Calculation of the future behavior of COVID-19 using past data

April, 8th 2020 – Reading time: 8 minutes

The CoE Analytics & Sensorics analyses on a regular base available COVID-19 data to give insights into possibilities and constraints of Data Analysis

Established as Think-Tank for statistics and data analysis, the CoE AS normally enables companies to understand themselves better by analyzing their data (process, metrology, manufacturing…) to extract hidden information and parameters.

With this project we want to share our daily analysis about the actual COVID-19 outbreak and about impacts of these statistics. Everyone shall have the opportunity to understand how decisions in politics, society and in our daily life may have an impact and why data can be interpreted in such different ways from different stakeholders.

Any additional comments, adds and corrections are gratefully received.


COVID-19 Key Facts

Your Content Goes Here
  • Current doubling time: 10.336±0.902 days
  • After March 29th the doubling time has increased again, which is a strong indicator for the success of government actions as we could confirm
  • Parameters can be extracted from the data, which can be used to calculate models predicting the future behavior
  • With the current doubling time, we expect that the current situation will last more than 200 additional days depending on whether a vaccine is available and the restrictions are released or not

Model parameters from past data

To predict the future, we will use standard models. However, they rely on measured data obtained from the actual pandemic. The most relevant parameter used for the model is the infection rate, which is β = ln(2)/τ2, i.e. it is directly connected to the doubling time.

Figures 1 and 2 show logarithmic visualisations of the daily reported and the accumulated COVID-19 cases displayed versus the reporting date. Figure 1 shows the reported COVID-19 cases and deaths against the reporting date. From the March 19th onwards, it can be seen that the growth of newly reported cases decreases, which repeats after March 29th, which is a clear indicator for the success of the contact prohibition as we already mentioned in our latest article.

Figure 1: (Logarithmic scale) Number of new cases and new death displayed vs. the reporting date. One can see that the growth of the number of new cases per day is getting slower after the 19th of March. The difference between the data range of March 23th to March 29th and March 30th to April 5th is much smaller than in comparable data ranges before.

Extracting the doubling time

Figure 2 shows the accumulated cases against their respective reporting dates. Here, too, a bending of the curve can be seen from March 19th and again from March 29th onwards.

Figure 2: (Logarithmic scale) Accumulated COVID-19 cases and deaths displayed vs. the reporting date. The green, grey and black curves are the best fits for the exponential growth. Note that the doubling time of the green curve is 3 days larger than the grey one, which is again 3 days larger than the black curve. We also added the major government actions [4], which influence the behavior of the curve. Additionally, we show an estimation on the number of recoveries. (*) The estimation is based upon the WHO report on the Hubei case [1], which gives us rough numbers on the recovery time.

This also coincides with the three fitted exponential curves (note again the logarithmic scale, figure 3 presents the same graph using linear scale), which intersect each other on March 19th and March 28th.

Figure 3: (Linear scale) Accumulated COVID-19 cases and deaths using a linear scale for comparison. Note that the number of deaths is small compared to the number of infections. Additionally, we show an estimation on the number of recoveries. (*) The estimation is based upon the WHO report on the Hubei case [1], which gives us rough numbers on the recovery time.

The doubling time has increased to τ2 = 10.622 days (β = 0.065 days-1) at the transition from the grey to the green curve, which means that it takes about 10 days for the number of reported infections to double. The target doubling time required from the government and the Robert Koch Institute shall be about 13-14 days. For the reasons see the model calculations below.

Extracting parameters from data

Your Content Goes Here

Parameters are important quantities from mathematical models and can influence the behavior of models significantly. Parameters of linear models are quite easy to obtain by applying a linear regression, however most models are not linear. To find the best fitting parameters for those models, applying a non-linear least-squares fitting method will yield the requested values. An important algorithm for least-squares fitting is Levenberg-Marquardt.

Predicting the future

It should actually be obvious that it is not possible to predict the future of a pandemic because it is a far too complex system that depends on too many influencing variables. Among other things, government, the existing health care system, general hygienic conditions and wealth have a great influence on the spread of an infection. In order to make statements about the behaviour of a system in the future, it is necessary to simplify it to such an extent that the number of influencing factors remains controllable. Models are used for this purpose.

Various models are used in epidemiology, including the very simple SI model (susceptible-infected model) [11] and the somewhat more advanced SIR model (susceptible-infected-removed model) [12], which we have picked out for the model calculations here. Both models are strong simplifications of reality, but can still give orientation in dealing with COVID-19.

The future according the SI model

The SI model only considers the spread of an infection. No recoveries from the disease or deaths are shown. The two main influencing variables are the number of infected persons and the number of persons who can be infected (so-called susceptibles), as well as the rate of infection, which was determined beforehand. For the three calculated doubling times from figure 2 we have calculated the corresponding model in figure 4. As can be seen, the spread of the infection slows down with higher doubling times, but this comes at the price of a longer duration for the complete infestation of the society, which is necessary for herd immunity. According to this model, the minimum necessary infestation of 70% (approx. 58 million inhabitants) will be reached in approx. 115 days with the current doubling time of approx. 10 days.

Figure 4: Forecasts of the future behavior using the most simple SI model [11] (susceptible-infected model) with the findings from the current data. The different linesizes represent the three doubling times from figure 2 (thin to thick). This model visualizes the number of infected people of a population. It does not consider deaths or recoveries (i.e. recovery time is far longer than the spread of the infection).

The SIR model yields different results

As an extension of the SI model, the SIR model is used, which takes into account the persons removed from the system (by immunity or death). In Figure 5 the model with the three doubling times from Figure 2 was calculated.

Figure 5: Forecasts of the future behavior using the SIR model [12] (susceptible-infected-removed model) with the findings from the current data. The different linesizes represent the three doubling times from figure 2 (thin to thick). The different quantities represent the Susceptible, the Infected and the Removed (i.e. immune or died) population in the three scenarios.

You can see the susceptible (grey), infected (blue) and already immune persons (green) for the three scenarios. Interesting results of this model are that for each scenario there is a maximum in the curve of infected persons. In addition, this model shows how high the target immunity of the society (in this idealized case) will be. For example, if the disease spreads rapidly, an immunity of almost 100% is achieved, whereas at the current rate of spread we can probably expect an immunity of less than 50% of the population. As a consequence, the current situation will most likely repeat itself with a new wave of COVID-19, if no vaccine is available by then. Flattening the infection curve therefore has its price: we gain time for research but have to live longer with the current restrictions. If we look at the current doubling time of about 10 days, we can expect the maximum number of infections of about 2.5 million patients in about one year.

With the quotient R0 = β / γ, where γ is the recovery rate of the disease, which we assume to 1/20 days-1, the SIR model provides an additional important quantity, the so-called basic reproduction number. This figure indicates the average number of people infected by an infected person. With the current data it is approximately 1.3, so one infected person infects 1.3 more people. In order to prevent the spread of the epidemic, the basic reproduction number must be kept below 1, i.e. less than one additional person is infected per infected person, and the epidemic thus comes to a standstill over time. Calculating the doubling time for a basic reproduction number of 1 (linear spread) gives ln(2) / (1 × γ) ≈ 13-14 days.

Models can fail

Models are based on simplifications and idealizations. No computer has the computing capacity to calculate a real system. Of course, it is also just as impossible to capture all influencing variables at a certain point in time. Therefore, a model also describes an idealized system which, in the best case, approximates reality. Nevertheless, deviations and completely different outcomes are possible in such complex systems as pandemics (a good example of such unexpected behaviors is weather forecasting). In order to stay as close as possible to reality, models must be repeatedly calculated with updated parameters (weather models are calculated several times a day). And last but not least, a model is only as good as the previously determined parameters. If the data source is insufficient, even the best model cannot provide a good forecast. See also our analysis of data sources in the previous article.

Predicting a system's behaviour

Your Content Goes Here

In order to be able to predict the behavior of a system, a model must be available that can describe this system mathematically at least in an idealised way. The more advanced and detailed such a model is, i.e. the less simplifications are necessary, the better the model can describe a system and the more reliable are its predictions. However, as the parameters increase, it also becomes more difficult to unify the model and the data already measured, so that the future can be predicted from the measured course of the past.


Erik Hänel
Erik HänelHead of Analytics & Sensorics

How can we accelerate your development?

Let’s start


Center of Excellence Analytics & Sensorics

  • Core Competences
    • Modelling & Simulation
    • Measurement & Sensoric
    • Statistics & Data Analysis
    • Predictive Maintenance


The presented analysis is based upon the data provided by the Robert Koch Institute. This data is updated daily (see status). The statements, which we derived today, may already be invalidated through new data provided tomorrow. We try to keep this analysis as up-to-date as possible.

Because not every health department updates their data during the weekends, data updates provided by the Robert Koch Institute on Sun- and Monday cannot be trusted and will be ignored for our analysis. The next update will be performed on Tuesday.

Change history


  • Doubling time: 10.622±0.902 days
  • Added model calculations to predict the future behavior of the pandemic


  • Doubling time: 9.692±0.586 days
  • Exponential growth decreased again after March 29th
  • Success of the contact prohibition could be confirmed


  • Doubling time: 7.213±0.346 days
  • Highlighted the COVID-19 key facts


  • Doubling time: 6.939±0.345 days


  • Doubling time: 6.623±0.417 days
  • Added estimated number of recoveries based upon WHO estimation


  • Doubling time: 6.231±0.274 days
  • Added population and population density of the German federal states
  • Added incidences and case densities of the German federal states
  • Added an initial analysis on the spread in the federal states


  • Doubling time: 5.849±0.172 days
  • Added major government actions
  • Last two days are now ignored for fit, because their data is quite incomplete
  • Added initial analysis about government actions


  • Doubling time: 6.182 days
  • Initial version

Further Reading

  1. WHO. Report of the WHO-China Joint Mission on Coronavirus Disease 2019 (COVID-19). Report. World Health Organization (WHO); 2020 16-24.02.2020.
  2. RKI. COVID-19 Dashboard. Robert Koch Institute; 2020
  3. RKI. SARS-CoV2-Steckbrief. Robert Koch Institute; 2020
  4. Finn Bauer et al. CovidCountries. 2020. Data source for government actions
  5. RKI. COVID-19 data set. NPGEO Corona; 2020. Main data source
  6. RKI. Federal state data set. NPGEO Corona; 2020
  7. Interview with Martin Eichner.; 2020
  8. BR. Recherche zum Casus Ischgl.; 2020
  9. Martin Eichner et al. COVIDsim. 2020
  10. DIVI. Momentane Auslastung der Intensivbetten in Deutschland. Deutsche Interdisziplinäre Vereinigung für Intensiv- und Notfallmedizin; 2020
  11. Wikipedia. SI-Modell. Wikipedia; 2020
  12. Wikipedia. SIR-Modell. Wikipedia; 2020

Data reference

Reference: Robert Koch-Institut (RKI), dl-de/by-2-0

The data are the „Case Figures in Germany“ of the Robert Koch Institute (RKI) and are available under the Open Data Data License Germany – Attribution – Version 2.0.

Learn more

INVENSITY Kompetenzen


Accelerate your development


Let’s make things better

© Copyright 2007 – 2020   |   All Rights Reserved 

INVENSITY Competencies


Accelerate your development


Let’s make things better

© Copyright 2007 – 2020
All Rights Reserved