## The CoE Analytics & Sensorics analyses on a regular base available COVID-19 data to give insights into possibilities and constraints of Data Analysis

Established as Think-Tank for statistics and data analysis, the CoE AS normally enables companies to understand themselves better by analyzing their data (process, metrology, manufacturing…) to extract hidden information and parameters.

With this project we want to share our daily analysis about the actual COVID-19 outbreak and about impacts of these statistics. Everyone shall have the opportunity to understand how decisions in politics, society and in our daily life may have an impact and why data can be interpreted in such different ways from different stakeholders.

Any additional comments, adds and corrections are gratefully received.

### Your contact

Erik Hänel

Head of Analytics & Sensorics

## Model parameters from past data

To predict the future, we will use standard models. However, they rely on measured data obtained from the actual pandemic. The most relevant parameter used for the model is the infection rate, which is *β* = ln(2)/*τ*_{2}, i.e. it is directly connected to the doubling time.

Figures 1 and 2 show logarithmic visualisations of the daily reported and the accumulated COVID-19 cases displayed versus the reporting date. Figure 1 shows the reported COVID-19 cases and deaths against the reporting date. From the March 19th onwards, it can be seen that the growth of newly reported cases decreases, which repeats after March 29th, which is a clear indicator for the success of the contact prohibition as we already mentioned in another article.

### Extracting the doubling time

Figure 2 shows the accumulated cases against their respective reporting dates. Here, too, a bending of the curve can be seen from March 19th and again from March 29th onwards. This also coincides with the three fitted exponential curves (note again the logarithmic scale, figure 3 presents the same graph using linear scale), which intersect each other on March 19th and March 28th. The doubling time has increased to *τ*_{2} = 16.969 days (*β* = 0.041 days^{-1}) at the transition from the grey to the green curve, which means that it takes nearly 17 days for the number of reported infections to double, which is far more than the target doubling time. This time is required from the government and the Robert Koch Institute and shall be about 13-14 days. For the reasons see the model calculations below.

## Extracting parameters from data

Parameters are important quantities from mathematical models and can influence the behavior of models significantly. Parameters of linear models are quite easy to obtain by applying a linear regression, however most models are not linear. To find the best fitting parameters for those models, applying a non-linear least-squares fitting method will yield the requested values. An important algorithm for least-squares fitting is Levenberg-Marquardt.

## Predicting the future

It should actually be obvious that it is not possible to predict the future of a pandemic because it is a far too complex system that depends on too many influencing variables. Among other things, government, the existing health care system, general hygienic conditions and wealth have a great influence on the spread of an infection. In order to make statements about the behaviour of a system in the future, it is necessary to simplify it to such an extent that the number of influencing factors remains controllable. Models are used for this purpose.

Various models are used in epidemiology, including the very simple SI model (*susceptible-infected model*) [11], the somewhat more advanced SIR model (*susceptible-infected-removed model*) [12] and the SEIR (*susceptible-exposed-infected-removed model*) [13], from which we have picked the latter two for the model calculations here. Both models are strong simplifications of reality, but can still give orientation in dealing with COVID-19.

### The SIR model

The SIR takes into account the persons removed from the system (by immunity or death). In figure 4 the model with the three doubling times from figure 2 was calculated. You can see the susceptible (grey), infected (blue) and already immune persons (green) for the three scenarios. Note that the current doubling time is already higher than the target, so the third and thickest curves are already fairly flat and difficult to see. Interesting results of this model are that for each scenario there is a maximum in the curve of infected persons. In addition, this model shows how high the target immunity of the society (in this idealized case) will be. For example, if the disease spreads rapidly, an immunity of almost 100% is achieved, whereas at the current rate of spread we can probably expect an immunity of less than 1% of the population. As a consequence, the current situation will most likely repeat itself with a new wave of COVID-19, if no vaccine is available by then. Flattening the infection curve therefore has its price: we gain time for research but have to live longer with the current restrictions. If we look at the current doubling time of about 15 days, we can expect the maximum number of infectious of about 55,000 patients today.

### The basic reproduction number as indicator for the spread

With the quotient *R*_{0} = *β* / *γ*, where *γ* is the recovery rate of the disease, which we assume to 1/20 days^{-1}, the SIR model provides an additional important quantity, the so-called basic reproduction rate. This figure indicates the average number of people infected by an infected person. With the current data it is approximately 0.80, so one infected person infects 0.80 more people. In order to prevent the spread of the epidemic, the basic reproduction rate must be kept below 1 constantly, i.e. less than one additional person is infected per infected person, and the epidemic thus comes to a standstill over time. Calculating the doubling time for a basic reproduction number of 1 (linear spread) gives ln(2) / (1 × *γ*) ≈ 13-14 days.

If the value 0.99 is used for the basic reproduction rate in the model equations, the dashed lines in figure 5 are obtained. As can be guessed, neither the infections nor the removed (immune) persons increase. If the numerical values are examined, a similar picture emerges here: the number of infections decreases over time (albeit very slowly), while the immunities slowly increase. After 600 days 31,000 infected persons and 140,000 immune persons remain.

### The SEIR model considers the delay in infectiousness

The SEIR model takes into account the case where an infected individual is not immediately infectious and thus, despite being infected, cannot immediately infect other people. It is therefore closer to reality than the SIR model. Using the calculated parameters in this model, one obtains the graph of figure 5 with the susceptible (grey), the exposed (red), the infectious (blue) and the remote (green) persons. This model shows a very similar course to the SIR model. The central difference is the progression of the infectious persons, which in this model are automatically flattened out somewhat by considering the temporal shift. Another difference between these models is that the planned infestation (see the dashed lines) of the society is marginally higher. In this case, after 600 days about 8,000 exposed, 46,000 infectious and 224,000 immune persons remain. Using the current numbers, we may expect the maximum number of 89,200 infectious persons in about 12 days, however, this estimation is strongly dependent on the (unknown therefore guessed) number of exposed people.

### Models can fail

Models are based on simplifications and idealizations. No computer has the computing capacity to calculate a real system. Of course, it is also just as impossible to capture all influencing variables at a certain point in time. Therefore, a model also describes an idealized system which, in the best case, approximates reality. Nevertheless, deviations and completely different outcomes are possible in such complex systems as pandemics (a good example of such unexpected behaviors is weather forecasting). In order to stay as close as possible to reality, models must be repeatedly calculated with updated parameters (weather models are calculated several times a day). And last but not least, a model is only as good as the previously determined parameters. If the data source is insufficient, even the best model cannot provide a good forecast. See also our analysis of data sources in another article.

## Deriving the growth rate

The growth rate of COVID-19 can be described by the basic reproduction rate. This rate contains the information about how many more people are infected by an infectious person on average. With a number greater than 1, the growth is exponential, with 1 it is linear, and with less than 1 the growth is inverse-exponential, i.e. exponentially decreasing.

Example of a wave of 15 generations of infected persons (15 infection cycles):

*R*_{0}= 1.5: 1.5^{15}≈ 438 (times the number of initial infections)*R*_{0}= 1: 1^{15}= 1 (times the number of initial infections)*R*_{0}= 0.5: 0.5^{15}≈ 0.000031 (times the number of initial infections)

## COVID-19 Key Facts

- Current doubling time: 16.969±1.243 days
- After March 29th the doubling time has increased again, which is a strong indicator for the success of government actions as we could confirm
- Parameters can be extracted from the data, which can be used to calculate models predicting the future behavior
- The basic reproduction rate currently is
*R*_{0}≈ 0.80, i.e. an infectious person infects less than one additional person: the number of infectious persons is currently decreasing

Figure 1: (Logarithmic scale) Number of new cases and new death displayed vs. the reporting date. One can see that the growth of the number of new cases per day is getting slower after the 19th of March. The difference between the data range of March 23th to March 29th and March 30th to April 5th is much smaller than in comparable data ranges before.

Figure 2: (Logarithmic scale) Accumulated COVID-19 cases and deaths displayed vs. the reporting date. The green, grey and black curves are the best fits for the exponential growth. Note that the doubling time of the green curve is 3 days larger than the grey one, which is again 3 days larger than the black curve. We also added the major government actions [4], which influence the behavior of the curve. Additionally, we show an estimation on the number of recoveries.

(*) The estimation is based upon the WHO report on the Hubei case [1], which gives us rough numbers on the recovery time.

Figure 3: (Linear scale) Accumulated COVID-19 cases and deaths using a linear scale for comparison. Note that the number of deaths is small compared to the number of infections. Additionally, we show an estimation on the number of recoveries.

(*) The estimation is based upon the WHO report on the Hubei case [1], which gives us rough numbers on the recovery time.

Figure 4: Forecasts of the future behavior using the SIR model [12] (*susceptible-infected-removed model*) with the findings from the current data. The different linesizes represent the three doubling times from figure 2 (thin to thick), the dashed lines represent the target behavior using a base reproduction number of 0.99. The different quantities represent the **S**usceptible, the **I**nfected and the **R**emoved (i.e. immune or died) population in the three scenarios.

Figure 5: Forecasts of the future behavior using the SEIR model [13] (*susceptible-exposed-infected-removed model*) with the findings from the current data. The different linesizes represent the three doubling times from figure 2 (thin to thick), the dashed lines represent the target behavior with a base reproduction number of 0.99. This model considers a disease, where an infected person is not infectious immediately.

## Read more about data analysis applied to COVID-19 data

#### COVID-19: Basic reproduction rate as spread estimation

We can confirm that the spread of COVID-19 is currently slowing down fairly rapid using the data from April 15th (R = 0.89).

#### COVID-19: Simulating current and target behavior

We compare the current and the target (as required by the government) spread of COVID-19 using the epidemiologic models from the previous article feeded with data provided by the Robert Koch Institute.

#### COVID-19: Calculation of the future behavior using daily data

Different epidemiologic models are used to describe the behavior of COVID-19. In this article, we explain their usage and how their parameter are obtained.

#### COVID-19: Success of the contact prohibition

In this article we present evidence that the contact ban ordered by the government is beginning to achieve first successes. We also discuss the different data sources and summarize the development of the last month.

#### COVID-19 outbreak analysis

Updates CoE Analytics & Sensorics analyzes COVID-19 outbreak Status: 2020-04-04 The CoE Analytics & Sensorics analyses on a regular base available COVID-19 data to give insights [...]

# INVENSITY Kompetenzen

### CONSULTING

### CAREER

© Copyright 2007 – 2020 | All Rights Reserved

# INVENSITY Competencies

### CONSULTING

### Career

© Copyright 2007 – 2020

All Rights Reserved