COVID-19 Modelling


Modelling the spread of a novel infectious agent can be a complex process but most mathematical models of the infection start with a few broad assumptions and follow similar principles. The aim of nearly all models is to help understand how many people are going to be infected and how severe the infection might be, including the number of people who might die from the disease. Surges in demand for highly specialised intensive care units (ICUs) are also something that is considered in some models to help with healthcare planning.

There cannot be a single model for the spread of the COVID-19 outbreak as the variables considered in each model are specific to each population. In the panel to the right - news articles which consider the models used in different countries will be available in a new series on how national governments are approaching non-pharmaceutical interventions (NPIs) to control the spread of COVID-19.

Why model?

Models are created to help planners and governments understand the impact of different measures on controlling the spread of infection with the eventual aim of bringing the outbreak to an end. Models help to inform which measures a country might adopt and the likely outcome of different NPIs on the spread of the disease.

These NPIs include such measures as border closure, restrictions on travel for inbound and outbound traffic, isolation of society’s most vulnerable, school closures, shuttering of businesses where people gather and others such as directives to work remotely and whether the general population should wear masks or whether these should be prioritised for healthcare workers.

As we have seen from numerous countries which have enacted NPIs, many are designed to reduce contact between people and to educate the population in how to reduce the spread of the disease through hygiene measures (cough etiquette/hand washing/use of hand sanitiser/etc.), maintaining a distance from others when in public and reducing the opportunities for contact between people wherever possible. Most public health interventions like these are designed to reduce the infection rate which is also known as R0 (pronounced R-nought).

Much has been talked about "flattening the curve" or spreading out the number of infections so that the limited intensive care resources in a country are not so overwhelmed by the surge in demand. This happens when many people become acutely unwell and need intensive care at the same time. By using measures to spread out the rate of new infections, the peak demand can be reduced to more manageable amounts. This is true even if the total number of ICU beds needed is the same, they are spread out over a longer period of time.

Other interventions are designed to reduce the case fatality rate such as improved treatment options (ventilator support for severe cases/trials of new or existing drugs/etc).

The start of the model

In the first step, the population at risk of infection is divided into three or four parts. This is known as the SIR or SEIR model, with each letter representing a stage in the progression of the infection through the population

  1. Susceptible = people at risk of infection.
    1. With new infectious diseases that have never been encountered before, this is taken to be everyone within the population as no one has been infected previously so the assumption is no one has any immunity against the virus.
  2. Exposed = people who have been exposed to the infection but are not yet able to infect others. This relates to the incubation period where people have been infected but are not yet able to pass it to others.
    1. Not all models include this step and therefore people once infected may go directly to being infectious.
  3. Infectious = people with the infection that can pass it on to others, often unwittingly, and therefore spread the disease to more people within the population.
    1. Not all infectious people are symptomatic and there’s evidence that the virus which causes COVID-19 can spread from asymptomatic people, even if this is not a major source of spread of the disease.
  4. Removed = this group of the population have recovered from the infection and have no symptoms and are thought to no longer be infectious. The removed group also includes those people who have died from the infection, or its complications and the infection can no longer be transmitted to other people.

All models start with the data that is available. It is important to note that this data will be specific to the population it has been taken from. This includes the characteristics of age/sex/ethnicity/other medical conditions all are dependent on the population being considered. There will also be differences in what data is collected, what is prioritised, how timely this is and what data is left out. 

Models also consider the "dynamics of transmission" which includes how easily the virus spreads between people based on their social interactions and the mode of spread of infection (cough/touch/airborne/water/etc.). Some populations are more isolated and have fewer interactions on average than others. This may be true even within a single country with rural communities often less connected than more densely populated areas such as larger towns and cities.


There are lots of variables and the level of detail collected on each of these variables is imperfect, especially at the start of a new outbreak. There may be differences in the degree of symptoms experienced by different people, which is in turn influenced by their age, sex, ethnic and genetic background, pre-existing medical condition and other factors.

Models can begin from what appear to be similar starting points and end up with very different recommendations as there may be significant variability in how data has been recorded and it can be challenging to work with so much uncertainty.

different models can arrive at very different outputs with only minor changes to the starting variables and assumptions made by the modellers.

For example, these are some key values to be put into the model:

  1. The Infection Rate aka R0 (pronounced "R-nought")
    1. This relates to the number of new infections each infected person goes on to cause, if the exposed population is fully susceptible.
    2. If the R0 is greater than 1, the infection will likely keep going, and if less than 1 it will probably die out.
  2. The Case Fatality Rate or CFR is taken from the number of deaths divided by the number of known infected people.
    1. This is highly dependent on changes in the number of infected people being counted.
  3. How many people are at risk of (or susceptible to) being infected. Is there any one with immunity to the disease?

The infection rate is, itself, is a combination of other factors which may vary wildly between different populations and locations. It includes:

  1. the rate of contact between infected and susceptible individuals
    1. How many people does an infected person come into contact with?
    2. For how long are they in contact?
  2. the rate of transmission –
    1. how many people become infected after contact with an infected person?
  3. What proportion of people become infected after contact with an asymptomatic case?
  4. For how long does a person remain infectious?
  5. For how long does the infection remain able to infect others in the environment?
    1. does it remain suspended in the air?
    2. or is it limited to larger droplets which fall to earth within a short time?
    3. How long can it survive on surfaces and go on to infect another person?

As mentioned earlier, the case fatality rate (CFR), or how many infected people die from the disease, is something that is often only properly understood once the outbreak has progressed beyond the first few weeks or months and it may not be fully known until the outbreak is contained or over. It also may vary wildly with different ages within the populations or between people with different ethnic or genetic backgrounds.

As a result, the CFR often varies over time as more people with less severe illness are counted and therefore the number of deaths is divided by a larger population of infected people. There is no single CFR for all populations. Also it is important to note that there are periods of time that elapse when considering the deaths of patients from COVID-19.

  • Incubation period - how long after exposure until a person develops the disease
  • Infective period - how long is the person able to transmit the infection to others
  • Symptomatic period - how long does the person have symptoms before
  • Period of hospital care? 
  • Period of ICU care?
  • Time of Death
  • Time taken to record death
  • time taken to report death.

As a result, the rate of deaths from COVID-19 may lag behind the number of cases by a week or more and may continue to climb for weeks after the peak of cases

Differences in data gathering can also have dramatic impact on the reported Case Fatality Rates. If, for example, a country records only confirmed tests in people with severe infections, the CFR will appear to be higher than would be the case if the same country had tested and found more infected people with less severe disease.

Case Fatality Rate is also impacted by the healthcare capacity of a country and by its ability to assign the cause of death accurately, in a timely fashion. In low healthcare settings, it may not be possible to test every case and therefore cause of death may not be appropriately attributed to the new disease resulting in a lower CFR than might be expected. Similarly, if testing is prioritised for only the severely ill, the CFR may be falsely elevated.

Early on in any outbreak, the ability for a healthcare system to identify cases of a new disease is limited and so only those which have more serious infections with more severe symptoms are picked up. At the same time, these severe cases are more likely to die than people with less serious illness.

In the case of COVID-19, this was especially true as it took time to isolate the virus, grow it in a lab and develop a sensitive and specific test against the virus to identify who was infected. Issues with the production of test kits and overly centralised testing may have limited some countries’ ability to fully know how far and fast the infection is spreading in their locations. Both the US and the UK were found to have limited testing capacity in the early days of their outbreaks. This has only partially been addressed as the months have passed since the first imported cases were identified.

The decisions taken by governments equipped with imperfect information will be subject to scrutiny in the future but now is almost certainly not the best time for recriminations.