Skip to content

Coronavirus model update: Background, assumptions, and room for improvement

Julien Riou, coauthor of one of the models we discussed here, writes:

Here is an overview of the current state of the project, so that it is easier for everyone to quickly grasp what is the potential room for improvement.

Background on the epidemic: COVID-19 just passed 100,000 confirmed cases all over the world, and is expected to continue to spread. Without strong control measures or a lucky turn of events (such as a lower transmission during summer months), a significant portion of humanity is going to get infected in the next few months (Marc Lipsitch from Harvard, one of the most respected scientists of the field, suggests than 20 to 50% of all adults may become infected and pretty much everyone in the field agrees). Some countries implemented strong control measures (China, Singapore) and are actually controlling the epidemic, so there is still hope to win enough time to develop a vaccine, which would be the best scenario.

Background on mortality: Most reports about the mortality (including WHO) only consider the crude case fatality ratio (CFR), i.e. the number of deaths observed divided by the number of cases observed until a given date.

This estimate is biased in two ways:

– the number of deaths is underestimated because of the delay between disease onset and death (up to 2 months)

– the number of cases is underestimated because of asymptomatics and mild cases.
WHO says 3.6%, there are other estimates taking other denominators, but there is a lack of clarity on this question. This is problematic as politicians and public health authorities need to decide on strong control measures, and the debate keeps getting polluted by contradicting claims (from “it’s just a flu” to panic-inducing claims).

Objective: To develop a model to estimate the true mortality, i.e. the proportion of people infected with or without symptoms that will die, stratified by age. The model will be applied to data from China first, then to other countries when it becomes available.

Impact: Obtaining a single, rock-solid estimate of the true mortality would be extremely important to support and direct the efforts of public health authorities.

Data: We started from data released by the Chinese CDC (enclosed), analysing data from the Hubei province (where the epidemic started in December). We got the incidence of confirmed cases by day of disease onset (first day of symptoms, see Fig A below) and the age distribution of cases and deaths in China, which is different from the age distribution of the Chinese population (see Fig B).

This age shift in cases can have several explanations:

1) younger people have prior immunity to COVID-19 (unlikely as it is a new disease)

2) younger people have cross immunity by being infected by other coronaviruses (very few data on that, and why would it only concern young people)

3) younger people have less contact with the potential infectors (possible but partial explanation, see Fig 1C)

4) younger people are infected the same but have less symptoms, do not seek care and are not identified (more likely)

Analysis plan: In this analysis, we want to consider only 3) and 4), and try to estimate the total size of the population. We also account for the asymptomatics and for the delay in mortality using external data. For that we need to do some kind of poststratification, backed by an epidemic model that generates infections by date of disease onset, which is important when accounting for the delay. All code and data are in

Current state: We published preliminary results in a preprint (here), because we were confortable with the results at this point, and wanted to put the idea out. But there is still room for improvement. In this version (model10) we did not consider the different contacts by age class. Since then we added the contact structure, and also an alternative source for the symptomatic rate (with uncertainty). This is model12 on github. Enclosed to this email (supplementary.pdf) you will find a better description of model12 (still a bit rough at this point). I don’t have the results of model12 yet (almost finished, it took 3 days on the cluster at the University of Bern).


Bottlenecks and room for improvement: The model relies on many assumptions listed below. It would be good to be able to relax some of these assumptions, or find more data or ideas to circumvent them. There are also potential improvements, also listed.

1) We assume that there is no prior immunity to COVID-19 due to cross reaction to other coronaviruses. I did not find any good data on that, but an alternative strategy may be to estimate what kind of prior immunity would explain the age shift, and see if it is conceivable.

2) We assume that about 80% of cases are symptomatic (comes from Bi et al, enclosed), without any trend by age. Only symptomatic people can be recognised as cases (with an age-dependent reporting rate) and can die (with an age-dependent mortality). It means that we assume that the probability of “no symptoms at all” is the same in each age group, but still younger individuals that have symptoms have milder ones, which leads to less reporting and less deaths. Better data on that would be good.

3) In order to identify the model we need to set the reporting rate for the older group to 100% (like an upper bound). This is not too far fetched (as we can imagine that in this context, old people with fever or cough will get tested very quickly), but still an assumption.

4) Mortality is also higher in people with comorbidities (same Chinese CDC paper):

It would be nice to stratify by comorbidity like we did for the age groups, but I worry about computation time and identifiability. And we would need the proper data in China.

5) We include a time-dependent parameter for transmission (to show the decrease in transmission after the control measures), but mortality is not time-dependent in the model. Neither is included the saturation of the hospitals, which could lead to increased mortality at the peak of the epidemic. These are obvious limitations, but I’m not sure if we can address them here.

6) It would be good to apply the model to other contexts (South Korea, Japan, Iran, Italy), but I didn’t find the appropriate data yet (time series of cases and deaths + age distributions). We could also apply our age-specific estimates to other age distributions in countries around the world.

Again, the data, code, and research paper by Riou et al. are all posted, so anyone can get involved here.


  1. Some thoughts. The differential equation model may not be the best thing from a computational perspective. I’m not sure. Converting it to a discrete time model in days may lead to substantially simpler calculations, without real loss of accuracy. Differential equation solvers do a lot of work to take small step sizes and control their error to a tolerance of 1 part in 1e5 or whatever… who cares, if you get 3% errors it really doesn’t matter.

    I’m not sure if that’s a real issue or not, but it might be worthwhile to try implementing the model as discrete time with 1 day timesteps and see if it can be fit substantially faster, thereby letting you iterate more on the form of the model without having to wait 3 days for your cluster to crank through it.

    • Julien Riou says:

      Thank you for the feedback. I received some advice from Ben Bales and Charles Margossian that led to a huge reduction of computation time (from 3 days to 2h, unbelievable), so I don’t think I need to ditch the ODE solver just yet. You talk about 3% errors, but we’re dealing with a few thousands deaths in a total population of 59 millions, so there is still a need for precision!

      • Keith O’Rourke says:

        Thanks for pointing this out “still hope to win enough time to develop a vaccine”.

        Not something to count on but something to hope for. In the same way your 3 days to 2h was enabled by a couple of experts, that’s likely also happening in the vaccine development.

      • Glad you found a way to speed it up, I guessed it should be possible to make a massive reduction, glad someone with expertise was able to help.

        I still think 3% errors are meaningless but I didn’t mean 3 percentage points I meant if the correct solution to your ode given the parameters was say 1000 cases and your actual numerical solution was 1030 it wouldn’t materially affect the inference on the parameters. maybe I’m wrong but often ODE solvers would work hard to make sure the solution was say 1000.035 and I don’t think .035 people is meaningful.

        Keith, another reason to delay as much as possible is that this kind of virus mutates quickly and will have selection pressure for less severe forms if we do things right. by April or May severity could be 10 or 100x less

    • David Kimpton says:

      Why isn’t population density and proximity to trade routes factored into this model?

      Also do you compare real mortality statistics as they become available With the stats predicted by the model to get a feel of model accuracy?

  2. Giacomo Petrillo says:

    Some Italian data: (I don’t know if that is sufficient for the analysis, maybe they already know about it)

  3. Two thoughts:

    * Are there no data on distribution of length of incubation/infectious period? (Viral load over time could be a proxy for the latter?) You could consider a SEEIIR or SEEEIIIR model, I think it is very impactful in flu models.
    * I don’t see the contract matrix you used in the preprint. Whatever it is, it could be scaled down over time (e.g. at discrete timepoints related to introduction of quarantine)? Scaling could be a parameter to estimate (with informative prior, even based on previous epidemics)?

    Both of these should not affect computation time too much and could be added directly into the Stan model that you have on GitHub, I think.

    • Julien Riou says:

      Thanks for your interest.

      * About using more complex distributions than exponential (as you suggested by adding more compartments, resulting in gamma distributions), we tried it in earlier versions of the model and it did not change the fit significantly. Since the main objective of this model is to link incident cases and incident deaths, and that the model fit to incidence data was good, I didn’t see a clear reason to add this complexity.

      * There was no contact matrix in the preprint, it was one of the things I added afterwards. On the other hand, the progressive decline of transmission after the introduction of the control measures was included in the preprint with a downward logistic function, and as you suggests the parameters of this decrease (delay and slope) were estimated from data.

  4. Anoneuoid says:

    The way the comorbidity correlation is being treated is strange to me due to the obvious increase in disease with age. How much more likely are &gtr; 60 people with eg hypertension to have a severe response than those without?

  5. Howard says:

    I have conducted a preliminary analysis of the Diamond Princess data from the quarantine of the ship in Japan. This is still a work in progress, but you can see the code and the report on github:


    Key takeaways:
    1. Mortality is highly skewed towards elderly population.
    2. All ages are at risk of infection
    3. Asymptomatic proportion of cases is high, so there are going to be lots of people walking around with Corona and not knowing it.

    For a summary of results, go to page 36 of the PDF. This is a “naive” analysis, in that future expected deaths are ignored. However, the analysis is still, in my opinion, very useful in its own right and sets a floor for mortality rates.

    I use a Greta model. I also use a latent GP layer to induce correlation among the various age categories and smooth out the rates. (for Stan users, you will see how relatively straightforward it is to set up a fairly complex model in Greta).

  6. Thomas Speidel says:

    Given that many of the cases are still in the at risk set, wouldn’t it be more appropriate framing this as a censored/survival problem?

  7. somebody says:

    I feel like the case fatality rate is not exactly what we should be focusing on estimating. For one, getting a meaningful number out of it and applying that number is tricky for reasons pointed out by others; the interpretation is sensitive to the base rate of tests, the current health system capacity, the quality of medical care, the current state of knowledge regarding disease treatment. A more meaningful number which is also easier to estimate seems to be the fraction requiring hospitalization. This seems like it’s less sensitive to inevitable changes and, combined with rate of growth, more immediately actionable for containment protocols.

    • I agree with this, and fatality rate is essentially a subset of the hospitalization rate, and the size of the subset is directly related to the degree to which the hospitals are overwhelmed…

      However, have you actually seen hospitalization data? this is stuff that doesn’t seem to be discussed much.

      • somebody says:

        > the size of the subset is directly related to the degree to which the hospitals are overwhelmed…

        I would impute with “anyone who died outside a hospital needed to be hospitalized.”

        > However, have you actually seen hospitalization data

        No, I haven’t seen any brought to my attention, though it’s possible they’re available somewhere

        • Zhou Fang says:

          What if they had coronavirus and then got hit by a bus?

        • Anoneuoid says:

          Just add up serious and critical:

          Assuming you trust those numbers…

          Also, this was a good idea imo. Basically divide the rate at which COVID patients die by the “natural” rate of mortality in the next year for each age group. There was only one death in the 10-20 yr old group, and zero in 0-9. But besides that it seems pretty close to a constant 2x multiplier:

          • I think that analysis has a dimensions problem. COVID-19 killed those people in ~ 2 weeks compared to all cause mortality for a year. So rather than 2 it should probably be about 52

            • Anoneuoid says:

              I see what you mean. But that seems to imply dying now vs next March is 52x (shouldnt that be 52/2= 26?) worse. Sooner is obviously worse than later, but the y-axis is based on probability of dying at a certain age.

              Also, dying of ARDS may be worse than other ways to go.

              • My point was, if you’re going to do something for 2 weeks, you could do “my normal life” or you could do “get COVID-19”.

                Suppose your risk for a year of “my normal life” is R, then the risk C from 2 weeks of COVID-19 is 2R… so the relative risk rate (per week) is

                (2R/2) / (R/52)

                So 52 times as high from getting COVID-19 as from “my regular life”

              • Anoneuoid says:

                I see now. Itd be good if they followed up a year later and reported all cause mortality rates in the patients by age.

                My understanding is few elderly who undergo mechanical ventilation survive another year (their lungs can’t take it and the alveoli burst or something, but if ventilation is stopped they don’t have enough oxygen in their blood to survive either). Those deaths may not show up here.


          • More Anonymous says:

            That Reddit comparison of COVID-19’s case fatality with the yearly risk of death is revealing, and helps to place the risks of COVID-19 in an intuitive context: The risk of death during a case of COVID-19 is about 1.5-3.5 times the mean yearly risk of death in the general population, and this is true for all studied age groups >20 years old.

            However, some additional observations are relevant to interpreting this correctly:

            * The case fatality is from China (CDC weekly report) and the yearly risk of death is from the US. Case fatality could be different in the US.
            * The case fatality estimates from China do not adjust for incomplete observation of cases, or for incomplete death counts due to limited follow-up times. See Julien Riou’s drafts for more on that.
            * The linked figure should really have confidence intervals, because some of the results at young ages are so uncertain as to be useless. In particular, the case fatality at ages 0-9 and 10-19 are based on only 0 and 1 deaths, respectively! But, the case fatalities at older age groups should be much more statistically reliable.

            Some other comparators could also be useful to place the COVID-19 case fatalities in context:

            * In the US, the highest risk jobs have an occupational risk of death of about 0.1% per year.
            * One of the very highest risk US jobs, Alaskan King Crab Fishing (“The Deadliest Catch” job) has an occupational risk of death of about 0.35% per year.
            * For the US, the cause-specific case fatality in the first year after diagnosis is 17% for all cancers (overall), 51% for lung and bronchus cancers, 41% for stomach cancers, 16% for colon and rectum cancers, 3% for breast cancers, and 2% for prostate cancers, all as based on SEER 9 cases diagnosed in 2011.

            It would be better to have non-yearly risks to compare with COVID-19, but I can’t think of good comparators off the top of my head. Maybe someone else has some good ideas on that.

            • Anoneuoid says:

              I agree with all you say. Also interesting to compare people with common COVID-19 comorbidities, eg table 1 here:

              So 2010 Scottish type 1 diabetics are also ~2x more likely to die for a given age (assuming the baseline is similar to 2015 US). And this doesn’t suffer from the two weeks vs yr problem mentioned by Daniel above.

              I’d be nice to see the same for hypertension but I’m not seeing it out there. For some reason these life tables are created for every possible combination of gender and race but not by chronic illness.

  8. Mark Siddall says:

    You have left out a possible contributor to pathology and mortality. “Immunity to” coronaviruses itself could be the cause of mortality and morbidity. It is well known that after being immunized against SARS, challenge with SARS caused dangerous hyper-responsiveness by the immune system. The corollary is that the younger you are the less you have been exposed to other circulating coronaviruses and the less primed your immune system is to wreak havok on your lungs. Being immune to something is not always good, ask anyone who is a responder to poison ivy, or who goes anaphylactic from bee stings.

  9. jim says:

    a bit off topic but good Joe Rogan interview w/ Michael Osterholm (UofM, infectious disease expert); amazingly joe is letting the expert speak.

    • Andrew says:


      Rogan should interview Matthew Walker again. Walker’s an expert on epidemics!

      • jim says:

        Ha, I hadn’t seen that! Interesting to see what people actually look like after I’ve dissed their work. He sounds pretty official. Is he an “expert” on epidemics too? :)

        Osterholm smashes several Roganisms during the interview. Joe believed that 180° sauna air would kill microbes in the lungs. He had a really hard time accepting that 180° air in your lungs would be deadly. Rogan also tentatively advanced the idea that COVID-19 is a Chinese bioweapon. What a laugher. File under “no concept of fundamentals”.

        • Of course, sauna air *is* 180F, it’s just that the molecules in the air have a lot less energy in them than water molecules in your lungs at 180F so there will be a strong gradient between the surface of the lungs and say the air at the centroid of the lung which would be 180F.

          this is like when you grab aluminum foil that came fresh out of the oven, you don’t get burned, because you suck the energy out of the aluminum foil very quickly and it isn’t that much energy, so it doesn’t heat your finger much.

          on the other hand, grab a cookie sheet and it’ll burn like crazy, because it’s a lot more metal, stores a lot more energy, and doesn’t cool “instantly”

          • jim says:

            Yes, small volume of gas(air), low heat capacity; large volume of body (80% water, 20% solid) means air entering the body reaches quickly reaches equilibrium with body temp.

            Joe was really after that one, apparently certain that his own experience in the sauna had positively impacted his internal biota.

            He also searched carefully for the (organic vegetable, pro-biotic, exercise regimen, (popular fad)…) that would function as a virus repellent, also to no avail.

  10. Ethan says:

    I’m not sure if the current model deals with this, but do we have any information on the false positive rate of the current covid-19 test? It seems this could cause a lot of bias because of the low prevalence of covid-19 in the general population. In particular, we might be vastly overestimating the number of asymptomatic patients with covid-19.

    • More Anonymous says:

      I’d like to see more on this too.

      To my understanding, the PCR testing used for COVID-19 produces very few false positives, but if good tests are given to groups with a low enough underlying disease rates, then false positives can still dominate.

      For example, if a test has 99.5% sensitivity and 99.5% specificity, then false positives will be more common than true positives if the tested population has a disease prevalence of less than about 5 in 1000.

      Disease rates could be low among both asymptomatic contacts and persons with suspicious signs and symptoms who are noncontacts. However, I don’t know if the disease rates and test performance are low enough to cause high false discovery rates / low PPVs in practice.

  11. Martha (Smith) says:

    Has anyone here looked at this guy’s models? His article at seems to be getting attention on popular websites (e.g., Facebook).

    • Dzhaughn says:

      +1 that he is getting a lot of attention as an amateur. I am not saying he’s wrong.

    • jim says:

      Super cool graphics! Excellent! Chart 21c is great, really shows the effect of cutting transmission rates. cool.

    • It kind of doesn’t matter how good his models are, the *gist* of it is correct, and explaining the gist of it to the layman is what’s getting the stuff done that needs to happen.

      That the CDC hasn’t been shouting this from the rooftops with their own pre-prepared estimates is evidence that Trump may stand as equivalent to maybe Joe Stalin.

    • Anoneuoid says:

      China contained it but then it leaked? This is total fantasy. 5 million people left Wuhan in early January, it was never contained.

      Then he shows basically a chart of how many tests have been issued or loosening/tightening of the diagnostic criteria and acts like it is the spreading of the virus.

      Just look at that big bump Feb 12 or so. The entire chart is like that.

      I couldn’t read on.

      The various health authorities seem to be 2-4 weeks behind the curve as usual:

      Yes, there have probably been millions of cases in the US already. There was absolutely nothing stopping it from spreading here months ago.

    • Keith O’Rourke says:

      Seems compatible with what I know.

      Now, I do remember SARS as being a China, Hong Kong and Canada thing. But then after being in the midst of a SARS outbreak in Toronto and having symptoms consistent with it (no tests at the time) maybe that’s just me.

      Anyway, in Canada we are mostly closed down or close to it (little choice given the March break travelling).

      My choice now is to focus on experimental testing of hopeful but too not hail Mary treatments innovations and vaccine evaluation.

      Most of us, 95% +/- ? will be fine.

      • Many people who survived SARS have had long term problems… lung problems of course but also bone necrosis and PTSD and depression and loss of livelihoods etc. survival is not a guarantee of being ok

        • Nick Adams says:

          This is simply untrue. This is Twitter-level stuff.
          However a search reveals a fascinatingly prescient article (free to view):
          Nature Reviews: Microbiology. Perlman S, Netland J. Coronavirus post-SARS. 2009.
          Interesting comment about how disappointing vaccines for Animal corona viruses have been.

          • I can’t verify it directly because I don’t read Chinese, but the current Wikipedia article on sars links two Chinese references under the section on Prognosis. perhaps someone with the language ability can assess.

          • Also this is the US where medical causes are a leading cause of bankruptcy already, and where ~50% of households report to government surveys that they couldn’t afford a $500 car repair without taking a loan. A two week stay in an ICU could easily have a decade long financial consequence for people. Italy is reporting ~10% of confirmed cases need this level of care. Epidemiologists are estimating 20-60% of people will get this disease or in that ballpark. so let’s call it 5% of US financially devestated from direct medical causes alone? seems reasonable even not including the loss of jobs and soforth from multi week closures, and the devestation of retirement funds from a stock market crash.

            the consequences downstream at a societal level for tens of millions of infections are much much bigger than the downstream fallout of less than 10k SARS cases globally.

            the idea that we’re going to be fine is unfortunately untenable to me. The US has been riding for an economic fall for a decade with everything held up on a pillar of printed cash from the fed, and corporate debt at an all time high. Homeless camps on the streets of LA are everywhere. and it’s not all the traditional mentally ill.

            coming out of this medical disaster will require major societal changes to the way things are done in the US to restore normalcy long term. this issue obviously will vary in other countries.

            • jim says:

              Homelessness is a manufactured crisis. When jobs and housing demand grow dramatically faster than housing units are built, as has happened in the tech-dominated west coast cities, it’s hardly surprising that housing prices go through the roof and people wind up on the street.

              • Sure, and when in LA every immigrant service worker with no insurance has this disease and their jobs disappear because the small businesses they worked for close because there is no demand for 2 months and commercial rents are very high, and they have a $400,000 medical bill with no insurance, and the schools where their kids get 2 out of 3 meals a day are closed through next fall… are they going to be fine?

                Population of LA county is ~ 10M. Population of CA is ~40M, or a little more than all of Canada. the structural problems are there even if they’re caused by terrible policy. it’s hard to change policy.

                is there any scenario here where we don’t enter a major depression with interest rates already at Rock bottom?

                The truth is the only scenario I can see which would mitigate the economic consequences would be to institute UBI immediately. Pouring money into banks would be insane. How many of the hardest hit are going to get a couple hundred thousand in loans anytime soon? we will need the information coming from individual level demand to prioritize the goods and services needed for recovery. the last thing we need is Rich people sitting on piles of cash.

              • jim says:

                Actually I was referring to the part about the social changes that have to occur coming out of the epidemic or really just generally. Housing policy is a major problem, it’s killing poor people. It has to change.

                Regarding the immediate situation:
                The direct financial impact (health care costs) will correlate strongly positively with high income because of the strong correlation with age. Most hourly workers (~75%??) simply won’t need care. They’ll just tough it out.

                The biggest impact on low wage earners will be loss of income.

                The obvious way to fire up the economy post-epidemic is infrastructure spending. It’s good news that it’s so close to an election because congress (and Trump) will be very worried about keeping their chairs and will want to spend generously, so maybe finally we can get a really good transportation package done – something on the scale of 2009, like $500B – $1000B. If that’s through by the end of June, we might be able to hit the ground running.

                UBI is just spending. Transportation infrastructure is an investment in future growth.

              • I’m with you on how social policy needs to change so we aren’t harming people.

                I think infrastructure spending is extremely gameable. we would wind up with bridges to Trump golf courses etc. UBI isn’t “just spending” it’s 320 million people individually steering resources where they’re needed. I’m not against decent infrastructure development but I think we need to solve some of the social problems before that could move forward in a good way.

        • Keith O’Rourke says:

          Of my two colleagues that actually had SARS one seemed to be back to normal in a few months, the other it was a year or two. Both are now still working and seem happy and are now caught up in the Covid19 mess.

  12. Stevec says:

    I’ve read a bunch of papers published in the last few weeks on Covid19.

    There’s large uncertainty on everything. Incubation periods. Death rates. Whether you can transmit before you develop symptoms. How long does the virus survive on a surface.

    Definitely people should be trying to come up with their best estimates. But even before we consider mutations of this RNA virus into additional more virulent/less virulent strains the range of outcomes from a model are very wide.

    Not sure if it’s relevant to the model here, but just in case:

    A mathematical model for the novel coronavirus epidemic in Wuhan, China Chayu Yang and Jin Wang

  13. RE: Coronavirus

    Peter Attia MD I follow on Twitter.

Leave a Reply