Estimates of the severity of COVID-19 disease: another Bayesian model with poststratification

Following up on our discussions here and here of poststratified models of coronavirus risk, Jon Zelner writes:

Here’s a paper [by Robert Verity et al.] that I think shows what could be done with an MRP approach.

From the abstract:

We used individual-case data from mainland China and cases detected outside mainland China to estimate the time between onset of symptoms and outcome (death or discharge from hospital). We next obtained age-stratified estimates of the CFR by relating the aggregate distribution of cases by dates of onset to the observed cumulative deaths in China, assuming a constant attack rate by age and adjusting for the demography of the population, and age- and location-based under-ascertainment. We additionally estimated the CFR from individual line-list data on 1,334 cases identified outside mainland China. We used data on the PCR prevalence in international residents repatriated from China at the end of January 2020 to obtain age-stratified estimates of the infection fatality ratio (IFR). Using data on age-stratified severity in a subset of 3,665 cases from China, we estimated the proportion of infections that will likely require hospitalisation.

And here’s what they found:

We estimate the mean duration from onset-of-symptoms to death to be 18 days and from onset-of-symptoms to hospital discharge to be 23 days. We estimate a crude CFR of 3.7% in cases from mainland China. Adjusting for demography and under-ascertainment of milder cases in Wuhan relative to the rest of China, we obtain a best estimate of the CFR in China of 1.4%) with substantially higher values in older ages. Our estimate of the CFR from international cases stratified by age (under 60 / 60 and above) are consistent with these estimates from China. We obtain an overall IFR estimate for China of 0.7%, again with an increasing profile with age.

I edited the above paragraph by rounding all numbers and removing the 95% intervals. The intervals are model-based and look way too narrow compared to actual uncertainty. For example, their estimate of the mean duration from onset to death is 17.8 days with a 95% interval of 16.9–19.2 days. There’s no way they can know this so precisely. And their estimate of the crude CFR from mainland China is 3.67% with 95% interval of 3.56%-3.80%. Again, this interval is too narrow to tell us anything. With an interval so narrow, we might as well just take the point estimate.

I’ve not looked at the substance of their data or model but I did notice they used Bayesian inference, which I think is a good idea given the need to integrate different data sources when studying this problem. I’m pretty sure they could fit their model in Stan, which would be a good idea as it would allow them to incorporate more structure in the model without requiring onerous programming effort to fit it. Also I skimmed through the paper and have some issues with their prior distributions (compare to general principles here). But that’s fine, there’s room for improvement. This and other models will need to be re-fit with new data in any case.

There’s some literature on this problem. For example, Verity et al. cite a 2005 article by A. C. Ghani et al., “Methods for Estimating the Case Fatality Ratio for a Novel, Emerging Infectious Disease,” in the American Journal of Epidemiology. That earlier work is non-Bayesian, though, which will create challenges if you’re dealing with sparse data or trying to combine multiple sources of information.

P.S. More here.

11 thoughts on “Estimates of the severity of COVID-19 disease: another Bayesian model with poststratification

    • I agree we don’t know the number of infected or the case fatality rate, but I don’t think we need that information in order to be able to determine that this disease is probably very serious. The health care systems in both Wuhan and northern Italy were completely overwhelmed. Would a normal flu do that?

      Also, I’ve said this already several times, but although the CFR in Singapore so far is 0, this gives a misleadingly rosy impression of the situation because 10-15% of the patients here have ended up in critical condition. I think it’s unlikely that there are lots of undetected cases in Singapore, because the situation has been under control for weeks, so 10-15% might be a relatively reliable estimate of the proportion of infected people who end up needing critical care (although of course this proportion will depend on the demographic composition of the population). That a lot of people who are infected end up needing critical care is also borne out by what’s happened in Wuhan and northern Italy. Hence, even if the CFR with proper medical care might be low, at this point there’s good reason to believe it’s high if the health care system becomes overwhelmed. So I think there’s a very good case to be made for using strong measures to prevent that from happening. Yes there’s a lot of uncertainty, and yes the measures need to be balanced against other costs, but I don’t think it would be a good idea to wait for more reliable data before implementing strong measures.

      • Of course we need better data, randomized testing at grocery stores would be a fantastic idea… but we don’t need this testing data to make our decision. As is usual for doctors Ioannidis focuses on what are essentially individual consequences for patients. doctors take care of patients so this makes sense to them… it is essentially totally irrelevant.

        In civil engineering and public health, both fields are dominated by the systemic consequences of systems failure. Sure if your dam up in the woods somewhere fails suddenly “only 15000” people will be flooded and die. But over the next year 18 million people will be forced to flee their homes for lack of drinking water… Sure the CFR for this disease may turn out to be the same as flu if the progression was similar to the flu, such as circulating around the globe over a 6 month period affecting about 10% of the population overall… but in reality half the world would have this disease by June 1 and half of those would be in the last week of that period… the consequences of everyone being sick and 10% extremely so are obviously way worse than cutting economic output in half for a year… anyone who can’t see this is just not familiar enough with the calculations. it’s useful to simply sit down and do back of the envelope calcs here. it’s not even close.

        +100 for Olav

    • I plead ignorance as to the actual severity of the situation or the proper course of action to take. However, I do think we have a major issue of credibility – none of the official actions being taken give me any confidence. The responses are so thoroughly fraught with politics that I am not sure what to believe. I can do some of the modeling and it does suggest we are either massively over-reacting or it could turn out worse than we think – perhaps even with the severe responses we are living through. I fault much of the political developments we have seen (particularly over the past 3 years) as well as many of the research issues we have discussed on this blog longer than that. Trustworthiness is in short supply right now. Here is a very good talk (though not focused on COVID-19:
      https://community.jmp.com/t5/Discovery-Summit-Munich-2020/The-Art-of-Statistics-Spiegelhalter/ta-p/252647

      • There isn’t any question in my mind that aggressive social distancing such as has been done by universities throughout the country is absolutely the right thing to do.

        Bayesian decision theory says there is a near certainty without these measures that nearly everyone in the country would be sick within a few months, and around 10% would need ICU care, and that this cost far exceeds the cost of working from home, missing classes, closing restaurants etc. With that level of swamping I can’t imagine we’d have less than 5% of population dead, not just from viruses but from collapse of everything, imagine 30M people needing ICU and 100k beds…

        now 30M times say 30 life years is a billion life years lost so at some round number of say 100k dollars that’s 100 trillion dollars. basically we should be willing to close everything completely for 5 years to avoid that. the decision isn’t even close to difficult

        what’s happening is that the correct response was TOTALLY obvious and yet it took about 2 weeks too long.now maybe hundreds of thousands will die who could have been avoided… hard to say at this time

        • I believe you meant to say “nearly everyone in the country would BE EXPOSED TO THE VIRUS within a few months”.

          All available evidence indicates that not everyone exposed gets sick and that only a small proportion of those exposed get seriously ill.

          It is inevitable than hundreds of millions of people will eventually be exposed to COVD-19. Quite possible billions. You don’t have to be a Bayesian analyst to notice that the numbers of serious ill or dead are orders of magnitude smaller.

    • Thanks, well written. Ioannidis only real valid point seems to be that we need broad randomized surveillance testing, which I agree with. the rest of his article is off in left field picking daisies while the stadium is on fire.

  1. I found a preprint that claims that the false positive rate for covid-19 PCR tests is probably higher than is commonly thought: “False positives in reverse transcription PCR testing for SARS-CoV-2” (https://www.medrxiv.org/content/10.1101/2020.04.26.20080911v1).

    Here are some quotes from the preprint:

    “We compiled data and calculated FPRs [false positive rates] for 43 EQAs [external quality assessments] conducted between 2004 and 2019, each of which assessed between three and 174 laboratories. These laboratories provided the EQAs with assays of 4,113 blind panels containing 10,538 negative samples”

    “3.2% of the 10,538 negative samples were reported as positive”

    “Rather, the likeliest source of these false positives is sample contamination or human error.”

    ‘The Patient’s Fact Sheet for the US CDC’s SARS-CoV-2 test states that there is only “a very small chance” that positive tests result could be wrong, but suggests that negative results could well be wrong. Public health officials have similarly suggested that positive test results are absolutely reliable but negative results are untrustworthy. In reality the opposite is true over a wide range of likely scenarios.’

    “We suspect that under-appreciation of the potential impact of false positives in SARS-CoV-2 testing has two sources. First, because PCR-based diagnostic protocols are effectively designed to eliminate false positives due to cross-reactivity, some individuals assume that false positives have been eliminated from the test process. In practice, however, minute levels of contamination, which are extremely challenging to control, can produce false positives in PCR-based tests, and the potential for human error in sample handling or records management can probably never be entirely eliminated. EQAs regularly found significant FPRs [false positive rates], from whatever cause, in PCR-based assays. Second, there appears to be little understanding of the relationship between FPR [false positive rate] and FDR [false discovery rate], in which a small FPR can produce a large FDR if the test positivity rate is low.”

Leave a Reply to David Cancel reply

Your email address will not be published. Required fields are marked *