## Estimating the mortality rate from corona?

Aaron “Edlin factor” Edlin writes:

What is the Gelman confidence interval for mortality rate from corona?

Aaron continues:

Would it be interesting to survey epidemiologists and generalists on their point estimates for the mortality rate for corona?

Will the average guess be close to reality? Like with guessing jelly beans in a jar. Or will it be far off. How will guesses vary by location I wonder? Probably a silly idea. What could be learned as the real n is 1?

Would you guess that experts or generalists would do better?

My only suggestion is to estimate the mortality rate as a function of demographic predictors and then do some sort of poststratification if you want an overall estimate. That’s gonna make more sense than trying to come up with a single ratio and then arguing endlessly about what’s the right denominator.

P.S. More here.

1. Joseph Delaney says:

https://en.wikipedia.org/wiki/2020_coronavirus_outbreak_in_South_Korea

South Korea is at about 140,000 tests and seems to do good contact tracing. They have 6284 cases and 42 deaths as of today. Wikipedia has a nice chart by age and sex of the estimated fatality rates. Many of the interesting patterns in the Chinese data show up in the South Korea data (unusually low fatality rate among children, higher rate in men than women) and it has the advantage they caught the epidemic early.

No data set is perfect and numbers are quire small but if I wanted to get a best case scenario, this is the data I currently look at. But it does support Andrew’s contention that a single, overall number is meaningless. There is way too much effect measure modification just in the raw data and using only age/sex.

• Martha (Smith) says:

Thanks for the link. The mortality rates (so far) by age are interesting — and make it sound not quite as dire as the standard news reports of “Higher mortality rate for people over 60”. The chart shows mortality rate 1.4% for ages 60 – 69, 4.1% for ages 70 – 79, and 6.0 % for ages 80 – 89.

• Martha (Smith) says:

I’ve been hearing the suggestion of “elbow-bumping” instead of handshakes, etc, to reduce spreading the virus — but just saw a variant, “toe-bumping”. (However, I don’t seem to be able to insert the URL here.)

2. Kaiser says:

The biggest problem is we need cohort-adjusted rates. That means we need to know when each patient became sick, which is hard to know until later. Assuming no other problems with data collection (bad assumption, I know), the current calculation that ignores cohorts will underestimate the mortality rate because there is a time to death (or survival), and as the infections spread, most cases are new and not yet passed that time. When the infection growth starts to slow down, and the cohort ages, this mortality rate will shoot up, causing media spasms.

• David Chorlian says:

Would the cohort problem be solved by having data on recovery? I thought I saw at some point in the last few days a news story which gave recovered cases and deaths, but I can’t remember where, or vouch for the methodolgy which yieled the numbers.

• Aaron says:

Kaiser: You identify one important effect of what will happen with time. But another is that the selection of those tested will change. At the outset only the very sickest are tested and so the case-rate mortality appears high. Over time it has decreased in most and perhaps all countries as I understand it. I suspect this misleading, oft-quoted, statistic will continue to decrease. Perhaps Korea has enough testing now that the case rate mortality will rise as you say, but my money is on Korea falling too over time as testing increases and more of the less-affected population is drawn into the sample.

• Carlos Ungil says:

It may also be the other way around: they can go from testing wider groups of people coming from certain areas or who had contact with previous positives to testing only the subpopulations where the disease is known to have graver consequences.

That’s the case in Switzerland: next week the criteria for testing mild cases shifts from potential contact with infected people to patient characteristics like age and comorbidities. Suspected cases in hospitalized patients and healthcare workers will still be tested to prevent propagation, but in general only vulnerable subpopulations will be tested.

3. Alex Blocker says:

It would be great to see that kind of stratified analysis as an extension to this work https://cmmid.github.io/topics/covid19/severity/diamond_cruise_cfr_estimates.html

The correction is well-motivated epidemiologically, but a full modeling approach that explicitly pools across groups and incorporates the demographic variables would be useful.

• Alex Blocker says:

@Kaiser Yes, exactly. Take a look at the “Adjusting for outcome delay in CFR estimates” section of the linked item.

• Kaiser says:

I just skimmed the webpage so may have missed some details. Here are some questions:
a) the modes of those distributions are at 5 days and yet there is at least a 15-day gap between the first case and the first reported deaths in the column charts
b) i see that Figure 1A is not a hazard ratio but a distribution of deaths. So if the historical data are to be unbiased, it has to consist of “fully mature” cohorts, with no censoring. Is that an assumption?
c) How should we reconcile the media saying 14 days quarantine when Figure 1A says deaths can occur after 30 days?
d) do you have individual tracking data or is this based on aggregate counts like those in the column charts?
Thanks!

4. Anon says:

Many posts end up discussing MRP and how it’s so great, how Nate Silver is unfairly trashing the method, etc. However, all methods and approaches have cases where they don’t work well (e.g., as discussed in the recent post on holes in Bayesian statistics). What about MRP? Would be interested to know when MRP breaks down and doesn’t perform well.

• Andrew says:

Anon:

1. I have no problem with Nate or anyone else criticizing MRP. My problem with Nate’s criticisms were not that they were “unfair” but that were empty. He wasn’t saying anything, he had no references or links, there was no content.

2. MRP breaks down if you don’t have good group-level predictors. I discussed some problems with MRP in this 2013 post, Mister P: What’s its secret sauce?

5. Thomas says:

Just to clarify the definitions – usually a mortality rate is computed for a population that is potentially at risk, and over a given period, and case-fatality is the proportion of cases (however they are defined) who die.
So if Hubei province has 60 million inhabitants, 80000 cases, and 3000 deaths accrued over 3 months, the mortality rate from the virus is 3/60000, times 4 (to make it annual), or about 20/100’000 person-years. Case-fatality is 3/80, about 4%. The problem with case-fatality is that the denominator is soft, since contamination by the covid-19 virus (confusingly called SARS-cov-2…) may lead to asymptomatic carriage, minor symptoms, more severe symptoms, etc. and the virus may or may not be identified in any given person.
Of course everyone is free to use common words as they please, so mortality for case-fatality, but misunderstandings will ensue.

• Hans says:

(+1). Taking it from here, we can then take the slow down of new infections in Hubei as a sign that steady state is approached with 40-70% of the population being infected (https://threadreaderapp.com/thread/1228373884027592704.html). E.g., we may assume that the slow down occurs at 10% infected. This gives an estimate of the mortality of 3000 / (0.1 * 60 * 10^6) = 5e-04 = 0.05%.

6. Julien Riou says:

Stan epidemiologist here. We actually just released a preprint doing exactly that using Stan (https://www.medrxiv.org/content/10.1101/2020.03.04.20031104v1).

The point is that crude estimates of case fatality ratio obtained by dividing observed deaths by observed cases are biased in two ways:

1) Deaths are underestimated because of the delay between disease onset and death (right censoring);

2) Total cases are underestimated because surveillance efforts focus on severe cases and miss asymptomatic and mild cases.

We attempted to correct for both these biases using data from China and a few assumptions. It might still need some refinement though, happy to hear any comment!

7. Carlos Ungil says:

What is the right denominator to estimate the mortality rate as a function of demographic predictors? I don’t think estimating multiple case-fatality rates [*] instead of one helps with those endless discussions.

[*] That’s what I think you meant. (Thomas: thank you for bringing up the issue and the explanation.)

Another “sheep from the train” story:

A shepherd and a math professor travel by train together. They pass a herd of sheep and the professor starts counting how many sheep are in the herd. Before finishing, the shepherd tells the correct number. So the professor is really surpised.

This continues once again and then once again, which makes the professor even more curious how the shepherd does it so fast. So he asked him and the shepherd repplied:
It’s very easy, I first count the legs and then divide by four.

8. Justin Smith says:

No clue, but on jelly beans in a jar… http://www.statisticool.com/candyjar.htm

Justin