Am I missing something here? This estimate seems off by several orders of magnitude!

Posted on June 16, 2020 9:34 AM by Andrew

A reporter writes:

I’m writing about a new preprint by doctors at Stanford University and UCLA on relative COVID-19 risk, in which they assert the risk is much less than most people might think. One author in an interview compared it to the risk of food poisoning. It’s a preprint so it’s obviously not fully baked yet, but it’s out there. Copying a link below, it’s not very long. Just wanted to ask your thoughts on the soundness of the science and their conclusions.

From the abstract of the linked article:

Among the 100 most populous US Counties, for the week ending May 30, 2020, the median probability of COVID-19 infection transmission is 1 infection per 3836 unprotected community-level contacts. For a 50 to 64 year old individual, the estimated median probability of hospitalization is 1 hospitalization per 852,000 community level person-contacts and the median probability of a fatality is 1 fatality per 19.1 million community-level person-contacts.

I did a quick look-up of the U.S. numbers and got these numbers for ages 55-64: 11439 deaths out of 42 million people. That’s a death rate of 1 in 4000.

If it’s 1 in 4000 people, and the fatality rate is 1 per 20 million person-contacts, that would mean that each person has roughly 5000 person-contacts. 5000 seems like a lot to me! I guess you can get to 5000 by saying that the epidemic has been happening for about 100 days, with 50 contacts a day. That still doesn’t seem quite right. For one thing, even if you do have 50 contacts a day for 100 days, a lot of these will be the same people over and over, and most of them are not exposed. The numbers just don’t seem to add up.

Indeed, the numbers here seem soooo wrong that I feel like I must be missing something here.

Could someone explain? If I’m completely wrong here, this would be a good teaching moment!

One of the hardest things about statistics is that you can never be completely sure of yourself. The hivemind is a big help.

P.S. Michael Jetsupphasuk sends along this picture of a friendly cat in Turkey, which he says is maybe the best place to be an outdoor cat.

98 thoughts on “Am I missing something here? This estimate seems off by several orders of magnitude!”

Raghuveer Parthasarathy on June 16, 2020 9:45 AM at 9:45 am said:

Note: I haven’t read the paper; quickly looking, it’s the sort of thing I’ve been hoping someone would do — I’ve often thought it bizarre that we don’t even have order-of-magnitude assessments of risk, and too many conversations seem to degenerate into a binary picture of zero risk versus inevitable doom. That said:

“For one thing, even if you do have 50 contacts a day for 100 days, a lot of these will be the same people over and over, and most of them are not exposed”

“Most of them are not exposed” is irrelevant. “Contacts” as used in the assessment is contacts with anyone, as is generally considered in contact tracing. The attempt is to measure risk per contact, not risk of transmission from contact with an infected person.

Second: you’re using the death rate rather than the hospitalization rate. Using the latter would push your estimate to 5 contacts per day, which is very reasonable.

Reply ↓
- Raghuveer Parthasarathy on June 16, 2020 9:52 AM at 9:52 am said:
  
  Sorry — I need an edit button, as I often realize after hitting Submit. Ignore the part about hospitalization vs. death rate.
  
  Reply ↓
- Andrew on June 16, 2020 10:03 AM at 10:03 am said:
  
  Raghu:
  
  I’m using the death rate because it’s more of a hard number. Regarding “contact,” I was going back and forth on the definition. They say, “a single unprotected and substantive community level contact with an individual of unknown infection status,” but then it all seems to depend on what is meant by “community level contact.” If I talk with my family members 100 times a day, is that 100 contacts? If so, I guess you can get to 20 million, but it doesn’t mean much.
  
  Reply ↓
  - Ian DB on June 16, 2020 10:13 AM at 10:13 am said:
    
    Why doesn’t it mean much? It seems like it means a lot: it tells you very specifically what risk you face given how many contacts you have. Obviously it ignores how homogeneous or heterogeneous the contacts are, and that’s an interesting extra source of variation to account for, but that doesn’t make it a useless statistic.
    
    Reply ↓
    - Andrew on June 16, 2020 10:33 AM at 10:33 am said:
      
      Ian:
      
      One problem is correlation. Seeing the same person 1000 times should not count; it just makes the numbers look bigger. Heterogeneity of contacts is not just an interesting extra source of variation, it’s the whole ballgame.
    - Jonathan on June 16, 2020 2:26 PM at 2:26 pm said:
      
      There would be some hetero in recontacting to an extent related to intervening contacts, but the strength of those versus originals is a guess. Somewhere between not material at all and equal to a new contact, which isnt instructive.
    - mik on June 16, 2020 3:13 PM at 3:13 pm said:
      
      I think it matters – a single contact does not necessarily mean infection, and repeated contacts with the same person increase the risk of being infected. Even with family members.
    - mik on June 16, 2020 3:15 PM at 3:15 pm said:
      
      ‘matters’ -> ‘should count’
    - Daniel Lakeland on June 16, 2020 3:30 PM at 3:30 pm said:
      
      Sort of… it’s complicated.
      
      Suppose two kids get together and play patty-cake… Is this 37 contacts every time their hands meet, or is it one contact lasting 2 minutes?
    - David J. Littleboy on June 17, 2020 7:53 AM at 7:53 am said:
      
      I’d think that it’s also the _quality_ (or _badness_) of the contact that’s significant. How careless, prolific, are your contacts?
      
      Here, my barber, guitar teacher, personal trainer are all one-on-one. Just me and the bloke. But all of these blokes see lots of people, so these three contacts for me become effectively (integrated over the previous two weeks) probably 100 (barber) plus 20 (guitar teacher) plus 40 (personal trainer). Writing down these estimates, I see I’ve got this completely backwards: I got my hair cut the other day, am planning on seeing personal trainer next week, and guitar teach next month.
      
      Of course, this is Japan, were the total COVID-19 deaths so far (less than 1000, but creeping up) are less than the deaths per day (on most days) in the US. Still, as a retiree who doesn’t have to do anything other than grocery shopping, maybe I’m being more suicidal than I normally am…
      
      This is a different issue but, on the news today (again, here in Japan), it seems that the number of excess deaths in Japan for April and May was _much_ larger than the number of Covid-19 deaths for those months (compared to the average numbers of deaths for those months over the past few years). And that’s true for some European countries as well. In Japan, the number of suicides was _down_, so there’s something nasty going on that hasn’t been figured out yet.
    - Joshua on June 17, 2020 8:50 AM at 8:50 am said:
      
      Yes, your comment raises an important point. Don’t know if it has been discussed above (I would imagine it must have been), but not all contacts are equal.
      
      If I have contact with 5 people in a day who have all had contact with five people that day it is far different than if I have contact with 5 people who have all had contact with 100 people.
      
      In the two scenarios, the next person I’m in contact with will effectively have contact with an additional 25 people in the one scenario and an additional 500 in the other.
      
      The ratio of contacts to illness necessarily depends on yhe connectedness of the contacts. Hence, yne ratio varies in different communities. Any attempt to quantity these issues while treating all contacts as equal is worthless.
    - jim on June 16, 2020 5:27 PM at 5:27 pm said:
      
      “Seeing the same person 1000 times should not count”
      
      This is wrong. Heterogeneity of contacts is not the whole game.
      
      First: Each contact with the same person isn’t the same. You pass Bob in the hall at 7:45am; you step in the elevator with him at 8:05 for 2 minutes. At 11:05 you sit next to them in a meeting for 30 mins. At 3:00 pm he stops by your office to complain about the department meeting. At 4:15 you walk down the hall with them on the way out of the building. These different contacts have different transmission probabilities.
      
      Second: The person might be non-contagious in the morning and contagious in the afternoon. You sat with Bob at a meeting at 11:05 but his infection progressed and he was contagious at 4:15. However, as you only walked down the hall with him, you did not acquire the infection, but the person that sat next to him on the bus for 30 min did acquire the infection.
      
      Even if Bob was already infected and contagious at 5am, you might not acquire the infection in all of your meetings with him. The Korean call center study showed that the elevator wasn’t a significant source of transmission. If Bob sat next to you at the meeting but didn’t speak or vocalize at all, you might not acquire the infection. The Korean exercise study showed that an infected yoga instructor who wasn’t exerting herself did not transmit the virus to students in a relatively small room, so how much a person is breathing or talking matters.
    - Martha (Smith) on June 16, 2020 5:59 PM at 5:59 pm said:
      
      I agree with Jim. To add to his comments:
      If Bob were singing (or perhaps even whistling), he might be more likely to spread the virus than if he were just sitting next to someone.
      Also, I have heard that viral load may be important — that higher viral load in the body may increase the risk of having the disease — so repeated contacts with an infected person might increase viral load in the recipient, thus increasing the risk of acquiring the disease.
    - Daniel Lakeland on June 16, 2020 9:56 PM at 9:56 pm said:
      
      Most likely the relevant variable is “minutes spent within 10 ft of each other per day”
      
      rather than “number of contacts”.
      
      I “contact” is simply not a well defined thing, it’s like a “good movie”.
    - jim on June 17, 2020 1:25 AM at 1:25 am said:
      
      ‘“contact” is simply not a well defined thing,’
      
      Apparently not in this discussion but presumably if buddy is using it in a paper he has some definition. Presumably the definition shouldn’t be specific to any particular disease.
Nick Nolan on June 16, 2020 10:07 AM at 10:07 am said:

(I’m probably completely wrong but speculating)

From European perspective restrictions and deaths work strangely. Maybe only heterogeneous modelling makes sense. Average R hides something important.

Speculation:

(1) there are wide subnetworks who spread covid to each other very fast. R is much smaller outside theses networks.

(2) high risk people may be concentrated on ‘the edges’. At first, little deaths. Then suddenly spike.

(3) Mortality risk could have inverse relation to R in heterogeneous models.

(4) restrictions work differently in different phases and subpopulations. At the beginning they stop the spread in highly connected subnetwork. few deaths, R drops. If restrictions implemented later, deaths drop much slower even if R drops equal amount.

Reply ↓
- confused on June 16, 2020 12:21 PM at 12:21 pm said:
  
  >>(2) high risk people may be concentrated on ‘the edges’. At first, little deaths. Then suddenly spike.
  
  This seems plausible in the US as well. Looking back, California found a COVID death 20 days earlier (February 6) than the previously known earliest US COVID death (February 26 in Kirkland, WA). And the California person was reported to probably be a community spread case. So it was probably spreading at a low level in CA from sometime in January on*, but didn’t become obvious since it wasn’t hitting high-risk populations. But in Kirkland, WA, a nursing home was hit…
  
  *Given the infection-to-death lag, a Feb 6th death strongly suggests infection in January.
  
  Reply ↓
Benjamin Morris on June 16, 2020 10:15 AM at 10:15 am said:

Have not read this study but have seen the issue discussed in other contexts. I think part of the discrepancy may come from the fact that most people who get covid get it from close contacts, like family or others they live with or see for extended periods on a regular basis. In other words, your odds of getting the disease from a random community contact may be extremely small, but then the number of expected infections resulting from that infection may be significantly higher than 1 as you spread it to close contacts, or those in your “bubble”.

Regardless, I think despite some bickering, even the mainstream science of this has seemed to be converging on the idea that the transmission chains for this are not that similar to what we thought early on. It’s like occasional super-spreading events followed by people spreading it to their immediate contacts. OTOH, we now have a data problem that all our data is basically grouped in two categories: 1) early data from a period where the virus spread uncontrollably, and 2) later data from a period where everyone had been locked down and drastic measures had been taken. Amazingly, these data seem to lead to different statistical conclusions, despite neither being an accurate representation of what things might look like after a controlled re-opening.

Reply ↓
- Benjamin Morris on June 16, 2020 10:17 AM at 10:17 am said:
  
  *That was a sarcastic “Amazingly”
  
  Reply ↓
Zhou Fang on June 16, 2020 10:23 AM at 10:23 am said:

The paper is basically not understanding exponential growth again. The numbers seem reasonable, but only due to currently low prevalence.

If you do a back of the envelope calculation and reverse out the maths here, the US having 20k new cases per day indicates the average individual has 0.26 contacts per day. Increasing this instantaneously to pre-lockdown contacts per day of about 20 means 1.5 million new cases of covid19 on the first day of ending lockdown.

On the second day after returning to normal, per encounter infection probability increases to 1/889 (due to increased prevalence), so you get 7 million additional cases.

On the third day per encounter infection probability increases to 1/306, and you get 20 million additional cases.

By the six day everyone in the US is infected.

Obviously I’m speeding things up a lot by assuming every infected person becomes infectious immediately. But it’s pretty clear that linearising an exponential by looking at it in terms of *individual* per-encounter infection probability is totally unreasonable.

Reply ↓
- Zhou Fang on June 16, 2020 10:36 AM at 10:36 am said:
  
  You should tell the reporter that the paper basically uses dodgy maths to make deeply scary numbers look unscary. Infectious disease is infectious. Food poisoning is not.
  
  Reply ↓
- Zhou Fang on June 16, 2020 10:44 AM at 10:44 am said:
  
  The other error is that authors imply an IFR for a 50-65 year old of 3836/19e6 = 0.02%. This is far below most estimates, and using current deaths stats, would imply over half of this age group has been infected. This is clearly nonsense.
  
  Reply ↓
  - Steve on June 16, 2020 3:13 PM at 3:13 pm said:
    
    Isn’t the problem estimating the IFR based a geography. It is just obvious that one county will have low or zero risk and another county very high risk based on when the virus got there. My risk of getting COVID-19 in the US in November of last year is approximately zero. The risk of getting COVID-19 in New Zealand right now is approximately zero. Why would I average over those locations? Why should we even care about that number? It seems meaningless to me. The risk we should care about is the risk conditioned on what we are doing and where we are. I want to know what my risk is if everyone wears a mask and social distances, or just wears a mask or just social distances, and even that risk changes over geography and time. It seems like a meaningless number.
    
    Reply ↓
  - confused on June 16, 2020 3:39 PM at 3:39 pm said:
    
    My response below should have been to this comment. Yes, this is wrong, apparently due to a misreading of the CDC pandemic planning scenarios, confusing “symptomatic” with “reported” cases. Even by the CDC numbers, which seem to be considered pretty low*, it should be 6.5x higher. (If the real population-overall IFR is more like 0.8% than 0.4%, 13x higher.)
    
    *In fact, they seem overly low… except that these are *planning* numbers, forward-looking. I don’t think an 0.4% IFR from infections happening *from now on* is all that unreasonable, given what we now know about supportive care and repurposed drugs, and given that the distribution of infected people is probably younger.
    
    Reply ↓
- confused on June 16, 2020 3:36 PM at 3:36 pm said:
  
  I think they misread the CDC pandemic planning estimates with regard to “symptomatic” cases.
  
  The pre-print states (page 8) that “For example, for an individual in the 50-64 year old age group the ratiosbetween infections; reported cases, hospitalizations, and deaths are: 10:1:0.045:0.002.”
  
  But the 0.002 death rate / 0.045 hospitalization rate in the CDC “best estimate” scenario (https://www.cdc.gov/coronavirus/2019-ncov/hcp/planning-scenarios.html ) is for *all symptomatic cases* not *reported cases*.
  
  That same scenario assumes 35% asymptomatic. So that 0.045/0.002 applies to 65% of all infections, not 10% of all infections.
  
  So the death and hospitalization rates should be 6.5x higher.
  
  Reply ↓
Tom Passin on June 16, 2020 11:03 AM at 11:03 am said:

I agree that those numbers sound screwy. Early on, I made the following rough estimate, which still seems reasonably good:

R0 was said to be about 3 at that time. People are contagious for something like ten days (3 – 5 days before they show symptoms, maybe about 5 days or so when they have symptoms but haven’t gone into a hospital or started to self-quarantine. So the average number of other people an infected person would infect per day would be 3 / 10 = 0.3. This means that the rate of increase of the number of cases should be about 30% / day. Even of there were many more cases out there than were officially confirmed (because of lack of testing), the fractional rate of change would be the same.

That is actually the rate of increase that I found in most US states I looked at and most countries I looked at too, although there was sometimes an very noisy initial period that tended to obscure this value. (This was calculated using the Johns-Hopkins data).

With social distancing and isolation and face masks, obviously the number of daily contacts and the probability of infecting someone during a contact must be way down. For many US states, the daily fractional rate is currently a 1 – 2 percent. That’s a decrease by a factor of 15 – 20. I would have thought that the factor would be much larger (just based on the number of people you don’t meet any more during the day). Maybe the numbers are being kept high by infections in institutions – prisons and nursing homes, meat packing plants, etc.?

BTW, in Italy and France the rate is around 0.1%/day and still dropping.

The number of contacts a person has in a day must be so highly variable from place to place that it’s hard to know how to handle it, especially if there are any serious non-linearities. Population density has something to do with it, but that can’t be the whole story. In the US, the per capita case numbers within a state can vary from county to county by factors of 5 – 7, and the smallest counties (by population) can have the highest numbers. So averages may be suspect, but it’s hard to avoid them.

The daily number of new cases should be

R = Ni * Nc * P,

where Ni is the number of infectious people out there, Nc is the average number of contacts per day for an infectious person, and P is the probability of infection / contact. Let’s ignore complications like repeated contacts with the same person, etc., etc. Then

P = R / (Ni * Nc)

If it’s close to the mark that a person is infectious for about ten days, then Ni = 10 * R. So

P = 0.1 / Nc or Nc = P/10

Plugging in the the value of 1 / 3836 from the referenced paper, we get Nc = 0.1 * 3836 ~ 384 contacts per day. That seems awfully high to me. This estimate would not be changed if the number of cases were higher than the number of confirmed cases, because that would change both R and Ni by the same factor.

Reply ↓
- Andrew on June 16, 2020 11:22 AM at 11:22 am said:
  
  Tom:
  
  Yeah, another way of putting it is if this “1 in 20 million” (actually, an inappropriately precise “1 in 19.1 million”) sounds infinitesimal, it’s only because the denominator is inflated.
  
  Reply ↓
- Zhou Fang on June 16, 2020 11:30 AM at 11:30 am said:
  
  No, no. The probability given in the paper is the probability for an *uninfected person to pick up the virus* by being in contact with a random person, who may or may not be infected. In your formulation it would be P*Ni/300 million.
  
  Reply ↓
  - Craig on June 16, 2020 3:04 PM at 3:04 pm said:
    
    It might be worth looking at this paper from one of the authors for similar numbers that don’t make sense or possibly a flaw in the approach https://www.medrxiv.org/content/10.1101/2020.05.02.20089086v3 (estimates of excess suicides and deaths due to unemployment)
    
    Reply ↓
- Martha (Smith) on June 16, 2020 6:08 PM at 6:08 pm said:
  
  Tom said,
  “The number of contacts a person has in a day must be so highly variable from place to place that it’s hard to know how to handle it, especially if there are any serious non-linearities. Population density has something to do with it, but that can’t be the whole story. In the US, the per capita case numbers within a state can vary from county to county by factors of 5 – 7, and the smallest counties (by population) can have the highest numbers. So averages may be suspect, but it’s hard to avoid them.”
  
  Yeah, there is so much variability — not a good situation for getting credible estimates.
  
  Reply ↓
Raghuveer Parthasarathy on June 16, 2020 12:33 PM at 12:33 pm said:

Now having actually read the paper:

1. I do think that what the authors are trying to do is good, and as mentioned above, I wish there were more quantitative estimates being made; living is full of risks, which numbers help us understand.

That said, I have a lot of issues with the paper. It’s very possible I’m being dense, of course. Here are the main issues:

2. The paper is very unclear. It’s not at all apparent what things are a consequence of simply plugging things into their model (p. 7), what come from empirical data that are used to assess the model, etc. Even the parameter values are unclear, being given in a table that doesn’t use the symbols of their simple equation.

3a. The model is essentially that the probability of infection *per contact* P = S b L I [A] [B] (page 7), where [A] is some stuff that depends on the duration of infection, [B] on asymptomatics, and the important parts to the left are the “attack rate” b, the probability that a contact leads to infection and the fraction of infected people L. There are rough values the authors claim for all these parameters. As far as I can tell, L is set to be 1 (!) (Table 1 line 1), i.e. the entire population is infectious, which is so strange I must be misreading somehow.

3b. As oddly, and more importantly, a value is given for b (10%), but this seems precisely the thing one would want the model to be used to estimate! In other words, the model is *assuming* 10% of contacts lead to infection, and then making claims about the probability of infection, which means that the model is simply a way of estimating *how many contacts* a person has. If I wanted to know how many contacts a person has, I wouldn’t use a disease model, I’d just look at behavior.

These are rather quick thoughts. If I were reviewing this paper, I’d second-guess myself more, and write more, but perhaps this will spur other people (even if just to explain what I’ve gotten wrong).

Now back to the work I’m neglecting…

Reply ↓
- Raghuveer Parthasarathy on June 16, 2020 12:47 PM at 12:47 pm said:
  
  I forgot to add:
  
  1b. Related to the lack of clarity, the paper commits the sin of being dimensionally incorrect. The main equation (p 7) has a probability on the left side, and units of time (from the “D” of the infection duration) on the right, which can’t be correct. Somewhere, somehow, the “D” is being non-dimensionalized, but this isn’t clearly written, which just invites mistakes to be made in the calculations!
  
  Reply ↓
  - Daniel Lakeland on June 16, 2020 2:03 PM at 2:03 pm said:
    
    When I was TAing at USC I used to hammer into the engineering students the importance of thinking about dimensions of equations. They would then complain in the course evaluations that I spent too much time doing that and not enough time working textbook problem after textbook problem… Seriously though I think if I taught a few tens of Civil Engineers to check the dimensions on their equations it was probably the most important thing I ever did as a TA.
    
    Reply ↓
  - Joseph Candelora on June 16, 2020 5:47 PM at 5:47 pm said:
    
    The per-day item must be the “IReported = Incidence of reported cases”. It’s not described as such, but that has to be the percentage of the total population reported as newly infected each day. Then they multiply that by the number of days a given person stays infectious, and you get a rough estimate of the number of infectious people at any point in time.
    
    That assumes that the incidence of reported cases (per day) has been level over the last n days, but that’s probably not the worst assumption they made.
    
    If I’m wrong and the IReprted is instead the percentage of the population currently reported as an active case then they’ve overstated their likelihoods, but I doubt they made that basic an error. Although given their other errors, maybe I should revise my prior on how basic an error they would make.
    
    Reply ↓
- Mendel on June 16, 2020 3:49 PM at 3:49 pm said:
  
  10% is the “attack rate”, i.e. how probable it is that an infectious person infects a contact.
  https://www.thelancet.com/journals/laninf/article/PIIS1473-3099(20)30314-5/fulltext found 75% for household members over the total infectious period (but for n=4) and 5% for high-risk contacts (15min at under 6 feet).
  
  You basically multiply that with the prevalence of infectious people in the population to get a random contact risk, and that’s already hugely oversimplified because the intensity of contacts varies a lot.
  
  And they don’t seem to consider superspreading events, such as when a care home or a meat plant has an outbreak, or the choir infections, where the reported attack rates were much higher.
  
  Reply ↓
  - Raghuveer Parthasarathy on June 16, 2020 4:15 PM at 4:15 pm said:
    
    It’s unclear, but I think they mean the 10% to be the “attack rate” *per contact event* not integrated over the total time. So the relevant number would be the 5%, not the 75%.
    
    Reply ↓
- Martha (Smith) on June 16, 2020 6:14 PM at 6:14 pm said:
  
  Raghuveer said,
  “These are rather quick thoughts. If I were reviewing this paper, I’d second-guess myself more, and write more, but perhaps this will spur other people (even if just to explain what I’ve gotten wrong).”
  
  It sounds like people are rushing to publish back-of-the-envelope calculations — which are not inherently bad, but can often be misleading in things as complex as this. On the other hand, if these rush-to-publish methods are adequately critiqued, they might lead to better approaches which might be more realistic, hence more useful — but I am skeptical at this point.
  
  Reply ↓
  - Raghuveer Parthasarathy on June 16, 2020 8:06 PM at 8:06 pm said:
    
    Good points; I agree. I don’t think there’s anything wrong with publishing simple calculations — some of the best calculations are simple! — but they should be clear and well-written simple calculations, which this is not. (See also John N-G’s comment below.) If this were an end-of-term class project, it would get a failing grade just based on clarity alone. But: hopefully the authors will improve this, and then it really could be a stepping stone to useful insights!
    
    Reply ↓
wally on June 16, 2020 2:21 PM at 2:21 pm said:

My 10th grade chemistry teacher, Mr. Eibner, hammered into us the importance of dimensions: ‘true statements in fraction form’. Not content with that pearl, he also taught us: ‘partying in college goes a lot farther than partying in high school’. Legend.

Reply ↓
John N-G on June 16, 2020 3:51 PM at 3:51 pm said:

Their equation seems to be a mashup of two equations. Here’s the basic equation:
P(infection | contact) = Susceptibility (P(infection | attack) = 1) * Attack rate (P(attack | infectious contact) = 0.1) * Community incidence (P(infectious contact | contact))
where P(infectious contact|contact) = P(infectious in community | infectious) * P(infectious)
P(infectious) = reported cases/day/person (=6/100K) * correction for unreported cases (=10) * #days infectious (=8) * fraction of infectious days spent circulating (=0.55)
In the paper’s symbols, this is P = S beta Ireported [1 + alpha/(1-alpha)] Dinfectious lambda
Doing the math: P = 1 * 0.1 * 6/100,000/day * 10 * 8 days * 0.55 = 3788 (close enough to 3836 in paper)
Key assumptions: only 10% of true cases are reported, and there about 8*0.55 = 4.4 days of community infectiousness per true case.

The units do work out correctly.

Lambda is the “proportion of infected persons circulating” which is 0.55. It is calculated from:
(sigma + eta(1-sigma))
where sigma is the proportion of infectious time pre-symptomatic (=0.4) and eta is the noncompliance rate while symptomatic (=0.25).

The equation in the paper has written lambda on the RHS, and then written the formula for lambda on the same RHS of the same equation! To make matters worse, there’s an extra lambda in the formula for lambda!

Evidence that I’ve analyzed this correctly: the paper never assigns a value for lambda, and my answer is consistent with the paper’s calculation. So hoo boy.

From there: the symptomatic hospitalization ratio for 50-64 yrs is 0.045 so the paper assumes that the true hospitalization rate is 10% of that (because of underreporting of true infections), or 0.0045. Likewise, the fatality ratio of 0.002 gets multiplied by 10% too.

Reply ↓
- Joseph Candelora on June 16, 2020 5:58 PM at 5:58 pm said:
  
  They most definitely screwed something up in their formula, although my guess on how they messed up lambda is different than yours. Check out Table 1 — they did set a value for lambda; they set it to 100% (“Proportion of infectious individuals circulating in the community: 100%”). So that extra lambda (or lambdas) in the formula doesn’t(don’t) in the matter — it’s all multiplication by 1.
  
  My guess is that at some point they were going to include a factor reflecting the percentage reduction in activity of people who are sick, but then decided to just use 100% as close enough. But then again that would be “proportion of symptomatic people circulating”, not “infectious”, but who knows.
  
  They also didn’t close their first parentheses in the formula. And have a typo in one of their variables on RHS: “IReproted”. (I only noticed the latter because I copied out their formula to Notepad; that subscript in the PDF is entirely too small to read at anything less than 200%.)
  
  Reply ↓
Mendel on June 16, 2020 4:23 PM at 4:23 pm said:

The CDC has published 5 scenarios (sets of assumptions) for modeling. Their “best estimate” has 40% “transmission before symptom onset”, R0=2.5, Percentage of infections that are asymptomatic 35%, Infectiousness of asymptomatic individuals relative to symptomatic individuals 100%, from table 1, scenario 5 at https://www.cdc.gov/coronavirus/2019-ncov/hcp/planning-scenarios.html#table-1

With an attack rate of 10% (from the paper cited in the blog post), the average infected person needs 25 contacts to achieve R=2.5. If we isolate everyone with symptoms (which is feasible in the summer, but not in the winter), we eliminate 40% of transmissions (presymptomatic and asymptomatic transmission continues), making Ri=1.5. Then everyone needs to reduce their contacts by at least 33%, and we’d have the epidemic licked. This is independent of any “contact risk” or even the attack rate, but it depends on the abovementioned assumptions.

Good contact tracing can find and isolate asymptomatic cases: basically, if you isolate all contacts of a known case, you’d be reducing the contact numbers for *infectious* individuals, and that could suffice to not need everyone to reduce their contacts that much: after all, what we need is for infectious people to have a low number of average contacts, and if we can find enough of them in time, we can push the average down substantially with contact tracing.

So basically, the study attempts to answer a question that is epidemiologically useless: if R isn’t below 1, the epidemic spreads, and if it spreads, it’ll get to many, many people eventually. Individual risk makes people feel safe individually, but if that leads to more contacts, the risk is rising.

Reply ↓
- Martha (Smith) on June 16, 2020 6:18 PM at 6:18 pm said:
  
  Good summary points in last two paragraphs.
  
  Reply ↓
- confused on June 16, 2020 7:19 PM at 7:19 pm said:
  
  >> Individual risk makes people feel safe individually, but if that leads to more contacts, the risk is rising.
  
  Yeah.
  
  And I don’t think this is even the right way to look at individual risk.
  
  Given my age, gender, and overall health, my chance of dying of COVID if I catch it is probably something like 0.1% or maybe a bit less (and maybe 1%-2% risk of hospitalization).
  
  In an uncontained epidemic, my chance of being infected might be 50% (probably less, as I live by myself and am fairly introverted, but …)
  
  So my total risk of death in an uncontained epidemic might be 0.05% (1 in 2000).
  
  But what does that mean? It’s not that insignificant, given how low the baseline risk of death for someone of my age and health is. But it’s not a large enough risk to warrant significantly reducing my quality of life over, either.
  
  So the main driver for “burdensome” measures would have to be my risk of infecting others… But most people I interact with are also at low risk (my workplace is mostly young and I see older family members only a few times a year).
  
  So I don’t know how helpful that is, except maybe to allay fear a bit.
  
  Reply ↓
- Steve on June 16, 2020 7:53 PM at 7:53 pm said:
  
  + 1 Agreed. There is a lot of these studies that you call “useless”. I would call them positively harmful because they imply that the rate of infection is static, and give the public the wrong idea. Often that the risk is low. The risk depends on what one is doing. In countries where the public conversation is about what to do to lower the risk, (New Zealand, Germany, Australia, Singapore, S. Korea), the risk is trending toward zero. In the U.S., where we seem to think its a good idea to worry about whether we are overreacting, the risk is plateauing and starting to rise again. It is dumbfounding.
  
  Reply ↓
  - Martha (Smith) on June 16, 2020 10:17 PM at 10:17 pm said:
    
    +1
    
    Reply ↓
  - Mendel on June 17, 2020 4:37 AM at 4:37 am said:
    
    Singapore is not a good example: while they managed to have success initially, the epidemic grew to a level that exceeds the US per capita.
    
    Reply ↓
    - Carlos Ungil on June 17, 2020 5:07 AM at 5:07 am said:
      
      Singapore is an interesting case, with a very low CFR (41k cases, 26 deaths). One of the factors seems to be that it’s mostly healthy, young foreign workers who get infected (as it happens also in some Persian Gulf countries).
    - Steve on June 17, 2020 7:43 AM at 7:43 am said:
      
      Again, with meaningless averages. Singapore is a fairly densely populated city. The US is an enormous country with parts where the virus has yet to reach. Compare Singapore to NYC, and it is clear that they managed the situation better than we did.
- nutmeg on June 21, 2020 1:05 AM at 1:05 am said:
  
  yes your second to last paragraph also points up why the delay in any testing (and absence of widespread testing) is in part responsible for why large spread lock downs were resorted to.
  
  Without enough information (or really any information) about who was sick, the only tool available was the bluntest one in the box.
  
  Reply ↓
Joseph Candelora on June 16, 2020 7:08 PM at 7:08 pm said:

I think most here are misunderstanding the goal of this paper. It isn’t a research study, it’s simply the Covid equivalent of the Drake equation. Go pick whatever values you like best for the parameters, then boom you have an estimate of your likelihood of getting sick/hospitalized/dead, under current conditions in your location.

I think we all get that if the virus is circulating at very low prevalence in a given location, then anyone’s chances of catching it are also very low — even if that person were to go about their life with the “normal” number of contacts. The authors are simply putting some numbers to that intuition.

Did they do a good job? No. Pretty terrible, in fact. In addition to the problems in writing up their equation that John N-G discussed above, they made two significant errors in their calcs.

Summary:
1. They misused the CDC’s estimated of Symptomatic Case Fatality rate, and used fatality rates that are a factor of 6.5 lower than they intended. (And even if they _had_ used the estimates they intended, they picked pretty bad ones from the CDC.)
2. They got their formula wrong with respect to duration of infectiousness of non-identified infections, which cut all their likelihoods by a little less than half.

Reply ↓
- Joseph Candelora on June 16, 2020 7:09 PM at 7:09 pm said:
  
  Details:
  1. In Table 1, they list
  “Symptomatic Case Fatality Ratio, 0 to 49 years 0.0005
  Symptomatic Case Fatality Ratio, 50 to 64 years 0.002
  Symptomatic Case Fatality Ratio, Over 65 years 0.013”
  
  Having read through the paper a couple times, I’m confident that they’re using “Symptomatic Case Fatality Ratio” to just mean what is typically termed “case fatality ratio” — i.e. the number of deaths divided by number of positive cases. That understanding is supported in this illustration on page 8:
  “For example, for an individual in the 50-64 year old age group the ratios between infections; reported cases,
  hospitalizations, and deaths are: 10:1:0.045:0.002. (Table 1)”
  In Table 1 they show “Proportion of infections either asymptomatic or unreported: 90%”. That explains the 10:1 ratio between “infections” and “reported cases”, and then the 1:…:0.002 ratio between “reported cases” and “deaths” just restates their Symptomatic Case Fatality Ratio.
  
  On page 10-11, they say the SCFRs come from CDC’s pandemic planning guidance, Scenario 5 (best estimate). See here: (https://www.cdc.gov/coronavirus/2019-ncov/hcp/planning-scenarios.html).
  And those SCFR numbers do in fact come from that CDC report, but the CDC specifically cautions on what they mean:
  “Symptomatic Case Fatality Ratio: The number of symptomatic individuals who die of the disease among all individuals experiencing symptoms from the infection. ***This parameter is not necessarily equivalent to the number of reported deaths per reported cases***, because many cases and deaths are never confirmed to be COVID-19, and there is a lag in time between when people are infected and when they die. This parameter reflects the existing standard of care and may be affected by the introduction of new therapeutics.”
  
  So in case it’s not clear: The CDC’s Symptomatic Case Fatality Ratio is not the percentage of **identified** cases expected to die, it’s the percentage of *all symptomatic infections* expected to die. Further note that in Scenario the CDC also assumed in this scenario that only 35% of cases never show symptoms. So the CDC’s IFRs would simply be 0.65*the reported SCFRs.
  
  The authors glossed over this. They applied this SCFR in conjunction with their assumption that only 10% of total infections would be identified as cases, with the 90% comprising both asymptomatic _and_ untested symptomatic cases. So the authors are ultimately using an IFR that’s equal to 0.1*the reported SCFR.
  
  If they had paid attention to the CDC’s warning that the number was meant to be applied to total symptomatic infections and not just identified cases, they would’ve come up with mortality rates 6.5 times higher.
  
  As an aside, I still don’t think that’s high enough; the CDC’s “best estimate” from the end of April was itself problematic.
  
  An example showing how both fatality ratio estimates seem too low:
  The CDC’s expected sympomatic case fatality ratio for people age 65+ is 1.3%.
  To date, we’ve had 76,000 Covid deaths in the US among that age cohort. Given that known death total, a 1.3% “symptomatic case” fatality rate implies there have been 5.8m symptomatic cases in that cohort to date. The US only has 35m people who are age 65+.
  – So the CDC’s estimate implies fully 5.8/35 = 17% of the US age 65+ population has already had a symptomatic Covid infection. Really? And the CDC assumed that symptomatic only comprises 65% of total infections, so that implies 25% of US age 65+ population has already had the disease. Really??? I think that’s probably high by factor of 3 to 5 (I’m guessing actual infection prevalence is in the range of 5% to 10%). The CDC’s IFR is too low.
  – And the way the authors used the numbers of course produces an even more absurd result. For them, it would be 5.8m **identified** cases, so they would be saying that 17% of the age 65+ population has had an _identified_ case of Covid. Really???? And with their assumption that only 10% of cases are identified, they’re saying 165% of the US age 65+ population has had it. Fantastic.
  
  So yes Andrew, their numbers make no sense. The only way to get there is if you think everybody in the US age 65+ has already been infected (not exposed, infected!) — and not just that; over half of them must have been infected twice!
  
  I think the authors’s mortality numbers are likely low by more than an order of magnitude. Probably something on the order of a factor of 15 to 30.
  
  Reply ↓
  - confused on June 16, 2020 7:26 PM at 7:26 pm said:
    
    Yes, the misuse of symptomatic vs. reported cases is a really critical error.
    
    The CDC IFR is definitely too low for cases so far. In their (partial) defense, though, this is a “planning” document, and I think there is good reason to think that the IFR for people infected after, say, the end of May will be notably lower than those infected in March. Care is improving quite a bit, for one thing.
    
    Also, I think the IFR in places like Texas probably has to be significantly lower than in New York. The deaths/confirmed cases ratio is low enough that if IFR is ~1%, an anomalously high proportion of all infections were detected with a relatively small number of tests. If that’s true, then the national IFR will drop as the New York/New Jersey infections become a smaller percentage of the total.
    
    Reply ↓
    - Joseph Candelora on June 16, 2020 8:59 PM at 8:59 pm said:
      
      I added this back on the other thread, but I think we can consolidate the conversation here.
      
      I haven’t seen anything in the data that suggests that NY death rate has changed over time or with resource utilization. This is very simplistic, but I was grabbing the numbers from Cuomo’s daily briefings until he stopped giving them:
      https://i.redd.it/hdz4cjox6d551.png
      
      It will certainly change over time as the demographics of the infected group evolve. I haven’t looked at how NY, and NYC really, compare to rest of the country in a couple months, but if memory serves it’s a bit older with a bit higher prevalence of comorbidity, so I guess will come down a bit as it infects a broader population across the country. But probably not that much.
      
      I just don’t think there was any justification for those estimates at the time the CDC put them out.
    - confused on June 17, 2020 4:06 AM at 4:06 am said:
      
      NY death rate might not have changed much, I don’t know much about that.
      
      But NY vs. other large states… there does seem to be something going on.
      
      TX has ~2000 deaths. If IFR is 1%, 2000 deaths imply 200,000 infections as of a few weeks ago.
      
      But TX has something like 90,000 confirmed cases and 60,000 recovered. It’s really hard to believe, given the tests per day/population ratio, that we were catching more than say 1/5 of all cases. Maybe now with focused testing of hotspots, better reporting from prisons… maybe. But not before end of May.
      
      For a while people were saying only 1/10 or so were being detected, though that’s probably much better now.
      
      So if confirmed cases are 1/5 of all infections, TX has ~450k infections, ~300k recovered… IFR maybe 0.45%-0.67%.
      If confirmed cases are 1/10 of all infections, TX has ~900k infections, ~600k recovered… IFR maybe 0.23%-0.33%.
      
      Seems pretty incompatible with ~1.1% for NY.
      
      And rates can vary wildly between different places based on demographics of those infected. Singapore has a *CFR* (not even IFR) less than 0.1%!
      
      Other states have similar things going on. Georgia’s fatality curve departed from its case curve (fatalities lower) long enough ago that lag in deaths alone doesn’t seem to be a sufficient explanation.
    - Carlos Ungil on June 17, 2020 4:28 AM at 4:28 am said:
      
      > TX has ~2000 deaths.
      
      Reported.
      
      > If IFR is 1%, 2000 deaths imply 200,000 infections as of a few weeks ago.
      > But TX has something like 90,000 confirmed cases
      
      Those are the numbers as of June 15. But shouldn’t you look at confirmed cases “a few weeks ago”. 2 weeks before: 27% less (67,000). 3 weeks before: 37% less (57,000). 4 weeks before: 45% less (49,000).
    - confused on June 17, 2020 11:09 AM at 11:09 am said:
      
      >>Reported.
      
      Yeah. Large-scale underreporting of deaths is another potential explanation, but I don’t think it’s particularly plausible.
      
      >>Those are the numbers as of June 15. But shouldn’t you look at confirmed cases “a few weeks ago”
      
      Sure, that’s why I also compared to recovered cases only (~60,000 reported, so 300,000 if 1/5 are detected, 450,000 if 1/10 are detected).
    - Carlos Ungil on June 17, 2020 1:47 PM at 1:47 pm said:
      
      Sorry, I missed the adjustment (1/3 looks fine).
      
      Anyway, I don’t think a factor of 2 between the NYC and Texas numbers necessarily makes them “pretty incompatible”.
      
      The IFR is quite sensitive to changes in the age distribution of infections. The median age of victims seems to be higher in NYC than in Texas (but that’s based on only one third of the cases with “completed fatality investigations”, so who knows). The imbalanced age distribution may be due to structural reasons (younger population in Texas, 8% are 70+ rather than 10%) and to different infection rates due to epidemic dynamics (less homogeneous in the early phases, distancing measures in place relatively earlier in Texas reducing exposure of elderly compared to younger population).
    - confused on June 17, 2020 7:34 PM at 7:34 pm said:
      
      >>The median age of victims seems to be higher in NYC than in Texas (but that’s based on only one third of the cases with “completed fatality investigations”, so who knows)
      
      Yeah, that 1/3 thing is why I wasn’t leaning too heavily on the age differences. TX median age is younger than NY though, so there is probably some effect. Lower density might also reduce infections among the independently-living elderly… maybe more 65+ Texans live in their own house by themselves or with a spouse, and there are more multigenerational households in New York?
    - confused on June 17, 2020 7:36 PM at 7:36 pm said:
      
      >>The IFR is quite sensitive to changes in the age distribution of infections
      
      I guess maybe I’m missing something, but I would count that as a “real” difference in IFR rather than an apparent one due to reporting issues.
      
      That’s what I meant by “incompatible” – not that either NY or TX must have fake or irretrievably flawed data, or that the virus itself must be different, but that the IFR is genuinely different in NY vs. TX.
- Steve on June 16, 2020 8:04 PM at 8:04 pm said:
  
  Joseph writes, “I think most here are misunderstanding the goal of this paper. It isn’t a research study, it’s simply the Covid equivalent of the Drake equation. . . . I think we all get that if the virus is circulating at very low prevalence in a given location, then anyone’s chances of catching it are also very low — even if that person were to go about their life with the “normal” number of contacts. The authors are simply putting some numbers to that intuition.”
  
  I would say that if that is the goal, it is a pretty unworthy goal to have. We don’t need a number put to the intuition that when the virus is not widely circulated the risk is low. All that number can do is mislead.
  
  Reply ↓
  - Zhou Fang on June 16, 2020 8:21 PM at 8:21 pm said:
    
    Right. The lead author has done an editorial on National Review pushing the end of lockdown. Like I said in my earlier response, the core of this preprint is trying to express a large risk as a dismissable small one, and the basic way they do that is by linearising an exponential growth by looking at instantaneous individual risk assuming prevalence remains constant. There’s various other errors but that’s not the point.
    
    Reply ↓
    - Joseph Candelora on June 16, 2020 9:17 PM at 9:17 pm said:
      
      I’m actually a bit torn on how worthy an endeavor I think this is (while being endlessly frustrated by the parade of voices making arguments to end lockdown, most in bad faith). I don’t think it’s worth exploring my feelings here.
      
      But I do want to quote a bit from the paper that I think helps illuminate what they tried to accomplish and why:
      
      “Absent systematically collected and reported, robust case risk factors and contact tracing data,
      gauging relative risks among individual types of settings is speculative. Still, society has taken
      dramatic and unprecedented steps to control COVID-19, choosing to apply universal contact
      reductions through home confinement, limits on travel, closures of schools and businesses and
      limits on gatherings. While heightened perception of risk (fear) motivated those proscriptions on
      social contact at the outset of the epidemic, ongoing restrictions on community activity may be
      mediating ongoing risk perceptions. Today, even with transmission falling in many places and
      States re-opening, public sentiment surveys indicate a high level of apprehension about returning
      to everyday community activities. As of May 20th, 2020, over half of the US population fears
      getting a haircut, going shopping, or visiting a friend. (3)
      While data does not permit estimating setting specific risks of COVID-19 transmission,
      sufficient data do allow an estimate of the average individual-level probability of infection across
      all community settings. Here, we contribute to COVID-19 risk perception by estimating the
      individual probabilities of acquiring infection, being hospitalized, and dying from communitylevel contacts in large U.S. Counties. Our findings may inform both the public as well as policy
      makers less familiar with epidemiologic metrics. Equally important, we identify areas of
      available and future knowledge that could make risk assessment more precise and context
      specific.”
    - Martha (Smith) on June 16, 2020 10:30 PM at 10:30 pm said:
      
      Joseph said,
      
      “While heightened perception of risk (fear) motivated those proscriptions on
      social contact at the outset of the epidemic, ongoing restrictions on community activity may be mediating ongoing risk perceptions. Today, even with transmission falling in many places and States re-opening, public sentiment surveys indicate a high level of apprehension about returning to everyday community activities. As of May 20th, 2020, over half of the US population fears getting a haircut, going shopping, or visiting a friend. (3)”
      
      I think the high level of apprehension was warranted — as indicated by today’s reports of what has been happening in Texas since lessening of restrictions: https://www.npr.org/sections/coronavirus-live-updates/2020/06/16/878924556/as-texas-coronavirus-cases-reach-new-high-gov-abbott-plays-down-the-numbers
    - Martha (Smith) on June 16, 2020 10:54 PM at 10:54 pm said:
      
      And then there’s this from the Onion yesterday:
      https://local.theonion.com/city-enters-phase-4-of-pretending-coronavirus-over-1844037065?utm_medium=sharefromsite&utm_source=onionlocal_facebook&fbclid=IwAR25zhK6eqcSpoPDyYBOZ9nEjJr6Fgbzb8GFnDDzZ_HLlDECGPKKdIbBTu8
    - Joseph Candelora on June 16, 2020 11:18 PM at 11:18 pm said:
      
      Yes and no.
      
      I’ve been watching hospitalization numbers from the Texas Medical Center (Houston). They’re showing pretty steady exponential growth for the last 4+ weeks.
      
      Thing is, the growth rate is much much lower than in the initial wave, particularly the initial wave in NYC. Houston is seeing a doubling time of about 3 weeks, whereas in NYC it was more like 4 or 5 days.
      
      That makes for effectively a different situation. In NYC it was necessary to act well in advance, because if it takes say two weeks for a measure to take full effect then you’re talking a full order of magnitude difference in caseload between when you act and when you get the benefit. In Houston at current, you’re talking less than a doubling of caseload between action and effect.
      
      That also means that Houston is much closer to an R0 of 1 than NYC before lockdown, meaning the additional restrictions Houston would need to get under 1 are relatively modest.
      
      It looks like they’ll need to clamp down a bit more, but they can afford to wait another few weeks. And if it’s a situation where lockdown fatigue and breakdown of trust means that people will need to actually see pictures of overwhelmed ICUs before they take it seriously and change behavior… well that’s bad, but it’s not as nightmarish as you might think based on the first go-round, given that things won’t deteriorate as rapidly.
    - Tom Passin on June 16, 2020 11:33 PM at 11:33 pm said:
      
      But there is one thing about these numbers. If you compare the Texas daily death rates with the confirmed case rates (John-Hopkins data), you will find that the daily case numbers have been going up for about 25 days but the death rates have been going steadily down with no apparent change in behavior. (It’s easier to see this behavior if you apply some smoothing since the data vary so much. LOWESS smoothing using a 6-day window seems to work well here. Smoothing the total case curve before taking finite differences works better than the other way around, because of end effects at the last few days of data).
      
      Either the hospitals are doing a progressively better job at keeping people alive, and by fairly large factors, or Texas has been doing progressively more testing over those 25 days. I presume it’s the latter. In that case, the rise in case numbers would really be a result of more testing, not more infections.
      
      Since the death curves tend to lag behind the new case number by around 6 days in many places, any dramatic changes in the last few days wouldn’t be showing up in the fatality data yet.
      
      I see the same pattern for many (not all) U.S. states I’ve looked at.
    - Joseph Candelora on June 17, 2020 8:53 AM at 8:53 am said:
      
      The numbers I referred to were Covid hospitalizations, not Covid confirmed cases. It shouldn’t be subject to the distortion by test volume you’re referring to. The trend is there from about 5/18.
      
      Looking at Worlodometer data for Texas deaths since 5/25 shows level to a slight uptick, but we’re talking very small numbers – fewer than 25 deaths per day on average.
      
      I would guess that in another month of similar conditions we would have a better sense of the death curve following the hospitalizations, but for now it’s just too low prevalence and noisy.
    - confused on June 17, 2020 1:48 PM at 1:48 pm said:
      
      Hospitalizations shouldn’t be distorted by testing alone, but there might be other factors, such as better surveillance of nursing homes (i.e. people who would have died without going through a hospital in April now get to a hospital). Texas is testing everyone in a nursing home in the state, so…
      
      Hospitalizations up / deaths down does look very odd, and the lag from hospitalization to death shouldn’t be as large as from infection to death – 5/18 should have started showing up in the death numbers at least a week ago.
      
      If you take 7-day averages of the DSHS death numbers*, it does seem to show a decline from the numbers TX was seeing end of April to mid-May. Yesterday’s was the highest this month (though still lower than several weekdays in May) but yesterday was also a “data dump” of cases… not sure if that applies to deaths.
      
      *I’ve been doing this since late April.
      
      week of 4/26 to 5/2 = 32
      week of 5/3 to 5/9 = 28.9
      week of 5/10 to 5/16 = 36.6 (includes highest daily deaths so far, 58 reported on 05/14)
      week of 5/17 to 5/23 = 28.7
      
      …
      
      past 7 days = 25.1 (highest day = 46, yesterday)
      
      Today’s and tomorrow’s numbers should show which it is (data dump, or beginning of an uptick).
      
      —
      
      That’s a lot of words and numbers to basically say “we don’t know yet”.
    - confused on June 17, 2020 1:52 PM at 1:52 pm said:
      
      Also, TX testing has increased from what it was early on, but I think the significant change in the last ~month is not number of tests but who is being tested — focused testing of likely “hot spots” and congregate settings (prisons, nursing homes, etc., I think meatpacking plants too, but not sure if it is only ones with known cases — vs. all nursing homes).
    - confused on June 19, 2020 2:39 PM at 2:39 pm said:
      
      >>Today’s and tomorrow’s numbers should show which it is (data dump, or beginning of an uptick).
      
      Well, I was overconfident – they really don’t. Wednesday was comparable to last Wednesday (difference of +1). Thursday was somewhat higher, but lower than Tuesday, and Wed/Thu are usually the highest reporting days for TX.
      
      Also, Houston Health Department says the daily deaths being reported from Houston are not really “new” (largely May, some even in April). If that is true for other cities, the daily deaths reported may not represent the curve in occurrence of deaths.
    - Joseph Candelora on June 20, 2020 1:01 AM at 1:01 am said:
      
      The TMC hospitalization rate of growth has ticked up and sustained the higher rates over the last few days, showing a trend that I’d now say is looking worrisome and potentially urgent.
      
      If the hospitalization/death relationship really holds, and if Houston’s TMC is broadly indicative of Texas, then we should see it reasonably clearly in deaths within a week — averaging 45-50+ per day, and sustaining all-time highs in 7 day average deaths.
      
      A few days ago I would’ve said it’s a too early to reinstitute lockdown in Houston, and give it two weeks to wait and see. I think I have to revise that down to one week now.
    - confused on June 25, 2020 5:44 PM at 5:44 pm said:
      
      A week after my last comment, hospitalization vs. deaths still looks weird for TX.
      
      Today’s deaths are 47, which is relatively high (43 last week). But the 7-day average of daily deaths is 27.3, which is a bit higher than it was when I posted June 17th, but still well below May’s number.
      
      This may well be just the beginning of an uptick in deaths (hospitalization is certainly rising rapidly, especially in Houston).
      
      But that doesn’t totally explain away the weirdness. 2 weeks ago TX current hospitalizations were already over 2000 – higher than they were 2 weeks before early/mid-May when the highest deaths so far were reported. Why would ~1500-1800 hospitalizations translate into ~36 deaths/day but 2000+ hospitalizations translate into ~27 deaths/day?
      
      Is the threshold for being hospitalized lower now? Is it a matter of younger demographics getting infected, so fewer deaths as a percentage of hospitalizations?
      
      Or is it all just a matter of reporting lags (which do seem bad)? So the dates deaths are reported do not form a curve anything like the actual dates of death?
    - confused on June 17, 2020 4:11 AM at 4:11 am said:
      
      I live in Texas, and I am not sure what is really going on here.
      
      Deaths do not seem to be following the case curve. ICU hospitalizations, at least locally (I haven’t seen ICU vs. all hospitalizations for the whole state) seem pretty flat while total hospitalizations are rising.
      
      Does that mean fewer hospitalizations are critical now? Why? Maybe something to do with keeping COVID+ people out of nursing homes (wild baseless speculation)?
      
      New daily cases are inflated by lots of prison cases being added in on “day reported” not “day actually tested” (creating a big one-day addition), but cases are rising even without that effect.
      
      There is a program going on to test all kinds of hotspots – every nursing home in the state, prisons, meatpacking plants, etc. So it’s not clear how much of what we’re seeing is “general community” vs “hotspot driven”.
    - Daniel Lakeland on June 17, 2020 11:56 AM at 11:56 am said:
      
      There’s also a plausible effect of the virus mutating to less severe form. Lockdown is a fairly strong selective pressure for more transmissible versions, and for less severe versions (so that more asymptomatics are out there doing the spreading).
      
      In the end, we may find that the main positive effect of lockdown was really to modify the virus to be less virulent and more contagious.
      
      Right now we don’t know enough about the genetics and how genetic changes affect virulence.
    - Zhou Fang on June 17, 2020 1:05 PM at 1:05 pm said:
      
      Could there be a seasonal/climatological factor where warmer weather makes respiratory disease less lethal? Flu deaths I understand peak in the winter months, so maybe a similar dynamic is at play with covid19. Not so much the disease itself going away in summer, but symptoms being more endurable.
    - Daniel Lakeland on June 17, 2020 1:12 PM at 1:12 pm said:
      
      Zhou, it’s definitely possible. With Flu there’s a complicated dynamic due to the fact that birds migrate and they are a primary source of flu virus. They can then spread to pigs, which is another source. And pigs can get infected by both avian and human flus so there’s genetic recombination that occurs. This bird migratory pattern isn’t an issue with COVID. so… it’s complicated.
    - confused on June 17, 2020 1:17 PM at 1:17 pm said:
      
      It’s a possibility, but I think coronaviruses don’t mutate anything like as quickly/easily as influenza viruses, so wouldn’t it be a bit quick for that to happen?
    - Zhou Fang on June 17, 2020 1:21 PM at 1:21 pm said:
      
      There’s a little graph of seasonality of various causes of deaths here. So perhaps this factor can explain some of the variation. Maybe someone can just out a spatial graph as well.
    - confused on June 17, 2020 1:29 PM at 1:29 pm said:
      
      One idea is that vitamin D, which is better in summer (due to sunlight), strengthens the immune system against respiratory infections.
      
      IIRC, there are at least four hypotheses for seasonality of colds/flu/etc., but no clear picture on which is correct (or, if all are, which is most important):
      – increased indoor activity in winter;
      – school years;
      – summer conditions reduce virus survival outside the body;
      – vitamin D
      
      I’m far from an expert, though (I have only an undergraduate degree in biology and no specific medical / epidemiological training).
    - Zhou Fang on June 17, 2020 1:37 PM at 1:37 pm said:
      
      The vitamin D hypothesis feels unlikely. I would have suggested some kind of temperature effect – as you can see, extremes of temperatures are significant for a broad range of ailments. It would not surprise me at all if many people are just generally in weaker health during colder times, and this makes them vulnerable to infections.
      
      https://journals.plos.org/plosmedicine/article/figure/image?id=10.1371/journal.pmed.1002619.g004&size=inline
    - confused on June 17, 2020 1:53 PM at 1:53 pm said:
      
      Maybe, but then why does Texas still have a flu season? In a place where winters are often very mild and air conditioning is ubiquitous, I’m not sure the temperature that most people are actually exposed to (except outdoor workers) is that different between winter and summer.
    - Zhou Fang on June 17, 2020 3:30 PM at 3:30 pm said:
      
      I can’t find any data on the seasonality of texan mortality. There could be still some seasonal effects from other aspects.
Joseph Candelora on June 16, 2020 7:09 PM at 7:09 pm said:

2. They claim in their paper that “Conservatively, we treat asymptomatic and unreported fraction as infectious for the same duration as those with symptoms.”

That is shown to be untrue by examination of their formula on page 7. See John N-G’s comment above for a good description of how it works. (I’ve also validated my understanding of the formula by replicating the numbers they provided.)

Contrary to their statement, they have not assumed that asymptomatic/unreported are infectious for the 8 days they assume symptomatic people to be infectious. Rather, they have used the same weighted-average duration of time spent both infectious and in the community that they did for symptomatic people. For people who develop symptoms, they assume the final 60% of the infectious period is spent symptomatic, and that on average only 25% of symptomatic people go out in the community during that time. In other words, they assume that a person who develops symptoms is only infectious and in the community for 8*(40% + 60%*25%) = 4.4 days.

It’s the downward-adjusted 4.4 days that they’re using for the 90% of the infected population they estimate to not be identified, not the full 8 days.

Had they used the “conservative” assumption they described, all of their likelihoods would be about 1.75 times what they published. The number of person-contacts required for an outcome would’ve been cut by about 45%.

Reply ↓
jim on June 16, 2020 8:08 PM at 8:08 pm said:

Here I think is the problem:

“Among the 100 most populous US Counties, for the week ending May 30, 2020”

Right? It’s for a week only, in the 100 most populous cities. Andrew uses 11,439 fatalities, but that’s for the entire epidemic, not for the week ending May 30. Here the CDC gives 257 fatalities for the stated week in the relevant age group. So rather than 5000 (actually 5447) contacts for the epidemic, the correct value is 122.4 contacts for the week – remember it’s just a week – in question, which is 17.5 contacts per day.

As I argued above each contact is unique even if it’s with the same person. So 17.5 contacts per day isn’t at all ridiculous in one of the hundred most populated counties in the US.

Reply ↓
- Andrew on June 16, 2020 8:12 PM at 8:12 pm said:
  
  Jim:
  
  If you count my family, I have hundreds of contacts per day without ever leaving the apartment.
  
  Reply ↓
  - jim on June 17, 2020 1:12 AM at 1:12 am said:
    
    Well I guess you have to take seriously the idea of figuring out what counts as an independent contact in the real world if you want to count the number of contacts and do a model that has a hope of being realistic.
    
    Reply ↓
  - jim on June 17, 2020 1:14 AM at 1:14 am said:
    
    But that does explain why your numbers are so high.
    
    Reply ↓
Joseph Candelora on June 16, 2020 9:33 PM at 9:33 pm said:

Andrew, the other thing that I believe is confounding your analysis is that this paper provides purely a point-in-time estimate. They’ve estimated a bunch of parameters that they believe are constant and characterize the epidemic (e.g. 10% of contacts with infected people result in infections, 10% of infections are detected by testing, etc) as well as one additional a single parameter that applies to a locality at the current point in time: the percent of the local population that was newly infected with Covid that day.

US deaths peaked in mid/late-April at about 3 times the current rate. Separately I’m certain that when it was raging through NYC in early April, the top 5 counties accounted for well north of 80% of new infections/deaths. So using the current (or May 30, really) numbers from this paper to look retrospectively at numbers accumulated over the entire course of the pandemic just doesn’t work, and looking at the median county given the way cases were distributed doesn’t work either.

Reply ↓
- Martha (Smith) on June 16, 2020 10:43 PM at 10:43 pm said:
  
  May 30 was more than two weeks ago. What’s happening now (and in the next few days) is probably more relevant. In today’s news: https://www.npr.org/sections/coronavirus-live-updates/2020/06/16/878924556/as-texas-coronavirus-cases-reach-new-high-gov-abbott-plays-down-the-numbers . It remains to be seen what happens in the next couple of weeks.
  
  Reply ↓
  - jim on June 17, 2020 1:18 AM at 1:18 am said:
    
    Yes, it’s for one of the lowest weeks in a long time. But doubling or quadrupling the death rate doesn’t seriously change the odds of contracting an infection on any single contact. Also probably a large portion of new infections are coming from crowds and protesting, so people who maintain social distancing probably have much lower oddds.
    
    Reply ↓
Terry on June 16, 2020 10:01 PM at 10:01 pm said:

Why is this article deemed worthy of all this effort? Because there is a Stanford connection? Because the reporter is implicitly vouching for its worthiness? Why does the reporter think this is worthy of attention?

It sounds badly written. Why should we put in all this effort if the authors can’t be bothered to make it readable?

Reply ↓
Clifton on June 17, 2020 12:57 AM at 12:57 am said:

I’m sorry if someone else has already posted this — I haven’t read all the comments.

But 1/4000 chance of infection per contact and 1/20,000,000 chance of death per contact implies a 1/5000 chance of death once infected (in the 50-64 age group). I don’t think this is remotely true. I haven’t checked, but I’d guess 1/100, which is a factor of 50 higher. That would mean to get the deaths that have actually happened, you’d have 1 contact per day, not 50.

So it could be that the whole paper is correct, except they used the wrong death rate for the age group. Certainly 1/3826 sounds very plausible.

Reply ↓
- confused on June 17, 2020 4:14 AM at 4:14 am said:
  
  Yeah, someone misread the CDC numbers by confusing “reported case” with “symptomatic case”, which effectively divided the risks of hospitalization and death by a factor of 6.5.
  
  Reply ↓
Paul on June 17, 2020 9:04 AM at 9:04 am said:

Having read the paper, my opinion is that their analysis of the risk of infection is generally correct but conceptually misses the point entirely.

In response to some of the above comments:
1. The dimensions of the equation work out, if you understand I_reported to be the rate of reported cases *per day*. That’s not at all clear from the equation or Table 1, but it matches the number they’re using (about 6/100,000) and in the text they repeatedly refer to “daily case incidence”.

2. The definition of “contact” that they’re using seems to be: Any interaction with an individual, that if the individual is infectious, would increase your chance of getting COVID by 10%. So if walking by someone on the sidewalk has a 1% of transferring COVID, that would count at 0.1 of a contact.

For a quick back of the envelope calculate (I mean, even more than this paper already is), if 6 out of 100,000 people are reported as infected each day, only 10% of all infectious is reported, and each case is infectious for 8 days, then on any given day, 480 out of 100,000 people are infectious. That means you’d have to have encounter 100,000/480 = 208 people before meeting someone with COVID. If each encounter with an infectious person has a 10% chance of transferring COVID, then you’d need to encounter 2080 people before being expected to acquire COVID. The numbers in the paper are higher, because they assume that symptomatic individuals will generally sequester themselves, meaning any random person you meet is less likely to have COVID than you would expect from the daily incidence rate.

Reply ↓
- Paul on June 17, 2020 10:05 AM at 10:05 am said:
  
  For the fatality rate, I agree with the other commenters that I looks like they’re using the estimated fatality rate per symptomatic cases, but treating it as the fatality rate per reported case. If we use the estimate that 65% of all cases are symptomatic, rather than 10% of all cases are reported, that changes the estimates contacts per fatality to 3836 / 0.045 / 0.65 = 130,000 contacts.
  
  The bigger issue to me is that I think the shutdowns and stay at home orders aren’t about keeping me safe, they’re about keeping the community safe. And that means keeping the spread of COVID as low as possible. If the daily incidence rate gets high enough, then the number of contacts per fatality is going to plummet.
  
  So let’s invert the calculation and ask: If I have COVID, how many people do I need to contact before I am likely to spread it to more than 1 person? We’ll match the paper, and assume every contact has a 10% chance of spreading COVID. We’ll also assume that you can’t get COVID twice. Based on the New York Times data, in the 100 most populous counties, an average of about 450 cases of COVID have been reported per 100,000 people. Using the 10:1 ratio of total:report cases, that’s 4,500 out of 100,000 people who have ever had COVID, meaning 95,500 out of 100,000 are suceptible. So to infect 1 person, it takes on average 1 / (0.1 * 0.955) = 10.5 contacts. And based on equation 1 in the paper, 40 % of infectious people don’t know that they’re infectious, so it’s not enough to quarantine people once they show symptoms.
  
  Reply ↓
  - Zhou Fang on June 17, 2020 1:08 PM at 1:08 pm said:
    
    This is exactly right.
    
    Reply ↓
  - Joseph Candelora on June 17, 2020 2:22 PM at 2:22 pm said:
    
    Well put.
    
    But all that said, the current incidence rate is still important: if it’s very uncommon in the population, then it’s very unlikely I’m infected, thus it’s very unlikely I’ll give it to someone else.
    
    There does come a point where continued lockdown is protective against an unsubstantial risk, and not worth the cost. Of course that time won’t last forever once you release the lockdown, but might as well take the reprieve, unless you think you can drive to true viral extinction (I don’t think you can).
    
    Reply ↓

Statistical Modeling, Causal Inference, and Social Science

Am I missing something here? This estimate seems off by several orders of magnitude!

98 thoughts on “Am I missing something here? This estimate seems off by several orders of magnitude!”

Leave a Reply Cancel reply