New analysis of excess coronavirus mortality; also a question about poststratification

Uros Seljak writes:

You may be interested in our Gaussian Process counterfactual analysis of Italy mortality data that we just posted.

Our results are in a strong disagreement with the Stanford seropositive paper that appeared on Friday.

Their work was all over the news, but is completely misleading and needs to be countered: they claim infection fatality ratio as low as 0.1%, we claim it cannot be less than 0.5% in Santa Clara or NYC. This is not just a bad flu season, as people from NYC probably can confirm.

From the paper by Modi, Boehm, Ferraro, Stein, and Seljak:

We find an excess mortality that is correlated in time with the COVID-19 reported death rate time series. Our analysis shows good agreement with reported COVID-19 mortality for age<70 years, but an excess in total mortality increasing with age above 70 years, suggesting there is a large population of predominantly old people missing from the official fatality statistics. We estimate that the number of COVID-19 deaths in Italy is 52,000 ± 2000 as of April 18 2020, more than a factor of 2 higher than the official number. The Population Fatality Rate (PFR) has reached 0.22% in the most affected region of Lombardia and 0.57% in the most affected province of Bergamo, which constitutes a lower bound to the Infection Fatality Rate (IFR). We estimate PFR as a function of age, finding a steep age dependence: in Lombardia (Bergamo province) 0.6% (1.7%) of the total population in age group 70-79 died, 1.6% (4.6%) in age group 80-89, and 3.41% (10.2%) in the age group above 90. . . .

I have not read their article so you’ll have to make your own judgment on this—which I guess you should be doing, even if I had read the article!

In any case, this is all a tough subject to study, as we’re taking the ratios of numbers where both the numerator and denominator are uncertain.

Also, Seljak had a question about poststratification, arising from our discussion of that Stanford paper the other day. Seljak writes:

You say it is standard procedure, but I have my misgivings (but this is not my field). I understand that it makes sense when the data tell you that you have to do it, ie you have clear evidence that the infection ratio depends on these covariates. But the null hypothesis in epidemiology models is that everyone is infected at the same rate regardless of sex, race, age etc. Their data (50 positives) is too small to do regression analysis and get any signal to violate the null hypothesis. If so, and given the null hypothesis, then there is no point to post-stratify, one is just increasing the noise from importance weighting and not gaining anything. If so the crude rate of 1.5% would be the infection rate (ignoring the other issues you talk about). So by post-stratifying they are rejecting this null hypothesis, ie their underlying assumption of post-stratification is that the infection rate is not the same across these covariates. Say I know my sample is unbalanced against 10 things (age, zip code, sex, race, income, urban-suburban, it never ends). So how do I choose against which to post-stratify, when does one stop? Eg they did not do it for age, despite the large imbalance, how is this justified? Either you do it for all, and pay the price in large variance, or not at all?

What I am getting at is that post-stratification offers a lot of opportunity for p-hacking (could be subconscious of course): you stop when you like the answer, until then you try this and that. It is easy to give a posteriori story justifying it (say the answer was weird for age post-stratification, so we did not do it), but it is still p-hacking. Why is this not always a concern in these procedures, specially if the null hypothesis is saying there is no need to do anything? And in their case, this takes them from 1.5% to 2.8%.

My reply:

As the saying goes, not making a decision is itself a decision. It’s fine to take survey data and not poststratify; then instead of doing inference about the general population, you’re doing inference about people who are like those in your sample. For many purposes that can be fine. If you’re doing this, you’ll want to know what population that you’re generalizing. If you do want to make inferences to the general population, then I think there’s no choice but to poststratify (or to do some other statistical adjustment that is the equivalent of poststratification).

Regarding the point that 50 positive cases does not give you enough information to regression analysis: it’s actually 3300 cases, not 50 cases, as you poststratify on all the observations, not just the ones who test positive. And if you look at the summary data from that Santa Clara study, you’ll see some clear differences between sample and population. But, yes, even 3300 is not such a large number, so you’ll want to do regularization. That’s why we do multilevel regression and poststratification, not just regression and poststratification.

Regarding concerns about p-hacking: Yes, that’s an issue in the Santa Clara study. But I’d also be concerned if they’d done no adjustment at all. In general, I think the solution to this sort of p-hacking concern is to include more variables in the model and then to regularize. In this example, I’d say the same thing: poststratify not just on sex, ethnicity, and zip code but also on age, and use a multilevel model with partial pooling. I can’t quite tell how they did the poststratification in that study, but from the description in the paper I’m concerned that what they did was too noisy. Again, though, the problem is not with the poststratification but with the lack of regularization in the small-area estimates that go into the poststrat.

81 thoughts on “New analysis of excess coronavirus mortality; also a question about poststratification

  1. Their work was all over the news, but is completely misleading and needs to be countered: they claim infection fatality ratio as low as 0.1%, we claim it cannot be less than 0.5% in Santa Clara or NYC.

    Why do people keep talking about IFR like it is a property of the virus that must be the same everywhere?

    • From the paper:

      Our analysis shows that the IR vary a lot within a single country like Italy (Table 1). High estimates of CFR in Italy, for example 20% in Lombardia, can be understood by the high IR. In Lombardia, the total number of administered tests as of April 18 2020 was ≈ 2.5% of the population, and 0.6% of the population tested positive, compared to our estimated 23% infection rate. Therefore, we estimate that the infection rate is 35 times higher than number of test positives, and since these are the most severe cases that likely required hospitalization their fatality rate was significantly higher than for the overall infected population.

    • Why would we think the virus would be 5 times deadlier in one locale relative to another unless it’s a different virus (or a mutated form of it)? It seems that assuming the IFR to be same (conditional on (at least) age) is a safer assumption, but it is that… an assumption.

      • “Why would we think the virus would be 5 times deadlier in one locale relative to another unless it’s a different virus (or a mutated form of it)? It seems that assuming the IFR to be same (conditional on (at least) age) is a safer assumption, but it is that… an assumption.”

        We keep coming round and round to the demographics. Given that the severity of an infection APPEARS to vary by more than an order or magnitude between young (20’s, 30’s, 40’s) and old (70’s, 80’s and 90’s) people then an all-population IFR in an older-skewing region will indeed be as much as 5 times deadlier than all-population IFR in a very young region.

        There’s also no reason to dis-believe other factors (viral load, repeated exposures, environmental risk factors) are influenced the probability of dying conditional on “infected” status. Heck we’re not even sure what the best way to code “infected” versus “not infected” is for this particular disease.

        • Also,

          SES and race/ethnicity factors go a long way towards explaining disparities in health outcomes. Maybe not as influential in this situation as age, but they are a big reason why generalizing from Santa Clara *infection* rates to estimate national *fatality rates, seem me, should be a scientific no-no.

          I would wonder that if viral load (of exposure in contrast to the term used to describe amount of shedding) helps explain outcomes, then population density might affect “fatality” as well as “infection* rates.

        • That is certainly plausible. One day we’ll probably figure out if it’s actually true. It sure would help explain the over-the-top effect of this thing in NYC versus almost everywhere else in the USA.

  2. At the risk of sounding naive and clueless, I do speculate that the Santa Clara study broader claims are plausible despite the controversy generated over calculations. If you ask John Ioannidis, I would bet he would not suggest that the IFR is the same everywhere. Recall that around the time that John came out with that 1st article referencing the Princess cruise ship numbers, Dr. Fauci and others estimated anywhere from 100,000-240,000 US deaths. I further speculate therefore that nearly all or all modeling is subject to some error based on Andrew’s own blog comments over the last couple of years.

    What I might be saying, and I may be all nonsense, is that the Santa Clara study is not an example of the worst statistical calculation especially since the CDC and NIAI have a better handle on the fatality rate now. The broader claims from the Santa Clara study are in line with the CDC analysis in most respects.

    The more concerning issue is the reliability of the COVID19 tests for testing the broader population.

    • > If you ask John Ioannidis
      Sorry, the whole point of science is to avoid having to take anyone one’s word for it.

      When one does not publish openly giving all pertinent details (or offering to do so ASAP for a general or privacy restricted group), your work is not science and perhaps especially in a pandemic, should be treated at most as supportive (a polite word for hear-say).

      • Keith, you must know that I wouldn’t ever support withholding data. That was not the essence of my comment. Yes, experts, audiences should be transparent.

        You picked up on one point I made. Again, if you ask John whether he meant to suggest that the IFR rate is everywhere, I doubt that he would make such an absolute assertion.

        The substance of the Santa Clara claims doesn’t seem to be far fetched given my reading the blog comments so far.

        The problem is that John Ioannidis nor his co-authors are not here defending or clarifying their study. So that’s too bad.

    • I agree. The study’s estimate of Santa Clara residents who contracted the virus (48k – 81k by April) is more in line with models that have predicted the spread of COVID-19. The first related death in Santa Clara occurred on February 6th. Even if we assumed that the individual died relatively soon after contracting the virus (5 days vs average of 20-25), reaching 81k by April 1st is more in line with a 2.2 to 2.7 R0 until California’s shelter in place orders were issued.

      • The problem is that 81k infections by April 1st doesn’t seem compatible with 88 deaths by April 21st.

        They have published today the preliminary results of antibody tests in Geneva over the last couple of weeks (760 participants in total, a representative sample from a cohort of people that participate in annual health surveys). They estimate the population prevalence to be 3.5% (1.6%-5.4%) in the first week, 5.5% (3.3%-7.7%).

        The more recent estimate, 5.5%, represents 27’000 people. That’s higher than the official count (4710, 0.9% of the population), but not surprisingly so. The number of deaths reported is 205 (0.04% of the population). The IFR, whatever the relevance you may want to attach to that calculation, would be 0.8%.

        https://www.hug-ge.ch/medias/communique-presse/seroprevalence-covid-19-premiere-estimation

        • Carlos –

          > 760 participants in total, a representative sample from a cohort of people that participate in annual health surveys.

          Does that mean representative for Geneva? Would SES be one of the controls? Does SES have much explanatory power for health outcomes in Switzerland?

          If it is for Geneva, how representative would Geneva be for the whole country? Would Geneva be likely to have a higher prevalence of Pelle traveling from Lombardy?

        • Geneva is not representative of the rest of Switzerland. It’s among the cantons were the prevalence is higher. The situation seems relatively under control, the peak in hospital utilization was three weeks ago and in the last couple of weeks the number of deaths has increased by less than 25% in the last 10 days.

          The people tested is supposed to be representative for Geneva. I assumed it was from a cohort study, apparently it’s not the case: they conduct public health and epidemiological surveys and analysis on around 1’000 people recruited each year and they have contacted for this study people that have participated in that program before. I don’t know more than that (but surely it’s better than using Facebook ads).

          Geneva is not particularly close to Lombardy, but has (or at least used to have) people travelling from everywhere (lots of international organisations and headquarters of multinationals). Anyway, the point is that the prevalence they have found is *low* relative to the expectations of many people and consistent with conventional knowledge obtained from other places (not of the just-a-flu kind).

        • Carlos –

          Thanks for that info.

          By the by, as an American, who actually lived in Bergamo for a year, I’d say it’s pretty close to Geneva, but I guess that’s all relative. I was curious about rhe level of travel back and forth because I know that it used to be that a lot of Italians would travel to Switzerland to work. But I don’t know how much that’s still happening or if it happens much with Geneva?

        • Europeans think 100 miles is a long way, Americans think 100 years is a long time. But from Bergamo in particular Geneva is almost as far as you can get within Switzerland, at least as the crow flies: https://imgur.com/a/xSnLHDx

          Driving from Bergamo to Geneva also takes as much time as driving to Basel (4:12 vs 4:14 according to Google maps) while Zurich is closer (3:39). Lugano is much closer (1:28 from Bergamo) and is just half an hour away from Como in Lombardy.

          Over 65 thousand people (used to) cross the border every day to go to work in Ticino. Geneva has even more people crossing the border every day, but it’s a different border.

        • I agree that the low fatality rate implied is a concern, but we know that testing rates have been too low in most countries to obtain an accurate IFR.

          If the Stanford and USC studies are overestimating, by 10 to 20 fold, the number who contracted COVID-19 in Santa Clara and Los Angeles, then the virus is no where near as contagious as most think. This runs counter to what is being experienced in other areas. Residents of Santa Clara passed away from COVID-19 related causes well before the county thought anyone had it. In addition, at least six weeks passed from the first case(s) to California’s shelter in place orders. So it had ample time and an unaware population to spread. One would expect a very high transmission rate, yet neither location is or was a hot spot for deaths.

          Although they are probably flawed to an uncomfortable extent, the studies could offer insight and verify the exponential growth rates predicted a few weeks ago (“…roughly 22.4 million Californians could come down with the virus over the next eight weeks according to Governor Newsom…” —March 19, 2020). Otherwise, the relatively pedestrian spread rate in Santa Clara county needs to be explained.

  3. Sameera –

    > I would bet he would not suggest that the IFR is the same everywhere.

    He has repeatedly said it is about the same as the seasonal flu. We don’t know exactly what data he uses for thar conclusion, but he is saying that in the context of a study of Santa Clara, a study which has some dubious recruitment methodology, and I’m told statistical methodology.

    Ioannidis may well be right about the numbers. Thar isn’t a justification for how he’s selectively leveraging uncertainty to support his science.

    I am certainly not offering to judge his motivations. I assume he’s trying to lend his expertise on a very important topic. But he is outpacing the science. It’s not atypical, but it’s not questionable, acie ridixslly.

      • Thanks for responding Joshua. I did not interpret that assertion to mean that the IFR for the seasonal flu or COVID was the same everywhere. I interpreted to mean that the COVID19 IFR was likely closer to the seasonal flu IFR. Recall at the time, the availability heuristic led some prominent virologists to compare COVID19 to the 1917 pandemic which had millions of casualties.

        I comprehend what you mean. My comments were also attempting to draw a rudimentary comparison between Fauci’s initial IMHE forecast [100,000-240,000 deaths] and John’s 1st rebuttal of the IMHE figures which have been since downgraded to an estimate of 62,000. But if the testing is unreliable, then everyone is handicapped in making assessments.

        • Sameera –

          > I did not interpret that assertion to mean that the IFR for the seasonal flu or COVID was the same everywhere.

          I agree. But I’m focusing on the difference between a uniform rate everywhere, and a rate which is broadly applicable – say nationally. Ioannidis has, most definitely, talked very publicly about using data from a very particular location to justify a broad infection fatality rate. So I think we’re maybe stuck on a question of semantics here.

          Theoretically, he wouldn’t have to say that the rate is the same everywhere to stretch the evidence beyond what’s scientifically justifiable. Now maybe I’m wrong that he’s done that, and I know I’m more than a bit of a broken record about this, but I simply don’t get how it’s scientifically valid to extrapolate nationally from a rate that’s based on non-nationally representative data. Santa Clara is not remotely nationally representative in many respects – on top of the potential skewing from their recruitment methodology.

          He did the same thing with another outlier data set – cruise passengers.

          I have to assume I’m the one who’s wrong about this as Ioannidis is a renown expert in this science – but I have yet to see an explanation for *why* I’m wrong.

        • Sameera –

          Just wanted to add that I have held Ioannidis in high esteem for quite a while. I have thought maybe his criticism of the replicability “crisis” might have been somewhat overstated, but nonetheless serve an extremely important and useful role for improving the integration of scientific and academic research into society.

          That’s part of the reason why I really want to understand what’s going on here.

        • Jashua:

          By all means, hope Ioannidis is a renowned expert in this science, but don’t anyone too seriously and so presume he might be wrong.

    • Hi there Joshua,

      Well, I do think that providing quality references/citations would have been useful. It was a preprint, right? Hopefully, an appendix is forthcoming.

      I try to be mindful of what is said by whom in which context. Even the best of thinkers are capable of blunders. And in fact that highly trained thinkers have also engaged in blunders. So I’m all for accountability.

      Often we don’t know what personal dynamics are undergirding some of the academic rivalries or feuds either. That was experienced through much of my teen years of the academic circles. It was just surreal. So I resist getting into cliques even to this day.

      As I commented yesterday, I think the substance of the claims will not be significantly altered if all the corrections are incorporated.

      • BTW, I don’t agree always with John Ioannidis and others either. It’s that I think that consumers of statistics & medicine must acquire critical thinking capacities, which are integral to expert accountability. I think the history of medicine is a cautionary tale. And continues to be of concern.

      • Hi Sameera –

        Yes, maybe much of this discussion is premature given that it was only a preprint. But likewise, perhaps the publicity campaign by the authors was premature for the same reason.

        I certainly hope that the viably extrapolated infection rate is as high, and the fatality rate is as low, as they are asserting. Maybe they’ve launched this publicity campaign based on more tightly controlled data than what they’ve released this far. If so, then I really could understand why they outpaced what was released. If not, then i think that what they’ve done is of questionable value and reflects poor scientific decision-making.

  4. Yes! An excess mortality analysis!

    What’s not to like about the data: how far off can the count or total deaths be? Plus, years of solid data to compare current data to. The graphed results are clear and jump off the page to the point where you don’t even need statistics. And the graphs give an intuitive feel for how large the effect is: they automatically compare this years mortality to previous year mortalities.

    Why isn’t this the first analysis everyone does (if possible)? If a more complicated analysis doesn’t agree with the excess mortality results, then sorry, the more complicated analysis loses. The commentary on this blog makes clear that there are *huge* problems with the commonly-used data.

    You need to be a little careful about the interpretation because not all the difference is due to Covid infections, so you need to interpret the results as excess deaths due to all direct *and* indirect effects. I can live with that … in fact, those might even be the more important results.

    • I think it’s kind of misleading because where is the excess flu season mortality? The Italian government said this year was an exceptionally weak flu season which allowed the vulnerable to accumulate.

      • Found it:

        The comparison of winter mortality (December-February) shows in the winter 2019-2020 a low mortality attributable to the reduced impact of seasonal risk factors (low temperatures and flu epidemics). This phenomenon, already observed in previous years (Michelozzi et al. 2016) have the effect of determining an increase in the pool of more fragile subjects , (elderly and with chronic diseases) which can increase the impact of the COVID-19 epidemic on the mortality and explain, at least in part, the greater lethality observed in our country;

        https://www.epiprev.it/andamento-della-mortalit%C3%A0-giornaliera-sismg-nelle-citt%C3%A0-italiane-relazione-all%E2%80%99epidemia-di-covid-19

        • We want to predict how this illness would affect other places and times and devise an appropriate response. The mortality rate is way higher in some regions of Italy because

          1) Already a relatively old population
          2) A weak flu season leading to even more vulnerable than usual
          3) Aggressive use of ventilators
          4) The hospitals were sending home a lot of their staff for testing positive, probably aggravating problem #3
          5) Lots of infections there (large Chinese population so more contact with Wuhan, public service announcements suggesting it was racist to be careful around Chinese tourists, etc)

          I also saw from some videos it looked like they were putting many covid patients in the same room without divisors, likely leading to higher viral load. I don’t know how common that is but it seems like a bad idea to me. Maybe once you have too many patients there is no choice though.

          So looking at those regions of Italy and extrapolating to elsewhere that doesn’t share those properties is not a good idea.

        • A large proportion of vulnerable means the mortality rate is going to be higher in that region than elsewhere.

          And I feel like the seasonal flu excess mortality should be more prominent on those charts, so something may be up with the data.

        • You don’t see excess mortality in the summer after a flu.
          The people who get Covid-19 aren’t the same who would have gotten the flu.
          A lot of people have immunity for some strains of flu, they wouldn’t have been vulnerable,but are vulnerable to Covid-19.

          I think your argument comes down to a version of the gambler’s fallacy: because you lost this round, there isn’t a higher chance that you will win the next round.
          Because someone did not die from the flu, there isn’t a higher chance they will die from Covid-19.
          The flu doesn’t “cull” the at-risk population like it might have done in the stone age.

        • You don’t see excess mortality in the summer after a flu.

          I don’t follow, I was talking about the excess mortality in the elderly seen every winter during flu season. You can see it clearly here: https://www.euromomo.eu/

          The people who get Covid-19 aren’t the same who would have gotten the flu.

          The people who die from the flu are largely the same who die from covid-19 (elderly people with comorbidities).

        • @Anoneuoid
          Sorry, I dropped a word. You don’t see excess mortality after a MILD flu season.
          Flu waves are not all the same, if your theory held water, the summer mortality would “react” to whether the flu season was sever or not, but it doesn’t.

        • @Anoneuoid
          Sorry, I dropped a word. You don’t see excess mortality after a MILD flu season.
          Flu waves are not all the same, if your theory held water, the summer mortality would “react” to whether the flu season was sever or not, but it doesn’t.

          I understand now. Are you sure that is true? I’m looking at the (apparently just new and improved today) charts here for weekly 65+ excess mortality: https://www.euromomo.eu/graphs-and-maps/

          2016 was a weak flu year and yea I don’t see any real difference between the years after week 20 or so until next flu season. That is interesting. I don’t know if comparing a normal year to one with an illness that is especially severe in the elderly makes sense though.

        • @Anoneuoid
          The EuroMOMO 65+ data supports the idea that the deaths from different epidemic waves are somewhat independent random variables.
          If that assumption is true, then, if Covid-19 kills 15% of the 80+ population this year, then next year, that age group will suffer approximately 15% fewer deaths across all causes because the population shrunk by that amount. That’s it.
          If the assumption wasn’t true, we ought to see signs of that in the historic 65+ influenza data, but we don’t.

          You may think, “the virus only kills people with preconditions”, but it doesn’t kill all of them, and in those age groups, most people have some.

    • “you need to interpret the results as excess deaths due to all direct *and* indirect effects.”

      Yes, the more I think about it the more it seems there could be a significant issue with indirect effects. Imagine you have your 85yo parent in your house and they develop any kind of non-covid symptoms. Would you take them to the ER at the first sign of trouble, or wait until the last possible minute? If people have knowledge of the age-related impact of COVID, then that could produce the exact same pattern shown in this paper: that is, the older your home-care parent, the less likely you are to take them to the ER, and the more likely they are to become an indirect fatality.

      So if you’re backing IFR out of the excess fatalities, you’d be getting a maximum estimate and it might be quite high.

      • Or, you just don’t visit your 85-year-old parent in the care facility, which in itself is a drop in care quality. And maybe the quality of their staff’s care varies with less pressure from the adult children. Maybe their is a staffing shortage due to quarantine rules or paranoia or stress.

        Or, (as I would have) you check them out, bring them home with you, and their quality of care drops despite your best efforts. (But, thanks to my indefatiguable sister, in my case quality would have increased.)

        And in the other direction, there are possibly decreased fatalities from auto accidents, other infectious diseases, who knows? Hesitating before going to the ER might be a plus.

    • > Why isn’t this the first analysis everyone does (if possible)?

      I think it’s largely not possible until v recently as mortality data is delayed. For example in the US id normally think about the CDC data, but I read that the delays in reporting to the national level can be weeks to months in normal times.

  5. 2 points. First, living in Boston, I dug into MA’s death figures. Or rather tried: the online tool pulled up 2015, while the online archive of death reports ends at 2017, and mostly looks at town by town figures. I think that’s because the information system is GIS, as it is in the individual towns. I found no monthly information, just a single page which describes daily deaths as 161, broken into groups. That’s 2017. You could build a crappy-assed model off that, or go through the obits (for how many news sources?) every day and hope you get every death. You could assume 161 is close and multiply, etc. The 80+ category dominates MA Covid-19 deaths (1237 of 1961 as of 4/21). Some is forward displacement by weeks or months. You could model that displacement using expected life, which they do report in generalities, though I note life expectancy in general in MA is 80. But there’s not enough data to make that reasonable.

    2nd, they divvy by cause, but that raises even more questions. Like they show about 1200 diabetes deaths a year, but about 2/3 are considered underlying. I note this because deaths have an immediate, proximate, and underlying cause. That gets into co-morbidity issues and thus into IFR for age groups, condition groups, etc.

    My interest is in understanding how we function going forward. To personalize, I know a little kid who had a liver transplant. He has had almost no immune system for the last 3 years, but he goes to school, has friends, and leads a fairly normal life, with precautions and observation. He’s been hospitalized, but managing his extreme risk level has proven ‘manageable’. Knowing who to manage is important. (I’ve noted before the state’s failure to quarantine nursing homes was criminal: they even tried to empty facilities to use them as spillover hospitals when they found massive infection rates. The news showed them hauling aged people out on gurneys. I wish I were kidding.)

    Or, you could believe we a) can spend trillions without consequence and b) we’re not displacing other medical issues. Regarding the latter, you may have heard ‘elective’ and ‘non-essential’ surgeries. Believe that if you want. Here’s an anecdote: a dear friend’s daughter was called at 10PM the night before she was to report for her ‘non-essential’ surgery and told it was canceled. Her surgery: ovarian cancer. She was then told they might be able to handle her in June. So delay from early March to maybe June. She has children. Here’s a question: given the age of many Covid-19 deaths, did they just shorten her life by more than Covid-19 did for many of the very elderly? (Good news is they fit her in this Monday, so she was ‘only’ delayed by 6 weeks or so. That took how many months of life away from her and her family?)

    Here’s another question: how much diagnostic imaging is being done now? Mammography is at a standstill. Other cancer screenings, including those included in ordinary office visits and blood work, are not being done. You can then attribute these extra deaths (+ cost, + pain) to Covid-19 (though I’d be more likely to say it’s terrible management by the demonstrably terrible medical establishment). I’m not going to argue that point, but rather point out that paths have costs. There are lots of models that connect screening and surgery to finding cancers, to life extension, etc. Haven’t seen a single mention of that anywhere as a cost of this particular path.

      • The doctors I know, including family, have been seeing patients but not nearly the volume, and obviously not as many in person. The number of diagnostic tests ordered has dropped. That has a cost which can be thought of as the bringing forward of future deaths, just as the deaths of the very elderly is a bringing forward of future deaths. Let’s say the virus kills half the centarians. We’d see a bulge in deaths that would bleed off over time as we ‘construct’ more centarians (assuming we don’t decide to keep the centarian census low, which is the same as ‘down-regulating’ by imposing a continuing process – the virus – that increases 100+ mortality. You can do the same with 90 year olds because they already exceed the standard life expectancies. You can take it down to deeply stratified life expectancy if you want. Or at least do it by sex. Point: once you go below life expectancy, that bulge in death has 2 effects, one being recovery of the lost population to life expectancy and the other going beyond that. Isn’t that how you make waves across a field: if your data is strong enough, you can step each rational stratification and describe something approaching a smooth(er) curve at the projected level. The (er) is because this invokes elliptic curves so you can draw lines but not be able to describe the actual dimensional pathway. At least that makes sense to me.

  6. Here’s a challenge for all you model-mavens:

    1. Is R0 approximately a linear function of the average number of people an infected person interacts with each day? Seems pretty reasonable to me.

    2. If so, why isn’t it relatively easy to reduce R0 dramatically? Banning large gatherings and increasing physical distancing alone should have a big impact on average daily interactions.

    This reasoning must be wrong, but I don’t see why.

    • R0 is a fine measure in a homogenous network, but that’s not what we have. Note, for example, the vast difference between infection occurrences in nursing homes vs. the general population (even the general population of elderly people). As another example: 90% (!) of Singapore’s spike of new Covid-19 cases are in worker dormitories.

      Banning large gatherings etc lowers R0 for the “average” set of people, but that R0 is already low, and the average R0 is not the average of the network’s subgroups. Imagine one group with R0=0, and another with R0=2; in the overall population that is the two groups together, each infection leads to two new infections, so R0=2, not 1.

      Put another way, “We find that the classical concept of the basic reproduction number is untenable in realistic populations, and it does not provide any conceptual understanding of the epidemic evolution.” — from a 2018 paper that I quote in a post I wrote yesterday.

      • Good points about heterogeneity of R0; and good paper. I’ve been thinking about heterogeneity lately too.

        My intuition is that heterogeneity makes it easier is some ways to reduce R0. Say we have a population with two subgroups. One group goes to sporting events and the other doesn’t; and infection is spread only through sporting events. R0 = 10 for the first group and R0 = 0 for the second (the first group are “super-spreaders”). If we just eliminate sporting events, R0 = 0 for everyone, and the second group doesn’t need to do anything.

        To my mind, this suggests that, when super-spreaders are present, efforts should be focused narrowly on them and low-R0 populations could be left alone. Perhaps suburbanites naturally have R0 < 1 and need to do little.

        It is hard to believe I am right about this. Any idea why I am wrong?

        • Germany has data on symptom onset for most cases and also did a staggered shutdown. Look at page 3, figure 2 of this report: https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Situationsberichte/2020-04-21-en.pdf?__blob=publicationFile

          Crudely, March 9 shutdown of schools, events >500 canceled, people asked to keep 1.5m distance, wash hands, and observe cough hygiene: changed daily case rate of growth from exponential to linear
          March 15 shutdown of nonessential shops and all entertainment venues etc, limiting gatherings to 50 persons: daily case rate now aporoximately flat
          March 22 shutdown of restaurants, more shops, limiting gatherings to 10 and soon 2 people:
          daily case case starts shrinking

          Contact tracing and quarantining of outbreak regions is going on throughout.

          So, the “easy” measures do have an effect, but they act in parallel to “teach” the population that individual compliance is important. Growth rates are quite different by region, my county hasn’t had a new case for a week.

          Ideally, you’d be able to control the situation with physical distancing, hygiene, and containment measure (contact tracing/quarantine), the problem is to catch and recognize outbreaks that “escape” because these are driving the spread. If you could lock down regions where outbreaks are imminent, you’d be home free, but unfortunately that violates some fundamental principles of physics.

          The RKI is trying to find ways to recognize outbreak regions faster: they’re now collecting smartwatch/fitness data from volunteers to get clues about the general health of a region, they’ve opened up direct reporting lines to hospitals and laboratories . The old way is just using county health officials, which gives very good data but introduces delays up the chain: not an issue when you’re dealing with salmonella in restaurants, but difficult when you’re trying to control an international epidemic!

          The conundrum is to find a subset of measures that will keep the daily case rate from growing, and ideally keeps it shrinking, while re-establishing as much economic activity as possible.
          Personally, I hope we can revert to containment for all regions; underpowered county health is getting help with that, people start using community masks, summer is coming, …

      • > Banning large gatherings etc lowers R0 for the “average” set of people, but that R0 is already low, and the average R0 is not the average of the network’s subgroups. Imagine one group with R0=0, and another with R0=2; in the overall population that is the two groups together, each infection leads to two new infections, so R0=2, not 1.

        Let me run a thought by you.

        I think your point needs to be stressed with respect to perhaps a related point about relaxing social distancing.

        I see people saying that it shouldn’t be a problem to let people walk on the beach because most infections take place in the home.

        Suppose you had a 10 person family. 9 stayed at home and one went walking out on the beach. And let’s say a low probability event happened, and that person walking in the beach got infected and then went home and infected his family members.

        So then you have 90% of the infections coming from close contact, and only 10% from public interaction. But you have that one contact expanding exponentially.

        Thoughts? Is that a related concept?

        • I’m not sure what you’re illustrating with this example that’s different, but I’m probably missing your point. If your goal were to minimize the spread of infections (or death) breaking the highly connected 10-person group would be effective. (I don’t like phrasing it as families, but whatever.) Obviously if there are *any* connections, even ones with extremely low probability like walking on the beach, transmission risks are not zero, but that’s not relevant to this.

          Perhaps more relevant: in this construction the highly connected group has no connections outside its group, other than the guy walking on the beach, rather than being part of a broader network of high connectivity. Whether that’s realistic or not I won’t say, but it highlights again that the *real* network structure matters.

    • Terry, I don’t know, but I’m going to make some stuff up.

      With regard to 1: everything is linear to first order, so there is nothing wrong with your approximation over some range. (Yes, I know if you expand y=x^2 around zero, it’s not linear to first order, but you know what I mean). I think if you cut the number of close personal interactions per day in a region in half you will cut transmissions per day by a large factor. But not a factor of 2.

      As a practical matter, in most social regimes some people are going to keep having very large numbers of interactions even if large gatherings stop. A grocery store checkout person is going to have a close encounter with dozens or even hundreds of people per day. They are presumably both much more likely to become infected than most other people, and more likely to pass it along to others, and this won’t stop if large gatherings are banned. This is one reason a factor of two decrease in average interactions won’t lead to a factor of two reduction in infections. If a significant number of infections come from ‘superspreaders’ who aren’t affected by the behavior changes, that’s obviously going to interfere with the kind of scaling you’re looking for.

      But I don’t think your reasoning is wrong, really. What has convinced you that those measures aren’t at least moderately effective?

      • I think those measures *are* moderately successful. Just less so than I would have guessed. It seems pretty easy to cut your number of interactions by 2/3, so my intution is “why isn’t that enough”. The answer seems to be that it would cut a super-spreader’s interactions from 15 to 5, which isn’t good enough. You have to cut almost every groups R0 to less than 1.

    • Every epidemic, when it’s growing exponentially grows like

      N = exp(r*t)

      the point of R0 as far as I can tell is that it’s a way to connect knowledge you might acquire from contact tracing efforts to estimate the r in the growth equation. Like after you’ve got 10 confirmed cases, and traced all their contacts and found that on average each confirmed case infected 3 other people over a period of 10 days… then you can estimate r from that and have something to say about the epidemic *even before there are significant numbers of cases to fit an exponential curve to*

      This is why the construct of R0 exists.

      So R0 isn’t a fundamental property of anything, it’s really more like a way to connect epidemiological data to estimate r… But what we’re getting is bass-ackwards… where people are looking at the N data and estimating R0 from that.

      R0 estimation is just an extra step adding noise to the signal.

      • And this is coming from the person who insists on having mechanistic models of everything… But you are right, anything growing exponentially grows like an exponential.

        • If you’re estimating R0 from contact tracing, then it gives you a mechanistic connection between data you can collect via contact tracing and a prediction about how fast the exponential will grow. This is super valuable. so Contact Tracing -> estimate r before you can observe it is a useful direction.

          But if you’re backing out Reff from observed exponential growth… then it’s just a deterministic transform of fitting r… So fit r -> transform r to Reff, the Reff adds nothing here.

          Now, if you’re doing contact tracing at the same time, and you’re getting different numbers for Reff(r) and Reff(ContactTracing) now there’s actionable information… you can ask yourself what is wrong with what we’re doing? Like maybe tests aren’t ascertaining enough cases, or maybe there are crazy false positive rates, or maybe we’re not contact tracing hard enough, or whatever.

          Without a comparison to contact data the Reff(r) is just a deterministic transform.

        • Also, Carlos, the concept of “basic reproduction number” is more or less an example of “homogenization”. you take a statistical sample and say that the average is all that matters, and then go from there.

          It’s the simplest mechanistic model. Realistic mechanistic models would be using realistic contact networks. And Raghuveer posted a nice paper above showing that people who do that don’t think the R0 concept is particularly helpful:

          https://doi.org/10.1073/pnas.1811115115

          It’s a nice paper

      • Specifically, suppose that you did see an average of spreading to 3 people in 10 days… So it went from 1 to 4 people in 10 days on average… so r = log(4)/10 if t is measured in days.

        Admittedly this is the Reff (effective R) but to get R0 you just make the assumption that the environment you are looking at is more or less the “uncontrolled” most rapid spread (because in the early stages no one is taking any precautions).

  7. The error in the epidemic model predictions (output) may be due to miscalculation of the model input parameters and Ro (input), based on curve fitting the data for increasing number of cases per day, instead of correcting for the bias in that metric due to the increasing number of tests per day. So, looks like the typical “garbage in, garbage out”. Hard to believe that anyone capable of running a model could make such an obvious error. (But then again, we have others who should know better also using this wrong metric: “None of the states have met basic White House guidelines unveiled last week of two weeks of declining cases before a state should reopen.” The number of cases per day is the wrong metric, because it depends (largely, or maybe entirely) upon the number of tests per day.

    The correct metric to show the growth rate of the epidemic was not the increasing number of cases per day (what the news media showed to panic the public), but rather was the trend in the percentage of positive test results (a trend some researchers have reported was fairly flat, indicating no increasing epidemic). This important metric was not timely presented by CDC, and CDC made it difficult for anyone else to determine this important metric by their procedure for back-dating the reported cases (so the reported #cases on a given date are not matching the #tests reported for that date). So the data suffer from deliberate back-dating, unknown false-positives (failure to evaluate the test accuracy), and double-counting (which CDC acknowledges for their flu reports). It would help to unravel these mysteries if the CDC and the states were forced to present the correct metric on their websites, and to explain why they did not timely do this during March.

    • > The correct metric to show the growth rate of the epidemic was not the increasing number of cases per day (what the news media showed to panic the public), but rather was the trend in the percentage of positive test results

      I don’t think this is right. When I plot tests and cases, and look at cases per day… in most moderately+ affected areas cases per day grew *exponentially* while tests grew *linearly*. This makes sense because there’s some testing capacity and they mostly ran at maximum capacity and can never go above that.

      The thing was, Most places got their testing online shortly after Mar 15. But shelter-in-place orders were put in shortly after, like Mar19-Mar23 range.

      Since there’s a lag from infection to seeking a test, it means the cases per day were affected by the shelter in place orders around April 1 or a little before.

      After April 1 or so, growth of the infection was reduced to linear or for some places, even sublinear (declining per day cases).

      But looking at cases per day for AZ, CA, CO, CT (consecutive pages on my printouts) it seems clear that the shelter in place orders caused new cases per day to flatten to constant (linear growth).

      The right simple metric to understand this pandemic is the exponential growth rate during the period after Mar 17 when most states were finally testing rapidily, and before (shelter-in-place + 7 to 15) days which was ~ April 1.

      The flattening of the curve put in by shelter in place made the cases grow linearly, and then in the long time asymptotic behavior, the cases and the tests are proportional.

      • Daniel Lakeland said: “When I plot tests and cases, and look at cases per day… in most moderately+ affected areas cases per day grew *exponentially* while tests grew *linearly*.”

        Hi, Daniel. Yes, good idea to plot both #cases and #tests, as you did, and also plot the percentage over the time period. Do you have a data source that will allow you to correctly match the #cases and the #tests (for same date) so that you can correctly calculate the percentage of positive test results (cases) for each day’s batch of tests?

        For some areas, the plot of #tests per day over March is curiously similar to a typical epidemic curve. In that situation then, if the true percentage of cases/tests is fairly constant or slowly increasing (ie., slow or no increasing epidemic) as has been reported, then the resulting plot of #cases per day will resemble a typical epidemic curve even with no epidemic. So at least, there must be some correction to the epidemic curve to account for the bias introduced by the increasing number of tests. And the modelers must not use the scare curve, based only on increasing #cases, for estimating the model parameters.

        • I’ve been watching this for awhile:

          https://covidtracking.com/api
          https://www.docdroid.net/qKQaNNh/covidstates-pdf

          A relatively simple SIR model (well mixed population, empirical distribution of time until recovered/dead, etc) where the cases had already peaked before the first death there ~ May 1st and the entire growth in reported tests occurred during the downphase of the epidemic can qualitatively reproduce the general pattern at a national level:

          https://i.ibb.co/RPxQYV6/sirresults.png

        • Typo: entire growth in reported cases.

          And I shouldn’t have to say this, but of course that is not sufficient to conclude that is what happened.

        • Let’s try again:

          A relatively simple SIR model (well mixed population, empirical distribution of time until recovered/dead, etc) where the cases had already peaked before the first death ~March 1st and the entire growth in reported cases occurred during the downphase of the epidemic can qualitatively reproduce the general pattern at a national level

    • The trend in positive test results means nothing, because diagnostic Covid-19 test are not a representative sample, and this is not a survey. On the day that Dr Birx talked about the “logarithmic curve” in the press briefing,mshe seemed to espouse the same misconception, so you are not alone there.

      Imagine a country with sufficient test capacity. The number of tests that are carried out in this situation is determined by the number of tests that doctors send to the laboratories.If we have twice as many tests, that means we have twice as many suspected cases.

      Assume that the recommended criteria for ordering a test done are exposure to the virus or a high-risk contact to an infected person. Assume further that about 10% of high-risk contacts actually get sick (attack rate). Then, if we tested all contacts, and only contacts, we would always expect a fixed proportion of positive tests that reflects the ability of the virus to infect others, and the test volume reflects how many infected persons are in the population creating high-risk contacts: the test volume rises with the number of (known) infected, the proportion of positive tests stays the same.
      I’m assuming these or similar circumstances prevail in regions where the positive rate is 10% or less.

      If the test positive rate is higher than that, the test capacity is no longer sufficient to fully test all high-risk contacts, and the sample being tested is recruited from a set of patients more likely to be positive, e.g. people admitted to hospitals already with pneumonia “of unknown aitiology”. The positive rate is a measure of how likely the set of people that gets tested is to have been infected, and that reflects the criteria by which these people are chosen; and these reflect how scarce tests are. If your positive rate is >10%, you are likely undertesting and missing cases.

      If the criteria were to choose a representative sample of the population, you’d be correct, and the positive rate would have the meaning you assign to it. But that’s not what happens.

      • *** What was the actual growth rate of the epidemic during March? ***

        Mendel said: “Then, if we tested all contacts, and only contacts, we would always expect a fixed proportion of positive tests that reflects the ability of the virus to infect others, and the test volume reflects how many infected persons are in the population creating high-risk contacts: the test volume rises with the number of (known) infected, the proportion of positive tests stays the same.”

        To Mendel: Thank you for your insight and good explanation. I have used your explanation as H2 below.

        So, it seems that there could be three logical explanations H1 H2 H3 for the epidemic curve (increasing #cases) observed during March.
        We already know that the first explanation H1 is not true, although it is the one most likely believed by the public.

        If assume no false-positives, then the equation is: #cases/day = (%infected in sample) x (#tests/day)

        H1 — The epidemic is spreading in the population…. #cases/day is increasing solely because %infected is increasing in population.

        H2 — The epidemic is spreading in the population…. #cases/day is increasing because #tests/day is increasing, and
        testing is restricted to exposed people with fairly constant %infected, and
        #tests/day is increasing because #exposed people is increasing.

        H3 — The epidemic is NOT spreading in the population…. #cases/day is increasing because #tests/day is increasing, and
        %infected in population is staying about constant (or maybe the percent
        of false-positives is staying about constant).

        H4 — other ideas? Perhaps some combination of H1-H2-H3?

        How can one determine which explanation is more likely to be true?

        • H3 can be disproven because %infected is not constant.

          German rt-PCR testing data by calendar week:
          Week – Tests – positive – reporting labs
          16 323.449 21.538 (6,7%) 161
          15 378.881 30.700 (8,1%) 160
          14 408.173 36.850 (9,0%) 152
          13 361.374 31.391 (8,7%) 150
          12 348.619 23.820 (6,8%) 152
          11 127.457 7.582 (5,9%) 114
          up to week 10 124.716 3.892
          Source: https://www.rki.de/DE/Content/InfAZ/N/Neuartiges_Coronavirus/Testzahl.html
          Note that multiple tests may be commissioned for some people, this statistic is about tests, not cases.

          H3 is approximately true in a situation where testing has been deployed late, and is catching up. H1 is true in a situation where testing is sufficiently available. H2 puts H3 in the context of “exposed population” instead of “everyone”, and proposes a sliding transition from an H3 situation to an H1 situation as test capacity is surged, or possibly a transition the other way if the epidemic outpaces growth in test capacity, or test capacity is reduced due to materials running out (week 14 had reports of lab supply shortages in Germany, if I recall correctly, with one lab not being able to test their backlog of 4000 samples before they spoiled because an awaited delivery did not arrive in time).

          With most testing regimes, asymptomatic cases don’t get tested unless they occur in high-risk populations; but if we assume that the proportion of asymptomatic cases to symptomatic cases is constant (or at least, constant by age group), the spread of symptomatic cases as tracked by testing is still indicative of the overall rate of spread.

          P.S.: Even if Germany could sustain a rate of 400 000 PCR tests per week, it would take 4 years to test every person at least once. Testing requires a strategy that ensures that we can find people who are sick at a much better than random rate, and that requires restricting who gets tested. The study about the first German cluster, “Outbreak of COVID-19 in Germany” by Merle Margarete Böhmer, Lancet preprint March 31, identified 217 non-household high-risk contacts (face to face for >15 minutes or contact to secretions or body fluids), and only 11 were infected. That’s a 5% attack rate, and if you look at the data cited above, we couldn’t consistently keep testing that kind of population at that rate. (Household member attack rate was 10% if infected household members isolated after being identified, one household group of 5 stayed together, and 1 member remained not infected.) The strategy is therefore (current RKI recommendation) to primarily test high-risk contacts with symptoms and institutional pneumonia outbreaks, and also test viral pneumonias with no alternative diagnosis, and then test people with respiratory symptoms who work in care or are in a risk group, and at the lowest priority is everyone else with respiratory symptoms at the physian’s discretion. This strategy is continuously being adjusted.
          I expect that competent public health offices pursue similar strategies around the globe. The WHO is helping to make that happen.

    • You’ve made a lot of assertions about errors by “epidemic model” makers, without referencing a single prediction or study. It’s impossible to respond substantively to your comment without knowing what you’re talking about.

      And have you actually read any of the studies? The Imperial College London study proposes a metric for lifting lockdown that appears to me to be superior to either new case count or % of positive tests.

      • They are right though. Have you seen anyone publish a model that incorporates rate of testing? I’ve been following pretty closely and the only ones I’ve seen are mine and Daniel Lakeland on here a few weeks ago.

        I didn’t even try to model this until the relatively clean US data on number of tests became available because that is the most important factor in number of cases.

        How can a model that ignores the most important factor be any good?

        • Anoneuoid said: “How can a model that ignores the most important factor be any good?”

          To Anoneuoid: Yes. Thank you! You made this crucial point very clearly. Everyone should think about this.

          But unfortunately there was not available “relatively clean US data” for sorting out the true epidemic growth rate apart from the bias due to increasing rate of testing. The clean raw data existed, but was not easily available . . . so probably not clean in the CovidTracking.com website. We couldn’t see the clean data during March because CDC website reported the adjusted data, after back-dating the case dates so those cannot be easily matched to the test date in order to correctly calculate the percent positive tests (cases).

          And in addition to back-dating the #cases (which causes the calculated %positive to inflate when daily testing is ramping up), the dataset is also complicated by unknown false-positives percentage (failure to evaluate the test accuracy), and possible double-counting some positive tests (which CDC acknowledges for their flu reports).

          https://www.cdc.gov/coronavirus/2019-ncov/covid-data/covidview/index.html

          But now belatedly, CDC is showing ” therefore, the percentage of specimens testing positive across laboratory types can be used to monitor trends in COVID-19 activity.” Too late.
          As I said earlier: ” It would help to unravel these mysteries if the CDC and the states were forced to present the correct metric on their websites, and to explain why they did not timely do this during March.”

  8. I have a question about this paragraph: “As the saying goes, not making a decision is itself a decision. It’s fine to take survey data and not poststratify; then instead of doing inference about the general population, you’re doing inference about people who are like those in your sample. For many purposes that can be fine. If you’re doing this, you’ll want to know what population that you’re generalizing. If you do want to make inferences to the general population, then I think there’s no choice but to poststratify (or to do some other statistical adjustment that is the equivalent of poststratification).”

    @Andrew, I presume that you’re talking about poststratification for a non-random sample (as the sample in question.) If that’s right then you seem to be saying that poststratification generally improves non-random samples such that this should be the standard practice. Can you point me to some resources on this? I’m aware of specific studies showing impressive results for non-random samples (such as your Xbox survey paper), but I’d like to see evidence that basic raking, etc. on a few key variables (as in this study) generally improves estimates. The closest thing I’ve seen to this is this article by Krosnik et. all (https://pprg.stanford.edu/wp-content/uploads/Mode-04-Published-Online.pdf) in which they find that “post-stratification of non-probability samples did not consistently improve accuracy.”

Leave a Reply to Jonathan Cancel reply

Your email address will not be published. Required fields are marked *