“Inferring the effectiveness of government interventions against COVID-19”

John Salvatier points us to this article by Jan Brauner et al. that states:

We gathered chronological data on the implementation of NPIs [non-pharmaceutical interventions, i.e. policy or behavioral interventions] for several European, and other, countries between January and the end of May 2020. We estimate the effectiveness of NPIs, ranging from limiting gathering sizes, business closures, and closure of educational institutions to stay-at-home orders. To do so, we used a Bayesian hierarchical model that links NPI implementation dates to national case and death counts and supported the results with extensive empirical validation. Closing all educational institutions, limiting gatherings to 10 people or less, and closing face-to-face businesses each reduced transmission considerably. The additional effect of stay-at-home orders was comparatively small.

This seems similar to the hierarchical Bayesian analysis of Flaxman et al. In their new paper, Brauner et al. write:

Our model builds on the semi-mechanistic Bayesian hierarchical model of Flaxman et al. (1), with several additions. First, we allow our model to observe both case and death data. This increases the amount of data from which we can extract NPI effects, reduces distinct biases in case and death reporting, and reduces the bias from including only countries with many deaths. Second, since epidemiological parameters are only known with uncertainty, we place priors over them, following recent recommended practice (42). Third, as we do not aim to infer the total number of COVID-19 infections, we can avoid assuming a specific infection fatality rate (IFR) or ascertainment rate (rate of testing). Fourth, we allow the effects of all NPIs to vary across countries, reflecting differences in NPI implementation and adherence. . . .

Some NPIs frequently co-occurred, i.e., were partly collinear. However, we were able to isolate the effects of individual NPIs since the collinearity was imperfect and our dataset large. For every pair of NPIs, we observed one without the other for 504 country-days on average (table S5). The minimum number of country-days for any NPI pair is 148 (for limiting gatherings to 1000 or 100 attendees). Additionally, under excessive collinearity, and insufficient data to overcome it, individual effectiveness estimates would be highly sensitive to variations in the data and model parameters (15). Indeed, high sensitivity prevented Flaxman et al. (1), who had a smaller dataset, from disentangling NPI effects (9). In contrast, our effectiveness estimates are substantially less sensitive . . . Finally, the posterior correlations between the effectiveness estimates are weak, further suggesting manageable collinearity.

I don’t have anything really to say about their multilevel Bayesian analysis, because I didn’t look at it carefully. In general I like the Bayesian approach because the assumptions are out in the open and can be examined and changed if necessary.

Also, I didn’t quite catch how they decided which countries to include in their analyses. They have a bunch of countries in Europe and near Europe, two in Africa, three in Asia, one in North America, and one in Oceania. No United States, Japan, or Korea.

The other thing that comes up with all these sorts of analyses is the role of individual behavior. In New York everything much shut down in March. Some of this was due to governmental actions (including ridiculous steps such as the Parks Department padlocking the basketball courts and taking down the hoops), but a lot of it was just individual choices to stay at home. It’s a challenge to estimate the effects of policies when many of the key decisions are made by individuals and groups.

58 thoughts on ““Inferring the effectiveness of government interventions against COVID-19”

  1. There is also the risk of dismissing a solution because a policy was poorly implemented or poorly communicated. Some have implied with respect to mask mandates and other policies that if an “intent to treat” study finds no effect, that the policy is wrong (e.g. “the baking temperature is as much a part of the recipe as the batter”). The problem, is that the implementation can be modified to be more effective; something that “intent to treat” study boosters seem to forget.

  2. > (including ridiculous steps such as the Parks Department padlocking the basketball courts and taking down the hoops)

    Well, if everyone had made the individual choice to stay at home nobody would have noticed. So maybe it was not so ridiculous after all…

    • In my city they took down the hoops too. And although I have no idea if that was a good idea or not from a public health standpoint, I definitely was personally annoyed that I couldn’t go and put up some shots by myself. There really was not much to do so a safe and healthy outlet would have been nice.

  3. This kind of analysis does not explain how it happens that the infection rate in some area (like a U.S. state, say), rises up to a peak and then declines even though the area government has not instituted any orders such as shutdowns. An example would be the U.S. state of South Dakota.

    It also does not, at least in the chart reproduced in this blog post, say anything about mask-wearing or social distancing. And it does not seem to consider the degree of compliance with any official orders, such as maximum number of people in a gathering.

    So it’s hard for me to see how there’s much to take seriously here. OTOH, I haven’t read the actual article.

    • > This kind of analysis does not explain how it happens that the infection rate in some area (like a U.S. state, say), rises up to a peak and then declines even though the area government has not instituted any orders such as shutdowns.

      I think this point illustrates nicely Andrew’s comment about individual behavior; many folks, perhaps, took it upon themselves to stay at home or wear a mask. (I’m just speculating, of course). However, even individual choice reflects the autocorrelative and hierarchical nature of the phenomenon. Even if a particular state or country doesn’t enact a particular policy decision, individual behavior could be driven by the policies implemented in nearby states/countries and even at the broader region. In South Dakota, individual choice to stay at home could reflect the influence of policy in, say, Minnesota and the fact that individuals hear and read about the policies, though unofficial, put out by Dr. Fauci. It’s a kind of peer-pressure or keeping-up-with-the-Joneses type scenario.

      • Assuming the primary driver of individual behavior independent of interventions is reports of how bad things are, maybe you could use unlagged actual case counts and death counts as a predictor.

      • Sure; I think the question is rather why cases have fallen in the last few weeks (without any measures being taken). Has individual behavior shifted, or is something else going on?

  4. A quick scan fails to pick up any measures of compliance or voluntary changes in behavior.

    A colleague mentioned this week that they were using cell phone mobility data to try and so this out – in some areas in the spring ~70% reduction whereas this fall ~20%. Preliminary with no detail and likely not publicly available yet, but it does suggest studies based on the data this group has are dead on arrival.

  5. “No United States, Japan, or Korea.”

    Well, it would be hard to analyze the United States as a unit. We have no national policy, and where mitigation mandates have not only varied widely between the states, but even within states (i.e. some states refuse to close bars and indoor dining, or require masks, but let local jurisdictions do so if they wish).

  6. Another (related to Andrew’s point) difficulty is how do individuals respond to rising cases/deaths? These NPIs are usually in response to big increases in cases but it’s also possible that people’s behaviors also respond to those increases. And the timing of individuals’ responses happens to coincide with the timing of when those restrictions come into place. This also relates to the composition of regions as well – jurisdictions that are more likely to implement NPIs might also be composed of more people who are personally sensitive to rising infections.

  7. As a continuation of a thread from last week, I wonder whether there is any way to implement particular NPI’s (i.e. experimentally) in order to facilitate ex post analysis? That’s just a question; I don’t have the background to try to answer it.

    As with vaccine rollout, the politics of doing something like this would be dicey in the extreme, but that’s a separate matter.

  8. Much of this discussion is based on counterfactual arguments that assume an ability to assess what would have likely happened if things had been different.

    Such arguments, IMO, require a very high quality of evidence that we don’t yet have (if we ever will).

    Not to say the analysis shouldn’t be done, it shouldn’tb; we need to try to assess the efficacy of interventions. But for me it’s kind of beside the point right now.

    Policy to address COVID, IMO, is primarily about hedging against extreme downside risk. It’s about decision-making in the face of high uncertainty.

    • >>Policy to address COVID, IMO, is primarily about hedging against extreme downside risk.

      Is that *still* true, this late in the game? I think there is a lot less uncertainty now, and both the very high-impact (eg 2.2 million deaths in the US) and very low impact (eg 0.1-0.2% US IFR) have been pretty thoroughly ruled out.

      We also have a vaccine EUA, which suggests more action in the short term (if it was really going to be 4-5 years, keeping measures going that long would be unworkable).

      On the other hand, in much of the US there is just no political will for any new mandatory measures, and they wouldn’t be followed/enforced if imposed.

      • Confused –

        > Is that *still* true, this late in the game?

        The window is closing, for sure. So sure, the extent of the downside risk is somewhat attenuated.

        At least some of the downside risk is already baked into the cake and unavoidable even with the vaccine and no matter how effective interventions might be. But theoretically, particularly absent evidence of how many people will get vaccinated, and absent evidence of whether the vaccines prevent infection (as opposed to just limit severity), seems to me there’s still a pretty significant range of uncertainty about the potential risk.

        Interventions today could theoretically prevent hospitals from being overwhelmed beyond some 20 days hence up until the # of people getting vaccinated makes a serous dent in the rate of spread. How long will that take?

        > On the other hand, in much of the US there is just no political will for any new mandatory measures, and they wouldn’t be followed/enforced if imposed.

        Prolly. But isn’t there still the theoretical question to be answered?

        • >>The window is closing, for sure. So sure, the extent of the downside risk is somewhat attenuated.

          At least some of the downside risk is already baked into the cake and unavoidable even with the vaccine and no matter how effective interventions might be.

          Hmm, ok, I guess I somewhat misread your point.

          I was thinking of “extreme downside risk” in the sense that “COVID turns out to be much more deadly than we thought” not “more people get infected”.

          Given that some places in South America have seen 60% or even 70%+ infection rates I think we have a fairly clear picture of what an uncontained outbreak looks like (though admittedly US populations skew older … but medical care availability is also better) and 1918-level conditions seem clearly out of the possible range.

          >> But theoretically, particularly absent evidence of how many people will get vaccinated, and absent evidence of whether the vaccines prevent infection (as opposed to just limit severity), seems to me there’s still a pretty significant range of uncertainty about the potential risk.

          I guess in theory. Personally, though, I think the window is closing even for that. IMO in a year most people will either have been vaccinated

          > >Prolly. But isn’t there still the theoretical question to be answered?

          Sure — but if it is a theoretical question rather than a practical one of what to do now, I’d rather wait until the pandemic is over and its entire course can be looked at, and understanding has matured a bit/time pressure on the science is gone.

          I have a feeling that there are too many factors for there to be good, quick answers. Some places (e.g. Sweden) have curves that look like classic seasonality (spring and fall peaks); others (southern US e.g. Arizona and Texas – peaks in July and December don’t look like any kind of seasonal pattern).

        • > I have a feeling that there are too many factors for there to be good, quick answers.

          That has not stopped you in the last months from giving answers to many questions (often not good, so I guess it supports your point).

          Like your confident prediction in September that “a death peak higher than March/April just isn’t going to happen.”

        • > (often not good, so I guess it supports your point).

          Ouch.

          I will say that confused usually comes back to acknowledge errors and overconfidence, although I’m not sure how much s/he learns from experience.

          It’s intersting how difficult it can be for people on the Interwebs to learn that you can’t decouple infections from deaths.

        • IE – in September, I thought that the high cases/deaths in July/August in the southern US had demonstrated seasonality wasn’t much of a factor for COVID, therefore seasonality shouldn’t be suppressing COVID in the summer and there shouldn’t be a resurgence beyond that related to holiday gatherings in the winter.

          Now it really does look like there is serious seasonality… maybe the southern US is too low-latitude for that to be as significant? But winter is still flu season in places like Texas… It is weird.

        • I admit I was wrong about that.

          The reason I *now* think much useful can’t be said yet, is because I don’t see that any *simple* explanation can account for / reconcile…

          1) very low cases/deaths in Sweden and the Great Plains US states (Dakotas, Nebraska, etc.) in the summer, in the absence of strong measures
          2) current high cases/deaths in both Sweden and (most of) the US
          3) midsummer and (current) late fall/early winter peaks in the southern and southwestern US
          4) no high per-capita mortality rates in Africa or Asia (with Western Asia, having a few moderately-high mortality rate nations, eg Iran, but none in say the top 20)

          1) and 2) sound like typical seasonality, but 3) is a radically different pattern. 4) seems even more mysterious given the huge disparities in forms of government, resources, etc. available.

        • What I’m seeing more and more is people saying that lower attack rates amidst interventions and mask-wearing proved that a “herd immunity threshold” had been reached sooner than thought.

          And then when people start relaxing the interventions and social distancing less, attack rates go back up and that proves that the interventions did’t work.

          Suddenly the “herd immunity” becomes un-herd immunity.

        • The difference in September for me was that we hadn’t seen fall spikes yet in Europe/NE US, so I thought July-August spikes in many parts of US (TX/FL/AZ) etc. had “disproved” seasonality (traditional respiratory virus-style, anyway).

          I think it’s now much harder to reconcile the Europe/US fall resurgence (implying significant seasonality), including in Sweden where restrictions don’t seem to explain low summer numbers, with the high summer surges in some of the *same* parts of the US.

          IE – if only the northern US was surging now, we could say that TX/FL/AZ had a different seasonal pattern than higher latitudes. But surges in both July/August and December don’t look like any form of seasonality.

          Yet without strong seasonality, how can low cases in the summer in some northern places without strong restrictions (Sweden, Great Plains states in the US) be explained?

        • @Joshua: I think you are largely right, but I don’t think behavior/interventions alone can explain the observed patterns, given the low cases in summer/high cases in fall with places with not much action taken (the Dakotas, Nebraska, Sweden, etc.)

          I’m also not sure that can explain the “longitude” pattern – *all* the high per-capita death rates are in the Americas and Europe.

          Measures taken can reasonably explain low rates in developed, coastal East Asia (Japan, Singapore, South Korea, etc.) – but not, I think, *all* the other nations of Africa and Asia with a huge variation in economic development, government types, infrastructure, culture, etc.

          There actually seems to be a sort of “line” at more or less the divide between the Mediterranean watershed and cultural sphere and the South Asian one — Iran and some Eastern European countries have high per-capita death rates, but nothing east of Iran does.

        • IE – granted that it can’t be herd immunity (since there was resurgence in fall/winter), what *does* explain the lower numbers in summer, in places where behaviors didn’t change noticeably between summer and fall (and the surge started well before Thanksgiving, so it can’t just be holiday gatherings)?

          If it’s seasonality, why do places like Florida, Arizona, and Texas have such a “non-seasonality-like” curve?

          I think there is pretty strong evidence to rule out all the “simple” models (purely behavior-driven, purely seasonality-driven, herd immunity in lots of places).

        • @Carlos Ungil: I can’t read that NYT article, but I think “herd immunity” is being misused anyway. I think a lot of people talking about it really mean the transition from “pandemic/epidemic” to “endemic” levels.

          Unfortunately, I’m not sure how well understood that transition actually is. We didn’t have the ability to really track what the virus was doing in the past (and 2009 was no worse than a mild seasonal flu virus to start with).

          But clearly, historically, *something* happens to end these respiratory pandemics, even pre-vaccine ones like 1889 and 1918, within a few years. I don’t think it is actually “herd immunity” though.

        • confused –

          I think there’s enough going on to speculate that it’s complicated, and multi-factorial, with a lot of interaction effects and mediators and moderators.

          > Yet without strong seasonality, how can low cases in the summer in some northern places without strong restrictions (Sweden, Great Plains states in the US) be explained?

          I read that in Sweden, behavioral changes might have come into play via people traveling to remote areas in vacation.

          > I think there is pretty strong evidence to rule out all the “simple” models (purely behavior-driven, purely seasonality-driven, herd immunity in lots of places).

          Sure. No simple models – like lattitude.

          A big thing for me is thst if you’re going to impute cauality based on observing a correlation you should come to the table with a plausible mechanism of causality. By what mechanism does latitude predict cauality?

        • Do you mean my comment about longitude? (Latitude, I think, plausibly *could* explain a classic seasonality pattern being observed in subarctic Sweden but not subtropical Florida/Texas.)

          I’m not claiming longitude is causal – I am just using that to describe a pattern: all nations with high per-capita fatality rates* are in the Americas or Europe/western Eurasia (depending on whether you count Iran’s moderately high one).

          I don’t have a good causal explanation of it – nor have I really seen much discussion of it. (There’s been a ton of discussion about successes in East Asia – South Korea, Japan, etc. – and Australia; but not so much about the rest of Asia and Africa.

          *India has a high absolute number of deaths, but its per-capita is lower than Western countries considered to have done relatively well, such as Canada or Germany.

        • You probably need at least three or four things to really explain it…

          – behavior/interventions

          – seasonality (which *may* be driven by behavior; has anyone really resolved whether usual cold/flu seasonality is driven by behavior changes e.g. schools and holidays or whether it’s related to temperature/humidity/sunlight/vitamin D?)

          – % already infected

          – whatever explains the divide between tropical Americas (largely high death rates) vs tropical Asia/Africa (universally low per-capita death rates); it seems to cut across too many different governments/cultures to be a purely behavioral divide

        • confused –

          I feel like we’ve been down this road quite a bit. Seems to me you’re hunting out correlations (patterns) and squeezing in notions of causation without actual plausible explanatory mechanisms. There’s an infinite set of patterns. Selectively picking out certain ones is suggestive of an expectation of causation. Of course that’s natural. Part of human nature. But I feel that’s why you’ve made a string of errors in getting out in front of the evidence to speculate.

          Anything is possible, of course.

        • “squeezing in notions of causation”

          I am perhaps not phrasing this well. By “seasonality” I don’t necessarily mean a specific weather-influenced cause, but “whatever causes regular cold/flu seasonality” (I think there is still significant uncertainty as to whether this is weather/sunlight/vitamin D/etc. driven or behavior pattern driven or both).

          Even avoiding notions of causation – the difference between the pattern in some parts of Europe, including Sweden, and the Northeast US with clear spring and fall-to-winter peaks, and the southern tier of US states with a mid-late summer peak/surge and a current (winter) one looks striking. The first looks vaguely like a typical respiratory thing, the second doesn’t at all.

          I think this pattern, and the drastic difference in impact in tropical America vs. tropical Asia and Africa, requires an explanation. I don’t know what that would be, as the fall/winter surge in the Northeast seems to have removed the only one that I had in summer (a non-seasonal one-large-surge-per-area “model”).

        • confused –

          > I was thinking of “extreme downside risk” in the sense that “COVID turns out to be much more deadly than we thought” not “more people get infected”.

          ? People getting infected means people dying.

          > Given that some places in South America have seen 60% or even 70%+ infection rates

          From what I’ve seen (particularly from random, not convenience sampling), it’s not likely that high. I think there’s waaaaay too much uncertainty to really have a good idea. At any rate what do you think the fatality rate was in Bergamo, or even more so some of the towns in the province that were hit the hardest? I think there’s too much uncertainty to have a really good idea. You need a different method of investigation, where you look at death certificates and canvass by doing interviews.

          >I think we have a fairly clear picture of what an uncontained outbreak looks like…

          Yeah, well I don’t.

          > I guess in theory. Personally, though, I think the window is closing even for that. IMO in a year most people will either have been vaccinated…

          Have been vaccinated or infected you mean? Most people? Hmmm. Totally depends on vaccination rates.

          > Sure — but if it is a theoretical question rather than a practical one of what to do now, I’d rather wait until the pandemic is over and its entire course can be looked at, and understanding has matured a bit/time pressure on the science is gone.

          Wait to do what?

          Wait to be confident of answers? Or wait to implement interventions? But weren’t you saying that all along, along with being consistent wrong, always on the low side, in terms of the pandemic’s trajectory? Being wrong is fine – but if it’s consistently wrong over an extended period of time always in the same direction, then there’s prolly a glitch in your model.

          > I have a feeling that there are too many factors for there to be good, quick answers. Some places (e.g. Sweden) have curves that look like classic seasonality (spring and fall peaks); others (southern US e.g. Arizona and Texas – peaks in July and December don’t look like any kind of seasonal pattern).

          What does “seasonality” mean? There were behavior changes (people going on vacation and then returning home). Is that seasonality? They started opening up senior housing communities again, I believe.

          I dunno. Seems to me there’s a whole lot we don’t understand yet. Lots o’ places that were thought by some to be “burned out” or “saturated” saw 2nd waves. Where would a population fatality rate top off? I had people telling me that 0.085% was the top end. Smart people. Way smarter than me. They were quite certain. It’s almost 0.16% in Belgium. 19 countries are above 0.085%

        • >>? People getting infected means people dying.

          What I meant is that in early March 2.2 million people in the US dying was (arguably) within the possible range (I tend to think not, but if you took the higher IFR values, maybe). Now, that seems clearly impossible given both that the IFR seems to be sub-1% in most of the US and the fact that vaccination is beginning (so the pandemic lasting say 3-4 years is probably not in the range of reasonable possibilities, if it ever was).

          >>From what I’ve seen (particularly from random, not convenience sampling), it’s not likely that high.

          Oh, ok, I’d thought that range for places like Manaus/Iquitos was fairly “solid” (as preliminary data go) – perhaps I misunderstood.

          >>At any rate what do you think the fatality rate was in Bergamo,

          Very high, but Italy’s population is extremely old + poor understanding of care back then meant they got the “worst of both worlds” (developed-world/elderly demographics without much benefit from better medical care), IMO.

          >>Have been vaccinated or infected you mean?

          Yes, that was a mistake (I actually meant to write “have been vaccinated or exposed”, to avoid the issue of, say, someone’s innate immune system defeating the virus before they become contagious & they don’t develop a serology-measurable level of antibodies — does that “count” as “infected”?

          >>Totally depends on vaccination rates.

          Not “totally”, I don’t think; given US behavior if vaccination rates were really bad I’d expect most people to be exposed before a year from today.

          >>Wait to be confident of answers?

          Yes, that’s what I meant — understanding what is happening in a broader sense, vs. making rapid decisions.

        • > Now, that seems clearly impossible given both that the IFR seems to be sub-1% in most of the US and the fact that vaccination is beginning (so the pandemic lasting say 3-4 years is probably not in the range of reasonable possibilities, if it ever was).

          I dunno about 2 million….but do we need to reach that number to be an extreme damage?
          What if hospitals get overwhelmed and IFR, as a consequence, increases markedly?

          Check out at the end what Mina has to say about vaccines. Also, mutations (there’s talk of a mutation in the UK that is complicating policy implementation) could complicate the picture.

          https://nymag.com/intelligencer/amp/2020/12/how-rapid-antigen-tests-could-end-the-pandemic-within-weeks.html?__twitter_impression=true

          > Oh, ok, I’d thought that range for places like Manaus/Iquitos was fairly “solid” (as preliminary data go) – perhaps I misunderstood.

          I went down a Twitter rabbit hole the other day where a fairly credible analysis questioned the high seropositiviry from blood donors in Manaus, saying that another random sampling came up with a lower estimate. Can’t find it again. At any rate, I’m just an internet epidemiologist but I don’t have much faith in seropositiviry estimates.

          > Differential carefulness/risk-taking by age might also be a significant factor. Purely hypothetically and based on no real data, I wonder if that is why places in the Midwest like the Dakotas that did practically nothing peaked at notably lower hospitalizations-per-capita than New York? That seems like a very odd data point…

          Prolly too early to draw conclusions about “peaks.”

        • <>Prolly too early to draw conclusions about “peaks.”

          There’s definitely a peak in the Dakotas and Nebraska… that doesn’t mean there won’t be another peak *later*, but the curve of cases or hospitalizations by time absolutely shows a peak.

          Even if there is another peak later, I don’t think that removes the need for some explanation of the current decrease.

        • Manaus – maybe. But it seems to me also hard to believe that with COVID being as infectious as it is, no place has reached that level; precautions aren’t being taken seriously in lots of places.

          >>I dunno about 2 million….but do we need to reach that number to be an extreme damage?

          Well, I was thinking you meant “extreme downside risk” in context of bad historical pandemics, so I was thinking you meant something much worse than current COVID IFR estimates could allow for. Current COVID is quite bad enough, of course; the phrase “extreme downside risk” just sounded to me to imply something more ‘apocalyptic’. I guess I misunderstood.

          Mutations – maybe. A Wall Street Journal article today claimed that the England mutation may be *less* dangerous, though more transmissible (which IMO would make some evolutionary sense!) Worse mutations are possible too, I suppose… though I think COVID actually gets a lot of benefit spread-wise by the majority of cases being mild. If symptoms were *usually* incredibly awful there’d be fewer people out spreading it…

          >>Prolly too early to draw conclusions about “peaks.”

          There’s definitely a peak in the Dakotas and Nebraska… that doesn’t mean there won’t be another peak *later*, but the curve of cases or hospitalizations by time absolutely shows a peak.

          Even if there is another peak later, I don’t think that removes the need for some explanation of the current decrease.

        • Having, it would seem, the worst first wave world-wide is not protecting Manaus from the second wave:

          “At the height of its first wave of Covid-19 cases in April last year, the city buried at most 142 citizens each day — a record that has now been broken every day since Jan 10.”

          https://www.ft.com/content/9cd1d3dd-c9b5-424b-a84c-30b788a15ed1

          Three-quarters attack rate of SARS-CoV-2 in the Brazilian Amazon during a largely unmitigated epidemic

          https://science.sciencemag.org/content/371/6526/288

          Herd immunity by infection is not an option

          https://science.sciencemag.org/content/371/6526/230

        • Joshua,

          I have no opinion. I’ve not even read thoroughly the paper or the critique. I noticed that you had linked to some counterarguments, though, so I hedged my comment with the “it would seem” :-)

        • Carlos –

          Thx. If you do take the time I”d be curious to know what you think. You seem like a smart fella and I’ve come to enjoy reading your thoughts.

        • As for “where would a population fatality rate top off” I don’t think there is or can be one globally-applicable answer. It depends on both age structure of the population and availability of medical care (and presumably prevalence of other underlying health issues), and presumably herd immunity thresholds would be different in places with different mixing patterns, and…

          New York City will probably be at 0.3% or more when everything is over (I think it was 0.25% just from the spring wave); there are probably tropical countries with very low median ages that wouldn’t reach that if every single person was infected.

          Differential carefulness/risk-taking by age might also be a significant factor. Purely hypothetically and based on no real data, I wonder if that is why places in the Midwest like the Dakotas that did practically nothing peaked at notably lower hospitalizations-per-capita than New York? That seems like a very odd data point…

  9. We (the CDC) did some modeling on the US side of this.

    A relevant quote:
    “School closures and stay-at-home orders were associated with Rt reduction but the large changes in mobility over time cannot be explained by the four NPI we modeled alone. External factors played a larger role as evidenced by the large national time trends. Individuals may have changed their mobility behaviour in response to perceived risk, guidance from community and faith-based organizations, employers providing opportunities to telework, city/county government actions, and media coverage of the pandemic including measures being taken by other countries and states”

    https://arxiv.org/pdf/2007.12644.pdf

  10. The Flaxman et al. paper is so bad that it borders on scientific fraud, see https://necpluribusimpar.net/lockdowns-science-and-voodoo-magic/ for a detailed discussion of the their methods. If I understand the criticism in the linked blog correctly, the analysis of Flaxman et al. is set up in a way that whatever intervention is done last ends up with a disproportionate weighting in terms of its calculated “effect”. The “borderline scientific fraud part” is that their model gives a country-specific-weight for lockdowns which they state for all countries except Sweden, and Sweden (which famously had not real lockdown) has the highest country-specific-weight for the effectiveness of its ‘lockdown’.

    • Nathaniel:

      I read the linked post by Lemoine and I can’t understand why you’re so harsh regarding Flaxman et al. I agree with several of Lemoine’s points, and I think Flaxman et al. would agree too: the model is indeed a variant of SEIR, there are indeed many real-world complexities that are not included in the model, in reality all coefficients should vary across countries and over time—this involves the usual compromises that are made when fitting statistical models to data—any measured changes are correlational not necessarily causal, policies vary across counties, there’s lots of variation in behavior and exposure within any country, and it’s impossible to pin down the effects of policy in a setting where people are continually changing their behavior on their own. That last issue is key, and I see this even with my own behavior: back in March we were so scared that we were bringing wipes with us when we went outside so as to swap the supermarket carts. Now we’re not doing that, but there are people outside wearing masks while jogging. The number of human contacts we make is somewhat under our control. There was all sorts of fuss about Thanksgiving gatherings, but you can end up in close proximity to people in a simple subway ride.

      I think the way to interpret results from this sort of model is as statistical averages. I went back to the Flaxman et al. paper and this is what they say in their abstract:

      We study the effect of major interventions across 11 European countries for the period from the start of the COVID-19 epidemics in February 2020 until 4 May 2020 . . . Our model relies on fixed estimates of some epidemiological parameters (such as the infection fatality rate), does not include importation or subnational variation and assumes that changes in Rt are an immediate response to interventions rather than gradual changes in behaviour. . . . We estimate that—for all of the countries we consider here—current interventions have been sufficient to drive Rt below 1 (probability Rt < 1.0 is greater than 99%) and achieve control of the epidemic. . . . Our results show that major non-pharmaceutical interventions—and lockdowns in particular—have had a large effect on reducing transmission. Continued intervention should be considered to keep transmission of SARS-CoV-2 under control.

      This all seems reasonable to me, except for near the end when they say, “Our results show . . .” They really should’ve said, “Our fitted model implies . . .” or “Our model estimates . . .” or “The data are consistent with the conclusion that . . .”

      Other than that, they’re pretty clear. Indeed, right in the abstract they state that their model (a) “relies on fixed estimates of some epidemiological parameters,” (b) “does not include importation or subnational variation,” and (c) “assumes that changes in Rt are an immediate response to interventions rather than gradual changes in behaviour.” These are all points that Lemoine makes, too, but it would help if he were to clarify that these are not any kind of secret—Flaxman et al. say them all right in their abstract!

      Now, a flaw is a flaw. Again, the model is far from perfect, and stating a flaw in the abstract of the paper does not suddenly then make the model correct. My point here is that no sleuthing was needed to find these problems. The problems in the model are there for the usual reasons, which is that there’s a limit to what can be learned from any particular dataset. Policies and behaviors are entangled, and so any estimate will be some sort of average. Again, that’s implied by the Flaxman et al. abstract but I do wish they’d not said “Our results show” near the end.

      The other thing Lemoine sys is that they’re hiding data from Sweden. I doubt that “they swept that fact under the rug.” They had tons of results and I’m not sure how they decided what to include in the paper. Lemoine writes: “What I don’t understand, or would not understand if I didn’t know how peer review actually works, is that no reviewer asked for it.”

      I’ve seen a lot of peer review over the years, and I think the best way to summarize peer review is that (a) it tends to be results-oriented rather than methods-oriented (even when reviewers criticize methods it’s often because they don’t believe the claimed results) and (b) it’s random. A paper can have obvious flaws that don’t get caught; other times, reviewers will obsess over something trivial. In this case, I expect that (a) the reviewers believed the result so they wanted to see the paper published, and (b) they probably did suggest some alternative analyses, but I’m guessing that the journal was also putting lots of pressure on the authors to keep the paper short. I’m not sure what restrictions were on the supplementary material, or why they didn’t include a few hundred more graphs there. But I really really doubt that they were trying to hide something. I was giving them some advice when they were doing this work—I’m not sure if my advice was about this particular paper, but in any case I’ve known Flaxman for awhile, ever since he spent some time working with our group a few years ago. What was happening is they were acutely aware of the statistical challenges in estimating the effects of policies in this observational setting, also they were acutely aware of the flaws in their model—which is why they mentioned these flaws right in their abstract—and they were doing their best. If they really thought the country-specific effect for Sweden didn’t make sense, I think they would’ve highlighted this point, not hidden it.

      In any case, if you take out some of the speculations and rhetoric, I think it’s good that Lemoine did simulations, as that’s a great way to understand what a model is doing. As I wrote a few months ago, the model is transparent, so you should be able to map back from inferences to assumptions.

      • While the implication of conscious malpractice is perhaps overblown and certainly unverifiable, the Sweden issue is a major flaw in the paper. Whether Flaxman et al realized it or not, the uniquely huge country-specific factor for Sweden is totally implausible. (It makes sense that they didn’t realize it, though; the flaws in our work are often invisible to us.)

      • Thank you Andrew for taking the time to respond in such detail, and merry Christmas!

        I’d also like to go back a little on my previous comment, I should not have conflated several criticisms. In particular, I think I was too fast to use the f-word (“fraud”), I apologize and since there is not way to edit comments I hope anyone reading consideres this as an update to my previous one.

        That said, there is at least a moderately strong case that at least one of the authors deliberately hid a major flaw in the paper – that Sweden’s “country-specific factor” was too large. In Figure 29 of the supplementary material, the country specific effect is given for *all of the countries considered except Sweden*. I’d like to stress that the question is not about the supplementary material missing more graphs, but that a graph is missing the single data point that would put the conclusions of the paper into question. I don’t want to blame Flaxman for this – after all, the paper has over 10 authors – and it could well be that the co-author doing the supplementary material is to blame. It could also be a weird coincidence, I don’t know.

        That said, there are a number of structural issues with the model which just make it a bad model. Nature has just published a “Matters arising” that goes into the same points as my linked blog post at https://www.nature.com/articles/s41586-020-3025-y. In a politically charged situation with a model of whose flaws you are acutely aware of, you should not be writing “Our results show that…” and then give a policy suggestion. At the very least, say something like “In our toy model (which has a questionable relationship to reality)…”.

        It looks to me like a number of modeling choices of the “garden-of-forking-paths” were made, and I don’t think mentioning which forks were taken in the abstract absolves the authors of blame here. I don’t know for what the authors selected the forks they took, but it seems to me that it was “low uncertainty in the results”. The effect is to give a false sense of certainty. A number of other strange modeling choices were made, such as including data for Sweden – which had no lockdown – in a way that suggested that it had a lockdown. As Sweden is well-known for going a different path to most countries in the first wave, it seems a little strange that the authors do not use Sweden as a test of their counterfactual.

        I agree that Flaxman. et al do some things right. It is great that they published their code/data and also that they at least mention that their model has limitations in the abstract. However, in terms of epistemological value on how to think about lockdowns, their paper is completely useless in my opinion because it (inadvertently or not) finds that whatever intervention was done last has the greatest effect, and we already know what intervention was done last so the paper tells us nothing new.

        • > However, in terms of epistemological value on how to think about lockdowns, their paper is completely useless in my opinion because it (inadvertently or not) finds that whatever intervention was done last has the greatest effect, and we already know what intervention was done last so the paper tells us nothing new.

          Why would any attempt to model something like this be “completely useless?” Some say all models are wrong and some are useful. But that means that models that are wrong can be useful – because they can be used to open up the dialog. IMO, what is effectively useless is when people reach beyond a model’s value to close down the dialog – either by overstating it’s generalizability or by dismissing it outright without appropriately acknowledging stated caveats, error ranges or confidence intervals, etc.

        • Actually I’m happy to strengthen my statement from “completely useless” to “considered harmful”. In a situation where there just isn’t any good data, any attempts to fit a model that says otherwise makes it look like there is knowledge that doesn’t actually exist.

          Here’s a mathematical example: assume you have a function f : X -> X. Now assume that you know the value y = f^10(x) = f(f(f(…(f(x))))) up to some measurement-error eps. You now “model” this to solve for x by finding a (right) inverse of f at the point 10 times. Now if the derivative of f is sufficiently large (e.g. think of f defining a chaotic dynamical system), then it could be that your reconstruction using your “model” is completely off – after all, you might have points essentially anywhere in the domain that end up somewhere in an eps-neighborhood of y after 10 iterations of x. If you’re trying to find out which particular area of X your point x was in, your model is completely useless because your data doesn’t tell you anything.

          I think of models like Flaxman et. al in the same way. There just isn’t any real information to be gleamed just from death numbers in different countries. Any good model should tell the truth, which is “we really don’t know what caused what”. A bad model, however, takes numbers of deaths in countries and timing of various measures, and concludes “lockdowns work” or “lockdowns don’t work”.

          I am not in the camp that all models are useless, but I am in the camp that says that the Flaxman et. al model is useless. I’m not saying this out of principle, but for reasoning I have linked to elsewhere – including the one published by Nature. The model is useless *because* of the stated (and unstated) caveats. I don’t see why saying that the model is useless should be seen as “closing down the dialog”. The dialog is already open, if anything papers like Flaxman. et al try to close it by giving strong claims that are not really supported by the data.

        • To correct my “missing the single data point claim”: it seems I was slightly confused, Figure 29 is missing Sweden because Sweden did not have a lockdown. So what is missing is in fact slightly more than a single data point. I still think the issue is serious, but the case for it being deliberate is weaker than I initially claimed, my bad.

          In Flaxman’s rebuttal to some cricisim recently at https://www.nature.com/articles/s41586-020-3026-x, country-specific effects for Sweden are included which is good.

Leave a Reply to Ben Cancel reply

Your email address will not be published. Required fields are marked *