Tracking R of COVID-19 & assessing public interventions; also some general thoughts on science

Simas Kucinskas writes:

I would like to share some recent research (pdf here). In this paper, I develop a new method for estimating R in real time, and apply it to track the dynamics of COVID-19. The method is based on standard epidemiological theory, but the approach itself is heavily inspired by time-series statistics. I use Stan in estimating the model using Bayesian methods. It’s such a fantastic tool.

I provide an online dashboard where one can compare multiple countries and track the development of R over time (bit.ly/2KiPj9s). Here is a graph of the current estimates of R for the world as a whole:

I [Kucinskas] also use the estimates to take a first pass at evaluating the effectiveness of public health interventions. Here is how the estimates of R look like 4 weeks around the imposition of a lockdown (sample of 13 European countries):

There are obviously many caveats, and one should not over-interpret such evidence. But I [Kucinskas] find the graph quite striking.

I’ve only glanced at the paper so I’m not endorsing (or criticizing) its conclusions. The important thing from a statistical standpoint is that the assumptions and methods are transparent, so that if the ideas here are useful, they can be judged by experts and then used as components in other people’s models.

I’m always saying how statistics is the science of defaults. But there is a more general sense in which all of science is the science of defaults. In the exploration-exploitation tradeoff, we should just about always consider ourself in the exploration stage. The value of a study is almost always in how it gives clues to allow us to do better studies in the future.

P.S. The y-axis should go exactly to zero on both graphs. On the top graph, the y-axis goes below zero, which makes no sense; on the bottom graph, the y-axis has a hard stop at 1, which doesn’t make sense either.

23 thoughts on “Tracking R of COVID-19 & assessing public interventions; also some general thoughts on science

  1. These sort of methods seem extremely common but extremely stupid. No one could possibly think R has changed smoothly over several weeks, when there was a huge step change in policy. Why on earth do people insist on modelling it this way? Not everyone, but many people.

    • Don’t be so modest, James:

      > Abstract. 5 10 15 20 Wepresent a simple operational nowcasting/forecasting scheme based on a joint state/parameter estimate of the COVID-19 epidemic at national or regional scale, performed by assimilating the time series of reported daily death numbers into a simple SEIR model. This system generates estimates of the current reproductive rate, Rt, together with predictions of future daily deaths and clearly outperforms a number of alternative forecasting systems that have been presented recently. Our current (14th April 2020) estimates for Rt are, respectively, UK 0.49 (0.0 –1.02), Spain 0.55 (0.33 –0.77), Italy 0.90 (0.74 –1.06) and France 0.67 (0.40 –0.94) (mean and 95% credible intervals). Thus, we believe that the epidemics have been successfully suppressed in each of these countries, with high probability. Our approach is trivial to set up for any region and generates results in a few minutes on a laptop. We believe it would be straightforward to set up equivalent frameworks using more complex and realistic models, and hope that some experts in the field of epidemiological modelling will consider investigating this approach further.

      https://bskiesresearch.files.wordpress.com/2020/04/operational.pdf

    • I dunno, there are explanations for how this step change in policy would lead to a smooth drop in R. For instance, it might take time for people to come into compliance (and some people might go into compliance in anticipation of the policy). Or there might be household transmission.

    • Zhou Fang’s comment makes sense. A lockdown is going to slice up your population so it’s much more heavily localized. That’s going to up the R0 in places that have infectious people and lower it in places with no one infectious, and you’re not going to hit the lower, steady-state R0 until the infected groups have hit local herd immunity (e.g. 100% infected), which should happen over what, an infection cycle or two? I’d expect a fairly smooth-ish drop over a couple weeks.

      That said, these numbers in the Kucinskas paper don’t look right at all.

      • Reducing contacts can not increase the transmission rate.
        The transmission rate is not a function of the size of infected group: whether you have 1 person infecting 3, or 10 poeple infecting 30 is the same R.

        • It absolutely can, by increasing intensity and proximity of contact within a smaller group. And that’s what lockdown does. Spending 24 hours a day in small apartment with 4 people, one of whom has it briefly can briefly up the R0 among those 4 people, while decreasing it in society.

    • Yes, it is amazing how many people are wiling to blindly publish their model predictions without stopping for a second to think if their predictions are clearly violated by the observed data!

      • Yeah, I’m having a tough time with this one. On page 12, it looks like it’s still estimating R0>1 for Italy as of ~4/18.

        Daily deaths in Italy peaked on 3/27. And ICU usage peaked on 4/4, dropping steadily since then (a little surprising it’s after deaths peaked, but whatever). I don’t see any way to square those with an R0 that is still above 1 all the way out to 4/18.

    • The Swedes say their rate of death outside of elder care facilities is the same as their neighbors are experiencing. This suggests we measure R grossly when deaths are a function of penetration of specific vulnerable populations.

  2. > On the top graph, the y-axis goes below zero, which makes no sense

    At least zero is clearly marked with an horizontal line. More than a flaw in the chart it’s an issue with the model, acknowledged as such in the paper. The confidence interval (and even the point estimate) can go into negative territory.

    • The plot for China has a negative section for “number of days infections” >= 12.
      In general, changing this parameter seems to do little more than to rescale the y-axis.
      The model seems to be fundamentally unaware of the fact that R is a ratio that ought to scale geometrically with the serial interval.

      The simplest way to see if R is greater or less than 1 is to see whether the daily case rate is growing or shrinking. Your estimate of this is dependent on how you smoothed the data.

      Graph the logarithm of the daily new cases. R is approximately proportional to e to the power of (c times the slope of the graph), where c is the serial interval. Voila, now your estimate only depends on how well you iron out the statistical errors in the case rate! (And it’s always > 0.) ;-)

  3. “Differently from these authors,we use data on the number of infected individuals, rather than the number of new cases” — I don’t quite understand this line, where does this data come from?

    As the author is associated with the Humbold University in Berlin, I am surprised at the complete lack of discussion of the methods of the Robert Koch Institute to compute its official estimate of R, as described in “Epidemiologisches Bulletin 17/2020”. The RKI uses nowcasting to account for the lag in registrations of official cases (the German case data typically contains the date of symptom onset). The RKI estimates a mean time from infection to symtom onset of 5 days, and assumes that the majority of transmissions take place 2 days before that, which makes the mean serial interval 4 days. This value is crucial: the lower it is, the higher the estimate for R from case data. Simon’s online tracker allows this parameter, imprecisely labeled as “infectious period”, to to not be set lower than 5, and therefor his model is always lower than the official German estimate.
    Armed their adjusted estimate for the timeline of symptomatic people, the RKI now simply divides cases(day_n to day_n-3)/cases(day_n-4 to day_n-7) to arrive at their estimate of R for Germany, which is published in the daily situation report and currently stands at 0.9, with some regions still at R>1.

    The RKI’s assumption that there is little transmission after symptom onset rests on the consideration that symptomatic people are more careful in their behaviour than asymptomatic people.
    The value of R depends highly on what we choose as serial interval; it can’t stand on its own unless it is exactly 1.

    Why is the author’s method superior to the RKI method? Does its result differ if the “infectious period” is reduced to 4?

    • Exactly. the thing that is observable in macroscale data is the growth rate. The growth rate is a ratio of a function of R and an average duration of transmission. Since the average duration of transmission is actually completely unknown (and if there are completely asymptomatic cases it’s actually an average over a two group mixture distribution, with non fixed proportions.) it makes no sense to focus on Reff… what we really want is growth rate and we can pick that off the slope of the graph.

      Reff is basically dressing up a slope with some noise

      • The average duration of transmission is not “completely unknown”. We also know that asymptotic infected that test positive with PCR have similar virus concentrations as symptomatic infected, suggesting the transmissibility is the same, but would obviously extend for longer than it does for symptomatic infected (unless they are isolated as high-risk contacts).

        • Yes, exactly, the transmitting people come from two distributions that are mixed together. One of them is a group that’s pretty well characterized: the symptomatic population, and one of them it seems to me is fairly uncharacterized: the asymptomatic population.

          If the duration of transmission is a random variable then it comes from a distribution that looks like

          p*Ds + (1-p)*Da

          Where Ds is a density function for symptomatic transmission and Da is a density function for asymptomatic transmission.

          but p is also a changing function of time, and Da is uncharacterized, so the distribution overall is maybe not “completely unknown” but it has an enormous amount of uncertainty (ie. Bayesian uncertainty about the shape of the distribution and the way this shape changes in time and as a function of mitigation strategies in particular).

          For example, there are asymptomatic people in China who are still testing positive almost a hundred days after “recovering”. Can they transmit? We don’t know.

          The distribution of this quantity is so uncertain that calculating R which inherently requires taking an observed growth rate, and multiplying by the average of this transmission duration quantity… is meaningless. It takes a perfectly well resolved quantity “r” the growth rate, and turns it into a ridiculously unresolved quantity r*tbar where tbar is realistically any number between 3 and say 30

  4. > The value of a study is almost always in how it gives clues to allow us to do better studies in the future.
    Yup, the only assurance we get from induction is that if it persisted in, are misunderstandings will be discovered.

  5. Many thanks for all your comments & discussion — much appreciated!

    To answer a question that has come up a few times — why is that for some countries new cases & deaths are falling while these estimates of R are still around 1? The short answer is that the current estimation procedure is a bit conservative and likely over-estimates R later on in the epidemic.

    Longer answer: currently, I am using data on new cases and recoveries to construct a time series of how many people are infected by COVID-19 at a given point in time (using John Hopkins data). The estimated R is then just a simple transformation of the growth rate of the number of infected individuals. The exact formula is provided by the SIR model. However, with COVID-19, people take a very long time to recover. So, in later stages of the epidemic, the number of infected individuals tends to be quite stable even though new cases are falling.

    The right way to deal with this, I think, is to construct the time series of infectious individuals differently: Assume that people are infectious for a given number of days and then use a Poisson approximation (now discussed in passing in Section 3.2 of the paper). The current estimation is somewhat inconsistent, since the benchmark estimates assume that people are infectious for 7 days (which is consistent with data on the incubation period of COVID-19), but in the data people take much longer than 7 days to recover.

    I also have figured out a simple way to ensure that the estimates and CIs are always non-negative, while still keeping the linearity of the local-level model for the growth rate.

    I plan to incorporate both changes — as well some other additions to the applications part — very soon. Many thanks for your feedback again.

Leave a Reply to mark Cancel reply

Your email address will not be published. Required fields are marked *