Skip to content

How to track covid using hospital data collection and MRP

Len Covello, Yajuan Si, and I write:

The current way we track the prevalence of coronavirus infections is deeply flawed.

Ideally, health officials would test random samples of citizens in each community in a systematic way. But throughout the pandemic, the United States has lacked the political will or funding to pursue it. Instead, testing tends to ramp up when there are symptomatic outbreaks, which prevents the data from being representative of the population as a whole. Other factors compound that flaw: Outbreaks tend to cluster geographically and ethnically, for instance. What’s more, if a state tests more people, it will tally more cases, and if clinicians get better at identifying the disease, and test the people they think have covid-19, the test-positivity rate (another metric used to gauge viral spread) also increases.

Until we get to widespread random testing, we propose a second-best methodology that — our data shows — outperforms current practices as a predictor of future burdens on the health system. Our method involves testing asymptomatic people who are visiting hospitals for a broad range of outpatient procedures — diagnostic as well as surgical — and then adjusting the rates of positive testing to match the area’s demographics. Using data from the Community Hospital network in northwest Indiana, we analyzed the relationship between rates of positive test results for this outpatient population and covid-related admissions at all hospitals in the five Indiana counties the hospital network serves from late April into this month.

Our method predicted rises (or falls) in hospitalizations seven to 10 days before they occurred. In contrast, official state statistics for positive cases in the area lagged our data by roughly a week. State test-positivity rates — the proportion of tests given that were positive — were even less useful as a predictor of trends. In other words, switching to this new methodology more broadly could give policymakers at least a week’s jump on coronavirus trends, crucial time to prepare.

The idea for this new measure of coronavirus spread arose out of practices undertaken to reopen hospitals last spring after the initial shutdowns: Specifically, in April, Community Hospital in Munster, Ind. — where one of us works — was faced with the need to deliver surgical and diagnostic services to a patient population that obviously contained some infected but asymptomatic people. (Symptomatic people could be screened out more readily.) Only by testing everyone could the hospital be sure it wasn’t exposing staff and other patients to a potentially fatal illness. As a result, patients scheduled for procedures were required to say whether they had symptoms of covid-19 and contact with anyone with the disease. They also had to test negative for viral RNA four days before surgery. That regimen has continued.

As a side benefit, this protocol gave us an outstanding chance to measure viral prevalence. Demographically, these patients have an age, gender and racial-ethnic distribution that is similar to (though not identical to) the community at large. That they were asymptomatic was valuable, too: Testing them was more like testing people out and about at a local mall than like testing people who were already experiencing a fever and a cough. Overall, we collected information on 23,400 asymptomatic outpatients over the period we studied. (We also looked at the test results for more than 9,000 symptomatic outpatients, to explore how well they matched the demographics of the surrounding community; they did not match it well at all.)

The differences between the hospital population and the surrounding community were small but important: These patients are somewhat older and whiter, for instance. But statistical techniques allowed us to appropriately adjust what we found in the hospital setting. Since the hospital outpatient population contains fewer young people than the population at large, and younger people who are infected tend to be asymptomatic, we know our sample has fewer asymptomatic people than it “should.” We reweighted to account for that, and adjusted appropriately for other over- and underrepresented populations.

The method we used — called multilevel regression and poststratification — is simple, easily duplicated in any hospital system (they’re already doing the testing), and inexpensive. (To make implementation elsewhere easy, we have made the statistical demographic adjustment availableonline.) Our current work would imply that the same methodology can also be used to keep track of antibody prevalence, both naturally- and vaccine-acquired, which will be important as the vaccines roll out.

At 3,500 deaths daily, a week’s improvement in forecasting — the advantage our model provides — is an eternity, so this new metric promises to be useful as a viral surge unfolds. And from a policy standpoint, it’s just as important to know when it is safe to open as when to shut down. Our model can help there, too. The new metric is clearly superior to population-wide case numbers and especially positivity rates in showing clinical decrease in the virus. In a recent six-week period, for instance, the area around Munster saw a dramatic downturn in covid-related hospitalizations and ER visits — a trend foreshadowed by our data. This occurred even though positivity rates recorded by the state remained fairly high. That pattern is almost certainly replicated in many areas around the country with persistently high positivity rates.

While Indiana’s policies have kept our restaurants and bars largely open during this interval, other presumably well-meaning state and local governments mandated closures of businesses like these under parallel circumstances. Throughout our country, overreliance on positivity rates to track virus behavior may be economically and socially damaging.

We should still strive for widespread random testing of the general population. But until we have the resources, we can leverage already existing hospital testing to get more reliable information about our adversary — and win this war.

The research paper that describes this work is here, and we also discussed it in this post from last month. I’m posting it again here because we’d like to influence policy, and the above Washington Post article (mostly written by Len) is more direct and accessible than our research paper.


  1. jarvis says:

    quote from above: The current way we track the prevalence of coronavirus infections is deeply flawed.

    AFAIK no U.S. government health officials agree with this statement; suspect very few people here agree that official prevalence stats are deeply in error.

    Random testing analysis of general hospital patients seems an extremely narrow and manifestly unrepresentative sample of the general population, plus very difficult to do effectively in the real world.

    • Andrew says:


      I’m not in contact with any U.S. government officials on this, so I can’t comment on that. But it’s well known that reported positivity tests and reported positivity rates have problems because they depend so strongly on who gets tested.

      Regarding our data: we don’t have random testing of hospital patients; we have testing of all hospital patients. I agree that hospital patients are not representative of the general population. We adjust for demographics and geography, but those adjustments are not perfect. On the plus side, we test all the hospital patients so there’s no selection bias involving the choice to get tested. No data source here is perfect or even close to perfect, but we’ve found the analysis of hospital data to be useful.

      Regarding your final point: we do this effectively in the real world. At least, assuming that Indiana is part of the real world.

  2. Joshua says:

    Long, long overdue.

    True random sampling for covid would seem a near impossible task. Somewhat random sampling plus stratification adjustments would seem to be the next best (least sub-optimal) choice. It would certainly be nice if you could get this going across a wide variety of hospital settings. I don’t see what it should be very hard to get funding for something like that. Particularly given the importance of getting good sampling not only for this pandemic but also to inform future interventions.

    • Joshua says:

      What I’ve most been surprised by in the investigations so far are the attempts by certain researchers in particular to just consider obviously flawed convenience sampling as representative. That practice violates fundamental tenets of epidemiology (and the scientific methodl. It’s particularly surprising when it has been done by renown epidemiological experts. I almost can’t believe that I’ve seen what I’ve seen. I keep thinking I must have gotten something wrong. But I haven’t as yet seen evidence that I have.

      • Lonni says:

        Not an expert here at all, but isn’t it just the fact that the urgency of the situation led to the fact that testing needed to be implemented at al costs without giving too much thoughts about how to do it properly. Granted the fact that it’s still a problem today is still puzzling, but couldn’t it just be more of a logistic problem than a methodological one?

      • Fred says:

        That and the willingness to accept fundamentally flawed research of estimating effects of policy through cross-country comparisons.
        If you took some time carefully looking at how the data is collected and defined, you would quickly realize that even cross-state comparisons in the US can be extremely problematic, not to mention dozens of relevant confounders that would be impossible to control for.
        The real frustrating part is that there are academics who I generally respect that should know better that are quick to point out the folly of these exercises when the result don’t match their priors, but still spread them when they like the results with some hedging language (e.g. “flawed but interesting”, “study has limitations”).
        Obviously not a COVID-specific problem, but still annoying nonetheless.

        • Joshua says:

          Fred –

          > That and the willingness to accept fundamentally flawed research of estimating effects of policy through cross-country comparisons

          I completely agree. Attempting to compare across localities with vastly different conditions and wide differences in variables that are highly predictive of health outcomes is a breeding ground for confirmation bias, IMO. I understand the temptation to do it – but it’s been hard for me to see so many skilled scientists give in to that temptation without the appropriate discussion of the limitations.

          Even the obvious need to account for the severity of precipitating conditions when looking to evaluate the efficacy of interventions, or the relative efficacy of interventions in association with their severity, has been ignored. What explains why skilled epidemiologists have attempted to evaluate the severity of outcomes in association with the severity of interventions without even discussing the obvious consideration that the more severe outcomes would necessarily be coupled to the more severe interventions because they were implemented in the locations where the pandemic was most out of control?

    • Joshua:

      I am baffled why folks seem to overlook as a potential resource for analyzing data collected in hospitals…

      (Of course, publication likely would not be near as easy.)

  3. Zhou Fang says:

    Does this method work when the prevalence of covid in the population is in fact quite low, so the results are likely to be swamped by false positives? What about non-response bias?

    I’ve recently been looking at wastewater RNA testing in relation to case rates and the correspondence isn’t too bad, actually. The one piece of systematic random testing data I could obtain seemed, in contrast, to be quite nonsensical, for reasons that aren’t entirely clear but probably has something to do with false positives or non-response.

  4. Daniel L Speyer says:

    Is this data available without jumping through a thousand hoops? Is there any attempt to set up this analysis as an automated and publicly available data source?

Leave a Reply

Where can you find the best CBD products? CBD gummies made with vegan ingredients and CBD oils that are lab tested and 100% organic? Click here.