“For the cost of running 96 wells you can test 960 people and accurate assess the prevalence in the population to within about 1%. Do this at 100 locations around the country and you’d have a spatial map of the extent of this epidemic today. . . and have this data by Monday.”

Daniel Lakeland writes:

COVID-19 is tested for using real-time reverse-transcriptase PCR (rt-rt-PCR). This is basically just a fancy way of saying they are detecting the presence of the RNA by converting it to DNA and amplifying it. It has already been shown by people in Israel that you can combine material from at least 64 swabs and still reliably detect the presence of the RNA.

No one has the slightest clue how widespread SARS-Cov-19 infections really are in the population, we’re wasting all our tests testing sick people where the bayesian prior is basically that they have it already, and the outcome of the test mostly doesn’t change the treatment anyway. It’s stupid.

To make decisions about how much physical isolation and shutdown and things we need, we NEED real-time monitoring of the prevalence in the population.

Here’s my proposal:

Mobilize military medical personnel around the country to 100 locations chosen randomly proportional to the population. (the military is getting salaries already, marginal cost is basically zero).

In each location set up outside a grocery store.

Swab 960 people as they enter the grocery store. Sort the swab vials in random order.

From each vial, extract RNA into a tube, and combine the first 10 tubes into well 1, second 10 tubes into well 2 etc… for a 96 well PCR plate (this is a standard sized PCR tray used in every bio lab in the country).

Run the machines and get back a count of positive wells for each tray…

Use a beta(2,95) prior for the frequency of SARS-Cov-19 infection, this has high probability density region extending from 0 to about 10% prevalence, with the highest density region between around 0.5 and 5%, an appropriate prior for this application.

let f be the frequency in the population, then let ff = 1-dbinom(0,10,f), then ff is the frequency with which a randomly selected well with 10 samples will have *one or more* swab positive. The likelihood for N wells to come positive is then dbinom(N,96,ff)

Doing a couple lines of simulation, for the cost of running 96 wells you can test 960 people and accurate assess the prevalence in the population to within about 1%. Do this at 100 locations around the country and you’d have a spatial map of the extent of this epidemic today.

There is NO reason you couldn’t mobilize military resources later today to do this swabbing, and have this data by Monday.

This kind of pooled sampling is a well known design, so I assume the planners at the CDC have already thought of this. On the other hand, if they were really on top of things, they’d have had a testing plan back in January, so really I have no idea.

The innovation of Lakeland’s plan is that you can use a statistical model to estimate prevalence from this pooled data. When I’ve the pooled-testing design in textbooks, it’s been framed as a problem of identifying the people who have the disease, not for estimating prevalance rates.

128 thoughts on ““For the cost of running 96 wells you can test 960 people and accurate assess the prevalence in the population to within about 1%. Do this at 100 locations around the country and you’d have a spatial map of the extent of this epidemic today. . . and have this data by Monday.”

  1. I’ve been thinking along those same lines and trying to advocate for it on the Italian Twitter.
    Spatial map of disease prevalence would allow creation of “safe zones” where the disease suppression can ideally start.

  2. “Swab 960 people as they enter the grocery store. Sort the swab vials in random order.”

    Not a random sample of the population. People who feel sick but haven’t sought diagnosis or treatment will likely be sniffling quietly at home (as they are supposed to do) rather than going to the grocery store and exposing others.

    Great idea. Just needs tweaking.

    • And what if a person doesn’t want to be swabbed? What if people who aren’t feeling very well REALLY want to be swabbed?

      (Just a question; not a critique of the methodology)

    • Not a random sample of the population, but still MUCH more representative and informative than “we tested people at a hospital who were coughing and found x individuals had the disease”

      that’s 100% people who feel sick and scared getting tested.

      Of course it needs tweaking though, there are lots of ways to design sampling procedures to use this kind of thing. This one doesn’t need anything other than someone to say “go” and we’d have dramatically more actionable knowledge by monday.

    • With 960 people being swabbed, imagine it’s 1%, you’d get around 8 or 9 or 10 positive wells, you run the bayesian estimate, it gives 1% +- 0.5% or so.

      pooling 100 of these plates and you’d have the national estimate to within .05% or so?

      There would be noise from one place to another, no doubt. You’d ideally do this in a weekly time-series, and fit some more sophisticated models as a geo-spatial time-series model.

      I’m all for serology testing, but we could be doing this PCR stuff for peanuts today compared to 2T stimulus bills, and it’d give us a DRAMATIC improvement on our state of knowledge and planning requirements.

      • Where does the 1% estimate come from? I did a fake data thing and it seems not far off, but not obvious to me how all the design parameters fit together (10 people per tube, etc). This seems like a cool idea though. I like it.

        Also interesting link Carlos. I didn’t realize it was that simple to test if the virus existed previously.

        • Yeah, I ran a quick simulation, assumed there was 1% prevalence, randomly generated some fake data, calculated the posterior (in maxima, semi-analytically), got a posterior, drew the curve, it was peaked very close to to assumed underlying reality and with a tight error bound… good enough, dramatically better than whatever crap we have right now from the “confirmed cases horse race”

          that’s from a single PCR plate.

          10 people per well is just a first pass, I actually am working with a friend to try to formulate some kind of optimization procedure to choose those parameters etc in a good way, but the fact is that’s weeks or months out to do the optimization, it’d be great to know and help with management of this kind of thing as a general strategy, but RIGHT NOW we need someone to just go do the 10 swab thing.

  3. Unfortunately I think we’ve moved beyond the usefulness of testing.

    Currently if we test a bunch of people and all the tests come back negative than can mean either one of two things: they’ve never been exposed to the virus, or they’ve already been exposed and recovered and no virus remains. Okay, so in both cases those people aren’t carriers and aren’t a risk to other people so by that logic they shouldn’t need to quarantine, but in the first scenario they don’t have immunity and can get infected and thus become a carrier despite being given the green light that they aren’t a carrier.

    • We need to plan the deployment of emergency hospital resources, masks, ventilators, hospital ships, mobile army field hospital equipment, decide whether actions taken in certain areas are efficiently slowing the growth of the virus or need to be stepped up… etc etc .

      all of that needs to be driven by knowledge of the *prevalence* of the active infections in the general population.

      Is NY really a major ground zero, or is it just that they’re testing a lot more than LA or Atlanta or New Orleans or Houston?

      this data would tell us a LOT about how to utilize resources.

      • I don’t see how the surge in the number of patients requiring ventilators in NY, for example, could be driven by the fact that they are testing more than LA or Atlanta or New Orleans or Houston.

        • https://nymag.com/intelligencer/2020/03/inside-a-brooklyn-hospital-during-covid-19.html

          “The number of very sick COVID patients coming in is tremendous. I don’t know if the word is exponentially or logarithmically, but the curve goes up steeply. It’s scary. Mount Sinai Brooklyn is a moderate-sized community hospital. We have 220 beds, we’ve planned a surge of up to 240 to 260. At the current moment I have 135 COVID-positive patients. There are probably another 10 or 15 that just don’t have test results back yet. And they are sick. They are the ones who need to be admitted to the hospital. It’s a few debilitated elderly from nursing homes, but there’s a lot of patients who are between the ages of 40 to 60 who may have some underlying health problems like obesity, diabetes and high blood pressure, and their lungs are very inflamed. They go from being moderately sick to crashing and needing to put on ventilators very quickly.”

          [ I know you will dismiss this readily, because people is just dying at the usual rate or whatever, so don’t bother replying. I thought others may find that piece interesting, though, even if it’s just a bunch of anecdotes. ]

        • More anecdotes from Brooklyn:

          https://www.wsj.com/articles/inside-a-new-york-er-as-coronavirus-hits-we-dont-know-how-high-the-peak-is-going-to-be-11585243524

          A shortage of ventilators has worried doctors across New York City, but high-flow machines are also a precious commodity in the expanding battle against the pandemic. They were in short supply Wednesday at Maimonides Medical Center in Brooklyn.

          Dr. Marshall, chairman of emergency medicine at Maimonides, assured the doctor that a dozen or so were coming right away, and that they were pushing to order a lot more.

          The need for respiratory-support machines is acute as Maimonides and other hospitals in New York City—an epicenter of the outbreak—are seeing massive rises in cases of patients with Covid-19, the disease caused by the coronavirus.

          […] Dr. Marshall said the hospital had just acquired 15 ventilators and had a total of 30 extra at the moment.

          An hour earlier, Dr. Marshall had gone outside to meet a voluntary emergency response group called Hatzolah of Boro Park, which unloaded the 15 donated ventilators from the back of an SUV.

          Maimonides, like every hospital in the city, has been scrounging for the devices. Dr. Marshall said the hospital has already bought refurbished ones and is leasing others from home health-care companies that aren’t using them.

          The hospital is also converting medical-grade apnea machines used for people with a sleep disorder into ventilators, he said.

        • I just reread this:

          there’s a lot of patients who are between the ages of 40 to 60 who may have some underlying health problems like obesity, diabetes and high blood pressure, and their lungs are very inflamed. They go from being moderately sick to crashing and needing to put on ventilators very quickly.

          I don’t think medical science has an understanding yet of why some people do so much worse than others. There are theories out there about the viral load and probably some genetic variation. It’s unclear. Certainly underlying things like diabetes and high blood pressure add to the equation. Smoking, lung disease, vaping absolutely doesn’t help. But I don’t think we know enough about the science of this yet to say what makes one person crash and burn when another person just has a fever and aches for a week.

          https://nymag.com/intelligencer/2020/03/inside-a-brooklyn-hospital-during-covid-19.html

          These patients are clearly running out of ascorbate due to the prolonged inflammation (as has been repeatedly documented in critically ill patients) and then crashing as their lungs (the main site of inflammation) oxidize and the tissue liquifies due to a localized kind of scurvy (collagen degradation outpaces synthesis).

          Healthcare experts in the west keep calling the use of vitamin C a myth, when it should have been the first thing they tried when observing that. And they need to keep giving it because these very sick people will be deficient again a few days later even after massive doses:

          Considering that most patients had normal dietary intake before intensive care unit (ICU) admission, there is a clear association between development of critical illness and decreasing vitamin C levels. Patients in multiorgan failure exhibit even lower levels, sometimes reported as low as 3.8 mmol/L despite enteral nutrition [6].

          […]

          Recent pharmacokinetic trials concluded that intravenous administration of 2 to 3 g/d is required only to normalize plasma levels, whereas many of the trials administered inferior doses [5], and even superior doses are required to obtain the supraphysiological levels described by the concept of pharmaconutrition. De Grooth et al. [22] randomly allocated 20 patients in four groups receiving either 2 g or 10 g of intravenous vitamin C, either twice daily as infusion or as continuous perfusion. Similar to previous findings, 2 g was sufficient to normalize serum levels, whereas 10 g was required to maintain serum levels constantly >40 mg/L, with a maximum at 295 mg/L. Interestingly, after 48 h of discontinuation, 15% of patients developed hypovitaminosis.

          https://www.ncbi.nlm.nih.gov/pubmed/30612038

          Also see this specifically for ARDS, where the vitamin c patients were dying at 1/5 the rate of standard treated patients until they stopped the vitamin C, then they began dying at the same rate. They also show most of the patients were deficient again already a few days later (figures 2-3):
          https://jamanetwork.com/journals/jama/fullarticle/2752063

          Then some other group cut the dose in half, added some other random stuff that may or may not have deleterious effects, and didn’t measure blood levels (just assumed they corrected the deficiency). This was claimed as debunking the earlier studies. I mean they even compare blood levels one hour after treatment to 6 hours later in the trials that showed it worked:

          Patients in the current study received lower daily doses of IV vitamin C compared with CITRIS-ALI. However, in the nested cohort study within the intervention group of this trial, the median plasma concentration of vitamin C increased from 28 μmol/L at baseline to 369 μmol/L 1 hour after the first dose and achieved nearly the same plasma level at 6 hours 24 as reported in CITRIS-ALI at 48 hours. 2

          https://jamanetwork.com/journals/jama/fullarticle/2759414

          Just look at the difference in blood levels after a 1.25 mg dose at 1 hr vs 6 hrs here, it is 5-6x lower:
          https://www.ncbi.nlm.nih.gov/pubmed/15068981/

          Besides smoking, I also see the “experts” are still saying asthma is a risk factor: https://www.cdc.gov/coronavirus/2019-ncov/need-extra-precautions/asthma.html

          Just like smoking and other chronic respiratory issues the opposite seems to be the case:

          However, chronic obstructive pulmonary diseases (COPD) are relatively less common in COVID-19 patients, with a prevalence of 1.1%-2.9%. 7-9 In a study involving 140 cases with COVID-19 on the association between allergies and infection, no patients were found to have asthma or allergic rhinitis. 8

          […]

          Given the association between virus infection and asthma, 30 it is worth carefully monitoring asthmatic patients in this coronavirus epidemic. However, in pediatric cases, we did not find COVID-19 patients with a history of asthma (unpublished data). Maybe a distinct type 2 immune response may contribute to this low prevalence of asthma and allergy patients in COVID-19. The interaction between SARS-CoV-2 and asthma remains to be further investigated, especially considering that current medical resources have been mostly focused on COVID-19.

          https://www.ncbi.nlm.nih.gov/pubmed/32196678

          But anyway, like I asked in the other post can you quote the part about a surge in ventilator usage? Afaict this hasn’t materialized in NYC. Your quotes are about people predicting or planning for that to happen, not what is actually happening. He does say:

          Normally I would have five on during a shift. I have two today, at the exact time I have more patients on ventilators than before.

          https://nymag.com/intelligencer/2020/03/inside-a-brooklyn-hospital-during-covid-19.html

          But that doesn’t really imply an overwhelming surge (yet).

        • I think in NY the death rate is getting high enough that people die before they even get tubed.

          Thanks for the info on ascorbate though. I think that makes some sense. I wonder if dosing people with N-actyl-cysteine to up their glutathione levels so that glutathione could help recycle ascorbate would help? (as I understand from your previous discuss this is the correct direction of recycling?)

        • Not New York, but:

          “Demand for ventilators has doubled in Louisiana, with some 80% of intensive care unit patients now on the breathing machines, said Warner Thomas, chief executive of Ochsner Health System, the state’s healthcare group.

          “Louisiana Governor John Bel Edwards warned that his state, reporting about 1,800 infections, including at least 83 deaths is rapidly running out of beds and ventilators.”

          https://nationalpost.com/pmn/health-pmn/new-york-new-orleans-hospitals-reel-as-u-s-leads-world-in-coronavirus-cases

          Do you think people are requiring ventilators at the usual rate?

        • Demand for ventilators has doubled in Louisiana

          This is, once again, demand for ventilators, not actually being used. Your second quote is better.

          Do you think people are requiring ventilators at the usual rate?

          I said from the beginning that people are dying of respiratory distress instead of whatever else they were about to die from. It is all cause mortality aggregated over a couple weeks or months where there should be little difference. So I always did expect an increase in ventilator use.

          And as of last week all cause mortality for Europe looks like it is 2 SD below average even in the 65+ group. Italy is higher than usual but not more than what happens once a year or so: https://www.euromomo.eu/index.html

          We need to wait a few weeks to see if that is due to a reporting lag or something, but how are we getting daily updates on number of COVID-19 deaths but not weekly of all cause mortality?

        • I think in NY the death rate is getting high enough that people die before they even get tubed.

          Thanks for the info on ascorbate though. I think that makes some sense. I wonder if dosing people with N-actyl-cysteine to up their glutathione levels so that glutathione could help recycle ascorbate would help? (as I understand from your previous discuss this is the correct direction of recycling?)

          I doubt anything could be cheaper and safer than taking sodium ascorbate itself. Side effects are very rare even in people receive huge doses IV: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2898816/

          Its notable that, as a water-soluble small molecule, ascorbate gets filtered by the kidneys automatically and energy must be expended in the form of a sodium gradient to reabsorb/conserve it. So my guess is some tradeoff with energy conservation is why normal blood levels are “only” 50-200 uM rather than some health concern.

          I don’t know how much effect N-acetyl-cysteine would have… are critically ill people deficient in glutathione or cysteine? Also, plasma glutathione does not appear to be a replacement for plasma ascorbate:

          Plasma devoid of ascorbate, but no other endogenous antioxidant, is extremely vulnerable to oxidant stress and susceptible to peroxidative damage to lipids. The plasma proteins’ thiols, although they become oxidized immediately upon exposure to aqueous peroxyl radicals, are inefficient radical scavengers and appear to be consumed mainly by autoxidation.

          https://www.ncbi.nlm.nih.gov/pmc/articles/PMC297842/

          I’m not really familiar with the supplemental/pharmacologic glutathione literature though. I will see whats out there, but from what I’ve learned vitamin C is really in a class all itself.

        • OK, so you always did expect an increase in ventilator use but find it difficult to believe that there has been a surge. Nevermind, imagine I said “increase” instead of “surge”.

          Anyway, what’s your current position on the strongest form of your point that “people are dying at the usual rate” point?

          In case you’re still waiting for some data on that front, 1759 people have been reported dead from covid-19 in Bergamo. Ignoring that the actual number is likely to be much higher, let’s just assume for the sake of the argument that no one else has died in the province, nobody else is going to die until the end of the month, and by some remarkable coincidence everyone who died for any cause happened to test positive for SARS-CoV-2.

          Deaths for the (full) month of March in 2003/2017 were in the [706 953] range (average 797, standard deviation 64).

          1759 (if only!) seems quite an outlier, doesn’t it?

        • OK, so you always did expect an increase in ventilator use but find it difficult to believe that there has been a surge.

          Where did I say that? I said your sources don’t say it. You are being sloppy with your citations and I pointed it out.

        • In case you’re still waiting for some data on that front, 1759 people have been reported dead from covid-19 in Bergamo. Ignoring that the actual number is likely to be much higher, let’s just assume for the sake of the argument that no one else has died in the province, nobody else is going to die until the end of the month, and by some remarkable coincidence everyone who died for any cause happened to test positive for SARS-CoV-2.

          Deaths for the (full) month of March in 2003/2017 were in the [706 953] range (average 797, standard deviation 64).

          1759 (if only!) seems quite an outlier, doesn’t it?

          Please include a link to your data sources because it sounds interesting. And you seem to keep ignoring that what I expected is after this there will be fewer deaths in the coming weeks/months so it averages out. I have repeated that about a dozen times now.

        • > And you seem to keep ignoring that what I expected is after this there will be fewer deaths in the coming weeks/months so it averages out. I have repeated that about a dozen times now.

          Would you kindly guide me through the following exchange so I can properly understand it under that light?

          Me, March 18 at 7:39 pm: “I do completely get your point! Everyone who dies was going to die anyway in the coming months or years”

          You, March 18 at 10:23 pm: “No, I dont think you did understand my point at all. The strongest form of my point is people are dying at the usual rate and also testing positive for this virus.”

        • Please include the link for context because “No, I don’t think you did understand” referred to what you think. That is why I referred to “the strongest version”.

        • > I said your sources don’t say it.

          I understand “demand for ventilators has doubled in Louisiana, with some 80% of intensive care unit patients now on the breathing machines” to mean that usage has doubled. That interpretation seems consistent with 40% of intensive care unit patients receiving mechanical ventilation typically (https://www.ncbi.nlm.nih.gov/pubmed/23963122).

          You may of course choose to understand it as you please.

        • I understand “demand for ventilators has doubled in Louisiana, with some 80% of intensive care unit patients now on the breathing machines” to mean that usage has doubled. That interpretation seems consistent with 40% of intensive care unit patients receiving mechanical ventilation typically (https://www.ncbi.nlm.nih.gov/pubmed/23963122).

          There you go! Using proper sources for your conclusions. Thanks. Why would you think we all know the normal percent of ICU patients on “ventilators”?

        • Your full comment follows, one line quoting me, one line of reply. https://statmodeling.stat.columbia.edu/2020/03/18/just-some-numbers-from-canada/#comment-1265858

          >> Don’t worry, I do completely get your point! Everyone who dies was going to die anyway in the coming months or years (decades in some cases, but what does it change) and it’s not like anyone is going to die twice.

          > No, I dont think you did understand my point at all. The strongest form of my point is people are dying at the usual rate and also testing positive for this virus. Do you have evidence otherwise? Did you look until now?

          If your reply doesn’t mean what it seems, then what does it mean?

        • Don’t worry, I do completely get your point! Everyone who dies was going to die anyway in the coming months or years (decades in some cases, but what does it change) and it’s not like anyone is going to die twice.

          This is what I was responding to. This does not indicate an understanding of the point that people who were going to die soon are dying so the overall mortality is going to be little changed when aggregated over a few weeks or months.

        • What does your reply mean? Do you understand it yourself?

          Why do you formulate the strongest form of your point? Why do you ask for evidence against it?

        • What does your reply mean? Do you understand it yourself?

          Why do you formulate the strongest form of your point? Why do you ask for evidence against it?

          If you understood the point you wouldn’t even be thinking about people dying decades later… even years was stretching it but the mention of decades showed you did not understand.

        • So if I understand correctly what you have always said and still maintain is that nobody dies from covid-19 that wasn’t going to die in the next twelve months anyway. Can you confirm that?

        • So if I understand correctly what you have always said and still maintain is that nobody dies from covid-19 that wasn’t going to die in the next twelve months anyway. Can you confirm that?

          No, your constant strawmen are getting really annoying.

        • Carlos, I think you could state Anoneuoid’s point as basically that for all the people dying of COVID-19, the number of actual QALYs lost is less than 1… that is, these are all people who were so elderly, infirm, or sick that their life expectancy was a few weeks to months.

          I don’t think there’s any reason to believe that is true for the numerous people under 50 in the US who have been put on ventilators. I read recently around 50% of ventilator patients are under 50. We will have to wait to see what fraction die… but it’s not reasonable to think that the proper life expectancy of people dying from COVID was ~ 1 year or less.

        • Then you agree that some people who dies could have lived for decades? It seems I got your point right the first time after all!

          I would say that you created the strawman yourself when you asked for evidence against the “people are dying at the usual rate” strongest form of your point. And you kept the strawman alive when I offered evidence oh that thread and you replied repeatedly that it was not good enough instead of telling me that that wasn’t your point.

          That clears the issue, thanks.

        • Carlos, I think you could state Anoneuoid’s point as basically that for all the people dying of COVID-19, the number of actual QALYs lost is less than 1… that is, these are all people who were so elderly, infirm, or sick that their life expectancy was a few weeks to months.

          I would not say all, but that definitely seems to apply to most. Like I keep saying, all cause mortality doesn’t look like it will be affected at the month to quarter scale.

          I read recently around 50% of ventilator patients are under 50.

          Source? Is this due to demographics (lots of young wherever it is), age-based triage (maybe they use 50 as a cutoff), etc. Also, mechanical ventilation kills old people much more often than young, so they are less likely to get treated that way:

          Patient mortality is the gravest complication of mechanical ventilation. In our study neither advanced age nor HVT [high tidal volume] ventilation alone significantly increased subject mortality. Only with the combination of advanced age and HVT did our study yield a profound decrease in our subjects’ survival (Fig. 1). Considering the epidemiology of VILI the experimental validation of the age associated increase in ventilator mortality is already of paramount importance. Potentially even more meaningful however was that we were able to completely attenuate the age associated increase in our subject’s HVT mortality with the administration of a low fluid protocol.

          https://www.sciencedirect.com/science/article/pii/S0531556516301401

          It is now well established that over-distention of the alveoli can damage alveolar lining cells and result in local and systemic inflammatory immune responses that can be deleterious to the host, even in the absence of pulmonary infection [2]. This problem, known as ventilator-induced lung injury (VILI), is a major, yet avoidable, complication of mechanical ventilation. Low tidal volume ventilatory strategies have now become the standard of care given the findings of the ARDSnet trial [3] and other supporting studies [4] and are now part of the Surviving Sepsis Campaign guidelines to limit ventilator-associated lung injury [5].

          https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3706874/

          The mortality rate for patients requiring mechanical ventilation is about 35% and this rate increases to about 53% for the elderly. In general, with increasing age, the dynamic lung function and respiratory mechanics are compromised, and several experiments are being conducted to estimate these changes and understand the underlying mechanisms to better treat elderly patients.

          http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0183654

          So that number seems to be a poor proxy for how severe COVID-19 is likely to be in healthy people.

        • Then you agree that some people who dies could have lived for decades?

          Yes, same for any illness.

          It seems I got your point right the first time after all!

          Nope. Start thinking quantitatively instead of like you were trained in NHST.

        • Sure, at the MOMENT NY may be a more serious situation, but perhaps the quarantines are more effective in NY and in fact, some other place is still spreading the disease widely and in epsilon of time will be an even bigger major catastrophe. On the other hand, maybe some place is doing a great job of containing the disease and doesn’t need the stockpile they’ve got quite as much, and could sell it / transfer it to New York … etc

          the point is, without some kind of at least estimate of prevalence, these kinds of resource allocation decisions are entirely chasing the curve… Knowing what *will* happen in your city in 5 to 10 days is *gold* compared to cowering in the corner and not knowing if it’s going to go up by a factor of 2 or a factor of 32

        • Your are not going to know what *will* happen in your city by testing at 100 random locations across the country either. Unless your city is one of those 100 locations (and still).

        • choosing the two largest metropolitan areas in each state, you could at least get a reasonable idea for large populations. Obviously you could do more than one plate per city too, given resources. What really needs to happen is for someone to realize this is possible, try it out, publish some results, and have every mayor of a major city tell their public health department to get on it in a copy-cat manner.

        • Carlos, you have a lot of patience! Anoneuoid seems to simply be trolling, or if that’s not the word for it then it’s something almost equally pointless. Push push push push on one point until you finally prove that you’re right, then it turns out that point didn’t matter anyway and what really matters was something else and why do you think you’re right on that one? Push push push push, rinse, repeat.

          I stopped reading this after about five posts by each of you, and I pity the fool who continued longer than I did.

        • Carlos, you have a lot of patience! Anoneuoid seems to simply be trolling, or if that’s not the word for it then it’s something almost equally pointless. Push push push push on one point until you finally prove that you’re right, then it turns out that point didn’t matter anyway and what really matters was something else and why do you think you’re right on that one? Push push push push, rinse, repeat.

          I stopped reading this after about five posts by each of you, and I pity the fool who continued longer than I did.

          Nope not trolling. Just sticking to the facts instead of proxies chosen by the media to maximize clicks. Also, everything you are concerned now about I was concerned about 2-8 weeks earlier.

        • Actually, looking at my records I personally started prepping all the way back in mid January for this situation, and started sharing solutions for it online Feb 3rd.

          Were you and Carlos warning people about the problem on Feb 3rd?

    • They could do both PCR and an immunoassay to see if the person has ever been exposed to Covid-19. That gives you both current infections and exposure levels.

  4. If you simply chose the top 2 metropolitan areas in each state, you’d have 100 teams. Pretty much most major metropolitan areas have universities that are SHUTTERED now. All major universities with a bio department have at least one of these PCR machines sitting in a room unused. Contact the president of that university, send out emails to the faculty, identify someone who can go run PCR tests.

    This doesn’t have to be some kind of FDA approved diagnostic test, since it’s not diagnosing any individual person’s disease. The equipment is already in place, the reagents cost literally pennies per test.

  5. It is not clear to me if the bottleneck of testing capacity is either throughput (a quick google shows that the Roche 8800 machine (fully automated) can obtain 1056 results in an 8-hour shift), or the swab labor (given that all kinds of construction workers still keep building luxury towers in the city).

    Another quick google shows military members do get extra payment on “hardship duty” missions. It was $150 per month and an additional $250 for separation allowance during the Ebola time. So the marginal cost is strictly positive.

    Return to statistics, I think it is an interesting question to ask, what is the optimal number of individuals samples we want to mix in a wel (10 in the current design)l– seems to be a seperate decision theory / active learning problem– and the number can probably be adaptively tuned after each prevalence estimation.

    • > It was $150 per month and an additional $250 for separation allowance during the Ebola time.

      This is a 3 day mission *in the US* at most, not flying for a month to Angola.

      But sure, pay 1000 people an extra $100 for 3 days of excess hazard pay… It’s still less than the annual salary of 2 tenured professors doing NHST based noise mining.

  6. Daniel, I think this is a great idea, but please don’t forget about false positives and false negatives!

    Even for a very good test (high sensitivity and specificity), if you run it in a population with low enough actual disease prevalence, then your results will end up reflecting more about the test’s performance than the disease prevalence you want to measure.

    There are some ways to get around this, at least partially. For example, if you track rates of positive test results over time and are willing to assume constant sensitivity and specificity (a questionable assumption), then you can derive estimates of the true disease prevalence from the observed trends. But there are many caveats with that kind of approach.

    I also assume that CDC planners have already thought of this and aren’t doing it for some reason. Could be a good reason …or it could be a bad reason. Their flu tracking systems (see CDC FluView weekly reports) are similar enough that it should be easy for them to deploy this kind of COVID-19 testing through that system rapidly.

    • > Daniel, I think this is a great idea, but please don’t forget about false positives and false negatives!

      This is a case of don’t make the perfect the enemy of the good. Right now we have what? NOTHING that estimates prevalence. It’s all about N people confirmed who have sought testing because they’re sick. It’s like sampling for cats at a dog kennel. You’re not going to find out how many people are out there carrying the virus without symptoms yet by sampling people who have symptoms and are seeking treatment.

      Collect the data, we can refine the models later.

      • Daniel, I don’t think that false positives are details here. Instead, they may be the largest obstacle to your proposal and possibly why it has not been tried by the cdc.

        Some illustrative numbers may help to communicate my point. First, we’ll need sensitivity and specificity. It’s difficult to estimate sensitivity and specificity are for covid 19 tests, especially given test problems that keep being talked about, and the issue that real-world sensitivity and specificity differ for different tested populations. However, glancing at reported values for SARS tests, and considering reports of patients who only test positive for COVID-19 after multiple rounds of testing, let’s say 75% sensitivity and 98% specificity.

        Now let’s consider an example set of community covid-19 prevalences that we’d like to measure, such as 0%, 0.1%, 0.5%, 1%, and 2% prevalence of covid 19. With 75% sensitivity and 98% specificity, testing at these prevalences would be expected to return positive results for 2%, 2.1%, 2.4%, 2.7%, and 3.5% of tests, respectively.

        These numbers are driven more by the test specificity than the actual disease prevalence. Moreover, trying to take these numbers and back-out the actual disease prevalence would require knowledge of the real-world sensitivity and specificity, which are not really known and are context-dependent. Lastly, there is the issue that because the disease is infectious, covid 19 cases are concentrated in pockets around the nation (that nursing home in the Seattle area is an example) and with random sampling you may simply miss those pockets, which could greatly affect your final estimates.

        Still, I haven’t thought through how your very nice pooling suggestion affects things (does it help or hurt?) and there could well be other solutions.

        As mentioned above, one option may be repeated cross-sectional sampling and analyses of trends, which could provide a firmer basis for obtaining estimates of disease prevalence from positive test rates, though then it may also be necessary to deal with issues like “length bias” from the screening literature.

        It’s also possible that I greatly misjudge the plausible specificity for COVID-19 tests – if anyone thinks so, I’d like to know.

        • Not exactly my specific domain expertise, but since no DNA jocks have replied, I’ll offer some info from my experience with modeling related to environmental DNA studies utilizing qPCR. Not sure how relevant that might be here though. My understanding is that the rate of false positives is generally very low via qPCR, and related primarily to probe design and/or contamination. Essentially, I think most false positive problems are operator error of some sort and less about what the machine can amplify. Contamination can be monitored via lab/field/travel blanks and internal PCR controls, etc. In our work, false negatives are addressed with occupancy modeling (3-level hierarchical model with replicate PCR and obs per sample unit), which traditionally assume no false positives.

        • > It’s also possible that I greatly misjudge the plausible specificity for COVID-19 tests

          The numbers I’m seeing are better than your estimates, but in low prevalence, not particularly good, if I understand the confidence intervals on the tests.

          The LabCorp EUA (https://www.fda.gov/media/136151/download) reports agreement with positive controls at 100% (95% CI: 91.24%-100%) and with negative at 100% (95% CI: 92.87% -100%).

          [ BTW, I’d appreciate getting some tips on what the expected values from those distributions should be; I’m not comfortable with 1/2 CIs. ]

          Abbott Labs just got an EUA for a COVID version of its Influenza test device. This study (https://pubmed.ncbi.nlm.nih.gov/31558351/?from_single_result=31558351) says, about the Influenza version of the device:

          > The sensitivities of ID NOW 2 for influenza A were 95.9% and 95.7% in NPS and NPA, respectively, and for influenza B were 100% and 98.7% in NPS and NPA, respectively. The specificity was 100% for both influenza A and influenza B in NPS and NPA.

          And includes this table, for two versions of the test using the ID NOW device:

          Virus Parameter ID NOW 2 ID NOW 2 VTM
          type A Sensitivity (95% CI) 95.7 (89.2-98.8) 96.7 (90.8-99.3)
          Specificity (95% CI) 100 (89.3-100) 100 (89.3-100)
          Type B Sensitivity (95% CI) 98.7 (93.0-100) 100 (96.2-100)
          Specificity (95% CI) 100 (98.5-100) 100 (98.5-100)

          These are all rt-PCR tests; the antigen tests have lower values. This paper (https://pubmed.ncbi.nlm.nih.gov/32104917/) reports, for one test used in China:

          > The overall testing sensitivity was 88.66% and specificity was 90.63%.

          I’ve got a lot of other references, bt these are representative.

        • Biologists who do qPCR all the time tell me that when done correctly it’s extremely reliable. Almost all of the false positive or negative is down to poor technique, which is a definitely thing when considering emergency conditions etc. But if you partnered with local universities you’d have virologists doing these and they’d be extremely high quality, and much lower actual cost than commercial results.

        • In particular, essentially the *only* way to get a false positive, is to contaminate the well. Thanks to the combinatorics of DNA, and the very broad sequencing that we’ve seen, it’s easy to design primers that are highly specific to this virus.

          False negative is more of a thing. You might have mutations in the particular virus or you might have damaged the RNA through unclean technique, or you might have swabbed someone with bad technique, etc.

          Virologists could easily provide this data to determine sensitivity and specificity already, if someone with some power asked them to collect this kind of data and provide it in a centralized place, they’d mostly jump at it. My wife (a biologist, but not virologist) gets daily emails from virologists at her university working hard pooling resources on this issue. They’re even stripping equipment out of shuttered labs to move them to virology labs etc.

        • RWM, Eric, & Daniel —

          tl;dr — After more research, I still think false positives are a critical issue for Daniel’s suggestion. Still, there may be good fixes to this problem, perhaps in previous articles on RT-PCR for screening testing.

          Thanks RWM, Eric, & Daniel! As you can probably tell, RT-PCR testing is far from my specialty so what you say very good to know. However, I am more familiar with the unintuitive pitfalls of trying to apply diagnostic tests as screening tests, which is in effect what Daniel is suggesting.

          Consistent with what RWM and Daniel are saying, many studies in the literature assume RT-PCR specificity is essentially 100% (ie, there are no false positives). However, these are studies of RT-PCR for diagnostic testing — for testing where you already suspect the case has the disease in question with reasonably high probability. If the probability the case has the disease is reasonably high, then it is practically unimportant if specificity is exactly 100% or a few percentage points away from that, so you might as well assume 100% specificity for simplicity’s sake.

          The situation is critically different when performing screening tests for a rare disease — ie, when testing individuals who have low probability of having the disease you are interested in. For COVID-19, we are talking about RT-PCR testing in populations where the actual disease prevalence is below 2% and often below 0.2%. Note, however, that if you apply a test in a population where no one at all has the disease, the proportion of positive test results is 1 minus specificity. So if the specificity is 98%, 2% of tests will be positive, if the specificity is 99%, 1% of tests will be positive, etc. Suddenly, those small differences from 100% specificity matter and prevent you from obtaining a good estiamte of COVID-19 prevalence at all. So, what is an ignorable detail in diagnostic testing has become a critical issue for screening testing.

          What then is the actual specificity of RT-PCR? Consistent with what Eric mentions, in reports of RT-PCR testing for infectious diseases I am seeing specificity estimates that are usually 90-100% and often more than 96%. However, these estimates are based on sample sizes of about 20-200, which means that statistically they cannot distinguish between say 99% specificity and 98% specificity. Yet, a different of that size would have a dominant effect on estimates of the COVID-19 prevalence in the general population. So specificity data from the literature is of little value for considering RT-PCR screening testing because sample sizes are too small. Really the situation is even worse than this, since for good COVID-19 prevalence testing, we would like to distinguish between quantities like 99.5% and 99.75% specificity, but sample sizes needed to do that are prohibitively huge. Moreover, real-world specificity is likely to differ from that in a study.

          But there may still be solutions! As RWM and Daniel have helpfully mentioned, the false positives from RT-PCR are generally lab errors. This suggests that a good approach might be to take 2 swabs from each person and send them to different labs, since whatever false positives occur may then be mostly independent. Another approach could be to follow-up positive cases with more definitive analyses like sequencing, though that would be difficult for the pooled testing approach.

          In general, I would also suggest looking at previous studies of RT-PCR for screening for rare diseases. I’m not familiar with this literature, but assume it is out there. Ideally, it should contain solutions to the problems I raise above — that, or people have just been ignoring these problems, which sadly occurs and leads to a lot of bad medical information.

        • My wife estimates about a 2% false positive error *entirely* from lab contamination / technician error. Thermodynamically it’s essentially impossible to get a positive result without some DNA in the well, so in this sense there’s no “false” positive, every positive tells you there was in fact some DNA in the well… it comes down to contaminating the wells with DNA that shouldn’t have been there.

          As you say, running two separate plates with separate technicians mitigates this and it would be easy to build a more sophisticated data analysis model that incorporates false positives from contamination if you are running independent plates.

          Thanks for expanding on your concerns. I agree with you now, it makes sense to run duplicate plates or some other strategy that incorporates separate technicians.

        • Daniel, I’m glad that was helpful for you!

          It would probably also be good to consider the probability of false positive test results occurring due to cross-reactivity with other coronaviruses in the population, including those that have not been sequenced. I’ve heard this probability is “negligible,” but I don’t know if “negligible” includes values like 1% — which are negligible for diagnostic tests but too large for your proposed screening tests — or whether “negligible” actually implies close enough to 0 to be acceptable for your proposal.

        • Daniel — Ah, great point!

          I think this discussion has now resolved the fundamentals of the problem of false positives.

          The main question that’s left is why the cdc is not doing something similar to this.

          Another question is whether the general population sampling in Iceland has considered these false positive issues, or is potentially mistaking the 1 – specificity of their test for covid-19 prevalence.

  7. I agree that we should be doing widespread randomized testing of the U.S. population, and also, when available tests for antibodies. (I tweeted this over a week ago, but didn’t get much agreement*). Sheer prevalence is relevant, but informing individuals would seem important. Rather than send people to locations (older people might not go out), why not inform everyone of the rationale and importance, and explain how it has helped other countries. Minimally, I think individual states where covid-19 hasn’t yet spread that far should take the lead in this type of testing, e.g., where I’m at in VA.

    *also in this blog https://errorstatistics.com/2020/03/26/the-corona-princess-learning-from-a-petri-dish-cruise/

  8. There is some (but not an enormous amount of) literature on using pooled sampling methods for detecting the prevalence of a disease. Perhaps I could point readers to Section 5.3 of our recent survey paper: Aldridge, Johnson, Scarlett, “Group testing: an information theory perspective” https://arxiv.org/abs/1902.06002

    • As Matthew’s co-author, I would add that with such a wide range of uncertainty on the true infection rate, I think you’d get better information (guarantee of concentrated posterior) by not doing such a rigid test strategy.

      That is, I’d do some groups of size 10, some size 20, some size 40, some size 80.

      (If the true prevalence was 1%, you’d get very little Shannon information from your tests of size 10)

      • It’s tricky as to how well the assay works in more pooled sampling. But it seems to work at up to 64 in an Israeli study. The Shannon information from pools of size 10 on a 96 well plate is quite noticeable relative to the suggested beta(2,95) prior. and realistically the value of this information decays rapidly with time at the moment. If we spend time optimizing the method and wait 2 weeks to get this done, it will be too late to use the data to shuffle resources… The peak surge will be here then anyway. But in principle you’re right we should work out some optimal pooling strategy for future use.

        • Ok, happy to make top group size 64 not 80 if you like. I certainly agree about not spending weeks optimizing the design.

          However, the literature that we surveyed is pretty well established: group testing will estimate prevalence, but you get a better estimate if you have a ballpark value in mind to start with. If not, then jumping geometrically across a range of values will do a good job.

          In this case as I say it’s not implausible prevalence is 1% or lower, so at most 10 of your 96 tests would be positive. To be clear, I do think your idea is a good one, just that fixing the pool size means you are concentrating on a particular range of parameters values.

        • Definitely, you are correct. I wonder about the feasibility of deploying people to actually swab and collate and process say 6000 swabs. The fixed 10 suggestion was in part practical on such a short timeframe.

  9. “we’re wasting all our tests testing sick people where the bayesian prior is basically that they have it already”

    This doesn’t seem to be true though, since the positive rate of tests is roughly 10%, right? I’ve heard there’s a fairly high false negative rate (difficulty getting good swabs), but still.

    • At risk of entering weeds here, but could consider occupancy modeling approach to get handle on false negatives due to this sort of thing. However, that requires replicate samples per individual, so not sure if worth the effort/samples (or how that would work with pooled samples).

      • Definitely a trade-off. I’ve heard in China they require 4 negative results in a row now to let someone leave quarantine due to this issue though… so maybe there is data somewhere already, although I suppose they are using different tests.

    • The point is that in terms of the epidemic overall, the information gain about the prevalence from testing 960 people at a hospital is that you know about 96 more people who have it. The information gain from testing 960 people at a couple of grocery stores is that you know within 0.5 or 1% or so what the prevalence is in “people willing to go to a grocery store” which is a big swath of the population.

      • I don’t disagree with that, but in a world with scarce tests where 90% of who have symptoms AND pass whatever hurdles are presently in place (things like negative flu tests) test negative, it seems like the tests are still needed for those with symptoms. So I think the tests do provide imperfect and yet valuable information for those with symptoms.

        But some random sampling would be worthwhile. Hard to know what the right balance is when we’re so short on tests. S Korean data seems to indicate that lots of young adults don’t even know they have it.

        • Why are these tests needed for people with symptoms? They literally don’t change the treatment plan AT ALL. In fact the county of LA has specifically decided NOT to test anyone seeking healthcare for COVID like symptoms for this very reason, I read.

        • My understanding is that we are so short on tests that testing centers won’t test people with mild symptoms because it won’t change the treatment, i.e. nothing.

          My assumption is that for those with serious symptoms, it does help to know what they are suffering from to anticipate which treatments may be the most helpful, not to mention what precautions the staff need to take when treating the patient.

        • If a person has severe bilateral pneumonia at the moment I bet the frequency of positive tests in that group is 95% or more. So running the test is useless.

          It’s among the “i have a fever and a dry cough” crowd that positive tests are 10%.

          So, among the mild cases, it’s best to simply assume they have the virus, send them home with very clear instructions. and move on. Among the severe cases, you put them on ventilators and just assume they have it, and you’re almost always right…. So testing in hospitals is *information free* in terms of treatment.

          whereas testing the community tells you how much load to expect 5 to 10 days out, and could tell you whether your community countermeasures are effective or need to be heightened.

        • Prognosis for hospitalised patients depends on them having or not having covid-19. As do the therapeutic choices, in particular when considering experimental treatments, and service planning. The need for isolation protocols and protecting equipment depends on being or not in front of a SARS-CoV-2 infection. Saying that testing in hospitals is *information free* is beyond ridiculous.

        • n of two, but I actually had a member of my team from work go the hospital 2 weekends ago with respiratory problems. Had pneumonia but tested negative for coronavirus. He had just run a marathon, which had weakened his immune system, and visited his mother, who also had pneumonia (and rhinovirus) but not coronavirus, which is where they think he picked up the infection.

          Coronavius is more and more likely (even over the past two weeks), but there are still plenty of ways to get pneumonia.

        • Carlos, I admit I’m thinking mainly of during the surge. When we hit 3 people per available ventilator, it doesn’t matter what your pneumonia is from, you’re not getting treatment, and there’s no room in isolation, and there will be no masks available for the doctors, and the patient is dead before the tests come back anyway (round trip is about 24 hours). Calculations over the last several weeks have always showed that to be essentially a necessary fact of having waited this long to shut things down. However, it’s possible they were wrong. I HOPE they were wrong. But I don’t see anything in Italy, Spain, France, NY, New Orleans etc suggesting they were wrong.

          So, yes, pre-surge when the isolation wards are still available etc, there is benefit to testing. That may apply mainly to rural areas in US or states that lagged in infection initiation so were able to shut down at an earlier stage.

          At the moment I’m assuming it’s entirely impossible to avoid overwhelming many large cities in the US. What *could* maybe be done is to mobilize heroic military resources to move medical equipment to the places with the most broad infection levels, but this would require the kind of information this proposal seeks to generate.

        • > When we hit 3 people per available ventilator, it doesn’t matter what your pneumonia is from, you’re not getting treatment,

          It absolutely does matter, the patient with better prognosis (everything else been equal) should get it. Also, covid-19 patients require ventilators for a longer duration.

        • > It absolutely does matter, the patient with better prognosis (everything else been equal) should get it. Also, covid-19 patients require ventilators for a longer duration.

          When you hit 3 people per vent, all the prognoses will be the same, except maybe by age. Just as in Italy, it’ll be just putting the youngest people on vents. You do that without tests because the test takes 24 hours to get the information on, and you have about 15 minutes to an hour to make the decision. During the surge, anyone who wasn’t a COVID patient immediately becomes one anyway. The ~24 hour lag makes the testing useless under these conditions, except to classify the cause of death.

          These are the things I hear from updates from my medical community friends or online articles. when the surge comes, it’s just moving beds, sorting people by age, and letting people die in the hall. That seems to have started in NY already :-(

          https://www.washingtonpost.com/national/coronavirus-morgue-autopsy-funeral/2020/03/27/7d345478-7057-11ea-aa80-c2470c6b2034_story.html

          The point is, if you devote testing resources to community prevalence, in whatever manner, then you can maybe reduce the chance that the surge reaches this level above capacity by shifting large quantities of supplies to the hardest hit areas BEFORE people show up at the ER so that you expand capacity appropriately.

          Think of it like a military campaign, you don’t want to be sitting around with all your tanks over here, when the enemy is lining up to attack over there…

          At this point, maybe the overload is less, and it does make sense to shift tests to individuals.

          People are acting like Italy and Spain’s experience won’t happen in the US. In the mean time, the US is the top country by confirmed cases, and we’re clearly 10 to 12 days lagging Italy in the dynamics. 10 days at these dynamics means ~ 10x. So the 116k confirmed cases will be ~ 1.1M by the 7th or so.

          It looked like Italy might have hit its daily new case peak, but then yesterday it had a massive surge. That might be due to surges in other parts of the country? I don’t know. I don’t have sub-country data to look at.

          Ideally, you’d have enormous testing capacity with very short turnaround. If you can turn a test around in 10 minutes and you have 1M of them, then yes, you should reserve some for hospitals. If it takes 24 hours and the patients average life expectancy at arrival to the ER is 3 hours… the test is totally a waste of time.

  10. “No one has the slightest clue how widespread SARS-Cov-19 infections really are in the population, we’re wasting all our tests testing sick people where the bayesian prior is basically that they have it already, and the outcome of the test mostly doesn’t change the treatment anyway. It’s stupid.”

    It seems to me, as well, that there would be better use for limited testing capacity than testing those who show clear symptoms, specific to Covid-19 — just assume that they are positive, and go on from there. However, there’s the middle ground of testing those who show some symptoms but not (yet) very specific or clear symptoms.

    One advantage to identifying such cases as early as possible would be that it allows more efficient tracing and severing of transmission chains: for a person who is tested positive, take all their social interactions from the previous N days, test the persons involved, quarantine if positive, and continue recursively.

    Another example of the benefits of testing unsure cases, although more subtle: I would guess that people are less likely to break the quarantine if they know for sure that their mild symptoms are caused by Covid-19. This could become more significant if exhaustion sets in as the epidemic goes on.

    One more benefit to knowing that the symptoms are caused by Covid-19 is that it allows returning to the society after recovery with an (assumed) immunity.

    It is not clear to me if the benefits of being able to estimate prevalence outweigh the above-mentioned benefits of testing unsure cases, or vice versa. Doubling the testing capacity would allow to do both…

    • I think you’re underestimating the importance of testing in the hospital setting. There is a scarcity of personal protective equipment. Treating every potential patient as a SARS-Cov-2 infection wastes a lot of resources. What is required is faster turnaround times to have a better idea of who really needs to be put in isolation without having to wait for days for the results of the tests. (Informing treatment decisions would be another reason.)

      https://jamanetwork.com/journals/jama/fullarticle/2763590

    • The problem is with the current bolus of patients it is far too big to trace contacts or use isolation rooms etc. The time for that was early February but testing was an utter failure then. Today there are probably 1M patients easily in US. We need to work through that caseload. While monitoring to see that distancing measures are working and allocating emergency surge capacity to the appropriate locations. These immediate goals are better met with community monitoring. Later when active infections have dropped to ~ 1000 we can return to individualized testing. Hopefully by then capacity will be 100x larger

      • As an aside: do we have numbers for total, all cause mortality and morbidity / ICU / hospitalization rates for these same regions that appear on the covid graphs?

        Would be interesting to see what those trends show.

  11. This is an excellent post. (And again, it’s mind-boggling that we don’t have a ready-made plan for doing something like this.) As Daniel notes in a comment, “Pretty much most major metropolitan areas have universities that are SHUTTERED now.” What’s more, lots of biologists and similar sorts at these universities (mine included) are aching to do something useful. It would be trivial to get volunteers.

  12. Hang on, am I missing something or is there a big pseudoreplication issue here? Disease prevalence is potentially very clustered, so even if you are testing 1000 per location you are only looking at 100 locations, and thus potentially missing completely individual hot spots? Therefore your uncertainty will be much larger than a naive implementation suggests…

    • The strategy by which you choose your locations and the number of different plates to run and etc is of course open to a lot of choices and tradeoffs. The single plate strategy gives you an accurate estimate of a particular population: the people who come to a particular spot on tuesday or whatnot. It is absolutely the case that prevalence will vary in different communities. you certainly couldn’t just run one of these and figure out everything about the whole country for example. The point is that you can use this kind of strategy to get relatively precise estimates of prevalence within a particular community at vastly lower cost than running ~ 1000 individual tests.

  13. 1% guess of prevalence is hopefully way too high. A prevalence of 0.1% will consume all available ICU beds, and then some. Roughly 4% of positive patients need intensive care, so with 0.1% prevalence an area will need 40 ICU beds per million, just for COVID. There are about 30 ICU beds per million in the US, mostly occupied by non-COVID patients. Since most patients so far have been able to access care, we can be sure that in most places (probably not NYC), actual prevalence of COVID-19 is much less than 0.1%.

    • Isn’t the 4% estimated based only on confirmed cases, i.e. cases that tested positive, and not the number of infections? Where the number of infections are confirmed cases plus any untested cases.

      We already know a lower bounds for the prevalence (current confirmed cases/population) in e.g. NY to be about 0.24%, NJ is 0.1% and LA/WA both at 0.05%. These bounds are very likely not tight because there is a lag effect in confirmed cases and an underestimation issue when using confirmed as a replacement for infections.

  14. If the prevalence is as low (e.g. 1%) and the PCR is the bottleneck, then you could also identify the people that are positive by rerunning the test for all samples in the wells that tested positive.

    So by doing 960 swabs for one location you could identify the positive cases with 2-3 PCR trays, instead of 10 trays.

  15. What about rerunning the PCR test with tje individual swabs in the wells that tested positive? Then you could identify the positive cases from 960 samples with 2-3 PCR trays, instead of 10 trays? Assuming around 1% prevalence and that that PCR is the bottleneck.

  16. 2 cents for 2 thoughts (and perhaps I am repeating what others already stated — I haven’t read all comments):

    1. Putting 10 samples into 1 well may lead to a high degree of dilution, particularly if it’s just 1 out of 10 who are currently infected instead of 5 or 6 out of 10. The assay may be too close to it’s lower limit of detection (i.e., the concentration that can still be reliably differentiated from zero) to accurately flag an infection. Thus, you run the danger of ending up with too many false negatives — you keep underestimating the infection rate of the virus.

    2. It appears to me that what we need are (a) a safe and potent vaccine against corona (this may take a while, though, especially if you want to make sure that it’s really safe), (b) massive, random-sample screening of the virus to get a better sense of how quickly it gets transmitted, and (c) a valid assay for antibodies in those who do not appear to have the virus — they copuld have never been infected OR they could already have been infected and developed a regular immmune system response. That’s where a lot of the money shouid go right now so that those who have already had it can be classified as immunocomeptent, resume work, and help get the economy going again.

    • A Technion study showed that you could pool 64 swabs already. I chose 10 because it keeps the probability that you get several positives in a well down. It would be good to optimize the whole thing but we have too little time before our full surge, we need this data by next Wed at the latest…

      • A collaboration of German labs just (pre-)published results from using the 10 well minipool strategy you describe above, although with a different collection method and no attempt to estimate prevalence beyond simple case estimation. https://www.medrxiv.org/content/10.1101/2020.03.30.20043513v1.full.pdf

        The greatest challenge here seems to be bridging the gap between statistical optimality (or at least efficiency) on one side, and the need of labs for a simple, reliable, quick, and predictable on the other side.

        Colleagues and I have done simulation studies (using prevalence rates covering the range seen in countries currently dealing with COVID-19) that suggest row-and-column pooling on 96-well plates gives reasonable efficiency. https://www.medrxiv.org/content/10.1101/2020.03.27.20043968v1

        • Daniel Klein — Thank you for posting your fascinating and useful manuscript.

          I encourage you to update your analyses to account for the issue of imperfect test specificity. In short, when testing at low disease prevalence (for example, less than 2%), issues of false positives that would be negligibly small in usual practice can become a central, critical obstacle for estimating the disease prevalence in the population. For more on the importance of this issue and potential solutions, kindly see my discussions with Daniel Lakeland elsewhere on this page. 

  17. One of the issues in the current mess is that we are shown aggregate data, for example by country. With better prevalence data and associated covariates, the picture would have been sharper. The Princess cruise ship data gives a peak at a population sampled in a census like data collection, including covariates such as the ventilation system and the staff activity.

    Location based data, wit covariates regarding the population behavior and infrastructure characteristics would help understand the numbers we see, calibrated per capita or not.

    For an excellent account of the Princes case study see https://errorstatistics.com/2020/03/26/the-corona-princess-learning-from-a-petri-dish-cruise/

  18. What is called here pooled sampling is often called “group testing”. It is used in several types of applications.

    Group testing was originally developed at Bell Laboratories for e±ciently inspecting products. M. Sobel and P. A. Groll, Group testing to eliminate all defectives in a binomial sample, Bell System Technical Journal 38(5) (1959) 1179-1252.

    Other important papers include the one by Geoff Watson from Princeton: G. S. Watson, A study of the group screening method, Technometrics 3(3) (1961) 371-388 and R. Dorfman, The detection of defective members of large population, Annals of Mathematical Statistics 14 (1964) 436-440.

    The blood testing applications is described in: F. M. Finucan, The blood testing problem, Applied Statistics 13 (1964) 43-50.

    We used it in testing web applications: Bai, Kenett, Wu; Risk Assessment and Adaptive Group Testing of Semantic Web Services, International Journal of Software Engineering and Knowledge Engineering Vol. 22, No. 5 (2012) 595-620, https://www.worldscientific.com/doi/abs/10.1142/S0218194012500167

    Identifying affected units and deriving prevalence estimates is also relevant in industry 4.0 applications, among others…

  19. … “Testing” itself does not prevemt or cure Covid-19.

    This emotional fixation upon testing seems highly misplaced.

    We have a huge amount of technical & historical experience with infectious diseases, viruses, and flu viruese — step back and calmly look at the big picture.

    What is the overall sober, rational path of action from here ?

    • The sober rational path is to collect data on the extent of your epidemic, rather than the individual patients, and use this extents data to plan the deployment of resources. Hence the focus on *testing the community* not testing hospital patients.

      Once we have data on where the infections are and how rapidly they’re changing (repeated community sampling through time) we can make much more rational decisions.

      This is particularly important because you can detect an infection easily 5 to 10 days before that person shows up at the ER, so by community sampling, you can figure out what resources are needed before it’s too late and you need them but can’t get them.

  20. This could be a good study, but hopefully people are sharing the data that are already being collected on non-symptomatic populations. NBA players found 7 positives (5 symptomatic) testing roughly ¼ of 450 players by March 17 (teams tied to UTAH Jazz, https://www.forbes.com/sites/kurtbadenhausen/2020/03/18/why-do-so-many-nba-players-have-the-coronavirus-and-why-are-they-getting-tested/), but those players travel a lot and are not super representative. A better and more important dataset should be analyzable from all hospitals in NY Presbyterian system, which started testing all mothers coming in for delivery because of an infected mother who only showed symptoms rapidly after delivering. That population is young, healthy and diverse, though even they are biased because many are trying to leave the city due to another policy disallowing spouses in labor and delivery. I happen to know that testing sample exists because my spouse works at a NYP hospital, but how many other semi-random samples are out there? What agency has people tracking those ‘natural experiments’ down? Who should we contact to increase the likelihood this data is shared and used?

    • data that are already being collected on non-symptomatic populations. NBA players found 7 positives (5 symptomatic) testing roughly ¼ of 450 players by March 17

      So 1.5%, that is similar to what was reported for healthcare workers in the Netherlands and general population in Iceland. Also, 30% are totally asymptomatic like on the princess diamond. IIRC, for that data 85% were asymptomatic/mild in the end (would normally go unreported).

      because of an infected mother who only showed symptoms rapidly after delivering.

      Very interesting, is there something in the blood of young healthy people that depletes with age, stress, and chronic illness?

    • One of the epidemiologists leading the response in Korea (subtitled):

      https://www.youtube.com/watch?v=gAk7aX5hksU

      20% of infected people have no symptoms. Even the lightest face masks can be effective. Generally not aerosolized but that’s not impossible.

      By far the biggest barrier to dealing with this issue is the lack of any office or agency coordination information and communicating accurate information back to the public. Different groups and agencies are gathering sub-standard information over and over and over again.

      BTW, Washington State – the place of the first case in the US – is still restricting testing to people with symptoms!!! OMFG. It’s not easy to find this out. You have to actually try to find out how to get a test, then you’ll run into this little problem that no one is talking about.

  21. I guess you are familiar with the Bayesian model by Sunetra Gupta et al., heavily reported in the UK media, saying that just from the death rates the population infection rate is really unclear. Maybe that kind of thing is the background to this post, although I didn’t see it mentioned. Would be interested to know your take on it.

    Fundamental principles of epidemic spread highlight the immediate need for large-scale serological surveys to assess the stage of the SARS-CoV-2 epidemic
    Jose Lourenco, Robert Paton, Mahan Ghafari, Moritz Kraemer, Craig Thompson, Peter Simmonds, Paul Klenerman, Sunetra Gupta
    https://www.medrxiv.org/content/10.1101/2020.03.24.20042291v1

    • Daniel, thank you for this thread. “To make decisions about how much physical isolation and shutdown and things we need, we NEED real-time monitoring of the prevalence in the population.”

      Can you or someone else clarify why we would not also require real-time monitoring of individuals (i.e. the same individuals tracked over time, within a population)? Presumably, dynamics at the level of populations can (and do) differ from dynamics at the level of individuals.

      • In the presence of an infinite supply of fast tests, you need everything, so do everything. But in the present situation where there are *whole states* that have done only a handful of tests… we need prevalence more than individuals because prevalence allows us to know where to physically send medical resources to meet the demands in the coming week to 3 weeks.

        Long term once the immediate case pulse has worked through to resolution, we will need different things, which will definitely include the test and trace methodology, which is very effective but only at small prevalence.

        • “prevalence allows us to know where to physically send medical resources to meet the demands in the coming week to 3 weeks.”

          But the problem is that no one has anywhere near enough materials to meet the current demand anywhere. No matter what you project about demand you won’t be anywhere close to being able to satisfy it. The available materials are so far short of the requirement, and the uncertainty in the predictions high enough that even just trying to optimize the distribution of equipment isn’t going to be effective, especially when, forced to make choices, people are finding local substitutions.

          If this was going on eight to ten weeks ago, it would have some chance. But even six weeks ago it was probably too late.

  22. It’s a reasonable plan and something lots of us are thinking about already and trying to implement. Obviously the sample size here depends on your inferential target (prevalance, incidence, etc.) and particularly depends on the spatial resolution you would like if the spatial distribution of cases is of interest. There already exists some data in the form of cases, hospitalisation, deaths etc. that could be informative. And to improve this sampling design, if spatial prediction were of interest, then you would target areas with high prediction variance (or some other criterion) rather than just purely randomly.

  23. I am certainly curious about what was said about corona pandemics before, on the occasion of work on vaccines – [curious, if not brazen enough to barge into a vast & alien body of work.]

  24. Evidently the CDC plans to do a random sample of some sort this summer. See the URL: https://www.statnews.com/2020/04/04/cdc-launches-studies-to-get-more-precise-count-of-undetected-covid-19-cases/; look in the third paragraph of the news item. I had the idea of a relatively simple stratified random sample, the strata defined by sex and, initially, broad age groups, and perhaps some geographic stratification. I’m certain I’m not the only one to have come up with this idea.

    The goal would be to estimate the proportion not infected, infected and symptom-free, and infected and presumably with mild symptoms. (Those with moderate or severe symptoms would likely not be out and about.) What seems to have been lost is that until prevalence stabilizes, periodic samples are needed to assess the rate of change in these proportions; furthermore, two or three samples would not be enough. The growth of the infected proportion is likely to follow a logistic curve of some sort and it would be helpful to know where the point of inflection is along with the asymptote.

    I’m sorry, but it seems to me that the CDC is more that a day late and lots of dollars short.

  25. I don’t know anything about the subject so I can tell if this approach makes sense. But some people here may find it interesting:

    https://www.billiontoone.com/covid-19

    BillionToOne developed a novel qSanger-COVID-19 Test that is more than 30x higher throughput than existing quantitative PCR (qPCR) methods and is highly accurate and cost-effective.

    Combining the unused Sanger sequencer capacity from the Human Genome project and the proprietary machine learning algorithm, BillionToOne’s qSanger-COVID-19 unlocks millions of daily testing capacity worldwide.

Leave a Reply to Oliver C. Schultheiss Cancel reply

Your email address will not be published. Required fields are marked *