Updated Santa Clara coronavirus report

Joseph Candelora in comments pointed to this updated report on the Santa Clara study we discussed last week.

The new report is an improvement on the first version. Here’s what I noticed in a quick look:

1. The summary conclusion, “The estimated population prevalence of SARS-CoV-2 antibodies in Santa Clara County implies that the infection may be much more widespread than indicated by the number of confirmed cases,” is much more moderate. Indeed, given that there had been no widespread testing program, it would’ve been surprising if the infection rate was not much more widespread than indicated by the number of confirmed cases. Still, it’s good to get data.

2. They added more tests of known samples. Before, their reported specificity was 399/401; now it’s 3308/3324. If you’re willing to treat these as independent samples with a common probability, then this is good evidence that the specificity is more than 99.2%. I can do the full Bayesian analysis to be sure, but, roughly, under the assumption of independent sampling, we can now say with confidence that the true infection rate was more than 0.5%. (They report a lower bound for the 95% confidence interval of 0.7%, which seems too high, but I haven’t accounted for the sensitivity yet in this quick calculation. Anyway, 0.5% or 0.7% is not so different.)

3. The section on recruitment says, “We posted our advertisements targeting two populations: ads aimed at a representative population of the county by zip code, and specially targeted ads to balance our sample for under-represented zip codes.” This description seems incomplete, as it does not mention the email sent by the last author’s wife to a listserv for parents at a Los Altos middle school. This email is mentioned in the appendix of the report, though.

4. They provide more details on the antibody tests. I don’t know anything about antibody tests; I’ll leave this to the experts.

5. On page 20 of their report, they given an incorrect response to the criticism that the data in their earlier report were consistent with zero true positives. They write, “suggestions that the prevalence estimates may plausibly include 0% are hard to reconcile with documented data from Santa Clara…” This misses the point. Nobody was claiming that the infection rate was truly zero! The claim was that the data were not sufficient to detect a nonzero infection rate. They’re making the usual confusion between evidence and truth. It does not help that they refer to concerns expressed by “several people” but do not cite any of these concerns. In academic work, or even online, you cite or link to people who disagree with you.

They also say, “for 0 true positives to be a possibility, one needs not only the sample prevalence to be less than (1 – specificity) . . . but also to have no false negatives (100% sensitivity).” I have no idea why they are saying this, as it makes no sense to me.

The most important part of their response, though, is the additional specificity data. I’m in no particular position to judge these data, but this is the real point. They also do a bootstrap computation, but that is neither here nor there. What’s important here is the data, not the specific method used to capture uncertainty (as long as the method is reasonable).

6. I remain suspicious of the weighting they used to match sample to population: the zip code thing bothers me because it is so noisy, and they didn’t adjust for age. But they now present all their unweighted numbers too, so the weighting is less of a concern.

7. They now share some information on symptoms. The previous version of the article said that they’d collected data on symptoms, but no information on symptoms were presented in that earlier report. Here’s what they found: among respondents who reported cough and fever in the past two months, 2% tested positive. Among respondents who did not report cough and fever in the past two months, only 1.3% tested positive. That’s 2% of 692 compared to 1.3% of 2638. That’s a difference of 0.007 with standard error sqrt(0.02*0.98/692 + 0.013*0.987/2638) = 0.006. OK, it’s a noisy estimate, but at least it goes in the right direction; it’s supportive of the hypothesis that the positive tests results represent a real signal.

Table 3 of their appendix gives more detail, showing that people reporting loss of smell and lost of taste were much more likely to test positive. Of the 60 people reporting loss of smell in the two weeks prior to the study, 13 tested positive. Of the 188 reporting loss of smell in the two months prior to the study, 21 tested positive. Subtracting, we find that of the 128 reporting loss of smell in the two months prior but not the two weeks prior, 8 tested positive. That’s interesting: 22% of the people with that symptom in the past two weeks tested positive, but only 4% tested positive among people who had the symptom in the previous two months but not the previous two weeks. That makes sense: I guess you have less antibodies if the infection has already passed through you. Or maybe those people with the symptoms one or two months ago didn’t have COVID-19, they just had the regular flu?

8. Still no data and no code.

Overall, the new report is stronger than the old, because it includes more data summaries and more evidence regarding the all-important specificity number.

Ultimately, this is just one survey. Whether the infection rate in Santa Clara county in early April was 0.7% or 1.4% or 2.8% or higher, we’ll soon be getting lots more information from all over, so what’s important is for us to learn from what’s worked and what hasn’t in the studies we’ve done so far. Also, as we discussed earlier, all this specificity stuff will become less of an issue for populations with higher underlying infection rates.

P.S. Following up on point 2 above (“I can do the full Bayesian analysis to be sure”), here’s a further analysis allowing for the specificity and sensitivity of the test to vary across testing location. This gives wider uncertainty intervals for the underlying rate. Again, there’s only so much that can be learned from this one study.

55 thoughts on “Updated Santa Clara coronavirus report

  1. That the researchers would permit a spouse (an MD who should know better) to circulate an advertisement that falsely claims a) subjects are getting and FDA approved test, and b) such tests are being used in places to give immunity passes enable to work makes me extremely skeptical of their judgment and their work.

    Also, the NYC data pretty makes their IFR claims highly unreasonable.

    • >>Also, the NYC data pretty makes their IFR claims highly unreasonable.

      Probably.

      But how many nursing homes have gotten infected in Santa Clara County?

      Also, if it’s true that NYC overused ventilators, that might raise their IFR out of proportion to the rest of the US. Probably still shouldn’t be a 4x-7x difference though…

        • Quite likely, but NYC had way more critical-care cases relatively early on than anywhere else in the US.

          All I meant was that IF care got better in, say, mid-April, the number of ICU cases that were treated the old way would be overwhelmingly weighted towards NY. And thus IFRs might be lower in interior-US states, since the proportion of ICU cases treated after the change would be higher in interior-US than in NY (and Washington State etc.)

  2. Not clear to me that they’re using the manufacturer’s sensitivity data correctly in this new paper. Their test methodology was to count as positive anyone positive results either by IgG OR IgM. The manufacturer reported 75 of 75 known IgG positives to be kit-positive, and 78 of 85 known IgM positives to be kit-positive. For the study’s test characteristic adjustments, they used 78/85 as the mfgr sensitivity data. That would make sense I suppose if a) the mfgr’s data was all one population where for some reason 10 were untested or produced unusable results for IgG and b) the 7 false negatives by IgM all came from the population of 10 untested/unusable under IgG. If any of those IgM false negatives were int he 75 known IgG positives, then they should be treated as accurate-positive not false negative, because the study would’ve counted them as positive.

    I find it odd that they chose to count positives based on either IgG or IgM and not both IgG and IgM. Clearly the limiting factor in this test’s usefulness is its specificity. The test specificity goes up if you switch from either to both.

    I still am concerned about the likelihood of positive enrichment due to self-selection, and this is my biggest concern. A likely way for self-selection to manifest would be if people had limited interest in knowing their status and were likelier to participate if they thought they had been exposed. It would appear that interest was lowest among the Hispanics contacted — despite the study authors’ efforts to get a balanced sample, Hispanics represented only 8% of the test subjects while they are 28% of the county population. And Hispanics had by far the highest raw positive rate in the study — 4.9% vs an overall average of 1.5% for the full population (Asians were next at 1.9%). To me, this suggests there’s a good chance that there was self-selection by the Hispanic population in the form of a lot of healthy people staying home, which would’ve slightly enriched the positive rate in the raw test sample, but would’ve _heavily_ enriched the positive rate in the study’s estimate of population-weighted prevalence (the population weighting brought the rate up from 1.2% to 2.8%).

    The new paper mentions a couple things that were errors in the first paper — somehow they missed 2 true-positives in the sensitivity testing they did themselves (27 true-positives instead of 25)? Also, they originally misused the mfgr data on specificity, which was 369 of 371 known negatives to be kit-negative for IgG and 368 of 371 known negatives to be kit-negative for IgM. Version 1 of this study used 369/371, which doesn’t make sense given that they did their study counting positives based on either IgG or IgM. In the revised study they used 368/371, which is better and likely correct. However, as with the mfgr’s sensitivity info, it’s still not clear to me that that 3 false positives is correct; again the manufacturer doesn’t say either way whether the 2/371 false positives for IgG were different people than the 3/371 false positives for IgM. I would guess that the two groups overlapped, which would mean the 3 used in this study is correct.

    If anyone else wants to look at spec/sen information reported by the manufacturer, I think this is it (see tables on page 2):
    https://imgcdn.mckesson.com/CumulusWeb/Click_and_learn/Premier_Biotech_COVID19_Package_Insert.pdf

    • It seems to me that the high number of “loss of smell within last two weeks” patients is consistent with the self selection hypothesis. That or there was a very recent surge in cases.

      Since those would be recent infectees who would most want to know about having the virus. I don’t think Andrew’s theory about antibodies wearing off within two months squares with other data.

    • “I find it odd that they chose to count positives based on either IgG or IgM and not both IgG and IgM. ”

      It’s because of the biology of these antibodies. IgM appears first. There is a lag, typically several weeks, before IgG antibodies appear. And after IgG antibodies appear, IgM tapers off and, in most cases, disappears. So if you define a positive test as both IgG and IgM you will get false negatives for most people who were either recently or remotely infected and only identify those who were infected within a brief (i.e. several weeks or a few months) window.

      • Thanks, that makes sense. However, the IgM picked up an extra false positive in the manufacturer’s test. With such low prevalence, it hurts to take a specificity hit like that. But maybe the time delay on IgG makes doing the study impractical. Rock and a hard place, I guess.

  3. So (cellwise) the weights vary over almost three orders of magnitudes (10**-1.2-10**1.5) and the unadjusted prevalences over what, 0%-15%?

    Surely something smart has to be done to get good estimates?

  4. I would also add that comparisons these some of these authors have made to the flu in the past are misleading. CDC quoted flu death rates are not adjusted for asymptomatic infections and include statistical estimates of flu deaths resulting from respiratory and cardiac events where influenza was not reported on the death certificate.

    See https://blogs.scientificamerican.com/observations/comparing-covid-19-deaths-to-flu-deaths-is-like-comparing-apples-to-oranges/

    and

    https://www.cdc.gov/flu/about/burden/faq.htm

  5. Question: Has anyone determined whether people who get a low dosage (mild exposure) of the virus or virus proteins can produce enough antibodies to be positive in the serological tests but not obtain immunity?

    • That’s a good question. However, why should this coronavirus behave all that much differently? Even a vaccine may not guarantee full immunity. Flu vaccines, I understand, range in the 30-60. And vaccine for coronavirus has been tried unsuccessfully.

  6. Likely that part of the enrichment of recent loss of sense of smell/taste as positive vis a vis those who had the symptoms remotely (22% positive if in the last two weeks, 8% in the last two months) reflects that the former group is enriched for IgM as well as IgG and the latter group relies on IgG only. A lateral flow immunoassay like this obtains quite a bit of its sensitivity from IgM positivity; that is, early stages of infection.

    Another issue: since early in our experience with this virus, smell/taste disturbance has been identified as a prime presenting symptom. These are not asymptomatic carriers:these are all people an informed clinician would strongly suspect for COVID infection had they presented. Easy to imagine that many such people self selected after hearing this symptom might be a COVID symptom. One should also note the earlier incidence of smell/taste disturbance may have been preCOVID and for more pedestrian reasons, such as nasal polyps.

    Happy to see the new update reels back some of their earlier rather sweeping contentions that there is a vast hidden asymptomatic immune protected group. These further revelations serve to emphasize that the reservations expressed by the statistical community were valid. A true representation of the data appears to support a much more modest contention: namely, the asymptomatic post infectious population is small but real and that herd immunity is miles away. As noted by Dr Gelman above, the notion that we had yet to identify all cases of COVID with the testing to date was never in doubt. I actually think their data seem to suggest that we haven’t missed a whole lot.

  7. It seems the reweighting mistake still persists. (Following the reweighting, they treat 4 samples the same as a 1 sample upweighted 4x, even though a single error can obliterate the latter.) They have to either refine their analysis (by carrying the weights through to the bootstrap estimates of sensitivity/specificity), or (crudely) inflate their FPR/FNR by the maximum coefficient of reweighting.

    That being said, with the new data, the study is much more convincing.

  8. Good to see the apparent scientific progress on specificity (item 1). Lucky break that it aligned so well with the point estimate from the previous data!

    Item 5 is pedagogically discouraging. It’s one thing to miss something when drafting a report but quite another to still not see an error after it’s been pointed out.

    The weighting problem remains a mystery to me. The zip code thing feels largely decorative. I still can’t tell what slice of the population these results are supposed to speak to. Facebook users who want a test and can get to the testing site at the right time?

  9. The specificity trials on page 19 are not normal.
    7 trials show 100%, with N=30,70,1102,300,311,500,99, sum 2412.
    The 6 remaining trials:
    368/371 = 99.2% (97.7-99.8)
    198/200 = 99.0% (96.4-99.9)
    29/31 = 93.6% (78.6-99.2)
    146/150 = 97.3% (93.3-99.3)
    105/108 = 97.2% (92.1-99.4)
    50/52 = 96.2% (86.8-99.5)
    Pooling these, I get 896/912=98.3% (97.2-99.0).

    We use the pooled test performance based on the available information:
    Sensitivity: 82.8% (exact binomial 95CI 76.0-88.4%)
    Specificity: 99.5% (exact binomial 95CI 99.2-99.7%)

    There is no trial that has exactly 1 false positive. There are 3 trials that don’t have 99.5% in the 95% CI (4 trials if you include 1102/1102). There is no trial that falls inside the 99.2-99.7 range (one straddles it). The specifity range they’re using is an empty space between the values that the trials are actually at. This is not a normal distribution.

    Andrew pointed out that 187 samples had loss of smell and taste in the past 2 months. This is a very specific indicator for Covid-19, ~70% of patients (well, 33,9–85,6%, depending on the study, e.g. Mons/Belgium, Heinsberg/Germany) have that, and I don’t think this kind of nerve affliction has been reported for any other common illness. Yet only 11% of these samples test positive. For the 59 more recent samples, it’s 22%.
    This looks like the prevalence this study should have measured is 267/3330 = 8%, and the test failed to pick up on that. It would fail to pick up on recent infections, because they wouldn’t have seroconverted (created enough antibodies) yet, and it would fail to pick up on infections that happened too long ago (because the antibody levels would have fallen off below the sensitivity of this test). This study really needed a more sensitive test, like an ELISA, which is actually available at Standford, and is able to detect much lower levels of antibodies.

    This kit has not been validated against people who had the infection a month ago.
    The presence of false positives is an indication that cross-reativity with outher cold viruses exists. If you test a sample with few people who haven’t had a severe cold recently, which probably includes most samples taken of people who check into the hospital for elective surgery, or samples taken in the summer months, you get an optmistic sensitivity that does not apply to the general population in early spring.
    The WHO Early investigation protocol (Unity protocol) for the investigation of population prevalence mandates the use of an ELISA test, or the freezing of samples until a time when such a test becomes available. The WHO does not endorse the use of lateral flow assays for this kind of testing.

    —-

    P.S.: No study that does not measure prevalence in the older population where the majority of deaths occurs should speak on fatality rates. This study had 2/167 positives in the age 65+ population, that’s 0.1-4.3% (95% CI), a 30-fold spread, and hardly a representative sample, since I don’t expect residents from care homes were able to attend the drive-through testing.

    • Another consideration is that asymptomatic/paucisymptomatic cases might actually not have a full antibody response; are there studies on that? In that case, the “no smell” samples probably include a large nuimber of people who had no symptoms otherwise. (There is at least one reported case of the “no smell” symptom appearing before any other symptoms.)

      • I’ve seen a newspaper report that a virus cold or the flu can lead to a loss of smell when the nose is inflamed.
        I haven’t seen a study or any statistics, but my point that loss of smell is characteristic of Covid-19 may not be as valid as I thought.

      • FWIW — I have suffered from chronic sinusitis my entire life; have lost sense of smell and taste many times.

        Mucus and/or inflammation of the nose can block olfactory neurons; it’s common. This doesn’t mean what is happening in COVID-19 is different or the same, though. TBD, it seems.

        • Twain, I have a totally game-changing treatment for you. Find someone who is healthy, who doesn’t suffer from sinusitis, and have them use a standard sterilized NeilMed wash bottle with standard NeilMed sterile saline, catching the effluent in a sterile glass (like a jar or measuring cup with a pour spout). Then immediately put the effluent into the NeilMed wash bottle and wash your own sinuses with it. (yes, wash your sinuses with a healthy person’s mucus, that’s exactly what I mean).

          To pre-sterilize the containers, use a microwave and some water to boil the water in a covered container containing the items to be sterilized for a minute or so… then pour out the water and let the containers cool before using… all fluid should be normal and at room temperature before use.

          almost ALL chronic sinusitis is aggravated or even entirely caused by imbalance in the bacterial species living in the sinuses. Most people start with allergy issues, and then go through several rounds of antibiotics for infections, and then they’re STUCK because until they replenish the stabilizing bacteria, they never get back to normal.

          Literally a single treatment of this method might be enough to change your life forever. If not, re-treat 3 days later, and then again 3 days later. More than that is unnecessary.

          Obviously, long term, treating allergies should be a big issue as well.

        • Daniel,

          Thank you for the suggestion! Seems similar to fecal transplants to C-diff; I’d be interested to see the literature.

          You mention the imbalance of bacteria — that is precisely my problem. I have had Specific Antibody Deficiency (SAD) to streptococcus pneumoniae since birth; for some reason, my body metabolizes IgG antibodies for this family of bacteria (but produces the IgGs just fine). So I’ve always had some “low level” infection, that conditions aggravate to sinusitis every so often.

          The solution, thankfully, has been receiving Pneumovax every few years when my IgG titer starts to decline or I start getting more infections. As I’ve aged, the problem has gotten better slowly; it seems like each year my body retains more IgGs.

          The above issue is actually quite common; ~1:500 people have SAD at birth. Its almost all “luck of the genetic draw”.

    • I don’t think low sample sizes on the older end of the demographics is necessarily that big a problem if you are arguing that fatality rates are low, the general argument is that confirmed cases skew older, and a more realistic fatality rate would include unobserved younger infectees.

    • I think you are right about the trial heterogeneity. A simple chi-squared test pretty intensely rejects the null of all of those trials being the same, which makes pooling the sample a very poor way to establish interval estimates.

      A reasonable (in my view) modelling procedure is to use the different trials as a random variable.

      > formula = success~f(trial, model = “iid”, param = c(0.5, 0.5))
      > inla(formula, data = dat, family = “binomial”, Ntrials = ntr, control.compute = list(config=TRUE))

      Incorporating the trial heterogeneity puts about a 25% posterior probability of any given trial given a specificity of <98%.

      • FWIW, a possible objection is that some of the samples come from covid era PCR-negative samples and one might suppose that some of those are PCR false negatives. Eliminating all the covid era trials (and merging the rheumatoid ones, since that seems like the same trial) improves the values somewhat, there’s about a 10% posterior probability of 1.5%.

      • (Uh, ignore the previous comment, the website chewed it up. Should be this)

        FWIW, a possible objection is that some of the samples come from covid era PCR-negative samples and one might suppose that some of those are PCR false negatives. Eliminating all the covid era trials (and merging the rheumatoid ones, since that seems like the same trial) improves the values somewhat, there’s about a 10% posterior probability of less than 98%.

        My prior choice is perhaps not the best here, but there’s not that big a difference in the results. I tried an informative beta(1, 0.1) prior, and that put the 10% point at 0.983. Alternatively there’s a probability of about 0.13 by my estimate that the false positive rate is over 1.5%.

        • PS: If you permute the sample so that the false positives are distributed randomly amongst the trials, doing the same procedure (using only pre-covid trials) gives a false positive rate that is below 1.5% with probability 0.0035. So trial heterogeneity – and assuming it is absent and thus doing the pooling procedure – makes a huge difference here.

        • Thank you for your analysis.

          My argument re: the older poeple is simply that if you are trying to determine an IFR for the county, you need to weigh it by age demographics, since the older people do have the most deaths everywhere — China, Germany, anyplace I’ve seen data of –, there isn’t some big number of “unobserved younger cases”, but I haven’t see age data for the US, so.. when doctors say “younger”, they typically mean 50+ in this context.

          And this IFR depends a lot on where the initial spread was, ho wwell care homes protected, that sort of thing. This is absolutely noticeable and one of teh big reasons why Germany initially had one of the lowest case fatality rates in the world (the other big reason was discovering more cases).

          So unless you know exactly how many older people have been infected, you can’t generalize to an IFR that has any predictive value.

    • Your comment on irregular results for specificity is concerning me as well. I will try to dig deeper and see what validations were done by truly independent actors (not manufacturers nor ones who used tests for similar studies).

      Maybe the tests have a great specificity and https://covidtestingproject.org/ are just out after selling their own, in-house Elisa.

  10. > Ultimately, this is just one surveys.
    Yup, soon they will just be one of the pack, but one of the earliest (aren’t they usually the most unreliable?)

    Anyway, as Mendel points out, expect to need to rule out some inexplicable heterogeneity.

  11. Why I to some extent trust Ioannidis judgement:

    He apparently won the “National Award of The Greek Mathematical Society” during his time at Athens College. I can´t figure out if this was sort of a community/pre-bachelor college, or if he actually took college courses in mathematics. But clearly the man is intelligent. So I highly doubt that he is ignorant of statistics. Sure, he is not a professor of statistics, but compared to most physicians, he probably is “orders of magnitude” sharper.

      • Because he is quite well known and has been a vocal critic of sloppy studies. His appearance as an author implies his endorsement of the work; at least when I was in academia you would not want to be a co-author of a paper that you couldn’t defend.

        I have been somewhat disappointed by him–both in his difficulty acknowledging that responding to weak evidence by doing nothing is in fact taking a big gamble and some of the analysis that he has used to support the claim that COVID may not much worse than the flu.

        • As consumers of statistics, My friends and I follow many statisticians on Twitter. Some of them are editors/writers. So they try to read even more carefully.

          What we all note is that there is a good deal of confirmation, selective, and publication bias. More importantly, it does pay for audiences to read and listen carefully.

  12. Andrew –

    Don’t know if you’ll read this on such an old post…if this is true, it shows why you don’t extrapolate from samples that are effectively SES outliers (3% African American in Santa Clara).

    –snip–

    Liam Dillon (@dillonliam) Tweeted:
    San Francisco did large-scale coronavirus testing for those living and working in its gentrifying Mission District. More than a third tested were white. None were positive, according to preliminary results.

    More than 95% of those who tested positive were Latino. https://t.co/zZfX7aYD31 https://twitter.com/dillonliam/status/1257444825328168961?s=20

    • Shows all that talk about *why* there is a racial signal in the morbidity and mortality #’s may be way premature.

      Do we have any idea of the racial and/or demographic breakdown of positive test rates?

  13. As an outsider I have two related questions about the study, some of which was implied by the conversation around Joseph Candelora;s comment

    First, given housing patterns in the US weighting by zip code and weighting by ethnicity probably overlap, does this present a problem?

    Second, Santa Clara County is about equally white, Asian and Hispanic. Given the early start of the virus in China, the back and forth between that area and China and the vilification of Asians as a result of US politics the low number of Asians in the study was perhaps not surprising and the ability to handle this issue statistically appears at least to me to be impossible. Further, of all the counties in the US, Santa Clara probably has the highest number of trips back and forth to Asia because of Silicon Valley irrespective of ethnicity.

    Would this contribute a few grains of salt to any analysis of COVID19 penetration in the US?

  14. I’m late to the party here, but the authors seem to respond to the criticism of symmetric (delta method) confidence intervals via a “classic” percentile bootstrap. (Generate 10000 bootstrap resamples, discard top and bottom 250, done.) Really? This bootstrap is only valid for symmetric distributions and, because it implicitly estimates the 5th quantile of estimation error by the 95th quantile of bootstrap estimation error, should be actively avoided as heuristic adjustment for finite sample skewness. Happy to stand corrected re: my reading of the paper.

    • Joerg:

      Often there is a problem, when researchers are asked to reanalyze their data, that they are committed to their original conclusions and hence seek a reanalysis that will not change their earlier results.

Leave a Reply to J Cancel reply

Your email address will not be published. Required fields are marked *