In comments, Joshua Ellinger points to this news article headlined, “Hundreds of thousands in L.A. County may have been infected with coronavirus, study finds,” reporting:
The initial results from the first large-scale study tracking the spread of the coronavirus in [Los Angeles] county found that 2.8% to 5.6% of adults have antibodies to the virus in their blood, an indication of past exposure.
That translates to roughly 221,000 to 442,000 adults who have recovered from an infection, according to the researchers conducting the study, even though the county had reported fewer than 8,000 cases at that time.
“We haven’t known the true extent of COVID-19 infections in our community because we have only tested people with symptoms, and the availability of tests has been limited,” study leader Neeraj Sood, a professor at USC‘s Price School for Public Policy, said in a statement. “The estimates also suggest that we might have to recalibrate disease prediction models and rethink public health strategies.” . . .
The early results from L.A. County come three days after Stanford researchers [including Sood] estimated that 2.5% to 4.2% of Santa Clara County residents had antibodies to the coronavirus in their blood by early April. . . .
The Santa Clara study recruited around 3,300 participants from social media . . . The study was composed differently in Los Angeles; 863 adults were selected through a market research firm to represent the makeup of the county. The county and USC researchers intend to repeat the study every two to three weeks for several months, in order to track the trajectory of the virus’ spread.
It’s good to see this sort of study. When discussing the Stanford study yesterday, we expressed concern that the estimate could be a statistical artifact. But it’s hard to say: much depends on the error rate of the assay, as well as how the statistical analysis was done.
The “good news” is that, as prevalence rates rise, inferences will be less sensitive to testing errors and other statistical issues. As data quality improves, inferences should be more robust to choices in statistical analysis.
To get back to the Los Angeles study: the news article refers to “a report released Monday”—that’s today!—but unfortunately I don’t see a link to the report. I hope the researchers share their data and code. This is an important topic, so best to engage the hivemind.
As discussed in yesterday’s post, it would be good to have data, on demographics and locations of the people tested, on the test results, and also on what symptoms the people reported. If confidentiality makes it difficult to share all this, just forget the location information and give us age, test results, and reported symptoms and co-morbidity.
If someone can find the report, data, and code, please share the links in the comments.
P.S. Shiva Kaul in comments pointed to a link to a version of the report, but I removed the link here because I’ve been told that it’s not an official document from the research project. No data and code yet. The unofficial report stated that 35 out of 863 people (4%) of the sample tested positive, more than twice as high as the 1.5% observed in the Santa Clara county study. This time they poststratified based on sex, age, ethnicity, and income, but not on geography. No data were reported on comorbidity or symptoms. They assume the test has a specificity of 100%.
The unofficial report concluded, “Further population representative serological testing is warranted to track the progress of the epidemic throughout the country and the world.” I agree. As noted above, when tests are done in populations with higher exposure rates, statistical concerns regarding specificity will not be such a concern.
P.P.S. Joseph Candelora points out that the numbers in the above-linked report are not quite consistent with the press release, so I’m not sure what to think. The authors should just post their report, data, and code somewhere. Too bad the L.A. Times reporters didn’t ask for all that as a condition for running their story.