“I don’t want ‘crowd peer review’ or whatever you want to call it,” he said. “It’s just too burdensome and I’d rather have a more formal peer review process.”

Nothing new here, I just happened to come across this post from a couple years ago and I think it remains relevant:

I understand the above quote completely. Life would be so much simpler if my work was just reviewed by my personal friends and by people whose careers are tied to mine. Sure, they’d point out problems, but they’d do it in a nice way, quietly. They’d understand that any mistakes I made would never have any major impact on our conclusions.

OK, not really. Actually I want my work reviewed by as many people as possibles. Friends and colleagues, yes. But strangers also. The sooner the better.

But I understand that lots of people want the review process restricted. That’s the whole principle of journals like Perspectives on Psychological Science or the Proceedings of the National Academy of Sciences: if you’re well connected, you can publish pretty much whatever you want. And some of these folks get pretty hot under the collar if you dare to question the work they have published and promoted.

It’s a pretty sweet gig. Before publication, you say the work is a fragile butterfly that can’t be exposed to the world or it might get crushed! It’s only safe to be seen by friends peers, who can in secret give it the guild seal of approval. The data need to be kept secret too. Everything: secret secret secret, like the deliberations before choosing the next pope. And, after publication, the paper is Published! so it can’t be questioned—if you find any flaws, it’s your responsibility to Prove that these flaws materially affect the conclusions, and you can’t ever be so mean as to suggest any lack of competence or care on the part of the researcher or the journal. Also, once it’s published, no need to share the data—who are you, annoying second-stringer, to keep bugging us about our data?? We’re important people, and we’re busy on our next project, when we’re not promoting the current work on NPR.

In this case, there’s no PNAS, but there is the willing news media. But not all the media are willing to play the game anymore.

The source of the above quote

It comes from the Buzzfeed article by Stephanie Lee that I mentioned last night.

Lee’s article recounted the controversies involved in with the Santa Clara and Los Angeles coronavirus prevalence studies that we’ve been discussing lately.

My favorite part was this quote from Neeraj Sood, a professor of public policy at the University of Southern California and one of the authors of the two controversial studies:

Sood said he plans to eventually post a paper online, but only once it has been peer-reviewed and approved for publication.

“I don’t want ‘crowd peer review’ or whatever you want to call it,” he said. “It’s just too burdensome and I’d rather have a more formal peer review process.”

When I say this quote is my favorite part of Lee’s article, I don’t mean that I agree with Sood! Rather, it’s a great line because it reveals in distilled form a big problem with modern competitive science.

“It’s the best science can do.”

Before going on, let me emphasize that I’m not trying to paint Sood as a bad guy. I’m using his name because it’s his quote, and he can take responsibility for what he says (as can I for my own public statements), but my point here is that the attitude in his quote is, unfortunately, all too common in science, or in some corners of science. I’m amused because of how it was stated, but really it bothers me, hence this post.

Now for the background.

Sood and about 15 other people did two serious studies in a short time, sampling and covid-testing over 4000 people in two California counties. They’ve released none of their raw data or code. For one of the studies, they released a report with some summary data; for the others, bupkis. They did, however, make a bunch of claims in the news media. For example, Sood’s coauthor John Ioannidis was quoted in the New York Times as saying of their study, “It’s not perfect, but it’s the best science can do.”

That ain’t cool. They made mistakes in their analysis: that’s understandable given that (a) they were in a hurry, (b) they didn’t have any experts in sample surveys or statistics on their team. Nobody’s perfect. But it’s not “the best science can do.” It’s the best that a bunch of doctors and medical students can do when they don’t have time to talk with statistics experts.

What’s the point of describing hasty work as “the best science can do”? How hard would it be for him to say, “Yes, we made some mistakes, but what we found is consistent with our existing understanding, and we hope to do better in the future”?

But as long as news reporters will take statements such as “it’s the best science can do” at face value, I guess some scientists will say this sort of thing.

I have no problem with them going straight to the news media

I have no problem with Ioannidis et al. going to the news media. They have done work that they and many others have good reason to believe is influential. It’s policy relevant, and it’s relevant right now. Go on NPR, go on Fox News, go on Joe Rogan, go on Statistical Modeling, Causal Inference, and Social Science—hit all the major news media. I’m not one of those people who says, “What do we want? Evidence-based science. When do we want it? After peer review.” If you really think it’s important, get it out there right away.

But I am annoyed at them hyping it.

If that Santa Clara study was really “the best science can do,” then what would you call a version of the Santa Clara study that did the statistics right? The really best science can do? The really really best science can do?

It’s like in the Olympics: if the first gymnast to go out on the mat does some great moves but also trips a few times but you give her a 10 out of 10 just because, then what do you do when Simone Biles comes out? Give her a 12?

I’m also annoyed that they didn’t share their data. I can see there might be confidentiality restrictions, but they could do something here. For example, in the Times article, Ioannidis says, “We asked for symptoms recently and in the last few months, and were very careful with our adjustments. We did a very lengthy set of analyses.” But none of that is in their report! He adds, “That data will be published in a forthcoming appendix.” That’s good. But why wait? In the Los Angeles study, they not only didn’t share their data, they didn’t even share their report!

“Crowd peer review” is too “burdensome” and they couldn’t put in the effort to share their data or a report of their methods, but they were able to supply “B-roll and photos from the April 10-11, 2020 antibody testing.” Good they have their priorities in order!

Here’s what I wrote earlier today, in response to a commenter who wrote that those researchers’ “treatment of significant uncertainties contrasts with basics tenets of the scientific method”:

I disagree. The three biggest concerns were false positives, nonrepresentative sampling, and selection bias. They screwed up on their analyses of all three, but they tried to account for false positives (they just used too crude and approach) and they tried to account for nonrepresentative sampling (but poststratification is hard, and it’s not covered in textbooks). They punted on selection bias, so there’s that. I feel like the biggest malpractices in the paper were: (a) not consulting with sampling or statistics experts, (b) not addressing selection bias (for example, by looking at the responses to questions on symptoms and comorbidity), and (c) overstating their certainty in their claims (which they’re still doing).

But, still, a big problem is that:

1. Statistics is hard.

2. Scientists are taught that statistics is easy.

3. M.D.’s are treated like gods.

Put this together, and it’s a recipe for wrong statistical analyses taken too seriously.

But, that all said, their substantive conclusions could be correct. It’s just hard to tell from the data.

Sood’s a professor of public policy, not an M.D., so point #3 does not apply, but I think my other comments hold.

Back to the peer-review thing

OK, now to that quote:

“I don’t want ‘crowd peer review’ or whatever you want to call it,” he said. “It’s just too burdensome and I’d rather have a more formal peer review process.”

What an odd thing to say. “Crowd peer review” isn’t burdensome at all! Just put your data and code up on Github and walk away. You’re done!

Why would you want only three people reviewing your paper (that’s the formal peer review process), if you could get the free services of hundreds of people? This is important stuff; you want to get it right.

As Jane Jacobs says, you want more eyes on the street. Productive science is a bustling city neighborhood, not a sterile gated community.

Conclusion

I’ll end it with this quote of mine from the Buzzfeed article: “The fact that they made mistakes in their data analysis does not mean that their substantive conclusions are wrong (or that they’re right). It just means that there is more uncertainty than was conveyed in the reports and public statements.”

It’s about the science, not about good guys and bad guys. “Crowd peer review” should help. I have no desire for these researchers to be proved right or wrong; I’d just like us all to move forward as fast as possible.

26 thoughts on ““I don’t want ‘crowd peer review’ or whatever you want to call it,” he said. “It’s just too burdensome and I’d rather have a more formal peer review process.”

  1. I’m not sure that those Ioannidis papers and press releases make the best illustration of your general point, although the quote you found is perfect. The reason is that these were not garden-variety sloppy errors but systematic errors all supporting Ioannidis et al.’s prior claim that Covid is no big deal and everybody should just get back to work raising stock prices. Even Ioannidis could have easily calculated the conventional confidence interval on the false positive rate. Their failure mode (cooked results for a political agenda) isn’t especially rare, but it’s not the one you seem to be describing, which is just overhyping some standard piece of science (e.g. we’ve found Majorana fermions!) for the usual career/ego reasons.

    • Michael:

      1. The classical confidence interval is not so easy to calculate in this case! The closeness of the boundary creates problems with the usual approaches. I don’t think the authors of that paper had the background necessary to do it right. Here’s a classical analysis by the statistician Will Fithian: it’s doable but it’s not a simple textbook procedure.

      2. I agree with you that the errors all went in the same direction, but that’s not special to this example. Even when there’s no political agenda, there’s just about always a research agenda. Think of all the times that people say, “We did this study to prove X.” And having a research agenda is not such a bad thing! For example, if a team of researchers comes up with some drug or treatment or intervention that is backed up by some theory, then, yeah, they’ll expect that it will work and they’ll be hoping their data unambiguously shows it. Consider the economists who study early childhood intervention.

      3. I feel like your term “cooked results” is too strong. I think of “cooked results” as when people are deliberately analyzing the data wrong. What happened here seems more like researchers who were out of their statistical depth and picked results that fit the stories they wanted to tell, again in a way similar to the economists who did the early-childhood-intervention study and the air-pollution-in-China study.

      I feel awkward saying that these people were “out of their statistical depth,” as this seems like an invitation for someone to label me as “patronizing,” but, yeah, statistics is hard, and there a lots of wrong turns where confused people can think they got it right—especially if they’re M.D.’s or economists who have been treated as authorities.

      4. I’d rather not focus on Ioannidis. He was the 16th author on that famous paper. More generally, I am concerned that a focus on individuals leads us down blind alleys, as in that terrible article that Scientific American had to flag for inaccuracy.

      • Andrew –

        Since my comment from back in the day was linked in the excerpt…

        I’ve already asked you but still haven’t seen your answer as to how extrapolating/generalizing from an unrepresentative convenience sample (without comprehensive post-stratification adjustments, and looking past the other aggregious methodological practices such as email recruitment by one of the authors’ wife), is not a fundamental scientific failure. Seems to me like it is, to a rather notable extent.

        And as for focusing on Ioannidis, yes, he was one of many authors – but he went on a national TV campaign to promote the results of this study, including associated rhetoric where he compared covid to the seasonal flu, predicted a peak in deaths in spring of 2020, and ridiculed other scientists who argued that the pandemic was more serious than his estimation. In that sense, I think he merits focus beyond just that as one of 16 authors.

        Personally, I think it’s inherently problematic to try to interpret “agendas.” As I said in that thread, I assume his “agenda” is to practice sound science and to mitigate the harm of the pandemic. But that doesn’t exempt him from bad science, likely infected by the biases of “motivated reasoning” just like everyone else can be as well.

      • Andrew- IIRC, they based the false positive rate on a trial run of <100 samples, with zero false positives found. If the overall expectation value were 3/100, then the probability of getting zero is e^-3 =~0.05. So in the most boring frequentist approach, the standard CI would include 3%. Again, IIRC, that was bigger than the nominal positive rate they saw.
        That's in addition to all the other errors, all of the same sign.
        I agree that it's best to minimize attribution of motive, but not to the extent of pretending not to see something that's pretty obvious.

        • Michael:

          Here’s what I wrote in my first post on the topic, a couple years ago:

          So how do they get their estimates? Again, the key number here is the specificity. Here’s exactly what they say regarding specificity:

          A sample of 30 pre-COVID samples from hip surgery patients were also tested, and all 30 were negative. . . . The manufacturer’s test characteristics relied on . . . pre-COVID sera for negative gold standard . . . Among 371 pre-COVID samples, 369 were negative.

          This gives two estimates of specificity: 30/30 = 100% and 369/371 = 99.46%. Or you can combine them together to get 399/401 = 99.50%. If you really trust these numbers, you’re cool: with y=399 and n=401, we can do the standard Agresti-Coull 95% interval based on y+2 and n+4, which comes to [98.0%, 100%]. If you go to the lower bound of that interval, you start to get in trouble: remember that if the specificity is less than 98.5%, you’ll expect to see more than 1.5% positive tests in the data no matter what!

          So it was 399/401, not 100/100, but you’re right that a standard confidence interval would show the concern. The problem is that they folded in this uncertainty with other uncertainties in their analysis using the delta method, which was not correct. All is clear in a Bayesian analysis or, with a bit more effort, in a classical analysis as shown by Fithian. But, again, the authors of the paper didn’t know enough statistics to know how to proceed. They grabbed a formula and went from there.

          It could be that they did more appropriate analyses, saw they didn’t give the desired result, so instead did something wrong, but I’m guessing they just asked a local expert (who was not really a statistical expert) what to do, and then went from there. This behavior seems like standard operating practice; I see it all the time.

          The bad thing is that when Fithian and I and others pointed out the problems, the authors of the paper kept doing what they could do to preserve their original conclusions, bringing in new data and doing new wrong analyses.

        • Aha- My memory was off, tho it happened to give about the right lower bound of the CI. That the PIs kept at it after being told of their errors, and that their conclusion confirmed the prior position of the senior authors, and that those authors kept making mistakes that confirmed the same position doesn’t prove that the errors were highly motivated. But it does suggest that so strongly that I think it’s a little disingenuous to just lump this in a “should have known stats better” category.

        • Michael:

          I agree that they didn’t behave well. Here’s what I wrote in my original post on the topic:

          I think the authors of the above-linked paper owe us all an apology. We wasted time and effort discussing this paper whose main selling point was some numbers that were essentially the product of a statistical error.

          I’m serious about the apology. Everyone makes mistakes. I don’t think they authors need to apologize just because they screwed up. I think they need to apologize because these were avoidable screw-ups. They’re the kind of screw-ups that happen if you want to leap out with an exciting finding and you don’t look too carefully at what you might have done wrong.

          I don’t think they “should have known stats better”; rather, I think stats is hard, and they got the stats wrong, which is something I’ve seen happen before. My guess is that they thought the stats were straightforward, and then when their stats were criticized, they just thought of this as some red tape, some paperwork issues to be papered over, as it were. My guess is that they never put themselves in the position of considering they might be wrong in the sense of overclaiming from their data. And, yeah, that’s not a good frame of mind for a scientist or for a policymaker. So I think we’re basically in agreement here.

        • Now memory is coming back, tho of course maybe not reliably. Their own calibration gave 0/30 false +ve, as you remind us. Not even close to enough to set a useful bound. The company had almost enough tries to get the standard CI into useful range, but then you have all the systematic issues of careful controlled use by the developers vs. field use by non-experts. In either case you can just use binomials to get the CIs, because the absolute number is so small.

  2. Love this quote: “Productive science is a bustling city neighborhood, not a sterile gated community.”….I’m reminded of Linus’ Law in in the open source community: “given enough eyeballs, all bugs are shallow”…

    It’s unfortunate that many research fields operate like a proprietary software company producing products with no/limited quality control, and then criticizes users for filing bug reports.

  3. Andrew used the term “bupkis” as if this audience would automatically understand/resonate to the term:

    “For one of the studies, they released a report with some summary data; for the others, bupkis.”

    However, we live in a highly sensitive/contentious era where cultural appropriation is hotly debated—>

    https://www.verywellmind.com/what-is-cultural-appropriation-5070458#:~:text=Cultural%20appropriation%20refers%20to%20the%20use%20of%20objects,source%2C%20or%20reinforces%20stereotypes%20or%20contributes%20to%20oppression.

    “Cultural appropriation refers to the use of objects or elements of a non-dominant culture in a way that doesn’t respect their original meaning, give credit to their source, or reinforces stereotypes or contributes to oppression.”

    So, should “bubkis” have been replaced by “nada” or, does that involve even more cultural appropriation?

    So to speak, and more to the point, when scientists use statistics, perhaps inappropriately,

    “2. Scientists are taught that statistics is easy.”

    is this a form of cultural appropriation?

    • I’m an old goy, but I know perfectly well what bupkis means. More generally, if you pay much attention to etymology, there are damn few words in English that weren’t appropriated from some other culture. Let’s not get all postmodern about it.

      • As it happens, unlike “bubkis,” the word “goy” has confusing meanings. Most people believe it means a non-Jew but from
        https://en.wiktionary.org/wiki/goy

        “The word goy technically refers not to non-Jews, but rather to a nation per se; the Jews are said to constitute a ‘goy’.”

        From
        https://www.dictionary.com/browse/goy

        “a term used by an observant Jew to refer to a Jew who is not religious or is ignorant of Judaism.”

        In today’s blog, Andrew once again claimed ” Statistics is hard.” So to speak, other sweeping concepts are as well. Look hard enough and even bubkis will have a deeper postmodern meaning.

        • This is really besides the point, but uou are being a bit selective in citing your sources. You cite something wiktionary gives in discussing the etymology. For a definition, it says

          Noun goy (plural goyim or goys or goyem)

          A non-Jew, a gentile. (See usage notes)

          Synonyms: akum, gentile, shegetz, shkotz
          Hyponym: (female) shiksa”

          You cite what dictionary.com gives as a second meaning, As the first, it says “A term used by a Jew to refer to someone who is not Jewish.”

  4. Almost 2 years later the US has ~50 million reported cases and ~800 thousand reported deaths “with covid”.

    I’d say it is plausible to assume there were 4x more cases than reported and half the deaths were actually “from covid” rather than “with covid” and/or inept medical treatment. Even doubling both those adjustments still sounds plausible.

    That gives an IFR of 0.2%, or same as the flu. Maybe as low as 0.05%. I you assume lots of false positive or multiply-counted tests and untested deaths, maybe as high as ~1.5%?

    So basically the level of knowledge has not changed since April 2020.

    • Anon:

      Yeah, the Stanford team should’ve just told us in April 2020 to chill out because covid would only kill a million people in this country. There was a lot of uncertainty at the time, and people were concerned that the eventual total number of deaths could be a lot higher. At the same time, people at Stanford were predicting that the total number of deaths would be just 500 or 5000 or 10,000. So I guess the level of knowledge really has changed. We know it will be more than 10,000.

    • Anoneuoid –

      >… half the deaths were actually “from covid” rather than “with covid” and/or inept medical treatment.

      Can you reference some research supporting that claim, or are you stating personal speculation as “plausible assumption?”

      > That gives an IFR of 0.2%, or same as the flu. Maybe as low as 0.05%.

      Obviously, the age stratification of COVID outcomes makes a single IFR limited in utility, but 0.05% is totally inconsistent with any research I’ve seen. It’s actually off by an order of magnitude (or more) from most of the careful analysis conducted by people who are skilled and experienced in conducting such analysis. (At first I read your comment as saying 0.5% and was surprised that it seemed you were asserting a number that’s actually in line with the existing evidence I’ve seen). Again, do you have evidence for the 0.05% claim? Even your assertion of covid being causal for only 1/2 the deaths attributed to to covid (a dubious claim at best), and the multiplier of 4x for cases to infections, would make an IFR of 0.05% a mathematical impossibility in the US.

    • Well, you are quick to dismiss the ascertainment of cases and causes of death–fair enough. But to what do you attribute the excess deaths in 2020 and 2021?

      • A combination of hysteria, stress, and inept medical treatments can easily kill hundreds of thousands per year.

        Eg,

        1) misused/overused mechanical ventilation based on anonymous rumors from China

        2) overdosing hospitalized patients with hydroxychloroquine causing methemoglobinemia (similar symptoms to covid and was reported to be common in 2020)

        3) a sudden increase in oxygen after the body has adapted to lower levels when showing up at the hospital (causing a kind of reperfusion injury)

        4) any variety of standard medical errors that may be more frequent due to the hysteria

        5) not being able to see loved ones (especially for the elderly in hospitals or nursing homes who don’t have family to make sure they are being treated well)

        6) damage to the nasal mucosa due to overtesting

        7) chronic anxiety from any runny nose, etc

        8) job loss, poverty, etc leading to chronic stress

        Lot’s of things change depending on if you test positive or people around you are testing positive.

        • Yeah, none of those show up as opaque lungs on x-ray and inability to breathe.

          It’s definitely the case that somewhere between 400k and 1.5M people have died *of* COVID in the US. The existence of a better treatment protocol that doctors aren’t using isn’t a reason to call these not deaths from COVID. And at least 16% of the US has had the disease (that’s the “confirmed cases” so far), so that’s 400000/(.16*320e6) = 0.0078 which is 0.8% or an order of magnitude higher than your low estimate.

          Most likely more than the official death count should be attributed to covid (excess deaths are definitely more), which is about 820k and most likely say 3-4x the infections, so a more likely IFR is

          2*820000/(3*.16*320e6) = .0107 or about 1%

          A decent homegenized IFR credible distribution is something like beta(5,500)

        • opaque lungs on x-ray

          I forgot about this one. How many CT scans are these patients getting? Radiation-induced lung injury sounds a lot like covid (smoking may be protective, involvement of the ACE system, ground glass opacities), and apparently the dosages used are non-negligible.

          https://www.nejm.org/doi/full/10.1056/NEJMra072149

          https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8293129/

          I wonder if CT scans are such a great idea on already stressed tissue.

          Anyway, we will never know the contribution of all these factors but they are assuredly greater than zero and so far it has gone ignored.

        • Raghu –
          Avert your eyes one more time and then I’m done…

          Anoneuoid –

          Typically, you hand wave to “anxiety” and “hysteria” with no actual quantification or even an attempt to quantify. To what extent are negative outcomes from mistakes made causal? You don’t know. Just as you don’t know how much “hysteria” and anxiety would have led to how much illness and deaths without interventions and sub-optimal treatmsnts.

          And your is “gone ignored” is clearly inconsistent with reality in at least some respects. There have been extensive reviews of a COVID counts to quantify the differences between died “with” and died “from,” and how many missed deaths there were because pwolw died without being tested.

          The initial treatments have undergone review by knowledgeable doctors, but of course there’s no such thing in the real world as perfect response to something like the pandemic. Handwaving as if there could be some perfect response, in particular with no actual data or evidence provided, certainly seems agenda-driven to me.

          And once again, even if we multiplied the number of cases by 4x, and assumed your assertion that 1/2 of the identified deaths were actually not “from covid,” the low end of your “plausible” IFR range is mathematically impossible (@200,000,000 infections and @400k deaths).

  5. There’s no justification for hiding data etc to avoid crowd peer review, but I can see how it could be time-consuming and aggravating to rebut repeated criticisms that are nonsensical or have been already been rebutted, especially if the same criticisms come up again and again. (Think of all the time “Joshua” wastes on rebutting “Anoneuoid.”) So on these sites like PubPeer or even Twitter, if an author doesn’t quickly respond to a criticism, it can give the mistaken impression that the critic is right. Perhaps that doesn’t matter for science per se–scientists should evaluate criticisms on the merits–but I can see how it could be annoying to authors.

Leave a Reply

Your email address will not be published. Required fields are marked *