Skip to content
 

MIT’s science magazine misrepresents critics of Stanford study

I’m disappointed. MIT can and should do better. I know MIT is not perfect—even setting aside Jeffrey Epstein and the Media Lab more generally, it’s just an institution, and all institutions have flaws. But they should be able to run a competent science magazine, for chrissake.

Scene 1

Last month, I received the following query by email:

I have interviewed John Ioannidis and Eran Bendavid regarding the Santa Clara study. I am writing for Undark (MIT’s science magazine) and wonder if you would be willing to chat briefly today or tomorrow morning?


I agreed and spoke with the reporter for about half an hour. In the conversation I emphasized that I had no particular issues with Ioannidis and, indeed, I hadn’t mentioned Ioannidis even once in my post on that study.

After the conversation, I remembered one more thing so I sent the reporter an email:

I remember that I did write one thing about Ioannidis; see here. I disagreed with his statement that their study was “the best science can do.” I also disagreed with coauthor Sood’s statement, “I don’t want ‘crowd peer review’ or whatever you want to call it,” I think crowd peer review is a good thing.

And a few days later I followed up with a link to one of my posts on how to do Bayesian analysis of the study.

Scene 2

The Undark article came out.

And it had problems.

The authors did not talk about the benefits of crowd peer review. That’s fine. It’s their article, not mine.

What bothered me is that they misrepresented the critics of the Stanford study, taking careful scientific criticism (and a bit of annoyance at sloppy science being promoted in the news media) as being political and personal.

1. The Undark article said: “Other critics said the antibody test used in the Santa Clara study was so unreliable that it was possible none of the 50 participants who tested positive had actually been infected. This, despite the fact that almost all surveys to date suffer similar test-reliability problems in low-prevalence areas.”

But the data in that study really are consistent with a very low rate of true positives in that sample. The critics are correct here! In our analysis of these data (https://www.medrxiv.org/content/10.1101/2020.05.22.20108944v2.full.pdf), we summarized as follows:

For now, we do not think the data support the claim that the number of infections in Santa Clara County was between 50 and 85 times the count of cases reported at the time, or the implied interval for the IFR of 0.12–0.2%. These numbers are consistent with the data, but the data are also consistent with a near-zero infection rate in the county. The data of Bendavid et al. (2020a,b) do not provide strong evidence about the number of people infected or the infection fatality ratio; the number of positive tests in the data is just too small, given uncertainty in the specificity of the test.

The fact that other surveys have similar problems with test reliability . . . sure, other surveys have these problems too! The lower the rate of positive tests in the data, the more you have to be concerned about false positives. The Santa Clara study had only 1.5% positive tests. That’s a really low number.

This is a technical point. It has to do with the possible false positive rate. It depends on the numbers. Talky-talk won’t do it. If you’re gonna do journalism on this one, it has to be quantitative journalism. Again, if any magazine should be able to handle this, it’s MIT’s magazine.

2. The Undark article said: “The attacks on Ioannidis continued to snowball. In a recent blog post about the Stanford study, Columbia University statistician Andrew Gelman wrote that Ioannidis and his co-authors ‘owe an apology not just to us, but to Stanford.'”

The misleading thing about the way this is presented is that they got the time sequence wrong. In the previous paragraph, they linked to a 20 May article from the Nation and a 15 May article from Buzzfeed. Then, after giving some background, they wrote, “The attacks on Ioannidis continued to snowball” and mentioned my post. But that post of mine was from 19 Apr. To present my post as part of a snowball of attacks . . . that’s just wrong.

Also, I did not attack Ioannidis in any way. Indeed, my post does not mention Ioannidis even once! In comments some people bring up Ioannidis, and at one point I noted that he was author #16 of a 17-author paper. This has never been about Ioannidis for me.

I do think the authors of the Stanford paper owe us an apology, but this has nothing to do with politics or Ioannidis or the ideological leanings of Buzzfeed or whatever. As I wrote in my post, “Everyone makes mistakes. I don’t think they authors need to apologize just because they screwed up. I think they need to apologize because these were avoidable screw-ups. They’re the kind of screw-ups that happen if you want to leap out with an exciting finding and you don’t look too carefully at what you might have done wrong.”

3. Undark wrote: “Ioannidis, right or wrong, has raised difficult questions, in the best tradition of science. Silencing him is an enormous risk to take.” Just to say this again: (a) my writings on this topic are not about Ioannidis et al., (b) Bendavid et al. made avoidable statistical errors in their papers, these were errors that many people pointed out, and they did not take the opportunity to reassess their conclusions, and (c) I think the “silencing him” think makes no sense. Nobody’s about to silence these Stanford professors who can continue to post their preprints etc.

The Stanford team made strong public claims, and other scientists pointed out errors in their claims. Meanwhile the news media got involved. If someone on Fox news said one thing or someone at the Nation said another . . . that’s fine, it’s the free press, they can feel free to share their takes on the news, but that’s not science. Just cos the Stanford study was featured on Fox news, that doesn’t make it wrong. Also, just cos some critics were praised in the Nation, that doesn’t mean it’s right for MIT’s science magazine to ignore the substance of the criticisms.

Scene 3

I sent off a polite note to the reporter who’d interviewed me with the three points above. The author responded that they did not imply that our statistical points were incorrect, so I followed up:

Thanks for the quick reply. I think it would help if the article clarified the point. Something like, “The critics were right on this one. The Stanford team really did make several mistakes in their statistics. And, just to be clear, the statistical criticisms by Will Fithian, Andrew Gelman, and others did not focus on or even mention Ioannidis. That said, it’s notable that the Stanford research got caught up in a political and media firestorm in a way that other, similar studies did not.”

They ask why did that happen. I have three quick answers. First, the Stanford paper got tons of media attention. This has nothing to do with Fox news, it’s just that if a paper gets tons of attention, then tons of people will read it, so if a paper does have problems, they might well be found. Second, the Stanford paper had some clear statistical errors. Third, the fraction of positive tests was low, which makes the results more sensitive to statistical assumptions.

Again, I bring in a technical point, naively thinking that a reporter for MIT’s science magazine will want to get the technical details right.

They also misrepresented the investigative reporting of Stephanie Lee; see this thread.

Scene 4

They edited the article in response to one of my points. But the edit just made things worse!

In the earlier version of the article, they wrote, “The attacks on Ioannidis continued to snowball. In a recent blog post about the Stanford study, Columbia University statistician Andrew Gelman wrote that Ioannidis and his co-authors ‘owe an apology not just to us, but to Stanford.'”

This has been changed to: “Attacks on Ioannidis came early and often. Just days after the study published, Columbia University statistician Andrew Gelman wrote that Ioannidis and his co-authors ‘owe an apology not just to us, but to Stanford.'”

First, I never attacked anyone. I pointed out errors in a much-discussed paper. I wrote that a group of authors make avoidable errors and I thought they should apologize for wasting our time with sloppy work. That is not an attack.

Second, my post was not about Ioannidis. Indeed, I did not mention Ioannidis at all in that post. Nor was my post about Ioannidis and his co-authors Ioannidis was author #16 out of 17. There was no “Ioannidis and his co-authors.”

In addition, in making these changes they still did not anywhere in your article acknowledged that Will Fithian, me, and other critics (including Stanford’s own Trevor Hastie and Rob Tibshirani) were correct that the Stanford paper had errors which invalidated their main claims.

Look, I get it. MIT’s science magazine want to write a story about Ioannidis. That’s fine. He’s had a busy career. I have nothing against him. What I don’t like is that they are taking open scientific discussion by the scientific community, and investigative reporting by Stephanie Lee and others, and inappropriately labeling them as political.

I don’t think Undark is doing Ioannidis any favors here either. He’s a busy person, he was author #16 on a 17-author paper that had no statisticians on it and that made serious statistical errors. That’s fine! Statistics is hard. Why go to so much effort to misrepresent the critics of this paper? This is how science works: when people make mistakes, we point them out. It’s not personal. It’s not about who is author #16 or whatever.

A more accurate story would state very clearly that (a) the Stanford paper had serious statistical errors, (b) the critics were correct, and (c) the scientific criticisms of that paper had nothing to do with its sixteenth author. Then they could go to town on the whole politics thing,

A science-empty take on science

Who cares?

I care, partly because Fithian, Lee, and many others put in a ton of work. Bob Carpenter and I were inspired to write a whole goddam paper on how to actually analyze this sort of data.

Yah, yah, you’re saying: It’s my fault because I said the authors of the Stanford study should apologize. Well, no, it’s their fault for not checking their statistics. I’m not saying they’re evil, I’m just saying an apology is in order, given all the time they wasted. I’m not saying they did a bad analysis on purpose; I’m saying they should’ve known enough to know that they didn’t know how to analyze these data. If you lend me your car and I try to fix it and instead I make things worse, leaving a pile of random parts and a pool of oil in the driveway, and it turns out I don’t know much about how to fix foreign cars, then, yeah, I should apologize, even if I was really really honestly trying my best.

But the Stanford team messed up. That happens. I’m mad at MIT’s science magazine because they just published the sort of article that sets back science journalism, an article that presents legitimate and open scientific criticism as being political. This is Lysenko-style science reporting: it’s not what you know, it’s who you know that counts.

I don’t think the authors of the Undark article are evil either. They’re journalists, they have a good story, and they want to go with it. They’re just doing that thing that storytellers sometimes do, of folding up the truth so it will fit better in their container. This is bad journalism, in the same way that the Stanford study used bad statistics. That doesn’t make these journalists bad people, any more than those Stanford doctors are bad people. They just got carried away; they’re misrepresenting the data in a way that better tells their story. And I’m not even saying the misrepresentation is on purpose; these things happen when you write an argument and then fill in the facts to make your case.

But I do blame them for not fixing things after I pointed out the problems in their story. This is MIT; the data should matter.

I do a lot of science, and I do a lot of science reporting. I’m not a fan of scientist-as-hero journalism, and I think science is well served by reporters such as Stephanie Lee who dig deep. I don’t think science is well served when the top engineering school in the world (I say as an alum, class of ’85) is promoting a science-empty take on science, a narrative which is full of politics but can’t find the time to establish the scientific facts.

I guess I”m just naive. Last decade I was getting angry that prestigious journals such as PNAS were publishing absolute crap. Now I’m used to it, but it’s time for me to be angry at prestigious science journalism. So laff at me, call me naive to think that MIT’s science magazine would want to get things right. I’m still fuming about Scene 4 above, where they politely listen to my criticism and then double down on their bullshit framing of the story.

P.S. David Hogg sends in the above picture of a NYC cat making its way out of quarantine.

P.P.S. tl;dr version here.

61 Comments

  1. D Kane says:

    “Talky-talk” is a genius phrase. I am stealing this!

    > doing that thing that storytellers sometimes do, of folding up the truth so it will fit better in their container

    Reminds me of the complaint that the secret to Michael Lewis’s success is that he never lets the facts get in the way of a good story.

    Forget it, Andrew. It’s MIT.

  2. Joshua says:

    Andrew –

    > He’s a busy person, he was author #16 on a 17-author paper.

    I think that focusing where he falls on the list of the authors misses important context: Specifically, that he went on a national TV publicity campaign to argue that the results of his study justified extrapolating a national (or global) infection fatality rate, to support a conclusion that COVID-19 is nothing to worry about, is basically like a seasonal flu, and that the interventions being undertaken to limit the spread were “draconian” and counterproductive.

    Although your posts were limited in their focus, and ioannidis’ publicity campaign isn’t relevant to how the MIT article mischaracrerized your input into the debate, his publicity campaign is directly relevant to the topic of article and the absurd notion that some pure scientist just doin’ his science is being attacked by partisan tribalists.

    • Joshua says:

      I also think that relevant context, with respect to his ranking on the list of authors, is that he has become so well known for his work advocating for better statistical rigor in research, and yet was involved in a study with dubious statistical rigor.

      That doesn’t do anything to detract from his previous work on that issue, but it should serve as a cautionary tale that, as you say, statistics is hard!

      • Joshua,

        I thought the revisions made reflected that one or more co-authors incorporated many of the criticisms. I haven’t read any others that are so reliable either.

        I wonder if we had ramped up testing efforts initially by testing massively and more frequently; that is to have included the asymptomatics, we would have been in a better place. This is my opinion. And I view much knowledge as a consumer of science/statistics and potential patient down the road. Not as an expert.

        • The problem I had with John Ioannidis’s role, especially in the study’s publicity, was to provide credibility to far less uncertainty than there was in that study at the time. It simply does not matter, if later the revisions addressed this or if with more data, it turns out the uncertainly was not as high as before.

          Time matters – you say its safe in the theater and then after some delay you say actually there is a fire or that it was just put out (you are safe now).

          What matters and this maybe primarily a statistical concept, what you claim should not exceed the information you have and is to judged by what repeatedly happens when you use that information. What repeatedly happens.

          Now, I regularly refused to be a co-author and withdrew as a co-author from a paper when it seemed to me that the paper would not adequately quantify uncertainty. Others that don’t pick up on this or choose to ignore are subject to public criticism.

        • Joshua says:

          Sameera –

          It’s good that they incorporated responses in their follow-up. Yes, that’s what you’d hope for. But…

          As Keith says below. That doesn’t really address the more fundamental problem of not properly quantifying uncertainty, and then going on a national publicity campaign – and inserting themselves in a discussion of policy – based on that mis-quantification of uncertainty. And to add to that, as I’ve commented on before, Ioannidis doubled-down with a one-sided treatment of uncertainty on related issues such as the uncertainty about death counts (he discussed only the uncertainties that might contribute towards over-counting and ignore the arguably more significant uncertainties that would result in under-counting).

          And subsequent to that, I feel that some of the co-authors engaged in similarly questionable treatment of uncertainties related to their findings in the 2nd LA County study and the MLB employees study. Always their errors led toward supporting their political advocacy against certain interventions to reduce the spread of the pandemic.

          But worst of all, imo, they built on perhaps one of the most fundamental errors in science: Extrapolating from non-representarive sampling.

          It is their right to engage in such advocacy. I fully support it. But it can only be done responsibly when people take full responsibly for their treatment of uncertainties. People should expect that their science will be biased in favor of their ideological proclivities, and to support their scientific theses. As such, they should act from a place of understanding that you are the easiest person for you to fool. That requirement is only heightened when people insert themselves into political engagement around policies associated with a deadly pandemic.

          These aren’t new issues. They are understandable. They are the kinds of things that people just do. Everyone needs to step back from motive-impugning, imo, and recognize that when we see these biases in others they are reflecting back our own tendencies.

          • Joshua says:

            At the risk of seeming even more unhinged…

            The more I think about this the more it bugs me.

            Not only did they go on a nation’s TV publicity campaign to extrapolate from unrepresentative sampling, when their further studies revealed findings that contradicted their priors, they then engaged with the media to explain that the results of their follow-on work that they didn’t like were explained by the unrepresentariveness of their sampling…

            Grrrrr…

  3. Ernie says:

    Undark is affiliated with MIT? I don’t see any mention of that on the site.

    • Andrew says:

      Ernie:

      As noted in the above post, this is what the author of the article told me:

      I am writing for Undark (MIT’s science magazine) and wonder if you would be willing to chat briefly today or tomorrow morning?

      • Joshua says:

        Sayeth Wikipedia:

        > The magazine is published under the auspices of the Knight Science Journalism Fellowships program at the Massachusetts Institute of Technology.

        • Hi Andrew,

          I read the UNDark article and your comments for the 1st time today. As I accessed many Twitter threads, from when John Ioannidis’ article 1st came out, I also saw that many of the criticisms of John Ioannidis on Twitter were deleted. So now it is difficult to gauge the accuracy. I wish I had copied them to Word.

          In my observation, there is a virulence to some critiques coming from experts on Facebook and Twitter that I think showed that experts do not disclose precisely what their own interests are as they are critiquing. My friends who follow the statistics experts have been astounded by what they think are just displays of pettiness and meant to besmirch John’s reputation.

          Actually, one of them asked me why people are so hostile to John Ioannidis. And I didn’t answer that question b/c
          I have respect for many of the experts who have critiqued him. And even agree with some critiques. But do not care for the sarcasm that accompanies the tweets.

          However,

          • Dave says:

            Hmm, maybe because he first staked a claim that people were overreacting with no evidence, then backed up this position with poor evidence in a way that can affect policy and lead to tens of thousands of unnecessary deaths?

            • Hi Dave,

              Which claim do you mean? After 2 months of reading a fair number of articles and commentaries, I am looking more in depth discussion of assumptions made. So I welcome your elaborating on your earlier post.

              • Dave says:

                The entire tenor of this: https://www.statnews.com/2020/03/17/a-fiasco-in-the-making-as-the-coronavirus-pandemic-takes-hold-we-are-making-decisions-without-reliable-data/

                It is in no way crazy to debate the best way to deal with a situation that is new and rapidly evolving for which many choices will be painful. If this was really his point, he did not advocate for it well. Reading the above in March, it seemed clear to many that Ioannidis was (purposely) playing contrarian and suggesting that everyone is overreacting. If one has a range of possible outcomes based on the available data (however limited), it is dangerous to suggest that the least conservative actions are the ones to follow at the start. To back this up, his Santa Clara coauthors also came forward with similar attitudes in the press. Then, lo and behold, they post a totally craptastic piece of garbage that is exactly the type of study that Ioannidis is know for railing against to support their original views. Then again instead of backing down, they all double down in various ways, and go to various news sources about how they have been proven right about overractions. Ioannidis himself starts posting studies with cherry picked population data (healthy young blood donors?), sloppily saying the IFR of flu is 0.1 percent, ignoring country-wide data (England is nearing a total of 0.1% total country-wide deaths, so come on now), etc. Again, these things matter when it is well-respected people at well-respected places whose poor work and media appearances can alter policy on a large scale. It also makes it harder to actually debate what could have been a related point-that lock downs cannot be sustained for long periods-is there a better way? People get pissed off when all of this obvious cherry picking and sloppiness comes from the guy who made his name on calling it out.

          • Andrew says:

            Sameera:

            I have no problem with the article’s perspective. My problem is that (a) they omitted the true and relevant fact that the statistical analysis in the Stanford report was bungled, (b) they misrepresented my post as an attack on Ioannidis when it was neither an attack nor about Ioannidis, and (c) they completely misrepresented Lee’s reporting. They could’ve told their story without the misrepresentations. Indeed, they could still do this by fixing their story. It’s completely irresponsible for them to not do this.

            • Andrew says:

              As the saying goes, the authors of that MIT article are entitled to their own opinions, but not their own facts.

              How hard can that be for people to understand? Pretty damn hard, I guess.

              • jim says:

                people…”are entitled to their own opinions, but not their own facts.”

                :) hard to say whether this is true or not. I think the key word is “entitled”.

                People don’t care that they’re not “entitled” to “their own facts”, so they use them anyway, which means in effect they actually are entitled to their own facts.

              • Andrew says:

                Jim:

                Fair enough. Nobody’s entitled to anything. I think it’s a journalistic error to misstate the facts, and journalistic malpractice to not correct the errors when they are pointed out. Lots of journalists do this, though. I think that part of the problem is that “telling a good story” is prized over “getting the facts right.”

              • jim says:

                ‘it’s a journalistic error to misstate the facts, and journalistic malpractice to not correct the errors when they are pointed out. ‘

                Oh, make no mistake, I agree. Everyone else agrees too! Until they need some slightly less truthful facts or some other allusion to misrepresent reality to support their personal cannon.

                ‘I think that part of the problem is that “telling a good story” is prized over “getting the facts right.”’

                Absolutely. Facts don’t sell papers or get clicks. Stories do.

              • Jim says:

                Pardon my cynicism, but I believe the battle for science is all but lost.

  4. Ono says:

    The irony is the authors are engaging in exactly what they accuse critics of doing – assuming others are arguing in bad faith by assigning motives to political motives to Ioannidis instead of discussing the substance of his claims.

    It is not true that discussion of the need for lockdowns has been silenced. ZDogg MD (to name just one person with a mass of followers) has done so several times in YouTube videos that are still up. The difference between Ioannidis and all the others is that only Ioannidis has particpated in doing actual science as opposed to simply making videos, tweeting, or writing an Op Ed.

    As you pointed out, there was no discussion in their piece of the scientific criticism of any of the Stanford research. Ironically, they also repeat the claim that the Stanford studies demonstrate the IFR of COVID is similar to that of seasonal influenza (both in the range of 0.1 to 0.2), therefore shutdowns were not needed. The 0.1 to 0.2 figure for influenza is the same as the CFR that can be calculated from influenza burden information on the CDC website, not an IFR. And a CFR derived from some algorithm that guesses at cases. That number has no relationship whatsoever to an IFR determined by any serosurevey. The IFR of influenza isn’t known with any precision. These COVID/flu comparisons could actually be an order of magnitude off.

    Interestingly, Ioannidis gave testimony in front of the US Senate

    https://www.hsgac.senate.gov/imo/media/doc/Testimony-Ioannidis-2020-05-06.pdf

    • Ono says:

      Hit return before comment was finished – in the Senate testimony, he makes the statement that the IFR of COVID and seasonal influenza is about the same for middle aged adults. No numbers are provided, and there is no reference given.

      So I do agree with Joshua about the influence of Ioannidis – it’s gone even further than TV – into the US Senate.

      • My guess is that John bases his analysis on the situation as it unfolded in Italy. I have to check that of course.

        As I posted earlier, a good many of the nastiest of tweets looked like they were deleted. Some individuals have a ‘waddle off in a huff’ demeanor judging from their tweets. It’s actually humorous.

        I think the stat sig wars have also played into the different camps among experts.

        • confused says:

          Italy would be surprising, as it’s one of the hardest hit countries. It seems more like dismissing places like northern Italy, Madrid, and NYC as outliers. Might be regional bias — Stanford is in California, and for some combination of reasons, the Western US has been pretty lightly affected, to the point that the deaths *so far* in most Western states are probably comparable to or less than a bad flu season like 2017-18. (On the other hand, much of the Western US is in a much earlier stage of its COVID epidemic than the Northeast/Midwest).

  5. I belong to the Right Care Alliance.org of which Shannon Brownlee is a founder. But I did not know in advance that Shannon and Jeanne Lenzer were going to write the article. I RT the article. And discovered that they had authored it.

    Shannon Brownlee’s book, Overtreated is a tour de force. A superb book.

  6. Joseph Candelora says:

    In all the criticism of Ioannidis, I haven’t seen anyone take him to task for this in his original STAT article:

    “If we assume that case fatality rate among individuals infected by SARS-CoV-2 is 0.3% in the general population — a mid-range guess from my Diamond Princess analysis — and that 1% of the U.S. population gets infected (about 3.3 million people), this would translate to about 10,000 deaths.”

    1% of the US population gets infected? How could anyone present that as reasonable in a serious discussion of potential outcomes. Absurd.

    • Joshua says:

      > How could anyone present that as reasonable in a serious discussion of potential outcomes. Absurd.

      Well…if you rely on incomplete data…

      Oh, wait, Ioannidis wrote an op-ed about how other people’s analyses were dubious because of incomplete data and then went on to make flawed yet confident projections based on similarly incomplete data.

      It’s interesting that someone who has forged a reputation for statistical robustness would fall into such an obvious trap of self-biased reasoning.

      It goes to show how much diligence is required to guard against the adage thst you are the easiest person for you to fool.

      His whole exercise of extrapolating from data that are such a clear outlier – can there be more of an outlier than data collected in a cruise? – seemed absurd to me from the start.

      • confused says:

        Yeah, I think the 1% thing was the key error in that piece.

        >>His whole exercise of extrapolating from data that are such a clear outlier – can there be more of an outlier than data collected in a cruise? – seemed absurd to me from the start.

        Ironically, that extrapolation in and of itself seems to have held up fairly well, comparatively. (He did adjust for demographic differences between the Diamond Princess and the US overall population.)

        Ioannidis took a 1% death rate from the Diamond Princess and came up with 0.3% for the US. The death rate on the Diamond Princess ultimately turned out to be 2%, and I don’t think 0.6% for the US overall IFR will prove to be that far off. (NYC is ~1% but probably higher than much of the US.)

        And US overall IFR may drop; better supportive care/more availability of remdesivir/maybe future treatments depending on how long this thing lasts, younger populations may tire of social distancing faster affecting the age-distribution of infections, better protection of nursing homes (and in some parts of the US a significant proportion of nursing homes already having been hit) etc. I don’t think we are going to see NYC-level strains on medical capacity again in the US.

        If only, say, 1/4 of the people who will ultimately be infected in the US have been infected so far, even 0.3% might not be impossible at the end of this whole thing, though I think it’s highly optimistic.

        The 1% thing is the real critical error; that’s much more than a factor-of-2 error.

        • Joshua says:

          > Ironically, that extrapolation in and of itself seems to have held up fairly well, comparatively. (He did adjust for demographic differences between the Diamond Princess and the US overall population.)

          Here’s what Ioannidis said:

          > “Cruise ships are like an ideal experiment of a closed population. You know exactly who is there and at risk and you can measure everyone,” says John Ioannidis, an epidemiologist at Stanford University in California. This is very different from trying to study the spread in a wider population, where only some people, typically those with severe symptoms, are tested and monitored.

          That’s absurd. The population is an outlier and the treatment condition is completely non-typical. It depends completely on how people are treated once the infection has been discovered.

          He estimated based on about 1/2 of the total deaths over time (maybe some of them are questionably attributable to COVID, but still), which shows that he got tripped up by exactly what he was complaining about – people extrapolating form incomplete data.

          As far as I can tell he adjusted for age (I welcome being shown I’m wrong about that) – there are many other explanatory variables for which that population would be an outlier.

          Ioannidis took a 1% death rate from the Diamond Princess and came up with 0.3% for the US. The death rate on the Diamond Princess ultimately turned out to be 2%, and I don’t think 0.6% for the US overall IFR will prove to be that far off. (NYC is ~1% but probably higher than much of the US.)

          And US overall IFR may drop; better supportive care/more availability of remdesivir/maybe future treatments depending on how long this thing lasts, younger populations may tire of social distancing faster affecting the age-distribution of infections, better protection of nursing homes (and in some parts of the US a significant proportion of nursing homes already having been hit) etc. I don’t think we are going to see NYC-level strains on medical capacity again in the US.

          If only, say, 1/4 of the people who will ultimately be infected in the US have been infected so far, even 0.3% might not be impossible at the end of this whole thing, though I think it’s highly optimistic.

          The 1% thing is the real critical error; that’s much more than a factor-of-2 error.

          • Joshua says:

            Sorry for making it seem like I wrote what you wrote above… My comment should have stopped after “for which that population would be an outlier…”

          • confused says:

            >>That’s absurd. The population is an outlier

            Not *that* much, I don’t think, once you adjust for age structure of the population (which Ioannidis did). It’s non-representative, sure, but everyone else in mid-March was comparing the US to Italy or Wuhan. Our outbreak evolution hasn’t been remotely similar to either place (NYC is somewhat comparable to northern Italy, sure, but the US as a whole is definitely not).

            It’s certainly better than the Spanish flu comparisons people were throwing around back then!

            >>the treatment condition is completely non-typical.

            Do we know that? I mean, clearly they got better treatment than in the near-overwhelmed NYC hospital system, but Ioannidis was extrapolating to the US, and NYC is nowhere near typical of the US.

            And learning more about how to deal with COVID may cancel out — or more — any effect from greater medical resources per patient.

            >>As far as I can tell he adjusted for age (I welcome being shown I’m wrong about that) – there are many other explanatory variables for which that population would be an outlier.

            Sure, but age is *incredibly* important in COVID. Everything else except preexisting conditions (which are very strongly correlated with age anyway) is practically “details around the edges” by comparison; the IFR is probably something like 100x-200x different between people in their 20s and people in their 80s.

            But I guess this depends on how much you think socioeconomic status and such have an independent effect, vs. basically being proxies for urban density/contact patterns. Since the Diamond Princess was dense, I don’t think much “wealth effect” should show up, but who knows…

            • Carlos Ungil says:

              > comparing the US to Italy or Wuhan. Our outbreak evolution hasn’t been remotely similar to either place (NYC is somewhat comparable to northern Italy, sure, but the US as a whole is definitely not).

              The US as a whole is within a factor of two of Italy as a whole in per capita deaths. The American Northeast (pop: 56mn, 62k deaths reported) is comparable to Northern Italy (pop: 28mn, 29k deaths reported).

              • confused says:

                This is true, but not really contradictory to what I was saying, since the Northeast numbers are very strongly driven by the NYC metropolitan area (not just NYC itself, but also Long Island and most of the population of New Jersey and Connecticut are included).

                The highly populous US states other than New York – Texas, California, and Florida – show a radically different picture.

                And yes, New York’s epidemic is practically over while it’s not in those other states. But I’m not talking just about severity, but the *pattern* of the epidemic. The NYC metro area showed a very sharp rise, a very high peak, and a relatively rapid decline – like Italy. Texas and Florida (which didn’t take strong measures early, unlike California) show a totally different pattern (slow rise and long plateau; right now cases appear to be rising while deaths are dropping in TX, so it’s not clear if that’s a lag effect or a result of better finding of mild cases).

              • Carlos Ungil says:

                More than I contradiction I was giving a different way to look at it. Anyway, it’s not just the NYC area: Massachussets (pop: 7mn, 8k deaths reported) is also comparable to Northern Italy. But I agree that the pattern is different in different places, for multiple reasons.

            • Joseph Candelora says:

              I’m quibbling here, but Ioannidis didn’t come up with 0.3% from his age-adjusted Diamond Princess study — he came up with 0.125%*.

              It’s not clear exactly how he got to 0.3%, but it appeared to be the result of further adjustment. He adjusted his original confidence interval of 0.025%-0.625% up to 0.05%-1.00% in order to account for cruise passengers being healthier than the general population at same ages and also to account for right censoring of deaths**. Then later in the article he picked 0.3% as “a mid-range guess from [the] Diamond Princess analysis”.

              So he already adjusted for deaths not-yet measured. Your multiplication by 2 is double-counting those additional deaths. Worth noting that he it also reflects his best estimate adjustment due to population being nonrepresentative.

              But as I said, those are quibbles. You and I broadly agree. That 0.3% (which will likely be low by more than a factor of 2 but certainly less than an order of magnitude) reflects him doing a much better job than he did with the 1% prevalence estimate for an unmitigated outbreak. That part was just absurd.

              The other part one need to not give him too much credit for is for having generated an IFR estimate that’s closer to actual than the WHO’s 3.4% was. Every expert knew that the 3.4% was simply the CFR calc, not an estimate of IFR. His arguments, while nominally attempts to clarify, are actually the ones conflating the WHO’s CFR with an IFR. He’s the one who keeps acting is if using an actual IFR estimate is some major correction to what everyone else is/was doing. But his article came out the day _after_ Imperial College London published their study — and they of course didn’t use 3.4%, or anything close to it in their projections. They used best estimates of actual IFR by age, and came up with about 0.82% in the US and about 0.9% in the UK. Yet Ioannidis, in his original STAT article and the many followups, keeps using this 3.4% strawman as if the policy-driving fatality estimates assumed that number. They didn’t.

              To summarize, he was wrong in just about every dimension it’s possible to be wrong. His best estimate of IFR was wrong (too low). His best estimate of ultimate infection rate was wrong (absurdly too low). But he was most wrong in saying that the information that others (far more expert than he) were using was “utterly unreliable”. Those experts actually succeeded in taking spotty data and sussing out some damn solid estimates. As much as he wants his readers to think so, ICL and the like weren’t using 3.4% as the assumed infection fatality rate.

              *”Projecting the Diamond Princess mortality rate onto the age structure of the U.S. population, the death rate among people infected with Covid-19 would be 0.125%.” — Ioannidis, STAT 3/18
              **”It is also possible that some of the passengers who were infected might die later, and that tourists may have different frequencies of chronic diseases — a risk factor for worse outcomes with SARS-CoV-2 infection — than the general population. Adding these extra sources of uncertainty, reasonable estimates for the case fatality ratio in the general U.S. population vary from 0.05% to 1%.” — Ioannidis, STAT 3/18

              • Joseph Candelora says:

                Separately, I’ve not seen anything that suggests excess deaths in NYC due to hospital overload. If you compare the hospitalization data to death data, the rolling average new hospitalization data does a damn good job of predicting the death count, over the entire course of the epidemic in NYC.

                I took Cuomo’s 3-day average hospitalization count and daily death counts. If you look at deaths lagged five days against hopsitalizations, the ratio of deaths to hospitalizations is surprisingly steady, with no discernible trend.

              • Joshua says:

                Joseph –

                > He adjusted his original confidence interval of 0.025%-0.625% up to 0.05%-1.00% in order to account for cruise passengers being healthier than the general population at same ages and also to account for right censoring of deaths**

                Thanks for that. I hadn’t seen it sooner.

                “0.05-1.0%.” In context, a CI thatnwide seems pretty useless to me – all that much more so becsuee of his subsequent publicity campaign to say its baciskky like the seasonal flu.

                He has a recent publication. Here, he ignores the uncertainty of what the economic impact would have been absent government mandated shelter in place orders.

                https://forecasters.org/blog/2020/06/14/forecasting-for-covid-19-has-failed/

              • Joshua says:

                IMO, adjusting for “different frequency of chronic diseases” – while like age is an important given the nature of COVID-19 – seems rather inadequate given all the ways in which cruise passengers might not be a representative slice of the general public.

              • confused says:

                Hmm, good point. And I think I agree with nearly all of that.

                I do disagree that 0.3% “will likely be low by more than a factor of 2”, at least for the US. The *current* IFR for the US is probably more than 0.6%, but the US isn’t done with COVID, and I think that it will likely drop (and probably already has dropped somewhat from the ~1.1% or thereabouts of New York).

              • confused says:

                I didn’t mean that people were dying in NYC because there was no room for new patients in the hospitals (I agree that didn’t happen), but that the patient / staff ratio increased so healthcare workers were overworked and the quality of care decreased for patients in the hospitals.

              • Zhou Fang says:

                I think it’s plausible that it might in fact not drop but rather increase. Current US covid19 is a mix of states in late stages (where new cases are reducing down to zero) where raw CFR is a fairly decent calculation (apart from unobserved deaths and cases), and states where covid19 are in early stages (where new cases are rising), in which case raw CFR is an underestimate due to the infection-death lag.

              • confused says:

                >>I think it’s plausible that it might in fact not drop but rather increase.

                I mean, there’s still a lot of uncertainty about COVID, so a lot of things are plausible to some degree.

                But being at an early stage of the epidemic would only affect (observed) CFR, not (true) IFR. I’m talking about the percentage of people infected who end up dying from COVID, not the deaths/infections ratio at some specific moment in time.

                What I’m saying is that I expect a lower IFR for people infected today than for people infected March 16th, and I think the IFR for people infected September 16th will probably be significantly lower than that. So unless COVID dies out in the US or drops to very low levels very soon, the final IFR for the US determined at the end of the pandemic will probably be much lower than what we’d get from an estimate today.

              • Joseph Candelora says:

                I just don’t see anything in the data that suggests that death rate changed with time or resource utilization.

                This is very simplistic, but I was grabbing the numbers from Cuomo’s daily briefings until he stopped giving them:
                https://i.redd.it/hdz4cjox6d551.png

          • Joshua says:

            > Not *that* much, I don’t think, once you adjust for age structure of the population (which Ioannidis did).

            SES is a strong predictor of health outcomes. The associated causality is obviously complicated. Is it really SES or comorbidities? Is it really SES or race/ethnicity? Is it really SES or is it access to healthcare? Is it really SES or is it health behaviors? If it really SES or is it diet? Is it really SES or is it number of adverse childhood experiences? Is it really SES or is it [particularly with reference to COVID 19 mortality) prevalence of multi-generational households (or grandparents caring for grandchildren)? Is it really SES or overall health status? Etc.

            Cruise passengers are non-representative BY SES, and probably many if those other factors and prolly more.

            > It’s non-representative, sure, but everyone else in mid-March was comparing the US to Italy or Wuhan. Our outbreak evolution hasn’t been remotely similar to either place (NYC is somewhat comparable to northern Italy, sure, but the US as a whole is definitely not).

            So what? Comparisons other people made doesn’t change that he was extrapolating from an outlier with respect to many predictors of health outcomes.

            Do we know that?

            Do we know if conditions on board a cross are unlike typical living conditions? Yes, we know that!

            Ability to isolate. Degree of isolation. Ventilation. Access to healthcare. Inter-generatkonal mixing. Everything on board a cruise is atypical. Do we know how those atypical aspects affected outcomes? Nope. We don’t know if they increased spread or decreased spread, increased or decreased viral load, increased or decreased fatality. That’s why you shouldn’t use passengers on a cruise to extrapolate.

            We don’t even know of it was a strain that might have been typical to other samples.

            > And learning more about how to deal with COVID may cancel out — or more — any effect from greater medical resources per patient.

            >>As far as I can tell he adjusted for age (I welcome being shown I’m wrong about that) – there are many other explanatory variables for which that population would be an outlier.

            > Everything else except preexisting conditions (which are very strongly correlated with age anyway) is practically “details around the edges”

            But once you’ve controlled for age, they are very important. You silent just gloss over them. Bad science.

            > But I guess this depends on how much you think socioeconomic status and such have an independent effect, vs. basically being proxies for urban density/contact patterns. Since the Diamond Princess was dense, I don’t think much “wealth effect” should show up, but who knows…

            If density is high, it could exaggerate the predictive value of SES, as mediated by factors such as lifetime health status, comorbidities, etc.

            • confused says:

              >>SES is a strong predictor of health outcomes. The associated causality is obviously complicated.

              Well, sure, but “SES” itself can’t have any biological effect. It is a useful proxy for a complicated mix of factors, certainly. But a lot of those factors won’t apply in this kind of situation, so the *usually* strong link between SES and outcomes is pretty likely to be broken or at least weakened here.

              Living patterns (work exposures, multi-generational households, shared housing, general crowding, urban density) are probably a huge factor in the SES effect – for COVID at least. But a cruise ship is already about the worst kind of situation for COVID, so a lot of the beneficial effect of SES would probably be gone in this case.

              Sure, lower comorbidities maybe, but the effect should at least still be weakened.

              >>Do we know how those atypical aspects affected outcomes? Nope. We don’t know if they increased spread or decreased spread, increased or decreased viral load, increased or decreased fatality.

              See, this is the part I disagree with. It seems pretty clear that congregate settings are seriously bad news. I really don’t see a plausible case that the Diamond Princess could have been better off than a “general population” sample of the same age… except maybe by better medical care, but only by comparison to the average patient *at the same time*. Care *now* in the US is probably largely better than what they got, and will continue to improve.

              >> If density is high, it could exaggerate the predictive value of SES, as mediated by factors such as lifetime health status, comorbidities, etc.

              Unless a major part of the predictive value of SES *for COVID specifically* is as a proxy for dense living conditions.

              • Joshua says:

                > But a lot of those factors won’t apply in this kind of situation, so the *usually* strong link between SES and outcomes is pretty likely to be broken or at least weakened here.

                I don’t see that. Many of those factors contribute to baseline health status, such as diet, health behaviors, etc. The average grandparent over 70 has X% chance of living in a multi-generational household; the average grandparent over 70 on a cruise has about a 0% chance of living with a grandchild while on board. Etc.

                I think as a general principle it’s a bad idea to extrapolate from non-representative sampling unless you have ways to adjust your sample. In this case, it’s really unrepresentative sampling with very little ability for adjustment.

                > But a cruise ship is already about the worst kind of situation for COVID, so a lot of the beneficial effect of SES would probably be gone in this case.

                So we might think. But the problem is that we really don’t know. I think that such speculation should be avoided. Better to go with sampling where you have relevant data for making adjustments. Too much chance for confirmation bias that way lies, IMO.

                > See, this is the part I disagree with. It seems pretty clear that congregate settings are seriously bad news. I really don’t see a plausible case that the Diamond Princess could have been better off than a “general population” sample of the same age… except maybe by better medical care, but only by comparison to the average patient *at the same time*. Care *now* in the US is probably largely better than what they got, and will continue to improve.

                See above. Yes, congregate living settings as a general condition are seriously bad news. But there’s a lot of detail there that could be relevant. Life in a nursing home is pretty much diametrically opposed to live on a cruise. On the cruise they had an extraordinary ability to isolate people. It seems from what I’ve read they didn’t really maximize that ability at all (it took them days to really lock people down, infected staff roomed with uninfected staff), but in point of fact we really don’t know without a careful analysis. I agree that there’s reason to speculate that the conditions contributed to spread, but even there, I think that we would be making unjustified assumptions in the sense that you just did of comparing life on a cruise to congregate living as a general condition.

                What makes this particularly ironic is that John was right, IMO, to call for better data as a part of projecting. And yes, sometimes you go to war with the epidemiological data that you have. But you for me, there is a basic principle of you really, really avoid extrapolating from obviously unrepresentative data and incredibly idiosyncratic treatment conditions.

                > Unless a major part of the predictive value of SES *for COVID specifically* is as a proxy for dense living conditions.

                I”m saying that you shouldn’t compare high-density on a cruise with cruise passengers (of a high SES) to high-density living in NYC with people living in a public housing block. Even when you control for age, comorbidities and density, seems to me to be a really bad idea.

              • Zhou Fang says:

                Factors that might make the Diamond Princess patients better off is:

                1. Early detection for everyone. The ship is basically like a perfect contact tracing scenario where we were testing almost everyone before they even exhibited symptoms. This means there’s no additional deaths from a failure to diagnose Covid19 before it’s too late to get to a hospital.

                2. Viral load. It may well be the case that Diamond Princess passengers are exposed to lower viral load than in a non-ship scenario. We can only really speculate about this.

              • confused says:

                I suppose I’m mostly arguing about details here. But I do think the congregate / non-congregate distinction is hugely important for COVID, and that use of SES as an ‘explanatory’ variable (as opposed to a ‘warning signal’ to point out potentially more vulnerable populations, which it is very good for) is problematic.

                A bit of a “prior” which is affecting my argument here: I’ve seen some uses of SES as ‘explanatory’ in environmental health stuff where it is pretty clearly serving as a proxy for specific definable effects, and where working out those specific effects could give a much better picture of what is really happening.

                A non-biological factor like SES can only serve as a proxy for other things — so when you compare to a different population, the effect will only remain if the correlation to the genuinely explanatory factors it serves as a proxy for also remains.

                The only other specific comment I have is…

                >>I think that we would be making unjustified assumptions in the sense that you just did of comparing life on a cruise to congregate living as a general condition

                This seems somewhat counter-Occamian to me. If every kind of congregate setting that’s been looked at* seems to have big problems (nursing homes, prisons, naval ships, worker dormitories in Singapore)…

                *We don’t have good evidence for college dorms since they were closed early. And it might be hard to notice without mass testing, anyway, since the rate of severe cases in the ~18-24 age group seems to be very low.

              • Joshua says:

                confused –

                > But I do think the congregate / non-congregate distinction is hugely important for COVID, and that use of SES as an ‘explanatory’ variable (as opposed to a ‘warning signal’ to point out potentially more vulnerable populations, which it is very good for) is problematic.

                Of course, I agree that it’s hugely important. But, not all congregate living is equal. And, the effect of SES can mediate or moderate the influence of congregate living. Seems to me you are thinking of them as somehow mutually exclusive effects, rather than additive or that they have an interaction effect.

                > I’ve seen some uses of SES as ‘explanatory’ in environmental health stuff where it is pretty clearly serving as a proxy for specific definable effects, and where working out those specific effects could give a much better picture of what is really happening.

                Sure, digging beneath the SES association with health outcomes, to understand the underlying causality there is important. I’m not questioning that at all.

                > f every kind of congregate setting that’s been looked at* seems to have big problems (nursing homes, prisons, naval ships, worker dormitories in Singapore)…

                But again, all congregate living isn’t equal. There is a quality to all of those other settings that isn’t remotely similar to a bunch of high SES folks out on a cruise. They all are settings of lower SES folks with the associated increase in susceptibility to the risk factors from congregate living, living without anything like the accommodations and services provided (like strict isolation with proper ventilation and food delivered after a first infection was discovered). as there would be on a cruise.

                Anyway, I think this horse is sufficiently dead now?

              • confused says:

                >> Anyway, I think this horse is sufficiently dead now?

                Yeah, probably.

          • Here is an article about the Diamond Princess Cruise Ship

            Natural History of Asymptomatic SARS-CoV-2 Infection

            https://www.nejm.org/doi/pdf/10.1056/NEJMc2013020?articleTools=true

    • Paul Johnson says:

      Well, what % of the US population actually *have been* infected?

      • Joseph Candelora says:

        Based on the large-scale serology testing I’ve seen (NY State), about 12% as of 6 weeks ago.

        Based on current case count from Worldometer (2.1m) and Ioannidis’s Santa Clara study estimate of infection undercounting (actual infections = 50 to 85x the case count) somewhere between 30% and 50% today.

        Why do you ask?

        • Joshua says:

          Based on a more resonable estimate than John’s estimate when he got so far out in front of the evidence…

          2 million now identified, plus maybe 35% asymptomatic…

          Plus how many who were symptomatic and presented for testing but who were sent home without testing plus how many who were symptomatic but never even went to be tested? And then multiply that sum by another 35%

          Seems that John was clearly off on his prediction of infected, maybe by 100% or more, although perhaps by not as many multiples as his estimate of deaths (against pointing back to his underestimating the fatality rate.

          • confused says:

            New York’s serology-implied IFR is a little bit above 1% (~30k deaths out of ~2.75 million infections). Medical care was most strained in NYC, so the US overall IFR will probably be lower.

            Applying a 1% IFR to the ~115,000 US deaths, that’s 11.5 million infections (3.5% of US population).
            If the US overall IFR is 0.75%, that’s 15.3 million infections (4.6% of US population).

            And since deaths lag infections, and people are still being infected, the total number infected as of today is higher, probably over 5%. The 11.5-15.3 million is probably a decent estimate as of 3 to 4 weeks ago.

  7. Mike says:

    I remember Andrew’s 2016 post “the winds have changed” and a comment by a reader called “Plucky” who pointed out that many people practicing science have a mistaken notion of science as a community of practitioners (with all the parochial goals and habits that communities develop for protecting their members from criticism or harm). Possibly the Undark reporters also have this view of science as a community rather than of science as a method. If they do then the biased reporting and resistance to improvement that Andrew experienced would make sense: the Undark reporters are just reporting on a disagreement between two communities, and the reporters have taken a particular side in that disagreement (without caring much about which side has a better method for finding out what’s true about coronavirus disease).

  8. Martha (Smith) says:

    Andrew said,
    “This has been changed to: “Attacks on Ioannidis came early and often. Just days after the study published, Columbia University statistician Andrew Gelman wrote that Ioannidis and his co-authors ‘owe an apology not just to us, but to Stanford.’”

    First, I never attacked anyone. I pointed out errors in a much-discussed paper. I wrote that a group of authors make avoidable errors and I thought they should apologize for wasting our time with sloppy work. That is not an attack.”

    We (as a society) need to get to a place where “criticism” is not considered “an attack”. We ain’t there yet.

  9. A Country Farmer says:

    > First, I never attacked anyone. I pointed out errors in a much-discussed paper. I wrote that a group of authors make avoidable errors and I thought they should apologize for wasting our time with sloppy work. That is not an attack.

    I now generally enjoy your content and I’m trying to be objective and learn from you, but I can give you a personal anecdote that I initially found you after seeing your article criticizing Bendavid et al. shared on Hacker News. I very much personally took your article as an attack on Ioannidis and his team due to various phrases. I notice that the URL still has “fatal-flaws” in it? Was the title changed at some point?

    Given their paper wasn’t retracted and you seemed okay with their changes to the paper based on crowd peer review, it would seem “fatal” flaws was attack-ish, along with other phrases in there like “We wasted time and effort discussing this paper whose main selling point was some numbers that were essentially the product of a statistical error”, and others I’d need to re-read to get the feeling of again.

    I know my feelings of your blog post were a bit of an emotional overreaction, but there might be something to it. I still struggle with the style of your writings sometimes as insensitive.

    • Andrew says:

      Country:

      The paper did have fatal flaws! It’s not an attack on Ioannidis to say that a paper he was 16th author on had fatal flaws. It’s just the way things are. We all make mistakes. In any case, I don’t give primary responsibility on a paper to its 16th author.

      • Andrew, in Biology and Medicine it’s almost always the case that the person in charge of the research and who is primarily responsible for the major research choices is listed LAST.

        In bench science, first author is given to the person doing most of the bench work. Last author is always the PI in charge of the lab. Middle authors may sometimes be grad students or technicians. If there is a collaboration the PIs are listed towards the end, with the most responsible PI last, and the second most second-to-last etc.

  10. Mel Wahl says:

    Mathematicians can lie or be wrong for only a short time.
    Statisticians can lie or be wrong just long enough to cause a lot of damage.
    – mew

    That’s why I am a mathematician.

Leave a Reply to Paul Johnson