“Evidence-based medicine”: does it lead to people turning off their brains?

Joshua Brooks points us to this post by David Gorski, “The Cochrane mask fiasco: Does EBM predispose to COVID contrarianism?” EBM stands for “evidence-based medicine,” and here’s what Gorski writes:

A week and a half ago, the New York Times published on Opinion piece by Zeynep Tufekci entitled Here’s Why the Science Is Clear That Masks Work. Written in response to a recent Cochrane review, Physical interventions to interrupt or reduce the spread of respiratory viruses, that had over the last month been widely promoted by antimask and antivaccine sources, the article discusses the problems with the review and its lead author Tom Jefferson, as well as why it is not nearly as straightforward as one might assume to measure mask efficacy in the middle of a pandemic due to a novel respiratory virus. Over the month since the review’s publication, its many problems and deficiencies (as well as how it has been unrelentingly misinterpreted) have been discussed widely by a number of writers, academics, and bloggers . . .

My [Gorski’s] purpose in writing about this kerfuffle is not to rehash (much) why the Cochrane review was so problematic. Rather, it’s more to look at what this whole kerfuffle tells us about the Cochrane Collaborative and the evidence-based medicine (EBM) paradigm it champions. . . . I want to ask: What is it about Cochrane and EBM fundamentalists who promote the EBM paradigm as the be-all and end-all of medical evidence, even for questions for which it is ill-suited, that can produce misleading results? . . .

Back in the day, we used to call EBM’s failure to consider the low to nonexistent prior probability as assessed by basic science that magic like homeopathy could work its “blind spot.” Jefferson’s review, coupled with the behavior of EBM gurus like John Ioannidis during the pandemic, made me wonder if there’s another blind spot of EBM that we at SBM have neglected, one that leads to Cochrane reviews like Jefferson’s and leads EBM gurus like Ioannidis to make their heel turns so soon after the pandemic hit . . .

[Regarding the mask report,] perusing the triumphant gloating on social media from ideological sources opposed to COVID-19 interventions, including masks and vaccines, I was struck by how often they used the exact phrase “gold standard” to portray Cochrane as an indisputable source, all to bolster their misrepresentation. . . .

Gorski continues:

I’ve noticed over the last three years a tendency for scientists who were known primarily before the pandemic as strong advocates of evidence-based medicine (EBM), devolving into promoters of COVID-19 denial, antimask, anti-public health, and even antivaccine pseudoscience. Think Dr. John Ioannidis, whom I used to lionize before 2020. Think Dr. Vinay Prasad, of whose work on medical reversals and calls for more rigorous randomized clinical trials of chemotherapy and targeted therapy agents before FDA approval we generally wrote approvingly.

Basically, what Jefferson exhibited in his almost off-the-cuff claim that massive RCTs of masks should have been done while a deadly respiratory virus was flooding UK hospitals was something we like to call “methodolatry,” or the obscene worship of the RCT as the only method of clinical investigation. . . .

But it’s not so simple:

Human trials are messy. It is impossible to make them rigorous in ways that are comparable to laboratory experiments. Compared to laboratory investigations, clinical trials are necessarily less powered and more prone to numerous other sources of error: biases, whether conscious or not, causing or resulting from non-comparable experimental and control groups, cuing of subjects, post-hoc analyses, multiple testing artifacts, unrecognized confounding of data due to subjects’ own motivations, non-publication of results, inappropriate statistical analyses, conclusions that don’t follow from the data, inappropriate pooling of non-significant data from several, small studies to produce an aggregate that appears statistically significant, fraud, and more.

Evidence-based medicine eats itself

For some background on the controversies surrounding “evidence-based medicine,” see this news article from Aaron Carroll from 2017.

Here’s how I summarized things back in 2020, my post entitled “Evidence-based medicine eats itself”:

There are three commonly stated principles of evidence-based research:

1. Reliance when possible on statistically significant results from randomized trials;

2. Balancing of costs, benefits, and uncertainties in decision making;

3. Treatments targeted to individuals or subsets of the population.

Unfortunately and paradoxically, the use of statistics for hypothesis testing can get in the way of the movement toward an evidence-based framework for policy analysis. This claim may come as a surprise, given that one of the meanings of evidence-based analysis is hypothesis testing based on randomized trials. The problem is that principle (1) above is in some conflict with principles (2) and (3).

The conflict with (2) is that statistical significance or non-significance is typically used at all levels to replace uncertainty with certainty—indeed, researchers are encouraged to do this and it is standard practice.

The conflict with (3) is that estimating effects for individuals or population subsets is difficult. A quick calculation finds that it takes 16 times the sample size to estimate an interaction as a main effect, and given that we are lucky if our studies are powered well enough to estimate main effects of interest, it will typically be hopeless to try to obtain the near-certainty regarding interactions. That is fine if we remember principle (2), but not so fine if our experiences with classical statistics have trained us to demand statistical significance as a prerequisite for publication and decision making.

Bridges needed

The above-linked Gorski post was interesting to me because it presents a completely different criticism of the evidence-based-medicine paradigm.

It’s not that controlled trials are bad; rather, the deeper problems seem to be: (a) inferential summaries and decision strategies that don’t respect uncertainty (that was my concern) and (b) research agendas that don’t engage with scientific understanding (that was Gorski’s concern).

Regarding that latter point: a problem with standard “evidence-based medicine” or what I’ve called the “take a pill, push a button model of science” is not that it ignores scientific theories, but rather that it features a gap between theory and evidence. On one side there are theory-stories of varying levels of plausibility; on the other side there are statistical summaries from (necessarily) imperfect study.

What we need are bridges between theory and evidence. This includes sharper theories that make quantitative predictions that can be experimentally studied, and empirical studies measuring intermediate outcomes, and lab experiments to go along with the field studies.

57 thoughts on ““Evidence-based medicine”: does it lead to people turning off their brains?

  1. Tom Jefferson himself has been problematic about vaccines

    And what method was used to convince this author whatever it is they believe about vaccines? I bet at the root is EBM, which is in turn based on NHST.

    This is what NHST does, generates an endless series of conflicting results that never change anyones mind. It only measures a wealth/power weighted collective belief.

    It is why people have been forced to drop their standards so low that they can actually believe abusing the frail in nursing homes and hospitals by scaring them and keeping them from family/friends is a valid medical intervention. There is “no evidence” that elder abuse kills people.

    Or that more reported cases and all cause mortality after a massive vaccination program is “success”.

    This includes sharper theories that make quantitative predictions that can be experimentally studied, and empirical studies measuring intermediate outcomes, and lab experiments to go along with the field studies.

    Yes. And regarding the masks, we saw from lab studies that they redirect the airflow into a cloud that gathers around the persons head while normally it is directed towards the ground. So we would predict they decrease the exposure from eg, a standing waiter talking to a sitting customer. But vastly increase it in people standing in line who then step forward to breath the air just exhaled by the person in front of them. To make it quantitative we’d have to measure the air currents and so on.

    • “…So we would predict they decrease the exposure from eg, a standing waiter talking to a sitting customer. But vastly increase it in people standing in line who then step forward to breathe the air just exhaled by the person in front of them. To make it quantitative we’d have to measure the air currents and so on.”

      You seem to have forgotten to take into account the filtration effectiveness of the mask! I can’t say I’m surprised.

  2. EBP is used as a big carpet you can sweep anything you want under. It is, in many cases, also used to cover one’s ‘assets’, just show that something has been shown to be ‘EBP’. The fact that medical industrial complex is producing various PhD degrees at an industrial scale, certainly doesn’t help.
    https://www.statista.com/statistics/185353/number-of-doctoral-degrees-by-field-of-research/
    Basically, they keep inventing advanced degrees and giving out diplomas to each other. Most of them are really diploma mills (hey, but there’s a ‘shortage’ of professionals, so have to keep cranking them out). Can’t blame them though. Health business has more money than a few gods combined.

  3. … use the term “Evidence Based Science” instead of “evidence-based medicine” and the basic Knowledge issue becomes more clear. The Scientific Method works in MEDICAL research too.

    We actually seek factual knowledge on effective medical treatments, but usually settle for subjective theories of medical efficacy.

    above #2 of ‘ three commonly stated principles of evidence-based research’ is false.

    Cost/Benefit Analysis is always ultimately a subjective judgment based on variable human values.

  4. I suggest Gorski’s concern that medical researchers don’t engage with current scientific understanding in afield is at least partly behind why they lean on NHST. For sure their training doesn’t help, but they are also filling a gap.

    If you are trained to determine what research question is interesting, and what are the gaps in knowledge entirely by reading Cochrane reviews and meta-analyses, then you can tend to end up relying on p-values to determine what you do.

    Whereas if you are a scientist (perhaps in contrast to a physician or physician-scientist) who has spent undergrad, postgrad, postdoc, and who-knows-how-many years of faculty immersed in an often narrow and specialised field without having to worry about patients, you have a much deeper understanding and feel for what bits of the science are important, what work is right and what work is suspect, and you have a sense for what the big questions are and where the field is moving

    Maybe I’ll be lambasted for saying it, but I don’t think physicians really get scientific training!

    • Physicians are absolutely not scientists, nor do they get much in the way of scientific training (ie. design of experiments, technique in mouse genetics or genome sequencing or western blots or PCR or regression analysis of data etc)

      An analogy might be between say a physicist / aeronautical engineer (biologist or medical researcher) and an aircraft maintenance technician (clinical doctor).

      Of course some doctors get PhDs and do research, but most doctors are busy learning how to diagnose and treat patients. Like aircraft technicians learn how to diagnose and fix broken aircraft systems.

  5. Gorski also writes (at too great length) for a blog called Science-Based Medicine, named to contrast with evidence-based medicine. He wants to ask: “What is it about Cochrane and EBM fundamentalists who promote the EBM paradigm as the be-all and end-all of medical evidence, even for questions for which it is ill-suited, that can produce misleading results?” Well, they are human. They fell in love with something they developed that works a lot of the time, and they want it to work all of the time. Seems pretty ordinary.

  6. This is such a timely post to find this weekend!

    My current research is focused on developing a model of research synthesis that respects uncertainty, decision-making, and the complexities of the world around us.

    A few thoughts others may find useful:

    1. This post reminds me of your prior blog post from 2018: The gap between 1,2, and 3, is too large
    “What we really need are bridges between the following three things:

    1. **Qualitative research, could be N=1 or could be larger N,** but the point is to really understand what’s going on in individual cases.

    2. **Quantitative research with careful measurement, within-person comparisons, and large N.

    **3. The real world. Whatever people are doing when they’re not doing research.

    The gaps between 1, 2, and 3 are just too large.”

    Some applied fields like education and HCI are more likely to bridge the three worlds you mentioned in 2018. Of course, my experience as a junior researcher taught me that due to educational limitations, the constellation of studies that PIs trace in their research programs could be even more varied.

    I’m currently cataloging research methods to weave together into my empirical research (content analyses -> self-experimentation -> interviews and surveys -> small experiments).

    2. The alternative to EBM and meta-analyses seems to be realist reviews/evaluations often used in nursing, education, and program evaluation more broadly. This synthesis method emphasizes contexts, mechanisms, and a range of outcomes (intermediary and distant/final). A closely related idea is Cartwright’s theory of process change (often used in the program evaluation field to avoid overgeneralization from studies). Realist reviews take in a wide variety of inputs (qualitative, quantitative, and mixed methods studies), construct the sort of theory that is contextualized, explanatory, and practical. It seems far more open-minded at each level (inputs, processing/synthesis, and outputs). Note: Chris Blunt’s blog and scholarship are a good place to start if you want to learn more: https://cjblunt.com/category/evidence-based-medicine/hierarchies-of-evidence/

    3. It seemed as though earlier when I began this thread of research into synthesis systems like meta-analyses, realist reviews, and mixed methods reviews, the term “evidence” itself threw me off. Colloquially, evidence is used to signify that we are interested in rational and skeptical argumentation as opposed to ideological advocacy. It became easier for me to critique evidence-based practice once I stopped focusing on the distracting use of “evidence.”

  7. There are sooo many problems with this screed – I only got 1/2 way through before I had to go for a walk.
    Normally I’d suggest people read it for themselves but that would contradict the Hippocratic principle of non-maleficence.

    What I would recommend is a read of Carl Heneghan’s substack “Trust the Evidence”. Carl is the co-author of the Cochrane Mask review and over numerous (but short) posts describes the whole saga (and he’s got receipts).

    But onto the problems with Gorski’s post:
    1. He seems to have some sort of beef with Tom Jefferson and spends a fair whack of the post denigrating him (that is personally, not his work). Strangely, Zeynep Tufecki (who BTW is a sociologist apparently without any knowledge of evidence appraisal), does the same thing.
    2. He claims that the Cochrane Collaboration was “forced to walk back” the conclusions of the review. This is not true – the review is still up at the Cochrane website unaltered. The Cochrane editors did release a statement rather critical of the language of the reviews conclusion but that’s all.
    3. Gorski states that Tom Jefferson has a history of being “problematic about vax”. If calling for good evidence about mass-administered treatments such as the influenza vaccine is “problematic” then we’re in deep trouble.
    4. He criticises the performance of RCTs on alternative treatments such as homeopathy and intercessory prayer on the grounds that it’s impossible that they could work. Fine, his prior is very skeptical (so is mine), but that is not an argument for not to study these things. He then laments that these RCTs were equivocal. Well, if your prior is skeptical and the evidence is equivocal then your posterior remains skeptical. I don’t think he understands Bayes.
    5. He gets the meaning of a p-value wrong (sigh, of course).
    6. One of the conclusions of the Cochrane authors was that the mask evidence was very weak and that more effort should have been put in to performing RCTs during the pandemic to improve this. For some reason this sends Gorski into a fit of rage. He says that it would have been too difficult to do during a pandemic. Well, it is pretty hard to perform a trial of an anti-pandemic measure unless you are in the middle of a pandemic.

    I could go on, but I need another walk.

    (PS The contrast between the sober Trust the Evidence substack and Gorski’s is striking)

  8. > “What we need are bridges between theory and evidence. This includes sharper theories that make quantitative predictions that can be experimentally studied, and empirical studies measuring intermediate outcomes, and lab experiments to go along with the field studies.”

    I keep going back and forth between strongly agreeing and strongly disagreeing with this statement. Isn’t the EBM movement at least partly a response to medicine’s tendency to elevate common-sense ideas, often based on a siloed view of one aspect of a biological system, to something like scientific truth? Okay, maybe common sense ideas (eat less fat to be thin!) don’t qualify as “sharper theories that make quantitative predictions,” but I think sharp theories are hard to come by in complex biological systems where the downfall of an idea could turn out to be a system you had no idea was even relevant.

  9. The critical issue for me here wasn’t so much the contrast of EBM vs. SBM or the very real problems with NHST or the frequent over-playing of RCTs.

    The critical issue for me in this whole situation was the politicization and bastardization of scientific principles for use as cheap ideological, identity-based rhetoric.

    Even before the material subsequently published by Cochrane, it should have been, imo, obvious to any serious observer how the Cochrane Review was being played by the likes of Prasad and Jefferson and the many, many people who argued as if the report was conclusive that masks don’t work. Their statements about the study that directly contradicted the wording of the review itself were inexcusable – and mind boggling in Jefferson’s case given that he’s one of the freakin’ authors. It really isn’t much better with Prasad, IMO, as he’s often demonstrated that along with crap advocacy during the COVID era, he’s a sharp contributor to science.

    That’s not to suggest that the crap has been spread around only on their side of the mask debate or the other issues.

    Anyway, although I thought Gorski’s take was interesting I don’t have a problem with EBM, NHST, RCTs per se. They’re tools. They’re information. The problems are with how they’re leveraged to promote crap.

  10. 100% agree with this:

    > The conflict with (2) is that statistical significance or non-significance is typically used at all levels to replace uncertainty with certainty—indeed, researchers are encouraged to do this and it is standard practice.

    There’s a sort of emotional maturity that’s needed to cope with high levels of uncertainty, especially in medicine. A simple “I don’t know” has an almost callous ring to it if you imaging a doctor saying it to a patient trying to make a high-stakes choice. People in that situation need to feel like, win or lose, they’ve made the best decision they could, given the available information. The problem is when experts (and “experts”) crack under the pressure and start throwing around terms like “gold standard” to hide from uncertainty. A statement like,”there’s a lot we don’t know, but the currently available evidence appears to lean in favor of course A” would still give people what they need to feel like they’ve made the best decision they could.

    FWIW, in my personal experiences with medical doctors, at least in the past 10 or so years, I feel like I’ve seen good acknowledgements of uncertainty about half the time. Not ideal, but certainly not nothing, so this messaging must be getting through somewhere in the clinical training pipeline.

    Moving up through the levels from practicing MDs to AMA recommendations, big-science research labs, and funding agencies, I can only imagine that the pressure to paint over uncertainty with gold standards (or whatever) must be pretty great.

    I probably should have posted this under Anoneuoid’s top post because I think technical shortcomings and ubiquitous misuses of NHST are beside the point. This is first and foremost a problem about humans, not statistics.

  11. My reading is that EBM is merely a list of hopeful desiderata, eg. “Lets use RCT if possible, if not, lets use all the evidence we have”. Unfortunately we are not told how to combine pieces of evidence, say experimental and observational, or several RCT’s run under different conditions.
    A theory for such combinations has been developed, but it hasn’t penetrated the EBM literature.
    See “Challenging the Hegemony of Randomized Controlled
    Trials: Comments on Deaton and Cartwright,”
    https://ucla.in/2JfdQcv

    And on Data-Fusion: https://ucla.in/2Jc1kdD,

    .

    • Judea:

      The statistical method of meta-analysis, which is exactly an approach for combining evidence from multiple sources, is well known in evidence-based medicine, so I disagree with your statement, “A theory for such combinations [of pieces of evidence] has been developed, but it hasn’t penetrated the EBM literature.” Unfortunately, there are various issues of data quality that can cause problems with any method of combining information; see for example the examples discussed here and here.

      Also relevant is the paper on Facilitating Reasoning with Epistemic Uncertainty in Meta-analysis, by Slex Kale, Sarah Lee, Terrance Goan, Elizabeth Tipton, and Jessica Hullman, discussed here by Hullman..

      • Andrew,
        From what I’ve read thus far, the “statistical method of meta-analysis” neglects the heterogeneity of the data sources and, therefore, falls short of combining, say, RCT’s data with observational studies, as is done here https://ucla.in/2Jc1kdD. Moreover, the task of combining heterogenous data sources is a causal, not statistical endeavor, for it requires causal modeling of the populations involved, see https://ucla.in/2N7S0K9.
        It is for this reason, I believe, that the problems of “external validity” and “selection bias” have not received adequate treatments in the EBM literature.
        I am eager to read the paper by Hullman et al but, based on the description, I do not see the word “causal” appearing, so I am doubtful.
        Would love to be corrected, if you think my reading is off.

        • Judea:

          No, the statistical method of meta-analysis (no quotation marks needed) does not neglect the heterogeneity of data sources, and it is often used to combine data from randomized trials with data from observational studies. I agree that these are causal questions; they are also statistical because the design and analysis depends on variation. It’s not causal or statistical; it’s causal and statistical. Some research in this area focuses on the causal aspect of the problem, and that’s fine; other research in this area focuses on statistical modeling and inference, and that’s fine too. Both parts are important.

      • Joshua,
        Both you and Andrew assure me that EBM experts routinely combine data from randomized trials with data from observational studies. I am skeptical, because (1) I havn’t seen it done in any of the papers I’ve read and (2) How can it be done if the latter is infected with confounding bias? Reason (2) is my main logical obstacle because, if it’s done after removing the bias, who needs the small RCT data? If it’s done w/o removing the bias, what can biased data tell us w/o contaminating the result? Can you allay my logical obstacle?

        • Judea –

          > Both you and Andrew assure me…

          Seems to be some confusion there, as I didn’t weigh in.

          I will say, however, that I have found Andrew to be a good source for understanding the methodology for conducting meta-anlyses.

        • Judea:

          The literature on meta-analysis is huge. (I guess someone should meta-analyze it!) In this literature, there’s much discussion of biases and difficulties with generalization, both for individual studies (because of discrepancies between sample and population, imbalance between treatment and control groups, and measurements that don’t line up to the underlying construct of interest) and in the analysis itself (choice of what studies to include in the meta-analysis, along with selection in what gets studied in the first place); see for example this 1978 paper, “Quantitative methods in the review of epidemiologic literature,” by Sander Greenland. Whether a study has randomized assignment is just one of many issues.

          Regarding the specific question of combining randomized and non-randomized trials in a meta-analysis, a quick google search turned up this 2007 paper, “Should meta-analyses of interventions include observational studies in addition to randomized controlled trials? A critical examination of underlying principles,” by Ian Shrier, Jean-François Boivin, Russell J. Steele, Robert W. Platt, Andrea Furlan, Ritsuko Kakuma, James Brophy, and Michel Rossignol. The point here is that a basic meta-analysis, combining data from different studies without trying to model the various sources of bias mentioned above, is just a starting point, and that’s the case whether or not the meta-analysis includes randomized trials as well as observational studies.

          Regarding your question: all the data are biased. It’s not a matter of “contaminating the result”; it’s just a general issue that we need to consider many sources of generalization. The errors involved in these generalizations can be modeled. The models aren’t perfect, no method is perfect, etc.

        • Andrew,
          Of course, “the literature on meta-analysis is huge.” Of course, “there’s much discussion of biases and difficulties with generalization,” Of course people like
          Ian Schrier etal continue to ask:
          “Should meta-analyses of interventions include observational studies?”
          Of course people continue to demand again and again “that we need to consider many sources of generalization.”
          But, for havens sake, with all these “discussions” “difficulties” and “demands”, shouldn’t we see all the meta-analysis experts celebrating with trumpets the fact that we can now answer these questions precisely, rather leave them at the mercy of flimsy judgment?
          I hope we start seeing them celebrate.

    • I think that the idea of Evidence Based Medicine/Practice is not about doing meta analysis, but the argument against doing what you learned to do from whoever taught you 30 years ago or based on your gut instincts. Sure there are different levels of evidence and meta analysis is part of the big picture of understanding what to do when there are many studies. It’s important not just in medicine.

      • Elin:

        One of the strongest arguments in favor of formal meta-analysis, as with cost-benefit analysis and other controversial applications of quantitative methods, is that the alternative to formal meta-analysis is not “no meta-analysis,” it’s “informal meta-analysis.” Given that people will combine existing information somehow, there are advantages to making the procedure transparent.

        Separately, Judea is alluding to the difficulty of generalizing from existing data to new cases. This is indeed a big problem, which is addressed by multilevel modeling or, more generally, by statistical modeling of variation. Judea is talking about the challenges of combining experimental data with observational studies, but really that’s the least of it. Even if all your data are from controlled experiments, or if all your data are from observational studies, it can be a big deal to generalize to new cases. Assumptions are needed in any case.

        I realize that you know all this, so I guess this comment isn’t for you; it’s more of a followup on your comment which I hope will be helpful to others.

        • Yes that is exactly right. As you know the evidence based concept is used in social work, criminal justice, education and other areas beyond medicine. Once you start telling people–front line decision makers like actual doctors or high school math teachers– that they should use evidence based practices you risk them going off and using the first article they find that confirms their preconceptions or is kinda interesting/cool or on NPR (usual cast of characters). And if you get them to read more seriously and critically they still have to make practical decisions often about generalizing to the n=1 patient in front of them. Which is potentially a different kind of decision making than the CDC making recommendations or an insurance company deciding what treatments they should cover.

  12. I am surprised that I have not heard somebody talking about internal and external validity. Certainly masks _should_ work, but in situations where there is no virus there will be no benefit, and it seems at least possible that in situations where there is a great deal of virus present, that even very careful mask-wearers will eventually get infected. Indeed I believe I have seen a mathematical model to this effect as one of the many papers that appeared during covid (along with a study that showed the reduction in aerosol exported from mask-wearing was less than the (very considerable) difference in aerosol produced by talking vs not talking). So it is at least possible that many of the participants of RCTs happened to experience an environment in which there was not an obvious benefit from mask-wearing.

    • A. G. –

      > So it is at least possible that many of the participants of RCTs happened to experience an environment in which there was not an obvious benefit from mask-wearing

      I agree that seems plausible. Just as it seems (to me) plausible that they might have a benefit. Particularly if worn properly, in certain conditions, particularly if we include source control for delivering reduced viral load even if aerosolized particles lead to transmission, as a potential benefit.

      The problem, as the Cochrane Review made explicit (and people like Jefferson and Prasad pretty much ignored) is that beyond theoretical speculation about what might be plausible, the RCT evidence collected was pretty uninformative, although not completely useless for helping to assess probabilities broadly. (E.g., if not always worn correctly or bot worn with high compliance pretty much all the time, they aren’t wildly effrctive. But we prolly could have guessed that anyway).

      • Joshua writes: “(E.g., if not always worn correctly or bot worn with high compliance pretty much all the time, they aren’t wildly effrctive. But we prolly could have guessed that anyway).”

        It was my understanding that this was the problem with the underlying trials: folks would take off their masks during breaks and the like.

        There was a big trial in which masks were worn correctly. Here in Japan, everyone wore masks, pretty much all the time, and pretty much correctly. Covid deaths were SIX TIMES LOWER in Japan than in the US and Europe. And that’s despite Japan’s famous commuter train crowding. (I was expecting Japan to be a cl@sterf@ck because of said crowding and I was wrong (all it takes is leaving the windows open). Still, when I see a crowded train, I look for the least crowded car and take least crowded car in the next train. Since I can and do avoid peak rush, what’s happening is that the trains are coming every 3 minutes, but a glitch happens, and a train gets delayed for, say 6 minutes, bunching up trains behind it. The Japanese, being hyper-busy and hyperactive, all crush into the first train that comes, and the next one’s empty. Yay!)

        By the way, masks are now optional here, and Japan has opened back up to tourists, so there are a plethora of unmasked idiots on the trains, but no medical facility in Japan allows anyone to enter without a mask. (It’s now hard to find statistics, but I did find the Covid death counts for this spring, and they’ve seemed to have largely fixed the problem that nursing homes had turned into death traps, so now the most deaths are in my age group (70-79). Ouch.

        • Another interesting aspect of Japan’s relatively low COVID mortality rate is the age of the population – which is a huge risk factor, especially with a lot of old, old.

          That said, IMO it’s all too complicated to reverse engineer from mortality rates to infer the efficacy of masks, given all the many confounds comparing cross-nationally

        • Replying to Joshua:

          Ha! You picked a GREAT day to point out Japan’s aging population: It’s a national holiday here.

          What national holiday, you ask??? (Drum Roll Drum Roll): Respect for the Elderly day!!!

          Yay, Japan! As of today (well, this year actually) a full 10% of Japan’s population is over the age of 80. Sheesh. At 71 I’m still not particularly old at all.

          There’s actually been a slowing of the increase in the aging population (depending on how you graph the numbers) since 2020, not because of Covid, but because Japan’s population distribution is wacky. Unlike the baby boom in the West, Japan’s baby boom only lasted 3 years. The postwar economy was a disaster, and as soon as abortion became legal, people stopped having the kids they couldn’t afford to have. So the 1945-1948 or so age cohort is bulging up the 78-81 age group, and the “baby-boom junior” (the period that those folks started having kids) is another pulse in the “violin graph” of the population. Also, there’s a year in the Chinese 60-year cycle that it’s real bad luck to be born in, and there’s a monster drop in the number of births for that year. But that brief and small “baby boom” means that there’s an “aging aging” phenomenon, in which the average age of the elderly is aging (increasing) since there’s a brief reduction in the number of new elderly joining the pool. At least until the baby-boom junior cohort hits 65.

          I dunno about the “too complicated” bit, though. Japan masked and vaxed and didn’t die quite as badly as countries dense of anti-vax anti-mask idiots. Seems pretty clear to me. (That’s not to say Japan had it easy; it been a rough 3 years and isn’t over yet. And Japan could have done even better than it did: the government really messed up, and still is messing up, in a variety of ways (IMHO, of course).)

          But the question remains as to how to persuade the population that masking, vaxing, and avoiding crowded interiors is important. That was easier in Japan (it happened almost despite the government) than it was, or will be, in other countries.

        • David –

          > I dunno about the “too complicated” bit, though. Japan masked and vaxed and didn’t die quite as badly as countries dense of anti-vax anti-mask idiots.

          Sure – but you’re mixing in vaxes there, which would complicate the question of masking as causal.

          > But the question remains as to how to persuade the population that masking, vaxing, and avoiding crowded interiors is important. That was easier in Japan (it happened almost despite the government) than it was, or will be, in other countries.

          My bias is that once sorting out all the confounds, one thing that will be left behind as casual is a lack of focus on any (putative) zero sum sacrifice of individual freedom for the sake of behaving in ways that are collectively aligned with the larger community – as would be the case in Japan compared to the US. And that’s part of the reason why I see very limited utility in comparing across countries if you haven’t controlled for that element. Comparing COVID policies between Asian (Conducian-rooted) counties to the US, or comparing the US to country like Sweden, where people have a strong social safety net and a strong level of trust in public health officials – seems to me to offer little insight into the impact of COVID policies in and of themselves.

    • For several months I’ve been thinking about doing a blog post about the issues of determining mask efficacy or even defining what we mean by that, but I thought it would take a lot of work to make it a good post and I’ve been too lazy to do the work.

      If you wear a mask that captures 100% of the infectious agent so that you don’t inhale any of it, then you won’t get an airborne infection. We know that without any data.

      If you wear a mask that captures x% of the infectious agent, then your probability of being infected is reduced compared to being unmasked…but (a) the magnitude of the reduction depends on the concentration of the agent, and on the duration you are in the environment, and of course on the agent itself (the answer would be different for flu and covid etc.) and (b) even for a given concentration and a given duration we don’t know the magnitude of the effect. The could and should be a lot more research in this area in general — get a bunch of volunteers, expose them to different known concentrations of an infectious agent, see how many get infected; do this with people who are and aren’t wearing different kinds of masks, for different durations, etc. etc. Yeah, sure, it would be time-consuming and expensive, but even in a non-pandemic year tens of thousands of people die from respiratory infections, and zillions of person-hours are lost to illness, and people’s quality of life is harmed. Surely some systematic research on this would be worthwhile.

      But even if we knew the effectiveness of masks, against COVID, for experimental conditions as described in the paragraph above — and I reiterate that I think such research should be done — we would still be a long way from knowing the real-world effectiveness of masks in the pandemic. At any given time, some locations will have no airborne virus and others will have a lot. Any individual moves around, generating virus (or not) that remains airborne for some amount of time, and inhaling some amount of virus (mediated by the mask), while other people move hither and thither doing the same thing.

      And even if we somehow knew the effectiveness of masks of a given type in the range of real-world environments, that wouldn’t tell us what we want to know. People wear all manner of masks, or no masks at all. Some people have facial hair that reduces mask effectiveness, some people let the mask dip down below their nose, some people wear knock-off masks, etc. etc. And everyone has to take the mask down to eat and drink, and some people do it to talk if they aren’t being understood. The range of people and masks and behaviors is really wide.

      And so on.

      Putting it all together…it’s complicated.

      That said, I think that it’s borderline ridiculous to suggest that masks don’t help reduce the risk that the wearer will be infected or will infect other people, all else equal. This isn’t 1800, we know about germs and viruses, we aren’t making this stuff up.

      Of course, all else isn’t necessarily equal. If you were going to have your groceries delivered, but you decide that you’ll be safe if you wear your mask to the grocery store, then your infection risk is higher because of the mask. I’d be surprised if the net health effect, society-wide, is not positive (i.e. fewer infections) due to some people wearing masks, but it’s not impossible.

      • Good comment Phil. The biggest variation in designed filtration mask effectiveness (ie. N95/KN95/KF94/respirators) is in terms of the leakage at the seal with your face. This is particularly more variable for adult males due to facial hair, which can literally change hourly as it grows out. If I have to go somewhere like my allergist or to shop at a supermarket I still shave first so I can wear a KN95. I do this to avoid virus still but also because I have pollen allergies so just going out in general at certain times of the year is unpleasant without a mask.

        When it comes to pollen, masks **dramatically** reduce the effect. I can go walk my dog on a very high allergy day with a KN95 and actually feel better after coming back than before I left (I get some exercise, but basically wearing the mask outdoors can expose me to less pollen than not wearing a mask indoors depending on how windy it is and leaky my house is).

        My wife has colleagues at the Dental school who said that throughout the pandemic the dental office saw patients while wearing N95 masks and literally *no one* had a transmission event that was linked to the dental office (as opposed to say family to family). So we have LOTS of evidence out there that places that 100% wore masks had very good success. It’s just not systematic evidence collected into a study.

        • I now use a mask when running the vacuum cleaner, and I sneeze much less than without. And yes on those super pollen-heavy days they make a difference too.

      • I was going to respond with something similar to what Daniel said. I reckon (but don’t know for sure) that there already have been quite a few studies regarding masks and filtration effectiveness for pathogens. Just think of the PPE required to enter hospital rooms of patients with airborne precautions like TB. Healthcare workers must be custom fitted with masks. These do the job well if there is appropriate fit (i.e. seal around the mask with your face).

        So there seems little doubt that, under the necessary assumptions that the correct type of mask is chosen, fitted properly, and worn correctly and at all times, the risk of inhaling the pathogen will be reduced or effectively be eliminated.

        However, as you hint at, whether or not mask mandates have any net benefit to society in terms of reducing transmission seems far more dubious.

        • Jd:

          I think part of the difficulty here is the use of expressions such as “mask mandates.” During 2020-2022, mask requirements were instituted by various governments, businesses, schools, etc. There wasn’t a general “mask mandate”; it was a mix of policies implemented by different private and public organizations. I agree with your general point that it’s not clear where there was benefit to society; I guess this depends, as some policies made more sense than others.

        • Yes. I agree.

          I should have fleshed out a bit more what I meant. Take a counterfactual scenario where the only “mask mandate” implemented anywhere in the US during 2020-2022 would have been to wear a mask at all places outside the home. Given the conditions required (proper mask, correct fit, no facial hair, worn at all times, correct don and doff procedure, etc) for PPE such as a mask to work, it would still be quite dubious whether any net benefit would be seen by such a mandate. Telling people to ‘wear’ a ‘mask’ apparently means many things to many people. Anecdotally, I saw many people wearing masks down on their chin, hanging from one ear, or with nose out, as if this were the middle ages and masks were some sort of amulet to ward off the disease (and I’ve seen this after any kind of requirements expired, so it wasn’t just some sort of ‘hey, look I’m wearing one but not really’ sort of attitude!). But even when attempting to wear one correctly, it must be correctly fitted with proper seal. This is difficult without access to a variety of sizes/shapes. I have no doubts about the efficacy of masks under proper use. I have strong doubts about the efficacy of masks as used by the general populace (i.e. policy/mandates/and such).

        • Jd:

          My favorite along these lines was when I was biking down the street and someone on the sidewalk screamed at me to wear a mask. The funny thing was, the person screaming was not wearing a mask himself!

        • Jd –

          I don’t know how well masks “work,” let alone mask mandates – but this should be seen imo, as a situation of (perhaps low probability but) high damage function risk, with the added condition of a potential benefit compounding from an individual risk scenario to the population level. Further, another factor is what (imo) seems to me to be a low downside risk.

          Adding to all of this is what seems to me to be growing evidence of a “dose-effect” where infection severity is likely proportional to the “viral load.”

      • I was thinking about this just earlier before reading your comment.

        Assessing the efficacy of masks must be really complicated.

        Say a mask filters out x% of viral material in some theoretical sense. But what does that mean in reality? Maybe the tiny particles it doesn’t filter out would never get to you because they’re attached to small water droplets that dry out and then the particles fall to the ground quickly. So then x% of viral particles filtered out means x% + y% of the particles filtered out that might infect you.

        Or maybe the smaller particles are more likely to stay aloft longer, and are thus more likely to get to you.

        But given that there’s a threshold of particles that make you “infected,” and there’s a “dose effect” whereby the amoint if virus correlates to the severity of illness, maybe those little particles don’t add up to make you infected or if infected, very sick.

        And of course there are all the context-specific variables like ambient air currents and the like.

        • Hmmm. Kind of messed that up since maybe the virus particles would be less likely to fall to the ground once the moisture evaporated.

  13. “What we need are bridges between theory and evidence. This includes sharper theories that make quantitative predictions that can be experimentally studied, and empirical studies measuring intermediate outcomes, and lab experiments to go along with the field studies.”

    Two comments on these much needed methodological bridges:

    1. The focus on numbers in contrast to information is prevalent, EBM is about information. An example is informed consent. Is a signature a surrogate to “understanding”? Usually not. Evidence based informed consent (EBIC) aims at solving this discrepancy. Some methods do https://www.researchgate.net/publication/338307369_Production_of_Evidence-Based_Informed_Consent_EBIC_With_Meaning_Equivalence_Reusable_Learning_Objects_MERLO_An_Application_on_the_Clinical_Setting.

    2. The key value of EBM is in its ability to generalize findings (or not). Methods for doing this are much needed. For one such proposal see https://link.springer.com/article/10.1007/s11192-021-03914-1

  14. I’ve tired a bit of all these battling stats over whether masking is effective or not because one thing seems clear to me:
    Masks ARE a physical barrier to your hands whether or not they effectively block airborne particles!… getting a respiratory virus on your hands and then touching/rubbing your mouth or nose is possibly the most common (and unconscious) way of transference… well, try rubbing or picking your nose with a mask on! Goggles on your eyes (also frequently rubbed) would be further protection. Masks (and discouraged handshaking) thus likely also contributed to the lower incidence of seasonal flu last season or two.
    The problems with RCT and EBM are virtually insurmountable, but masks as a physical barrier to hand contact seems a no-brainer.

    • I always preferred fist bump to handshakes. They just waste less time. But now due to covid I have come across multiple people who will refuse to do that because they think its an anti-covid measure.

      Anyone who reads my comments here could guess that is not the case, but anyway yet another example of an intervention yielding a 180 degree opposite effect due to feedbacks.

      And most people don’t care if other people wear a mask to “protect” themselves (besides that a popular heuristic for impending criminal behaviour has been lost). The issue is whether they stop you from unknowingly spreading a virus.

      And once again, that UK challenge study tells us the “mainstream” idea was totally wrong. Asymptomatic/presymptomatic transmission is rare:

      Two individuals emitted 86% of airborne virus, and the majority of airborne virus collected was released on 3 days. Individuals who reported the highest total symptom scores were not those who emitted most virus. Very few emissions occurred before the first reported symptom (7%) and hardly any before the first positive lateral flow antigen test (2%).

      […]

      No viral contamination was detected in the breath, air, or rooms of uninfected participants.

      https://pubmed.ncbi.nlm.nih.gov/37307844/

      Note that the earlier sister paper reported about half the “uninfected” participants also tested positive on PCR during the study. But whether or not masks work, there is 95%+ chance there is no virus to spread unless you are actively sick.

      Once again, this is *before* addressing whether the masks do anything beyond redirecting airflow.

      • >… there is 95%+ chance there is no virus to spread unless you are actively sick.

        Lol. It’s one study. And you use it, despite any inherent flaws and you constant focus on flaws in research and over-confidence in the reliability of research, to confidently assert absolute certainty in a highly precise evaluation.

        “Having symptoms” is inherently subjective. I would imagine in a study where people are self-reporting symptoms, there could be a biasing effect. And I could imagine myriad contextual factors that could play a role (a particular variant, the length of time of exposure, the distance from the asymptomatically infected, the ambulant conditions, the “viral load” of the infected person, the immune status of the potentially infectes, etc.)

        • It is the only study that actually uses people who are confirmed infected vs not. And the results have been slowly trickling out 2-3 years after people went wild with hysteria and bad assumptions. Think about how awful that is.

          For sure, someone else should directly replicate it. Then it should be done in an older age group.

        • I mean using one positive PCR as confirmed infected included ~50% of people who just randomly got a positive. Then of people who were recently infected, over 50+% of the time testing positive was when there was no viable virus left.

          So every other study categorized over half the people who were not currently infected as infected.

          I’ll go with one study that actually measures the independent variable over thousands that do so with so much error that it is wrong more than it is right.

          BUt indeed, there should be a huge response along the lines of “really, lets double check that because it will really help us for the next pandemic”. But there isn’t, it is either silence or denial.

      • “Once again, this is *before* addressing whether the masks do anything beyond redirecting airflow.”

        Of course masks do something beyond redirecting airflow. Masks filter the air. Different masks behave differently as far as filtration at a given particle size. All masks have some ‘bypass’ — unfiltered air that goes around the mask. Facial hair and face shape and nose shape affect the amount of bypass. And so on. Nobody should think that in actual use an N95 mask removes 95% of airborne particles of [whatever size distribution the definition of an N95 mask includes]. But it’s even more absurd to suggest that they don’t do anything other than redirect airflow.

        • N95 material is defined to filter more than 95% of all particles larger than 0.3 microns (they filter essentially 100% of particles by maybe 3 microns, and definitely ~100% by the time you’re at pollen sized particles of 5-50 microns) so depending on what you’re trying to filter they absolutely can be much MORE than 95% effective at the material level.

          Of course that’s the *material*. Real world N95 masks (half facepiece) are rated for 10x reduction in harmful particles (so 90% filtration). In reality they filter more than this but this is the “reliable” filtration (ie. some point on the left tail of the distribution). Basically you can use them when exposed to up to 10x the “allowable concentration” of whatever contaminant, like lead paint or asbestos or silica dust or whatever.

          Of course this is on *inhale*. The quantity of particles filtered on exhale is highly dependent on construction but some have a flapper valve designed to let air flow out freely and unfiltered. Those are going to do very little in terms of reducing viral exhale total, though they will redirect the air, and they may coalesce small particles into condensation, so rather than an aerosol you might wind up with more of large droplets that fall rapidly to the floor? But that’s just speculation on my part. Without the exhaust valve, the exhale air leaks out around the edges of the mask, that almost certainly filters larger droplets but probably allows plenty of aerosol…

          Good masks like KN95 (chinese) and N95 (US) and KF94 (Korean) are quite decent at filtering inhaled air. Much more variable at filtering exhalation.

  15. Cochrane reviews are great if you expect most of the relevant studies to be captured by RCTs published in medical journals. The Jefferson et al review appears to suffer from the same problem as any meta-analysis that unjustifiably restricts a search in ways that are convenient for the analyst (team) but problematic in substance. (In my current associate editor role,* the most common such restriction is to peer-reviewed journals without any substantive justification.)

    To almost anyone who has observed real people in real settings over the past few years, the more relevant questions focus precisely on the heterogeneity: where is masking more/less effective in preventing COVID transmission? where are people using better/worse masks with better/worse fit and consistency? And for goodness sake, how about some good qualitative studies? I am not an ethnographer but I desperately want to go up to everyone who voluntarily wears a blue-baggy type under the nose and ask, “Why? Why are you doing this? If you aren’t going to cover your nose, why use the mask at all?”

    * AE at Review of Educational Research (and my views are just mine, not those of the editorial team or the sponsoring learned society). Many of our authors do their best to include multiple types of studies, to the best of their skills/knowledge.

    • Sherman –

      Not that I think your comment is “wrong,” but I looked at that Cochrane Review as evidence. In that sense it was useful. But it certainly wasn’t conclusivez and in fact the authors were clear and explicit in stating that their review found that the RCT evidence was insufficient for drawing any firm conclusions.

      The problem, imo, is that “medical science communicators” or “content creators” who have substack memberships to sell ignored those caveats.

      Most remarkable, wasTom Jefferson attracting attention by making comments in the media that effectively directly contradicted the explicit statements in the review that he co-authored.

      I mean that’s really quite remaekable. Say one thing in your peer-reviewed science and then go out making public statements that are largely in direct contradiction.

      What explains that?

  16. This might be of some interest, given Gorski’s mention of Ioannidis:

    John P.A. Ioannidis, “Evidence-based medicine has been hijacked: a report to David Sackett,” Journal of Clinical Epidemiology 73 (2016) 82-86 https://dx.doi.org/10.1016/j.jclinepi.2016.02.012
    Abstract: This is a confession building on a conversation with David Sackett in 2004 when I shared with him some personal adventures in evidence-based medicine (EBM), the movement that he had spearheaded. The narrative is expanded with what ensued in the subsequent 12 years. EBM has become far more recognized and adopted in many places, but not everywhere, for example, it never acquired much influence in the USA. As EBM became more influential, it was also hijacked to serve agendas different from what it originally aimed for. Influential randomized trials are largely done by and for the benefit of the industry. Meta-analyses and guidelines have become a factory, mostly also serving vested interests. National and federal research funds are funneled almost exclusively to research with little relevance to health outcomes. We have supported the growth of principal investigators who excel primarily as managers absorbing more money. Diagnosis and prognosis research and efforts to individualize treatment have fueled recurrent spurious promises. Risk factor epidemiology has excelled in salami-sliced data-dredged articles with gift authorship and has become adept to dictating policy from spurious evidence. Under market pressure, clinical medicine has been transformed to finance-based medicine. In many places, medicine and health care are wasting societal resources and becoming a threat to human well-being. Science denialism and quacks are also flourishing and leading more people astray in their life choices, including health. EBM still remains an unmet goal, worthy to be attained.

    On page 85, Ioannidis writes:

    “Several years ago, I decided not to practice medicine any longer. I might have caused more harm than good. I could not even think of remedying this by repeating training. Retraining on how medicine is practiced today might make me worse. In some settings, we are close or past the tipping point where medicine diminishes rather than improves well-being in our society. Some truly excellent and committed physicians certainly continue to make positive contributions to health, improve lives, and save lives. However, with 20% of GDP being spent on health and health care so inefficiently, with such limited evidence or with conflicted evidence, medicine and health care can become a major threat to health and well-being.”

    • “Meta-analyses and guidelines have become a factory” The unbearable irony given the ridiculous number of highly dubious articles Ioannidis has been churning out for years and years!
      Gorski also repeats this common suggestion that somehow Ioannidis’ reputation was intact prior to 2020 (as if his ‘transformation’ since then has been somehow shocking). However, he was widely regarded as a self-promoting charlatan well before that, at least amongst people with their eyes open or who had to actually engage with the detail of his work.
      Of course, he wrote some good and important statement articles amongst all that, but there’s a lot of terrible stuff, especially the vast majority of his empirical work. For example, his studies that he uses to back up his assertions that most systematic reviews are a complete waste of space. Of course, the conclusion is probably right, but if you look at his ‘methods’ and the lack of robustness he supposedly used to reach that conclusion it’s just staggering he got away with it for so long.

  17. Speaking of evidence based matters. Zeynep Tufekci is evidence that you don’t need to have any relevant scholarly merits to get an endowed chair in sociology at Princeton.

    • Gb:

      That’s a twitter-like comment, just expressing an attitude with nothing backing it up. We have plenty of space here in the blog comments. Why do you say that Tufekci has no relevant scholarly merits? Over the past two decades, she’s done work on important topics, and this work has been well cited. In recent years, her work has been more public facing work, which is a choice that scholars often make mid-career. So, solid work in the past and the potential for influence in the future. Also she used to teach at Duke and Columbia, so it’s not like hiring her is a big risk.

      You can make an academic case for (or against) just about anybody. Not everyone would want to hire Tufekci, just as not everyone would want to hire me! But it seems like a stretch to go from you not liking her work to you saying she has no relevant scholarly merits.

      I’m sensitive to this kind of attitude on your part because this is kinda what happened to me with the statistics department at the University of California, many years ago. It wasn’t enough for them to say they were uncomfortable with my work and preferred not to have me as a colleague; instead they went on the attack and wrote various flat-out false things about my research (I’d say they lied, but, frankly, some of them were pretty stupid and others were probably smart on a good day but this wasn’t their best day, so let’s just say they wrote false statements and maybe that was because they were confused and couldn’t bring themselves to actually check to see if those statements actually made sense). So I don’t like to see this kind of drive-by comment directed at others, either.

Leave a Reply

Your email address will not be published. Required fields are marked *