How to think about “medical reversals”?

Bill Harris points to this press release, “Almost 400 medical practices found ineffective in analysis of 3,000 studies,” and asks:

The intent seems good; does the process seem good, too? For one thing, there is patient variation, and RCTs seem focused on medians or means. Right tails can be significant.

This seems related to the last email I sent you (“What if your side wins?”).

From the abstract of the research article, by Diana Herrera-Perez, Alyson Haslam, Tyler Crain, Jennifer Gill, Catherine Livingston, Victoria Kaestner, Michael Hayes, Dan Morgan, and Adam Cifu:

Through an analysis of more than 3000 randomized controlled trials (RCTs) published in three leading medical journals (the Journal of the American Medical Association, the Lancet, and the New England Journal of Medicine), we have identified 396 medical reversals.

I’m not sure what to think about this! I’m sympathetic to the aims and conclusions of this article, but I can see there can be problems with the details.

In particular, what qualifies as a “medical reversal”? From the linked article:

Low-value medical practices are medical practices that are either ineffective or that cost more than other options but only offer similar effectiveness . . . Medical reversals are a subset of low-value medical practices and are defined as practices that have been found, through randomized controlled trials, to be no better than a prior or lesser standard of care. . . .

The challenge comes in when making this judgment from data. I fear that pulling out conclusions from the published literature will lead to the judgment being made based on statistical significance, and that doesn’t seem quite right. On the other hand, you have to start somewhere, and there’s a big medical literature to look through: we wouldn’t want to abandon all that and start from scratch. So I’m not quite sure what to think.

15 thoughts on “How to think about “medical reversals”?

  1. >I fear that pulling out conclusions from the published literature will lead to the judgment being made based on statistical significance, and that doesn’t seem quite right.

    Yes, it looks like that’s exactly what they did:

    “An article was considered positive if the trial met its primary endpoint and negative if it failed to meet the primary outcome or if the study measured a hard endpoint (quality of life, mortality, etc.) and failed to show statistical superiority over a prior or lesser standard of practice in the control arm.”

    But they don’t define “failed to meet the primary outcome” (I hate that terminology) or “failed to show statistical superiority.” I’d be surprised if thise meant anything other than p > 0.05 though. So they are probably counting as “negative” lots of trials that actually did suggest benefit, just not strongly enough to get p < 0.05.

  2. OR “cost more than other options but only offer similar effectiveness”

    That’s hardly a reversal, in the sense readers of this blog would think of it. Sure, it might not be an appropriate treatment now, but that just means it’s more expensive, not that it doesn’t work.

    Besides, relative costs can change over time (e.g. if drugs go off patent, or have only one manufacturer who decides to gouge, or new technology allows easier synthesizing).

  3. I am not sympathetic with this type of work. Humans respond idiosyncratically to all types of medical treatment. It is one thing to declare a treatment effective generally or unsafe for general use, but one drug that is generally effective may be ineffective for a certain patient. A physician can run a real world experiment, and test the drug in a patient, and then see if the patient responds. If we take options off the table because we think a treatment is less effective or only as effective as a cheaper option for the population as a whole, then we are certainly going to diminish patient care. We already take drugs off the market because they are unsafe at the population level even though we know that they are perfectly safe and effective for some sub-populations. Maybe that’s the right thing to do, but restricting options further because of costs or relative effectiveness is a bridge too far.

  4. well, the basic problem of reaching accurate conclusions from data analysis … extends to all fields of science inquiry.

    However, the medical profession enjoys an unusually high level of cultural respect & trust — that gives their research pronouncements & expertise more credibility with the public … beyond their actual level of demonstrated medical expertise and treatment success.

    Also, the direct medical accountability for ineffective, over-priced, and/or harmful treatments & diagnoses is difficult to assess … thus significantly insulating medical researchers/practioners from an effective feedback loop.

    (as the folklore adages advise — Doctors bury their mistakes & Doctors get their fee whether their patients live or die)

  5. p-value as an appropriate/sole criterion is often castigated in this blog as being a low bar. However, when an expensive, highly invasive treatment compared to an inexpensive, noninvasive existing treatment is unable to pass this low bar, it likely follows that the expensive, highly invasive treatment is not worth it.
    See the 2015 book, “Ending Medical Reversal,” by Vinayak Prasad and Adam Cifu for a general discussion of 146 medical reversals.

    • p-value as an appropriate/sole criterion is often castigated in this blog as being a low bar. However, when an expensive, highly invasive treatment compared to an inexpensive, noninvasive existing treatment is unable to pass this low bar, it likely follows that the expensive, highly invasive treatment is not worth it.

      It is just a matter of how much money is allocated to the idea… You are guaranteed to “get significance” if you spend enough money. This is a simple fact.

  6. The published data is a subset of all the studies, and there is a clear tendency to publish positive studies over equally well done negative studies. However, when a positive study is published in a big impact journal like the NEJM, a negative study becomes more important to publish. Thus precursor positive studies are an impetus to publish negative results that would be neglected otherwise, and this results in “reversals.”

  7. Where’s the “what if your side wins?” link supposed to go? It’s broken and the archives don’t contain any post of a similar title for 2019-10-11.

  8. Maybe the problem lies in wanting to interpret a trial result “standalone” ?

    The standard practice in my field (medicine) is to require a formal presentation of trial methods, results and analysis ; however, the “discussion” part is largely un-formalized.

    One could envision a *formal* discussion part, involving a formal retrieval of the relevant literature (with specified criteria), a formal “aggregation” (formal meta-analysis if possible) and a *joint* analysis of the available data. This may lead to a formal decision analysis (and, yes, this more or less entails the *explicit* choice of loss function(s)).

    • Some journals require a systematic review along with the trial report to put the new trial in the context of what was already known (Lancet does I think). Seems a good idea, and sometimes done well, although it’s extra work at the reporting stage.

Leave a Reply to Michael Watts Cancel reply

Your email address will not be published. Required fields are marked *