Evidence-based medicine eats itself in real time

Robert Matthews writes:

This has just appeared in BMJ Evidence Based Medicine. It addresses the controversial question of whether lowering LDL using statins leads to reduced mortality and CVD rates.

The researchers pull together 35 published studies, and then assess the evidence of benefit – but say a meta-analysis is inappropriate, given the heterogeneity of the studies. Hmmm, ok, interesting….But what they do instead is astounding. They just tot up the numbers of studies that failed to reach statistical significance, and interpret each as a “negative” study – ie evidence that the LDL reduction had no effect. They ignore the point estimates completely.

They then find that only 1 of the 13 (8%) of the studies that achieved an LDL target showed statistically significant benefit in mortality risk, and 5 of the 13 (39%) showed the same for CVD. They do the same for those studies that failed to reach the LDL target.

Incredibly, this leads them to conclude: “The negative results [sic] of numerous cholesterol lowering randomised controlled trials call into question the validity of using low density lipoprotein cholesterol as a surrogate target for the prevention of cardiovascular disease”.

The paper has several other major flaws (as others have pointed out). But surely the stand-out blunder is the “Absence of evidence/evidence of absence” fallacy. I cannot recall seeing a more shocking example in a “serious” medical journal. Whatever the paper does call into question, top of the list must be how this came to be published – seemingly without irony – in BMJ Evidence Based Medicine.

If nothing else, the paper might serve as useful teaching material..

P.S. For what it’s worth, even a simple re-analysis focusing solely on the point estimates produces a radically different outcome.

BTW, for some reason, the authors have included studies whose mortality benefit is stated as “NR” (Not Reported) in their tables.

Also, this means the percentages used to calculate their bar charts are also incorrect (eg of those studies that met their LDL reduction targets, there are only 10, not 13, studies that allow the mortality benefit to be calculated).

“British Medical Journal Evidence Based Medicine,” huh? Remember our earlier post, Evidence-based medicine eats itself.

Evidence Based Medicine is more than a slogan. It’s also a way to destroy information!

23 thoughts on “Evidence-based medicine eats itself in real time

  1. Regarding the “Absence of evidence/evidence of absence” fallacy: while the point is well taken, I think it can be taken too far. Beyond the particular meta-analysis at hand, the practical outcome of a body of literature that consistently finds no effect (however you want to define effect) of a particular intervention should be to start thinking that the particular intervention has no effect, not that there is an absence of evidence of an effect. I understand the philosophical issues involved with that interpretation, but at some point I think most people (particularly when it comes to science) are rightfully philosophically pragmatic.

    • JFA,

      Doesn’t it depend strongly on the size and quality of the multiple studies? Taking it to an extreme for purpose of argument, if there are a dozen seriously underpowered studies failing to find evidence, doesn’t it matter that maybe ten of them were so small that they could only detect a huge effect? Or the quality argument, what if many of the published studies used noisy and/or invalid measures or had poor designs?

      I agree with your main point but to my understanding the whole purpose of meta-analysis is to extract information from the accumulated research in toto. Which can’t be done by reducing each study to the presence or absence of a single asterisk.

      I’ve seen analysis of epidemiological studies where some continuous quantity of medical interest is dichotomized and no information is published at all about the findings on the underlying continuous variable. Saying that a drug did not (significantly) decrease the number of people with hypertension is not the same as saying the drug does not decrease blood pressure. The whole significance filter thing is like a more extreme version of that fallacy.

    • What Brent said. It could well be that the substantive conclusion of no real effect is correct. The way to arrive at such a conclusion is to analyze all the data, not to throw away most of the data in an intermediate processing step.

    • My problem with the meta-analysis in the paper is that it does not go far enough. Instead, the “body of literature” should be regarded as consisting of thousands of N=1 studies, each of which fails to achieve statistical significance. This would strengthen their conclusion.

  2. Is any of this sort of similar to the recent approval of a drug for treating Alzheimer’s disease? The disease is associated with plaque formation and the drug reduces the plaque but has little or no effect on the disease itself.
    Another instance of looking at the wrong metric is five-year survival rate instead of mortality. Rudy Giuliani is famous for confusing the two when he touted the (free market) U.S. treatment of prostate cancer compared to other (i.e., socialist) countries. His fame has increased since then.

    • Yes, the FDA approved adanacumab based on its biomarker effect. Same situation as how the statins gained approval 20+ years ago. We can hope that amyloid plaque removal is similar to ldl lowering. Time will tell.

      • Perhaps today’s FDA approves drugs based on surrogate endpoints but this was not the case for statins. For example, 4S showing that simvastatin statistically reduced CV and total mortality was published in 1994 and FDA approved 31/3/1998.

  3. Happens in psycholinguistics and psychology on a daily basis. Examples (top psych journal; a well-known series from a well-known publisher):

    1. “In our review, only 11 of the 22 studies that tested for the interaction indicative of the [effect of interest] found a significant effect.”

    2. “A number of studies have tested whether the parser considers [effect of interest] upon encountering …. Most results converge on the conclusion that it does not.”

    Apparently psychologists and psycholinguists didn’t get the memo about absence of evidence.

    I’m actually feeling relieved that it’s not just my field that has this problem.

  4. Andrew you left out the most juicy bit. One of the authors, AM mentions in the paper:

    “AM is a co-author of the ‘Pioppi Diet’ a lifestyle plan book that advocates for the benefits of a Mediterranean diet low In refined carbohydrates to to reduce heart disease risk and improve metabolic health. AM is also co-producer of documentary, The Big Fat Fix.”

  5. The post and comments reminded me of synthetic control charts.

    In statistical process control people have considered “synthetic control charts”. Suppose that each observed sample mean is classified as either “0” (conforming, or within control limits) or “I” (nonconforming, or outside of control limits). A sequence of means could then be represented by a string of zeros and ones; for example, 1000001000 would indicate that in a sequence of ten samples, the first and seventh sample means were outside of the control limits. In general, the digit on the right end of the string represents what happened with the most recent observation;
    digits to the left correspond to values observed in earlier samples.

    These charts have proven inferior to non dichotomized charts https://www.tandfonline.com/doi/abs/10.1080/00224065.2016.11918158

    Perhaps referring to the BMJ study as a “synthetic” study would help better communicate the “Absence of evidence/evidence of absence” fallacy.

  6. This flawed ‘vote count’ method of analysis was proven incorrect by Hedges and Olkin in the classic text Statistical Methods for Meta-Analysis (1986). I agree with Robert Matthews (whose papers I’ve learned a ‘significant’ amount from) — the errors in this paper are instructive. I started a thread on Frank Harrell’s discussion forum on this study. https://discourse.datamethods.org/t/article-hit-or-miss-the-new-cholesterol-targets/4545

    • Thank u for the thread – very interesting. There is another basic question here which is why “35”? Was the study done sequentially? This applies to meta analysis in general.

      Do you know if this has been addressed by anyone (a fixed sample approach to the number of studies used in the meta analysis, or a sequential approach, i.e. adding studies and repeating the analysis as you go along)?

      • RE: number of studies for a retrospective meta-analysis. I’ve come to the conclusion that the method of synthesis should be decided upon by the number of studies that can be found, and stated beforehand.

        Andrew Gelman’s distinction between questions about sign and magnitude is helpful. If you have a small number of studies you aren’t confident are homogeneous enough to combine, (ie up to 8 (2^3)), I’d make assumptions about Type II error (making power less than 0.5), I’d just use a p value combination method to draw a conclusion about the sign. With up to 16 studies, you can start to look for local effects using a p-value ranking method. Once you get up to 32+ studies, meta-regression or empirical Bayes estimation to adjust for factors that may explain heterogeneity, could be useful.

        I base this loosely on information-theoretic considerations, and this interesting paper on ‘Almost Sure’ Hypothesis Testing, which essentially argues for maximizing Chernoff efficiency. https://www.projecteuclid.org/journals/electronic-journal-of-statistics/volume-10/issue-1/Almost-sure-hypothesis-testing-and-a-resolution-of-the-Jeffreys/10.1214/16-EJS1146.full

    • I don’t own this book but have read meta-analysis articles that most certainly cite it. There are many good approaches, even when the raw data is not accessible. ‘Vote count’ is definitely wrong.

      The problem is cultural. Evidence-based medicine is about getting doctors to think differently beyond their own community. It is not the same as public health or epidemiology. The purpose of a paper like this is to show that there are multiple sources of data about statins, not to show how to combine the evidence. So counting the asterisks is like a garnish on a restaurant plate–for style and emphasis, not the source of nutrition. And then when statisticians criticize, we are told ‘you don’t get it.’

    • There are ways to lessen the defects and interpretations that were outlined here (say given all you have is stat sign and direction – MODERN
      EPIDEMIOLOGY Rotheman et al “Note that many researchers would misinterpret having “only” 5 of 17 studies significant and positive to mean that the preponderance of evidence favors the null; the test shows how mistaken this judgment is. Under the null hypothesis we should expect only 1 in 20 to be “significant” at the 0.05 level in either direction, and only 1 in 40 to be significant and positive. Thus, seeing 5 out of 17 studies significant and positive would be extremely unusual if there were no effect or bias in any of the studies.” page 680 https://sgp1.digitaloceanspaces.com/proletarian-library/books/246ede0435010d90ce710415b214f21b.pdf

      As Brian Ripley once put it – statisticians just don’t read [so why expect them to learn the lessons others have?]

  7. There was at the very start of it something uncanny and tendentious about the way the word “evidence” got made-over into a technical term of astounding narrowness; in the hands of the high-church doctors-of-evidence. The clever slogan neatly divides medicine into “evidence-based” and “otherwise-based”. Were ‘evidence’ meant to be read as broadly as possible, who could possibly argue with such a seeming common-sense position?

    What is one to think when a seeming common-sense proposition is transmuted into the chant of a mystery-cult?

    Even peddlers of crackpot cures, be they credentialed or not, assure you that their cure-alls do rest upon evidence — whether this evidence be plain-as-day or dark and recondite, who would say his claims are *not* based on evidence, stricto-sensu? Why, take some street-corner preacher of doom, maddened by empty half-Aramaic formulae and by sunstroke; rest assured: even he will show you the “evidence” if only you knock kindly on his cardboard sign to get his attention. Even dark-age antediluvian “physics” like Isidore of Seville and Maimonides assembled most strenuously the evidence — as they saw it — for their prescriptions.

    The enlightened modern may object that such-and-such a passage Isidore cites Galen on the cure for quinsy barely rates as evidence at all.

    True, the enlightened modern ought to weigh the reliability of the prior chronicler and thereby rate the value of the evidence assembled from that report. But there is quite a world of difference between asserting that some evidence must by elicited and made-bare, if we should pay any heed at all to a cure that’s being pitched. On this point, the quack-doctor, the scientist-doctor and the clinician are in violent agreement!

    But the movement (the church of EBM) no longer remembers ‘evidence’ in the catholic sense; it is right away gutted and filleted into a mystified, concentrated, desiccated mummy; into the sacrament of a mystery cult: the impenetrable formula provided for the journal-editor and his co-conspirators, the tremulous academics, groping around alike, for the branch they may seize, for their rescue: the salvation of ‘significance’.

Leave a Reply to R_cubed Cancel reply

Your email address will not be published.