Doug Helmreich writes:
OK, I work at a company that is involved in stents, so I’m not unbiased, but…
http://www3.imperial.ac.uk/newsandeventspggrp/imperialcollege/newssummary/news_2-11-2017-15-52-46 and especially https://www.nytimes.com/2017/11/02/health/heart-disease-stents.html
The research design is pretty cool—placebo participants got a sham surgery with no stent implanted. The results show that people with the stent did have better metrics than those with just the placebo… but the difference was not statistically significant at 95% confidence, so the authors claim there is no effect! (the difference was significant at 80% confidence). So, underpowered study becomes ammunition in the “stents have no material impact” fight.
Here are the relevant quotes. From the press release:
Coronary artery stents are lifesaving for heart attack patients, but new research suggests that the placebo effect may be larger than previously thought. . . .
“Surprisingly, even though the stents improved blood supply, they didn’t provide more relief of symptoms compared to drug treatments, at least in this patient group,” said Dr Al-Lamee, who is also an interventional cardiologist at Imperial College Healthcare NHS Trust.
From the news article:
Heart Stents Fail to Ease Chest Pain . . . When the researchers tested the patients six weeks later, both groups said they had less chest pain, and they did better than before on treadmill tests.
But there was no real difference between the patients, the researchers found. Those who got the sham procedure did just as well as those who got stents. . . .
“It was impressive how negative it was,” Dr. Redberg said of the new study. . . .
Here’s what the research article (Percutaneous coronary intervention in stable angina (ORBITA): a double-blind, randomised controlled trial, by Rasha Al-Lamee et al.) had to say:
Symptomatic relief is the primary goal of percutaneous coronary intervention (PCI) in stable angina and is commonly observed clinically. However, there is no evidence from blinded, placebo-controlled randomised trials to show its efficacy. . . . There was no significant difference in the primary endpoint of exercise time increment between groups (PCI minus placebo 16·6 s, 95% CI −8·9 to 42·0, p=0·200).
Setting aside the silliness of presenting the p-value to three significant digits, that summary is reasonable enough. The press release and the news article got it wrong by reporting a positive but non-statistically-significant change as zero (“they didn’t provide more relief of symptoms compared to drug treatments” and “Those who got the sham procedure did just as well as those who got stents”) or even negative (“It was impressive how negative it was”)! The research article got it right by saying “there is no evidence”—actually, saying “there is no strong evidence” would be a better way to put it, as the data do show some evidence for a difference.
Getting to the statistics for a moment . . . Tables 1 and 3 show that there were some pre-treatment differences between treatment and control groups. This will happen under randomization, but then it’s a good idea to adjust for those differences when estimating the treatment effect. What they did was compute the gain score for each group, using post-treatment minus pre-treatment as their outcome measure. That’s not horrible but it will tend to overcorrect for pre-test—it’s equivalent to a regression adjustment with a coefficient of 1 on the pre-test, and typically you’d see a coefficient less than 1, for the usual reasons of regression to the mean.
Indeed, had the natural regression adjustment been performed, the observed difference might well have been “statistically significant” at the 5% level. Not that this should make such a difference, but imagine how all the headlines would’ve changed.
Are the raw data from this study available? The answer should be Yes, as a matter of course, but unfortunately I don’t think Lancet yet requires data and code repositories.
The other thing going on is that there are multiple outcome measures. The research paper unfortunately focuses on whether differences are “statistically significant”—this just makes me want to scream!—and I don’t know enough about the context to be sure, but if the treated patients are improving, on average, for all the outcomes, that’s useful evidence too.
Are stents a good idea?
That’s another question and it comes to costs and benefits. Not “Do stents work better than placebo?”, but “How much better do they work, compared to realistic alternatives?” and “What are their risks?” We need numbers here, not just wins and losses.
There’s a science question: What do stents do, what makes them work when they work and what makes them fail when they fail, etc. (I phrase this in a vague way because I know nothing of the science here.) And there are various decision questions, at the level of individual doctors and patients, and higher up when deciding which procedures to recommend and reimburse.
Lots of questions, and it’s a mistake to think that all (or even any) of them can be answered by a single number obtained from this study alone.
The problem here is not lots of potentially useful data have been smushed and compressed into a binary summary (p greater than or less than 0.05) but that there’s a disconnect in the way that this study is used to address the scientific and policy questions we care about. There’s a flow of argument in which all scientific information goes into this one number which then is supposed to answer all questions. And this makes no sense. Indeed, it makes so little sense that we should ask how it is that people could ever have thought this was a good idea. But that’s a subject for another post.
P.S. I wrote this post awhile ago, and in the meantime I wrote an article with John Carlin and Brahmajee Nallamothu, expanding on some of these points. The article is called ORBITA and coronary stents: A case study in the analysis and reporting of clinical trials, and I have no idea where we will publish it. It can be tough to get an article published that doesn’t take strong conclusions; indeed, our message is that researchers should express less strong conclusions from their data.