Battle of the biostatisticians

Sanjay Kaul points to this interesting news article by Matthew Herper about statistical controversies involving the drug Vytorin:

A 1,873-patient study called SEAS found there were 50% more cancers among patients who took Vytorin than those who received placebo. Researchers involved in the study put together a hastily organized, company-funded press conference on July 21 to release the data.

There, Richard Peto, an Oxford University statistician, quieted the cancer scare before it really began. He pooled data from two much larger ongoing studies of Vytorin and said they showed that the cancer risk was a statistical fluke. He called the contention that Vytorin could cause cancer “bizarre.” . . .

Forbes surveyed 16 experts about the SEAS results. None were entirely convinced of a link to cancer, but eight thought Peto had gone too far in completely dismissing any cancer risk. Ten thought there was at least some possibility that Vytorin increases the risk of death for patients who have cancer. . . .

Sander Greenland, a statistician at the University of California, Los Angeles, says the data are “ambiguous” . . . The data “are not definitive at all,” says James Stein, a cardiologist at the University of Wisconsin, Madison. He says Vytorin should remain “a third-line drug” until more data can be collected, even though the drug is less likely to make patients complain of symptoms like aching. Doctors, he says, are admonished to “first do no harm,” not “first do what is easy.” Peto responds: “These three trials do not provide any credible evidence of any side effect.”

Peto serves as statistician for one Vytorin trial, but does not receive direct money from Merck and Schering-Plough. Here’s his basis for dismissing the cancer risk: In SEAS, there were 102 cases of cancer for patients on Vytorin, compared with 67 on placebo. Peto got permission to unblind data from two ongoing studies involving 20,000 patients: SHARP, testing Vytorin vs. placebo and Zocor in kidney patients, and IMPROVE-IT, which compares Vytorin and Zocor in patients at risk for heart attacks. There were 313 cases of cancer for patients taking Vytorin, compared with 326 cancer cases for people taking Zocor or placebo. . . .

The numbers work out differently when one looks at cancer deaths, as opposed to just cancer. In SEAS, 39 Vytorin patients died from cancer, compared with 23 on placebo. In the two larger studies, 97 patients getting Vytorin died, compared with 72 getting Zocor or placebo. The result, Peto says, is close to statistically significant; the odds are about six in 100 that it occurred by chance.

But unlikely things do occur by chance sometimes–that’s how people get struck by lightning or win the lottery. Peto points out that hundreds of drugs are being studied in thousands of clinical trials, and sometimes a study will show a drug causes cancer or extends survival just by chance. The hypothesis being tested was that Zetia increases the risk of cancer, not that it increases the risk of cancer deaths. “These continually changing hypotheses are a misuse of statistics,” Peto says.

UCLA’s Greenland disagrees. He thinks that it is entirely proper to look at the issue of cancer deaths, and he also thinks it’s okay to pool all the data from all three trials, an approach that yields a highly statistically significant result. . . .

Donald Berry, head of division of quantitative sciences at M.D. Anderson Cancer Center says he finds much of Peto’s analysis convincing. But he says it’s not impossible that a drug could make a whole swath of cancers worse, and that possibility may not have been ruled out yet.

“My very clear bent is in Peto’s direction,” says Berry, “but I do think one wants to keep looking at this.”

These are some interesting issues. I have just a few comments:

1. It’s funny that 6 experts surveyed did not think there was “at least some possibility that Vytorin increases the risk of death for patients who have cancer.” There’s gotta be some possibility, right?

2. It’s funny that a p-value of .06 is likened to being struck by lightning or winning the lottery. 6% just isn’t so rare as all that.

3. At some point the discussion should move beyond whether a certain result is statistically significant, and move to an assessment of the costs and benefits of different courses of action.

10 thoughts on “Battle of the biostatisticians

  1. Good points, especially the first.

    "These continually changing hypotheses are a misuse of statistics," says Peto. As if you can't use statistics to come up with new hypotheses. This is such a basic issue you'd think there would be more agreement by now.

  2. There really is a mystique about statistical significance that has grown really tiresome, and it extends throughout drug development. So I agree with your last statement, and would add that it should apply throughout.

  3. I was struck by the line "The result, Peto says, is close to statistically significant; the odds are about six in 100 that it occurred by chance."

    Taken at face value, this means that there is a 94% chance that the result did not occur by chance, which is to say, there is a 94% chance that Vytorin increases the risk of cancer. But of course we can't take this at face value: I'm sure that the "6 in 100" line is not in fact an estimate of the probability that Vytorin causes cancer: it's the answer to the question "If Vytorin DOESN'T cause cancer, what is the chance that we would see this number of cancer cases?" Admittedly this is _related_ to the chance that Vytorin increases the risk of cancer, but it is very far from being the same.

    But, as Andrew suggests, much more relevant than the question "does Vytorin increase the risk of cancer" is the question "do the benefits of Vytorin (whatever they are) outweigh the risks of Vytorin (whatever _they_ are)."

  4. "The result, Peto says, is close to statistically significant; the odds are about six in 100 that it occurred by chance."

    Nonsense! Whatever results were obtained, were obtained by chance. The notion that p-values measure "the probability of obtaining those results by chance," or even worse, "the probability that the null hypothesis is false," is a widespread but completely wrong-headed meme.

    This is why I hate p-values. I am in complete agreement with Andrew that significance tests are the wrong way to approach such problems. These problems are decision-theoretic in nature and must take into account a realistic loss function.

  5. My guess is that the lightening/lotto examples are a result of the the news writer missing Peto's point. Somehow the writer took Peto's discussion of how an event that is unlikely in a single case may be very likely across all cases, and turned it into "unlikely things do occur by chance sometimes".

  6. Apparently, none of the 16 experts is completely sure Vytorin causes any cancer or any cancer deaths. That's reassuring. The ones who are sure it doesn't have any such effect are too sure. Who died and left them gods?

    I didn't think the article was likening the probabilities of getting hit by lightning or winning a lottery to 6%. It seemed to me that the reporter was saying that even events much less probable than that do, in fact, occur.

    The articles come perilously close to attributing to Richard Peto the major misinterpretation of a p-value other posters caught.

    The articles also come perilously close to attributing acquiescence on Sander Greenland's part to the dubious view that total cancer incidence and total cancer mortality are etiologically meaningful outcomes.

    Direct quotes avoid the potential of coming perilously close to suggesting that someone said something with which he or she might strongly disagree. Unfortunately, there aren't many direct quotes in the articles.

    The scariest one is from Roger Blumenthal: "I believe that [Zetia] is clinically beneficial, but I cannot prove that."

    I don't care if this guy has never taken a ball point pen from the manufacturer. I wouldn't let him within 10 miles of a Zetia efficacy or safety study until after it's done.

    Another interesting aspect to this story is that it's one of many in which hypotheses about etiologic mechanism are non-ignorable. There are pervasive views in epidemiology and public health that understanding mechanism of action is unnecessary or unimportant, that drawing attention to incomplete mechanistic knowledge usually delays needed action, that causality can be established without knowing anything about mechanism, etc.

    Charlie Poole

  7. Concordant data with regards to incident cancer and cancer deaths in combined SHARP and IMPROVE-IT studies would have strengthened the evidence of harm associated with ezitimibe. However, cancer death is arguably a more reliable endpoint than incident cancer, and it is consistently going in the wrong direction in all three trials. When you combine the cancer death events from all three trials, there is a significant increase in the signal of harm with ezitimibe. Sir Peto has dismissed this as “misuse” of statistics on two counts: a) one cannot “continually” change the hypothesis from incident cancer to cancer deaths, and b) one cannot combine data from hypothesis-generating trial (SEAS) with hypothesis-testing trials (SHARP and IMPROVE-IT). I respectfully disagree with Sir Peto on both counts. These are data-derived observations and there are no set rules with regards to which data can be chosen to generate the hypothesis. I will argue that the observation that ezitimibe may impact cancer mortality with out impacting incident cancer is a plausible hypothesis that needs to be tested (especially given the potential role of phytosterols in tumor immunosurveillance). With regards to the second point, there are numerous examples where data-derived observation from one study was used for both generating as well as testing the hypothesis. Accordingly, I see no problem with pooling data from all three trials, an approach that yields a highly statistically significant adverse outcome.

  8. I am surprised by this thread: not surprised that people continue to disagree on statistical inference or drug trials, which would be naive indeed, but surprised at how this thread has been conducted.

    It appears that people are taking quite literally the exact phrasing that appears in a news article, and one that comes from a source that it is not obviously of high impartiality or high quality on statistical matters.

    Have you never had the experience of being quoted in media coverage, and found that coverage to be wrong, distorted, selective, etc., etc.? Have you never heard that journalists can be creative even about supposedly exact quotations?

    Conversely, if the various participants are being quoted for what they say in published articles, or minimally in press statements that they approved, then all well and good, and make criticisms exactly as seems appropriate, but please use referencing of sources in good academic fashion.

    I have no personal brief here, and no expertise that enables me to adjudicate in this debate. But please don't match one kind of semi-technical reporting with something as mediocre. One simple reason I like this blog a lot is that it generally hits a far higher standard.

    Finally, as a small matter of form, one key participant in this debate is, depending on how formal or informal you want to be, Peto, Richard Peto, Professor Peto, Professor Richard Peto, Professor Sir Richard Peto, but emphatically not Sir Peto. This is put in for those who prefer to be right on linguistic points, with apologies to those who think, understandably, that it is not worth mentioning.

Comments are closed.