Howard Wainer points us to a recent news article by Jennifer Couzin-Frankel, who writes about the selection bias arising from the routine use of outcome criteria to exclude animals in medical trials. In statistics and econometrics, this is drilled into us: Selection on x is OK, selection on y is not OK. But apparently in biomedical research this principle is not so well known (or, perhaps, it is all too well known).
Couzin-Frankel starts with an example of a drug trial in which 3 of the 10 mice in the treatment group were removed from the analysis because they had died from massive strokes. This sounds pretty bad, but it’s even worse than that: this was from a paper under review that “described how a new drug protected a rodent’s brain after a stroke.” Death isn’t a very good way to protect a rodent’s brain!
The news article continues:
“This isn’t fraud,” says Dirnagl [the outside reviewer who caught this particular problem], who often works with mice. Dropping animals from a research study for any number of reasons, he explains, is an entrenched, accepted part of the culture. “You look at your data, there are no rules. … People exclude animals at their whim, they just do it and they don’t report it.”
It’s not fraud because “fraud” is a state of mind, defined by the psychological state of the perpetrator rather than by the consequences of the actions.
Also this bit was amusing:
“I was trained as an animal researcher,” says Lisa Bero, now a health policy expert at the University of California, San Francisco. “Their idea of randomization is, you stick your hand in the cage and whichever one comes up to you, you grab. That is not a random way to select an animal.” Some animals might be fearful, or biters, or they might just be curled up in the corner, asleep. None will be chosen. And there, bias begins.
That happens in samples of humans too. Nobody wants to interview the biters. Or, more likely, those people just don’t respond. They’re too busy biting to go answer surveys.
Statisticians are just as bad! (Or maybe we’re worse, because we should know better)
Of course, we laugh and laugh about this sort of thing, but when it comes to evaluating our own teaching or our own research effectivness, we not only don’t randomize, we don’t even define treatments, or take any reasonable pre-test or outcome measurements at all!
We use non-statistical, really pre-scientific tools to decide what, in our opinion, “works,” in our teaching and research. So maybe it should be no surprise that biomedical researchers often work with some pre-scientific intuitions too.
Accepting uncertainty and embracing variation
Ultimately I think many of these problems come from a fundamental, fundamental misunderstanding: lack of recognition of uncertainty and variability. My impression is that people think of a medical treatment as something that “works” or “doesn’t work.” And, the (implicit) idea is that if it works, it works for everyone. OK, not really everyone, but for the bad cases there are extenuating circumstances. From that perspective, it makes perfect sense to exclude treated mice who die early: these are just noise cases that interfere with the signal.
OK, sure, sure, everybody knows about statistics and p-values and all that, but my impression is that researchers see these methods as a way to prove that an effect is real. That is, statistics is seen, not as a way to model variation, but as a way to remove uncertainty. There is of course some truth to this attitude—the law of large numbers and all that—but it’s hard to use statistics well if you think you know the answer ahead of time.
[And, no, for the anti-Bayesians out there, using a prior distribution is not “thinking you know the answer ahead of time.” A prior distribution is a tool, just like a statistical model is a tool, for mapping the information coming from raw data, to inferences about parameters and predictions. A prior distribution expresses what you know before you include new data. Of course it does not imply that you know the answer ahead of time; indeed, the whole point of analyzing new data is that, before seeing such data, you remain uncertain about key aspects of the world.]
So, just to say this again, I think that researchers of all sorts (including statisticians, when we consider our own teaching methods) rely on two pre-scientific or pre-statistical ideas:
1. The idea that effects are “real” (and, implicitly, in the expected direction) or “not real.” By believing this (or acting as if you believe it), you are denying the existence of variation. And, of course, if there really were no variation, it would be no big deal to discard data that don’t fit your hypothesis.
2. The idea that a statistical analysis determines whether an effect is real or not. By believing this (or acting as if you believe it), you are denying the existence of uncertainty. And this will lead you to brush aside criticisms and think of issues such as selection bias as technicalities rather than serious concerns.
P.S. Commenter Rahul writes:
The sad part is stuff like excluding subjects on a whim will get you zero repercussions almost all the time. In most cases you get published and perhaps tenure or more funding. If you are terribly unlucky you get mentioned on a blog like this and a few of us go tsk tsk.
It’s worse than that! Even after this article we still don’t know who discarded the data from those three rats or the lab where it happened. If you read the news article, it was all done confidentially. So, for all we know, there might be dozens of papers published by that research group, all with results based on discarding dead animals from the treatment group, and we have no way of knowing about it. And even the offending paper, the one being discussed here, might well eventually be published.
I guess maybe someone can do a search on all published papers involving mouse trials of drugs for protecting the brain after stroke, just looking at those studies with 10 mice in the control group and 7 mice in the treatment group. There can’t be that many of these, right?