Three unblinded mice

I happened to come across this post from 2013 disucssing a news article by Jennifer Couzin-Frankel, who writes about the selection bias arising from the routine use of outcome criteria to exclude animals in medical trials:

Couzin-Frankel starts with an example of a drug trial in which 3 of the 10 mice in the treatment group were removed from the analysis because they had died from massive strokes. This sounds pretty bad, but it’s even worse than that: this was from a paper under review that “described how a new drug protected a rodent’s brain after a stroke.” Death isn’t a very good way to protect a rodent’s brain!

The news article continues:

“This isn’t fraud,” says Dirnagl [the outside reviewer who caught this particular problem], who often works with mice. Dropping animals from a research study for any number of reasons, he explains, is an entrenched, accepted part of the culture. “You look at your data, there are no rules. … People exclude animals at their whim, they just do it and they don’t report it.”

It’s not fraud because “fraud” is a state of mind, defined by the psychological state of the perpetrator rather than by the consequences of the actions.

Also this bit was amusing:

“I was trained as an animal researcher,” says Lisa Bero, now a health policy expert at the University of California, San Francisco. “Their idea of randomization is, you stick your hand in the cage and whichever one comes up to you, you grab. That is not a random way to select an animal.” Some animals might be fearful, or biters, or they might just be curled up in the corner, asleep. None will be chosen. And there, bias begins.

That happens in samples of humans too. Nobody wants to interview the biters. Or, more likely, those people just don’t respond. They’re too busy biting to go answer surveys.

Wow. I wonder if this is still happening.

Back in 2013, here’s how I summarized the problem:

So, just to say this again, I think that researchers of all sorts (including statisticians, when we consider our own teaching methods) rely on two pre-scientific or pre-statistical ideas:

1. The idea that effects are “real” (and, implicitly, in the expected direction) or “not real.” By believing this (or acting as if you believe it), you are denying the existence of variation. And, of course, if there really were no variation, it would be no big deal to discard data that don’t fit your hypothesis.

2. The idea that a statistical analysis determines whether an effect is real or not. By believing this (or acting as if you believe it), you are denying the existence of uncertainty. And this will lead you to brush aside criticisms and think of issues such as selection bias as technicalities rather than serious concerns.

It seems we’ve been saying the same damn thing for close to a decade, but many people still haven’t been getting the memo.

22 thoughts on “Three unblinded mice

  1. Did you hear the one where high profile researchers at Stanford thought you could get a generalizable sample by sending out emails and promising potential participants they could return to work without fear if they drove to testing sites and tested negative for COVID?

  2. > curled up in the corner, asleep. None will be chosen. And there, bias begins.

    I do remember giving the “fast rabbits” lecture to rehab medicine students in 1987 as one of the lessons on why random selection was really important.

    What was most notable is that these methodological considerations rather than statistical distribution considerations were more readily grasped by the students and actually enjoyable for them to learn about.

    (It was a strange course, the first term was on methodological considerations, e.g. the need for random selection, for blinding, to identify past research efforts, assess the quality of the reports of past research, etc. and the second term was on the statistical distribution considerations, e.g. hypothesis test, confidence intervals, diagnostic errors, etc. After the first term the dean told me I had become the most popular professor in the department. That changed very quickly in the second term. However, almost all the student did very well on what I thought was a challenging exam mostly on the statistics part, except for a few that stopped coming to class. The following year, the dean told me a new faculty member who knew little about statistics was assigned the course to help them learn about statistics.)

    • > However, almost all the student did very well on what I thought was a challenging exam mostly
      > on the statistics part, except for a few that stopped coming to class.

      I see what you did there!

  3. Regarding summary points 1 and 2:
    I’ve known very, very few researchers who have any idea what to do in the absence of a statistical approach to, in essence, identifying what’s real and what’s not real. And when funding is on the line and reviewers are complicit….

    I still don’t think that this is a statistical issue. I think it’s an inability to think scientifically that’s the problem.

  4. “The idea that effects are “real” (and, implicitly, in the expected direction) or “not real.” By believing this (or acting as if you believe it), you are denying the existence of variation.”

    If my oscilloscope is on autogain, the screen will display big squiggles of variation in the magnitude of the noise. If I then activate a square wave generator, the noise goes to what looks like a straight line and I get a nice clean square wave. Now I can convert the display to a time series, bracket some time, and ask “was the square wave generator on or off?”

    The argument here is that no statistical analysis can tell me the answer, it can only give me a level of confidence. I should never conclude that the generator was on, only that I am pretty sure it was on to a probability of 99% with a long string of nines.

    While I am aware that the author does not have deeply held postmodern views, the words in the post were torn from the book of postmodernism IMO.

    • Matt:

      I agree that it makes sense to describe effects as “real” or “not real” if these effects are defined precisely enough. For example, either I am holding a baseball in my hand or I am not. If I put an egg in boiling water for 10 minutes it will get hard inside. Etc. But describing average effects as “real” or “not real” will not work so well. For example, it’s common practice in medicine, psychology, policy analysis, etc., to declare that an effect is “real” if it shows a statistically significant pattern in data or that it is “not real” if it does not show such a pattern. I don’t think it makes sense to say that growth mindset, for example, “works” or “does not work.” First, the growth mindset treatment is not precisely defined—there are many such treatments—and, second, any particular growth mindset treatment will have different effects on different people and in different contexts. My words on this subject are torn not from the book of postmodernism but from my own experiences.

    • This seems to be confusing ‘effects’ with ‘any kind of data whatsoever’. Obviously treating a single data point as containing solid information can’t possibly “deny the existence of variation”.

    • When you turn on the square wave generator and calculate the average voltage, the -1V and +1V average out to zero which because the p value for a null hypothesis of mean zero is 0.64 obviously means there is no effect of turning on the generator. You’re just fooling yourself, which is obvious to any outside observer, I mean you didn’t even blind yourself to whether the generator was on or off.

  5. This reminded me of research I was a part of. Vaginal swabs of mice on some hormone medications. Biters weren’t the biggest issue.
    However, randomization was proper. Collecting the outcome was more challenging (filming, tracking the behavior longitudinally, etc.)

    I think part of the problem is that many people think that RCT is the same as ‘random’ opinion poll. There are exclusion/inclusion criteria to even end up in a big pool, before randomization to placebo/intervention begins. I wouldn’t call those criteria selection bias though.

    RCT is anything but random, really.

  6. You just have to look at sample sizes. No one designs a study to have 5 mice in one group and 7 in another. If there is no explanation then you know some selection happened.

    Also watch for sample sizes to change from figure to figure without explanation. Eg, n = 10 for behavior then n = 7 for histology, etc. And if different groups if animals were used for behavior and histology, why? That isn’t minimizing animal suffering so unless there is good reason the study is technically federal offense.

    • I was once at a departmental colloquium where the speaker started out with an n = 20 pigeons and ended up with 19. When asked about it, he confessed that they had no idea what happened to Pigeon No. 20.

      I have since postulated a hungry grad student.

  7. I think your (2) is the science version of map-territory confusion and cargo cult superstition. If you successfully make the incantations of “randomization”, “statistics”, and “significance”, you get a real effect.

    This type of confusion seems deeply wired into human thinking. But scientists somehow thing that “science” automatically elevates them beyond it.

    This is why I think the Red Team idea is so terrific. Competition and teamwork are also deeply wired. Having designated competitors seems to create the psychological room to break free from the momentum of ritual.

  8. Stop creating room for casual bitism like this. Richly ironic in an article about bias.

    Luis Suarez has had a fantastic career despite being out–thankfully the Mediterranean nations are less uptight than the British–but who knows what he might have achieved without the discrimination. (I’m not saying what he did was entirely right, mind you.)

    Most biters have to skulk around, anonymously…

  9. As a lukewarm defense of that mice paper, maybe an excuse(? justification?) would be that the size of the induced stroke is a variable in the experiment, so if they accidentally induced too big a stroke, then maybe it’s reasonable to remove those samples.

Leave a Reply to Daniel Lakeland Cancel reply

Your email address will not be published. Required fields are marked *