Instead of replicating studies with problems, let’s replicate the good studies. (Consider replication as an honor, not an attack.)

Commenter Thanatos Savehn pointed to an official National Academy of Sciences report on Reproducibility and Replicability that included the following “set of criteria to help determine when testing replicability may be warranted”:

1) The scientific results are important for individual decision-making or for policy decisions.
2) The results have the potential to make a large contribution to basic scientific knowledge.
3) The original result is particularly surprising, that is, it is unexpected in light of previous evidence and knowledge.
4) There is controversy about the topic.
5) There was potential bias in the original investigation, due, for example, to the source of funding.
6) There was a weakness or flaw in the design, methods, or analysis of the original study.
7) The cost of a replication is offset by the potential value in reaffirming the original results.
8) Future expensive and important studies will build on the original scientific results.

I’m ok with items 1 and 2 on this list, and items 7 and 8: You want to put in the effort to replicate on problems that are important, and where the replications will be helpful. One difficulty here is are determining if “The scientific results are important . . . potential to make a large contribution to basic scientific knowledge.” Consider, for example, Bem’s notorious ESP study: if the claimed results are true, they could revolutionize science. If there’s nothing there, though, it’s not so interesting. This sort of thing comes up a lot, and it’s not clear how we should answer questions 1 and 2 above in the context of such uncertainty.

But the real problem I have is with items 3, 4, 5, and 6, all of which would seem to favor replications of studies that have problems.

In particular consider item 6: “There was a weakness or flaw in the design, methods, or analysis of the original study.”

I’d think about it the other way: If a study is strong, it makes sense to try to replicate it. If a study is weak, why bother?

Here’s the point. Replication often seems to be taken as a sort of attack, something to try when a study has problems, an attempt to shoot down a published claim. But I think that replication is an honor, something to try when you think a study has found something, to confirm something interesting.

ESP, himmicanes, ghosts, Bigfoot, astrology etc.: all very interesting if true, not so interesting as speculations not supported by any good evidence.

So I recommend changing items 3, 4, 5, and 6 of the National Academy of Sciences. Instead of replicating studies with problems, let’s replicate the good studies.

To put it another way: The problem with the above guidelines is that they implicitly assume that if a study doesn’t have obvious major problems, that it should be believed. Thus, they see the point of replications as checking up on iffy claims. But I’d say it the other way: unless a study in its design, data collection, and results are unambiguously clear, we should default to skepticism, hence replication can be valuable in giving support to a potentially important claim.

Tomorrow’s post: Is “abandon statistical significance” like organically fed, free-range chicken?

12 thoughts on “Instead of replicating studies with problems, let’s replicate the good studies. (Consider replication as an honor, not an attack.)

  1. 5) There was potential bias in the original investigation, due, for example, to the source of funding.
    6) There was a weakness or flaw in the design, methods, or analysis of the original study.

    These are both true 100% of the time. So I guess they agree with me that if a study is worth doing once, it is worth replicating. Anything that will not be replicated can be considered a pilot study, which would be about 99.99% of what has been published in the last few decades.

  2. Hmmm…

    Personally I’m cool with replicating everything. Science isn’t a one-shot oh-we-solved-this-move-on game. Everything needs to be checked and confirmed from every angle. Interesting and important results often come unexpected places. If a person has the replication bug, they should just pick what they like and get ‘er done.

    This is a good one:
    “3) The original result is particularly surprising, that is, it is unexpected in light of previous evidence and knowledge.”

    But we should also:
    “9) The result is mundane and everyone thinks its a waste of time”

    For years biochemists new of DNA but thought it was a boring and simple molecule that couldn’t possibly be the molecule that transmits genetic information.

  3. I don’t think item 4 indicates “problems” with a study.

    “Controversy” about a topic can stem from ideological differences or problematic research designs in previous papers. The minimum wage controversy is an example of such a topic. New studies on such controversial issues need not be problematic and they deserve to be replicated in order to settle the controversy more quickly.

  4. I think the question of whether limited replication resources should be devoted preferentially to shaky or solid seeming results depends in part on context concerning how the results are being received pre-replication. Let’s say we have two sets of results, A and B, with A more suspect than B. Let’s also say that the impact of A and B is similar if “true”. If people are acting on both A and B as if they’re true (e.g. using them to support controversial policy positions as in item 4 on the list or conducting followup research as in item 8), I would be more inclined to replicate suspect results A. The reason is that the results are already impacting the world as if they’re “true”, so replication will have much more of an impact if it debunks the results than if it supports them.

    I would hope that more solid seeming research typically has much more impact pre-replication and is therefore more worthy of replication than suspect research. But when suspect research is received the same as solid research, replication should focus on the suspect research.

  5. You might want new defaults, but as you frequently complain, researchers currently seem to act as if the opposite is the case.

    I think it might be helpful to think on the margin. If we imagine an alternate reality where people are replicating studies all the time, and succeeding 95% of the time, it wouldn’t be all that helpful to do yet another replication of a study that already has some successful replications. In your view, it’s wasteful to try replicating a lousy study we had no reason to believe was accurate (although I recall that someone successfully replicated the result of a faked study you said would be pointless to replicate). So presumably the optimal study to replicate is the one whose credibility is most ambiguous, so the next additional replication could do the most to shift our beliefs about its credibility.

    • The best study to do is the one whose utility of doing the study is highest… The utility is a mixture of how much information we gain from the study (to be able to make good decisions) and how important the decisions are we need to make (like decisions about surgical techniques that might kill people are much more important than about power-poses that might or might not slightly improve people’s moods before interviews or whatever).

  6. A replication attempt is an ongoing attack on the hypothesis. If you successfully replicate, you’ll be admitting that prior studies needed at least “more support”, which reveals that prior studies didn’t support the hypothesis as much as they thought they did.

    I don’t know why this is not obvious. In Bayesian terms, the scientific community has prior probabilities for theories. There’s no point in replicating studies that you think are highly probably true. If the community thinks that a long established theory is highly probably true, but I disagree with that because I think I have a better theory, then the payoff is immense (Nobel prize, hanging out with Taylor Swift, etc.).

    Every single one of those numbered criteria is just another way of saying “either you or someone whose opinion matters (e.g. policy makers) shows skepticism towards the hypothesis”.

    What we don’t have here is a bookmaker.

    • Koray:

      Maybe to you, “a replication attempt is an ongoing attack on the hypothesis.” But that’s not how I see it!

      Let me put it another way. In psychology, they sometimes talk about exact replications and conceptual replications. But others, including me, have argued that, in practice, all replications are conceptual replications. The conditions of the new experiment will always differ from the original, if in no other way than that they will be performed on different people at a different time.

      Traditionally, the main difference between “replications” and “conceptual replications” is not so much in the data collections as in the analysis. In a “replication,” you are allowed to consider the possibility that the original published study may have serious, even fatal, flaws in design, data collection, and analysis. In a “conceptual replication,” you are supposed to take all the claims in the original published study at face value.

      With that in mind, I guess my recommendation is:

      1. Think of all replications as conceptual replications;
      2. When doing conceptual replications to consider the possibility that the original published study may have serious, even fatal, flaws in design, data collection, and analysis;
      3. Don’t think of this as an “attack on the hypothesis.” Rather, conducting a new study is a way to learn And we typically can learn more from following up on good studies than from following up on bad studies.

    • A replication attempt is an ongoing attack on the hypothesis. If you successfully replicate, you’ll be admitting that prior studies needed at least “more support”, which reveals that prior studies didn’t support the hypothesis as much as they thought they did.

      I don’t know why this is not obvious.

      This is not the motivation for running replications at all. “Ongoing attack on the hypothesis” sounds like some cargo cult reason made up to explain a behavior people do not comprehend. The real reason is to:

      1) Ensure you can communicate all the important aspects of the methods, demonstrating you understand the experimental conditions
      2) Check if the observation is stable to whatever differences arise due to varying time/location/etc

      Replications have to do with repeatable observations. The hypotheses motivating the observations are irrelevant.

Leave a Reply to Koray Cancel reply

Your email address will not be published. Required fields are marked *