Wagenmakers et al. write:
A single experiment cannot overturn a large body of work. . . . An empirical debate is best organized around a series of preregistered replications, and perhaps the authors whose work we did not replicate will feel inspired to conduct their own preregistered studies. In our opinion, science is best served by ruthless theoretical and empirical critique, such that the surviving ideas can be relied upon as the basis for future endeavors. A strong anvil need not fear the hammer, and accordingly we hope that preregistered replications will soon become accepted as a vital component of a psychological science that is both though-provoking and reproducible.
I don’t feel quite so strongly as E.J. regarding preregistered replications, but I agree strongly with his anvil/hammer quote, which comes at the end of a recent paper, “Turning the hands of time again: a purely confirmatory replication study and a Bayesian analysis,” by Eric-Jan Wagenmakers, Titia Beek, Mark Rotteveel, Alex Gierholz, Dora Matzke, Helen Steingroever, Alexander Ly, Josine Verhagen, Ravi Selker, Adam Sasiadek, Quentin Gronau, Jonathon Love, and Yair Pinto, which begins:
In a series of four experiments, Topolinski and Sparenberg (2012) found support for the conjecture that clockwise movements induce psychological states of temporal progression and an orientation toward the future and novelty.
OK, before we go on, let’s just see where we stand here. This is a Psychological Science or PPNAS-style result: it’s kinda cool, it’s worth a headline, and it could be true. Just as it could be that college men with fat arms have different political attitudes, or that your time of the month could affect how you vote or how you dress, or that being primed with elderly-related words could make you walk slower. Or just as any of these effects could exist but go in the opposite direction. Or, as the authors of those notorious earlier papers claimed, such effects could exist but only in the presence of interactions with socioeconomic class, relationship status, outdoor temperature, and attitudes toward the elderly. Or just as any of these could exist, interacted with any number of other possible moderators such as age, education, religiosity, number of older siblings, number of younger siblings, etc etc etc.
Topolinski and Sparenberg (2012) wandered through the garden of forking paths and picked some pretty flowers.
What happened when Wagenmakers et al. tried to replicate?
Here we report the results of a preregistered replication attempt of Experiment 2 from Topolinski and Sparenberg (2012). Participants turned kitchen rolls either clockwise or counterclockwise while answering items from a questionnaire assessing openness to experience. Data from 102 participants showed that the effect went slightly in the direction opposite to that predicted by Topolinski and Sparenberg (2012) . . .
No surprise. If the original study is basically pure noise, a replication could go in any direction.
Wagenmakers et al. also report a Bayes factor, but I hate that sort of thing so I won’t spend any more time discussing it here. Perhaps I’ll cover it in a separate post but for now I want to focus on the psychology experiments.
And the point I want to make is how routine this now is:
1. A study is published somewhere, it has p less than .05, but we know now that this says little to nothing at all.
2. The statistically significant p-value comes with a story, but through long experience we know that these sort of just-so stories can go in either direction.
3. Someone goes to the trouble of replicating. The result does not replicate.
Let’s just hope that we can bypass the next step:
4. The original authors start spinnin and splainin.
And instead we can move to the end of this story:
5. All parties agree that any effect or interaction will be so small that it can’t be detected with this sort of crude experimental setup.
And, ultimately, to a realization that noisy studies and forking paths is not a great way to learn about the world.
Let me clarify just one thing about preregistered studies. Preregistration is fine, but it helps to have a realistic sense of what might happen. That’s one reason I did not recommend that those ovulation-and-clothing researchers do a preregistered replication. Sure, they could, but given their noise level, it’s doomed to fail (indeed, they did do a replication and it did fail in the sense of not reproducing their original result, and then they salvaged it by discovering an interaction with outdoor temperature). Instead, I usually recommend people work on reliability and validity, that is, on reducing the variance and bias of their measurements. It seems kinda mean to suggest someone do a preregistered replication, if I think they’re probably gonna fail. And, if they do succeed, it’s likely to be a type S error, which is its own sort of bummer.
I guess what I’m saying is:
– Short-term, a preregistered replication is a clean way to shoot down a lot of forking-paths-type studies.
– Medium-term, I’m hoping (and maybe EJ and his collaborators are, too) that the prospect of preregistered replication will cause researchers to moderate their claims and think twice about publishing and promoting the exciting statistically-significant patterns that show up.
– Long term, maybe people will do more careful experiments in the first place. Or, when people do want to trawl through data to find interesting patterns (not that there’s anything wrong with that, I do it all the time), that they will use multilevel models and do partial pooling to get more conservative, less excitable inference.