Rob Wilbin writes:
I made this quiz where people try to guess ahead of time which results will replicate and which won’t in order to give then a more nuanced understanding of replication issues in psych. Based on this week’s Nature replication paper.
It includes quotes and p-values from the original study if people want to use them, and we offer some broader lessons on what kinds of things replicate and which usually don’t.
You can try the quiz yourself. Also, I have some thoughts about the recent replication paper and its reception—I’m mostly happy with how the paper was covered in the news media, but I think there are a few issues that were missed. I’ll discuss that another time. For now, enjoy the quiz.
Well I got the 1st 2 right. But I was somewhat unclear about the question. The 2nd on in particular.
OK, I went through 10 examples of the total, as u have a choice to stop at 10. Each correct answer equals 2 points. I got 18 so far. I compared how I did before seeing the detailed stats and after seeing them. I answered correctly before seeing the details too. So again, I think some results should be obvious whether you have or don’t have the details.
If the replication results in an effect in the same direction of the original, but of smaller size, shouldn’t that give some pause? The quiz seems to presume that as long as the replication shows a statistically significant effect in the same direction as the original, it’s a successful replication. Wouldn’t an effect of substantially smaller size cast doubt on (a) the importance of the finding and (b) the stability of the results?
So do you mean that there is an inherent validation of significance testing for purposes of replication? That was my question initially.
To add, without 1st looking at the stat details, I got the same answer as if I had 1st consulted the stat details. That makes me focus on the quality of the hypotheses put forth.
“If the replication results in an effect in the same direction of the original, but of smaller size, shouldn’t that give some pause?”
Not really — that is, in fact what one would expect — see http://statmodeling.stat.columbia.edu/2009/11/21/type_m_errors_a/
Indeed, because Type I errors are so likely to give inflated effects, one could consider one purpose of a replication to be getting a better estimate of effect size.
The Scholar’s Stage blog comments on this…
‘How did I score so well? My predictions followed a rough rule of thumb: if the study 1) involved “priming,” or 2) seemed to fly against my own experience dealing with humans in day to day life, I predicted it would not replicate.’
http://scholars-stage.blogspot.com/2018/09/so-why-did-they-publish-them-few-notes.html
From the Nature article: “Rand cautions that a failure to replicate should not be seen as an automatic invalidation of the original study — who, after all, is to say which of the two outcomes is correct?”
This is true, but if you accept publication bias as a factor in failed replications, that framing makes it sound like more of a shrug and a coin flip than it is. To adapt Andrew’s time-reversal heuristic: if the replication had happened first, how likely is it that it would have been published?
I thought one study had identifiable biases. The one that stuck out was the study about reading ‘fiction’ and looking at a pic.
I think what happens in each field is that, for a time, a circle has disproportionate influence and control by virtue of like-minded members, who have gained tenure and prestige. I have observed this throughout my life. And in some cases echo chambers form. There may be debates within those circles. But to what extent each deviates all that much from the main goals and objectives of the circle or governing bodies is an interesting process to watch. I see this more in the political science circles.