My sense is that commonness is assessed by the likelihoods with the prior set to be common for all studies. Mike Evans disagrees and thinks the different priors that may have been used should also be assessed for commonness and appropriately pooled.

]]>Broadly speaking, I think that exact replications of positive results (or replications with more data) are not massively useful. At their very best they are a tiny corner of the space of interesting things. And I think this paper bears this out! (Also it’s not done under a hypothesis testing framework, which is always nice).

Like chairs, tables, and governments, the real object of interest is stability.

]]>Thanks for the post.

Regarding “x% of experiments in social psychology don’t replicate”: A major concern here, even beyond what you mentioned, is that it’s generally a mistake to think of “replicate” as a true/false statement, in part because whether a result is statistically significant (the usual standard for “successful replication”) is itself so noisy, in part because it’s typically a mistake to think of an experiment as having just one result. The way we handled this in the replication study we did of one of our earlier papers was to explicitly state that our results were not a single. We wrote:

We began this study with no particular concern about [our earlier published] results, but a bit of replication would, we believe, give us a better sense of uncertainty about the details. In addition, one can always be worried about opportunistic interpretations of statistical results. . . .

And:

[The analysis in our original paper was] exploratory (and thus a replication cannot be simply deemed successful or unsuccessful based on the statistical significance of some particular comparison).

Just in general, we shouldn’t take off our statistical-common-sense hat, just because we’re talking about replication. A replication may be preregistered, but being preregistered doesn’t mean turning yourself into a Neyman-style Stepford wife of statistical methods. Just as, we can do randomized experiments without restricting our analyses to classical t-tests and Anovas. And we can do random sampling without restricting our analyses to narrow classical sampling-theory procedures. Preregistration, random assignment, and random sampling are great *design ideas* which go well with sophisticated and careful analyses as well as with off-the-shelf classical procedures.

(The above paragraphs do not represent any disagreement with your post; rather, take them as elaborations of your points.)

]]>