“Replication initiatives will not salvage the trustworthiness of psychology”

So says James Coyne, going full Meehl.

I agree. Replication is great, but if you replicate noise studies, you’ll just get noise, hence the beneficial effects on science are (a) to reduce confidence in silly studies that we mostly shouldn’t have taken seriously in the first place, and (b) to provide an disincentive for future researchers and journals to publish exercises in noise mining.

In his article, Coyne discusses lots that we’ve seen before, but I find it helpful to see it all in one place. In particular, you get a sense that junk science is at the core of academic psychology. Yes, we can (and should) laugh at himmicanes etc., but within the field of psychology this is no joke.

To say that psychology has problems is not to let other fields off the hook. Biology is notorious for producing “the gene for X” headlines, political science and economics have informed us with a straight face that elections are determined by football games and shark attacks, and so on.

Or take statistics. I don’t think statistics journals publish so many out-and-out wrong results, but I would say that most articles published in our journals are useless.

And I’m not saying psychology is all bad, or mostly bad, just that the bad stuff is out there, continually promoted and defended by the Association for Psychological Science and leading journals in the field.

The only way I’d change Coyne’s article is to add a slam at the field of statistics, which in many ways has prospered by selling our methods as a way of converting uncertainty into certainty. I think we are very much to blame. See the end of this article (the second column on page 55).

18 thoughts on ““Replication initiatives will not salvage the trustworthiness of psychology”

    • I’m not optimistic. The people funding these studies aren’t very careful about how their $$ get spent.

      Often, it is the same set of himmicane / power-pose people sitting on the funding committees. How can we expect change?

      • Not to mention the fact that the huge number of undergrad psych majors undoubtedly are a significant proportion of departmental revenue, and those people get into the field at least in part because of these (bad) headline studies.

        That reminds me. Can anyone in psych tell me what the field thinks about the landmark “experiments” that are so popular in the undergrad curriculum? I am mostly thinking of the Stanford prison experiment, which seems to be more of an art project than science. At least the way I learned it in intro to psych, there seemed to be no hypothesis at all, no actual data, and the lead researcher actually participated in the study (as the prison warden). Yet this was considered a major breakthrough in learning about the dark side of human nature.

        • It’s an interesting result qualitatively even if it’s pretty flawed from a statistical POV–I (and likely others) think about it the same way I think about interview-based qual studies from other social sciences.

        • As a professional professor of psychology (of the cognitive stripe), but with training entirely outside psychology (computer science, primarily), I think it’s about 50/50, in two ways. About half of the people take these results largely uncritically as indicative of…something…About half of us take about HALF of the classical results serious. Stanford Prison Experiment has always struck me as more or less a joke–how could we think that this is real science, or informative of anything? On the other hand, many classic results are very stable, and can easily be replicated on demand within individuals “on the street” or in classroom settings (Stroop effects, for instance). So you do have to pick and choose. I try to avoid mentioning the stuff I can’t credit, unless I think it’s particularly important and possible to discredit it.

  1. Is there any academic field in which it’s *not* the case that most of the published articles are wrong, useless, or insignificant? Some fields avoid publishing wrong things (for the most part), but they rarely avoid the other two vices.

  2. I have very mixed views on this.

    Being a statistician with heavy exposure to the Bio world, I can certainly tell you that there is a lot of bogus research out there. Quite frankly, this a result from the *need* to publish. Your $300k experiment didn’t work out as planned? Well, publish something or lose your job! Bad papers are an almost certain result of this environment. No increase in statistical education will ever fix that if expectations aren’t brought back down to earth at some point.

    On the other hand, despite the fact that we have lots of garbage coming out, we’ve also produced a system that churns out new, real, reliable technology at a mind-numbingly fast rate. To the effect that what Bio researcher learn as the latest and greatest in grad school becomes almost obsolete by the time they graduate.

    Personally, I don’t like this aspect of the academic environment. But I also have to admit that it does lead to extremely rapid development of real science. It’s just that there’s a whole lot more garbage to swift through.

    • Perhaps insiders have good heuristics to a priori tell apart the bogus from the valid?

      That would let the system still function well because we can ignore the crap publications when we generate technology. i.e. There’s a set of papers catering to the publish-or-perish / funding-agency / tabloid-publicity markets which have obvious markers that the others see & ignore.

      • I wonder if the problem is made more extreme though by the clickbait headline incentives, which are an increasingly new pull on scholars in the social media age.

        Yes, there are piles of garbage being put out in all fields as scholars have to publish something or perish to get tenure, but the incentives to produce the dreck that this article discusses seems to be a more specific response to the financial incentive structure created by the (new) potential for 15-minutes of internet fame from your garbage research. There’s a big difference between the social scientist publishing a sloppy study in the Journal of Southeast Asian Development Studies (insert generic 4th tier journal title here) that nobody ever reads, just so she can fill out her CV, and the social scientist publishing an equally dodgy paper and turning into a social media sensation and giving a Ted talk based on it. Very different beast entirely.

        We can’t get rid of all the garbage, but maybe we can reign in the public media bullshit projection of the garbage?

    • “On the other hand, despite the fact that we have lots of garbage coming out, we’ve also produced a system that churns out new, real, reliable technology at a mind-numbingly fast rate. To the effect that what Bio researcher learn as the latest and greatest in grad school becomes almost obsolete by the time they graduate.”

      I don’t see how you can be so confident in this. The primary method of assessing what is successful or not (NHST) is meaningless. For a while, in a field in the process of adopting NHST, this is no big deal, the p-values, etc are just extra junk in the papers. But eventually the scientific safeguards (replication, precise prediction) are removed and NHST takes over. I’m not sure even most critics of that practice realize how insidious this is. It means people can follow the wrong path for decades without any indication there is something wrong.

  3. The real reason that replication research won’t help the most problematic areas of psychology is that researchers in those fields have already bought into a cluster of interconnected measurements and methods that render much of the “existing knowledge base” self sealing. It would require a genuine revolution in the field to criticize it even modestly. Unless the critics are prepared to uncover that there are no meaningful generalizations in many of the areas now studied, or at least that many of the methods and experiments are not measuring what they purport to be studying, the “reform movement” will fail to improve the scientific status of psych (and related fields). The existing replication attempts haven’t gotten anywhere close to criticizing presumed links between the statistics and the inferences. The method on which they rely to show lack of replication–i.e., that p-values in attempted replications are not small–does important work for them, in suggesting original small p-values were due to QRPs, but the researchers are afraid to do more than whisper the possibility of this explanation.
    Coyne is much stronger in his construal (of non-replications). He mentions one thing I hadn’t heard of. Coyne makes it sound as if there’s an (implicit?) agreement between psych societies and top journals in psych not to pressure them to publish replication efforts, regarded as insufficiently novel. Is that true? But refuting entire types of psych tests and measurements would be quite novel. Of course, you can’t expect journals to go as far as I’m suggesting would be needed, because it could be suicidal.

    A link to a recent presentation” the statistical replication crisis: paradoxes and scapegoats’. https://errorstatistics.com/2016/05/10/my-slides-the-statistical-replication-crisis-paradoxes-and-scapegoats/

Leave a Reply to Nick Menzies Cancel reply

Your email address will not be published. Required fields are marked *