Skip to content

Historical critiques of psychology research methods

David Lockhart writes:

I found these two papers – in of all places the presentation which Emil Kirkegaard and John Fuerst are presenting in London this weekend, which they claim is preventing them from responding to the can of worms they have opened by publishing a large, non-anonymized database of OKCupid dating profiles. This seems like it may become an important case in research ethics and data privacy. You may want to look into it. I recommend starting with this post by Oliver Keyes, but Vox, Vice and Thomas Lumley have all picked up the story.

At any rate, the culprits cite these two papers that look quite good, and the second is the lead in to a whole special issue on cumulative science from Psychological Methods in 2009.

Frank Schmidt (1996) “Statistical significance testing and cumulative knowledge in psychology: Implications for traning of researchers”
(Ha! Hosted by Mayo – I hope she’s already pointed you to it)

Patrick Curran (2009) “The seemingly quixotic pursuit of a cumulative psychological science”

In addition, I’d point you to Allen Newell’s work in his paper “You can’t play 20 questions against nature and win” from 1973 and his book _Unified Theories of Cognition_ from 1990 where he revisits and reiterates the idea. Here’s a link to the 1973 paper:

The 20 questions paper is a summing up from a colloquium and heavily references the papers presented with little re-presentation of them, but I think the larger point stands on its own. Also, you may be familiar with the Chase & Simon paper being discussed, as it studied chunking and other phenomena in the memory of chess experts, both taking deGroot’s work further and showing that using larger chunks of related pieces accounts for a lot of the memory advantage that expert players have for board positions.

A comment on the enterprise of surveying this intellectual history. We can find papers like Schmidt and Curran that approvingly cite Meehl and note that the criticisms still stand. We can find others like Newell making similar points but not citing Meehl (or being cited by Schmidt or Curran). We might be able to find papers pushing back against Meehl and also look at how much they get cited. But it seems likely that we can’t find the many psychologists who are unaware of Meehl’s criticism or do not think it is relevant to their own work and so don’t bother commenting on it. And I suspect that’s where the real problem lies.

One conjecture I have is that the root of the problem is that exploratory analysis is something people spend more time working on, thinking about and reading about than confirmatory data analysis, for example:

I actually think the graph in that last link is misleading, as it is my impression that the expression “confirmatory data analysis” is a backformation that exists only in reference to “exploratory data analysis.” So it makes sense that people mostly won’t be talking about confirmatory data analysis, even when they’re doing it. They’ll just talk about hypothesis testing or whatever. For more on this, see my 2010 discussion of exploratory and confirmatory data analysis. And here’s my recent discussion of Meehl.

Lockhart continues:

This combines with a change in the standards of what constitutes a publishable unit to allow publication of what were previously regarded as partial, preliminary results. (I’ve seen this assertion made but don’t know where, and finding a source has been hard.) It’s not necessarily a bad thing to allow some scientists to specialize in EDA or to allow them to publish an EDA of potential importance and decide they would rather pursue something else themselves. But it becomes a problem when the rewards – both extrinsic and intrinsic – are tilted so far towards the EDA, and doing just the EDA becomes so common that everyone seems to forget that the pre-registered confirmation is a crucial piece of the knowledge discovery process, not just something you ignore until you get all the way to evaluating real world interventions.

You might also check out this article in which Angela Duckworth complains that her work on “grit” as a predictor of success is being unfairly stuck with a failure to replicate label for supposed replications from folks that don’t seem to understand her theory:

And finally if you aren’t aware of it, check out Norm Breslow’s 2003 paper _Are statistical contributions to medicine undervalued?_

Amongst other things, on page 4 Breslow discusses the variety of findings submitted by different teams on a class project to analyse an open question with real epidemiologic data. (I was one of the students in that class and am working on writing up something about that experience.) I know that Nosek has since done something similar, probably with more senior researchers.

I don’t really have anything to say on this, but I thought all the above links might interest some of you.


  1. Jeff McLeod says:

    I’ve mentioned before, I was fortunate to have taken Meehl’s philosophy of psychology class, and I spent time talking to him about these issues outside of class. I have heard that Meehl met with considerable resistance to publishing his ideas on statistical significance testing. Notice he tended to publish in law and philosophy journals. I think he did this out of frustration. I could be wrong. As I said once before, Meehl seemed resigned to the foolishness. He thought an honest grad student would either embrace methodological rigor or go sell shoes. Most chose not to embrace methodological rigor, yet stayed in psychology anyway.

    I asked him once whether Bayesian methods could help fix all of this. He said the answer is not a statistical methodology. The answer is clarity in thinking about the scientific enterprise.

    I suspect that if he were on this topic thread, Meehl would recommend one reads the Neo-Popperian Imre Lakatos and Alan Musgrave’s, *Criticism and the Growth of Knowledge*. It was a book he strongly recommended to me. My copy is marked up beyond recognition. Fantastic philosophy of science perspective about how knowledge accumulates (or fails to accumulate) in a discipline. Meehl described himself as a neo-Popperian, and thought that Lakatos was raising the right questions. He loved methodological anarchist Paul Feyerabend, which I always found surprising. But apparently they were friends.

    Also, in the earlier Meehl thread, it was remarked that Meehl’s work was not cited in Kahneman’s book. True, but in journal publications, Kahneman credits Meehl with his (Kahneman’s) career. He said it was reading Meehl’s “Clinical versus Statistical” that made him turn to research on cognitive bias that won him the Nobel Prize.

  2. Keith O'Rourke says:

    Nice classroom demonstration of forking paths in

    “When I looked closely at the reasons for the dramatically different conclusions,
    however, it seemed clear that they were the result of
    a chain of effectively arbitrary choices made with regard to
    scales of measurement, cut points for discretization, and variables
    selected as potential confounders.”

  3. Cory Giles says:

    Is it actually true that in psychology, the “rewards – both extrinsic and intrinsic – are tilted so far towards the EDA”? In biology, EDAs are basically unpublishable, at least in any journal I know of. Although I suppose it depends on the exact definition of the term.

    I would like to see EDAs more easily publishable in biology, but I can easily imagine that the field would become swamped with noise if the incentives actually favored them over experimental data. Does this account for some of the noise coming out of psychology?

  4. Marcus says:

    Angela Duckworth’s claims about the wonders of grit do not replicate at all as we showed in our meta-analysis of the grit literature.
    She also made some pretty serious statistical errors in her original papers (e.g., misunderstanding odds ratios, reporting results for unidentified models), so she probably led herself astray to some degree.

    • To her credit, Duckworth acknowledges errors in her work. I see a bigger problem when people make errors but claim that they aren’t really errors, or that the supposed hypothesis wasn’t the “real” one, or that everyone else misunderstands the topic, or that hundreds of new unpublished studies support the original findings.

  5. Didn’t notice this post before. Curious that John Fuerst gets the blame here. He was entirely uninvolved with that project.

    For the record, there is an FAQ about that project here, for those curious:

Leave a Reply