Why did it take so many decades for the behavioral sciences to develop a sense of crisis around methodology and replication?

“On or about December 1910 human character changed.” — Virginia Woolf (1924).

Woolf’s quote about modernism in the arts rings true, in part because we continue to see relatively sudden changes in intellectual life, not merely from technology (email and texting replacing letters and phone calls, streaming replacing record sales, etc.) and power relations (for example arising from the decline of labor unions and the end of communism) but also ways of thinking which are not exactly new but seem to take root in a way that had not happened earlier. Around 1910, it seemed that the literary and artistic world was ready for Ezra Pound, Pablo Picasso, Igor Stravinsky, Gertrude Stein, and the like to shatter old ways of thinking, and (in a much lesser way) the behavioral sciences were upended just about exactly 100 years later by what is now known as the “replication crisis” . . .

The above is from a new paper with Simine Vazire. Here’s the abstract:

For several decades, leading behavioral scientists have offered strong criticisms of the common practice of null hypothesis significance testing as producing spurious findings without strong theoretical or empirical support. But only in the past decade has this manifested as a full-scale replication crisis. We consider some possible reasons why, on or about December 2010, the behavioral sciences changed.

You can read our article to hear the full story. Actually, I think our main point here is to raise the question; it’s not like we have such a great answer. And I’m sure lots of people have raised the question before. Some questions are just worth asking over and over again.

Our article is a discussion of “What behavioral scientists are unwilling to accept,” by Lewis Petrinovich, for the Journal of Methods and Measurement in the Social Sciences, edited by Alex Weiss.

37 thoughts on “Why did it take so many decades for the behavioral sciences to develop a sense of crisis around methodology and replication?

  1. It seems to me that you already have outlined a set of assumptions and select history of crises and methodology tailored to the general question you pose.

    I think it is worth reviewing Steven Goodman, Daniele Fanelli, John Ioannidis: What Does Research Reproducibility Mean? My understanding from a few academics in Boston that John Ioannidis’ accelerated the queries to which we have been exposed in the last 15 years.

    Just off the cuff notes for now.

  2. I would spin a variation on the second-to-last paragraph in the article. Perhaps concern or revulsion gradually built among researchers as they saw colleagues pursuing power-pose or Wansink-like career models: leveraging media coverage of sexy claims into promotion within institutions. What is special about 2010 is that this model had become (or appeared to be) more viable, due to changes in the media stemming from the rise of the internet.

    The modest difference in my story is that it is not that the research got worse–it was more or less as dismal as ever–but rather the institutions got worse. And methodological reform was just the tool that was found to reform the institution.

    Which is perfectly fine. But it leaves a risk that perhaps, broadly speaking, researchers are content with a quiet walk through the garden of forking paths, as long as they don’t have to hustle to get on the Today show to get respect. If so, methodological reforms may not stick.

    • Fixed; thanks.

      Also, the Menand article is excellent, as Menand’s articles typically are. I’d forgotten how credulous of junk science were people like Steven Pinker who at the same time had reputations for hard-core skepticism. I once briefly discussed with Pinker the problems with evolutionary psychology research. My impression was that he was open to criticism of such work but that he kinda wanted the empirical claims to be true because they jibed with his theoretical take on human nature. I felt he was being too generous to this work because the empirical evolutionary psychology I’d seen was able to produce arguments that could go in any direction, so I don’t think these Psychological Science or PNAS-style claims were as much in support of his theoretical views as he thought.

        • Matt:

          Menand’s dad was an instructor in the political science department at MIT and I took a class from him. It was my second-favorite class in college, so I think it’s so cool that the son has done such good work.

        • I like this bit:

          “The notion is that a particular arrangement must have been “selected for”—as though the struggles among individuals and groups and ideas were nature’s way of making sure that we end up with the best. ”

          Kind of amusing that he’s ridiculing natural selection as a mechanism for explaining human failings: Well, we’re just going to defund the police anyway! Take *that* natural selection! People aren’t naturally violent! The police are just here to “validate the practices and preferences of [the] regime happens to be sponsoring them.”

          As though! Hilarious.

      • The trouble with being a “big ideas” person is that you have to keep finding them, and finding true pithy surprising ideas is slower than the rate you need to publish big idea books to earn a living. And trade publishers rarely bother to fact-check: an editor with a BA in English or History could have caught some of the howlers in “The Better Angels of Our Nature.”

        As far back as psychology has been an academic discipline in the United States, practicioners have been torn between practicing a SERIOUS SCIENCE and providing practical wisdom to solve the sorrows of life. Both the polygraph and William Molton Marston emerged in the interwar US (and the most famous academic psychologist in North America right now is a Jungian!) If your goal is practical wisdom, rigorous methods get in the way of giving your clients clear confident instructions (and all the fun stories keep your undergraduates entertained- they serve as what Latin rhetoricians called exempla, they wrap up a moral in a vivid story).

      • “I’d forgotten how credulous of junk science were people like Steven Pinker who at the same time had reputations for hard-core skepticism.”

        +1

        “How the Mind Works” begins with some compelling stuff about Turing Machines and related issues, well written and accessible. Then, having perhaps established some credibility, Pinker flips and spends the rest of the book describing his favorite theories of human behavior as established fact.

        A modest goal for humanity: eventually we need to realize that we will never unravel the mysteries of how we evolved unless the answers lie in the hard science of genetics.

        • Matt said,
          “A modest goal for humanity: eventually we need to realize that we will never unravel the mysteries of how we evolved unless the answers lie in the hard science of genetics.”

          Not sure whether you mean “hard” in the sense of “difficult” or in the sense of “natural sciences” as opposed to “social sciences”.

  3. The internet. People who were mid-20s grad students around 2010 were born in the mid 80s and grew up using the internet to learn.

    They had access to a much wider array of views that the “authoritative” textbooks could not compete with. It really is exactly like the printing press leading to the reformation and loss of Catholic Church authority. It is still early days, but this pandemic seems to have catalyzed the process.

  4. I think that some adverse selection is at work. When it is difficult to tell the true work quality (but the authors have better information), then people assume that all work is of average quality (a more nuanced view would break this down by some dimensions, such as prestigious universities, 2nd tier, etc.). The provides a premium for work that is below average quality (the above average quality work ends up being taken “off the market,” which in academic terms may mean moving to consulting firms, think tanks, etc.), so the average quality declines. And so on, until it becomes the famous “market for lemons” where the only research appearing in journals is the worst quality. Since there are some ways to ascertain true quality (subject to considerable uncertainty), that extreme outcome is tempered somewhat. But I do think we have forces at work that are likely to lead to general worsening of the quality of what is produced. So, the problem has arguably been getting worse.

    Add in the ability to access data and code, and we have an increasing ability to monitor the quality, ex post. So we have a better ability to recognize this worsening crisis.

    Much harder is to figure out what to do about it.

    • That is the Gresham’s law effect. Bad research pushes out good because it is so much easier to produce a string of significant p-values than to come up with a reproducible method or a quantitative model that makes precise predictions.

      I’m not sure if the good research is still getting produced elsewhere and “hoarded” or not. But I suspect people who would have produced the good research mostly just end up doing something different. Something that gets rewarded by society.

  5. I think there is one factor that you allude to, but didn’t clearly identify in your paper. From Upton Sinclair famous quote, ““It is difficult to get a man to understand something, when his salary depends on his not understanding it”, I think we have to ask, are there now people who can make a career or part of a career out of understanding those errors in behavioral science studies to offset those whose careers depend on not understanding them. I think there are now. In the past, maybe some researcher could make a career out of debuking a particularly widely accepted theory. But, now there are a number of researchers (Reproducibily Project, Retraction Watch, etc.) who can make a carreer out of finding the methodological errors in bad studies and not even particularly prominent studies. And, that is a game changer. If a researcher now realizes that post publication someone is going to look for errors in her paper and publicize it, she’s going to start to want to know how to avoid those errors.

  6. At the bottom of p. 1 of Andrew’s preprint, they write, “The behavioral sciences are at a different stage of methodological self-awareness than they were in 1990 or 1970, when Campbell, Cronbach, Meehl, and others appeared to represent only a small minority of the profession when lamenting the mismatch between statistical methods and substantive theory in the social and behavioral sciences.” I would love to see data supporting (or, as the case may be, contradicting) the apparent claim that such people no longer “represent only a small minority”.

    • Russ:

      It would be good to get some data on this! Even thinking about potential data is helpful, in that it motivates me to think more carefully about what we meant when we wrote that sentence. I agree that only a small minority of the profession are lamenting anything; maybe the point is that 30 or 50 years ago, most of the profession were either unaware of any methodological controversies or thought of such controversies as peripheral to the main concerns of psychology research. But now I have the impression that many research psychologists are aware that debates over methods are very important, at least for some subfields of psychology. So it’s not that lots and lots of researchers are currently lamenting, but I do think that lots of researchers recognize the salience of these concerns.

      To put it another way: Back in the old days, Cronbach, Meehl, etc., were eminent and respected, but it’s my impression that most researchers in the field thought of their methodological concerns as being fringe issues. Now I think the methodological reformers represent a big chunk of the profession, including many students who don’t want to spend their time producing flashy unreplicable results, as well as many behavioral scientists who aren’t part of the Psychological Science / PNAS loop and don’t really enjoy that air rage, himmicanes, etc., are such prominent aspects of the public face of psychology research.

      • Andrew,

        I’d be very happy to see data related to these (differently expressed) impressions as well. By the way, serious methodological complaints about the use of statistics in the social sciences go back at least to Keynes in 1939.

        • Russ:

          Sure, I didn’t mean to imply that Meehl etc. were the first. The relevance of them to our story is that they were raising these concerns within psychology.

        • Andrew,

          According to https://www.britannica.com/science/behavioral-science , “Behavioral science, any of various disciplines dealing with the subject of human actions, usually including the fields of sociology, social and cultural anthropology, psychology, and behavioral aspects of biology, economics, geography, law, psychiatry, and political science. The term gained currency in the 1950s in the United States; it is often used synonymously with “social sciences,” although some writers distinguish between them. The term behavioral sciences suggests an approach that is more experimental than that connoted by the older term social sciences.”

  7. As someone who worked quite a bit with students in Psychology during this period, and was a fan of Meehl, Cohen, and others from decades prior, I’d like to point out two things I observed that were influential in this change.

    One of the bigger influences was R. Psychologists who had no expertise and never actually used it were still strongly influenced. The rise of R meant the rise of easy simulation. The Simmons et al. paper you mention was a critique that relied heavily on simulation and it was one of the earliest times that non-programmer students could easily verify those simulations and generate variations of them for themselves. I found that students were very moved by arguments from simulation, especially when they can see so many of the moving parts that would have been opaque to a non-programmer in the past.

    The file drawer effect has more repercussions than people think. It’s not just that researchers observed a lot of junky work being done, but people who don’t replicate and don’t publish those non-replications talk among their colleagues. This generates dissent and dissatisfaction and a large, heretofore unpublished, group of individuals who know many findings are BS but can’t really say anything for a variety of reasons, including a lack of language tools they can use to discuss the matter. I think that it’s better to consider that many individual researchers had good prior evidence that some particular finding was untrue. Then the False Psychology papers hit and they had some support. They had better tools that allowed them to argue those individual findings. And then, a dam burst. And people realized what they knew about something very specific turned out to be true across a variety of studies. More and more evidence mounted and… you have the crisis. It’s not that individual researchers were convinced by the articles you mention as foundational to the crisis but that eventually individual efforts spurred by those papers collectively produced evidence of one.

    I’d also like to point out that, even though there has been a shift it is often a very problematic one. In the review process I frequently see the abuse and misuse of some of the lessons from replication crisis founders based on a lack of deeper understanding of the issues or, worse, using a shallow interpretation as a cudgel to quash alternative views. So, all is not rosy in this reformation.

    • “In the review process I frequently see the abuse and misuse of some of the lessons from replication crisis founders based on a lack of deeper understanding of the issues or, worse, using a shallow interpretation as a cudgel to quash alternative views. So, all is not rosy in this reformation.”

      Paul Feyerabend wrote at length about this issue, and offers many examples in Against Method where the ‘sound, rational’ scientist in position of authority used ‘reason, observations, and facts’ to allegedly disprove competing, new theories; labelling the champions of such theories as silly. The punchline being that the silly scientists (and the theories they believed) turned out to be closer to the truth than what the eminent scientists were vigorously defending with so-called reason/science.

      This, among other things, led P.F. to believe that when it comes to science anything goes.

      It is a terrible issue to contend with. It gets even worse when one realizes that observations are themselves quite often infected by the theories we already believe (see Against Method…though P.F is a bit strong and says this is always the case but that is obviously false). When the empirical data we rely on to appraise new theories is already itself infected by the theories we already hold dear, it’s easy to see how things become a sticky mess. At what point are we being rational, critical scientists versus being dogmatic theorists clinging to our entrenched beliefs (even if we do not believe (or want) to be doing so)? That division is not as clear as people would like it to be (or even as clear as this very blog seems to imply at times).

      • AllanC said,
        “At what point are we being rational, critical scientists versus being dogmatic theorists clinging to our entrenched beliefs (even if we do not believe (or want) to be doing so)? That division is not as clear as people would like it to be (or even as clear as this very blog seems to imply at times).”

        +1

        (says one person who uses nested parentheses in writing to another)

  8. The existence of a “crises” and the perception of a “crises” are not necessarily one and the same.

    What makes a crises in this case might be some combination of greater awareness of a problem with more focus on highlighting the problem and exploiting the problem for any variety of reasons.

    I’ve often wondered re the “replication crisis,” just how the descriptor of “crisis” is defined.

    Sure, there’s seemingly more evidence that a lot of studies don’t reproduce, but (1) there are technical problems with many of the attempts at reproduction that don’t necessarily “prove” that the original studies (or their conclusions) were invalid and, (2) does that mean that a higher proportion of formal research has become less valid over time, or just that we’re producing more research and more research is being investigated for (and failing) reproducibility? and, (3) given that it often takes time for there to be a societal impact from research findings, does the existence of a replication “crisis” mean some kind of net increase in “harm” from sub-optimal research over time, or merely that we’re more aware of the methodological problems in research but that in the end the same % of solid findings stand the test of time as more or less happened in the past?

    I think that there may be a proliferation of “crises” in general, just as there may be a growing sense among people that they’re being victimized, even if when we decouple the signal from the noise, we’re on an upward trajectory in the positive benefits accrued by society from research along with an upward trajectory in the number and variety of people who have more agency in having some ability to influence their life circumstances.

      • > Auto-correct gremlin?

        Nope. Just the unclear writing gremlin. Just saying that I think that the “crisis” label is often over-used, and I think may be over-used in this case in particular. Why is it a “crisis?”

  9. It might also be worth looking at the subfield of surveying people about their private lives and presenting the results of the survey as facts about people in general. That kind of research can get published in academic journals and recycled in “Psychology Today” and similar venues, even though for a long time thoughtful people have known that people lie and misremember about their private lives and so this kind of survey data is worse than useless. Bob Altemeyer is most famous for his work on authoritarian personality, but his other major field of research was conducting surveys on his students which were designed to avoid the problems of the usual approach (and which he was explicit could not be extrapolated beyond undergraduates in Saskatchewan).

    So this is another case where outsiders have recognized that the methods psychologists are using are not strong enough to back their claims, but insiders ignore the criticisms and publish each other’s papers because people like to gossip about other people’s private lives.

    • Vagans said, “So this is another case where outsiders have recognized that the methods psychologists are using are not strong enough to back their claims, but insiders ignore the criticisms and publish each other’s papers because people like to gossip about other people’s private lives.”

      Snarky and cynical, but (sadly) probably true.

  10. Late to this post, but my two cents: I suspect a major reason things changed is that the people in charge retired/died. Suppose we stipulate that a career is about 40 yrs on average, and that it takes 20 years to become influential. Then, if attitudes began changing in 1970, grad advisors and professors would begin introducing these ideas to their grad students then. The second generation came to power in 1990, and their grad students assumed leadership in 2010. This suggests it took about two generations to propagate the ideas from training to practice, among a critical mass of practitioners. Not bad, in the grand scheme of things. Maybe we’ll be rid of pay journals by 2030?

Leave a Reply to steve Cancel reply

Your email address will not be published. Required fields are marked *