The replication and criticism movement is not about suppressing speculative research; rather, it’s all about enabling science’s fabled self-correcting nature

Jeff Leek points to a post by Alex Holcombe, who disputes the idea that science is self-correcting. Holcombe writes [scroll down to get to his part]:

The pace of scientific production has quickened, and self-correction has suffered. Findings that might correct old results are considered less interesting than results from more original research questions. Potential corrections are also more contested. As the competition for space in prestigious journals has become increasingly frenzied, doing and publishing studies that would confirm the rapidly accumulating new discoveries, or would correct them, became a losing proposition.

Holcombe picks up on some points that we’ve discussed a lot here in the past year. Here’s Holcombe:

In certain subfields, almost all new work appears in only a very few journals, all associated with a single professional society. There is then no way around the senior gatekeepers, who may then suppress corrections with impunity. . . .

The bias against corrections is especially harmful in areas where the results are cheap, but the underlying measurements are noisy. . . .

Leek counters:

Wait, I thought there was a big rise in retraction rates that has everyone freaking out. Isn’t there a website just dedicated to outing and shaming people who retract stuff? I think registry of study designs for confirmatory research is a great idea. But I wonder what the effect would be on reducing the opportunity for scientific mistakes that turn into big ideas. This person needs to read the ROC curves of science. Any basic research system that doesn’t allow for a lot of failure is never going to discover anything interesting.

I think Leek might be missing the point here. Nobody is suggesting “not allowing” for the publication of scientific mistakes. The same old trial and error pattern can continue, in which researchers are allowed, even encouraged, to publish speculative, high-risk high-return sorts of research. The point is to suggest there should be fewer barriers to the publication of criticism of this sort of high-profile publication. As it is now, you can make a dramatic claim based on a couple of survey questions filled out by 150 Mechanical Turk participants, publish it in Psychological Science, and get press around the world. Meanwhile, criticism of this work ends up appears in blogs etc. The most notorious example was that Bem paper where the journal that published his ridiculous paper refused to publish the failed replications.

So, yes, let’s give people the chance to publish and fail, even to embarrass themselves! In the grand scheme of things, a scientist who publishes 9 failures and one major success may well be making more of a positive contribution than an otherwise equally placed scientist who publishes 10 boring minor incremental contributions, even if they happen to be correct. But let’s move that process forward and make it easier for people to point out the mistakes in those 9 failures sooner. Science can be self-correcting, but that “self-correcting” process is the result of actions of many individuals in the system, and right now I do agree with Holcomb the barriers to the publications of replications and criticism are too high.

Leek does have a point, though, that critics such as Holcomb (and me!) don’t provide any statistical data to back up our claims. I have a general feeling (supported by data collected by researchers such as Uri Simonsohn) that lots of shaky research gets published, and I have a lot of anecdotal evidence, but that’s about it.

But, hey, that’s fine! As noted above, I don’t oppose the publication (in journals or in blogs) of speculation, as long as the source of the relevant data (in this case, all my anecdotes) is made clear. And others can criticize. (Indeed, this blog has an active comment section, and when people email me with longer comments, I often post them as full entries to give readers both sides of the story.)

Open debate will not solve all problems. But let me get back to the key point: as Holcomb says, the self-correction that is central to the scientific process can be slowed down or it can be helped along. Working to improve the self-correction process is not equivalent to “reducing the opportunity for scientific mistakes that turn into big ideas.” Rather, it’s about getting to those big ideas more effectively.

P.S. One issue that came up in the discussion is that a published criticism could well be a lesser piece of work, compared to original research.

Indeed, if I think a paper is flawed and I go to the trouble of criticizing it, I probably won’t want to put an equivalent effort into the criticism as the original researcher put into the published paper. After all, I think the work is flawed! It’s probably not worth a huge amount of effort. Still, I think the criticism should be published, and I think it’s a big mistake when journals reject criticisms because they’re not major research contributions.

One problem, I think, is that publication in journals is not just about the science, it’s a scarce measure of value that’s used in hiring, promotion, grant review, etc. So journal editors have the same attitude about restricting publication as the U.S. Treasury has about printing a few more dollars.

With that issue in mind, I’d have no problem if journals were to flag corrections and replications, so that tenure committees etc would recognize that these are not original research. So, for example, if you publish a replication or non-replication of a Cell article and this replication or non-replication appears in Cell too (this is under my ideal new regime where it is encouraged for people to publish criticisms and replications in the same journal as published the original article), you won’t get credit for “a Cell article.” For example, the official journal title for the article could be Cell Criticisms and Replications, or something like that. This way, people don’t have to worry that the credit for their original research is being diluted by endless trivial criticisms and replications. But science as a whole would benefit, because these criticisms and replications would be right out there in the literature for anyone to read.

34 thoughts on “The replication and criticism movement is not about suppressing speculative research; rather, it’s all about enabling science’s fabled self-correcting nature

  1. > critics such as Holcomb (and me!) don’t provide any statistical data to back up our claims

    But you simply do not currently have access to the data and when that changes (see below) the distribution of shaky research published will almost surley change – “By God Jim, they can check our raw data!”

    “Anecdotally, of course, we hear of other such circumstances [shaky research],” he said, adding: “The truth is we don’t really know what the full scope of the problem is.”

    http://www.medpagetoday.com/PublicHealthPolicy/ClinicalTrials/43995?isalert=1&uun=g646841d638R5778811u&utm_source=breaking-news&utm_medium=email&utm_campaign=breaking-news&xid=NL_breakingnews_2014-01-27

  2. Really interesting post!

    There appears to be a bit of a pattern of the people who tend to lean the hardest on the claim that science is self-correcting simultaneously tend to favor practices that cripple the correction process. There is a somewhat perverse logic at work here as in “We have decided not to publish your correction to the flawed paper we published but don’t worry….science is self-correcting so your point will surely prevail in the end.” After all, why should anyone waste journal space on corrections which will ultimately be corrected anyway since science is automatically self-correcting? I can see why Andrew devotes time to correcting journalistic errors since these won’t get corrected unless someone like Andrew takes action. But, Andrew, why bother critiquing science? You should focus on creating new science and just let the self-correcting nature of science take care of itself.

    This situation brings to mind the old joke about the economist who doesn’t pick up the money she finds on the sidewalk because if there was money on the sidewalk someone would have already picked it up.

    By the way, I once had a back-and-forth with at editor at Public Opinion Quarterly which had published a paper filled with substantive errors. This finally ended when he informed me that he had consulted with the top editor and learned that they have a policy of not correcting errors. Well, why should they? The errors I was concerned about have not yet been corrected anywhere else but I’m sure they will be.

    • Mike:

      My impression is that, if you were to ask the people like Leek who feel that I’m overdoing it with the whole criticism thing, they would say that they don’t mind when I criticize bad work within my own subfield (for example, if someone does a crappy analysis of voting patterns and makes a false and unsupported claim about income and voting, or if someone proposes a multiple imputation algorithm that doesn’t make sense, of if someone proposes an estimate for hierarchical variance parameters that has poor statistical properties), but they find it a bit weird that I waste time criticizing work in areas such as social psychology that are far from my own research interests.

      Consider the classic movie Death Wish. In the first half of the movie, Charles Bronson shoots muggers who are threatening him. In the second half, he goes out and looks for muggers to shoot. I think the part of my anti-fraud blogging that seems unseemly to some people is the aspect of vigilantism.

      • “I think the part of my anti-fraud blogging that seems unseemly to some people is the aspect of vigilantism.”

        –The Death Wish reference was not the one I was expecting on this blog. Maybe now I understand why I love coming here just a little bit better.

        +1

  3. ….sorry….one more point.

    I think that the scientific community also has a culture against replications and critiques. It’s not just that journal editors are covering their butts and looking for new, headline-grabbing research. There is also a widespread view that doing critiques or replications is inappropriate behavior. I think this attitude is present in the Mina Bissell article you blogged on recently.

    http://statmodeling.stat.columbia.edu/2013/12/17/replication-backlash/

    The idea seems to be that by you might tarnish peoples’ reputations by challenging their work so good researchers shouldn’t go there. The truth doesn’t seem to be a big priority of people who think this way.

    The reluctance of journals to publish critiques probably reinforces the moral suasion against writing critiques. There is all sorts of flawed stuff out there that doesn’t draw a critique. So if you then publish a critique of one flawed paper among many then people can ask why you’re picking on just this one research team. Is it personal? And, indeed, if your critique is successful then it probably will tarnish the reputations of the authors of the original paper.

    If research followed by critique was a common pattern then science could self-correct and people could publish speculative work that eventually fails without there being much of an effect on their reputations.

  4. One point that puzzled me is: Even when a critique is published on an article why don’t journals combine the pdf’s? i.e. An unsuspecting reader shouldn’t be able to download an article without noticing the critique / rebuttal etc. Most often the rebuttals are published as separate letters in subsequent issues compounding the basic problem.

  5. “The point is to suggest there should be fewer barriers to the publication of criticism of this sort of high-profile publication. As it is now, you can make a dramatic claim based on a couple of survey questions filled out by 150 Mechanical Turk participants, publish it in Psychological Science, and get press around the world. Meanwhile, criticism of this work ends up appears in blogs etc. The most notorious example was that Bem paper where the journal that published his ridiculous paper refused to publish the failed replications.”

    I’d like to think that this is where new, peer reviewed, open-access journals like PLoS and PeerJ (particularly PeerJ, since it’s reasonably priced) would come into play. PeerJ explicitly states that they encourage replications of experiments.

    “Replication experiments are encouraged (provided the rationale for the replication is clearly described); however, we do not allow the ‘pointless’ repetition of well known, widely accepted results.”

    This, hopefully, is a step in the right direction, although it’s certainly not as good as (failed) replication studies being published in the same highly regarded journal as the original findings. But at least now there are more peer-reviewed venues for these sorts of things. I also hope that these open source journals take off, such that a person who publishes exclusively in PLoS and PeerJ is regarded as well as a person who publishes in the standard journals. I don’t know if that’s going to happen, but it’d be nice.

    There’s also the issue that, once an article is retracted, people who read the original article may not know it and may continue to cite it. Notice of retraction may never make it to the reader: http://scholarlykitchen.sspnet.org/2012/08/10/the-secret-life-of-retracted-articles/

  6. I’d think authors should welcome comments and criticisms because it will help with their citation counts.

    Anecdotally speaking, a number of people (including one commentor on this blog I don’t have the patience to look up) have said that their poorer work (not necessarily wrong, just a poor approach to a problem) gets a lot of citations because everyone rushes in to do better and cites the original work. In that case, perhaps the original work is groundbreaking, so really shouldn’t be considered poor in the first place. You have to start somewhere.

    Having worked in the trenches of natural language processing for 20+ years, I can say without a doubt that a lot of shoddy works gets published — some of it’s even had my name on it. It’s quite common to see quantitative corrections to results due to bugs in scoring scripts (I’ve even had to do it myself). And almost none of the reported p-values for “my system is better than system X” hold up because the authors don’t seem to realize that the items (words, phrases or sentences) over which they’re calculating are highly correlated because they’re drawn from a single document.

  7. The need for a “replication and criticism movement” within the sciences is a bit like the need for a “faith movement” within the church. Nontheless, it is, sadly, necessary. I can understanding why people find it shocking and appeal to the “self-correcting” nature of science. But Andrew is entirely right that the sort of publication issues he’s talking about are precisely the (all-too-absent) mechanism of self-correction. (I liked Mike Spagat’s way of putting this above.) While science is still conducted in the vague “spirit” of critique, it seems to lack a body. It’s forever there in theory, but too little practical criticism.

    • Thomas:

      One thing I’d like to address (maybe it should be a separate post) is a subtext of this discussion, which is that I’m continually being pointed to bad studies which then I’ll criticize or even, on occasion, mock; whereas Jeff Leek, Mina Bissell, and the psychology professor mentioned at the beginning of this article would prefer to focus on the good news, the studies that are done by honest, competent, and non-self-deluded researchers who are making legitimate research contributions that are ultimately leading to new technologies and deeper scientific understanding.

      To state it quickly, my basic position is that the good scientists shouldn’t mind that sloppy science is getting exposed for what it is. But their take, I think, is that sloppy science is a distraction, 90% of everything may well be crap but what’s the point of talking about it. My quick response to that response is that I don’t think the editors of Psychological Science or the American Sociological Review would agree that 90% of everything they publish is crap!

      Ultimately, though, I don’t know if the approach of “the critics” (including myself) is the right one. What if, every time someone pointed me to a bad paper, I were to just ignore it and instead post on something good? Maybe that would be better. The good news blog, just like the happy newspaper that only prints stories of firemen who rescue cats stuck in trees and cures for cancer. But . . . the only trouble is that newspapers, even serious newspapers, can have low standards for reporting “cures for cancer” etc. For example, here’s the Washington Post and here’s the New York Times. Unfortunately, these major news organizations seem often to follow the “if it’s published in a top journal, it must be correct” rule.

      Still and all, maybe it would be best for me, Ivan Oransky, Uri Simonsohn, and all the rest of us to just turn the other cheek, ignore the bad stuff and just resolutely focus on good news. It would be a reasonable choice, I think, and I would fully respect someone who were to blog just on stuff that he or she likes.

      But I do think that we, “the critics,” are making a useful contribution in our criticism. Consider things from the standpoint of the editorial board of Psychological Science, or Science, or Nature, or PNAS, or whatever. One might think it would make even more sense for them to focus on the good news and not publish the crap. But they can’t seem to do it! Which suggests to me that this whole “separate the good stuff from the crap” job is worth some research effort.

      • There’s an even more basic point, I think. The so-called “growth” of scientific knowledge consists largely in the correction false beliefs previously held. E.g., Copernicus, e.g., Galileo, e.g., Einstein. To say that these people made “positive” contributions is really to miss the whole point of what they did. E.g., Darwin.

        No serious gardener would say “never mind weeding, just focus on growing flowers”.

      • Andrew,

        I think you’re missing a big part of the subtext. Classical Statisticians now that classical statistics is going to pick up the lion’s share of the stink from all these failures. They want to mute the perception of failure enough to where they don’t have to reevaluate some of their deepest held philosophical positions.

        I don’t think that is a good move on their part since the failures are so common and so visible that every non-scientific person has reached the point where they summarily dismiss any “study” or “finding” without even looking at it. For example, Nutritionist are in the process of reversing just about every major piece of nutritional advice of the last 50 years – including that famous food pyramid we were all taught in school.

        When the failures are that visible, insisting that everything is ok and that all we need do is tweak the system a little, isn’t going to fly. But hey if that’s what they want to claim then fine. It’s not really a Bayesian’s job to save them from the consequences of their errors.

      • Please keep being critical. If people thought “what will Andrew Gelman say about this?” before the paper is sent off that would be good. That said, I often wonder why politicians say particular things when they know Jon Stewart will be critical.

      • > 90% of everything may well be crap but what’s the point of talking about it
        Until they or a loved one is diagnosed with a serious disease and only weeds are growing in research areas that might have informed treatment of _that_ disease.

        Or as Xiao-Li put it, “Such a “personalized situation” emphasizes that it is my interest/life at stake, which should encourage us to think more critically and creatively, not just to publish another paper or receive another prize.”
        http://www.stat.harvard.edu/Faculty_Content/meng/COPSS_50.pdf

        Also, why I am much more interested in strongly encouraging better research (more careful, documented and shareable) in the future and nothing provides stronger encouragement than sunshine that can shine into every detail of what was done and why (restricted to qualified third parties if needed.)
        (Not going to try to work this into Thomas’ gardening metaphor.)

        Perhaps there is widely held belief that scientists (like me) do not need such encouragement and suggesting they do tarnishes the interests of all scientists?

  8. I think we need a Replication Charter laying out pledges by authors, journals, hiring committees and others in the academic science “system”.

    This would include replication and correction polices at journals, academic depts., donors, grant institutions, etc.

    Kind of a Fair Trade mark or, more obstrusively, certification process elements in the system can pledge to. A measure of excellence.

  9. Thanks Andrew, you’re right that in my Edge piece (http://www.edge.org/response-detail/25445) I wasn’t suggesting that ALL studies must be registered to be published, in fact I myself do largely exploratory research in rapid iterative spurts of experiments that don’t lend themselves so well to preregistration. My point about registration is that it should be more common, to reduce p-hacking, and yes my main point is that the barrier for criticisms of published and non-replicated work to become available are too high. There are many innovations (e.g. PubMed Commons, and our psychfiledrawer.org) that we must support for science to live up to its self-correcting moniker.

  10. One way to look into this question of how right or wrong scientific papers are is to review progress in fields with enormous improvements in technology, such as genomics. Contemporary 2014 physical anthropologists looking to write the prehistory of humanity have access to tremendous data unavailable to their prececessors. So, how did earlier physical anthropologists do in their big books, such as Carleton Coon’s in 1965 or LL Cavalli-Sforza’s in 1994?

    I’d say, not bad to pretty good. But that’s my subjective judgment, while others could differ. But the point is that we could review a number of different fields that have enjoyed huge progress to get some sense of how wrong people were in the past.

    • Wait, isn’t revisiting old conclusions in light of totally new lines of evidence a completely different thing than correcting outright technical errors in the literature? And aren’t both of those quite different than, say, how skeptical to be of implausible claims based on small samples?

      This is one thing I’m slightly uneasy about with the whole replication/reproducibility/post-publication criticism movement. Too often, people seem to run together totally different issues. I don’t think it’s useful to lump together all reasons why a scientific claim might be wrong under a single heading, whether that heading is “replication” or “reproducibility” or whatever.

      • I agree with you but in one sense (utilitarian) it doesn’t matter. If I need to use your result it only matters if it is right. If wrong, it doesn’t matter so much as to why.

        Whether that regression is wrong due to a typo or malice the most important part is to let past & potential users know.

  11. If wrong findings don’t need to be corrected, if we don’t care if they are wrong or right, then social science must be inconsequential.

    I can think of some Congressmen willing stop wasting tax money financing inconsequential pursuits.

  12. Regarding the desire for something more than anecdotal critiques, I ran a “test for excess success” on all papers in Psychological Science with 4 or more experiments. The key finding is that 82% (36 of 44) articles appear to have too much success, relative to a standard criterion. Details at

    http://www1.psych.purdue.edu/~gfrancis/Publications/Francis2014PBR.pdf

    With a link within the article to supplemental material that describes the full analysis for each article.

  13. Very interesting/insightful blog, Andrew!

    Is it possible that this conversation points towards a 20th/21st century intellectual tragedy of the commons? Criticizing other work may be against a scientist’s immediate best interest (i.e. playing nice can make life easier), although in the long-run the disincentives for “conflict” catch up with the whole group by making it acceptable to churn out shaky results, and emboldening the idea that all science is highly biased. Also, as Bob suggested, the one who may benefit from questioning of a paper (in the context of the importance of citation metrics) is the original paper’s author… It takes time to formulate a critique of flawed work, and unless one has a proper fix to issues they raise, posing criticism may be seen as wasted energy at best, and aggressive at worst.

    It may be that academia is not crowded enough to incentivize individuals to publically dispute other’s work (or their own previous work). Unlike many business arenas where the most efficient producer/servicer can continue to accumulate market share, academia seems to allow for many people to carry on doing parallel work with incremental (and sometimes contradictory) advances. This last point, I would argue, encourages people to ‘toe the line’ and keep criticism to themselves… possibly out of fear that a challenge can be taken personally and turned into a political conflict.

    As a student I have only a few anecdotes to draw from when pondering such issues, but I think this type of conversation is important, and that a “self-correcting” scientific environment is not a given.

  14. Sorry to come in late, but I have to correct a factual misstatement in Andrew’s piece.

    “The most notorious example was that Bem paper where the journal that published his ridiculous paper refused to publish the failed replications.”
    I am the current editor of that journal (Journal of Personality and Social Psychology: Attitudes and Social Cognition), although I was not the editor when the Bem paper was published. In fact, in 2012 this journal published a paper reporting several failures to replicate, and also including a meta-analysis of not only the authors’ own studies but also all the attempted replications they could locate at the time. The meta-analytic combined effect size was not reliably different from zero.

    (In my own opinion, publishing this type of corrective paper is much more useful for the progress of science, compared to publishing numerous attempted replication papers individually — each one of would be picked up in the press as: “The Bem effects replicate!” “No they don’t!”)

    Galak, Jeff; LeBoeuf, Robyn A.; Nelson, Leif D.; Simmons, Joseph P. Correcting the past: Failures to replicate psi.
    Journal of Personality and Social Psychology, Vol 103(6), Dec 2012, 933-948.

    • Why the asymmetry? Why only publish meta analyses for replications? To be consistent why not do the same for original hypotheses?

      I mean, right now the press reads: “Women wear red when menstruating”, “women wear blue when menstruating”, “women wear [pick color] when menstruating”. You can also apply this to health: “Eating x amount of [favourite food] a day reduces [pick disease] significantly in a sample of 400,000 nurses”. And so on.

      Moreover, who cares what the press says?

    • Eliot:

      Here’s what I was talking about. As Fernando writes, the real question is, why is the Bem paper publishable while each individual non replication isn’t? In part, I suspect, because the Bem results are newsworthy, that is, surprising, that is, unlikely to be true (that is, to represent patterns in the general population under replication). Hence the paradox, that an untrue, unreplicable claim is more publishable than a true, replicable (but boring) result. It’s understandable (we wouldn’t, for example, want physics journals to be full of confirmation of gravity, that objects fall down and not up) but still somehow disturbing.

  15. Interesting article. But is Leek not using “scientific mistakes” to refer to failed hypotheses and leaving himself with no name for REAL mistakes. Crappy work from low standards, weird ideas or lousy mindless off the shelf experiments that everybody does and that don’t mean anything, plus fraud. Apropos your point Andrew, about criticisms of papers involving a lot less time and effort than the paper criticized and thus quality concerns there, how ’bout a science speciality in methodology. People who’s only work is making rigorous, valid critiques. They would have the time. Crit making would be professionalized (in the good not busywork sense) and what constitutes a competent critique come to be known to everyone. For instance the difference between giving an accurate explication of the ideas in a critiqued paper before saying what the critic thinks is wrong with it and a hatchet job. These methodology and critique specialists would do only that. They would not be in professional competition with the scientists doing the research. They would train as scientists then do this, kind of like scientists who go into science history and are history profs. Methodology, a specialty within science.

  16. Pingback: Post-publication “review”: signs of the times | Dynamic Ecology

  17. Pingback: All that is published does not replicate

  18. Pingback: Weekend reads: A psychology researcher’s confession, a state senator’s plagiarism | Retraction Watch

Comments are closed.