Post-publication peer review: How it (sometimes) really works

In an ideal world, research articles would be open to criticism and discussion in the same place where they are published, in a sort of non-corrupt version of Yelp. What is happening now is that the occasional paper or research area gets lots of press coverage, and this inspires reactions on science-focused blogs. The trouble here is that it’s easier to give off-the-cuff comments than detailed criticisms.

Here’s an example. It starts a couple years ago with this article by Ryota Kanai, Tom Feilden, Colin Firth, and Geraint Rees, on brain size and political orientation:

In a large sample of young adults, we related self-reported political attitudes to gray matter volume using structural MRI. We found that greater liberalism was associated with increased gray matter volume in the anterior cingulate cortex, whereas greater conservatism was associated with increased volume of the right amygdala. These results were replicated in an independent sample of additional participants. Our findings extend previous observations that political attitudes reflect differences in self-regulatory conflict monitoring . . .

My reaction was a vague sense of skepticism, but I didn’t have the energy to look at the paper in detail so I gave a sort of sideways reaction that did not criticize the article but did not take it seriously either:

Here’s my take on this. Conservatives are jerks, liberals are wimps. It’s that simple. So these researchers can test their hypotheses by more directly correlating the brain functions with the asshole/pussy dimension, no?

A commenter replied:

Did you read the paper? Conservatives are more likely to be cowards/pussies as you call it – more likely to jump when they see something scary, so the theory is that they support authoritarian policies to protect themselves from the boogieman.

The next month, my coblogger Erik Voeten reported on a similar paper by Darren Schreiber, Alan Simmons, Christopher Dawes, Taru Flagan, James Fowler, and Martin Paulus. Erik offered no comments at all, I assume because, like me, he did not actually read the paper in question. In our blogging, Erik and I were publicizing these papers and opening the floor for discussion, although not too much discussion actually happened.

A couple years later, the paper by Schreiber et al. came out in a journal and Voeten reblogged it, again with no reactions of his own. This time there was a pretty lively discussion with some commenters objecting to interpretations of the results, but nobody questioning the scientific claims. (The comment thread eventually became occupied by a troll, but that’s another issue.)

More recently, Dan Kahan was pointed to this same research article on “red and blue brains,” blogged it, and slammed it to the wall:

The paper reports the results of an fMRI—“functional magnetic resonance imagining”— study that the authors describe as showing that “liberals and conservatives use different regions of the brain when they think about risk.” . . .

So what do I think? . . . the paper supplies zero reason to adjust any view I have—or anyone else does, in my opinion—on any matter relating to individual differences in cognition & ideology.

Ouch. Kahan writes that Schreiber et al. used a fundamentally flawed statistical approach in which they basically went searching for statistical significance:

There are literally hundreds of thousands of potential “observations” in the brain of each study subject. Because there is constantly varying activation levels going on throughout the brain at all time, one can always find “statistically significant” correlations between stimuli and brain activation by chance. . . .

Schreiber et al. didn’t discipline their evidence-gathering . . . They did initially offer hypotheses based on four precisely defined brain ROIs in “the right amygdala, left insula, right entorhinal cortex, and anterior cingulate.” They picked these, they said, based on a 2011 paper [the one mentioned at the top of the present post] . . .

But contrary to their hypotheses, Schreiber et al. didn’t find any significant differences in the activation levels within the portions of either the amygdala or the anterior cingulate cortex singled out in the 2011 Kanai et al. paper. Nor did Schreiber et al. find any such differences in a host of other precisely defined areas (the “entorhinal cortex,” “left insula,” or “Right Entorhinal”) that Kanai et al. identified as differeing structurally among Democrats and Republicans in ways that could suggest the hypothesized differences in cognition.

In response, Schreiber et al. simply widened the lens, as it were, of their observational camera to take in a wider expanse of the brain. “The analysis of the specific spheres [from Kanai et al.] did not appear statistically significant,” they explain,” so larger ROIs based on the anatomy were used next.” . . .

Even after resorting to this device, Schreiber et al. found “no significant differences . . . in the anterior cingulate cortex,” but they did manage to find some “significant” differences among Democrats’ and Republicans’ brain activation levels in portions of the “right amygdala” and “insula.”

And it gets worse. Here’s Kahan again:

They selected observations of activating “voxels” in the amygdala of Republican subjects precisely because those voxels—as opposed to others that Schreiber et al. then ignored in “further analysis”—were “activating” in the manner that they were searching for in a large expanse of the brain. They then reported the resulting high correlation between these observed voxel activations and Republican party self-identification as a test for “predicting” subjects’ party affiliations—one that “significantly out-performs the longstanding parental model, correctly predicting 82.9% of the observed choices of party.”

This is bogus. Unless one “use[s] an independent dataset” to validate the predictive power of “the selected . . .voxels” detected in this way, Kriegeskorte et al. explain in their Nature Neuroscience paper, no valid inferences can be drawn. None.

Kahan follows up one of my favorite points, on the way in which multiple comparisons corrections exacerbate the statistical significance filter:

Pushing a button in one’s computer program to ramp up one’s “alpha” (the p-value threshold, essentially, used to avoid “type 1” errors) only means one has to search a bit harder; it still doesn’t make it any more valid to base inferences on “significant correlations” found only after deliberately searching for them within a collection of hundreds of thousands of observations.

Wow. Look what happened. Assuming Kahan is correct here, we all just accepted the claimed results. Nobody actually checked to see if they all made sense.

I thought a bit and left the following comment on Kahan’s blog:

Read between the lines. The paper originally was released in 2009 and was published in 2013 in PLOS-One, which is one step above appearing on Arxiv. PLOS-One publishes some good things (so does Arxiv) but it’s the place people place papers that can’t be placed. We can deduce that the paper was rejected by Science, Nature, various other biology journals, and maybe some political science journals as well.

I’m not saying you shouldn’t criticize the paper in question, but you can’t really demand better from a paper published in a bottom-feeder journal.

Again, just because something’s in a crap journal, doesn’t mean it’s crap; I’ve published lots of papers in unselective, low-prestige outlets. But it’s certainly no surprise if a paper published in a low-grade journal happens to be crap. They publish the things nobody else will touch.

Some of my favorite papers have been rejected many times before finally reaching publication. So I’m certainly not saying that appearance in a low-ranked journal is definitive evidence that a paper is flawed. But, if it’s really been rejected by 3 journals before getting to this point, that could be telling us something.

One of the problems with traditional pre-publication peer review is that it’s secret. What were the reasons that those 3 journals (I’m guessing) rejected the paper? Were they procedural reasons (“We don’t publish political science papers”), or irrelevant reasons (“I just don’t like this paper”), or valid criticisms (such as Kahan’s noted above)? We have no idea.

As we know so well, fatally flawed papers can appear in top journals and get fawning press; the pre-publication peer-review process is far from perfect. Post-publication peer-review seems like an excellent idea. But, as the above story indicates, it’s not so easy. You can get lots of “Andy Gelmans” and “Erik Voetens” who just post a paper without reading it, and only the occasional “Dan Kahan” who takes a detailed examination.

38 thoughts on “Post-publication peer review: How it (sometimes) really works

  1. One key problem in getting an academic version of Yelp going is that there’s a lot more people interested in posting a cogent review of my next door hole-in-the-wall Mexican take-out than the typical journal article.

    Let’s not kid ourselves: most published papers are hardly ever read; leave alone read with motivation enough for someone to post a meaningful critique.

    • “there’s a lot more people interested in posting a cogent review of my next door hole-in-the-wall Mexican take-out than the typical journal article”

      On top of that, while the average Yelp contributor can give you reasonably informative comments about that Mexican place, the group that can meaningfully address technical topics (like “activation levels within the portions of either the amygdala or the anterior cingulate cortex”) is much smaller. Between interest and qualifications, we’re talking tiny numbers here.

  2. I think the point about sharing previous peer reviews from journals that rejected is a good one. What a waste of information!

    Journals have a policy to limit concurrent submissions. Why not also ask: a) Did you submit this elswhere, b) Where, c) what where the reviews.

    This is like Bayesian adaptive testing. As the Nth editor you can narrow down on the problem, and send manuscript to one reviewer who is expert on that problem. It might take them five minutes to make a decision.

    • Undoable. First, it happens to everyone to receive terrible unjustified reviews with the Editor saying “I read the paper carefully myself, and I agree with the reviewers”. It makes me laugh every time. So, the “new” journal has to check your paper + the terrible unjustified reviews. Then, what happens when the paper has been rejected and I modify the paper according or not to reviewers’ suggestions? Do I have to inform the “new” journal that yes I received bad reviews, but the i thought about it and I modified the paper and here is the result. What happens if I lie? What a clumsy system (not the current one is not clumsy).

      • Terrible unjustified reviews (TURs) are informative. If Nature rejects on the basis on TURs then I have reason to pay attention to the manuscript (and pay less attention to Nature…). Meanwhile constructive reviews can lead to changes in paper. Just mention on cover letter, and point to page.

        I think this would speed up review. The current system is independent sampling 2-3 reviewers at a time ignoring all previous information. If the average road to publication is say 2-3 rejections we are probably doubling the amount of wasted effort. Talk about clumsy.

        To wit, the system is already in place with BMC journals (I think), where you can simply refer your (rejected) file to another BMC journal.

        #academic #publishing #middle-ages :-)

      • Peer review cascades of the sort Andrew suggests already exist. A bunch of neuroscience journals (from various publishers) have had one for years, though apparently few authors have chosen to use it. Various publishers in my own field of ecology have them, though again I’m unsure how heavily used they are. eLife, BMC, Plos, and EMBO just announced a new one:

        Another model is the system used by law review journals, in which authors submit papers to a centralized system to which all journals subscribe, and then journals “bid” on the papers:

        And then there’s Axios Review, which is a new idea in evolutionary biology. Basically it’s an independent editorial board unaffiliated with any journal, from which authors can get independent peer reviews that can be used at multiple journals.

    • For what its worth, the American Economic Association journals have an opt-in model of this. From their website,

      The American Economic Journals will consider papers previously reviewed by the American Economic Review. Authors of manuscripts previously reviewed by the AER may request that, subject to the referees’ agreement, the full correspondence files of the AER referees (including cover letters as well as referees’ reports) be shared with the editors of the AEJ to which the author is applying (AEJ: Applied Economics, AEJ: Economic Policy, AEJ: Macroeconomics, or AEJ: Microeconomics), who will have the discretion to make a decision based on these reports or to request additional reports.

      • I have the feeling that they want the technically correct papers rejected by AER, but maybe not novel enough for AER or something like that. The problem is that if it works the reader may have the feeling that “yes, nice, but it has been rejected by the more prestigious journal”, which may or may not make sense, but we should be aware that most of the papers are barely cited and barely read.

        • When I do research I just hit google scholar and other online databases. Some of the papers I find most useful are in obscure journals, online repositories, edited volumes, etc. Where it is published is not AT ALL something that enters into my search. But maybe I am an outlier.

  3. I also agree with Kahan’s on PloSOne (I published there, btw). Has anybody ever submitted to PLoSOne as a first option? If yes, why?

    Some days ago I wrote in my blog (the field in Ecology and Evolution, but it is true across fields) “PLoS One for papers that have been rejected at least a couple of times by more relevant journals (PNAS, rejected –> Ecology Letters, rejected –> American Naturalist, rejected –> ok I am fed up, let’s go PLoS One, accepted. Just ask).”

    Rahul above is right: “Let’s not kid ourselves: most published papers are hardly ever read; leave alone read with motivation enough for someone to post a meaningful critique.” Most of the time a terrible waste of time, money and effort, unfortunately.

  4. I have often submitted papers to PLoSONE as a first choice. Relatively quick review and lack of BS reasons to reject paper because of “news appeal”, and total open access are my priorities. In my field (human evolution/genetics), the associate editors/reviewers have been very competent and I find the rigor of review is basically the same as in field-specific journals (which have lower impact factors). The idea that PLoSONE papers “must have been rejected by other journals” is an incorrect assumption.

    In the case described here, I am beyond incredulous that any intelligent person would believe the claims of the grey matter/political orientation connection. I’m not saying it is an impossibility, just that it is so clearly in the “extraordinary claims” category relative to earlier work and known environmental factors influencing political orientation.

  5. John, yes mine was an incorrect assumption. In my field, every time I asked I received the “it has been rejected somewhere” answer (especially if there is a “famous” name among the authors) and n was not 1, so I thought I could generalize (btw, I submitted two times there and both times it was not first submission). It would be “interesting” to know (maybe just for me) if the fact that papers have been rejected somewhere before a PloSOne submission it is true (or never true) across disciplines.

  6. I actually like the PLoS ONE format. It publishes papers fast, it is open access, and it cares less about the politics of academia than traditional journals. Some work is unpublishable in the regular journals, for instance because the outcome of the study goes against the ideas of one or two prominent reviewers. By the way, the PLoS ONE impact factor is about twice as high as that of the leading stats journal, and higher than all but a few psychology journals.

    The key question is: are PLoS ONE papers on average of lower quality than papers published in the standard journals? Maybe, but I predict one would be hard pressed to show this empirically. The variance in quality may be higher. But I personally prefer a publication in PLoS ONE over a publication in one of the C journals. When you publish in PLoS ONE you want other researchers to notice your work (otherwise you could publish in the Quarterly Journal of Danish Psychobehavioral Research).

    Unfortunately the academic system is obsessed with publishing in high-impact journals and I’d jeopardize the academic future of my students if our lab published in open access journals too often.

    • Your point about PLoS ONE’s IF being higher than most psych journals is a reminder of how broken the IF is as a metric. It doesn’t make any sense to me that a journal should be high-impact in one field and low in another, unless it’s for rather different reasons than what the IF is measuring (e.g. I can see why it might be low impact in, say, Film Studies — but only because it’s presumably not read by anybody in that field).

    • EJ Wagenmakers: “for instance because the outcome of the study goes against the ideas of one or two prominent reviewers”

      The problem comes in spades when you are replicating their work and finding their conclusions don’t hold.

  7. 1: I find it weird that anyone could think that a journal that universal norms would be in play, for a journal like PLoS One. Of course some will use it as last resort, but differences will exist across all the disciplines that publish there. Which is also why the IF is tricky for such a journal.

    2: The journal Sociological Science is trying to open up a post-publication discussion, by allowing comments directly linked to the actual paper. Extracts from their website:
    “Open access: Accepted works are freely available, and authors retain copyright
    Timely: Sociological Science will make editorial decisions within 30 days; accepted works appear online immediately upon receipt of final version”
    “A community: The journal’s online presence is intended as a forum for commentary and debate aimed at advancing sociological knowledge and bringing into the open conversations that usually occur behind the scenes between authors and reviewers”

  8. Between 2009 and 2013 the paper has supposedly been submitted & rejected
    several times. Why did the peer reviewer’s comments not improve the paper
    along that way?

  9. Here’s the only time I’ve really been peed off by a paper and commented on the open-access journal’s website:
    The authors never replied (hmmm). Maybe they were too busy working for the agency hired by the Dutch government to advise on, yup you guessed it…

    How many stats-savvy people actually comment though? Not many, in the same way we don’t bother commenting on good or bad visualizations. We’re too busy doing pointless sample size calculations for mathphobic colleagues to wave at mathphobic ethics committees.

    • Robert: Why not use simulations to get (more realistic) sample size calculations?

      OK, most of them will also be computationally phobic but they do (seem to) understand more with simulations.

      (I don’t understand why for real applications folks still use over simplified formulas.)

      • K, your reply made me chuckle because it chimes in so exactly with what I’m doing, trying to integrate a simulation-based approach from teaching across to researcher support. Yes – of course it should be simulation-based unless it really is for some very simple randomised controlled trial with a beautifully Gaussian outcome. I couldn’t agree more about the formulas! But in many cases the whole idea of estimating sample size / power is ridiculous however you do it because it is such a shot in the dark.

        I mention sample size calcs just as a glib example of something I waste time on – and I couldn’t very well have said “reading Andrew Gelman’s blog” now could I!

  10. Pingback: Somewhere else, part 72 | Freakonometrics

  11. As an occasional reviewer for PLoS One (and I’ve published one paper there), I’ll offer a qualified defense and some reasons for why its IF is relatively good. The PLoS One charter is to publish essentially anything that is methodologically solid. Clearly, this doesn’t always happen (anymore than it happens with more prestigious journals), as can be seen by the comments about the paper that prompted this discussion. Most of the papers I’ve reviewed for PLoS One have been rejected because of methodological flaws. Because presitigious journals place a strong emphasis on novelty, PLoS One is a good venue for substantial but not necessarily exciting work. Publishing, for exasmple, a large volume of data on an animal model of disease which has been described previously but not necessarily with a lot of good prospective data. This is important for many research communities but hard to get into prestigious journals. Or publishing a negative result, maybe not exciting but important to prevent others from pursuing a blind alley. You can argue that the more prestigious journals ought to be demanding and publishing this kind of work, but they don’t, opening a useful place for PLoS One.

    As mentioned in one of the prior comments, the journal is run efficiently and publishes promptly. The scientist-run open access model is very attractive to many investigators.

    • Albin:

      I have no problem with Plos-One. In this particular case, given this set of authors, I have a feeling the paper was rejected by other places before going there.

      • Really, is “I’m not saying you shouldn’t criticize the paper in question, but you can’t really demand better from a paper published in a bottom-feeder journal” how you define “I have no problem with Plos One? You are a strong authority and you send a clear signal to young/other scholars: Don’t send anything there, it’s at the bottom and willing to publish crap.

        • Anon:

          But Plos-One is a lower-tier journal (at least, compared to the alternative places where I’m pretty sure these authors sent their paper) and that journal is willing to publish crap. I’m only stating the truth! That said, they also publish lots of good stuff.

          Regarding the signal you think I’m sending: let me clarify. I’ve published stuff in all sorts of bottom-feeder journals. I think it’s perfectly ok to do that. In fact, in a recent post I recommended that authors of speculative research studies send their papers to lower-ranked venues rather than confusing things by publishing their random p-values in top-ranked journals. Those of us who do research often end up with inconclusive findings or very specialized results, and I think that low-ranked journals are a perfect place to send such things.

        • Andrew:

          I disagree on two counts.

          First, as discussed before in this blog lead journals balance sensational with solid. The former often leads them to be “air feeders”, perhaps a fluffier version of horse manure.

          Second, I have a different take on what qualifies as a top scientific journal. Yours appears to be based on IF, name recognition, etc… I would put more emphasis on, inter alia:

          1. Focus on quality over and above sensational;
          2. Enforce pre-registration of empirical studies (observational and experimental);
          3. Be willing to consider replications (AJPS has a policy of not accepting unsolicited rejoinders, which in my view disqualifies it as a scientific journal, period).
          4. Open access.
          5. Publishes peer reviews online, plus additional materials.

          In conclusion, “top journals” also publish a lot of air, and some do not even pass the laugh test of a scientific journal.

          Plos One is a leader in many respects, and, in my view, one of the few truly scientific journals around.

        • I agree. I understand what you are trying to say, and why PLoS One may be seen as a bottom feeder.

          But I think that says more about the discipline than about PLoS One.

  12. I think there should be criticisms but I think Kahan and others are wrong to focus on the multiple comparisons issue, i.e. “There are literally hundreds of thousands of potential “observations” in the brain of each study subject. Because there is constantly varying activation levels going on throughout the brain at all time, one can always find “statistically significant” correlations between stimuli and brain activation by chance”

    I wouldn’t be surprised if different patterns of thinking within conservative vs. liberal populations really did correlate with actual differences in patterns in brain activation. The _real_ critique, the reason this adds very little to our knowledge is that the study design does not isolate a causal explanation for the what’s observed. This is a point Kahan brings up in the next post. I don’t have a hard time believing that there will be an actual correlation to be found which differentiates these two populations, the question is so what? What have we learned? In this kind of study design, very little.

  13. Pingback: Special Event: “The End of IR Theory” Symposium » Duck of Minerva

  14. Pingback: Special Event: “The End of IR Theory” Symposium | Symposium Magazine

  15. Pingback: Friday links: liberal arts ecologists, potato beetle bombs, good modelers “lie, cheat, and steal”, and more | Dynamic Ecology

  16. Pingback: What we’re reading: Compressed genomes, drafting genes, and the third post-publication peer reviewer | The Molecular Ecologist

  17. Late comment here: One thing I love about astronomy is that we all publish in the bottom-feeder of arXiv, even if we are also submitting to strongly refereed journals. So arXiv becomes a union of all papers, where the good and bad coexist and judgement must be applied. That’s good. It is especially good because (in my view) refereeing doesn’t really protect us from mediocre papers; it only really protects us from very good papers and very bad ones, or perhaps is very much random.

    Another effect in astronomy is that we tend to have long reference lists and long introductions. That makes the literature itself a not-very-corrupt yelp.

Comments are closed.