When does peer review make no damn sense?

Disclaimer: This post is not peer reviewed in the traditional sense of being vetted for publication by three people with backgrounds similar to mine. Instead, thousands of commenters, many of whom are not my peers—in the useful sense that, not being my peers, your perspectives are different from mine, and you might catch big conceptual errors or omissions that I never even noticed—have the opportunity to point out errors and gaps in my reasoning, to ask questions, and to draw out various implications of what I wrote. Not “peer reviewed”; actually peer reviewed and more; better than peer reviewed.

Last week we discussed Simmons and Simonsohn’s survey of some of the literature on the so-called power pose, where they wrote:

While the simplest explanation is that all studied effects are zero, it may be that one or two of them are real (any more and we would see a right-skewed p-curve). However, at this point the evidence for the basic effect seems too fragile to search for moderators or to advocate for people to engage in power posing to better their lives.

Also:

Even if the effect existed, the replication suggests the original experiment could not have meaningfully studied it.

The first response of one of the power-pose researchers was:

I’m pleased that people are interested in discussing the research on the effects of adopting expansive postures. I hope, as always, that this discussion will help to deepen our understanding of this and related phenomena, and clarify directions for future research. . . . I respectfully disagree with the interpretations and conclusions of Simonsohn et al., but I’m considering these issues very carefully and look forward to further progress on this important topic.

This response was pleasant enough but I found it unsatisfactory because it did not even consider the possibility that her original finding was spurious.

After Kaiser Fung and I publicized Simmons and Simonsohn’s work in Slate, the power-pose author responded more forcefully:

The fact that Gelman is referring to a non-peer-reviewed blog, which uses a new statistical approach that we now know has all kinds of problems, as the basis of his article is the WORST form of scientific overreach. And I am certainly not obligated to respond to a personal blog. That does not mean I have not closely inspected their analyses. In fact, I have, and they are flat-out wrong. Their analyses are riddled with mistakes, not fully inclusive of all the relevant literature and p-values, and the “correct” analysis shows clear evidential value for the feedback effects of posture.

Amy Cuddy, the author of this response, did not at any place explain how Simmons and Simonsohn were “flat-out wrong,” nor did she list even one of the mistakes with which their analyses were “riddled.”

Peer review

The part of the above quote I want focus on, though, is the phrase “non-peer-reviewed.” Peer reviewed papers have errors, of course (does the name “Daryl Bem” ring a bell?). Two of my own published peer-reviewed articles had errors so severe as to destroy their conclusions! But that’s ok, nobody’s claiming perfection. The claim, I think, is that peer-reviewed articles are much less likely to contain errors, as compared to non-peer-reviewed articles (or non-peer-reviewed blog posts). And the claim behind that, I think, is that peer review is likely to catch errors.

And this brings up the question I want to address today: What sort of errors can we expect peer review to catch?

I’m well placed to answer this question as I’ve published hundreds of peer-reviewed papers and written thousands of referee reports for journals. And of course I’ve also done a bit of post-publication review in recent years.

To jump to the punch line: the problem with peer review is with the peers.

In short, if an entire group of peers has a misconception, peer review can simply perpetuate error. We’ve seen this a lot in recent years, for example that paper on ovulation and voting was reviewed by peers who didn’t realize the implausibility of 20-percentage-point vote swings during the campaign, peers who also didn’t know about the garden of forking paths. That paper on beauty and sex ratio was reviewed by peers who didn’t know much about the determinants of sex ratio and didn’t know much about the difficulties of estimating tiny effects from small sample sizes.

OK, let’s step back for a minute. What is peer review good for? Peer reviewers can catch typos, they can catch certain logical flaws in an argument, they can notice the absence of references to the relevant literature—that is, the literature that the peers are familiar with. That’s how the peer reviewers for that psychology paper on ovulation and voting didn’t catch the error of claiming that days 6-14 were the most fertile days of the cycle: these reviewers were peers of the people who made the mistake in the first place!

Peer review has its place. But peer reviewers have blind spots. If you want to really review a paper, you need peer reviewers who can tell you if you’re missing something within the literature—and you need outside reviewers who can rescue you from groupthink. If you’re writing a paper on himmicanes and hurricanes, you want a peer reviewer who can connect you to other literature on psychological biases, and you also want an outside reviewer—someone without a personal and intellectual stake in you being right—who can point out all the flaws in your analysis and can maybe talk you out of trying to publish it.

Peer review is subject to groupthink, and peer review is subject to incentives to publishing things that the reviewers are already working on.

This is not to say that a peer-reviewed paper is necessarily bad—I stand by over 99% of my own peer-reviewed publications!—rather, my point is that there are circumstances in which peer review doesn’t give you much.

To return to the example of power pose: There are lots of papers in this literature and there’s a group of scientists who believe that power pose is real, that it’s detectable, and indeed that it can help millions of people. There’s also a group of scientists who believe that any effects of power pose are small, highly variable, and not detectable by the methods used in the leading papers in this literature.

Fine. Scientific disagreements exist. Replication studies have been performed on various power-pose experiments (indeed, it’s the null result from one of these replications that got this discussion going), and the debate can continue.

But, my point here is that peer-review doesn’t get you much. The peers of the power-pose researchers are . . . other power-pose researchers. Or researchers on embodied cognition, or on other debatable claims in experimental psychology. Or maybe other scientists who don’t work in this area but have heard good things about it and want to be supportive of this work.

And sometimes a paper will get unsupportive reviews. The peer review process is no guarantee. But then authors can try again until they get those three magic positive reviews. And peer review—review by true peers of the authors—can be a problem, if the reviewers are trapped in the same set of misconceptions, the same wrong framework.

To put it another way, peer review is conditional. Papers in the Journal of Freudian Studies will give you a good sense of what Freudians believe, papers in the Journal of Marxian Studies will give you a good sense of what Marxians believe, and so forth. This can serve a useful role. If you’re already working in one of these frameworks, or if you’re interested in how these fields operate, it can make sense to get the inside view. I’ve published (and reviewed papers for) the journal Bayesian Analysis. If you’re anti-Bayesian (not so many of these anymore), you’ll probably think all these papers are a crock of poop and you can ignore them, and that’s fine.

(Parts of) the journals Psychological Science and PPNAS have been the house organs for a certain variety of social psychology that a lot of people (not just me!) don’t really trust. Publication in these journals is conditional on the peers who believe the following equation:

“p less than .05” + a plausible-sounding theory = science.

Lots of papers in recent years by Uri Simonsohn, Brian Nosek, John Ioannidis, Katherine Button, etc etc etc., have explored why the above equation is incorrect.

But there are some peers that haven’t got the message yet. Not that they would endorse the above statement when written as crudely as in that equation, but I think this is how they’re operating.

And, perhaps more to the point, many of the papers being discussed are several years or even decades old, dating back to a time when almost nobody (myself included) realized how wrong the above equation is.

Back to power pose

And now back to the power pose paper by Carney et al. It has many garden-of-forking-paths issues (see here for a few of them). Or, as Simonsohn would say, many researcher degrees of freedom.

But this paper was published in 2010! Who knew about the garden of forking paths in 2010? Not the peers of the authors of this paper. Maybe not me either, had it been sent to me to review.

What we really needed (and, luckily, we can get) is post-publication review: not peer reviews, but outside reviews, in this case reviews by people who are outside of the original paper both in research area and in time.

And also this, from another blog comment:

It is also striking how very close to the .05 threshhold some of the implied p-values are. For example, for the task where the participants got the opportunity to gamble the reported chi-square is 3.86 which has an associated p-value of .04945.

Of course, this reported chi-square value does not seem to match the data because it appears from what is written on page 4 of the Carney et al. paper that 22 participants were in the high power-pose condition (19 took the gamble, 3 did not) while 20 were in the low power-pose condition (12 took the gamble, 8 did not). The chi-square associated with a 2 x 2 contingency table with this data is 3.7667 and not 3.86 as reported in the paper. The associated p-value is .052 – not less than .05.

You can’t expect peer reviewers to check these sorts of calculations—it’s not like you could require authors to supply their data and an R or Stata script to replicate the analyses, ha ha ha. The real problem is that the peer reviewers were sitting there, ready to wave past the finish line a result with p less than .05, which provides an obvious incentive for the authors to get p less than .05, one way or another.

Commenters also pointed out an earlier paper by one of the same authors, this time on stereotypes of the elderly, from 2005, that had a bunch more garden-of-forking-paths issues and also misreported two t statistics: the actual values were something like 1.79 and 3.34; the reported values were 5.03 and 11.14! Again, you can’t expect peer reviewers to catch these problems (nobody was thinking about forking paths in 2005, and who’d think to recalculate a t statistic?), but outsiders can find them, and did.

At this point one might say that this doesn’t matter, that the weight of the evidence, one way or another, can’t depend on whether a particular comparison in one paper was or was not statistically significant—but if you really believe this, what does it say about the value of the peer-reviewed publication?

Again, I’m not saying that peer review is useless. In particular, peers of the authors should be able to have a good sense of how the storytelling theorizing in the article fits in with the rest of the literature. Just don’t expect peers to do any assessment of the evidence.

Linking as peer review

Now let’s consider the Simmons and Simonsohn blog post. It’s not peer reviewed—except it kinda is! Kaiser Fung and I chose to cite Simmons and Simonsohn in our article. We peer reviewed the Simmons and Simonsohn post.

This is not to say that Kaiser and I are certain that Simmons and Simonsohn made no mistakes in that post; peer review never claims to that sort of perfection.

But I’d argue that our willingness to cite Simmons and Simonsohn is a stronger peer review than whatever was done for those two articles cited above. I say this not just because those papers had demonstrable errors which affect their conclusions (and, yes, in the argot of psychology papers, if a p-value shifts from one side of .05 to the other, it does affect the conclusions).

I say this also because of the process. When Kaiser and I cite Simmons and Simonsohn in the way that we do, we’re putting a little bit of our reputation on the line. If Simmons and Simonsohn made consequential errors—and, hey, maybe they did, I didn’t check their math, any more than the peer reviewers of the power pose papers checked their math—that rebounds negatively on us, that we trusted something untrustworthy. In contrast, the peer reviewers of those two papers are anonymous. The peer review that they did was much less costly, reputationally speaking, than ours. We have skin in the game, they do not.

Beyond this, Simmons and Simonsohn say exactly what they did, so you can work it out yourself. I trust this more than the opinions of 3 peers of the authors in 2010, or 3 other peers in 2005.

Summary

Peer review can serve some useful purposes. But to the extent the reviewers are actually peers of the authors, they can easily have the same blind spots. I think outside review can serve a useful purpose as well.

If the authors of many of these PPNAS or Psychological Science-type papers really don’t know what they’re doing (as seems to be the case), then it’s no surprise that peer review will fail. They’re part of a whole peer group that doesn’t understand statistics. So, from that perspective, perhaps we should trust “peer review” less than we should trust “outside review.”

I am hoping that peer review in this area will improve, given the widespread discussion of researcher degrees of freedom and garden of forking paths. Even so, though, we’ll continue to have a “legacy” problem of previously published papers with all sorts of problems, up to and including t statistics misreported by factors of 3. Perhaps we’ll have to speak of “post-2015 peer-reviewed articles” and “pre-2015 peer-reviewed articles” as different things?

102 thoughts on “When does peer review make no damn sense?

  1. On the subject of elementary statistical errors in peer-reviewed research, have you looked into this latest example:

    On the Viability of Conspiratorial Beliefs
    http://www.plosone.org/article/comments/info%3Adoi%2F10.1371%2Fjournal.pone.0147905

    The paper had a graph showing the probability of a conspiracy breaking down increasing over time, then decreasing. The reviewers thought this was fine.

    There seems to be a tendency for conspiracy theory papers to contain basic errors. That’s the third example I’m aware of.

    • I’m not sure which figure you’re talking about, but it looks to me like all those figures have some kind of explanation which isn’t necessarily problematic.

      The probability *per unit time* can certainly decrease. The cumulative probability obviously can’t. All the decreasing figures I saw looked like they were intended as *per unit time* and the vertical scale seems to make sense in that way, though it’s possible the figure labels misrepresented this.

        • The curves in Figure 1 are cumulative probabilities. And yet some of them go down at long times. Which is of course utterly impossible, since it implies negative probabilities, but that didn’t seem to trouble the reviewers in the slightest. The author and the editor have now admitted that this was a ghastly mistake, but apparently the fact that the underlying model is hopelessly flawed doesn’t affect the conclusions at all – make of that what you will. How this travesty got through editing and peer review remains unexplained.

        • I think Daniel Lakeland partially answers Jonathan’s question – the graph in figure 1 is not clearly labelled/captioned. It confused me for a while. But however you look at it or interpret it, the blue and yellow curves together make no sense. So there’s really no excuse for the peer reviewers not picking this up.
          And this paper has received huge media publicity.

  2. Cuddy’s response reminded me of this famous quote from a natural philosopher at Oxford about some praise Voltaire had for Newton’s work on gravity, calculus, and mechanics, contained in Newton’s Principia Mathematica:

    The fact that Voltaire is referring to a non-peer-reviewed book, which uses a new mathematical approach that we now know has all kinds of problems, as the basis of his article is the WORST form of scientific overreach.

    Of course that was 300 years ago, and times have moved on. We’re no longer talking about Newton, Voltaire or Principia Mathematica. We’re talking about power-poses and p-values. Think about that for a moment whenever someone uses the phrase “the progress of science”.

  3. The undue faith in peer review seems to be an implicit version of the “appeal to authority” fallacy, in which information is deemed valid simply because of its association with experts. The problem is even worse when considering the politics of peer review (name recognition, unfavorable reviews because it conflicts with one’s own work, etc.). What is that quote? “[Peer-reviewed publications] are like sausages; it is better not to see them being made”…

  4. I am with you 100% about breaking the misconception that peer review is alchemy that turns lead to gold. It is about filtering “well enough”. (I’d frame it using type 1 and type 2 errors.)

    And I agree that peer review should include both experts in the area as well as smart non-experts who provide a “smell test”. But I thought journal editors often did this especially in general journals. You assign two reviewers, one an subject expert, one a smart non-expert. I suppose increasingly journals assign only one reviewer, and this will be a subject expert, so there’s no chance to have a bigger-picture referee.

    It seems to me your argument can also be framed in terms of number of reviewers vs expertise. You start with 1 reviewer who is the “best expert” to referee the paper. You add a second, who is nearly as good. Now, the average quality falls, but you can through diversity of evaluation. You can keep adding reviewers, and the average expertise falls, but you gain more diversity of perspectives. There must be some number N at which the tradeoff is optimal. How big is N? I’m not sure that letting just anybody comment is necessarily a good idea–I liked in a previous post how several commenters mentioned reputation and having an anonymous but identifiable commenter name.

    Finally–thousands of referee reports? It takes me one day to write a good report, assuming the paper is not total rubbish. I must be terribly inefficient, or the papers I referee must be very different than yours.

      • The most efficient approach is to just say “no” to refereeing requests. Turns out nothing comes of it other than fewer requests to referee in the future. I always so “no” to non-open-access journals; if they want to own the work and limit its distribution, they can pay referees (not that I’d do it for pay for them, either).

        The second most efficient approach is to the limit the time you spend on reviews. Set aside 15 minutes (or an hour) respond in whatever useful way you can, and move on. I believe Andrew and I both follow this policy on reviews we do.

        The third most efficient approach is to delegate to more junior people. This can be number 1.5 if you don’t review what they produce and you don’t need to bug them to finish it.

        The least efficient approach is to spend all day trying to understand a paper. But who has time for that?

        • Bob – “efficiency” here must mean some sort of optimal trade-off between quality of refereeing and time. And in the end, I think that comes down to a question of preferences over where and how we value the contributions we make to our fields. But I would just point out that this “efficiency” of which you speak is really just something about the combination of your preferences and how much you value other contributions you could be making, and that for other people with different preferences and different time demands, your ordering would not be “efficient”.

          For me, I (at least for the moment like to) think that I can make a valuable contribution through careful refereeing, so I spend more time on it than that. I also learn a lot in the process (about evidence, methods and writing). Of course that means I spend less time doing research, preparing for class, attending conferences… you know, whatever else I could be doing that would also make contributions to the field and my own career trajectory. But that is just the optimal trade-off for me.

          Or did you mean efficient just as “least amount of time spent possible”, without commenting on the underlying objective of the researcher dividing up their time?

        • I’m just thinking efficiency for the field. I’d rather get feedback from four people spending an hour each than one person spending eight hours. There’s way too much variance that comes from the small number of reviewers.

          Part of the inefficiency comes from mismatch in what I get sent. I’m only truly an expert on a very narrow range of subjects. Yet I get asked to referee a much broader set of topics. For just about everything outside my narrow speciality, it’s a heroic amount of effort to understand the paper enough to do a careful review on its own merit and basically impossible to understand the context well enough to make an informed judgment. That’s what I did as a grad student when reviews were passed down to me and I think it was a good learning experience.

          Another inefficiency comes from the dead drop and log delay nature of the whole thing. I’d much rather give an author feedback on an early draft than to wait for them to polish the whole thing (perhaps wasting a ton of time generating plots, analyses, graphs, etc.), send it to a journal, and then have me write something based just on the paper with no dialogue (other than maybe a disgruntled author writing to the editor), and a long time delay.

          And I have put in my time on program committees for conferences, editorial boards for journals, and panels for grant organizations. All of which has led me to the conclusion that peer review as currently practiced is overrated.

    • Instead of framing the trade off in terms of quantity (number of reviewers) vs. expertise, I think of the trade off as diversity vs. expertise. Peer reviews typically have low diversity of opinion but high expertise, and public reviews have high expertise but low diversity of opinion. The problem is that both are typically needed for good group decisions.

      What’s missing from this discussion, though, is independence of opinion. Peer reviews have high independence, as reviewers are not aware of the opinions of the other reviewers when they produce their first round reviews. Public reviews, on the other hand, often become echo chambers where commenters are unduly influenced by other commenters. Work by Pentland and others at MIT show how this typically degrades decision quality.

      So we want all three–diversity, expertise, and independence–but it’s probably impossible to get them all.

      • You are optimistic, I will give you that, but perhaps overly so about the ‘high expertise’ part.

        Consider a well known historical example:

        How many scientists at the time had the expertise to adequately review Einstein’s papers? Were these the people charged with doing so? How does the peer review system constrain the application of and misapply expertise via rejection of a competitors research? How and when is outdated expertise recognized within a system such as this?

        I think Andrews point here is right on target:

        “In short, if an entire group of peers has a misconception, peer review can simply perpetuate error. We’ve seen this a lot in recent years, for example that paper on ovulation and voting was reviewed by peers who didn’t realize the implausibility of 20-percentage-point vote swings during the campaign, peers who also didn’t know about the garden of forking paths. That paper on beauty and sex ratio was reviewed by peers who didn’t know much about the determinants of sex ratio and didn’t know much about the difficulties of estimating tiny effects from small sample sizes.”

        • I think that your anecdote is an extreme one that doesn’t describe most peer review situations. Einstein’s papers were scientific revolutions (a la Thomas Kuhn), and these happen extremely infrequently. In the vast majority of situations, peer reviewers do have at least a modicum of expertise that is helpful in critiquing manuscripts.

          I’m not arguing for peer review over public review, by the way. I’m suggesting that they each have their flaws and you trade off some benefits when you choose one over the other.

        • It would be great if we could have both, but I don’t see any feasible way to make that happen. How can we have expertise, diversity of opinion, and independence of opinion all at once? It seems to me that maximizing one of them necessarily involves minimizing at least one of the others.

        • I think the argument is that open sourced peer review allows for one to have all three. With it we are more likely to have contributions from expert, analytical, and diverse reviewers. It does require a willingness to disregard the nonsensical and the intentionally provocative, but the final revisions after such a process will be more likely to result in better theory and application than the current system.

        • Michael:

          We currently have both. For example, the power pose study was approved by the peers of Carney, Cuddy, and Yap, and then errors were found in outside review. The main change I would like, compared to current practice, is for people to realize that peer review, without outside review, doesn’t tell us that much. And that, conversely, outside review can be valuable even if it not published by the Association for Psychological Science or Elsevier or whoever.

        • Andrew:

          The site won’t let me respond to your comment below (too many levels?), so I’m responding to it here. I completely agree that the peer review system commits Type 1 errors and lets in nonsense that can and should be called out in public review. No one should take peer-reviewed articles on blind faith; if they do, they are idiots and not scientists. Cuddy’s response that Simonsohn’s concerns can be dismissed out-of-hand simply because they were not peer-reviewed is ridiculous.

          But I still maintain that public review lacks independence and is often an echo chamber where people are unduly influenced by early comments (Salganik & Watts, 2009). This reduces the benefit of diversity opinion in perhaps the same way that many statistical analyses are compromised when the cases being studied are not independent. Curious suggests below that we have to be willing to disregard the nonsensical and intentionally provocative comments. I completely agree, but (a) can we, really? System 1 thinking, selective perception, confirmation bias, etc. suggest that maybe we can’t; and (b) how high is the noise-to-signal ratio? Masses of inane comments often drown out the few good ones in online discussions (present site excepted).

        • @Michael J

          A working example of such a system (“expertise, diversity of opinion, and independence of opinion all at once”) I think is the StackExchange model.

          The comments you get on there have, I think, all of the attributes you mention. In particular, I think post-publication review benefits from some sort of credibility indicators.

          One difference to make such a model work for academic papers would be incentives. It is a lot more work to read a paper than to (usually) answer a specific question of the sort asked on SE. Maybe the funding agencies can work out a model so that good review comments (as measured by votes say) would translate into $$.

          One would hope that the better papers attract more reviewer interest. You could have authors, funding agencies or third parties put some bounty to attract reviewer attention to less commented on papers. When funding agencies grant $100,000 to equipment & manpower, a few thousand dollars for review services doesn’t sound too bad.

          I would also hope that the structure of typical papers would improve in response to this new system. Commentators might demand more concise, crisp papers rather than the rambling bloat (e.g. 30 pages) that today often passes as a “good” paper. You might also expect more data to be posted online because of commentator demand.

          Maybe this is too quixotic on the whole but I think it is worth a shot.

        • That anecdote is right in line with my 30+ years of reviewing experience. The point is that “a modicum of expertise” isn’t enough. That’s why it takes people all day to review a paper and it’s a good “learning experience” (though arguably reading accepted papers would be a better strategy for learning if we’re still talking meta-game).

          Right now, nobody can review Michael Betancourt’s papers on geometry of HMC. Not because the mathematical tools aren’t well known to mathematicians and physicists, but because the statisticians doing the reviews don’t know them. And it was the same problem I had back when I was doing linguistic syntax and semantics and employing type theory and topology to language semantics; it was mainstream in logic and in parts of computer science programming language theory, but virtually unknown in linguistics. I had papers rejected with a simple “this isn’t linguistics”. I had a proposal rejected by NSF as being “too European” (Americans didn’t do computation and logic in linguitsics in the 1980s if they wanted to get funded or get tenure). I had proposals go to NSF linguistics, only to be handed over to computer science becuase the linguists didn’t think it was linguistics, then handed back to linguistics because the computer scientists didn’t think it was computer science (the proposal was to write ALE, which was the basis of my first book, which took about 15 years to catch on); that one got high marks from reviewers, but was rejected by the program director out of hand as “not linguistics.”

          Peer review’s been virtually useless in my experience. Whereas I’ve gotten tons and tons of great feedback from people on blog posts, on paper drafts, from colleagues, in response to talks, etc. It’s just too hard to match someone who cares and knows enough to be useful as part of the reviewing process.

          Einstein’s papers were both revolutionary and building on Maxwell and Riemann. Ideas don’t come out of a vacuum, so to speak. I highly recommend Gleick’s bio of Feynmann, Genius, which packs a lot of sociological insight into how the sausages are made.

        • Speaking of Einstein, rumor has it he only had one refereed paper, and the referee thought he was wrong (the referee appears to be right).

          During the period of time when science achieved the most with far less resources than today, they didn’t have modern peer review. The norm was for people to submit manuscripts to some bigwig on the board, who would then “communicate” it to the society for publication if they wanted to. The bigwig’s name was publicly attached to the paper. If they had reservations about the paper but thought it should be published they might say “communicated by Mr. Bigwig with the following reservations …”.

          The whole idea that modern peer review is the way science is done is ludicrous. If it was burned down today, by tomorrow everyone would have found better ways to spot good papers among the crap. Things would only improve from there.

        • How about this perspective— and again I’m not arguing FOR peer review, I’m just trying to think through the arguments for and against—peer review can serve as a curator for quality work. In a completely open-access system, every paper gets published regardless of quality. That’s way more than can I can sift through; I can’t read every paper and determine whether it is good or not. I’ll have to rely on someone (or a group of people) to sort out the good from the bad, so that I can spend my limited time keeping up with the high-quality work. In our current system, peer review serves that curating function. Editors and reviewers separate the wheat from the chaff, and I can read a few top journals each month and keep up with high-quality work in my field.

          Is it perfect? Obviously not. As another commenter noted, there are Type 1 and Type 2 errors in this system. Stuff gets through that shouldn’t, and other stuff doesn’t get through that should. But this will happen with any system, won’t it?

        • In 30 years of economics, I’ve had lots of useful referee reports—- well over half, I would say have improved my paper. This is entirely consistent with the Gelman critique of this post, because they catch the flaws people in the subfield can catch, not all the flaws. Peer review is highly useful—especially for young scholars, who can learn a lot from their rejections— it just has limitations.

        • Yes. Einstein’s insights was an example of an outlier, which is why I hesitated to use that as an example and hoped you would look past the name and to the point I was making, which was to reiterate Andrew’s point:

          “if an entire group of peers has a misconception, peer review can simply perpetuate error”

          This is far more common than you are allowing for with your assumption of ‘high expertise’ for peer reviewers.

      • Michael:

        You wrote, “Peer reviews typically have low diversity of opinion but high expertise, and public reviews have high expertise but low diversity of opinion.”

        I am guessing that one of these says the opposite of what you intended. Which one?

  5. I think there’s some middle ground to be had between the excessive closedness of peer review — which suffers by being inadequately thorough and often political — and free-for-all blogospheric “debunking” of conclusions we don’t like or “believe.”

    I say this, only because I’ve been on the other side of this: My colleagues and I published what I consider a thorough, careful paper, with admitted and discussed limitations in data collection. We made the article publicly available, and provided data and code in an R package.

    A researcher whose prior work had come to different conclusions than ours published and publicized a “debunking” of our article, where he identified a few bits of bad data in our sample—claiming that this problem likely explained our entire result, and it was soon taken as fact amongst his readers and other skeptics that our study was “fatally flawed.” Most of the folks working in this field are not highly trained in statistics, and so don’t have quite the tools to independently evaluate these kinds of claims—and at no point did anyone look thoroughly at the data we provided.

    A small bit of math would have easily demonstrated that the problem he identified couldn’t have explained more than a few percentage points of our pretty sizeable result. We ended up spending a lot of money and time going through the entire sample and collecting more data to respond to the “debunking”—at the end of which our result of course still held. And the article is now through peer review and accepted for publication.

    I have mixed feelings about doing science by blog from this experience. On the one hand, the back and forth did make our paper better, which is what we wanted from making it all publicly avaialable. On the other hand, the ease with which people claimed to “refute” or “debunk” and entire piece of research over cherry-picked flaws was very troubling.

    • Carl:

      Rather than a middle ground, I think we need both. As I wrote int the above post: “Peer review can serve some useful purposes. . . . I think outside review can serve a useful purpose as well.”

      • I guess my point is that they can both tend to extremes, and I don’t know if it’s guaranteed, or even likely that the truth will out just because you have two flawed discourses instead of one.

    • Andy:

      Yes, as I wrote in an earlier thread, I am immune to Paul Coster’s charge of hypocrisy as I do not criticize the media for commenting on unpublished research.

      To be fair, I don’t think he’s calling me a hypocrite. I think he’s willfully ignoring evidence that counters his worldview, which is too bad, and I think he’s overestimating the value of peer review, as I discuss in the above post.

  6. Re “To jump to the punch line: the problem with peer review is with the peers.”

    This is part of why sometimes you need people who are outsiders and methodological experts, as in the new statistical review board at Science and Nature:
    http://magazine.amstat.org/blog/2014/08/01/science-review-panel/
    http://www.nature.com/news/science-joins-push-to-screen-statistics-in-papers-1.15509

    Of course, this doesn’t address the problem that the ideal peers might be from the future (when e.g. more is known about some methods).

  7. One of the main problems with “non peer-reviewed” sources, in a peer-reviewed journal one can have high (though not perfect) confidence that at the very least, the reviewer read the paper. When journalists or bloggers review something, they usually just read the press release. Often bloggers will just read surrounding media coverage, or maybe the title and abstract of the paper.

    For instance, out of the many thousands of bloggers heaping scorn on the “himmicanes” paper, how many do you think actually read it? A lot of that scorn is deserved on the grounds that the claims made in the title and abstract are not supported by data. However, they also ran 7 experiments, asking people about various hurricane-related situations, and varying the gender of the hurricanes name. These experiments were very carefully done, matching names using several characteristics (likeability, attractiveness), and found a significant effect based on gender of the name. Whether historically this has actually had any effect I don’t know, but their experiments show it’s at least plausible. In all the media coverage around this paper I never saw anyone even address these experiments, only the archival data. Critics focused only on what the authors got wrong, and completely ignored what they got right.

    • Jacob:

      Here’s the abstract of the himmicanes paper, in its entirety:

      Do people judge hurricane risks in the context of gender-based expectations? We use more than six decades of death rates from US hurricanes to show that feminine-named hurricanes cause significantly more deaths than do masculine-named hurricanes. Laboratory experiments indicate that this is because hurricane names lead to gender-based expectations about severity and this, in turn, guides respondents’ preparedness to take protective action. This finding indicates an unfortunate and unintended consequence of the gendered naming of hurricanes, with important implications for policymakers, media practitioners, and the general public concerning hurricane communication and preparedness.

      The claimed finding is about deaths. Laboratory experiments are referred to only as a way of getting insight (“indicating”) into the mechanism. But, as many outside reviewers noted, the claim about the deaths was not supported by the data. I think the outside reviewers, many of whom did show evidence of having examined the paper in detail, got it right: The paper made a bold and attention-getting claim that was not supported by the data. The capstone of their evidence was a p of less than .05, basically zero evidence given the many forking paths in their garden.

      • >The paper made a bold and attention-getting claim that was not supported by the data

        Yes, that’s true, and as a result everybody ignored the entire paper. Meanwhile they have a weaker claim that IS supported by the data but that nobody seems to acknowledge.

        >many of whom did show evidence of having examined the paper in detail

        Again, I never saw any address these experiments. So if they did examine the paper they chose to ignore this portion, which is very relevant to the overall question of whether the name of hurricanes cause additional deaths.

        If I were a public policy researcher in hurricanes (or in naming things in general) this would piss me off pretty strongly. I would want to replicate these experiments, but it would be pretty hard to get funding if funding agencies thought the experiment I was trying to replicate was junk. It’s not. It’s contained in a paper containing overzealous claims

      • By the way, you mention this study an awful lot, it would make sense to link to substantive criticisms when you do. Particularly because you once said you wouldn’t berate it, but every time you include “the himmicane study” in a list of examples of “bad” studies without providing any nuance at all, it sure looks an awful lot like you’re berating it.

        • From the WaPo link, you say:

          >I wouldn’t “berate” the study

          So what changed your mind?

          My experience echoes your anonymous colleague, quoted as:

          >I’ve now read several blogs that berate the study, but none of them presents a particularly meaningful criticism (just lots of indignation & ridicule). The critical “expert” quoted in Ed Yong’s blog definitely hadn’t read the paper or didn’t understand the analyses.

        • Jacob:

          What changed is that (a) I saw further critiques of the paper, (b) I didn’t see any good arguments in support of the paper’s claims, and (c) the authors’ defense was lame, which suggest to me they didn’t have any good arguments either. As usual, my conclusion here is that the substantive claims in the paper could be true; the authors just don’t present any good evidence.

        • Okay, I’m going to circle back to the original question as to why a blog might be less credible than a peer-reviewed journal article. From my third-party perspective: (a) is hearsay, (b) is your personal experience (and you haven’t acknowledged the lab experiments in this comment chain, which imho are good evidence of a weaker claim), and (c) is ad hominem (strictly speaking the authors communication skills is independent of their statistical skills, believing otherwise is a cognitive bias, a sort of anti-halo effect).

          And lastly, for all the times you’ve insulted this paper on this blog, there was never anything informative. I still know nothing about your other frequent punching bags (Bem, Wegman) because I haven’t bothered to google them, and for all their mentions here there’s not a lot of substance. Along those same lines, why do you call it PPNAS?

  8. for example that paper on ovulation and voting was reviewed by peers who didn’t realize the implausibility of 20-percentage-point vote swings during the campaign

    Yet your paper (with others) on changes in polling results based on non-response ignored the fact that polling results often change radically even in well-behaved national campaigns with no non-response issues (see Bush-Gore 2000 Gallop results, where there were close to 20 percent point vote swings). In a previous post I mentioned that non-response was implausible as a cause of your variations given this previous historical fact, to which you made a somewhat snarky response to the effect that the reviewers had found your claims implausible as well and had rejected the paper (is this still the case?). I just think political scientists are in the own little collegium and don’t ever deal with real political phenomenon (such as polling for campaigns, running campaigns, governance, etc) and so they develop theories that are tangential (to put it mildly) to actual political reality. In other words, I think you’re throwing stones in your own glass house and are somewhat oblivious to the shards around you. I realize it is hard to maintain consistency with the huge amount of product you put out there (note below–one nice things about a mathematical framework such as statistics has is that it forces consistency) but in particular I think you, and quantitatively oriented political scientists, continually underestimate the corrosive effects of race on the American polity (though of course the second law of political science is “everything is different in the South”, so there is simultaneous recognition that it is important and that it is not–Orwell called this doublethink). After this current presidential campaign, maybe this will change.

    • Numeric:

      I can’t remember exactly what we put in our vanishing swing voter paper, but there is a clear contrast to my 1993 paper with King, where we show big historical vote swings during campaigns. Indeed, that was the point of our 1993 paper. I do suspect that some of the swings we saw in the 1993 paper were partly attributable to differential nonresponse, but all of this is happening in the context of two long-term trends: (1) increased partisan polarization, which has meant smaller vote swings within general election campaigns, and (2) decreased survey response, which I assume has meant greater effects of differential nonresponse.

      In short: our paper was about 2012, not 2000. That said, I haven’t looked at the vote swings during the 2000 campaign (I assume when you say 20-percentage-point swings, that’s what I would call a 10-percentage-point swing, in that you’re referring to the difference and I’m referring to one party’s vote share), but I’d expect that some of these swings could be attributable to differential nonresponse.

      It’s worth looking in to (in case any political science researchers happen to be reading this deep in the comments).

      Whether this is all a waste of time, that’s another story. I think you could make a very strong argument that there are many more important topics in political science than what I happen to study.

      • http://statmodeling.stat.columbia.edu/2014/09/30/didnt-say/ has our rather extensive discussion for any that are interested (link to that and look for comments by numeric, Andrew’s response, my response, etc, etc). The gist of my criticism is (reproduced from this link):

        If you recall the 2000 election, the polls varied dramatically over the course of the campaign (see
        http://en.wikipedia.org/wiki/Historical_polling_for_U.S._Presidential_elections#United_States_presidential_election.2C_2000). Yet there was no indication of large non-response problems by party from any of the polling organizations (maybe there was and I didn’t see it/it wasn’t reported, but my impression is there wasn’t and I didn’t see any of this in the polling I was doing). This is why I think your findings are probably not true (slump versus changing
        responses). Also, when you state that a 5% difference is “major”, you are ignoring that the Pew estimates are random and that the 55 to 47 percent difference in two successive samples may very well be within expected sampling error bounds, depending upon the Pew n (which I
        don’t know). Isn’t ignoring randomness in estimation a type M error?

        The more meta-comment is that the 2012 paper’s results (not 1993) are difficult to believe (non-response versus slumpers–love it when polysci types come up with nomenclature) due to the previously mentioned 2000 election polls (the word implausibility comes to mind). I offer an alternative explanation (based on the endogeneity of self-reported party identification and described vote choice) in my comments. The point is that rejection of the 2012 paper (maybe it’s been published by now) was valid by Andrew’s criteria if it was reviewed by experts who happen to know something (having published a number of political science papers, and had some rejected, my experience is that political scientists can’t follow mathematical/statistical arguments–I had one paper at APSR that went through 8 reviewers (it ended up about 4 to 4 and was rejected), none of whom could agree on anything, and only one made a valid comment–this reviewer claimed a proof was incorrect and offered a counter-example. This is exactly the correct way to proceed on a proof but his counter-example was incorrect and I demonstrated that. His response was to drop out of the reviewing–the editor explained to me that he refused to reply to her e-mails and he would be punished by not being allowed to review more papers, to which I replied that seemed like a reward, not a punishment. I looked up the topic on the web of science a few years ago and I saw a Nobel prize winner in Economics (I know it’s not a Nobel but everyone calls it that) had obtained the same results I had a year or two later–there was one iconic figure that exactly matched mine so I knew I had the right answer. One is better off sticking to an academic field with clear criteria and doing applied work in political science (and I mean real applied work, with campaigns and such, not applied versus theory in academia).

  9. To get all meta here, wouldn’t studies of techniques recommended by motivational speakers for how to subtly influence other people tend to be susceptible to researchers getting different results depending upon which results the researchers wanted?

    A lot of the replication crisis studies seem to fall more into the fields of marketing research than of Science with a capital S in which researchers discover the eternal Laws of Nature. But the history of the fields of marketing and motivational speaking suggests that positive results tend to be highly contingent upon who is promoting a technique, how enthusiastic they are for their gimmick, and a lot of history (e.g., is this technique too new to come into fashion yet because the public isn’t ready for it, or is too old and corny-sounding because it’s been beaten to death in recent years?).

    • Steve:

      Interesting point and highly relevant given that all three of Carney, Cuddy, and Yap teach at business schools! In this case, though, I’ll go with Simmons and Simonsohn and say that there’s no evidence for any effects here, period. That is, I agree with you in theory that (a) there could be real effects of power pose etc., and (b) such effects could vary over time as well as across situations and among people. But in practice I think these experiments have turned out to be nothing but noise mining exercises or tea-leaf reading. All these poses and mantras can have effects, but I think the research designs in the papers under discussion are too crude to get at them.

    • That may be because much of this stuff IS marketing research. What’s being marketed is the academic themselves — to get tenure, to get grants, to get a reputation, etc.

      I’d phrase this a bit differently — scientific research to find Laws of Nature, and advocacy research, to find facts that support My Position. But that’s just semantics.

      • zbicyclist:

        There are good researchers and we need to remember that but “to get tenure, to get grants, to get a reputation, etc.” dose explain much of the experience I had in clinical research and likely any research area where replication takes all lot of work and time (i.e. not math that can be relatively easily and cheaply replicated by other mathematicians) or where “positive results” are hard to discern (e.g. this talk on philosophy – https://www.academia.edu/18875132/The_Fragmentation_of_Philosophy_the_Road_to_Reintegration_in_press_ excerpts below).

        “First: philosophers, like human beings generally, find it hard to resist acting so as to advance what Mill would call their interests “in the vulgar sense of the word” —the desire for prestige, status, worldly success, and, of course, money; and as it is presently organized our profession creates powerful perverse incentives—incentives, that is, not to be alert for and to seize opportunities to advance inquiry, but instead to be alert for and to seize opportunities to advance yourself”

        “once upon a time, philosophy professors were scholars learned in their field, and published only when they believed they had discovered something or figured out something interesting; but, as the publish-or-perish ethos took hold, many soon came to believe that the only way to survive, let alone succeed, in their profession was to find some niche, some clique, some citation cartel where—provided they used the right, i.e., the fashionable, jargon, and cited the right, i.e., the most influential, people—they could publish enough to get tenure, a raise, a promotion.”

        “I really shouldn’t have been shocked—though, I admit, I was—when the young woman I mentioned at the beginning told me that her supervisor had advised her to “publish as much as possible as fast as possible”; nor that she had evidently realized that the easiest way to go about this was to concentrate her attention on a narrow seam of niche literature”

    • Zbicyclist:

      Ugh. A sad demonstration of the divide between the statistical haves (Ranehill, Dreber, Simmons, Simonsohn, Fung, and the readers of Slate) and the statistical have-nots (the power-pose people, their unfortunate students at some of the world’s greatest business schools,the readers of the Chicago Tribune, and the followers of Ted talks). What next for the Trib? An astrology column and weekly lotto picks? A regular science column by Daryl Bem? Subliminal smiley faces that will change their readers’ views on immigration??

  10. if you’ve seen unprepared researchers do an AMA, peer review by reddit can be brutal.

    On the other hand, i wonder if you’re discounting the role of systematic disinformation in public discourse. Seems like there’s a risk of slipping into a “market is always rational”-like “crowd will converge on the truth” assumption.

    This is why the “don’t admit fault” strategy can work. If you don’t admit fault and have enough loud supporters (either many of them or ones with a big megaphone), you can fence off the truth to the minority specialists within your field. Meanwhile, everyone else in the world views the dispute as a wonky disagreement over “technical details”.

    • J:

      Yes, I noted one of those problems in my above post. Being able to miscalculate test statistics and p-values is another set of researcher degrees of freedom. I have no idea if these mistakes were simple transcription errors (someone computing the statistics on a calculator and mis-entering a few numbers), or deliberate changes to correctly-calculated numbers, or something in between such as aggressive rounding of intermediate quantities, rounding up numerators and rounding down denominators in order to get bigger final values.

  11. “What is peer review good for? Peer reviewers can catch typos, they can catch certain logical flaws in an argument, they can notice the absence of references to the relevant literature—that is, the literature that the peers are familiar with.”

    While I agree with most of your points and recognize the many frustrating aspects of peer review, I don’t think you’re being entirely fair about what the review process can reasonably catch. A good reviewer would catch important problems such as design flaws, inadequate measures, model misspecification, misuse of statistical techniques, poor or incorrect interpretation of results, and so on. This doesn’t mean that all reviewers will catch potential problems, and I’m sure the quality of reviews varies by journal/field/chance? But it does mean that peer review can serve an important vetting process for sound empirical research (or at least more transparent/honest research).

    My bigger gripe with peer review and the publication process is that it is painfully slow, delaying potential contributions to public knowledge months or years (often when an issue is no longer of interest). Which makes the power pose authors’ excuse far less satisfying: Why shouldn’t they respond to genuine and thoughtful criticism of their work regardless of where it’s raised?

    • Todd:

      I agree that reviewers can catch all these problems you state. But peer reviewers, maybe not. The authors of the papers in question are part of a peer group that doesn’t understand the problems you listed: design flaws, inadequate measures, model misspecification, misuse of statistical techniques, poor or incorrect interpretation of results, and so on. Maybe they are learning about some of those issues now (or maybe not, given Cuddy’s empty dismissal of the work of Simmons and Simonsohn) but certainly not in 2005 and 2010, when their papers were being reviewed and published.

      In the peer-review process you can get an informed non-peer review (these journals even ask me to review a paper from time to time!), but the reviews from actual peers can have big problems.

      Finally, your question, “Why shouldn’t they respond to genuine and thoughtful criticism of their work regardless of where it’s raised?”, is a good one which perhaps I will discuss in a subsequent post. I think it’s for the same reason that a team, having scored a touchdown, would prefer not to have a video review of the play. To them, their touchdown is valid, whatever process led to it being scored.

      • Ah, I see what you mean. Thanks for clarifying.

        I was reading “peers” as a large population of scholars in the relevant field (or in another field but with knowledge of the subject area/methods being employed). But peer reviewers as you define them are definitely problematic. Even if peer reviewers were to spot potential problems (which you note is unlikely), these small circles of reviewers have little incentive to reject like-minded research which cites their own work and makes it easier for future publication efforts. In this sense, I completely agree with you that peer review has limited value.

    • It is rather disturbing to see people so bright engage in such flawed rationale, entirely ignoring the flaws in the system from which they directly benefit.

      • Not trolling. Just an example how others really think. The full quote from that article was:

        Those who can, publish. Those who can’t, blog. I understand that blogs can be useful in affording the general public insights into current science, but it often seems those who criticize or spend large amounts of time blogging are also those who don’t generate much publications themselves. If there were any valid criticisms to be made, the correct venue for these comments would be in a similar, peer-reviewed and citable published form. The internet is unchecked and the public often forgets that. They forget or are unaware that a published paper passed rigorous review by experts, which carries more validity than the opinion of some disgruntled scientist or amateur on the internet. Thus, I find that criticism in social media is damaging to science, as it is to most aspects of our culture.

        • Anon:

          I’m not saying you’re the troll. I’m saying that the person who said this is a troll. It disturbs me that people get attention by saying things like this, but that’s the world we live in.

        • Unfortunately, I don’t think it was trolling. I think it’s the start of academics circling the wagons. If you look back at how Bayesians were treated, or scandals in climate science, or ideological enforcement of fields like sociology, it’s clear academics who control tenure, departments, funding, and publishing, have enormous power to squelch developments they don’t like and there is nothing they like less than being knocked off their cosy perches.

          It’s very naive to think they’re going to go along with these developments or criticisms in a friendly manner just because you offer them up in a friendly manner. They’re going to fight back against them tooth and nail.

          The best thing to do is to bypass them completely. Start with something like a github for research. Something where you can trivially upload latex, code, data. It can settle priority, and be made public with a single command.

          Combine that with tools making it easy to publish, index, notify, and exchange materials.

          Finally, make it easy for ‘super recommenders’ to operate. Much like the best movie reviewers aren’t good movie producers, directors, or even writers, the best academic reviewers often aren’t’ going to be the best at research. Call these types “Mersenne reviewers” after the priest who played this role at at critical time in the 1600’s. There needs to be a way for these Mersenne reviewers to prove themselves and rise to the top.

          Most research is a straight up scam. I don’t see the point of being polite about it. It’s a direct and cynical attempt to defraud taxpayers. Plumbers are being taxed so that philosophers can earn six figures salaries being wrong about subjects they’re barely acquainted with. That’s academia in 2016. What is desperately needed is a major shake up. Science needs to become the free for all it once was (at least for a while). It will be hard for most fields to make progress in such an environment, but currently most fields are guaranteed to never make progress. A small chance is better than no chance.

        • “Most research is a straight up scam. I don’t see the point of being polite about it. It’s a direct and cynical attempt to defraud taxpayers.”

          I don’t think so, the current system does little to detect or discourage scammers (eg it is a big deal to do a direct replication… that should be SOP), and even selects for them (it doesn’t bother them if they are surrounded by BS for whatever reason) but they are still a small minority from what I’ve seen.

          Most people just don’t know what they are doing but want to do a good job. Is it even controversial that basic statistical competence amongst researchers is very low, probably too low to understand anything written on this blog? They just think they are doing a good job because they are doing what everyone else does and get rewarded for it.

        • A small minority? Oh please, there’s even a phrase for the feeling Academics get when the facts of their scam intrude on their conscious: “impostor syndrome” which +95% of Academics feel. They feel that way because they are imposters. Every academic I’ve met has confessed to gaming the system (splitting one paper into three, working a dump project because they need the grant money, and so on) ad nauseam.

          I don’t think for a minute Cuddy believes she’s doing science or something important. I think she, like most of the rest, views herself as playing a fun little game, which beats doing hard real stuff, and is very rewarding. If she had to fund her experiments with her own money (easily doable in this case) she’d drop the subject in a heartbeat.

          So if everyone tells people like Cuddy that they’re “researchers” doing “science” their life is great. The illusion never crumbles and they keep on going with their convenient cozy fictions. But when someone threatens to upset the illusion, which is what Gelman is really doing whether he intends to or not, well then people like Cuddy get very upset about it. It’s extremely naive to think they’ll graciously accept the destruction of their worldview.

          Academia could use some plain speaking:

          (1) Most research is a scam.
          (2) Most academics are on welfare.
          (3) Frequentism doesn’t work in practice because Freqeuntists got it wrong.
          (4) Most fields have achieved nothing in many ah decade despite having unprecedented manpower and money.
          (5) Most fields are not slowly accumulating advances which will lead to something important one day. Most are at a standstill.

        • > a small minority from what I’ve seen

          I doubt much is known about the percentage and it will vary by discipline, place and time.

          That there are pressures for the percentage to increase, I think are clear as are the difficulties in undoing it.

        • >”Academia could use some plain speaking:

          (1) Most research is a scam.
          (2) Most academics are on welfare.
          (3) Frequentism doesn’t work in practice because Freqeuntists got it wrong.
          (4) Most fields have achieved nothing in many ah decade despite having unprecedented manpower and money.
          (5) Most fields are not slowly accumulating advances which will lead to something important one day. Most are at a standstill.”

          If you replace “scam” with “sham” and “welfare” with “a jobs program”, then this would be consistent with what I have observed.

        • I’m not convinced that it’s either trolling or circling the wagons — it may be just ignorance, or lack of careful thought, or lack of flexibility. Many people don’t adapt quickly to new ways of doing things.

        • From “anonymous”: “I don’t think for a minute Cuddy believes she’s doing science or something important.”

          I think you are off here. The sad fact is I think (so I guess it is not exactly a “fact” but rather a belief) Cuddy feels she is doing science. We train, not educate, our graduate students. Accordingly, most cannot go beyond the stale and logically fallow (failure to distinguish necessity from sufficiency) mantra of “it is science because I use the scientific method”.

          Bottom line: In contemporary upper-level education we credential folk way beyond their capabilities. And, perpetuating the sad cycle, they the become the credentialing agents.

    • A lot of research (i.e. expansive poses) is a scam no better than selling diet pills. The winners who have played the academic game well and risen to the top of the heap in this scam aren’t going to quickly relinquish their position easily.

      The push-back you’re seeing in that quote is just the beginning. As their position, status, and welfare checks (uh…I mean “research grants”) become seriously threatened, the push-back will intensify exponentially.

      • “The push-back you’re seeing in that quote is just the beginning. As their position, status, and welfare checks (uh…I mean “research grants”) become seriously threatened, the push-back will intensify exponentially.”

        Not just a push-back, they are looking to expand:

        http://pps.sagepub.com/content/10/6.toc

        https://s3.amazonaws.com/v3-app_crowdc/assets/0/03/03f4a318bd0e7a62/SympAbstracts_86.original.1452615483.pdf

        They want to be involved in policy making. Imagine all the BS research that can be used as “evidence” by these folks. It’s pretty scary when you think of it. You gotta hand it to social psychology, in the wake of scandals, non-replicability, etc. they are just doubling down! Amazing stuff.

        • P.S.:

          I like some of these people, but I do have mixed feelings, partly because of the academia-media-government hype cycle. It’s tricky. On one hand, I am happy to promote my own work and would like it to influence practical policy; on the other, sometimes the incentives seem all wrong. So, yes, there is something disturbing to me about this “Council of Psychological Science Advisers” initiative, even though I have some sympathy for its general goals.

          I’d feel a little better about Cass Sunstein, say, offering research-based policy advice if he were to take a clear stand against all the untrustworthy research that’s been published in top psychology journals. (Or maybe he has; I haven’t been following all his writings.) Similarly I’d feel better about Daniel Kahneman being treated as an all-around sage if he would stop going around talking about the notorious “embodied cognition” studies and telling us, “You have no choice but to accept that the major conclusions of these studies are true.”

        • It is completely scary.

          The majority of psychologists (more true of social, but generally applicable) have NO IDEA what psychology consists in and NO IDEA of how the “mind” works (much less any clear conception of what a “mind” is).

          There are no principles (much less laws or well-specified theory) informing us how neural behavior is transformed into mental experience or how these mental happenings determine behavioral outcomes. Just a lot of hand waving and rampant stipulation (of unjustified magical mental mechanisms; reinstating, btw, conditions that enabled behaviorism to gain traction 100 years past).

          Psychology has largely devolved (yes it was a more serious discipline in its youth — when critical thinking and a desire to know nature trumped personal aggrandizement and the intellectually lazy comfort afforded by riding with a well-established, but ill-defined, on-going research industry) into demonstrations of regularities (though recent replication efforts make one concerned for the regularity of those regularities).

          To think these folk now want to have a say in public policy is frightening and presumptuous simultaneously (even ignoring the questionable generalizability of policy based on shaky outcomes gleaned from college freshman). But, self-promotion being the guiding principle, it is not surprising.

    • The g-test gives you a p-value of exactly .05 (so not <.05 if you want to play the NHST game). Also, the g-test really would require them to describe it as a g-test and to state the embedding model.

      • Sorry for the late reply. Actually it gives exactly the quoted p-value. And yes, I would agree that it would be much more proper to report it as a g-test, but it is chi-squared distributed and we don’t necessarily require researchers to explicitly state the method when using the Pearson’s chi-squared. So I’m going to call it as “We all know you were stealing a cookie, but technically your hand could have been in the jar because you thought you saw a bug.”

  12. I think it behooves you to acknowledge the possibility that not every academic approaches every peer review with the honest intention of furthering the understanding of the human race. Aside from insidious unconscious bias (“Harvard guys wouldn’t write rubbish, oh maybe I should look again because that’s an unusual sounding surname” variants on which we all must watch ourselves for), nefarious motives do exist and there are academics who really do harbor such motives. Is it a tiny minority? Is it widely prevalent. I offer no guidance there nor can I suggest how one might come up with a suitable estimate. In addition, using the publication contact (editor?) as a back channel for a repeated game of approving each others research papers (eg to pad publication metrics or similar) is something that is also possible. This may go in concert with the I’ll cite yours you cite mine metric padding game.

    The proviso “Assuming that every peer review is performed by a reviewer who has the relevant expertise and when that reviewer is alert, critical, concentrating and with no other intent than to perform the review to the best of their ability. (As opposed say, drunk in front of the tv at 11pm or forced on whichever grad student was in the wrong place that day). In addition we assume that the selection of reviewers and the anonymity of such reviews is what is claimed by the publication.” Would seem an appropriate sentence. Bob Carpenter’s statement about his own reviewing (which I agree with and don’t criticize him for) suggests even then, the dimensions of time spent and secondary objectives (support of open access etc) mean that it still isn’t quite strong enough.

    I am not trying to smear academia, merely pointing out that assuming the problems in the data do not exist seems like it might not improve the analysis.

    • >”The proviso “Assuming that every peer review is performed by a reviewer who has the relevant expertise and when that reviewer is alert, critical, concentrating and with no other intent than to perform the review to the best of their ability. (As opposed say, drunk in front of the tv at 11pm or forced on whichever grad student was in the wrong place that day).”

      Are you saying one reason for sub-par peer reviews is that as the professor writes the review they are drunk and sexually assaulting students who ended up “in the wrong place that day”? That could possibly be worse than the catholic priest scandal if you have any evidence for this.

      • No. I don’t know how you arrived at that construction. Perhaps this is humor?

        I will say I’ve never heard of a grad student volunteering to do a professor’s peer review work, usually they are volunteered by the professor. Such volunteering is not necessarily evenly distributed among the grad students.

        I’m sure I don’t need to point out the power imbalance in the relationship between grad student and professor. I accept that the fact that I’ve never heard of grad students volunteering themselves doesn’t mean to say it never happens. I’m also told there are professors who refuse to put their name on papers where they have only acted as editor, to my mind that would be the right thing. But I haven’t seen that behavior myself.

        Given the rampant exploitation of grad students in academia it is right and proper to be on the look out for occasions when that exploitation is sexual exploitation. I don’t think this would be the place I’d report it and I’m not sure how that would directly affect review quality.

        There are a myriad of ways in which a review could be performed very badly. Think of some for yourself, perhaps drawing on your own experiences.

  13. Andrew writes:

    “(Parts of) the journals Psychological Science and PPNAS have been the house organs for a certain variety of social psychology that a lot of people (not just me!) don’t really trust. Publication in these journals is conditional on the peers who believe the following equation:

    “p less than .05” + a plausible-sounding theory = science.

    Lots of papers in recent years by Uri Simonsohn, Brian Nosek, John Ioannidis, Katherine Button, etc etc etc., have explored why the above equation is incorrect.”

    This made me laugh out loud. Thanks.

    There seems to be a sub-equation at work, also:

    “If p is not less than .05, find some out “outliers” to discard & make it true”.

  14. On the Amazon web page I wrote a long review of Presence (i.e., the power pose book), challenging Cuddy to defend her (in my view) junk science.

    Total silence (save for the personal attacks on me , my wife, etc from the fan base).

    Of course, one might offer that Cuddy had not seen my comments. But the same day, another reviewer called into question some minor aspects of her book and she responded angrily (“bet you did not even read my book!”) — so it seems reasonable to assume that a long, nuanced critique of the “substance” of Presence would capture her attention.

    But substance (e.g., what is the “self” that power poses empower…?) is not something this type of a-theoretical (and by theory I mean serious mechanistic (or not) explanation enabling parametric prediction — not a compilation of data in the service of folk intuition that admits exclusively to the binary outcome “effect present/effect absent”) research and its producers are prepared to grapple with.

    I once had an on-line conversation with a neuroscientist who studies consciousness (use “studies” with caution here). He was of the opinion that conceptual clarity of one’s constructs (i.e., consciousness) only muddies the empirical waters (I kid you not).

    Such is the very sorry state of much research coming from the behavioral sciences. User beware.

    • Sbk:

      Some interesting issues in science communication here. On one hand, a researcher certainly has no obligation to respond to critiques on a book review page, or on a blog, or for that matter in a peer-reviewed journal. She can let the work stand for itself. On the other hand, what are we to think when a researcher (in this case, multiple members of a research team) seem not to care about serious methodological errors in their published work, starting with miscalculated test statistics and going on from there? On a purely strategic level, it can make sense to just not respond and to hope that the criticisms do not become widely noticed by peers or NPR producers, but from a scientific perspective I think it’s best to engage serious criticism and recognize the possibility that one might be wrong.

      • My point (in the above post) was not that she was expected or required to respond; rather, that she clearly was monitoring the comments appearing on Amazon, and chose to immediately respond (this was when the reviews still were sparse)in a somewhat nasty manner (i.e., “bet you did not read the book!”)to a poor fellow who runs a school (or some such), but not avoid any serious engagement with conceptual concerns.

        Frankly, I did not write my comments for Cuddy’s benefit (I seriously doubt any “benefit” would be in the realm of the possible), but as a way to alert potential readers/buyers, to the strong possibility that this so-called “grounded in science” work is actually in potted in very lose soil.

  15. Two additional problems not covered by your article: (1) Peer review is often thought of as being blind review or double-blind review. For highly specialized fields, this is an impossibility. You can have blind review, or peer review, but not blind peer review. (2) Peer review is impractical with today’s publication pressures. In order to achieve some sort of parity, you (and your co-authors, collectively) would need to review three papers for every one you *submit* for peer-reviewed publication. Quality reviews are often a casualty of the time economy in the life of a modern academician.

  16. I’m a TOTAL outsider to these debates, with no academic background in science or statistics. And reading through the slog of these posts, I am UTTERLY SHOCKED that you guys would need to be having these arguments in the first place. It seems rather obvious that people outside the field can have significant contributions to make about the understanding of a particular study, or an analysis of the data that illuminates the situation somehow. I was extremely surprised by Amy Cuddy’s second response, and how unprofessional it seemed. I typically do not trust many of these “studies” anyway [which of course can be a bad thing, given a statistically significant result], but this entire process has really opened my eyes to just how biased individual researchers, and how flawed a peer-review procedure, can be.

    I must say, this is the reason I LOVE reading blog posts from professionals. Not only do you get some insight into the thinking of the professionals themselves, but also the educated posts from “outsiders” can often be so illuminating in their own right.

  17. “Peer review can serve some useful purposes. But to the extent the reviewers are actually peers of the authors, they can easily have the same blind spots. I think outside review can serve a useful purpose as well.”

    I am starting to wonder if this reasoning also could apply to psychological science in general (i know the most about psychological science so i’ll stick with that). Without wanting to sound like a d#ck, i am increasingly starting to wonder (and worry) that psychological science has, and is, not selecting for the “best” and “brightest” people. I reason this in turn may result in currently hired people to not have the capabilities and/or characteristics to hire the “best” and “brightest” students, and so this cycle continues.

    If i am not mistaken, i’ve read the following sentence a few times on this blog:

    “The problem with peer-review are the peer-reviewers” (or something like that).

    I wonder if it would be accurate to say:

    “The problem with psychological science are psychological scientists”…

  18. I don’t understand why it makes scientific sense to let few reviewers and an editor judge the value of a paper and decide whether or not to publish a paper.

    It makes more sense to me to just publish papers, and let the entire scientific community decide its value.

    If, and how, the paper is used and cited is perhaps all the peer-review and quality control that is necessary and desirable.

    It also makes no sense to me that reviewers possibly influence and/or improve the final version of a paper without receiving any credit and without it being clear who actually wrote what. It seems to be in direct conflict with reasons for authorship and all that stuff.

Leave a Reply to Andrew Cancel reply

Your email address will not be published. Required fields are marked *