What happened that the journal Psychological Science published a paper with no identifiable strengths?

The other day we discussed that paper on ovulation and voting (you may recall that the authors reported a scattered bunch of comparisons, significance tests, and p-values, and I recommended that they would’ve done better to simply report complete summaries of their data, so that readers could see the comparisons of interest in full context), and I was thinking a bit more about why I was so bothered that it was published in Psychological Science, which I’d thought of as a serious research journal.

My concern isn’t just that that the paper is bad—after all, lots of bad papers get published—but rather that it had nothing really going for it, except that it was headline bait. It was a survey done on Mechanical Turk, that’s it. No clever design, no clever questions, no care in dealing with nonresponse problems, no innovative data analysis, no nothing. The paper had nothing to offer, except that it had no obvious flaws. Psychology is a huge field full of brilliant researchers. Its top journal can choose among so many papers. To pick this one, a paper that had nothing to offer, that seems to me like a sign of a serious problem.

A good study does not need to be methodologically original, but it should be methodologically sound. When we do surveys we worry about nonresponse. To take a few hundred people off MTurk and not even look for possible nonresponse bias, this is not serious.

But, again, it’s not so much that this paper was flawed, as that it had nothing much positive to offer.

Just to be clear: I’m really really really really not trying to censor such work, and I’m really really really really not saying this work should not be published. What I’m saying is that the top journal in a field should not be publishing such routine work. They should be publishing the best quality research, not just random things that happen to slip through the cracks.

I mean, sure, the referees should’ve caught the problems with that paper. But I blame the editors for even considering publication. Even without noticing the paper’s methodological flaws, it was nothing special.

And, once you decide to start publishing mediocre papers in your top journal, you’re asking for trouble. You’re encouraging more of the same.

To clarify, let me compare this with some other high-profile examples:

– Christakis and Fowler’s study of the contagion of obesity. This was published in top journals and was later found to have some serious methodological issues. But it’s not a mediocre work. They had a unique dataset, a new idea, and some new methods of data analysis. OK, they made some mistakes, but I can’t fault a leading journal for publishing this work. It has a lot of special strengths.

– Bem’s paper claiming to demonstrate ESP. OK, I wouldn’t have published this one. But I can see where the journal was coming from on this. If the results had held up, it would’ve been the scientific story of the decade, and the journal didn’t want to miss out. The editors didn’t show the best judgment here, but their decision was understandable.

– Kanazawa’s papers of schoolyard evolutionary biology. I’ve written about the mistakes here, and this work has a lot of similarities to the ovulation-and-voting study. The difference is that Kanazawa’s papers were published in a middling place—the Journal of Theoretical Biology—not in a top journal of their field. Don’t get me wrong, JTB is respectable, but it’s in the middle of the pack. It’s not expected that they are publishing the best of the best.

– Hamilton’s paper in the American Sociological Review, claiming that college students get worse grades if their parents pay. This paper had a gaping hole (not adjusting for the selection effect arising from less well-funded students dropping out) and I think it was a mistake for it to be published as is—but that’s just something the reviewers didn’t catch. On the plus side, Hamilton’s paper was thoughtful and had some in-depth analysis of large datasets. It had mistakes, but it had strengths too. It was not a turn-the-crank, run-an-online-survey-and-spit-out-p-values job.

My point is that, in all these cases of the publication of flawed work (and one could add the work of Mark Hauser and Bruno Frey as well), the published papers either had clear strengths or else were not published in top journals. When an interesting, exciting, but flawed paper (such as those by Bem, Hauser, etc) is published in a top journal, that’s too bad, but it’s understandable. When a possibly interesting paper (such as those by Kanazawa) is published in an OK journal, that makes sense too. It would not make sense to demand perfection. But when a mediocre paper (which also happens to have serious methodological flaws) is published in a top journal, there’s something seriously wrong going on. There are lots of things that can make a research paper special, and this paper had none of those things (unless anything combining voting and sex in an election year is considered special).

P.S. Let me emphasize that my goal here is not to pile on and slam the ovulation and voting paper. In fact, I’ve refrained linking to the paper here, just to give the authors a break. They did a little study that happened to be flawed. That’s no big deal. I’ve done lots of little studies that happened to be flawed, and sometimes my flawed work gets published. I’m not criticizing the authors for making some mistakes. I hope they can do better next time. I’m criticizing the journal for publishing a mediocre paper with little to offer. That’s not just a retrospective mistake; it seems like a problem with their policies that they would think that such an unremarkable paper could even be seriously considered for publication in the top journal of their field.

37 thoughts on “What happened that the journal Psychological Science published a paper with no identifiable strengths?

  1. Possibly there’s a cause-effect issue here. Does being regarded as a good journal imply that it will publish good papers, or does publication of good papers imply that it will be regarded as a good journal?

    • Clark:

      My impression was that the Association for Psychological Science was set up several years ago to be the serious society for psychology research, because people felt that the American Psychological Association wasn’t research-focused enough. That’s why I’m particularly bothered by Psychological Science publishing a paper with no real strengths. They’re supposed to be the serious people!

  2. “Bem’s paper claiming to demonstrate ESP. OK, I wouldn’t have published this one.”

    One thing worth remembering is that post hoc we can point to all these “obvious flaws” in Bem’s methods (e.g. one sided vs two sided tests) only because tons of good people actually went over it with a fine toothed comb.

    I’m sure many papers with similar or worse methodological flaws get published in top journals every year. Let’s not think of Bem as an exception here. If you publish yet another paper saying smoking kills, red wine helps or some such nobody is going to scrutinize your methods to any significant degree.

    There are tons of papers with crappy methods out there; Bem’s was different only in that it was saying things controversial enough for people to care about his methods.

    • Rahul:

      As I wrote above, I think the editors’ decision to publish Bem’s paper was understandable, given that, if Bem had been correct, it would’ve been a huge huge story. Also, Bem did many experiments and put a lot of effort into each one. He had a new experimental protocol (or whatever it’s called). In contrast, the ovulation-and-voting study was a quick little Mechanical Turk thing.

      • Agreed.

        Personally, I’m glad Bem got published. The world (or at least psychology and statistics) is a better place because it did.

        Although he was wrong (obviously), the net impact on methodology has been positive I think. If Bem hadn’t been published we’d have lost a lot of valuable discussion and useful introspection.

        • I agree Rahul. I think this was actually one of the main points of publishing it. The experimental methods and data analyses reported in the paper were not flawed in any more serious a way than a typical paper that would appear in the field. If the main thesis of the paper had not been something so implausible, the paper would surely never have been singled out on methodological grounds. So choosing to publish the paper is, in my opinion, to say something like: “Here is a paper which plays by the standard set of methodological rules in our field, but which finds support for an impossible conclusion. Perhaps we need to re-examine the standard set of methodological rules in our field.”

          The editors themselves had the following to say in their editorial comment appearing in the same issue as the Bem paper: “It is our hope and expectation that the current two papers [the Bem paper and the Wagenmakers et al. commentary] will stimulate further discussion, attempts at replication, and critical further thoughts about appropriate methods in research on social cognition and attitudes.”

        • Isn’t the implicit assumption here that there’s no other way to start that valuable discussion of methodology besides publishing an obviously-wrong paper? I mean, yes, it’s great to have needed methodological discussion and introspection–but wouldn’t it be better if we could prompt it in some other way besides actually publishing an ESP paper?

          For instance, what about something like Simmons et al. 2011: http://people.psych.cornell.edu/~jec7/pcd%20pubs/simmonsetal11.pdf? A paper that uses satire to get the reader’s attention and dramatize and drive home important methodological points, but which doesn’t “contaminate” the peer-reviewed literature with incorrect conclusions about ESP?

          I agree that it can be hard to get people talking and thinking about methodological issues. But I hope that there are other ways to get people talking and thinking about methodological issues besides “publish a seriously-flawed paper”!

        • It’s too easy to say that the methodology was “obviously flawed”. At least three referees, the editors and perhaps Bem himself did not see the “obvious” flaws in the techniques.

          Based on the conclusion, someone with reasonable priors may say the conclusion was obviously wrong, yes.

    • It’s amazing what sometimes does get published. I don’t know anything about the journal but the paper is a hoot. etractionwatch.wordpress.com/2012/12/05/math-paper-retracted-because-some-of-it-makes-no-sense-mathematically/#more-10981

  3. PaulM–

    WTR to the moon landing paper, you forgot to add: It was a survey performed using a web based survey form that took no special pains to avoid people voting multiple times using free online proxies, open proxies or merely exploiting the fact their dynamically assigned IPs vary naturally — a point that was immediately discussed in blog comments at the blogs where the invitation to the survey was announced. Also: the major results depended on just a numerically small number of entries and so could at least hypothetically have been the result of a quite small number jokers who thought the questions about conspiracies might be fun to play around with during happy hour.

    • It’s also interesting to note that Anthony Watts and other bloggers *openly* suggested the survey be gamed, and, IIRC, you yourself, wrote a blog and a code that would allow for gaming surveys, all conveniently ones that didn’t’ support your “side” of this non-debate.

        • I think you are thinking of what happened after a subsequent paper ‘Recursive Fury’ – lewandowsky et al appeared (about criticisms of ‘moon hoax’ – 2 years after the survey took place, when some Watts Up commenters had some fun reproducing the original survey and asking people to participate..

          This was TWO years after the actual survey took place.. and could have had no impact on the original data collected, ie the paper was in press, having passed peer review, before this happened.

  4. Andrew,
    I have a question about non-response bias,

    A good study does not need to be methodologically original, but it should be methodologically sound. When we do surveys we worry about nonresponse. To take a few hundred people off MTurk and not even look for possible nonresponse bias, this is not serious.

    If a portion of a paper result springs from a survey with a response rate of 14%, what sorts of steps would you consider necessary to test for non-response bias? This does have to do with a recently published climate paper and it does have to do with arguments where someone seems to be suggesting that it’s reasonable to simply assume non-response bias does not exist because that’s the simplest of possible assumptions.

    • Lucia:

      There are internal and external checks. Internally, you have information on how the survey is run. MTurk, for example, is pretty well respected. Externally, you can compare the demographics of respondents to the general population, and you can compare the demographics of subsets that are being compared to each other.

  5. Thanks Andres,

    I want to see if I understand with an application. (Yes, this will get too close to the particular paper.)

    In this study, the papers were sent to authors of climate papers. The group doing the study had done already done a pre-rating of the papers based on the contents of the abstract. These did this:
    1) Assign papers to a category (i.e. mitigation, methods, impacts, not climate, not peer reviews and so on.)
    2) Rate the papers on a 1-7 (with 4 broken int 4(a) and 4(b))

    The authors were then asked to repeat the rating (more or less.)

    The internal check should compare demographics of those who returned vs. those who did not? I assume in this context, the “general population” would be everyone sent the survey.

    I’m not sure how the could have been done given the pre-existing survey constraints, but in the very disorganized comments at my blog, I said given what they collected, it seems to me that a comparison to see whether the proportion of returns was affected by category (i.e. mitigation, vs. methods) and also whether the return rate by % varied by pre-assigned rating. For example: were authors of abstracted rated “1” more or less likely to return than those of abstracts rated “4”.)

    Is this the sort of thing you mean? Or are you thinking of internal checks that might require more information? (Possibly, return rate could have varied by year paper was written 1991-200? Or it could have varied by author age, tenure status and so forth. The former would represent potentially available data, the later, not either not so much or huge amount of work.)

    (I ask because there will be people interested in sifting through the data for these tests. I have no idea what the results would be. The author/abstract data pairs to run these simple tests doesn’t seem to be public yet. But given the politics in climate I would not be surprised if someone either pesters the journal or the data gets FOIA’d. I tend to like to know what tests ought to be run before data are available as it inhibits just data-mining to find what you “want”. My position is merely that those tests which can be done ought to have been done and discussed in the paper. Until they are, we can’t know and should be cautious about making very strong declarative statements about what the author results tell us. )

    • Lucia,

      It’s a bit difficult to obtain demographics about people of whom you only know:
      a) Their names on an article; and
      b) That they have NOT responded to your email.

      It should be possible to determine probability of email response as a function of rating class (as determined by team participants). I don’t know if this has been done, but it shouldn’t be difficult. As it happened, the idea of assessing the authors’ self-ratings was almost an after-thought of the project.

  6. I agree with the post, but somehow I think you’re making an inference based on one single case! You said, “To pick this one, a paper that had nothing to offer, that seems to me like a sign of a serious problem”.
    It seems you had a good prior opinion about this journal (say, its a very good journal with probability .9). Now, it happens that a paper that bad was published there. Surely we shall revise our opinion. But how much? You seems to believe that a lot, based on one single mistake.

    I mean, I agree with the argument that they should avoid this kind of mistake, based on all you said. But to say that it’s a sign of a serious problem, well, I’m not so sure. I mean, it is a signal, but a very weak one, and it may well be just noise.

    what am I missing?

  7. The fact that the authors received wide media attention for this paper (pre-press) last July/August, but when asked to provide information and links to the survey, the lead author then lied about the key link (the Skeptical Science, survey link, that the paper depended on, to claim a divers audience is also a problem.!!

    I know this first hand, because it was me that Prof Lewandowky lied to! email reproduced here

    That and the authors are utterly ethically conflicted.. atagonistic, hostile publically to sceptics.

    Then I find myself researched by Lewandowsky and co-authors Cook in their follow up paper (Recursive Fury), identifying critics of the earlier paper as conspiracy theorists! the LOG12 paper was not even published yet, so we could not formally reply to the journal.. Now we can….

    When I finally spoke to an editor in Geneva (Frontiers) I pointed out the ethical challenges and the paper (Fury) has disappeared (Again) now in limbo for getting on 2 months. see comments under abstract to see why.


    Very early criticism of the Moon Hoax paper here:

    where Dr Adam Corner, had written an article about it for The Guardian, Lewandowsky had sent him a pre-press release, pre publication copy, and Adam reproduced the Guardian article on ‘his’ blog Talking Climate. Where I am others are expressing surprise at Adam’s lack of scepticism, as the paper has just surveyed readers of blogs that hate sceptics..

    and the comments under the surveyed links described what a rubbish survey it was, that they did not think the deniers, would be dumb enough to fall for it. and that some of them had fun with it..(ie scammed it) That and Lewandowsky’s name was made known to them.. that and 5 of the other blogs, were mate blogs of Skeptical Science (all guest authors there)

    A car crash of a paper.

    Talking Climate is where the criticism really started..

    4 of the critics here, ended up in the Recursive Fury – Lewandowsky et al, labelled as espousing conspiracy ideation. When in fact it was adults asking resoanbe questions about a highly questionable paper.. The fact that a ceratina ‘Richard Betts’ found himself alongside the sceptic in the data was highly amusing, as the psychology researchers Cook and MArriott, were completely unaware that this was prof Richard Betts, UK Met Office, Head of Climate Impacts, and a IPCC AR4 & AR5 lead author – Richard tweeted that Lewandowsky et al, were deluded.

    But what sanction is there when the author is knowm to have lied, and knows we know, that he knows he had lied, put published anyway..

    And the journal knows he lied to!

    Strong words, but absolute proof here: (the email to me, where he lies, is reproduced in full in the comments)

    but the one of the earliest criticism of ‘Moon Hoax’ is here, including Paul Matthews, Geoff chambers, ‘Foxgoose’ and me.

    I published (not in the paper) the names of the blogs surveyed, and was very critical.

    • the ‘Moon Hoax’ paper is of course also published in Psychological Science,

      NASA Faked the Moon Landing—Therefore, (Climate) Science Is a Hoax:
      An Anatomy of the Motivated Rejection of Science – Lewandowsky et al


      the title derived form 4 responses (out of over a 1000), of which they are almost certainly scammed by the participants of the online survey… ie some individuals commented on the ‘fun’ to be had..

      this paper is also more problematic than just the stats.. the stats, which has been looked at closely, because the authors have also been caught in an important, to the paper, lie.

      even the locals that took the survey didn’t think the ‘den­iers’ would fall for such a trans­parent survey…


      “Yeah, those con­spiracy theory ques­tions were pretty funny, but does anyone think that hard­core den­iers are going to be fooled by such a trans­parent attempt to paint them as paranoids?”

      Tom Curtis of Skeptical Science, who have col­lab­or­ated with Lewandowsky in the past and are cer­tainly no friends of genuine cli­mate scep­tics, has now called for the paper to be with­drawn — quote:-

      “Given the low number of “skep­tical” respond­ents overall; these two scammed responses sig­ni­fic­antly affect the res­ults regarding con­spiracy theory ideation. Indeed, given the dubious inter­pret­a­tion of weakly agreed responses (see pre­vious post), this paper has no data worth inter­preting with regard to con­spiracy theory ideation. It is my strong opinion that the paper should be have its pub­lic­a­tion delayed while under­going a sub­stan­tial rewrite. The rewrite should indicate expli­citly why the responses regarding con­spiracy theory ideation are in fact worth­less, and con­cen­trate solely on the result regarding free market beliefs (which has a strong enough a response to be sal­vage­able). If this is not pos­sible, it should simply be withdrawn.” – Tom Curtis –

  8. To be fair – Kanazawa has also been published in some top journals (e.g., Psychological Review). Even good journals do mess up. Psychological Sciences does publish many good papers but I agree it has focused too much on headline bait.

  9. This is still my favourite psych science paper, in the ‘metaphors gone wild’ line of research, on thinking outside of the box http://intl-pss.sagepub.com/content/23/5/502.full. They actually put people inside a box (!) and found they were less creative inside the box (!) than outside (possibly due to a lack of oxygen?). Notably, there was a box present in the ‘out of the box’ condition over the controls. So, PI’s all over the world, the single best investment to induce creativity in your colleagues is to buy a bunch of empty boxes, conveniently placed next to the desks of your colleagues. I just don’t know how you can read (or write) this sentence without bursting into laughter: “Using polyvinyl chloride (PVC) pipe and cardboard, we constructed a box that measured 5 ft by 5 ft and could comfortably seat one individual. As predicted, participants who completed the RAT while they were physically outside the box generated more correct answers (M = 6.73, SD = 0.50) than did participants who were physically inside the box”. That’s spectator sport science right there! (and of course the p-value is teetering precariously on the brink of significance, the F test reported as F(1, 99) = 3.93, p < .05, which is actually p= 0.0502. But let's put that down to rounding error, as F(1, 99) = 3.939, p<0.05. See, out of the box thinking right there).

  10. why do you, and so many other bloggers, persist in this weird idea that bad papers in refereed science journals are rare, a new phenomenon, or, even, unplanned ?

    Many, many years ago data from current contents (possibly even work by the late founder, whoose v pompous weekly column was always amusing) showed that most papers have 1 or fewer citations.

    Clearly, most papers are never read by anyone but their authors; we have known this for years.

    Anyone who goes thru grad school must quickly learn that ~50% of published papers detract from your brain when you read them; another 25-30 are at least neutral, another 5-10 are of moderate interest, and all the real action is in a small percent of the published papers.

    This is a planned part of the system: since at many schools you can’t get tenure (keep your job) without research, or you are looked down on without research, there must be a large market for research that isn’t very good…it is a built in part of the system, just like the pyramid ponzi like nature of the increase per year in students/increase per year i funding (like most doubling system, we see sudden crashes, when VERY senior people go to congress and decry the “crisis” in funding…)

    and we haven’t even gotten to the general loss of jobs in our society, and how people are compensating with more education and certificates, which requrie, at least at the high end, published papers…

    not to mention, that for profit publishers , I speculate, engage in various ruses to get authors to publish; the page charges and subscription fees are paid for by the taxpayer, so the publisher and publishee make out like bandits.

    Sydney Brenner was, justifiably, one of the most famous molecular biologist of our time. In his autobio, which is very entertaining, he recalls how while spending an extended period in the hosptial (oxford or cambrige) he would read old journals; one he recalled as being uncut (1) – no one had ever read it. He got the inspiration for his nobel prize from a paper in that volume

    1) in the old days, the journals were printed, and sometimes when the paper was folded, the paper was cut, so you had to get a sharp knife to separate some of the pages.

  11. Pingback: Mechanical Turk and Experiments in the Social Sciences » Duck of Minerva

  12. Pingback: Mechanical Turk and Experiments in the Social Sciences | Symposium Magazine

Comments are closed.