Skip to content

How can statisticians help psychologists do their research better?

I received two emails yesterday on related topics.

First, Stephen Olivier pointed me to this post by Daniel Lakens, who wrote the following open call to statisticians:

You would think that if you are passionate about statistics, then you want to help people to calculate them correctly in any way you can. . . . you’d think some statisticians would be interested in helping a poor mathematically challenged psychologist out by offering some practical advice.

I’m the right person to ask this question, since I actually have written a lot of material that helps psychologists (and others) with their data analysis. But there clearly are communication difficulties, in that my work and that of other statisticians hasn’t reached Lakens. Sometimes the contributions of statisticians are made indirectly. For example, I wrote Bayesian Data Analysis, and then Kruschke wrote Doing Bayesian Data Analysis. Our statistics book made it possible for Kruschke to write his excellent book for psychologists. This is a reasonable division of labor.

That said, I’d like to do even more. So I will make some specific suggestions for data analysis in psychology right here in this post, in the context of my next story:

Dan Kahan sent me this note:

The most egregious instance of totally bogus methods I had the misfortune to feel obliged to call foul on involved an econometrics study that purported to find that changes in law that never happened increased homicides by “lowering the cost” of committing them…)

Actually, as you know, often times investigation of a “wtf?!” report like this discloses that the problem is in the news report & not in the study.

I think you agree that many of the “bad statistics/methods” problems & even the “nonreplicability” problem are rooted in the perpetuation of a set of mindless statistical protocols associated with ossified conception of NHT (one from which all the thought that it might have reflected was drained away & discarded decades ago).

But certainly another problem is the “wtf?!!!!!!” conception of psychology.  Its distinguishing feature is its supposed discovery of phenomena that are shocking bizarre & lack any coherent theory.

The alternative conception of psychology is the “everything is obvious — once you know the answer.”  The main point of empirical research isn’t to shock people. It’s to adjudicate disputes between competing plausible conjectures about what causes what we see.  More accounts of what is going are plausible than are true; without valid inference from observation, we will never separate the former from the sea of the latter & will drown in a sea of “just so” story telling.

I have zero confidence in “wtf?!!!” & am convinced that it is a steady stream of bogus, nonreplicable studies that hurt the reputation of psychology.

I have lots of confidence in EIO–OYKTA. It’s not nearly so sexy — which is good, b/c it removes the temptation to cut corners in all the familiar, petty ways that researchers do (usually by coaxing out a shy “p < 0.05” to emerge  w/ one or another data-manipulative come-on line). But it is dealing with matters that reflect real, theorized, validated mechanisms of psychology (the issue in each case is — which one?!), and ones that are important enough for researchers to keep at it essentially forever, revising, correcting, improving our evolving understanding of what’s going on.

Kahan points to a much-mocked and criticized study by Kristina Durante, Ashley Arsena, Vladas Griskevicius, “The Fluctuating Female Vote: Politics, Religion, and the Ovulatory Cycle,” which was reported then retracted from CNN under the title, “Study looks at voting and hormones: Hormones may influence female voting choices.”

The relevance for the present discussion is that this paper was published in Psychological Science, a top journal in psychology. Here’s the abstract:

Each month many women experience an ovulatory cycle that regulates fertility. Whereas research finds that this cycle influences women’s mating preferences, we propose that it might also change women’s political and religious views. Building on theory suggesting that political and religious orientation are linked to reproductive goals, we tested how fertility influenced women’s politics, religiosity, and voting in the 2012 U.S. presidential election. In two studies with large and diverse samples, ovulation had drastically different effects on single versus married women. Ovulation led single women to become more liberal, less religious, and more likely to vote for Barack Obama. In contrast, ovulation led married women to become more conservative, more religious, and more likely to vote for Mitt Romney. In addition, ovulatory-induced changes in political orientation mediated women’s voting behavior. Overall, the ovulatory cycle not only influences women’s politics, but appears to do so differently for single versus married women.

I took a look at the paper, and what I found was a bunch of comparisons and p-values, some of which were statistically significant, and then lots of stories. The problem is that there are so many different things that could be compared, and all we see is some subset of the comparisons. Many of the reported effects seem much too large to be plausible. And there’s a casual use of causal language (for example, the words “influenced,” “effects,” and “induced” in the above abstract) to describe correlations.

Beyond all that, I found the claimed effects implausibly large. For example, they report that, among women in relationships, 40% in the ovulation period supported Romney, compared to 23% in the non-fertile part of their cycle. Given that surveys find vary few people switching their vote preferences during the campaign for any reason, I just don’t buy it. The authors might respond that they don’t care about the magnitude of the difference, just the sign, but (a) with a magnitude of this size, we’re talking noise noise noise, and (b) one could just as easily explain this as a differential nonresponse pattern: maybe liberal or conservative women in different parts of their cycle are more or less likely to participate in a survey. It would easy enough to come up with a story about that!

Anyway, my point is not to slam the work of Durante et al. They did a little study, wrote it up, and submitted it to one of the leading journals in their field. It’s not their fault the journal chose to publish it.

Also, let me emphasize that I’m not saying that their claims (regarding the effects of ovulation) are false. I’m just saying that the evidence from their paper isn’t as strong as they make it out to be.

A statistician offers helpful advice for psychology researchers

My real goal here is to address the question that was brought up at the beginning of this post: What recommendations can I, as a statistician, give to psychology researchers? Here are a few, presented in the context of the paper on ovulation and political attitudes:

1. Analyze all your data. For most of their analyses, the authors threw out all the data from participants who were PMS-ing or having their period. (“We also did not include women at the beginning of the ovulatory cycle (cycle days 1–6) or at the very end of the ovulatory cycle (cycle days 26–28) to avoid potential confounds due to premenstrual or menstrual symptoms.”) That’s a mistake. Instead of throwing out one-third of their data, they should’ve just included that other category in their analysis.

2. Present all your comparisons. The paper leads us through a hopscotch of comparisons and p-values. Better just to present everything. I have no idea if the researchers combed through everything and selected the best results, or if they simply made a bunch of somewhat arbitrary decisions throughout of what to look for.

For example, I would’ve liked to see a comparison of respondents in different parts of their cycle on variables such as birth year, party identification, marital status, etc etc. Just a whole damn table (even better would be a graph but, hey, I won’t get greedy here) showing these differences for every possible variable.

Instead, what do we get? Several pages full of averages, percentages, F tests, chi-squared tests, and p-values, all presented in paragraph form. Better to have all possible comparisons in one convenient table.

3. Make your data public. If the topic is worth studying, you should want others to be able to make rapid progress. If there’s some confidentiality restrictions, remove the respondents’ identifying information. Then post the data online.

4. And now some advice for journal editors. What’s the purpose of a top journal in a field such as psychology? I think they should be publishing the top work. This paper is not top work, by any standard. The researchers asked a few survey questions to a bunch of people on Mechanical Turk, then did who knows how many comparisons and significance tests, reported some subset of the results, then went on to story time. It’s not innovative data collection, it’s not great theory, it’s not great data analysis, it’s not a definitive data source, it’s nothing. What it is, is headline-bait that’s not obviously wrong. But is that the appropriate standard? It’s not obviously wrong to three referees, so publish it? Psychological Science is “the flagship journal of the Association for Psychological Science . . . the highest ranked empirical journal in psychology.”

As a statistician, my advice is: if a paper is nothing special, you don’t have to publish it in your flagship journal. In this as with the notorious Daryl Bem article, I feel that the journal almost seemed to feel an obligation to publish a dubious claim, just because some referees didn’t happen to find any flaws in the data collection or analysis. But if Psychological Science does not publish this article, it’s not censorship or suppression; the authors can feel free to submit it to a lesser journal. For the leading journal to have such low standards, this is bad news for the entire field. For one thing, it encourages future researchers to focus on this sort of sloppy work.

I’m hoping the above advice will make Stephen Olivier happy. I’m not just sitting there criticizing something, or telling someone to use R instead of SPSS, or lecturing psychologists about their lack of mathematical skills. I’m giving some very specific suggestions that you, the psychology researcher, can use in your next research project (or, if you’re a journal editor, in your next publication decision).

There is, of course, lots and lots of additional advice that I and other statisticians could give. The above is just a start. But I wanted to start somewhere, just to demonstrate to Olivier (and others) that this is indeed possible.

P.S. Blogger Echidne raised similar points last year.


  1. Anonymous says:

    Someone once told me that every data description in a manuscript should have a p-value next to it. It’s unfortunate that some people actually follow this stupid practice.

  2. Jason says:

    The sad thing is that editors at Psych Science have an incentive to publish just that kind of work that you so accurately describe as
    “headline-bait that’s not obviously wrong”.
    It ensures press coverage, presumably citations, and thus an increase in the all-mighty impact factor. A while ago I had a chat with one of the (former) editors of Psych Science and learned that they place a high premium on “sexy data” when it comes to publishing.

    Can somebody now explain to me what “EIO–OYKTA” (above post) stands for?

  3. Dan H says:

    This goes back to an issue that is percolating in my head. Psychology and most of biological research has radically changed in the past couple of decades. Despite this, the math-based course requirements for most undergrad psychology departments (and most other bio majors) is still the pre-med requirements of physics and calculus plus an introduction to statistics course which doesn’t go far beyond probability, t-tests, and ANOVA. For a graduate degree in psychology, you’ll have to take one more intro stats course that re-covers probability, t-tests, and ANOVA in a bit more depth. Beyond whether Baysean statistics should be a requirement, why can’t there just be agreement that more formal quantitative training is required before calling someone competent in a biological field? Personally, I’d love to see two semesters of undergrad statistics or courses with a heavy statistical/data visualization component, and one semester of a course with a heavy programming/scripting component. I’m not sure what to do with graduate students where there is less total coursework, but it would be nice if every grad student had to bring their research ideas to someone with a strong statistical background (since many psych faculty never had this training). The end result of this training should be someone should be able to look at their data or a paper their reviewing and know when to say that the results look weird.

    • Pdiff says:

      Dan H, your comment is on the mark. As a long time consulting statistician, all I could think of reading this is “I can only help researchers as far as they can understand the analysis”. Beyond that, I become the researcher myself and that is not my job. Researchers need strong analytical backgrounds to get the most from their research (and perhaps catch bad research as editors and reviewers).

  4. vfm says:

    EIO-OYKTA stands for “everything is obvious – once you know the answer.”

  5. jonathan says:

    You made a comment that you could see the claimed effects were large compared to surveys. That to me is an admonition to do external checks and report those results. If that’s in your “comparisons” point 2, I think it should be explicit. My reading of your point was they should show their work, while I took your comment in the body to mean, “Do this better.”

    My dad had a great example of external checks. He was listening to the radio and heard that Martians had invaded the earth. He ran and told his dad. He said my grandfather thought for a moment and asked, “Is it on every station?” Silly but if you’re showing a 40% to 23% change, then you should check to see if that’s on every station.

    • Andrew says:


      That’s a good point, that it’s a good idea to compare to external checks. The only trouble is, you have to know where to look. I expect the authors of the article in question would say that, indeed, the large effects they find are consistent with other research findings in the psychology literature regarding the large effects of hormones on behavior.

  6. John says:

    I’ve read Psychological Science for some time and it is a headline bait journal pure and simple. If it wasn’t quite general enough for Nature then you go to PsychSci next. I’m sure you don’t have time Andrew, but if you actually look through an issue you’ll find the premiere journal is chock full of garbage that’s newsy.

    It’s my favourite journal to hand students doing independent research. There are so many easy followup or mandatory replication studies that they don’t really have to dig deep to get started on a project that’s newsy and potentially interesting to them. But really, what’s published often rises to no more than the undergraduate student project level.

    • jrkrideau says:

      Psychogical Science also may be prone to exaggerating it’s impact if this wiki entry is to be believed. It up there but it’s still not JPSP even if they both seem to publish the occasional weird article.

      Daniel Lakens article has a link to an interesting power calculation spreadsheet he put together. It looks like it may be be the equivalent of the R package “”. Kinda scares the heck out of me but one can see why it might be useful.

  7. […] and psychology. It is a yearning piece. Statistics, why are you so coy? And, then, my favorite Statistician gets a hold of it and answers. May this be the beginning of a beautiful […]

  8. Lee Sechrest says:

    The problem is not so much one of statistics as of fundamental ignorance of the principles of the “conduct of inquiry,” to use Abraham Kaplan’s term for philosophy of science, along with equally fundamental ignorance of, or maybe just ignoring of, the basic principles of research methodology. The Durante, et al., paper was flawed way beyond its statistics. Not even gentle concern for the tender quantitative sensibilities of psychology students can justify the failure of so many of our graduate programs to ensure that students know how to do research that is worth good statistical analysis.

    • Jay Verkuilen says:

      I absolutely agree. Statistical concerns are rarely neatly separable from issues of study design and substantive theory. Many analytic techniques have hidden substantive assumptions baked in as well, and these can be far from innocuous.

  9. Daniel Lakens says:

    Dear Andrew,

    Thanks for referring to my blogpost. To be sure, there are some great examples of statisticians making practical contributions. Kruschke’s book is definitely one of them, and I think it is an important step towards getting psychologists to actually do Bayesian statistics. I also agree with Dan H’s comment that teaching can improve (but some curricula are so full, it is difficult to make room). In my experience, most psychologists want to do the right thing, but their time to learn how to do this is limited (as for every academic). Our research questions take us all over the statistical landscape, and this means we sometimes need techniques, which we are not experts in. We have to weigh learning how to do meta-analyses, Bayesian statistics, learning how to create better virtual environments for experiments, improve our understanding of physiological measurements, and reading that relevant theoretical work from the 50’s (to name just a few of the choices to spend your time on). This explains why we are such great fans of online forms (Kruschke’s online bayesian tests), easy to use programs (G*Power software), or spreadsheets. Perhaps we should have learned more programming skills (but others will tell you we need more signal processing skills, more knowledge of psychological theories, etc), so formulas or even R code is not the best way to get us to use things you create. So if you are interested in improving knowledge-utilisation (which is basically what my post is about, and yes, psychologists can make some big improvements in that department themselves), then keep your audience, and their limitations, in mind. It will get you a lot of appreciation (and if you think of something really useful, such as G*Power, it will easily get you some papers with thousands of citations).

    All the best,

    Daniel Lakens

    • Andrew says:


      Software is great—I write a lot of it myself—but I think it can also be useful to formulate general principles such as Analyze all your data and Present all your comparisons. This leads to an interesting challenge: to develop software that makes it easier for researchers to analyze all their data and present all their comparisons. I’ll have to think about how this could be done.

  10. John says:

    I understand and am sympathetic for the general thesis of this post; however, I think many statisticians and some of the other critics of some of the seemingly sensational papers in Psych Science would find a more receptive audience without repeatedly devaluing the paper by referring to the paper as “story time” or “lots of stories” and the like.

    I am not an expert on Durante et al’s paper or theoretical perspective, as I imagine is the case of Andrew and others commenting here. The position of Durante et al might be a reasonable theoretical position based on past research. Moreover, if you look at Durante’s publication record (, she seems to be telling the same basic story consistently across publications. That is, as opposed to the inferences invited by the post above that Durante et al made of the stories after seeing the data, the history of her publications invites the inference that these tests follow from her other work on similar topics.

    Of course this doesn’t have anything to do with how convincing the statistics are, if the effect size seems implausibly large, or if there are alternative plausible accounts for results that are consistent with what research knows about ovulation, politics, and their combination. My point is that as non-experts on the theoretical background, it is probably more convincing in the long run to avoid devaluing the perspective along with the statistics.

    I often see critiques of papers in Psych Science (or JPSP, or Science, or whatever) that suggest there is “no theory” behind the effect etc., but either didn’t read the paper or are unfamiliar with the topic and so do not know that there is, in fact, quite a bit of theory that predicts a particular effect. I think this type of uninformed, seat-of-the-pants, intuitive theoretical critique is not helpful and allows people who do know the theory/research to dismiss the rest of the argument.

    • John#2 says:

      Oops…realized there is another John. So, consider the post above as from John#2

    • Andrew says:


      As I wrote, I’m not saying that their claims (regarding the effects of ovulation) are false. I’m just saying that the evidence from their paper isn’t as strong as they make it out to be. And I do think it ridiculous that in their paper they do not even consider alternative explanations. All in all, that paper has several flaws and no exceptional strengths, and I am unhappy about a system that would let it be published in a top journal.

      I am indeed an expert on voting and political attitudes, but the main purpose of the above post is not to criticize that paper (which I don’t think many people are taking seriously) but rather to offer some general recommendations for psychology researchers. It was just serendipity that I received the request for statisticians to be helpful, on the very same day that someone asked me to comment on the Durante et al. paper. It worked well to use this paper as an example to develop some suggestions.

      • John#2 says:


        I regularly read your blog (never commented until now). I am aware of your expertise and enjoy your academic papers. My comments were not a critique of the substance of the post, but rather a comment on style, such that I think it could be improved so as to be more convincing and less irritating to some of the same people you were trying to convince. I guess to say it another way, if I were in Durante et al’s position, I would be more likely to take the good advice about voting behavior and statistics if I wasn’t also accused of “story telling”. I want people to take this advice, so please interpret my comments in the spirit of helping things get better (I apologize that this wasn’t clear the first time) .

        • Andrew says:


          Yes, this makes sense. My intended audience was not Durante et al. but rather the general audience of psychology researchers, most of whom, I think, would feel there’s something wrong with that published article but not quite be able to put their finger on what went wrong, from a statistical perspective. That’s the point of my recommendations, to say that if these general principles had been followed, perhaps these mistakes could have been avoided.

  11. badrescher says:

    As a cognitive psychologist and statistician with pretty extensive training in Bayesian modeling, Structural Equation Modeling, GLM, non-linear, and non-parametric testing, I think I can disagree with you confidently.

    First, although many fields of psychology are notorious for using weak statistical methods, many are LEADERS in the application and even development of very good methods. We need them, so we come up with them AND we can justify their use.

    Second, it seems that you could learn a lot about good methodology from psychologists. For example, and there’s no nice way to put this, your advice sucks. It would actually cause all kinds of problems for psychologists and degrade the validity of what we do. We have reasons for our standard practices that you couldn’t know about because you clearly have little training in the area.

    For example, your very first piece of advice destroys external validity. We exclude subjects for good reasons that have nothing to do with statistical analysis.

    Statistical analysis is a tool that provides us with information about probabilities of events. It is not the foundation for methodology. Some statistical analyses can help us improve validity, but no test can make up for a lack of screening (excluding subjects based on criteria chosen beforehand) or a lack of random assignment.

    And sharing data is not as simple as you seem to think it is. Yes, it would be easier for other researchers to uncover fraud or unintentional errors, but it also provides a means for people to challenge valid conclusions with post-hoc analysis. This is not a good thing. It’s called “fishing”, and it leads directly to Type I errors.

    These are just a couple of examples of what’s wrong with your piece.

    There is much you should learn about what we do before suggesting that we change it, especially when there are PLENTY of people working in the field of psychology whose primary training is in statistics. These are people who have a good handle on both fields and are qualified to discuss how statistical tools might be better used.

    • Andrew says:


      I think you misunderstood my post. I have a huge respect for psychology researchers and psychometricians. In fact, one of my general principles of statistics is, Anything we can think of, some psychometrician did it fifty years earlier.

      In this case I was responding specifically to a self-described “mathematically challenged” psychologist who was appealing to statisticians to help. I’m hardly dissing psychologists to respond to a call for help.

      Regarding the rest of your comment, let me just make three points:

      1. I think there is a problem with the field of psychology when a leading journal in the field can publish a paper that is so weak. But don’t trust me on this: lots of psychology researchers agree with me. The publication of weak results is a well-known crisis in psychology. See, for example, the discussion of work by Uri Simonsohn, Brian Nosek, Gregory Francis, E. J. Wagenmakers, Jelte Wicherts, and others. You can of course feel free to disagree with all these people, but then you should just be aware that you’re disagreeing with some serious psychologists.

      2. No, my advice to analyze all your data does not “destroy external validity.” If you have a particular claim to make about a particular subset of the data, go for it, you can have all the external validity you want. But it doesn’t make sense to throw away the rest of your data. Especially not in an example such as this one where the hypotheses are so speculative.

      3. You write:

      And sharing data is not as simple as you seem to think it is. Yes, it would be easier for other researchers to uncover fraud or unintentional errors, but it also provides a means for people to challenge valid conclusions with post-hoc analysis. This is not a good thing. It’s called “fishing”, and it leads directly to Type I errors.

      I think that paragraph is simply ridiculous. To say that people shouldn’t share their data because someone else might come to a different conclusion? What a horrible, horrible attitude. This is the kind of thinking that leads our government to classify millions of documents a year, not because these are state secrets, but because they don’t want to be embarrassed. When I do an analysis I’m happy to share my data. And if others come to different conclusions, that’s just fine. That’s how it should be.

      Your comment is useful if for no other reason than to remind us that there are people with such defensive, backward-thinking attitudes, to remind us (that is, those of us who are fighting for open data, open analysis, and open review) what we’re struggling against.

      • Anonymous says:

        In this case I was responding specifically to a self-described “mathematically challenged” psychologist who was appealing to statisticians to help. I’m hardly dissing psychologists to respond to a call for help.

        The title of your post suggests otherwise. So does the text of it. BTW, “I’ve written a lot about this” doesn’t make you “the right person to ask”. Anyone can write stuff.

        Regarding #1, I never said that the field’s publishing practices were perfect. If that’s what you got from my comment, you weren’t paying attention.

        Regarding #2, you’re simply wrong.

        Regarding #3, you clearly don’t understand what my comment means. Do you know what “fishing” refers to?

        It’s posts like yours that remind us that there are people whose overconfidence and ignorance threatens to undermine the scientific process.

        • Andrew says:


          Intellectually, I realize that there are a lot of confused and angry people out there, and some of them write blog comments. Emotionally, though, I have to admit that this sort of thing upsets me. I’ve been lucky in that we get very few trolls on this blog. Most of our commenters, even those who disagree with me, get themselves suitably informed on the topic before coming here to argue. In fact, I often learn a lot from well-informed commenters who disagree with me. I suggest that if you want to get anything useful out of commenting here in the future, you inform yourself. Statements on your part such as, “We have reasons for our standard practices that you couldn’t know about because you clearly have little training in the area,” are simply misinformed. Understandably defensive, perhaps, but misinformed, and really a waste of your time as much as mine.

          • Entsophy says:

            Andrew, I know that last line was directed at you, but if you don’t object, I’d like steal it and use it as my tag line:

            “Overconfidence and Ignorance so great it threatens to undermine the Scientific Process”

            I love it! I’m going to put it on email signature.

      • Brad Stiritz says:


        >>And sharing data is not as simple as you seem to think it is. Yes, it would be easier for other researchers to uncover fraud or unintentional errors, but it also provides a means for people to challenge valid conclusions with post-hoc analysis. This is not a good thing. It’s called “fishing”, and it leads directly to Type I errors.

        >To say that people shouldn’t share their data because someone else might come to a different conclusion? What a horrible, horrible attitude.

        Thank you for sharing your reaction. I was confused by badrescher’s claim. I don’t see that publishing one’s raw data leads directly to Type I errors. Do you understand where s/he’s coming from? If so, could you please add a few expository words for a very interested non-specialist? Is there sound logic here? If not, why not?

        Also, just a general note to everyone here: this is a serious blog published by a serious person. If you post critical comments anonymously, it reduces your credibility & raises questions about what kind of frame of mind you’re in. People will always & forever disagree about things, so that’s expected. Please have the decency & respect for others to compose your thoughts carefully, take your best shots & sign your name to them, so it’s evident that you’re a serious person as well.

        • Ben Murrell says:

          I think that anon is very clearly badrescher, who *is* posting under her name (follow her name link) but didn’t enter the Name field for her second post.

    • Anonymous says:

      I think you missed the point with the “use all the data” advice. I think he means his usual spiel of using all the data with partial pooling, not complete pooling. Modeling the degree variation between the groups of particular interest vs. other groups should actually provide some information to help assess external validity.

      The obsession with Type I errors has really done some harm to scientific research. It’s not that choosing to work with a type I error framework is inherently bad, but it’s just that there’s a tradeoff involved in any choice of cost function. There are some problems that are specific to “type i error” style thinking and problem solving (e.g. multiple comparisons), but people end up thinking they’re the _only_ problems that exist and forget about other elephants in the room such as effect size overestimation, efficiency, base rates, and confounding.

      • Andrew says:


        Indeed, it is my usual spiel, but in this case I’m trying to say something slightly more general. Even if you’re not planning to use multilevel modeling or Bayesian inference, even if all you’re going to do is make a bit table of comparisons with p-values, I’d still like to see the whole table, rather than have 1/3 of the data excluded. I’d prefer to display all three columns (ovulation, post-ovulation, and PMS/menstrual), and then if the author wants to focus just on the first two columns, fine, but at least the reader would be able to see all three columns and draw his or her own conclusion. Even without any multilevel modeling, I think this would be a step forward.

    • Jay Verkuilen says:

      (Disclosure: PhD in psychometrics from UIUC, 2007, among other degrees, currently a faculty member at CUNY Graduate Center)

      I think that Andrew Gelman’s guidelines represent a nice take on ways to foster good practice. I’ll stick just to openness, though:

      Releasing data isn’t an invitation to fishing in any meaningful sense, any more than already plagues areas of psychology, with Psych Science all too easy an example of the bias towards newsworthiness trumping proper scientific process. (Psych Science was also the home of arbitrary rules about statistics better known as P-rep, taken based on what seems largely to be an editorial whim.)

      For instance, a number of fields have generated reporting requirements or norms such that authors need to post data or code. Even if they don’t require it, some fields have very healthy norms about this. DataShop ( at CMU comes to mind; many datasets are publicly available in known data formats and this helps the field a great deal. People argue about how to format data so others can use it. It’s a public, repeatable process. UCI similarly hosts a number of well-known datasets which have been used to benchmark many new statistical and analytic procedures ( And of course there’s the granddaddy of public availability, ICPSR at Michigan, where many disciplines post their data and analysis code ( These fields have learned to cope with public data.

      Far too often, psychologists don’t make their data available. This may well be for good reasons, e.g., HIPAA, FERPA, or IRB requirements, but it is also often just because we don’t bother, because that’s what we’ve always done, or might well fear the consequences of doing so. Openness helps others use what you did and check your results. I know that posting analysis code to an article helps get others to use your work and so it’s worth it just on that alone. I also know that I’ve written things with mistakes in them (hopefully nothing too big!). That’s something that we have all done if we are being honest with ourselves.

      Newer tools such as Sweave and making code public and transparent go a long way to towards alleviating the impact of mistakes.

  12. […] Source:… […]

  13. RJB says:

    My concern about recommendations 1 (analyze ALL the data) and 2 (report ALL the comparisons) is that doing so will yield incomprehensibly complex and voluminous articles. If part of my sample is fundamentally different from the rest, I will have to conduct a far more complex analysis for little benefit, and I doubt my readers will take much from the paper. If I present every single comparison, I feel like I am abdicating my role as author to help readers direct their attention to the comparisons that are actually relevant to my theory. And isn’t part of the point of sharing data to allow authors to write concise papers that emphasize the key findings, while giving free rein to those who want to explore the details viewed by the author as less valuable?

    Of course, this all presumes that there is a theory being tested, so that the author can make reasonable selections of data points and comparisons. I think you (or maybe it is just Kahan) overstate the atheoretical nature of most social science. Would you stick by points 1 and 2 if you believed the researcher had a strong theoretical foundation for their predictions that a subset of subjects will exhibit a particular set of comparisons? Or is this advice contingent on the assumption that the study lacks a believable theory?

    • Jay Verkuilen says:

      Simple solution? Supplementary web appendix. Those who care can take a look at it. Those who don’t can read the shorter paper.

  14. Andrew says:


    You write, “My concern about recommendations 1 (analyze ALL the data) and 2 (report ALL the comparisons) is that doing so will yield incomprehensibly complex and voluminous articles.”

    I disagree. I think that a well-made graph (or even a table) could express all this information in less space than is currently taken up by the prose in the article giving selected parameters and p-values. Of course the author can help readers direct their attention, but I’d rather have this direction done in the context of seeing all the comparisons. As it is, if I don’t see a comparison, I wonder if it’s been excluded because the authors looked at it and it didn’t fit their story.

    You write, “If part of my sample is fundamentally different from the rest . . .” But in this case the decision to exclude menstruating and PMS days seems arbitrary and inappropriate. Indeed, if women had different preferences during PMS, that would fit the story too, no? I don’t think this particular study is athoretical; if anything, there’s too much theory. They have enough theory to explain anything.

    • Anonymous says:

      This may be a semantic issue, but I wouldn’t call that “too much theory”, but rather “too flexible of a theory”. To me, too much flexibility is usually a sign of too little theory. “More theory” means being increasingly specific about the underlying mechanism. Usually the more specific one is in regards to the underlying mechanism, the more ways there are to invalidate it and the less flexible the theory will be to explain observations inconsistent with the mechanism.

      The limiting example of this is the claim that “God causes everything” – is the problem with this model too much, or too little theory?

      I agree with the comments that researchers need to stop using prose when a picture would be more appropriate.

    • RJB says:

      Thanks for the quick reply, Andrew. However, your response sounds specific to a particular study, while recommendations 1 and 2 seem directed at a generic psychologist. In that broader context, I fear that you are recommending a lopsided imbalance between Horn’s two pragmatic maxims of language: to communicate as much information as possible (quantity), and to communicate the most relevant information possible (relevance). The two maxims conflict, because the more we say, the harder it is for our audience to know what we think is relevant.

      As a frequent reader, reviewer and editor, I hesitate to advise authors to write extremely long papers filled with analyses that aren’t relevant to the contribution of the story, and thereby imposing a tremendous burden on readers.

      Am I still misreading you? Or do we just differ on the relative value of quantity and relevance in academic articles? (Again, note that I am not talking about the specific paper above, but the general approach we should take to writing up our results.)

      • Andrew says:


        I think you can do all the comparisons without taking up more space in the articles. Currently, lots of space is taken up presenting comparisons in prose, one at a time. A table or graph will show more in less space. Of course, some choice still needs to be made: the raw data should be in an online archive, not repeated number-for-number in the paper. But there are lots and lots of cases like the paper under discussion, where natural questions arise because the readers are only shown a few pieces of the puzzle, and then we have to guess what the other pieces look like.

        • K? O'Rourke says:

          My first thought would be something like a meta-analysis forest plot to show all the observed comparisons with confidence/credible intervals as one of 20 lineup plots to give some sense of the real uncertainties involved (dependencies, multiplicities, etc.). Of course, along with a plot of accompanying observed control means/rates with confidence/credible intervals so that nuisance parameters are not forgotten about – so two (maybe large and unwieldy) plots.

          Wonder if there are examples out there of show succinctly all the comparisons that could have been made in a study?

          • Anonymous says:

            I’ve seen forest plots on posterior contrasts in “plotmatrix” form that do this, but I don’t think this sort of thing is widely used.

  15. GL says:

    The original post by Daniel Lakens (unfortunately, I couldn’t find an easy way to comment there) seems based on a bizarre premise, namely that everything statisticians should do is provide simple-to-understand software for psychologists. Otherwise it’s difficult to see what the criticism actually is.

    Sure, you can find many articles that do not seem relevant to you personally or that do not come with an implementation but statisticians write tons of software – there are even a few journals entirely devoted to that – and books explaining how to analyze data with many different software packages. SPSS is less popular than other packages but it’s simply not true that statisticians don’t care about making their results applicable. And if you are going to demand that everything be implemented in SPSS, you have to consider that writing software fit for release actually takes a lot of time and that researchers in other fields all have their preferences (SAS, R, Minitab, Stata…) so the notion that it would only take 10 minutes is a joke. (Not to mention the fact that for you, the psychologist, it’s apparently OK not knowing how to read a mathematical formula, use R or program at all but that statisticians apparently should all be expert programmers in half a dozen languages or more even if they are theoreticians not particularly interested in your problems.)

    More importantly, there are some thing that are fundamentally wrong in the way many experimental psychologists approach data analysis. There is no easy fix for that and moaning because being criticized does not feel nice does not make the problem disappear. Besides, not everything can be explained in the push-button format of “Discovering statistics with SPSS” or boiled down to a convenient rule of thumb. For example, statistical power depends on many things that weren’t covered in a typical psychology curriculum but there is no way around it (it’s not extremely complicated in fact but I can see how it can feel dauting at first). Recommending 20 participants per cell does not solve any problem.

    You just can’t assume that there must be a painless solution accessible to you with your current software and level of knowledge. Maybe, just maybe, this stuff is complicated and you really need to educate yourself or, failing that, to hire a specialist to help you. You wouldn’t expect, say, neuroimaging to be cheap and doable with only Excel and an afternoon’s time to read a cookbook. Why should statistics work that way?

    I am myself a psychologist by training and I find all this sense of entitlement very annoying. If you don’t understand why, consider this analogy: You are basically acting like a software engineer asking how to design some software screen, a recruiter trying to select the best employee, or a marketing specialist formulating an ad who would demand that you, personally, whatever the subfield of psychology you are in, provide actionable advice now now now, without making any effort to learn anything, no recognition that your knowledge might require some expertise to apply properly, no patience for any distinction you might want to make, and no willingness to examine their own hidden assumptions. This is clearly absurd.

    • Daniel Lakens says:

      Dear GL,

      I think you are missing my point, and you might not have read my post carefully. For example, I’m actually saying that recommending to collect 20 participants per cell is not solving anything, or that writing articles that are not relevant for a large audience is perfectly fine (I do it myself).

      First of all, there is no sense of entitlement. It’s perfectly fine to keep on writing articles with formula’s, and don’t provide anything else (such as R code, spreadsheets, etc). I’m just saying that thinking about how others might want to use what you come up with, will be very helpful for those people. I don’t think anyone is entitled to get help, but I’m pretty sure people appreciate it now and then.

      I don’t expect anyone to provide code for all types of software, R, SAS, Stata, etc. But I do think it is weird that people don’t even share the code they used themselves when working on an article. Surely, sharing what you used yourself can be done in 10 minutes? I think that just as all psychologists should share their data (something I fully agree with Andrew on), statisticians should share the code, spreadsheets, etc they used to perform calculations. But it is perfectly fine if you disagree.

      I’m also not looking for an easy solution. I’m willing to learn something (see my earlier blogpost on how I spend some time figuring out how to calculate and report effect sizes, an experience that led me to write the blogpost you read). Note that I did provide an easy to use spreadsheet, because I know it will make life a lot easier for many of my colleagues. It took me some time, but I’m confident all the time it took me will be comensated in a pay it forward kind of manner by the time it will save the scientific community as a whole.

      My main point was simply that in the literature I came across, it would have been very easy to facilitate knowledge utilization, if the authors had spend just a little time on it. Not just for psychologists – people have told me that even statisticians spend quite some time to translate formula’s back to some code to be able to use it. Seems easy to save each other some time, that’s all.


      Daniel Lakens

      • GL says:

        I did notice the point about writing specialized articles but I still think the tone of the post really betrays a sense of entitlement or wishful thinking that I have often felt from psychologists.

        For example, you still seem awfully quick in declaring that things other people should do are “easy”. I think that you seriously underestimate how much time it takes to make a computer program fit for release. Let’s be realistic: Uncommented, hard-coded SAS or R routines are not going to help the “mathematically challenged” non-programmer do anything. Of course, I don’t disagree that it would be great to have easier access to any code written to prepare an article but what you wrote in your post was that “it’s not enough” if it’s in R.

        (Incidentally, all the money universities spend on SPSS licenses plus the time all PhD candidates, post-doc, etc. are wasting figuring all this out could buy some serious statistical consulting but that’s usually not entering into the discussion. We prefer to drink beer chatting about what statisticians should do for us for free.)

        In any case, statisticians are way ahead of psychologists when it comes to releasing stuff or making sure their research is relevant (again, entire journals exist to publish and document statistical software or explain techniques to potential users) so picking on a specific article and insinuating it’s a general problem seems unfair to me.

        As a psychologist, I truly believe that there are lots of practical solutions out there. The issue is not statisticians unwilling to help, it’s psychologists actively resist anything that could challenge their 2×2-where-is-my-p-value mentality. If an approach is fundamentally wrong and its users don’t want to consider anything else than a drop-in replacement for their current technique (e.g. another way to spit out a p value that will be interpreted as a proof that an effect is “real”), every reasonable criticism can only be perceived as non-constructive. That’s the real problem, not code availability and moaning about it just becomes a way to avoid recognizing that.

      • Nick Cox says:

        Sharing code covers a multitude here. At the most awkward end is code lost or code that is impractical to send. I have decks of punched cards with old Fortran programs from the 1970s and in the unlikely event of someone asking for the code I would just decline to type it all out again, as the files were lost on the demise of one mainframe. Even for very recent projects there is a spectrum from (a) scripts that are tied to the analysis of particular datasets to (b) programs that are well documented and have enough generality to work with other data. I suspect that much reluctance to provide code stems from diffidence about code of type (a). There can be darker motives in hiding fraud or other indefensible practices, often discussed on this blog, but I’d guess that diffidence is a much bigger factor. Similarly, I guess that very little that was done using SPSS or spreadsheets is easily transferable to others. (That it should be more available is not at issue.)

      • K? O'Rourke says:

        Daniel, I am going to mostly side with GL but first as an aside RA Fisher (one of the most productive statisticians) once made very similar arguments about mathematicians who repeatedly [and callously] refused to help him with the finer math details of his statistical theory.

        Part of the problem, as my Msc advisor once put it, is “do they want help with the problem or just with a technique they (mistakenly?) perceive will address their problem?”. For instance, in one case you indicated if what was coded in R was coded in SPSS (technique) you could solve problems you perceive as important and worthwhile. Most of us rightly want to primarily “teach people to fish” not just catch one for a dinner they want to serve that we are not directly interested in attending.

        Helping with techniques is “washing dishes” and we all should do some of that, but also have those problems we perceive as important that we have to focus on. Why not learn “R”, that’s the software where statistical techniques are mostly prototyped and shared in and there are many web resources many statisticians have worked hard on to facilitate that easily.

        The other part of the problem, is what Kipling referred to as ‘the man in land of the blind being king” – those who build careers making statistics or other fields accessible can’t always resist the temptation to encourage people to misunderstand and accept faulty reasoning that makes perfect sense to them. For instance, encouraging the use of stepwise regression to _sort_ out how to address confounding in observational studies – because it leads quickly to many publications with little statistical work on the statisticians part. (While suggesting privately that statisticians that do otherwise, are unnecessarily complicating things.)

  16. […] written with a view to the interaction between statisticians and psychologists but it applies just as much to statisticians helping scientists with their statistics, which is […]

  17. […] the people present discussed how surprising I thought that Psychological Science is highly ranked (indeed called a top journal scroll down for it), when I thought the hard hitting science could be found in journals like Psych […]