More frustrations trying to replicate an analysis published in a reputable journal

The story starts in September, when psychology professor Fred Oswald wrote me:

I [Oswald] wanted to point out this paper in Science (Ramirez & Beilock, 2010) examining how students’ emotional writing improves their test performance in high-pressure situations.

Although replication is viewed as the hallmark of research, this paper replicates implausibly large d-values and correlations across studies, leading me to be more suspicious of the findings (not less, as is generally the case).


He also pointed me to this paper:

Experimental disclosure and its moderators: A meta-analysis.

Frattaroli, Joanne

Psychological Bulletin, Vol 132(6), Nov 2006, 823-865.

Disclosing information, thoughts, and feelings about personal and meaningful topics (experimental disclosure) is purported to have various health and psychological consequences (e.g., J. W. Pennebaker, 1993). Although the results of 2 small meta-analyses (P. G. Frisina, J. C. Borod, & S. J. Lepore, 2004; J. M. Smyth, 1998) suggest that experimental disclosure has a positive and significant effect, both used a fixed effects approach, limiting generalizability. Also, a plethora of studies on experimental disclosure have been completed that were not included in the previous analyses. One hundred forty-six randomized studies of experimental disclosure were collected and included in the present meta-analysis. Results of random effects analyses indicate that experimental disclosure is effective, with a positive and significant average r-effect size of .075. In addition, a number of moderators were identified.

At the time, Oswald went a message to the authors of the study, Sian Beilock and Gerardo Ramirez:

I read your Science article yesterday and was wondering whether you were willing to share your data for the sole purpose of reanalysis, per APA guidelines. Thank you for considering this request – your research findings that triangulate across multiple studies are quite compelling and I wanted to examine further out of personal curiosity about this phenomenon.

Beilock replied that they would be happy to share the data once they had time to put it together in an easy package to send him, and they asked what Oswald wanted to do with the data. Oswald replied:

I’m a bit wowed by such large effects found in these studies. I was just wanting to take a look at the score distributions for the d-values and correlations for each study — literally a reanalysis of the data, nothing new. Also, if you had brief advice on getting these anxiety/pressure manipulations to work, that would help one of our graduate students . . . who is implementing this type of manipulation in her dissertation on complex task performance . . .

There is no real hurry on my end, but I do appreciate both your integrity and future efforts in relaying the data to me when you can.

It’s now early December. I was cleaning out my inbox, found this message from September, and emailed Oswald to ask if it was ok with him for me to blog this story. Oswald replied with a yes and with some new information:

Just last Thursday (some 2 months later), I [Oswald] sent Sian Beilock a follow-up to ask for an ETA on the data. This time, Sian replied the next day, telling me that she checked with her IRB that morning, and they encouraged her not to share the raw data [emphasis added]; she then asked what analyses I would like. So…maybe the analyses will come through.

I can’t say this surprises me—the whole IRB thing seems to be set up to discourage openness in data—and let me emphasize that I’m not blaming Beilock or Ramirez. For one thing, IRB’s really are difficult, and they usually do have some arguments on their side. After all, they’re studying people here, not chickens. Also, cleaning data takes work. Just the other day, I responded to a request for data and code from an old study of mine by saying no, it would be too much trouble to put it all in one place.

The real issue here, then, is not a criticism of these researchers, but rather:

1. Suspiciously large estimated effect sizes. This could be coming from the statistical significance filter or maybe some unstated conditions in the experiment.

2. Evil IRBs.

3. The difficulty of getting data to replicate a published analysis. We’ve been hearing a lot about this lately from Jelte Wicherts and others.

What’s really frustrating about this is that Oswald is (presumably) doing a selfless act here. It’s unlikely that you’ll get fame, fortune, or grant funding by replicating or even shooting down a published finding. It’s gotta be frustrating to need to jump through all sorts of hoops just to check some data. The study was published, after all.

P.S. One more time: I’m talking about general problems here, not trying to pick on Beilock and Ramirez. I myself have published a paper based on data we’re not allowed to release.

14 thoughts on “More frustrations trying to replicate an analysis published in a reputable journal

  1. Andrew – Well done, thanks. Yes, the goal is not to target her research in particular. I’m an individual-differences researcher seeking out important information about individual variability that accompanies these reported large effects. Best – Fred

  2. It would be nice if science journals did the following two things. First, accept or reject papers based on a preview that indicates (a) the motivation for the study, (b) the methods for the study, and (c) the proposed analysis — all before any data is collected. Second, require properly anonymized data be submitted along with the paper and then put that data into a searchable database online so that other researchers can make use of the data collected by their peers.

    • It would be very nice for most everyone, except those already winning or hoping to win the academia game (aka academic racketeering)

    • I can not imagine a field except drug RCTs where this would be remotely feasible. It bears no resemblance to how science works in the fields I’ve been in. Maybe some strains of physics? Even then it seems usual for experiments to adapt to preliminary findings, serendipity to be extremely important, methods to be invented as one goes, and data to be recycled given new ideas.

  3. The American Economic Association journals (which I think are the top economics journals) require(!) the authors to put both their data and their complete code to reproduce exactly the findings shown in the paper on the AEA website. Of course, sometimes you can not put the data on the website, in which case they provide the code and say something like “you can get the data if you get permission from X”.
    This practice is very professional and should be the standard in all sciences and social sciences and I am actually surprised to hear that science does not require anything like that. I am not an economist but frankly, their disciplin seems to be much more advanced in this regard. ‘Preparing your data’ and ‘cleaning’ the code should be part of getting a paper published. People might find it annoying but it should be the standard in science. I think the journals are the one who should enforce that.

    ps: In sociology (my disciplin), ASR just asks you to consider making your code and data available. Of course, no one does it…

  4. The IRB issues may be a story for itself, but how hard can it be to put some data together from some simple pre/post experiments with 40 participants, given that the data was analyzed already and a paper was published using these data? What do I miss?

  5. I can’t say anything to the evil IRBs. Outside of the restrictions imposed by evil IRBs, however, I think scientists should be required to provide their data and code for replication. Perhaps I was fortunate to be recently trained and trained by someone who values replication enough that we spent an academic term understanding the importance of data management (and –most importantly– commenting your code!) with the end goal of making it publicly available upon publication. I personally feel stronger about my findings when I know that the data are there for any skeptic to see for herself. It ties our hands against laziness or, worse yet, dishonesty.

  6. The sample sizes in this paper are too small given the expected effect sizes (typically d=.5 in this line of work) and so the studies are severely underpowered.

    The ethical issue can be dealt with by anonymizing the data and by signing a contract that specifies the use of the data as well as a promise to destroy the data after the reanalyses are finished. If Drs. Ramirez and Beilock continue to refuse sharing of data then it would be good to seek contact with Science.

    Science’s policy on data sharing can be found here: http://www.sciencemag.org/site/feature/contribinfo/prep/gen_info.xhtml#dataavail

  7. Andrew, I’m surprised to see you back the excuse that “it’s too much work” to put together a dataset for replication. If there’s anything lesson to be learned from the software engineering world, it’s this: if you don’t clearly document all the steps used to create and analyze the data for a particular study, the probability that you’ll make errors along the way is much greater. In other words, the best practice in software engineering–and by extension, I would argue, in data analysis–is to be careful along the way to clearly document your work and keep everything you need to turn raw data into what you analyze.

    Moreover, if someone can clean data enough to publish some analyses from the data, it’s hard to understand the argument that they don’t want to spend time cleaning the data to share it. Cleaning data and putting it in organized, readable form is not just a step people should do in case someone else asks for the data: the person most likely to benefit is the analyst himself, trying to make sense of a project after putting it aside for 6 months or a year. None of this is the norm, I know, but we should be encouraging best practices here.

  8. Pingback: Data sharing update « Statistical Modeling, Causal Inference, and Social Science

  9. The effect of IRBs is reflected in disciplinary norms regarding data exchange. Economists seldom do research on human subjects as defined by IRB. Rather most of the research we do isn’t subject to their review. Most of it involves archival data, which means that all we must do to satisfy AEA requirements is clearly document the steps used to analyze the data. There is some evidence that this norm has improved the quality of the research reported in leading journals. One of my nearby colleagues requires her doctoral students to replicate the data analysis reported in highly cited articles from the AER. The students typically find one or more errors in the analysis (e.g., annualizing quarterly data by multiplying by 12 instead of four, calculating indices using equations that put zeros in the numerator or denominator) in articles taken from the earlier period that materially affects the size of the coefficients reported and/or their significance. These kinds of errors are rare in more recent articles, although the students still often conclude that some of the steps taken were ill advised.

  10. In 21st Century society, there’s a vast demand for methods to raise student test performance, especially among Under Represented Minorities. Not surprisingly, that leads to fads and acceptance of magical thinking.

Comments are closed.