Jeff Rouder writes:
Although many researchers agree that scientific data should be open to scrutiny to ferret out poor analyses and outright fraud, most raw data sets are not available on demand. There are many reasons researchers do not open their data, and one is technical. It is often time consuming to prepare and archive data. In response my [Rouder’s] lab has automated the process such that our data are archived the night they are created without any human approval or action. All data are versioned, logged, time stamped, and uploaded including aborted runs and data from pilot subjects. The archive is GitHub github.com, the world’s largest collection of open-source materials. Data archived in this manner are called born open.
Psychological science is beset by a methodological crisis in which many researchers believe there are widespread and systemic problems in the way researchers produce, evaluate, and report knowledge. . . . This methodological crisis has spurred many proposals for improvement including an increased consideration of replicability (Nosek, Spies, & Motyl, 2012), a focus on the philosophy and statistics underlying inference (Cumming, 2014; Morey, Romeijn, & Rouder, 2013), and an emphasis on what is now termed open science, which can be summarized as the practice of making research as transparent as possible.
And here’s the crux:
Open data, unfortunately, seems to be paradox of sorts. On one hand, many researchers I encounter are committed to the concept of open data. Most of us believe that one of the defining features of science is that all aspects of the research endeavor should be open to peer scrutiny. We live this sentiment almost daily in the context of peer review where our scholarship and the logic of our arguments is under intense scrutiny.
On the other hand, surprisingly, very few of our collective data are open!
Say it again, brother:
Consider all the data that is behind the corpus of research articles in psychology. Now consider the percentage is available to you right now on demand. It is negligible. This is the open-data paradox—a pervasive intellectual commitment to open data with almost no follow through whatsoever.
What about current practice?
Many of my colleagues practice what I [Rouder] call data-on-request. They claim that if you drop them a line, they will gladly send you their data. Data-on-request should not be confused with open data, which is the availability of data without any request whatsoever. Many of these same colleagues may argue that data-on-request is sufficient, but they are demonstrably wrong.
Here’s one of my experiences with data-on-request:
Last year, around the time that Eric Loken and I were wrapping up our garden-of-forking-paths paper, I was contacted by Jessica Tracy, one of the authors of that ovulating-women-wear-red study which was one of several examples discussed in our article. Tracy wanted to let us know about some more research she and her collaborator, Alec Beall, had been doing, and she also wanted to us to tell her where our paper would be published so that she and Beall would have a chance to contact the editors of our article before publication. I posted Tracy and Beall’s comments, along with my responses, on this blog. But I did not see the necessity for them to be involved in the editorial process of our article (nor, for that matter, did I see such a role for Daryl Bem or any of the authors of the other work discussed therein). In the context of our back-and-forth, I asked Tracy if she could send us the raw data from her experiments. Or, better still, if she could just post her data on the web for all to see. She replied that, since we would not give her the prepublication information on our article, she would not share her data.
I guess the Solomon-like compromise would’ve been to saw the dataset in half.
Just to clarify: Tracy and Beall are free to do whatever they want. I know of no legal obligation for them to share their data with people who disagree with them regarding the claim that women in certain days of their monthly cycle are three times more likely to wear red or pink shirts. I’m not accusing them of scientific misconduct in not sharing their data. Maybe it was too much trouble for them to put their data online, maybe it is their judgment that science will proceed better without their data being available for all to see. Whatever. It’s their call.
I’m just agreeing with Rouder that data-on-request is not the same as open data. Not even close.