When we talk about the “file drawer,” let’s not assume that an experiment can easily be characterized as producing strong, mixed, or weak results

Posted on August 28, 2014 7:52 PM by Andrew

Neil Malhotra:

I thought you might be interested in our paper [the paper is by Annie Franco, Neil Malhotra, and Gabor Simonovits, and the link is to a news article by Jeffrey Mervis], forthcoming in Science, about publication bias in the social sciences given your interest and work on research transparency.

Basic summary: We examined studies conducted as part of the Time-sharing Experiments in the Social Science (TESS) program, where: (1) we have a known population of conducted studies (some published, some unpublished); and (2) all studies exceed a quality threshold as they go through peer review. We found that having null results made experiments 40 percentage points less likely to be published and 60 percentage points less likely to even be written up.

My reply:

Here’s a funny bit from the news article: “Stanford political economist Neil Malhotra and two of his graduate students . . .” You know you’ve hit the big time when you’re the only author who gets mentioned in the news story!

More seriously, this is great stuff. I would only suggest that, along with the file drawer, you remember the garden of forking paths. In particular, I’m not so sure about the framing in which an experiment can be characterized as producing “strong results,” “mixed results,” or “null results.” Whether a result is strong or not would seem to depend on how the data are analyzed, and the point of the forking paths is that with a given data it is possible for noise to appear as strong. I gather from the news article that TESS is different in that any given study is focused on a specific hypothesis, but even so I would think there is a bit of flexibility in how the data are analyzed and a fair number of potentially forking paths. For example, the news article mentions “whether voters tend to favor legislators who boast of bringing federal dollars to their districts over those who tout a focus on policy matters).” But of course this could be studied in many different ways.

In short, I think this is important work you have done, and I just think that we should go beyond the “file drawer” because I fear that this phase lends too much credence to the idea that a reported p-value is a legitimate summary of a study.

P.S. There’s also a statistical issue that every study is counted only once, as either a 1 (published) or 0 (unpublished). If Bruno Frey ever gets involved, you’d have to have a system where any result gets a number from 0 to 5, representing the number of different times it’s published.

6 thoughts on “When we talk about the “file drawer,” let’s not assume that an experiment can easily be characterized as producing strong, mixed, or weak results”

question on August 28, 2014 8:38 PM at 8:38 pm said:

Perhaps researchers are publishing the wrong type of result:
Significant result: Two groups were different for “some reason”
Null result: Any effect of treatment was small relative to other factors.

It seems we can learn more from the second case.

Reply ↓
L.J Zigerell on August 28, 2014 10:10 PM at 10:10 pm said:

The garden of forking paths is less an issue in TESS studies than in correlational research. TESS typically fields relatively brief survey experiments, so there are fewer paths, and some of these paths don’t go far; for example, tossing in different sets of controls often does not make much difference. The big forking paths for TESS appear to be restricting the sample, non- or selective reporting or use of manipulations and dependent variables, and the use of post-stratification weights. But TESS makes the data available, so it’s possible to assess whether these forking paths matter.

Believe it or not, I’ve been writing up some unpublished TESS studies. The biggest non-reporting problem is that it appears that some researchers have fielded TESS studies to test hypothesis X, the test does not provide evidence for hypothesis X, and the researchers publish an article presenting evidence for hypothesis X from convenience samples but do not mention the null results from the TESS study.

In any event, it seems that it would be a useful exercise for researchers — and especially researchers-in-training — to write up and submit to journals the unpublished TESS studies: the social science community would benefit from this opening of the file drawer, and the researchers-in-training would get an opportunity to practice presenting results (and would maybe get an article) with “free” data and without needing theoretical innovations.

Reply ↓
- L.J Zigerell on August 29, 2014 1:45 AM at 1:45 am said:
  
  For what it’s worth, I’ve posted summaries of results for four TESS studies that I’ve been working on; from what I can tell, none of these studies were published. Link here: https://www.ljzigerell.com/?p=2230
  
  Reply ↓
jonathan on August 28, 2014 10:41 PM at 10:41 pm said:

I really like the linked “garden” paper. A topic that needs much more light. Jordan Ellenburg’s book is a fun read that touches on those issues.

Reply ↓
Keith O'Rourke on August 29, 2014 12:20 PM at 12:20 pm said:

> any result gets a number from 0 to 5
Surely 1 to 1/5 (on the log likelihood scale)?

Reply ↓
anon on September 2, 2014 1:44 AM at 1:44 am said:

This is another compelling article on publication bias, with a great title: Star Wars: The Empirics Strike Back (Abel Brodeur, Mathias Lé, Marc Sangnier, Yanos Zylberberg)

https://ftp.iza.org/dp7268.pdf

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

When we talk about the “file drawer,” let’s not assume that an experiment can easily be characterized as producing strong, mixed, or weak results

6 thoughts on “When we talk about the “file drawer,” let’s not assume that an experiment can easily be characterized as producing strong, mixed, or weak results”

Leave a Reply Cancel reply