Douglas Campbell writes:
A new study finding that more than half of psychology studies failed to replicate is a very positive step forward for social science. Could a similar study be undertaken in economics, and what would it find? Most empirical economics research is non-experimental, and thus I suspect that most studies would replicate in the sense that if one used the same data and ran the exact same regressions, the results are unlikely to change. However, if one were also to test the robustness of results to additional (or fewer) control variables, differing estimation approaches, or try out-of-sample testing on new data, I suspect less than half would survive. When I was a graduate student, I became frustrated with constantly being assigned to read papers which I felt were clearly wrong. I suspected that the real key to publication for many of these papers was perhaps the right pedigree and a close relationship between theauthors and the editor and/or referees. While I knew the conventional wisdom that it’s a bad idea to write “comment papers” in economics, eventually I became curious what would happen if I tried to take down a “seminal” paper published in a top journal.
If you’ve read this blog long enough, you’ll know that I’m sympathetic to Campbell’s argument. Not that this means he’s correct (or, for that matter, that he’s wrong), I’m just letting you know where I stand, where my preconceptions are.
One paper I had been assigned to read in several graduate courses, on “The Diffusion of Development,” published in the QJE, a leading economics journal published at Harvard, argued that there is a causal link between a society’s average skull size “genetic distance to the US” and its GDP per capita. The authors were careful to point out that their results didn’t necessarily indicate a direct impact of genetic traits on economic development, but that genetic distance could be a proxy for a whole host of other cultural traits which could impact the transmission of technology. However, in my view this point was undercut by the authors’ assertion that the apparent impact of genetic distance on GDP per capita survives the inclusion of an ostensibly exhaustive list of geographic and cultural controls. This suggests that genetic distance may not merely proxy differences in cultural traits, but has a direct impact on GDP. Thus black Africa may be poor because of its genetic endowment, and white Europe rich for the same reason.
Except, …, wait a second here before we go leaping to conclusions.
Uh oh. These sorts of arguments (not Campbell’s, but the ones he’s criticizing) are the kind of thing I hate! See here and here for some examples. My problems with these arguments are: (a) correlations being what they are, these studies are typically based on essentially 2 or 3 or 4 data points, not 91 or 132 or whatever is claimed based on the number of countries in the dataset, and (b) I’m suspicious of the whole GDP-as-revealed-virtue thing, given how time-bound such arguments are.
Campbell might agree with me on this, or maybe not, but in any case he has some very specific criticisms:
How exhaustive were those geographic and cultural controls? A coauthor, Ju Hyun Pyun and I noticed that the authors did not even control for latitude or for a dummy for sub-Saharan Africa in their cross-country income regressions, even as they argued that their results were robust to controls for geographic regions. When we included these controls (standard in this literature) in the first regression we ran, the correlation between genetic distance to the US and development disappeared. We also found that genetic distance to the US failed to predict income levels even when we just included two dummy variables, one for Europe and one for sub-Saharan Africa, with no other controls. Thus, the original findings were equivalent to the observation that white Europe is rich and black Africa is poor, with no more explanatory power than that. While we felt our results were perfectly straightforward, it took us seven submissions and four years to publish our results in a minor journal. Meanwhile, results similar to those we had critiqued continued to be published in leading journals, including one of the same journals where our paper was rejected. We often had to contend with the original authors as referees – once as the sole referee. (Pro tip: if writing a paper like this, recommend to the editor that they not choose the hostile original authors as referees.) One editor sided with a creative referee who objected to our paper on the grounds that “There is no reason to interpret the sub-Saharan dummy as a ‘geographic variable’”. The same referee also zinged us for not including the exact same sample as the original paper, even though the data and original sample were not publicly (or privately) available. This was hardly the type of hassle-to-reward ratio which would lead me to write a similar such paper, at least before tenure. Instead, had we decided to write an “extension” paper, using the genetic distance data to predict some other variable, publication would have been facilitated, since the original authors would have been likely referees and would have been happy to see our results published. The incentive structure here could be improved.
Campbell also writes:
In defense of the authors, the paper itself was at least an interesting idea, and they deserve credit for trying to tackle a sensitive topic such as the link between genetics and development. They are certainly not alone in publishing papers that turned out not to be robust (this should happen to any ambitious researcher), particularly so for research using spatial data, notorious for spurious correlation. Empirical researchers are under a lot of pressure to find statistically significant results that are seemingly robust.
And he concludes with some general comments about replication:
Academic economics dearly needs replication studies to become sexy. There are encouraging signs. Thomas Herndon became famous after catching Reinhart and Rogoff’s excel error. There is a new economics replication wiki, and there will be a panel on replication at the AEA meetings. Some journals, such as the AER, require data to be made available online. (Personally, I believe doing empirical research without posting your data online, at least after publication, should be taboo.) Yet, in a field where building close personal relationships is still easily the best path toward publishing and tenure, more needs to be done. One proposal is for someone to calculate new journal rankings which penalize journals which either do not accept comments on the papers they publish, or rarely publish such comments. It would also be helpful if editors, particularly at leading journals, which have substantial market power, would do more to encourage replication. One proposal is that if the QJE or AER wants to encourage comment papers without hurting their own citation ranking, they could start additional journals focused on replication.
P.S. Lots of discussion in comments, including a response by Enrico Spolaore and Romain Wacziarg that begins, ‘Campbell and Pyun’s paper is a completely misguided criticism of our paper “The Diffusion of Development,” published in the Quarterly Journal of Economics in 2009.’