Researcher incentives and empirical methods

Bob Erikson pointed me to this paper by Edward Glaeser:

Economists are quick to assume opportunistic behavior in almost every walk of life other than our own. Our empirical methods are based on assumptions of human behavior that would not pass muster in any of our models. The solution to this problem is not to expect a mass renunciation of data mining, selective data cleaning or opportunistic methodology selection, but rather to follow Leamer’s lead in designing and using techniques that anticipate the behavior of optimizing researchers. In this essay, I [Glaeser] make ten points about a more economic approach to empirical methods and suggest paths for methodological progress.

This is a great point. The paper itself has an unusual format: the ten key points are made in pages 3-5, and then they are expanded upon in the rest of the paper. I think Glaeser’s specific analyses are limited by his focus on classical statistical methods (least-squares regression, p-values, and so forth), but his main points are important, and I’ll repeat them here:

In this essay, I [Glaeser] make ten points about researcher incentives and statistical work. The first and central point is that we should accept researcher initiative as being the norm, and not the exception. It is wildly unrealistic to treat activity like data mining as being rare malfeasance; it is much more reasonable to assume that researchers will optimize and try to find high correlations. This requires not just a blanket downward adjustment of statistical significance estimates but more targeted statistical techniques that appropriately adjust across data sets and methodologies for the ability of researchers to impact results. Point estimates as well as t-statistics need to be appropriately corrected.

The second point is that the optimal amount of data mining is not zero, and that even if we could produce classical statisticians, we probably would not want to. Just as the incentives facing businessmen produce social value added, the data mining of researchers produces knowledge. The key is to adjust our statistical techniques to realistically react to researcher initiative, not to try and ban this initiative altogether.

The third point is that research occurs in a market where competition and replication matters greatly. Replication has the ability to significantly reduce some of the more extreme forms of researcher initiative (e.g. misrepresenting coefficients in tables), but much less ability to adjust for other activity, like data mining. Moreover, the ability to have competition and replication to correct for researcher initiative differs from setting to setting. For example, data mining on a particular micro data set will be checked by researchers reproducing regressions on independent micro data sets. There is much less ability for replication to correct data mining in macro data sets, especially those that include from the start all of the available data points.

Fourth, changes in technology generally decrease the costs of running tests and increase the availability of potential explanatory variables. As a result, the ability of researchers to influence results must be increasing over time, and economists should respond for regular increases in skepticism. At the same time however, improvements in technology also reduce the cost of competitors checking findings, so the impact of technology on overall bias is unclear.

Fifth, increasing methodology complexity will generally give the researcher more degrees of freedom and therefore increase the scope for researcher activity. Methodological complexity also increases the costs to competitors who would like to reproduce results. This suggests that the skepticism that is often applied to new, more complex technologies may be appropriate.

My sixth point is the data collection and cleaning offers particularly easy opportunities for improving statistical significance. One approach to this problem is to separate the tasks of data collection and analysis more completely. However, this has the detrimental effect of reducing the incentives for data collection which may outweigh the benefits of specialization. At the least, we should be more skeptical of results produced by analysts who have created and cleaned their own data.

A seventh point is that experimental methods both restrict and enlarge the opportunities for researcher action and consequent researcher initiative bias. Experiments have the great virtue of forcing experimenters to specify hypotheses before running tests. However, they also give researchers tremendous influence over experimental design, and this influence increases the ability of researchers to impact results.

An eighth point is that the recent emphasis on causal inferences seems to have led to the adoption of instrumental variables estimators which can particularly augment researcher flexibility and increase researcher initiative bias. Since the universe of potential instruments is enormous, the opportunity to select instruments creates great possibilities for data mining. This problem is compounded when there are weak instruments, since the distribution of weak instrument t-statistics can have very fat tails. The ability to influence significance by choosing the estimator with the best fit increases as the weight in the extremes of the distribution of estimators increases.

A ninth point is that researcher initiative complements other statistical errors in creating significance. This both means that spurious significance rises spectacularly when there are even modest overestimates in statistical significance that are combined with researcher initiative. This complements also creates particularly strong incentives to fail to use more stringent statistical techniques.

My tenth and final point is that model driven empirical work has an ambiguous impact on researcher initiative bias. One of the greatest values of specialization in theory and empirics is that empiricists end up being constrained to test theories proposed by others. This is obviously most valuable when theorists produce sharp predictions about empirical relationships. On the other hand, if empirical researchers become wedded to a particular theory, they will have an incentive to push their results to support that theory.

Again, I think we can do better by moving beyond t statistics and statistical significance. Multilevel modeling should help with some of these problems (notably the multiple comparisons issue, as we discuss here). But I agree that more needs to be done to assess statistical methods, anticipating the behavior of researchers. Glaeser’s fundamental points seem relevant to me.

8 thoughts on “Researcher incentives and empirical methods

  1. On the one hand, I think that his third point contains the solution: Replication (on other data sets, I mean) is what really convinces.

    On the other hand, I know that replication does not land your paper in a top-tier journal. At least not if it's pure replication.

    I've long thought that there should be a Journal for Replications of Social Scientific Findings (which, of course, would also publish failures to replicate).

  2. Lemmus Lemmus: I agree that the possibility of replication is what makes results more convincing. But it is not the opportunity to publish replications that affects the incentives not to make up results or massage the data too much. It is the opportunity to publish failures to replicate.
    Given that failures to replicate are quite likely to get published (as long as the original results were important enough to get attention), people have incentives to try and break existing results. The credible threat of a published failure to replicate keeps, at least in part, researchers in check…I would not want to land a paper, say, in the APSR, based on heavily massaged data, just to be portrayed as sort of a scammer one year later…
    The main problem, at least in political science, is that replication is less than straightforward due to the reluctance that sometimes scholars display regarding sharing their data and clarifying the exact techniques they used.

    Policy implication: rather than a JRSSF, we need stricter standards regarding dissemination of the data (and code) on which published results are based.

  3. There are also costs involved in downweighting all evidence as if it had been massaged: doing so strengthens even further the incentive to massage results. (Replication doesn't seem to suffer this drawback.)

  4. Piero,

    I strongly agree that all journals should have a policy which says that all authors must upload their data to the journal's homepage. I've long thought about writing a post about "The Ideal Journal", and this would be a part of it. Not sure about the code, though – I would prefer people to write their own code; otherwise you might repeat the mistakes the original researchers made. Of course, having a JRSSF and having an "upload policy" are not mutually exclusive.

    Not sure I agree with you on the question of replicable results. I think I read that with some of the most renowned econ departments in the US, it is a necessary condition to have published in one of the top three journals if you want to get tenure (sorry, don't have a cite for this). From which I conclude that it is better to publish an article in, say, AER, and later have it debunked, than not publish an article in AER at all. Steven Levitt is a case in point. Got the results of two of his papers absolutely trashed after they were published in top journals.

    I have never seen an article debunking a finding in a "higher" journal than the one the original finding was published in.

  5. I agree on the general "unsexiness" of replication. Even debunking, if done as the main activity, gets one a reputation for being a "spoiler" or a coattail-rider. I don't know what to do about this, since I know that I personally find "new" stuff more exciting than replication or debunking. The best solution I have come up with is to extend or find a new spin on existing questions while repeating/testing previous findings.

  6. ad Glaeser's seventh point: I strongly disagree if I'm reading G's last sentence right: "[…] and this influence increases the ability of researchers to impact results" I'm reading the sentence as follows: "[…] and this influence increases the ability of researchers to impact results [compared to survey research]" — otherwise the sentence makes not too much sense, right? Again, if this is what G. is trying to convey, I strongly disagree! Here is why: We should stop to think of questionnaires as if they were objective means of data collection — they are not. In fact, they are treatments: Question order, phrasing, answer scales/categories, etc. have a tremendous impact on the respondent's reaction to each question. Put differently, the researcher has tremendous impact on the respondent and thereby on the results of the survey.

    A good point to start:
    (1) Schwarz, Norbert (1999) "Self-reports: How the questions shape the answers." American Psychologist. 1999 Feb Vol 54(2) 93-105
    (2) Sudman/Bradburn/Schwarz (1996): "Thinking About Answers: The Application of Cognitive Processes to Survey Methodology." San Francisco, CA: Jossey-Bass.

    Sebastian E. Wenz

  7. I agree on the general "unsexiness" of replication. Even debunking, if done as the main activity, gets one a reputation for being a "spoiler" or a coattail-rider. I don't know what to do about this, since I know that I personally find "new" stuff more exciting than replication or debunking. The best solution I have come up with is to extend or find a new spin on existing questions while repeating/testing previous findings.

  8. Not sure I agree with you on the question of replicable results. I think I read that with some of the most renowned econ departments in the US, it is a necessary condition to have published in one of the top three journals if you want to get tenure. I strongly agree that all journals should have a policy which says that all authors must upload their data to the journal's homepage.

Comments are closed.