Skip to content

Causal inference in economics

Aaron Edlin points me to this issue of the Journal of Economic Perspectives that focuses on statistical methods for causal inference in economics. (Michael Bishop’s page provides some links.)

To quickly summarize my reactions to Angrist and Pischke’s book: I pretty much agree with them that the potential-outcomes or natural-experiment approach is the most useful way to think about causality in economics and related fields. My main amendments to Angrist and Pischke would be to recognize that:

1. Modeling is important, especially modeling of interactions. It’s unfortunate to see a debate between experimentalists and modelers. Some experimenters (not Angrist and Pischke) make the mistake of avoiding models: Once they have their experimental data, they check their brains at the door and do nothing but simple differences, not realizing how much more can be learned. Conversely, some modelers are unduly dismissive of experiments and formal observational studies, forgetting that (as discussed in Chapter 7 of Bayesian Data Analysis) a good design can make model-based inference more robust.

2. In the case of a “natural experiment” or “instrumental variable,” inference flows forward from the instrument, not backwards from the causal question. Estimates based on instrumental variables, regression discontinuity, and the like are often presented with the researcher having a causal question and then finding an instrument or natural experiment to get identification. I think it’s more helpful, though, to go forward from the intervention and look at all its effects. Your final IV estimate or whatever won’t necessarily change, but I think my approach is a healthier way to get a grip on what you can actually learn from your study.

Now on to the articles:

The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con out of Econometrics
Joshua D. Angrist and Jörn-Steffen Pischke
Since Edward Leamer’s memorable 1983 paper, “Let’s Take the Con out of Econometrics,” empirical microeconomics has experienced a credibility revolution. While Leamer’s suggested remedy, sensitivity analysis, has played a role in this, we argue that the primary engine driving improvement has been a focus on the quality of empirical research designs. The advantages of a good research design are perhaps most easily apparent in research using random assignment. We begin with an overview of Leamer’s 1983 critique and his proposed remedies. We then turn to the key factors we see contributing to improved empirical work, including the availability of more and better data, along with advances in theoretical econometric understanding, but especially the fact that research design has moved front and center in much of empirical micro. We offer a brief digression into macroeconomics and industrial organization, where progress — by our lights — is less dramatic, although there is work in both fields that we find encouraging. Finally, we discuss the view that the design pendulum has swung too far. Critics of design-driven studies argue that in pursuit of clean and credible research designs, researchers seek good answers instead of good questions. We briefly respond to this concern, which worries us little.

Tantalus on the Road to Asymptopia
Edward E. Leamer
My first reaction to “The Credibility Revolution in Empirical Economics,” authored by Joshua D. Angrist and Jörn-Steffen Pischke, was: Wow! This paper makes a stunningly good case for relying on purposefully randomized or accidentally randomized experiments to relieve the doubts that afflict inferences from nonexperimental data. On further reflection, I realized that I may have been overcome with irrational exuberance. Moreover, with this great honor bestowed on my “con” article, I couldn’t easily throw this child of mine overboard. As Angrist and Pischke persuasively argue, either purposefully randomized experiments or accidentally randomized “natural” experiments can be extremely helpful, but Angrist and Pischke seem to me to overstate the potential benefits of the approach. I begin with some thoughts about the inevitable limits of randomization, and the need for sensitivity analysis in this area, as in all areas of applied empirical work. I argue that the recent financial catastrophe is a powerful illustration of the fact that extrapolating from natural experiments will inevitably be hazardous. I discuss how the difficulties of applied econometric work cannot be evaded with econometric innovations, offering as examples some under-recognized difficulties with instrumental variables and robust standard errors. I conclude with comments about the shortcomings of an experimentalist paradigm as applied to macroeconomics, and some warnings about the willingness of applied economists to apply push-button methodologies without sufficient hard thought regarding their applicability and shortcomings.

A Structural Perspective on the Experimentalist School
Michael P. Keane
What has always bothered me about the “experimentalist” school is the false sense of certainty it conveys. My view, like Leamer’s, is that there is no way to escape the role of assumptions in statistical work, so our conclusions will always be contingent. Hence, we should be circumspect about our degree of knowledge. I present some lessons for economics from the field of marketing, a field where broad consensus has been reached on many key issues over the past twenty years. In marketing, 1) the structural paradigm is dominant, 2) the data are a lot better than in some fields of economics, and 3) there is great emphasis on external validation. Of course, good data always helps. I emphasize that the ability to do controlled experiments does not obviate the need for theory, and finally I address different approaches to model validation.

But Economics Is Not an Experimental Science
Christopher A. Sims
The fact is, economics is not an experimental science and cannot be. “Natural” experiments and “quasi” experiments are not in fact experiments. They are rhetorical devices that are often invoked to avoid having to confront real econometric difficulties. Natural, quasi-, and computational experiments, as well as regression discontinuity design, can all, when well applied, be useful, but none are panaceas. The essay by Angrist and Pischke, in its enthusiasm for some real accomplishments in certain subfields of economics, makes overbroad claims for its favored methodologies. What the essay says about macroeconomics is mainly nonsense. Consequently, I devote the central part of my comment to describing the main developments that have helped take some of the con out of macroeconomics. Recent enthusiasm for single-equation, linear, instrumental variables approaches in applied microeconomics has led many in these fields to avoid undertaking research that would require them to think formally and carefully about the central issues of nonexperimental inference — what I see and many see as the core of econometrics. Providing empirically grounded policy advice necessarily involves confronting these difficult central issues.

Taking the Dogma out of Econometrics: Structural Modeling and Credible Inference
Aviv Nevo and Michael D. Whinston
Without a doubt, there has been a “credibility revolution” in applied econometrics. One contributing development has been in the improvement and increased use in data analysis of “structural methods”; that is, the use of models based in economic theory. Structural modeling attempts to us data to identify the parameters of an underlying economic model, based on models of individual choice or aggregate relations derived from them. Structural estimation has a long tradition in economics, but better and larger data sets, more powerful computers, improved modeling methods, faster computational techniques, and new econometric methods such as those mentioned above have allowed researchers to make significant improvements. While Angrist and Pischke extol the successes of empirical work that estimate “treatment effects” based on actual or quasi-experiments, they are much less sanguine about structural analysis and hold industrial organization up as an example where “progress is less dramatic.” Indeed, reading their article one comes away with the impression that there is only a single way to conduct credible empirical analysis. This seems to us a very narrow and dogmatic approach to empirical work; credible analysis can come in many guises, both structural and nonstructural, and for some questions structural analysis offers important advantages. In this comment, we address the criticism of structural analysis and its use in industrial organization, and consider why empirical analysis in industrial organization differs in such striking ways from that in field such as labor, which have recently emphasized the methods favored by Angrist and Pischke.

The Other Transformation in Econometric Practice: Robust Tools for Inference
James H. Stock
Angrist and Pischke highlight one aspect of the research that has positively transformed econometric practice and teaching. They emphasize the rise of experiments and quasi-experiments as credible sources of identification in microeconometric studies, which they usefully term “design-based research.” But in so doing, they miss an important part of the story: a second research strand aimed at developing tools for inference that are robust to subsidiary modeling assumptions. My first aim in these remarks therefore is to highlight some key developments in this area. I then turn to Angrist and Pischke’s call for adopting experiments and quasi-experiments in macroeconometrics; while sympathetic, I suspect the scope for such studies is limited. I conclude with some observations on the current debate about whether experi mental methods have gone too far in abandoning economic theory.

Geographic Variation in the Gender Differences in Test Scores
Devin G. Pope and Justin R. Sydnor
The causes and consequences of gender disparities in standardized test scores — especially in the high tails of achievement — have been a topic of heated debate. The existing evidence on standardized test scores largely confirms the prevailing stereotypes that more men than women excel in math and science while more women than men excel in tests of language and reading. We provide a new perspective on this gender gap in test scores by analyzing the variation in these disparities across geographic areas. We illustrate that male-female ratios of students scoring in the high ranges of standardized tests vary significantly across the United States. This variation is systematic in several important ways. In particular, states where males are highly overrepresented in the top math and science scores also tend to be states where women are highly overrepresented in the top reading scores. This pattern suggests that states vary in their adherence to stereotypical gender performance, rather than favoring one sex over the other across all subjects. Furthermore, since the genetic distinction and the hormonal differences between sexes that might affect early cognitive development (that is, innate abilities) are likely the same regardless of the state in which a person happens to be born, the variation we find speaks to the nature-versus-nurture debates surrounding test scores and suggests environments significantly impact gender disparities in test scores.

The Gender Gap in Secondary School Mathematics at High Achievement Levels: Evidence from the American Mathematics Competitions
Glenn Ellison and Ashley Swanson
This paper uses a new data source, American Mathematics Competitions, to examine the gender gap among high school students at very high achievement levels. The data bring out several new facts. There is a large gender gap that widens dramatically at percentiles above those that can be examined using standard data sources. An analysis of unobserved heterogeneity indicates that there is only moderate variation in the gender gap across schools. The highest achieving girls in the U.S. are concentrated in a very small set of elite schools, suggesting that almost all girls with the ability to reach high math achievement levels are not doing so.

Explaining the Gender Gap in Math Test Scores: The Role of Competition
Muriel Niederle and Lise Vesterlund
The mean and standard deviation in performance on math test scores are only slightly larger for males than for females. Despite minor differences in mean performance, many more boys than girls perform at the right tail of the distribution. This gender gap has been documented for a series of math tests including the AP calculus test, the mathematics SAT, and the quantitative portion of the Graduate Record Exam (GRE). The objective of this paper is not to discuss whether the mathematical skills of males and females differ, be it a result of nurture or nature. Rather we argue that the reported test scores do not necessarily match the gender differences in math skills. We will present results that suggest that the evidence of a large gender gap in mathematics performance at high percentiles in part may be explained by the differential manner in which men and women respond to competitive test-taking environments. The effects in mixed-sex settings range from women failing to perform well in competitions, to women shying away from environments in which they have to compete. We find that the response to competition differs for men and women, and in the examined environment, gender difference in competitive performance does not reflect the difference in noncompetitive performance. We argue that the competitive pressures associated with test taking may result in performances that do not reflect those of less-competitive settings. Of particular concern is that the distortion is likely to vary by gender and that it may cause gender differences in performance to be particularly large in mathematics and for the right tail of the performance distribution. Thus the gender gap in math test scores may exaggerate the math advantage of males over females.

Empirical Industrial Organization: A Progress Report
Liran Einav and Jonathan Levin
The field of industrial organization has made dramatic advances over the last few decades in developing empirical methods for analyzing imperfect competition and the organization of markets. These new methods have diffused widely: into merger reviews and antitrust litigation, regulatory decision making, price setting by retailers, the design of auctions and marketplaces, and into neighboring fields in economics, marketing, and engineering. Increasing access to firm-level data and in some cases the ability to cooperate with firms or governments in experimental research designs is offering new settings and opportunities to apply these ideas in empirical work. This essay begins with a sketch of how the field has evolved to its current state, in particular how the field’s emphasis has shifted over time from attempts to relate aggregate measures across industries toward more focused studies of individual industries. The second and primary part of the essay describes several active areas of inquiry. We also discuss some of the impacts of this research and specify topics where research efforts have been more or less successful. The last section steps back to offer a broader perspective. We address some current debates about research emphasis in the field, and more broadly about empirical methods, and offer some thoughts on where future research might go.

The Columbian Exchange: A History of Disease, Food, and Ideas
Nathan Nunn and Nancy Qian
This paper provides an overview of the long-term impacts of the Columbian Exchange — that is, the exchange of diseases, ideas, food crops, technologies, populations, and cultures between the New World and the Old World after Christopher Columbus’ voyage to the Americas in 1492. We focus on the aspects of the exchange that have been most neglected by economic studies; namely the transfer of diseases, food crops, and knowledge between the two Worlds. We pay particular attention to the effects of the exchange on the Old World.

Corporate Audits and How to Fix Them
Joshua Ronen
Auditors are supposed to be watchdogs, but in the last decade or so, they sometimes looked like lapdogs — more interested in serving the companies they audited than in assuring a flow of accurate information to investors. The auditing profession is based on what looks like a structural infirmity: auditors are paid by the companies they audit. An old German proverb holds: “Whose bread I eat, his song I sing.” While this saying was originally meant as a prayer of thanksgiving, the old proverb takes on a darker meaning for those who study the auditing profession. This paper begins with an overview of the practice of audits, the auditing profession, and the problems that auditors continue to face in terms not only of providing audits of high quality, but also in providing audits that investors feel comfortable trusting to be of high quality. It then turns t o a number of reforms that have been proposed, including ways of building reputation, liability reform, capitalizing or insuring auditing firms, and greater competition in the auditing profession. However, none of these suggested reforms, individually or collectively, severs the agency relation between the client management and the auditors. As a result, the conflict of interest, although it can be mitigated by some of these reforms, continues to threaten auditors’ independence, both real and perceived. In conclusion, I’ll discuss my own proposal for financial statements insurance, which would redefine the relationship between auditors and firms in such a way that auditors would no longer be beholden to management.

Markets: The Credit Rating Agencies
Lawrence J. White
This paper will explore how the financial regulatory structure propelled three credit rating agencies — Moody’s, Standard & Poor’s (S&P), and Fitch — to the center of the U.S. bond markets — and thereby virtually guaranteed that when these rating agencies did make mistakes, these mistakes would have serious consequences for the financial sector. We begin by looking at some relevant history of the industry, including the series of events that led financial regulators to outsource their judgments to the credit rating agencies (by requiring financial institutions to use the specific bond creditworthiness information that was provided by the major rating agencies) and when the credit rating agencies shifted their business model from “investor pays” to “issuer pays.” We then look at how the credit rating industry evolved and how its interaction with re gulatory authorities served as a barrier to entry. We then show how these ingredients combined to contribute to the subprime mortgage debacle and associated financial crisis. Finally, we consider two possible routes for public policy with respect to the credit rating industry: One route would tighten the regulation of the rating agencies, while the other route would reduce the required centrality of the rating agencies and thereby open up the bond information process in way that has not been possible since the 1930s.

My thoughts on Angrist and Pischke are in this review published last year in the Stata Journal (and originally appearing on this blog). My more recent thoughts on causal inference appear in this article to appear in the American Journal of Sociology, and you can go here to see various miscellaneous discussions.

The funny thing is that, after I wrote my review of Angrist and Pischke, I searched around for an econ journal to publish it in. I felt I had important ideas to share with that community. But I talked with various influential people who explained to me that econ journals just don’t review textbooks. I might’ve even submitted my review to the Journal of Economic Literature, I can’t remember. The review ended up appearing in the Stata Journal, which was just fine, but in retrospect it would’ve fit in well with this special issue. Too bad they didn’t ask me!


  1. Gustaf says:

    To be provocative: have "scientists" in economics been able to predict anything of substantial meaning (major events/effects in the economy), ever?

  2. dWj says:

    In re your first point, even "descriptive" statistics only make sense within the context of a model. (The "mean" doesn't really describe anything unless there's some reason to think, for example, that it's an informative measure of central tendency or the like.) I think doing statistics without a model is a bit like doing inference without a prior: you almost always have one lurking in the background, and the decision "not to use" one is really a decision not to be aware of it.

  3. K? O'Rourke says:

    Much ado about 3 things?

    Is there not just 3 basic ideas here?

    1. Randomization makes the distributional assumptions under the counterfactual "absolutely no treatment effect" almost certainly not too wrong. (Absolutely no treatment effect is counterfactual because everything does have some effect – but it is a useful represenattional device. On the other hand it is not as directly useful for getting at the magnitude of or variation in treatment effects.)

    2. When there was not randomization – what combination of manipulations and assumptions provides a credible argument that the manipulated empirical distribution is now – not that much unlike an empirical distribution that would have occured with randomization.

    3. When is that credible argument in 2 any much more than hopeful or wishful thinking – i.e. should be taken with much more than just a grain of salt?


  4. K? O'Rourke says:

    dWj: Perhaps John Maynard Keynes was one of the first statisticins to explicitly point this out in

    Keynes, J. M. The principal averages and the laws of error which lead to them. Journal
    of the Royal Statistical Society 74 (1911), 322-331.


  5. Miguelito says:

    If what you are after is a case where we have predicted by way of a consensus model an event that led to large changes in some price, that is theoretically impossible. If there is an algorithm that tells us reliably that the price of an asset will change dramatically, then there is an arbitrage opportunity and people taking advantage of it will cause the event not to occur. For example, had there been an algorithm that could predict (reliably) the 2007 collapse in housing values, no one would have been willing to pay the high prices of 2005, and there would have been no bubble.

    That's simply not the sort of science we do: forecasting is a bunch of judgment calls. There are lots of things we can predict in other areas. For example, if you create a certain sort of job-training program it will significantly increase the pay of women but do almost nothing for men. Or, if you create a school choice program of the sort used in Chicago, it will not lead to significant improvements in standardized test scores.

    These predictions may be wrong in other times and places. They are not universal constants. What they are is what scientific methods can tell us about subjects of interest. It seems a too-common error among those who would understand economics to look for the impossible and then, not finding it, conclude there is nothing to be learned.

  6. Mike says:

    Gustaf's point is a fair one, and we haven't even approached the point where the difficulty you bring up has arisen.
    No one is taking the models so seriously that opportunities are being arbitraged away based on them–at least not the textbook asset pricing models of macro and finance. We're so far away from those models reaching the theoretical problem of people taking them seriously and reacting to them, that it seems like only an incidental point that the models will be self-defeating.
    Even accepting that point, why exempt economists from including the effect of their models on the world? What we as economists are supposedly able to do (or what we are trying to do) is solve for "general equilibrium". Taking as a given that that is a very difficult thing to do anyway, why should we exclude the effect of the model as a part of the equilibrium? That may be difficult, but it's not theoretically impossible.
    I think a better reaction to Gustaf is simply that we, as economists, have not yet developed good predictive models. But we're doing our best.

  7. The Other Mike says:

    Gustaf's point would only be a fair one if there was reason to believe that the economy (or society in general) is predictable. What Miguel is (rightly) saying is that if this is not the case, then: (1) it doesn't make any sense whatsoever to hold predictive success as some sort of benchmark according to which economics ought to be evaluated; and that (2) it doesn't follow from the fact that prediction may not only be difficult, but also impossible, that economics cannot be rigorous and insightful.

    To the extent that economics is neither rigorous or insightful, moreover, I'd argue that it's likely a direct result of attempting to model it after an imaginary version of physics instead of drawing from fields whose subject matter is of a similarly contingent nature (like, say, biology). The success of any particular methodology is dependent on the way the world works; not the other way around. Yet, despite this obvious fact, one often gets the impression that some people expect the world to just neatly arrange itself after hopes and dreams.