The State of the Art in Causal Inference: Some Changes Since 1972

For the first issue of the journal Observational Studies, editor Dylan Small will reprint William Cochran’s 1972 article on the topic (which begins, “Observational studies are a class of statistical studies that have increased in frequency and importance during the past 20 years. In an observational study the investigator is restricted to taking selected observations or measurements on the process under study. For one reason or another he cannot interfere in the process in the way that one does in a controlled laboratory type of experiment.”) along with discussions from several statisticians.

Here’s what I sent (under the title given above):

William Cochran’s 1972 article on observational studies is refreshing and includes recommendations and warnings that remain relevant today. Also interesting are the ways that Cochran’s advice differs from recent textbook treatments of causal inference from observational data.

Most notable, perhaps, is that Cochran talks about design and estimation and general goals of a study—but almost nothing about causality, devoting only one page out of ten to the topic. In statistical terms, Cochran spends a lot of time on the estimator (and, more generally, the procedure to decide what estimator to use) but never defines the estimand in an observational study. He refers to bias but gives no clear sense of what exactly is being estimated (he does not, for example, define any sort of average causal effect). Modern treatments of causal inference are much more direct on this point, with the benefit of the various formal models of causal inference that were developed by Rubin and others starting in the 1970s. Scholars have pointed out the ways in which the potential-outcome formulation derives from earlier work by statisticians and economists, but Cochran’s chapter reveals what was missing in this earlier era: there was only a very weak connection between substantive concerns of designs and measurement, and statistical inference decisions regarding matching, weighting, and regression. In more recent years, the filling in of this gap has been an important research area of Rosenbaum and others; again, seeing Cochran’s essay gives us a sense of how much needed to be done.

One area that Cochran discusses in detail, and which I think could use more attention in modern textbooks (including those of my collaborators and myself) is measurement. Statistics has been described as living in the intersection of variation, comparison, and measurement, and most textbooks in statistics and econometrics tend to focus on the first two of these, taking measurement for granted. Only in psychometrics do we really see measurement getting its due. So I was happy to see Cochran discuss measurement, even if he did not get to all the relevant issues—in particular, external validity, which has been the subject of much recent discussion in the context of laboratory experiments vs. field experiments vs. observational studies for social science and policy.

In reading Cochran’s chapter, I was struck by his apparent lack of interest in causal identification. Modern textbooks (for example, the econometrics book of Angrist and Pischke) discuss the search for natural experiments, along with the assumptions under which an observational study can yield valid causal inference, and various specific methods such as instrumental variables and regression discontinuity that can identify causal effects if defined carefully enough under specified conditions. In contrast, Cochran discusses generic before-and-after designs and restricts himself to analysis strategies that do basic controlling for pre-treatment covariates by matching and regression. He is not so clear on what variables should be controlled for (which perhaps can be expected given that he was writing before Rubin codified the concept of ignorability), and this has the practical consequence that he devotes little space to any discussion of the data-generating process. Sure, an experiment is, all else equal, better than an observational study, but we don’t get much guidance on how an observational study can be closer or further from the experimental ideal. Cochran did write, “a claim of proof of cause and effect must carry with it an explanation of the mechanism by which the effect is produced,” which could be taken as an allusion to the substantive assumptions required for causal inference from observational data but he supplied no specifics, nothing like, for example, the exclusion restriction in instrumental variables analysis.

Another topic that has appeared from time to time in the causal inference literature, notably by Leamer in the 1970s and in recent years by researchers such as Ioannidis, Button, and Simonsohn in medicine and psychology, are the biases resulting from the search for low p-values and the selective publication of large and surprising results. We are increasingly aware of how the “statistical significance filter” and other sorts of selection bias can distort our causal estimates in a variety of applied settings. Cochran, though, followed the standard statistical tradition of approaching studies one at a time; the terms “selection” and “meta-analysis” do not appear at all in his essay. Just to be clear: In noting this perspective, I am not suggesting that his own analyses were rife with selection bias. It is my impression that, in his work, Cochran was much more interested in improving the highest-quality research around him and was not particularly interested in criticizing the bad stuff. I get the sense, though, that, whatever things may have been like in the 1960s, in recent years selection bias has become a serious problem even in much of the most serious work in social science and medicine, and that careful analysis of individual studies is only part of the picture.

Let me conclude by emphasizing that the above discussion is not intended to be exhaustive. The design and analysis of observational studies is a huge topic, and I have merely tried to point to some areas that today are considered central to causal inference, were barely noted at all by a leader in the field in 1972. Much of the research we are doing today can be viewed as a response to the challenges laid down by Cochran in his thought-provoking essay that mixes practical concerns with specific statistical techniques.

13 thoughts on “The State of the Art in Causal Inference: Some Changes Since 1972

  1. > Cochran, though, followed the standard statistical tradition of approaching studies one at a time; the terms “selection” and “meta-analysis” do not appear at all in his essay.

    Hmm, Cohran did more work than almost anyone else on meta-analysis (aka analysis of a series of experiments)

    Covered in detail in O’ROURKE K. Meta-analytical themes in the history of statistics: 1700 to 1938, Pakistan Journal of Statistics. S. Ejaz Ahmed Special Issue 2002 18(20):285-299.

    And this summary from http://jameslindlibrary.org/articles/a-historical-perspective-on-meta-analysis-dealing-quantitatively-with-varying-study-results/

    One of Fisher’s colleagues, William Cochran extended Fisher’s approach and provided a formal random effects framework for it more in line with the earlier approach by Airy (Cochran 1937). Cochran, together with Frank Yates (another colleague of Fisher’s), soon afterwards applied this in practice to agricultural data (Yates and Cochran 1938). Cochran continued to work on methods for the analysis of multiple studies throughout his career. Indeed, the last sentence in his last paper commented on the difficulties in dealing with study effects that vary over time and location (Cochran 1980).

    Cochran also applied the method in medical research in an assessment of the effects of vagotomy (a surgical operation for duodenal ulcers), which was reported in an influential book entitled Costs, Risks and Benefits of Surgery (Cochran et al. 1977). Like Karl Pearson before him (Pearson 1904), Cochran commented on the need for data from controlled trials:

    We could have come across a number of comparisons that were well done but not randomized – the type sometimes called observational studies. … I would have been interested in including the observational studies so as to learn whether they agreed with the randomized studies and if not, why not? But the medical members of our team had been too well brought up by statisticians, and refused to look at anything but randomized experiments.

    • Keith:

      You raise an interesting question. Given that Cochran was an expert on meta-analysis, why did he not consider mentioning the topic in his review article on observational studies? I assume that he either thought of meta-analysis as a specialized topic, or as something that only would be used for prospectively planned experiments, or . . .?

      It is often interesting to consider why some expert seems to be neglecting some relevant and important topic. For another example, I remember over 20 years ago when I gave a talk on statistical models of votes and seats in legislatures, and a statistician in the audience commented that the sorts of hierarchical models that I was using, he had used over 30 years previously himself in an unpublished analysis of election returns for a news organization.

      I was not shocked to hear that the structure underlying my statistical methods was not new—after all, hierarchical models were well known in animal breeding studies in the 1950s, and I can only assume that Laplace did some version of partial pooling in his applied work. Rather, what interested me was that this statistician and his colleagues had been sitting on dynamite and had not realized it! It would be as if Tesla Motors or whoever were to design a super-efficient new car battery, and then some guy were to come up to them and say: Yeah, we did the same thing 30 years ago, it worked perfectly, we used it to power some toy cars around a racetrack.

      It makes me thing that it’s not enough for a researcher to have a method, or even a method + application + computation + data. You need the whole research infrastructure, including publications and interactions with colleagues. And unpublished project, even if performed by a supergroup, is too much of a dead end.

      • > seems to be neglecting some relevant and important topic

        Not sure, maybe just to limit the scope of the paper or the challenges for a graduate student.

        When I first mentioned Cochran’s meta-analysis papers to Don Rubin he seemed a bit upset that Cochran had not discussed any of that with him (“just made read all this survey stuff”).

        Fisher, in many of his early papers was very specific about concerns for multiple studies (excerpt below) but when I talked to one of Fisher’s students http://en.wikipedia.org/wiki/A._W._F._Edwards he said he had not noticed this aspect before.

        Yates did accuse Fisher of not knowing how to deal with more than one study at a time, but I believe Yates was just angry.

        Fisher in his 1925 and 1934 papers in which he mainly developed his theory of statistics,
        thought through the issues of multiple experiments when addressing the loss of information when summarizing data.
        In the 1925 paper, he points out that if there is no loss of information in a summary (i.e. when there are sufficient statistics) then the summary of two combined samples from the same population must be some function of the two
        summaries of the individual samples without recourse to the individual observations from either
        sample.

  2. I fully agree with this, but I have one minor quibble: I disagree with your use of the term “selection bias”. This might sound like a purely semantical objection, but “selection bias” has a well-defined meaning for a class of biases that affect identification, ie the internal validity of the analysis (an example of this is Berkson bias). I think we risk confusing everyone if we use the same term to describe significance filters

  3. One need not go all the way back to 1972, nor focus on observational studies.
    Here is large variation in semantics across experimental fields over the last two decades.

    One the one hand consider these two books on industrial experiments:

    Wu and Hamada, Experiments
    Hinkelmann and Kempthorne, Design and Analysis of Experiments

    Neither ever mentions potential outcomes.

    On the other hand consider this other book on social science experiments:

    Gerber and Green Field Experiments

    It mentions potential outcomes at every turn.
    Questions:

    Why?
    Was industrial research hampered by the lack of potential outcomes?

    • Fernando:

      The differences between industrial and social experimentation come up from time to time, and I think a key difference is that in industry it can be easier to consider uniquely defined treatments.

      For example, you can set the temperature of the room where the production is happening, or you can alter the humidity of the air. In an industrial experiment, it might not matter so much exactly how these treatments are done.

      In contrast, in a social experiment, the treatment “give someone a year more of education” or “increase the number of police officers on the street” might depend a lot on how the treatment is done (for example, are the police reassigned from desk jobs, are they working longer hours, are new police being hired, if so how is it being paid for, etc).

      P.S. Kempthorne’s book on design and analysis of experiments is from 1952, so it’s no surprise that it does not mention potential outcomes, which were first expressed for observational studies in 1974! Potential outcomes notation may well be overkill for clean designed experiments.

      • Andrew:

        The version of Kempthorne I linked to is the second updated edition. I believe it was published in 2007, and many of the references cited are from the 1980s and 1990s. So it seems potential outcomes have not yet been picked up….

        Not sure how your discussion about the degree of control over treatments relates to potential outcomes. For example, I try to set the temperature in my oven but (a) the internal temperature fluctuates across time and space inside the oven; and (2) if I am cooking silicon chips, such variation, even if minute to the naked eye, may be a huge deal for the quality of the chip. IOW, the degree of control is relative to the precision of the process under study. In these terms I am not sure many manufacturing processes are better controlled than social ones.

        I think the difference between manufacturing and social science lies not on the degree to which we can control causes, but to the extent the causes we manipulate explain outcomes. If you consider the graph U_x → X → Y ← U_y, I think in social science the big difference is Var(U_y) is much higher. I think your point relates more to Var(U_x).

        • Fernando:

          1. Yes, it’s an updated edition, but Kempthorne’s book is a classic of a certain era. It makes sense to me that they would update the references, but it also makes sense to me that the new author of the book would not try to expand its scope to observational studies. A single book can only do so much.

          2. Your oven examples represent treatment variation and measurement error, but it still seems to me that, in these examples, if you can get the temperature to the assigned level, that defines the treatment. In contrast, “more cops on the street” is not a treatment, in that there are so many different ways to get there, and these different “instruments” can have much different effects.

        • Andrew:

          On your point 1, it would be interesting if a newly minted book on industrial experiments uses potential outcome notations. I doubt it. Not when you are doing 2^20 factorial experiments at a time, which is not uncommon in manufacturing. The potential outcome notation for this would be a total mess.

          On your point 2 I still don’t see how that relates to potential outcomes.

          Manufacturing experiments also include interventions. Whether these are direct or indirect (via instruments) seems to me to be besides the point for justifying potential outcomes. It is not like in manufacturing they spouse causation without manipulation (though I have no problem with that).

          I think the main difference is that people in manufacturing use structural equation modeling. And they seem to be doing fine with that. So maybe Cochrane was also doing fine with that (though I have not read the article).

        • Also, to continue on the Kempthorne book, I bet it doesn’t have much or anything on instrumental variables and other identification strategies that are standard in inference from observational studies. This is all just a different world.

          I don’t think we can learn much from the fact that an updated textbook from 1952 on designed experiments does not mention a topic related to observational studies. You could just as well ask “Was industrial research hampered by the lack of instrumental variables?” or just about any other of the tools used in social inference.

          In any of these cases, I’d give the answer that, in an industrial experiments (or in a field such as pharmacology with its mechanistic models), selection bias is typically not the key issue. Instead, researchers need to spend more of their effort on nonlinear modeling.

        • I agree it is a different world. And as far as I can tell Kemp. does not include IV as you predict.

          And I don’t disagree with your point about selection bias.

          My point is simply to note that the variation in language / topics etc you observe across time in observational studies, you can observe more or less in the cross section in experimental studies. I am not sure what to conclude from this. I just found it rather surprising.

        • I do work in pharmacology and we analyze observational data pretty much the same way we analyze experimental data. Selection just doesn’t seem like a big deal. But, in pharmacology, it doesn’t really matter how much sophistication you have regarding identification strategies and causal inference, if your mechanistic model is bad.

        • Andrew and Fernando:

          You might find this helpful

          Epistemic Activities and Systems of Practice: Units of Analysis in Philosophy of Science After the Practice Turn. Hasok Chang 2014 in Science After the Practice Turn in the Philosophy, History and Social Studies of Science, Publisher: Routledge, Editors: Léna Soler, Sjoerd Zwart, Michael Lynch, Vincent Israel-Jost, pp.67–79

          Also possibly for “You need the whole research infrastructure, including publications and interactions with colleagues.”

Leave a Reply

Your email address will not be published. Required fields are marked *