Skip to content

“Causal Inference: The Mixtape”

A few years ago we reviewed “Mostly Harmless Econometrics,” by Josh Angrist and Jörn-Steffen Pischke.

And now we have another friendly introduction to causal inference by an economist, presented as a readable paperback book with a fun title. I’m speaking of “Causal Inference: The Mixtape,” by Scott Cunningham. I like the book—all the blurbs on the back are correct. My only problem with it is the same problem I have with most textbooks (including much of what’s in my own books), which is that it presents a sequence of successes without much discussion of failures.

For example, the book has a chapter on regression discontinuity designs (RDD). That’s fine. But all we see in the chapter are successes. There are some notes of caution (“In all these kinds of studies, we need data. But specifically, we need a lot of data around the discontinuities, which itself implies that the data sets useful for RDD are likely very large.”), but I still feel that the approach is a bit too encouraging. For example, Cunningham says, “The validity of an RDD doesn’t require that the assignment rule be arbitrary. It only requires that it be known, precise and free of manipulation.” But that’s not right. The validity of a regression discontinuity analysis also requires the validity of its statistical model.

We can see this from a couple of examples we’ve discussed here before: the example of air pollution in China and the example of air filters in Los Angeles schools. In both cases, the problem with the analysis was not the treatment assignment, it was with the statistical model which attempted to predict the outcome of interest with a mostly irrelevant variable whose only interest was that it was used in the treatment assignment. To put this in an observational-study framework, the problem is that there are systematic differences between the treatment and control groups, differences that are not accounted for by adjusting for the running variable. In contrast, when you look at the successful examples of RDD in Cunningham’s chapter (predicting earnings from test scores, predicting mortality from age, predicting employment from age among people in their sixties, predicting political ideology of congressmembers from partisan vote share in the previous election), the running variable has a strong connection to the outcome, enough so that it’s plausible that adjusting for this variable will adjust for systematic differences between treatment and control groups.

My concern is that people will read the chapter with these clean examples, read the text, and think that the only concerns are sample size and the treatment assignment rule. This is similar to the concern I have when people learn regression and get the wrong impression that the main things they have to worry about are the normal distribution and equal variance. Actually, these are the least important aspects of linear regression.

Getting back to regression discontinuity, here’s how we put it in section 21.3 of Regression and Other Stories:

We can learn from natural experiments. In a natural experiment, the exposure is assigned externally so we don’t have to worry about selection bias. Many natural experiments have the form of a discontinuity. If so, there’s a potentially serious concern with lack of overlap of the exposed and control groups, inference will be sensitive to our model for the outcome given the assignment variable, so we should take that aspect of the model seriously. But the assignment variable should not be the only concern. The exposed and control groups can differ in other pre-treatment characteristics. . . .

Regression discontinuity analysis works particularly well when the assignment variable has a strong logical and empirical connection to the outcome variable, for example pre-test and post-test scores in an education study. In such cases, if you adjust for the assignment variable and other relevant covariates, you don’t need to be so concerned about imbalance or lack of overlap on other pre-treatment variables.

In other settings, such as some examples where the treatment assignment depends on geography, the assignment variable is not particularly predictive of the outcome, and little if anything is gained by focusing on the discontinuity rather than simply considering the problem as a generic observational study. Indeed, performing a regression discontinuity analysis can lead to mistaken conclusions if you mistakenly think that the discontinuity implies that you don’t need to worry about adjusting for other pre-treatment variables.

But don’t let me give you the wrong idea. I like Cunningham’s book. Positivity bias is a problem in almost every methods textbook I’ve ever read or written, and such a bias makes sense. Students read these books (and teachers assign them) because students want to learn how to use these methods. It wouldn’t make much sense to study economics or statistics with the goal of learning why it doesn’t work. We learn and teach the successes, then we step back and explain the limitations. I get it. But the result can be to instill an attitude of overconfidence, an attitude among researchers that if you work hard and play by the rules you will get discoveries. This is a problem, and I want to emphasize this problem in the context of a book that I otherwise like a lot. I don’t think Cunningham would himself do or promote a bad analysis such as the air-pollution-in-China or Los-Angeles-schools discontinuity regressions, but I could imagine students reading his book and not catching how this method could go wrong. Same goes for readers of my books too, I’m sure.

P.S. Also I think it was a mistake for Cunningham to go on and on about the so-called Fisher’s sharp null hypothesis. I had the same problem with Imbens and Rubin’s book (see item 2 here). Imbens and Rubin are great, and Cunningham’s great, but even great explainers can detract from the message by explaining an idea that almost never makes any sense.


  1. Brett says:

    If totally unrelated then is this not random assignment?

    • Paul says:

      Not quite, random assessment means the scientist choose the values of the variable using a random method. Random assignment (should) ensure they are unrelated but is not necessary for being unrelated. Things can be unrelated without being randomly assigned. Whether you prefer the color orange or purple is unrelated to if you voted for Obama but not randomly assigned.

  2. gec says:

    This reminds me of my recent experiences teaching R to undergrads. Often, the most instructive thing to do was for me to make an error or set up a situation in which the students would make one. That way, the students could see what it looked like, what the consequences were, and how easy it typically was to recover from the error. Sometimes, the purpose of this was just to show how important it is not to be sloppy. Other times, the purpose was to emphasize that even when there is no error message, you have to double-check to make sure the result makes sense and means what you think it means.

    Overall, I agree that it is important to show how things can go wrong so that students can at least know what to look for. That’s an important step on the road to being a (self-)critical expert, as opposed to just a helpless button-pusher.

    Maybe there’s a way to inject these “intentional errors” in such a way that it avoids the overconfidence problem? In other words, rather than showing the right path and *then* all of the pitfalls around it, we walk the path step by step and show what happens if you go the wrong way. How to do this without overwhelming someone with info, I don’t know.

    • Ben says:

      Hmm, I’m not super sure about this. I think if you said beforehand “I’ve made sure and injected some errors, so approach this with the level of critique you’d expect to use outside of a classroom setting” that’s okay, but doing it secretly seems bad. Trick questions are just annoying.

      • gec says:

        To clarify, I’m not talking about trick questions or secrets. They are either in the context of a classroom demonstration or a guided lab activity and in both cases, the error is explicitly called out as such. The point is that we all see the error and it is good setup for subsequent discussion that clarifies important concepts.

        To give an early example, we would do some drills to introduce vectors and indexing.

        > x x[1]
        # 5
        > x[3]
        # 4.3

        and then I might write

        > x[5]

        which produces an error. As a class, we work out why it didn’t work, which helps the students understand the relationship between the number in the brackets and the result you get.

        These students have no prior experience with programming, so this is an efficient way to get across the functional idea of what it means to have a collection of values and how to access them.

        As another example, later in the course, we do independent sample t-tests using simulated data. In one example, we set the sample size for one group much lower than the other, such that it messes with the pooled variance. The result is a “significant” difference which, because we’d already worked through previous scenarios hammering in the idea that there are no average differences between the groups, doesn’t make sense. Through discussion of this weird result, the students get some understanding of how statistical tests don’t give you sensible results when you go too far outside the realm of their assumptions.

        This latter example is a case where it might make sense to incorporate it *into* the introduction of that test, rather than leave it until afterward. As it is, I fear the students get the sense of overconfidence that Andrew describes. I’m just not sure the best way to do this.

  3. Thank you Andrew for reading and critically interacting with the book. I agree with your point about estimator. For a long time, I had a punchy phrase I’d say to my students: “all you need for causal inference is data and assumptions”. I was always trying to get people to understand that making assumptions is required. Seems trivial but often youd encounter people who thought they could get away without doing it. But these days I’m very focused on diff in diff and how various estimators perform. Assumptions still are needed for any estimator to identify parameters of interest, but the phrase isn’t as helpful bc some estimators don’t need the same or as many assumptions. I’ll keep working on a punchy phrase to say more. Thanks again for interacting with the book. I appreciate your comments.

  4. sentinel chicken says:

    Maybe someone needs to write a book called “Regression Failures and Other Sad Stories”.

  5. Bodo Brückner says:

    I am not at all an expert. However, I am currently reading the recently published book “Data Analysis For Business, Economics, And Policy” by Gábor Békés and Gábor Kézdi who provide the following helpful explanation in section 21.10 of their book:

    The RD [regression-discontinuity] method has three important caveats besides the fact that it’s applicable only if there is a threshold that determines treatment.
    The first one is that only the intervention in question should depend on the threshold value of the running variable. […]
    The second additional caveat is that the subjects should not be able to manipulate the running variable. […]
    The third caveat with the RD method is that, even under ideal circumstances, what it can help estimate is the average effect of the treatment for subjects who are around the threshold value of the running variable. […]
    Thus, the RD method is applicable under very specific situations, and its results need to be interpreted carefully. […]

    • Andrew says:


      That’s all fine except that it doesn’t cover examples where the running variable is not a good predictor of the treatment, such as those discussed in my above post.

      I think the fundamental error in these sorts of descriptions is that they present regression discontinuity as a method of causal identification rather than as an approach for dealing with certain problems in observational studies. The bad RD examples we’ve discussed on our blog are all cases where I think it would be much clearer to just think of these as observational studies and then work to adjust for pre-treatment differences between the groups. Focusing on one particular predictor, just because it happens to be related to treatment assignment, is a distraction. See here for further discussion of this point.

Leave a Reply

Where can you find the best CBD products? CBD gummies made with vegan ingredients and CBD oils that are lab tested and 100% organic? Click here.