Summary
If you have an observational study with outcome y treatment variable z and pre-treatment predictors X, and treatment assignment depends only on X, then you can estimate the average causal effect by regressing y on z and X and looking at the coefficient of z. If there is lack of complete overlap in X of the treatment and control groups, then your inference can be highly sensitive to the form of the model for E(y|z,X) as a function of X.
A special case is discontinuity analysis, where the treatment assignment depends entirely on one of the pre-treatment variables, call it x, with z=1 or 0 when x is above or below some threshold. Here, when running your regression of y on z, you’ll definitely want to include this “running variable” x in your pre-treatment predictors—but in general you’ll also want to adjust for other X variables. Just because the treatment assignment depends on x, this doesn’t ensure overlap and balance across other variables in X. The other thing is that there’s no overlap on x, so your inference is sensitive to the functional form of how x enters the regression model. That’s just the way it is. Deciding to use a local regression or a polynomial or whatever doesn’t resolve this problem; these models are nothing more than tools that allow you to try to construct a reasonable fit, and if the fit is unreasonable, there’s no reason to trust the result. In some settings, you can fit your regression discontinuity analysis only adjusting for x and no other variables in X, but that’s only in the special case where x is a really important predictor and you can assume something close to balance on all the other pre-treatment variables, for example if y is post-test score and x is the score on a highly predictive pre-test. This is as with any observational study: If you have a really good pre-treatment predictor, you might be able to get away with just adjusting for that and nothing else, but this is not a general principle. In general you need to be concerned with balance on all pre-treatment predictors, and when there’s lack of overlap, the form of the regression function can be important.
Considering many of the bad regression discontinuity analyses we’ve looked at in recent years, some common features are:
– The running variable x is not a strong predictor of the outcome;
– The fitted functional form for E(y|z,x) lacks face validity;
– The analysis does not always adjust for other pre-treatment variables (what I’m calling the rest of X);
– The people who did the analysis think they’re doing everything right, so they don’t question the results.
The point of this post is (a) to talk about how to do a better analysis using the general perspective of observational studies, and also (b) to free people from thinking that the simplistic regression discontinuity (in which only x is adjusted for, and in which there’s no concern about the fitted functional form of the regression) is the right thing to do. I’m hoping that when released of that attitude, researchers can be liberated to do better analyses.
All of this is separate from the concerns of forking paths and summaries based on statistical significance. These topics are also important, and they also come up with regression discontinuity analysis, but I won’t be discussing them today.
Background
The other day I was speaking with some economics students and we were discussing problems with regression discontinuity analyses. For background see here, here, here, here, here, here, here, here, here, and here. One interesting thing about these examples is that the analyses are obviously wrong, to the extent that the students are surprised they were ever taking seriously—but yet these examples keep on coming.
The purpose of today’s talk is not to explain what went wrong in all those analyses—you can see the above links for that—but rather to outline the analysis I’d recommend instead.
The trick is to take the good part of the regression discontinuity design but not the bad part.
The good part is that you have a natural experiment: everyone with x below some threshold was exposed, everyone with x below that threshold was unexposed. So no need to worry about selection bias in the way that it is often a concern with observational studies.
The bad part is the idea that you’re supposed to model y given x and the discontinuity and nothing else: y_i = a + theta*z_i + f(x_i, phi) + error, where theta is the treatment effect, z_i is the treatment variable (1 if exposed, 0 if not), x is the running variable, and phi is the vector of parameters governing E(y|x) in the absence of any treatment.
There’s lots of focus on what functional form to use in the above expression, and Guido and I have contributed to this discussion, but really the problem is not with any particular family of curves but rather with the idea that you’re only supposed to adjust for the running variable x and nothing else. That’s the mistake right there.
My advice
So here’s how I recommend attacking the problem of causal inference in a discontinuity design:
1. It’s an observational study. You’re comparing outcomes for exposed and unexposed units, and you want to adjust for pre-treatment differences between the two groups.
2. It’s a natural experiment. The treatment assignment only depends on x. That’s great news! But you still need to adjust for pre-treatment differences between the two groups.
3. Adjusting for a functional form f(x, phi) does not in general adjust for pre-treatment differences between the two groups. It adjusts for differences in x but not for anything else.
4. It makes sense to adjust for x and to fit a reasonable smooth function to do this. The treatment and control groups have zero overlap on x, so you want to think hard about how to do this adjustment. “Think hard” includes using an appropriate functional form and also looking at the fit to see if it makes sense.
The punch line is: Adjust for x and also adjust for other relevant pre-treatment variables. It’s an observational study! No reason to expect balance for pre-treatment characteristics that don’t happen to be captured by the running variable.
We discuss regression discontinuity in section 21.3 of Regression and Other Stories. We have an example there and we give some good advice. But now I’m wishing we had something punchier like what I just wrote above. Sometimes it’s worth putting in some words to dispel misconceptions.
I’m trying to help here!
Sometimes people get annoyed when I criticize these papers, either because they’re written by important people and so who am I to question, or because they’re written by less important people and so why am I picking on them.
The reason why I criticize is the same as the reason why I offer advice. It’s because I think policy analysis is important! I’m glad that youall are uncovering these natural experiments and doing these studies. I just want to help you do a better job of it. What’s the point of making avoidable errors? Sure, in the short term if you do a bad analysis and nobody notices you can get some twitter action and maybe even a published paper out of it. But long term you’re just wasting everyone’s time, and for your own career development it’s better to learn how to do things right.
This has come up before
Here’s what I wrote a couple years ago:
I was talking with some people the other day about bad regression discontinuity analyses . . . The people talking with me asked the question: OK, we agree that the published analysis was no good. What would I have done instead? My response was that I’d consider the problem as a natural experiment: a certain policy was done in some cities and not others, so compare the outcome (in this case, life expectancy) in exposed and unexposed cities, and then adjust for differences between the two groups. A challenge here is the discontinuity—the policy was implemented north of the river but not south—and that’s a challenge, but this sort of thing arises in many natural experiments. You have to model things in some way, make some assumps, no way around it. From this perspective, though, the key is that this “forcing variable” is just one of the many ways in which the exposed and unexposed cities can differ.
After I described this possible plan of analysis, the people talking with me agreed that it was reasonable, but they argued that such an analysis could never have been published in a top journal. They argued that the apparently clean causal identification of the regression discontinuity analysis made the result publishable in a way that a straightforward observational study would not be.
If so, that’s really frustrating, the idea that a better analysis would have a lower chance of being published in a top journal, for the very reasons that makes it better. Talk about counterfactuals and perverse incentives.
What would be helpful
It can be hard to communicate with economists—they use a different language. To really make the points in this article, it would be helpful to translate to econ-speak and write a paper with a couple of theorems. That could make a difference, maybe.