**Summary**

If you have an observational study with outcome y treatment variable z and pre-treatment predictors X, and treatment assignment depends only on X, then you can estimate the average causal effect by regressing y on z and X and looking at the coefficient of z. If there is lack of complete overlap in X of the treatment and control groups, then your inference can be highly sensitive to the form of the model for E(y|z,X) as a function of X.

A special case is discontinuity analysis, where the treatment assignment depends entirely on one of the pre-treatment variables, call it x, with z=1 or 0 when x is above or below some threshold. Here, when running your regression of y on z, you’ll definitely want to include this “running variable” x in your pre-treatment predictors—but in general you’ll also want to adjust for other X variables. Just because the treatment assignment depends on x, this doesn’t ensure overlap and balance across other variables in X. The other thing is that there’s no overlap on x, so your inference is sensitive to the functional form of how x enters the regression model. That’s just the way it is. Deciding to use a local regression or a polynomial or whatever doesn’t resolve this problem; these models are nothing more than tools that allow you to try to construct a reasonable fit, and if the fit is unreasonable, there’s no reason to trust the result. In some settings, you can fit your regression discontinuity analysis only adjusting for x and no other variables in X, but that’s only in the special case where x is a really important predictor and you can assume something close to balance on all the other pre-treatment variables, for example if y is post-test score and x is the score on a highly predictive pre-test. This is as with any observational study: If you have a really good pre-treatment predictor, you might be able to get away with just adjusting for that and nothing else, but this is not a general principle. In general you need to be concerned with balance on all pre-treatment predictors, and when there’s lack of overlap, the form of the regression function can be important.

Considering many of the bad regression discontinuity analyses we’ve looked at in recent years, some common features are:

– The running variable x is not a strong predictor of the outcome;

– The fitted functional form for E(y|z,x) lacks face validity;

– The analysis does not always adjust for other pre-treatment variables (what I’m calling the rest of X);

– The people who did the analysis think they’re doing everything right, so they don’t question the results.

The point of this post is (a) to talk about how to do a better analysis using the general perspective of observational studies, and also (b) to free people from thinking that the simplistic regression discontinuity (in which only x is adjusted for, and in which there’s no concern about the fitted functional form of the regression) is the right thing to do. I’m hoping that when released of that attitude, researchers can be liberated to do better analyses.

All of this is separate from the concerns of forking paths and summaries based on statistical significance. These topics are also important, and they also come up with regression discontinuity analysis, but I won’t be discussing them today.

**Background**

The other day I was speaking with some economics students and we were discussing problems with regression discontinuity analyses. For background see here, here, here, here, here, here, here, here, here, and here. One interesting thing about these examples is that the analyses are obviously wrong, to the extent that the students are surprised they were ever taking seriously—but yet these examples keep on coming.

The purpose of today’s talk is not to explain what went wrong in all those analyses—you can see the above links for that—but rather to outline the analysis I’d recommend instead.

The trick is to take the good part of the regression discontinuity design but not the bad part.

*The good part* is that you have a natural experiment: everyone with x below some threshold was exposed, everyone with x below that threshold was unexposed. So no need to worry about selection bias in the way that it is often a concern with observational studies.

*The bad part* is the idea that you’re supposed to model y given x and the discontinuity and nothing else: y_i = a + theta*z_i + f(x_i, phi) + error, where theta is the treatment effect, z_i is the treatment variable (1 if exposed, 0 if not), x is the running variable, and phi is the vector of parameters governing E(y|x) in the absence of any treatment.

There’s lots of focus on what functional form to use in the above expression, and Guido and I have contributed to this discussion, but really the problem is not with any particular family of curves but rather with the idea that you’re only supposed to adjust for the running variable x and nothing else. That’s the mistake right there.

**My advice**

So here’s how I recommend attacking the problem of causal inference in a discontinuity design:

1. It’s an observational study. You’re comparing outcomes for exposed and unexposed units, and you want to adjust for pre-treatment differences between the two groups.

2. It’s a natural experiment. The treatment assignment only depends on x. That’s great news! But you still need to adjust for pre-treatment differences between the two groups.

3. Adjusting for a functional form f(x, phi) does *not* in general adjust for pre-treatment differences between the two groups. It adjusts for differences in x but not for anything else.

4. It makes sense to adjust for x and to fit a reasonable smooth function to do this. The treatment and control groups have zero overlap on x, so you want to think hard about how to do this adjustment. “Think hard” includes using an appropriate functional form and also looking at the fit to see if it makes sense.

The punch line is: Adjust for x and also adjust for other relevant pre-treatment variables. It’s an observational study! No reason to expect balance for pre-treatment characteristics that don’t happen to be captured by the running variable.

We discuss regression discontinuity in section 21.3 of Regression and Other Stories. We have an example there and we give some good advice. But now I’m wishing we had something punchier like what I just wrote above. Sometimes it’s worth putting in some words to dispel misconceptions.

**I’m trying to help here!**

Sometimes people get annoyed when I criticize these papers, either because they’re written by important people and so who am I to question, or because they’re written by less important people and so why am I picking on them.

The reason why I criticize is the same as the reason why I offer advice. It’s because I think policy analysis is important! I’m glad that youall are uncovering these natural experiments and doing these studies. I just want to help you do a better job of it. What’s the point of making avoidable errors? Sure, in the short term if you do a bad analysis and nobody notices you can get some twitter action and maybe even a published paper out of it. But long term you’re just wasting everyone’s time, and for your own career development it’s better to learn how to do things right.

**This has come up before**

Here’s what I wrote a couple years ago:

I was talking with some people the other day about bad regression discontinuity analyses . . . The people talking with me asked the question: OK, we agree that the published analysis was no good. What would I have done instead? My response was that I’d consider the problem as a natural experiment: a certain policy was done in some cities and not others, so compare the outcome (in this case, life expectancy) in exposed and unexposed cities, and then adjust for differences between the two groups. A challenge here is the discontinuity—the policy was implemented north of the river but not south—and that’s a challenge, but this sort of thing arises in many natural experiments. You have to model things in some way, make some assumps, no way around it. From this perspective, though, the key is that this “forcing variable” is just one of the many ways in which the exposed and unexposed cities can differ.

After I described this possible plan of analysis, the people talking with me agreed that it was reasonable, but they argued that such an analysis could never have been published in a top journal. They argued that the apparently clean causal identification of the regression discontinuity analysis made the result publishable in a way that a straightforward observational study would not be.

If so, that’s really frustrating, the idea that a better analysis would have a lower chance of being published in a top journal, for the very reasons that makes it better. Talk about counterfactuals and perverse incentives.

**What would be helpful**

It can be hard to communicate with economists—they use a different language. To really make the points in this article, it would be helpful to translate to econ-speak and write a paper with a couple of theorems. That could make a difference, maybe.

Several of the examples you link to already adjust for x and other relevant pre-treatment variables. But great advice.

Fixed; thanks.

“So no need to worry about selection bias in the way that it is often a concern with observational studies.”

Not sure about this. Suppose we are trying to get the effect of a weight reduction class, but the class is limited to those with a BMI of 35 or above. Don’t you still have to worry about sample selection in how people got there? Suppose some people gained weight just before applying just to get into the class? Or some people lost a few pounds just to not be forced into the fatty class. We could treat BMI two months previous (if we had it) as another variable outside the running variable, or something, but there can still be sample selection problems even with a sharp division of treatment if people have the ability to alter their status with respect to the running variable. If people move north of the river to get access to heat before the trial begins, you have a sample selection problem.

Jonathan:

Yes, good point. This makes me think about doing a discontinuity analysis for fighters in different weight classes, as it’s well known that they’ll diet and sweat to get their weight below the threshold, just long enough for the weighing.

A rigorous discussion of under what conditions the regression discontinuity design estimates a real effect and in whom can be found here for anyone interested in that kind of thing: https://arxiv.org/abs/2004.09458. It’s not an inherently flawed design, it just depends on assumptions that in practice are often ignored or combined with crazy curve-fitting assumptions.

I’m not sure if I think Andrew’s twin crusades against IV and regression discontinuity designs (which are actually closely related, both the designs and the crusades) do more harm or good. On the one hand, the crusades wrongly brush aside useful formalism that clearly explains when the designs are valid and when they’re not. On the other hand, it’s clear that many practitioners use the existence of said formalism as license to do whatever they want, even when it runs afoul of what the formalism says is allowed. Maybe unfounded dismissal of the formalism itself is much more effective at decreasing the influence of bad studies than merely pointing out that while the assumptions can be (approximately) valid, they’re not even close to valid in lots of applications.

Z:

These are observational studies, and I think the key problem is that people often seem to think that the fact that the design is based on a particular x variable allows them to ignore all the usual concerns with observational studies.

The fundamental problem is that there can be systematic pre-treatment differences between the treatment and control groups. Regression can be a great way of adjusting for such differences; it’s just important to realize that (a) regression only adjusts for the variables included in as predictors (so regression on x does not automatically adjust for other variables in X), and (b) when there’s strong imbalance and lack of overlap, inferences can be very sensitive to the functional form of the regression.

Mathematical formalism is fine—my colleagues and I wrote three editions of a book that’s full of math! One useful role of math in statistics is to come up with solutions for particular problems; another useful role is to clarify the assumptions implicit in particular statistical methods.

> I think the key problem is that people often seem to think that the fact that the design is based on a particular x variable allows them to ignore all the usual concerns with observational studies.

Ok, people may often seem to think this, but that’s a straw man if criticizing regression discontinuity generally. The assumptions licensing use of a regression discontinuity design (e.g. https://arxiv.org/abs/2004.09458) require more than that treatment assignment is based on just one x. Surely you agree that if treatment assignment based on x satisfies certain conditions then certain concerns with observational studies (at least approximately) do not apply? For example, suppose x is the treatment arm to which you are assigned in a randomized trial. (What is analysis of data from an RCT but an observational study of patients who had been randomly assigned treatment?) A good regression discontinuity should meet the requirement that among subjects with x values near the cutoff C, it is essentially random which side of C their x value lands on. In that link I gave, they discuss how this assumption can be met when there is measurement error in x, for example.

> Mathematical formalism is fine—my colleagues and I wrote three editions of a book that’s full of math! One useful role of math in statistics is to come up with solutions for particular problems; another useful role is to clarify the assumptions implicit in particular statistical methods.

I know you’re not averse to math generally. My objection is that there’s an established counterfactual formalism in which it has been proven that certain assumptions are sufficient for regression discontinuity or IV designs to work. These assumptions are not “implicit” in the methods, but rather explicitly proven in the causal inference literature. But you discard these results and assert that the designs inherently ignore confounding bias. It’s like you think or are implicitly (or maybe explicitly at times, I don’t remember) claiming that the counterfactual formalism that produced these results is too disconnected from reality to be worth engaging with. (And I know, you’ve used potential outcomes notation in your work and are not generally against counterfactual reasoning. But you seem to draw a line at the simplest conditional exchangeability scenario and consider everything beyond that line too fancy or artificial or precious to be useful.)

I think we agree completely on what is going wrong in bad regression discontinuity studies or bad IV studies. I’m just objecting to the next step you take in your blog posts of dismissing these designs themselves as therefore fundamentally flawed.

Z:

Thanks for engaging me on this. Let me clarify.

I think the discontinuity design can be great: it’s a natural experiment! There’s a reason we include it in Regression and Other Stories. I think the mistake is more in the analysis than in the design. As I see it, the default analysis should adjust for pre-treatment variables X, in the way of the default analysis in standard observational studies. Compared to the standard observational study, the discontinuity design has one big advantage—you know the treatment assignment mechanism, so you have ignorability (absent some selection-within-the-running-variable issues of the sort discussed by Jonathan in his comment above)—and one big disadvantage, the non-overlap that makes inferences sensitive to your specification of the regression given the running variable. The advantage is big enough to often outweigh the disadvantage, which is why I think the design can be useful. The fact that a design has disadvantages doesn’t make it fatally flawed! The design is made a lot worse when people do bad analyses, for example when they fit a really bad functional form for y given x or when they don’t even try to adjust for other X variables.

I hope this helps! This discussion also I think underscores the potential value in a paper that would make these points in clear and rigorous econometrics language. As you say, part of a useful criticism is to make clear exactly what is being criticized. I don’t want to be nihilistic here.

> A rigorous discussion of under what conditions the regression discontinuity design estimates a real effect and in whom can be found here for anyone interested in that kind of thing: https://arxiv.org/abs/2004.09458

This is about cases where measurement error in the forcing variable simulates random assignment. In other words, when there’s lots of data in a small neighborhood around the boundary and a bandwidth is therefore chosen such that that neighborhood dominates the analysis. I’m not sure how relevant that is to the designs discussed here, where fitting trends over a large neighborhood is necessary to get a coherent functional form and enough data to distinguish from noise, but the discontinuity still models treatment assignment.

Somebody:

I agree that if there’s measurement error in the forcing variable, the analysis is much cleaner. As you say, in most of the bad RD examples we’ve considered, this is not the case. So . . . nobody told all these researchers that measurement error in the forcing condition was an important part of the RD assumptions!

Glad you like this direction. References to measurement error and other noise are often part of intuitions about RD but, apart from Lee’s result about implied continuity, aren’t really made use of in any direct way.

We hope our work clarifies that there are perhaps qualitatively different RD designs here — some with noise (and thus randomization, and sometimes enough randomization to make this the sole basis of inference) and some without noise (or not nearly enough of it).

We’ll try to tell all those researchers :)

Thanks Z for pointing to our paper (https://arxiv.org/abs/2004.09458). We see this as potentially carving out different classes of regression discontinuity designs — with some being based in exogenous noise (and thus a local experiments, albeit with unobserved, heterogeneous probabilities of assignment to treatment cf https://www.emerald.com/insight/content/doi/10.1108/S0731-905320170000038001/full/html) and others that can’t readily be understood that way (eg geographic quasi-experients, thresholds in grade point averages).

“I’m not sure how relevant that is to the designs discussed here, where fitting trends over a large neighborhood is necessary to get a coherent functional form and enough data to distinguish from noise, but the discontinuity still models treatment assignment.”

So I see our paper as relevant to those cases in precisely the sense that the methods we propose don’t apply — or if applied credibly would result in very, very wide confidence intervals (since there is not enough noise)! And thus it is very hard to argue that the estimation and inference is really grounded in something like a local randomized experiment.

Rather, they make sense if you can lean on some assumption about the conditional expectation function of the potential outcomes. With a continuous running variable and a lot of data, this could be just fine. But if there isn’t much data near the cutoff (or the cases near the cutoff are corrupted by endogenous sorting) then maybe one runs into trouble again.

Typically in practice, at least until very recently with adoption of bias-aware methods (https://www.mitpressjournals.org/doi/abs/10.1162/rest_a_00793 https://github.com/kolesarm/RDHonest https://rdpackages.github.io/rdrobust/), empirical applications did not really recon with the consequences of estimating the CEF with bias.

In response to: “It’s an observational study! No reason to expect balance for pre-treatment characteristics that don’t happen to be captured by the running variable.”

Unless the covariates you want to adjust for are highly nonlinear functions of the running variable, then in fact wouldn’t you expect balance in those covariates with a reasonably large sample? That is the reason it is liked for causal inference when we cannot reasonably assume ignobility of the treatment conditional on X, we are assuming the unobserved confounders are balanced across the threshold, otherwise the discontinuity would be a pointless exercise.

As an Econ PhD student I agree about the egregious RDDs you post here being wildly inappropriate for serious analysis, but I feel you may understate the case for a vanilla RDD with a good functional form fit on the running variable near the threshold.

Jackson:

No, I don’t see why I’d expect balance even with a large sample. Consider the air pollution in China example. Some cities are north of the river, some are south of the river. The cities that are north of the river are not in the same places as the cities that are south of the river, so even if I were to gather data on lots and lots of cities (I guess in this case one would increase N by including smaller subdivisions of cities, towns, villages, etc.), I would see no reason to expect balance on other pre-treatment variables.

I agree that there are cases where regression discontinuity analysis can make sense—indeed, we include the method in Regression and Other Stories. But I guess there’s a problem that the conditions for what makes a “vanilla” RD are not so clear, given that the egregious we’ve discussed have been authored by top economists and published in top journals.

I agree with you about the air pollution example because the cities in question are not sufficiently near the threshold of interest. It is a poor use case. Would we say RDD is bad in the case of a pre-test post-test where some researches used students very far on either side of the threshold for their comparison groups? I’d hope not, we would say “hey, you’re supposed to use students near the threshold.” Why is it valid to use students nearer the threshold than further away, because we presume those students are more similar. It’s like you say, a blind application of theory without sufficient thought put into it, but that is the fault of the carpenter, not the hammer.

My main point is that a vanilla RDD, as “somebody” below points out, has pretty standard and I think often easy to reason out assumptions that lead to valid inference and are usually if not always violated in the examples you post that are poor uses. As someone with data that are often confounded with selection bias I would hate to see RDD get a bad rap because there is a river in China and the cities that sit on either side are not altogether that similar.

Jackson:

That’s just one example—but it was published in a top journal, was featured uncritically in the national news media, and has still never been retracted by the authors.

There was also the RD on governors’ elections and lifespan which had all sorts of problems. Election outcomes can be considered random near the boundary, but in real life what we have there is an observational study with finite N and issues of balance between treatment and control groups, forking paths, data quality, and all the rest. These problems are not unique to RD: if regression discontinuity had never been born, any analyses of these data would still have to address balance between treatment and control groups, forking paths, data quality, and all the rest. I’d say that RD added nothing to the analysis. Rather, it made the analysis worse, first because it gave the authors overconfidence (the attitude is that the hardest thing in causal inference is identification, so once people think they have identification, they have permission to turn their brains off), and second because they were under the impression that if you’re doing RD you don’t need to adjust for differences between treated and control groups, other than in the forcing variable.

The formally correct version of RDD that you’re talking about is when:

1. Potential confounders are all smooth functions of the forcing variable

2. You take the limits of the expectation of the outcome variable at the cutoff from above and below and subtract

1 is clearly not the case in many examples. In the river example, I expect lots of things are changing discontinuously across the river. 2 requires a ton of data in an arbitrarily small neighborhood of the boundary. The moment you’re using data far from the boundary to inform the fit of whatever functional form you’ve assumed, there’s absolutely no reason to expect confounders to be balanced on both sides.

On 1 of course I agree. You can’t type RDD(data) in R and turn off your brain, but it is usually possible to reason out (and even test!) whether or not covariates are continuous.

On 2, you go from taking limits which in reality can never be taken to “absolutely no reason,” which seems like a discontinuous rule applied to a continuous reality. When we have a ton of data near the threshold our inference is more valid, when we have slightly less, our inference is slightly less valid, and so on. I don’t see how it goes any further than that. Of course that is conditional on what “far” means, but dismissing anything beyond a mean difference using “a ton” of data seems pessimistic.

By 2 I mean, looking at the plots Gelman linked above, do they really look like a finite sample approximation to the upper and lower limits of a discontinuity? But, on the other hand, is the case where you can get a good approximation of that the only case where discontinuities can be helpful?

I too find it a little odd to strictly maintain that these are observational studies.

Now, a simple question: what if the discontinuity is not actually known? Unlike in policy analysis, to my understanding, this case is commmon in economics. The classical case they’ve studied is the 1973-75 recession.

To rephrase: what if the whole purpose is to locate a potential discontinuity?

Jukka:

From wikipedia: “In fields such as epidemiology, social sciences, psychology and statistics, an observational study draws inferences from a sample to a population where the independent variable is not under the control of the researcher.” In theory, it would be possible to have a discontinuity design that is experimental, but in practice, the discontinuity designs I’ve seen have all been observational studies, or, if you prefer, natural experiments, where the assignment was already done before the researcher comes to the problem.

But I agree that the important thing here is not whether the researcher controlled the assignment. What’s important is the actual assignment mechanism. Even if these discontinuity studies

hadbeen experimental, they’d have all the problems of lack of overlap and imbalance that we’ve been discussing. For example, if some experimenter had decided to give the indoor coal heating to Chinese households north of the river but not to those households south of the river, that would’ve been an experimenter-assigned treatment, but that would not have assured balance across other pre-treatment variables, as the cities are still in different places.In answer to your other question: what if there is a discontinuity whose location must be estimated? This is not a difficult statistical problem, but (a) in real life I don’t think there will be a sharp discontinuity in that case, and (b) any fitted model will still give a challenge in causal interpretation if there is not a good adjustment for lack of overlap in pre-treatment variables.

“the independent variable is not under the control of the researcher”

That definition highlights that if the researchers didn’t run the experiment, but say some firm did an A/B test you are reanalyzing, it is maybe a natural experiment (and indeed, might be badly broken or never really was randomized).

See for example the failure to mix the balls enough in the first year of the Vietnam draft lottery.

The short answer is that the point of discontinuity then has to be treated as a parameter to be estimated. Moreover, you’d probably also want to compare a model with a discontinuity to one without to see if there is in fact evidence for the discontinuity anywhere in your data (conditional on the choice of model).

I’m not sure there’s an “off the shelf” analysis here, but that’s probably for the best. Changepoint detection is a pretty common problem in various fields, but the particulars of each situation differ. You’d be better off trying to work out the structure of the model specially for each application. There’s an example in the Bayesian Cognitive Modeling book by Wagenmakers and Lee, as well as an example in Stan (https://mc-stan.org/docs/2_26/stan-users-guide/change-point-section.html).

Andrew:

While there’s been some recent work on the subject, my understanding is that the asymptotic properties of the RD estimate are not well understood when including other covariates, whereas a univariate RD is consistent, even with confounders. I agree that including control variables has heuristic appeal for finite samples, but is there a formal basis for doing so?

Take the opposite of your example. If we find non-zero effects with controls on a finite sample, but null results without controls, what should we conclude? Why?

M:

No, the univariate RD is not consistent if there if the treatment and control groups are not balanced with respect to other pre-treatment variables.

In answer to your question: I think you’re making a mistake by framing it as binary: nonzero effects or null effects. In real life, no effects are zero and all are uncertain. But I guess the general question is, what would I do when different estimates differ, and my answer is that it will depend on the applied problem. The univariate analysis is a special case of the multivariate analysis with certain coefficients set exactly to zero. Sometimes that’s ok, sometimes not. This is really a general question of inference from observational studies, not special to regression discontinuity.

“For background see here, here, here, here, here, here, here, here, here, and here. “

This is great stuff and many of the regular readers have read much of the detail. However, as a general principle in your critiques it might also be cool to provide some examples that are well done. You have an outline of appropriate ways to use the method; supporting one or two of those with a positive example from the literature would be a stellar way to illustrate the concepts as well as to quell the criticism that you’re throwing out the entire box because of a few bad apples.

Anon:

See section 21.3 of Regression and Other Stories.

So you are saying that there is “no need to worry about selection bias in the way that it is often a concern with observational studies” to then follow up with the claim that there is no reason “to expect balance for pre-treatment characteristics that don’t happen to be captured by the running variable”.

But these two statements are direct contradictions? How does this make any sense?

Yannick:

Not all bias is selection bias. In a generic observational study, we don’t know the treatment assignment at all, so for example the treatment could be chosen by people who would do particularly well under the treatment. In a discontinuity design, that particular source of selection bias can’t happen. But there can still be lots of imbalance between the treatment and control groups, and not adjusting for that imbalance can result in bias. It’s not selection bias, but it’s still bias. Just for example, suppose that smoking rates are higher on one side of the river than the other; if you don’t adjust for smoking, your life expectancy comparisons won’t be appropriate for estimating the effect of the treatment of indoor coal heating. That’s a bias, but it’s not a selection bias having to do with who decides to get the treatment; it’s just a bias caused by imbalance between these two groups in an observational study.

But if not adjusting for the difference between control and treatment groups in an additional covariate x_2 results in a bias of the treatment effect, then by definition x_2 is correlated with treatment status. Consequently, the assumption of no selection into treatment or – put differently – the claim that only the running variable x determines treatment status is violated.

It might just be that the two of us have a different definition of selection bias and whether it involves only bias introduced by conscious decisions of individuals. In general, I agree that claiming a border or river is a valid discontinuity in the sense of allowing the identification of a causal parameter is pretty far-fetched.

Yannick:

“Selection bias” is just a phrase, so no need for us to worry exactly what is selection bias and what is some other source of bias. But you’re wrong in your first paragraph. Just consider my hypothetical example above. Label x1 as distance north of the river and x2 ash smoking status. Yes, treatment assignment is correlated with x2, but it is a function solely of x1. The assumption that only x1 determines treatment status is _not_ violated in this example.

Well, selection bias is just a phrase that leads to misunderstanding in the subsequent discussion then. You seem to use ‘selection into treatment’ and ‘x determines treatment’ in the narrow sense of the literal assignment rule of a treatment status rather than in the broader (and practically relevant) sense of ‘which other characteristics do people in the treatment group possess than those in the control’.

My question then is why make this distinction? RDD comes with clearly stated identifying assumptions that require continuity around the threshold for variables that are predictive of the outcome. If you believe that this is unlikely to be true in the case of the river, is the problem then really with the RDD or with researchers claiming that RDD is an appropriate method in the given context?

If I observe some imbalances around the threshold for other predictors and I don’t have a clear theory in mind as to why the differences should be there for exactly those specific variables, I should probably conclude that the assumptions for an RDD set-up are unlikely to hold and I truly am in an observational regression setting in which I try to net out some confounding effects without being able to claim causality. It is not clear to me why I should call it an RDD analysis then, though.

Yannick:

You’re thinking about it backward. Consider the problem from the applied researchers’ point of view. It’s not like they have some real-world situation that satisfies a bunch of mathematical axioms and then they apply the statistical method. Rather, they have a problem where they notice that the treatment assignment depends entirely on one variable, x: no exposure when x is less than the threshold, exposure when x is greater than the threshold. Then they apply regression discontinuity. Unfortunately, the typical advice is

not, “This is an observational study. Unless certain very special conditions hold, you need to adjust for imbalance between control and treatment groups in relevant pre-treatment predictors.” Instead the typical advice is, “You have causal identification! Do a regression discontinuity analysis and all you need to do is adjust for x,” with all the discussion being about whether to adjust for x with a linear function, a piecewise linear function, a polynomial, a nonparametric curve, whatever. All sorts of discussion of how to adjust for x, not nearly enough discussion about all the other X’s in the problem.As to your last paragraph: No theory is needed for why there are imbalances in pre-treatment predictors. The point is that he cities south and north of the river, or whatever the threshold is, will differ in various ways. That’s life.

Coincidentally I was listening to this talk on tie breaker designs the other week, which may be of interest: https://multithreaded.stitchfix.com/blog/2021/02/15/art-owen-algo-hour-video/