Thanks for the reference.

]]>It was a random observation that I should not have included in my blog post. It’s a matter of personal preference. Personally, I aim to comply with the principles outlined in the tidyverse style guide: https://style.tidyverse.org/

]]>I’m a little late to this post. Have you considered updating your post, with a P.S. or something, noting the baseless speculations or removing them entirely, otherwise they’ll always be on your blog and your apology will be buried in the comments section of this blog..

]]>Thanks

]]>Thanks for the loess plots and discussion.

]]>Here is my quick take on the data.

http://models.street-artists.org/2020/07/05/regression-discontinuity-fails-again/

Using LOESS with different bandwidths, we can investigate assumptions of smoothness vs of rapid changes near 0. I plot 3 LOESS curves with 0.2 0.1 and 0.05 bandwidth (this bandwidth is fraction of the data points to be used in a given fit).

Bandwidths of 0.2 mean we are assuming the function is smooth and slowly changing, so that the overall behavior through a given region is best estimated by smoothing through that large region. A bandwidth of 0.05 means we accept that only the closest 5% of the data is relevant to the question of the behavior of the function at any given point.

If there is a step-like change in the expected value near x=0 then the smaller bandwidth LOESS fits will tend to jump upwards as they transition across the x=0 boundary.

In fact, given the noise in the data if you chose any two points on the loess curve that are apart by say 1 point in margin_pct_1 you will see a difference in lifespan that is randomly fluctuating wildly in the region of x=0

The claimed effect size also happens to be the grid-size here… 5 years. The entire function can be approximated by y = 23 + normal(0,2.5) around x=0 or so. There is nowhere on these functions where you can get a jump of 5 to 10 years without cherry picking out two peaks.

So if Erik is correct, it means that the whole of the effect is canceled out precisely by the covariates he adjusts for for these individuals. Since the people winning the elections didn’t actually live longer, they must have been much much sicker than the people they were running against, and if they hadn’t won the election they’d have died 5 to 10 years earlier than they did. Good thing they won and extended their life just exactly to the average.

:-(

]]>Erik:

My language is confrontational only in the sense that I don’t think this study adds anything to our understanding, except to the extent that it helps us build understanding as to how people can fool themselves using regression discontinuity analyses. But I agree that the discussion here has been productive. When I was speaking of people taking sides, I was not talking about your post (I criticized your study, it’s completely legitimate, not “taking sides” at all, for you to defend it!) or the comments here; I was talking about forums such as EJMR and twitter which seem more focused on zingers, soundbites, and statements of allegiance rather than on thoughtful commentary and responses.

]]>jim,

Thanks for the comment. No worries, I can handle it. However, I’m not sure the confrontational tone is needed to convey what you think of the data, study and findings. It’s totally fair that you believe that the approach we pursue in our study is “stupid” in so many “it’s hard to count them”, and I simply disagree. I have nothing to add that I haven’t already said in my comments above and in my response (including why I don’t believe we can assess the LATE by looking at the raw data).

]]>Dude, nothing personal but in ascribing anything meaningful to this data in terms the effect you’re seeking, you’re grasping at straws. You should see that form the get-go. The distribution of the raw data is random. There’s not even the slightest hint of a pattern, yet you propose to extract a massive causal effect from it. If this effect is so large and prominent, why isn’t it visible in the raw data?

I mean dude if I tried to publish data like that for some chemical and claim an effect on life expectancy, even the EPA would cringe. There’s no effect. There’s nothing in your data. Yet you persist in defending it. Just withdraw it. Issue a mea culpa, learn from it and move on.

There’s every reason Andrew’s language should be ‘confrontational at times’: you’re a professional scientists and this research has fatal and blatant flaws. Work like this erodes the integrity of science. You’re a grown-up, not a child who’s self-esteem needs care and nurture. You can handle it.

]]>Andrew,

1-3. I fully agree that precision is crucial. We also agree that we find a huge effect in our study. Again, we would love to have a lot more data. We don’t find much variability in these estimates and it’s similar to what is reported in Borgschulte and Vogler (2019). Accordingly, based upon previous research, our discussion and the tests we provide, my best guess is that the LATE is closer to 5 years than 0 years, but definitely also closer to 5 years than 10 years. You find all of these numbers meaningless here as it’s basically all noise (and no signal). I still believe that there is something useful here. I can understand you find this frustrating given your view on our study (that it never had a chance) and your views on RDD more generally. Again, I am not saying our study is perfect or conclusive (it’s not conclusive and definitely not perfect). The comments above have been very constructive and useful — and I believe that something can be gained from our study.

4. Yes. That’s also the reason we include covariates in our analysis (both state and candidate covariates) and report the results with and without covariates. I also agree that “an attitude that causal identification + statistical significance = discovery” is problematic — and we can learn a lot more about what can be gained from an RDD estimate by treating it as an observational study rather than an experiment (or “natural experiment”).

5. I agree. Group means are informative and a good starting point.

6. I don’t see anybody picking sides — and as I wrote in my response: “this is not about being on one side or the other”. I do find your language confrontational at times that might force some people to ‘pick a side’ (agree that our study is a scientific failure or a great success). However, I see most people engaging in a healthy discussion without saying that everything’s OK.

]]>Not endorsing EJMR. Just noting that someone posted there saying Andrew doesn’t seem to realize that the forcing variable is the only important variable since it fully determines the treatment assignment, and that post got a large number of upvotes and no downvotes. If that’s indicative, it means they think Andrew’s comments here and elsewhere come from him not understanding how RDD works, when in fact Andrew is making a very valid point they should be reflecting on. They won’t, though, unless this confusion about what Andrew is saying is cleared up, which I hope my summary post semi-accomplished.

]]>Thanks for responding at length, Andrew. This clarifies a lot.

I think we’re in complete agreement about 1-3. On 4, I agree that ignoring other variables is a mistake, though I continue to think the mistake consists in not parsing out more of the residual variance, rather than adjusting for differences between the groups, since if the identification strategy is compelling the latter shouldn’t matter so much. By contrast, the former matters either way because it also helps with the signal-to-noise issue you’re raising. Both of your 5s make more sense now. And on 6, I agree. As I mentioned in an earlier comment, when I see these papers I have the same instinctive reaction as you, and find it odd that people buy these results despite their manifest implausibility. And yet I’ve had difficulty pinpointing where these papers are going awry. I think your diagnoses are basically correct as to what’s going wrong, it’s just taken a few rounds of this for me to correctly understand them. And I worry many of the people who actually employ this design in their work have the same confusions about your analysis that I did, which might be why they continue to disregard it.

]]>Ram:

I’ve been thinking of writing yet another post on this, but it’s so exhausting sometimes. Anyway, here’s the story, which I can put here as a placeholder for now:

1. Given the data available, the variability of any estimate will be huge compared to any realistic effect size, which means that you’re trying to use a bathroom scale to weigh a feather—and the feather is resting loosely in the pouch of a kangaroo that is vigorously jumping up and down. This is my point #4 in the above post. The study never had a chance.

2. The above point arises whether you’re doing regression discontinuity, instrumental variables, or just plain regression. It would even come up if you were doing a randomized controlled trial. To do causal inference, you need identification and you need precision. In this case, you don’t got the precision, and that’s the case whether or not you have identification.

3. All the details of the discontinuity analysis are a distraction—but they can be useful in that they can help us understand what went wrong. In particular, the third and fourth graphs above show a characteristic pattern that we often see when regression discontinuity analyses go wrong: the fitted curve shows an artifactual discontinuity that is there to counteract an artifactual local negative slope.

4. What’s the biggest problem with regression discontinuity? In addition to the false sense of security that it gives to researchers (similar to the problem that comes from trust in statistically significant estimates that come from randomized experiments), the big problem with regression discontinuity is that researchers are encouraged to forget about all the other background variables in the problem. Just because the treatment assignment is determined only from a particular variable X, that doesn’t mean that you can’t use other predictors in your regression. It’s an observational study, and you want to adjust for differences between the treatment and control groups. It’s naive and silly to think that this one X variable will capture all the differences.

5. So why talk about forking paths? Forking paths don’t come up in the four items above. The reason why I talk about forking paths is that this helps us understand how it is that researchers can come up with statistically significant results from data where the noise overwhelms the signal. We talk about forking paths to resolve the cognitive dissonance that otherwise arises from the apparent strength of the published evidence. To put it another way: the top graph in the above post, reproduced from the published article, looks pretty impressive. Forking paths helps us understand how this happens. I purposely use the term “forking paths” rather than “p-hacking” so as not to imply that there was intentional sorting of results.

5. You ask why I recommend first comparing group means and then adding adjustments from there. I recommend this because it helps me better understand what the statistical analysis is doing. In this case, none of it really matters given point 1 above, but for analyses with a higher signal-to-noise ratio I think it’s important to understand what the statistical adjustments are doing.

6. Finally, I find it super-frustrating that people can read these critiques and still think the original studies are OK. I think that some of this is a lack of understanding of subtle statistical concepts, some of it is rule-following, and some of it is people taking sides. In any case, it’s frustrating!

]]>EJMR is transparently careerism over truth

]]>Glad to hear you think so.

There was an old thread on EJMR about a prior RDD post from this blog, and the reception was quite negative. In particular, there seemed to be confusion about how the forcing variable could possibly be an unimportant variable to control for, when in the context of a sharp RDD it is the sole determinant of treatment assignment. If I (now) understand Andrew correctly, the importance he’s concerned with is importance for predicting the observed outcome, not the treatment assignment. If the forcing variable is unimportant in this sense, we will have large residual variance, and the design will accordingly suffer from large type S and type M errors.

I suspect applied microeconomists would be much more sympathetic to Andrew’s concerns (as I’ve characterized them) if they appreciated this distinction, which I have to admit took me some time after reading many posts on this blog.

]]>This is a pretty good summary.

]]>If the forcing variable is only weakly associated with the observed outcome variable, then there will be large residual variance. If, moreover, the average treatment effect at the cutoff is small, then any estimate of it will be extremely noisy. In particular, conditional on statistical significance, there is a high probability that the sign of the estimate will be wrong (‘type S error’), and the magnitude of the estimate will on average be excessively large (‘type M error’). Because of this, even if we have good reason to be interested in the average treatment effect at the cutoff, and even if we believe the semi-continuity assumption holds (at least approximately), we don’t really learn anything from a significant RDD estimate of said effect.

Furthermore, since the typical estimator involves fitting separate curves to either side of the cutoff, and comparing their predictions at the threshold, this makes the estimate even noisier because we’re taking a difference of two extrapolative predictions. This problem is made even worse if our curve fitting models have bad extrapolative properties (e.g., high degree polynomials), but remains even if we use methods with linear extrapolation (e.g., LOESS).

Finally, there is the cultural observation that if we think RDD + statistical significance = profit, we will not pay sufficient attention to whether the forcing variable is strongly related to the observed outcome, whether the average treatment effect at the cutoff is plausibly large, and whether the extrapolative behavior of our curve fitting model is good.

Some analytic correctives include controlling for important (pre-treatment) predictors of the observed outcome to reduce residual variance, and fitting a model that estimates the discontinuity in a way that is interpolative rather than extrapolative (e.g., Daniel Lakeland’s sigmoid smoothing device).

Did I pass this ideological Turing Test? All of this makes good sense to me, I’ve just never seen you lay the full set of considerations out in this way.

What I’m less clear about is what forking paths has to do with this, given that there are pretty standard workflows researchers in this area tend to use. Once you’ve chosen a treatment to study, the forcing variable and the cutoff are already given. The only researcher degree of freedom is choice of outcome, and this is a problem with any study that isn’t pre-registered, not just RDD.

I’m also unclear about why you always talk about observation studies needing to start with the comparison of group means, followed by adjusting for differences between groups. I understand that controlling for (pre-treatment) predictors can reduce residual variance, and so limit type M and type S errors, but if we stipulate semi-continuity there isn’t any issue of confounding by these variables. One of the appeals of this identification strategy is that it doesn’t require us to measure every confounder and correctly specify it in our regression adjustments. Why do you say we need to control for differences between groups?

Anyway, if I’ve at least (mostly) characterized your views correctly this will be very clarifying for me, and hopefully for anyone else similarly lost in trying to understand your perspective here.

]]>Dear Prof. Gelman, dear authors thanks a lot for the stimulating scientific discussion!

I learnt a lot out of it! Enormous thanks!

The article reminded me a famous way of saying of a famous Italian politician names Giulio Andreotti that I just wanted to share

https://it.wikipedia.org/wiki/Giulio_Andreotti

Andreotti used to say “Power wears out those who don’t have it…” (“il potere logora chi non ce l’ha…”)

]]>“How can I make my work as transparent and as useful for all the folks who might want to include my findings in a subsequent metanalysis.”

1) Get excellent data.

2) Compile and store data in a relational database with proper data validation.

3) develop additional tests beyond database validation to look for invalid data.

4) if data acquisition is ongoing, run the data tests frequently to catch any problems quickly.

Borgschulte and Vogler (2019, https://doi.org/10.1016/j.jebo.2019.09.003) also find a positive effect of winning office on longevity (relying on 20,257 observations and, accordingly, deal with less noise than our study). I am mentioning this study here as they provide a similar point in footnote 20 (i.e. the relevance of including health-related measures): “An analysis of observed pre-election health-related measures such as the height and weight of candidates, would be valuable in this setting. Unfortunately, such data to our best knowledge is unavailable for the vast majority of candidates.”

]]>My thought process is if winning or losing an election does not have an impact on longevity, then the magnitude of the win or loss would also not have explanatory power.

]]>Anon:

This could be, but I think the key point is my item 4 above. These data are so noisy, and any realistic effect size will be so small, that all these analyses are just sifting through noise.

]]>It could also be that healthier candidates are more able to engage in the sort of retail politics that wins close elections.

Years ago, there were studies that showed that winning an Oscar made you live longer, until some scholars factored in the inherent bias of the academy toward healthier actors. https://projecteuclid.org/download/pdfview_1/euclid.aoas/1310562204

The replication data (as far as I can tell) cannot balance on the health history and appearance of individual candidates. So, to me, this finding is just an artifact of the Healthy Performer Survivor Bias in US Governor’s races.

]]>That’s fair. Yes, I believe in the study and, sure, why not put money on the central finding… Also, if the odds are very good I would even be willing to bet on studies I don’t necessarily believe in.

For the specific suggestion, there is already another study looking into these elections. I mention it in my response and you can find it here: https://doi.org/10.1016/j.jebo.2019.09.003.

For a study on prediction markets in relation to replications, see this in PNAS (of all places): https://www.pnas.org/content/112/50/15343

]]>Thanks, Michael! Much appreciated. I have saved a local copy and I’ll go through the data and output. I’ll let you know if I have any questions or anything meaningful to add.

]]>Eric,

I’ve posted the code to my github repository: [here](https://github.com/apollostream/Longevity_Elections)

Feel free to explore, and let me know if you have any questions.

One can imagine that losing a close election (by 1 percentage point) is more stressful and traumatic than clearly losing.

]]>Hi Michael,

Thank you for the response. To reiterate, I am not arguing that the (simple) code used to get these coefficients is the best approach — and it is not something we report or use in the paper. My point was that it’s not necessary to use our replication material (or the rdrobust package) before you will find any significant results (though this was the case for Gelman) – and even the most simple approaches (e.g. the introductory procedure outlined in Gelman and Hill, 2007), will yield significant results.

It sounds like an interesting approach (i.e. the Bayesian analysis on the observations below abs(marg) < 5) and I will be happy to look at your results. Is the code publicly available?

Erik

]]>I am about to embark on pre-registering an RDD that’s going to be relevant to an upcoming policy change in education policy in my state. I have done the power analysis and I know that I am really only able to detect effect sizes that are around 0.3. All of the meta-analyses are suggestive that I will be grossly underpowered to find the effect size that existing literature ~0.06. It’s a relevant and ongoing policy question but I know this individual estimate is likely to be noise. How can I make my work as transparent and as useful for all the folks who might want to include my findings in a subsequent metanalysis.

]]>“1. Common sense”

The math is excellent but hardly necessary. There are so many ways this can be shown to be stupid it’s hard to count them.

1) if someone has lost a close election and has cancer, there is no medicine to counteract it then? What specifically is the medical effect?

2) what other life events are playing havoc with our longevity? What about:

getting fired

getting cheated on by a spouse

having an immediate family member murdered

being a victim of sexual or other assault

3a) we already know longevity has a genetic component. So how does losing an election interact with one’s genes? Does it overpower nature? Perhaps the next study should compare gubernatorial life span with gubernatorial parent life span. Is the regression discontinuity retroactive?

3b) continuing with genetics: perhaps it works the other way around: if my family has a history of short lifespan, am I more likely to loose a close election? Or if my family is long-lived should I run for office?

This kind of study makes alchemy look like science.

]]>Hi Eric,

I think Andrew raises valid points, and I understand why you are vigorously defending your work. However, based solely upon your code you posted on the Github (“erikgahner/Code to Paolo Inglese”) and some poking around in my own R code using package “brms” (Bayesian Regression Modeling using Stan”) by Paul-Christian Burkner, I’ve come to the conclusion that the effect of “won” on “living_day_imp_post” you’re seeing is simply an artifact due to the functional relationship between “won” and “margin” — the former merely being a discretization of the latter.

Narrowing the dataset to “margin” values of magnitude less than 5 really accentuates the correlation between “won” and “margin”.

The simple regressions I performed gave opposite signs for the coefficients estimated for “won” and “margin”. Plus, the “pairs()” plots of the posterior samples of these coefficients from the estimated model (`formula = living_day_imp_post ~ won + margin + living_day_imp_pre`) also had high negative correlation in the bivariate scatterplot.

So I just took the posterior samples of these coefficients, say b_won & b_margin, multiplied them by the dataset terms won & margin respectively, and added them together to get the contribution that won & margin make to the prediction of living_day_imp_post. I then plotted boxplots of these contribution for won=0 and won=1. The distributions show large overlap — I’m sure a formal statistical hypothesis test of these contribution distns would not reject the null hypothesis that the contribution of won & margin together are the same for won=0 and won=1.

I went further with models that used either one or the other of the terms won and margin, each yielding coeffs with posterior distribution that largely overlapped zero, and I compared the “loo” metrics (elpd) between the model and found that although the model with both terms gave a larger elpd, it was within 1 std. error of the elpd’s of the two models using either of the terms.

But, I think the clearest indication that there is no real predictive power of won on living_day_imp_post is the fact that even the model that shows a highly significant coefficient for won ends up having a highly significant negative coefficient for margin and that the total contribution of these terms is basically the same whether won=0 or won=1.

I hope this helps.

]]>Thanks for doing that, I think this is a great illustration of the fact (if it is a fact) that you can’t trust regression discontinuity analysis for this sort of thing. In fact, the discontinuity is nearly never zero and for many vote margins it is substantially larger than for a vote margin very close to zero! I suppose someone can make up a Just So Story for why losing by 5 percentage points should make you live a lot longer but losing by 1 will shorten your life, or whatever (I’ve already forgotten what the plots show) but honestly this just looks like noise-fitting, which is what I think it is.

]]>As soon as you make a realistic accounting of measurement noise in x, there is no converging in sharpness to below the scale of the measurement error. If you pretend the measurement error in x is exactly zero you can do the math, but it’ll be wrong anyway.

Still, I’m just glad you understand that you can treat a step function as a particularly sharply changing continuous function, and I hope you explore that avenue in your models!

]]>Daniel,

Getting the convergence rate right is important if we want to be able to build confidence intervals around this estimator with correct large sample coverage. I realize you’re less interested in this kind of frequentist property, though I suspect that a well specified fully Bayesian approach here will be asymptotically equivalent to some sort of cross validation for choosing the tuning parameter. A fixed tuning parameter will lead to asymptotic bias, and will result in confidence intervals to have less than advertised coverage (or, if you like, hypothesis tests have higher than advertised type I error rates).

But in any case, I think this is productive. I agree that there is room for improvement in how we estimate the discontinuity, and I think this points to a clever way to achieve such an improvement. I just don’t think this is what Andrew is unhappy about (or not the only thing he’s unhappy about), since I somehow doubt he’d be happy with many RDD papers if they just used this estimator instead of the usual one.

]]>Also, I just don’t work in the p value / null hypothesis testing framework. So for me I’d probably make the scale a parameter and provide a prior over how sharp it should be.

]]>If you make the scale of the sigmoid on the same order of magnitude as the measurement error in your x values you should be fine for all practical purposes… Asymptotic consistency is basically unimportant in most real world scenarios where the number of data points is always less than 100 trillion, usually less than a few thousand.

The phenomenon is similar to so called “asymptotic series” these are series representations of functions that diverge as the number of terms increases, but for a finite number of terms they’re often far more accurate approximators of functions than convergent series.

So, in your example, you’re talking about percentage points in an election. When we look at something like the Bush / Gore election and the whole issue of hanging chads and soforth it became clear that we really can’t count elections meaningfully more accurately than about 3 maybe 4 significant figures. So using something like 1/(1+exp(-x/.05)) where x is expressed in percentage points is as sharp as you could ever really be just due to the x value measurement uncertainty.

You can usually make a similar argument in any other RD context.

]]>somebody, Daniel:

I think what you’re saying is that instead of estimating the size of the discontinuity, we should instead estimate the difference between the asymptotes of a certain sigmoid, which is constructed as a smoothing device in place of the discontinuity. And the tuning parameter that controls the smoothness of the sigmoid can be set to decay as the sample size increases such that asymptotically it becomes a step function and the estimator converges to the discontinuity. Do I have that right?

I think this is a pretty reasonable idea for regularization. Do you have a specific way of setting this tuning parameter to ensure the smoothing diminishes at the right rate for the estimator to converge to the average treatment effect at the cutoff? Just having it decay doesn’t guarantee the estimator is consistent. Maybe some type of cross validation would do the trick.

In any event, if insufficient smoothing of the estimate of the discontinuity is the whole of the issue, then that means the problem with RDD is (1) from my original post. And that’s clarifying, though I have to admit that a lot of what Andrew complains about doesn’t seem to be covered by this complaint.

I should just say for the record I’m neither pro-RDD nor anti-RDD. My view is just that RDD has a logic to it, and if you want to criticize it you should tell us where the logic is failing in any given case. It’s frustrating to read posts like this from Andrew because I can’t figure out what he’s disagreeing with, since he won’t express the complaints in terms of RDD logic. But I think this sub-discussion has been illuminating, in that it’s clear that this specific complaint is about insufficient regularization of the typical estimators used.

]]>Link fixed; thanks.

]]>For me, this is one of the better explanations of what’s going wrong with lots of RD analysis. Thank you.

]]>Peter, Martha:

This one’s a bit different, though, because it seems reasonable to consider the intermediate outcome—whether the election result is just above 50% or just below—to be the equivalent of randomly assigned. Really, though, the model has lots of problems as can be seen by its implications of life expectancies for candidates who just win a close election, win by 5%, etc. It’s the usual story of overfitting patterns in observational data.

]]>It’s just a variant of the old “correlation is not causality” thing — e.g., Shoe size predicts reading exam score for children.

]]>as “somebody” said, the function

inv_logit(x,q) = 1/(1+exp(-x/q)) goes pointwise to a step function *everywhere* except *exactly at x=0* as q goes to zero (or you can do 1/(1+exp(-kx)) as k goes to infinity)

So simply define your treatment effect as f(x+epsilon) – f(x-epsilon) for epsilon finite and small… and you recover this definition of the treatment effect in the limit of large data.

]]>> Again, the point here is that if semi-continuity holds, and if there is a nonzero average treatment effect at the cutoff, then there is a discontinuity in the regression function. The size of that discontinuity is precisely the size of said treatment effect. If we’re using a model that assumes that there is no discontinuity, then the model is assuming said treatment effect is zero. Let’s not lose the plot here.

That’s the same as a logistic function as k -> infinity, so you can get an approximate estimate of the treatment effect if the data is convincing of a discontinuity and fits to a high k. It’s true that using smooth functions makes it categorically impossible to estimate the true function with any finite sample, but

1. There is no true model, they’re all wrong anyways

2. Even if there is a true model, and your family of functions includes it, you’re almost surely not going to estimate the true function with a real sample. The only thing that matters is how wrong you are.

> I’m all for regularization, but I generally want the bias to disappear as the sample size gets larger. If the model enforces continuity, the bias never goes to zero, so the estimator is inconsistent.

As n goes to infinity, the logistic function should go to a step function. Unless I’m missing something, it is asymptotically consistent, just biased. Which is fine, I don’t think unbiasedness is lexicographically preferred to low variance, and I think the variance of NPRDD is too high to be useful.

]]>I’m all for regularization, but I generally want the bias to disappear as the sample size gets larger. If the model enforces continuity, the bias never goes to zero, so the estimator is inconsistent. Seems less than ideal.

Again, the point here is that if semi-continuity holds, and if there is a nonzero average treatment effect at the cutoff, then there is a discontinuity in the regression function. The size of that discontinuity is precisely the size of said treatment effect. If we’re using a model that assumes that there is no discontinuity, then the model is assuming said treatment effect is zero. Let’s not lose the plot here.

]]>I feel like you’re thinking in terms of picking a family of functions that contains the “true model” where a discontinuity is there or not there, and statistics tells you if the discontinuity is real or not real. Daniel is saying to pick a set of functions that might not contain a “true model” where there’s a “real” jump discontinuity, but can approximate such discontinuities to an arbitrary sharpness if there’s strong evidence for it. With a set of smooth basis functions including inverse logit, if there’s enough evidence for a discontinuous effect at x=k it’ll still look very close to a discontinuous jump as k goes to infinity.

It might be true that picking a basis of smooth functions where “the true effect” is discontinuous biases your effect size downwards relative to the non-parametric local linear RDD, but you get a pretty dramatic variance reduction in return. People sometimes argue that unbiasedness of an estimator is a minimal requirement, but personally I find that an estimator like the NPRDD that’s so unstable that it’s nearly guaranteed to see large effects in pure noise is much less useful.

If you need an estimate of the “average treatment effect”, it can just be the coefficient on the logistic step function in Daniel’s example. That gets iffy if the curve turns out to be pretty shallow, but in that case you should be asking if you have proper causal identification in the first place.

]]>We want to use an estimator that can recover functions that do, and functions that don’t, have discontinuities at the cutoff. It isn’t a question of “changing dramatically”, as this could happen in a continuous fashion. An estimator that enforces continuity at the cutoff is not going to be able to recover a function that does have a discontinuity at the cutoff. And the point of my derivation is that if we think the average treatment effect in the cutoff population is nonzero, and we think the potential regression functions are semi-continuous, then there is a discontinuity there. So you need to use an estimator that allows there to be a discontinuity there, otherwise under semi-continuity you’re assuming there is no average treatment effect in the cutoff population.

]]>That’s an interesting approach! We did consider the relevance of alternative cutoffs. When we estimated the main results with synthetic cutoffs, we found no significant effects (these results are available in appendix 3.6).

]]>