Estimating discontinuity in slope of a response function

Peter Ganong sends me a new paper (coauthored with Simon Jager) on the “regression kink design.” Ganong writes:

The method is a close cousin of regression discontinuity and has gotten a lot of traction recently among economists, with over 20 papers in the past few years, though less among statisticians.

We propose a simple placebo test based on constructing RK estimates at placebo policy kinks. Our placebo test substantially changes the findings from two RK papers (one which is revise and resubmit at Econometrica by David Card, David Lee, Zhuan Pei and Andrea Weber and another which is forthcoming in AEJ: Applied by Camille Landais). If applied more broadly — I think it is likely to change the conclusions of other RK papers as well.

Regular readers will know that I have some skepticism about certain regression discontinuity practices, so I’m sympathetic to this line from Ganong and Jager’s abstract:

Statistical significance based on conventional p- values may be spurious.

I have not read this new paper in detail but, just speaking generally, I’d imagine it would be difficult to estimate a change in slope. It seems completely reasonable to me that slopes will be changing all the time—that’s just nonlinearity!—but unless the changes are huge, they’ve gotta be hard to estimate from data, and I’d think the estimates would be supersensitive to whatever else is included in the model.

The Ganong and Jager paper looks interesting to me. I hope that someone will follow it up with a more model-based approach focused on estimation and uncertainty rather than hypothesis testing and p-values. Ultimately I think there should be kinks and discontinuities all over the place.

18 thoughts on “Estimating discontinuity in slope of a response function

  1. First – and speaking of “in the air” – I think this is a great exercise in extending permutation test thinking into intuitively pleasing but analytically complex econometric environments. Daniel Lakeland – this is the kind of thing that I’m thinking about when I try and fail to explain my intuitions about re-shuffling data for inference. Page 7 touches on the idea and interpretation, but I’m still not convinced there is a fully formed metaphysical/epistemological framework for extending permutation test thinking into quasi-experimental practice, and using the resulting “p-values” (and knowledge of the the empirical environment) to make inferences regarding the world. I’d like to be able to articulate one some day.

    Second – maybe this is mentioned (only had time for a quick read), but it seems to me you should do these placebo tests at least a bandwidth distance from the “real” kink, or, in the case of splines, at least a knot away. You don’t want any real effect of the kink to influence your placebo Betas, right? Or am I missing something here?

    Last point: I think this method could be made more robust by thinking more about variability of the outcome left and right of the kink. Right now (in my quick read) it seems like the assumption about “random kink placement” obscures an additional assumption that observations are equally variable on either side of the kink. In a policy-induced kink, or if the running variable strongly determines the outcome, I suspect that is usually wrong (if there is a real effect). So it may be that randomly assigning the placebo kink on one side v. the other of the real kink might give you different sampling variability, and that strikes me (at least on first thought) as undesirable. One solution might be just to compare the variability of estimated placebo Betas on each side of the kink separately.

    • I’m not a fan of permutation testing as a frequentist panacea. The implication is always the belief that the “shuffling” itself is well defined for a given problem (in the same way people say “is it random?” as if that was a well-defined question). My question is – permutation with what correlation structure? There’s always multiple ways to permute which retain different correlation structures and the choice is often as unsupported and arbitrary as the choice of a point null hypothesis.

      • I totally agree. But sometimes, like when you assign treatment using some re-randomization scheme/rule in an experimental setting, you do know the treatment assignment procedure and can control for it very precisely. And in other cases you can sort of infer/fake it. Suppose you have some state-year time-series as the variable of interest, and micro-data from all 50 states. I think if you just randomly assigned a state-level (for all years) time series you would generate a meaningful counter-factual distribution.

        In this case, I think looking at other possible estimates of Beta obtained across the the running variable is intuitively appealing, while the analytic estimators we currently have for standard errors are poorly suited for the job. It answers the question “What would the distribution of Beta look like if this cutoff was generated randomly?”

        Randomization inference is like any good statistical method – really powerful for certain kinds of situations, not very useful for other ones. What I particularly like about it is that, when it is useful, it is also very transparent and easily interpretable.

    • I haven’t had a chance to look carefully at this yet. I’ll take a look monday, but here’s a thought I had when just skimming the blog post and the abstract and soforth of this paper:

      The idea that there’s a “continuous and smooth” function which could have a “random kink” is already highly suspect. Put another way, a smooth regression curve through the data is a fiction we impose upon our models, and then when we ask if there’s a kink, we’re just imposing a different class of smooth fictions + kinks and asking which one of those fits well… We could just as well ask how well does a gaussian process with covariance function exp(-r/l) fit, and that has sample paths that are nowhere differentiable (if I remember correctly). I think the idea of using “placebo” locations is basically a kind of “goodness of fit” metric.

      So, I am with you, I think if we’re going to use permutations as statistical methods we should develop a “framework” for understanding them. But, I think that framework needs to take a step back from just “how to interpret standard errors” and start thinking about “what does a regression mean”.

      • On the issue of “what does a regression mean”, one thing to keep in mind is that when RD is applied appropriately, it’s not supposed to be a special case of polynomial curve fitting. There should be a prior evidence for the boundary itself being well-defined and approximating randomized treatment. The estimation of the magnitude of the step should be analogous to estimation of the effect size in a RCT (with of course, 0 being a possible effect size). Correctly modeling the smooth component is analogous to covariate adjustment for sources of bias which can occur in a real-world RCT.

        None of this is to say that it shouldn’t be better-used in the literature or that the standard methods can’t be improved.

        • BTW I agree that estimating a “kink” discontinuity in slope is highly problematic given the quantity and quality of data available in these studies. vanilla RD though, can be a reasonable approach to making a causal argument (within the limitations of retrospective data) when done well.

        • So for regression discontinuity, we’re estimating essentially

          y ~ f(x) + M * H(x-x_step)

          where M is the “magnitude of the discontinuity” and H is the heaviside unit step function, and x_step is the location of the step and f(x) is whatever class of function we want to use to define the “normal variability” .

          With the regression kink problem, we’re estimating:

          y ~ integrate(f(x) + M*H(x-x_step), x)

          In both cases for econometric situations, usually there’s some “situation” that suggests that x_step should have a certain value. Such as for example if x is time in a time-series and a law was passed at time x_step, or in the Huai river data x_step was a distance from the river and the river formed a political boundary where there was a policy change.

          If the functional form of f(x) is sufficiently flexible then M will always be zero even if there IS a discontinuity! For example, with fourier series or chebyshev polynomials or something, we can approximate the heaviside step function as part of the series. So we’ll only ever find M != 0 if we restrict f(x) to be an incomplete basis. But then, we need to restrict f(x) to be incomplete essentially “only” in the “direction” of step functions, otherwise we don’t know whether we’re getting a nonzero M because of what’s actually going on, or just because our f(x) is insufficiently flexible to represent what is really going on (like maybe there’s oscillation and a slow trend, or whatever).

          But, if our basis is sufficiently flexible, and our data set is finite, we’ll just fit a curve through all the data points exactly! Given the noisy and finite data, we *need* regularization of the fit, which means limiting the basis to be less flexible either by choosing a smaller number of basis functions, or by putting strong priors on the coefficients in a large basis set.

          But when we restrict to a more limited set, that means in essence that we only allow “slower” transitions. So it seems like this entire paper is re-discovering the fact that “discontinuity” can be smoothly approximated. For example

          H(x-x_step) = 1/(1+exp(-(x-x_step)/s))

          for s an infinitesimal number (in nonstandard analysis). But for s a limited number, you get a “step” transition which takes place over distance O(s). If you have data points separated by distance d, then all you have a hope of knowing is that s d

          Furthermore, you only have any hope of seeing “persistent” affects. It’s no good looking for something like abs(atan(5x)), where there’s a nice big kink, but it’s entirely local (assume I’m measuring x on a scale where data comes at intervals of O(1)). There’s no way to detect the difference between that and measurement noise.

          So, I feel like maybe econometricians are missing a bigger mathematical picture or something. Even a policy change that takes place on midnight of Jan 1 will have some “duration”. For example if they’re going to drop the speed limit on highways some people will start driving slower days or weeks in advance, and some people will still speed until they’ve been caught a few times…

          Looking for the difference between a “kink” and a fast changing but smooth transition (as in their figure 1) is completely pointless. And figure 2, where they have no kink will of course fit well with a kink because setting s to be an infinitesimal will only cause local badness of fit in the vicinity of age 27. The point though, isn’t whether there’s a “true kink” at age 27, but whether things are changing in the vicinity of age 27, and it’s obvious that they are. Anyone who needs a statistical test there is a moron.

          Figure 3… what the heck? That figure and legend are completely uninterpretable to me.

          Figure 4: isn’t this what I’m saying? There’s no meaningful distinction for statistics between “a kink” and “a function which largish curvature in a certain region” simply because ALL data analysis with noisy data MUST have regularization or we will fit an interpolant through all the data points.

        • As I mention above I think slope discontinuity estimation is mostly pointless for these kinds of data. Regarding RD more generally – “If the functional form of f(x) is sufficiently flexible then M will always be zero even if there IS a discontinuity” – yes this is why the family of f(x) shouldn’t come from a goodness-of-fit criterion. RD is just another example of causality from retrospective data is inherently underdetermined absent assumptions. The smoothness of the functional form should reflect the plausible smoothness of variation in the absence of a treatment effect. It’s a qualitative argument, but almost all all causal interpretations of retrospective data have some qualitative aspect. I tend to interpret these forms of causal inference more as reasoned historical arguments rather than mechanistic models.

          In theory, one could derive the “background smoothness” from the data itself, which could be an interesting approach to improve on the pick-a-polynomial heuristic. But the choice of the background sample would still be a modeling assumption.

        • “almost all all causal interpretations of retrospective data have some qualitative aspect”

          Yes! But I don’t even think you need the “almost.” I think the fundamental aspect of quasi-experimental, reduced-form econometrics is the union of statistical technique and qualitative understanding. We are always just constructing an argument where quantitative outputs are interpreted through the lens of a qualitative description of the world. In particular, we tend to want to argue (mostly qualitatively) that the world itself is providing us with identifying variation that is “as good as random*”, and that (qualitative) argument provides the basis for the causal interpretation of our statistical analysis.

          *and I agree that the phrase “as good as random” is problematic, but in this case I want it to mean something like “what is not described in the model is un-correlated with the covariate of interest.” – I don’t know, maybe that formulation isn’t any better.

  2. One thought before digging into the papers: Model selection. If you’re going to do a polynomial fit to your data then not only do you need to justify the order of the fit – and why a polynomial as opposed to, e.g., a spline? That’s pretty basic though. More significant I think, if you’re going to claim a discontinuity at a particular point then you need to demonstrate that the evidence is compelling that the discontinuity is at that location as opposed to a different location – or locations. Look at the Huai River data. Is it really any more plausible to claim a discontinuity at 0 deg as opposed to +1 or 2 deg north of the boundary. There’s a big green circle which certainly looks like it belongs with the “south of Huai” cluster. Looks can be deceiving though. What does the analysis indicate?

    One last thing, if you’re going to do a high order polynomial then take full responsibility for your actions: show confidence intervals, prediction intervals and extrapolate (and show confidence intervals with the extrapolation). Looking quick, there do not appear to be any confidence or prediction intervals shown with the fits to the Huai River data.

    • WWJTD?

      Hold the phone. The Huai River data is associated with geographic coordinates. If you’re going to look for a pattern (or discontinuity) why not start out the applying a method – such as thin-plate spline or kriging – which is well-suited to working with that kind of data. If there’s a discontinuity then it should stick out like a sore thumb. (That the discontinuity follows the river should be a hypothesis that’s tested not a presumption.) TPS ref: http://www.image.ucar.edu/pub/nychka/manuscripts/OV1.pdf

      A few other thoughts:
      1. How do TSPs not jump the river?
      2. What’s the prevailing wind direction?
      3. What do atmospheric transport models predicts for TSP distribution given source locations/intensity distribution?

      • With the Huai river study, there was a policy change that was controlled by the river as a political boundary. So the basic idea was to see if that policy change induced a health change. There was no real attempt to model the physics of pollutant distribution.

        • > There was no real attempt to model the physics of pollutant distribution.

          Understood.

          The issue is that it isn’t policy change that affects life expectancy, it’s TSP concentration. What you want to know is the sensitivity of life expectancy to ug/m3 of TSP. In this case, the role of policy change was to create change in the distribution of TSP concentrations which, in principle, enables a more accurate estimation of the sensitivity than would be possible otherwise.

          TSPs are airborne. The Huai River isn’t a wall with respect to airborne pollutants. Atmospheric transport matters. If the wind blows in the right direction particulates created on one side of the river will end up on the other side. If that’s a common occurrence and the sensitivity of LE to TSP concentration is significant then you wouldn’t necessarily expect a discontinuity in LE at the river itself. In order to understand how sharp a boundary to expect you want to understand transport.

          My bottom line: If your goal is to accurately determine the sensitivity of life expectancy to TSP concentration – and the consequences thereof – then you need to be a lot more sophisticated than simply regress life expectancy vs distance from the river. Perhaps it turns out to be that simple but it’s not a good presumption to start from.

        • Oh, no question. My biggest complaint about the Huai river thing when it first hit the blog was that they didn’t model transport, and that they turned a 2D problem into a 1D problem. So, we’re on the same page.

Leave a Reply

Your email address will not be published. Required fields are marked *