Ultimately, then, the call becomes an issue on what kind of errors you are willing to make. And, unlike a simulation, one must take into account the fact that one can deviate from the assumptions of the RD procedure used, forking paths, whether the estimated effect is all that relevant (perhaps there is in fact a discontinuity at the threshold, but the unconvincing plot could suggest that it has little practical relevance – subjects a little bit away from it may not get such different outcomes after all, despite not being comparable to each other), etc.

]]>He’s saying that a noisy but well behaved plot without obvious trends doesn’t necessarily imply that the effect isn’t there. The example where there definitely is an effect since the data is simulated, but the plot is “uncompelling” makes sense to me.

Though maybe language like “don’t use RD plots to make statistical inference” and describing plots as merely “a powerful way to display an effect” incorrectly minimizes the value of plots.

]]>The piecewise-quadratic curve is indeed horrible, both in general terms (estimating what’s going on at the discontinuity using this global model) and in the actual example as shown in the graph. Using such curves is just poor practice.

The story is slightly more interesting when we to a locally linear model, which is better in that it eliminates or reduces the global dependence of the fit. But still there’s the problem of what happens with the actual data: if the local linear fit also looks like a mess, then we still can and should be concerned that the fitted model is a data artifact and does nothing for the causal inference of interest except add noise (or bias, depending on how you look at it).

Also forking paths, which makes it hard for me to take the reported statistical significance seriously.

And the larger issue of disconnect between theory, model, and measurement.

And the meta issue of why people wanted so much to believe and then defend this result. I don’t see where all this trust is coming from.

]]>But how on earth do you justify the shape of the curves on each side of 50%? The concavity is flipped from one side to the other. And the slope flips signs in the middle of each side. Why are the extrema in the middle of the two intervals? I would never have guessed that, I would have guessed the extrema to be at the far ends or at 50%. Why on earth does NCSKEW and DUVOL have a local max at .75? And why is there almost no trend up or down in the two intervals? We are confident that there is important information in the second derivatives when there is nothing of interest in the first derivatives? (Not a rigorous argument, I’ll grant you, but my spidey-sense is tingling on this last point.)

You might (might!) be able to tell a story where extreme values of the vote are associated with extreme values of each variable, so you need a nonlinear model, but the nonlinearity shown is just plain weird. The nonlinearity is a jumble of concavities and extrema … what you would expect to get if your model was the result of arbitrary cherry picking.

]]>in this case I assume if the vote is more than 50% the firm unionized and then they want to see if unionizing caused changes in volatility, by comparing basically firms that had votes like 49% to those with say 51%

]]>Also, my understanding is that the purpose of using a RDD is usually not to investigate whether or not there ia some discontinuity, but rather to investigate whether or not there is a discontinuity at a point where there has been some particular event that might cause a discontinuity.

]]>Each technique needs to be illustrated by (at least) two examples: One where it works well, and (at least) one where it doesn’t.

]]>Graphing is more of a challenge when there are multiple continuous predictors.

One thing that we sometimes do is to discretize some predictors. For example, in a regression of y on x1, x2, x3, x4 if you discretize x2, x3, and x4 to have two, three, and four levels, respectively, then you can display the full model using a 3-by-4 grid of plots indexed by x3 and x4, with each plot showing y vs. x1 using different colors for the different values of x2.

In other settings, we can make an omnibus continuous predictor by using the linear predictor from a fitted model. For example, in a regression of y on x1, x2, x3, x4, …, where x1 is a discrete predictor of interest (for example, a treatment indicator), we can fit the model y = Xb, then create the omnibus predictor z = b2x2 + b3x3 + b4x4 + …, and then plot y vs. z with different colors for the different values of x1.

For your basic regression discontinuity problem, it’s more clear what to graph, as there’s a single forcing variable to use on the x-axis. So there’s no excuse *not* to plot the data and fitted model, and that’s good news, because such a plot can reveal problems, as in the example above.

I understand the validity of this point and I have clearly followed similar principles in presenting data from my own research from early on.

BUT:

My hunch is that graphing data is fine as long as you have only 2 variables (resulting in a 2-D display) and perhaps even 3 variables (with 2 variables having main or interaction effects on a third, dependent variable; illustrated by, for instance, regression hyperplanes in 3-D). But even with 3-D models confined to the 2-D surface of your computer screen or a fancy graph in a printed article, it cam become difficult to gauge the appropriateness of a model just through visual inspection. And in my opinion the validity of this approach is even more compromised once you have more than 2 predictors that might interact (or not) in complex ways. At that stage you either have to break things down by splitting one variable and thereby obscuring to a large extent that variable’s properties or by some other way of reducing the complexity. And that comes at a cost, usually.

Perhaps I am just ignorant of tried & tested approaches to depicting more complex models in graphs. But my sense is that we need more, better, and more clever ways to depict data from complex models and thereby to check the adequacy of our models in light of our data. I am not ready to accept that we should be limited in our ability to understand and correctly model complexity in behavioral and other kinds of data simply because anything that goes beyond 2 or 3 variables involved cannot easily be visualized.

Does anyone have suggestions?

]]>Yes. Cleveland is one of my heroes.

]]>That’s too bad. Again, I think part of the blame goes to statistics and econometrics textbooks, where we tend to give clean examples where the model fits the theory (for example, predicting post-test scores from pre-test scores in a model where pre-test is used as a discontinuity threshold), so you’d expect a strong and persistent relation between the forcing variable and the outcome.

Another way to put it is that we train people to have too much faith in the statistical properties of these canned procedures to provide insight about the real world.

Sometimes it seems that we’re actively encouraging people to set aside their common sense. The advice to look at the p-value and ignore the graph is a pretty stunning example!

]]>To underline this, as William Cleveland wrote in “The Elements of Graphing Data”:

“Data Display is critical to data analysis. Graphs allow us to explore data to see overall patterns and to see detailed behavior; no other approach can compete in revealing the structure of data so thoroughly. Graphs allow us to view complex mathematical models fitted to data, and they allow us to assess the validity of such models.”

]]>