Fatal Lady

Eric Loken writes:

I guess they needed to add some drama to Hermine’s progress.

[background here]

P.S. The above post was pretty short. I guess I should give you some more material. So here’s this, that someone sent me:

You’ve written written about problems with regression discontinuity a number of times.

This paper that just came in on the NBER Digest email looks like it has another very unconvincing regression discontinuity. I haven’t read the paper—I’m just looking at the picture. But if you asked me to pick out the discontinuity from the data, I think I’d have a lot of trouble…

Among other things, it looks like if you take their linear specification seriously, we should expect negative patents for firms with assets below 50 million pounds if eligible for SME (left), or for firms below 80 million pounds if ineligible (right). So, um, maybe we shouldn’t take those lines very seriously.

Also the minimum x value edge looks suspiciously as though it’s been chosen to make the left-side regression line go as high as possible. Wouldn’t be an issue if they had doing some kind of LOESS regression, of course.

9 thoughts on “Fatal Lady

  1. How did that graph ever get published?

    Here we are generating random firms and random exponentially distributed number of patents… and then doing separate linear regressions above and below the breakpoint:

    library(ggplot2);

    S = runif(160,61,111)
    P = rexp(160,1.0/.05)
    Draw = as.factor(as.vector(sapply(1:10,FUN=function(x) return(rep(x,times=16)))))
    DF = data.frame(size=S,pat=P,rep=Draw)
    ggplot(DF) + geom_point(aes(size,pat,col=rep))+geom_smooth(aes(size,pat,col=rep),data=subset(DF,DF$size 85),method=”lm”,se=FALSE)

    Basically every single pair of same-colored lines has a radically different slope…

  2. Granted that there were space limitations in the NBER paper (and the bizarre choice to put the graphs in the center of the page is not helping information density), but surely the minimum best practice when pursuing a regression discontinuity is to do a placebo check that confirms that the discontinuity only occurs near the actual discontinuity threshold.

    I think even a basic visual inspection would show that the first graph in the linked paper would fail such a check.

    • Aaron:

      I think the problem, as with that pollution-in-China paper, is that economists (and, I suppose, researchers in other fields too) are trained to think of identification strategies as magic.

      The reasoning goes like this:
      1. Identification is key. Without identification, you can’t learn anything useful.
      2. We have identification!
      3. All our inferences are just fine, and we can proceed straight from statistical significance to story time.

      And, remember, it worked with the regression-in-China paper! The analysis was discredited but it’s still featured on author’s website.

  3. Speaking of regression discontinuity designs (and power, beauty, etc.)
    I recently came across this manuscript: http://scholar.harvard.edu/files/jmarshall/files/britain_schooling_reform.pdf
    It claims that
    > each additional year of late high school increases
    > the probability of voting Conservative in later life by 12 percentage points.
    I’m no political scientist, but this struck me as completely unbelievable.
    I looked through the paper to find mistakes, but I couldn’t.
    But then I saw figure 4 (p. 20 of the pdf): the estimate is super noisy, and the lower end of the confidence interval can fall on either side of zero, depending on specification. So this is classic type-M errors/exaggeration ratio/significance filter stuff.
    Thought y’all might be interested.

    • Ack I thought I’d be clever and use markdown, but apparently I’m not too good at markdown. “each additional…12 percentage points” is a quote (from the abstract)

Leave a Reply

Your email address will not be published. Required fields are marked *