All that said, I actually think this new paper isn’t entirely terrible as an exploratory analysis. Definitely could be wrong but not totally unworthy of publication, if appropriately caveatted (which of course it never is).

]]>Two influential econ papers in Science…

http://science.sciencemag.org/content/348/6236/1260799

http://science.sciencemag.org/content/341/6144/1236498

Terry:

If you publish a paper in one of the tabloids (Science, Nature, PNAS), I think you’re much more likely to get lots of media attention. It seems that, for the news media, a paper in one of the tabloids is more important than anything published in the American Economic Review, American Political Science Review, etc. A complicating factor is that the tabloids encourage bold claims and simple statements with minimal qualifications.

So . . . if you’re a social scientist and you think your work is important, it makes sense to publish in the tabloids, if you can, so your work will get more attention. And if you’re the sort of social scientist who’s willing to make bold claims far beyond what is warranted by your data, you just might be able to write the sort of article that the tabloids will want to publish.

In any case, it’s not just the tabloids. The article featured in the P.S. in the above post appeared in the American Journal of Political Science.

]]>Good point about the positive slope from -5 to +5 degrees latitude.

If we take this paper seriously, the positive-slope results are far more important than the Huai River discontinuity. Life expectancy could be increased by about 10 years by just moving north — an effect probably bigger than curing both cancer and heart disease.

Such results would be astounding if they are widely applicable.

]]>I’ve never seen an economist publish anything of interest in Science.

Does Science even have economist editors? Who would they get to review it? Why publish it in Science when economists don’t read Science?

]]>One could perhaps alternatively conclude that the life expectancy value for this city was revised, but that would require acknowledging that datapoints have measurement error, which would lead the authors to realize that that’s yet another thing to consider when fitting models to data, after which the authors would stop themselves from publishing junk. But clearly, that’s not the case.

Q.E.D.

]]>“Is this a journal no one takes seriously?”

Well – do you take the National Academy of Sciences seriously? I think Andrew has flip-flopped on that, they aren’t Prestigious Proceedings anymore, just Proceedings, which, in internet, means that they can be taken seriously again. Or something.

I suspect this would not have gotten published in the top Economics journals, at least not twice (!). My guess is they got a second PNAS out of it because of Andrew’s critique – they do the non-parametric local regressions Gelman/Imbens suggest, and they do placebo tests at other distance cutoffs. Which all ties into Andrew’s point, which is that once people focus primarily on the “identification strategy”, they stop thinking carefully about the world. I can almost see reviewers thinking “well, I’m not really all that convinced, and there are some weird differences here relative to the traditional idea of a running-variable in an RD, but I don’t see anything obviously wrong with the methods, and I can’t say the question isn’t important/interesting, so… ¿accept?” It could also just be an editor who wanted to give them a chance to respond, whether you think they deserved PNAS space for that or not.

If I’ve been surprised by anything I’ve learned about publishing since I became a paid (instead of paying) person in Academia, its that the top general interest journals (PNAS, Nature, Science) publish a whole lot of really bad papers that wouldn’t get into the top journals in their respective disciplines. That might be less true in other fields – I’m sure I’d publish my cure for cancer in Science – but in Econ, people actually value a Science less than an American Economic Review or an Econometrica (its true – you couldn’t make that s*** up). That usually makes us look dumb to researchers in other fields, but then you see this kinda stuff and you’re like “oh yeah, right, they have really poor taste and discernment in social science research.”

]]>Terry:

I think it’s a sort of ideology, or overconfidence, that various scientists are trained to think that “identification strategies” such as regression discontinuity analysis will give them the answer. And they get a lot of feedback supporting this attitude. Remember, that original air-pollution-in-China paper was published in a top journal and received wide and uncritical press attention. So, lots of reasons for people to think they’re on the right track when they’re doing this sort of thing, even though from a scientific position it’s ridiculous.

]]>Hi Matias:

Thank you for the thoughtful response! That clarifies several questions I had about the method.

Best,

Mark

It seems to me that a radial basis function approach, including some step-like functions such as c*inverse_logit((x-a)/b) would be a good choice of global basis for RD type regressions. When there are localized features expected, including localized functions in your representation makes good sense. It’s unfortunate that many people in social sciences don’t have a lot of math background in areas like function approximation theory.

]]>Matias:

Thanks much for the response.

]]>Hi Mark:

Thanks for your question. Andy invited us to answer it, since it refers directly to our work.

We view our paper Calonico, Cattaneo and Titiunik (2015, JASA) as providing data-driven, principled methods for graphical visualization of RD designs, and for conducting some heuristic specification tests. However, we recommend against using this methodology for estimation and inference of RD treatment effects. The RD plot is a tool to visualize and illustrate, not to formally estimate effects or make statistical inferences. We state this in our paper (p. 1756-1757): “Global polynomial approximations may not perform well in RD applications and, more generally, in approximating regression functions locally. These polynomial approximations for regression functions tend to (i) generate counterintuitive weighting schemes (Gelman and Imbens 2014), (ii) have erratic behavior near the boundaries of the support (usually known as the Runge’s phenomenon in approximation theory), and (iii) oversmooth (by construction) potential discontinuities in the interior of the support.” Point (ii) is the key when it comes to RD estimation and inference.

We have made this same point multiple times in our work. For example, take a look at our forthcoming Cambridge monograph: http://www-personal.umich.edu/~cattaneo/books/Cattaneo-Idrobo-Titiunik_2018_CUP-Vol1.pdf , where in Section 4.1 (Local Polynomial Approach: Overview) we write “Since the RD point estimator is defined at a boundary point, global polynomial methods can lead to unreliable RD point estimators, and thus the conclusions from a global parametric RD analysis can be highly misleading”. Instead, we advocate for local to the cutoff analysis when estimation and inference of RD treatment effect is the main goal. See our other papers here: https://sites.google.com/site/rdpackages/rdrobust/

Best wishes,

Matias

]]>Are these papers severe outliers in this field?

Is this a journal no one takes seriously?

Or, is this a big joke made up by Andrew or his correspondent?

Do journals in this field signal in some way that some papers they publish are pretty crappy? Do they make them the bottom article in the issue?

]]>“What is it with the blog refusing to remember who I am? That’s been going on for a couple months.”

I seem to have only been experiencing it the past couple of weeks.

]]>I think that is a third reason to be skeptical: one has to look at the totality of implications of an analysis, not just the implications for a favored topic.

]]>Sigh, of course moments after I re-post my original post shows up…

What is it with the blog refusing to remember who I am? That’s been going on for a couple months.

]]>Since the data near the edge of an interval comes from only one side of the interval, it’s virtually guaranteed to be the case that a polynomial will wiggle near the edges. Here is code to generate 10 plots of a 6th degree polynomial fit to normal(0,1) random noise whose correct regression line is y=0, see for yourself:

library(ggplot2)

set.seed(1)

dataset=list()

pdf(“test.pdf”)

xes=seq(0,1,.01)

for (i in 1:10){

dataset[[i]] = data.frame(x=xes,y=rnorm(length(xes),0,1))

print(ggplot(data=dataset[[i]],aes(x,y))+geom_point()+geom_smooth(method=lm,formula=y~poly(x,6)))

}

dev.off()

system(“evince test.pdf&”)

Here is R code to generate 10 sixth order polynomial regressions to data that is just normal(0,1) errors (so the correct regression line is y(x)=0)

library(ggplot2)

set.seed(1)

dataset=list()

pdf(“test.pdf”)

xes=seq(0,1,.01)

for (i in 1:10){

dataset[[i]] = data.frame(x=xes,y=rnorm(length(xes),0,1))

print(ggplot(data=dataset[[i]],aes(x,y))+geom_point()+geom_smooth(method=lm,formula=y~poly(x,6)))

}

dev.off()

system(“evince test.pdf&”)

Run it, and you’ll see that basically every single page of the 10 page pdf has a wiggly regression line that curves strongly at the edges.

]]>The authors claim their results are robust to other bandwidths and kernels, but I don’t see any results (in the paper or SI) with the same (triangular kernel) and substantially varying bandwidth.

]]>“Figure 1 utilizes the

optimal data-driven RD plots developed by Calonico,

Cattaneo, and Titiunik (2015) to allow for a correspond-

ing visual examination of the discontinuity at the cut

point. Consistent with the results in Table 1, column 2,

Figure 1 shows visual evidence of a clear discontinuity at

the cut point for projects proposed in the 2-year period

after a state election”

I’m curious to get your thoughts on robust RD (Calonico, Cattaneo, and Titiunik 2015), given that the method often fits data using high-degree polynomials. I increasingly see papers using this approach as a way of minimizing researcher discretion over the number of polynomial terms and bins.

]]>