Gur Huberman writes that he’s been wondering for many years about this question:

One function of protests is to vent out the protesters’ emotions. When do protests affect policy? In dictatorships there are clear examples of protests affecting reality, e.g., in Eastern Europe in 1989. It’s harder to find such clear examples in democracies.

And he sent along a link to this paper from 2013, Do Political Protests Matter? Evidence from the Tea Party Movement, by Andreas Madestam, Daniel Shoag, Stan Veuger, and David Yanagizawa-Drott, who write:

Can protests cause political change, or are they merely symptoms of underlying shifts in policy preferences? We address this question by studying the Tea Party movement in the United States, which rose to prominence through coordinated rallies across the country on Tax Day, April 15, 2009. We exploit variation in rainfall on the day of these rallies as an exogenous source of variation in attendance. We show that good weather at this initial, coordinating event had significant consequences for the subsequent local strength of the movement, increased public support for Tea Party positions, and led to more Republican votes in the 2010 midterm elections. Policy making was also affected, as incumbents responded to large protests in their district by voting more conservatively in Congress. Our estimates suggest significant multiplier effects: an additional protester increased the number of Republican votes by a factor well above 1. Together our results show that protests can build political movements that ultimately affect policy making and that they do so by influencing political views rather than solely through the revelation of existing political preferences.

My reaction: I’m suspicious of any analysis involving rainfall in that way. Ultimately it’s observational data and I think these sorts of tricks are hocus-pocus. That said, I do think protests can make a difference (I’d say this whether or not I’d seen this particular paper you sent me).

The usual story I’ve heard, and which makes sense to me, is that protests energize the base, motivating people to write their congressmember, run for office, harangue their friends and relatives about politics, etc. It’s not about the direct effect of intimidating politicians (although maybe that’s part of it) or even about the effect of swaying public opinion to your side—as many people have noted, protests can often antagonize the average voter. Rather, it’s about getting potential core supporters more involved in politics.

After I sent that response to Gur, he sent me his take on the Madestam et al. paper:

The point of the very clever paper: Just correlating protest size & voter sentiment doesn’t prove that protest size affects voting outcome; possibly (likely) both are affected by voter sentiment. Now, think about two otherwise identical places, A & B running parallel protests. Suppose it rains on A and not on B, and therefore the protest size in A is smaller than that in B. A difference in voting outcome (more protest-sympathetic vote in B) is reasonably attributable to the lower protest turnout in A which in turn is attributable to rain, not to sentiment.

OK, I should elaborate: Sure, I realized from the abstract that this is what was being done in this paper, and I agree that the rainfall-instrument is clever and worth trying out. But I don’t think the result is necessarily as clean as it might appear. The difficulty is the “otherwise identical places” thing. Places aren’t really otherwise identical. Or, to put it another way, the rain in different places on that day was not random; the data are correlated. And various other issues too. Looking through the paper, I see that the authors do seem aware of many of these issues, and on first glance it looks much better than, say, the air-pollution-in-China paper. Still, I don’t know that I’d trust the result of this sort of instrumental analysis as much as I’d trust a more conventional analysis, where cities are matched on a bunch of pre-treatment variables and then regression models are constructed to predict the outcomes of interest.

I do find the idea of instrumental variables appealing in this context: after all, the “treatment” (in the sense of a potentially manipulable intervention) has something to do with various key actors in politics and the news media. Or, to put it another way, there are two sorts of decision makers here. There are the political leaders or political entrepreneurs who make the decisions of whether and how to organize rallies, and there are all the individual people and groups that decide whether to participate in a rally. And it makes sense to me to think of the size of a rally as an intermediate outcome, as the result of all those decisions. So in that sense I’m thinking in terms of instruments.

But I don’t really buy the rainfall thing; it strikes me as an attempt to get something for nothing. Remember my trick for thinking through instrumental variables? The usual approach would be to say that the instrument gives us an effect of the effects of crowd size. Instead I’d say that excess rainfall on that day is correlated with crowd size and with some other political outcomes. That’s about it. And given the spatial correlations of weather patterns, it’s more of an N=5 or N=10 thing than N=2758. Again, it’s a serious analysis, and there’s no reason not to look at it. I’m not trying to shoot it down so much as to clarify its interpretation.

The big picture is that the causal mechanisms suggested by Madestam et al. are similar to what I’ve heard people say over the years, based on qualitative evidence. So maybe it’s not so important how seriously we take their particular quantitative claims. Setting research methods aside, our best answer to the question of how protests affect policy is that they do so by mobilizing activists and potential activists who can then go out and do political activity and persuasion.

I am an old person. The 1960s had protests against segregation that I believe did impact policy. The 1968 Chicago protests did move the presidential vote from the 1964 debacle for the GOP to a victory. In the thirteen elections following the Chicago protests, the GOP has won eight. 8/13 might not be outside the realm of statistical probability, but it is a big deal in real life.

This is summary of a recent article on the impact of lunch counter protests in the South https://contexts.org/articles/protest-works/?_ga=1.211784848.469013543.1456972656 which found them to be effective.

It seems that your main worry is that rainfall on 4/15/2009 was associated with later political outcomes other than through protest attendance. I assume you’re not worried that rain on 4/15/2009 specifically is associated with political outcomes, but rather that places where it tends to rain more may also share certain political characteristics. This is reasonable. But it seems straightforward to adjust for historical rainfall to address this problem, making rain on 4/15/2009 a valid instrument.

You also seem to object to instrumental variables in general, calling them ‘hocus pocus’. I don’t understand this attitude given that you must surely be aware of the formal assumptions under which they are provably valid (one set of which was of course developed in part by your advisor). Do you just think the assumptions rarely hold in practice? Are you ok with using IV methods in examples where the assumptions clearly do hold by design, e.g. assignment to treatment arms in a randomized trial or lotteries to get into charter schools? This rain example (adjusting for historical rainfall) seems like a pretty good one to me.

Never mind the rain being a great instrument, I realized that large weather systems on that day can cause issues. But I’m still interested in an answer to the second paragraphs’ questions.

But I think large weather systems goes to the heart of the matter. Is the model specified in such a way that it takes into account these correlations? If so, as Andrew said, there would be an effective sample size of 5 or 10, and much less certainty about the effect size.

When you assign treatment *using a verifiably high-quality pseudo-random number generator* (and under no other circumstances) then you can prove that in the limit of large sample size (which 2758 locations would be) the treatment assignment is uncorrelated with *every* other variable (a more accurate statement would be that the probability that the sample-correlation between the treatment and any other sample statistic will be large, is itself small).

The instrument variable approach is only convincing if you believe that the instrument is uncorrelated with all the important explanatory variables, and what’s more you have to believe that you *know what they are* so that you can consider whether it’s uncorrelated. In most of these situations, there isn’t enough substantive theory to be able to identify all the important variables, and hence there’s always the concern that there is some important common variable which invalidates the IV.

One advantage of the matching-and-regression approach is that it explicitly considers a set of explanatory variables instead of sweeping them under the rug by saying “rain is as good as random assignment”.

In the presence of a detailed causal model, an IV becomes a lot more convincing. For example if you want to use an asteroid impact on the moon to measure something about the properties of moon rock, you don’t need to actually design and launch your own asteroid because we believe that we know what the important variables for the asteroid are, such as size, mass, velocity, perhaps something about the material the asteroid is made of. So you can use naturally occurring impacts and study the effect of velocity treating it as if it were randomly assigned by nature because the magnitude of the importance of the stuff that’s left out (such as for example the irregular shape of the asteroids) is explicitly thought to be small in the first place.

yes, large weather patterns are why i took back that rain was a good instrument in this case. my questions in the second paragraph stand, though, because andrew made lots of statements about instrumental variable analyses in general as well.

Daniel:

For your last paragraph (which I believe gets at the real issue here) – Susan Haack has this cross-word metaphor – in your asteroid example, the cross-word is judged to be mostly filled in while for the protests example, the cross-word is judged to be hardly filled in. Both can lead to the same posterior probabilities but what we make of these same posterior probabilities should be very different.

Right, in some sense, when we just write down the model in the “hardly filled in” example there is really a lot of uncertainty ABOUT THE MODEL which we are ignoring. And often, even if we could have some kind of causal model of voting behavior or whatnot, we’re missing almost ALL the data we’d need to put into such a model (for example, for each person involved we’d need some sort of detailed family history and genetics). So the hardly filled in case is usually a stand-in for something else that we don’t know what it should be. That’s very different from the asteroid case usually.

Z:

Regarding your first paragraph, there are different ways of looking at this, but the simplest way to put it is that the relative amount of rain at a particular place on a particular day is not a randomly defined treatment. As I wrote in my post above, the weather is spatially correlated, so I think the true “N” is more like N=5 or N=10 than N=2758. Also there are issues with using the absolute amount of rain, because in some places such as New York the probability of rain is approx 1/3, every day of the year; in other cities it almost never rains. For example, it’s very rare for it to rain in Phoenix in April. Yes, the analysis controls for historical rainfall but such controls end up being sensitive to functional forms. If you have a randomized experiment then results will be robust to such things, but you won’t necessarily have that robustness once you move to these highly nonlinear and skewed distributions and where the “treatment” is not assigned at random.

Regarding your second paragraph: I find the concept of instruments to be helpful when the instruments correspond to the sorts of interventions that might be performed to activate the treatment.

Maybe it would be clearer if I give an example of an instrumental variables argument that I

didfind convincing: Steven Levitt’s study of the effect of #police officers on crime, using mayoral elections as an instrument. I’ve never read the actual paper but myHere’s the deal. Suppose we use the notation z -> T -> y, so that z is the instrument, T is the treatment, and y is the outcome. IV’s are generally presented like this: you want to estimate the effect of T on y, and then you find an instrument z and go from there. But, as I discussed in my review of the Angrist and Pischke book a few years ago, that’s not how I think about IV’s. The way I think about IV’s is that you start from z and look at the consequences that flow from there.

So . . . it makes a lot of sense for me to say something like, When elections are coming up, mayors put more cops on the street and crime goes down, and we can form a structural model (the exclusion restriction or maybe something looser) that tells us the increasing cops in this way has that effect on crime. But it

doesn’tmake a lot of sense to me to say, When it doesn’t rain in a city on a particular day, there will be larger protests and there will be some different outcomes down the road. It seems like too much of a bank shot. Just about any little thing could go wrong and mess this up. Because, again, remember, rainfall is not randomly assigned. There are systematic differences between cities with more and less rain, which returns us to the point that this ultimately is just an observational study.I had a similar negative feeling about that air-pollution-in-China paper. The China paper was a million times worse because it relied entirely on that horrible wiggly polynomial, but there was still the larger point that the model didn’t make so much sense. Distance north or south of the river will be a good predictor for life expectancy because . . . what? Again, all sorts of systematic differences between those cities still exist. Pulling out one pre-treatment variable and calling it an “instrument” or a “forcing variable” doesn’t solve the fundamental problem.

Thanks for the response. I completely agree with your concerns about rain in this example now. (I had missed N=10 point when I originally read the post.) And I obviously agree that it’s inappropriate to take any old pre-treatment variable and call it an instrument. And I’m sure this happens in practice quite a bit.

But to clarify, supposing there is a pretreatment variable that is genuinely haphazard (I know, not common) and satisfies the exclusion restriction, are you saying that you only think it’s appropriate to use that variable as an instrument if it’s manipulable? (“I find the concept of instruments to be helpful when the instruments correspond to the sorts of interventions that might be performed to activate the treatment”) Or are you just saying that you find it hard to believe that rain on a given day could have such large downstream consequences on politics so the results are likely due to confounding of the instrument, whereas it’s easy to see how upcoming elections might influence law enforcement?

I actually think that the police example seems likely to seriously violate the exclusion restriction as upcoming elections might influence law enforcement in ways other than # of police (e.g. effort, or discretion when classifying crimes, etc.)

> Still, I don’t know that I’d trust the result of this sort of instrumental analysis as much as I’d trust a more conventional analysis, where cities are matched on a bunch of pre-treatment variables and then regression models are constructed to predict the outcomes of interest.

Is this a general comment on how credible you find identifying assumptions for instruments vs regression/matching, or specific to this setting?

If general, my understanding is that economists hold the opposite view, though these days, they don’t really find instruments to be credible either and now regression discontinuities are all the rage.

a regression discontinuity is a special case of an instrument

The critique seems to identify two distinct issues, one of which is about this analysis, and the second of which is about the technique.

The first comes down to a sample size issue. The observed correlations are, as Andrew said, based on effectively n=5 or 10. If the data set was over multiple classes of protests, in different years, etc. then a hierarchical model should be able to deliver a much better estimate – though given the additional variables and complexity, the sample size might need to be very large. Then the observed “effect of the effect of crowd size” would be a much better proxy for whether large crowds themselves are causally connected to regional protest efficacy. (Is that right?)

If this is correct, though, we still don’t know whether the effect of a protest movement depends on the participants, because there are effects of protests that are not geographically bound, and politicians presumably react based on their knowledge of the conditional turnout (200 people showed up even though it was pouring vs. 500 people showed up on a sunny Sunday.) This critique is more central to the social science question being addressed, and I’m unsure if there can be any type of clever econometric manipulation that could address it.

> And given the spatial correlations of weather patterns, it’s more of an N=5 or N=10 thing than N=2758.

What do you mean? I don’t know how this quantity would be defined precisely, but I’d guess at least one order of magnitude more that that after looking at Figure 3 in https://www.aeaweb.org/conference/2013/retrieve.php?pdfid=227

In figure 3 I see a spatial scale to the residuals that is easily 500 to 1000 miles (ie. correlation between the color of the residual dots extends over those kinds of distances). There are 3.8 million square miles in the US. This suggests something like on the order of 3.8e6/(1000)^2 = 4 “independent” experiments.

As always, Daniel gave a concrete example (which seems to be his trademark)

But the middle ‘concept’ that he’s leaving to you to fill in the blank is

when observations are correlated, the ‘effective sample size’ is NOT the same as the iid sample size (usually smaller). The point is, effective sample size uses ‘iid’ as a reference

I may be lacking imagination, but I don’t think the pattern of rainfall in the US (relative to historical averages) for a given day can be reduced to four numbers.

Let’s divide the US in four quadrants (that’s easy enough for me to imagine). Let’s use a verifiably high-quality pseudo-random binary number generator to assign a more-rain/less-rain value to each of those quadrants. That’s what I understand by N=4. Figure 3 doesn’t look like that.

And even if N was 4, the spatial correlation of rainfall doesn’t mean anything in itself. To have an effect on the “number of effective observations” for the study there has to be a spatial structure in the other covariates and the impact will depend on how they relate.

Imagine that there is no geographic structure (just for the sake of the argument, I find that easy to imagine as well). All the observations would correspond to places that, apart from the weather, would be drawn from the same underlying distribution of places. It wouldn’t mind that you cannot compare Rochester to Buffalo because both were sunny: you can compare them to Orlando.

On the other hand, if you considered that the “number of effective observations” is reduced because you can only compare properly places within a 100 miles radius, for example, that doesn’t make N=4 true. Taking again the four-quadrants example, maybe the observations deep within each region are useless, but close to the edges of those regions you have rain/no-rain pairs which are close enough to make them comparable.

It’s O(4) meaning some small multiple of that number. Imagine you had to write down the rainfall function as a function of x and y. You do a fourier expansion in X and Y with a small multiple of 4 coefficients. Like say 4, 6, 9, 16 something like that. You’d probably do pretty well.

Of course, there’s also the underlying political process itself which could easily have more variation, but as an instrument variable perturbing the underlying process, the weather is only “worth” as much information as is in the entropy in the joint distribution of the fourier coefficients.

You might easily convince me it’d be say 48 coefficients required to do a good job, still a lot less than the 2000+ individual observations. The point holds that the instrument variable doesn’t operate as an uncorrelated point-process, it operates as a big old smooth continuous function of space that has very little “high frequency” information in it.

Also, it doesn’t operate like the “bins” you are thinking of. We can’t go to the edge of some bin and say “these two cities are in different bins so they have un/low correlated rainfalls” Because the rainfall is a continuous function, any two close together cities will have predictably similar rainfalls. Only by getting far apart so that the low frequency fourier components can act do you get different levels of rainfall.

Think of rainfall as an undulating-in-time fourier-surface which is frozen in time on that day as the treatment. It’s the shape of that fourier-surface that is kind-of sort-of a “random” treatment. More than that, you can imagine averaging the rainfall surface over 12 daylight hours or so. This time-averaging will also smooth out the spatial structure since averages in time are related to averages in space by the convection of the weather.

> Because the rainfall is a continuous function, any two close together cities will have predictably similar rainfalls.

You may live in a very boring place and be unfamiliar with “April showers” and other localized weather phenomena.

I found a calculation of spatial correlation of daily rainfall events (i.e. 1 for rain, 0 for no rain) in Alabama, Florida and Georgia. For the month of April (1990-2004), the average correlation is 0.62 for pairs of weather stations closer than 50km, 0.52 when they are 50-150km apart.

I guess you’ll find that this result doesn’t contradict what you’ve said (if 4 is like 48, 100km may be like 500 to 1000 miles), so I’m happy to leave it at that.

http://plaza.ufl.edu/gbaigorr/GB/Documents/Journals/Understanding%20rainfall%20spatial%20variability%20SEUSA.pdf

Core take homes are basically this: In some places there may be localized weather phenomena, in other places there are vast stretches of absolutely no rain at all or an entire eastern seaboard full of rain. Net result is, the number of independent pairs of “experiments” is far smaller than the total number of observations. Quantifying an effective sample size *is* difficult without going ahead and doing the entire project and I agree with you that calling out a number like “4 to 10” shouldn’t be taken as precise.

50% correlations between 0 or 1 variables at 100km is a lot. How far do you have to go to get correlation of say 5%? It might well be the 500 to 1000 miles I guessed.

Correlations between inches of rain might be even bigger or extend farther (because 0.05 inches of rain is a lot more like 0 than it is like 3 inches for example but both 0.05 and 3 inches will code as YES and only 0.0 inches will code as NO).

Regions where “April showers” are frequent will also generally have different behaviors endemic to the population than regions where an April shower is rare. Working with repeated events could be a lot more effective in determining some of these things instead of a *single* observation (that is, repeated spatial measures of ONE event).

Daniel:

Yes. Or, to put it in statistical terms: places which had more rain on that particular day are different in various ways from places which had less rain. Those differences will be correlated with all sorts of things.

My own summary of the conversation (I’m not sure if there is a disagreement):

Andrew : It’s more of an N=5 or N=10

Carlos : I’d guess at least one order of magnitude more

Daniel : It’s of the order of N=4

Daniel : You might easily convince me that N=48

Daniel : Calling out a number like “4 to 10” shouldn’t be taken as precise.

The only thing I’d add Carlos is that there are ways of thinking about effective sample size or more generally information content by referring to the number of coefficients you would need to specify the functions involved to within the precision that makes a difference, or by considering dimensionless ratios of scales etc.

As a rough guide these are useful techniques. Much more useful than “I was given data on N cities I’ll pretend they are all independent due to random assignment of rain”

Shot-in-the-dark question here, as I know this post is mostly about methodology. But, I’m very curious about a parallel social science question to the one the paper above asks — do phone calls / letters / email to elected representatives matter? A cursory literature review has turned up nothing. Does anyone know of any studies on this topic? Perhaps a bit easier to actually do a field experiment of a size that could affect something…

As a causal factor in political science, rainfall actually goes at least as far back as John D. Barnhart in 1925: https://www.jstor.org/stable/2939131

Much of the commentary on this paper is misguided and seems to be based on guesses as to what the paper does rather than a careful read of the paper. Some of the concerns brought up are genuine, but every single one of them is addressed in the paper. One could argue that there remain issues or that the manner in which these concerns have been addressed is in some way inadequate, but dismissing the paper as something not quite as bad as that atrocious pollution in China paper, but still “hocus pocus” to be dismissed out of hand, is unwarranted.

The paper does actually address the spatial correlation issue. All the main analyses deploy both state fixed effects and covariance matrix estimates which are robust to clustering at the state level, so within-state correlations are not an issue. The authors point out that across-state correlations in residual rainfall may still be a problem and consider the issue in the robustness section by showing their results are qualitatively unchanged if such spatial correlation is modeled explicitly. Also, as detailed below, it’s not spatial correlation in rainfall itself which is the issue, it’s spatial correlation in rainfall residualized with respect to a large number of covariates.

Critically, everything reported is adjusted for a large number of covariates, so the criticism that “places which had more rain on that particular day are different in various ways from places which had less rain, those differences will be correlated with all sorts of things” is incorrect. We’re not just asking whether rainfall occurred in one place or another, we’re asking whether rainfall residualized with respect to (from page 16): “flexible controls for population size (decile dummies) and other demographic controls: log of population density, log of median income, the unemployment rate, the increase in unemployment between 2005-09, the share of white population, the share of black population, the share of Hispanic population (the omitted category consists of other races and ethnicities), and the share of immigrant population (in 2000)… the county vote share for Barack Obama in the 2008 Presidential election and outcomes from the two preceding U.S. House of Representatives elections (the Republican Party vote share, the number of votes for the Republican Party in total or per capita, the number of votes for the Democratic Party in total or per capita, and turnout in total or per capita).” Critical to the IV estimates, all specifications also include controls for the probability of rain, entered non-parametrically to address concerns such as Andew’s over functional form.

So the model is not identified by whether it rained or not in a particular place, it’s identified by whether or not it was surprising that it rained on a particular day *given all of the above considerations.* That is, it is identified off the assumption that unobserved causes of various political outcomes are uncorrelated with surprise rainfall, in the sense of residuals from the model described above. One might still be concerned that residual rainfall for whatever reason correlated with residual political outcomes, which would invalidate the IV. This possibility is addressed in Tables 2.a and 2.b, which shows that residual rainfall does not predict pre-rally political outcomes.

Notice this is a much weaker assumption than: after controlling for observed characteristics such as those listed above, rally attendence is correlated with political outcomes. Such an analysis, even if done through matching estimates with weak functional form assumptions, suffers from the problem that unobserved causes of rally attendance are probably correlated with unobserved causes of political events (e.g., people in some county are really mad at the incumbent politician, and both attend rallies and vote for the other party in response). So I don’t see why Andrew thinks such matching estimates would be convincing but turning to quasi-experimental evidence is not. Nonetheless, the paper does actually report that sort of analysis Andrew would prefer, in Table A4, and the results are pretty similar to the IV analysis.

Finally, Andrew, what you refer to as “your trick” for thinking about IV is referred to by everyone else as the “indirect least squares” derivation of the estimator. It’s how IV is commonly introduced in undergraduate econometrics classes. The equations relating the treatment and the outcome to the IV are called the “reduced form” of the model, often presented to highlight exactly what you say: how the endogenous outcomes “flow” from the exogenous variation in the instrument. This is all fundamental stuff, very well-known in the econometrics community.

Chris:

These ideas may be well known in the econometrics community at a theoretical level but I do not think they are well understood by the applied economics community, nor do I think they are clearly taught. For more on this point, see section 4 of my review of Angrist and Pischke’s book. The point of what you refer to as “what you refer to as ‘your trick'” is not the mathematical definition which of course is standard; it’s the interpretation. In the example of the rain and protests, what the researchers are doing is looking at various things that follow from rain happening or not happening on a particular day, which I don’t think is going to tell us much about political outcomes.

Andrew, again, the interpretation that you insist on referring to as “your trick” is referred to as the “reduced form” of the model in every econometrics textbook every written. This interpretation—that the endogenous outcomes “flow from” the exogenous outcomes— is very well-known and very well-understood by economists and is routinely taught in even undergraduate econometrics courses—it is certainly not “yours” and you really ought to stop claiming that this is something you invented. I have personally taught hundreds of students in econometrics courses “your” “trick”, much like pretty much every other person teaching IV. I have no idea why you think this is something not well-understood or well-taught.

The paper in question in this thread, for example, reports and discusses reduced-form estimates in a number of places, noting in passing, “we also present reduced form effects of protest day rainfall for all outcomes, where the exclusion restriction is not a necessary identifying assumption for our interpretations.” In your notation, they report how (y, T) depend on Z and discuss what can be learned from those estimates even without imposing the exclusion restriction, i.e., they discuss at some length “your” “trick,” much like a majority of other papers using IV in the applied economics literature.

Can you cite examples of well-cited papers in applied economics you think demonstrate your case? I don’t see anything in your review of MHE which bolsters your argument beyond what you’ve said here or in previous blog posts. With regard to the paper: you’ve certainly made it clear you don’t believe the results, but your objections are all actually addressed in the paper so it’s not clear why, and the matching exercise you suggest as a more compelling alternative suffers from severe endogeneity issues. One could reasonably argue that no observational data could rise above suggestive evidence in a context like this, but I don’t understand why you think that matching on observables does reveal a causal effect here whereas the IV estimates are “hocus pocus” which “don’t tell us much about political outcomes.” It seems to me that the assumptions required for your preferred estimator to reflect causation are MUCH less plausible than the assumptions imposed in the paper you’re dismissing out of hand.

Chris:

I have not claimed that my trick is an idea unique to me. It’s a trick that I find useful. It’s a pretty simple idea. It just happens to not be the way that instrumental variables regression is generally presented in textbooks and research articles. Again, I refer you to the relevant section in my review of Angrist and Pischke’s book. The point is not the mathematical existence of the reduced form, but rather how the regression results are interpreted. In calling this “my trick,” I’m not trying to claim ownership of the idea. It can be your trick too!

Also, I’m not dismissing that rainfall paper out of hand, or am I saying that I don’t believe the results. If I’ve “certainly made it clear I don’t believe the results,” this was a failure of communication on my part! Here’s what I wrote: “I’m not saying that I think the analysis and the conclusion in that paper are wrong, necessarily; I just am not so convinced that it’s necessarily correct.” I hope this clarifies things.

It’s not that I have a “preferred estimator” that I think solves the problem. Rather, I think these sorts of causal questions are inherently difficult, and I resist the idea that there is some “estimator” out there that will solve it. I think the whole framing in terms of estimators is a mistake.

Finally, in all sincerity, I appreciate your comments here. It seems clear that you think I’m wrong and maybe even a bit annoying, but you’ve still gone to the trouble to explain your position in clear terms. This sort of discussion is worth having. We’re still coming at this from different perspectives, but at least you and I and whoever reads this deep into the comment thread can see both perspectives, and I think that’s important. So thank you.

Andrew, in context claiming something is “your trick” implies that it’s something you made up. You might want to consider different phrasing. I don’t agree this concept is something that is not usually presented in econometrics textbooks, it’s almost universally discussed, and the stuff in your 2009 blog post is actually routine fare in such textbooks (and I do specifically mean the interpretation you prefer, not merely noting the mathematical existence of the reduced form). Again, I don’t see anything in your book review which bolsters your case, and I’d point out again that ironically the very paper under discussion in this thread repeatedly presents reduced-form estimates. How would you suggest the authors should have interpreted their results?

I am glad you are not dismissing the paper, because it certainly seemed that way! You did start with the paper is not “necessarily wrong,” which isn’t exactly a glowing endorsement, but you’ve also described it as “hocus pocus” which uses “tricks” to “get something for nothing” and generate results which you “don’t think are going to tell us much about political outcomes.” I find it hard read that as an opinion that can’t be paraphrased, `you don’t believe the results.’

I would be genuinely interested if you can supply any criticisms that aren’t addressed at length in the paper, as I have a graduate student following a similar stategy in something they’re working on. To be clear, I don’t think noting spatial correlation in rainfall, possibly functional form problems, or that cities with more rain aren’t identical to cities with less rain, is noting actual problems, as these are all potential problems which are addressed. One problem which might remain is we must assume that the *only* way rainfall affects political outcomes, conditional on the other stuff in the model, is through its effect on protest size (suppose for example that people are more likely to read about politics on the internet if they stay home from the protest because of rain, and that reading changes their subsequent behavior). But that’s also a problem, as is functional form considerations, in the Levitt paper you do find convincing (and in fact Levitt went back years later and published a follow-up paper trying to address these and other even more serious issues with the police and crime paper, see Levitt 2002). So I am baffled by your opinion that one of these IV papers is compelling and the other, well, not so much.

It’s not the “estimator” that’s intrinsically “solving” the problem. No one thinks IV is “magic,” or at least, I hope they don’t, and obviously the authors of this paper don’t given the lengths they go to to attempt to address potential problems, test the assumptions they use to the extent possible, and provide robustness checks. Of course the question is inherently difficult, that’s why they’re going to all this trouble to try to provide some evidence on it! The scientific question here is whether we can use the random component of rainfall combined with extensive other data to shed light on how protest activity leads to various political outcomes, given the inherent difficulties that question poses. That question plays out as statements about the properties of estimators in this context, but the context matters: no one thinks merely naming the estimator acts as a spell which solves all problems.

With regard to your last paragraph: I have read your blog for years and even teach some of your papers; I do respect your opinion and often find interesting discussion here. What you’re seeing is essentially a selection issue: I am prompted to post comments most often when I think you’ve jumped in to offer criticism of something you haven’t considered carefully enough, such as this paper, and I think you’re particularly prone to offering such poorly-founded criticism when the authors are economists, which does tend to annoy me.

> And given the spatial correlations of weather patterns, it’s more of an N=5 or N=10 thing than N=2758.

The more I think about that statement, the less sense it makes to me. As an example (not trying to be realistic, I’m interested in the principles), let’s consider the occurrence of events to be a mixture of a large-scale and a low-scale process. Let’s say there is a global event P(rain)=0.5 and at each site we see either (50/50) the global event or a local event P(rain)=0.5. The output has 0.25 correlation at any scale.

I understand the effective sample size to be estimation of how many “independent” observations would be give the same sampling error for some estimator of interest. If what we want to estimate is the overall probability of rain, it can be argued that in our case Neff~4 regardless of the number of observations. The actual value we’re looking for is 0.5, but depending on whether the global condition is wet or dry we will be sampling from [1,1,1,0] or [1,0,0,0]. As we add observations, the sample mean will converge to either 0.25 or 0.75 and the sample standard deviation will converge to 0.43. We cannot say that the standard deviation of our estimator is 0.43/sqrt(N), because it doesn’t improve with the sample size and it will be 0.43/sqrt(Neff).

Now, I don’t see why that would have any major effect on the suitability of the rain events as instrument variables. When the global environment is rainy the probability of rain at each site is 0.75, when the global environment is dry probability of rain at each site is 0.25. Conditional on the global environment all the events are independent. Maybe the imbalance decreases a bit the efficiency of the “intervention” but that’s all.

Carlos:

It makes a lot of sense to me that, when comparing two similar cities with varying levels of rain, that outdoor protests would attract more people in the city with less rain. The thing that concerns me is that cities with and without rain can differ in various ways. As I wrote in my post above, I’m not saying that I think the analysis and the conclusion in that paper are wrong, necessarily; I just am not so convinced that it’s necessarily correct. I’m pushing against the idea that some people seem to have, that instrumental variables analysis is a sort of magic. And I have the same concerns with traditional regression analysis (see here for a particularly ridiculous recent example).

Can this correlation have some impact? Maybe. But I don’t think it can have a “N=5 or N=10 rather than N=2758” kind of impact, whatever that means.

Suppose there are 2000 towns. Now you’re told the rain amount, and attendance at the rally at the first town, and its demographic data, you write down a fourier expansion for the rain function, and you can look at the rain function and try to predict the attendance from the rain and the demographics. Next you’re given another town at random… you learn a bit more about the attendance function. You get another town… you learn a bit more about the attendance function…

After several randomly select towns, Neff, you can accurately predict both the rain, and the attendance at all the other towns. How many randomly selected towns do you need before you’ve converged to within epsilon? How many if you already know the rain function?

I’d argue you’re correct that it’s more than 4, but I’d be pretty surprised if you were still making any progress at all in learning when you got to 100 randomly selected towns, and some small multiple of 5 or 10 seems likely to me, and a small multiple is like 3,4,5,6 etc so somewhere between say 5*3 and 10*6 randomly selected towns would get you there (which is why I said you could probably convince me about the 48 number).

Suppose you were first just given the coefficients of the Fourier expansion, then you’d learn the attendance function even faster. If you’re talking about how much information is in a randomly selected town *for the attendance function* given the rain function already, it’s not a lot more information in the 2000 total towns than is in the first small multiple of 5 or 10 randomly selected towns.

To me, we can argue over the numbers but I think this *way of thinking* is what’s important, not the numbers. And if we carried out this analysis using Stan, and you found that after 100 randomly selected towns that you were still gaining meaningful information, then I’d agree with you that Neff > 100. I’m just making an educated guess that it seems unlikely, and I think Andrew is too. But that educated guess is based on certain techniques that I’ve outlined here and what I’m more committed to is the usefulness of the techniques than precise values of the numbers.

So, now I’m interested, what do you think about this way of looking at it, and does it make sense to you?

No, it doesn’t make much sense. You cannot accurately predict the rain from a small subset. Unless your concept of accuracy is also extremely flexible. You said above that 0.50 correlation at 100km was a lot. What kind of accuracy do you think you get in that case trying to predict if there was rain at some place from the observed rain event 100km away?

Instead of “there was rain at some place” let’s work with inches of rain to the nearest 0.05 inch. Now we have a defined accuracy. Furthermore, since we’re really interested in learning the effect of *attendance* on *politics*, let’s assume for the sake of argument that I give you the Fourier coefficients for the rainfall to the nearest 0.05 inch in the whole country.

How much do you learn *about attendance* from random samples of cities, given their demographics and location and the coefficients of the rainfall function.

How much more information would you learn about the effect of attendence *on politics* if at each city the attendance was assigned *by a cryptographic pseudo-random number generator*?

How much *like a cryptographic pseudo-random number generator* is the rainfall function?

Put another way, if the rain were randomly assigned from some RNG, you’d need 2000 numbers each of which had several bits of resolution to specify the full rainfall vector.

Now, let’s be very precise about what we mean by the rainfall function. The rainfall function will be the local gaussian weighted average of the rainfall at all points around a point of interest using a say 1km gaussian weighting kernel. This is an infinitely smooth function of space and has a “resolving power” of around 1km, and we’ll use the time-averaged rainfall over 6 am to 4 pm on the day of interest.

To specify that function for the whole US takes how many fourier coefficients? The US is around 3000 miles across, that’s around 5000 km. The function by definition has no appreciable frequency content above 1km wavelength, so worst case is about 10,000 coefficients, 5000 for x and 5000 for y.

But in actual fact, you’re going to need WAY fewer than 10,000 coefficients, because weather systems have sizes of hundreds of miles. So let’s just use your 100km cutoff to get an estimate. in units of 100km we have 5000/100 coefficients in each dimension, that’s 50 in each dimension, or 100 coefficients.

And if we assume that what matters is mainly just the difference between what happened on that one day and what happens on average in that region… even fewer pieces of information need to be specified.

So, it’s reasonable to believe that you’d do a good job predicting what matters for this analysis using just a relatively small number of coefficients, 5, 10, 20, even 48 as I said, but not 100 or 500 or 2000

It gets even less interesting when you realize that small differences in rainfall amounts probably modulate the function of interest (attendance) fairly little. It’s a big deal if you expected 0 inches and got 3, but it’s probably not a big deal if you expected 0.15 and got 0.22… And so the actual thing of interest… the affect of attendance on political outcomes…. is much less sensitive to knowing the precise rainfall exactly.

Whereas, if you just used a PRNG you’d have a lot more variation and especially a lot more variation between places that are close together and so have similar demographics, local politics, etc. You might still argue that even with a PRNG since it probably isn’t a uniform PRNG you’d need less than say 2000×4 bits for the whole thing, but it WOULD scale like O(N) whereas the rainfall function is basically fixed dimensional, and much smaller than 2000 dimensions are required.

Daniel, look at table A7 in the paper, which reports a bunch of estimates to gauge how robust the standard errors are to spatial correlation. Note they don’t change much depending on how spatial correlation is modeled, and allowing across-state correlation gives much the same answer is doing what the paper does in all the major specification: cluster at the state level. Earlier you wonder if the analysis is specified in such a way as to be robust to large weather systems: Yes, it is.

Again, it is critical that the paper controls for the probability of rain in a given place, which is itself spatially correlated. Residual rainfall is less spatially correlated than actual rainfall: look at figure 3, showing the geographic pattern of residual rainfall. Note that, while large weather systems are apparent even in the residuals, there is still lots of variation, and lots and lots of places where one county received more rain than expected while closeby counties received less rain than expected. Even a casual glance at that figure overwhelmingly suggests that we are not looking at an effective sample size of 5 to 10.

How much is attendence like a random number generator? That’s the question, sort of–we don’t need those numbers to actually be random, rather, we merely require that they be conditionally uncorrelated with unobserved causes of political outcomes, a much weaker requirement. Much of the paper concerns presenting evidence on precisely that question.

By the way, “The instrument variable approach is only convincing if you believe that the instrument is uncorrelated with all the important explanatory variables,” is not at all correct: there is no such assumption required, and in fact we usually need to adjust for explanatory variables precisely because they are generally correlated with the instrument.

“The instrument variable approach is only convincing if you believe that the instrument is uncorrelated with all the important explanatory variables”

That should probably say “unknown explanatory variables”. ie. if there are a bunch of things you don’t even know about, and therefore can’t “adjust for”, it had better be uncorrelated with them.

Also, whatever the situation with the weather, it is basically a single vector of some fixed number of coefficients. It’s a point in a Hilbert space of functions.

If you collect more and more locations you don’t get more and more information that scales with N the number of cities, instead you get closer and closer to the correct K dimensional Hilbert space vector for some fixed relatively small K (small relative to 2000 cities, K/2000 < < 1).

Note, I’m not arguing that this analysis is wrong, or impossible or anything like that, I’m arguing that there are ways to think about modeling these things that make it clear how much information you can expect to extract. It’s good to do these things ahead of time to get a feeling for what your study might tell you.

Also, if you look back above I *did* look at figure 3 and that’s where I got my estimate of 4 to 16 effective independent samples by saying that the spatial structure of the residuals had 500 to 1000 mile features, and there are 4 million square miles in the US 4 Million/(1000)^2 = 4 and 4 Million/(500)^2 = 16.

People not familiar with this kind of reasoning may find that highly questionable, but I freely admit to perhaps needing a factor of 2*pi in that estimate or something, and so if you told me it was 6 times larger than that I wouldn’t blink an eye. But if you tell me it’s 100 times larger than that we’re going to have difficulty reconciling things.

I’ve now given several examples of why I think the information content in the rainfall residuals has a small number of dimensions. Whether it’s that the rainfall should be averaged over a 1km area to be meaningful for this problem and so you can tell the whole rainfall at every point in the country by on the order of 10k coefficients, or using a “typical” distance of 50 to 100 km between counties to get that you could grid the whole country at that density with just on the order of 50 numbers (fourier coefficients) to looking at the residual plot you mentioned and extracting feature sizes.

You can’t get something for nothing, and rainfall is *worth a lot less information* than a vector of 2000 numbers out of a PRNG.

Also note that the approach which I think is used here of using dummy variables is TERRIBLE at representing smooth functions, and the result is that there is a tremendous amount of noise in any estimate, and so your residuals are full of noise and hence you’re learning the effect of noise on attendance, not the effect of actual signal.

Here is an example of how bad that is from my blog: http://models.street-artists.org/2016/05/02/philly-vs-bangkok-thinking-about-seasonality-in-terms-of-fourier-series/

With 7 fourier coefficients the model far exceeds the accuracy of the weekly dummy variable approach (52 dummies).

The residuals from 7 fourier coefficients are on the order of .005 hours out of the day or a fraction .005/24 or 2 parts in 10 thousand or 18 seconds out of the day. As expected with these things, the magnitude of the coefficients is decreasing EXPONENTIALLY with frequency. You do just fine really with like 3 fourier coefficients if you are willing to have an accuracy of like 2 or 3 minutes.

This might explain a lot actually, if the residuals are based on a dummy variable approach. Contrary to popular thought that is not some kind of nonparametric functional-form-independent method, it is in fact an assumption that the rainfall in the country is well represented by a piecewise constant discontinuous function of space which happens to have boundaries that totally coincide with the historical decision making process for the layout of the 50 US states and their counties….

So you’d expect both that there would be a lot of residual error, that the error would look a bit like random noise, and that it would look like a pretty high dimensional object with an effective sample size much larger than you’d predict from a much more appropriate Fourier or Chebyshev polynomial basis.