Are you ready for some smashmouth FOOTBALL?

Kickoff

This story started for me three years ago with a pre-election article by Tyler Cowen and Kevin Grier entitled, “Will Ohio State’s football team decide who wins the White House?.” Cowen and Grier wrote:

Economists Andrew Healy, Neil Malhotra, and Cecilia Mo . . . examined whether the outcomes of college football games on the eve of elections for presidents, senators, and governors affected the choices voters made. They found that a win by the local team, in the week before an election, raises the vote going to the incumbent by around 1.5 percentage points. When it comes to the 20 highest attendance teams—big athletic programs like the University of Michigan, Oklahoma, and Southern Cal—a victory on the eve of an election pushes the vote for the incumbent up by 3 percentage points.

Hey, that’s a big deal (and here’s the research paper with the evidence). As Cowen and Grier put it:

That’s a lot of votes, certainly more than the margin of victory in a tight race.

Upon careful examination, though, I concluded:

There are multiple games in multiple weeks in several states, each of which, according to the analysis, operates on the county level and would have at most a 0.2% effect in any state. So there’s no reason to believe that any single game would have a big effect, and any effects there are would be averaged over many games.

So I wasn’t so disturbed about the legitimacy of the democratic process. That said, it still seemed a little bit bothersome that football games were affecting election outcomes at all.

Hard tackle

The next chapter in the story came a couple days ago, when Anthony Fowler pointed me to a paper he wrote with B. Pablo Montagnes, arguing that this college football effect was nothing but a meaningless data pattern, the sort of thing that might end up getting published in a rag like Psychological Science:

We reassess the evidence and conclude that there is likely no such effect, despite the fact that Healy et al. followed the best practices in social science and used a credible research design. Multiple independent sources of evidence suggest that the original finding was spurious—reflecting bad luck for researchers rather than a shortcoming of American voters.

We might worry that this surprising result is a false positive, arising from some combination of multiple testing (within and across research teams), specification searching, and bad luck.

I discussed this all yesterday in the sister blog, concluding that voters aren’t as “irrational and emotional” as is sometimes claimed.

Short passes

Neil Malhotra, one of the authors of the original paper in question, pointed to two replies that he and his colleagues wrote (here and here).

And Anthony Fowler pointed to two responses (here and here) that he and his colleague wrote to the above-mentioned replies.

And in an email to Malhotra, I wrote: My quick thought is that the framework of “is it or is it not a false positive” is not so helpful, and I prefer thinking of all these effects as existing, but with lots of variation, so that one has to be careful about drawing general conclusions from a particular study.

Three yards and a crowd of dust

Where do the two parties stand now?

Malhotra writes:

I think the findings of the Fowler/Montagnes paper are incorrectly interpreted by the authors. Instead of a framing of “the original effect was a false positive,” I think a more accurate/appropriate framing is:

1. Some independent, separate tests conducted by Fowler/Montagnes are not consistent with the original study. Therefore, in a meta-analytic sense, the overall literature produces mixed results. Fowler/Montagnes’ results in no way mean that the original study was “wrong.” However, we argue that these independent tests are not appropriate to test the hypothesis that mood affects voting. For
example, it’s not surprising to us that NFL football outcomes do not influence elections since NFL teams are located in large metropolises where there are many competing sources of entertainment. On the other hand, the fate of the Oklahoma Sooners in Norman, OK, is the main event in the town. Further, single, early regular season games in the NFL are less important than later, regular season games in NCAA football. So the dosage of good/mad mood is much lower in the NFL study. Now readers may agree/disagree with me about the validity of the NFL test. But the important thing to realize is that the NFL study is a different, separate test. It doesn’t tell us anything about whether the original study is a “false positive” or is incorrect.

2. Some tests on sub-samples of the original dataset show that there is heterogeneity in the effect. For example, Fowler/Montagnes show that the effect does not seem to be there when voters appear to have more information (e.g., in open-seat races, and when there is partisan competition). This is very theoretically interesting (and cool) heterogeneity, and it definitely changes our interpretation of the original findings. However, these are not “replications” of the original result, and do not speak to whether the original results are false positives.

In sum, I am very open to criticisms of my research. I think this new paper definitely changes my opinion of the scope of the original findings. However, I do not think it is accurate to say that the original findings are incorrect or that the findings were obtained by “bad luck.” There are some auxiliary tests conducted by Fowler/Montanges that either support our results (e.g., that geographically proximate locations outside the home county also respond to the team’s wins/losses) or don’t make much sense (e.g., Texas is a minor college football team?), but we will let interested readers weigh the evidence.

And here’s Fowler:

Of course, we wouldn’t conclude that the effect of college football games on elections is exactly zero, but our independent tests suggest that most likely the effect is substantively very small and Healy et al.’s original results were significant overestimates.

If their purported effects are genuine, we would expect them to vary in particular ways, but none of these independent tests are consistent with the notion that football games and subsequent mood influence elections. So by examining treatment effect heterogeneity in theoretically motivated ways, we reassess the credibility of the original result.

Handoff

We’re getting closer. I’ll invoke the Edlin effect and say that I think the originally published estimates are indeed probably too high (and, as noted near the beginning of this post, even the effects as reported would have a much much more minor effect on elections than you might think based on a naive interpretation of the numerical estimates of direct causal effects). Based on my own statistical tastes, I’d prefer not to “test the hypothesis that mood affects voting” but instead to think about variation, and I like the part of Malhotra’s note that discusses this.

When it comes to the effects of mood on voting, I think that the point of studying things like football games is not that their political effects are large (that is, let’s ignore that original Slate article) but rather that sporting events have a big random component, so these games can be treated as a sort of natural experiment. As Fowler, Malhotra, and their colleagues all recognize, such analyses can be challenging, not so much because of the traditional “identification” problem familiar to statisticians, econometricians, and political scientists (although, yes, one does have to worry about such concerns), but rather because of the garden of forking paths, all the possible ways that one could chop up these data and put them back together again.

Two-minute warning

There is no two-minute warning here. There is no game that is about to end. Research on these topics will continue. I agree with both Malhotra and Fowler that the way to think about these problems is by wrestling with the details. And I do think these discussions are good, even if they can take on a slightly adversarial flavor. I’d like the mood-and-politics literature to stay grounded, to not end up like the the Psychological-Science-style evolutionary psychology literature, where effects are all huge and where each new paper reports a new interaction. Now that we know that a subfield can be spun out of nothing, we should be careful to put research findings into context—something that both Malhotra and Fowler are trying to do, each in their own way.

To put it another way, they’re disagreeing about effect sizes and strength of evidence, but they both accept the larger principle that research studies don’t stand alone, and Healy et al. are open and accepting of criticism. Which one would think would be a given in science, but it’s not, so let’s appreciate how this is going.

And to get back to football for a moment: As a political scientist this is all important to me because, to the extent that things like sporting events sway people’s votes, that’s a problem—but, to the extent that such irrationalities are magnified by statistical studies and hyped by the news media, that’s a problem too, in that this can be used to disparage the democratic process.

Overtime

Fowler saw the above and added:

Some of Neil’s comments reflect a misunderstanding of our paper. For example, we did not show that “the effect does not seem to be there when voters appear to have more information” and we never wrote anything along those lines. In the test he’s alluding to (Table 1, “By incumbent running”), we find that the purported effect of football games on incumbent party support is no greater when the incumbent actually runs for reelection. One of our predictions is that if football games meaningfully influence incumbent party support by affecting voter mood, we should expect a bigger effect when the incumbent is actually running. However, the interactive point estimate is actually negative and statistically insignificant, meaning that we don’t find variation in the direction one would expect if the effect is genuine, and if anything, the variation goes in the wrong direction.

Neil’s comments also sound to us like ex-post rationalization. He argues that we shouldn’t expect NFL games to influence local mood in the same way that college football games do, but the local television ratings suggest the opposite. In another interview (here), Neil justified the notion that football games influence mood by citing Card and Dahl who find that football games influence domestic violence. But Card and Dahl analyze NFL games, and they provide arguments as to why the NFL provides the best opportunity to estimate the effects of changes in local mood. In our view, Healy et al. contradict themselves by rationalizing the null effect in the NFL while at the same time citing an NFL study as evidence that football games influence mood.

27 thoughts on “Are you ready for some smashmouth FOOTBALL?

  1. I’m curious why the authors didn’t bother to look at voter turnout here. There are a few things that come to mind (and I’ll preface by noting that I haven’t read all the rebuttals).

    Let’s assume the effect is real (and ignore the size). The original article only looks at voter share. This seems to imply the idea that voters are changing their votes based on the outcome. But it doesn’t attempt to look at voter turnout. If for whatever reason incumbent supporters are more closely tied with their college teams, the I could imagine a scenario where rather than changing votes to the incumbent by existing voters, it may just be that more voters are willing to show up to the polls. Based on my experiences living in Ann Arbor for 5 years, there are almost certainly negative effects of the ability for the locals/students/etc. to get up off the couch after a loss. The entire mood of the town changes. Very few people are out and about in town after the loss, but with a win, there are a huge number of people on the streets. I could see this carrying over during the week, which could result in this effect if these people were disproportionately incumbent supporters.

    Maybe the incumbent support among these people is not unlikely: the ones that are depressed and still going out to vote may be those more interested in getting rid of the incumbent, while those stuck on the couch are less involved in the first place, which I would assume leans them toward the incumbent since the name is known. But those effects are outside my area and I’m making a LOT of conjecture here but all seems relevant to understanding the place where the effect takes place, which seems key. One result would say that mood affects the candidate choice. The other implies that reducing barriers to get to the poll would result in no more effect.

    It’s also surprising to me that economists would not use the deviation from the polls, rather than just raw vote percentage as a robustness check. I can envision a scenario where local football teams are better because the local political/economic structure has been better (or perceived to be better). This structure or environment likely has some (perhaps very small) effect in the recruiting process. In that case, the causal effect is actually reversed, and the team is marginally better because the incumbent actually is a (perceived) marginally better policymaker. So, rather than mood affecting the vote, the incumbent’s may actually have had a positive effect on the football environment, whether it’s a public school or not. But if they authors looked at the deviation from expectations for the voting outcome (not just the game), they might be able to get at this effect a bit more.

    Further, just controlling for economic/demographic characteristics of the areas wouldn’t be sufficient here from the standpoint of the causal effect. There’s pretty clear endogeneity between the policymaker and these characteristics and the relatively simplistic DiD framework doesn’t seem sufficient.

    Again, plenty of conjecture by me here, but these seem like pretty easy checks to deal with. I do think it’s a neat idea, but I’m also biased toward the sports angle.

    • Hi BMMillsy:

      I encourage you to get into the weeds, but most of the issues you raise have been addressed:

      1. We examined turnout effects and found none on average.

      2. We use numerous approaches to causal inference to get at the causal effect. We do not just control for economic/demographic areas. We also don’t just use a difference-in-difference strategy. We also: (1) condition on win probabilities from betting spreads (like Card and Dahl) to approximate the natural experiment; (2) conduct placebo tests to see if future games affect past vote share; and (3) include fixed effects to examine deviations from normal vote share/team strength.

      • Thanks for responding!

        Interesting and helpful to know about turnout effects (if I missed these as part of the paper, I apologize).

        Will have to think more on the strategies I’m thinking about. On the run right now.

      • On 2. (2), I do really like the placebo test, which I thought was an extension of the DiD framework.

        I guess I’m thinking along the lines of a propensity score with the LHS the probability of winning a game generally (from the data, rather than conditioning on expectations) as a first stage matching procedure being modeled with information on the political/economic climate on the RHS. From here, maybe a second stage where vote proportion (or perhaps deviation from expectations about the election’s vote proportion) is the LHS variable. This could also be conditioned on the deviation from expectations about the game to get the heterogeneous effect of surprise/euphora of the upset that you have (I think). That sort of set up might address any existing football quality – political climate quality endogeneity. Then again it might not, and the endogeneity may not exist anyway.

        Maybe this is what you did in your comment response 2. (1). I’ll have to go back and read more carefully.

        Thanks again for clarifying. Like I said initially, I find this very interesting as someone particularly interested in the economics of sport (and using sport to inform economics).

  2. Andrew: “this can be used to disparage the democratic process”

    This is a very valid concern. I find a similar problem in many studies of “voter information” where portraying voters as politically ignorant as possible constitutes an “interesting finding” (translation: a highly biased estimate that will be echoed by media).

    In my view the metric used is often totally inappropriate.

    For example, probably the most important thing a pistachio farmer in California needs to know about politics is where political candidates stand on the pricing of water. What is ISIS, where is Lybia, or the ability to name three U.S. presidents is, in my view, mostly irrelevant and somewhat paternalistic.

    • I really disagree strongly.

      What the President of the United States thinks about water pricing is completely irrelevant: the office has little or no influence over water pricing. Moreover, this water pricing issue affects relatively few people, and at least in current circumstances, nobody lives or dies by it.

      By contrast, the President will get to decide more or less on his/her own whether we go to war with ISIS, or whether to intervene in the Libyan chaos. And for those decisions, thousands, millions of lives may hang in the balance.

      I will, however, concede that the ability to name three (or any) US presidents is irrelevant.

      • Clyde:

        Presidential elections are not the only elections. I would have thought farmers interested in water regulation would be more involved in local / state / and congressional politics.

  3. I guess I am the only one here who finds all this sports related stuff totally boring? I can’t bring myself to read anything on football and statistics, even though there’s probably something interesting statistically there.

        • So, by that logic, do you think studying *anything* at all is irrelevant to the real world?

          In my opinion, its a question of degree. I get the feeling that, just like publishing a counter-intuitive sensational finding has become a path to quick glory, so also we are seeing the glorification of the trivial.

          Nothing wrong in studying sports but there’s just too much scientific coverage these days of stuff like hot-hands etc. What used to be fun and games & sports-journalist or amateur / dilettante statistician territory has gradually turned into respectable serious research programs.

        • I guess I’m lost on why bother trivializing research about sport just because it apparently doesn’t interest you. We’re talking about a possibly multi trillion dollar industry worldwide that accounts for a good chunk of leisure time for individuals and a topic that has a rich cultural history across nations and regions.

          Something of this size, with its ingrained cultural status, that possibly impacts how people feel/function and ultimately election outcomes isn’t particularly trivial, nor is it just “studying anything.”

          While the intricacies of some of the game play and strategic outcomes, such as the hot hand literature, are a good bit overplayed, the fact that it also tends to get many people involved in the statistical process that may otherwise not care also seems rather non-trivial to me.

          As Malhotra notes below, the study is about mood, using a rather salient mood driver (football) to frame the study. The salience of sport makes it rather useful in the context of social phenomena. Sure, it can also go wrong or be trivial. Sure much of the bad stuff gets over-covered in the media. But that’s not much different than other areas.

          You’re not interested in sports. Fine. It’s something a large population can relate to.

  4. It’s possible that some football games can influence some elections but without the relationship being consistent enough over time to be measured. Mass moods are complicated and contingent, dependent upon a host of other factors competing for public attention at the same time. So it could be that, say, the 2004 Ohio St. season had some effect on the 2004 presidential election in Ohio but not the 2008 election, due to whimsical factors.

    Social scientists are looking for relationships as consistent as in the natural sciences, but a lot of success in the world of human affairs comes to those who happen to ride a fad that isn’t as replicable as the laws of physics.

  5. What’s the deal with junior people going after the work of senior people (Lacour/Green scandal being the most prominent)? For those w/o tenure, it seems like a good way to make powerful enemies.

    • Risk:

      Broockman was senior to Lacour. Yes, Broockman was junior to Green but he was doing Green a favor, saving him whatever huge amount of time he would’ve invested in the project had he not been informed of the fraud. If I were Green, I’d be extremely angry at cheater Lacour and very grateful for Broockman for bailing him (Green) out.

      In the above example, Fowler may be junior to Malhotra but they are in different fields. Economist Malhotra can’t do much to political scientist Fowler. Also Malhotra is a serious scientist and there’s no reason he’d want to retaliate against a serious critic.

    • This is an unfortunate view in academia, that young scientists should keep their mouths shut. Honestly, as a younger academic, if discussing papers and being critical of them is going to cost me tenure, it’s not a place I want to stay anyway.

      And I’m fine with being wrong. Without that engagement, though, it’s harder to learn directly from those more senior academics.

      I think Dr. Malhotra’s response to my thoughts above makes clear that the original authors are engaged in thoughtful and interesting conversations regarding the research. There’s nothing I appreciate more in a rather competitive academic climate where insecurity reigns supreme.

        • I’m in a small (niche) area where I don’t see much of this sort of subversion taking place, which I am grateful for.

          I do see it elsewhere with friends and family in other fields and the damage it does to the actual science is unfortunate. Often, the entire reason for going to conferences seems to be to ensure others think their own work is no good, irrespective of the quality. It’s particularly bad at large, impersonal conferences from the stories I hear.

  6. In response to the “Overtime” above:

    1. Our reference to the Card/Dahl analysis was methodological, not substantive. We cited the paper to show that the analysis with the betting spreads was the most rigorous analysis across both papers (because it models the data as a quasi-experiment), but this analysis was not considered in the F/M critique. On the substance, our argument is that NFL games provide a lower dosage of a mood effect than college football games, due to entertainment competition in the local areas. Readers may disagree with that. The key issue is not whether the NFL replication is a good/bad study. Rather, it is an unambiguously *different* study, and does not speak to whether the original result was “wrong” or a “false positive.” The NFL study tests the hypothesis of the original paper and comes to a different conclusion. That is an important addition to the literature for sure, but it is a separate addition.

    2. Many of the analyses in the F/M critique are conducted on or compare subsets of the original data (e.g., the incumbent analysis referenced above, the models with county-year fixed effects). These analyses show that the effects may be heterogeneous, but again, these are not replications that can establish false positives, since they are conducted on different samples. Again, these analyses are relevant and theoretically interesting.

    3. As a general point, I think the issue is that this follow-up critique by F/M is very interesting and useful. However, the framing of the original results as “false positives” is not helpful or accurate. It might be a good idea to think of this in a Bayesian (and not frequentist fashion). Additional tests and analyses (e.g., the NFL study) should cause us to move our priors, but we are balancing different pieces of evidence. Some studies have shown the influence of sports on election results (our paper, Mike Miller’s paper in SSQ, Jamie Druckman’s survey-based natural experiment). F/M do not find an effect. What we know now is that there is mixed evidence in a meta-analytic sense.

    4. Lastly, our paper is about the effect of football games on elections. That would be a silly topic for a paper. Rather, we sought to build on a large and extensive literature that found that mood can affect individually high-stakes actions (e.g., consumer behavior). We also found that mood affects individually low-stakes behavior, voting. The consensus of numerous studies is that mood affects how people process information. That seems to us to be uncontroversial.

Leave a Reply to Fernando Cancel reply

Your email address will not be published. Required fields are marked *