Skip to content

Beyond forking paths: using multilevel modeling to figure out what can be learned from this survey experiment

Under the heading, “Incompetent leaders as a protection against elite betrayal,” Tyler Cowen linked to this paper, “Populism and the Return of the ‘Paranoid Style’: Some Evidence and a Simple Model of Demand for Incompetence as Insurance against Elite Betrayal,” by Rafael Di Tella and Julio Rotemberg.

From a statistical perspective, the article by Tella and Rotemberg is a disaster of forking paths, as can be seen even from the abstract:

We present a simple model of populism as the rejection of “disloyal” leaders. We show that adding the assumption that people are worse off when they experience low income as a result of leader betrayal (than when it is the result of bad luck) to a simple voter choice model yields a preference for incompetent leaders. These deliver worse material outcomes in general, but they reduce the feelings of betrayal during bad times. Some evidence consistent with our model is gathered from the Trump-Clinton 2016 election: on average, subjects primed with the importance of competence in policymaking decrease their support for Trump, the candidate who scores lower on competence in our survey. But two groups respond to the treatment with a large (between 5 and 7 percentage points) increase in their support for Donald Trump: those living in rural areas and those that are low educated, white and living in urban and suburban areas.

There are just so many reasonable interactions that one could look at here, also no reason at all that we’d expect to be in a “needle in a haystack” situation in which there are one or two very large effects and a bunch of zeroes. So it doesn’t make sense to pull out various differences that happen to be large in these particular data and then spin out stories. The trouble is that this approach has poor statistical properties under repeated sampling: with another dataset sampled from the same population, you could find other patterns and tell other stories.

It’s not that Tella and Rotemberg are necessarily wrong in their conclusions (or Cowen wrong in taking these conclusions seriously), but I don’t think these data are helping here: they all might be better off just speculating based on other things they’ve heard.

What to do, then? Preregistered replication (as in 50 shades of gray), sure. But, before then, I’d suggest multilevel modeling and partial pooling to get a better handle on what an be learned from their existing data.

This could be an interesting project: to get the raw data from the above study and reanalyze using multilevel modeling.


  1. Student says:

    Can you point to a Stan case study/example that would be a similar application of hierarchical modeling?


    • Peter says:

      It’s not Stan per se, but I think this paper gives a good conceptual explanation of how you might run these subgroup analyses simultaneously with a multilevel model: . Check out the sections “Multiplicity—estimating the prior” and “Bayes’s theorem for subset analysis”. I hope I’ll be corrected by someone if I’m off base, but I guess the idea is that instead of naiively running a ton of variables and their interactions — which would look something like lm( outcome ~ rural + educated + priming + rural:educated + rural:priming + educated:priming + rural:educated:priming ) — you could treat the smaller subgroups as though they’re coming from a distribution with a prior pulling their estimates toward the mean.

      So you might create a single “subgroups” variable that encodes all the possible groups of interest (1 = rural and educated, 2 = non-rural and uneducated, 3 = rural and uneducated, 4 = non-rural and educated,…) and run something like stan_lmer( outcome ~ priming + (priming | subgroups) ). With this model you get an estimate for the overall effect of priming, but you can also check out the varying intercepts to get (partially pooled) estimates for how the different subgroups score on the outcome, and look at the varying slopes to get partially pooled estimates for how priming differentially affects these subgroups, if at all.

      I’m posting this for you in part because I hope it baits someone smarter than me into give you a better answer. I’m curious about this as well. I often see the suggestion but I don’t know that I’ve come across a case study that gives real examples of how it would look in code.

  2. I’m going to have to ask Tyler Cowen for access to that paper.

  3. LemmusLemmus says:

    “The trouble is that this approach has poor statistical properties under repeated sampling: with another dataset sampled from the same population, you could find other patterns and tell other stories.”

    This sounds like a pretty good summary of a lot of your writing in the last few years.

  4. I favor Andrew’s view thus far. I’ll have to re-read the paper as it is very detailed. Could have been reduced by 7 or 8 pages.

Leave a Reply