Political attitudes look more stable over time if measured by averaging responses to several issue questions

Jim Snyder is speaking Fri 17 Feb at NYU on “Issue preferences and measurement error,” his paper with Steve Ansolabehere and Jonathan Rodden. I think this is an important paper. First I’ll give the abstract (again, here’s the paper) and then my comments.

Ansolabehere, Rodden, and Snyder write:

We show that averaging a large number of survey items on the same broadly-defined issue area–e.g., government involvement in the economy, or moral issues–eliminates a large amount of measurement error and reveals issue preferences that are well structured. Averaging produces issues scales that are stable over time, and with enough items, these scales are often as stable as party identification. The scales also exhibit within-survey stability when we construct scales made from disjoint subsets of survey items. Moreover, this intertemporal and within-survey stability increases steadily, and in a manner consistent with a standard common measurement error model, as the number of survey items used increases. Also, when we estimate Converse’s “black-white” model, we find that at most 20-25 percent of respondents can be classified as “pure guessers,” and 75-80 percent have stable attitudes over issue areas, the reverse of Converse’s conclusion. Finally, in regressions predicting presidential vote choice, the issue scales appear to have much more explanatory power–relatively large coefficients and much larger t-values–than any of the individual survey items used in constructing the scales.

My comments:

1. I think this paper is important for substantive reasons (it’s interesting that voters are more issue-consistent than researchers have thought), but also statistically–iIm always trying to make the point to students and collaborators that it’s a good idea to combine questions into “scores,” and it’s nice to have this example. Sometimes items can be combined using factor analysis, but often simple averages are effective (as Robyn Dawes and others have pointed out in the context of psychological measurement).

2. I wonder if they chould talk more about what is meant by “measurement error.” This could be unstable preferences on individual items (e.g., a weak preference that is easily influenced by what was in the newspaper last week), or people not paying attention and answering randomly, or people not understanding the question, or different people having different ideas of what the question is, …

The term “measurement error” could be misleading–it assumes a lot, in that it implicitly assumes that there is a “true response” that is “measured” with “error.” I don’t think the paper stands or falls on this assumption, so it might be good to detach it a bit from the main results.

3. Just to clarify my above comment: I think Section 2 of the paper is fine at a technical level but perhaps still leaves too much assumed about where the errors come from.

4. When referring to the black/white model, it would be good to refer also to some more sophisticated versions, for example the paper that Jennifer Hill and others wrote on this a few years back.

And some statistical comments:

– I’d round off all correlation to the nearest digit (e.g., say 0.6 rather than 0.62).

– p.16: I don’t see why you have to apologize for using issues with only 4 or 6 items. if averaging items is going to help, it’ll help the most, at the margin, when the number of items is small. i.e., going from 1 to 4 items should produce a big benefit right away. and by looking at several different issue areas, you can see this.

– p.19, etc: I’d round off coef estimates to 1 decimal place (e.g., 1.3, rather than 1.32). The precision shown is illusory.

– I’d get rid of most of the tables. For example, what does table B.1 give you? If you really want to include these numbers, I’d just put them in parentheses after each question in appendix A. A;so, I would round these to 1 decimal place when including them in the list. But, really, appx B is telling the reader nothing. If these loadings are relevant, it would be better to see them with the text of the questions.

– Tables 1,2,3 would work better as time-series plots (separate plot for each issue area)

– Table 4 needs some interpretation. Are these logits? How do we interpret “1.12”, “.24”, etc. Again, round to 1 decimal place.

– I find Figures 1 and 2 a bit confusing. But if you do want to use them, I’d clean up as follows:
1. Make 2 graphs smaller so they fit on one page side by side
2. Use common y-axis going from 0 to 1
3. Label y-axis at 0, .5, and 1.0, that’s enough
4. Label x-axis at 1, 5, 10, 20. that will make it much more readable.

In summary

This seems like an important paper to me in getting closer to understanding issue attitudes and polarization. It seems that you get a lot by combining questions.