Awhile ago I posted some maps based on the Pew pre-election polls to estimate how Obama and McCain did among different income groups, for all voters and for non-Hispanic whites alone. The next day the blogger and political activist Kos posted some criticisms. I disagree with one of Kos’s suggestions—he wanted me to rely on exit polls, but I don’t actually see them as more reliable than the Pew pre-election polls—but he pointed out some serious problems with my maps. I realized that some fixes were in order. Most importantly:
– My maps would be improved by replacing solid red and blue with continuous shading to distinguish between landslides and narrow margins.
– I needed a more flexible model that would allow the nonlinear pattern of voting and income to vary by state. (In the previous model, I fit a nonlinear pattern (by including a separate logistic regression coefficient for each of the five income categories) but allowed the states to vary only with intercepts and slopes. In the new model, we’re letting all five coefficients vary by state.)
During the past couple of months, I’ve been working on this when I’ve had a spare hour or two, and now I think we have something reasonable to share. Here it is:
States colored deep red and deep blue indicate clear McCain and Obama wins; pink and light blue represent wins by narrower margins, with a continuous range of shades going to pure white for states estimated at exactly 50/50.
The maps are based on a model fit to four ethnic categories (non-Hispanic white, black, Hispanic, other), but I’m only displaying total and non-Hispanic whites. The others are interesting too but they’re based on a lot less data: they’re my (current) best estimates but are much more reliant on model extrapolation.
The estimates are entirely based on the Pew data—except that we use Census-based voter turnout estimates to reweight estimates in each state, and we shift each state’s estimates to be consistent with the actual election outcome in the state. (For example, if our estimate says that Obama got 48% of the total vote in a state (adding up voters from all income and ethnicity categories), and he actually got 46%, then we’d pull down our estimates for each category so that the estimated total is 46%.)
Some particular changes
I’ll talk about a couple of states where Kos pointed out issues with my original maps.
New Hampshire. John McCain won 45% of the two-party vote in New Hampshire, a state which is 93% non-Hispanic white, 1% black, 2% Hispanic, 2% Asian, and 2% other. Based on the Census survey, we estimate that non-Hispanic whites were 96% of New Hampshire’s voters in 2008. If whites represented 96% of the voters, and if McCain received 20% of the votes of the other 4%, then his share of the white vote would be 46%—thus, as Kos pointed out, it’s hard to believe that McCain won in four of the five income categories among whites in the state, as my original map had implied. The problem was in the way that I’d adjusted things to the national vote.
Michigan. As Kos points out, Michigan was closely divided among whites, and so there was something fishy about my original maps, which had Obama winning among whites in four of the five income categories. The new map does not have this problem.
Colorado. This state reveals some problems with the published exit poll data: according to CNN, McCain got 48% of the white vote in Colorado, but, when this was broken down by income, he got 45% of the vote of whites under $50,000 and 47% of the vote of whites over $50,000. This is a mathematical impossibility: using the exit poll numbers, McCain’s percentage of the total white vote should then be (.19*45% + .62*47%)/(.19+.62) = 46.5%, not 48%. I don’t know which of these—if either—is correct. I assume all of these numbers are from the corrected exit polls, adjusted to match up to the actual vote proportions in each state. Our estimate gives McCain 51% of the white vote in Colorado. I think this is possible too, and for that matter it’s consistent with the exit poll estimate of 48%, which has a standard error of at least sqrt(.48*(1-.48)/(.81*1254))=.015, so the exit poll number is within two standard errors of our estimate.
Estimates and raw data
Here are graphs showing our estimates, along with the weighted average from the Pew surveys in each group.(including only those respondents who expressed a preference for Obama or McCain and also said they were “absolutely certain” they had registered to vote):
You can see the partial pooling from the data to the model, with more pooling in small states such as Wyoming, Rhode Island, and Vermont, and less pooling in states such as California, Texas, and New York where sample sizes were larger. The graphs show estimated McCain vote share, so, unsurprisingly, the lines for whites are higher than the lines for all voters, with differences smaller in states such as Wyoming or Vermont where there are very few nonwhite voters.
Some technical details
Even after restricting to respondents who are certain they are registered, the pre-election polls don’t do a great job matching the population of voters. To correctly weight to voters (rather than to the general adult population), we used the 2008 Current Population Survey post-election supplement, which has information on voter turnout. We’ll write a technical article describing exactly what we did, but the short version is that the CPS numbers are generally considered to be much more reliable than exit polls or pre-election polls for estimating turnout rates among different groups within a state. What we actually did was to use a multilevel model to smooth the CPS numbers using the latest population totals from the American Community Survey.
Yair also came up with a cool color scheme. Instead of going from deep red to deep blue through purple, we divided up the color scheme as follows: for proportions between 0 and .5, we used different shades of blue (deep blue, getting progressively lighter, toward white), then going from .5 to 1, we used deeper and deeper reds, starting with white, through light pink, to red. (Don’t worry, I’ll post the R code.) This worked much, much better than the purple schemes I was playing with before. More visual resolution, and a key benefit is that it’s immediately clear which states are above and below the 50% threshold. Finally, I did a little trick of my own and used a square-root transformation (more specifically, if the estimate vote proportion for McCain is x, I defined z = 2*(x-.5), and then worked with sign(z)*sqrt(z)) to spread out the resolution near 0.5 and compress it near 0 and 1.
One other thing. The Pew organization sent me their raw data and posted them on the web for anyone to use. The exit polls still refuse to report anything but summaries. I don’t see this refusal as a sign of confidence on their part. Please also read my earlier note for further discussion of the Pew and exit polls.
All this work is joint with Yair Ghitza.