My colleagues Joe Bafumi, Bob Erikson, and Christopher Wlezien just completed their statistical analysis of seat and vote swings. They write:
Via computer simulation based on statistical analysis of historical data, we show how generic vote polls can be used to forecast the election outcome. We convert the results of generic vote polls into a projection of the actual national vote for Congress and ultimately into the partisan division of seats in the House of Representatives. Our model allows both a point forecast—our expectation of the seat division between Republicans and Democrats—and an estimate of the probability of partisan control. Based on current generic ballot polls, we forecast an expected Democratic gain of 32 seats with Democratic control (a gain of 18 seats or more) a near certainty.
These conclusions seem reasonable to me, although I think they are a bit over-certain (see below).
Here’s the full paper. Compared to our paper on the topic, the paper by Bafumi et al. goes further by predicting the average district vote from the polls. (We simply determine what is the vote needed by the Democrats to get aspecified numer of seats, without actually forecsasting the vote itself.) In any case, the two papers use similar methodology (although, again, with an additional step in the Bafumi et al. paper). In some aspects, their model is more sophisticated than ours (for example, they fit separate models to open seats and incumbent races).
Slightly over-certain?
The only criticism I’d make of this paper is that they might be understating the uncertainty in the seats-votes curve (that is, the mapping from votes to seats). The key point here is that they get district-by-district predictions (see equations 2 and 3 on page 7 of their paper) and then aggregate these up to estimate the national seat totals for the two parties. This aggregation does include uncertainty, but only of the sort that’s independent across districts. In our validations (see section 3.2 of our paper), we found the out-of-sample predictive error of the seats-votes curve to be quite a bit higher than the internal measure of uncertainty obtained by aggregating district-level errors. We dealt with this by adding an extra variance term to the predictive seats-votes curve.
In summary
I like this paper, it seems reasonable, and I like how they do things in two steps: using the polls to predict the national swing and then using district-level information to estimate the seats-votes curve. I’d like to see the scatterplot that would accompany equation 1, and I think the election outcome (# of seats for each party) isn’t quite so predictable as they claim, but these are minor quibbles. It goes beyond what we did, and all of this is certainly a big step beyond the usual approach of just taking the polls, not knowing what to do with them, and giving up!
This aggregation does include uncertainty, but only of the sort that's independent across districts. In our validations (see section 3.2 of our paper), we found the out-of-sample predictive error of the seats-votes curve to be quite a bit higher than the internal measure of uncertainty obtained by aggregating district-level errors. We dealt with this
So what happens to these new results if you apply this correction?
Just as a reference, the Intrade contract for GOP house control is currently trading at an implied probability of 32.6–34.5%. Perhaps they're banking on a last-minute shift—or on Diebold voting machines.
This is the authors' mistake, not yours, but Democrats need only pick up 15 seats for the majority. This is presumably a typo — control of the House = 218 seats.
While the Joe Bafumi, Bob Erikson, and Christopher Wlezien paper is interesting, I must say that is not well written. The informal and rule-of-thumb style is not fun to read.