Skip to content

“How to disrupt the multi-billion dollar survey research industry”

David Rothschild (coauthor of the Xbox study, the Mythical Swing Voter paper, and of course the notorious Aapor note) will be speaking Friday 10 Oct in the Economics and Big Data meetup in NYC. His title: “How to disrupt the multi-billion dollar survey research industry: information aggregation using non-representative polling data.” Should be fun!

P.P.S. Slightly relevant to this discussion: Satvinder Dhingra wrote to me:

An AAPOR probability-based survey methodologist is a man who, when he finds non-probability Internet opt-in samples constructed to be representative of the general population work in practice, wonders if they work in theory.

My reply:

Yes, although to be fair they will say that they’re not so sure that these methods work in practice. To which my reply is that I’m not so sure that their probability samples work so well in practice either!


  1. Mark Palko says:

    Given the trend in response rates, I think the question is less “what works?” and more “what will work five years from now?”

  2. Rahul says:

    If I am equally unsure about two methods the legacy method / status quo gets the benefit of doubt in my mind. Is that wrong?

    Faced with a decades old competing method the onus ought to be on the supplanter to show that it works as well or better.

    • Andrew says:


      That makes sense. Indeed, you’ll have to pay more per respondent using a traditional poll than from an opt-in poll, which indicates that people generally agree with you. If you think the additional expense is worth it, go for it. For several years I’ve been affiliated with the General Social Survey, which uses high-quality in-person surveys. I think the GSS is great. For other purposes it’s good to have large samples and panels. It’s good for the polling ecosystem to have all these possibilities.

  3. Steve Sailer says:

    I’ve been surprised by how well Presidential election polls have held up over the decades of changing communications technology.

    • Andrew says:

      Regarding presidential election polls: I’ve written about this, but the short answer is that the general election for president is pretty predictable, both in the aggregate and in the sense that, by the time the election comes around, people pretty much know who they’re going to vote for. But exit polls are rumored to be way way off, but we never find out about that because they’re adjusted to the election returns.

  4. Andrew,

    The examples I’ve seen of the opt-in/modelling approach haven’t reported interval estimates (this isn’t a comprehensive assessment, just something I noticed in a few cases). It seems to me that when the modelling is good enough to remove bias, it should be good enough to give an interval estimate rather than just a point estimate. Do you have views on this? Do they usually come with interval estimates and I’ve just seen a non-representative set?

    • Andrew says:


      I have no idea. In political science, MRP always seems accompanied by uncertainty estimates. However, when lots of things are being displayed at once, it’s not always easy to show uncertainty, and in many cases I simply let variation stand in for uncertainty. Thus I’ll display colorful maps of U.S. states with the understanding that the variation between states and demographic groups gives some sense of uncertainty as well. This isn’t quite right, of course, and with dynamic graphics it would make sense to have some default uncertainty visualizations as well.

      But one thing I have emphasized, ever since my first MRP paper with Tom Little in 1997, is that this work unifies the design-based and model-based approaches to survey inference, in that we use modeling and poststratification to adjust for variables that are relevant to design and nonresponse. We discuss this a bit in BDA (chapter 8 of the most recent edition) as well. So there’s certainly no reason not to display uncertainty (beyond the challenges of visualization).

      I’ve recently been told that things are different in epidemiology, that there’s a fairly long tradition in that field of researchers fitting Bayesian models to survey data and not being concerned about design at all! Perhaps that relates to the history of the field. Survey data, and survey adjustment, have been central to political science for close to a century, and we’ve been concerned all this time with non-representativeness. In contrast, epidemiologists are often aiming for causality and are more concerned about matching treatment to control group, than about matching sample to population. Ultimately there’s no good reason for this—even in an experimental context we should ultimately care about the population (and, statistically, this will make a difference if there are important treatment interactions) but it makes sense that the two fields will have different histories, to the extent that a Bayesian researcher in epidemiology might find it a revelation that Bayesian methods (via MRP) can adjust for survey bias, while this is commonplace to a political scientist as it’s been done in that field for nearly 20 years.

      • Thomas Lumley says:

        Thanks. That makes sense. Epidemiologists have actually known about post-stratification for a long time — they call it ‘direct standardisation’ — but they don’t apply it to regression models, just to simple means and rates.

        The two examples I noticed most recently that were reported without any uncertainty were actually political: the YouGov estimates for the Scottish independence vote, and a project related to the New Zealand elections, called “Vote Compass”.

        Integrating MRP into my survey software is definitely on my to-do list — explaining things to a computer is a great way to make sure you understand them, since the computer won’t pretend you’re making sense if you aren’t.

  5. […] 22 – “How to disrupt the multi-billion dollar survey research industry” by Andrew David Rothschild (coauthor of the Xbox study, the Mythical Swing Voter paper, and of […]

Leave a Reply