Survey Statistics: a new paradigm

Over the next posts, let’s dig into Michael Bailey‘s A New Paradigm for Polling and the commentary it got from other wonderful survey statisticians. In this post, let’s get introduced to the response instrument, and see what these folks have to say about it.

Bailey begins:

the polling field needs to move to a more general paradigm built around the Meng (2018) equation that characterizes survey error for any sampling approach, including nonrandom samples.

We discussed Meng 2018 “Statistical Paradises and Paradoxes” in this blog series here.

Most of what we’ve discussed so far (e.g. poststratification, weighting) assumes that within groups based on covariates X, response R is independent of outcome Y. Random sampling (within X) achieves this.

In Section 4, Bailey looks at some methods for when R might depend on Y (within X), a “nonrandom” paradigm. These methods rely on a response instrument Z: a variable that affects the probability of response R but does not directly affect the outcome of interest Y.

For example, let Z = 0 if someone is forced to discuss politics (“high response protocol”), and Z = 1 if they are given the choice to discuss politics or opt out and discuss sports (“low response protocol”). From Bailey 2025 (with my additions in green):

Intuitively, the response instrument helps because we can compare observed Y between low versus high response protocols, which gives information about the dependence between Y and R. How this translates to an estimate of population Y depends on methods and assumptions Bailey doesn’t fully dive into here.

Sharon Lohr’s comments describe a sufficient assumption: no interaction between outcome Y and instrument Z in the model for response R. Confusingly, she uses “X” instead of “Z” for the instrument, so I’ve edited below:

All three models in Figure 1 perfectly fit the data. In Figure 1(a), the high response protocol (Z = 0) is “random” (R does not depend on Y), so our usual methods work. In Figure 1(c), there is no Z*Y interaction in the response model, so the response instrument methods work. In Figure 1(b), it isn’t clear what to do.

Perhaps we should propagate uncertainty about these assumptions to our final results. Rod Little’s comments include a suggestion to “use Bayesian modeling, including a prior distribution for unidentified parameters.” Has anyone seen a good example of Bayesian modeling using response instruments ?

In his comments, Shiro Kuriwaki says that randomized instruments “give us leverage in the face of unobservable confounders, but no leverage comes for free.” He advocates more research into these methods. Agreed !

We’ve been focused on the randomization instrument Z. But we can’t forget about the response R and how its dependence on Y differs with different survey recruitment protocols (e.g. web versus text), see Sharon Lohr’s comments. And conditioning on more covariates X increases the plausibility that Y and R are independent. So instead of pursuing a randomization instrument, we could focus on improved recruitment protocols and enlarging our set of covariates X. Thoughts ?

p.s. Bailey’s post also clarifies why Raphael Nishimura doesn’t like the term “representativeness”, see this blog comment: it’s too vague, and can be used to mean “matches some population demographics”, which may not guarantee much about the outcomes of interest Y.

10 thoughts on “Survey Statistics: a new paradigm

  1. Gustavo and I wrote a short discussion of the Bailey paper that was published in same issue of that journal. We wrote:

    We agree with the general point of Bailey (2023) that random sampling is a distant benchmark for real-world polls that either have very low response rates or that are constructed from panels that do not even
    purport to be random samples of the population. . . . Whether your data come from random-digit dialing, address-based sampling, the internet, or plain old knocking on doors, you’ll have to do some adjustment to correct for known differences between sample and population. (Bailey speaks of “weighting,” but we prefer the term “adjustment,” which encompasses more general possibilities for population inference.)

    The key contribution of Bailey’s article is to emphasize the relevance of differences between sample and
    population that have not been included in survey adjustments. When a poll oversamples Democrats or Republicans, adjustment for party identification can make a big difference . . . Indeed, even the notorious Literary Digest poll of 1936 could have been made much more accurate simply by adjusting for respondents’ stated votes in the previous election (Lohr and Brick, 2017). Adjusting for party identification or voting history can be challenging because these variables are not tabulated in the census, but imperfectly adjusting could be better than not even trying.

    Political polls oversample people who are interested in politics. This bias is well known, but pollsters are
    typically interested in voters more than in the general population, so we have not tended to think too much about it. . . . Bailey provides an interesting clue in his Figure 3, which shows a correlation between interest in politics and support for Biden in the 2020 American National Election Study (ANES). This was not something we had expected to see, especially after all the news coverage of passionate Trump voters. . . .

    A challenge here is that many polls adjust for estimated likelihood to vote, which itself could be highly
    correlated with interest in politics. So, even the direction of any adjustment for interest in politics is not clear, but in any case we should recognize the potential importance of going beyond conventional
    adjustment variables.

    Our paper has some pretty graphs, too. Maybe I’ll post something on it directly!

    • Thanks, Andrew ! I’d like to write the next posts covering the other discussions of Bailey’s post, including yours of course. In this one I focused on the response instrument, which I think isn’t directly covered your paper ?

      • Political polls oversample people who are interested in politics.

        Do you think this is true also of polls that recruit in other ways, e.g. online platforms that pay people to take surveys on all topics, and politics is just in that mix ?

        • Shira:

          I can’t be sure, but, yes, I would guess that polls in general, including those from online panels, oversample people who are interested in politics.

  2. In equation 6, why is there no comma between z and y and instead, there is a blank space? Is this some sort of convention to indicate that the product term, zy, is not allowed in the model.

  3. Thanks Shira for digging into this.

    My big picture take is that our models should reflect the nature of the problem. Every single pollster recognizes that R may depend on Y — and that this is likely a cause of “misses” regarding Trump over the last decade (and chronic problems in estimating turnout from polls etc). So why should be be content with models that assume that away?

    You’ve provided a nice introduction to randomized response instruments. For folks new to the concept, I like to start with the “tilted fish” (see https://hdsr.mitpress.mit.edu/pub/ejk5yhgv/release/4). With ignorable nonresponse (MAR) the average of Y is the same for all levels of response interest. (Each dot is a person; dots on the right really love to respond.)

    With non-ignorable nonresponse (MNAR) the average of Y differs by level of response interest. A randomized response instrument varies the (effective) response instrument. Seeing the values of Y differ across treatment groups would be consistent with non-ignorable nonresponse. Seeing no difference would be consistent with ignorable nonresponse.

    As Kuriwaki, Lohr and other point out, randomized response instruments require assumptions (I go into more detail in the 2025 paper that you linked to). But let’s not forget that conventional MAR based models assume that R is independent of Y — a potentially strong assumption. Randomized response instruments can allow the effect of a response instrument to vary by X, but not Y.

    I don’t think randomized response instruments are a magic bullet or final word, but they do allow us to stop assuming MAR and they always (afaik) help reduce bias in polling based estimates of turnout and often suggest that Trump numbers are too weak due to non-ignorable nonresponse.

    I hope there’s a lot more research on the topic — and more broadly, that we move beyond MAR limits. I’ve got a paper using panel data — we can literally observe differences in response interest by individual and we definitely see differences in Y (e.g., turnout or Trump vote) associated with these differences, even when controlling for X.

    • Thank you, Michael !

      I do feel that the MAR assumption can be made more plausible if we work on improved recruitment protocols and enlarging the set of covariates X that we adjust for (Andrew and Gustavo’s comments relate to this). I’m not sure how to think about pursuing those lines of research versus pushing beyond-MAR. Both seem important !

      • Thanks, Andrew !

        Indeed, enlarging the set of covariates X can make a big difference, as you point out in The mythical swing voter:

        The addition of attitudinal variables in the MRP model corrects for differential response rates by party ID and other attitudinal variables at different points in the campaign. Compared to the demographic-only post-stratification (shown in gray), post-stratification on both demographics and party ID greatly reduces (but does not entirely eliminate) the swings in vote intention after the first presidential debate.

Leave a Reply

Your email address will not be published. Required fields are marked *