Survey Statistics: more adventures in mismeasured X

Posted on December 30, 2025 4:00 PM by shira

Last week we considered this simple example of measurement error in auxiliary data X:

Y = current 2025 support
X = true 2024 vote choice
X* = response for 2024 vote choice

All are binary, = 1 for Democrats and 0 for Republicans. This cartoon example is from politics (not meant to be particularly realistic), but measurement error occurs in almost every survey. When does measurement error in an auxiliary adjustment variable negate the gains from adjusting for it to reduce nonresponse bias ?

Suppose we want E(Y), the current 2025 support in the population. If the true X were enough to handle nonresponse bias, then we could estimate this via poststratification:

P(X=1) E(Y | X = 1, sample) + P(X = 0) E(Y | X = 0, sample)

where we have P(X=1) and P(X=0) from the 2024 election results. But we can’t directly estimate E(Y | X = 1, sample) because we only have X* in the sample.

We considered two choices:

Adjust with mismeasured X*: P(X=1) E(Y | X* = 1, sample) + P(X = 0) E(Y | X* = 0, sample)
No adjustment: E(Y | sample)

Questions:

A) Which is closest to the truth, E(Y) ?

B) Which is closer to the previous election result, E(X) ?

C) Which is higher for Democrats ?

The answers depend on the distribution of Y,X,X* in the population and in the sample. For question A, generally I’d guess adjusting even with a mismeasured X* usually gets us closer to truth, but as we’ll see below, it doesn’t always. For question B, one might think adjusting for a past election always brings us closer to that past election’s results, but as we’ll see below, it doesn’t always. For question C, let’s rewrite the no adjustment estimator:

So adjusting for X* could increase support for Democrats if P(X=1) > P(X* = 1 | sample). In other words, more people voted for Democrats in 2024 than say they did in the sample. This sounds like winner’s bias, but it’s also comparing apples (population) to oranges (sample), so not quite.

So the answers to these three questions really does depend !

Here is some R code to simulate your own worlds. I made 4 examples so far. Do you think they’re realistic ?

# Y = 2025 support
# X = 2024 vote
# X* = 2024 recalled vote
# p_ij = P(Y=1 | X=i, X*=j) is 2025 Democrat support by X and X*
# s_ij = P(X=i, X*=j | sample) is distribution of X and X* in sample
# s01 could come from consistency bias 
# s10 could come from winner's bias
# P(X=1) = 0.49 is the true election result

EY_calc <- function(p11,p01,p10,p00, s11,s01,s10,s00, PX1=0.49){
  PX0 <- 1 - PX1
  PY1_X1 <- (s11/(s11+s10))*p11 + (s10/(s11+s10))*p10 # P(Y=1 | X=1, sample)
  PY1_X0 <- (s01/(s01+s00))*p01 + (s00/(s01+s00))*p00 # P(Y=1 | X=0, sample)
  Truth  <- PX1*PY1_X1 + PX0*PY1_X0

  PY1_Xs1 <- (s11/(s11+s01))*p11 + (s01/(s11+s01))*p01 # P(Y=1 | X*=1, sample)
  PY1_Xs0 <- (s10/(s10+s00))*p10 + (s00/(s10+s00))*p00 # P(Y=1 | X*=0, sample)
  Xstar_adjust <- PX1*PY1_Xs1 + PX0*PY1_Xs0

  no_adjust <- s11*p11 + s01*p01 + s10*p10 + s00*p00 # P(Y=1 | sample)

  # closeness to Truth E(Y)
  closer_truth <- if (abs(Xstar_adjust-Truth) < abs(no_adjust-Truth)) "Xstar_adjust" 
  else "no_adjust"

  # closeness to E(X) (last election result)
  closer_EX <- if (abs(Xstar_adjust-PX1) < abs(no_adjust-PX1)) "Xstar_adjust" 
  else "no_adjust"

  # higher for Democrats
  higher_for_Democrats <- if (Xstar_adjust > no_adjust) "Xstar_adjust" 
  else "no_adjust"

  est <- c(Truth=Truth, Xstar_adjust=Xstar_adjust, no_adjust=no_adjust) 
  list(estimates=signif(est, 3), 
       closer_truth=closer_truth, 
       closer_EX=closer_EX, 
       higher_for_Democrats=higher_for_Democrats) 
}

# 1) Xstar_adjust is closer to Truth E(Y)
EY_calc(
  p11=0.82, p01=0.68, p10=0.42, p00=0.25,
  s11=0.48, s01=0.06, s10=0.07, s00=0.39,
  PX1=0.49
)

# 2) no_adjust is closer to Truth E(Y)
EY_calc(
  p11=0.78, p01=0.66, p10=0.46, p00=0.34,
  s11=0.44, s01=0.10, s10=0.05, s00=0.41,
  PX1=0.49
)

# 3) no_adjust is closer to last election E(X)
EY_calc(
  p11=0.781, p01=0.648, p10=0.550, p00=0.297,
  s11=0.476, s01=0.010, s10=0.095, s00=0.419,
  PX1=0.49
)

# 4) Winner’s bias only: s01 = 0 no consistency bias
EY_calc(
  p11=0.86, p01=0.74, p10=0.40, p00=0.28,
  s11=0.50, s01=0.00, s10=0.10, s00=0.40,
  PX1=0.49
)

8 thoughts on “Survey Statistics: more adventures in mismeasured X”

MJT on December 30, 2025 7:47 PM at 7:47 pm said:

This sounds related to the calibrating / poststratifying to ‘estimated controls’ literature.

https://academic.oup.com/jssam/article-abstract/4/3/289/2399680

Your X* would be an estimated control

Reply ↓
- shira on January 5, 2026 2:40 PM at 2:40 pm said:
  
  Thanks, MJT !
  
  In this post, our survey sample has a mismeasured X*, but we know the true population totals P(X=1) from the past election:
  
  P(X=1) E(Y | X* = 1, sample) + P(X = 0) E(Y | X* = 0, sample)
  
  In the very interesting paper you link to, Dever and Valliant (2016), the survey has the correct X, but a separate “benchmark survey” is used to estimate P(X=1), something like:
  
  P_hat(X=1) E(Y | X = 1, sample) + P_hat(X = 0) E(Y | X = 0, sample)
  
  They write this for a generalized regression estimator in equation (4).
  
  Does this sound right to you ?
  
  Reply ↓
  - MJT on January 7, 2026 7:25 PM at 7:25 pm said:
    
    You’re right! The two equations you laid out makes it clear. I misunderstood that your original problem was the mismeasured X* in the sample, not the pop.
    
    Reply ↓
Raphael Nishimura on December 31, 2025 8:49 AM at 8:49 am said:

I posted this in the previous post, but as a reply to Andrew. I ran a simulation study on this and presented at AAPOR earlier this year. Here are the slides: https://aapor.confex.com/aapor/2025/mediafile/Handout/Paper4566/AAPOR%202025%20-%20Nishimura%20R.pdf

Reply ↓
- shira on January 2, 2026 11:06 AM at 11:06 am said:
  
  Thanks so much, Raphael ! I hadn’t seen this extremely relevant work. So cool !
  
  On slide 17, can you explain the formula here ?
  
  Calibrated the respondent sample by:
  …
  Both Z and recalled vote with measurement error, X1 – X6 (Z + Xi)
  
  Reply ↓
  - Raphael Nishimura on January 5, 2026 10:36 AM at 10:36 am said:
    
    Thanks, Shira! Sorry, I didn’t answer this before, I have just seen your post.
    This means that I ran 6 different calibration schemes, using both a covariate Z (with no measurement error) and a covariate X_i (with measurement error, corresponding to recalled vote). In these calibration schemes, I assumed different types and levels of measurement error (this is what X_1-X_6 corresponds to). Slides 13-16 illustrate each of the measurement error scenarios I evaluated.
    
    Reply ↓
    - shira on January 5, 2026 1:31 PM at 1:31 pm said:
      
      Thanks, Raphael !
      
      Ah so you mean you ran 6 different calibration schemes:
      1. Z and X_1
      2. Z and X_2
      …
      6. Z and X_6
      
      ?
    - Raphael Nishimura on January 5, 2026 10:23 PM at 10:23 pm said:
      
      Hi Shira,
      
      Sorry, it seems like I can’t reply directly to your reply.
      
      That’s right, I run 6 different calibration schemes using X with some sort of measurement error.

Statistical Modeling, Causal Inference, and Social Science

Survey Statistics: more adventures in mismeasured X

8 thoughts on “Survey Statistics: more adventures in mismeasured X”

Leave a Reply Cancel reply