Last week we considered this simple example of measurement error in auxiliary data X:
- Y = current 2025 support
- X = true 2024 vote choice
- X* = response for 2024 vote choice
All are binary, = 1 for Democrats and 0 for Republicans. This cartoon example is from politics (not meant to be particularly realistic), but measurement error occurs in almost every survey. When does measurement error in an auxiliary adjustment variable negate the gains from adjusting for it to reduce nonresponse bias ?

Suppose we want E(Y), the current 2025 support in the population. If the true X were enough to handle nonresponse bias, then we could estimate this via poststratification:
P(X=1) E(Y | X = 1, sample) + P(X = 0) E(Y | X = 0, sample)
where we have P(X=1) and P(X=0) from the 2024 election results. But we can’t directly estimate E(Y | X = 1, sample) because we only have X* in the sample.
We considered two choices:
- Adjust with mismeasured X*: P(X=1) E(Y | X* = 1, sample) + P(X = 0) E(Y | X* = 0, sample)
- No adjustment: E(Y | sample)
Questions:
A) Which is closest to the truth, E(Y) ?
B) Which is closer to the previous election result, E(X) ?
C) Which is higher for Democrats ?
The answers depend on the distribution of Y,X,X* in the population and in the sample. For question A, generally I’d guess adjusting even with a mismeasured X* usually gets us closer to truth, but as we’ll see below, it doesn’t always. For question B, one might think adjusting for a past election always brings us closer to that past election’s results, but as we’ll see below, it doesn’t always. For question C, let’s rewrite the no adjustment estimator:
E(Y | sample) = P(X*=1 | sample) E(Y | X* = 1, sample) + P(X* = 0 | sample) E(Y | X* = 0, sample)
So adjusting for X* could increase support for Democrats if P(X=1) > P(X* = 1 | sample). In other words, more people voted for Democrats in 2024 than say they did in the sample. This sounds like winner’s bias, but it’s also comparing apples (population) to oranges (sample), so not quite.
So the answers to these three questions really does depend !
Here is some R code to simulate your own worlds. I made 4 examples so far. Do you think they’re realistic ?
# Y = 2025 support
# X = 2024 vote
# X* = 2024 recalled vote
# p_ij = P(Y=1 | X=i, X*=j) is 2025 Democrat support by X and X*
# s_ij = P(X=i, X*=j | sample) is distribution of X and X* in sample
# s01 could come from consistency bias
# s10 could come from winner's bias
# P(X=1) = 0.49 is the true election result
EY_calc <- function(p11,p01,p10,p00, s11,s01,s10,s00, PX1=0.49){
PX0 <- 1 - PX1
PY1_X1 <- (s11/(s11+s10))*p11 + (s10/(s11+s10))*p10 # P(Y=1 | X=1, sample)
PY1_X0 <- (s01/(s01+s00))*p01 + (s00/(s01+s00))*p00 # P(Y=1 | X=0, sample)
Truth <- PX1*PY1_X1 + PX0*PY1_X0
PY1_Xs1 <- (s11/(s11+s01))*p11 + (s01/(s11+s01))*p01 # P(Y=1 | X*=1, sample)
PY1_Xs0 <- (s10/(s10+s00))*p10 + (s00/(s10+s00))*p00 # P(Y=1 | X*=0, sample)
Xstar_adjust <- PX1*PY1_Xs1 + PX0*PY1_Xs0
no_adjust <- s11*p11 + s01*p01 + s10*p10 + s00*p00 # P(Y=1 | sample)
# closeness to Truth E(Y)
closer_truth <- if (abs(Xstar_adjust-Truth) < abs(no_adjust-Truth)) "Xstar_adjust"
else "no_adjust"
# closeness to E(X) (last election result)
closer_EX <- if (abs(Xstar_adjust-PX1) < abs(no_adjust-PX1)) "Xstar_adjust"
else "no_adjust"
# higher for Democrats
higher_for_Democrats <- if (Xstar_adjust > no_adjust) "Xstar_adjust"
else "no_adjust"
est <- c(Truth=Truth, Xstar_adjust=Xstar_adjust, no_adjust=no_adjust)
list(estimates=signif(est, 3),
closer_truth=closer_truth,
closer_EX=closer_EX,
higher_for_Democrats=higher_for_Democrats)
}
# 1) Xstar_adjust is closer to Truth E(Y)
EY_calc(
p11=0.82, p01=0.68, p10=0.42, p00=0.25,
s11=0.48, s01=0.06, s10=0.07, s00=0.39,
PX1=0.49
)
# 2) no_adjust is closer to Truth E(Y)
EY_calc(
p11=0.78, p01=0.66, p10=0.46, p00=0.34,
s11=0.44, s01=0.10, s10=0.05, s00=0.41,
PX1=0.49
)
# 3) no_adjust is closer to last election E(X)
EY_calc(
p11=0.781, p01=0.648, p10=0.550, p00=0.297,
s11=0.476, s01=0.010, s10=0.095, s00=0.419,
PX1=0.49
)
# 4) Winner’s bias only: s01 = 0 no consistency bias
EY_calc(
p11=0.86, p01=0.74, p10=0.40, p00=0.28,
s11=0.50, s01=0.00, s10=0.10, s00=0.40,
PX1=0.49
)
This sounds related to the calibrating / poststratifying to ‘estimated controls’ literature.
https://academic.oup.com/jssam/article-abstract/4/3/289/2399680
Your X* would be an estimated control
Thanks, MJT !
In this post, our survey sample has a mismeasured X*, but we know the true population totals P(X=1) from the past election:
P(X=1) E(Y | X* = 1, sample) + P(X = 0) E(Y | X* = 0, sample)
In the very interesting paper you link to, Dever and Valliant (2016), the survey has the correct X, but a separate “benchmark survey” is used to estimate P(X=1), something like:
P_hat(X=1) E(Y | X = 1, sample) + P_hat(X = 0) E(Y | X = 0, sample)
They write this for a generalized regression estimator in equation (4).
Does this sound right to you ?
You’re right! The two equations you laid out makes it clear. I misunderstood that your original problem was the mismeasured X* in the sample, not the pop.
I posted this in the previous post, but as a reply to Andrew. I ran a simulation study on this and presented at AAPOR earlier this year. Here are the slides: https://aapor.confex.com/aapor/2025/mediafile/Handout/Paper4566/AAPOR%202025%20-%20Nishimura%20R.pdf
Thanks so much, Raphael ! I hadn’t seen this extremely relevant work. So cool !
On slide 17, can you explain the formula here ?
Thanks, Shira! Sorry, I didn’t answer this before, I have just seen your post.
This means that I ran 6 different calibration schemes, using both a covariate Z (with no measurement error) and a covariate X_i (with measurement error, corresponding to recalled vote). In these calibration schemes, I assumed different types and levels of measurement error (this is what X_1-X_6 corresponds to). Slides 13-16 illustrate each of the measurement error scenarios I evaluated.
Thanks, Raphael !
Ah so you mean you ran 6 different calibration schemes:
1. Z and X_1
2. Z and X_2
…
6. Z and X_6
?
Hi Shira,
Sorry, it seems like I can’t reply directly to your reply.
That’s right, I run 6 different calibration schemes using X with some sort of measurement error.