Survey Statistics: individualism doesn’t work

Posted on October 21, 2025 4:00 PM by shira

This blog series launched with the proverb “it is the people“. So let’s talk about work by some people I’ve loved talking to over the years: Swen Kuh, Lauren Kennedy, Qixuan Chen, Andrew Gelman, and Aki Vehtari. They’ve been thinking about how to evaluate models for multilevel regression and poststratification (MRP).

The typical machine learning loss looks one individual i at a time: Loss(y_i, yhat_i).

Our post on poststratification focused on estimating a population mean E[Y] with yhat, minimizing Loss(E[Y], y_hat). This differs from Loss(y_i, yhat_i). Indeed, Kuh et al. 2023 caution that these losses may order models differently. In other words, individual-level loss may not be great for choosing models for MRP. Kennedy et al. 2024 propose a way to get at the population-level loss Loss(E[Y], y_hat).

This is one of my less-stellar tent pitches. But I have a neighbor ! This is a post about population-level errors (it is the people !), and our tents average to a good pitch.

Kennedy et al. 2024 point out that the first challenge is that we don’t know E[Y] or E[Y|X]:

However, in practice the population truth (cellwise or at the aggregate level) is not available. One option in this case is to approximate the population truth with the sample observation, .

I assume this means what we wrote as Ehat[Y | X = j, sample] in our post on poststratification, also with clumsy notation. In words, the average Y for folks in the sample with the jth possible combination of covariates X. Aggregating these sample averages across X gives the classical poststratification estimate, which we called yhat_PS. In contrast, the MRP estimate yhat_MRP uses a multilevel model for E[Y|X].

So if I understand, Kennedy et al. 2024 propose to evaluate the MRP estimate by comparing it to the classical poststratification estimate ?

The second challenge is that the same data is used for both fitting the MRP model and assessing its error. As is typical in machine learning, they propose to address this with cross-validation:

K-fold cross validation partitions the sample into K non-overlapping folds. The kth fold is removed to fit the model and score calculated using this fold.

So in their equation (9) above, we would need to compute the average Y for folks in the kth fold with X = j. For this to be nonempty, we need K folks in each cell j. With enough covariates X, this is a tall order. In the paper they propose alternatives.

But I’m still stuck on the first challenge, am I understanding correctly their proposal to get at a population-level error ?

2 thoughts on “Survey Statistics: individualism doesn’t work”

Blissex on October 22, 2025 6:22 AM at 6:22 am said:

«”However, in practice the population truth (cellwise or at the aggregate level) is not available. One option in this case is to approximate the population truth”»

Here I am perplexed with that phrase because if nothing at all is known about the “population truth” then we do not even know if the sampling process is biased or not, or even whether it is ergodic.

In a leap of faith we can just assume ergodicity, but we cannot assume unbiasedness, so to me “approximate the population truth” seems nonsensical.

Under the assumption of ergodicity and repeated sampling we can however make estimators of the population of *samples* and this may be what the authors mean by “approximate the population truth” and relatedly use of “folds” but to treat an approximation to the population of samples as an approximation to the population seems to be a bit challenging :-). Somebody long ago was fond of saying “the map is not the territory”.

Reply ↓
- shira on October 22, 2025 2:06 PM at 2:06 pm said:
  
  Thanks, Blissex !
  
  This is what Kennedy et al. 2024 assume about the sampling process:
  
  The probability of inclusion in the sample for every jth cell is assumed to be the same and denoted pi_j.
  
  But as Gelman, Goel, Rothschild, and Wang 2016 write:
  
  this assumption of cell-level simple random sampling is only reasonable when the partition is sufficiently fine; on the other hand, as the partition becomes finer, the cells become sparse and the empirical sample averages become unstable
  
  Reply ↓

Statistical Modeling, Causal Inference, and Social Science

Survey Statistics: individualism doesn’t work

2 thoughts on “Survey Statistics: individualism doesn’t work”

Leave a Reply Cancel reply