Survey Statistics: individualism doesn’t work (even when weighted)

Last year we saw that individual-level loss may not be great for choosing models for MRP (“individualism doesn’t work”).

The typical machine learning looks at individual-level Loss(y_i, yhat_i).

But for MRP we care about population-level Loss(E[Y], E[yhat_i]) where E[Y] is the unknown population mean and E[yhat_i] is our MRP estimate.

Earlier this month we saw that the model that minimizes individual-level loss in the sample may not be the model that minimizes individual-level loss in the population:

Kuh et al. 2023 tried a weighted-to-the-population individual-level loss but saw this still ordered models quite differently from the population-level loss. So the issue isn’t just the weighting, it’s the aggregation.

Ok but with individual-level Loss(y_i, yhat_i) we have the ground truth y_i in our survey.

For population-level Loss(E[Y], E[yhat_i]) we don’t have the ground truth E[Y].

Kennedy et al. 2024 replace E[Y] with the classical poststratification estimate E[ybar_X] (see the post on poststratification). But this is minimized when the multilevel regression (“MR” of MRP) is ybar_X, a data summary rather than a regularized model. This may overfit to the survey data and generalize poorly. This is analogous to minimizing training error for individual-level loss, see ESL p.221:

As in ESL, Kennedy et al. 2024 handle this with cross-validation.

6 thoughts on “Survey Statistics: individualism doesn’t work (even when weighted)

  1. (I just finished writing the following poem and saw the picture of the scenery accompanying this blogpost which I thought was very fitting. I hope it’s okay to share the poem here. Wonderful view!)

    Perhaps the thing with poetry
    Is that it can show what you can see
    In the exact moment in which it might be
    That the path to the view is momentarily free

Leave a Reply

Your email address will not be published. Required fields are marked *