Survey Statistics: Poststratification ?

Lots is written on this blog about “Poststratification”. Andrew addresses it formally with a “Mister“. But when I learned it from Alan Zaslavsky’s course it was casually just “Poststratification”. At the time it sounded to me like damage control after we forgot to stratify.

  • Stratification” so that your sample has the same distribution for variables X as in the population: divide the population into strata (i.e. groups) based on X. Then take a sample in each stratum proportional to its size.
  • Post“: divide the population into strata only after the sample is already selected.

(As Raphael Nishimura and Andrew Gelman pointed out in the comments below, stratification has other uses !)

Fancy graphics from a DOL video I worked on:

How can Poststratification help ?

Suppose we want to estimate E[Y], the population mean. But we only have Y in the survey sample. For example, suppose Y is voting Republican. We can use the sample mean, ybar = Ehat[Y | sample] (I don’t know how to LaTeX on this blog).

But our sample mean is conditional on being sampled. And what if survey-takers are more or less Republican than the population ? As Joe Blitzstein teaches us: “Conditioning is the soul of statistics.” Conditioning on being sampled might bias our estimate. But maybe more conditioning can also somehow help us ?! Joe taught me to try conditioning whenever I get stuck.

If we have population data on X, e.g. racial group, then we can estimate Republican vote share conditional on racial group E[Y|X] and aggregate according to the known distribution of racial groups, invoking the law of total expectation (Joe’s favorite): E[Y] = E[E[Y|X]]. So if our sample has the wrong distribution of racial groups, at least we fix that with some calibration. Replacing “E” with estimates “Ehat”, poststratification estimates E[Y] with E[Ehat[Y | X, sample]].

When our estimate of E[Y|X] is the sample mean of Y for folks with that X, the aggregate estimate is classical poststratification, yhat_PS. When our estimate of E[Y|X] is based on a model that regularizes across X, the aggregate estimate is Multilevel Regression (“Mister“) and Poststratification, yhat_MRP. Gelman 2007 shows how yhat_MRP is a shrinkage of yhat_PS towards ybar.

Which estimate is best for estimating E[Y] ? ybar, yhat_PS, or yhat_MRP ?

To answer this, I’d want to compare Loss(E[Y], yhat), Loss(E[Y], yhat_PS), and Loss(E[Y], yhat_MRP) for some population-level loss. This differs from the typical machine learning individual-level losses Loss(y_i, ybar), Loss(y_i, yhat_PS_i), and Loss(y_i, yhat_MRP_i).

As Kuh et al 2023 write:

it is not individual predictions that need to be good, but rather the aggregations of these individual estimates.

Gelman 2007 ends with “Where to go next”:

A parallel approach is through simulation studies—for greater realism, these can often be constructed using subsamples of actual surveys—as well as theoretical studies of the bias and variance of poststratified estimates with moderate sample sizes.

I found 3 papers that have gone there, but I’d like help finding more.
Holt & Smith 1979 compare population-level Loss(E[Y], ybar) to Loss(E[Y], yhat_PS) in a simulation study. They do not include MRP in the simulation. They find that neither is uniformly best, but poststratification is usually much better.
Wang & Gelman 2014 compare individual-level Loss(y_i, yhat_PS_i) to Loss(y_i, yhat_MRP_i) using cross-validation holding out y_i. They show that MRP does best, but is nearly indistinguishable from complete pooling of interactions (something closer to ybar, complete pooling of everything):
Kuh et al 2023 compare population-level loss to individual-level loss in a simulation study. They caution that these losses may order models differently ! Choose your diagnostics carefully. They only consider MRP.
So I have not yet found the comparison I want: population-level loss for unweighted ybar, classical poststratification, and MRP. I think adding MRP to the Holt & Smith 1979 simulation would be interesting ? Can someone do this (my birthday is in October) ?
Do any MRP papers discuss standard errors theoretically ? Gelman 2007 only discusses standard errors for models with noninformative priors (see below). I also think the formulas here have typos ?

11 thoughts on “Survey Statistics: Poststratification ?

  1. “Stratification” = divide the population into strata (i.e. groups) based on some variables … for representativeness. |

    … seems confusing and departs from normal statistical probability sampling — by subjectively manipulation of the gathered Sample and ‘specific’ Population being sampled.

    • I don’t love that definition either. Both because I *really* don’t like the notion of “representativeness”, but also because this is just a specific use of stratification, i.e., allocate the sample proportionately to the population across the strata with an expectation of gains in precision. But there is more of stratification than this.

      • Thanks, Raphael !

        Great point, which Andrew echos below. Stratification has more uses than trying to gain precision for estimating a population average.

        Curious to hear more about your objections to the notion of “representativeness”.

    • It’s a variation of probability sampling (EPSM) but it requires detailed knowledge of the sampling frame prior to selection, which is only possible in specific cases. You have to adjust the standard error estimations because you’re guaranteed not to have a sample that contains members of just one group.

  2. Shira:

    Thanks as always for the posts. Regarding definitions:

    1. As Raphael says, stratification in sampling is not always done with the goal of representativeness. Often a stratum will be purposely oversampled. This especially comes up when you’re sampling organizations rather than people. You can have a stratum with a small number of large organizations that are likely to be included in the sample and then strata with many small organizations, each with low probability of inclusion. You can even have a certainty stratum in which everything gets sampled.

    2. In many settings, stratification is not a choice: you don’t have a single sampling frame, or for reasons of practicality you will do a separate sampling process within each stratum.

    3. One way I sometimes describe poststratification is the analysis of unstratified sample data as if they were stratified.

    4. In modern poststratification (MRP or RPP–that is, multilevel regression and poststratification or, more generally, regularized prediction and poststratification), we are still estimating the population within each stratum, but these estimates are themselves constructed through model fit to all the sampled data. So it doesn’t look like a classical stratified estimate. That said, these same procedures can be used to analyze stratified data too. For the purpose of the analysis, it doesn’t really matter whether the sampling procedure was stratified or not!

    5. I’d also point everyone to our new paper on MRPW (MRP with sampling weights), but Columbia’s server still seems to be down. Once the server is up again, you can find it at this page.

    • If it is done for oversampling purposes then you would need to apply weights when analyzing the combined data. Another way stratification is interesting is if you see it as a kind of limiting case of cluster sampling in which all the clusters are included.

      I often have students use this https://cress.soc.surrey.ac.uk/samp/index.html and compare the results with 10 samples for each of the methods. Stratified random sampling almost always wins on estimates but loses on how much it costs.

    • Thanks, Andrew !

      1. Indeed, Holt & Smith 1979 begins highlighting at least 2 purposes: “STRATIFICATION is one of the most widely used techniques in sample survey design serving the dual purposes of providing samples that are representative of major sub-groups of the population and of improving the precision of estimators.”

      2. yup ! I think this was the case in our Millennium Village Project ? There were budgets and sampling frames per country.

      3. that’s a nice description !

      4. Holt & Smith 1979 introduction also says: “In one way post stratification is potentially more efficient than stratification before selection since after sampling the stratification factors can be chosen in different ways for different sets of variables in order to maximize the gains in precision.”

      5. Looks like the server is back up ! Thanks for pointing me to this paper. Does it compare these MRP(W) also to classical poststratification and no poststratification ?

  3. The most common calculation method for this is probably by using survey weights. Eg in this case, if you think non-response rates differ by race, you would adjust the design weights to compensate for this. It looks like this is the same as the post-stratification calculation you describe.

    When there are multiple calibration variables, however, there is not a unique way to calculate these weights. Are there links between the literature on creating survey weights and the MRP approach?

    • With post-stratification there *is* a unique way to calculate the weights. Post-stratification is the special case where you just match observed and expected numbers for each cell in a partition of the sample space (a ‘saturated model’, sort of), so all the ways to adjust weights turn out to be the same: estimating sampling probabilities (with any link function) or calibration with any distance metric.

      Often with multiple variables you can’t afford post-stratification because there would be too many post-strata, and that’s when there are all the choices.

      • Your last sentence is approaching what I was thinking of. Researchers (or data providers) often have multiple calibration variables, but not the intersection of these variables. So, iterative methods are used to adjust weights to satisfy all the calibration variables simultaneously. But, there are tradeoffs. If calibration variables are inconsistent, you can end up with very high weights on individual cases. So, I’m wondering if the MRP approach is a better approach here.

      • Thanks, Thomas !

        In your book “Complex Surveys” p.142 you write:

        The calibration constraints in equation 7.3 do not uniquely define the weights. For example, if there is only one auxiliary variable X it is always possible to satisfy equation 7.3 by making a large change to the weight for just one observation. One way to completely specify g_i is to also require that the calibrated weights are as close as possible to the sampling weights. That is, for some distance function d ( , ), the calibration weights are chosen to make weight change = sum_i=1^n d(g_i/pi_i, 1/pi_i) as small as possible while still satisfying the calibration constraints. The regression estimator of the population total corresponds to one choice of d ( , ) …

        In the comment you’re saying that when X are a complete cross-classification of a set of categorical variables, then all d( , ) give the same weights ? Is this “obvious” ?

Leave a Reply

Your email address will not be published. Required fields are marked *