Multilevel regression and poststratification (MRP) vs. survey sample weighting

Marnie Downes and John Carlin write:

Multilevel regression and poststratification (MRP) is a model-based approach for estimating a population parameter of interest, generally from large-scale surveys. It has been shown to be effective in highly selected samples, which is particularly relevant to investigators of large-scale population health and epidemiologic surveys facing increasing difficulties in recruiting representative samples of participants. We aimed to further examine the accuracy and precision of MRP in a context where census data provided reasonable proxies for true population quantities of interest. We considered 2 outcomes from the baseline wave of the Ten to Men study (Australia, 2013–2014) and obtained relevant population data from the 2011 Australian Census. MRP was found to achieve generally superior performance relative to conventional survey weighting methods for the population as a whole and for population subsets of varying sizes. MRP resulted in less variability among estimates across population subsets relative to sample weighting, and there was some evidence of small gains in precision when using MRP, particularly for smaller population subsets. These findings offer further support for MRP as a promising analytical approach for addressing participation bias in the estimation of population descriptive quantities from large-scale health surveys and cohort studies.

This article appeared in 2020 but I just happened to hear about it now.

Here’s the result from the first example considered by Downes and Carlin:

For the dichotomous labor-force outcome, MRP produced very accurate population estimates, particularly at the national level and for the larger states, where the employment rate was estimated within 1% in each case. For the smallest states of ACT and NT, MRP overestimated the employment rate by approximately 5%. Post-hoc analyses revealed that these discrepancies could be explained partly by important interaction terms that were evident in population data but not included in multilevel models due to insufficient data. For example, based on Census data, there was a much higher proportion of Indigenous Australians living in NT (25%) compared with all other states (<5%), but only 3% (n = 2) of the Ten to Men sample recruited from NT identified as Indigenous. There were also differences in the labor-force status of Indigenous Australians by state according to the Census: 90% of Indigenous Australians residing in ACT were employed compared with 79% residing in NT. Due to insufficient data available, it was not possible to obtain a meaningful estimate of this Indigenous status-by-state interaction effect.

And here’s their second example:

For the continuous outcome of hours worked, the performance of MRP was less impressive, with population quantities consistently overestimated by approximately 2 hours at the national level and for the larger states and by up to 4 hours for the smaller states. MRP still however, outperformed both unweighted and weighted estimation in most cases. The inaccuracy of all 4 estimation methods for this outcome likely reflects that the 2011 Census data for hours worked was not a good proxy for the true population quantities being estimated by the Ten to Men baseline survey conducted in 2013–2014. It is entirely plausible that the number of hours worked in all jobs in a given week could fluctuate considerably due to temporal factors and a wide range of individual-level covariates not included in our multilevel model. This was also evidenced by the large amount of residual variation in the multilevel model for this outcome.

Downes and Carlin summarize what they learned from the examples:

The increased consistency among state-level estimates achieved by MRP can be attributed to the partial pooling of categorical covariate parameter estimates toward their mean in multilevel modeling. This was particularly evident in the estimation of labor-force status for the smaller states of TAS, ACT, and NT, where MRP estimates fell part of the way between the unweighted state estimates and the national MRP estimate, with the degree of shrinkage reflecting the relative amount of information available about the individual state and all the states combined.

We did not observe, in this study, the large gains in precision achieved with MRP seen in our previous case study and simulation study. The multilevel models fitted here were more complex, including a larger number of covariates and multiple interaction effects. While we have sacrificed precision, this increased model complexity appears to have achieved increased accuracy. We did see small gains in precision when using MRP, particularly for the smaller states, and we might expect these gains to be larger for smaller sample sizes where the benefits of partial pooling in multilevel modeling would be greater.

Also:

The employment outcome measures considered in this study are not health outcomes per se; rather, they were chosen in the absence of any health outcomes for which census data were available to provide a comparison in terms of accuracy. We have no reason to expect MRP would behave any differently for outcome measures more commonly under investigation in population health or epidemiologic studies.

MRP can often lead to a very large number of poststratification cells. Our multilevel models generated 60,800 unique poststratification cells. With a total population size of 4,990,304, almost three-fourths of these cells contained no population data. This sparseness is not a problem, however, due to the smoothing of the multilevel model and the population cell counts used simply as weights in poststratification.

They conclude:

Results of this case-study analysis further support previous findings that MRP provides generally superior performance in both accuracy and precision relative to the use of conventional sample weighting for addressing potential participation bias in the estimation of population descriptive quantities from large-scale health surveys. Future research could involve the application of MRP to more complex problems such as estimating changes in prevalence over time in a longitudinal study or developing some user-friendly software tools to facilitate more widespread usage of this method.

It’s great to see people looking at these questions in detail. Mister P is growing up!

4 thoughts on “Multilevel regression and poststratification (MRP) vs. survey sample weighting

  1. “This article appeared in 2000 but I just happened to hear about it now.”

    I’m guessing that should be 2020.

    Also, it seems that the more a field has their models tested—i.e. surveys with actual election outcomes, machine learning—the more they embrace regularization and bayesian methods.

  2. I wonder how can doubly-robust methods fit in here? (These would generally involve modeling both the outcome and the selection into the sample.)

Leave a Reply

Your email address will not be published. Required fields are marked *