Include all design information as predictors in your regression model, then postratify if necessary. No need to include survey weights: the information that goes into the weights will be used in any poststratification that is done.

David Kaplan writes:

I have a question that comes up often when working with people who are analyzing large scale educational assessments such as NAEP or PISA. They want to do some kind of multilevel analysis of an achievement outcome such as mathematics ability predicted by individual and school level variables. The files contain the actual weights based on the clustered sampling design. There is a strong interest in using Bayesian hierarchical models and have asked me how to handle the weights. What I tell them is that if the data set contains the actual weighting variables themselves (the variables that went into creating the weights to begin with), then having those variables in the model along with their interactions is probably sufficient. The only thing is that they would not want an interaction if one or more of the main effects is null. I am not aware of a way to actually use the weights that are provided in a hierarchical Bayesian analysis. I am hoping that the advice I am giving is somewhat sound, because as with all surveys, the sampling design must be accounted for. What are your thoughts on this?

My reply: As you say, I think the right thing to do is to include the design factors as predictors in the analysis, and then poststratify if there is interest in population-level averages. In that case it is better not to use the weights. Or, to put it another way, the information that goes into the weights will be used in any poststratification that is done.

18 thoughts on “Include all design information as predictors in your regression model, then postratify if necessary. No need to include survey weights: the information that goes into the weights will be used in any poststratification that is done.

  1. Weighing or not weighing? It looks to me like a bias-variance trade-off.

    When the sample is not representative, not weighing yields a biased estimate, but weighing increases variance (which is annoying when you are after statistically significant results…)

    Does post-stratification increase the variance (of the population estimate) as weighing does?

    PS: if it does not, it would help me convince my boss to move to bayesian analysis and post-stratification

    • Alain:

      No method corrects bias from unmodeled variables, but, to the extent that weighting corrects for bias, poststratification does so too. No tradeoff needed. Poststrat increases variance if you poststratify the raw data but not if you first do multilevel regression or regularized prediction. See here and here.

      • Thanks for your reply. Is multilevel modeling still needed if the bias comes from a SINGLE variable?

        For example, with respondent-driven sampling (RDS), participants with a larger social network get over-represented in the sample. It is common practice to correct for this bias by using weights that are inversely proportional to the degree of participants (i.e. number of social connections). The adjustment is therefore based on a single variable.

        Assuming the degree distribution of the target population was known (which is not the case in practice…), controlling for degree before post-stratification should work? No need for multilevel in that case?

        • Alain:

          I discuss respondent-driven sampling here. The short answer is that in real-world respondent-driven sampling, you don’t actually know the probability of selection. The way I recommend analyzing such data is to poststratify on gregariousness (number of social connections) or some similar variable. The distribution of gregariousness in the population won’t be known, so that itself will have to be estimated from the sample.

        • I infer that your weights model something like probability of response, to account for oversampling, and that’s increasing variance. But that approach does nothing to account for (potential) similarity of responses among those with similarly-sized networks. What if you created “classes” representing different ranges of network size, via median split or some substantive/theoretically-driven partitioning? It seems plausible that people with a similar number of connections might respond similarly. (or, if on different sides of an issue, might hold similarly extreme/moderate views.) If, in fact, the lower third of network sizes all respond similarly, and the same for the middle and upper thirds, that could soak up a lot of the variance your weights are adding in. Put another way, you currently have up to n classes based on number of connections, with a different weight assigned to each class. You might reduce the number of classes to as few as 2 or 3, then apply the weights within classes. Not only might the classes themselves absorb variance, but the nested weights might also add less variance (since the class itself already explains some of the oversampling). I should disclose surveys aren’t my area of expertise, but it sounds like a cool idea you could try and check the ICC’s.

  2. Recently I have been thinking that survey weights themself are just some meta-features. As we would encode state-level information into a state-level regression, isn’t it sensible to also include survey weights in the multilevel regression as an input too?

  3. As someone that researchs on PISA, and trying to be orthodox in terms of folowing instructions (lack of knowldege?), I would like to briefly explain PISA’s (and other international large scale assessments) weights. First, there are 2 different weights: students’ weights and senate weights. The first one represent a complex multilevel probability of being selected (student/class/school/non response…). Senate just equalises countries.

    Previous research on multilevel PISA analysis suggests using weights at level 1, but scaling weights for level 2 (in this case, schools). For example: Rabe‐Hesketh, S., & Skrondal, A. (2006). Multilevel modelling of complex survey data. Journal of the Royal Statistical Society: Series A (Statistics in Society), 169(4), 805-827. There is a certain consensus in recommending not to use weights on level 2 as the student weight (in level 1) already corresponds to the inverse of the joint probability of selection for a particular student in certain school. See: Rutkowski, L., Gonzalez, E., Joncas, M., & von Davier, M. (2010). International large-scale assessment data: Issues in secondary analysis and reporting. Educational Researcher, 39(2), 142-151.

    Additionnaly, in case of PISA, there are 80 replicate weights (BRR) to address sampling variance. In R, for example, BIFIEsurvey can handle 2-level regressions with weights, replicate weights (and plausible values as outcome variables). Not sure if there is any bayesian ‘ready to use’ approach.

    Obviousily this is far from MrP but, in same cases, is worth knowing when it is not possible to use post-stratification (in my case, I do cross-country models).

    • Or the Fragile Families Survey. There are many other surveys where the sampling design variables are not included in the public use files. What is the procedure in that situation?

    • Thomas, Aleja:

      There are 3 ways of handling this sort of non-census variable in MRP:

      1. Compute weighted averages within poststratification cells, as we do here; see here for clean code.

      2. Include these additional variables in the poststratification and do some modeling to estimate the population distribution; see here for an early example.

      3. Adjust for enough census variables that you don’t feel you have to worry about any remaining adjustment variables. Remember that the weights are just a means to an end, which is inference for the general population.

      • Hi Andrew,

        Another related and probably simple question. I understand the general idea of MRP when all of the variables are survey strata variables – e.g. sex, race, region, … etc. However, in practice, these are not the only variables of interest. Other non-strata type variables such as attitudes, opinions, etc., are going to be in a regression model. In my original post, I mentioned student and school factors that are not strata type variables. How are those estimates handled in MRP? Is it reasonable to assume that there is no adjustment that can be made to them and that posterior predictive distributions would still be better when combined with strata-variables that have been post-stratified? I guess what I am really asking is how does one do post-stratification in a Bayesian regression analysis when some of the variables are strata (e.g. sex) for which I can get population estimates, and some are not. I hope that made sense.

        • Could you make predictions conditional on the various variables in the regression, and then weight these predictions according to the strata to get weighted avg population predictions? This seems obvious enough that I imagine I’m missing the point.

  4. Another related and probably simple question. I understand the general idea of MRP when all of the variables are survey strata variables – e.g. sex, race education… etc. However, in practice, these are not the only variables of interest. Other non-strata type variables such as attitudes, opinions, etc., are going to be in a model. In my original post, I mentioned student and school factors that are not strata type variables. How are those estimates handled in MRP? Is it reasonable to assume that there is no adjustment that can be made to them and that posterior predictive distributions would still be better when combined with strata-variables that have been post-stratified? I hope that made sense.

    Thanks.

  5. Hi all, I read somewhere (though I can’t remember where) that the use of sampling weights is a violation of the likelihood principle insofar as new observations are being created that are never observed. I see the logic in that, but was wondering if there was some consensus to that view. Also, if anyone knows a reference for that claim, please pass it on.

    Best,

    David

    • Many legitimate procedures will violate the likelihood principle: posterior predictive check for example. Indeed any generative/predictive procedure is likely to violate the likelihood principle. The sampling weight and the postratifcation imply different inference procedures, but would lead to the the post-inference generative predictions, so I think they equally violate the likelihood principle.

  6. This is interesting. I would think PPCs are not a violation of the likelihood principle insofar as the posterior predictive distribution was obtained after accounting for the likelihood of the data in hand. I see this as different from, say, the usual criticism of the p-value as referencing observations that never occurred (as in Jeffreys’).

Leave a Reply to Yuling Cancel reply

Your email address will not be published. Required fields are marked *