Survey weighting is a mess

Dave Judkins writes, regarding my Struggles with Survey Weighting and Regression Modeling paper,

I am hoping you might be able to clarify a point in your approach. How does a variable like number of phone lines in the house get used in equation 5? (Given that N.pop and X.pop are not available.) Does your work in Section 3 apply only to X variables with known population distributions?

My reply:

My student and I are working on how to deal with these “non-census variables.” The Bayesian answer is that you need to know the N’s for the crosstabs of these non-census variables and the census variables. Since only the census variables are known, the relative N’s for the non-census variables are unknowns are random variables, they need a prior distribution, etc etc. Inference is done on these by making an assumption about selection probability (e.g., that households with multiple phones are twice as likely to be picked, and households with intermittent service are half as likely to be picked, compared to households with one phone line). My conjecture is that if you have a simple flat prior on the unknown multinomial probabilities, this reduces to some sort of inverse-probability-weighting analysis, and that maybe one can do better using a more structured prior (i.e., a hierarchical model). But for now it’s all talk and no action from me on this!

Dave replied with some information on how they adjust for non-Census variables at Westat and links to this recent paper of his on sample-based raking, work that started around 1987:

Judkins, D., Nadimpalli, V. and Adeshiyan, S. (2005). Replicate control totals. Proceedings of the Section on Survey Research Methods of the American Statistical Association, pp 3167-3171.

Which reminds me of this 2001 paper by Cavan, Jonathan, and myself on poststratifying without population-level information.