Mike Larsen asks,

I [Mike] have one specific question about your article in Statistical Science on weighting and multi-level regression models. I have one specific question about the article: do the results for the table 1 regression results use the procedure you describe in section 1? That is, does it include interactions between X and z in the model, or does it use design variables with main effects for the relation (y on z) of interest and simply report the coefficient for y on z? I couldn’t really tell, but perhaps I missed something.

I guess I have another question: on page 157 in the last full paragraph you state that it is not clear why a simple linear regression of y on z in the entire population would be of interest. That implies that it is not of interest. The first line of 1.4 discusses the regression of y on z. If we had all the data in the population, would we not simply compute the simple linear regression parameter estimates and report those as the relationship between y and z (assuming linearity)? If not, what are we trying to estimate with the E(y|z) function? I understand that it would be more interesting to look at y on z and X if we had tons of data, but that did not appear to be the motivation at the start of 1.4.

Related to this, I see that the population proportions of men and women enter into equation (4) through Bayes’ theorem because you don’t have many people of a single height. In the second example (page 158) you might have E(male|white=1) etc. from population data, such as census data in the geographical area. You could use that, couldn’t you, instead of the proportions white among males in the sample and then Bayes’ theorem?

Finally, about implementing this idea, perhaps we need groups of statisticians inside federal agencies to build recommendations for multilevel models for various outcomes and relationships among variables in place of (or in addition to) the survey statisticians developing complicated weights? What do you think?

My reply:

1. The details are given in the second column of p.158. The model does not include interactions, and we just use the coefficient of z.

2. My point on p.157 that you noted is that, once you consider an additional predictor in the model, you have to consider that the regression of y on z might not be linear. In which case, yes, you can certainly create some summary such as the slope that you’d get by regressing y on z given all the data–but it’s not clear why you’d want it. The E(y|z) function is still clearly defined, though, even if nonlinear.

There’s a paper by Korn and Graubard in the American Statistician several years ago that discusses this point.

3. For equation (4), even if you had many people at any single height, you’d want to adjust using the population dist of men and women, to correct for differential nonresponse rates. In the Social Indicators Survey example, yes, we poststratified using census numbers.

4. Yes, I think that what is needed is a set of worked examples showing how the hierarchical modeling can work. Once we have the examples, we can have guidelines. But I don’t really have the examples yet–note the “struggles” in the title!

Andrew, is table 3 in Gelman (2007) mislabeled relative to table 14 in Lu and Gelman, which seems to be the source?

Example: Inverse prob in table 3 starts at 1.9; in table 14 it is 2.5.

[This comment need not be posted.]

Z,

Yes, you're right. That's so frustrating! Assuming inverse-probability gives higher se's because it doesn't make use of the information that the marginal totals are known.

What this blog posting has done for me (a fan letter):

I starting reading Gelman (2007, "Struggles with … weighting…") because I do a lot of work developing weights and working with weighted data. I got hooked on the first sentence, "Survey weighting is a mess".

During the course of spending the day with this paper, I made substantial headway on two problems.

First, as part of a project to provide better error ranges on our data, I'd run a bunch of comparisons with simple random sample, jackknife, and two Horvitz-Thompson type (inverse probability) formulas. I was puzzled because the inverse probability methods gave higher results than the jackknife, and in such circumstances coding errors are the most likely explanation. But, we couldn't find any in a detailed trace or in code replicated by another programmer. So, my wanderings into Lu and Gelman were helpful in discovering that the inverse probability methods are likely to overstate the variance (particularly if auxiliary information such as census marginals is known). I'm pretty sure my overstatement is in the same ballpark as in Lu and Gelman. Now I can get this project back off the table.

Second, some years ago I introduced "Effective Sample Size" as a management metric to provide a single number for sample quality tracking. But I lacked a good, fairly understandable explanation of how this metric worked — provided by a statistician of some repute in published literature, not something I wrote that I can't publish due to company IP restrictions. The Kish (1992) paper referenced in Gelman (2007) does this nicely.

The really nice thing is that reading this article has helped me substantially with two problems, and I'm only part way through it.