How large a sample is needed for a multilevel model

Shang Ha writes,

I have a quick question on Bayesian hierarchical models.

I’ve never gotten a solid answer about the issue of how many respondents should be nested within level-2 units. If I can remember right, Bryk and Raudenbush suggest AT LEAST THREE, but recommend more for conventional HLM. But I cannot find any suggestions in terms of Bayesian hierachical models.

The survey dataset I have comprises the following:

68 metropolitan areas (the number of respondents ranges from 5 to 37)
187 zip code areas (the # of respondents ranges from 1 to 13)
211 census tracts (the # of respondents ranges from 1 to 10)

Can I build hierarchical models using zip code areas and census tracts? If it is not possible in the case of
conventional HLM, then can I apply Bayesian hierarchical model?

My response: yes, you can do it, using either Bayesian or non-Bayesian hierarchical modeling. The two varieties of hierarchical modeling are almost the same; the main difference is that the Bayesian models the variance parameters and the non-Bayesian uses point estimates. Typically there is no real difference between the Bayesian and non-Bayesian unless there are variance parameters estimated near zero or group-level correlation parameters estimated near 1 or -1. All hierarchical models are Bayesian for the purpose of estimating the varying coefficients.

You don’t need 3 observations per group to estimate a multilevel model. 2 observations per group is fine. In fact, it’s even ok to have some groups with 2 observations and some with only 1 observation. I don’t know why anyone would say that 3 or more observations per group are required.

4 thoughts on “How large a sample is needed for a multilevel model

  1. I have read the same thing myself. If you have a large proportion of groups that have only one case in them, what in general will be affected most by this, i.e. will the estimates of the coefficients have inflated standard errors?

  2. I wonder if there is a reference that discusses the number of observations one needs to do multi-level analyses. Is there anything I can cite? I would very much like to keep all my level 2 units and not drop those who have less than three observations.

  3. Sandra

    We discuss this in Section 12.9 of our forthcoming book, so you can cite that. But, from my perspective, the burden is on anyone coming from the other direction and saying that you need 3 observations or whatever. As Mark notes, if you have fewer data you'll have bigger standard errors, but that's OK.

  4. i have a national cross-sectional dataset (census) and want to estimate a HLM of determinants of household-level mortality experience. My dependent var is left-censored at zero. I would like to include a two-stage process by assuming that the censoring is determined endogenously. Whats the most efficient way to do this in a HLM framework?
    Felix

Comments are closed.