No, you don’t need 20 groups to do a multilevel analysis

Anna Reimondos writes,

I stumbled upon your blog while reading up on multilevel modeling, and have found it really useful. I have been itching to ask a question on it but was not sure if the correct procedure was to email you personally or to write in a comment on the blog. Many apologies if I was meant to type my comment/question directly in the blog.

My question is very simple (stupid?). I am just about to embark on my PhD thesis where I will be looking at a number of research questions relating to demographic behaviour in Europe. I will be conducting secondary analysis on a survey of 11 European countries.
I was hoping to use multilevel modeling, as not much research seems to have been done using this method in my field of research. While I have never used multilevel modeling before from what little I have read so far it seems that for a lot of questions if I could include country level explanatory variables, that that might explain a lot of the behaviour observed.

The level 1 would be the individual respondents and level 2 the countries. Is 11 countries enough to use multi-level modeling? I read that around 20-30 units of analysis were required at the highest level, so I am a bit worried.

My reply:

1. Email is a fine way to send a question.

2. Yes, you can include country-level explanatory variables. No, I don’t think you don’t need 20 or more groups. Track down who told you that and find out why they said it. Maybe they have a good reason; if so, I’d like to hear it…

3. For many examples of analyses across countries, take a look at the recent issue of Political Analysis that contains this article.

8 thoughts on “No, you don’t need 20 groups to do a multilevel analysis

  1. I was under the impression from somewhere that 20 (or so) were required because the inferences at level 2 rely on large sample theory. This is the case with sandwich estimator corrections (that's in Murray's book Design and analysis of group randomized trials).

    Would you suggest that there was a minimum number (other than the obvious one of a larger level 2 N than the number of level 2 predictors)?

  2. Jeremy,

    There is no minimum number of groups. But if the number of groups are small, you have to put more effort into your prior distribution for the group-level variance. See the "3 schools" example in Section 5.2 of this paper.

  3. Maas and Hox (2005) conduct some simulations and find that "a small sample size at level two leads to biased estimates of the second-level standard errors." They use ML instead of Bayesian. The full cite is:
    Maas, CJM & Hox, JJ (2005). Sufficient sample sizes for multilevel modeling. Methodology, 1, 86-92.

  4. Is it really correct to treat countries as random effects? Countries are selected rather than sampled.
    I don't know exactly why, but it feels wrong to add a country level variance component.

  5. Ed,

    The question is: what's the alternative? With 8 groups (or whatever), I'd do a multilevel model. The usual non-multilevel models are simply multilevel models with group-level variance set to 0 or infinity.

    George,

    Think of a multilevel model as a 2-stage regression. Regression is ok even if the data are not randomly sampled, as long as the selection is on x-variables and not on y. (See Chapter 7 of Bayesian Data Analysis or various econometrics textbooks.) To put it another way, I don't mind if it feels wrong to you as long as you still do it! The alternative of setting the group-level variance to 0 or infinity seems worse.

  6. I definitely feel much clearer about the situation now…basically using multi-level modeling seems the way to go because the alternative of having zero/infinite country level variance is not very attractive. I originally became worried after reading that Maas article cited by Ed – and a few other ones which had the same argument (i.e that the level 2 units should be treated as a random sample)

    Thanks a lot Andrew and everyone else who wrote the comments!

  7. Hello

    I recently obtained a copy of Gelman and Hill (2007) and it is much better than many of the other books I have read on the topic – THANK YOU.
    Unfortunately after much mental gymnastics I am left with (what I can only assume to be) a silly question. I have been told that I have a 'mixed effects' problem, but I am not convinced.

    I have 40 yrs of presence/absence data from 5 locations within one region. I am trying to predict the probability of fish presence in each location as a function of habitat quality in each location and temperature, mean body size, and total abundance at the region level. I am treating the years as replicates. My questions are:

    1) Given that most of my data are 'region' level, how do I treat this in my model? For now I have replicated the region level data for each location in my data matrix.

    2) how can this be a multilevel model? While the data are hierarchical, I fail to see how the model describing the response at each location is hierarchical. If I had 2 or more regions (and therefore error at this level) I can see it, but for now I am at a loss!

    I would be very grateful for any help you can offer. Many thanks

    Aislinn

  8. I am foolishly about to embark on a multilevel examination for a doctoral thesis and have three levels (org level, team level and individual level). I've not seen how orgs, teams and individuals I need to attain statistical power. It is simply not possible to know what the size of the population is.

    Any help would be greatly appreciated!

Comments are closed.