No, you don’t need 20 groups to do a multilevel analysis

Posted on August 16, 2007 5:17 AM by Andrew

Anna Reimondos writes,

I stumbled upon your blog while reading up on multilevel modeling, and have found it really useful. I have been itching to ask a question on it but was not sure if the correct procedure was to email you personally or to write in a comment on the blog. Many apologies if I was meant to type my comment/question directly in the blog.

My question is very simple (stupid?). I am just about to embark on my PhD thesis where I will be looking at a number of research questions relating to demographic behaviour in Europe. I will be conducting secondary analysis on a survey of 11 European countries.
I was hoping to use multilevel modeling, as not much research seems to have been done using this method in my field of research. While I have never used multilevel modeling before from what little I have read so far it seems that for a lot of questions if I could include country level explanatory variables, that that might explain a lot of the behaviour observed.

The level 1 would be the individual respondents and level 2 the countries. Is 11 countries enough to use multi-level modeling? I read that around 20-30 units of analysis were required at the highest level, so I am a bit worried.

My reply:

1. Email is a fine way to send a question.

2. Yes, you can include country-level explanatory variables. No, I don’t think you don’t need 20 or more groups. Track down who told you that and find out why they said it. Maybe they have a good reason; if so, I’d like to hear it…

3. For many examples of analyses across countries, take a look at the recent issue of Political Analysis that contains this article.

8 thoughts on “No, you don’t need 20 groups to do a multilevel analysis”

Jeremy Miles on August 16, 2007 10:06 AM at 10:06 am said:

I was under the impression from somewhere that 20 (or so) were required because the inferences at level 2 rely on large sample theory. This is the case with sandwich estimator corrections (that's in Murray's book Design and analysis of group randomized trials).

Would you suggest that there was a minimum number (other than the obvious one of a larger level 2 N than the number of level 2 predictors)?
Andrew on August 16, 2007 1:47 PM at 1:47 pm said:

Jeremy,

There is no minimum number of groups. But if the number of groups are small, you have to put more effort into your prior distribution for the group-level variance. See the "3 schools" example in Section 5.2 of this paper.
Ed on August 16, 2007 8:54 PM at 8:54 pm said:

Maas and Hox (2005) conduct some simulations and find that "a small sample size at level two leads to biased estimates of the second-level standard errors." They use ML instead of Bayesian. The full cite is:
Maas, CJM & Hox, JJ (2005). Sufficient sample sizes for multilevel modeling. Methodology, 1, 86-92.
George Doukas on August 16, 2007 10:57 PM at 10:57 pm said:

Is it really correct to treat countries as random effects? Countries are selected rather than sampled.
I don't know exactly why, but it feels wrong to add a country level variance component.
Andrew on August 17, 2007 7:41 PM at 7:41 pm said:

Ed,

The question is: what's the alternative? With 8 groups (or whatever), I'd do a multilevel model. The usual non-multilevel models are simply multilevel models with group-level variance set to 0 or infinity.

George,

Think of a multilevel model as a 2-stage regression. Regression is ok even if the data are not randomly sampled, as long as the selection is on x-variables and not on y. (See Chapter 7 of Bayesian Data Analysis or various econometrics textbooks.) To put it another way, I don't mind if it feels wrong to you as long as you still do it! The alternative of setting the group-level variance to 0 or infinity seems worse.
Anna on August 20, 2007 5:51 PM at 5:51 pm said:

I definitely feel much clearer about the situation now…basically using multi-level modeling seems the way to go because the alternative of having zero/infinite country level variance is not very attractive. I originally became worried after reading that Maas article cited by Ed – and a few other ones which had the same argument (i.e that the level 2 units should be treated as a random sample)

Thanks a lot Andrew and everyone else who wrote the comments!
Aislinn on September 10, 2007 2:46 PM at 2:46 pm said:

Hello

I recently obtained a copy of Gelman and Hill (2007) and it is much better than many of the other books I have read on the topic – THANK YOU.
Unfortunately after much mental gymnastics I am left with (what I can only assume to be) a silly question. I have been told that I have a 'mixed effects' problem, but I am not convinced.

I have 40 yrs of presence/absence data from 5 locations within one region. I am trying to predict the probability of fish presence in each location as a function of habitat quality in each location and temperature, mean body size, and total abundance at the region level. I am treating the years as replicates. My questions are:

1) Given that most of my data are 'region' level, how do I treat this in my model? For now I have replicated the region level data for each location in my data matrix.

2) how can this be a multilevel model? While the data are hierarchical, I fail to see how the model describing the response at each location is hierarchical. If I had 2 or more regions (and therefore error at this level) I can see it, but for now I am at a loss!

I would be very grateful for any help you can offer. Many thanks

Aislinn
Greg on November 8, 2007 8:32 AM at 8:32 am said:

I am foolishly about to embark on a multilevel examination for a doctoral thesis and have three levels (org level, team level and individual level). I've not seen how orgs, teams and individuals I need to attain statistical power. It is simply not possible to know what the size of the population is.

Any help would be greatly appreciated!

Comments are closed.