Barry Quinn writes:
I would like some quick advice on survey design literature, specifically any good references you would have when designing a good online survey to allow for some decent hierarchal modeling?
My quick response is that during the opening you should already be thinking about the endgame. In this case, the endgame is that you want results you believe, and that will stand up to outside criticism. The “middlegame,” then, will involve adjusting sample to match population, using Mister P or whatever. And the opening will include the following four steps:
1. Try your best to reach everyone in the target population: minimize selection process in the process of contact and response.
2. Take care when measuring the outcomes of interest. Let your measurements match your goals. For example, if you are interested in individual changes over time, try to get multiple measurements on the same people.
3. Gather enough background information on respondents so you can do a good job adjusting. So gather demographic information and also other relevant variables (such as party identification and past votes, if this is a political survey).
4. If you’re going to do Mister P, then get good group-level predictors. Hierarchical modeling is most effective when it is used to smooth toward a model that predicts well; it doesn’t work magic when you don’t have good group-level information.
In answer to your direct question: No, I don’t know that this is addressed in the literature. Unless you count this post as part of the survey design literature, in which case I guess I could point here.
> 1. Try your best to reach everyone in the target population
what does this mean?
> minimize selection process in the process of contact and response.
also, what does this mean?
Is anyone aware of examples of researchers using MrP where where the target population is, itself, uncertain due to the nature of the data collection mechanism?
For instance, in a clickstream dataset, data is collected about the behavior of internet users, therefore, everything observed is conditional on someone having access to the internet in the first place (hence being in the dataset) and making the measured outcome of interest more frequent in the dataset than in the population even adjusted for geography/other group variables.
Answering a question about the American population broadly using this data is not possible without first knowing the number of Americans with internet access (or whatever the data collection mechanism was conditional on).
(ie. a random user in the clickstream data is more likely to to visit a certain website just so scaling by the total population gives inflated answers).
I suppose the only solution be to first design a second experiment that estimates the size of the desired target population
For the terminally naive, what is Mister P? I cannot find it in Google (other than this article).
Bill:
Mister P is the first item on the lexicon!
More terminally naive: Lexicon? Where is that? Can’t find it on your blog page.
Bill:
You can search blog for lexicon or just click on the above link!
Oh, sorry, I didn’t notice that it was a link.
You can google *gelman mister p* for lots more.
This is useful. “Multilevel Regression with Poststratification.