Juliet Price writes:
I recently came across your blog post from 2009 about how statistical analysis differs when analyzing an entire population rather than a sample.
I understand the part about conceptualizing the problem as involving a stochastic data generating process, however, I have a query about the paragraph on ‘making predictions about future cases, in which case the relevant uncertainty comes from the year-to-year variation’.
Wouldn’t the random-data-generating-process conceptualization cover the situation where you’re interested in making predictions about future cases? I just wanted to check that I’m not missing the importance of the year-to-year variation– this, presumably, wouldn’t be the random variation that’s necessary for inferential statistics to apply, as the year-to-year variation might be systematic rather than random?
My reply:
See for example the Gelman and King JASA paper from 1990. The point is that variation among units within a given year is not the same as variation within a unit from year to year.
We used a multilevel model.
But the real point here is that we were able to transform a somewhat philosophical question (What is the meaning of statistical inference if the entire population is observed?) into a technical question regarding variance within and between years. A lot of progress in statistical methods goes this way, that topics that formerly were consigned to philosophy can get subsumed into quantitative modeling.
Does the modelling approach change? i.e. is there a difference in the procedures used for extrapolating (inference?) from the usual sample to population versus the use case where you must “think of your “entire population” as a sample from a larger population, potentially including future cases.”
Rahul:
Yes, it makes a difference. See our 1990 paper (mentioned above) for an example.
Please provide a link to the paper. (Or at least the full reference)
http://www.stat.columbia.edu/~gelman/research/published/