Skip to content

How does inference for next year’s data differ from inference for unobserved data from the current year?

Juliet Price writes:

I recently came across your blog post from 2009 about how statistical analysis differs when analyzing an entire population rather than a sample.

I understand the part about conceptualizing the problem as involving a stochastic data generating process, however, I have a query about the paragraph on ‘making predictions about future cases, in which case the relevant uncertainty comes from the year-to-year variation’.

Wouldn’t the random-data-generating-process conceptualization cover the situation where you’re interested in making predictions about future cases? I just wanted to check that I’m not missing the importance of the year-to-year variation– this, presumably, wouldn’t be the random variation that’s necessary for inferential statistics to apply, as the year-to-year variation might be systematic rather than random?

My reply:

See for example the Gelman and King JASA paper from 1990. The point is that variation among units within a given year is not the same as variation within a unit from year to year.

We used a multilevel model.

But the real point here is that we were able to transform a somewhat philosophical question (What is the meaning of statistical inference if the entire population is observed?) into a technical question regarding variance within and between years. A lot of progress in statistical methods goes this way, that topics that formerly were consigned to philosophy can get subsumed into quantitative modeling.


  1. Rahul says:

    Does the modelling approach change? i.e. is there a difference in the procedures used for extrapolating (inference?) from the usual sample to population versus the use case where you must “think of your “entire population” as a sample from a larger population, potentially including future cases.”

  2. Anon says:

    Please provide a link to the paper. (Or at least the full reference)

Leave a Reply