How does inference for next year’s data differ from inference for unobserved data from the current year?

Posted on September 6, 2014 9:58 AM by Andrew

Juliet Price writes:

I recently came across your blog post from 2009 about how statistical analysis differs when analyzing an entire population rather than a sample.

I understand the part about conceptualizing the problem as involving a stochastic data generating process, however, I have a query about the paragraph on ‘making predictions about future cases, in which case the relevant uncertainty comes from the year-to-year variation’.

Wouldn’t the random-data-generating-process conceptualization cover the situation where you’re interested in making predictions about future cases? I just wanted to check that I’m not missing the importance of the year-to-year variation– this, presumably, wouldn’t be the random variation that’s necessary for inferential statistics to apply, as the year-to-year variation might be systematic rather than random?

My reply:

See for example the Gelman and King JASA paper from 1990. The point is that variation among units within a given year is not the same as variation within a unit from year to year.

We used a multilevel model.

But the real point here is that we were able to transform a somewhat philosophical question (What is the meaning of statistical inference if the entire population is observed?) into a technical question regarding variance within and between years. A lot of progress in statistical methods goes this way, that topics that formerly were consigned to philosophy can get subsumed into quantitative modeling.

4 thoughts on “How does inference for next year’s data differ from inference for unobserved data from the current year?”

Rahul on September 6, 2014 4:15 PM at 4:15 pm said:

Does the modelling approach change? i.e. is there a difference in the procedures used for extrapolating (inference?) from the usual sample to population versus the use case where you must “think of your “entire population” as a sample from a larger population, potentially including future cases.”

Reply ↓
- Andrew on September 6, 2014 4:44 PM at 4:44 pm said:
  
  Rahul:
  
  Yes, it makes a difference. See our 1990 paper (mentioned above) for an example.
  
  Reply ↓
Anon on September 6, 2014 5:50 PM at 5:50 pm said:

Please provide a link to the paper. (Or at least the full reference)

Reply ↓
- Andrew on September 6, 2014 10:27 PM at 10:27 pm said:
  
  https://www.stat.columbia.edu/~gelman/research/published/
  
  Reply ↓

Statistical Modeling, Causal Inference, and Social Science

How does inference for next year’s data differ from inference for unobserved data from the current year?

4 thoughts on “How does inference for next year’s data differ from inference for unobserved data from the current year?”

Leave a Reply Cancel reply