Population forecasting for small areas: an example of learning through a social network

Adam Connor-Sax sent this question to Philip Greengard:

Do you or anyone you know work with or know about (human) population forecasting in small geographies (election districts) and short timescales (2-20 years)? I can imagine just looking at the past couple years and trying to extrapolate but presumably there are models of population dynamics which one could fit and that would work better.

The thing that comes up in the literature—cohort component models—seem more geared to larger populations and larger timescales, countries over decades, say. But maybe I’m just finding the wrong sources so far.

Philip forwarded this to me, and I forwarded it to Leontine Alkema, a former Columbia colleague now at the University of Massachusetts who’s an expert in Bayesian demographic forecasting, and Leontine in turn pointed us to Monica Alexander at the University of Toronto and Emily Peterson at Emory University.

Monica and Emily had some answers!

Here’s Emily:

I can just speak to what we work on at Emory biostatistics. Lance Waller and myself have worked together with other collaborators in population estimation and forecasting for small areas (e.g., counties and census tracts) accounting for population uncertainty, mainly for US small areas but have also done some work in the UK. We have looked at the use of multiple data sources including official statistics and WorldPop across small administrative units.This works commonly combines small area estimation and measurement error models.

And Monica:

Here’s a few different approaches I’m aware of:

– A common approach is the Hamilton-Perry Method, which is sort of a cut down version of the cohort component projection method. David Swanson and others have published a lot about this, see eg https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2822904/ This is mostly deterministic projection; I’ve started some work with a student incorporating this approach into Bayesian models.

– Emily and others’ work, as she mentioned; my understanding of this is that they incorporate various data/estimates on population counts, and project those, accounting for different error sources (see eg here: https://arxiv.org/abs/2112.09813)

– A recent paper by Crystal Yu (Adrian Raftery’s student) describes a model that builds off the UN population projections approach: https://read.dukeupress.edu/demography/article/60/3/915/364580/Probabilistic-County-Level-Population-Projections

– Classical cohort component models at small geographic areas are tricky, because there’s so many parameters to estimate and the data often aren’t very good (even in the US, particularly for migration), which is why @Adam you aren’t seeing much for small areas — migration makes things hard. Leontine and I did a subnational cohort component model (see paper here: https://read.dukeupress.edu/demography/article/59/5/1713/318087/A-Bayesian-Cohort-Component-Projection-Model-to) but the focus was low- and middle-income countries, and women of reproductive age.

Adam followed up:

I’m working with an organization that directs political donations to Democratic candidates for US state-legislature. They are interested in improving their longer-term strategic thinking about which state-legislative districts (typically 40,000 – 120,000 people) may become competitive (or cease being competitive) over time and also, perhaps, in the ways that changing demographics might lead to changing campaign strategies.

From the small bit of literature search that I’ve done—which included most helpfully “Methods For Small Area Population Forecasts: State-of-the-Art and Research Needs” on which Dr. Alexander is a co-author!—I see various things I might try, likely starting with a simplified cohort-component model of some sort. I will look at all the references below.

One wrinkle I am struggling with is that for political modeling (in my case, voter turnout and party preference), there are mutable characteristics, e.g., educational-attainment, which are important. But in my brief look so far, none of these models are trying to project mutable characteristics. Which makes sense, since they are not simply a product of births, deaths and migration. So, because this is all new to me, I’m not quite sure if I should be trying to find a model which is amenable to the addition of education in some way, or choose an existing model for the population growth and then somehow try to model educational-attainment shifts separately and then combine them.

BTW, Dr. Peterson, I just looked a bit at your paper “A Bayesian hierarchical small-area population model accounting for data source specific methodologies from American Community Survey, Population Estimates Program, and Decennial Census data” and that is helpful/interesting in a different way! The work Philip and I do together is about producing small area estimates of population by combining various marginals at the scale of the small area (state-legislative district) and then modeling the correlation-structure using dataat larger geographic scales (state & national). This is driven by the limits of what the ACS provides at scales below Public Use Microdata Areas. I’ve also thought some about trying to incorporate the more accurate decennial census but hadn’t gotten very far. Your paper will be a good motivation to return to that part of the puzzle!

I’m sharing this conversation with all of you for three reasons:

1. Demographic forecasting is important in social science.

2. The problem is difficult and relates to other statistical challenges.

3. It’s hard to find things in the literature, and personal connections can help a lot.

I guess that the vast majority of you don’t have the personal connections that would allow you to easily find the above references. (You have your own social networks, just not so focused on quantitative social science.) The above post is a service to all of you, also a demonstration of how these connections can work.

5 thoughts on “Population forecasting for small areas: an example of learning through a social network

  1. “population forecasting in small geographies (election districts) and short timescales (2-20 years)?”

    Ha! That’s one amazingly timely post you’ve got there.

    The lead story in today’s (April 25, 2024) Asahi Shinbun morning edition was “40 PERCENT OF LOCALLY GOVERNED AREAS FACE EXTINCTION”

    It seems that Japan’s population predictions call for the number of women in the 20-39 age bracket to fall by 50% or more in these regions by 2050.

    (Oops: this is fo 26 years, not 2 to 20… Close, though. And no references to methodologies.)

    It looked like overmuch overdetailed overheated scaremongering, so I didn’t bother reading the details, though. (I kept the front page because it has a reference to a book that looked fun: “Palindrome Word Play In Japanese”, which I now see is out of print. There’s another one called MA SA KA SA KA SA MA “Wow! Upside Down” (Since Japanese is written vertically, joking about palindrome often involved “top to bottom bottom to top” sorts of things.) But I, as usual, digress…)

    But it was rather relavent for this post, since it was looking at the local regions within the 47 prefectures. For example, 24 of the 25 local governments/municipalities in Akita Prefecture are expected to see a reduction of over 50% in the number of women in the 20 to 39 age bracket. For Tokyo of the 62 local governments/municipalities, only 2 are expected to see such a reduction. (Tokyo’s a state, about half the size of Rhode Island, with a monster population.)

    (For Okinawa, it’s zero of 41. Yay, Okinawa!)

    Japan’s population was actually growing until just a few years ago, but it’s been starting to fall since then. Tokyo was the only one of the 47 prefecture that saw an increase, not decrease, in the latest actual population numbers.
     

    • Ouch! That out of print palindrome book is US$200, even at today’s insane exchange rate (The Yen’s crashing against the US dollar.) Ouch.

  2. There might be models of interest at https://applieddemogtoolbox.github.io/. It is an intriguing and difficult problem. I was struck by the traditional approach which seems to be based on using 2 (or more) time periods to make predictions, and more sophisticated approaches that then calibrate these predictions using at least one additional time period. One of the approaches listed in the post calibrated using expert opinion and independent forecasts coming from alternative models. All of these strike me as valuable attempts, although most seem inadequate to deal with issues of migration. Forecasting migration patterns should be relatively easy provided that the past is a good guide to the future – probably a bad assumption for small areas. For example, COVID led to big impacts on migration in small areas – the area where I last lived had an average time on market of houses for sale of more than 2 years – until COVID, after which average time on market was on the order of 1 day (or less, in my case). This is a case where forecasting works well, provided that nothing important changes – but in small areas, forecasts where things change are what matter most. I think that, in such cases, models must be developed from assumptions regarding the fundamental drivers of migration – economic and social changes. Numerical approaches are not likely to be successful unless past performance is a good guide for future performance.

    • Seems like the way forward is to assess the historical growth rates in the area in question, the adjacent areas, and the region as a whole. Then perhaps classify the small pop areas depending on local / regional patterns and/or proximity to large population areas, something like:

      1) local and regional both stable; presumably in some place like central Nebraska you’d find stable growth in adjacent areas, which would allow you to use some simple function of the historical growth rate (pos or neg); might be similar model in urban cores that have already long passed peak growth;

      2) local is stable regional growing; e.g., as-yet non-suburban areas with nearby areas of higher population;

      3) regional stable, local growing; e.g., growing towns in rural areas far from major pop centers

      Then give your local area a different growth model depedning on the local/regional pattern.

      Seems like something you could test with historical data as for former small population areas; e.g. turn back the clock somewhere and see how the model fares.

      The COVID thing strikes me as totally unpredictable. Can’t see how you’d forecast that, seems like one of those impossibilities you have to accept. But some kind of approach integrating regional and local and historical seems like a way forward that could eventually work reasonably well alot of the time.

  3. One more example. The Office for National Statistics in the UK has been developing a Bayesian model for population estimation at the Local Authority level. The model includes estimation of local-level birth, death, and migration rates (which is the hard part of local-level population forecasting) plus demographic accounting. The ONS website has some information: the model is referred to as the Dynamic Population Model.

Leave a Reply

Your email address will not be published. Required fields are marked *