Skip to content

Improving our election poll aggregation model

Luke Mansillo saw our election poll aggregation model and writes:

I had a look at the Stan code and I wondered if the model that you, Merlin Heidemanns, and Elliott Morris were implementing was not really Drew Linzer’s model but really Simon Jackman’s model. I realise that Linzer published Dynamic Bayesian Forecasting of Presidential Elections in the States in 2013 but the model looks a lot like Jackman’s 2006 Australian Journal of Political Science article (that was in chapter 9 of his 2009 textbook) mixed with a hierarchical structure in chapter 8 of his textbook. Nevertheless, I was wondering if there was any component to the model that you would have liked to include but couldn’t? I have commented on GitHub that I imagine MRP would be a fine inclusion, or using a non-linear component to the hierarchical structure such as a spline or a Gaussian process would have been cool, or even the use of a neat prior specification horseshoe prior on shrinking in the hierarchical structure or on the bias of a pollster (etc), or perhaps the use of ensemble of model specifications.

Whenever I’ve thought about how I’d do a forecast beyond a national state space model I find myself quickly turning into a kid in a candy store, imagining all sorts of permutations that would be cool but have the cost of time thanks to exploring high dementia all space. Incorporating national and state polling into a dynamic mrp model has been an idea I’ve played around a bit but found that I’ve not had enough survey data in Australia to do it confidently.

A longer version of Mansillo’s comment is here on the Github page, and here’s an analysis that he did with Jackman of an Australian election campaign.

My quick reply is that, no, we didn’t take this from Jackman’s article, but it’s a pretty straightforward model where we just kept adding in features to explain different aspects of the time evolution of public opinion and poll data. As I wrote in my earlier post, it’s vaguely based on my 2010 paper with Kari Lock on Bayesian combination of state polls and election forecasts, but we pretty much started from scratch. I suppose you can get to our model from various paths. I agree with Mansillo that more can be done. Our model is all open-source, so anyone is free to alter it and do better.


  1. anon says:

    Would be great to see a longer reply from you. He makes some good points on ideas worth trying.

    • I was really interested in the problem that there needs to be a quick turn around modelling these data so there was perhaps a premium placed on the simplicity of design for computational reasons; I am reminded of when Doug Rivers was doing YouGov’s MRP model in Stan for the 2017 British general election computation took about a day which is not necessarily ideal.
      I noticed a few similarities between the Stan model Gelman, Heidemanns and Morris had implemented and the model that Jackman had published in 2006. Some permutations and influences go into all types of models to solve problems after 15 years as models become increasingly sophisticated. The model just reminded me of Jackman’s earlier work at its heart, and I was a little hungry to know what the temptations to alternatively model the data. It’s a bit of a how long is a piece of string question, then what would you do differently question or what icing should go on the cake question?

      • Anon says:

        Sure, but certainly Jackman borrowed ideas from Andrew to begin with.. so is the coming full circle really so surprising?

        • Luke Mansillo says:

          I’ve certainly found inspiration in both Jackman’s and Gelman’s models – I guess I wrote to Andrew curious about his inspiration, thought process on how to execute the task. I always say to myself if I cannot state a prior for something I shouldn’t include it in a model thanks to Gelman – what are our priors for the transition of voting intentions conditional on what inexhaustible list of random variables.

    • Andrew says:


      Some of these ideas definitely seem worth trying! As the saying goes, the most important thing is not what you do with the data, but what data you use. So it could be helpful to extend the model in ways that allows you to use more information.

      When I said the model is open source and people should improve it, I meant it! I’m just one person. And, for whatever reason, I never really focused 100% on this particular model. And even if I had, there’d still be lots of room for improvement, I’m sure.

  2. Luke Mansillo says:

    I have recently discovered that “dimensional” my iPad automatically corrects to “dementia” – apparently Apple’s Australian English dictionary doesn’t recognise the adjective form of dimension.

Leave a Reply