A Bayesian state-space model for the German federal election 2021 with Stan

I didn’t do anything on this, just stood still and listened while others talked. I’ll share the whole thread with you, just to give you a sense of how these research conversations go.

This post is for you if:

– You’re interested in MRP, or

– You’re interested in German elections, or

– You want to overhear this particular email exchange.

It started when I received a message from Marcus Groß:

We published a project on forecasting the German federal election outcome using a Bayesian state-space model. Similar to your forecast for the US presidential elections 2020, the model is written in Stan and freely available on github. A webpage is here (in German, English version follows soon; you may use google webpage translate in the meantime).

The model consists of two parts:

– A model for the election outcome, i.e. the vote share for each party based on poll data.

– A model to assign a probability to the coalition of the actual government using expert survey data. For a given election outcome it is not certain which parties are going to be part of the government coalition, as in most cases multiple options are conceivable and this is subject to negotiations between parties.

On the first part:

I think that everybody concerned with this topic agrees that accounting for important sources of biases and uncertainties is a key to derive sound estimates on election outcome events. The model for the German federal election accounts for the following sources:

a) Uncertainty about future events and their correlations between parties, e.g. an event positive for party A might have a strongly negative effect for party B, but much less so for party C.

b) Long-term and short term bias (“house effects”) of a certain pollster for a certain party, i.e. a certain pollster might generally favour a specific party (long-term) whereas the size of this bias might fluctuate over time (short-term).

c) Common pollster bias for specific parties and correlations between parties. The entirety of all pollsters might be wrong / biased. Following an election the pollsters will likely adjust their poll schemes and correction methods to account for it. Afterwards the bias magnitude will slowly creep up again.

d) Polling error due to finite sample size

The basic model idea follows a Bayesian state-space model, such as the model of Simon Jackman for the australian election. However, a major change is that the latent voter’s intention is not modeled by a random walk, but rather uses a “long-short term memory” extension. The concept behind this is that a specific event that influences the vote share of a specific party (scandals, nomination of candidates..) are first spread through the media which causes an overreaction or overshoot (short-term memory) with a maximum effect at around four weeks. Medium to long term, however, we see that this effect declines somewhat as other topics are covered by the media now and voters forget about the concerning event. As the remaining long term effect is smaller than the initial one, we see a regression to the mean behaviour. This is modeled by a time-series model structure resembling a mixture of autoregressive processes. This concept can be also confirmed empirically and improves the forecast quite a bit (between 5%-15% depending on the forecast horizon).

On the second part:

Here, we asked experts on their opinion on coalition preferences between parties. They had to rank coalitions under the premise that the parties were free to choose, independently of the election result. (Imagine a situation where party representatives are locked into a room to form coalitions. Who would find together first, second and so on..) Thus, these rankings are orthogonal to the election outcome or vote share simulations, which make it quite straightforward (under certain assumptions) to derive probabilities for final government coalitions.

The model description is here and the Github link is here.

I sent this to Merlin Heidemanns who worked with me to set up the Economist model and is also German. I guess Germany is more difficult to model because you have more than two major parties.

Merlin had a question for Marcus:

Something I found weird looking at the website. You say that Jamaica has a 99.8% chance of receiving a majority of the vote share but that the coalition will be CDU/CSU/Greens with 63%? Do the experts make their prediction based on the model probabilities? The probabilities for the different coalitions don’t necessarily seem congruent with each other. Maybe I am missing something.

Marcus replied:

This is because different questions are answered. Yes, Jamaica has an estimated chance of 99.8% of getting a majority, and CDU/CSU/Greens of 83.5% (see at the bottom of the web page). These numbers are not influenced by the expert rankings and are derived from the “first part” of the model that uses only poll data. The 63% for CDU/CSU/Greens in the first graph do however not answer the question of getting a majority, but rather the probability of being in charge after the elections (i.e. constituting the government coalition). These probabilities therefore sum up to 100%! To derive these probabilities, the expert rankings are combined with the election outcome simulation results (see last section of the notebook.pdf for a detailed explanation).

Merlin then had further thoughts:

1. I had thought about the German elections for a while and have so far thought that some MRP based approach as YouGov did it would probably be better than a model based only on essentially the second vote choice. Also with regard to the Ueberhangsmandate.

I like the model (and that it’s public)! Some things I would be thinking about:

2. How predictive are polling errors across elections? I have recently begun to think that a mean zero assumption for polling errors is not sensible. I obviously am unaware of how the situation is in Germany but at least here in the US year to year correlation is between 0.4 and 0.6. Maybe one can do better by predicting polling errors for the upcoming election rather than sampling them anew.

3. Similarly, I’ve been concerned about house effects that assume that the average is equal to 0. I would like at patterns in the polling errors by pollster and decompose polling error year by year into an average error and a polling house specific error and see whether there is a pattern that one could use to derive priors for the next election cycle from the previous one.

4. I am not a fan of fixed hyper parameters for the short and mid term decay. I’d either put priors on them or try to estimate them with auxiliary data though my fear is that the data you fit will affect them and cause some undesirable behavior that inflates them as the model tries to fit random excess variation in the polls.

5. Both epsilon and pollError can probably be decomposed with the standard normal cholesky decomposition and that could be faster? A week is a lot to fit a model like this.

6. What do you gain from estimating the polling error for every election week? Is the assumption that polling error is larger or smaller at different points in the election season?

Marcus replied:

1. You mean using polls on state / county level in a MRP model? On state level, the poll data is quite sparse in Germany, therefore we used only the second vote choice poll data. But, yes this might be something to look at. With the Ueberhangsmandate, I think there won’t be a problem, because there was a change in the voting system some years ago, that guarantees that the second vote choice shares and the Bundestag seat shares are guaranteed to be the same (ignoring discretization).

2. That might be actually worth looking into as they are currently are assumed to be uncorrelated within the model. My assumption was that the pollsters learn after each election and try to recalibrate and correct their polling methods. At a first glance, there does not seem much correlation, but I’ll look into that in more detail.

3. I think I would run into an identification problem otherwise. The total bias, i.e. the difference between the published poll for a certain party from a specific pollster is decomposed into:
total_bias_party_pollster_election = poll_error_party_election + house_bias_party_pollster + house_bias_party_pollster_election. So, the house bias of a specific pollster for a party has two parts:the long term house_bias (” housebias”- parameter) and the short-term house_bias (” housebiasEl” parameter). A specific pollster can have a house effect != 0 for a specific party, but the average over all pollsters is 0 (not possible to differentiate from the poll error and short term house effect otherwise).

4. I tried that as well, but the model got much less stable. I think however, fixing them is okay, as the extent of the decay is estimated by the model, but only the half time parameters are fixed. (I see that somewhat similar to fixed knots in a spline model; I also tried to introduce a long term decay as well, this didn’t change much, however.).

5. The multi_normal_cholesky statement is currently used, actually or do you mean some other formulation? I tried a lot of other model formulations in stan, but the current one was the fastest I could do. The model takes only about 18 hours to fit currently, not a week; the web page is updated every day. (Maybe you took a look at the poster? It states an outdated version of the model).

6. The poll error is only estimated per election and party, not by week; (see line 54 in https://github.com/INWTlab/lsTerm-election-forecast/blob/master/stan_models/lsModelMulti.stan). This was also changed some months ago (info on the poster is outdated).

Merlin responded:

That clarifies a lot. Thanks!

YouGov did essentially a very large poll and fit an MRP model post stratifying to census variables to get win probabilities for each electoral district and the second vote choice. I am not aware of how frequent vote splitting is or how voters are generally distributed in Germany based on party identification so whether or not that is comparatively better or not I am not certain.

Yes, I looked at the poster. 18 hours is a lot but if you just have it run on a server somewhere I guess it doesn’t matter.

We used the non-centered parameterization for our multivariate random variables like so

mu_b[:,t] = cholesky_cov_mu_b * raw_mu_b[:, t] + mu_b[:, t – 1]

That could help speed things up (or not) but if you haven’t already might be worth trying.

On the pollsters correct after each election, it’s hard to say. Do you have graphs of pollster-party level error by election?

And that makes sense on the stability issue when not fixing those decay parameters. It’s probably also related to identifiability.

And then Marcus replied:

To fit a electoral-district-level model, it is not easy to get poll data and I don’t think it would make much difference on the federal level. The reason is that the first vote doesn’t have any influence on the relative seat distribution in the parliament since 2011 (https://de.wikipedia.org/wiki/%C3%9Cberhangmandat#Reform_wegen_Verfassungswidrigkeit), where so-called Ausgleichsmandate were introduced. Thus for the parliament seat distribution in the parliament we have a popular vote only (only second vote counts). Überhangmandate still exist but the new Ausgleichsmandate increase the parliament size, such that the seat distribution is identical to the second vote result. Still, getting estimates on the electoral-district level including winning probabilities of the candidates would be very nice.

I tried that non-centered parameterization, but it didn’t matter performance-wise in this case. These kinds of hacks do unfortunately make the code less readable (e.g. the “epsilon” variable is also multivariate-t, which is anything but obvious), one of the drawbacks of Stan. However, i’m really happy that the model runs at all! Stan really was the only thing that worked. State-space models with this level of convolution are notoriously hard to fit.

On the correlations of poll errors, maybe a short explanation would help:

For a given party:

– There is a positive correlation for errors of a specific pollster as the (long-term) house effect parameter is shared (“housebias” variable), even more so if the polls are from the same election cycle. In this case the election specific (short-term) house effect parameter (“housebiasEl” variable) is shared as well.

– There is also a positive correlation for errors between different pollsters for a given election cycle, as they share the common pollster bias variable (“pollError” in the Stanmodel, that expresses a global bias of all pollsters within an election cycle).

The assumed pollsters correction after an election is to be interpreted globally, i.e. the common pollster bias is reset. For example, in the 2017 election all pollsters overestimated the CDU/CSU vote share. As of now, the model doesn’t assume any persistence of this kind of bias, i.e. for the 2021 election the model draws this bias from a (multi-) normal with mean zero. However, I’ll look into that next week, maybe there is some room for improvement.

That’s where it is so far. You can follow the links at the top of the post to see Marcus’s model, data, and documentation.

2 thoughts on “A Bayesian state-space model for the German federal election 2021 with Stan

  1. Interesting, thanks for sharing! And good to see some serious election forecasting for Germany. Will be interesting to watch how well it will work out, especially without Merkel on the ballot for the first time since 2002.

Leave a Reply

Your email address will not be published. Required fields are marked *