Mister P for the 2020 presidential election in Belarus

An anonymous group of authors writes:

Political situation

Belarus is often called the “last dictatorship” in Europe. Rightly so, Aliaskandr Lukashenka has served as the country’s president since 1994. In the 26 years of his rule, Lukashenka has consolidated and extended his power, which is today absolute. Rigging referendums has been an effective means of consolidating power. His re-elections have been no better — he has claimed about 80% of the vote in all of them, while none of them has been acknowledged by the international community as free and fair. Lukashenka’s dictatorial rule seemed unshakeable a mere half a year ago. Right now, all of this is history as Lukashenka is scrambling to prop up his regime under the stress of 100,000- to 250,000-strong protest rallies every weekend since August, 9th. So what happened? In this post, we are discussing a preprint that we under the pseudonym of Ales Zahorski wrote to analyze the actual support levels for Lukashenka coming into the Presidential election on August 9th, 2020.

The 2020 presidential campaign proved to be unique for Belarus in many ways: The nonchalant approach of President Aliaksandr Lukashenka to the Covid-19 pandemic caused voluntary civil engagement in countering the threat of Covid-19. In turn, this led to increased political activity, providing fertile soil for the emergence of new political leaders, some of whom became presidential contenders. These new political leaders did not come from the conventional opposition and they had no obvious orientation towards Russia or the West. In addition, the new opposition leaders came from different backgrounds and had experience from a wide variety of professional fields. Thus they appealed to a much broader audience than their earlier counterparts.

Lukashenka, however, eliminated the three strongest candidates from the presidential race. To his dismay, the teams of those candidates united around Sviatlana Tsikhanouskaya, the wife of Siarhei Tsikhanouski, who was the third most popular candidate according to Internet surveys (see Table 1). She registered as a stand-in for her husband after his arrest. The Central Electoral Committee (CEC), a puppet body meant to oversee elections, allowed her to enter the race, probably because Lukashenka did not consider her a real threat. Otherwise, CEC registered the three representatives from the conventional opposition: Siarhei Cherachen, Andrei Dmitriyeu and Hanna Kanapatskaya. None of them had any visible support in the population according to the media polls (see Table 1). From the early stages of the 2020 presidential campaign, it was clear that the fairness of the election would be in question. Independent candidates were barred from entering local election committees which hinted at the planned ballot stuffing. The Belarusian Ministry of Foreign Affairs did not invite any credible international observers.

Sociology on political topics is banned

Since independent sociology and independent surveys are banned in Belarus, we had to be inventive in order to obtain data on the popularity of each presidential candidate. There are some online polls performed by the media (which were as of June 1, 2020 also forbidden), but these can not be trusted as they lack sound scientific rigour.

The absence of independent polling institutes and extremely contradictory results coming from different sources, provided the impetus for the current study. The results of media polls are summarized in Table 1, while the Ecoom (a company hired by Belarusian authorities) polls are presented in Table 2. As one can see, these polls contradict each other. Thus, we came up with an initiative to carry out a national poll, and based on these data, we used the multilevel regression with poststratification (MRP) methodology to estimate the popularity of each candidate. With this study, it was our sincere aim to provide a politically unbiased account of what the presidential election results in a counterfactual world – a Belarus with free and fair elections – would likely have been.

Data

We employed two different methods for polling: (1) An online poll using Viber – the most popular messenger application in Belarus; and (2) a street poll taking place at different locations across the country. The questionnaires contained questions about what candidate the respondents intended to vote for, as well as questions about socio-economic and demographic status of the participants including age, gender, education level, region of residence and type of area of residence that correspond to the national census data. The latter allowed us to employ poststratification. We further added questions of common research interest. There were two additional questions in both the Viber and the street surveys about the family’s total monthly income and whether the respondent was willing to participate in early voting. The invitation to participate in the Viber poll was advertised in various communities on social media and was also sent via SMS to random Belarusian phone numbers (see details in the paper). As a result, we obtained around 45,000 answers. After disregarding answers from persons younger than 18 years old, people without Belarusian citizenship, and responses from phone numbers outside of Belarus (in the clean-up) 32,108 answers were kept. For the street poll, we aimed at collecting at least 500 responses to cover all possible categories of citizens with respect to gender, age, region, and type of area of residence. We used the official annual report for 2019 from Belstat (National Statistical Committee of the Republic of Belarus) to calculate the representative size of the statistical group for each category surveyed. As a result, we collected 1124 responses, providing a decent representativeness of the Belarusian population as compared to the official Belstat census data Demographic biases in the collected samples against the official 2009 census and 2019 annual report are presented in Figure 1.

After preprocessing the data from the Viber and street polls, we joined the two samples as follows: The filtered Viber sample was randomly divided into two parts consisting of 50% of the data each. One of these parts was kept as a holdout set for testing the predictive uncertainty handling of our MRP model, whilst the other one was merged with the street sample into a training set, where the street data was uniformly upsampled to the size of 50% of the whole Viber sample. By means of doing this kind of preprocessing we equalize the importance of the street and Viber data in the training set, whilst keeping approximately the same amount of information as in the Viber poll data.

The scripts used for merging the data are implemented in R as a part of statistical modelling pipeline and are also freely available on the GitHub page of the project.

MRP

In short, the methodology we employ involves building a statistical model that attempts to atone for the fact that our survey respondents are not representative of the population as a whole. By properly weighting the predictions of our multilevel regression model, we generalise from the sample to the entire population. The procedure is called multilevel regression with poststratification (MRP). The inference was performed in INLA.

We also adopted several recently published advancements from Gao et al. [2020] to improve MRP. In particular, the random effects corresponding to the ordinal categorical predictors (age and education) are assumed to have a latent AR1 structure between the categories, whilst other factors as well as the intercept term have an i.i.d. latent structure. Additionally, a latent Gaussian Besag-York-Mollie (BYM2) field is included into the model in order to account for the spatial dependence of the probabilities between the regions and the variance which is neither explained by the covariates nor by the common latent factors included into the random intercept.

We also employed model selection using criteria including WAIC and MLIK to compare the suggested model to the baselines. The baselines were models without a latent AR1 structure between the categories and additionally without BYM2. The model with both AR1 and BYM2 included was found optimal with respect to these criteria.

Results

We found that the results of the election announced by CEC and the results of the pro-governmental BRSM (BRSM here stands for Belarusian Republican Youth Union) poll strongly disagree with the estimated pre-election ratings of the candidates, whilst the results of the independent polls are much more consistent with our estimated ratings. In particular, we found that both the officially announced results of the election and the officially reported early voting rates are improbable according to the estimates we obtained from the merged Viber and street poll data.

As shown in the following figure, both the officially announced results of the election and early voting rates are highly improbable. With a probability of at least 95%, Sviatlana Tikhanouskaya’s rating lies between 75% and 80%, whereas Aliaksandr Lukashenka’s rating lies between 13% and 18% and early voting rate predicted by the method ranges from 9% to 13% of those who took part in the election. These results contradict the officially announced outcomes, which are 10.12%, 80.11%, and 49.54% respectively and lie far outside even the 99.9% credible intervals predicted by our model. The ratings of other candidates and voting “Against all” are insignificant and correspond to the official results. The same conclusions are valid when comparing the pre-election ratings to the pro-governmental BRSM poll.

As shown below, the only groups of people where the upper bounds of the 99.9% credible intervals of the rating of Lukashenka predicted by MRP are above 50% are people older than 60 and uneducated people.

For all other subgroups, including rural residents, even the upper bounds of 99.9% credible intervals for Lukashenka are far below 50%. The same is true for the population as a whole. Thus, with a probability of at least 99.9%, as predicted by MRP, Lukashenka could not have had enough electoral support to win the 2020 presidential election in Belarus.

Criticism and our responses
Important assumptions that must hold for our conclusions to be valid are discussed by Daniel Simpson in his scientific blogpost: Assumption 1: The demographic composition of the population is known. Assumption 2: The people who did not answer the survey in subgroup j correspond to a random sample of subgroup j and to a random sample of the people who were asked.

Regarding Assumption 1, we used precise survey data from the 2009 Belstat census. We had to assume, however, that the demographics of Belarus have not changed significantly since then. In the first figure presented in this blogpost, we show this to be true at least marginally for four groups of the addressed demographic variables (when compared to the 2019 annual report), but the data on the fifth group (education levels) from 2019 is not yet available. Assumption 1 will also get an additional check when the results of the 2019 census in Belarus are published. Then, we will have the possibility to restratify the results if some significant changes in the demographics appear.
Regarding Assumption 2, we agree with Simpson that this sort of missing at random assumption is almost impossible to verify in practice. Simpson mentions that there are various things one can do to relax this assumption, but generally this is the assumption that we are making. This assumption is quite likely met for the street survey. Nevertheless, there is room for several sources of bias: (1) selection of respondents by interviewers – tendency to select more approachable/friendly-looking people, although we gave the explicit instructions to select random people; (2) response/refusal of respondents when approached (those in a hurry, those afraid to answer because of their pro-opposition views, possibly pro-governmental respondents who are not eager to answer due to their distrust in polls and other activities around the election); (3) item non-response, i.e. respondents not answering specific questions (some respondents did not want to report their income levels). The net effect of (1)-(3) is, of course, unknown. Validating Assumption 2 in the Viber poll is much more difficult. According to Simpson’s blogpost, one option is to assess how well the prediction works on some left out data in each subgroup. This is useful because poststratification explicitly estimates the response in the unobserved population. This viewpoint suggests that our goal is not necessarily unbiasedness but rather a good prediction of the population. It also means that if we can accept a reasonable bias, we will get the benefit of much tighter credible bounds of the population quantity than the survey weights can give. Hence, we return to the famous bias against variance trade-off. We have tried to approach this assumption from several perspectives. First of all, in the Viber poll, we used sampling of random phone numbers to invite respondents and advertised at different venues frequented by people with various demographic and political backgrounds. Secondly, in the attempt to obtain better results in the bias-variance trade-off sense and to assess predictive properties of the underlying Bayesian regression, we uniformly upsampled the street data to the size of 50% of the Viber data and randomly divided the Viber data into two halves: One half was merged with the upsampled street data to form the training sample. The other one was left as a hold-out set to test predictive uncertainty handling by the introduced in the paper modified Brier score. Here, we aimed at reducing the variance by possibly introducing some bias and at testing predictive qualities of the model. Lastly, to assess and confirm our findings on the joint sample, we performed the same analysis based on MRP fitted on the street data only. This analysis is much more likely to have no violations of Assumption 2 above, however the sample is significantly smaller, and in the sense of a bias-variance trade-off, we are likely to have a significantly increased variation in the posterior distributions of the focus parameters. At the same time, we can validate the results obtained by MRP on the joint sample. As a result we get the posterior quantiles of interest presented in Figure 4. In short, one can see that even though the level of uncertainty is significantly increased due to the reduced sample size, ultimately all of the conclusions are equivalent to those presented above for the MRP trained on the joint sample. Though for some important conclusions the level of significance drops from 99.9% to 99% or 95%. Moreover, the 99.9%, 99%, 95%, and 90% credible intervals of the MRP trained on the joint sample are almost always inside the corresponding credible intervals obtained on the street data. This allows us to conclude that we have obtained a very reasonable bias-variance trade-off on the joint data, corroborating the conclusions we have drawn from the joint sample.

The full article is here. It contains some tables and graphs.

I have not checked this analysis myself, and of course all conclusions depend on assumptions, but I like the general approach of adjusting survey data in this way, and even if this analysis has its imperfections it can be the starting point for further work and it cam motivate similar studies in other countries.

6 thoughts on “Mister P for the 2020 presidential election in Belarus

  1. Sociology on political topics is banned

    But the authors had no trouble doing street polls around the country?
    Is Viber controlled within Belarus?

    Secret police may

    Of course the election was rigged. I do find the complete flip unusual.

    Two or three casual commentators on the election seemed to think,given the various arrests of opponents and all sorts of bizarre shenanigans that were going on at the time that Tsikhanouskaya would almost certainly have lost a honest election but doubted if Lukashenka would have gotten anything much over 60% at most.

    • jrkrideau, would you mind giving any reference to these comments? I have followed up the whole buzz around the election in Belarus but haven’t seen any comments like these by any trustworthy experts. But it might happen that I missed something.

    • jrkrideau says:

      “But the authors had no trouble doing street polls around the country?”

      The polls were performed by the volunteers, selected via interviews, trained and instructed by our sociologists. The ban of independent sociology means that we did not have a license to do that and the volunteers could be fined in the worst-case scenario, but hopefully this has not happened. The whole procedure of training the volunteers as well as the instructions and questions are described in full detail in our preprint https://arxiv.org/abs/2009.06615. Whilst the raw data obtained on the streets and from the Viber poll are available on GitHub https://github.com/Narodny-Opros-2020/national-poll-analysis.

      jrkrideau says:

      “Is Viber controlled within Belarus? Secret police may”

      This seems to be an extremely bold statement. Viber is a respected international company which fully respects GDPR and privacy, see https://www.viber.com/en/terms/viber-privacy-policy/. There is no reason to believe it can be controlled by the secret police of Belarus. If you have any evidence, we would very much like to hear about that.

      jrkrideau says:
      “Two or three casual commentators on the election seemed to think, given the various arrests of opponents and all sorts of bizarre shenanigans that were going on at the time that Tsikhanouskaya would almost certainly have lost a honest election but doubted if Lukashenka would have gotten anything much over 60% at most.”

      I find this statement completely bold. On the contrary, in this piece of research we provide in a fully transparent manner the whole pipeline of gathering and processing data, modelling and drawing conclusions (see the preprint and our GitHub). I haven’s seen any casual commentators claiming that “Tsikhanouskaya would almost certainly have lost an honest election but doubted if Lukashenka would have gotten anything much over 60% at most.” But I have seen the head of the official Institute of sociology of the national academy of science earlier in 2020 (before the pike of the COVID-19 crisis) announcing the ranking of Lukashenka in Minsk not exceeding 24% (see for example https://euroradio.fm/sacyyolag-gramadstva-nemagchyma-perazagruzic-na-zavadskiya-naladki). Unfortunately, Dr. Korshunau lost his soon job after that. There also were independent (and held without the license) exit pools, which agree with the ratings obtained by MRP in our work (see https://charter97.org/en/news/2020/8/9/388815/). Finally, in September 2020, Chatham House performed its own CAWI survey https://www.chathamhouse.org/2020/10/what-belarusians-think-about-their-countrys-crisis the results of which are also showing a likely victory of Sviatlana in the first round of the election. Their results are also located within the credible intervals predicted by MRP on the street data with a reasonable credibility level.

      Of course, our analysis has some imperfections and relies on several assumptions… These aspects are honestly discussed in the last section of the preprint (https://arxiv.org/abs/2009.06615). But we believe these are the best estimates of the pre-election ratings one could obtain given the circumstances and doubt there would have been any serious deviations from them had the election in Belarus been free and fair. Once again, you can see on your own (https://github.com/Narodny-Opros-2020/national-poll-analysis) that the signal we see in the data is very strong.

      Regards, Ales

  2. jrkrideau, would you mind giving any reference to these comments?I have followed up the whole buzz around the election in Belarus but haven’t seen any comments like these by any trustworthy experts. But it might happen that I missed something.

    I suspect you might be confusing the statement of 60% at most for Lukashenka with the totally misleading interpretations of the results of the report by the Voice platform, ZUBR and Honest People https://bit.ly/ voice-belarus-report. However, this report claims quite a different thing.

    First of all, one must understand that the Voice were collecting photos of the ballots from the willing to share them voters across the polling stations from the whole country. Thus, for every single polling station they collected a subset of the confirmed by photos votes. This means that for every polling station they obtained lower bounds of the confirmed votes for all of the candidates. Subsequently they could see the obvious rigging of the results at the polling stations where these lower bound were exceeding the officially announced results. According to the report above, this happened at at least 30% of the polling stations based on not so tight lower bounds (around 500000 photos collected in total out of 6000000-8000000 potential voters)!?

    Regarding the magical upper bound of 60% for Lukashenka, which somehow appeared in your comment. The report (https://bit.ly/ voice-belarus-report) claims (pages 1-7) that the Honest people managed to take photos of the officially announced (most likely rigged) by the local election committees (LECs) results at all the location where these results were announced (1310 polling stations out of 5,767 polling stations across the country) and at these 1310 polling stations the officially announced by the LECs results Lukashenka got 61,7% whilst Tsikhanoukaya – 25,4%, whilst for all of the polling stations where the results were not announced this would imply that the official results would have to give Lukashenka 88,9% and Tsikhanouskaya – 2,97%. Quite an improbable difference, isn’t it? And as you clearly see these 61,7% cannot in any sense be a good estimate of the real support of Mr. Lukashenka, which is discussed in detail in the post https://statmodeling.stat.columbia.edu/2020/11/19/mister-p-for-the-2020-presidential-election-in-belarus/.

    Moreover, the Honest People claim (page 5 of the report) the following:

    – In Minsk, according to the official results that were reported by Central Election Commission, 126,861 voters at 731 polling places had voted for Tikhanovskaya. We analyzed official records from 432 polling places in Minsk (accounting for 59% of all polling places in Minsk). In these 432 places, there were 132,941 votes for Tikhanovskaya reported — 4.8% more than the total number of votes reported by the Central Election Commission for all 731 polling places.

    According to the CEC, in the Minsk region, 115,304 voters at 993 voting places
    voted for Tikhanovskaya. We analyzed results that were reported by 257 voting places (26% of all voting
    places in Minsk region) and found that Tikhanovskaya received 114,553 votes
    there. This is 99.3% of the votes reported by CEC. These results imply that in
    the remaining 74% voting places, only 0.7% votes went for Tikhanovskaya. Confusing, isn’t it?

Leave a Reply

Your email address will not be published. Required fields are marked *