Skip to content

You’ll never guess what I’ll say about this paper claiming election fraud! (OK, actually probably you can guess.)

Glenn Chisholm writes:

As a frequent visitor of your blog (a bit of a long time listener first time caller comment I know) I saw this particular controversy:


Very superficial analysis:

and was interested if I could get you to blog on its actual statistic foundations, this particular paper has at least the appearance of respectability due to the institutions involved. As a permanent resident not a citizen I tend to be a little more abstracted from American politics, although I do enjoy the vigor in which it is played in the US. Based on some of the vitriol I have seen from all sides you may not want to touch this with a ten foot pole, but I thought it would be interesting to get a political scientist with credibility analysis out there publicly. There does seem to be this undercurrent in this cycle where a group genuinely feel that somehow they were disenfranchised, my take is Occam’s Razor applies here and no manipulation occurred, but my opinion is worthless.

My reply:

I don’t find this paper at all convincing. There can be all sorts of differences between different states, and that pie chart is a joke in that it hides the between-state variation within each group of states. You never know, fraud could always happen, but this is essentially zero evidence. Not that you’d need an explanation as to why a 74-year-old socialist fails to win a major-party nomination in the United States.

Also, regarding your comment about the institutions involved, I wouldn’t take the credentials so seriously; this Stanford guy is not a political scientist. Not that this means it’s necessarily wrong, but he’s not coming from a position of expertise. He’s just a guy, the Stanford affiliation gives no special credibility.


  1. Matt VE says:

    “I don’t find this paper at all convincing. There can be all sorts of differences between different states…”

    Well, hell, they addressed in the second page of that paper (manuscript? series of disjointed paragraphs? list of stuff?):

    “Are there other variables that could account for our main effect (states without paper trails going overwhelmingly for Clinton)?”

    Wracking their minds, surely giving deep consideration to the host of state-level variables discussed in the universe of political and social analysis, they came up with… proportion non-Hispanic white and presidential voting margins.

    Two controls, main effect remains.

    Case. Closed.

    • Anon says:

      Very interesting paper, they present compelling evidence that the absence of paper trails in a primary predicts Clinton support, which is just what you’d expect if Clinton’s delegate support is, in part, driven by fraud.

      Of course, other explanations could be at work, although the effect remains controlling for whites and presidential voting margins.

      Further research is needed to precisely determine the extent to which Clinton support is driven by the fraud channel, as opposed to these other channels.

      • Bob says:

        Anon suggests: “Of course, other explanations could be at work”

        Well, a quick look at their supplementary materials shows that many of the states with an absecne of paper trails are in the South. What are the chances that the wife of a former southern governor and popular president will do better in the South than a jewish, northeastern, and socialist, old guy? Always trying to be Bayesian, I cannot assign that event a 100% probability—even though that is my first instinct. It is probably better to consider one’s prior distribution on distribution of votes for Clinton minus votes for Sanders —a quantity that is distributed between -100% and +100%. It seems to me that the probability mass would clump on the right hand side.

        Another old guy.

    • Andrew says:

      I’m puzzled by this comment from Anon, which is written in a spammy way “Very interesting paper . . .” and has spammy content but is not actually word salad but is targeted at this post. I guess it’s propaganda, or whatever you want to call it.

      It was a tough call on whether to screen it out. I screen out spam but allow all real comments (except for the very very rare comments that are seriously abusive). The comment above by Anon falls somewhere on the border between spam and a legitimate comment.

  2. GabbyD says:

    If you were them, how could we make it more convincing? What cuts of the data would bolster or destroy their argument( paper vs no paper trail states)?

  3. Chris Wilson says:

    I see your point, but Bernie’s no socialist. He’s a social democrat, or, in American terms, a mainstream New Dealer. So what IS interesting is the lengths the Democratic Party will go to defeat somebody like him (even if there’s no evidence of outright fraud).

    • Andrew says:


      I don’t think it’s interesting that Clinton will go to such lengths to defeat Sanders. She wants to win the primaries and get the endorsement of superdelegates, then get the Democratic nomination, then win the general election. That’s been her plan all along, no secret at all. If Clinton were running for president and not trying her best to win, that would be the strange thing!

      • Chris Wilson says:

        Hey Andrew, no I meant the entire rest of the Democratic party, including a lot of voters who I think are closer to Bernie on the issues. Clearly, HRC will do whatever it takes to win, and that’s no surprise :)

  4. alex says:

    It doesn’t increase my confidence in their findings that they start off quoting Abraham Lincoln saying something that he almost certainly never said. (The Lincoln entry on wikiquote has a good summary that notes that the first recorded written attribution of the quote to Lincoln came in 1886.)

  5. Frederick Guy says:

    Their dataset lists 13 non-paper-trail primary states for 2016. All but two of these states are southern or border states. They list 18 paper trail states, of which only five are southern or border (here counting MD and OK as southern-or-border). I would be very surprised if there were not regional differences in voting that are not captured in the variables “% non-Hispanic whites” and “redness” (the latter is Republican share in 1996-2012 presidential voting). The comparison with 2008 (Clinton *not* favored in non-paper-trail states in 2008 primaries), which is presented as further evidence of a 2016 problem, fits with the fairly obvious fact that Clinton vs. Obama in Democratic primaries in the South is not the same as Clinton vs. Sanders.

    • Elin says:

      It’s not the population of the state that matters so much as the populaton of Democratic voters. Yes, Black voters in the south were overwhelmingly for HRC and voter suppression efforts are also greatest there.

  6. D.O. says:

    Why they regress on Hispanic, but not on black vote? Blacks are the core voting bloc for Clinton.

    • Jonathan (another one) says:

      Because it changed the results?

    • GabbyD says:

      its “Non-Hispanic” Whites that they include — which should include Blacks right?

      • D.O. says:

        % of non-Hispanic whites in the state according to census is not good enough. Democratic electorate has much higher percentage of blacks than population overall (like 30% vs. 13% or something, it is not some small potatoes nitpick). And, if I understand it right, Southern whites are more Republican than all other types of whites. Another factor of Sanders/Clinton split is age (see Nate Silver, for example). It is trickier than black/Hispanic/white because how many young vs. older people come to a primary depends on campaign to a large degree, but it is probably irrelevant for allegations of fraud (as opposed to voter suppression, if someone tried to look into that).

      • Elin says:

        GabbyD what do you mean? Usually there is a 4 way split, White Non Hispanic, Black non Hispanic, Hispanic, Other.

  7. z says:

    How does this compare to your evaluation of your own critiques in fields your not “expert” in?

  8. Bob says:

    Not relevant to the topic but I find it interesting.

    LIGO reported detection of a second binary black hole merger. One overview paper gives, among other statistics, P values for each of three events:

    Event: GW150914 GW151226 LVT151012
    P-value 7.5E-8 7.5E-8 0.045

    Even though they calculate a P-value for LTV151012 less than 5%, they don’t claim it as an observation of a black hole merger. They give an alternate calculation of the P-value for GW150914 of 8.8E-12, but they stick with the more conservative 7.5E-8. Maybe if we adopted a P-value standard of 1E-5 instead of 5%, the garden of forking paths would be less of a problem.

    Much of the signal processing descirbed in the paper uses Bayesian inference. See


    • D.O. says:

      Ehhh, 10^-7 p-value is a BS. IMHO.

      • Martha (Smith) says:

        p-values smaller than that are common in gene studies.

        • D.O. says:

          With what null-hypothesis? That 2 specimens are completely unrelated and one of them is composed of random combination of base pairs?

          • Martha (Smith) says:

            As in most uses of statistics, the test and null hypothesis depend on the question being studied.

            • D.O. says:

              It’s hard to discuss things in abstract, but I hold an unscientific believe that on 10^-7 level you are well past such events as sample mislabeling, mishandling, contamination, or some equipment fluke. I do not think that people who report such small p-values are idiots, but we should take them for (not more) then what they are. That is that a random event from a well established, but necessarily narrow list of such events, is almost certainly didn’t happen.

              • Bob says:

                Well, the LIGO community are not idiots on all dimensions—they have talked funding agencies out of approximately $1E9.

                More seriously, the paper I linked to describes their calculation of the P-values. The underlying physics implies a specific structure for the signal generated by a merger of two black holes. It is a chirp-like signal—the duration of which is related to the masses of the black holes. When such an event is observed at essentially the same time at measuring instruments 3000 km apart, the P-value will be pretty small. Sure, it could be a coincidental fluctuation of noise at both detectors—but, not very likely.

                The biggest flaw in their analysis is that the probability of somebody generating a pair of spurious signals might be in the ballpark of 1E-6. I think that it would be difficult for anyone not an insider to do so with a budget less than $1E7.

                When you see an airplane fly over, do you even bother to calculate the probability that you are just seeing the effects of random fluctuations in the background noise in your eyes and nerves? That’s essentially what they did.


  9. Jack PQ says:

    The Stanford guy is a Psychology PhD student. I’m sure he’s smart, but maybe not his area. As Prof Gelman has shown many times, psychologists are not the best-trained when it comes to observational data statistics.

    • Anonymous says:

      The fact that they thought it made sense to show box plots with and without one high and one low outlier removed and different scales and then pointed out that the comparison didn’t really change indicates to me that they don’t really have a good grasp of their first semester undergraduate statistics material.

Leave a Reply