The p-value is 4.76×10^−264 1 in a quadrillion

Ethan Steinberg writes:

It might be useful for you to cover the hilariously bad use of statistics used in the latest Texas election lawsuit.

Here is the raw source, with the statistics starting on page 22 under the heading “Z-Scores For Georgia”. . . .

The main thing about this analysis that’s so funny is that the question itself is so pointless. Of course Hillary’s vote count is different from Joe’s vote count! They were different candidates! Testing the null hypothesis is really pointless and it’s expected that you would get such extreme z-scores. I think this provides a good example of how statistics can be misused and it’s funny to see this level of bad analysis in a high level legal filing.

Here’s the key bit:

There are a few delightful—by which I mean, horrible—items here:

First off, did you notice how he says “In 2016, Trump won Georgia” . . . but he can’t bring himself to say that Biden won in 2020? Instead, he refers to “The Biden and Trump percentages of the tabulations.” So tacky. Tacky tacky tacky. If you want to maintain uncertainty, fine, but then refer to “the Clinton and Trump percentages of the tabulations” in 2016.

Second, the binomial distribution makes no sense here. This corresponds to a model in which voters are independently flipping coins (approximately; not quite coin flips because the probability isn’t quite 50%) to decide how to vote. That’s not how voting works. Actually, most voters know well ahead of time who they will be voting for. So even if you wanted to test the null hypothesis of no change (which, as my correspondent noted above, you don’t), this would be the wrong model to use.

Third . . . don’t you love that footnote 3? Good to be educating the court on the names of big powers of ten. Next step, the killion, which, as every mathematician knows, is a number so big it can kill you.

Footnote 3 is just adorable.

What next, a p-value of 4.76×10^−264?

The author of the expert report is a Charles J. Cicchetti, a Ph.D. economist who has had many positions during his long career, including “Deputy Directory of the Energy and Environment Policy Center at the John F. Kennedy School of Government at Harvard University.”

The moral of the story is: just because someone was the director of a program at Harvard University, or a professor of mathematics at Williams College, don’t assume they know anything at all about statistics.

The lawsuit was filed by the State of Texas. That’s right, Texas tax dollars were spent hiring this guy. Or maybe he was working for free. If his consulting fee was $0/hour, that would still be too high.

Given that the purpose of this lawsuit is to subvert the express will of the voters, I’m glad they hired such an incompetent consultant, but I feel bad for the residents of Texas that they had to pay for it. But, jeez, this is really sad. Even sadder is that these sorts of statistical tests continue to be performed, 55 years after this guy graduated from college.

P.S. The lawsuit has now supported by 17 other states. There’s no way they can believe these claims. This is serious Dreyfus-level action. And I’m not talking about Amity Beach.

154 thoughts on “The p-value is 4.76×10^−264 1 in a quadrillion

  1. Either this guy is embarrassingly incompetent to a degree that’s almost hard to believe for an economics professor, or he’s deliberately trying to dupe the court. Either way it’s bad and sad!

        • Go to the current any medical journal and click the first paper you see. I bet they do the same thing: testing a null hypothesis no one believes and interpreting a low p value to mean their favorite theory is true.

        • Nah, this is worse. When you ask an ordinary non-statistician whether it makes sense to test the hypothesis that ibuprofen is exactly as effective as aspirin at treating headache, they’ll say sure, why not?. (In my experience you can get a different answer if you phrase this sort of question as “could it be EXACTLY as effective, to within 0.00000001% or closer? To most people, “exactly” doesn’t mean exactly.)

          But if you ask people whether exactly the same number of voters would be expected to vote for Biden as did for Hillary four years earlier, nobody is going to say that makes sense as a hypothesis.

        • Im just sayin this guy and most of his peers have probably built their entire careers doing exactly this kind of analysis and would be confused to find people think it is idiotic.

        • Anon:

          Yes, and it’s even more than that. These people spent the first half or 2/3 or 5/6 or whatever of the careers being told by statisticians that they need to compute p-values or else they’re being unrigorous. Consider, for example, horrible statistical terminology such as “exact tests.” Now all of a sudden they’re being told be statisticians that p-values and null hypotheses are idiotic. Sounds like mixed messages from the statistical establishment. First they play by the rules and get success. Then the rules change. It makes it hard to be a hack.

        • My experience with many engineers is that they behave exactly as you say. They have learned it is important to test null hypotheses and obtain p-values, and they dutifully do it until they obtain something that they can claim has been validated, which they then publish. It never occurs to them that there is anything wrong with this. As engineers they are trained to use black boxes, and hypothesis testing is just another tool/filter that has to be applied.

          Many posters here misunderstand these behaviors because of the filtering effect generated by them being good researchers who mostly interaact with good researchers.

        • Maybe NHST can be declared invalid by the supreme court? Or they will use all the precedent cases that relied on it to call it valid.

        • > Im just sayin this guy and most of his peers have probably built their entire careers doing exactly this kind of analysis and would be confused to find people think it is idiotic.

          Do you mean they did exactly the same analytics, or maybe 0.0001% different analytics?

        • Do you mean they did exactly the same analytics, or maybe 0.0001% different analytics?

          It really is the standard BS that peer reviewers try to force you to do.

      • If it would be unusual for the model tested to be accurate then something unlikely given that model isnt necessarily unusual.

        I can have a model that predicts when I let go of a weight it flys off into the air. If it drops to the ground instead every time that wouldnt be unusual, but the p-value would be low.

  2. I think we’re all agreed that the two outcomes aren’t similar — for one thing, Biden won! If that’s really what they want to allege, well, more power to them!

  3. Hi thanks for the post! I wanted to get something cleared up: if a competent statistician what would this trial have looked like? I realize that the premise is ridiculous, and I understand why the binomial distribution doesn’t belong, but what distribution would have been right? Thanks!

    • Dan:

      The right thing to do here, if the goal were to compare candidates’ vote shares in the two elections, would be to use as a baseline other year-to-year differences. You could draw a scatterplot with one dot per state showing the election outcomes in 2020 vs. 2016, and similar plots for 2016 vs. 2012, and 2012 vs. 2008, etc., to establish a baseline of what the changes could look like. And of course you’d see that changes happen in every election; there’s nothing suspicious about that.

    • And I think more generally claiming null hypothesis being violated is a greater burden to bear especially in fields like Public Policy, Establishing some kind of treatment protocol etc. Even more specifically, by what amount is the null hypothesis being violated is also important.

  4. Not sure how good his math is either. No finite number is “almost an infinite number”. Also, the image of a Deputy Directory is lovely. Economic man as a telephone book, but yes he should pay for the privilege of working. Phone books are not very useful any more.

  5. Before we judge the expert’s competence, we need to consider ‘what could a competent statistician do?’
    And when the answer is ‘not much’, we realize that this isn’t so much an evidence of his incompetence as it is of his unscrupulousness.

      • The point *is* for the state of Texas to spend its money; the better to stir up the emotions of the rabble; to push a losing cause; the spirit of revenge that will enrich, with some zest and — if they’re lucky, even some action — the final act of the lives of the tens of millions of dead-enders?

  6. “The moral of the story is: just because someone was the director of a program at Harvard University, or a professor of mathematics at Williams College, don’t assume they know anything at all about statistics.“

    I am inclined to entertain the conjecture that PhD statistics programs will produce top quality statisticians even if they changed their admission to a “diffuse prior” approach – randomly select from the applicant pool without having to look at the application.
    It is probably a less biased procedure on social/race/gender equality metrics.

      • I interpreted your statement as: having a mathematics PhD is no indication that they know “anything at all” about statistics.

        My assumption was that to be a mathematics professor one must at least have a PhD in mathematics, which further implies that they have taken at least an undergraduate class in probability theory and possibly a graduate course in measure theory.

        From which I (unreasonably?) have jumped to the hypothesis that random selection would not do that bad a job after all. The remark about the diversity is simply an extension of the diffuse prior. I was not insinuating anything nefarious on your part. I admire you and your work Dr Gelman!

  7. Taft, Chief Justice: “No litigant is entitled to more than two chances, namely to the original trial and to a review, and the intermediate courts of review are provided for that purpose. When a case goes beyond that, it is not primarily to preserve the rights of the litigants. The Supreme Court’s function is for the purpose of expounding and stabilizing principles of law for the benefit of the people of this country, passing upon constitutional questions and other important questions of law for the public benefit. It is to preserve uniformity of decisions among the intermediate courts of appeal.”

  8. I think that screenshot above is missing one crucial detail: how that economist interpreted the results of his statistical analysis.

    ——————–

    13. There are many possible reasons why people vote for different candidates. However, I find the increase of Biden over Clinton is statistically incredible if the outcomes were based on similar populations of voters supporting the two Democrat candidates. The statistical differences are so great, this raises important questions about changes in how ballots were accepted in 2020 when they would be found to be invalid and rejected in prior elections.

    ——————–

    Note how he basically just goes completely off the deep end here, implying that somehow people liking Joe more than Hillary is proof of some elaborate problem with the voting system in 2020.

  9. They can “reject it many times more than one in a quadrillion times.” What does that even mean? So how much more? Like one in ten? I’m going to reject their work even more, like one if one times.

    This “Z score” is hilarious. The only thing it proves is that this ECONOMICS PHD dude has no understanding of statistics. At least the mathematician consultant affidavit was a meaningful calculation. The calculation here is literally empty of content.

    Applying the same calculation to all states 2020 vs 2016 shows that many have similarly large “Z scores.” Texas and Colorado are the strangest states by this measure. Only Florida and Hawaii are anywhere near a “non-significant Z score.” I’m not certain I’ve ever seen someone who should have known better perform a more meaningless computation.

  10. You pick some funny things to criticize. Of course the lawsuit does not say that Biden won Georgia. The whole point of the suit is to dispute that. And if the vote count is being disputed, it is reasonable to call it a tabulation.

    The supporting statistical analysis is in an appendix, and that may not have been released yet. That is where the dubious work is hiding.

    • You say it’s funny, I say it’s infuriating. Other reasonable characterizations of Cicchetti’s work would be sad, or the dumbest fucking thing I’ve read today since the last pro-Trump filing.

      In any case, worth criticizing.

  11. One thing I didn’t understand here is why an economist who specializes in a range of topics that has nothing to do with the topic he addresses here:

    From his wikipedia (!) page:

    Cicchetti has testified before regulatory agencies in the U.S. and abroad on tariff design, the rate of return, and the organizational structure of the natural gas, electricity, water and telecommunications industries. He has prepared expert testimony for various federal proceedings on a variety of topics.[1] His testimony in environmental litigation has dealt with natural resource damage assessment and cost allocation under the Comprehensive Environmental Response, Compensation and Liability Act (CERCLA) and the Oil Pollution Act. He has worked in developing nations throughout the world and designed the World Bank’s Build, Own, Operate, and Turnover (BOOT) public/private investment program for infrastructure development.

    Or do we see him as the statistical equivalent of Rudy’s star witness? I want to say to him: “Why did you guys do, did you take it and do something crazy to it (the numbers)?” (bites lip).

    • I’ve known of Cicchetti’s work in economics for many years. He is a “well-respected” economist (whatever that may mean). He certainly has a better resume than I do. I don’t know that this example is a general statement about economists vs statisticians, though it might be. But I suspect that we can find well respected academics in virtually all fields that do such shoddy and inexcusable work – and I doubt it is worthwhile debating how common it is in different fields (though I do think it is worth each field at least thinking about how they can reduce it).

      I am astounded, however, that he did this and that they would have the nerve to file this in court. What he has really “tested” is the hypothesis that Biden is really Hillary. And, his conclusion is that they are different people. I’m glad we needed an expert (regardless of what discipline he is in) to tell us that.

      • Dale:

        Interesting. And that Williams professor was considered a well respected mathematician, or so it seems. Perhaps one problem is that these people have the impression that statistics is nothing but a bunch of formulas, so that computing a probability is akin to saying that 2 + 3 = 5.

        From our point of view, these credentialed experts are making fools of themselves by doing ridiculously bad applications of statistics. From their point of view they’re just technical consultants, as if they’re given a document to review and they’re checking that the numbers add up, or that the sentences show subject-verb agreement. We’re saying these documents make no sense, and they’re thinking of themselves as mathematical proofreaders of a sort.

        I’m trying to think of an analogy . . . let’s try this. The expert produces a report saying, “The Loch Ness Monster weighs 243,091 pounds.” We say how ridiculous it is, and the expert replies that it’s really quite simple, he calculated the weight based on the water displacement and the specific gravity of water, and here’s the formula. The numbers in the formula make no sense, but from his perspective it’s not his job to check that; he’s just been hired to multiply the two numbers together.

        But then everything goes off the rails when the results are used to conclude that, because the Loch Ness Monster exists, we can conclude there’s an election fraud. In all seriousness, anyone with any knowledge of empirical economics would know to do a placebo control test, at which point they’d learn that, according to their assumptions, there’s a Loch Ness monster in every one of the world’s lakes, oceans, ponds, and bathtubs, and that by their lights there’s overwhelming evidence of fraud in just about every election ever tabulated.

        • This is but the latest example in a long series of examples you have shown on this blog – made up data, absurd hypotheses, erroneous conclusions, etc. What they seem to share in common is that an “expert” is willing to allow their work to be used for purposes that go beyond what their expertise can support. We all do that – at least, I’ve done it numerous times as an expert witness. You take on a case, knowing full well what your client is using it for, and which usually involves influencing perceptions that go beyond what the science can support. Some people do more of this than others – but I really don’t think any of us are immune (it happens in our teaching all the time).

          What is missing is the ethical responsibility that comes with “expertise.” Most of these examples seem to illustrate the lack of moral compass – and the unwillingness/inability of professions to hold people accountable for their judgements. I’m quite sure Cicchetti will remain a recognized accomplished economist, just like the guy at Brigham and Women’s Hospital will be largely unscathed by the Surgisphere debacle, and as William College will sidestep the issue that their mathematician put his affiliation on that affidavit without any disclaimer that he was not speaking on their behalf.

          It makes me sad that we have reached this point, and that it appears to be getting worse. Academic freedom was supposed to mean something different, but has become a way to insulate ourselves from holding each other accountable. In an environment where perceptions have become more important than facts (and even the existence of the latter has become questionable), this is a very dangerous state of affairs.

        • Certainly a demonstration of the lack of accountability and acceptance of conflicts of interests. Great exchange Andrew and Dale.

          There are pockets of exceptional expertise that do seem to have influence. But it takes very close monitoring of the epistemic environment that some retired academics and officials have negotiated well. Having exceptional communication skills is a must in such an effort; instead some exceptional experts sabotage themselves in this regard.

        • On the optimistic side, I’ve been following all of these filings on law Twitter and the lawyers aren’t fooled by the statistical garbage being spewed here and elsewhere.

          Certainly they’re predisposed to presume it’s cockeyed bullshit because the quality of the lawyering going into the legal filings is actually even worse than the quality of the statistical analysis in these affidavits. But they’re able to sniff out the bullshit in part because it’s so egregious and despite (likely) no statistical training they’re not idiots, but also because there’s a certain way things are done in terms of presenting expert testimony that’s all been throw out the window here.

          Our legal system has instituted some really coarse filters to keep out bad expert testimony that I’m sure are regularly abused and misused, but they do at least succeed in keeping this crap out.

        • As bad as this testimony is (and I used to work with Charlie about 35 years ago, FWIW) I have often been asked to make “calculate this number” testimony. I’ve turned a fair number down, but I’ve accepted a number as well. Every once in a while, a party actually needs a number which an expert can give in good conscience without buying into a whole case. An example is an estimate of lost wages from a terminated worker, which you can make whether his case is good or bad as to whether his termination is the fault of the defendant or not.

          That said, there is a line I would never cross, which is when I was making one of these calculations, I insisted on putting in my affidavit that that was all I was doing, that the number I was calculating shouldn’t be taken as an indication that the side I was testifying for was justified in their claim. In addition, you have to be clear about all the assumptions you are making, and if the side hiring isn’t comfortable with your statement of those assumptions, you just move on. I think if Charlie had been straightforward about what the assumptions he was making here in the affidavit itself, then the State of Texas would have said what we are all are saying here: namely, Huh? And this affidavit would never had seen the light of day.

        • Doubtful.

          In part, you’re looking at the wrong step of the process — this isn’t even the craziest affidavit that’s been filed in one of these election challenges this week*. The expert’s affidavit can go in the filing, but without caveats and explanations of methodology like you’ve described, the court will reject it.

          Theoretically there are some standards of pleading whereby the lawyers are opening themselves to sanction by including this stuff in bad faith (knowing the court will throw it out for failing to meet those standards), but apparently that almost never happens.

        • I’m upset about these cases not because these individuals are opining outside their area of expertise, but rather the opposite. In both cases, the “experts” had the credentials and background to do statistical inference. It’s not like only people with a Statistics PhD have the ability to not vomit all over themselves when dealing with data.

          I do not buy the proof reading analogy. Using a an independent binomial is wrong on first principle. It would be like using a drag equation to calculate the weight of the Loch Ness Monster rather than displacement. On the plus side, at least he limited himself to saying that it “raised important questions” and listed a couple non-fraud explanations. The Williams guy (Steve Miller) on the other hand actually claimed votes were stolen in his junk analysis.

        • Andrew said,
          “according to their assumptions, there’s a Loch Ness monster in every one of the world’s lakes, oceans, ponds, and bathtubs”

          Well, I had to go check my bathtub after reading this. No Nessie (unless it’s microscopic? or my eyesight is even worse than I think it is?)

      • “What he has really “tested” is the hypothesis that Biden is really Hillary. And, his conclusion is that they are different people.”
        Not even that. If Biden is indeed really Hillary, still the result can be explained by more voters coming out because they were more against Trump after having seen him in office than before.

    • FWIW, I was an undergraduate student in Cicchetti’s environmental econ course at the University of Wisconsin back in the 70s! I can’t say I remember much about his work as a teacher; I learned very little, but that may have been me. It’s a bit off-topic (OK, more than a bit), but I vividly remember one of the discussion section meetings. The topic was electricity pricing, in particular the difference between residential and commercial/industrial rates. Sitting there, I sort of recalled monopoly price discrimination theory, but I couldn’t remember the exact terminology. I raised my hand and tried to explain about segmented markets, different demand elasticities but without the right language, and the TA cut me off. In front of the whole class he said — I paraphrase — “Listen to this guy. This is exactly the sort of ignorant, prejudiced thinking you’re studying economics to overcome.” I was furious and decided, at that moment, I would learn enough econ that no one would ever be able to do that to me again. Can’t blame Cicchetti for that, although if he held his weekly TA meeting without discussing monopoly price discrimination as a potential contributor to differential electricity rates, it says something about where he’s coming from.

  12. In addition to different Democratic candidates (and the fact that Republican candidate now had a record to be judged on), we might add that the population of Georgia probably had about 10 percent of its voters over 65 dying between 2016 and 2020 and a cohort of 14-17 year-olds newly qualified to vote between 2016 and 2020 plus migration in and out, so these aren’t really the same populations at all — QED…. Null hypothesis rejected.

  13. “Statistician certain beyond a doubt: Joe Biden is not Hillary Clinton!”

    Reminds me of the old joke of the hot air balloon travellers who got lost; they sailed lower to the ground and shouted to a person, “where are we?” The person seemed to think for a moment and then shouted back, “in a balloon”. “Ah”, said one traveler to the other, “that pwrson must be a theoretical mathematician!” “Why?” “Well, the answer was precise, obviously correct, and completely useless.”

  14. Of all the specious reasoning and mistakes, this is by far the least important, but somehow I can’t get over the fact that in footnote 3, Cicchetti says that the number of zeros in the p-value increases exponentially in Z. It increases (approximately) *quadratically*!

  15. According to wikipedia, the article about Charles J. Cicchetti, https://en.wikipedia.org/wiki/Charles_Cicchetti, has this warning:

    “It is proposed that this article be deleted because of the following concern: the subject of this article meets neither the WP:BIO notability standards or the WP:NACADEMIC notability standards (proposed by 99.112.100.63)

    If you can address this concern by improving, copyediting, sourcing, renaming, or merging the page, please edit this page and do so. You may remove this message if you improve the article or otherwise object to deletion for any reason. Although not required, you are encouraged to explain why you object to the deletion, either in your edit summary or on the talk page. If this template is removed, do not replace it.

    The article may be deleted if this message remains in place for seven days, i.e., after 04:26, 16 December 2020 (UTC).
    If you created the article, please don’t be offended. Instead, consider improving the article so that it is acceptable according to the deletion policy.”

    Noting that “04:26, 16 December 2020 (UTC)” is next week somewhere on the planet, what is going on? What (other?) sin has he committed? Who else is suffering the same shaming?

    • What an interesting development! As offended as I am of his testimony, I don’t like this reaction. Surely there is a better way to hold him accountable. The wikipedia article seems accurate to me. I wouldn’t object to adding in a section about this latest debacle, but it actually holds the profession more accountable to keep his past “achievements” in plain view. You ask “What (other?) sin has he committed?” Well, perhaps many. But how are we to know? Expunging his past is not a good reaction to what he has done. Re-evaluating it would be more appropriate.

  16. It is amazing how easily people are wiling to trash their own reputations for “causes”. This goes with the cubic fit for COVID-19, the only difference being the cubic fit turned out to be “correct”, as long as we flipped the curve over to go the other way (well come on – that is essentially correct up to a -1).

    Speaking of COVID, I hope everyone has noticed that there has been a natural experiment going on, in that different countries have followed different policies. There is a pretty consistent set of policies that in a significant number of countries have driven cases to zero, and then there is everywhere else. Maybe not everything in that set of policies was needed, but who wants to experiment to find out. Unfortunately there is no real way to hold accountable the people who likely have caused ten of thousands of needless deaths (if not more). And even though I am likely to get flamed, no it is not just like the flu, ask any HCW on the front lines of this.

    • The people causing the deaths are those refusing to correct vitamin and oxygen deficiencies with vitamins and HBOT as has been done in China since Feb. The response by the medical community has been epically incompetent. They have been choosing the most expensive and dangerous intervention without evidence of efficacy at every turn, while cheap and safe ones with evidence go ignored.

  17. If I may, odds are the guy is not a total fool but rather is motivated by the money he receives for producing an argument that is defensible but ridiculous. The argument is defensible because he took two things and compared them. He was asked to do this. He received compensation for this. This example stands out because this is a national election, but similar stuff goes on every single day in the legal world. The norm is not: let’s find someone to do an objective analysis. The norm is: we want to make this argument, so we need someone who can give us a way to make that argument. You talk with people you know. You identify who will say what for how much. You aren’t asking them to do a whole, balanced or relatively complete analysis: you want things done your way. You then typically have them do additional work so it doesnt look so much like you really wanted this and some other points made. That is how litigation works.

    Why? Because the model converges on strategy, not on truth. Your job is represent your client, not the abstraction of truth. Your expectation is the other side represents its client. It is their job to expose the flaws in what you say.

    In the Trump case, again what stands out is that Trump is paying people to make arguments that cant get past a judge, that cant make it to any substantial hearing because they are as paper thin as this argument example. That leads me to believe the actual intent isnt to dispute the election but something else. My guess is that he a) may believe that keeping this grievance open enhances the chances of winning at least one seat in GA and/or b) he is making a big deal about being cheated so any prosecutions of him become tinged with aspects of show trials designed to silence an enemy. I think ‘and’ fits well, but I dont know the proportions.

    • Jonathan:

      Sure, but:

      1. The quadrillion stuff really is foolish. So this guy is maybe not a complete fool, but he’s enough of an idiot not to know how much of an idiot this makes him look. First rule of clueless is to be so clueless that you don’t know how clueless you are.

      2. Beyond all that, I don’t know this guy, and any judgment I can make of his abilities will be based on what he wrote. If he writes stupid stuff, this makes him look stupid, although I agree there’s an alternative model where he’s not really stupid, he’s just kinda stupid and also kinda corrupt. Or yet another model in which he’s a martyr, courageously torching his own professional reputation so as to serve a political cause, a kamikaze piloting the tiny little fighter plane of his reputation into the massive aircraft carrier that is the certified election results from 50 states.

      Regarding Trump, that’s another story. People do what’s worked for them in the past. Trump’s spent his adult life going back on his word, telling obvious lies, treating everything like a negotiation, etc., so no reason to expect him to stop now. I guess he has various larger goals, but this sort of thing seems like how he operates, whatever the goals might be.

      • A few more things should be taken into consideration when thinking about his possible motivations. I sincerely doubt that money is one of them. He has been a Managing Director at Arthur Anderson and partners at several large consulting firms. Naivete can be ruled out as well – he has been an expert witness dozens (perhaps hundreds) of times, as well as being a state Public Utilities Commissioner. Ideology is a bit tougher – he was the first economist for the Environmental Defense Fund, but has also represented large corporate interests many times. It is hard for me to think of him as a typical Trumpophile. And, ignorance seems far-fetched to me: he has worked with some of the most capable economists (yeah, I know, that may not be saying much). So, I’m not sure what to make out of this.

        It does remind me of why I got out of environmental economics (a different path than the one taken by Peter Dorman). Much of his career has been as a successful environmental economist – doing the standard neoclassical approach to estimating values for environmental resources which have no readily available market prices. Some of that work is quite creative – and makes a contribution. However, it has always seemed a Faustian bargain to me: you have to agree to play by the economists’ rules, meaning that the only sorts of values you can estimate are based on willingness to pay or willingness to accept compensation. Once you make that assumption, then there is a considerable theoretical and practical apparatus to apply to environmental issues. And, depending on the issue and the client, you can collect from virtually any interest group. In that light, perhaps it is in-character for him to do this dirty political work.

        • Dale:

          Maybe “ignorant” would be a better description than “stupid.” The fundamental error seems to be overconfidence. Statistics is easy, right? Dude was a Harvard professor, what could go wrong? In that case, all his experience is a negative, not a positive, in that it just gives him the overconfidence that he knows what he’s doing. If it’s not ideology, then some ignorance must be involved, because otherwise why blow up his reputation for this?

    • They are feeding the grievance and will do so indefinitely.
      Analogy: the failed “Beer-Hall Putsch”.
      This mass psychosis is not going away; not until the real lesson is eventually learned.

      • Where did the shooting fish in a barrel reference come from? I ask because one of my early memories of my parting ways with mainstream environmental economics was when I saw an environmental activist referring to economic valuations of environmental resources as furthering environmental causes, saying it is “as easy as shooting fish in a barrel.” This is the kind of analysis that Cicchetti has spent his career conducting.

        • Dale:

          Nah, nothing specific. I was just saying that when we see something like “p = 4.76×10^−264” or “p is less than 1 in a quadrillion,” it’s so obvious that something’s wrong that it takes no effort to criticize it.

  18. Can’t we use the reversal-of-time heuristic to argue that it’s really unlikely that Trump won Georgia in 2016, using the same statistical argument? Let’s examine the evidence: In 2016, Trump claimed the election was rigged, even after winning it. Furthermore, the FBI and CIA found evidence that Russians had interfered in the election. By contrast, in 2020 security officials called it the “most secure election ever”. QED :-)

  19. Wait wait wait. Am I reading this right? This guy proposes to test whether the proportion of votes going to Hillary/Biden was the same in 2016 vs 2020. And the data he has is…TWO data points? The null hypothesis is the average of the two, and the alternate is that there are two different means? It can’t be that dumb, can it??

    • It’s very close to being that dumb, but it is not quite that dumb. Still, it is dumb enough that I feel myself getting stupider as I try to explain it. I’d better make it quick.

      Imagine this: there’s a big urn full of voters. A voter will vote for either Trump, or for the Democratic candidate. Voters are pulled out of the urn until the actual number of voters in the election have been selected (either 2016 or 2020), and their votes are tallied. If we were to do this many times for, say, the 2016 election, we would find that the number of votes for the Democrat is different each time: sometimes we happen to sample more Democrats, sometimes more Trump voters. Due to this variability, any single sample does not tell us exactly what fraction of the voters in the urn support the Democrat.

      The hypothesis being tested is: the fraction of voters in the urn who prefer the Democrat was the same in 2020 that it was in 2016. It turns out — shocker! — that even though a single sample does not tell us exactly what fraction of voters in the urn support the Democrat, we can still reject the hypothesis! We can be absolutely sure that more voters preferred Biden this year than preferred Hillary four years ago! Obviously this could never have happened if the election were fair. Stop the steal!

      • Aagugh my brain it hurtsss it!

        But but but…the variance & standard deviation from a binomial distribution would be ludicrously tiny with millions of votes! Surely that can’t be considered an estimator of the variation in vote percentage between elections 4 years apart!

        # Calculation of SD for binomial distribution
        # based on 2016 results
        n = (1877963 + 2089104)
        p = 0.459
        q = (1-p)

        sd = sqrt(n*p*q)
        sd
        #

        # 95% CI is basically

        1.96*sd
        # +/- 1945.341

        # Estimate of n*p is
        # 1,877,963 +/- 1945

        # In percentages:
        47.33883% +/- 0.04903726%

        …so I guess he’s saying that any deviation from Hillary’s percentage by greater than 0.049% means…FRAUD!!!

        (Reading back to the OP, I guess the z-score is based on the average variance across both elections, but this would also be similarly tiny)

        Ugh. Can the American Statistical Association, like, parachute in a strike team or something to say this is the biggest statistical atrocity of the 21st century?

  20. Well, isn’t he fairly accurate up until he starts speculating in note 13? I mean, he explains it in par. 12 that he found “with very great confidence that [he] can reject the hypothesis that the percentage of the votes that Clinton and Biden received in the respective elections are similar”.
    I mean, I think we can all agree with a great degree of confidence that 49.5% and 45.9% aren’t the same (transposition of digits notwithstanding).

    I mean sure, he was driving the crazy train to nowheresville with that analysis, but it doesn’t truly go off the rails until he tries to use the the z-statistic from that analysis to say anything about anything else. Specifically in par. 13 where he says “The statistical differences were so great, this raises important questions about changes in how ballots were accepted in 2020 when they would be found to be invalid and rejected in prior elections”.
    No, sir, it doesn’t. It really, really doesn’t.

    Don’t know if you’ve kept up with the other statistical arguments in these friends-of-Trump legal challenges (particularly the ones by Sidney Powell), but any one of them would make you want to put your hand through a wall.
    – There’s Eric Quinnell who *proved fraud* because he can’t get his head around the concept that some voters switched candidates from last cycle.
    – Shiva Ayyadurai (‘The Inventor of Email’) who is incapable of conceiving of ticket-splitting, so ***fraud***!
    – William Briggs (‘Statistician to the Stars!’) who took an obviously highly-enriched subset of 1% of a survey and said “yep, bet everybody looks like that”.
    – Navid Keshavarz-Nia, who is such a data whisperer that he could intuit “that the software algorithm manipulated votes counts forging between 1-2% of the precinct results to favor Vice President Biden”, going on to clarify that the “software performed data alteration in real-time in order to maintain close parity among the candidates and without raising red flags”. All from looking at a feed of NY Times vote count updates.

    And hand to god I’m not making those little catch-phrases up. Those two actually referred to themselves as such in their affidavits.

    • Joseph:

      Ugh!

      Seriously, though, the big problem here is not various bottom-of-the-barrel hacks and random Williams College professors who were willing to join in this clown show, but the public officials of 18 states (!) who signed on to it. There’s an endless supply of hacks with Ph.D.s who will write pretty much whatever you ask them to, but there’s only one attorney general of Texas, only one attorney general of Missouri, etc. Elected officials hold a public trust, and for them to sign on to a lawsuit like this is malpractice, in the same way it would be malpractice for me to not show up to class, or for a doctor not to wash his or her hands before doing an operation, or for the New York Jets to hire a defensive coordinator who doesn’t even know how to throw a game convincingly.

      • They are doing *exactly* what their peculiar public wishes them to do!
        Are they any better in Missouri than they are in — say — Honduras?
        They just haven’t had the opportunity (in many generations) to show what they can do.

      • Absolutely.

        But there’s also some real and justified upset in the legal community. Even in a profession knwn for being scuzzy, the lawyering going into these has been ghastly and in such transparently bad faith that it’s setting some new lows. The likeliest explanation is that Sideny Powell and Lin Wood et al simply love the donation money hose they’re being blasted with for pursuing this garbage, and give zero fucks about what they’re doing to the country. I have alternate explanations too, but they’re all worse.

        Sheesh. We all hated politicians and lawyers already. But this, well, somehow these past 4 years have found new ways for us as a society to find endless original ways to hit new lows.

        • They can no longer be called “politicians” or “lawyers”.
          Was general Somosa a “politician”? Was Hans Frank a “lawyer”?
          Come on folks, these characters have bigger fish to fry than mere politics and lawyering.

        • Anon:

          I disagree. Politicians and lawyers lie and misprepresent the facts all the time. Not everyone and not always, but it happens enough that we can say it’s a routine occurrence. There’s something about the scale of it here that’s disturbing, though. A state attorney general’s not supposed to do this. The reason why it seems Dreyfus-like to me is how blatant the lying and misrepresenting is. Backroom deals I can understand—business has to get done sometimes. But for a state attorney general signing on to a report that’s so obviously bad—I mean, they didn’t even bother to get a semi-competent expert, I guess that Williams prof was busy that day and maybe Mary Rosh had other commitments?—as people have said, it’s a kind of signaling device, these elected officials affirming that they are such dedicated partisans that they’re willing to sign their name to anything. Again, Dreyfus-like: the worse the evidence the stronger they stand by it. A state attorney general is a highly placed legal officer. What does this say about our legal system that a third of the states in this country are endorsing the worst statistical argument this side of PNAS? Nothing good. It’s a good thing we have 3 independent branches of government etc etc but not good when one or two of the branches are acting so irresponsibly. I don’t like it when the Association for Psychological Science acts this way, but it’s a million times worse when it’s elected officers of the U.S. and state governments.

        • That is exactly what I am trying to say: it is nothing at all — *nothing* *at* *all* — like we here have any frame of reference for handling. You are correct in the historical sense. It is Dreyfus-like. The anti-Dreyfussards were the direct antecedents of Action Francais, Maurras, the Petain clique. They hated “liberalism” more than they loved France. Four years ago the French had enough of an historical sense to form a common front against that Nazi sow. The French are in that respect both dependable and respectable: “L’hypocrisie est un hommage que la vice rend à la vertu”. The Americans sadly may be forced to learn it by “the Russian Lesson”, which persuaded the Germans to finally throw in the towel (only after every last brick in the city of Berlin was dust).

        • Here is the first one I see:

          In this randomized trial in which patients with peripheral artery disease received treatment with paclitaxel-coated or uncoated endovascular devices, the results of an unplanned interim analysis of all-cause mortality did not show a difference between the groups in the incidence of death during 1 to 4 years of follow-up.

          https://www.nejm.org/doi/full/10.1056/NEJMoa2005206

          We see the “logic” that “no statistically significant difference” = “no difference”. So, why are these devices coated with paclitaxel if people believed the null hypothesis that it does exactly nothing for mortality made sense? The coating must be expected to do something beneficial for health right?

        • This paper doesn’t test a hypothesis, but I have looked at it.

          If they did it would be “a slight expected decline in titers of binding and neutralizing antibodies”. By slight decline they apparently meant ~70% with no observed plateau 4 months later. Is that really what they expected?

          Amazing how they don’t check statistical significance against their actual hypothesis, even though that would be a valid use.

        • At minimum, I suppose that they *ought* to have published antibody titres of controls alongside the same time history; just as a baseline; even if only “< epsilon".

        • At minimum, I suppose that they *ought* to have published antibody titres of controls alongside the same time history; just as a baseline; even if only “< epsilon”.

          This has nothing to do with what I said, but why?

        • Response to your comment below: “That has nothing to do with what I say, but why?”
          [1] Because I’m trying to be agreeable and find fault with the paper.
          [2] Because I think the story is about *contrasts* and there are two sets of contrasts of interest:
          [a] the contrast from one time interval to the next within the treated group.
          [b] the contrast between the treated and the untreated groups

        • Sorry, I don’t think we are on the same page at all.

          Data on a control group is not interesting or relevant at all to the flaw in the paper, which is trying to play off a ~70% decrease in antibody levels/activity after a few months with no apparent plateau as an “expected slight decline”.

        • I responded to this:

          “Can you find one paper in the most recent issue of NEJM that does not do this? Or pick a journal of your choice.”

          and this:

          “Find one. A single one.”

          I found a ‘single’ paper which does not test against an uninteresting null-hypothesis.

          I remarked in passing that perhaps, because it places confidence intervals around the fractions it reports, such could conceivably be regarded, by one who is inclined to see stupid hypothesis tests under every rock, as the surreptitious slipping in of an stupid hypothesis-test; the ‘hypothesis’ that said interval is bounded away from some uninteresting alternative.

          I then made some even stupider criticisms of my own, just forth sake of proving that fault can always be found in work; for failing to do something it did not set out to do: failure to include side-by-side examples of the fraction of response in the non-treated group. No, it was not a treat vs. no-treat comparison; but it might be reassuring to extremely skeptical persons that the effect shown in the the treated group does not arise at all in an untreated group (who knows, maybe people are walking around with antibody titres to this and we just did not know it).

          Finally I can point the fault in myself: of being argumentative for the mere reason that I cannot tolerate anyone saying something which I did not think of first!

        • Yes, obviously case studies also arent going to test a hypothesis. You chose a paper that doesnt.

          Like I said they did mention their hypothesis (expected slight decline) but decided not to test it for some reason.

          And you can do NHST with confidence intervals, credible intervals, p-values, bayes factors, etc. The error on logic does not depend on the specific method used.

          I think the PA lawyers managed to capture the idiocy of the NHST logic quite well:

          https://statmodeling.stat.columbia.edu/2020/12/08/the-p-value-is-4-76×10%e2%88%92264-1-in-a-quadrillion/#comment-1610201

          How many statisticians/researchers are laughing about how stupid this was then going to work the next day and doing the exact same thing? I bet almost all.

  21. I think we should note that Paxton’s gloss on the stats is even dumber than the original:

    The probability of former Vice President Biden winning the popular vote in the
    four Defendant States—Georgia, Michigan, Pennsylvania, and Wisconsin—
    independently given President Trump’s early lead in those States as of 3 a.m.
    on November 4, 2020, is less than one in a quadrillion, or 1 in
    1,000,000,000,000,000. For former Vice President Biden to win these four
    States collectively, the odds of that event happening decrease to less than one
    in a quadrillion to the fourth power (i.e., 1 in 1,000,000,000,000,0004). See Decl.
    of Charles J. Cicchetti, Ph.D. (“Cicchetti Decl.”) at ¶¶ 14-21, 30-31 (App. 4a-7a,
    9a).

    See:

  22. The defense (Pennsylvania) has now filed their responding brief which opposes this statistical argument:

    Texas further claims, again based on Dr. Cicchetti’s analysis, that “[t]he same less than one in a quadrillion statistical improbability” can be found “when Mr. Biden’s performance in each of those Defendant States is compared to former Secretary of State Hilary Clinton’s performance in the 2016 general election.” For this assertion, Dr. Cicchetti simply assumes that the likelihood of a given Pennsylvania voter in 2020 voting for Biden was the same as that of a Pennsylvania voter in 2016 voting for Hillary Clinton—and then concludes, based on that assumption, that the 2020 results were quite improbable. But it should not be necessary to point out that the 2016 and 2020 elections were, in fact, separate events, and any analysis based on the assumption that voters in a particular state would behave the same way in two successive presidential elections is worthless.

    • For this assertion, Dr. X simply assumes that the likelihood of a given outcome in the control treatment group was the same as that of the outcome in the control group—and then concludes, based on that assumption, that the observed results were quite improbable. But it should not be necessary to point out that the treatment and control groups were, in fact, treated differently, and any analysis based on the assumption that two different groups would behave the same way when they were different is worthless.

      Great, hopefully this can get ingrained into law as a precedent.

      • It was too perfect and I had to mess it up with a typo. Replace “control treatment” with “treatment”:

        For this assertion, Dr. X simply assumes that the likelihood of a given outcome in the treatment group was the same as that of the outcome in the control group—and then concludes, based on that assumption, that the observed results were quite improbable. But it should not be necessary to point out that the treatment and control groups were, in fact, treated differently, and any analysis based on the assumption that two different groups would behave the same way when they were different is worthless.

        • Suppose we anonymised medical papers with statistical studies in this way: replace the phrases in H0 with coronavirus related phrases such that it is about efficacy of mask usage. Just changing the language.

          And we let statisticians have a go at it. I wonder if there will be a significant bias in criticisms.

        • Yes. The only thing I see that NHST really measures is prevailing opinion, ie the collective prior.

          1) BS is more likely to be questioned when the audience disagrees with the conclusion.

          2) Sample size. If prior is high for the expected result it is more likely to get funding for large sample size and less noisy observations.

          In the effort to get rid of transparent priors, the tragic result ended up being opaquely measuring a weighted average of everyones priors.

  23. Question: I see multiple references in these comments to an unnamed “Williams College professor”, yet I see no reference whatsoever to who that might be. (For instance, I see nothing to suggest that Charles J. Cicchetti ever worked at Williams.)

    So: Who is this supposed William College professor?

      • It is interesting how Northwestern has quickly gone out of their way to distance themselves from a former employee who said negative (and foolish) things about Jill Biden (https://www.cnn.com/2020/12/13/politics/jill-biden-dr-first-lady-op-ed-joseph-epstein-northwestern/index.html) while Cicchetti and Miller have not elicited much from their former associations. In my view, the latter two are much worse than the op ed piece about Jill Biden (which I found distasteful, but don’t see any reason/need for the institution to distance itself from).

        • Dale:

          I don’t know, but here’s my guess. All three events were embarrassing for the institutions, but:

          1. Miller is a tenured professor at Williams. It’s hard to get rid of a tenured professor, and even to publicly criticize a tenured professor can invite backlash, in this case from people who support those lawsuits despite their lack of merit. The university doesn’t want to look like the censor in a free speech battle. Meanwhile, outside of Williams College and maybe the mathematics-professor community, Miller’s document is already forgotten. So I can see why the Williams admin would prefer to just hunker down while the controversy blows over. I guess it’s tougher for the students and faculty in the math department there, but maybe this will give them a good lesson regarding the fallibility of math professors and the difficulty of moving from theory to application.

          2. Cicchetti’s document is even worse than Miller’s, but it seems that, despite all the publicity form that lawsuit, there’s not much discussion of Harvard or the other academic institutions where Cicchetti worked. It’s hard to blame an institution for asinine things done by its employers after they leave. Although I guess it does make you wonder if the terrible judgment and statistical incompetence revealed by this report is a latent characteristic of the author that could’ve been apparent decades earlier when this guy was holding full time employment.

          3. The aspect of the Joseph Epstein case that was different was that Epstein himself highlights his Northwestern University credentials: “I taught at Northwestern University for 30 years . . .” And it seemed that he had still been listed on its website. In this case perhaps the institution judged that the cost was low to sever their connections with a long-retired lecturer. I’m not saying this was a good decision or a bad decision, just that it differs in some relevant ways from the Miller and Cicchetti cases.

  24. This is radical work, I am surprised it hasn’t been given wider publicity. Using a similar analysis, it is possible to show that practically all elections are faulty! In fact, as the expected outcome of an election is a sample from the same binomial distribution as applied 4 years previously, it is a wonder that we waste so much money running actual elections.

  25. Do you think they realise they just proved that every election ever, bar one, was rigged? Now they just have to work out which election was “correct”.

    (OK, it’s also possible every election ever was rigged, but just to different degrees.)

  26. I replicated his analysis, assuming that the same proportion of voters are Republican as in 2012, it’s practically impossible that Trump won in 2016! This fraud goes deeper than we thought! Was Trump’s whole presidency fraudulent?

    • Pennsylvania attacks Dr. Cicchetti’s probability analysis calculating that the statistical chances of Mr. Biden’s winning the election in the Defendant States individually and collectively, given the known facts, are less than one in a quadrillion. Penn. Br. 6-8. Pennsylvania argues that Dr Cicchetti did not take into account that “votes counted later were indisputably not ‘randomly drawn’ from the same population of votes” in his analysis. Penn. Br. 6-8. Pennsylvania is wrong.

      First, Dr. Cicchetti did take into account the possibility that votes were not randomly drawn in the later time period but, as stated in his original Declaration, he is not aware of any data that would support such an assertion. See Supplemental Declaration of Charles Cicchetti (“Supp. Cicchetti Decl.”) ¶¶ 2-3. (App. 152a-153a). Second, although Pennsylvania argues that such data is “indisputabl[e]”, Pennsylvania offers in support nothing other than counsel’s assertion. Unsworn statements of counsel, however, are not evidence. See Frazier v. United States, 335 U.S. 497, 503 (1948).

      In fact, Pennsylvania’s rebuttal to Dr. Cicchetti’s analysis consists solely of ad hominem attacks, calling it “nonsense” and “worthless”.

      Welcome to hell, PA counsel. People have been arguing NHST is nonsense for more than half a century now:

      http://stats.org.uk/statistical-inference/Bakan1966.pdf

      https://meehl.dl.umn.edu/sites/g/files/pua1696/f/074theorytestingparadox.pdf

      It is literally bizarro science that does the opposite of what science is supposed to do, and is so ingrained into the thought processes of the people doing it there is no getting through to them.

  27. I was taking a look at your article to try and get more information and outlook on this matter.
    You had a point, the binomial distribution, though the other things mentioned seemed to be simply attacking the guy without counter-evidence or stats.
    If this guy was off by many many 0’s and came up with 1 in a million, it would still be anomalous. Right?
    Given that these same states had the top most anomalous vote tabulation releases that seems to be reason to wonder.

    Curious, could you go through the election probabilities? What numbers have you come up with in your analysis?

    • John:

      That guy’s report was a joke because it had no content. He rejected the null hypothesis that nobody believed, which is that Clinton and Biden would have identical vote shares. Then he decided to reject another null hypothesis that nobody believed, which is that Biden’s share of early votes was the same as his share of late votes. His p-value could be 1 in a million or 1 in 43 billion or 1 in 867 zillion and it wouldn’t matter. To say that an election in 2020 is different than an election in 2016, or that early votes are different from late votes, is not “anomalous,” it’s just the way things are.

      • Andrew:

        Comparing elections is done all the time, and if there is a great change it should be noted as anomalous. Why? Because elections are generally voted on by people, and people tend towards voting the same way. There are occurrences of change, in fact. However, massive change, specifically in specific areas, specifically in areas most crucial to an election is an oddity at best.
        If, for example, the state of California went Red some eyebrows would be raised and it would be curious indeed. Why? Because CA has been Blue for some time and tends that way. Comparing the past to the present is what we have, so an event can be ‘out of line’ with what has been and that should at least be an indicator that something may be wrong. And, of course, if the #’s are outrageous (in reality, or if methodology aligns with reality I should say) then why simply stop at… “well, that’s different, I guess things change”? Should there not be a curiosity raised, especially given other events bent away from any norm in particular (and particularly) important regions?

        • John:

          The relevant baseline is the historical pattern of changes from election to election. Testing a null hypothesis of zero change is of no interest. The report discussed in the above post did not compare to historical patterns. It compared to the null hypothesis of zero change. You or anyone else can feel free to look at changes fro 2016 to 2020 in comparison to previous changes. The above-linked report does not do any of that. The report is useless, but that should not stop people from looking at actual changes. To put it another way: the fact that an incompetent consultant wrote a crappy report should not stop you or anyone else from looking at voting trends.

Leave a Reply to Martha (Smith) Cancel reply

Your email address will not be published. Required fields are marked *