These corruption statistics are literally incredible! (Every political science student should read this article.)

Matthew Stephenson sends along this paper on the reliability of quantitative estimates of the magnitude and impact of global corruption and writes:

There’s no fancy statistical work here, and nothing about causal inference – my coauthor Cecilie Wathne and I were just trying to figure out where some of these oft-cited numbers come from… and we found that they’re often either completely made up, or based on what can be most generous described as gut feelings expressed in quantitative form.

From the paper:

We analysed ten global corruption statistics, attempting to trace each back to its origin and to assess its credibility and reliability. These statistics concern the amount of bribes paid worldwide, the amount of public funds stolen/embezzled, the costs of corruption to the global economy, and the percentage of development aid lost to corruption, among other things. Of the ten statistics we assessed, none could be classified as credible, and only two came close to credibility. Six of the ten statistics are problematic, and the other four appear to be entirely unfounded. . . .

First, using a combination of keyword searches and snowballing, we identified 71 potentially relevant quantitative statistics from a range of sources. . . . we narrowed our original list of 71 statistics to the following ten, which are the focus of our analysis:
1. Approximately US$1 trillion in bribes is paid worldwide every year.
2. Approximately US$2.6 trillion in public funds is stolen/embezzled every year.
3. Corruption costs the global economy approximately US$2.6 trillion, or 5% of global
GDP, each year.
4. Corruption, together with tax evasion and illicit financial flows, costs developing
countries approximately US$1.26 trillion each year.
5. Approximately 10%–25% of government procurement spending is lost to corruption
each year.
6. Approximately 10%–30% of the value of publicly funded infrastructure is lost to
corruption each year.
7. Approximately 20%–40% of spending in the water sector is lost to corruption each
year.
8. Up to 30% of development aid is lost to fraud and corruption each year.
9. Customs-related corruption costs World Customs Organization members at least
US$2 billion per year.
10. Approximately 1.6% of annual deaths of children under 5 years of age (over
140,000 deaths per year) are due in part to corruption.
We attempted to trace each of these statistics back to its original source. . . .

This is amazing. The corruption literature seems to have “Sleep Diplomat”-level problems.

This makes me think it could be useful to have an article collecting a bunch of these made-up or bogus statistics. So far we’ve got:

The claim that North Korea is more democratic than North Carolina

– The Human Development Index

– The supposed “smallish town” where 75 people a week were supposedly dying because of the lack of information flow between the hospital’s emergency room and the nearby mental health clinic

– The above corrupt corruption statistics.

We could collect a bunch more, no?

Here I’m restricting ourselves to simple numbers, not even getting into bad statistical analyses, causal confusions, selection bias, etc.

P.S. Following up on Stephenson’s above-linked report, Ray Fisman and I wrote an article with him for The Atlantic which conveyed the general point. That article was fun to write, and I hope it reached some readers who otherwise wouldn’t have thought about the issue, but I’d still like to compile a broader list, as discussed in the above post.

34 thoughts on “These corruption statistics are literally incredible! (Every political science student should read this article.)

  1. Andrew refers to an article in the Atlantic which contains a reference to the delightfully expressive term, “decorative statistics.”

    “These numbers are what we might call “decorative statistics.” Their purpose is not to convey an actual amount of money but to sound big and impressive. That doesn’t keep them from being added, subtracted, divided, or multiplied to yield other decorative statistics.”

    My guess is that the world is overflowing with decorative statistics, including my guess.

    • “These numbers are what we might call “decorative statistics.” Their purpose is not to convey an actual amount of money but to sound big and impressive. That doesn’t keep them from being added, subtracted, divided, or multiplied to yield other decorative statistics.”

      That description perfectly fits extreme weather event attribution.

      • From a linguistic, indeed poetic point of view, I do like the term “decorative statistics” because it sort of sounds like the familiar term, “descriptive statistics.” Each starts with the letter “d”, ends in “ive” and has three syllables. Well, almost–https://www.syllablewords.net/syllables-in/decorative
        claims “decorative” has four syllables but that is not the way I pronounce it. And, only off by one.

    • Here here, I would like your opinion on something.

      Your joke at the end is good, but jokes aside, how to decide between “My guess is that the world is X” and “The world is X”?

      I think there’s lots of situations where this problem comes up, including here.

  2. Here’s a good one:

    https://bailey83221.livejournal.com/75717.html

    “… The banana events – Garcia Marquez said – are perhaps my earliest memory. They were so legendary that when I wrote One Hundred Years of Solitude I wanted to know the real facts and the true number of deaths. There was a talk of a massacre, an apocalyptic massacre. Nothing is sure, but there can’t have been many deaths. But even three or five deaths in those circumstances at that time…would have been a great catastrophe. It was a problem for me … when I discovered it wasn’t a spectacular slaughter. In a book where things are magnified, like One Hundred Years of Solitude… I needed to fill a whole railway with corpses. I couldn’t stick to historical reality. I couldn’t say they were three, or seven, or 17 deaths. They wouldn’t even fill a tiny wagon. So I decided on 3,000 dead because that filled the dimension of the book I was writing. The legend has now been adopted as history…1”

    Here’s a very active one that circulates constantly.

    https://www.tandfonline.com/doi/abs/10.1080/00396338.2010.494880?journalCode=tsur20

    In this article Adam Roberts does exactly what you recommend in the Atlantic: he traces back to the source the claim that 90% of victims in modern wars are civilians. The claim doesn’t fair well.

  3. ” Corruption of language is an essential to oppressive or exploitative
    politics ” — George Orwell

    … political control has infiltrated most aspects of our society via mass deception, including corrupt manipulation of supposed statistical analsis an reporting.

    the problem is not new, as indicated in the famous 19th Century quote —
    “Lies, Damned Lies, and Statistics”

  4. I think this is an important point that readers of this blog entry might miss: “These findings do not mean that the problem these statistics are meant to capture is not real and important. We do not doubt that it is. Furthermore, our critical assessment of these statistics should not be mischaracterised as a finding that these numbers are exaggerations or overstatements. Indeed, it is quite possible that in some cases they significantly understate the harm they are meant to describe. The problem is that we do not know. We believe that the anti-corruption community can and should do better in its treatment and presentation of quantitative evidence, and though much of the discussion in this Issue is critical, it is meant in a constructive spirit.”

    Andrew, I think you’ve given this blog post a misleading title. I, for one, though you were claiming that the numbers themselves are outside the realm of plausibility. That is not the case. The statistics lack credibility, so in that sense they are indeed “incredible”, but in the plain-English usage of incredible they are not incredible. They could be right, or at least some of them could.

    • After posting the comment above, I read the rest of the paper. It’s really good! Not only should every political science student read it, a lot of other people should too.

    • Phil:

      The numbers may be within the realm of plausibility to outsiders such as you and me but not to experts on corruption. Here are some other examples of the same sort:

      – The paper that claimed that beautiful parents were 36% more likely to have girls. This was plausible to the author of the paper (a sociologist) and to the Freakonomics team (an economist and a journalist), but not to me, as I’d read some of the literature on sex ratios.

      – The paper that claimed that single women were 20 percentage points more likely to support Barack Obama during a certain time of their monthly cycles. Plausible to the author and the journal, but not to me, given my understanding of public opinion.

      In other settings, I’m the ignorant one. This came up recently in some discussion of medical research. The point is that there can be a big gap between outsiders and experts on what is plausible.

      • Andrew, no, these numbers are not outside the realm of possibility for experts. Read the quote from the paper! I made it really easy by quoting it for you!

        According to the paper itself, the numbers are not implausible. They’re just unfounded or problematic. That’s not the same at all.

        • Here, I’ll quote just a portion of the quote; “ Furthermore, our critical assessment of these statistics should not be mischaracterised as a finding that these numbers are exaggerations or overstatements. Indeed, it is quite possible that in some cases they significantly understate the harm they are meant to describe.”

        • Phil:

          I’ll leave it for Fisman or Stephenson to resolve this one, as they’re the corruption experts. My guess is that some of the numbers in that article are in the right ballpark and that some are unreasonable. But you might be right.

  5. Speaking of statistics out of control, the January 20 2023 issue of Science has a big article on how a particular statistician has been fighting bavk against horrific abuses of statistics in various legal systems.

    If you (anyone here) haven’t seen it, I’d bet you’d enjoy reading it…

      • Apparently the same Richard Gill that has been defending JS Bell’s inequality in quantum mechanics against various cranks? An odd crossover between two of my favorite blogs.

        • I read some of the back and forth… These types of philosophical arguments are irrelevant now.

          The walking droplets show how you can get a bell violation from completely local interactions (non-markovian dynamics, ie particles leave a wake so that they are affected by their own past behavior and that of other particles):

          In these examples, if observers were unaware of the pilot wave field and observed only the droplets, they could only account for the observed correlations by inferring a nonlocal connection between the droplet pairs. The violation of Bell’s Inequality (Eq. 1) in our static Bell test may be rationalized in terms of the wave-mediated coupling between the two subsystems.
          Specifically, the form of the pilot wave and the concomitant particle positions depend on the geometry of both subsystems, and so necessarily violate the assumption of Bell locality.

          https://arxiv.org/abs/2208.08940v2

          Whether that is a good model for how the universe works is a separate issue. But surely it is more consistent with the rest of our experience to start with a model that is local, realist, and (if desired) deterministic.

        • I think Bell himself believed that his theorem was about **local realism** that is “an electron is a thing whose path is determined by the value of field variables in the vicinity of its location”. He shows that is incompatible with QM.

          His own belief based on careful thought was that therefore the **nonlocal** realism of the DeBroglie-Bohm description of QM was the correct interpretation. In that scheme stuff that happens far away affects the multi-dimensional wave function directly, and therefore affects the path of the electron.

          If, on the other hand, you don’t believe that electrons are a thing that have a path (ie. non-realism) then you don’t need to have any explanation for what they do in between your “observations”. That’s more or less the “Copenhagen” interpretation.

        • In that scheme stuff that happens far away affects the multi-dimensional wave function directly, and therefore affects the path of the electron.

          Sure, the idea is that non-locality emerges from locality. So there is nothing spooky about this “non-locality”, the mechanism is mundane as some boats on a lake influencing each others motion with their wakes.

        • > So there is nothing spooky about this “non-locality”, the mechanism is mundane as some boats on a lake influencing each others motion with their wakes.

          Only if you don’t understand quantum non-locality – or think that the movement of a boat will instantaneously change the level of the water elsewhere.

        • Nope, nothing needs to happen instantaneously to get “quantum non-locality”. You can get it due to a shared past (“non-markovian dynamics”, as mentioned above).

          Check the paper quoted and watch some of the walking droplet videos on youtube to see how it works. It will take some minimal amount of effort to familiarize yourself with the topic.

        • My comment was in the context of the remark of Daniel that you quoted (“In that scheme stuff that happens far away affects the multi-dimensional wave function directly”). That’s seems a reference to Bell’s idea of things affecting [non-locally] the [whole] 3N-dimensional wave function [existing in configuration space, not physical space]. You seem to be talking about something else.

          Anyway, it was already clear to Bell half a century ago that there is no non-locality to worry about if everything that happens is determined by a common cause. The problem is explaining the observed correlations under the assumption that the measurements were not pre-determined.

        • The problem is explaining the observed correlations under the assumption that the measurements were not pre-determined.

          Bell violations are compatible with randomness, locality, realism, and a limit on speed of information transfer. There is no need to invoke anything “spooky”. Where did I, or anyone in the walking droplet literature, claim predetermination as required?

          One possible issue with these experiments is that there is a single “clock” synchronizing the bouncing of all the droplets. So that may be an additional requirement to get non-locality without any spookiness. Perhaps that can be turned into a testable prediction.

        • > Where did I, or anyone in the walking droplet literature, claim predetermination as required?

          “No-Go Theorems Face Background-based Theories for Quantum Mechanics.”

          “In particular, a model for the Bell experiment is proposed that includes variables describing a ‘background’ field or medium. This field mimics the surface wave that accompanies the droplets in the fluid-mechanical experiments. It appears that quite generally such a model can violate the Bell inequality and reproduce the quantum statistics, even if it is based on local dynamics only. The reason is that measurement independence is not valid in such models.”

          (They argue that giving up measurement independence is not so bad – but it’s not nothing either.)

          ((Anyway I was talking about Quantum Mechanics and how a common cause also can solve the problem depending on additional assumptions. These models are not QM.))

        • Yes, you can give up “realism”. But, if you do so, you’ve given up locality too, since locality is about things. No things, no locality. See “Bell’s theorem” by
          Sheldon Goldstein et al. (2011), Scholarpedia, 6(10):8378 for a clear explanation of this

        • > The walking droplets show how you can get a bell violation from
          > completely local interactions (non-markovian dynamics, ie particles leave
          > a wake so that they are affected by their own past behavior and that of
          > other particles):

          No. The water is not local.

        • I wrote:

          > No. The water is not local.

          Of course, it is a classical model. What I meant was that the model is not modeling locality the way we need it in Quantum Mechanics.

        • David. I’ve always found Bohm’s theory extremely appealing. In the water example model, waves on the water have a finite wave speed. I’ve always wondered where you get if you hypothesize a finite QM wave speed which is very very fast. Let’s say for example a billion times the speed of light. Is there any experiment we have done which exclude possibilities like that? What is the smallest speed compatible with experiment?

          Yes I realize that the Bohm theory has in essence infinite speed as it hypothesizes a 3N dimensional configuration space but I think that can be made a special case when there’s a physical wave that travels very fast. Consider Lagrangian mechanics is the same, a configuration space, but that’s just asymptotic result because the speed of light is so fast.

        • > I’ve always wondered where you get if you hypothesize a finite QM wave
          > speed which is very very fast.

          Since “locality” is defined with reference to the speed of light, I don’t think this saves “locality”.

          In which reference frame are you measuring the speed?

          Does your wave live in 3-space rather than 3N-space? How does it relate to Schrödinger’s wave?

          Why can’t we use your wave to send signals?

          You will still have the situation where two observers can disagree on which of two events is the cause and which is the effect.

          The advantage of Bohmian Mechanics is that we can write down the formulas. It sounds like you are describing a research project.

        • Yes, definitely describing a research project.

          I guess you’d hypothesize a wave in 3 space which locally controls the momentum of a particle. The shape of that wave would be determined by some PDEs, and the speed with which stuff happening far away would affect stuff nearby would be vastly higher than the speed with which photons move through space. Why can’t we send signals? Because just like Bohm’s theory we can’t know whether our outcome was due to far away signal senders manipulating things or due to lack of knowledge about the precise values of the particle position, which induces different outcomes.

          Anyway as you say, a research problem, and one I’m unlikely to effectively tackle myself unfortunately.

  6. Sometimes it’s not hard to correct the statistics so that they become reliable. Here’s my edit of #8, for example:

    Up to 100% of development aid is lost to fraud and corruption each year.

    However, there is often an inverse relationship between reliability and usefulness…

Leave a Reply

Your email address will not be published. Required fields are marked *