Holes in Bayesian statistics (my talk tomorrow at the Bayesian conference, based on work with Yuling)

3 Jul 2024, 11am, at the International Society of Bayesian Inference meeting at Ca’ Foscari University of Venice:

Every philosophy has holes, and it is the responsibility of proponents of a philosophy to point out these problems. Here are a few holes in Bayesian data analysis:

1. The usual rules of conditional probability fail in the quantum realm;

2. Flat or weak priors lead to terrible inferences about things we care about;

3. Subjective priors are incoherent;

4. Bayesian decision picks the wrong model;

5. Bayes factors fail in the presence of flat or weak priors;

6. For Cantorian reasons we need to check our models, but this destroys the coherence of Bayesian inference.

Some of the problems of Bayesian statistics arise from people trying to do things they should not be trying to do, but other holes are not so easily patched. In particular, it may be a good idea to avoid flat, weak, or conventional priors, but such advice, if followed, would go against the vast majority of Bayesian practice and requires us to confront the fundamental incoherence of Bayesian inference. This does not mean that we think Bayesian inference is a bad idea, but it does mean that there is a tension between Bayesian logic and Bayesian workflow which we believe can only be resolved by considering Bayesian logic as a tool, a way of revealing inevitable misfits and incoherences in our model assumptions, rather than as an end in itself.

This work is joint with Yuling Yao.

This is not an “anti-Bayesian” talk (for my thoughts on anti-Bayesianism, see here and here). Incoherence is a necessary part of any generally useful statistics or data-analytic workflow. One cool feature of Bayesian inference is that we can leverage its local coherence as part of a larger workflow—but to do this most effectively we should understand where the paradigm breaks down. Hence this talk.

Perhaps I’ll also have a chance to make the point that Bayesians are frequentists.

25 thoughts on “Holes in Bayesian statistics (my talk tomorrow at the Bayesian conference, based on work with Yuling)

  1. I wouldn’t say that an advice such as “avoid priors which are inadequate for the problem” requires us to “confront the fundamental incoherence of Bayesian inference”. Not more that “avoid models which are inadequate for the problem” anyway. If anything, it’s the fundamental coherence that needs to be confronted.

    > Perhaps I’ll also have a chance to make the point that Bayesians are frequentists.

    To make the point clear to them you may want to start by defining “frequentist” – that word means different things in different contexts.

    • From our paper:

      Bayesian statistics has no holes in theory, only in practice. When the model and all model assumptions are taken for granted, the Bayesian procedure is unique. But we prefer to break such enforced coherence by questioning the validity of the model and assumptions. On the other hand, perhaps these practical holes imply the existence of a theoretical hole, if theoretical statistics indeed is the theory of applied statistics.

      • Your paper says many things – some make more sense than others. Even in that short fragment the last sentence contradicts the first.

    • David:

      Click through the link and read section 2 of the paper. This is basic quantum mechanics: uncertainties are combined using superposition of complex amplitudes, not linear combination of probabilities.

      To put it another way, what we in statistics call “probability” is, in physics jargon, called “Boltzmann statistics.” At the quantum level, particles follow “Fermi-Dirac” or “Bose-Einstein” statistics. It’s kinda weird, and you’re not the first person to doubt it, but it’s how the world works.

      • I’m gonna guess that David Marcus is well aware of the way you calculate the frequency with which certain outcomes of certain QM experiments will be certain quantities… I’m guessing he just denies there’s anything wrong with using probability properly.

        Let’s give the simplest example I know of… We have a double slit experiment and individual photons are sent into the system, and then detected at different locations on the “screen”.

        We can cover one hole, and get a spot of light, cover the other hole and get a spot of light at a slightly different place, or uncover both holes and get a well known “interference pattern” with multiple light spots.

        You might argue that “conditional on going through hole A” you get the first spot of light as the distribution of outcomes, and “conditional on going through hole B” you get the shifted spot of light, but “conditional on going through either A or B” you don’t get the p(location | A) + p(location | B) two spots of light that probability theory would predict.

        But that’s not correct, because you aren’t “conditioning on going through either A or B” you are “conditioning on both holes being open and all the wave function propagation that implies”

        You will find that if you condition on the wave function, everything obeys standard probability.

        • > I’m gonna guess that David Marcus is well aware of the way you calculate
          > the frequency with which certain outcomes of certain QM experiments will
          > be certain quantities… I’m guessing he just denies there’s anything
          > wrong with using probability properly.

          Good guess.

          Andrew: If you don’t want to believe me that probability works just fine for quantum physics, maybe you will believe Sheldon Goldstein:

          http://www.scholarpedia.org/article/Bell%27s_theorem#Classical_versus_quantum_probability_.28and_logic.29

          “However, as long as the usual meanings of words are kept, there is no need to get rid of classical probability theory (or classical logic).”

          I expect that if you email Professor Goldstein, he’d be happy to explain this to you.

      • Andrew,

        I looked at Section 2 of your paper. Your error is to assume that experiment 4 simply provides more information about experiment 3. You can’t assume the “detectors” don’t affect the particles. The theory tells you whether they do or don’t. If they do (and the theory says that they do in this case), then experiment 4 is not just more information about experiment 3. It is a different experiment.

        You could have the same confusion classically if you used water waves and assumed your “detector” wasn’t affecting the water when it really was.

        When you posted about this paper on February 23, 2020, (https://statmodeling.stat.columbia.edu/2020/02/23/holes-in-bayesian-statistics/) we pointed out the error. I don’t know why you haven’t corrected your paper.

        > The photon does not go through one slit or the other; it goes through
        > both slits,

        This is (of course) nonsense. The particle (being a particle) goes through one slit. But, the wave (that guides the particle) sees both slits.

        • David:

          I get that you disagree with what I wrote. That doesn’t make what I wrote an error, or nonsense, or (of course) nonsense, or whatever. It’s a disagreement. There are different ways to translate quantum mechanics into a description in natural language.

        • Andrew, you can argue about the goes through one or both slits terminology, or even argue that particles don’t exist, as some QM people seem to. But I think you have to acknowledge that experiment 4 isn’t more information about experiment 3 it’s a different experiment.

          I’m not an expert in QM, far from it, but I understand that double-slit type experiments involve some material making up the slit, that material is made out of atoms, and the atoms have electrons and things which produce potential energy in the space around them, making it so that the QM wave propagates through the space differently. Detectors are also made out of atoms, and may emit particles or light or whatever ultimately altering the physical reality in the vicinity of the detector. The wave propagates differently in configuration space when the detector is there and when the slit is open vs is closed, etc.

          We can create a perfectly fine classical experimental setup that has the same logical content:

          You can imagine putting a pingpong ball in a river, and halfway downstream are two separate weirs which can be set to different heights and fully downstream is a series of baskets that catch the pingpong ball aligned next to each other across the river. If we set the weirs so that theyre both flowing, the ball floating down the river will, due to the flow of water, pass either through one weir or through the other. We can imagine putting a “detector” which is like a pipe that has a camera in it on the outflow of each weir. We can also imagine that the presence of the pipe alters the surface waves of the water flow so that there’s no longer the same kind of interference caused by the surface waves as compared to if it’s just the weirs without the detectors.

          We wouldn’t say that putting the detectors in place gives us more information about what happened when we did an experiment that had no detector, because clearly the detector perturbs the surface waves in the water dramatically and eliminates the interference.

          With either weir blocked we get a single spot that the pingpong ball always goes to, but it’s different spots for blocking different weirs. When both weirs are open, balls are directed by complicated waves on the surface to an interference pattern at the downstream detector. However with a “detector pipe” next to each weir, the interference pattern on the surface of the water disappears because of the pipe, and we get single spots again.

          Each experiment is a physically different set-up where the apparatus itself matters. The probability we need to assign to being in a certain place is **conditional on the physical affects of the apparatus** and this is not in any way contradictory for the laws of probability.

          In fact the whole reason why Physicists like Bayes is because Bayes is about **our knowledge** and not about **the way the world works**. So the truth or accuracy of Bayesian probability itself can’t possibly be dependent on the outcome of experiments. Only the accuracy of our model at explaining the physics is called into question. The Cox theorems are mathematical tautologies that don’t depend on physics.

        • Daniel:

          I agree that experiments c and d in our paper are different experiments. We label them as such!

          P.S. Physicists may like Bayes, but they know better than to apply Boltzmann statistics to electrons, or to combine probabilities using linear averaging rather than superposing complex amplitudes.

        • Andrew, I still disagree with your characterization of combining probabilities by squared amplitudes.

          Suppose we have a double slit apparatus and a high quality unpredictable random number generator. The random number generator controls whether one or the other or neither slit is covered. It produces numbers 0,1,2 uniformly and they correspond to the states open-open, open-closed, closed-open.

          The apparatus is in a dark room we are standing outside of, we can press a button outside the room and then a few seconds later a detection will appear on a computer screen corresponding to a photon that went through the apparatus and hit a detector at a certain point along the screen.

          Our information is just that after pressing the button, the RNG runs, converts the apparatus to one of the three states with equal probability, and then sends a particle and the detection was at a certain point.

          we’ll still say

          p(certain_point) = p(certain_point | 1) p(1) + p(certain_point|2)p(2) + p(certain_point|3)p(3)

          each of the p(certain_point | 1) etc will be a calculation in which the probability of a detection at a certain point is calculated conditional on the experimental set-up, just like we’d calculate the pingpong ball with the weirs.

          In the process of doing the physical simulation of the pingpong balls we are free to do some calculations related to fluid mechanics equations. In the process of calculating where the photon hits we are free to do some calculations related to Bohmian mechanics. In both processes we have some uncertainty over the initial conditions of the particles as they enter the apparatus, and those small uncertainties grow in time to produce a variety of possible outcomes, giving us p(certain_point | 1) as integral(p(certain_point | 1, initial_condition)p(initial_condition))

          Despite the mechanics being different (fluids vs DeBroglie), the structure of the mathematics is the same, you run a particle forward in time from its assumed initial condition until it hits the screen, and you do this for a weighted average over the prior for possible initial conditions

          The DeBroglie equation assumed by Bohmian mechanics doesn’t even involve complex values per-se. Given an initial condition, the dynamics are weird, but **deterministic** like a pingpong ball. And it builds up an interference pattern as the sum of individual outcomes weighted by priors over initial conditions using standard probability calculations we do every day in problems like pharmacodynamics.

        • Andrew:

          > That doesn’t make what I wrote an error

          The fact that we disagree doesn’t automatically imply you made an error. But, I pointed out what your error is. Conversely, just because you don’t admit it is an error doesn’t make it correct.

          > There are different ways to translate quantum mechanics into a
          > description in natural language.

          Some make sense. Some do not make sense (i.e., are nonsense). If you pick one that doesn’t make sense, it is not too surprising that you conclude that probability is broken.

          As I said, you should contact one of the experts. Or, read some of the many good books on the subject.

        • Hmm to clarify, whether DeBroglie-Bohm involves complex numbers is not quite the right way to think about the question and is a red herring.

          https://en.wikipedia.org/wiki/De_Broglie%E2%80%93Bohm_theory#Derivations

          shows that the original Bohm derivation used 2 coupled real equation, but that they’re equivalent to a single complex equation. The point is more or less irrelevant. In the pingpong ball example we calculate p_NS(outcome | apparatus) by integrate(p_NS(outcome | apparatus, initial_cond) p(initial_cond)) using Navier-Stokes equations to propagate position forward in time. In quantum 2 slit experiments we can do it by integrate(p_DBB(outcome | apparatus, initial_cond) p(initial_cond)) using DeBroglie-Bohmian mechanics to propagate position forward in time.

          In both situations we get a probability by the sum rule and integration over a prior combined with “some physics”. There’s no “weird probability” involved everything is standard Bayes.

        • Daniel:

          > But I think you have to acknowledge that experiment 4 isn’t more
          > information about experiment 3 it’s a different experiment.

          Exactly!

          Good example with the river.

        • Andrew,

          If you read any of Sheldon Goldstein’s papers

          https://sites.math.rutgers.edu/~oldstein/index.html

          it is obvious that he understands probability. So, if he writes an article with a section “Controversy and common misunderstandings”

          http://www.scholarpedia.org/article/Bell%27s_theorem#Controversy_and_common_misunderstandings

          and one of those common misunderstandings is something that you believe is true

          http://www.scholarpedia.org/article/Bell%27s_theorem#Classical_versus_quantum_probability_.28and_logic.29

          you might want to discuss it with him before assuming that you are right and he is wrong.

        • Apparently de Broglie and Bohm pilot wave theories are *not* the same. And this is a major source of confusion when talking about the topic.

          It is important to clarify the distinction between de Broglie’s double-solution theory and the relatively well-known Bohmian mechanics, or de Broglie–Bohm pilot-wave theory [7,8,9]. According to the latter, quantum particles are guided by the standard wave function; however, the particles are not seen as sources of that wave, whose form is unaltered by the particle position. Recent advances in Bohmian mechanics include the Lagrangian theories of Sutherland [10] and Holland [11]. These Lagrangian approaches bear some similarity to the present work; indeed, both theories fall under the purview of the variational results to be developed in Section 2. However, our focus is on a classical pilot-wave dynamics of the form envisaged in de Broglie’s double-solution theory and engendered in the walking-droplet system, where the particle responds exclusively to a wave of its own making.

          https://www.mdpi.com/2073-8994/16/2/149

      • To put it another way, what we in statistics call “probability” is, in physics jargon, called “Boltzmann statistics.”

        But in both the mathematics and quantum foundations literature probability is just called “probability”; the simplification of the jargon reflecting the knowledge that “classical” (Kolmogorovian) probability isn’t all of probability (see e.g.).

        • > But in both the mathematics and quantum foundations literature probability is just called “probability”

          Also in physics in general.

          “[Maxwell-]Boltzmann statistics describes the distribution of classical material particles over various energy states in thermal equilibrium.”

          Usually when people “in statistics” call something “probability” they are not thinking of that.

  2. “The usual rules of conditional probability fail in the quantum realm.”

    Yukalov and Sornette have solved this problem…or at least proposed a reasonable way to formulate conditional probability in the quantum realm.

    See:
    V.I. Yukalov and D. Sornette, Positive operator-valued measures in quantum decision theory, Lecture Notes Comput. Sci. 8951, 146-161 (2015)
    external page(http://ssrn.com/abstract=2579278)

    and especially
    V.I. Yukalov and D. Sornette
    Quantum probabilities of composite events in quantum measurements with multimode states,
    Laser Physics 23, 105502 (14pp) (2013) doi:10.1088/1054-660X/23/10/105502
    (arXiv:1308.5604 and http://ssrn.com/abstract=2316701)

  3. > the usual rules of conditional probability fail in the quantum realm

    Sounds bad.

    > The standard paradigm of Bayesian statistics is to define a joint distribution of all observed and unobserved quantities […]

    Ok, let’s keep that in mind.

    > Consider four possible experiments: […] The problem is that p4, […] is not the same as p3 […]

    Is it a problem that different experiments have different outcomes?

    > This violates the rule of Bayesian statistics, by which a probability distribution is updated by conditioning on new information.

    Does it? Having a probability distribution for the outcomes conditional on the experiment being performed doesn’t seem problematic.

    > At this point, we can rescue probability theory, and Bayesian inference, by including the measurement step in the conditioning. Define a random variable z carrying the information of which slots are open and closed, and the positions of any detectors, and then we can assign a joint distribution on (z, y) and perform Bayesian inference

    Ah, we can rescue probability theory by following the standard paradigm of Bayesian statistics: taking into account all the relevant information about the experiment to create a joint distribution.

    > In essence, the problem with applying probability theory to the slit x and screen position y arises because it is physically inappropriate to consider x as a latent variable.

    So that was it, the model was physically inappropriate.

    > Bayesian inference cannot resurrect a misspecified model, but it works fine to incorporate quantum mechanics within the model.

    Amen.

    > There are two difficulties here. The first is that Bayesian statistics is always presented in terms of a joint distribution of all parameters, latent variables, and observables; but in quantum mechanics there is no joint distribution or hidden-variable interpretation.

    We did just rescue probability theory in your QM experiment(s) by defining the right joint distribution. Where’s the difficulty?

    > The second challenge […] If classical probability theory needs to be generalized to apply to quantum mechanics,

    Why would it need to be generalized if “Bayesian inference can be rescued by careful modeling”?

    • Carlos wrote:

      > So that was it, the model was physically inappropriate.

      Indeed.

      There can’t be anything wrong with conditional probability because it is just math. It is possible to produce a mathematical model that is not a good description of reality. If one does that, then it is not surprising that the model will make statements that do not match reality. This just means the model is poor. In fact, it is the disagreement with reality that tells us that the model is poor.

      In this case, Andrew and Yuling assumed the “detector” does not affect the particle. With this assumption, the model predicted things that did not agree with experiment. The solution is to remove the bad assumption from the model. Once you do that, everything is fine.

      I don’t understand why Andrew thinks this is just a “disagreement” rather than an error. What Andrew is saying and what we are saying are contradictory. They can’t both be true.

Leave a Reply

Your email address will not be published. Required fields are marked *