Holes in Bayesian Statistics

With Yuling:

Every philosophy has holes, and it is the responsibility of proponents of a philosophy to point out these problems. Here are a few holes in Bayesian data analysis: (1) the usual rules of conditional probability fail in the quantum realm, (2) flat or weak priors lead to terrible inferences about things we care about, (3) subjective priors are incoherent, (4) Bayes factors fail in the presence of flat or weak priors, (5) for Cantorian reasons we need to check our models, but this destroys the coherence of Bayesian inference.

Some of the problems of Bayesian statistics arise from people trying to do things they shouldn’t be trying to do, but other holes are not so easily patched. In particular, it may be a good idea to avoid flat, weak, or conventional priors, but such advice, if followed, would go against the vast majority of Bayesian practice and requires us to confront the fundamental incoherence of Bayesian inference.

This does not mean that we think Bayesian inference is a bad idea, but it does mean that there is a tension between Bayesian logic and Bayesian workflow which we believe can only be resolved by considering Bayesian logic as a tool, a way of revealing inevitable misfits and incoherences in our model assumptions, rather than as an end in itself.

This paper came from a talk I gave a few months ago at a physics conference. For more on Bayesian inference and the two-slit experiment, see this post by Yuling and this blog discussion from several years ago. But quantum probability is just a small part of this paper. Our main concern is to wrestle with the larger issues of incoherence in Bayesian data analysis. I think there’s more to be said on the topic, but it was helpful to write down what we could now. Also I want to make clear that these are real holes. This is different from my article, “Objections to Bayesian statistics,” which discusses some issues that non-Bayesians or anti-Bayesians have had, which I do not think are serious problems with Bayesian inference. In contrast, the “holes” discussed in this new article are real concerns to me.

106 thoughts on “Holes in Bayesian Statistics

  1. I disagree that the quantum case is a problem for Bayesian statistics. The issue there is that the physical measuring apparatus changes the physical system. Consider system A, without detectors at the slits, and system A’ with detectors at the slits. System A’ is different from system A, so it’s not a problem for Bayesian statistics that the probability distributions for A and A’ are different.

    John Bell’s theorem, together with many experiments by people like Alain Aspect, gave solid empirical proof that quantum systems are not described by latent variables such as “which slit did the photon pass through”. Thus, trying to describe the statistics of a two-slit experiment in terms of such latent variables is a category error in the characterization of the problem, not a challenge to Bayesian formalism.

    I don’t see a problem applying Bayes to quantum entanglement so long as you characterize the probabilities in terms of an Abelian set of variables (that is, the operators corresponding to the variables commute with one another and can thus simultaneously have well-defined eigenvalues).

    • I was going to make the same criticism. A dual-slit experiment with detectors at each slit is not a dual-slit experiment: it is a zero-slit, dual detector-emitter experiment.

      Nearly all quantum weirdness stems from either a misunderstanding of quantum mechanics or a misunderstanding of the macro world. NEWTONIAN mechanics, now THAT’S weird.

    • That’s not what John Bell’s theorem says, nor is it John Bell’s interpretation of his own theorem. What his theorem says is that the world can not simultaneously be *local* and have hidden variables. His own position was that it seemed exceedingly likely that the world was non-local and that there were hidden variables (such as where is the photon at any given time). Evidence for this can be found in his book “Speakable and Unspeakable in Quantum Mechanics”.

      • > What his theorem says is that the world can not simultaneously be *local*
        > and have hidden variables.

        No. The only hypothesis needed for the Bell inequality is locality. If you
        disagree, please reread Bell’s papers. Bell himself says this
        (unfortunately in a footnote).

        As for the relationship to probability, you can’t ignore the experiment.
        The experimental setup is not passive (in this case).

        • I haven’t read the footnote, but I believe the one condition you’re referring to is “local realism”. Which is really two conditions folded into a convenient phrase. Non-local hidden variable theories are, as far as I know, not incompatible with our current state of knowledge.

        • Bell’s paper proceeds from the assumption that there exists a latent parameter lambda, the probability distribution of which induces an expectation over the observable measurement. This is realism or hidden variables, defined by EPR as

          “If without in any way disturbing a system, we can predict with certainty (i.e. with probability equal to unity) the value of a physical quantity, then there exists an element of physical reality corresponding to this physical quantity”

          So when you say:

          > No. The only hypothesis needed for the Bell inequality is locality.

          you are incorrect.

          Combining realism with the requirement that measurement at B doesn’t affect measurement at A, or locality, you get a contradiction. Hence, what the paper rules out is “local realism” or local hidden variable theories. This purpose is stated in the introduction of the paper

          “Moreover, a hidden variable interpretation of elementary quantum theory has been explicitly constructed. That particular interpretation has indeed a grossly nonlocal structure. This is characteristic, according to the result to be proved here, of any such theory which reproduces exactly the quantum mechanical predictions.”

          “any such theory” being a hidden variable interpretation.

          Local realism isn’t a term I made up, it’s a standard phrase. There exist local “non-real” theories that aren’t incompatible with experiment, though they’re all New-Agey “consciousness/mentalist” interpretations that aren’t very illuminating and sort of sidestep the issue. Nonetheless, they exist.

        • The only hypothesis needed for the Bell inequality is locality.

          It isn’t. You have to replace “locality” with Bell’s later “local causality” (see also Landsman). Unfortunately, some proponents of some interpretations / modifications of QM have let their ‘ideology’ blind them to that distinction (and to believe other false things, e.g. that the PBR theorem rules out all psi-epistemic interpretation).

    • Indeed we do not touch the micro-level mechanism: which slit the photon passes through. The narrative of our double-slit experiment is focused on the macro-level measurement: if/which of the slit is open or closed at all.

      • Hello,

        Long time follower of this blog, but never commented before — so, firstly, thank you for the constant stream of interesting content and papers. The inconsistency with the quantum physics as presented here has intrigued my curiosity, and so I thought I should express my thoughts (limited as they are).

        Arguably, the latent variable should describe the joint state of both detectors for the likelihood to factorise over it. I.e.: \theta = 00, 01, 10, 11, each one carrying its own likelihood over the observations on the screen.

        Looked at from the perspective of the path integral formulation, the appropriate latent variable for factorisation is the set of all possible paths the particle could have taken. Given such a set of paths (i.e. the state of the slits) the observation variable is independent of anything else, but the entire set is necessary to construct it.

        Hope this makes sense as an alternative resolution.

        MM

    • I agree with this objection. Also, I read Gelman and Yuling’s article to the end and in the last part they say (rephrasing as I understood) that there was no problem with quantum mechanics as long as you knew quantum mechanics, so: why putting this problem upfront in the article when it seems the weaker one? When I arrived at “more complex probability theory that allows for quantum entanglement” I had a strong feeling of quantum crap. I do not want to offend, this sensation probably stems from the fact that I’m a physics student and I have heard or read too many a time nonsense discussions on quantum mechanics. I just feel that starting with that discussion on the two slit experiment might steer away physicists from taking the article seriously. I mean, really, I read through only because I knew Gelman’s name, I can imagine my friends reading section 2 and sharing it to make fun of the authors. I found the rest of the insights interesting and useful.

      • Arguably, QM has been plagued by confusion (and nonsense discussions and interpretations!) over the years largely because of the persistence of a lack of appreciation that there is more to probability than classical probability theory.

        QPT is a little more complex than CPT but I think you’ll find that the perspective it facilitates: that QM is just probabilistic mechanics (with the appropriate PT being QPT rather than CPT because of the noncommuting random variables) more than compensates.

        The real reason QPT gets in the way of doing Bayesian inference with it in much the same way as is done with CPT is to do with a causal asymmetry in the former.

        • About the first link: I don’t think that the matter of the various interpretations of quantum mechanics (Copenaghen, many worlds…) really changes probability. My take on this is that you have to go through probability to formally make inferences, so your theory of reality has to give rules for writing p(x|theta), so it is not weird that quantum mechanics just starts giving rules for computing p(x|theta) at a fundamental level, instead of e.g. classical statistical mechanics where you think that *in principle* you could go without probability.

          In this perspective I see quantum mechanics the same way you think of a model in statistics: you have a set of empirical/mathematical rules to build your p(x|theta) and to put numbers in the x when you do an experiment. We already know quantum mechanics as it is now is not a theory of everything, so I can live by with this picture. I admit I feel puzzled when reasoning about observers, decoherence, many worlds, etc. but this is theory-building since experiments still do not discern this. (Is this what they call “quantum bayesianism”? I don’t know.) BTW: I feel the right way is something many-worlds-like.

          Regarding the second link (disclaimer: I didn’t fully read the article, I quickly read the intro and glanced the rest): this is interesting, I always saw the density matrix just as a convenient formalism for combining probability and the fact that your model is quantum mechanics, while this other perspective is taking the density matrix as a generalization of probability and saying quantum mechanics is a straightforward application of this new probability theory. It vaguely resembles what Einstein did with Lorentz equations.

          The next step is applying this generalization wherever you applied classical probability. Is this really useful? We can decompose it back and say that if in many contexts statistical inference with a quantum model was effective, you could as well speak of quantum probability by default. But it appears to me it isn’t. Continuing to try a parallel with relativity: ok, we often use Galilei because it works for many cases, but we know it always fits as a special case of relativity. But here it is a bit different: you can’t do relativity = Galilei + something, while quantum probability = classical probability + quantum model. And CP + QM is at the same time equivalent where it matters, more general, and easier to understand.

          The possibly exciting point that Andrew likes to bring up is: ok, but what happens if in practice I try to routinely apply QP + social sciences or QP + psychology or similar? My view is that I strongly believe things in these fields are well described by models that ultimately boil down to classical physics. So I assume reasonable models will “converge” to classical ones; having a good fit with a quantum model would just be an accident of low sample size. (Since long-distance coherence now seems to be feasible, this could change if they started selling “quantum oracles” to people! Or if you believe in the quantum mind crap.)

          P.S. Rereading this, I felt an English speaker would have put “Newton” where I used “Galilei”. I could make an heartfelt argument about the appropriateness of mentioning Galilei instead of Newton in that context, but the truth is this is about me being Italian! :)

        • About the first link: I don’t think that the matter of the various interpretations of quantum mechanics (Copenaghen, many worlds…) really changes probability.

          Indeed, it’s more the other way around: Copenhagen (sometimes*) and neo-Copenhagen (e.g. QBism, or Rovelli’s) interpretations are broadly compatible with the “QM is just probabilistic mechanics” view. Others (e.g. MWI or Bohm) are not. QM is either (applied) generalised probability or weirdened mechanics.

          Regarding the second link […]

          Did you follow the links within the first link? Because that’s where you’ll find more about the generalisation of probability. The second link (to the Leifer and Spekkens article) is about how to do Bayesian inference with it in case anyone wants to. It’s important to understand that it’s not just a QM-motivated “taking the density matrix”. As Terry Tao says:

          However, it is possible to take the abstraction process one step further, and view the algebra of random variables and their expectations as being the foundational concept, and ignoring both the presence of the original sample space, the algebra of events, or the probability measure.

          – to improve on the Kolmogorov foundations:

          In the present paper a somewhat different sort of model is presented which while equivalent to the standard one is more cogent for certain technical purposes, and seems to us to be more in agreement with the historical and conceptual development of probability theory.

          Furthermore, Segal’s way isn’t the only way and QPT isn’t the end of it (see e.g.).

          The next step is applying this generalization wherever you applied classical probability. Is this really useful?

          If all your RVs commute you’re unlikely to want to do that. OTOH it is useful for comparison’s sake:

          It is misleading to compare quantum mechanics with the deterministically formulated classical mechanics; instead, one should first reformulate the classical theory, even for a single particle, in an indeterministic, statistical manner. After that some of the distinctions between the two theories disappear, [while] others emerge with greater clarity.

          –Max Born.

          while quantum probability = classical probability + quantum model.

          No! QP = QP. The physical model to which it is applied is, in the case of ordinary QM, the [Galilei symmetry based] mechanics. QM = QP + M.

          * The CI according to Ray Streater is one and he was unusually explicit about this.

        • > No! QP = QP. The physical model to which it is applied is, in the case of ordinary QM, the [Galilei symmetry based] mechanics. QM = QP + M.

          I still stand by quantum probability = classical probability + quantum model. Maybe this is ambiguous, so let me lay it out. I start by saying that a quantum model is something built this way:
          – You have a complex vector.
          – You can linearly transform the vector.
          – You build all your probabilities for data as squared expansion coefficients of the vector on sets of orthogonal subspaces.

          Then: I apply probability to the initial vector, like “I don’t know what vector exactly it was, let it be a random variable”. It turns out that, if you don’t care about completely representing distributions on complex vectors, but just to keep what matters for the subsequent computations, the distribution can be summarized as a matrix, the density matrix, and you work out what operations you have to do on this matrix to carry on the usual computations, and they turn out to be elegant linear algebra operations.

          Finally, I can take all this and call it “quantum probability”, but mathematically it is the same (not being controversial here, it was as well pointed out in your Tao’s blog link).

          I agree with your QM = QP + M in the sense that quantum mechanics = quantum probability + mechanics. Well we’re not being really formal on these + and =.

          Ack my point was very down to earth, that these generalizations of probability are very exciting, but the only fundamental example is quantum mechanics, which I’m not sure can describe everything, and I can get by with probability being uncertainty on logic, which is very understandable, and quantum mechanics as just another model. So finally for my sanity I can fix my epistemology on “bayesian, classical probability”.

          When an experiment will shed light on quantum mechanics interpretations, maybe I will need to adjust my views, but until then it makes sense that I pragmatically separate generalized probability as theoretical fun from classical probability + models as The Way To Reason About Reality.

          Anyway, thanks for linking the “Post-Classical Probability Theory” paper, I just read the beginning but it seems the most useful of the bunch. Oh, and of course for bringing all this up! I mean, I knew the math, but nobody ever pointed me to considering it a generalization of probability.

        • I see Bohm’s QM as basically Strange nonlocal mechanics of particles + Bayesian probability over initial states, which are position and momenta of particles.

          I see Copenhagen as fundamentally not a theory of mechanics, it’s a theory of the behavior of dials and lights and digital readouts and soforth when set up to read off results determined by the behavior of very small objects.

        • It might be better to say that Copenhagen doesn’t really exist, in the sense that there is no well defined interpretation one can point to and say “this is the copenhagen interpretation”. One point in favor of Bohm is that in fact he spent time to create an actual theory. It’s a complete and well thought out theory. It’s a theory of actual particles, they move around through 3D space according to an actual rule. They exhibit certain statistical properties as a straightforward consequence of sensitive dependence on initial conditions, lack of knowledge of those initial conditions, and that the dynamics of ensembles would converge to a particular stationary distribution regardless of the initial ensemble conditions.

          Anyone familiar with MCMC would find Bohm’s theory delightful.

        • Finally, I can take all this and call it “quantum probability”, but mathematically it is the same (not being controversial here, it was as well pointed out in your Tao’s blog link).

          What you’ve made there looks like the [unit sphere of the] Hilbert space C_n treated as a “phase space” for a classical probability. This is the “formally equivalent hidden variables model” described e.g. here (in §1.7). It’s problematic[*] but something you might want to do – and call “QP” – if you’re not comfortable with “QP proper” and/or you already subscribe to an interpretation that takes the view that the [pure] QP state is something real (“psiontology”).

          [*]

          Whether one can give such “physical” interpretation for the formal construction of the Theorem 1.7.1 in cases other than spin-1/2 particle depends on the possibility of interpreting an element ψ ∈ Σₙ as “input data” for some classical preparation procedure. Without going into detail we mention that for particle with spin j > 12 there are state vectors ψ which can hardly be interpreted in terms only of polarizing filter as it was done in Section 1.5 for j = 1/2.

          […]

          Anyhow, the introduction of the classical description is achieved in Theorem 1.7.1 at the price of drastic increase in the dimensionality of the set of states (from 3 to ∞ in the spin-1/2 case). Apparently it does not simplify the description of the object, introducing a lot of details which are not reflected in the measurement statistics. A concise and adequate description of all relevant statistical information is provided by the quantum theory.

        • Ops, we have hit the maximum depth. Answering to Paul.

          It’s problematic[*]

          That non-orthogonal subspaces come out from a tensor product with a much larger (possibly infinite) system is not a problem! It is the actual condition of experiments where quantum mechanics applies, in which you study a very small system, with ZILLIONS OF DEGREES OF FREEDOM in you instruments!

          but something you might want to do – and call “QP” – if you’re not comfortable with “QP proper” and/or you already subscribe to an interpretation that takes the view that the [pure] QP state is something real (“psiontology”).

          I admit I have ideas on what makes me comfortable… but I’m trying to make a substantive argument here. I don’t prefer psiontology (but I find the term slightly excessive—because, again, I already know quantum mechanics is broken) because I subscribed something, it is that if I start calling things “quantum probability”, then if it is not just a name, it means I should start applying quantum probability everywhere to make inferences. Since it perfectly generalizes probability, everything done so far still holds, no problems there. But since I already expect a huge lot of things are or can at some point be described deterministically, and by experience I trust classical probability, Occam’s razor cuts away carrying around the generalization.

          I’m not 100 % sure, because it is one of those things where I imagine taking a newborn, forcing him to live in an artificial environment where you get to eat only if you understand entanglement, and then when he’s adult he’s like «what’s this “classical” probability? are u dumb LOL»

          Well, if you ever happen to get a far right government, please take the occasion for these interesting social experiments!

        • Giacomo,

          That non-orthogonal subspaces come out from a tensor product with a much larger (possibly infinite) system is not a problem! It is the actual condition of experiments where quantum mechanics applies, in which you study a very small system, with ZILLIONS OF DEGREES OF FREEDOM in you instruments!

          Tensor products weren’t mentioned and aren’t relevant to the particular technical and conceptual problems with that classical [“hidden variables”] construction which Holevo points out there. Tensor products – describing compositions of ‘small’ systems* – expose further problems with such constructions – most notably “spooky action” of course (see e.g. that monograph’s Supplement, which covers “hidden variables” in more depth).

          […]

          “Quantum probability” (or “algebraic probability”) certainly isn’t just a name, and of course it doesn’t mean you should start applying it everywhere. Let me re-emphasise that it contains classical probability and of course you’ll want to keep on using the conceptually and technically simpler Kolmogorovian representation of that part of it wherever it’s appropriate. But the physics of the last century or so has revealed that the assumption that everything “at some point can be described deterministically” – the “classicality assumption” – is extremely dubious. Probably, there will be “no return to classical reality”.

          * The tensor product description of a ‘small’ system of interest with a ‘large’ measuring instrument system is relevant to a very different problem: the “(small) measurement problem”.

        • The point with augmenting the dimensionality, isn’t it mentioned in your very quote of Holevo’s book? (I also went and read that part of the book and part of the addendum before my previous answer)

          Now, I think I’m missing something here. I’m convinced that, mathematically, at the end, I do exactly the same calculations on the density matrix when applying quantum mechanics, whatever is my ontology. So, I understand what it means there will not be a return to classical reality, but I don’t need to change probability for that!

          You continue to point out problems in my meta-theoretical choice, but I can’t say I fully understand them. Each time you link a paper or book, I try to read everything, but they are too many and too long and technical to understand everything, and I have to extract the information from the context, and each time I think I found a loophole. But I can’t be sure, at best I understood only a minimal part of all your references!

          For my sanity, could you please make an argument yourself? Something I can be sure I understand, grounded on reality, without taking the “Paul Hayes” exam instead of the ones I have to actually take… You can quote at the end if you mention theorems, but only the proof can be external, the statement must be clear. Otherwise I never know if what I’m saying makes sense, or if I’m not understanding anything at all.

          I understand if this is a weird request, it always takes more time when I have to do that kind of self-contained explanation myself to someone who knows less. In case you don’t have enough time to cope with my ignorance, I concede a technical victory to quantum probability!

        • For my sanity, could you please make an argument yourself? Something I can be sure I understand, grounded on reality, without taking the “Paul Hayes” exam instead of the ones I have to actually take… You can quote at the end if you mention theorems, but only the proof can be external, the statement must be clear. Otherwise I never know if what I’m saying makes sense, or if I’m not understanding anything at all.

          Well I’m not sure. For example, it’s easy to see* why that earlier construction is the (needless) introduction of a ∞-dimensional classical probability state space on top of [the “extreme boundary” of] a 3-dimensional quantum probability state space. But if you don’t already see it, it’s probably simply because you’re not familiar with the relevant elements of QP. Probably I should start there but it would be a bit of a chore in this context.

          * I mean that quite literally: just by looking at the Bloch sphere/ball – an object which I expect you’re already familiar with (from a ‘vanilla’ QM perspective at least).

        • Ok, I said I conceded victory in this case, it turns out I was lying, because it is Saturday and I took the time to fully read the first chapter of Holevo’s book.

          So, yes that construction is exactly what you mean, I got confused with purification because I thought infinite-dimensional was referred to the Hilbert space instead of to the probability space over the Hilbert space.

          I can accept that the construction is needless in the sense that, given a density matrix, I don’t need to decompose it in that way to do subsequent calculations, but it is not needless in general.

          Example: I can prepare spins in pure states. I can make a machine that, given a pdf on pure states of spin, generate spins according to that distribution at a given rate, and I can send the spins one at a time and record in which state each of them was.

          From the point of view of someone who does not know how I’m doing this, he can do quantum tomography of the spins coming out and measure the density matrix with arbitrary precision, but no more than the density matrix—he would not be able to get information on which pure state each spin was. In the end, he gets a probability distribution on the density matrix.

          But, meanwhile, I know the state of each spin, and I can give him the record, and he will know too then.

          So I can physically build a density matrix in a classical probability sense. Each passage is described with classical probability: the fact that I know the spin states, the statistical analysis done to do quantum tomography, possibly the poissonian rate of the spins, the fact that the guy before is ignorant and then knows the individual states.

          If I think quantum, then in each passage I have to use quantum probability. I would have a density operator for the entries of the density matrix! If I do not assume commuting observables and make inference on that, I can explicitly measure small entanglement between me and the guy, or between rho_11 and rho_22, or whatelse.

          In other words, the point that the phase space description adds unnecessary complication is ambiguous. Because the density matrix generalizes a distribution, but can also be derived from a distribution.

          To the critique of unwanted complication, the quantum statistician can reply “but you got complication because you didn’t assume commuting observables when you could”; conversely, the classical statistician says “you got complication because you took the space of all possible probabilities, you never do that, you always have a model!”

          Also: at the end, quantum probability spits out probabilities, in the sense of real numbers related to frequencies of repeated experiments. So, either we go back to the unsatisfactory frequentist paradigm where you always have to say “hafta do enough experiments, so we don’t get fooled by random variations” (THEN WHAT WAS RANDOMNESS AGAIN??) or “assume the system is chaotic, i.e. you really get frequency = probability” (AAAGH) (half-mocking Holevo here), or we have to do something with these numbers. I’m bayesian, so I think ok, these are degrees of belief or plausibility or whatever calibrated on frequencies, i.e. probability, but then wasn’t I a quantum probabilist? Why am I always extracting this special “classical” thing at the end?

          So I need a classical notion of probability to use a quantum probability. This selects classical probability as fundamental. As usual, not really sure, I feel there’s a way out from this “recursion paradox”.

        • Example: I can prepare spins in pure states. I can make a machine that, given a pdf on pure states of spin, generate spins according to that distribution at a given rate, and I can send the spins one at a time and record in which state each of them was.

          Or you can make a machine that doesn’t record which pure state is a good description of each system it prepares, or you or someone else can inspect the internal construction of your machine closely enough to determine a mixed state description for each system it prepares.

          If I think quantum, then in each passage I have to use quantum probability […]

          Let me re-emphasise that it contains classical probability and of course you’ll want to keep on using the conceptually and technically simpler Kolmogorovian representation of that part of it wherever it’s appropriate.

          In other words, the point that the phase space description adds unnecessary complication is ambiguous. Because the density matrix generalizes a distribution, but can also be derived from a distribution.

          It isn’t ambiguous; it’s a plain mathematical fact. A quantum probability state [“density matrix”] can’t be derived from a classical probability distribution. Your quantum tomographer is estimating one from the data he/she collects and the knowledge that it’s appropriate to do that rather than estimate a classical probability distribution [over some ‘hidden’ variables].

          So I need a classical notion of probability to use a quantum probability. This selects classical probability as fundamental. As usual, not really sure, I feel there’s a way out from this “recursion paradox”.

          You need the general notion of probability – that is what’s fundamental – and “general probability ⊃ quantum probability ⊃ classical probability”. Surely the way out for a Bayesian, acutely aware of the distinction between a probability and a frequency, is to not use a thought experiment in which both are involved in such a diabolical, frequentist manner? ;-)

        • > Or you can make a machine that doesn’t record which pure state is a good description of each system it prepares, or you or someone else can inspect the internal construction of your machine closely enough to determine a mixed state description for each system it prepares.

          Yes, I agree my example is built ad hoc, it’s a specific example where it is easier to use classical probability and the phase space construction of the density matrix.

          > > If I think quantum, then in each passage I have to use quantum probability […]

          > > > Let me re-emphasise that it contains classical probability and of course you’ll want to keep on using the conceptually and technically simpler Kolmogorovian representation of that part of it wherever it’s appropriate.

          > > In other words, the point that the phase space description adds unnecessary complication is ambiguous. Because the density matrix generalizes a distribution, but can also be derived from a distribution.

          > It isn’t ambiguous; it’s a plain mathematical fact. A quantum probability state [“density matrix”] can’t be derived from a classical probability distribution. Your quantum tomographer is estimating one from the data he/she collects and the knowledge that it’s appropriate to do that rather than estimate a classical probability distribution [over some ‘hidden’ variables].

          Ack, I don’t know what you read in “derived”, I meant that there is the phase space description. In the thought experiment the guy maneuvering the machine derives the density matrix from a probability over the spin states. The other guy can’t ever see the full probability with his measuring apparatus, but he can still think there’s one if he wants, in fact it turns out at the end he gains access to it.

          > You need the general notion of probability – that is what’s fundamental – and “general probability ⊃ quantum probability ⊃ classical probability”. Surely the way out for a Bayesian, acutely aware of the distinction between a probability and a frequency, is to not use a thought experiment in which both are involved in such a diabolical, frequentist manner? ;-)

          (Uhm, whatever I do must make sense in any concrete example. But you are half-joking here.)

          Since there is the construction by which you can always see a quantum probability as classical probability with higher dimensional probability space, I don’t agree that I need the general probability as fundamental. I can pick! And in these cases it is normal to pick the simpler one, like, a simpler rule over more objects.

          It is like saying that the fundamental notion of Markov chain is the one with 1 step memory. You can see it as a special case of the one with n steps memory and time dependence, but the latter can again be seen as a special case of 1 step on a (n+1)-dimensional vector. But if you happen to work a lot with n steps memory chains, maybe you work more easily by using as fundamental the other choice, so it is an arbitrary choice after all based on convenience.

          Another weaker analogy is with gaussian processes, where I build an infinite dimensional probability space, but when I have the data points and the points where I want the prediction, I just need to compute the covariance matrix on the points. But sometimes it is convenient to decompose back the problem in a basis of functions.

          The last example I made and called “recursion paradox” is relevant, you won’t get away with a joke! As I read in Holevo’s book: the generalized probability has a “state space” and “measurements”. The measurements produce probabilities out of the state space. Wait: so what are these probabilities? Holevo introduces them in a totally frequentist manner, but that has the usual problems. Anyway, it appears you need some preexisting notion of classical probability that stands under the definition of generalized probability. Then you find out generalized probability can be seen as usual probability over a larger space. So, uhm, it is the classical which is fundamental, because you need it in order to define the generalized one, and the generalized one actually is still a special case of it!

          Maybe it is Holevo’s treatment which is half-assed in this respect. I feel there must be a way to put down generalized probability all at once, so I don’t rely on a preexisting probability concept, and the choice of what is fundamental becomes totally free again.

          It is not literally what you said, but yes frequentists are diabolical creatures.

        • Since there is the construction by which you can always see a quantum probability as classical probability with higher dimensional probability space, I don’t agree that I need the general probability as fundamental. I can pick! And in these cases it is normal to pick the simpler one, like, a simpler rule over more objects.

          I’m afraid I can’t see how it simplifies anything. It’s augmenting a quantum probability space (Α, ϕ) with a classical probability space (Ω, Σ, μ) constructed on the boundary of Α. But μ just amounts to an extra, redundancy-laden set of coordinates for the QP state ϕ. I can’t see any use for it even for the tomographer of your example. He does only know Α and so must estimate ϕ, but that’s just a matter of estimating 3 numbers – the usual sort of coordinates for ϕ – directly.

          I feel there must be a way to put down generalized probability all at once, so I don’t rely on a preexisting probability concept

          The QP space (Α, ϕ) above denotes, in general, a [possibly noncommutative] algebra of “random variables” and [one of the] “states” of its dual. (The algebra Α is M₂ for the specific case of your spin-1/2 tomography example.)

        • I’m afraid I can’t see how it simplifies anything. It’s augmenting a quantum probability space (Α, ϕ) with a classical probability space (Ω, Σ, μ) constructed on the boundary of Α. But μ just amounts to an extra, redundancy-laden set of coordinates for the QP state ϕ. I can’t see any use for it even for the tomographer of your example. He does only know Α and so must estimate ϕ, but that’s just a matter of estimating 3 numbers – the usual sort of coordinates for ϕ – directly.

          You continue insisting (and emphasizing) on the mathematical complication of the construction. I already said many times in various ways that the point is not doing the calculation in the complicated way, I agree e.g. the tomographer should use the density matrix; rather it is philosophical, it is what I put down as fundamental. Like, the von Neumann construction of the integers is arguably more complicated to use than the Peano axioms, but you get the advantage of being founded on set theory, then you probably never use it again.

          I’ll quote the definition I found in Holevo’s book:

          1.3. Definition of a statistical model
          Motivated by consideration in Section 1.1, we define a statistical model as a pair (S,M) where S is a convex set and M is a class of affine maps of S into the collections of probability distributions on some measurable spaces U. The elements of S are called states, and the elements of M measurements. The problem of theoretical description of an object or a phenomenon satisfying the statistical postulate can then be described as a problem of construction of an appropriate statistical model. In more detail, the construction must first give a mathematical description of the set S of theoretical states and the set M of theoretical measurements and second, prescribe the rules for correspondence between the real procedures of preparation and measurement and the theoretical objects, i.e., an injection of the experimental data into the statistical model.

          So, generalized probability is made up of states, and measurements that yield probability to be used in the usual sense out of the states… How can this be fundamental respect to probability, since the latter is used in defining it?

          Also, it sounds a lot like the state is the parametrization of some distributions. It really sounds like the normal definition of a statistical model. What is generalized here? The generalization is more about the classical vs. nonclassical description of reality, but from the statistics perspective you just care that there are parameters and distributions. Why stuck your physics into the definition of probability? What if your physics change and the elegance of the density matrix crumbles?

          The QP space (Α, ϕ) above denotes, in general, a [possibly noncommutative] algebra of “random variables” and [one of the] “states” of its dual. (The algebra Α is M₂ for the specific case of your spin-1/2 tomography example.)

          Just putting down the definition of the mathematical object is not sufficient here. You have to connect it to reality, the definition above connects it to reality through the preexisting probability concept. (Disclaimer: I don’t remember what an algebra is in general, and I don’t know what is phi, what is M_2)

          I want an empirical explanation that introduces the density matrix without standing on the preexisting probability (and which is not frequentist in the bad sense). Otherwise I can accept that I can see every probability as just a special case of a generalized probability, but not that the latter is the fundamental concept.

          I’m not really against quantum probability. As you said at the very beginning of the discussion, it simplifies thinking about quantum mechanics. But I want a clear explanation!

        • P.S. Maybe I’m asking too much. Maybe a “clear” explanation is related to the problem of the interpretations of quantum mechanics, in the sense that I always end up with classical probabilities because in some way classicality emerges from quantum mechanics and humans behave classically.

        • the point is not doing the calculation in the complicated way, I agree e.g. the tomographer should use the density matrix; rather it is philosophical, it is what I put down as fundamental. Like, the von Neumann construction of the integers is arguably more complicated to use than the Peano axioms, but you get the advantage of being founded on set theory, then you probably never use it again.

          I thought we were on the same page – Bayesian – philosophically (I don’t think the flavour – Objective or Subjective or some mixture – matters). All mathematical models of <a probability in that sense – classical, algebraic/quantum, etc. – are models of the same concept. So you can’t order them by fundamentality but you can order them by generality.

          So, generalized probability is made up of states, and measurements that yield probability to be used in the usual sense out of the states… How can this be fundamental respect to probability, since the latter is used in defining it?

          No: Holevo defines a statistical model, which isn’t the same thing as a (quantum) probability space. But a statistical model can be extracted from one: take all the states in A to make S and make M out of [commutative, hermitean] subalgebras of A).

          […] I want a clear explanation!

          Yes, I’m not doing very well, but have we made any progress – in particular on the “fundamentalness” issue?

        • No: Holevo defines a statistical model, which isn’t the same thing as a (quantum) probability space. But a statistical model can be extracted from one: take all the states in A to make S and make M out of [commutative, hermitean] subalgebras of A).

          Ok, although the name is different I thought that was what you meant by generalized probability, please state your definition, because this means I still have not understood precisely what you intend by quantum probability. (You sentence sounds to my hears like “no, it is not that, but yes, it is that (??)”.) (What I was thinking up to now is: quantum probability = the density matrix represents your state of knowledge, while classical probability = a probability distribution represents your state of knowledge.)

          I thought we were on the same page – Bayesian – philosophically (I don’t think the flavour – Objective or Subjective or some mixture – matters). All mathematical models of <a probability in that sense – classical, algebraic/quantum, etc. – are models of the same concept. So you can’t order them by fundamentality but you can order them by generality.

          Uhm, ok, but from Cox’s theorem I obtain probability distributions, not density matrices, right? How do you make the link?

        • What I was thinking up to now is: quantum probability = the density matrix represents your state of knowledge, while classical probability = a probability distribution represents your state of knowledge.

          That’s the basic idea. Remember that QP includes CP [algebras of functions]! You can find definitions in many of the references I’ve already given. Probably the best thing for you to do is peruse the list in the references section on the nLab page and find the one(s) most suitable for you given your background knowledge.

          Uhm, ok, but from Cox’s theorem I obtain probability distributions, not density matrices, right? How do you make the link?

          Cox’s theorem yields only the laws of probability. You can then start assigning (diagonal) density matrices.

        • Cox’s theorem yields only the laws of probability. You can then start assigning (diagonal) density matrices.

          I just don’t understand this sentence. First observation: I can always diagonalize a density matrix. It sounds like you imply that a diagonal density matrix is classical, but that depends on the observables. Maybe you mean in the basis that diagonalizes observables too? I’ll assume the latter.

          Then: if I have understood correctly, without all the mathematical quirks, by (A, phi) you mean something equivalent to A = set of projectors, phi = density matrix. Then maybe what you are meaning is: Cox’s theorem starts from propositions, but does not implement them, in general a set of propositions need not be decomposable into a set of atomic propositions, so I can implement propositions with projections and phi the probability map? If it is so, I’d like to see the implementation, what is P and Q, P or Q.

      • Giacomo:

        I was a physics student too! And I gave my talk to a room of physicists, and they didn’t think what I was saying was nonsense. It could well be that I presented this more clearly in the talk than I did in the article.

        • How could you be certain? After all, you infer what they thought from a sample of data you collected, your prior expectation about their thinking and some kind of model to put everything together. If your data, prior or model are not adequate your inference may be full of holes!

        • The only way to go further than is to continuing doing that with more and more diverse audiences until you hit one that can point out how you are mistaken. And then repeat over and over again with the less wrong arguments ;-)

        • @Andrew

          Well, my view of the article changed. A “pure” statistician will read the abstract, think “woo quantum magic” and read the article. Someone knowing about quantum probability will be interested and will read the article. Someone like me before this discussion will think “Non-physicists messing with QM? HAS THIS BEEN APPROVED BY THE GUILD?” and mock it. I kind of regret that attitude now.

        • Giacomo:

          We submitted our paper to a physics journal so I hope the reviewers will point out aspects of the paper where we are wrong or are not communicating well to that audience.

  2. Just had to pop in here to agree with Gilligan, though via a different avenue. It’s true that the probabilities in the double-slit experiment don’t work out, but that’s because Gelman/Yuling assume the photon is a classical particle. Within quantum mechanics, though, it’s a wavefunction with indefinite location and momentum.

    Depending on your interpretation of QM, in case four the detector becomes entangled with the photon and thus observes only one of the two states, the other splitting off into a separate universe, or it forces the indefinite location to be more sharply defined and the wavefunction behaves like a point particle. In case three, though, the photon remains indefinite at the slits. This allows the wavefunction to diffract after the slits, altering its shape but without making the location or momentum that much more definite. Only when the photon arrives at the screen does the wavefunction entangle/collapse. Either way, the odds of it becoming definite at any given point is proportional to the square of the wavefunction’s value at that point.

    The keys is where this entanglement/collapse occurs. If it is close to where the slits are or before them, the wavefunction does not have time to spread out and remains concentrated in a small space, behaving like a point particle. The further it moves away from those slits, the more it can diffract and spread out, leading to a more wave-like appearance when it entangles/collapses.

    There’s a loose analogy here to the logit transform. MCMC algorithms can have an awful time sampling the domain [0,1], as it is easy to jump out of bounds and have your log likelihood hit negative infinity. If you transform the domain to [-infty,infty] via the logit, however, the sampler will always jump to a finite log likelihood and thus your effective sample rate is much higher. Likewise, if you view the double-slit experiment in terms of classical point-particles you’ll easily come to absurd conclusions. Transform into the space of the quantum wavefunction, though, and everything makes sense.

    • After spotting Andrew’s comment, I had to re-read section 7. It seems I missed the second half of it, but that section amounts to little more than “we can rescue Bayesian statistics by viewing the double-slit situation in terms of wavefunctions.” There was nothing to rescue in the first place, as a mis-specified model does not break the probability system it operates under. Likewise:

      “In the pharmacology example, the latent variables seem uncontroversial and we could hope that the direct model of blood concentration could be expressed mathematically as an integral over the unknown internal concentrations, thus following Bayes’ rule. But in the economics example, it has often been argued that preferences do not exist until they are measured (…), hence any joint model would depend on the measurement protocol.”

      Then the measurement protocol must be incorporated into the model. That’s not a problem with Bayesian statistics at all.

    • We mentioned that QM is perfectly described by probability theory in the Hilbert space. Or in other words, there will be fewer, if not none, holes in Bayes if we *know and utlize* the true model a priori.
      In your analogy, the logit transform is bijective, therefore it is more of a efficiency concern, wherace the wrong model just cannot give any reasonbel answer with even infinite number of data.

      • Suppose I wish to model a coin flip, and include two possibilities: heads or tails. I turn the crank on my model with Bayes, and predict p(heads) = 0.5 to within an incredibly small interval. In real life, though, there’s actually three possibilities: heads, tails, or edge. The true p(heads) is 0.4999something-or-other, so my Bayesian analysis has returned the incorrect result.

        To what degree is Bayesian statistics at fault here? I followed the laws of the Bayes, after all, yet I got the wrong result! I, and I wager most people, would say Bayesian statistics is completely blameless in this situation. And yet this is much like assuming classical mechanics when computing the probabilities in the double-slit experiment will lead to incorrect results. Upshot: incorrect results when following a statistical system does not necessarily imply the system is broken.

        We can go in the opposite direction, too. If I use frequentism to turn the crank on this classical model, and it too leads to incorrect results, have I just broken frequentism? In general, what statistical system would give us the correct results if we assume the double-slit experiment followed classical mechanics?

        • HJ:
          Sure, there is no better alternative that can rescue a misspecified model. That said, we pointed out that Bayes by default has to aggregate aleatoric uncertainty by a linear mixture. That is if the only unknown state is x (which slit is open), both the prior and posterior predictive distribution has to be some linear mixture of p(y|x=1) and p(y|x=0). But at a very high level, despite its mathematical simplicity, linear mixture is among many operators to combine two or multiple distributions, such as superposition.
          In such sense, linear mixture is intrinsic to Bayes rule to the same extent that superposition is invariant under Schrodinger’s equation. Again, this is not necessarily a limitation, as we can always model other operators to better reflect reality.
          Consider a simpler example: y~N(theta, sigma). The Bayes rule gives p(theta, theta|y) explicitly and the posterior predictive distributions of y have to a student t. But that is not the largest distribution this model can describe: if p(theta, sigma|y) is allowed to be any distribuion (wihch will be given oracularly), then the posterior distribution of y can literately fit any continous distribuion– after all there is some limitation of restriction on linear mixtures.

  3. Fascinating article. However, I am not sure that I understood your Cantor’s corner problem. There are a countable infinite number of decision procedures and an uncountably infinite number of decision problems. We can ask more questions than we can answer. I don’t understand why that is a problem with Bayesian inference as opposed to all inference. But, I think that I am missing something.

    Also, do you not regard the “Old Evidence” Problem as a major problem for Bayesian inference. The standard example, is Einstein’s GTR is “confirmed” by finding that his theory is compatible with Mercury’s orbit. But, Mercury’s weird orbit was already know when GTR came on the scene. Why should it be used to update our beliefs at all? I have always taken this to be one of the most serious problems with Bayesian inference, but you don’t mention it. Do you have a response to the Old Evidence problem?

    • Steve:

      1. Yes, Cantor’s corner arises for all statistical methods. It’s a particular concern for Bayes, because Bayesian inference has coherence as one of its selling points.

      2. I’m not so concerned with the “old evidence problem” because I think of prior and likelihood just as different pieces of evidence. I am not so concerned about the time order. And I don’t think of Bayesian inference as updating of beliefs; I see it as deductive reasoning given a model. For more on these issues, see this article with Shalizi.

      • Thanks. That was quick. Your answer to 2 makes sense to me. (I’ll have to think about whether I agree, but it makes sense.) I still have to understand what you mean by “coherence as one of its selling points”, but at least I understand that you are not making a claim about uncountable infinities being a unique problem for Bayes. Again, great article. N.B. Stop attributing to Lakatos a position that Quine and Duhem established decades prior. My pet peeve. Ignore if annoying.

  4. Section 5: “This last factor… is called the marginal likelihood of M”. I think the marginal likelihood should be p(y|M) rather than p(M|y)?

    Section 4 (subjective priors): Have there been attempts to condition on the modeller’s understanding of their own prior? One of the things I’ve noticed happening goes like this:

    1. You have an idea in your head of your prior knowledge of some model parameter.
    2. You translate that knowledge into a quantitative prior for the parameter that you think fits your prior knowledge.
    3. You fit the model, and your posterior looks strange.
    4. You find that your quantitative prior caused this strange behaviour, because it implies things that contradict your prior knowledge now that your aware of it.

    Reading through e.g. Michael Betancourt’s notebooks on principled modelling, one of the main things he does is to introduce an extra step:

    2a. Generate some fake data, and check the resulting posterior against what you expect the posterior to look like.

    This introduces a check on mismatches between prior knowledge and quantitative prior, which would otherwise be confounded with the effect of the real data. The statistical equivalent of section 4 of his Falling notebook (https://betanalpha.github.io/assets/case_studies/falling.html), where the sensors are checked for their detection sensitivity in a controlled experiment without falling bodies.

    I’m not sure how you’d rigorously try to condition on this, but it seems like something that is rarely even discussed.

  5. I’m curious if you think there is value (in terms of clarity of discussion) in distinguishing between some of the terms that appear in the quote: “Bayesian data analysis”, “Bayesian inference”, “Bayesian statistics”, “Bayesian logic”, “Bayesian practice”, and “Bayesian workflow.”

    • I think that giving definitions of each of these phrases, and pointing out the differences and similarities between them, would be helpful in the discussion. It’s always a good idea to define your terms, and especially so when speaking to as diverse an audience as this one.

    • Good point. For example:

      Problem (1) (quantum) is really a problem of misunderstanding how QM works. There is no problem for Bayesian analysis, it’s just a problem of regular old scientific model choice. It’s also a problem that we don’t really actually know how QM works, we know how to do calculations, but not what the underlying thing is we’re calculating about.

      Problem (2) is a problem of Bayesian Practice or something… People choose flat priors because they think this is a good thing, keeping them more “objective” or some such thing. When in fact a flat prior is just a prior that says you think it’s overwhelmingly likely that the value of interest is infinite (or extremely large at least). That’s the opposite of objective.

      Problem (3) (subjective priors are incoherent) I’m not even quite sure what it means. Maybe it means that often people don’t encode into informative priors information that actually means what they think it means. This is basically “be careful and check that what you think you’ve done makes sense”. It’s like debugging programs… not in general a problem specific to Bayes.

      Problem (4) (bayes factors fail with weak priors). I think this is just a corollary to the basic issue of we need to make our best effort to use as much prior information as we actually have. Also, to include in our analyses all the relevant models that we can think of. Often we don’t really have a discrete set of models, and this is more or less what BFs are for.

      Problem (5) I guess Andrew is saying that we have to step outside Bayes in order to check whether the thing we fit with Bayes makes sense in a scientific sense… I don’t see this as a “problem for bayesian inference” but rather just a general fact about science and formal systems (like Godel’s results).

      So, where do things go wrong? Mainly when people do half-assed Bayes and then claim with religious fervor that they’ve followed the incantations and their paper must be published by the rules set out in the guild handbook.

      If you’re doing science, and you’re NOT using Bayes at all, you are probably doing yourself a serious disservice. If you’re doing half-assed flat-prior sort-of-Frequentist Bayes you’re not getting it… if you’re not checking your fitted models against common sense and background knowledge then you’re not doing a good job. If you leave out possibilities from your analysis your analysis will suffer… None of these are inherent problems with core Bayesian ideas.

      Here on the other hand are some problems with core Bayesian ideas:

      1) Bayes is about the probability that a logical statement is true… This could in theory be the case. But in practice with real-world analyses, this can’t be the case. For example fluid viscosity changes rather dramatically with temperature. Temperature of a fluid is not a uniform property, it varies over the spatial extent of the fluid and it changes as energy is dissipated … due to viscosity! And yet, you can do an analysis of say pouring oil through a viscometer and come up with a single viscosity tightly bounded in your posterior distribution. There is no “one true viscosity” but for all intents and purposes we can act like there is without much problem. So while Bayes *can* be about the probability that a logic statement is true, in most science it isn’t really about that, it’s more like “which values are relatively compatible?”

      2) It’s hard to actually encode your knowledge into a probability distribution for a prior. This is just a fact about the world because the world is complex and we know lots of different kinds of things. This suggests that we need to check our priors in many ways to show that they make sense. Doing this is hard, and takes a bunch of learning. Also see (1) in that a lot of people don’t even know what Bayes is… it’s some math, they don’t have a coherent interpretation of what the math means. So they can’t do a good job of (2).

      3) Bayesian Computation can be hard… Sure we have Stan, but there are lots of scientific models that don’t fit into the smooth analytical type sort that Stan can fit efficiently. For example think Agent Based models of interactions in ecosystems.

      • I agree that your (1) is a big hole in Bayesian statistics, at least from a philosophical point of view. The standard Bayesian way of interpreting probability does not gel well with practice (even philosophical practice!), as Andrew has also pointed out before. I argue that the solution is that the primary elements of Bayesian analysis — i.e. the prior and likelihood — should be interpreted in a pragmatic way. E.g., p(H) does not represent the degree of plausibility that H is true, but rather the probability that H is best according to one or more measures of approximation (that are often not completely specified). The details can be spelled out in a few different ways (as I do here, for example: http://philsci-archive.pitt.edu/15215/1/New%20semantics%20for%20Bayesianism%20.pdf )

      • “..When in fact a flat prior is just a prior that says you think it’s overwhelmingly likely that the value of interest is infinite (or extremely large at least). That’s the opposite of objective.”

        That is certainly one interpretation. Another is the ‘let the data speak as much as possible’.

        Justin

        • > ‘let the data speak as much as possible’
          That presumes any value of interest is as likely as any other and guess what – lots more huge ones than reasonable for a universe any cognoscente being could survive in.

        • Yes that’s true. But on the other hand, a proper and concentrated prior does not simply express the idea that the value of interest is likely not huge; it also has to make a precise guess about where, among the non-huge possible values, the actual value is. At least the posterior calculated from the improper prior will be peaked on the MLE, which seems to be a good default guess most of the time. But if the proper prior is concentrated on a location that’s very far from the actual value, that will throw off the posterior estimate quite a bit. So there is a “risk” involved in using a proper prior (especially a very concentrated one) that is not there when one uses a flat improper prior, and it’s in that sense that a flat prior “let’s the data speak” more than a proper prior does (though, as Daniel Lakeland points out, there’s another sense in which it doesn’t). So there are trade-offs.

        • Sure, but it doesn’t have to be mysterious. There is no flat prior on the real line… And floating point arithmetic makes it proper for you.

          normal(0,1e9)

          is a LOT more concentrated than uniform(-1.79e308,1.79e308) which is the floating point version of a flat prior.

          How many applied problems are there where you can’t tell that a number is to within a couple billion?

        • I agree with Daniel. Also, uniform flat priors when dealing with variance components in hierarchical models can sometimes be a nightmare for computation. So, even if you don’t have a ‘belief’ about a parameter (although as Daniel notes, we usually know something!), it doesn’t suggest you should use uniform distributions. Sometimes ‘letting the data speak’ through uniform priors will lead to inaccurate estimates, and indeed often less conservative estimates than through the use of priors that induce some regularization.

        • Yea, Justin is incorrect about the flat prior interpretation. The trouble is that in Bayes we are integrating over the prior measure – “averaging over the prior”. In non-Bayesian methods like maximum likelihood, we are selecting a distinguished point in the set, and then defining a compatibility interval with other methods that rely on information local to just that point. Integrating over values arbitrarily far from zero can be problematic in a limited data setting and is far from “uninformative”.
          Flat priors = uninformative is one of those zombie ideas that it would be nice to see totally slain one of these days :)

        • “> ‘let the data speak as much as possible’
          That presumes any value of interest is as likely as any other and guess what – lots more huge ones than reasonable for a universe any cognoscente being could survive in.”

          Yes, I hear that a lot. Like ‘well, we know X is between 1 and 20 and cannot be 100,000 like you’re implying’. But again, if it can be 100,000 (or even 21?), the data will tell me that, and not any possibly off subjective or other prior generating artificially smaller CIs, IMO.

          Justin

        • Justin, this is just incorrect, or at least may only hold in very specific instances. I would be curious for you to speak to some specific, real examples of the phenomena you’re mentioning. Do some reading on hierarchical models, or really anything involving computation in high dimensions. We simply can’t ‘let the data speak’ if the computations are intractable without the use of prior distributions and other modelling assumptions (e.g., on high dimensional variance components, where estimates can be disastrous without imposing regularization). Of course, if you think models should only be run when we can 100% ‘let the data speak’ without any modelling decisions, limits, regularization, etc., then, I guess we need to do away with large swaths of science.

        • Yes, but you are missing the point. The prior specification can be looked at in many ways. Two of the most useful are: 1) it is a formal way to encode external information not present in the specific dataset being analyzed, and 2) it is a regularization device. Regularized estimates almost always being preferable to unregularized estimates. So, no, as a general principle, it is NOT the best idea to “let the data talk” or assume “the data will tell me…”.

      • “If you’re doing science, and you’re NOT using Bayes at all, you are probably doing yourself a serious disservice.”

        There are Nobel prize winners in science using frequentist methods (as well as Bayesian), for example, so I find the ‘frequentism is bad for science’ line very doubtful and cliche’ at this point.

        -http://www.statisticool.com/nobelprize.htm
        -http://www.statisticool.com/quantumcomputing.htm
        -“The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century”, by Salsburg
        -“Creating Modern Probability: Its Mathematics, Physics and Philosophy in Historical Perspective”, by von Plato

        Justin

    • Megan:

      Bayesian inference = going from likelihood & prior to posterior.

      Bayesian data analysis = Bayesian model building, inference, and model checking.

      Bayesian statistics = all the things that go into Bayesian data analysis.

      Bayesian logic = Bayesian inference + decision theory.

      Bayesian practice = what Bayesians (especially me) do.

      Bayesian workflow = Bayesian data analysis for a sequence of models.

  6. BTW, you say in that paper

    If classical probability theory needs to be generalized to apply to quantum mechanics, then it makes us wonder if it should be generalized for applications in political science, economics, psychometrics, astronomy, and so forth. […]

    But there are already applications outside of physics, e.g. in cognition.

    • For example, yesterday I came across “A Preliminary Experimental Verification of Violation of Bell Inequality in a Quantum Model of Jung Theory of Personality Formulated with Clifford Algebra” published in the “Journal of Consciousness Exploration & Research” when I was looking for Bell’s theorem.

    • Following one of the “See also” links in that wikipedia entry one learns that:

      “NeuroQuantology is a monthly peer-reviewed interdisciplinary scientific journal that covers the intersection of neuroscience and quantum mechanics. It was established in April 2003 and its subject matter almost immediately dismissed in The Lancet Neurology as “wild invention” and “claptrap”.[1] While the journal had a 2017 impact factor of 0.453, ranking it 253rd out of 261 journals in the category “Neuroscience” as reported in the 2018 edition of Journal Citation Reports,[2] Clarivate Analytics delisted the journal in its 2019 edition.[3]

      “The journal describes itself as focusing primarily on original reports of experimental and theoretical research. It also publishes literature reviews, methodological articles, empirical findings, book reviews, news, comments, letters to the editor, and abstracts.

      “In the Norwegian Scientific Index, NeuroQuantology has been listed as “Level 0” since 2008,[4] which means that it is not considered scientific and publications in the journal therefore do not fulfill the necessary criteria in order to count for public research funding.

      “Neither the editorial board nor the advisory board contain scientists working in the fields of quantum physics or neurology.[5]”

      • Yes, I noticed that there are a lot of references to the cranky* stuff – QM rather than QPT applied to cognition – in that wikipedia entry. I don’t know why they’re there: the crucial distinction is made at the beginning and it should’ve been left at that.

        * In fact QPT has been applied in cranky / crackpot ways too, e.g. to ‘explain’ how homeopathy ‘works’.

  7. This is certainly a strange juxtaposition. We have a blog that often points out where people do not admit their errors. Now we have a paper written by one of the author’s of this blog. In it, a claim is made that has been made before and which people have pointed out is wrong.

    The references in the paper do not include any of the items that one should read to understand the physics. The first physics error in the paper is assuming that the detectors in experiment 4 do not affect the trajectory of the photons. While this is a plausible assumption, theory and experiment say that it is false.

    “in quantum mechanics there is no joint distribution or hidden-variable interpretation”: Whether this is true depends on what it means. It is true that the world is not local. It is not true that particles don’t have positions. The formulas say that the trajectory depends on things that are not just along the trajectory, e.g., detectors at slits that the particle does not go through. Experiment confirms this.

    It is true that some of the criticisms of the claim have themselves been incorrect. But, this does not mean that all criticisms are incorrect.

    • Quotes are from the paper:

      “Probability theory isn’t true”: I’m not sure how this could be. The formula for conditional probability is a theorem. It is possible that probability is being applied to the real world incorrectly.

      “This is all standard physics”: Perhaps. But, is this “standard” physics being explained correctly?

      • David:

        What statisticians call “probability theory” is what physicists call “Boltzmann statistics” or “hidden-variable models.” These models are not in general true, in the sense that they do not apply in quantum mechanics. In mathematics we say that a conjecture is false if there are any counterexamples to it. In that sense, probability theory is false. That said, probability theory is very useful! There are some settings such as coin flipping and die rolling where probability theory is evidently true, and other settings such as Bose-Einstein statistics, Fermi-Dirac statistics, and the two-slit experiment where probability theory, as it would be intuitively applied, is false. An open question is the applicability of intuitive probability theory in other settings. As we discuss in our paper, the failings of probability theory can be resolved by changing or expanding the probability model in various ways, but it would not be apparent ahead of time that such extensions would be necessary.

        You write, “The first physics error in the paper is assuming that the detectors in experiment 4 do not affect the trajectory of the photons.” What we actually write is when discussing experiment 4 is, “putting detectors at the slits changes the distribution of the hits on the screen.” So, yes, we do say that the detectors affect the trajectory. In any case, this discussion is useful, because if such a basic point was not made clear in our paper, to the extent that you could read a statement as saying its opposite, then we have not written it clearly enough. So I appreciate the feedback.

        • I feel like your only error is the use of a physics example at all. It’s drawn out a lot of people who are less interested in a point about the dangers of building a probability model by intuition and more interested in demonstrating that they once took a course in quantum mechanics.

        • To some extent I agree with somebody. Personally, what fired my guts was «the usual rules of conditional probability fail in the quantum realm», and now I feel the same at Andrew saying «What statisticians call “probability theory” is what physicists call “Boltzmann statistics” or “hidden-variable models.” These models are not in general true, in the sense that they do not apply in quantum mechanics. In mathematics we say that a conjecture is false if there are any counterexamples to it. In that sense, probability theory is false.»

          It’s saying so starkly “probability is false because quantum” that feels wrong, because there are various interpretations around, none is preferred really at the moment by evidence, so anyone with a different opinion feels excluded.

          Then, at the end of the article, one thinks “whatever, they acknowledge all this, it was just an effect statement, the point is a strong example of misspecified model, and acknowledging probability of course is not a proven definitive form of knowledge”.

          It would seem appropriate to state something like “the usual rules of probability fail in the quantum realm if applied naively”, or “quantum mechanics can be more concisely described by quantum probability, so it may be the right way to do probability”.

          But, on the meta-level, what is the right amount of starkness really? Maybe it is good to make strong statements, if they do not make people do actual bad things. If someone comes to me and says “wrong because quantum because Gelman’s said that” I will send quantum anathema to all of Columbia university! Just kidding, I miss the necessary papers from the church.

        • One could turn the argument on its head and say that the paper seems written by people who are less interested in making a point about the dangers of building a probability model by intuition and more interested in demonstrating that they once took a course in quantum mechanics. :-)

          It’s not very clear to me what’s the point. The two main difficulties that quantum physics creates for statistical inference according to the paper seem to be that a) quantum superposition is not Bayesian probabilistic uncertainty and b) the authors have no idea whether nodes, wave behavior, entanglement, and other quantum phenomena could manifest in observable ways in applied statistics, motivating models that go beyond classical Boltzmann probabilities. Both points are true, I guess. Maybe there would be less objections if the paper was titled “musings on” rather than “holes in” Bayesian statistics.

          I find the discussion about physics confusing. For example I would say that |1>, |2> and |1>+|2> are states, not outcomes; the inequality {x=1 or x=2} =/= {x=|1>+|2>} is completely inconsistent. Another thing I don’t understand; “the awkwardness that x can be measured for some conditions of the number of open slits but not others.” I would say that x can be measured whenever we want, we just have to make a measurement.

          I also find the discussion about physics distracting, if the true objective is to discuss statistical issues. Maybe others find the discussion useful, but it made me think about the following quote: “The second law, when formulated in terms of entropy, gives powerful insights into a wide variety of problems. It seems a shame to take such a beautiful and exact idea and blur its meaning by indiscriminately applying it to all sorts of areas that have nothing to do with equilibrium thermodynamics. Such a procedure might disorder our thoughts about other disciplines that are already difficult enough.”

        • Carlos:

          As a person who uses probability models all the time, I find it useful to know that probability is actually false, in the sense described in the paper, that there are problems for which the usual application of conditional probability will not work.

        • > “hidden-variable models.” These models are not in general true, in the sense that they do not apply in quantum mechanics.

          Andrew, this is quite simply false. The big problem with bringing QM into all of this is that *there is no single thing called QM*. Specifically, while lots of people can calculate stuff and get the right answer for example experimental situations, there is no universal agreement on what it is that is being calculated or what the calculations mean about stuff in the world. There are in fact many models of what QM means… Copenhagen interpretation is not a single interpretation… but the various proponents have some various similarity. Bohmian interpretations are rejected not because they give the wrong answer, but because they give the right answer for reasons physicists prefer to reject (ie. it’s a political issue, they don’t like the nonlocality, and apparent faster-than-light effects, and more to the point, in the early years they couldn’t let themselves be associated with Bohm because of Joseph McCarthy’s witchhunt). Many Worlds just seems goofy to a lot of people. Decoherence seems like it’s just a way to say “QM is an asymptotic theory that only applies for “short” times, until things get mixed up and entangled with the environment”. Theories involving “spontaneous collapse of the wavefunction” require us to believe things that seem probably false, like subtle failures of conservation of energy etc etc.

          There *ARE* many many examples of hidden variable theories in QM. There is even a whole book surveying them, which someone I met from this blog recently bought and read: https://www.amazon.com/Survey-Hidden-Variables-Theories-International-Monographs-ebook/dp/B01DRXPOAM

          The only requirement for a hidden variable theory in QM is that it be nonlocal. This is what Bell’s theorem shows, that there are no local realist theories (ie. theories where the particle has a definite position, and goes through one of the slits but at the same time is not affected by nonlocal facts, like the second slit is open).

          The review from the other commenter said the hidden variables book was a bit funny because it kept complaining about how people keep talking about hidden variable theories being impossible, when in fact there are quite a few of them, and the “proof of impossibility” by Von Neumann was debunked ages ago…

          So, the QM stuff is a huge distraction, if it’s not essential to your larger argument, you should just drop it. If it is essential to your larger argument, you should read enough about some of the other theories of QM. I recently had recommended to me by Chris Wilson from this blog the book by Lee Smolin:

          https://www.amazon.com/Einsteins-Unfinished-Revolution-Search-Quantum-ebook/dp/B07FLK72XC/ref=tmm_kin_swatch_0?_encoding=UTF8&qid=&sr=

          I got it electronically from my library. It’s quite good, and it goes into *all* of this kind of discussion, the different interpretation and what’s generally wrong with them… In particular I found that the main objections he brings up to Bohmian theory didn’t seem even the slightest bit objectionable. In fact one of the objections is that the wave functions go on forever spreading out and affecting all of the universe. This seems to me like a benefit of the theory rather than an objection. It’s obvious that there is only one single universe and not a split between “quantum stuff” and “classical stuff” which is built into many other interpretations.

          The real challenge for Bohm is to understand the role of relativity with QM, but it turns out that this is a challenge for all of QM, and even relativistic theories like Dirac’s equation don’t fully resolve the issue, particularly in-so-far as there is no meshing of anything QM with anything gravitational. Smolin himself works on “loop quantum gravity” but it’s not a full theory.

          If you take Bohm’s theory, then quite simply the world is like an MCMC chain… Little particles go zooming around according to rules that involve complex high dimensional calculations over configuration space. As they do this, they rapidly converge into an ensemble that has statistical properties equivalent to Born’s rule. Basically they converge to be distributed asymptotically as Psi^2. Thus Born’s axiom is not an axiom at all, and normal Bayesian probability applies to the position and momenta of particles. It just happens to be that because of the wave that the particles interact with, the tinyiest errors in our knowledge of position or momentum get blown up through time into the full distribution of possibilities that we see in the outcome (the flashes of light on the screen in a two slit experiment for example). This is no more weird than it would be weird that starting an MCMC chain at 1 vs at 1.00000001 would after many iterations lead to the “particles” being in totally different locations distributed according to whatever the density is that you’re using in your sampler.

          The weird part of Bohm’s theory is that it requires a kind of “quantum information” essentially travels faster than light. However, it’s still impossible to send regular information over quantum propagation channels because it would require controlling the position of particles to well below the resolution that is actually possible.

          Basically saying that “normal probability theory doesn’t apply” to QM is perpetuation of a political position of Bohr’s. Bohr and co browbeat the QM community for decades to try to eliminate all ideological opponents in a typically 20’th century cold war.

          As a political scientist you might find that history interesting, and how much of what’s “weird” about QM is really about political power struggles within academia.

        • I really liked the Feynman biography by James Gleick, Genius, for its insight into the sociology of early quantum mechanics. Of course, there’s also Linguistics Wars; that one I know firsthand from working in its post-apocalyptic aftermath.

        • I think a big part of the reason QM seems so “weird” is that in the past, if you came in with a reasonable theory that doesn’t seem too weird (ie. non-realist, there are no real particles or they have no real location or whatever), like Bohm who had a theory of actual particles moving around, you would be stepping on the toes of all these powerful people who basically asserted the inherent weirdness of QM. Fermi, Bohr, Heisenberg and soforth.

          Another aspect is that as a physicist it would be unusual for you to have encountered ideas like the ones explored by Kevin S Van Horn in his summary of Cox’s theorem: http://ksvanhorn.com/bayes/Papers/rcox.pdf

          That is, you’d be unlikely to have spent much time discussing or understanding basic ideas in probability theory outside what might be called “naive” probability theory (you know, dice, and flipping coins and such). Not impossible, but just relatively uncommon, particularly for say a 3rd year undergrad.

          So, when your professor says “in QM we do this calculation and then Psi^2 is the probability to find the particle at the given point” you’d just accept that without any problem, and you’d have the idea that “probability = how often it happens in repetition” without questioning it. It doesn’t seem problematic. And when you ask about how that happens you’d just hear “it’s an axiom of QM called the Born rule, and it’s been experimentally verified to 18 decimal places” or some such thing. These days undergrad books just give the QM theory as axioms and teach it like it’s unquestionable due to the extraordinary precision of predictions. Never mind that you can question what a calculation means without questioning whether you got the right numbers.

          Of course, this just pushes things under the rug… So there’s usually some discussion, you hear about “collapse of the wave function” or some such thing, it all sounds mysterious, you assume that some high end people in the field probably understand it pretty well, and you spend your time learning how to grind through the formalisms to get reasonable calculations, which is *hard* and requires a crap load of learning. Out the other end of a masters or PhD program and you’re able to go to work for Intel designing semiconductor structures using first principles QM calculations, and you’re happy.

          You will *not* spend your time thinking for a long time “under what circumstances could we define real actual particles with positions and momenta” because if you go down that branch you *will* fail your physics tests… and if you do that you’ll wind up in philosophy or math, where you probably won’t know enough physics to make real progress unless you’re quite exceptional.

          So basically progress in the foundations of QM is left to a very small group of physicists who make it through grad school, get tenure, and then return to earlier questions that bothered them, or to a few high end philosophers of science or mathematicians like Edward Nelson who explored what he called “stochastic mechanics” but eventually gave up on it (though apparently possibly for the wrong reasons, he seems to have convinced himself that you can’t get entanglement from stochastic mechanics, but this isn’t exactly true I think)

          anyway, end story: we’re stuck in QM in large part because it’s so hard to do useful QM that you have to spend all your time learning how to do it and can’t spend any time thinking about what the heck it means, also doing that would step on toes so there’s been in the past lots of social reasons not to.

          Now that it’s several decades past when the Fermis and the Feynmans and soforth have died, and their direct students are dead or retired or about to retire… we’re seeing more active work on the foundations of QM because it doesn’t step directly on the toes of “great men”. The end result is that everyone seems to agree that no-one has the slightest idea what QM actually means, it’s a theory that predicts outcomes of experiments, but it doesn’t have a unique underlying physical model. Bohr basically insisted that it *couldn’t* have an underlying model and we *shouldn’t try*… which is why it’s taken so long to get people to think about the underlying model.

          I think the best progress on QM will come from someone who takes a realist approach (particles exist and have positions and momenta), uses Bayesian probability to describe the information we have about the positions and momenta, accepts nonlocality as a fact of life, and tries to make progress directly on the consequences for describing QM in a relativistic context including gravity.

          unfortunately it’ll require a very particular set of skills and background, and a certain kind of aesthetic sense that isn’t clobbered by an established view so it may take a long time.

        • To me, different interpretations of QM are not even directly relevant. Andrew argued earlier by in his blog that “Probability is a mathematical concept encoded in the Kolmogorov axioms. That’s it. No need to argue over a canonical interpretation. It’s just math!”

        • Addition is just a mathematical concept encoded in the Peano axioms, but 1 meter + 1 meter means a different thing than 1 kg + 1 kg

          Knowing what you mean by “the probability of X is” can only be figured out if you know what was meant by the person who did the calculation.

        • Man, that summary of Cox’s theorem is possibly the single most useful thing I gained by following this comment thread!

        • If the point of interpretation is to create an intuitive picture of microscopic reality, it is pointless, merely a naive attempt to salvage current beliefs. But then, this is especially true of Copenhagen, which assumes convention, then salvages this by denying science aims to describe reality, at all. Radical epistemological skepticism is reactionary, like most physicists and other academics, so far as I can tell. (No, being urbane, tolerant, generous etc. may be the idealized self-image of liberals. But social manners are not politics. Liberalism is not leftism, either.) If there is a generally agreed upon basic understanding of how QM/QFT lead to spacetime, or a DeSitter universe, I’m not aware of it. The greatest trend now is study of a universe we don’t live in, an anti-DeSitter universe.

          The point of interpretation is to help us sketch out how, for instance, a quantum object such as the primeval universe, connects to our reality. Much of the critique of Copenhagen no more addresses these issues than Copenhagen does. Another way of saying this is to point out that the problem of how to reconcile QM/QFT with General Relativity has been “resolved” by the determination that, somehow, GR is wrong. The real issue then should be interpreting General Relativity, how to revise its foundations. Even people who critique some aspect of fundamental physics, agree that GR is wrong. The real question, then, is: “How is GR wrong?” Thus far, the answer is, it contradicts QM/QFT.

          I do not see how statistics is relevant to these issues.

        • Steven:

          One of the key problems for me in QM is that there is usually no acknowledgement of the difference in meaning between probability: a measure of information about the state of some particular thing, and probability: the frequency with which things happen in repetition.

          Let me give you an example. We both close our eyes and you throw a coin in the air. We hear it land on the ground. How can we describe our state of information about whether it is heads? Namely as p=0.5 heads and p=0.5 tails. Now, we open our eyes. Magically “the wave function collapses” and suddenly in seeing that it is heads, we now have p=1.0 heads and p=0.0 tails. If we think of this probability as a property *of the coin* then it seems that there is a special role for the observer in modifying or making this property come into being… collapsing the wave function so to speak happens “because of measurement”.

          If in general we mix the ideas of frequency and probability willy nilly, we can wind up failing to understand why mysterious things seem to happen, like this. I just ask that we keep a *sharp bright line* between frequency and probability, and form a QM theory that keeps that bright line always in focus.

          There is no mystery in the heads/tails calculation provided that you recognize that you are measuring your information about the state of the coin, not the actual state of the coin.

          Similar issues arise in QM. When we perform a double-slit experiment and see a flash at a particular spot, at the moment we see this flash, our uncertainty about the location of the “particle” disappears. It is *right there* with probability 1. On the other hand, if there is an actual quantum wave moving through space, or through some high dimensional configuration space, this collapsing of *our* probability assignment for *this particular* particle has no logical consequences for the quantum wave’s shape or extent.

          If we keep these things in mind, we will make better progress on QM is my assertion.

          This is what “statistics” (or rather probability) have to do with QM.

        • Daniel:

          After we get the reviews back from the journal, I will try to rewrite that section in light of the comments on this post. The point of that example is that a model that seems natural, based on intuitive ideas of physics, does not work. Bayesian inference really does fail in that if you estimate p(x) from one experiment, p(y|x) from two other experiments, and p(y) from a fourth experiment, that these distributions are not consistent with each other. Now, sure, you can (correctly) point out that the laws of quantum physics tell us that we cannot estimate these probabilities from four separate experiments, and one can put this in a Bayesian framework by requiring that we condition on the measurement. That’s all fine—but when we’re not doing quantum mechanics, we don’t do that conditioning; we routinely estimate different aspects of a joint distribution from different experiments, indeed that’s part of the whole likelihood and prior thing. So I think there’s definitely something important about this example. I want to convey this without getting tangled in disputes that are occurring within physics.

        • What I agree with is the following: if your model is that y is independent and identically distributed across N experiments, and in fact it’s independent but NOT identically distributed, then you will find that Bayes gives the wrong answer. It’s just so routine that say running a mouse experiment doing surgeries to see if skin regrows after a certain kind of injury doesn’t have outcomes that depend on whether you left the window cracked open on your car when you parked it down in the parking lot below the lab building.

          QM is just a scenario where nonlocal state of a system such as whether a slit is open in the “parking lot” *does* directly influence the outcomes of the experiments… But it’s not the only one. Feynman’s example from the cargo cult speech of the guy who was forced to put his rat maze in a bed of sand to keep the rats from “hearing” the sound that their footsteps made when they got near the spot with the reward is a good example. Seemingly irrelevant things like whether the maze was on a table top or a bed of sand actually change the distribution of the outcomes.

        • Bohmian interpretations are rejected not because they give the wrong answer, but because they give the right answer for reasons physicists prefer to reject (ie. it’s a political issue, they don’t like the nonlocality, and apparent faster-than-light effects […])

          They’re (informed) metaphysical preferences, not political. In the light of developments in quantum foundations (including QFT) it’s hard to see why anyone would be spooked into accepting nonlocality / re-accepting action at a distance.

          a split between “quantum stuff” and “classical stuff” which is built into many other interpretations.

          Many? Either way, if you know of any interpreters who’ve got that split built in to their interpretation, and they’ve justified it by an appeal to Bohr, you can refer them to this.

          Basically saying that “normal probability theory doesn’t apply” to QM is perpetuation of a political position of Bohr’s. Bohr and co browbeat the QM community for decades to try to eliminate all ideological opponents in a typically 20’th century cold war.

          As a political scientist you might find that history interesting, and how much of what’s “weird” about QM is really about political power struggles within academia.

          Perhaps a political scientist could explain why demonisations of Bohr along with (sometimes tacit, possibly unwitting) dismissals of much of modern quantum foundations* seem to have become popular with some proponents of some QM interpretations / alternatives. Recently Sean Carroll has put the boot in too, but on the MWI side (usually it’s the Bohmians).

          * This is by far the worst aspect of this sort of thing of course. Fuchs rightly laments it in his review of Becker:

          From this point of view, the tools and concepts of quantum information are what were needed to make sense of the deeper elements of the Copenhagen interpretation all along. Yet, this is an interpretative route hardly mentioned in Becker’s book, though it now commands a significant portion of quantum foundations research worldwide. […] I suspect it would be difficult for pursuers of hidden-variable theories, spontaneous collapse models, and many-worlds interpretations to exhibit a similar quantity of research activity in their own fields.

        • A little bit of an aside here. The book I recommended to Daniel Lakeland above – Einstein’s Unfinished Revolution by Lee Smolin – is actually quite level and generous in consideration of all the existing interpretations of QM. I actually came away from that book still fairly sympathetic to Bohr, and more confirmed in my broadly instrumentalist take on philosophy of science, despite Smolin’s impassioned and cogent arguments on behalf of “realism”. Although, I really enjoyed the principles and new directions that Lee advocates, and have enjoyed thinking about them in relation to fields of study that are more in my wheelhouse…

        • Smolin gives an example of how QM foundations *is* political in his quote of Leon Rosenfeld, a close Bohr collaborator, responding to Bohm:

          “I certainly shall not enter into any controversy with you or anybody else on the subject of complementarity, for the simple reason that there is not the slightest controversial point about it…

          The difficulty of access to complementarity which you mention is the result of the essentially metaphysical attitude which is inculcated to most people from their very childhood by the dominating influence of religion or idealistic philosophy on education. The remedy for this situation is surely not to avoid the issue but to shed off this metaphysics and learn to look at things dialectically.” (Smolin, pg ~100 electronic version).

          It’s hard to see how this is anything but a “You’re not Marxist enough to understand the true meaning of QM”. From the other end of course came the “Bohm’s too Marxist to be allowed into US Universities” thanks to Joseph McCarthy. So to imagine that politics played no role in theoretical physics between 1920 and say 1990 is just too wide of the mark.

          If you read Griffiths book which is one of the most common undergrad / beginning grad textbooks on QM you will find him advocating the idea that Bell proved that there are no hidden variable theories on page 6, and asserting that the “orthodox” opinion due to Bohr is the only possibility. He also claims basically that the collapse of the wave function is just a fact.

          an actual quote: “for now, suffice it to say that the experiments have decisively confirmed the orthodox interpretation. a particle simply does not have a precise position prior to measurement…”

          (you can use “look inside” on amazon to see these pages https://www.amazon.com/Introduction-Quantum-Mechanics-David-Griffiths/dp/1107189632)

          Is it a political position to simply write textbooks asserting that no other ideas about the world are possible which has been proven by Bell? Even when Bell himself published a whole book about the fact that Bells inequalities don’t mean that hidden variables are impossible? I think yes. That is, this is a choice, a choice to shun alternative possibilities.

          I don’t know enough about QM to know how to proceed from any of the other theories to develop them further. It’s not within my power. What I do recognize is so long as we publish widely read textbooks claiming any other interpretation is mathematically impossible or completely disproven by experiment… we will not encourage any investigation of this question.

        • This sounds like how I have read that Chomsky operated in linguistics: defining any questions unanswerable by his schema as uninteresting and metaphysical. True, Bob Carpenter?

        • The first thing we have to do to make progress in QM in my opinion is to recognize the difference between the frequency of events in an ensemble, and the probability that a given single event will or has occurred.

          Simply stop using the term “probability” for wavefunction(x,t)^2 and say “frequency in replications of this experiment”. This will help because for example we can then make sense of the following experiment and the calculations needed to answer the question:

          suppose someone fires a photon at a two slit apparatus which has a screen in front of one of the slits that oscillates back and forth chaotically fed by amplified noise from a radio receiver tuned to static background… so that the second slit is either open or closed, but without a recording of the fact, we have no way to know which. Nevertheless in 50% of repetitions it will be open.

          We set up a far screen at the other side of the apparatus to collect photons and show a flash of light.

          When the light flashes on the far screen we also have a detector that beeps to know when the experiment has finished. A camera captures a photograph that shows us where the flash was once we’ve observed the photograph. There is also a camera that observes the second slit and can tell us whether it was in the open or closed position once we’ve observed that photograph.

          Assuming this background, write down the probability

          p(flash at X | Beep)

          Write down the probability

          p(photon went through the first slit | Beep)

          Write down the probability

          p(second slit was open | flash at X, Beep)

          Write down the probability

          p(second slit was open | flash at X, photo of second slit, Beep)

          Write down the probability

          p(photon went through the first slit | flash at X, photo of second slit, Beep)

          Now, there is exactly 1 replication of this experiment. Let F be the frequency with which a given thing occurred in this single replication:

          Write down F(flash at X | Beep) (hint it is either 1 or 0 but we don’t know which, we haven’t looked at the photo of the screen yet. Therefore it is not a well known number, though it is in fact a well defined number)

          Write down F(flash at X | Beep, Photo of screen) (hint, we know this number exactly once we see the photo of the screen)

          Write down F(second slit was open | apparatus) (hint, it’s either 1 or 0 but we don’t know which)

          Write down F(second slit was open | flash at X, apparatus) (again, it’s either 1 or 0 but we don’t know which)

          Write down F(second slit was open | flash at X, photo of slit) (now we know it’s either 1 or 0

          Write down F(photon went through first slit | flash at X, photo of slit) (depends on whether the photo shows the second slit was open or closed)

          I think you can see that this wedge we drive between what we know, measured as a probability, and what happened, measured as a frequency, is a critical distinction needed to make progress, until we make this distinction, QM will continue to pun on the two meanings of probability just like statistics has punned on the two meanings of “significant”

        • Kyle, that’s what it seems like to me. Bohr et al seem to have attempted to redefine *what it means to do physics* so that questions of the type Bohm and Bell asked, such as “what is an electron?” and “does it have a particular location before it hits your detector?” were now *no longer physical questions* but rather some kind of unseemly philosophical question for poseurs. Therefore those asking such questions were no longer physicists, or at least no longer wearing their physics hat, and could be safely ignored.

          The only “physical” questions allowed by Bohr’s group were “what would be the frequency that light will flash at point x on my screen” or “what would be the frequency with which I would detect a photon if I made my polarizer have this certain angle” etc. They seem to have claimed a non-instrumentalist view, but I can’t make heads or tails of what that would be, it seems like frequency of instrumental outcomes is the only thing they really admit as real.

          My own background is doing lots of mathematical modeling in which I build models where an unknown variable is *the most physical thing* and the observations are essentially irrelevant except that they give me information about the physical thing. For example, if I have a video of you doing a cartwheel while wearing reflective dots in a motion capture room, and I want to infer where your center of mass is and what its velocity is at time t, I don’t care how often in repetition you would have dots in a particular position, I want to know given where the dots were at time t where was your CM… and I will always wind up with a Bayesian posterior distribution over the location of this CM.

          The fact is relatively few physicists even seem to know what Bayes is about, and for the most part I would guess that they would almost universally confuse the frequency with which a particle goes through a given slit with the probability that it did in a given case.

          This attempt to redefine what was “allowed” vs what was “not physics” is at the root of why I say things were political (for example attempting to control who could be considered “in the group vs out”)

          I’m not either an expert historian or an expert physicist but this is the take-away that I have come up with from a moderate amount of reading (I’ve read Bell’s book, Bohms book, and some of the textbooks by both Griffiths, and Eisberg and Resnick, and most of the Feynman Lectures, all three volumes. I also took a basic undergrad course in intro QM, and a grad level course on stat-mech with a section on quantum statistical mechanics. This makes me not even close to competent in actually doing QM but I know something about what the issues are and how they’ve been described by different authors)

        • Smolin gives an example of how QM foundations *is* political in his quote of Leon Rosenfeld, a close Bohr collaborator, responding to Bohm:

          Yes, and Carroll tells anecdotes about Hugh Everett’s “persecution” by the “establishment”. They may even be true. In the end it’s what these people are [not] saying and writing about the actual content of modern QM foundations that’s of most concern.

          By “metaphysical preferences” I just meant e.g. choosing between accepting nonlocality or not where neither choice has been ruled out.

          Is it a political position to simply write textbooks asserting that no other ideas about the world are possible which has been proven by Bell? Even when Bell himself published a whole book about the fact that Bells inequalities don’t mean that hidden variables are impossible? I think yes. That is, this is a choice, a choice to shun alternative possibilities.

          Erk! General QM textbooks aimed at undergraduates often aren’t good on foundational matters. I see the preface advertises it as a book intended to teach how to do QM, leaving the “quasi-philosophical” stuff to the end. A remark which leads me to expect errors there! Let’s restore ‘political’ balance with a textbook which advertises itself as a foundations book but which, among other flaws, has Bell ruling out locality.

          #despair

        • When I say it was political, I meant in the period 1920 to say 1980 or so. I have no knowledge of current sociological state but I think it likely that all the social issues are largely dissipated and that foundations today are a less socially risky area.

  8. Daniel: > I think you can see that this wedge we drive between what we know, measured as a probability, and what happened, measured as a frequency, is a critical distinction needed to make progress

    Now, I think why we were disagreeing about something we might actually agree on.

    “what we [think we know] know, measured as a probability” is an abstract object (math) taken for now to beyond doubt – a representation.

    “what happened, measured as a frequency” is a distributional summary of what (we think) happened empirically.

    But with respect to what we think will repeatedly happen in future, it is informed by both the abstract object and the empirical summary – a reasoned argument for what to expect but expressed as a possibility (math).

    So in terms of Peirce’s 3 categories (yawn), possibility, actuality and expectation (based on reasoned argument of possibility and actuality) which becomes the new possibility.

    That might be, has been and expected to be.

    So thanks.

    • > But with respect to what we think will repeatedly happen in future, it is informed by both the abstract object and the empirical summary

      Exactly. But beyond that, many times what we want to know is not actually a thing we can measure. We can only measure some consequence of that thing. Like for example how far away is a lightning strike? The thing we can measure is the position of light, and the position of sound, and the position of the second hand on our watch (in fact, we can only every measure the positions of anything, no other quantities are possible to directly measure).

      We can *infer* the distance to the lightning strike because we have a model for what will repeatedly happen when lightning strikes the earth, namely that light will propagate outwards at the speed of light, which is a constant, and sound will propagate outwards at a speed which varies from place to place and time to time, but is basically always in a narrowish range…

      In other words, we need inference even to make sense of what people call measurement. I think this is something neglected in QM as well. Measurement is always the position of things. There is no such thing as measuring momentum, or measuring spin, or measuring energy.

  9. The paper could use some editing. Some issues that I noticed follow (I apologize in advance if I misunderstood and any of them are actually correct):

    “you could able to come up”

    “the approach can makes sense”

    “will have essentially no inference on the parameters”

    “also makes a claims”

    “we can take consider this is as a prior”

    “the actual result with notes”

    “a modification to a proper by weak prior”

    “but rather as a motivation to better understand than then to improve modeling and inferential workflow”

    The liberal use of commas is mostly a matter of style, but some stand out:

    “could more formally write, p(…)”

    “the default linear mixture (0.5, p(…) + 0.5 p(…))”

    This fragment sounds kind of strange to me, but maybe it’s intentional:

    “… to simply insist on realistic priors priors for all parameters in our Bayesian inferences. Realistically it is not possible to capture all features of reality…”

    • The point about the invalidity of inferring the failure of the conditional probability rule by ‘conditioning’ on different experimental setups has been made before here. Many times, apparently. However, the general situation in QT is not quite as simple as “the rule is valid in QT” (see the discussion of “quantum conditional states” in the linked article).

Leave a Reply to Daniel Lakeland Cancel reply

Your email address will not be published. Required fields are marked *