The MLE estimate is just a number of course, so it’s compatible with lots of different Bayesian output. But the CI construction procedure for a given alpha level will coincide with the posterior probability interval for a given alpha using a prior of the form I’m discussing **exactly** in many CI construction methods (those based on likelihood functions). This is a stronger sense in which the Frequentist result is mathematically the same as a Bayesian result with a particular nonsensical prior.

]]>If a well-tested theory makes a prediction of X, do we take that prediction as “evidence” for X? We certainly take it as “support” for positing X. If the window is broken and the papers are scattered all about the room we may take this as evidence of a burglary. The theory is, burglars are known to riffle break windows and riffle papers. But it might have been the mummy from the neighboring college museum; come to get the names of the archeologists who’d violated his grave. It could have been the dog next-door, charged through the window in pursuit of his own shadow…. why, it even could have been the wind. What makes the difference between the story of the burglarous entry and the other shaggy-dog tails? Experience.

]]>Sure. Reasonable priors are based on evidence/experience, they’re not justified a priori (i.e. independently from experience). The example is just meant to illustrate that “evidence for X” doesn’t have to be something like “most stuff we’ve seen in this reference class is X, therefore probably X”. That’s the simplest case of evidence, but not the only one

]]>Isn’t prior theoretical knowledge a summary of some prior genera of experience?

]]>A maximum likelihood estimate and its associated confidence interval assumes no prior of course but it will produce an answer compatible with an infinite number of Bayesian priors, including some crazy ones like yours.

]]>Lakeland:

“If you are saying that the history of our observations form our rational beliefs I don’t disagree. If you are saying that Bayes has secret quantitative frequency properties that are swept under the rug I DO disagree”.

Bloom: he does not sweep anything under the rug; he relies upon his experience. His experience tells him, such and such is rare under such and such circumstances; and it is common under other circumstances. He reports this experience by various figures of speech, “I believe this is the case”, “I strongly believe this is the case”, “I doubt it”, “I would not bet on it” …. If he is an ornithologist, or a beekeeper he may even perhaps have book-entries in support of this or that assertion about what is the case. If he is a particle physicist, all the more-so.

What he believes — if the belief itself be taken into evidence — rests in what was observed, somewhere, by someone, by him or by his forebears or colleagues; and what was observed are the concrete facts of experience in the world. The fact that such facts may or may not be adequate basis from which to build a set of relative frequencies; is just a reflection of the truism that ‘the investigation can be no more precise than the subject matter admits’

]]>Ron there is a difference between saying that I’ve seen a lot of people and many of them were around 150 to 250 lbs and saying that if you select a random selection of American males 10000 of them, that such and such percent will be between 100-110 and such and such percent between 110-120 etc etc.

Frequentists demand that every probability distribution quantify what fraction of the reference class falls into each subset. Bayesian reasoning just says that we’re willing to entertain substantially more credence on one subset vs another independent of whether we think some replications will bear out the frequencies. It is also entertaining situations where no replication is possible. Probability that the mass of a certain meteorite is such and such. There is only one meteorite. It’s mass is a number. My reasons for believing it’s mass is one thing vs another can be logical without having a repetition class associated and a frequency enumerated.

If you are saying that the history of our observations form our rational beliefs I don’t disagree. If you are saying that Bayes has secret quantitative frequency properties that are swept under the rug I DO diaagree

]]>I cannot respond to Lakeland below, so I will respond above (here).

If I have a hunch about something being more or less probable, and my hunch is not grounded in a web of experience which I can elucidate, then I am just huffing and puffing, and you should have no reason to listen to me at all. If my huffing and puffing turns out, more often than not, to be proven true; then you *do* have reason to listen; but only *because* my track record is there for you to see!

The reference class is always there, somewhere. Implicit or explicit.

]]>Ron, I don’t disagree with your clarification. Yes, we should have **reasons** based on a database of knowledge, for the use of our priors. When the final results depend sensitively on the prior, you should have a very convincing argument for the use of the particular one you did use. When they don’t depend so terribly strongly, it can be sufficient to use a few words, like “the density of smoke in the air is definitely dramatically less than the density of the air itself under almost all real worlds circumstances. Air is listed as about 1.2kg/m^3 on wikipedia, so we place a normal prior at about 0.1kg/m^3 with standard deviation 0.5 truncated to 0, to cover a broad region of plausible densities, including densities as low as 0.”

none of that is really frequency based though.

]]>> But in higher dimensions – and especially with non-linearity – these approaches can really start to diverge!

Exactly, and especially it can be absolutely critical to have a reasonable prior in high dimensions so that you don’t wind up doing something stupid, where you over-fit your model. The prior plays the role of saying “we only think this model makes sense if it has parameters in this certain region because outside the region the model would be “nonphysical” or “meaningless””. In other words, if you have to use those values, it’s because we built a bad model and need to start over not because the real world works just like our model but we were really wrong about the parameter values (like say the density of smoke in the air REALLY IS greater than neutron star material).

But the prior in high dimensions has ABSOLUTELY NOTHING to do with the frequency with which we saw whatever in some reference set of previous experiments or similar experiments we’re going to do in the future or whatever. It’s just “hey the world either works like this, with parameter values in this region of space… or we need to start over from the drawing board”

]]>I cannot respond below Lakeland’s comment so I will respond here above.

It is not the case that there must be controlled trials of replicates before a judgement of probability can be made. What I am insisting on is that a judgement of “probability” if it is a *reasonable* judgement must be based on evidence; and evidence consists (if it has any *reasonable* bearing on the question at hand) of histories of events or constellations of events; prior events; which I or others may have witnessed and which we or they have recorded. If it is so vague that I or they cannot make it over into a strict tabulation of presence and absence of effect; well so be it: some probability judgements are going to be sloppy and some will be tight; and others will be in the middle. But the evidence you bring to bear better have some grounding in experience if it is intended to be persuasive! This is not to deny that the mere expression of belief on the part — say — of a person of proven veracity carries with it a certain value as evidence, simpliciter. But to erect the whole tower of the theory of probability on belief — simpliciter — is remarkably jejune in its evasion of the simple question: belief on what grounds; on what evidence?

]]>Andrew, I have never fully understood your argument why “Bayesians are frequentists” :) If I don’t, in practice, construct prior models by thinking of the reference set of problems to which I would use said prior, am I still a frequentist using Bayesian clothing? What does this reference set mean mathematically, and how does it map to ‘probabilizing’ the parameter space, establishing the prior measure? If I push some simulated data from a prior predictive and then adjust the specification based on what I see, doesn’t that tuning/modeling process sort of whittle me down to a reference set of one – i.e. the particular problem I am working on?

]]>I think this is mostly right, although Bayes + Flat Priors does not necessarily equal max likelihood, the most common instance of a non-Bayesian method for a similar class of problems as what I work on. In the max likelihood framework you are working with the MLE and the Fisher Information at the MLE, whereas in Bayesian inference you would be integrating over the prior. Only in the latter does the unrealistic weight out to non-physical values implied by the improper prior really matter.

The max likelihood approach relies on asymptotic assumptions to justify using only the local information at MLE.

In the simple, linear, low-dimensional toy examples we generally use to train intuition, these differences are very slight in practice. But in higher dimensions – and especially with non-linearity – these approaches can really start to diverge!

]]>I have no idea what this has to do with the quote, but I’ll answer it anyway: sometimes the rational basis of the belief comes from theoretical knowledge. To give a somewhat simplified story, scientists at some point thought that slamming two high-speed neutrons together would cause an explosion. They weren’t *certain* of that, hence the need for testing, but it was a reasonable prediction based on the best theories at the time. The prediction turned out to be true. But it wasn’t at all based on “phenomena analogous to” slamming two small objects together. Nothing usually happens when you slam two small objects together. If they were to reason by enumerative induction (“nothing usually happens when we slam two small objects together, therefore…”), the prediction would utterly fail. You might say that the comparison is not fair because slamming subatomic particles together is very different from slamming rocks together. But scientists only knew they were different because their theoretical knowledge said so, that’s the point.

]]>“And Bayesians are Frequentists who don’t think they need to keep books on from whence they learned what they say the believe is true about the world.”

That doesn’t seem right at all. In the Bayesian calculus with the proper notation, there is always the knowledgebase on which you’re conditioning. p(A | B) where B is some background set of facts. But the **reason** for a belief need not be that in the past the frequency with which x occurred was f. For example “I think your weight is somewhere in the range of 80 to 450 lbs, with a peak around 200” modeled by some gamma distribution has **nothing** to do with me having a database of 200 randomly selected people and their weights. It does have to do with a general knowledge that people under 80 lbs are usually very sick, and that I have never met anyone over 300 pounds, but I do know a few football players or sumo wrestlers are occasionally that large.

I’m fairly confident though that if we did randomly select 200 people and then do a chi-squared goodness of fit test, the frequency of weights would not match whatever gamma distribution I chose. However, also, if we did do that, I would recommend to use as a prior some best-fit gamma distribution going forward.

“Bayesians are Frequentists. Full Stop.” also I don’t agree with. The definition of Frequentist is that probability means *only* the frequency with which a thing occurs in repeated trials. Bayesians may be interested in being able to predict frequencies, but they model things with probability **other** than the frequency of reoccurrence.

]]>Andrew: “Bayesians are Frequentists. Full Stop.”

+!

]]>Rm:

+1. It can be useful to model belief probabilistically, but belief is not the foundation of probability, any more than betting is the foundation, or coin flips and die rolls are the foundation. These are all different setting where probability can be useful.

Just one thing. You say, “And Bayesians are Frequentists who don’t think they need to keep books on from whence they learned what they say the believe is true about the world.” I’d just say “Bayesians are frequentists,” full stop.

]]>What *reason* can there be for belief in X other than our experiences in the world of phenomena analogous to or identical to X? Rather: what *rational* reason can there be?

]]>And Bayesians are Frequentists who don’t think they need to keep books on from whence they learned what they say the believe is true about the world. It may be difficult to do it; it may be impossible. But to the extent they make a big hullaballoo about “belief” being the *primitive* concept, not further reducible, not grounded in experience, simliciter; they are peddling intellectual error.

]]>Imagine a model for something like the calibration voltage for a circuit designed to measure say oxygen concentration in automotive exhaust (like for a fuel injection computer). You know for example that if you dial in this voltage correctly your circuit will read out percentage oxygen as a digital readout on a gauge, and then the fuel injection computer will accurately supply fuel.

A frequentist can collect say 10 measurements from an exhaust manifold which is rigged with both a pre-calibrated sensor and an uncalibrated sensor, then using a normal measurement error, can try to infer the required calibration voltage.

If as a frequentist they insist on “not using a prior” and rely on a likelihood based confidence interval they may get a confidence interval that is say [2.7,3.2] volts with say a maximum likelihood estimate of 3.03 volts. However suppose the CI construction method mathematically gives the same interval you would get if you did a Bayesian posterior using a “noninformative” improper prior where the prior is “uniform on the real numbers”. The prior which is “uniform on the real numbers” can be thought of as a nonstandard distribution in nonstandard analysis. This density is 1/(2N) on the range [-N,N] where N is a nonstandard integer. These nonstandard distributions have the property that for any limited number x all but an infinitesimal amount of probability is located in the region outside [-x,x]. In other words “frequentists are mathematically like bayesians who believe that their parameter is almost surely infinite in magnitude but could be either positive or negative”.

Now, any person who shows up at a calibration lab and says “i’m positive I’m going to need every volt you can give me, is this rig capable of handling 100 billion trillion volts?” should probably be physically restrained until appropriate care can be provided. It makes NO sense.

On the other hand, a hard core Frequentist would say that the Bayesian who provides a prior of uniform(0,10) volts knowing that the circuit is only designed to supply 10v anyway and any sensor which can’t be calibrated by that range of voltage will just be chucked in the bin as “out of spec” anyway… Well the hard core frequentist believes that the Bayesian is actually insane and believes in fairies because “probability only applies to measurements not parameters” and the math they’re doing is just improper like adding Volts to meters/second.

If I’m the race car driver and I have to make a choice between these two zealots, I know which of those two zealots I want calibrating the exhaust manifold of my race car.

]]>“In this work, the word chance will refer to events in themselves, independent of our knowledge of them, and we will retain the word

probability […] for the reason we have to believe”

Even before Ramsey!

]]>Daniel is talking about priors for continuous parameters, not priors over discrete models a la Bayes Factors…

]]>No, for a dividing hypothesis you can specify prior odds (“a likelihood prior”) and make no claims whatsoever about the shape of the prior probability distribution, only the relative distribution of its mass on either side of the divide.

]]>Exactly. If your confidence interval is equivalent to a Bayesian interval with a uniform prior, (confidence intervals constructed from likelihood based tests for example) then by using it you are saying essentially that you’re a Bayesian who is confident before data that your parameters are all greater than 10^300 in magnitude but don’t have any idea about the sign. That’s enormously silly in essentially 100% of applied problems.

]]>It’s funny, when people talk about “having to specify a prior” as a weakness, my immediate thought is always the weakness of “not specifying a prior”! I wonder what set of asymptotic assumptions are being invoked instead and how they stack up to encoding external information/considerations/constraints in a prior model….

]]>Daniel:

I agree. I think the problem is that all these people are working within a classical tradition in which the goal of a hypothesis test is to reject false hypotheses. But I only work with false hypotheses. All my models are false. That’s why I speak of “model checking” rather than “hypothesis testing.” Rather than there being some null model I want to reject, I’m in the position of having a model I like, but which I know is flawed, and the purpose of these tests is to reveal flaws in the model, in this case flaws of the form that the model is predicting things that are much different from what has been observed in the past.

]]>“The main strengths of p_prior are that it is also based on a proper probability computation… and that it suggests a natural and simple T… the main weakness of p_prior is its dependence on the prior pi(theta)…”

…

“The main strengths of P_posterior are as follows:

a) Improper noninformative priors can readily be used

b) m_post(x|xobs) typically will be much more heavily influenced by the model than by the prior….

c) It is typically … easy to compute … from … MCMC”

I personally find all of these things misguided. As a Bayesian the prior **is part of the model** it’s a statement about what we think is more or less likely to go on in the world. So I don’t see “its dependence on the prior” as a weakness of the prior predictive, in fact, a major thing I’d want to use the prior predictive distribution for is to confirm that the prior correctly spells out what I do really know about the world. It can be difficult to directly specify a prior, but it can be much easier to check the prior predictive against the kind of data I expect (so for example to specify a small fake dataset and then compare prior predictive p with that dataset)

The “strengths” of the P_posterior mentioned are also not consistent with my Bayesian practice. I **never under any circumstances** use improper priors. They are **not** uninformative, in fact what they say is that the parameter value is almost surely infinite or negative infinite. Furthermore m_post … “more heavily influence by the model than by the prior” makes a distinction that doesn’t exist in my mind (the prior **is** part of the model)

]]>I don’t think that QBism (née Quantum Bayesianism) solves QM strangeness. But I cannot say that I really understand what they are trying to say, to be fair. The fact that there is a stream of publication in the last two decades that say different things doesn’t help.

The starting point is reasonable: in general our knowledge of quantum states is incomplete (not pure states) and our measurements imperfect (not projectors). The densitity matrices representing quantum states are (proper) mixtures and our probabilistic predictions reflect both our uncertainty about the state and the quantum indeterminacy that would remain even if our knowledge was complete. The changes in our description about the quantum state when we do a measurement represent in part a refinement of our knowledge and in part a physical change. But if the system was in a pure state and our knowledge was already maximal there is no “Bayesian updating”, only strangeness.

Another discussion of the subject can be found “The quantum Bayes rule and generalizations from the quantum maximum entropy method” by Kevin Vanslette [ https://iopscience.iop.org/article/10.1088/2399-6528/aaaa08 ] which is more in the line of Jaynes (by the way, one can find online a draft of a chapter “Maximum Entropy: matrix formulation” that didn’t make into PTLOS).

From there, they get into metaphysical discussion were I’m lost. To make clear that I’m completely missing the point I’ll just say that if you prepare some quantum systems (and have some predictions about measurements) and while you look elsewhere I manipulate them (and have my personalistic predictions about measurements) what we may find is that when you do your measurements the outcomes match my expectations and not yours. As if some subjective descriptions were more objective than others…

I wouldn’t say either that existence of “objective probabilities” make the “subset claim” invalid. At least in the sense that some of us may use it to defend ourselves against arguments like “You can have epistemic probability, or you can have aleatory/frequentist probability, but you should decide which one you want.”

]]>How can beliefs — if rational — about what is likely or not likely to occur not be grounded in concrete facts of experience in the “real-world” ?

]]>Thanks Sander for mentioning Frank Ramsey. My aunt, a mathematician kept his name and views alive in our family.

]]>Thanks a lot for the insightful comments! My reading list has increased after reading them

]]>“Even if physical probabilities exist out in the real world, a statistician or physicist or other human is always reduced to questions about what they know about the world.”

The physical world exists ‘out in the real world’ (and we are a part of it too of course). And what we ‘know’ about it — if that so-called knowledge is to be any pragmatic guide at all to action — must be conditioned on what we believe has happened in that real world in circumstances similar to those in which we stand, when we propose to act.

If our beliefs are to be any guide at all they must be informed by what we or others before us have seen; or at the very least what they presume that they have seen. Probabilities, whether you call them “aleatoric” or “epistemic”, if they are of any relevance as useful guides to action, have to be grounded in summaries: summaries of worldly experience.

]]>“If you ask most people what are the “probability that the mother of Lincoln was blonde”

If the “epistemic” probability of such an assertion is merely “what I think I know about it” — simpliciter — that is a cop-out. What matters is the *reason* for which I think I know such and such. Logic does not describe *how* we think, but how we *ought* to think … if we seek to be persuasive.

My “epistemic” probability (for such assertion ) will be grounded (if it is not mere verbal nonsense) in my recollections of: hairstyles of wives of (Americans) politicians, my recollections of assertions by first or second-hand chroniclers of the same. Less persuasively, it may be grounded in my recollections of persons I believed where wives of politicians — i.e. proxies for them — but may, in truth, not have been.

All in all, whoever they may have been, I seem to recall (or think I recall) that some were blonds and some where not. And thus, by this expedient, I have conjured up a retrospective study of the matter; divided my observations into two groups (those with the character in question and those without) and produced a ratio.

The fact that the retrospection is subject to gross error of memory, absurdly limited sample, and all other manner of cognitive confusion and bias, does not change the fact that if I claim I know about the propensity of politicians’ wives’ hair I am reporting what I have learned: by recollections, by chronicles. I have created an experimental record. A reliable one? Of course not. But it is the best that I can muster. Just because a retrospective experiment is poor or weak one does not mean it sought access to a genera of knowledge different from a well-performed or strong one does.

What knowledge can I appeal to; other than my recollections of what I see; and my recollections of what others say they see; and my recollections of the record of the veracity of those who say they see?

]]>Yes, sorry. The “No” was because the QBists don’t view the Born rule as an “out there” law but I forgot that their interpretations of state and rule aren’t tied together as strongly as they are in the “QM as applied generalised probability” context and so you’re right: on its own that quote wasn’t enough.

]]>Paul Hayes: Thanks for the link! But I don’t get your “No, the QBists’ view of QM is (FAQBism FAQ #4)…” – Why the “No”? I don’t see the conflict between what I wrote and what’s in your linked paper or quotation from it, so please clarify where it is in light of the following:

Long before QBism was developed, DeFinetti gave a radical subjective Bayesian view of QM phenomena and their probabilities, which is the earliest version of what I’ve seen called a QBayes interpretation (note that your linked paper distinguishes QBism from the more vague, general QBayes category). But DeFinetti lacked the explanatory and math details (especially about measurement, a major sticking point among QM interpretations) that a particle physicist would rightly demand. I can’t claim any expertise, only that I’ve read it in the work of Fuchs and colleagues and thought of QBism as one refinement which adds such details, allowing it to match aleatoric conceptions (in which Born’s rule is an external probability law “out there” beyond agents or minds and thus beyond radical subjective Bayes). The key difference is that QBism locates the the probability rule on the agent side of the physical path from the data generator to the agent’s coherent bet. This move seems central to QBism’s resolution of QM strangeness, so please correct me if it is a misconception.

Consider the start of the quotation you sent, “A quantum state encodes a user’s beliefs about the experience they will have as a result of taking an action on an external part of the world”. That leaves an ambiguity due to being out of context of the paper: The state is the “user’s” belief only if the user equates their belief to the state. An objectivist could say that quote assumes implicitly that a user adheres to Lewis’s “Principal Principle” in transferring the distribution from an aleatoric QM law to the user’s belief, and my reading is that QBism indeed assumes an equivalent principle but earlier in the event path.

Defenders of objectivist interpretations (like many-worlds, which I don’t care for) can point to the hiddenness of this assumption in your quotation as masking and inviting confusion of the world “out there” with our beliefs about it. That is a variant of my initial response to the DeFinetti subset claim espoused by Lakeland (that frequentist/aleatoric probability is a proper subset of subjective/epistemic probability). I don’t hold to that subset view as a dogma, but I have found it’s a most useful way to approach probability in soft sciences like health-med, social, etc. where aleatoric laws are only assumptions (and more often than not, very hypothetical oversimplifications of reality).

As a complete amateur regarding QM, I find QBism the most compelling interpretation system for it I’ve read. So what I find fascinating about QM controversies is not some direct relevance for soft-science applications (I’d guess there’s none), but rather that in QM objectivists can seriously challenge the subjective Bayes view (and in particular the subset claim), and thus stand firm against radical subjective Bayes (RSB) as a religion. And that’s great: As with all such metascientific/epistemic positions, I see RSB as one extremely useful perspective for inference and decision problems; but so are its opponent positions in certain contexts (e.g., apparently in many if not most engineering and physics settings), and they can both be used in the same application to good effect. I think the name for my position in philosophy is perspectivalism, or more specifically epistemological pluralism.

]]>For example some in quantum mechanics point to the Born rule as providing pure aleatoric probability not subsumed by Bayesian philosophy since it appears to be a genuine law about relative frequencies that exists out there whether anyone or anything knows it, in the world beyond our heads; even among QBayes adherents (whom if I understand correctly see it as a law about what observers experience) it’s still a law that exists outside of any personal mind or bet (and thus in Popper’s world 1 and the ordinary usage of “probability” to represent a physical system’s propensity to produce a certain relative frequency).

No,the QBists’ view of QM is (FAQBism FAQ #4):

]]>A quantum state encodes a user’s beliefs about the experience they will have as a result of taking an action on an external part of the world. Among several reasons that such a position is defensible is the fact that any quantum state, pure or mixed, is equivalent to a probability distribution over the outcomes of an informationally complete measurement [8]. Accordingly, QBists say that a quantum state is conceptually no more than a probability distribution.

And calling probability “aleatoric” cannot invest it with some character that makes it trascend the domain of our knowledge. Does that mean that any distinction is completely arbitrary and useless?

If you ask most people what are the “probability that the mother of Lincoln was blonde” and the “probability that the red ball will be 7 in this Sunday’s lotto draw” they will agree that they lay on different sides of the divide. They may also say that the former doesn’t make sense, if they reject the idea of epistemic probability.

]]>And if the epistemic category is a strict superset, then what are the rational grounds for beliefs which are not rooted in experience of regularities –in reference classes– of the world, seen, remembered, summarized …. as probabilities? I see no such grounds. Yes, there are grounds for beliefs which are other than these; but why should any arbitrary belief, simpliciter, be admitted as support for any proposition at all? Calling probability “epistemic” cannot invest “probability” with some character that makes it transcend the domain of our experience; for what we wish (rationally) to do with “probabilities” is organize our past experience; so our past experience better lends itself to anticipating subsequent experience. We organize what “we know” therefore; but if it is to be useful, it ought to be rooted in regularities of experience. If what we claim “we know” is not so rooted; well so be it, we are then fooling ourselves and others. But that is to be expected!

]]>Rosenthal, and others above and below:

“so the larger point is just that it kinda doesn’t matter whether it’s aleatoric or epistemic “down there”. Of course, in practice none of this comes for free – we build models and make inferences conditional on assumptions and the various formalizations we’ve decided to be comfortable with…”

If “epistemic” means anything it all, it means of or pertaining to what we know (or think we know). And if what we know (or think we know) means anything at all, it must be anchored in what we have seen, or what we think we have seen. There is no rational, reasonable, pragmatic sense of “epistemic probability” which is not fundamentally rooted in induction from *experience*. “Epistemic” probabilities — if they are indeed stand somehow or other in experience — are necessarily rooted in “references classes”: No more or less so than the “reference classes” favored by the so-called frequentists.

The “epistemic” and “aleatoric” do not and cannot derive from radically different sources of experience.

“Aleatory” emphasizes that aspect of the reference class not easily subsumed within a deterministic description; the classical system best characterized by this adjective was formerly that of statistical mechanics; wherein the regularities observed were emergent properties of the “aleatory” elements in the molecular stratum. The modern quantum theory is the logical continuation of this mode of description; where the underlying “aleatory” properties are not even properties at all; the theory makes claims only for properties emerging out of nothing, as it were.

The adjective “epistemic” (in connection with the probability model) emphasizes that aspect of our experience which is what we *record* as having seen or learned from our interaction with …. a stratum of events relevant to the question at hand, whatever it might be. A reference class.

]]>+1

and thanks PE for bringing in Ramsey, I was thinking he needs to be cited in this discussion.

To meet the internal consistency criterion (and this responds to one of Russ Wolfinger’s questions) I think of applied probability and stats in terms of information models. I see the latter as a practical refinement of radical Bayes in which radical frequentism is a very special case (consistent with DeFinetti’s strict subset view that Lakeland mentioned, but more detailed). In it, additional constraints are imposed on probability models by causal information (“Bayesian”) networks, as per Pearl, Robins and others. We try to imagine how our probability models follow logically from background information, with basic gambling examples showing how we use information about causal stability and independence to arrive at models like the binomial, and with more complex stories showing how we might discount that information.

More generally, suppose we are not simply running data through stock models in software in the hopes that an algorithm will pick up some pattern. Then our information leads to specifications that involve data (to-be-observeds) Y and unknown parameters B, along with other unknown parameters C which we will only condition on because we have too little time or information to include in a credible prior specification. We want a linear function E(y,b;c) that summarizes the kind of patterns we expect in (Y,B) based on our input information and the additional hypothetical information of C=c [DeFinetti calls this function “Prevision”, often translated as “Expectation” but conceptually more than just the first moment of a random variable)]. These functions can be translated quickly to probability functions P(y,b;c) by using indicators.

In strict “classical” frequentism B is empty and the only information allowed in P(y;c) is from verified physical properties of the Y-generator, but it can readmit B if B represents a physical “random effect” with a distribution that may have indexes in C, as in hierarchical/mixed models.

In strict radical Bayes C is empty and one pretends it is possible to make practical progress putting all unknowns in B and treating P(y,b) as a coherent betting schedule.

For ecumenic pragmatists (sometimes labeled penalization or shrinkage frequentists or semi or partial Bayesians) the whole range between these extremes is accessible, and any source of information can enter anywhere.

But direct joint specification P(y,b;c) can be impractically difficult, so is approached piecewise or modularly, and tentatively with various forms which can and arguably should be indexed explicitly in C. Also, there can be many different useful allocations across B and C. There can also be many factorization possibilities; mostly I see and use the “Bayesian” type P(y,b;c) = P(y|b;c)P(b;c). This seems natural I think because the information source for P(y|b;c) is supposedly our knowledge about the actual physical data (Y) generator (the study design and conduct), whereas the source for P(b;c) is supposedly outside the study. This makes the two seem qualitatively different, but that is a category error; in reality anything could be informing either factor, and the source composition can vary quite bit depending on the allocation between B and C.

The math for doing “analysis” once a specification for P(y,b;c) is given is pretty much worked out and programmed, even if not presented or even recognized in the above form – it just becomes algorithmic data processing. That leaves the specification problem and the infamous Garden of Forking Paths, which typically injects much pseudo-information (what justified using age instead of log age as a covariate?). But such arbitrary elements are unavoidable if we want the programs to process the information into contextually interpretable summaries (which is one parsimonious idealization for “inference”). That is one reason I advocate unconditional interpretations of model diagnostics: Any discrepancy or lack thereof may have more possible causes than we accounted for in our model or care to imagine.

By the time I got out of grad school I had been convinced that causal network models were vital for this specification task, in both easing it and making sure the probability conversion and final summaries made sense contextually (my first paper using causal diagrams was published in 1980). Many others have reached and taught the same conclusion, and over recent decades causal modeling tools have exploded in rigor, depth, and breadth. Yet they are still not a central part of applied statistics texts and courses I see, a failure to integrate a topic that I would consider as essential as software use. See more on that lament at https://arxiv.org/abs/2011.02677

]]>He is indeed notorious for saying “probability does not exist!” :) I think the key here is similar to where Lakeland is going above: you can subsume the question of frequencies and physical chances into exchangeable degrees of belief via the Representation Theorem, so the larger point is just that it kinda doesn’t matter whether it’s aleatoric or epistemic “down there”. Of course, in practice none of this comes for free – we build models and make inferences conditional on assumptions and the various formalizations we’ve decided to be comfortable with…including the idea that canonical probability theory is a suitable way to model uncertainty in the problems we’re working on…

]]>To the extent epistemic probability is influenced by sequences of relevant prior events that epistemic probability is a degree of rational belief. To the extent it is not so influenced, it is not so rational. If we create a calculus that is grounded in so-called “rational belief” we must be clear: some evidence is relevant and some is, put it as charitably as possible, less so. Rational beliefs are supported by relevant reference-classes of events. If I invent a scheme which assigns a numerical probability to the hypothesis that rabbits live on mars and I wish to persuade anyone at all that it is grounded in “rational belief”, I suppose I should be prepared to line up my references (in which rabbits live hither and thither, some concrete set of observations of rabbits in approximately martian climates, first or second hand at any rate). There is a rational reference set behind every rational belief; and, conversely, an “irrational” reference set lurks probably behind many irrational beliefs. The numerical calculus just does what it is told to do with these!

]]>Even if physical probabilities exist out in the real world, a statistician or physicist or other human is always reduced to questions about what they know about the world. What they know about the physical probability is just one such question. If I set up a certain QM experiment in a lab, before doing the experiment I have a question about the frequency distribution of the outcome, and after doing the experiment I have a relatively strong knowledge of what the frequency distribution will be… My probability calculations converge to a delta function around the physical frequency distribution. It all makes perfect sense.

]]>I’m not an expert on anything! Perhaps he changed his views at some point, but de Finetti does repeatedly say that any talk of “physical chance” is metaphysical nonsense, it’s all just a fancy way of describing convergence of beliefs, and “parameters” are just useful fictions for making predictions about observables (you don’t bet on a parameter, you only bet on what can be verified). For a critical discussion, see chapter 4 of Gillies’ “Philosophical Theories of Probability”. But, of course, we can accept de Finetti’s Representation Theorem without subscribing to his interpretation of it.

Physical chance is a weird concept (Jaynes exposes some problems in his book “Probability Theory”). But if you don’t take it too seriously and realize that it’s just a simplification of something much more complex, it can be useful.

]]>Not an expert on de Finneti at all, but I’ve always interpreted his signature contribution here- the Representation Theorem – not as reducing one to the other but showing a mathematical convergence, an ‘as if’ condition. Thus, we can do Bayesian inference assuming exchangeability of some sort and remain agnostic about the source of uncertainty!

]]>Daniel: I believe that superset claim is the standard radical subjective Bayes position often attributed to DeFinetti (I think in Popper’s scheme it might be cast as a claim that probability is solely an object in world 3). While I’m sympathetic up to a point, I fear it can blur an important pragmatic distinction between the empirically-based probabilities that our audience and the public wants for its own purposes and the various personal bets that may be pure lunacy.

On the theoretical/philosophical side the superset assertion also has some parallel principled dissent. For example some in quantum mechanics point to the Born rule as providing pure aleatoric probability not subsumed by Bayesian philosophy since it appears to be a genuine law about relative frequencies that exists out there whether anyone or anything knows it, in the world beyond our heads; even among QBayes adherents (whom if I understand correctly see it as a law about what observers experience) it’s still a law that exists outside of any personal mind or bet (and thus in Popper’s world 1 and the ordinary usage of “probability” to represent a physical system’s propensity to produce a certain relative frequency).

Regardless, I hope no one will object to this quote from DeFinetti about formal theories which should apply to meta-theories as well (whether radical Bayes or radical logicism or radical frequentism or whatever), and which I think can’t be repeated too often:

“…everything is based on distinctions which are themselves uncertain and vague, and

which we conventionally translate into terms of certainty only because of the logical formulation…In the mathematical formulation of any problem it is necessary to base oneself on some appropriate idealizations and simplification. This is, however, a disadvantage; it is a distorting factor which one should always try to keep in check, and to approach circumspectly. It is unfortunate that the reverse often happens. One loses sight of the original nature of the problem, falls in love with the idealization, and then blames reality for not conforming to it.” [Theory of Probability vol. 2, 1975, p. 279]

Just to complement what you said, here’s a quote by Frank Ramsey, a Bayesian, in “Truth and Probability” (1926):

“Probability is of fundamental importance not only in logic but also in statistical and physical science, and we cannot be sure beforehand that the most useful interpretation of it in logic will be appropriate in physics also. Indeed the general difference of opinion between statisticians who for the most part adopt the frequency theory of probability and logicians who mostly reject it renders it likely that the two schools are really discussing different things, and that the word ‘probability’ is used by logicians in one sense and by statisticians in another. The conclusions we shall come to as to the meaning of probability in logic must not, therefore, be taken as prejudging its meaning in physics”

(Obviously, statisticians and physicists are much more comfortable with Bayesianism today than they were when he wrote this.)

Ramsey’s point is that you can consistently adopt many interpretations of probability (as long as you are cautious, perhaps by using different notations to refer to the different types of “probabilities” when not doing so might cause confusion). When people speak of “probability of a probability”, for example, they’re usually using two different meanings probability in the same sentence. That’s fine, the many different types of probability can coexist. As you note, physical chances should inform your degrees of belief (or when you don’t know the chance, you might want to estimate the “plausible” values of the chance parameter). But it would be nice if the concepts were kept distinct (though not completely separable from each other) to avoid confusion.

Not everybody is happy with this pluralism. de Finetti, writing at the same time as Ramsey, tried to completely reduce “physical chances” to subjective probability. But we don’t have to be as radical as him.

(It’s worth noting that frequentist interpretation of probability != frequentist statistics. To give an example, A.W.F. Edwards thought that probability can only ever refer to frequencies, but he criticized most of what we understand by “frequentist statistics” today.)

]]>Did you try JSTOR? They often have articles from some journals without charge after some embargo period (usually 5-10 years) even when those remain paywalled at the original journal site.

]]>I personally see no problem with the statement that Bayes is about epistemic probability. Epistemic probability can be induced by aleatoric considerations, so it completely subsumes Frequentism. It is a strict superset.

]]>Sander, Unfortunately these articles are not publicly available and cost $45 each to get access so I wont be able to read them :-(

]]>