I do *not* in general think it is a good idea to speak of “the probability that a certain hypothesis is true”; see this article for elaboration of this point; Figure 1 illustrates the sort of probabilistic thinking that I don’t generally like in science.

So yah, I’m all for defining probability in general as anything that satisfies the axioms. It avoids the confusion stemming from the assumption that all probabilistic claims refer to the same content. However, it doesn’t relieve us of having to explain what those underlying claims mean.

]]>Of course, that is why I prefaced with primarily.

> Peirce, insightful as ever, tends to the cultural side in that quote

More likely he tends to bounce around – he raises numerous interesting considerations on any question, deliberates these repeatedly (one might even say incessantly and endlessly) and seldom brings any closure that lasts (for him anyway).

It seems, that after 1908 Pierce does give a larger role to the object of the sign rather than the interpretant of the sign.

If I am understanding him? this gives the reality that we have no direct access more of a role in shaping how we represent and interpret it.

OK a bit of moderation of the constructionist perspective, although you may see this differently ;-)

]]>“But don’t you also think that, say, degrees of belief and long-run frequencies are also “probabilities” in another useful sense? ”

Indeed, I see degrees of belief and long-run frequencies as intuitive/conceptual ideas of probability that we try to *model* mathematically.

For elaboration on my view, please http://statmodeling.stat.columbia.edu/2014/01/16/22571/#comment-153299, (which Andrew gave a link to in his post), and also the further link (https://web.ma.utexas.edu/users/mks/statmistakes/probability.html) given in that comment.

]]>> define concepts within mathematics

These mathematical definitions of the concept are only one aspect (see below) but they do try to make hard pauses in the unfolding process of interpretation.

“The second grade of clarity is to have, or be capable of providing, a definition of the concept. This definition should also be abstracted from any particular experience, i.e., it should be general. So, my ability to provide a definition of gravity (as, say, a force which attracts objects to a point, like the center of the earth) represents a grade of clarity or understanding over and above my unreflective use of that concept in walking, remaining upright, etc.” https://www.iep.utm.edu/peircepr/

Now, Christian, is it culture that primarily drives the multi-dimensional evolving interpretation of the concept or the reality that we have no direct access to but which impacts us in many ways when the concept is applied…

]]>Thank you for the clarification. I suspect, however, that many people take “probability is a mathematical concept” to imply something stronger than “Kolmogorov’s axioms can be useful for many different purposes, and there are challenges to every interpretation of probability.” Perhaps the post would be less controversial if phrased differently? To give one example, Kolmogorov’s axioms may also be useful to calculate the normalized mass/length/area/volume/etc of physical objects; do you think, then, that physical mass is a probability? I mean, it trivially is in the boring sense that it follows the axioms! But don’t you also think that, say, degrees of belief and long-run frequencies are also “probabilities” in another useful sense? If not, then the dispute between Bayesians and frequentists would be very puzzling (akin to statisticians criticizing physicists for using Kolmogorov’s axioms to calculate physical masses; they are just talking about completely unrelated concepts that happen to use the same axiomatic tool).

]]>Especially the last two sentences: “In practice, probability is not a perfect model for any of these scenarios: long-run frequencies are in practice not stationary, betting depends on your knowledge of the counterparty, uncertainty includes both known and unknown unknowns, decision making is open-ended, and statistical inference is conditional on assumptions that in practice will be false. That said, probability can be a useful tool for all these problems.”

]]>If you really want groundbreaking stuff, you shouldn’t be reading blogs . . .

Seriously, though, see the P.S. above for the reason I wrote this post. I do think there’s a lot of confusion out there.

]]>I’m struggling to understand how this could be both true *and* not trivial. Consider:

“The concept of probability has been used long before Kolmogorov; we still use the term, both in science and ordinarily, in ways that do not correspond to a mere mathematical formalism; there are alternative axiomatizations of probability (e.g. Cox’s; some people reject countable additivity, etc)); etc; etc.”

“Yes, but those are different meanings of probability. I’m specifically referring to the mathematical concept.”

So, the mathematical concept of probability is a mathematical concept. Groundbreaking stuff.

]]>“> Frequentists will insist that there is no conditioning going on. Instead, there is a statistical model, which is a family of data distributions indexed by a parameter

Just means there is a *function* theta -> P(A). The parameter space need not have any of the additional structure required to make it a probability space, ie subsets etc of the parameter space need not satisfy the Kolmogorov axioms.”

This is more or less what I was trying to say by

“The p-value is a conditional probability: conditioned on the model and the null hypothesis (although one might alternatively consider the null hypothesis part of the model).”

In this statement, I was not using the phrase “conditional probability” in the “traditional” sense – but in a broader meaning than that: namely, a probability defined with some restriction (assumption) involved.

To (attempt to) describe the situation a little more clearly: In calculating a p-value, one assumes a particular type of distribution and particular values of parameters for that type of distribution, and uses these in calculating the particular probability that is called the p-value. Thus the calculation depends on the type of distribution and the particular values of the parameters of that type of distribution. This what I meant by a “conditional probability” here: a probability calculated under certain specified conditions.”

Having thought about it more, I am inclined to suggest terminology that is not quite what OJM said, but more like this: I use the word “conditional” in two meanings.

One is the traditional probability usage of “conditional on an event A”, where the event is in the sigma-algebra on which the probability function P is defined.

The other is to express that the probability function P itself is defined in terms of (i.e., a function of) other types of “objects”: In the case of calculating the p-value, one uses a posited probability model, so the p-value calculation depends on the type of model and the parameters of the model.

The discussions we’ve had suggest that it’s a good idea to use a different notation for this – perhaps P(A||model, parameters) would help avoid confusion.

(In calculating p-values, there is an added complication: the p-value would be defined in the above terminology as something like P(A|| normal model with mean mu-naught and standard deviation sigma) where sigma is unknown, so the p-value is estimated using an estimate of sigma for from the data.)

]]>One reason to be pedantic about this is that the formal structure of an additive, normalised measure imply, in all interpretations as far as I can see, that only one model can be ‘true’.

I don’t think it’s the additive normalized measure here but rather the model of probability as a measure over plausibility of the truth values of propositions. It’s not possible for the propositions X=2 and X=3 to both be true.

This is why I started working on a view of probability as a measure over “accordance” where I mean by accordance something like “compatibility with what’s assumed for the model”

It’s possible for “the length of my pencil is 14.0cm” and “the length of my pencil is 14.02cm” and “the length of my pencil is 13.98cm” to all have equally good accordance with the particular theory of measurement that I’m using (for example if I know that there is a near uniform spread of errors in manufacturing the ruler I’m using over this range of errors and I know how good my optical discernment of the marks on the ruler are and I have a certain view on what it means for a pencil to “have a length” such that it isn’t relevant to my measurement whether or not a speck of dust lands on the tip and sticks out a bit).

Did you ever skim that document I sent you? needs a lot of work but at least starts to develop this notion.

]]>Yes, your interpretation of Bell’s Theorem is correct. Most people are not willing to give up locality *or* realism, yet Bell’s Theorem and experimental violation of the inequality demands one of those to be wrong.

I think you are a little quick to dismiss dropping local realism – it’s the most literal interpretation of quantum mechanics (or, better, it’s an interpretation of classical mechanics in terms of quantum mechanics, rather than the other way around). I don’t think there’s any reason classical states need be fundamental, so long as quantum mechanics predicts (as it does) that our observations are always of definite states.

]]>Just means there is a *function* theta -> P(A). The parameter space need not have any of the additional structure required to make it a probability space, ie subsets etc of the parameter space need not satisfy the Kolmogorov axioms.

For example, in an ‘all models are wrong’ context there is no need for theta (models) to come with mutually exclusive, sum to one structure. Bayesians seem to like to call this sort of case ‘M-open’.

To a frequentist, (or likelihoodist or other) a pvalue and related quantities are a *function* of the ‘hypothesis’, yes, evaluated at one or more values, but they reserve the word ‘conditioning’ for formal statistical conditioning.

One reason to be pedantic about this is that the formal structure of an additive, normalised measure imply, in all interpretations as far as I can see, that only one model can be ‘true’.

To me this structure makes reasonable sense over observables, but not over theoretical constructs. But sure a Bayesian can call it a ‘conditional’ probability, it just won’t necessarily make sense to anyone outside of the Bayesian realm.

]]>“Conditional probabilities P(E|H), or conditional previsions, P(X|H), are expressible, in cases where H has nonzero probability, in terms of the unconditional probabilities by means of a formula which, in an abstract, axiomatic treatment, can be taken as a definition: P(E|H)=P(EH)/P(H), P(X|H)=P(XH)/P(H).”

To ensure that when P(H)=0 the conditional probability (undefined) is coherent he adds a third axiom:

“Axiom 3 The conditions of coherence (Axioms 1 and 2) must be satisfied, also, by the P_H conditional on a possible H, where P_H(E)=P(E|H), P_H(E|A)=P(E|AH) is to be understood.”

to the first two axioms:

“Axiom 1 Non‐negativity: if we certainly have X ⩾ 0, we must have P(X) ⩾ 0.

Axiom 2 Additivity (finite): P(X+Y)=P(X)+P(Y).”

In de Finetti’s (non-axiomatic) theory, there is a theorem stating that

“A necessary and sufficient condition for coherence in the evaluation of P(X|H), P(H) and P(HX), is compliance with the relation P(HX)=P(H)*P(X|H), in addition to the inequalities inf(X|H) ≤ P(X|H) ≤ sup(X|H), and 0 ≤ P(H) ≤ 1; in the case of an event, X = E, P(HE)= P(H)*P(E|H), is called the theorem of compound probabilities, and the inequality for P(X|H) reduces to 0 ≤ P(E|H) ≤ 1 (being = 0, or = 1, in the case where EH, or E͂H, respectively, is impossible).

]]>“… Bayesians will sometimes describe p-values as probabilities conditional on the null parameter value. Frequentists will insist that there is no conditioning going on. Instead, there is a statistical model, which is a family of data distributions indexed by a parameter, and p-values are probabilities computed using the null distribution. Conditioning on parameter values makes no sense when parameters are not random variables.”

The p-value is a conditional probability: conditioned on the model and the null hypothesis (although one might alternatively consider the null hypothesis part of the model).”

At December 30, 2018 at 10:50 pm, I replied: “The p-value is a conditional probability: conditioned on the model and the null hypothesis (although one might alternatively consider the null hypothesis part of the model).”

In this statement, I was not using the phrase “conditional probability” in the sense that we have been discussing – but in a broader meaning than that: namely, a probability defined with some restriction (assumption) involved. To (attempt to) describe the situation a little more clearly: In calculating a p-value, one assumes a particular type of distribution and particular values of parameters for that type of distribution, and uses these in calculating the particular probability that is called the p-value. Thus the calculation depends on the type of distribution and the particular values of the parameters of that type of distribution. This what I meant by a “conditional probability” here: a probability calculated under certain specified conditions.

]]>“However, I can elaborate by saying that I would not talk about “the probability of event A given B” unless the system of “probabilities given A” satisfies the first three of Kolmogorov’s axioms. From these, one can derive the formula for condition probability, provide that (or, as I might say, given that) the system of probabilities given A does satisfy the three axioms.”

I realize now that my last sentence in the quote is wrong. I think I was remembering something else related that I had worked through carefully a few years ago. So I’m going back to what started this discussion:

At December 26, 2018 at 12:30 pm, Ran said

“… Bayesians will sometimes describe p-values as probabilities conditional on the null parameter value. Frequentists will insist that there is no conditioning going on. Instead, there is a statistical model, which is a family of data distributions indexed by a parameter, and p-values are probabilities computed using the null distribution. Conditioning on parameter values makes no sense when parameters are not random variables.”

The p-value is a conditional probability: conditioned on the model and the null hypothesis (although one might alternatively consider the null hypothesis part of the model).”

At December 30, 2018 at 10:50 pm, I replied: “The p-value is a conditional probability: conditioned on the model and the null hypothesis (although one might alternatively consider the null hypothesis part of the model).”

In this statement, I was not using the phrase “conditional probability” in the sense that we have been discussing – but in a broader meaning than that: namely, a probability defined with some restriction (assumption) involved. To (attempt to) describe the situation a little more clearly: In calculating a p-value, one assumes a particular type of distribution and particular values of parameters for that type of distribution, and uses these in calculating the particular probability that is called the p-value. Thus the calculation depends on the type of distribution and the particular values of the parameters of that type of distribution. This what I meant by a “conditional probability” here: a probability calculated under certain specified conditions.

]]>seems to point to Andrew actually preferring a different axiom system than Kolmogorov (though he may not have enough formal logic background to be able to choose a particular one for example), which melds with my theory that in fact Kolmogorov’s axioms aren’t well suited to the use that Bayesians put probability to.

I should probably go get DeFinetti’s book, but balk a bit at spending $75 for the Kindle edition and I don’t have space to put the hardcover :-(

]]>I’ve argued that all real world measurements are discrete, they ultimately measure things like a count of atoms, electrons, etc. It’s just that those things are so small that we can treat them as continuous for most real-world purposes.

I am interested in the mixture of finite additivity and IST/nonstandard analysis and whether that gets us anything meaningfully different from Kolmogorov and his sigma algebras. Measure theory doesn’t seem to map to science particularly well, whereas extremely close discrete steps whose discreteness we wish to ignore *does*, and that’s exactly what you get from IST.

]]>I agree that conditioning seems to need more axioms than the three Kolmogorov ones Martha linked to at Wiki: https://en.wikipedia.org/wiki/Probability_axioms

In a system where we take conditioning as axiomatic and based on a set of propositions, then we can use axioms of set theory to handle unions of additional propositions that we “condition on”. I think this could put Martha’s statements which OJM interpreted as “if B then p(A)” onto a strong footing.

All that is to say that perhaps to Andrew and Martha and others (myself perhaps) the Kolmogorov axioms aren’t the most intuitive and shouldn’t be taken as primary. Both Martha and Andrew have expressed a preference for conditioning to have a primary role. I don’t think either of them are logicians / set theorists / etc but particularly Andrew, so I think their informal intuitions are at least an interesting piece of information to inform us as to how successful axiom systems are for axiomatizing intuitive ideas

]]>http://bulletin.imstat.org/2018/12/obituary-frank-hampel-1941-2018/

]]>But like Carlos, I don’t see how you take take the Kolmogorov axioms and get conditional probability without *defining* it eg via the ratio form.

Which I think comes back to the point about axiomatics – if people have different intuitions about which are the basic concepts then they may prefer different axiom systems.

]]>Accept that p(A|B) is shorthand for p(A|Union(B,K))

From Cox’s axiom about conjunction we have p(A,B) = p(A|B)p(B) essentially axiomatically.

all the stuff about adding up to 1 and negation and soforth comes out of the other Cox axioms…

Now to handle frequentism we restrict the whole thing to sequences of numbers, and K to knowledge of the properties of the sequences.

p(SomeEvent | SomeEvent is a logical statement about certain numbers subsetted in a known way from an infinite sequence of real numbers that passes Per Martin-Lof’s most powerful test for randomness)

and you have Kolmogorov’s probability theory as a special case of Cox’s theorems *waves hands wildly* ;-)

]]>RIP Frank Hampel by the way! :-( ]]>

Given a pre-existent definition of “probability” you can “derive” the axioms as well. Kolmogorov’s book has a section titled “The Empirical Deduction of the Axioms.”

]]>If B then P(A),

where P(A) satisfies the Kolmogorov axioms and the above is a logical statement?

Can I multiply the logical statement ‘if B then P(A)’ by a number?

Can you then *always* ‘invert’ the conditioning to give P(B|A)? Is it possible for P(A|B) to make sense while P(B|A) doesn’t?

Maybe it would be helpful to explicitly give the derivation you mention?

]]>https://www.tandfonline.com/doi/abs/10.1080/15598608.2009.10411908

]]>I tried twice last night to respond to ojm’s comment, “This seems to require a formal definition of ‘given’…can you elaborate on your preference in terms of a mathematical definition?”, but both attempts never got posted. So here’s another try:

1) I am using “given” in the sense of Definition 2 of The American Heritage Dictionary, 1985, which reads, “Granted as a supposition.” (This is also the usage that is often used in mathematics to describe the hypothesis of a theorem or conjecture.)

2) I’m not sure I can elaborate on my preference in terms of a mathematical definition – because I’m not sure what you are asking for. Perhaps if you rephrase your question, I can respond to it.

3. I think my response (1) at least in part replies to Carlos’ question, “what is the definition of “the probability of event A given that event B occurs” then?” To elaborate: I don’t claim that there any isolated definition of “the probability of event A given B”, just as there is no single definition of “the probability of event A”. However, I can elaborate by saying that I would not talk about “the probability of event A given B” unless the system of “probabilities given A” satisfies the first three of Kolmogorov’s axioms. From these, one can derive the formula for condition probability, provide that (or, as I might say, given that) the system of probabilities given A does satisfy the three axioms.

]]>Take the descriptive ‘reproducibility’. It is sometimes used interchangeably with ‘replication’. Goodman, Fanelli, and Ioannidis

explore its definitions in this following article.

and we can say p(A|B,K) is the probability when the state of knowledge is {K} union {“B is True”}. Since the probabilities are describing plausibility of boolean propositions in Cox’s view, and union of sets of propositions is a well defined thing.

]]>I agree with ojm, what is the definition of “the probability of event A given that event B occurs” then?

A some point a more substantial definition is needed which is not just swapping synonyms like “conditional on”, “given that”, “contingent upon”, “subject to”, etc.

I wouldn’t say that the formula is “derived” from that “definition”, I would say that the formula is the definition.

In the continuous case there is a problem conditioning on zero measure events, but it can be handled by taking limits adequately.

]]>However, probability existed as a concept before Kolmogorov came up with the axioms, and although the axioms were meant to define probability in the realm of mathematics, they were also meant to connect the mathematical definition to a range of ideas about probabilities that were around already. Mathematicians can define concepts within mathematics but their definitions don’t come with any particular authority outside their own field. In model theory mathematicians have a formal definition for what a model is, but this doesn’t mean that everyone who discusses models outside this framework has lost their legitimacy.

The history of probability before Kolmogorov is very instructive, particularly how people arrived at the explicit insight that when people talked about probability and even published about it, that they use two or even more genuinely different interpretations of that concept, and how this was missed or at least not acknowledged by anyone before 1830 or so. I believe that many of the current interpretations of probability can be traced back to some historical views of what probability is that were not seen as contradictory for quite some time, but seem contradictory or at least incompatible to us.

Generally I think that terms such as “probability” come with a complex bulk of roots and meanings; they are in use in partly incompatible ways for a long time, and the implications of this don’t go away just because mathematicians try to condense this into a clear unambiguous definition. I’m fine with the mathematicians doing their job there and I will play by their rules most of the time, but I will not grant them absolute definition power (and neither will I grant this to anybody else). Probability is all kinds of things and not just one of them “really”.

]]>As Martha said, cringing at defining p(a|b) = p(a,b)/p(b) is something some professional mathematicians do. It’s possible to axiomatize the system where p(a|b) is primary, and then define p(a,b) = p(a|b)p(b) = p(b|a)p(a). and this has good properties, for example, there’s no division by zero when p(b) or p(a) = 0

]]>Other axiomatic treatments can derive the ratio form *by including conditional probability in the axioms and primitives*. I think de Finetti and Popper etc do this.

But in order to reject the idea that (elementary) conditional probability is *defined* by the ratio form I think you need to work from an alternative axiomatic system where this concept is defined.

So I do think people are prioritising intuitions over eg conditional probability over axiomatics, somewhat contrary to the theme of the post.

Or not? See also below – how do you define conditional probability within the Kolmogorov system?

]]>