Comments on: What is probability?

By: John Xie

John Xie — Thu, 19 Nov 2020 23:29:47 +0000

What about the propensity interpretation of probability ? See this article for more details: https://www.sciencedirect.com/science/article/pii/S0307904X81800510.

By: Thanatos Savehn

Thanatos Savehn — Mon, 14 Jan 2019 06:05:57 +0000

Apparently there is not complete overlap between decision theory and probability theory. Hmmmm …

By: Andrew

Andrew — Mon, 14 Jan 2019 04:28:32 +0000

In reply to Peter Gerdes. Peter: I do not in general think it is a good idea to speak of "the probability that a certain hypothesis is true"; see this article for elaboration of this point; Figure 1 illustrates the sort of probabilistic thinking that I don't generally like in science.

By: Peter Gerdes

Peter Gerdes — Mon, 14 Jan 2019 04:21:42 +0000

I very much like this focus on the mathematical formalism but ultimately we have to decide what it means when we make various applications of probability. When we claim that the probability that a certain hypothesis is true or our scientific theory predicts that the probability of measuring an electron spin up is such and such. Yes, the fact that we call these things probabilities implies that we believe they obey the relevant mathematical axioms but we believe those claims have genuine empirical content (we can give the wrong answer to the question of what is the probability the electron will be measured spin up even if we offer a probability assignment obeying the axioms).

So yah, I’m all for defining probability in general as anything that satisfies the axioms. It avoids the confusion stemming from the assumption that all probabilistic claims refer to the same content. However, it doesn’t relieve us of having to explain what those underlying claims mean.

By: David Marcus

David Marcus — Mon, 07 Jan 2019 14:36:07 +0000

In reply to Brendan. Schrödinger's point was that the cat must be either alive or dead before we look. If you agree that it is alive or dead after we look, then you are not denying perceptual realism. If you insist that it is both alive and dead after we look, then it isn't clear what you mean by "locality". And, we are probably both having and not having this discussion.

By: Brendan

Brendan — Mon, 07 Jan 2019 10:58:10 +0000

In reply to David Marcus. As you said above, realism is the idea that it's absurd for the cat to be both alive and dead. Norsen would call this 'perceptual realism', though I disagree with his conclusions.

By: David Marcus

David Marcus — Sat, 05 Jan 2019 17:05:04 +0000

In reply to Daniel Lakeland.

Bell does define “locality”. See the articles collected in his book or see http://www.scholarpedia.org/article/Bell's_theorem#Bell.27s_definition_of_locality . Here is a quote from the latter: “Bell explained the ‘principle of local causality’ as follows: The direct causes (and effects) of events are near by, and even the indirect causes (and effects) are no further away than permitted by the velocity of light. In relativistic terms, locality is the requirement that goings-on in one region of spacetime should not affect — should not influence — happenings in space-like separated regions.”

By: David Marcus

David Marcus — Fri, 04 Jan 2019 21:07:46 +0000

In reply to Brendan.

For a detailed discussion of “realism” and its relationship to Bell’s Theorem, see “Against ‘Realism'” by Travis Norsen, https://arxiv.org/abs/quant-ph/0607057 .

By: Keith O’Rourke

Keith O’Rourke — Fri, 04 Jan 2019 19:12:08 +0000

In reply to Christian Hennig.

> surely both play a role that is hard to disentangle
Of course, that is why I prefaced with primarily.

> Peirce, insightful as ever, tends to the cultural side in that quote
More likely he tends to bounce around – he raises numerous interesting considerations on any question, deliberates these repeatedly (one might even say incessantly and endlessly) and seldom brings any closure that lasts (for him anyway).

It seems, that after 1908 Pierce does give a larger role to the object of the sign rather than the interpretant of the sign.

If I am understanding him? this gives the reality that we have no direct access more of a role in shaping how we represent and interpret it.

OK a bit of moderation of the constructionist perspective, although you may see this differently ;-)

By: Christian Hennig

Christian Hennig — Fri, 04 Jan 2019 18:39:50 +0000

In reply to Keith O’Rourke. It doesn't have to be *either* culture *or* reality of course, surely both play a role that is hard to disentangle. Peirce, insightful as ever, tends to the cultural side in that quote, I'd think, although you may see this differently (actually the two may be hard to disentangle even in Pierce's words).

By: Martha (Smith)

Martha (Smith) — Fri, 04 Jan 2019 17:25:32 +0000

In reply to Keith O’Rourke. Good points.

By: Martha (Smith)

Martha (Smith) — Fri, 04 Jan 2019 17:18:27 +0000

In reply to Pedro.

Pedro said,
“But don’t you also think that, say, degrees of belief and long-run frequencies are also “probabilities” in another useful sense? ”

Indeed, I see degrees of belief and long-run frequencies as intuitive/conceptual ideas of probability that we try to *model* mathematically.

For elaboration on my view, please http://statmodeling.stat.columbia.edu/2014/01/16/22571/#comment-153299, (which Andrew gave a link to in his post), and also the further link (https://web.ma.utexas.edu/users/mks/statmistakes/probability.html) given in that comment.

By: Keith O’Rourke

Keith O’Rourke — Fri, 04 Jan 2019 15:27:28 +0000

In reply to Christian Hennig.

Thoughtful comment – as if any concept (sign, representation, symbol, word etc.) could have a single meaning that could be fixed for all time.

> define concepts within mathematics
These mathematical definitions of the concept are only one aspect (see below) but they do try to make hard pauses in the unfolding process of interpretation.

“The second grade of clarity is to have, or be capable of providing, a definition of the concept. This definition should also be abstracted from any particular experience, i.e., it should be general. So, my ability to provide a definition of gravity (as, say, a force which attracts objects to a point, like the center of the earth) represents a grade of clarity or understanding over and above my unreflective use of that concept in walking, remaining upright, etc.” https://www.iep.utm.edu/peircepr/

Now, Christian, is it culture that primarily drives the multi-dimensional evolving interpretation of the concept or the reality that we have no direct access to but which impacts us in many ways when the concept is applied…

By: David Marcus

David Marcus — Fri, 04 Jan 2019 14:32:17 +0000

In reply to Brendan.

I don’t know what you mean by “realism” or “local realism”, but they aren’t hypotheses for Bell’s Theorem. See http://www.scholarpedia.org/article/Bell%27s_theorem#Bell.27s_theorem_proves_the_impossibility_of_.22local_realism.22

By: Pedro

Pedro — Fri, 04 Jan 2019 06:38:23 +0000

In reply to Martha (Smith).

Andrew, Martha:

Thank you for the clarification. I suspect, however, that many people take “probability is a mathematical concept” to imply something stronger than “Kolmogorov’s axioms can be useful for many different purposes, and there are challenges to every interpretation of probability.” Perhaps the post would be less controversial if phrased differently? To give one example, Kolmogorov’s axioms may also be useful to calculate the normalized mass/length/area/volume/etc of physical objects; do you think, then, that physical mass is a probability? I mean, it trivially is in the boring sense that it follows the axioms! But don’t you also think that, say, degrees of belief and long-run frequencies are also “probabilities” in another useful sense? If not, then the dispute between Bayesians and frequentists would be very puzzling (akin to statisticians criticizing physicists for using Kolmogorov’s axioms to calculate physical masses; they are just talking about completely unrelated concepts that happen to use the same axiomatic tool).

By: Martha (Smith)

Martha (Smith) — Fri, 04 Jan 2019 03:07:59 +0000

In reply to Andrew.

“… see the P.S. above for the reason I wrote this post.”

Especially the last two sentences: “In practice, probability is not a perfect model for any of these scenarios: long-run frequencies are in practice not stationary, betting depends on your knowledge of the counterparty, uncertainty includes both known and unknown unknowns, decision making is open-ended, and statistical inference is conditional on assumptions that in practice will be false. That said, probability can be a useful tool for all these problems.”

By: Andrew

Andrew — Fri, 04 Jan 2019 01:49:04 +0000

In reply to Pedro.

Pedro:

If you really want groundbreaking stuff, you shouldn’t be reading blogs . . .

Seriously, though, see the P.S. above for the reason I wrote this post. I do think there’s a lot of confusion out there.

By: Pedro

Pedro — Fri, 04 Jan 2019 01:46:40 +0000

“Probability is a mathematical concept.”
I’m struggling to understand how this could be both true *and* not trivial. Consider:

“The concept of probability has been used long before Kolmogorov; we still use the term, both in science and ordinarily, in ways that do not correspond to a mere mathematical formalism; there are alternative axiomatizations of probability (e.g. Cox’s; some people reject countable additivity, etc)); etc; etc.”
“Yes, but those are different meanings of probability. I’m specifically referring to the mathematical concept.”

So, the mathematical concept of probability is a mathematical concept. Groundbreaking stuff.

By: Martha (Smith)

Martha (Smith) — Thu, 03 Jan 2019 23:30:01 +0000

In reply to Martha (Smith).

At January 3, 2019 at 4:36 am, OJM said:

“> Frequentists will insist that there is no conditioning going on. Instead, there is a statistical model, which is a family of data distributions indexed by a parameter
Just means there is a *function* theta -> P(A). The parameter space need not have any of the additional structure required to make it a probability space, ie subsets etc of the parameter space need not satisfy the Kolmogorov axioms.”

This is more or less what I was trying to say by
“The p-value is a conditional probability: conditioned on the model and the null hypothesis (although one might alternatively consider the null hypothesis part of the model).”

In this statement, I was not using the phrase “conditional probability” in the “traditional” sense – but in a broader meaning than that: namely, a probability defined with some restriction (assumption) involved.

To (attempt to) describe the situation a little more clearly: In calculating a p-value, one assumes a particular type of distribution and particular values of parameters for that type of distribution, and uses these in calculating the particular probability that is called the p-value. Thus the calculation depends on the type of distribution and the particular values of the parameters of that type of distribution. This what I meant by a “conditional probability” here: a probability calculated under certain specified conditions.”

Having thought about it more, I am inclined to suggest terminology that is not quite what OJM said, but more like this: I use the word “conditional” in two meanings.

One is the traditional probability usage of “conditional on an event A”, where the event is in the sigma-algebra on which the probability function P is defined.

The other is to express that the probability function P itself is defined in terms of (i.e., a function of) other types of “objects”: In the case of calculating the p-value, one uses a posited probability model, so the p-value calculation depends on the type of model and the parameters of the model.
The discussions we’ve had suggest that it’s a good idea to use a different notation for this – perhaps P(A||model, parameters) would help avoid confusion.

(In calculating p-values, there is an added complication: the p-value would be defined in the above terminology as something like P(A|| normal model with mean mu-naught and standard deviation sigma) where sigma is unknown, so the p-value is estimated using an estimate of sigma for from the data.)

By: Daniel Lakeland

Daniel Lakeland — Thu, 03 Jan 2019 18:09:14 +0000

In reply to Martha (Smith).

ojm said>

One reason to be pedantic about this is that the formal structure of an additive, normalised measure imply, in all interpretations as far as I can see, that only one model can be ‘true’.

I don’t think it’s the additive normalized measure here but rather the model of probability as a measure over plausibility of the truth values of propositions. It’s not possible for the propositions X=2 and X=3 to both be true.

This is why I started working on a view of probability as a measure over “accordance” where I mean by accordance something like “compatibility with what’s assumed for the model”

It’s possible for “the length of my pencil is 14.0cm” and “the length of my pencil is 14.02cm” and “the length of my pencil is 13.98cm” to all have equally good accordance with the particular theory of measurement that I’m using (for example if I know that there is a near uniform spread of errors in manufacturing the ruler I’m using over this range of errors and I know how good my optical discernment of the marks on the ruler are and I have a certain view on what it means for a pencil to “have a length” such that it isn’t relevant to my measurement whether or not a speck of dust lands on the tip and sticks out a bit).

Did you ever skim that document I sent you? needs a lot of work but at least starts to develop this notion.

By: Brendan

Brendan — Thu, 03 Jan 2019 15:23:28 +0000

In reply to David Marcus.

David Marcus,

Yes, your interpretation of Bell’s Theorem is correct. Most people are not willing to give up locality *or* realism, yet Bell’s Theorem and experimental violation of the inequality demands one of those to be wrong.

I think you are a little quick to dismiss dropping local realism – it’s the most literal interpretation of quantum mechanics (or, better, it’s an interpretation of classical mechanics in terms of quantum mechanics, rather than the other way around). I don’t think there’s any reason classical states need be fundamental, so long as quantum mechanics predicts (as it does) that our observations are always of definite states.

By: ojm

ojm — Thu, 03 Jan 2019 09:36:55 +0000

In reply to Martha (Smith).

> Frequentists will insist that there is no conditioning going on. Instead, there is a statistical model, which is a family of data distributions indexed by a parameter

Just means there is a *function* theta -> P(A). The parameter space need not have any of the additional structure required to make it a probability space, ie subsets etc of the parameter space need not satisfy the Kolmogorov axioms.

For example, in an ‘all models are wrong’ context there is no need for theta (models) to come with mutually exclusive, sum to one structure. Bayesians seem to like to call this sort of case ‘M-open’.

To a frequentist, (or likelihoodist or other) a pvalue and related quantities are a *function* of the ‘hypothesis’, yes, evaluated at one or more values, but they reserve the word ‘conditioning’ for formal statistical conditioning.

One reason to be pedantic about this is that the formal structure of an additive, normalised measure imply, in all interpretations as far as I can see, that only one model can be ‘true’.

To me this structure makes reasonable sense over observables, but not over theoretical constructs. But sure a Bayesian can call it a ‘conditional’ probability, it just won’t necessarily make sense to anyone outside of the Bayesian realm.

By: Carlos Ungil

Carlos Ungil — Thu, 03 Jan 2019 06:57:00 +0000

In reply to Martha (Smith). For the record, in his axiomatic formulation de Finetti defines conditional probability in the same way as Kolmogorov: “Conditional probabilities P(E|H), or conditional previsions, P(X|H), are expressible, in cases where H has nonzero probability, in terms of the unconditional probabilities by means of a formula which, in an abstract, axiomatic treatment, can be taken as a definition: P(E|H)=P(EH)/P(H), P(X|H)=P(XH)/P(H).” To ensure that when P(H)=0 the conditional probability (undefined) is coherent he adds a third axiom: “Axiom 3 The conditions of coherence (Axioms 1 and 2) must be satisfied, also, by the P_H conditional on a possible H, where P_H(E)=P(E|H), P_H(E|A)=P(E|AH) is to be understood.” to the first two axioms: “Axiom 1 Non‐negativity: if we certainly have X ⩾ 0, we must have P(X) ⩾ 0. Axiom 2 Additivity (finite): P(X+Y)=P(X)+P(Y).” In de Finetti’s (non-axiomatic) theory, there is a theorem stating that “A necessary and sufficient condition for coherence in the evaluation of P(X|H), P(H) and P(HX), is compliance with the relation P(HX)=P(H)*P(X|H), in addition to the inequalities inf(X|H) ≤ P(X|H) ≤ sup(X|H), and 0 ≤ P(H) ≤ 1; in the case of an event, X = E, P(HE)= P(H)*P(E|H), is called the theorem of compound probabilities, and the inequality for P(X|H) reduces to 0 ≤ P(E|H) ≤ 1 (being = 0, or = 1, in the case where EH, or E͂H, respectively, is impossible).

By: Martha (Smith)

Martha (Smith) — Thu, 03 Jan 2019 06:38:12 +0000

In reply to Martha (Smith).

(Apologies if this appears twice — the first time I tried to post it, it didn’t seem to appear.)

“… Bayesians will sometimes describe p-values as probabilities conditional on the null parameter value. Frequentists will insist that there is no conditioning going on. Instead, there is a statistical model, which is a family of data distributions indexed by a parameter, and p-values are probabilities computed using the null distribution. Conditioning on parameter values makes no sense when parameters are not random variables.”
The p-value is a conditional probability: conditioned on the model and the null hypothesis (although one might alternatively consider the null hypothesis part of the model).”
At December 30, 2018 at 10:50 pm, I replied: “The p-value is a conditional probability: conditioned on the model and the null hypothesis (although one might alternatively consider the null hypothesis part of the model).”

In this statement, I was not using the phrase “conditional probability” in the sense that we have been discussing – but in a broader meaning than that: namely, a probability defined with some restriction (assumption) involved. To (attempt to) describe the situation a little more clearly: In calculating a p-value, one assumes a particular type of distribution and particular values of parameters for that type of distribution, and uses these in calculating the particular probability that is called the p-value. Thus the calculation depends on the type of distribution and the particular values of the parameters of that type of distribution. This what I meant by a “conditional probability” here: a probability calculated under certain specified conditions.

By: Martha (Smith)

Martha (Smith) — Thu, 03 Jan 2019 06:29:51 +0000

In reply to Martha (Smith).

OK, I’ve had a chance to look at this more carefully. On January 1 at 6:52 pm, I wrote:

“However, I can elaborate by saying that I would not talk about “the probability of event A given B” unless the system of “probabilities given A” satisfies the first three of Kolmogorov’s axioms. From these, one can derive the formula for condition probability, provide that (or, as I might say, given that) the system of probabilities given A does satisfy the three axioms.”

I realize now that my last sentence in the quote is wrong. I think I was remembering something else related that I had worked through carefully a few years ago. So I’m going back to what started this discussion:

At December 26, 2018 at 12:30 pm, Ran said

At December 30, 2018 at 10:50 pm, I replied: “The p-value is a conditional probability: conditioned on the model and the null hypothesis (although one might alternatively consider the null hypothesis part of the model).”

By: Daniel Lakeland

Daniel Lakeland — Wed, 02 Jan 2019 23:14:36 +0000

In reply to Martha (Smith).

In particular, Andrew’s comment from http://statmodeling.stat.columbia.edu/2018/12/26/what-is-probability/#comment-936894

seems to point to Andrew actually preferring a different axiom system than Kolmogorov (though he may not have enough formal logic background to be able to choose a particular one for example), which melds with my theory that in fact Kolmogorov’s axioms aren’t well suited to the use that Bayesians put probability to.

I should probably go get DeFinetti’s book, but balk a bit at spending $75 for the Kindle edition and I don’t have space to put the hardcover :-(

By: Daniel Lakeland

Daniel Lakeland — Wed, 02 Jan 2019 23:06:59 +0000

In reply to Christian Hennig.

Countably infinite additivity seems to be a pure-math issue, in the sense that Hamming meant when he said that he’d never fly in an airplane whose design required Lebesgue integration and not Riemann integrals.

I’ve argued that all real world measurements are discrete, they ultimately measure things like a count of atoms, electrons, etc. It’s just that those things are so small that we can treat them as continuous for most real-world purposes.

I am interested in the mixture of finite additivity and IST/nonstandard analysis and whether that gets us anything meaningfully different from Kolmogorov and his sigma algebras. Measure theory doesn’t seem to map to science particularly well, whereas extremely close discrete steps whose discreteness we wish to ignore *does*, and that’s exactly what you get from IST.

By: Daniel Lakeland

Daniel Lakeland — Wed, 02 Jan 2019 22:49:06 +0000

In reply to Martha (Smith).

Carlos, OJM, yes I agree this particular subthread (as opposed to say the broader discussion in this post, like Andrew’s original statement about axioms) really is about Kolmogorov rather than axiom systems more generally, so my most recent post was intended to as OJM mentioned, show a preference for an alternative axiom system that I think can be considered more general than Kolmogorov in the sense that you can get Kolmogorov’s system when you limit it to certain kinds of propositions. But, as with so much on blogs, I don’t deny that my treatment is handwavy and could use some careful thought. Perhaps I’ll write my own blog post about it.

I agree that conditioning seems to need more axioms than the three Kolmogorov ones Martha linked to at Wiki: https://en.wikipedia.org/wiki/Probability_axioms

In a system where we take conditioning as axiomatic and based on a set of propositions, then we can use axioms of set theory to handle unions of additional propositions that we “condition on”. I think this could put Martha’s statements which OJM interpreted as “if B then p(A)” onto a strong footing.

All that is to say that perhaps to Andrew and Martha and others (myself perhaps) the Kolmogorov axioms aren’t the most intuitive and shouldn’t be taken as primary. Both Martha and Andrew have expressed a preference for conditioning to have a primary role. I don’t think either of them are logicians / set theorists / etc but particularly Andrew, so I think their informal intuitions are at least an interesting piece of information to inform us as to how successful axiom systems are for axiomatizing intuitive ideas

By: ojm

ojm — Wed, 02 Jan 2019 20:52:12 +0000

In reply to Christian Hennig.

Oh no! :-(

http://bulletin.imstat.org/2018/12/obituary-frank-hampel-1941-2018/

By: Carlos Ungil

Carlos Ungil — Wed, 02 Jan 2019 20:34:45 +0000

In reply to Martha (Smith). Daniel, the discussion here is (I think) about how conditional probabilities need to be defined in Kolmogorov’s theory of probability. Kolmogorov’s axioms really means Kolmogorov’s axioms here, not just any axiomatic definition of probability. Martha even sent a link to remove any ambiguity.

By: ojm

ojm — Wed, 02 Jan 2019 20:10:11 +0000

In reply to Martha (Smith).

There are certainly axiomatisations of probability where conditional probability is taken as basic and the Kolmogorov definition is a theorem – see also de Finetti and Popper, I think.

But like Carlos, I don’t see how you take take the Kolmogorov axioms and get conditional probability without *defining* it eg via the ratio form.

Which I think comes back to the point about axiomatics – if people have different intuitions about which are the basic concepts then they may prefer different axiom systems.

By: Daniel Lakeland

Daniel Lakeland — Wed, 02 Jan 2019 19:51:26 +0000

In reply to Martha (Smith).

Fortunately, I think Cox-Bayes gives a solution to all of this at least for the case where we’re discussing boolean statements. Accept that p(A|K) is an assignment of a probability based on a state of information which is a set of true propositions, and that p(A) is a shorthand notation for this where K is implicit.

Accept that p(A|B) is shorthand for p(A|Union(B,K))

From Cox’s axiom about conjunction we have p(A,B) = p(A|B)p(B) essentially axiomatically.

all the stuff about adding up to 1 and negation and soforth comes out of the other Cox axioms…

Now to handle frequentism we restrict the whole thing to sequences of numbers, and K to knowledge of the properties of the sequences.

p(SomeEvent | SomeEvent is a logical statement about certain numbers subsetted in a known way from an infinite sequence of real numbers that passes Per Martin-Lof’s most powerful test for randomness)

and you have Kolmogorov’s probability theory as a special case of Cox’s theorems *waves hands wildly* ;-)

By: Christian Hennig

Christian Hennig — Wed, 02 Jan 2019 18:05:17 +0000

In reply to ojm.

…and de Finetti wanted to get rid of countably infinite additivity…
RIP Frank Hampel by the way! :-(

By: Carlos Ungil

Carlos Ungil — Wed, 02 Jan 2019 14:05:22 +0000

In reply to Martha (Smith). I don’t think you can derive the formula for conditional probability from Kolmogorov’s axioms alone. You need to give some additional meaning to the word “probability”, then you can derive the meaning of “conditional probability”. Given a pre-existent definition of “probability” you can “derive” the axioms as well. Kolmogorov’s book has a section titled “The Empirical Deduction of the Axioms.”

By: Mikhail

Mikhail — Wed, 02 Jan 2019 08:34:59 +0000

I was thinking about mathematical definition of probability (Kolmogorov axioms) as a bridge between different concepts rather then the foundations. You can define probability aleatoricly as a long-running frequency or epistemicly as a degree as believe, but as you can use the same mathematical framework for both of them the definition does not matter so much.

By: ojm

ojm — Wed, 02 Jan 2019 04:10:59 +0000

In reply to Martha (Smith). So you mean you take P(A|B) to be short hand for something like: If B then P(A), where P(A) satisfies the Kolmogorov axioms and the above is a logical statement? Can I multiply the logical statement ‘if B then P(A)’ by a number? Can you then *always* ‘invert’ the conditioning to give P(B|A)? Is it possible for P(A|B) to make sense while P(B|A) doesn’t? Maybe it would be helpful to explicitly give the derivation you mention?

By: ojm

ojm — Wed, 02 Jan 2019 03:52:20 +0000

In reply to Christian Hennig.

Hampel has even argued that the idea non additive probability has been present from the beginning

https://www.tandfonline.com/doi/abs/10.1080/15598608.2009.10411908

By: Martha (Smith)

Martha (Smith) — Tue, 01 Jan 2019 23:52:13 +0000

In reply to Martha (Smith).

Carlos and ojm,

I tried twice last night to respond to ojm’s comment, “This seems to require a formal definition of ‘given’…can you elaborate on your preference in terms of a mathematical definition?”, but both attempts never got posted. So here’s another try:

1) I am using “given” in the sense of Definition 2 of The American Heritage Dictionary, 1985, which reads, “Granted as a supposition.” (This is also the usage that is often used in mathematics to describe the hypothesis of a theorem or conjecture.)

2) I’m not sure I can elaborate on my preference in terms of a mathematical definition – because I’m not sure what you are asking for. Perhaps if you rephrase your question, I can respond to it.

3. I think my response (1) at least in part replies to Carlos’ question, “what is the definition of “the probability of event A given that event B occurs” then?” To elaborate: I don’t claim that there any isolated definition of “the probability of event A given B”, just as there is no single definition of “the probability of event A”. However, I can elaborate by saying that I would not talk about “the probability of event A given B” unless the system of “probabilities given A” satisfies the first three of Kolmogorov’s axioms. From these, one can derive the formula for condition probability, provide that (or, as I might say, given that) the system of probabilities given A does satisfy the three axioms.

By: Sameera Daniels

Sameera Daniels — Tue, 01 Jan 2019 23:41:45 +0000

In reply to Christian Hennig.

To an onlooker, like myself, to the statistics and epidemiology fields, I have been aware that many terms are defined and contextualized differently. I guess what surprises me a little is that we ignore [often subtle or overt] discrepancies in meaning. Specifically why has it taken experts so long to acknowledge them. Have there been efforts to standardize definitions? Have there been fruitful exercise in the process?

Take the descriptive ‘reproducibility’. It is sometimes used interchangeably with ‘replication’. Goodman, Fanelli, and Ioannidis
explore its definitions in this following article.

http://stm.sciencemag.org/content/8/341/341ps12

By: Daniel Lakeland

Daniel Lakeland — Tue, 01 Jan 2019 23:33:59 +0000

In reply to Martha (Smith). It's not satisfactory though when dealing with long-run-frequency notions and abstract sequences of numerical outcomes (RNGs) because we can't lean on the well defined notion of union of propositions.

By: Daniel Lakeland

Daniel Lakeland — Tue, 01 Jan 2019 23:32:03 +0000

In reply to Martha (Smith). I think this is one situation where the Bayesian notion of conditioning on a "state of information" makes good sense. Then all probabilities are conditioned on something, and conditioning is primitive/axiomatic, p(A,B|K) = p(A|B,K)p(B|K) and we can say p(A|B,K) is the probability when the state of knowledge is {K} union {"B is True"}. Since the probabilities are describing plausibility of boolean propositions in Cox's view, and union of sets of propositions is a well defined thing.

By: Martha (Smith)

Martha (Smith) — Tue, 01 Jan 2019 23:25:04 +0000

In reply to Christian Hennig. +1

By: Christian Hennig

Christian Hennig — Tue, 01 Jan 2019 23:07:18 +0000

In reply to Daniel Lakeland. Just that the word probability is in use for a long time and has inspired quite a bit of useful science among other things doesn't imply that there is any "true" and consistent answer to the question "What is probability?"

By: Carlos Ungil

Carlos Ungil — Tue, 01 Jan 2019 21:00:43 +0000

In reply to Martha (Smith).

“The probability of event A conditional on event B is the probability of event A given that event B occurs”

I agree with ojm, what is the definition of “the probability of event A given that event B occurs” then?

A some point a more substantial definition is needed which is not just swapping synonyms like “conditional on”, “given that”, “contingent upon”, “subject to”, etc.

I wouldn’t say that the formula is “derived” from that “definition”, I would say that the formula is the definition.

In the continuous case there is a problem conditioning on zero measure events, but it can be handled by taking limits adequately.

By: Christian Hennig

Christian Hennig — Tue, 01 Jan 2019 19:55:12 +0000

Once more too late to the party but… in some discussions I’m fully with Andrew on this one, particularly whenever somebody tries to claim that probability is “really” one thing and not the other (as long as both fulfill Kolmogorov’s axioms or do whatever it takes to work as a probability mathematically).

However, probability existed as a concept before Kolmogorov came up with the axioms, and although the axioms were meant to define probability in the realm of mathematics, they were also meant to connect the mathematical definition to a range of ideas about probabilities that were around already. Mathematicians can define concepts within mathematics but their definitions don’t come with any particular authority outside their own field. In model theory mathematicians have a formal definition for what a model is, but this doesn’t mean that everyone who discusses models outside this framework has lost their legitimacy.

The history of probability before Kolmogorov is very instructive, particularly how people arrived at the explicit insight that when people talked about probability and even published about it, that they use two or even more genuinely different interpretations of that concept, and how this was missed or at least not acknowledged by anyone before 1830 or so. I believe that many of the current interpretations of probability can be traced back to some historical views of what probability is that were not seen as contradictory for quite some time, but seem contradictory or at least incompatible to us.

Generally I think that terms such as “probability” come with a complex bulk of roots and meanings; they are in use in partly incompatible ways for a long time, and the implications of this don’t go away just because mathematicians try to condense this into a clear unambiguous definition. I’m fine with the mathematicians doing their job there and I will play by their rules most of the time, but I will not grant them absolute definition power (and neither will I grant this to anybody else). Probability is all kinds of things and not just one of them “really”.

By: Daniel Lakeland

Daniel Lakeland — Tue, 01 Jan 2019 00:27:50 +0000

In reply to ojm. I think "Kolmogorov axioms" is really a stand-in in this discussion for "some acceptable axiomatic treatment". Everyone knows Kolmogorov axiomized probability, but few people return to the axioms on a regular basis, it's more like they work with the calculations they're used to satisfied that it has a formal structure they don't quite exactly remember. This is like set theory. I have a Bachelor's degree in Mathematics, and just about every u-grad math text has some stupid unsatisfying chapter on set theory and notation in the beginning, but I never actually read a book on formal set theory until last year (thanks to a certain trouble maker ;-)) As Martha said, cringing at defining p(a|b) = p(a,b)/p(b) is something some professional mathematicians do. It's possible to axiomatize the system where p(a|b) is primary, and then define p(a,b) = p(a|b)p(b) = p(b|a)p(a). and this has good properties, for example, there's no division by zero when p(b) or p(a) = 0

By: ojm

ojm — Mon, 31 Dec 2018 23:23:56 +0000

In reply to Martha (Smith).

The problem then is that conditional probability is undefined purely based on those. Within the Kolmogorov approach it then needs to be defined in terms of those axioms and primitives, giving the ratio form.

Other axiomatic treatments can derive the ratio form *by including conditional probability in the axioms and primitives*. I think de Finetti and Popper etc do this.

But in order to reject the idea that (elementary) conditional probability is *defined* by the ratio form I think you need to work from an alternative axiomatic system where this concept is defined.

So I do think people are prioritising intuitions over eg conditional probability over axiomatics, somewhat contrary to the theme of the post.

Or not? See also below – how do you define conditional probability within the Kolmogorov system?

By: ojm

ojm — Mon, 31 Dec 2018 22:17:44 +0000

In reply to Martha (Smith). This seems to require a formal definition of ‘given’...can you elaborate on your preference in terms of a mathematical definition?

By: Martha (Smith)

Martha (Smith) — Mon, 31 Dec 2018 21:25:51 +0000

In reply to Carlos Ungil. I prefer the definition "The probability of A conditional on B is the probability of event A given that that event B occurs", because this captures the idea that I think is intended by "conditional probability". Then, if P(B) isn't 0, one can derived the formula p(A|B) = p(A,B)/p(B)

By: Carlos Ungil

Carlos Ungil — Mon, 31 Dec 2018 18:12:15 +0000

In reply to Martha (Smith). Do you mean that you prefer to derive this formula as a result, that you prefer a different definition, or that you accept this definition but you dislike it for some reason?