Regarding the so-called Dutch Book argument for Bayesian inference (the idea that, if your inferences do not correspond to a Bayesian posterior distribution, you can be forced to make incoherent bets and ultimately become a money pump), I wrote:

I have never found this argument appealing, because a bet is a game not a decision. A bet requires 2 players, and one player has to offer the bets. I do agree that in some bounded settings (for example, betting on win place show in a horse race), I’d want my bets to be coherent; if they are incoherent (e.g., if my bets correspond to P(A|B)*P(B) not being equal to P(A,B)), then I should be able to do better by examining the incoherence. But in an “open system” (to borrow some physics jargon), I don’t think coherence is possible. There is always new information coming in, and there is always additional prior information in reserve that hasn’t entered the model.

“argument ***for*** Bayesian inference…there is always additional prior information in reserve that ***hasn’t entered the model***.”

I have heard several types of arguments. Some have been as to whether Bayesian inference is the ideal against which all inferential methods should be measured, and others have been as to whether inference should be done by discarding anything outside of the output of one’s Bayesian models constructed as best as one could make them.

The last sentence of the post clarifies the ambiguity in the first. This counterargument only applies against those who would discard everything outside of their formal Bayesian models. Yet I most often see the Dutch Book argument in support of the first argument, the argument of ideals!

This was often a major point of discussion in the Decision Analysis program at Stanford.

Clearly, it’s unlikely that a human could be perfectly coherent all the time in the real world. However, if you always make your major life decisions in a wildly incoherent manner, you will be much worse off. Metaphorically, “life”, is using you as money pump.

Moreover, you could be used as an actual money pump in certain situation. If you’re a rich playboy with a large “posse”, you can bet consistently making decisions that our substantially incoherent will pump money from you to your posse. If you’re a business that’s consistently incoherent on major initiatives, you can also bet your partners and competitors will pump away your profits.

Exactly! The duch book argument assumes that there is someone knowing the real probabilities and using them to exploit you, but this is a factual assumption that usually is empirically false. The only thing we might conclude is that the better you know the real probabilities, the higher will you score in the game of life, but this is consistent with everybody failing to know them systematically, particularly if the probabilities of singular events are considerably unpredictable.

.

Imagine an analogous argument showing that your skin will be harder than the claws of the predators, because if they are not, preys will be used as a flesh pump.

I always regarded the Dutch Book argument as a way of justifying the axioms of probability theory, not as a justification of Bayesian inference. In my book all they do is to show that if you do not use the axioms of probability theory in any betting situation, you will surely lose. So for example, P(x)1 force you into incoherence, REGARDLESS of whether anyone knows what the real probabilities are. Your opponent doesn’t have to know what the real probabilities are, if you use axioms such as those above, he can make Dutch Book against you.

Example: The operator of a pari-mutuel betting scheme at a racetrack doesn’t know the probability of any horse winning. But she can arrange the payoffs in such a way that the track will have a positive return regardless of which horse wins. She makes Dutch Book against the ensemble of bettors, which as an ensemble are incoherent (that is, if a single bettor made the bets that the ensemble makes, he would be violating the axioms of probability in one or another respect). The payoffs, of course, are set when the betting window closes and before the horses take off.

Anyone who doesn’t like the Dutch Book way of doing this can consider instead Cox’s (1946) argument, which shows that probability theory is the unique extension of ordinarly logic to a multi-valued system. This is the approach that Jaynes preferred.

In my comment, it came out “P(x)1”, when I wrote “P(x) lessthan 0 or P(x) greaterthan 1”.

This is rather annoying. The blog ought to support inequalities typed into text.

It’s another case where Bayesian methods deliver “more than they should”. By this I mean, loosely speaking, that Bayesian solutions have great or optimal properties in ways that weren’t envisioned when the methods were first presented/discovered. Bayesian methods will often turn out to be very good using criterion completely different from those that motivated them.

Frequentist methods, on the other hand, seem to consistently deliver “less than they should”. Take p-values for example. They were chosen to as some kind of measure of strength of evidence. They don’t quite live up to that goal, but more importantly they don’t have any surprising optimality or other desirable properties. In fact, does anybody know of any Frequentist methods that turned out later to have surprising properties (other than the ones that motivated the method in the first place)?

This is relevant because researchers often use this as a heuristic to guide research. For example, a pure mathematician may set down some intuitive axioms for a mathematical structure. If that structure contains surprising results that weren’t “put into the axioms”, they take that a sign their research is on track. On the other hand, if the axioms only imply things they already knew or guessed at, then they take that as evidence they’re wasting time.

By this heuristic then, developing the Bayesian approach is more promising then the Frequentist one. No doubt Frequentists would reject this reasoning, but I’d love to know what arguments they would use to do so.

@Entosphy, can you give some more examples of Bayesian methods working out so well? There are lots of Bayesian methods that, arguably, don’t have great properties – Bayes Factors come to mind, or situations where default priors turn out to have much more influence than one naively expects. Now, it may be that such examples don’t behave well because the problems they address are ones where nothing _can_ work well, but they’re still not delivering “more than they should”.

I wasn’t talking about “working out well” exactly. I was talking about them having great properties in ways that weren’t’ explicitly “put in” and that they weren’t engineered to deliver. The Dutch Book result is one that is surprising to someone who first encounters Bayesian Statistics. But I don’t know of any properties of p-values that are surprising (other than the ones that p-values fail to have).

Here’s another example of what I’m talking about (quoting from http://en.wikipedia.org/wiki/Admissible_decision_rule):

“According to the complete class theorems, under mild conditions every admissible rule is a (generalized) Bayes rule (with respect to some prior — possibly an improper one—that favors distributions where that rule achieves low risk). Thus, in Frequentist decision theory it is sufficient to consider only (generalized) Bayes rules.”

The point is Bayes Rules weren’t chosen to have anything like the property of admissibility.

So my question remains: what surprising properties do Frequentist methods have other than the ones that motivated them in the first place? For example do p-values and Confidence Intervals have some surprising connections to completely different optimality requirements? Do they turn out to be the perfect answer in some context very different from the one that motivated them? It’s not a rhetorical question. I actually want to know.

@Entosphy, for a frequentist method (or rather, a method that is traditionally justified by appealing to its frequentist properties) that behaves better than it should, try the bootstrap. It’s not perfect, but works remarkably well in many situations. The lasso also has appealing properties that weren’t obviously “put in” when it was developed.

Fred:

Indeed, I was skeptical of lasso when it first came out because the model didn’t make sense. I was not forward-thinking enough to realize that the method could have good statistical properties nonetheless. More recently, my colleagues and I have been working on prior distributions for point estimation that have good statistical properties even though they would not make sense as priors in a fully Bayesian analysis.

@Andrew … re: lasso skepticism, me too! Hope you’ll be highlighting the work on priors here, it sounds interesting.

Entsophy: But Frequentist’s _have_ avoid having an explicit probability distribution on unknowns, while sill attempting to be appropriately conditional (get a reference set that does not have relevant subsets).

Given that hardship: it’s simply not a fair comparison.

Perhaps a sword fighting analogy might help (and be fun). At some point swords were fitted with small pistols in the handle that could be fired at one’s opponent. So the analogy is the Bayesian has a pistol [prior] in their sword handle, while the Frequentist doesn’t. All that Bayesian has to worry about is the pistol misfiring, perhaps causing them to drop their sword at the worst possible time.

For elaboration see: Two Cheers for Bayes. Controlled Clinical Trials 1996 Aug;17(4):350-2.

Now it was clear to me in the late 80’s that there were obvious career advantages in undertaking a Bayesian approach in the statistics discipline. But I was very concerned about clinical researchers being so inexperience in dealing with priors (the pistol above) they would almost always meet their end from the pistols critically misfiring [they would be safer without them.) Today, I think those risks can be better controlled – though wonder how often they actually are.

Fred/O’Rourke

Thanks for the reply and I’ll definitely look up “lasso” which is unfamiliar to me. I actually wasn’t thinking about which is better for applied statistics, but rather from the point of view of theoretician deciding on where to put their effort correcting problems.

In a less controversial area of pure mathematics (or even theoretical physics) the existence of deep surprising theoretical connections (Bayes/Dutch Book or Bayes Rules/Admissibility) and the seeming absence (to me anyway) of such connections on the Frequentist side would be a taken a strong heuristic directing one towards developing Bayes.

The Great mathematicians and physicist of the past used heuristics like this all the time to direct their research efforts, which partially accounts for why they made so many of the key discoveries.

Entsophy:

I’ll try while I am finishing my coffee to give a loose sense of where the challenges are Bayes versus Frequentist using a simple example of single group sample of size 20 (independent and notionally identical) with interest in location and spread.

Begin with a data model with two unknown parameters. The specification has dimension D^2 * X^20.

Subjective Bayes has a prior on D^2 (which cannot be questioned by anyone!) and the observed sample identifies a single point in X^20 and all there is to do is to determine/approximate the distribution on D^2 given that X point (the posterior) and possibly summarize that.

Frequentist has to somehow make an informed guess about a function on X^20 the provides a good estimate of something in D^2 (or some function of D^2) and cleverly determine the distribution of that function over all points in X^20. Then iterate between checking some arguable important but vague properties of that function as an estimate of something in D^2 and coming up yet with more clever variations.

The bootstrap is a good example: the function on X^20 is the empirical distribution function, the cleverly determine the distribution of that function over all points in X^20 was by sampling with replacement from the empirical distribution function _rather_ than determining all the possible points in X^20 and then sampling without replacement from those and the yet with more clever variations is the BCA stuff. This need not be obvious to most or anyone – Peter McCullagh missed the “sampling without replacement” point in a talk he gave at University of Toronto on the mathematics of the bootstrap but was quick to realize there was almost no loss of efficiency by with replacement.

My memory of LASSO (and don’t forget LARS) is not very good but think it involved mostly realizing clever ways to calculate things – perhaps check out Rob Tibshirani’s web site for this stuff.

Now when people start to question the priors (i.e. check them) or worry about relevant sampling properties for Bayes they again have to deal with the sample ill-defined problems. Coffee is over.

Andrew: Not sure if its of any interest, but Rob Tibshirani discussed the Lasso in seminars very early in its development and I recall someone speculating a Bayesian version based on a Laplace prior. Not sure how that might have impacted on what Rob did with it.

Copied from wiki “In a Bayesian context, this {LASSO] is equivalent to placing a zero-mean Laplace prior distribution on the parameter vector.”

I have some notes on my blog. E.g. at http://djmarsay.wordpress.com/bibliography/economics/de-finettis-standpoint/ I discuss de Finetti’s take on the Dutch Book and develop a generalization that copes with uncertainty. In your open system, some uncertainty is inevitable, so it seems to me (and Cedric Smith) that a modified Dutch Book is needed.

In my view, the Dutch book argument is a device that allows to formalise a much older intuition about fair betting rates, which is behind the probability axioms. It acts as a bridge between reality and mathematical formalism. As such, it can only be an idealisation (as is the idea of relative frequencies under infinite repetition). In real life normally there wouldn’t be an opponent who makes a Dutch book against you, and sometimes specifying betting rates that would allow this maybe better than following Bayes’s rule dogmatically (for example if information comes in that makes your prior look very foolish).

I still think, though, that the Bayesians (unless they use the frequentist interpretation of probability, which doesn’t necessarily force them into frequentist methods such as p-values) do well to have a “bridge to reality” such as the Dutch book argument in order to give their probabilities meaning. In that respect I find the “extension of ordinarly logic to a multi-valued system” approach much weaker; it doesn’t tell me what exactly is meant by “the probability of A is 0.42”. By the way, regarding Cox/Jaynes, you get uniqueness by coming up with axioms that enforce uniqueness on purpose, not because these are god-given, “natural” or whatever).

(By the way, @Bill Jefferys, what do you mean by “the real probabilities”?)

–By the way, @Bill Jefferys, what do you mean by “the real probabilities”?

I was repeating the words used by Jesus, to whom I was responding. You’ll have to ask him.

” … for a frequentist method (or rather, a method that is traditionally justified by appealing to its frequentist properties) that behaves better than it should, try the bootstrap. It’s not perfect, but works remarkably w … “

The bootstrap can be seen as a non-parametric likelihood approach. Methods based on likelihoods often are “surprisingly good”.