I had a discussion with Christian Robert about the mystical feelings that seem to be sometimes inspired by Bayesian statistics. Christian began by describing this article that was on the web about constructing Bayes’ theorem for simple binomial outcomes with two possible causes as “indeed funny and entertaining (at least at the beginning) but, as a mathematician, I [Christian] do not see how these many pages build more intuition than looking at the mere definition of a conditional probability and at the inversion that is the essence of Bayes’ theorem. The author agrees to some level about this . . . there is however a whole crowd on the blogs that seems to see more in Bayes’s theorem than a mere probability inversion . . . a focus that actually confuses–to some extent–the theorem [two-line proof, no problem, Bayes’ theorem being indeed tautological] with the construction of prior probabilities or densities [a forever-debatable issue].”
I replied that there are several different points of fascination about Bayes:
1. Surprising results from conditional probability. For example, if you test positive for a disease with a 1% prevalence rate, and the test is 95% effective, that you probably don’t have the disease.
2. Bayesian data analysis as a way to solve statistical problems. For example, the classic partial-pooling examples of Lindley, Novick, Efron, Morris, Rubin, etc.
3. Bayesian inference as a way to include prior information in statistical analysis.
4. Bayes or Bayes-like rules for decision analysis and inference in computer science, for example identifying spam.
5. Bayesian inference as coherent reasoning, following the principles of Von Neumann, Keynes, Savage, etc.
6. [added at Larry’s suggestion; see comments] Bayesian inference as a method of coming up with classical statistical estimators.
My impression is that people have difficulty separating these ideas. In my opinion, all five of the above items are cool but they don’t always go together in any given problem. For example, the conditional probability laws in point 1 above are always valid, but not always particularly relevant, especially in continuous problems. (Consider the example in chapter 1 of Bayesian Data Analysis of empirical probabilities for football point spreads, or the example of kidney cancer rates in chapter 2.) Similarly, subjective probability is great, but in many many applications it doesn’t arise at all.
Anyway, all of the five items above are magical, but a lot of the magic comes from the specific models being used–and, for many statisticians, the willingness to dive into the unknown by using an unconventional model at all–not just from the simple formula.
To put it another way, the influence goes in both directions. On one hand, the logical power of Bayes’ theorem facilitates its use as a practical statistical tool (i.e., much of what I do for a living). From the other direction, the success of Bayes in practice gives additional backing to the logical appeal of Bayesian decision analysis.
The story continues . . .
The article that Christian had found on the web, which got the discussion going in the first place, was by Eliezer Yudkovsky, a person I’ve never met but who shares a blog with Robin Hanson, a person who I have met and who invited me to contribute to said blog, which I have done on occasion.
It’s always a strange experience for me to be involved in Robin and Eliezer’s blog because the readers, and many of the contributors, have such radically different perspectives than I have, on just about everything. It’s not a bad thing, it’s just strange: my impression is that they generally don’t get the point of what I’m trying to say. And certainly the reverse is true: when I’ve read blog entries there that aren’t by Robin, I generally can’t see their point at all.
Anyway, I posted the above discussion (basically, all except for the previous two paragraphs, to their blog and got the strangest comments. Not that people were saying anything wrong, just they were coming from a traditional theoretical computer science perspective. For them, Bayesian statistics is all about code lengths; for me it’s all about hierarchical models. Which I guess is consistent with my original point. Still, it’s frustrating for me (but perhaps frustrating to some of these people from the other side, that statisticians see Bayes as about models rather than philosophy and code lengths). I thought that communicating with econometricians and non-Bayesian statisticians was tough, but this is a whole new level of difficulty!
Your forgot the most important one!
6. Bayes as a way to find minimax estimators.
if you test positive for a disease with a 1% prevalence rate, and the test is 95% effective, that you probably don't have the disease.
I use this example when teaching, but I have to confess I am not very happy with it. Generally speaking, people who are tested for rare diseases are not randomly pulled from the population at large! We need the base rate for the disease among those selected for testing in the first place. Of course there are other applications where it does work (e.g., universal surveillance).
I agree. But I'm even more bothered by the misunderstanding in which people think that Bayes has to be about discrete parameter spaces. This is one thing, I believe, that misleads people into thinking that Bayes is all about computing "the posterior probability that a hypothesis is true," which is something I can't stand.
Part of the confusion is from the writing. There's usually a strong incentive to write in a way that's considered "stylistically good", or concise, or entertaining, or something like that, even if it makes the writing more likely to confuse or mislead.
You write, "1. Surprising results from conditional probability. For example, if you test positive for a disease with a 1% prevalence rate, and the test is 95% effective, that you probably don't have the disease.", but this depends on how 95% effective is defined.
A lot of laypeople, if not most, may think it means that 95% of the time what it tells you is true, that is with infinite trials, 95% of the time if it says you have the disease then you really do.
Your definition however is different, but if you want to avoid confusion with people who aren't expert in this area, you have to sacrifice some brevity and other elements of what's considered "good writing style", and just very carefully explain this, that this is the odds that the test is just mechanically off and write out a probability tree and go over it carefully.
I don't see that this is necessarily Bayesian though. A frequentest could conclude that in infinitely repeated trials with random patients 5.9% of the time the test will say the disease is present, and 4.95 of those 5.9 percentage points will be cases where the person really doesn't have the disease.
It's interesting that you'd have to give this test at least 3 times, with positives, before you'd get odds of truly having a serious disease that most doctors would be comfortable with for accepting the disease as present.
Also, you write, "the theorem [two-line proof, no problem, Bayes' theorem being indeed tautological]". What's tautological? The fact that the theorem starts with axioms that appear very reasonable and are well accepted? That definition of tautological would make all theorems tautological.
1. I don't know that giving a test three times on the same person would be effective. I doubt the errors would be independent.
2. Regarding your last paragraph: you'll have to take that one up with Christian Robert, whom I was quoting. I fixed up the above entry by closing the quotation marks so this is clear. But, yeah, all theorems are tautological. When the proof is short enough, a theorem is sometimes called an "identity."
the misunderstanding in which people think that Bayes has to be about discrete parameter spaces
I've never encountered that one!
This is one thing, I believe, that misleads people into thinking that Bayes is all about computing "the posterior probability that a hypothesis is true," which is something I can't stand.
But the idea that the point of statistical inference ought to be finding "the probability that a hypothesis is true" is extensively propagated by many people who call themselves Bayesians. There are whole schools of philosophy of science devoted to it, for instance. And it is very natural, witness things like Laplace's calculation of the probability that the sun will rise tomorrow. (It was high.) In fact, I don't really see a way for someone who takes your (5) seriously to avoid it (though a Savage personalist would presumably talk about their probability of truth, etc.).
Code length is nothing else than the -log of the MAP probability.
The problem is how to construct priors that have good guarantees – such as that for any kind of data that might come into the model, there will be parameters such that the posterior won't be lower than this much.
Now, one thing that theoretical computer scientists are good at are priors that can eat any kind of data, and models where you update the parameters one case at a time, using the previous case to predict the subsequent case.
Predictability, Complexity and Learning might be a good introduction to this line of thinking.
Regarding the "discrete parameter spaces" misundertanding, go to the second link at the top of the above blog entry.
Regarding the "probability that a hypothesis is true": Exactly. I agree that a lot of people write about it. That's what bugs me about it. If it were a misconception that almost nobody had, I wouldn't care so much.
Regarding the probability the sun will rise tomorrow: This is the probability of an _event_–which I have no problem with–not the probability of a _hypothesis_. To give another example, I have no problem with someone giving the probability that some particle-accelerator experiment will produce some specified observational result. But I do have a problem with someone computing the posterior probability that quantum electrodynamics is true.
And, yes, I can take (5) seriously without talking about the probability that a model is true.
CS Peirce divided things into 3 piles that might be helpful here
1. could do (chose a certain probability model to represent the problem under consideration – always wrong)
2. must do (implications of a model – deduction – always true "the tautologies")
3. should do ( what "critical common sense" makes of 1 and 2 – i.e. how to interpret p_values or posterior probabilities (or whether they are of interest at all) – the "pragmatics")
I do believe people often confuse these three things and maybe that happening here?
Bayesian Methods "to free the parameters".
Frankly who does believe that the data is random but the parameters determining the data are deterministic?
All logical and mathematical proofs are tautological; this is one of the points made by Wittgenstein in his Tractatus.
Andrew, I would really love to hear you expand upon "5. Bayesian inference as coherent reasoning, following the principles of Von Neumann, Keynes, Savage, etc."
I know that as a social scientist, none of my models are true. But some physicists models could be true couldn't they?
Michael – not answer for Andrew but its too hard for me to refrain from paraphrasing Ramsey's comment on reviewing Wittgenstein's Tractatus
'the author would do well do read the writings of CS Peirce'
and Peirce's answer to your question would be a definite no.
(and Peirce did research in physics jointly with Rutherford and hated arguments by authority like this post)
Anonymous: Very interesting if true.
Charles Sanders Peirce (1839 – 1914) and
Ernest Rutherford (1871 – 1937) clearly did overlap in time. But when and where and on what did they collaborate?
Peirce lived in near poverty and mostly secluded in the latter years of his life. He certainly knew a lot about mathematics and physical science, but I doubt this detail.
Nick – was not expecting a homework assignment this weekend
Hope this suffices
Charles Sanders Peirce: A Life
and search for rutherford
p.s. if you read Brent's book – its worth knowing that (in contrast to Brent) some think Peirce's best work was done in the latter years of his life
I've a personal copy of that book.
A physicist named Lewis M. Rutherfurd is mentioned. No sign of Ernest Rutherford.
Thanks Nick – my personal copy of the book is in another city, but searching the book online, there is both spellings – Rutherford and Rutherfurd.
I had not given a first name, but if that lead people to think is was the more famous one?, that was lending to much authority to the my post
> He certainly knew a lot about mathematics and > physical science
Thats was my point but again arguments by authority are not the most ideal – but for blog posts?
Ian Hacking credited Peirce with developing the Neyman-Pearson theory of confidence intervals
Stev(ph)en Stigler credited Peirce with developing the Fisher randomization test
Now, was Egon Pearson a nephew of Peirce and did Fisher read Peirce??
"But I do have a problem with someone computing the posterior probability that quantum electrodynamics is true."
whats the problem with that?
If you have an alternative theory, a set of experiments and probabilities for the outcomes of those experiments dictated by quantum electrodynamics and alternative theories, you can compute that probabilities.
I do not see the problem. We do not know if quantum electrodynamics is true.
There might be problems if the alternative theories predicts exactly the same outcome, but it that case then they are not separate theories or you need a different experimental approach.