## Why Bayes?

Richard Zur has a question about the motivation for Bayesian statistics. I’ll give his questions, then my response.

I’ve also recently started working with Bayesian statistics for, hopefully, a thesis chapter in my medical physics phd. I’m not a statistician by training, so it’s not always easy for me. Yesterday I approached our very smart, very well-regarded statistics guy about what I’ve been doing and he hit me with “I don’t like bayesian statistics.” Some of the details he brought up I wasn’t able to address well, and I was wondering if you could help. Also, are there any newsgroups that are useful for these sorts of things?

One big issue was “why not just use uniform priors on everything, so we have the likelihood and don’t have to worry about making up priors?” I said that uniform priors are improper, ie. non-integrable, and therefore the posterior may not be integrable. But he said that since the likelihood is by definition integrable and as the priors become more uniform the posterior becomes more like the likelihood, the posterior will be proper for uniform priors. What have I missed?

The other issue was what we were really doing with the fitting. We’re working on ROC analysis, so he says we’re running an experiment to infer values… any introduction of priors will just bias our experiment. But, then he went on to praise the asymptotic interpretation of errors on estimates as the most desirable because we want population estimates, not just the errors for our single experiment. I was pretty frustrated by then, and the meeting had gone on too long, so I let that slide.

I’d appreciate any help you can give me. If you can point me in the right direction for answers to other similar questions I’d appreciate that, too.

Best,
Richard Zur

P.S. Have you heard of anyone else trying the sugar water diet from the self-experimentation paper? I gave it a shot for a while, but I think I eat out of boredom, and the results weren’t as impressive as in the paper.

My response to the person who says, “I don’t like Bayesian statistics”

First off, there are a lot of ways to solve a statistical problem. As Hal Stern says, the key is not the statistical method so much as its ability to make use of relevant information. Bayesian methods have worked well for me in dozens of examples, but others have had just as much success in their own application areas using others. And, in fact, a few of my own favorite papers (see this on public opinion in the U.S. and this on arsenic in Bangladesh) don’t use any Bayesian methods at all!

So, if this guy really doesn’t like Bayesian statistics, and he’s doing well with what he does use, then that’s fine. However, I certainly don’t think he should be discouraging others from using Bayesian methods.

Why not just use uniform priors?

Going to the specific questions . . . first, “why not just use uniform priors on everything, so we have the likelihood and don’t have to worry about making up priors?” The short answer is that often we have a lot of information available that is ignored in uniform priors. For example, if I have a regression forecast, and then some noisy data, then I’d like my new prediction to be some compromise between them. Again, this compromise can be constructed non-Bayesianly–the key is to use both pieces of information in some way. For a longer discussion of the role of the prior distribution in a hierarchical setting (estimating rates of a rare disease), see Section 2.8 of the second edition of Bayesian Data Analysis.

And as we discuss in Chapter 5, we’re not “making up priors”; we’re estimating them from the data! That’s what hierarchical modeling is all about.

“Priors will just bias our experiment”

Now for the second comment: “we’re running an experiment to infer values… any introduction of priors will just bias our experiment.” My answer is that it depends what your goals are. If you want to just summarize experimental results, it can be helpful to ignore prior information. As I put it, a Bayesian wants everybody else to be non-Bayesian.

Generally, however, you are not just doing an experiment once (as your colleague noted). You’re doing it over and over, under varying conditions. In that case you’re trying to infer some underlying structure, and the “prior distribution” is actually a regression-type model of this structure, along with the possibility of experiment-to-experiment variation.

Also

You might want to take a look at Exercise 8.8 in Bayesian Data Analysis (2nd edition) and its solution.

This anti-Bayesian seems to have some elementary misconceptions: "…he said that since the likelihood is by definition integrable and as the priors become more uniform the posterior becomes more like the likelihood, the posterior will be proper for uniform priors." This is wrong. There is no guarantee that the likelihood is integrable with respect to the parameters. Of course, it's integrable with respect to the data, but that's irrelevant. Of course, even if you have a likelihood that happens to be integrable, you can't just say "use a uniform prior", because there is no unique concept of "uniform" – it varies with the arbitrary choice of base measure.

What is perhaps an even more fundamental misconception is hinted at by "he went on to praise the asymptotic interpretation of errors on estimates as the most desirable because we want population estimates, not just the errors for our single experiment". The standard error of an estimate is not at all the same as the population standard deviation. If one is interested in the population standard deviation, then one estimates it. One doesn't try to figure it out backwards from the standard error in the estimate of the mean. And one would NEVER prefer to have a standard error derived from asymptotic theory if one could instead have the correct standard error for the specific finite sample that was actually used. This has nothing to do with Bayesianism (in fact we're talking frequentist standard errors here). It's just common sense.

2. Andrew says:

I agree with you about asymptotic estimates being an approximation, not a goal in itself.

But i disagree with you–a little–about uniform prior distributions. Although the choice of uniform prior distribution is not unique, in many cases a reasonable range of choices of noninformative uniform priors will give similar posteriors. Sometimes I refer to such a uniform prior as a "placeholder". If it works, fine; if the resulting posterior distribution doesn't make sense, that's motivation for putting more information into the prior.

3. Prem says:

I had a doubt, while going though the above posts.

Why one need to use Bayesian prior-posterior analysis for drawing inferences for limited samples, espcially when one can directly use a simple regression analysis?

For example, if there is 5 data points, why not fit them and predict the behaviour rather than using the first 4 points for prior modeling and use the 5th one as evidence to draw inference?