Christian:

As Rubin has long pointed out, there is no reason to say, “whether to use a frequentist or a Bayesian approach.” One can use Bayesian methods to obtain inferences and frequentist methods to evaluate them.

]]>Taking this further: Two definitions of probability based on the (same) mathematical concept of probability may be disjunct having nothing to do with each other, or be related in that they describe similar real world things or are used for similar purposes. With Bayesian and frequentist statistics being two approaches to relate data to probability models, it seems clear that the definitions are comparable and not disjunct. Having two definitions of probability that are related, the question is then whether the two definitions are compatible and describe different aspect of the real world in a compatible way, or whether the definitions contradict each other. In

http://dx.doi.org/10.6084/m9.figshare.867707

I try to argue that it is possible to define Bayesian and frequentist probabilities in a way that the definitions are compatible. To show this, I derive frequentist definitions starting from Bayesian definitions and vice versa. To me this was an interesting and new insight. I’m interested to hear other opinions.

Having shown that the definitions can be compatible does not imply that results of the two approaches are or should be identical on a trivial level. The two approaches ask different questions. For a given problem, one may decide which of the questions is most appropriate and whether to use a frequentist or a Bayesian approach.

Judea:

I have a whole book full of Bayesian analyses that are better than flipping a coin, also an article where we did an extensive cross-validation study. Or if you want something simple, check out the article by Agresti and Coull. It is no secret that Bayesian methods work. We didn’t sell 40,000 copies because researchers had a burning desire for subjectivity.

More generally, see section 26.2 of this article. There are many ways of knowing.

In any case, my original point in the above post, which still stands, is that I don’t think it makes sense for Larry, in criticizing Bayesian methods, to characterize them so narrowly. Naive readers might think that when Larry writes, “Bayes is the analysis of subjective beliefs,” that Larry is speaking in his role as statistics expert and giving a neutral description. Actually, to the extent that “Bayes” includes what is in our book and in much of the literature, Bayes is not the analysis of subjective beliefs. To put it another way, Agresti and Coull’s confidence intervals based on (y+2)/(n+4) are no more subjective than the classical intervals based on y/n. My Bayesian estimates of public opinion are no more subjective than taking raw survey means (which indeed correspond to Bayesian analyses under unreasonable flat priors).

If you want to say that you don’t want to use my methods, fine. Or if you want to say you’d like to use Bayesian methods but interpret them using frequentist principles, fine. Or if you want to say that you think my models are ok but they need to be supplemented with causal modeling, fine. Go for it. But please don’t describe what I do as subjective, at least not if you don’t want to also describe almost all the rest of statistics in using that same word. It just doesn’t help. Especially since I went to great trouble in chapter 1 of our book to explain how probabilities can and are estimated from data.

]]>Andrew,

I dont know how I gave you the impression that I have desire to be “rigorous”, or that

I keep coming up with objection after objection to Bayesian methods. I have not seen

ANY such objection in what I ever wrote.

In my reply to Konrad, above, I wrote:

———————————

That is why I am all in favor of “enlightened Bayesianism”

defined by accepting as input any useful knowledge,

in whatever format it comes, combine it with data,

and get useful conclusions whose merit is

guaranteed to be as good as the veracity of

the assumptions. What could be simpler than that?.

——————

I do asked for guarantees, yes, but these are not necessarily mathematical

guarantees. All I am asking is some verbal statement that tells me:

“If you follow this method, you are likely to do better that flipping a coin, where

“likely” comes from the veracity of your assumptions”

Is that asking too much?

Martha,

I confessed that my skepticism about such sweeping statements reflects my own personal and limited experience.

I am eager to enrich my experience with yours, which promises to be more accommodating.

Can you share some with us? Or, to be concerete, can you tell us what tools Casella finds to be INADEQUATE for a given

problem? I am using negative format (i.e., INADEQUATE), because useful information usually comes in the form of

negation; for example, “let one thousand flowers bloom” is not as informative as “square-root of 2 is IRRATIONAL”

Konrad,

By all means, it may be possible to give such characterization without referring to the causal extensions,

But will not be complete. Let me explain why, by first asking:

Why are we sweating

so hard to characterize a social phenomenon called

Bayesianism (i.e., “who calls himself Bayesian”? or “who is

entitled to call himself Bayesian”?) Shouldn’t we first

characterize what features make Bayesian methods useful.

Larry identified one such feature: Subjectivism, namely the

ability to go coherently from a set of judgmental beliefs

called “assumptions” to another set of judgmental beliefs

called “conclusions”. I can see why this is useful, because

we do have useful knowledge in our heads that we do not

want to leave untapped and, if we have a coherent way of

combining it with data to get useful conclusions — viva Bayes.

Andrew said that this is too narrow,

since (1) there is more to Bayesianism than subjectivism, and

(2) Even “frequentists” use subjective assumptions.

According to Andrew, assigning priors to parameters

is the defining feature of Bayesianism., not subjectivism.

This may be true in practice, but it does not answer our question.

“Assigning priors to parameters” may characterize what Bayesians

normally do, but it does not tell us why it is useful.

Usefulness comes when we consider the

exercise of “assigning priors to parameters” .

in the context of “model averaging”, namely, we have

subjective beliefs that one model is more

likely to be correct than another, express this belief

in the form of priors and the rest is mathematics.

But now we are back to subjectivism. And I still do not see

intrinsic value in “assigning priors to parameters”

outside the context of model averaging.

I am willing to learn.

The story does not end here.

Once we focus on subjectivism as a useful feature

of Bayesianism, we need to ask ourselves “where does this

useful judgmental knowledge come from?” And when we do that,

we find from many examples and psychological experiments

that, you guessed right, it comes from causal

knowledge. So now comes the obvious question: Is

the grammar of priors and posterior probabilities

adequate for capturing this new source of knowledge.

You can guess my answer: No.

That is why I am all in favor of “enlightened Bayesianism”

defined by accepting as input any useful knowledge,

in whatever format it comes, combine it with data,

and get useful conclusions whose merit is

guaranteed to be as good as the veracity of

the assumptions. What could be simpler than that?.

In short, I am not against defining Bayesianism, I am against

defining it too narrowly by techniques that mechanically

assign priors to parameters without attending to the assumptions involved.

Anonymous:

Yes, exactly. That’s why I wrote that the analogy to the statement, “Bayesian statistics is the study of subjective beliefs” is the statement, “frequentist statistics is the study of simple random samples.”

]]>Interesting that those examples are all basically examples where the likelihood has to be inferred by observing nature. I think there’s a fundamental difference between cases where the randomization process is under the control of the experimenter/engineer. The likelihood function when treatment assignment is randomized or for a device with designed properties tends to be less controversial.

Thus, I think frequentist guarantees / testing procedures are often fine in the latter case, but get misapplied in observational settings. I think focusing on estimation is more useful in observational settings, and hence Bayesian “default” procedures with their regularization/shrinkage tend to be a better fit.

]]>For linear regression I’ve had good results using a Geman-McClure prior. (I also used a G-M function rather than L2 to weight data-model deviations.) G-M isn’t as soft as Cauchy – which could be good or bad depending upon how well-matched your model is to your data. G-M seemed appropriate for my data – more so than Huber and Tukey functions at least – and made for a computationally-convenient M-estimator. It would be interesting to go back and apply a Cauchy prior and see how results were affected.

]]>Indeed, see fig 6 on p. 1378 of this paper for an example.

]]>When comparing the two regression functions, I take it you would ask which function gives better predictive performance on out-of-sample data? Why would this not be possible for the predictions of Bayesian models?

]]>If you asked ten sector experts to independently write down their likelihood functions for [molecular evolution / genotype-phenotype maps / EEG dynamics / speech-to-text / weather patterns / stock market dynamics / consumer behaviour / voter behaviour / etc], how close would they be?

]]>Judea: As you have shown convincingly over the last two decades, probability theory (whether Bayesian or not) does not describe all scientific knowledge – for that, it needs to be extended by adding causal language. The causal extension to either Bayesian or non-Bayesian probability theory is certainly useful. However, this extension is not required in all applications or for all purposes, and both Bayesian and non-Bayesian probability theory continue to be used without it in many applications. Surely, then, it must be possible to give an accurate characterization of Bayesianism without referring to the causal extension?

]]>But it is not a case of having to choose just one of these models in a given application. The boxer-wrestler example is a case in point: there, it is useful to quantify _both_ the long-run frequency _and_ the Cox-Jaynes probability. These two quantities are not in general equal to each other, but in a particular case of that particular example they happened to be numerically equal, leading to confusion because of a failure to distinguish between them conceptually.

]]>This is a nice analogy Andrew/Martha.

]]>I do “+1” sometimes.

But my usual problem is that I respond to the comments that I disagree with, perhaps thus not drawing enough attention to the comments I agree with. It create a positive incentive for people to argue with me!

]]>Konrad:

Probability is a mathematical concept. I think the analogy to points, lines, and arithmetic is a good one. Probabilities are probabilities to the extent that they follow the Kolmogorov axioms. (Let me set aside quantum probability for the moment.) The different definitions of probabilities (betting, long-run frequency, etc), can be usefully thought of as models rather than definitions. They are different examples of paradigmatic real-world scenarios in which the Kolmogorov axioms (thus, probability).

Probability is a mathematical concept. To define it based on any imperfect real-world counterpart (such as betting or long-run frequency) makes about as much sense as defining a line in Euclidean space as the edge of a perfectly straight piece of metal, or as the space occupied by a very thin thread that is pulled taut. Ultimately, a line is a line, and probabilities are mathematical objects that follow Kolmogorov’s laws. Real-world models are important for the application of probability, and it makes a lot of sense to me that such an important concept has many different real-world analogies, none of which are perfect.

]]>> That’s easy for you to say.

I do what I can;-) Comment needed to be emphasized and no ‘like’ button to be found.

]]>>This.

That’s easy for you to say.

]]>Martha: no, they don’t. My point is that we should use different words to capture the different meanings. That way we can refer to more than one of these meanings in the same context (e.g. long-run frequency and Cox-Jaynes probability are not, in general, equal – and one may want to calculate both in the same application). For the kind of confusion we get by using the same word to refer to these differnt concepts, see

http://statmodeling.stat.columbia.edu/2013/08/27/bayesian-model-averaging-or-fitting-a-larger-model/

and the comments in that thread.

Andrew: those are not imprecisions in the definitions, they are practical reasons for choosing to work with one concept rather than another in a given application. But I don’t see how it helps to give all of these concepts the same name.

]]>> It sounds like you have a prior belief (namely, “people who resort to such sweeping statements usually mean to say: “I have no idea what tool is adequate for what problem, but I must say something as if I do ..””) that I do not share.

My curiosity piqued, What’s the data which supports that prior?

]]>Judea:

> boundaries between ‘testable’ and ‘nontestable”

These may be vague – I use to think data models could be better checked than priors but Mike Evans and others have convinced me not to be so sure. (This might not be of interest to Andrew as I think he prefers to check them jointly in posterior predictive checks.)

Andrew: Maybe I’ll start using this as _my_ definition of frenqentist “having guarantees that do not explicitly involve the use of priors (e.g. vague acceptance of regularization)”.

One thing that I think most will agree is “nontestable” would be the form of the loss function (e.g. trying to decide between regularizations based on expected losses.)

]]>David:

Given the politeness of your comments, I hate to express disagreement. But . . .

You write of, “Gelman’s presentation of Bayesian Data Analysis as if it were a complete and flawless statistical philosophy.”

But I don’t do this! See, for example, section 6 of this paper, where I explicitly discuss “holes” in my philosophical approach.

Also, if you’re interested in work that “wrestles with the philosophical issue of randomization from a Bayesian perspective,” I recommend you take a look at chapter 8 of BDA. I don’t claim we resolve all the issues or anything close to it, but we do make some connections. And the result of this work is not just theoretical; our integration of Bayesian inference with traditional design-based sampling ideas has resulted in multilevel regression and poststratification (MRP), which is becoming a big deal in political science.

]]>I think a certain level of cognitive dissonance i.e. believing contradictory things is almost required for a good practitioner of statistical analysis. Ideas such as randomization and model checking while necessary for a practitioner, I think necessarily raise difficult philosophical issues. On the other hand a pragmatist has no choice but to embrace these methods.

I see Gelman’s work as all about furthering best practice in the here and now. Bayesian data analysis is more than willing to compromise the Bayes to further the data analysis. In particular Gelman rails against subjective Bayesian dogmatism that sees no role for model checking or randomization, but in order to do that he develops a framework that is a bit imprecise philosophically.

I think Wasserman plays a very different role. He makes precise theoretical statements (e.g. definition of Bayes) and provocative comments (not only about statistics). It seems to me Wasserman articulates the cognitive dissonance within statistics, and I like that a lot.

I think both roles are very useful.

Strangely, I feel no need whatsoever to voice my (many) disagreements with Wasserman’s comments.

On the other hand, I am sometimes concerned that Gelman’s presentation of Bayesian Data Analysis as if it were a complete and flawless statistical philosophy might discourage researchers to think about the contradictions within statistical practice.

For example:

It is indeed progress that subjective Bayesian dogmatists who reject randomization in practical settings are rejected by both Gelman and Wasserman.

On the other hand there is serious and practical work such as Bayesian Monte Carlo ( http://mlg.eng.cam.ac.uk/zoubin/papers/RasGha03.pdf ) which wrestles with the philosophical issue of randomization from a Bayesian perspective. I think this work is also progress, and we don’t want this to stop.

]]>Judea:

1. Just for example, asymptotically, (y+2)/(n+4) and its associated confidence interval work just as well as y/n. But the interval based on (y+2)/(n+4) gives better answers, as discussed in the (justly) much-cited paper by Agresti and Coull. This is fine—the Agresti and Coull method doesn’t need to be thought of as Bayesian, it can just be thought of as a mathematical trick to get a good regularized frequentist estimate—but the point is that many, many Bayesian estimates can be thought of in this way. Using a regularizer (i.e., a prior) doesn’t violate any frequentist principles. There’s no special statistical property that is possessed by unregularized estimates.

I think what is going on is as follows: You (and others) have a quite reasonable desire for mathematical rigor. You consider Bayesian methods to be less rigorous, and so you keep coming up with objection after objection to Bayesian methods, you (and Larry) speak of “guarantees” that you would like to have. But if you take a Bayesian method and throw away the prior, all you have is an unregularized estimate, such as least squares or maximum likelihood.

As I wrote last year, I like about the lasso world is that it frees a whole group of researchers—those who, for whatever reason, feel uncomfortable with Bayesian methods—to use regularization, acting in what i consider an open-ended Bayesian way.

A desire for mathematical rigor is fine—I recognize we have different tastes here, and I do not want to be dismissive of you and others who want more mathematical rigor, even in places where I do not find it particularly helpful—but there’s no need for you to tie a desire for rigor to a preference for unregularized estimation.

]]>Konrad: These definitions are not so precise:

1. The Kolmogorov definition is exact but circular, describing probabilities in terms of their mathematical structure, not in terms of what they refer to.

2. The long-run frequency definition is at best approximate given that nothing can be replicated infinitely many times. And in many cases the long run frequency definition has big problems in defining a relevant reference set that is small enough to be relevant to the particular probability of interest, and large enough to allow the long-run frequency to be reasonably accurately defined.

3. The subjective Bayesian definition requires some method of calibration or betting, but in real betting you need someone to take the other side of the bet, and that person will typically have access to other information than you do. To put it another way, bookies have a vig.

4. The Cox-James definition is precise, conditional on the rules you want to include to define your probabilities. But there is no precision in deciding which rules to apply.

]]>konrad: But do they each (or even collectively) capture all the ways that the word “probability” is usefully used?

]]>Judea,

My view is not as cynical as yours. I take the quote from Casella to mean that one should use whatever tools are best for the problem (which is not to say that deciding which tools are best is always easy — it is not).

Or, to put it another way: It sounds like you have a prior belief (namely, “people who resort to such sweeping statements usually mean to say: “I have

no idea what tool is adequate for what problem, but I must say something as if I do ..””) that I do not share.

It refers to a previous comment made by Entsophy in this comment thread.

]]>Andrew and Martha: but all four definitions I can think of (Kolmogorov definition, long-run frequency definition, subjective Bayesian definition and Cox-Jaynes definition) are crisp and precise!

]]>CK,

It is indeed not a good idea to set Bayesian/Frequentist boundaries.

But it is important to set boundaries between ‘testable’ and ‘nontestable”

assumptions, as well as between “cognitively meaninful” and “congnitively

opaque” assumptions.

This has been my aim since entering this discussion.

Martha,

I confess to finding zero information in Casella’s statement: “But in the End … We should use all of our tools … Bring what is needed to Solve the Problem.” I do not know Casella, but

in my limited experience, people who resort to such sweeping statements usually mean to say: “I have

no idea what tool is adequate for what problem, but I must say something as if I do ..”

In principle, such statements would be harmless but, practically, some people use them to justify

doing what they are accustomed to do, even when shown that tool-1 is more adequate

than tool-2 for a given task.

Using a counter example a Bayesian may argue that a frequentist is assuming that alpha is uniformly distributed. Perhaps it is not a good idea to set Bayesian/Frequentist boundaries.

]]>Dear Anonymous,

Your “fox eats rooster” argument works perfectly,

for me and die other homosapiens, and that is why we are all

committed to the proposition that “the rooster crow does not

cause the sun to rise”.

But, when we try to put our reasoning process in the formal

scheme of Bayes analysis we face a snug:

the very notion of “A causes B” cannot be expressed in the

language of standard probability theory (unless we supplement

it with hypothetical counterfactual variables). Which means that we cannot

achieve our belief about roosters not affecting sunrise by

the strategy of going from priors to posteriors, given the observed data.

BTW, your argument about “fox eating rooster” does not

rule out the possibility that the sun listens to all the

other roosters that crow before dawn. (Unless the fox eats them all;

an event that is not in our data.)

Andrew,

You write: “I’m not talking about asymptotically. I’m talking about finite data such as are encountered in the sorts of problems I work on,”

But surely if an assumption is proven to be asymptotically untestable it must also be

untestable in finite data.

Are you saying that testability of assumptions plays no role in the reluctance of

frequentists to adopt Bayesian methods?

Most frequentists I know see a fundamental difference between the standard statistical assumption “X is normally distributed”, which is testable in principle,

and the typical Bayesian assumption mentioned before, that

” alpha is normally distributed”, where alpha

is a parameter in the prior distribution of beta, which

is a parameter in the prior distribution of gamma, which

is ….etc, etc..

Andrew says he can’t give a sharp definition of “probability.” I’d go further, and say that it’s a concept like time or point that one can’t give a precise definition of. Nonetheless, it is (like point and time) a useful concept. Here’s the way I approach it with students: http://www.ma.utexas.edu/users/mks/statmistakes/probability.html

]]>@Martha

Just wait. Soon we’ll have sectarian non-sectarians!

]]>To my mind, the best part of the Casella talk is on p. 33: “But in the End … We should use all of our tools … Bring what is needed to Solve the Problem.”

A colleague of mine calls himself a non-sectarian statistician. If one wishes to talk about Bayesians and Frequentists (as distinct from Bayesian statistics and frequentist statistics), then it’s important to acknowledge the existence of non-sectarian statisticians — those who advocate, as does Casella, using whatever tools fit the problem.

]]>Of course, I meant to say I have no objection to empirical Bayes, not that I have no objective. :-)

]]>First, I should clarify why I used the word “subjective” when I spoke of subjective priors: I have no objective at all to empirical Bayes methods. Second, I don’t have time to look at those book chapters now regarding “assessing fit,” but I must say I can’t imagine what exactly you might mean by that term; your statement about assessing fit can’t literally be correct, it seems to me. Third, by saying “propriety” rather than, say, “correctness,” I was simply alluding to the fact that no model is 100% correct. (Of course, note too that, being statisticians, we can’t say for sure whether the model is appropriate.)

Now, what I meant in, say, the logistic context is this. We can, for instance, apply nonparametric regression techniques to compare the regression function as fitted by the logit to one that doesn’t make any parametric assumptions, and assess propriety of the logic accordingly.

I do agree that there are SOME frequentist methods that make assumptions of questionable testability. I suggest not using them!

]]>Is that a koan or is there supposed to be a link there?

]]>> Regarding (3): So a theorem proven by a Physicist isn’t a valuable addition about that topic, but verbiage and opinions by a Philosopher are?

This.

]]>Consider two states of the world:

A Rooster is what causes sun to sunrise

B Rooster does not cause sunrise

Put priors on states. Observe rooster. One day fox eats rooster. Sun rises.

Does this not work?

]]>Anonymous:

Unfortunately the clarity you are looking for does not exist under *any* definition of probability, as they are all idealizations. Betting doesn’t work in general because if someone offers you a bet it is quite possible they have access to additional information that you don’t have. Long-run frequency doesn’t work in general because we don’t have infinite data, and, even when we have large data sets, conditions change over time.

In the particular example you cite, when I say I don’t believe the 84% probability, I mean, both that (a) I would not bet 5:1 odds on the claim that theta>0, and (b) under many repeated replications of this procedure under these conditions, theta would be greater than 0 less than 84% of the time. This is typical, that a probability statement can be interpreted in many different ways. But all these interpretations are idealizations. That’s one reason why we like to have external validation where possible.

]]>Doesn’t the lack of a single definition make the discussion of “correctness” problematic?

For example, here http://statmodeling.stat.columbia.edu/2013/11/21/hidden-dangers-noninformative-priors/ you say that a posterior probability of 84% is too high. But how can we say that 84% is wrong when the 84% can mean more than one thing? It could be correct under one definition and incorrect under another, so it seems like either you must an implicit definition in mind. Otherwise I don’t know how to interpret the statement that 84% is “wrong”.

Maybe there’s a limited set of “correct” definitions and 84% is wrong under all of them? In this case, I’d agree with one of Larry’s previous suggestions to use different terms for different definitions. If subjective belief is an inadequate description of Bayesian probability, perhaps we need 5, 10, or 20 terms. Regardless, using one term per definition seems like it would avoid a lot of imprecise arguments.

]]>