Against overly restrictive definitions: No, I don’t think it helps to describe Bayes as “the analysis of subjective  beliefs” (nor, for that matter, does it help to characterize the statements of Krugman or Mankiw as not being “economics”)

I get frustrated when people use aggressive overly restrictive definitions of something they don’t like. [I originally used the term “aggressive definitions” but I think the whole “aggressive” thing was misleading as it implies aggressive intent, which I did not mean to imply. So I changed to “overly restrictive definition.”]

Here’s an example of an overly restrictive definition that got me thinking about all this. Larry Wasserman writes (as reported by Deborah Mayo):

I wish people were clearer about what Bayes is/is not and what  frequentist inference is/is not. Bayes is the analysis of subjective  beliefs but provides no frequency guarantees. Frequentist inference  is about making procedures that have frequency guarantees but makes no  pretense of representing anyone’s beliefs.

I’ll accept Larry’s definition of frequentist inference. But as for his definition of Bayesian inference: No no no no no. The probabilities we use in our Bayesian inference are not subjective, or, they’re no more subjective than the logistic regressions and normal distributions and Poisson distributions and so forth that fill up all the textbooks on frequentist inference. See chapter 1 of BDA for lots of examples of prior distributions that are objectively assigned from data. Here’s my definition of “Bayesian.” Science in general has both subjective and objective aspects. Science is always full of subjective human choices, and it’s always about studying larger questions that have an objective reality.

Now, don’t get me wrong—there are lots of good reasons for wanting to avoid the use of prior distributions or to use various non-Bayesian methods in different applications. Larry writes, “In our  world of high-dimensional, complex models I can’t see how anyone can  interpret the output of a Bayesian analysis in any meaningful way,” and I have no doubt of his sincerity. I myself have difficulty interpreting the output of non-Bayesian analyses in the high-dimensional, complex models that I work on—I honestly find it difficult to think about non-Bayesian estimates of public opinion in population subgroups, or a non-Bayesian estimate of the concentration of a drug in a complex pharmacology model—but I accept that Larry’s comfort zone is different from mine, and I think it makes a lot of sense for him to continue working using methods that he feels comfortable with. (See here for more of this sort of talk.) So, it’s fine with me for Larry to report his discomfort with Bayesian inference in his experience. But please please please don’t define it for us! That doesn’t help at all.

To get back to Larry’s definition: Yes, “the analysis of subjective beliefs” is one model for Bayes. You could also label classical (or frequentist) statistics as “the analysis of simple random samples.” Both definitions are limiting. Yes, Bayes can be expressed in terms of subjective beliefs, but it can also be applied to other settings that have nothing to do with beliefs (except to the extent that all scientific inquiries are ultimately about what is believed about the world). Similarly, classical methods can be applied to all sorts of problems that do not involve random sampling. It’s all about extending the mathematical model to a larger class of problems.

I do think, by the way, that these sorts of interactions can be helpful. I don’t agree with Larry’s characterization of Bayesian inference, but, conditional on him believing that, I’m glad he wrote it down, because it gives me the opportunity to express my disagreement. Sometimes this sort of exchange can help. For example, between 2008 and 2012 Larry updated his believes regarding the relation of randomization and Bayes. In 2008: “randomized experiments . . . don’t really have a place in Bayesian inference.” In 2013: “Some people say that there is no role for randomization in Bayesian inference. . . . But this is not really true.” This is progress! And I say this not as any kind of “gotcha.” I am sincerely happy that, through discussion, we have moved forward. Larry is an influential researcher and explicator of statistics, and I am glad that he has a clearer view of the relation between randomization and Bayes. From the other direction, many of my own attitudes on statistics have changed over the years (here’s one example).

In case this helps, here are some of my thoughts on statistical pragmatism from a couple years ago.

And, just to be clear, I don’t think Larry is being personally aggressive or intentionally misleading in how he characterizes Bayesian statistics. He’s worked in the same department as Jay Kadane for many years, and I think Jay sees Bayesian statistics as being all about subjective beliefs. And once you get an idea in your head it can be hard to dislodge it. But I do think the definition is aggressive, in that it serves to implicitly diminish Bayesian statistics. Once you accept the definition (and it is natural for a reader to do so, as the definition is presented in a neutral, innocuous manner), it’s hard to move forward in a sensible way on the topic.

P.S. Larry also writes of “Paul  Krugman’s socialist bullshit parading as economics.” That’s another example of defining away the problem. I think I’d prefer to let Paul Krugman (or, on the other side, Greg Mankiw) define his approach. For better or worse, I think it’s ridiculous to describe what Krugman (or Mankiw) does as X “parading as economics,” for any X. Sorry, but what Krugman and Mankiw do is economics. They’re leading economists, and if you don’t like what they do, fine, but that just means there’s some aspect of economics that you don’t like. It’s silly to restrict “economics” to just the stuff you like. Just to shift sideways for a moment, I hate the so-called Fisher randomization test, and I also can’t stand the inverse-gamma (0.001, 0.001) prior distribution—but I recognize that these are part of statistics. They’re just statistical methods that I don’t like. For good reasons. I’m not saying that my dislike of these methods (or Larry’s dislike of Krugman’s economics) is merely a matter of taste—we have good reasons for our attitudes—but, no, we don’t have the authority to rule that a topic is not part of economics, or not part of statistics, just because we don’t like it.

Oddly enough, I don’t really have a problem with someone describing Krugman’s or Mankiw’s writing as “bullshit” (even though I don’t personally agree with this characterization, at least not most of the time) as with the attempt to define it away by saying it is “parading as economics.” Krugman’s and Mankiw’s writing may be bullshit, but it definitely is economics. No parading about that.

113 Comments

1. JSE says:

I have nothing to add except that this is a great post and you should consider posting it once a year.

2. Econometrician says:

>I’ll accept Larry’s definition of frequentist inference. But as for his definition of Bayesian inference: No no no no no.

lol.

3. Entsophy says:

That was a very illuminating post by Mayo. After many paragraphs and comments wherein Philosophers comment on Statistics, and Statisticians comment on Macroeconomics, Wasserman responded thusly to a question about a Physicist who had the temerity of proving a theorem that may be relevant to statistics:

“Baez is a physicist. Why we would we want to know what he thinks about a statistical issue?”

When pressed on the matter he went on to say:

“That would be like me [Wasserman] offering opinions on string theory”

No Larry, it would be you offering opinions on Macroeconomics, or Mayo offering opinions on Statistics (i.e. perfectly legitimate assuming the math is right).

• K? O'Rourke says:

Nicely put.

As JG Gardin said, “you can’t rule out an hypothesis by the way it was generated – but you might choose not to entertain it for that reason (you have to spend your time efficiently)”.

In that light “Why we would we want to know what he thinks about a statistical issue?” perhaps should have been reworded as “I would not choose to spend my time finding out what he thinks about a statistical issue.

• Corey says:

I wish I’d thought of that reply.

(I was the one to which Larry was responding.)

• Baez is not just a physicist, he’s a fantastic author. When I asked Michael Betancourt what I should read to try to understand differentiable manifolds (so that I could understand Riemann manifold HMC), he recommended Baez’s insanely awesome book Gauge Fields, Knots, and Gravity.

I concur with the other Amazon reviewers that it’s one of the best intros to a mathematical subject I’ve ever read, if not the best. (Warning: You’ll have to know a bit about abstract algebra, analysis, and topology for it to make sense, but it should be relatively easy and fun reading for other undergrad math majors like me.)

4. Nick Cox says:

There is a perfect loop or meta-something lurking here. The definition of “aggressive definition” is itself an aggressive definition, of something defined or implied to be unacceptable, i.e. it is “restrictive” (evidently a bad thing) and “aggressive” (clearly a very bad thing; who votes for aggression?), and which was introduced as the source of frustration….

More seriously, what is a definition any way? There is a whole spectrum of aims behind definition, from claims about the real, essential, or correct meaning of something (what is justice? what is the American way?), to purely local statements which boil down to making plain the terminology you are going to use in some document or talk. I’d suggest that definitions are often akin to notation choices in mathematics; they should be made clear if lack of clarity would be a problem, and we judge them largely on their clarity, helpfulness, etc. but recognize that at bottom there is large scope for arbitrary choices. Nothing new in this, of course; the debate on nominalism and essentialism (or whatever other terms you prefer) stretches way back to the ancient Greeks if not earlier.

Wasserman’s definition of Bayesian does indeed seem surprising, but he made it pretty clear. Better that clarity, which at least can be discussed, than the word fog endemic in so many disciplines at present….

• Andrew says:

Nick:

Sure, it’s good for Larry to be clear, but it’s a bit awkward for him to be criticizing an entire school of statistics, representing lots of researchers, based on a narrow and outdated definition. My point is that it makes more sense for the definition to come from the practitioners within a field, rather than from an opponent.

Regarding my second example: Paul Krugman is a prominent economist. It seems silly to describe his work as “parading as economics.” It is economics; it’s just a form of economics that Larry doesn’t like.

• Nick Cox says:

I largely agree on the examples. But there is a long line between Krugman’s more technical statements on economics and his social and political arguments, and I doubt that economists would draw a line in the same place between what was economics and what was something else. So falling back on the idea that there is an economics that is obvious to practitioners isn’t going to help much.

Would you think that (people you regard as) racists are the best people to define racism? They are the practitioners.

• Andrew says:

Nick:

1. I agree if we are talking about Krugman’s non-economic writing, but Larry wrote of “Paul  Krugman’s socialist bullshit parading as economics.” The socialist stuff (to the extent you want to characterize Krugman and socialism in that way) is certainly on the economic end of the spectrum!

2. I’m not sure about the racists. I do think their self-definition could be useful but this is a tricky case because I assume that most racists would not describe themselves as racists. In contrast, I am happy to have my Bayesian methods described as Bayesian (even if I am uncomfortable with that term for some reasons).

• Larry Wasserman says:

Andrew
Where did I use that definition to criticize Bayes?
I was not criticizing Bayes at all.
Read what I wrote.
I was commenting on a lack of clarity about
the differences of Bayesian and frequentist inference.
I find that in many papers, blogs etc there is confusion about the
differences. But my definition, while over-simplified, was not
intended as a criticism.
Larry

• Andrew says:

Larry:

It looked to me like a criticism when you wrote: “In our  world of high-dimensional, complex models I can’t see how anyone can  interpret the output of a Bayesian analysis in any meaningful way. . . . In the high dimensional  world, you have to choose: objective frequency guarantees or  subjective beliefs. . . . Of course, one can embrace objective Bayesian inference. . . . But this is just frequentist inference in Bayesian  clothing.” To me, you’re tying certain criticisms of Bayesian inference directly to its purported subjectivity. But, as I’ve said before, I don’t see prior distributions as any more subjective than additive models, normal distributions, logistic regressions, and all the other routine assumptions of classical statistics.

• konrad says:

“this is just fequentist inference in Bayesian clothing” Actually, I prefer to think of maximum likelihood (in examples where the likelihood function is strongly peaked) as Bayesian inference in frequentist clothing.

• Mayo says:

People here are entirely overlooking that Wasserman’s reaction was in the spirit of my deconstruction of him and the metaphors/whimsical analogies within: http://errorstatistics.com/2013/12/27/deconstructing-larry-wasserman/

5. Kevin Dick says:

The Krugman case may be different. Many economists like to point out that what Krugman as a blogger and columnist writes _directly_ contradict things that Krugman published in a professional economics context. I’m not saying that this was necessarily Larry’s argument, but it would be a different class of assertion than simply dismissing someone through re-labeling.

• Andrew says:

Kevin:

Fair point, but then I still would think the appropriate term would be “bad economics” or “economics that Larry disagrees with,” not “parading as economics.”

6. Quartz says:

I find your framing back in terms of different comfort zones truly brilliant, if applied beyond statistical frameworks!
The “translations” of our preferences and comfort zones into rigid, overcharged and unwarranted “conclusions&rules” is always soooo tempting… So many useless (if not counterproductive) fights would be avoided if we were able to keep such a good standard. It has a more positive spin than agnosticism, while keeping the same rigour available… Thanks for providing such a tool!

7. […] see what a somewhat more level-headed statistics professor — Andew Gelman — has to say on his colleague’s, to say the least, somewhat bizarre […]

8. […] Gelman says that as a philosopher, I should appreciate his blog today in which he records his frustration: “Against aggressive definitions: No, I don’t think it […]

9. Sean Matthews says:

Larry Wasserman really wrote ‘Krugman’s socialist bullshit parading as economics’? He would have some credibility, though, I imagine, fewer friends, if he were to write about ‘Andrew Gelman’s Bayesian bullshit parading as statistics.’ In the future I’ll know to discount more or less to the level of noise anything he might say about social policy or political science, on the assumption that it is contaminated by ideological prejudice. Not a good thing for a statistician, I would have thought. [Note: not a left-right judgement – I don’t take, e.g., Theodor Adorno seriously either directly because he was also happy to subordinate facts to ideological prejudice]

• Larry Wasserman says:

Yes I really did write that.
It was a joke but I have learned since then that there is no sense
of humor in the blogosphere.
I do think, however, that he gave up economics long ago to
become a partisan hack. He often writes things that blatantly
contradict his own textbooks.
Here is a small example:

http://cafehayek.com/2014/01/i-dont-know-what-to-call-this-post.html

• Chris G says:

> Here is a small example…

I hope that was a joke too.

• Popeye says:

Krugman Derangement Syndrome. Apologies for the derail but whatever.

That CafeHayek post is bizarre. Krugman is quoted as saying that while you can debate the wisdom of unemployment insurance, there are prominent right-wingers (Robert Barro) who completely dismiss the Keynesian idea that UI can actually increase employment. Krugman basically argues that these people are unreasonable ideologues.

Roberts completely flips this around. He digs up some Krugman textbook quotes where he writes that UI *can* increase unemployment. This is entirely consistent with the current Krugman quotes. But Roberts draws the completely opposite conclusion: because the wisdom of UI is debatable, only a hack would dismiss Barro’s position, and so it is Krugman who is the unreasonable ideologue.

I don’t even.

Roberts is the host of the Econtalk podcast, by the way, which is really great. I shied away for a while because I hadn’t been impressed by his libertarian blogging but he’s a phenomenal interviewer and the 3 minutes in every interview where the interviewee has to delicately step around Roberts’s politics is actually pretty funny.

• Manoel Galdino says:

I agree with Popeye. Also, quoting an undergraduate textbook by Krugman to criticize Krugman’s view is a non sequitur. It’s like someone criticizing Gelman for his criticism of naive regressions by pointing that in his book he regressed earnings to height.

• Corey says:

Skip this comment if you don’t care about economics.

Russ Roberts doesn’t know what to call it when Krugman the textbook author writes:

If unemployment becomes more attractive because of the unemployment benefit, some unemployed workers may no longer try to find a job, or may not try to find one as quickly as they would without the benefit.

whereas Krugman the NYTimes blogger writes,

…enhanced UI actually creates jobs when the economy is depressed. Why? Because the economy suffers from an inadequate overall level of demand, and unemployment benefits put money in the hands of people likely to spend it, increasing demand.

So: there’s no contradiction here; the key phrase is “when the economy is depressed”. Inadequate demand and high unemployment go hand-in-hand — companies don’t hire when people aren’t buying and growth prospects are dim. Increasing UI makes employment less attractive, but only on the margin; when the economy is depressed, it’s not operating on that margin — even people who aren’t satisfied to be on “enhanced” UI can’t find jobs immediately. But they’ll spend those UI benefits (rather than investing them), raising aggregate demand, and companies will hire to meet that increased demand.

To sum up: in a depressed economy, increasing UI benefits will create jobs by raising demand. This is true even though there is a notional marginal individual who would seek employment at the original UI level but not and the enhanced level.

See? Simple.

• Andrew says:

Larry:

You write that Krugman “gave up economics long ago to become a partisan hack.” The problem here, I think, is in the implicit assumption that “economics” and “partisan hack” are mutually exclusive. Krugman (or anyone else) could certainly give up research to become a partisan hack, but Krugman hasn’t left economics.

10. Larry Wasserman says:

Wow.
ok, let me respond to a few things.

1. By subjective I meant “non-frequency.” That is, I was trying to distinguish the Bayesian notion of
probability from long-run-frequencies.

2. I would say I was being terse, but not aggressive.

3. My reply to Corey was based on his suggestion that John Baez would somehow sort out the results in the recent
papers by Houman Owhadi, Clint Scovel and Tim Sullivan. Despite the fact that Baez is a great physicist and mathematician,
I am quite confident that we understand the work of Owhadi, Scovel and Sullivan just fine without his help.
I don’t mean that as a criticism of him, but he’s not a statistician. He is an expert on loop quantum gravity (I mistakenly said string theory).

Larry Wasserman

• Corey says:

I do have to admit that it was a statistician (Dave Higdon) who wrote the blog comment that enabled me to understand just what Owhadi et al. had demonstrated.

• Judea Pearl says:

Larry,
I applaud our posting on Bayesian/frequentist analysis and on the need to clarify the differences.
In an article published in 2001, I responded to that need with “Why I am only half Bayesian?”
http://ftp.cs.ucla.edu/pub/stat_ser/r284-reprint.pdf
where I wrote:
“I turned Bayesian in 1971, as soon as I began reading Savage’s monograph The Foundations of Statistical Inference [Savage, 1962]. The arguments were unassailable: (i) It is plain silly to ignore what we know, (ii) It is natural and useful to cast what we know in the language of probabilities, and (iii) If our subjective probabilities are erroneous, their impact will get washed out in due time, as the number of observations increases. Thirty years later, I [Pearl] am still a devout Bayesian in the sense of (i), but I now doubt the wisdom of (ii) and I know that, in general, (iii) is false.”

Today, seeing how the boundaries between being and not being a Bayesian are shifting all over,
(e.g., Andrew even claims that Bayesian analysis is not more subjective than frequentist,)
I am not so sure that it is important to decide who is a Bayesian and who is not.
Instead, what is important is to decide:
(i) Should we ignore what we knew prior to obtaining data? (ii) In what language should we encode what we know? In probabilities only? and (iii) What kind of guarantees would satisfy us that the decisions
we make as a result of our analysis are better than flipping a coin or consulting an Astrologist?

I think it would be more helpful if researchers labele their methodological philosophies along these three dimensions, instead of the fuzzy Bayesian/frequentisonst dichotomy

• Larry Wasserman says:

Judea
I agree. The Bayes/freq dichotomy is indeed fuzzy
and I often complain about that (although people seem to
think I am criticizing Bayes when I do that).

It reminds of politics.
The idea of dividing things into left versus right leads to all kinds of confusion.
The simple idea of adding one more dimension led to the famous Nolan chart:
http://en.wikipedia.org/wiki/Nolan_chart
which is at least a small improvement on the simplistic left vs right dichotomy.
Perhaps we need a Nolan chart for statistics?
Larry

• Andrew says:

Larry:

1. You write: “By subjective I meant ‘non-frequency.’ That is, I was trying to distinguish the Bayesian notion of probability from long-run-frequencies.” That’s interesting. I actually think there is not such a difference; in either case we are talking about frequency within a reference set. We discuss this in chapter 1 of BDA in the context of several different examples. But, next time, I’d recommend replacing “subjective” with “finite-sample” or something like that, since there’s nothing particularly subjective about Bayesian probabilities (at least, the way I work with them). In Bayes or non-Bayes you need to define the reference set.

2. As I wrote above, I don’t think you were at all being personally aggressive, but I think the definition you used is (implicitly) aggressive in that it limits and restricts what Bayesian inference is.

• Judea Pearl says:

Andrew,
You write: “There is nothing particularly subjective about Bayesian probabilities” . Fine.
But this does not answer the question: “How should we encode subjective knowledge in case
we really need to use such knowledge in the analysis”?
Is the prior distribution p(theta) our only option ?
What do we do, for example, if we wish to express our strong belief
that “the rooster crow does not cause the sun to rise”.
Is p(theta) a good way of expressing it? What theta should we use for the poor rooster?

• Andrew says:

Judea:

This rooster thing is not really close to anything I ever work on. But I do work on problems where I encode prior knowledge; see, for example, this paper. I just don’t think of this prior knowledge as particularly “subjective.” It’s certainly no more subjective than are the standard assumptions such as normal distributions and logistic regressions of classical statistics.

To put it another way: your questions are interesting in their own right, but they are separate from my original point, which is that Larry, in the process of criticizing Bayesian statistics (which is just fine, of course I have no problem with him giving his criticisms and sharing his perspective), is using a highly restrictive definition of Bayes.

• Rahul says:

I just don’t think of this prior knowledge as particularly “subjective.”

If you asked ten sector experts to independently write down their priors how close would they be?

• konrad says:

If you asked ten sector experts to independently write down their likelihood functions for [molecular evolution / genotype-phenotype maps / EEG dynamics / speech-to-text / weather patterns / stock market dynamics / consumer behaviour / voter behaviour / etc], how close would they be?

• Anonymous says:

Interesting that those examples are all basically examples where the likelihood has to be inferred by observing nature. I think there’s a fundamental difference between cases where the randomization process is under the control of the experimenter/engineer. The likelihood function when treatment assignment is randomized or for a device with designed properties tends to be less controversial.

Thus, I think frequentist guarantees / testing procedures are often fine in the latter case, but get misapplied in observational settings. I think focusing on estimation is more useful in observational settings, and hence Bayesian “default” procedures with their regularization/shrinkage tend to be a better fit.

• Andrew says:

Anonymous:

Yes, exactly. That’s why I wrote that the analogy to the statement, “Bayesian statistics is the study of subjective beliefs” is the statement, “frequentist statistics is the study of simple random samples.”

• Judea Pearl says:

Andrew,
I will try to convince you that the Rooster-Sun example
(i.e., that the rooster crow
does not cause the sun to rise) is very close to the work
that you, as well as other data-analysts are doing.
The example illustrates three key features
in the Bayesian/frequentist discussion.

First, it illustrates the type of subjective, commonsense
knowledge that every data-analyst uses in a typical study.
Second, it illustrates that this knowledge is not normally
expressed in the Bayesian schema of going from priors to posteriors.
which you proposed as the distinctive feature of Bayesianism.
Third, this knowledge has no testable implications, namely,
no matter how many samples we take (in observational
studies), the posterior will equal the prior.

I agree with you that Larry, in equating Bayes with
subjectivity, is using a restrictive definition
of Bayes. I do not agree however with your statement
that “this prior knowledge..
is certainly no more subjective than are the standard assumptions
such as normal distributions… of classical statistics”
There is a fundamental difference between the standard
assumptions (e.g., of normality) and the ones invoked in
Bayes analysis (e.b., assuming a prior on sigma) in that
the former is asymptotically testable and the latter
is not. (Norm Satloff made this point below).

From my perspective, it is helpful to classify researchers
into three categories, depending on how they use
subjective assumptions:
1. Frequentists — Use subjective assumptions, but only
those that can be tested (asymptotically) from data.
2. Formal Bayesians — Use subjective assumptions, regardless
of whether they are testable, but only those that
can be cast in the form of P(theta) and P(theta|data).
3. Pragmatic Bayesians — Use subjective assumptions,
both testable and untestable, but only those that are
transparent, namely, they can be scrutinized by one’s peers,
or can be defended on scientific grounds.
This means that my confidence in the veracity of those assumptions.
should be sufficient to assure me that I do
better than flipping a coin if I follow the
recommendations of the analysis.

For example. My confidence in the veracity of the assumption
that alpha is normally distributed is very low, if alpha
is some parameter in the prior distribution of beta, which
is some parameter in the prior distribution of gamma, which
is ….etc, etc..
On the other hand, my confidence in: “the rooster crow
does not cause the sun to rise” is quite high, despite the
fact that I cannot express this assumption in the traditional
Bayes language P(theta) or P(theta|data)

In short, I think that your characterization of Bayesianism
as going from priors to posteriors is too restrictive;
for it rules out most scientific knowledge.

• konrad says:

Judea: As you have shown convincingly over the last two decades, probability theory (whether Bayesian or not) does not describe all scientific knowledge – for that, it needs to be extended by adding causal language. The causal extension to either Bayesian or non-Bayesian probability theory is certainly useful. However, this extension is not required in all applications or for all purposes, and both Bayesian and non-Bayesian probability theory continue to be used without it in many applications. Surely, then, it must be possible to give an accurate characterization of Bayesianism without referring to the causal extension?

• judea pearl says:

Konrad,
By all means, it may be possible to give such characterization without referring to the causal extensions,
But will not be complete. Let me explain why, by first asking:
Why are we sweating
so hard to characterize a social phenomenon called
Bayesianism (i.e., “who calls himself Bayesian”? or “who is
entitled to call himself Bayesian”?) Shouldn’t we first
characterize what features make Bayesian methods useful.

Larry identified one such feature: Subjectivism, namely the
ability to go coherently from a set of judgmental beliefs
called “assumptions” to another set of judgmental beliefs
called “conclusions”. I can see why this is useful, because
we do have useful knowledge in our heads that we do not
want to leave untapped and, if we have a coherent way of
combining it with data to get useful conclusions — viva Bayes.
Andrew said that this is too narrow,
since (1) there is more to Bayesianism than subjectivism, and
(2) Even “frequentists” use subjective assumptions.
According to Andrew, assigning priors to parameters
is the defining feature of Bayesianism., not subjectivism.
This may be true in practice, but it does not answer our question.
“Assigning priors to parameters” may characterize what Bayesians
normally do, but it does not tell us why it is useful.
Usefulness comes when we consider the
exercise of “assigning priors to parameters” .
in the context of “model averaging”, namely, we have
subjective beliefs that one model is more
likely to be correct than another, express this belief
in the form of priors and the rest is mathematics.
But now we are back to subjectivism. And I still do not see
intrinsic value in “assigning priors to parameters”
outside the context of model averaging.
I am willing to learn.

The story does not end here.
Once we focus on subjectivism as a useful feature
of Bayesianism, we need to ask ourselves “where does this
useful judgmental knowledge come from?” And when we do that,
we find from many examples and psychological experiments
that, you guessed right, it comes from causal
knowledge. So now comes the obvious question: Is
the grammar of priors and posterior probabilities
adequate for capturing this new source of knowledge.
You can guess my answer: No.

That is why I am all in favor of “enlightened Bayesianism”
defined by accepting as input any useful knowledge,
in whatever format it comes, combine it with data,
and get useful conclusions whose merit is
guaranteed to be as good as the veracity of
the assumptions. What could be simpler than that?.

In short, I am not against defining Bayesianism, I am against
defining it too narrowly by techniques that mechanically
assign priors to parameters without attending to the assumptions involved.

• Anonymous says:

Consider two states of the world:

A Rooster is what causes sun to sunrise
B Rooster does not cause sunrise

Put priors on states. Observe rooster. One day fox eats rooster. Sun rises.

Does this not work?

• Judea Pearl says:

Dear Anonymous,
Your “fox eats rooster” argument works perfectly,
for me and die other homosapiens, and that is why we are all
committed to the proposition that “the rooster crow does not
cause the sun to rise”.
But, when we try to put our reasoning process in the formal
scheme of Bayes analysis we face a snug:
the very notion of “A causes B” cannot be expressed in the
language of standard probability theory (unless we supplement
it with hypothetical counterfactual variables). Which means that we cannot
achieve our belief about roosters not affecting sunrise by
the strategy of going from priors to posteriors, given the observed data.

BTW, your argument about “fox eating rooster” does not
rule out the possibility that the sun listens to all the
other roosters that crow before dawn. (Unless the fox eats them all;
an event that is not in our data.)

• Entsophy says:

Regarding (3):

So a theorem proven by a Physicist isn’t a valuable addition about that topic, but verbiage and opinions by a Philosopher are?

• Chris G says:

> Regarding (3): So a theorem proven by a Physicist isn’t a valuable addition about that topic, but verbiage and opinions by a Philosopher are?

This.

• dab says:

Is that a koan or is there supposed to be a link there?

• Entsophy says:

>This.

That’s easy for you to say.

• Chris G says:

> That’s easy for you to say.

I do what I can;-) Comment needed to be emphasized and no ‘like’ button to be found.

• Andrew says:

I do “+1” sometimes.

But my usual problem is that I respond to the comments that I disagree with, perhaps thus not drawing enough attention to the comments I agree with. It create a positive incentive for people to argue with me!

• Andrew says:

+1

I don’t agree with everything in that presentation but it’s a pretty good summary. For more on the history, there’s also this and this.

• K? O'Rourke says:

I must strongly disagree (and very surprised George did this after being at some of SAMSI Meta-analysis 2008 summer program).

It has to be a two way street.

I don’t think it helps to describe [Bayes as “the analysis of subjective  beliefs”] Frequentist meta-analysis as as “the analysis of fixed parameters” – when in fact the default by most people today would be random parameters [random effects meta-analysis].

Here’s an old letter to the editor on those issues that I likely don’t agree with everything in it today http://www.ncbi.nlm.nih.gov/pubmed/16118810

p.s. the authors were also sent a pre-print of your [Andrew’s] paper on priors to use other than gamma(.0001,.0001) but that was left out of my letter to the editor.

• konrad says:

Decent presentation. It has a slide titled “Where do priors come from?” listing things like “previous studies”, “researcher intuition” and “convenience”.

I would have titled it “Where do models come from?”

• Martha Smith says:

To my mind, the best part of the Casella talk is on p. 33: “But in the End … We should use all of our tools … Bring what is needed to Solve the Problem.”

A colleague of mine calls himself a non-sectarian statistician. If one wishes to talk about Bayesians and Frequentists (as distinct from Bayesian statistics and frequentist statistics), then it’s important to acknowledge the existence of non-sectarian statisticians — those who advocate, as does Casella, using whatever tools fit the problem.

• Anonymous says:

@Martha

Just wait. Soon we’ll have sectarian non-sectarians!

• Judea Pearl says:

Martha,
I confess to finding zero information in Casella’s statement: “But in the End … We should use all of our tools … Bring what is needed to Solve the Problem.” I do not know Casella, but
in my limited experience, people who resort to such sweeping statements usually mean to say: “I have
no idea what tool is adequate for what problem, but I must say something as if I do ..”

In principle, such statements would be harmless but, practically, some people use them to justify
doing what they are accustomed to do, even when shown that tool-1 is more adequate
than tool-2 for a given task.

• Martha Smith says:

Judea,
My view is not as cynical as yours. I take the quote from Casella to mean that one should use whatever tools are best for the problem (which is not to say that deciding which tools are best is always easy — it is not).

Or, to put it another way: It sounds like you have a prior belief (namely, “people who resort to such sweeping statements usually mean to say: “I have
no idea what tool is adequate for what problem, but I must say something as if I do ..””) that I do not share.

• Chris G says:

> It sounds like you have a prior belief (namely, “people who resort to such sweeping statements usually mean to say: “I have no idea what tool is adequate for what problem, but I must say something as if I do ..””) that I do not share.

My curiosity piqued, What’s the data which supports that prior?

• judea pearl says:

Martha,
I confessed that my skepticism about such sweeping statements reflects my own personal and limited experience.
I am eager to enrich my experience with yours, which promises to be more accommodating.
Can you share some with us? Or, to be concerete, can you tell us what tools Casella finds to be INADEQUATE for a given
problem? I am using negative format (i.e., INADEQUATE), because useful information usually comes in the form of
negation; for example, “let one thousand flowers bloom” is not as informative as “square-root of 2 is IRRATIONAL”

11. Anonymous says:

@Andrew – you define “Bayesian” as “using inference from the posterior distribution, p(theta|y)”, but what’s your definition of the posterior distribution, or more generally speaking, a probability?

• Andrew says:

Anon:

See chapter 1 of Bayesian Data Analysis. We give lots of examples. I find it hard to give a sharp definition of the term “probability,” as probabilities can be defined in different ways. Subjective believes, and drawing balls from an urn, are two extreme examples, but they’re not the only examples.

• konrad says:

Andrew: I understand that refusing to define probability is a deliberate choice in BDA, and that it allows the book to be read from different perspectives. But doesn’t it bother you that the word refers to at least two very distinct concepts?

• Andrew says:

Konrad:

You write: “doesn’t it bother you that the word [probability] refers to at least two very distinct concepts?”

No, it doesn’t bother me at all. Betting and long-run frequency are two different settings where probability can be applied, or, to put it another way, two different paradigmatic models of probability. In applied statistical work, our probabilities typically have some aspects of each. The betting model of probability does not always work because real bets can involve deception, and the long-run frequency model of probability does not always work because it can be difficult to define a large enough reference set (which leads to partial pooling, which is one reason why I see hierarchical models as fundamental to the concept of probability).

• Anonymous says:

Doesn’t the lack of a single definition make the discussion of “correctness” problematic?

For example, here http://statmodeling.stat.columbia.edu/2013/11/21/hidden-dangers-noninformative-priors/ you say that a posterior probability of 84% is too high. But how can we say that 84% is wrong when the 84% can mean more than one thing? It could be correct under one definition and incorrect under another, so it seems like either you must an implicit definition in mind. Otherwise I don’t know how to interpret the statement that 84% is “wrong”.

Maybe there’s a limited set of “correct” definitions and 84% is wrong under all of them? In this case, I’d agree with one of Larry’s previous suggestions to use different terms for different definitions. If subjective belief is an inadequate description of Bayesian probability, perhaps we need 5, 10, or 20 terms. Regardless, using one term per definition seems like it would avoid a lot of imprecise arguments.

• Andrew says:

Anonymous:

Unfortunately the clarity you are looking for does not exist under any definition of probability, as they are all idealizations. Betting doesn’t work in general because if someone offers you a bet it is quite possible they have access to additional information that you don’t have. Long-run frequency doesn’t work in general because we don’t have infinite data, and, even when we have large data sets, conditions change over time.

In the particular example you cite, when I say I don’t believe the 84% probability, I mean, both that (a) I would not bet 5:1 odds on the claim that theta>0, and (b) under many repeated replications of this procedure under these conditions, theta would be greater than 0 less than 84% of the time. This is typical, that a probability statement can be interpreted in many different ways. But all these interpretations are idealizations. That’s one reason why we like to have external validation where possible.

• Martha Smith says:

Andrew says he can’t give a sharp definition of “probability.” I’d go further, and say that it’s a concept like time or point that one can’t give a precise definition of. Nonetheless, it is (like point and time) a useful concept. Here’s the way I approach it with students: http://www.ma.utexas.edu/users/mks/statmistakes/probability.html

• konrad says:

Andrew and Martha: but all four definitions I can think of (Kolmogorov definition, long-run frequency definition, subjective Bayesian definition and Cox-Jaynes definition) are crisp and precise!

• Martha Smith says:

konrad: But do they each (or even collectively) capture all the ways that the word “probability” is usefully used?

• Andrew says:

Konrad: These definitions are not so precise:

1. The Kolmogorov definition is exact but circular, describing probabilities in terms of their mathematical structure, not in terms of what they refer to.

2. The long-run frequency definition is at best approximate given that nothing can be replicated infinitely many times. And in many cases the long run frequency definition has big problems in defining a relevant reference set that is small enough to be relevant to the particular probability of interest, and large enough to allow the long-run frequency to be reasonably accurately defined.

3. The subjective Bayesian definition requires some method of calibration or betting, but in real betting you need someone to take the other side of the bet, and that person will typically have access to other information than you do. To put it another way, bookies have a vig.

4. The Cox-James definition is precise, conditional on the rules you want to include to define your probabilities. But there is no precision in deciding which rules to apply.

• konrad says:

Martha: no, they don’t. My point is that we should use different words to capture the different meanings. That way we can refer to more than one of these meanings in the same context (e.g. long-run frequency and Cox-Jaynes probability are not, in general, equal – and one may want to calculate both in the same application). For the kind of confusion we get by using the same word to refer to these differnt concepts, see
http://statmodeling.stat.columbia.edu/2013/08/27/bayesian-model-averaging-or-fitting-a-larger-model/
and the comments in that thread.

Andrew: those are not imprecisions in the definitions, they are practical reasons for choosing to work with one concept rather than another in a given application. But I don’t see how it helps to give all of these concepts the same name.

• Andrew says:

Konrad:

Probability is a mathematical concept. I think the analogy to points, lines, and arithmetic is a good one. Probabilities are probabilities to the extent that they follow the Kolmogorov axioms. (Let me set aside quantum probability for the moment.) The different definitions of probabilities (betting, long-run frequency, etc), can be usefully thought of as models rather than definitions. They are different examples of paradigmatic real-world scenarios in which the Kolmogorov axioms (thus, probability).

Probability is a mathematical concept. To define it based on any imperfect real-world counterpart (such as betting or long-run frequency) makes about as much sense as defining a line in Euclidean space as the edge of a perfectly straight piece of metal, or as the space occupied by a very thin thread that is pulled taut. Ultimately, a line is a line, and probabilities are mathematical objects that follow Kolmogorov’s laws. Real-world models are important for the application of probability, and it makes a lot of sense to me that such an important concept has many different real-world analogies, none of which are perfect.

• hjk says:

This is a nice analogy Andrew/Martha.

• konrad says:

But it is not a case of having to choose just one of these models in a given application. The boxer-wrestler example is a case in point: there, it is useful to quantify _both_ the long-run frequency _and_ the Cox-Jaynes probability. These two quantities are not in general equal to each other, but in a particular case of that particular example they happened to be numerically equal, leading to confusion because of a failure to distinguish between them conceptually.

• Christian says:

Taking this further: Two definitions of probability based on the (same) mathematical concept of probability may be disjunct having nothing to do with each other, or be related in that they describe similar real world things or are used for similar purposes. With Bayesian and frequentist statistics being two approaches to relate data to probability models, it seems clear that the definitions are comparable and not disjunct. Having two definitions of probability that are related, the question is then whether the two definitions are compatible and describe different aspect of the real world in a compatible way, or whether the definitions contradict each other. In
http://dx.doi.org/10.6084/m9.figshare.867707
I try to argue that it is possible to define Bayesian and frequentist probabilities in a way that the definitions are compatible. To show this, I derive frequentist definitions starting from Bayesian definitions and vice versa. To me this was an interesting and new insight. I’m interested to hear other opinions.
Having shown that the definitions can be compatible does not imply that results of the two approaches are or should be identical on a trivial level. The two approaches ask different questions. For a given problem, one may decide which of the questions is most appropriate and whether to use a frequentist or a Bayesian approach.

• Andrew says:

Christian:

As Rubin has long pointed out, there is no reason to say, “whether to use a frequentist or a Bayesian approach.” One can use Bayesian methods to obtain inferences and frequentist methods to evaluate them.

12. Christian Hennig says:

I like Andrew’s linked general definition of “Bayesian”, particularly because it’s descriptive, it avoids loaded terms, it doesn’t imply anything about quality and it leaves open a wide range of possible interpretations of Bayesian probabilities and analyses.
I also like Larry’s insistence that people should be clearer and more precise about what they’re doing when doing Bayesian analyses.
These two may look like a contradiction, but to me they are not. The thing is that although “Bayesian” can be given such a general broad definition, this definition is not enough to explain what people are doing when they are Bayesian, because there are many different options to interpret this, one of which, but not the only one, is de Finetti’s subjectivism.
So what I think is really needed is that people who write papers with Bayesian analyses explain how they interpret their specific Bayesianism. There are far too many papers in which Bayesian formalism is used, prior distributions are introduced, but where there is no explanation to what extent these priors are meant to be subjective, or if not what else. I think this is a major motivation for Larry’s quote and in this respect I’m with him.

• Andrew says:

Christian:

Yes, I feel that for many years the idea of subjective or objective priors was something of an ideology, with practitioners wanting to be purely subjective or purely objective without realizing that in general science has aspects of both subjectivity and objectivity. (In short: we are subjective in choosing what information to use, and objective in the use of that information to make our decisions.) That is one reason that, in chapter 1 of BDA, we include so many examples of priors derived from data (rather than priors derived from subjective introspection or derived without reference to any data at all). I do think that informative priors are important, and indeed my own thinking on this has changed over the years (with the current version of BDA not being completely up-to-date regarding my current views).

• Larry Wasserman says:

You hit the nail on the head Christian.
I don’t care what statistical philosophy people use.
Thats their business.
I just want them to be clear.
What I like about subjective Bayes
and frequentist statistics is that each is clear about
the meaning of P(A).
In the first case it is my betting rate; in the second case it is a long run
frequency.
Andrew seems to prefers a different definition of probability but I don’t have a clear
understamding of what it is.

• Andrew says:

Hi, Larry. You are a theorist and I can see how it makes sense for you to work with precise definitions. I do applied stuff and for me no single definition works. I recommend you read chapter 1 of BDA. We discuss different motivations for probability, including betting and long-run frequency. Bayes is compatible with both, but to me, betting and long-run frequency are two necessarily imperfect models of the underlying concept of probability.

In some settings, betting is an excellent model for probability; in other settings (in which negotiation is involved), betting is not an ideal model for probability.

In some settings, long-run frequency is an excellent model for probability; in other settings (in which no clear large reference sets are available), long-run frequency s not an ideal model for probability.

I think both these models can be useful, but my larger principle is that probability models should be based on data. They should map to betting odds where available, and they should map to long-run frequencies where available, but probability is larger than either of these metaphors. Again, that is why we illustrate in our chapter with several different examples of empirically-assigned probabilities.

In any case, thanks for getting the discussion started. Perhaps you can see why I did not like the restrictiveness of the definition, “Bayes is the analysis of subjective  beliefs,” and it is helpful for me to be forced to specify my ideas of probability in more detail.

• Christian Hennig says:

“I do applied stuff and for me no single definition works. (…) In some settings, betting is an excellent model for probability; in other settings (in which negotiation is involved), betting is not an ideal model for probability.(…)”

I’d ask people to specify as clearly as they can what they mean “setting-wise”. I think that in a paper treating a single applied study, a single definition should be used and explained (the same author may use a different one for a different setting, all fine by me as long as it’s explained). This may not be controversial!? But from my readings, it seems to be a very rare exception, not a rule.

• Andrew says:

Christian:

Regarding your question above: I think we do a pretty good job in our book and in our research articles.

In any case, my problem was with Larry’s original restrictive definition, which I think led him astray. As noted above, a few years ago, Larry wrote, “randomized experiments . . . don’t really have a place in Bayesian inference.” More recently, he realized that statement was a mistake. Similarly, in this comment thread he is now recognizing that Bayes is not just “the analysis of subjective  beliefs.” That seems like a step forward, too.

• Larry Wasserman says:

ok those are good points.

By the way, I am not just a theorist.
I do lots of applied work too.
Most of my applied work is with astrophysicists.
There is enormous confusion in astrophysics about Bayes
versus frequentist. This is in fact what motivates lot of my comments.
–Larry

• Andrew says:

Larry:

OK, I stand corrected on my overly restrictive definition of what you do!

Also, I’d love to see a discussion between you and David van Dyk regarding different statistical methods in astrophysics. Perhaps the two of you could write a joint paper, or pair of papers, or a back-and-forth, presenting your different perspectives? Then we could get some leading astrophysicists as discussants.

This could be good for Jasa or Aoas or Jrss, no?

• David Rohde says:

I think a certain level of cognitive dissonance i.e. believing contradictory things is almost required for a good practitioner of statistical analysis. Ideas such as randomization and model checking while necessary for a practitioner, I think necessarily raise difficult philosophical issues. On the other hand a pragmatist has no choice but to embrace these methods.

I see Gelman’s work as all about furthering best practice in the here and now. Bayesian data analysis is more than willing to compromise the Bayes to further the data analysis. In particular Gelman rails against subjective Bayesian dogmatism that sees no role for model checking or randomization, but in order to do that he develops a framework that is a bit imprecise philosophically.

I think Wasserman plays a very different role. He makes precise theoretical statements (e.g. definition of Bayes) and provocative comments (not only about statistics). It seems to me Wasserman articulates the cognitive dissonance within statistics, and I like that a lot.

I think both roles are very useful.

Strangely, I feel no need whatsoever to voice my (many) disagreements with Wasserman’s comments.

On the other hand, I am sometimes concerned that Gelman’s presentation of Bayesian Data Analysis as if it were a complete and flawless statistical philosophy might discourage researchers to think about the contradictions within statistical practice.

For example:

It is indeed progress that subjective Bayesian dogmatists who reject randomization in practical settings are rejected by both Gelman and Wasserman.

On the other hand there is serious and practical work such as Bayesian Monte Carlo ( http://mlg.eng.cam.ac.uk/zoubin/papers/RasGha03.pdf ) which wrestles with the philosophical issue of randomization from a Bayesian perspective. I think this work is also progress, and we don’t want this to stop.

• Andrew says:

David:

Given the politeness of your comments, I hate to express disagreement. But . . .

You write of, “Gelman’s presentation of Bayesian Data Analysis as if it were a complete and flawless statistical philosophy.”

But I don’t do this! See, for example, section 6 of this paper, where I explicitly discuss “holes” in my philosophical approach.

Also, if you’re interested in work that “wrestles with the philosophical issue of randomization from a Bayesian perspective,” I recommend you take a look at chapter 8 of BDA. I don’t claim we resolve all the issues or anything close to it, but we do make some connections. And the result of this work is not just theoretical; our integration of Bayesian inference with traditional design-based sampling ideas has resulted in multilevel regression and poststratification (MRP), which is becoming a big deal in political science.

• konrad says:

Larry: several of the regular commenters on this blog (not Andrew, though) are supportive of the definition and axiomatic development of probability advocated in the book by Jaynes. It is (a) clear, and (b) not well described as subjective.

• K? O'Rourke says:

Christian:

“explain how they interpret their specific Bayesianism”

My data informed prior would be that the percentage who could do that is small (< .1), the percent who would is even smaller and to do it well, will be hard for anyone.

So I am going to have to agree with Larry's last sentence below, but whats easy to explain – is not necessarily best.

• Christian Hennig says:

“My data informed prior would be that the percentage who could do that is small (< .1), the percent who would is even smaller and to do it well, will be hard for anyone. (…) but whats easy to explain – is not necessarily best."
Fair enough but is it good scientific practice to do (and publish!) things you can't explain?
Would you agree with Larry (and me) that one good thing about being a subjectivist Bayesian is that they can do it properly?

• K? O'Rourke says:

Seems a bit _aggressive_ to dichotomise my easy to to can’t ;-)

> subjectivist Bayesian is that they can do it properly?

But that is just math, given an expressed (unquestionabale) prior and data model, the posterior can only be exactly this. What does that mean for learning about this world???

Only math can be properly explained.
(Way too many distractions today!)

• Anonymous says:

“Fair enough but is it good scientific practice to do (and publish!) things you can’t explain?” I have heard there is “a gentleman’s agreement” not to probe too deeply into the probabilistic meanings in Bayesian positions that are flexible enough to be all things to all people.

• konrad says:

Christian: the problem is worse than that. In multi-author applied papers, it’s generally possible to achieve consensus on methodology, but far harder to achieve consensus on interpretation.

13. Norm Matloff says:

The analogy to the “logistic and Poisson models that fill our textbooks” doesn’t work. We can use our data to assess the propriety of such models. We can’t do so for subjective priors in Bayesian analysis.

• Andrew says:

Norm:

I’m not quite sure what you mean by “propriety,” but yes we can assess the fit of Bayesian models, see chapters 6 and 7 of Bayesian Data Analysis. And, conversely, take a look at the subjective logistic regressions etc that people do in applied non-Bayesian statistics. These models are full of assumptions that can’t be checked. Bayesian or non-Bayesian: either way, there is an unavoidable level of subjectivity in the choice of the model and the variables to include in it, and a fairly objective procedure of inference that is done, once the model and data inclusion rules have been chosen. In some settings, a Bayesian model represents one more assumption compared to an existing non-Bayesian approach; in other settings, I’d think of a Bayesian model as being a bit more objective than the non-Bayesian equivalent. I don’t mind the use of the term “assumptions” but I don’t think it helps to throw the word “subjective” at every Bayesian analysis.

• Judea Pearl says:

Andrew,
In my posting above, I interpreted “Propriety” to mean “asymptotic testability”,
Are you saying that there are analysts who call themselves “frequentists” or “non-Bayesian”
and would embrace assumptions that are not testable asymptotically? Than what makes these
analysts different from Bayesians? They seem to fit my definition of “Pragmatic Bayesians”.

You said that “in some settings, a Bayesian model represents one more assumption compared to an existing non-Bayesian approach”. I wonder what this extra assumption is.

• Andrew says:

Judea:

I’m not talking about asymptotically. I’m talking about finite data such as are encountered in the sorts of problems I work on, where even when data sets are large, models get large too. For example I might have a survey with 100,000 respondents but then I am interested in slices of the population such as middle-aged college-educated married white women in New Hampshire.

• Judea Pearl says:

Andrew,
You write: “I’m not talking about asymptotically. I’m talking about finite data such as are encountered in the sorts of problems I work on,”
But surely if an assumption is proven to be asymptotically untestable it must also be
untestable in finite data.

Are you saying that testability of assumptions plays no role in the reluctance of
frequentists to adopt Bayesian methods?

Most frequentists I know see a fundamental difference between the standard statistical assumption “X is normally distributed”, which is testable in principle,
and the typical Bayesian assumption mentioned before, that
” alpha is normally distributed”, where alpha
is a parameter in the prior distribution of beta, which
is a parameter in the prior distribution of gamma, which
is ….etc, etc..

• CK says:

Using a counter example a Bayesian may argue that a frequentist is assuming that alpha is uniformly distributed. Perhaps it is not a good idea to set Bayesian/Frequentist boundaries.

• Judea Pearl says:

CK,
It is indeed not a good idea to set Bayesian/Frequentist boundaries.
But it is important to set boundaries between ‘testable’ and ‘nontestable”
assumptions, as well as between “cognitively meaninful” and “congnitively
opaque” assumptions.
This has been my aim since entering this discussion.

• Andrew says:

Judea:

1. Just for example, asymptotically, (y+2)/(n+4) and its associated confidence interval work just as well as y/n. But the interval based on (y+2)/(n+4) gives better answers, as discussed in the (justly) much-cited paper by Agresti and Coull. This is fine—the Agresti and Coull method doesn’t need to be thought of as Bayesian, it can just be thought of as a mathematical trick to get a good regularized frequentist estimate—but the point is that many, many Bayesian estimates can be thought of in this way. Using a regularizer (i.e., a prior) doesn’t violate any frequentist principles. There’s no special statistical property that is possessed by unregularized estimates.

I think what is going on is as follows: You (and others) have a quite reasonable desire for mathematical rigor. You consider Bayesian methods to be less rigorous, and so you keep coming up with objection after objection to Bayesian methods, you (and Larry) speak of “guarantees” that you would like to have. But if you take a Bayesian method and throw away the prior, all you have is an unregularized estimate, such as least squares or maximum likelihood.

As I wrote last year, I like about the lasso world is that it frees a whole group of researchers—those who, for whatever reason, feel uncomfortable with Bayesian methods—to use regularization, acting in what i consider an open-ended Bayesian way.

A desire for mathematical rigor is fine—I recognize we have different tastes here, and I do not want to be dismissive of you and others who want more mathematical rigor, even in places where I do not find it particularly helpful—but there’s no need for you to tie a desire for rigor to a preference for unregularized estimation.

• K? O'Rourke says:

Judea:
> boundaries between ‘testable’ and ‘nontestable”

These may be vague – I use to think data models could be better checked than priors but Mike Evans and others have convinced me not to be so sure. (This might not be of interest to Andrew as I think he prefers to check them jointly in posterior predictive checks.)

Andrew: Maybe I’ll start using this as _my_ definition of frenqentist “having guarantees that do not explicitly involve the use of priors (e.g. vague acceptance of regularization)”.

One thing that I think most will agree is “nontestable” would be the form of the loss function (e.g. trying to decide between regularizations based on expected losses.)

• judea pearl says:

Andrew,
I dont know how I gave you the impression that I have desire to be “rigorous”, or that
I keep coming up with objection after objection to Bayesian methods. I have not seen
ANY such objection in what I ever wrote.

In my reply to Konrad, above, I wrote:
———————————
That is why I am all in favor of “enlightened Bayesianism”
defined by accepting as input any useful knowledge,
in whatever format it comes, combine it with data,
and get useful conclusions whose merit is
guaranteed to be as good as the veracity of
the assumptions. What could be simpler than that?.
——————
I do asked for guarantees, yes, but these are not necessarily mathematical
guarantees. All I am asking is some verbal statement that tells me:
“If you follow this method, you are likely to do better that flipping a coin, where
“likely” comes from the veracity of your assumptions”
Is that asking too much?

• Andrew says:

Judea:

I have a whole book full of Bayesian analyses that are better than flipping a coin, also an article where we did an extensive cross-validation study. Or if you want something simple, check out the article by Agresti and Coull. It is no secret that Bayesian methods work. We didn’t sell 40,000 copies because researchers had a burning desire for subjectivity.

More generally, see section 26.2 of this article. There are many ways of knowing.

In any case, my original point in the above post, which still stands, is that I don’t think it makes sense for Larry, in criticizing Bayesian methods, to characterize them so narrowly. Naive readers might think that when Larry writes, “Bayes is the analysis of subjective  beliefs,” that Larry is speaking in his role as statistics expert and giving a neutral description. Actually, to the extent that “Bayes” includes what is in our book and in much of the literature, Bayes is not the analysis of subjective beliefs. To put it another way, Agresti and Coull’s confidence intervals based on (y+2)/(n+4) are no more subjective than the classical intervals based on y/n. My Bayesian estimates of public opinion are no more subjective than taking raw survey means (which indeed correspond to Bayesian analyses under unreasonable flat priors).

If you want to say that you don’t want to use my methods, fine. Or if you want to say you’d like to use Bayesian methods but interpret them using frequentist principles, fine. Or if you want to say that you think my models are ok but they need to be supplemented with causal modeling, fine. Go for it. But please don’t describe what I do as subjective, at least not if you don’t want to also describe almost all the rest of statistics in using that same word. It just doesn’t help. Especially since I went to great trouble in chapter 1 of our book to explain how probabilities can and are estimated from data.

• Norm Matloff says:

First, I should clarify why I used the word “subjective” when I spoke of subjective priors: I have no objective at all to empirical Bayes methods. Second, I don’t have time to look at those book chapters now regarding “assessing fit,” but I must say I can’t imagine what exactly you might mean by that term; your statement about assessing fit can’t literally be correct, it seems to me. Third, by saying “propriety” rather than, say, “correctness,” I was simply alluding to the fact that no model is 100% correct. (Of course, note too that, being statisticians, we can’t say for sure whether the model is appropriate.)

Now, what I meant in, say, the logistic context is this. We can, for instance, apply nonparametric regression techniques to compare the regression function as fitted by the logit to one that doesn’t make any parametric assumptions, and assess propriety of the logic accordingly.

I do agree that there are SOME frequentist methods that make assumptions of questionable testability. I suggest not using them!

• Norm Matloff says:

Of course, I meant to say I have no objection to empirical Bayes, not that I have no objective. :-)

• konrad says:

When comparing the two regression functions, I take it you would ask which function gives better predictive performance on out-of-sample data? Why would this not be possible for the predictions of Bayesian models?

• Andrew says:

Indeed, see fig 6 on p. 1378 of this paper for an example.

• Chris G says:

For linear regression I’ve had good results using a Geman-McClure prior. (I also used a G-M function rather than L2 to weight data-model deviations.) G-M isn’t as soft as Cauchy – which could be good or bad depending upon how well-matched your model is to your data. G-M seemed appropriate for my data – more so than Huber and Tukey functions at least – and made for a computationally-convenient M-estimator. It would be interesting to go back and apply a Cauchy prior and see how results were affected.

14. […] field collecting your own data, in order to be a good ecologist. Sorry, but this strikes me as an overly-restrictive definition. Not every ecologist is a great natural historian. Nor is every ecologist out in the field […]

15. […] 7. Who is a Bayesian? Another lively debate (105 comments) addressed the 250 year old question: “Who is a Bayesian?” http://statmodeling.stat.columbia.edu/2014/01/16/22571/ […]

16. […] see what a somewhat more level-headed statistics professor — Andrew Gelman — has to say on his colleague’s, to say the least, somewhat bizarre […]

17. […] This came up in a discussion a few years ago, where people were arguing about the meaning of probability: is it long-run frequency, is it subjective belief, is it betting odds, etc? I wrote: […]

18. […] here and here, also here and […]