What is a Bayesian?

Deborah Mayo recommended that I consider coming up with a new name for the statistical methods that I used, given that the term “Bayesian” has all sorts of associations that I dislike (as discussed, for example, in section 1 of this article).

I replied that I agree on Bayesian, I never liked the term and always wanted something better, but I couldn’t think of any convenient alternative. Also, I was finding that Bayesians (even the Bayesians I disagreed with) were reading my research articles, while non-Bayesians were simply ignoring them. So I thought it was best to identify with, and communicate with, those people who were willing to engage with me.

More formally, I’m happy defining “Bayesian” as “using inference from the posterior distribution, p(theta|y)”. This says nothing about where the probability distributions come from (thus, no requirement to be “subjective” or “objective”) and it says nothing about the models (thus, no requirement to use the discrete models that have been favored by the Bayesian model selection crew). Based on my minimal definition, I’m as Bayesian as anyone else.

56 thoughts on “What is a Bayesian?

  1. Is it too vague or general to replace “Bayesian” with “probabilistic”?

    I realize that all methods in statistics are fundamentally probabilistic, but the primary sense in which I am a “Bayesian” is that I use probability calculus (like the sum and product rules) to transform noise in my observations into uncertainties in my parameters.

    But, of course, as we have argued here before, I even think that computing full likelihood functions (ie, not just optimizing them to get point estimates) is the “Bayesian” or “probabilistic” thing to do; that is, it is not obvious that you even need to produce a posterior PDF to be seen as a “probabilistic” (or “Bayesian”) reasoner, in my sense.

    I’m just an astronomer, so please delete this comment.

    • David:

      There are nonprobabilistic statistical methods (those without a data-generating mechanism) but there also are probabilistic statistical methods that are not Bayesian. For example, I would consider maximum likelihood to be such a method. Maximum likelihood is not fully probabilistic, but I would still consider it probabilistic.

  2. I’m familiar with this problem. I’ve tried using the term “neo-Bayesian” in lectures and seminars, to refer to the combination of weakly informative (regularizing) priors, hierarchical models, and using the posterior to simulate predictions and critique model structure. That’s what the biologists I teach want to learn.

    But I feel silly trying to coin a new term, since I had nothing to do with forming the approach. If Dr Gelman will coin it for me, I’ll gleefully adopt it.

    • My understanding is that NeoBayesian is usually reserved for the Ramsey-Savage-DeFinetti-Lindley(mark 2) form rather than the Laplace-Jeffreys-Lindley (mark 1 ) form. Thus “Neo-Bayesian” uses real priors rather than regularizing priors.

      If one wants to be positive about the philosophy how about YesBeian?

    • Recall that Jaynes’ book is entitled “Probability Theory: The Logic of Science”.

      Jaynes regarded probability theory to be an extension of logic via the theorem of Cox (and independently Jack Good).

      I don’t know if that can be turned into a name to replace “Bayesian”, though.

      • Maybe Jaynes would have liked “Jefferysarian”?

        Since you can’t get two statisticans to agree on the color of oranges, it doesn’t matter too much what people call themselves now. It’s where they’re going that counts.

        If Statisticians ever succeed in de-muddling the subject, everyone will likely just be called “scientists”. Jaynes would have prefered that name best of all.

      • The idea that scientific reasoning is captured by mathematical probability or is best seen as an extension of formal logic, a tradition from the logical positivists, is one that turns out to be an inadequate conception for scientific learning. I think Gelman agrees.

        • That’s a strong claim, and one that doesn’t sound like it will be popular among the commenters in this thread – is it based on a published critique of the arguments advanced by Jaynes and/or Jeffreys that you can link to? (I’m not sure if Andrew agrees – the last time we discussed it here he said that he had not read the argument by Jaynes.)

          I agree with most of Andrew’s points in his recent papers on this, and do not think that they are inconsistent with Jaynes’s position.

        • One looks in vain at the statistical writings of Laplace, Harold Jeffries, and Jaynes for any sign of influence or even awareness of the Logical Positivists.

          Do you suppose it’s possible that a philosopher and a statistician can’t use probability this way, but mathematical physicist can?

        • So Mayo’s claim is that one particular version of this idea (advanced in the philosophy literature) turned out to be inadequate, but the claim does not extend to the version advanced by actual practitioners (the one that the commenters on this thread are more likely to be familiar with), which by implication has not been critiqued by philosophers? If so, doesn’t that make the stated claim disingenuous?

          Also, _why_ would philosophers not have critiqued Jaynes? Surely it’s well known that his approach is regarded as a pretty big deal in many Bayesian circles (to the extent of making it into text books aimed at senior undergraduates)?

        • Konrad,

          I’ll save you some trouble here. Mayo’s blog is on Gelman’s blog roll. You can find her papers there. It’s a great resource to build a much needed deeper understanding of the Frequentists (there’s some genuine insite you won’t find elsewhere). You can also find critisims of subjective Bayesians, which are uninteresting, because most people just completely ignore subjective Bayes altogether. .

          But what you won’t find there is any comprehension of Jaynes. I can’t find any indication that she’s understood a single argument, technical or philosophical, he ever made. Indeed his mindset is so totally alien to hers it would be difficult for her to do so. If you try to ingage her on the subject, you’ll just be talking past each other to no effect.

          Gelman on the other hand is mostly uninterested in Jaynes. As Gelman has said, “Janyes is no Guru”. Well of course not. Arguably Gelman is more of a guru than Jaynes and “Bayesian Data Analysis” contains fewer technical errors than “Probability Theory: The Logic of Science”. The only reason to read the latter book over the former, is because it opens more fruitful avenues for future investigation. Of course lots of people won’t believe there are such opportunities in Jaynes work, but you don’t need to convince them they’re wrong. You can exploit those opportunities headless of what anybody thinks about anything.

        • @konrad, The two (related but distinct) criticisms I can most easily recall Mayo making of this idea are, first, that mathematical probability applies to exhaustive sets, so we can’t use it to generate new hypotheses (a key task of scientific reasoning), and second, that if we “complete” the hypothesis space with the catchall hypothesis “something I haven’t thought of”, then we’re unable to update on account of having no way to calculate the probability of the data under the catchall hypothesis.

        • @Entsophy: Thanks, I appreciate the summary – I’m not about to wade through a bunch of writing on frequentist philosophy just to see whether objective Bayesianism is addressed along the way.

          I do not think one should read either of BDA or PT:TLOS over the other: the two books have completely different aims. BDA is a guide to practical methodology, while PT:TLOS is about fundamentals. Chapter 1 of BDA starts with “By Bayesian data analysis, we mean practical methods…” and deliberately avoids defining probability: “we assume that the reader has a fair degree of familiarity with basic probability theory” (section 1.5). This allows the authors to target a broad readership who may have very diverse interpretations of what probability actually is. By contrast, PT:TLOS is specifically _about_ what probability is and how methodology is driven by this interpretation.

          @Corey: The first point strikes me as irrelevant – is anyone under the impression that probability theory can be used to generate new hypotheses, or that scientific reasoning consists _solely_ of probability theory? Jaynes addresses the second point in section 4.4 of PT:TLOS, arguing that attempting to complete the hypothesis space is undesirable. Instead, he advocates using probability theory as a tool to assist our human reasoning process.

          This is consistent with Gelman’s falsificationist approach: to Jaynes, probability theory formalises inductive reasoning in a context where explicitly stated information is available – in practical applications, such information typically comes in the form of strong assumptions, and the generation of those assumptions is outside the scope of probability theory. As Gelman has pointed out, the assumption generation / model construction process actually used by Jaynes seems to be a falsificationist mode of reasoning. This can be contrasted to the formal probabilistic reasoning that follows once explicit assumptions have been written down.

          So neither of these criticisms apply to Jaynes’s approach.

        • Corey:

          It does seem a bit unfair to reject someone’s work because you read that they came to a position contrary to what you wanted/hoped.

          Pierce always admitted he was wrong and might very well have arrived at a different position in the modern context – recall what Bayesian meant before 1900.

          And actually getting Pierce right, what his actual position was – is very, very hard.

          People have told me before that they don’t want to read Pierce because they heard he was not a Bayesian – and that seems wrong.

        • K?, I’m not familiar enough with Peirce’s positions to reject them! I mentioned Peirce’s position on probability only to emphasize the confusion that would result from applying the label “Pragmatist” to a person professing AG’s philosophy of statistics.

  3. Given that the good reverend used only a very restricted subset of priors, while Laplace generalized the rule to take into account arbitrary priors, I say we go with Laplacian (historical assertions based on the early sections of this book).

    Then we’d have “Laplacian Data Analysis,” “Laplacian Methods for Data Analysis,” and “Doing Laplacian Data Analysis,” and the like. But then maybe that’d be lumping unlikes together again…

    • Noah:

      I’d prefer Laplacian to Bayesian but making the change doesn’t seem worth the effort, given that I don’t like the sound of any “name”-ian, which to me sounds like a religious sect rather than an approach to science.

      • Plus “Laplacian” is already the name for a differential operator. This could lead to some amusing misunderstandings in that interesting part of mathematics where Probability Theory meets Partial Differential Equations.

    • Stigler has argued more towards Francis Galton as he was one of the earliest to use the prior to represent some knowledge rather than just a lack of it – so Galtonian?

      As an aside, Pierce substituted pragmaticist for pragmatic, saying it might be ugly enough to prevent others from stealing the term. So Bayesianicistic?

      Or maybe I should just read Lewis Caroll more carefully ;-)

  4. It’s just an assigned name to people who used to do statistical inference by using posterior distribution. Its special property is that everyone, at everywhere, will understand how you think(work) when he/she hear that you are a Bayesian.

    • “everyone, at everywhere, will understand how you think(work) when he/she hear that you are a Bayesian”. That presumption is exactly why (or one reason) I recommend the name change. I don’t think it informs how Gelman thinks/works.

  5. I quite like the old term “inverse probability”, but it’s hard to turn that into an adjective that would apply to a person.

    As for David’s comment:

    “I’m just an astronomer, so please delete this comment.”

    Astronomers and physicists have no need to apologise for being involved in this! We have Laplace, Jeffreys, Cox, Metropolis, Jaynes, Skilling, etc etc. :)

    • Most of the best expositions of probability theory I’ve seen were written by physicists. And a notable member of the list is David Hogg :-)

  6. I’ve found that as soon as I talk to scientists other than statisticians, the term ‘statistician’ is generally enough of a turn off to put an end to some discussions and ‘Bayesian’ is simply a different shade of the same colour (if the average chemist/engineer has even heard of the term). Most people want to get more from their data but don’t want to know that this is statistics. ‘Data Scientist’ is a bit trendy/management but it is more descriptive and does not cause people to run screaming from the room.

  7. To me, “Bayesian” denotes someone who understands “probability” as a measure of an information state – to be calculated conditional on data and assumptions rather than measured or estimated. By contrast, “frequentist” is someone who understands “probability” as an empirical quantity – to be measured or estimated rather than calculated. (In cases where frequentist methods apply, frequentist probabilities are is well approximated by Bayesian probabilities; in cases where frequentist methods do not apply – because data sets are not sufficiently informative – they are not.)

    This is a broad definition which includes many different schools of thought, but that’s no reason to avoid the label “Bayesian” – people who want to describe their position more precisely just need to be more specific. My personal preference for distancing myself from the associations Andrew dislikes is “objective Bayesian”.

    I do _not_ think the distinction should be based on methodology – Bayesian methods (e.g. working with posterior distributions rather than point estimates) are just what emerges as the correct solution once you phrase problems in terms of the Bayesian concept of probability. There are many people who use Bayesian methods without really thinking about (or agreeing with) the underlying justification, and I would prefer _not_ to call them Bayesians. I often use frequentist methods, but that doesn’t make me a frequentist.

    • Konrad:

      Regarding your first paragraph: Noooooooooooooooooo! Bayesians such as myself understand probability as an empirical quantity. See chapter 1 of Bayesian Data Analysis for much discussion and some examples.

      • From section 1.5 of BDA: “In Bayesian statistics, probability is used as the fundamental measure or yardstick of uncertainty. Within this paradigm, it is equally legitimate to discuss the probability of ‘rain tomorrow’ or…”

        I do not see how the probability of ‘rain tomorrow’ can be thought of as an empirical quantity? It certainly cannot be sensibly measured or estimated. However, the probability of ‘rain tomorrow’ conditional on a data set and a set of strong model assumptions (which need to be specified precisely) can be _calculated_. This would be an exact calculation of a precisely defined mathematical quantity, not an estimate of an empirical quantity.

        Also in 1.5 of BDA you claim (I think? the text is “many would agree that…”) that the probability of heads for a coin known to be either double-headed or double-tailed is 0.5. If you were talking about an empirical quantity, wouldn’t it be equal to either 0 or 1?

      • In case it helps to clarify the point I’m trying to make, below is an excerpt from an email discussion I recently had with a collaborator (my text):

        In statistics, “estimate” is used to refer to a calculation that only converges (and
        only if it happens to be unbiased) to the right answer as data set
        size increases – our posterior is not that kind of estimate.

        To illustrate the issue here, consider a coin-tossing experiment,
        where D is an observation sequence and theta is the generative model
        stating that the outcomes are iid samples from a binomial distribution
        with bias P(H)=f drawn from a uniform prior. In the case where the
        observation sequence consists of a single outcome, H, we can write
        P(f|D,theta)=2f. This is not an estimate, but an _exact_ expression
        for the posterior distribution of f. We could also write the posterior
        probability for the outcome of the next toss D’ landing heads, which
        is the mean of the previous expression: P(D’=H|D,theta)=2/3. Again
        this is an exact calculation of the specified quantity, not an
        estimate. One could of course _use_ it as an estimate and write
        \hat{f} = 2/3, but this would be silly: we would have no reason to
        believe that f is close to \hat{f} – the problem would be not the
        probability calculation (which is just a statement of fact about the
        probability in question) but the dubious decision to construct a point
        estimator on such a small data set.

      • Why? We want a crisp definition allowing us to decide to whom the “Bayesian” label does or does not apply. Basing it on the definition/interpretation of probability is crisp (and relevant, because this is a fundamental disagreement between the two world views; perhaps even the only fundamental disagreement). I’m not sure that basing it on other distinctions will yield a crisp definition.

  8. ‘More formally, I’m happy defining “Bayesian” as “using inference from the posterior distribution, p(theta|y)”.’

    Doesn’t that come up against Brad Efron’s point that ‘using Bayes rule doesn’t make one a Bayesian’. Which I think is meant to mean that pretty much evryone, including all the ‘frequentists’ Fisher, Neyman, Pearson, etc, would agree that if you are given objective prior probabilities from known setup or as part of the problem, then you’d use Bayes. It’s just conditional probability. Frequentism in a sense was thought of as a way to make do when you couldn’t be sure you had the full information.

    • Of course, there is a Bayesian way to accomplish this, namely, to use an “automatic” prior appropriate to the problem. A Jeffreys prior, a reference prior, etc. Very often, use of such a prior will give results numerically similar to what you’d get using a frequentist approach (although the interpretation would be different).

  9. The cited article is a really great paper, but for those who don’t know it yet, be sure to read footnote 1. In my opinion, that footnote is somehow more important than the rest of the paper.

    • Isn’t a Bayesian someone who views probability as epistemological quantity whereas a frequentist views probability as ontological? At least for me that seems to be the key distinction. A Bayesian finds a probability *for* something, whereas
      a frequentist finds the probability *of* something. Ontology and epistemology are on two different levels, ontology is what the world *is*, whereas epistemology is a *description* of the world. Bayesians believe that probability is there to describe the world, frequentists believe probabilities by itself are fundamental.

      • That is the point of view I was advocating above. Andrew doesn’t seem to like it, but I don’t see how one can read Ch 1 of BDA with an ontological interpretation of probability.

  10. Sounds a lot like the distinction between “thick” (content given by assumption) and “thin” (agnostic w.r.t. content) rationality. Perhaps you’re a thin Bayesian….

Comments are closed.