# Philosophy of Bayesian statistics: my reactions to Cox and Mayo

The journal Rationality, Markets and Morals has finally posted all the articles in their special issue on the philosophy of Bayesian statistics.

My contribution is called Induction and Deduction in Bayesian Data Analysis. I’ll also post my reactions to the other articles. I wrote these notes a few weeks ago and could post them all at once, but I think it will be easier if I post my reactions to each article separately.

To start with my best material, here’s my reaction to David Cox and Deborah Mayo, “A Statistical Scientist Meets a Philosopher of Science.” I recommend you read all the way through my long note below; there’s good stuff throughout:

1. Cox: “[Philosophy] forces us to say what it is that we really want to know when we analyze a situation statistically.”

This reminds me of a standard question that Don Rubin (who, unlike me, has little use for philosophy in his research) asks in virtually any situation: “What would you do if you had all the data?” For me, that “what would you do” question is one of the universal solvents of statistics.

2. Mayo defines scientific objectivity as concerning “the goal of using data to distinguish correct from incorrect claims about the world” and contrasts this with so-called objective Bayesian statistics. All I can say here is that the terms “subjective” and “objective” seem way overloaded at this point. To me, science is objective in that it aims for reproducible findings that exist independent of the observer, and it’s subjective in that the process of science involves many individual choices. And I think the statistics I do (mostly, but not always, using Bayesian methods) is both objective and subjective in that way.

## 17 thoughts on “Philosophy of Bayesian statistics: my reactions to Cox and Mayo”

• I think the hypothetical question, “What would you do if you had all the data?” is specious. In cases where some of the variables are quantitative one can never have all the data. If all the variables are categorical and the population is finite and small enough to take a reliable census then no inference is required. One just describes the population using traditional methods.

Why is Rubin’s question deep or is the emperor missing his clothes?

The “what would you do if you had all the data” question has been helpful to me in many statistical applications I’ve worked on over the years. It may not be helpful to you because perhaps you have already internalized the principle. But for me it is an always-helpful reminder. In regard to your point above, it is not always clear how to “describe the population using traditional methods.” First, even if one had all the possible data there is often a goal of generalizing to new settings. Second, description itself can be a challenge in high dimensions. Even in two dimensions; consider Red State Blue State.

1. I love the discussions and just thought I would chime in something.

I was recently reading “Objectivity” by Daston and Galison after Galison gave a talk at my University. I only had a couple days and was busy otherwise, so I didn’t get through it, but it did have some interesting points about the history of objectivity as an epistemic virtue that is relevant to this discussion.

They write that objectivity is a fairly recent virtue, developing from the ability of scientists to take photographic images and other instruments of the sort in which the observer plays a passive role in the observation. Previously, the guiding ethic was “Truth-to-nature”, in which the scientist would try to best replicate the symmetries and common properties of the underlying properties of the observed – without the flaws and imperfections of any particular specimen. The authors then discuss the emergence of objectivity and the desire of scientists to remove the impact of the observer on the observed. They discuss the limitations of objectivity as well however, how there are unwarranted artifacts and outliers in data that judgement must be used.

I bring this up because it seems connected to your statement:
“If one’s only goal is to summarize the data, then taking the difference of 8% (along with a confidence interval and even a p-value) is fine. But if you want to generalize to the population—which was indeed the goal of the researcher in this example—then it makes no sense to stop there.”

The idea one must prioritize their epistemic virtue; that being objective and being true to the underlying nature are sometimes in conflict and trained judgement is needed to bridge the gap between.

2. Thank you for taking the time to write this. I appreciate you stating clearly the point that “inference about a population” is “between” the two goals of (a) Extracting information from the data and (b) A “personalistic theory” of “what you should believe.”

Also you stating clearly the need for tough choices when working on thorny and thrilling problems where only weak data is available and necessarily informative priors could irrationally (or rationally) prejudice decisions about the objectivity of the analysis – and your solution of sensitivity analysis against different priors. My guess is that using a prior different than the researcher’s preferred prior will just show that no inference at all can be done – dispensing with the tiniest amount of information will prevent any work from being done, such is the skill and sound judgement and good faith of the researcher in his problem space, typically. Infinite sensitivity to a prior even minimally deficient. The benefit of your stance is a total commitment to openness, regardless.

3. For what little it’s worth, I agree with what you’ve written!

When you talk about sensitivity analysis, might this moves things in the direction of Wald’s Statistical Decision Theory?

4. Insofar as humans conduct science and draw inferences, it is obvious that “human judgments” are involved in science. This is true enough, but far too trivial an observation to distinguish between accounts that manage to afford reliable ways to learn about the world and those that don’t. An objective account of inference is deliberately designed to avoid being misled, despite limitations, biases, and limited information. By being deliberately conscious of the ways biases and prejudices lead us astray, an adequate account of inference develops procedures to set up stringent probes of errors while teaching us new things. I am not saying this is true for Gelman (it isn’t), I am saying that I constantly hear over the years, but “we cannot get rid of human judgments” as a prelude for (a) throwing up one’s hands and saying so all science/inference is subjective and/or (b) regarding as false or self-deceived any claim that a method is objective in this scientifically relevant sense– in accomplishing reliable learning about the world. This is dangerous nonsense that has too often passed as a deep insight as to why scientific objectivity is impossible. In the trivial sense (science is done by humans), it is true. The idea of objectivity as the disinterested scientist (as Mike mentions, citing Galison) is indeed absurd. Objectivity demands a lot of interest–interest in not being fooled and deceived.
Methods and procedures for objective learning exist; we should be developing more of them rather than looking for cop-outs. An account is inadequately objective, not because of containing human judgments (that it does is a tautology), but because it does not connect up with the real world adequately, readily allow ad hoc saves of our pet theories in the face of failed predictions and/or commits any number of the errors humans know all too well that they will fall into without stringent checks. You can search “objectivity” on my blog: errorstatistics.com or, even better, in my work.

• I’m all for holding up objectivity as something to strive towards, but the fact that the choice of hypotheses that we even consider are intertwined with “biases and prejudices” means that true objectivity is an inherently unattainable fantasy.

Maybe this is what you mean by “it is obvious that “human judgments” are involved in science”.

• We humans would not desire an “objectivity” that was irrelevant to humans in their desire to find things out, avoid being misled, block dogmatic authoritarian positions where relevant criticism is barred, or claimed to be irrelevant or impossible. Who wants “true objectivity” (whatever that might mean, but perhaps “ask the robot” would do)when it hasn’t got a chance of producing the results we want and need to intelligently relate to the world. I really don’t get it….

5. For both issues in the discussion, it seems to me that we need to avoid the extremes.

In terms of objective/subjective, we can’t throw in the towel and say that “subjective humans are in the loop at every step, so there can be no objectivity”, but we also cannot say, “I used a technique that does not use the word ‘subjective’, so therefore my investigation is objective”. Every decision from what topic to investigate to what and how much data to gather, from what techniques are applicable to whether answers are physically significant and plausible, involves decisions that are made based on a researcher’s experience and agreed-upon conventions (which can vary between disciplines) rather than some objective standard. We attempt to be as objective as possible by developing and following conventions and by being open with our methodology so that others can decide for themselves if we’ve let our subjective decisions twist the result. (I wonder if subjectivity, in the “you let your personal views twist your science” sense, is somehow analogous to overfitting, and objectivity is our attempt at avoiding overfitting.)

In terms of prior information, it seems obvious that all of those decisions I’ve mentioned, are made based on a researcher’s prior information. It’s called “knowledge”. The key is making prior information (assumptions) explicit and visible so that they may also be examined by others. Trying to claim that methodology X does not use prior information is ridiculous: everything else about the investigation uses prior information in one form or another, and methodology X also depends on many assumptions which are judged “met” based on prior/snooped information.

In both cases, explicitly expressing prior information, assumptions, and decisions made in the investigative process is the key. It allows others to examine them for personal biases or oversights. People don’t replicate research, in general, to find arithmetic errors but rather to investigate what prior information, assumptions, and decisions might have been in error. The easier that process is, the more objective the investigation can ultimately be.

It seems to me that the strength of the Bayesian approach is that it attempts to make key assumptions (priors) clearer, and to allow for different assumptions to be more easily tested. That’s the difference between, say, a political discussion and a scientific one: in the political discussion most participants do not make their assumptions clear and often are not aware of them, so the discussion can go round and round with no resolution because it all occurs too high up the reasoning ladder. A scientific discussion can’t occur at the first-principles level, of course, but various proposals must have their roots exposed and available for inspection.

6. Pingback: Links of 2012-02-04 | Bad Simplicity

7. Not to presume to speak for David, but my take on his apparent lack of enthusiasm for formally modelling prior information as probabilities is a bit different – it seems to me to be based on an assessed benefit/cost of doing so (the gain of information at the risk of getting miss-information and all of the work that is involved by all parties to do this well.)

This may be evinced by his writing on partial Bayes (OK to go ahead with priors just for unimportant nuisance parameters) and his verbal criticisms of full probability modeling of multiple biases ala Greenland. He seems to prefer to allow all those who are informed to have input into the formulation of design and structuring of the analysis – so that (further!) prior information becomes unnecessary or at least unimportant. This perhaps being most fully achieved in those convenient cases where higher order asymptotics allows the nuisance parameters to be made independent (and then “any” prior on these gives the “same” answer [quotes to evade technicalities].)

That fits in with my favourite definition of frequentist statistics which is a fervent attempt to avoid formal prior probability models at (almost) any cost. Of course, one of motivations for the avoidance is a discomfort of things that can not be empirically checked (and the reluctance of many Bayesian to re-engage with such discomforts perhaps explain all that reluctance to checking priors). Newer work on prior checking may help here.

But then there is what happens in practice, how people in the field implement Bayes and my first hand experience of most of that has been really horrible. I have difficulty explaining this to Bayesian colleagues who seem to think I am referring to subtle technical mistakes. No it is someone taking a researcher’s data set, running WinBugs with default priors, none of which or any prior is discussed with the researcher and then the percentage of the marginal credible interval for a guessed parameter of interest that is positive is conjured into _the_ relevant probability of that parameter being positive for that researcher _and_ all of their colleagues.

They usually have a Phd in Statistics, usually not be even sometimes supervised by a very capable Bayesian and they get personally upset when I call them on this sort of thing. As an instance, there was a biologist that was thrilled and impressed that the prior contained all the background knowledge on protein structure interactions and how the human biology processed and eliminated the substance. When I asked to see that prior and pointed out that a data equivalent prior of exponential survival with one observed survival of 16 months that was used was unlikely to capture all of that – they were very embarrassed (but they had already written up the findings and discussion for their paper.)

Anyways, no one wants their work evaluated by the worsted its ever being applied, but in choosing whether to be enthusiastic about an approach – it is worth considering.

• O’Rourke: a couple of people asked what I thought of your remark, and I hadn’t seen it, so I came back to it. I actually am not sure I understand it though. But I do think it is good to let people speak for themselves and assume they mean what they say. Cox is very wise.

On your last points, it’s good that you call out the Bayesians you mention on their presumed priors. I have increasingly been hearing stories like this, and helps me to understand what J. Berger and others have been speaking out against. Perhaps, in an ironic twist, some Bayesian ways are being discredited as a product of their own success (in being integrated into easy-to-run programs that make them automatic). But here’s a scary thought; will the Stat Ph.D’s, with all their technical wizardry, be able to go back (to what underlies frequentist and other methods)and think things through ? I think it might be forgotten knowledge, some of it…at least going by some (many?) textbooks nowadays.