## Bayesian statistical pragmatism

Rob Kass’s article on statistical pragmatism is scheduled to appear in Statistical Science along with some discussions. Here are my comments.

I agree with Rob Kass’s point that we can and should make use of statistical methods developed under different philosophies, and I am happy to take the opportunity to elaborate on some of his arguments.

I’ll discuss the following:
– Foundations of probability
– Confidence intervals and hypothesis tests
– Sampling
– Subjectivity and belief
– Different schools of statistics

Foundations of probability. Kass describes probability theory as anchored upon physical randomization (coin flips, die rolls and the like) but being useful more generally as a mathematical model. I completely agree but would also add another anchoring point: calibration. Calibration of probability assessments is an objective, not subjective process, although some subjectivity (or scientific judgment) is necessarily involved in the choice of events used in the calibration. In that way, Bayesian probability calibration is closely connected to frequentist probability statements, in that both are conditional on “reference sets” of comparable events. We discuss these issues further in chapter 1 of Bayesian Data Analysis, featuring examples from sports betting and record linkage.

Confidence intervals and hypothesis tests. I agree with Kass that confidence and statistical significance are “valuable inferential tools.” They are treated differently in classical and Bayesian statistics, however. In the Neyman-Pearson theory of inference, confidence and statistical significance are two sides of the same coin, with a confidence interval being the set of parameter values not rejected by a significance test. Unfortunately, this approach falls apart (or, at the very least, is extremely difficult) in problems with high-dimensional parameter spaces that are characteristic of my own applied work in social science and environmental health.

In a modern Bayesian approach, confidence intervals and hypothesis testing are both important but are not isomorphic; they represent two different steps of inference. Confidence statements, or posterior intervals, are summaries of inference about parameters conditional on an assumed model. Hypothesis testing–or, more generally, model checking–is the process of comparing observed data to replications under the model if it were true. Statistically significance in a hypothesis test corresponds to some aspect of the data which would be unexpected under the model. For Bayesians as for other statistical researchers, both these steps of inferences are important: we want to make use of the mathematics of probability to make conditionally valid statements about unobserved quantities, and we also want to make use of this same probability theory to reveal areas in which our models do not fit the data.

Sampling. Kass discusses the role of sampling as a model for understanding statistical inference. But sampling is more than a metaphor; it is crucial in many aspects of statistics. This is evident in analysis of public opinion and health, where analyses rely on random-sample national surveys, and in environmental statistics, where continuous physical variables are studied using space-time samples. But even in areas where sampling is less apparent, it can be important. Consider medical experiments, where the object invariably is inference for the general population, not merely for the patients in the study. Similarly, the goal of Kass and his colleagues in their neuroscience research is to learn about general aspects of human and animal brains, not merely to study the particular creatures on which they have data. Ultimately, sample is just another word for subset, and in both Bayesian and classical inference, appropriate generalization from sample to population depends on a model for the sampling or selection process. I have no problem with Kass’s use of sampling as a framework for inference, and I think this will work even better if he emphasizes the generalization from real samples to real populations–not just mathematical constructs–that are central to so much of our applied inferences.

Subjectivity and belief. The only two statements in Kass’s article that I clearly disagree with are the following two claims: “the only solid foundation for Bayesianism is subjective,” and “the most fundamental belief of any scientist is that the theoretical and real worlds are aligned.” I will discuss the two statements in turn.

Claims of the subjectivity of Bayesian inference have been much debated, and I am under no illusion that I can resolve them here. But I will repeat my point made at the outset of this discussion that Bayesian probability, like frequentist probability, is except in the simplest of examples a model-based activity that is mathematically anchored by physical randomization at one end and calibration to a reference set at the other. I will also repeat the familiar, but true, argument that most of the power of a Bayesian inference typically comes from the likelihood, not the prior, and a person who is really worried about subjective model-building might profitably spend more effort thinking about assumptions inherent in additive models, logistic regressions, proportional hazards models, and the like. Even the Wilcoxon test is based on assumptions! To put it another way, I will accept the idea of subjective Bayesianism when this same subjectivity is acknowledged for other methods of inference. Until that point, I prefer to speak not of “subjectivity” but of “assumptions” and “scientific judgment.” I agree with Kass that scientists and statisticians can and should feel free to make assumptions without falling into a “solipsistic quagmire.”

Finally, I am surprised to see Kass write that scientists believe that the theoretical and real worlds are aligned. It is from acknowledging the discrepancies between these worlds that we can (a) feel free to make assumptions without being paralyzed by fear of making mistakes, and (b) feel free to check the fit of our models (those hypothesis tests again! Although I prefer graphical model checks, supplanted by p-values as necessary). All models are false, etc.

I assume that Kass is using the word “aligned” in a loose sense, to imply that scientists believe that their models are appropriate to reality even if not fully correct. But I would not even want to go that far. Often in my own applied work I have used models that have clear flaws, models that are at best “phenomenological” in the sense of fitting the data rather than corresponding to underlying processes of interest–and often such models don’t fit the data so well either. But these models can still be useful: they are still a part of statistics and even a part of science (to the extent that science includes data collection and description as well as deep theories).

Different schools of statistics. Like Kass, I believe that philosophical debates can be a good thing, if they motivate us to think carefully about our unexamined assumptions. Perhaps even the existence of subfields that rarely communicate with each other has been a source of progress in allowing different strands of research to be developed in a pluralistic environment, in a way that might not have been so easily done if statistical communication had been dominated by any single intolerant group. Ideas of sampling, inference, and model checking are important in many different statistical traditions and we are lucky to have so many different ideas on which to draw for inspiration in our applied and methodological research.

Go to the link above to read Rob’s original article. The other discussions, and Rob’s response to the discussions, are at the journal’s website.

1. Drew Tyre says:

Interesting discussion! As someone who is passing from Frequentist training, through pragmatic Bayesian, to ???, I particularly found your discussion of subjectivity and belief interesting. I found Donald Gillies book "Philosophical theories of probability" quite enlightening, and I wonder if it is fair to characterize your position as pluralist? That is, there is an objective reality generating the events we see and measure, but that our knowledge of those events is subjective. Our knowledge is subjective because it is conditional on the models we choose to use to quantify the distributions of events.

That may not be the best summary of pluralism.

2. Manuel Moe G says:

Thank you for writing this. This is the clearest and quite comprehensive (even though short and to the point) support for your philosophical views, and I am inclined to agree on all counts.

I read Gelman and Shalizi 2010, and enjoyed it a lot, what my novice brain could understand. But the summary above hits and handles all the difficulties, and is easy to read.

I would recommend people read Gelman and Shalizi 2010 "Philosophy and the practice of Bayesian statistics" [ http://www.stat.columbia.edu/~gelman/research/unp… ] for the section on Mayo's "severe" testing of models, Section 4 "Model checking" – the only lack of the summary above that I can see.

3. Ewan says:

I assume that Kass is using the word "aligned" in a loose sense, to imply that scientists believe that their models are appropriate to reality even if not fully correct. But I would not even want to go that far. Often in my own applied work I have used models that have clear flaws, models that are at best "phenomenological" in the sense of fitting the data rather than corresponding to underlying processes of interest–and often such models don't fit the data so well either.

Kass said that "the most fundamental belief of any scientist is that the theoretical and real worlds are aligned." My impression formed on the basis of reading statisticians is that they tend to accept "all models are false, but some are useful" somewhat uncritically. But you're bound to get a lot of pushback from scientists if you're suggesting that the only criterion against which we can judge a model is whether it is useful for making predictions.

As a scientist, I personally want to know how the world actually works. If there were no knowable fact of the matter, I just wouldn't bother. To say that we're only ever going to have an approximation to the truth, where we say that we don't care about certain details, is okay. Instead of saying that all models are false, but some are useful, the scientific realist would say, "any model will be false with probability one; but some will be closer to the truth than others." But to say that we're only interested in prediction as an end in itself, and that we wouldn't mind if the universe turned out to be a big platypus that just happened to conspire to be describable using physics and chemistry – I think most people just can't make sense of it. If the universe is really a platypus, we should in principle be able to find out and model that; maybe we won't ever see the evidence, but in that case, we would just say that our model is as close to the truth as we can get, not that it's "merely useful." Now, of course, some models are "merely useful"! And that's fine; but that's not where we want to be at the end of the day.

4. Andrew Gelman says:

Ewan:

By "useful," I don't just mean "useful for making predictions." I also mean useful for data reduction, useful for decision making, useful for scientific understanding, etc.

I agree that some models are closer to the truth than others. But don't forget that "closeness to the truth" is only a partial ordering.

5. Ewan says:

Agreed. I still would have been happy to read (or write) his original statement as is, because I understood it to mean that the fundamental belief is that "the theoretical and the real worlds can be aligned and our ultimate scientific goal is to align them". I think this is a point that bears repeating for scientists when thinking about statistics. But I agree that this is only true for the "ideal" scientist who only ever tests and refines theories, and not of actual scientists who also need to do descriptive work.