## Bayes and Popper

Is statisticsl inference inductive or deductive reasoning? What is the connection between statistics and the philosophy of science? Why do we care?

The usual story

Schools of statistical inference are sometimes linked to philosophical approaches. “Classical” statistics–as exemplified by Fisher’s p-values and Neyman’s hypothesis tests–is associated with a deductive, or Popperian view of science: a hypothesis is made and then it is tested. It can never be accepted, but it can be rejected (that is, falsified).

Bayesian statistics–starting with a prior distribution, getting data, and moving to the posterior distribution–is associated with an inductive approach of gradually moving forward, generalizing from data to learn about general laws as expressed in the probability that one model or another is correct.

Our story

Our progress in applied modeling has fit the Popperian pattern pretty well: we build a model out of available parts and drive it as far as it can take us, and then a little farther. When the model breaks down, we take it apart, figure out what went wrong, and tinker with it, or else try a radically new design. In either case, we are using deductive reasoning as a tool to get the most out of a model, and we test the model–it is falsifiable, and when it is falsified, we alter or abandon it. To give this story a little Kuhnian flavor, we are doing “normal science” when we apply the deductive reasoning and learn from a model, or when we tinker with it to get it to fit the data, and occasionally enough problems build up that a “new paradigm” is helpful.

OK, all fine. But the twist is that we are using Bayesian methods, not classical hypothesis testing. We do not think of a Bayesian prior distribution as a personal beliefs; rather, it is part of a hypothesized model, which we posit as potentially useful and abandon to the extent that it is falsified.

Subjective Bayesian theory has no place for falsification of prior distributions–how do you falsify a belief? But if we think of the model just as a set of assumptions, they can be falsified if their predictions–our deductive inferences–do not fit the data.

Why does this matter?

A philosophy of deductive reasoning, accompanied by falsifiability, gives us a lot of flexibility in modeling. We do not have to worry about making our prior distributions match our subjective knowledge, or about our model containing all possible truths. Instead we make some assumptions, state them clearly, and see what they imply. Then we try to falsify the model–that is, we perform posterior predictive checks, creating simulations of the data and comparing them to the actual data. The comparison can often be done visually; see chapter 6 of Bayesian Data Analysis for lots of examples.

I associate this “objective Bayes” approach–making strong assumptions and then testing model fit–to the work of E. T. Jaynes. As he has illustrated, the biggest learning experience can occur when we find that our model does not fit the data–that is, when it is falsified–because then we have found a problem with our underlying assumptions.

Conversely, a problem with the inductive philosopy of Bayesian statistics–in which science “learns” by updating the probabilities that various competing models are true–is that it assumes that the true model is one of the possibilities being considered. This can does not fit our own experiences of learning by finding that a model doesn’t fit and needing to expand beyond the existing class of models to fix the problem.

I fear that a philosophy of Bayesian statistics as subjective, inductive inference can encourage a complacency about picking or averaging over existing models rather than trying to falsify and go further. Likelihood and Bayesian inference are powerful, and with great power comes great responsibility. Complex models can and should be checked and falsfied.

Postscript

These ideas are also connected to exploratory data analysis
(see this paper in the International Statistical Review).

Also, Sander Greenland has written on the philosophy of Bayesian inference from a similar perspective–he has less sympathy with Popperian terminology but similarly sees probabilistic inference within a model as deductive (“Induction versus Popper: substance versus semantics” in International Journal of Epidemiology 27, 543-548 (1998).)

### One Comment

1. Sam Cook says:

Sander Greenland commented:

Looks like you are converging on an "objective" Bayes that does not just decay into mere replication of frequentist approaches using "noninformative" priors that would not be remotely credible given the underlying science. Quite the opposite — and the point I'd emphasize is: strong priors can be strongly refuted (there, I even used a Popperian term!).

The problem left unanswered by this view is: what happens when no data set we have or could get is capable of refuting most reasonable models? In my research areas, at least, too much of the Popperian prescription comes across to me as naively heroic, as if the falsificationist program can be helpful with the limited available data. What good is model rejection when the only models you can reject were never serious contenders?

Also, I wonder how you see your view as squaring with Phil Dawid's recent Stat Science article on his Bayes-Popper synthesis (2004; 19:44-57).

It looks to me like his "alternative" to counterfactuals (potential outcomes)is nothing more than marginal modeling of counterfactuals, without using that label (i.e., is isomorphic to marginal structural modeling of potential outcomes).