Bayesian data analysis, as my colleagues and I have formulated it, has a human in the loop.

Here’s how we put it on the very first page of our book:

The process of Bayesian data analysis can be idealized by dividing it into the following three steps:

1. Setting up a full probability model—a joint probability distribution for all observable and unobservable quantities in a problem. The model should be consistent with knowledge about the underlying scientific problem and the data collection process.

2. Conditioning on observed data: calculating and interpreting the appropriate posterior distribution—the conditional probability distribution of the unobserved quantities of ultimate interest, given the observed data.

3. Evaluating the fit of the model and the implications of the resulting posterior distribution: how well does the model fit the data, are the substantive conclusions reasonable, and how sensitive are the results to the modeling assumptions in step 1? In response, one can alter or expand the model and repeat the three steps.

How does this fit in with goals of performing statistical analysis using artificial intelligence? Lots has been written on “machine learning” but in practice this often captures just part of the process. Here I want to discuss the possibilities for automating the entire process.

Currently, human involvement is needed in all three steps listed above, but in different amounts:

1. Setting up the model involves a mix of look-up and creativity. We typically pick from some conventional menu of models (linear regressions, generalized linear models, survival analysis, Gaussian processes, splines, Bart, etc etc). Tools such as Stan allow us to put these pieces together in unlimited ways, in the same way that we can formulate paragraphs by putting together words and sentences. Right now, a lot of human effort is needed to set up models in real problems, but I could imagine an automatic process that constructs models from parts, in the same way that there are computer programs to write sports news stories.

2. Inference given the model is the most nearly automated part of data analysis. Model-fitting programs still need a bit of hand-holding for anything but the simplest problems, but it seems reasonable to assume that the scope of the “self-driving inference program” will gradually increase. Just for example, we can automatically monitor the convergence of iterative simulations (that came in 1990!) and, with Nuts, we don’t have to tune the number of steps in Hamiltonian Monte Carlo. Step by step, we should be able to make our inference algorithms more automatic, also with automatic checks (for example, based on adaptive fake-data simulations) to flag problems when they do appear.

3. The third step—identifying model misfit and, in response, figuring out how to improve the model—seems like the toughest part to automate. We often learn of model problems through open-ended exploratory data analysis, where we look at data to find unexpected patterns and compare inferences to our vast stores of statistical experience and subject-matter knowledge. Indeed, one of my main pieces of advice to statisticians is to integrate that knowledge into statistical analysis, both in the form of formal prior distributions and in a willingness to carefully interrogate the implications of fitted models.

How would an AI do step 3? One approach is to simulate the human in the loop by explicitly building a model-checking module that takes the fitted model, uses it to make all sorts of predictions, and then checks this against some database of subject-matter information. I’m not quite sure how this would be done, but the idea is to try to program up the Aha process of scientific revolutions.

**The conscious brain: decision-making homunculus or interpretive storyteller?**

There is another way to go, though, and I thought of this after seeing Julien Cornebise speak at Google about a computer program that his colleagues wrote to play video games. He showed the program “figuring out” how to play a simulation the 1970s arcade classic game, Breakout. What was cool was not just how it could figure out how to position the cursor to always get to the ball on time, but how the program seemed to learn strategies: Cornebise pointed out how, after a while, the program seemed to have figured out how to send the ball up around the blocks to the top where it would knock out lots of bricks:

OK, fine. What does this have to do with model checking, except to demonstrate that in this particular example no model checking seems to be required as the model does just fine?

Actually, I don’t know on that last point, as it’s possible the program required some human intervention to get to the point that it could learn on its own how to win at Breakout.

But let me continue. For many years, cognitive psychologists have been explaining to us that our conscious mind doesn’t really make decisions as we usually think of it, at least not for many regular aspects of daily life. Instead, we do what we’re gonna do, and our conscious mind is a sort of sportscaster, observing our body and our environment and coming up with stories that explain our actions.

To return to the Breakout example, you could imagine a plug-in module that would observe the game and do some postprocessing—some statistical analysis on the output—and notice that, all of a sudden, the program was racking up the score. The module would interpret this as the discovery of a new strategy, and do some pattern recognition to figure out what’s going on. If this happens fast enough, it could feel like the computer “consciously” decided to try out the bounce-the-ball-along-the-side-to-get-to-the-top strategy.

That’s not quite what the human players do: we can imagine the strategy without it happening yet. But of course the computer could do so to, via a simulation model of the game.

Now let’s return to step 3 of Bayesian data analysis: model checking and improvement. Maybe it’s possible for some big model to be able to learn and move around model space, and to suddenly come across better solutions. This could look like model checking and improvement, from the perspective of the sportscaster part of the brain (or the corresponding interpretive plug-in to the algorithm) even though it’s really just blindly fitting a model.

All that is left, then, is the idea of a separate module that identifies problems with model fit based on comparisons of model inferences to data and prior information. I think that still needs to be built.

I feel so influential!

Your description of the discovery of a new strategy in Breakout is a pretty accurate account of reinforcement learning.

To translate to model building/checking, you’d have to frame model building/checking process in the terms given by the Introduction section of this article:

https://en.m.wikipedia.org/wiki/Reinforcement_learning

You’d have to define a finite set of possible actions and some notion of reward.

In the Atari games, the model had knowledge of the game’s score, which served as the reward.

To be fair to Cornebise and his colleagues at DeepMind, reinforcement learning is exactly the technique they used in this work.

“Now let’s return to step 3 of Bayesian data analysis: model checking and improvement. Maybe it’s possible for some big model to be able to learn and move around model space, and to suddenly come across better solutions. This could look like model checking and improvement, from the perspective of the sportscaster part of the brain (or the corresponding interpretive plug-in to the algorithm) even though it’s really just blindly fitting a model.”

One hot topic in deep learning which comes to mind in this context is generative adversarial networks: two networks are paired in a minimax game and trained to generate data and classify fake/real data, respectively. If the generator is missing some key detail, the discriminator will hopefully pick up on that. Since they are NNs, they are extremely flexible & powerful, and can produce almost arbitrarily complicated data like whole images and also pick up on almost arbitrarily small flaws (at least in theory). This looks a lot like a human statistician inspecting the output of a Bayesian model, no? Now replace the human statistician with a little neural network/statistician-in-a-box. You could imagine doing ‘posterior neural checks’ by taking your best Bayesian (generative) model, training a (probably small) Bayesian neural network to classify posterior-predictive-sample vs real, and… you probably wouldn’t use it to automatically change your Bayesian model because you’re using a Bayesian model for its interpretability & semantic content, but you could at least use the confidence of the classifier as an indication of how bad your models are, use it to choose between models (which of your models does the classifier do worst on?), and use it to flag specific parts of the posterior samples (where is your model performing worst?).

More specifically, I expanded on my suggestion there a little by demonstrating how you could use a random forest to do model criticism by comparing the original data to posterior predictive samples to get a measure of model quality:

http://www.gwern.net/Statistical%20notes#model-criticism-via-machine-learning

A simple linear regression model with rounding of covariates, right-censoring of outcome, and a quadratic trend in one variable; the random forest can spot the fake data.

One could imagine extending this further. For example, the random forest just gives yes/no in classification, but a Bayesian neural network could get confidences and highlight *which* datapoints are most obviously fake, which would be helpful when you don’t already know how the model being checked is wrong.

So to make the Breakout-playing AI conscious, would we just have to plug a module sports-commentating NN narrating the plays?

It seems wrong to me to describe inference as contained entirely in Step 2. Drawing inferences from a fitted model is directly related to the implications of the resulting posterior distribution and whether or not the substantive conclusions are reasonable. Calculating the posterior given the data is a necessary but not sufficient component of statistical inference.

Or maybe this is yet another case of statisticians redefining a word in an unnecessarily confusing way. See also “significance,” “confidence,” and probably some others that escape me at the moment.

I’m sure your aware of the Automatic Statistician project at Cambridge — https://www.automaticstatistician.com/index/

They claim to be “reasoning over an open-ended language of nonparametric models using Bayesian inference”, to be doing model checking, and also generating automatically written reports.

This was my first choice of your ‘pendings’ to do! Thanks!

Must take a close look…

My other choices of slightly lower urgency were:

The new quantitative journalism

The challenge of constructive criticism

If I have not seen far, it’s cos I’m standing on the toes of midgets

What should be in an intro stat course?

When is a knave also a fool?

Scientific and scholarly disputes

BTW – that election thing…

It wasn’t your fault mate ;-)

Strange:

Don’t thank me. Thank my past self. I completed this post months ago and it just happened to appear yesterday.

Regarding step 3: we judge statistical inferences also on how they fit with an (usually post hoc) casual explanation. Peter Lipton’s book “inference to the best explanation” builds on Peirce’s abductive reasoning (which brings us also to your previous post on exploratory studies) and he characterises good explanations as likely (we know about this bit) and lovely (explaining other stuff too, possibly simple, elegant and all that mathematical stuff). He thought that maybe Bayesian analysis could help by priors that promote lovely explanations. It’s a tough call, but in a ML context, not inconceivable. I’d be interested to hear your thoughts on that.

I view the steps

1. as the speculation of sensible probability distributions to represent how unobserved values (such as parameters) were set or determined and then how observations came about given particular unobserved values. The representation is seldom literal but rather idealized as if from some physical probabilistic mechanisms.

2. as simply the (approximate) deduction of what those probabilities in 1. are given the observations in hand (two stage rejection sampling though usually not feasible is a very direct way to approximately deduce those probabilities).

3. as the critical evaluation, in light of the observations, about the continued sensibility of probability distributions to represent unobserved and observed values that fully considers possible modifications.

So I think computer programs are stuck in deduction (though more generally than deducing the prior) as the code implicitly defines all possible representations that could flow from that code. So with deep learning, though the nested probability models can represent a vast variety of representations which the program can search to find a surprisingly good one for some purpose – it cannot get outside to representations not implied in the code.

Also, those representations are very literal – I don’t see any sense of how what is represented is to be distinguished from its representation. So we will always have the right to laugh at them ;-)

“The word butterfly is not a real butterfly. There is the word and there is the butterfly. If you confuse these two items people have the right to laugh at you.” Leonard Norman Cohen, (September 21, 1934 – November 7, 2016)

Opps – though more generally than deducing the _posterior_

How does this model checking AI differ from e.g. n-fold–cross-validated lasso?

Eric:

Cross-validation tends to be focused on minimizing some estimate of prediction error within some fixed set of choices. When I say “model checking,” I’m talking about identifying model misfit and, in response, figuring out how to improve the model. This is not the same as minimizing error; it’s more like exploratory data analysis in that I’m looking for patterns in the data that were not predicted by the model.

> I think that still needs to be built.

I think this phrase is a classical flag for what I heard Stuart Russell, a few months ago, call an ‘expected breakthrough’.

As some others have written already, explicitly or implicitly, the key thing seems to be the reward.

Playing games is ultimately a supervised problem.

I don’t really see how real data analysis is supervised, apart from the good old boring prediction error (which covers some DA but not all).

But the thing is, if a machine has too much freedom optimising, say, cross-validated prediction error, this becomes invalidated as measurement of quality (i.e., reward). Forking paths and all that.