Ben Holmes writes:

I’m a machine learning guy working in fraud prevention, and a member of some biostatistics and clinical statistics research groups at Wright State University in Dayton, Ohio.

I just heard your talk “Theoretical Statistics is the Theory of Applied Statistics” on YouTube, and was extremely interested in the idea of a model-space for exploring and choosing from possibilities in ‘model space’.

I was wondering if you knew of work on any R (or Python, or whatever, I’m not picky!) packages that was being done on this, or could recommend a place to start reading more about the theory/concept.

My reply:

I love this idea of the network of models but I’ve never written anything formal on it, nor do I have any software implementations. Here’s a talk on the topic from 2011, and here’s a post from 2017 with some comments from others too.

I still think this is an important topic—it relates to the idea of a generative grammar for building statistical models, and it should fit in well with Stan. So I’m posting this in the hope that someone will follow up and do it in some way.

Apologies if this has been mentioned before, but this idea seems to be rather like the Automated Statistician work by Zoubin Ghahramani and others (https://www.automaticstatistician.com/index/). I believe the idea there is to model timeseries data using a Gaussian process model, and then to search through the “network” of possible models. A model is defined by a kernel, and these kernels are composable so as search contiues more complex models are explored. (It does appear to be restricted to univariate time series, however.)

Maybe I am missing something, but the letter writer never mentioned a network?

Andrew’s thinking of a network in the graphical sense, where models are nodes and edges connect “adjacent” models. As an example the other day, Andrew mentioned that if you’re doing a compartment ODE with 3 compartments, you have natural 2 compartment and 4 compartment models that are adjacent. If you have ten predictors and you’re using 5 of them, there are 5 you can drop and 5 you can add at that point, to give you ten adjacent models. Oh, and you can have interactions. So there’s really infinitely many possible sets of predictors you can produce and that’s just considering polynomials.

This was the first thing Matt Hoffman and I got set to work on when we started working with Andrew in 2010/11. We pretty quickly decided it would be impractical to explore automatically given the difficulty of model comparison and the combinatorial explosion of potential models. I don’t see how it could help in a practical setting.

What Andrew dreams of is that some kind of IDE would solve all the fussy model naming and exploration. We tend to make moves like this in model space, so it seems like it’d be nice to have tools to support it. Something that’d munge all the data, give us a menu of predictors and outcomes with convenient discovery of types (categorical, ordinal, vector/scalar, etc.). None of us like to have to have a series of models like logistic.stan, hierarchical-logistic.stan, hierarchical-logistic-correlated-prior.stan, ad infinitum.

I’m not sure about the granularity of model in Andrew’s picture of this. If we have constant parameters for priors in our model, then there’s a neighborhood around the parameters of those models that brings us into something much more point-set topological than graphical. Then we can start looking at things like sensitivity of inference w.r.t. changes in these constant parameters.

Yup really infinite and actually a continuum but finite approaches are likely all that’s need ;-)

We are working up something like this for a network of ecologic models. Right now it’s all by hand on index cards. I think I am planning on two figures with the first showing how the models are connected and then maybe how the estimates played out? Not really sure if that second part is useful.

Here is the draft version on the dining room table.

https://github.com/bioinfonm/pystan_musings/blob/master/beginners_excercise/network_of_models.jpg