Ripley on model selection, and some links on exploratory model analysis

This is really fun. I love how Ripley thinks, with just about every concept considered in broad generality while being connected to real-data examples. He’s a great statistical storyteller as well.

. . . and Wickham on exploratory model analysis

I came across Ripley’s slides in a reference from Hadley Wickham’s article on exploratory model analysis. I’ve been interested for awhile in statistical graphics for understanding fitted models (which is different than the usual use of graphics to visualize data or to understand discrepancies of data from models). Recently I’ve started using the term “exploratory model analysis,” and it seemed like such a natural phrase that I thought I’d google it and see what’s up. I found the above-linked paper by Hadley, which in turn refers to a paper by Antony Unwin, Chris Volinksy, and Sylvia Winkler that defines “exploratory modelling analysis” as “the evaluation and comparison of many models simultaneously.” That’s not exactly what I had in mind, but it’s pretty close.

P.S. I was curious to see what research Ripley’s been up to lately. His webpage doesn’t seem to have been updated in many years, but a Google scholar search revealed this article on estimating disease prevalence. I have no idea how he got involved in that project, but I hope he is getting deep enough into the problem to inspire further insights. (The search also revealed a bunch of articles and patents on electric household appliances, but that seems to be a different Brian D. Ripley.)

11 thoughts on “Ripley on model selection, and some links on exploratory model analysis

  1. Your point about Ripley’s talent for connecting generalities to real data is an interesting one, since the ability ( or lack thereof) to connect the abstractions of statistical theory to concrete real world problems seems to be a real stumbling block for students, and their ability to use statistics to address real world problems. Is that consistent with your experience in teaching statistics? Does the Bayesian approach facilitate bridging the gap between theory and practice?
    I also like the comment from Wickham’s article about the ability of exploratory model analysis about how researcher could gain insights from “bad” models, another area where students and researchers alike might benefit from examining further.

  2. I just dug out this post of yours. Thanks for the link to Brian Ripley’s slides! I was interested in an apparent contrast between the two approaches he mentioned there: the black-box vs transparent-box statistics, prediction vs explanation, data-driven machine learning vs theory-driven hypothesis testing etc…..

    For anyone interested in the subject I can highly recommend Leo Breiman’s paper “Statistical Modeling: The Two Cultures” on random forests and other things with discussion (including a reply from sir Brian Cox himself) in Statistical Science vol. 16, issue 3 of 2001:

    I would be delighted to see some more Andrew’s comments on that topic!

    • Michal:

      Re Breiman see here. Like many a productive researcher, Breiman was on much more solid ground when discussing his own ideas than when trying (or, in his case, not trying) to understand the work of others.

      Ripley, in contrast, does a good job describing all sorts of methods. His writings often reveal unexpected connections between disparate areas of statistics.

    • Michal: Interesting, when Brian Ripley presented these or very similar slides at John Nelder’s festschrift he acknowledged David’s role in convincing him of the value of considering explanation for the sake of getting better (more likely to generalize) prediction.

Comments are closed.