**Winston Churchill said that sometimes the truth is so precious, it must be attended by a bodyguard of lies. Similarly, for a model to be believed, it must, except in the simplest of cases, be accompanied by similar models that either give similar results or, if they differ, do so in a way that can be understood.**

In statistics, we call these extra models “scaffolding,” and an important area of research (I think) is incorporating scaffolding and other tools for confidence-building into statistical practice. So far we’ve made progress in developing general methods for building confidence in iterative simulations, debugging Bayesian software, and checking model fit.

My idea for formalizing scaffolding is to think of different models, or different versions of a model, as living in a graph, and to consider operations that move along the edges of this graph of models, both as a way to improve fitting efficiency and as a way to better understand models by making informative comparisons. The graph of models connects to some fundamental ideas in statistical computation, including parallel tempering and particle flitering.

P.S. I want to distinguish scaffolding from model selection or model averaging. Model selection and averaging address the problem of uncertainty in model choice. The point of scaffolding is that we would want to compare our results to simpler models, *even if we know that our chosen model is correct*. Models of even moderate complexity can be extremely difficult to understand on their own.

I wrote a small R package along those lines. Description here: http://www.stat.iastate.edu/preprint/articles/200…

I often use this while builing models. Using this approach one can get a felling how the the model "behaves" while adding or removing some effect(s).

So if even "wronger" models give similar results this supports the less wrong model as well as helps with the computing for the less wrong model.

Keith

Just to clarify, are you referring to model specification tests ( e.g F,J, implemented via anova() in R), where we look for significant differences between say y=b0+b1*X1+b2*X2 and y=b0+b1*X1?

Are you suggesting either a) That these tests are too simplistic and we need more sophisticated (or at least visual) ways to make these comparisons AND/OR b)That we need a more systematic, theoretically grounded way to make these comparisons?

Hadley – neat stuff and thanks for the link to the Ripley's talk (as an aside you might find Nelder's "There are no outliers in the outliers data set" an interesting example to use)

Frank – I believe people tend to be more sure of thier work/abilities than they "should" be. Over the years I have encountered

while trying to appraise a dramatic treatment effect in an observational study – "rather than giving a histogram of effect coefficients over all possible linear covariate adjustments we will find the BEST model and just report that coefficient"

"a professional satistician would not need to undertake an "understudy analysis" (i.e. use a simpler model)- they need only to carefully plot the data."

"you need not plot the raw data if you have picked the RIGHT summaries"

i.e. "climbing up" the computations from simpler to more complex analyses and admitting (and displaying) uncertainty about different models – seems "lame" and somewhat "unprofessional".

Terms like "least wrong" or "less wrong" rather than "best", might help somewhat with "allowing" the admission of inherent modeling uncertainty.

Keith