Timothy Brathwaite sends along this wonderfully-titled article (also here, and here’s the replication code), which begins:

Typically, discrete choice modelers develop ever-more advanced models and estimation methods. Compared to the impressive progress in model development and estimation, model-checking techniques have lagged behind. Often, choice modelers use only crude methods to assess how well an estimated model represents reality. Such methods usually stop at checking parameter signs, model elasticities, and ratios of model coefficients. In this paper, I [Brathwaite] greatly expand the discrete choice modelers’ assessment toolkit by introducing model checking procedures based on graphical displays of predictive simulations. . . . a general and ‘semi-automatic’ algorithm for checking discrete choice models via predictive simulations. . . .

He frames model checking in terms of “underfitting,” a connection I’ve never seen before but which makes sense. To the extent that there are features in your data that are not captured in your model—more precisely, features that don’t show up, even in many different posterior predictive simulations from your fitted model—then, yes, the model is underfitting the data. Good point.

Interesting! Relatedly, I’ve generally found posterior predictive checks bad at detecting ‘overfitting’.

I suppose this begs the question of which “features” of the data may be anomalies in the particular sample you have, ones that wouldn’t necessarily generalize to other samples (or the population from which all the samples are drawn). In one sense, I like that this allows the researcher to say, in effect, “how come this obviously important thing isn’t being caught by the model?”

Yet, in another, models are supposed to help us separate out what’s empirically generalizable from what may be idiosyncratic deviations, e.g., in the sense of regularization. So, I can imagine doing this for an observable that has a clear economic or other discernable meaning, but less so for some multivariate relationship, and probably not at all for an intermediate level in a hierarchical model specifying latent utilities.

But I’ve only skimmed the paper thus far.

Thanks for the comments Fred!

In terms of separating generalizable relationships from idiosyncratic deviations, I think that, alongside the posterior predictive checks, one should definitely make use of techniques specifically designed to detect overfitting (e.g. cross-validation). An example of this is given in Table 3, pg 19 of the paper linked above.

I imagine the process of model building would be an iterative process (a la Box) of (1) estimating a current working model, (2) doing a bunch of posterior predictive checks, (3) altering the model to correct revealed deficiencies in the current model, (4) then checking that the changes were actually beneficial using out-of-sample diagnostics based on techniques such as cross-validation.

If one is at the same time continuously checking for overfitting, why wouldn’t you perform posterior predictive checks for a multivariate relationship or intermediate level in a hierarchical model? For instance, individual level taste parameters would be an intermediate level in a hierarchical choice model, and posterior predictive checks of the distribution of those intermediate parameters were usefully developed in Gilbride and Lenk (2010).

References:

Gilbride, Timothy J., and Peter J. Lenk. “Posterior predictive model checking: An application to multivariate normal heterogeneity.” Journal of Marketing Research 47.5 (2010): 896-909.