Modeling coefficients for large numbers of related predictors

Paul Gustafson and Sander Greenland have a preprint entitled, “The Performance of Random Coefficient Regression in Accounting for Residual Confounding.” From their paper:

The problem studied in detail by Greenland (2000) involves a case-control study of diet, food
constituents, and breast cancer. The exposure variables are intakes of 35 food constituents
(nutrients and suspected carcinogens), each of which is computed from responses to an 87-item
dietary questionnaire. An analysis based on the 35 food constituents alone assumes that the 87
diet items have no effect beyond that mediated through the food constituents. Greenland (2000)
comments that this is a strong and untenable assumption. As an alternative he included both the
food constituents and the diet items in a logistic regression model for the case-control status, while
acknowledging that this model is formally nonidentified since each food constituent variable is a
linear combination of the diet variables. To mitigate the lack of identifiability, a prior distribution is
assigned to the regression coefficients for the diet variables, i.e., random coefficient regression is
used. The prior distribution has mean zero with small variance, chosen to represent the belief that
these coefficients are likely to be quite small typically, as they represent `residual confounding’
effects of diet beyond those represented by the food constituents. Greenland argued that,
however questionable this prior may be, it is surely better than the standard frequentist analyis of
such data, which omits the diet variables entirely – equivalent to using the random-coefficient
model with a prior distribution that has variance (as well as mean) zero.

I have long felt that hierarchical modeling is the way to go in regression with large numbers of related predictors, but I was not familiar with the Greenland (2000) paper. Section 5.2.3 of my 2004 Jasa paper on parameterization and Bayesian modeling presents a similar idea, but I’ve never actually carried it out in a real application. So I’d be interested in seeing more about Greenland’s example.

P.S. I like the following quote in the abstract of Greenland’s paper:

The argument invokes an antiparsimony principle attributed to L. J. Savage, which is that models should be rich enough to reflect the complexity of the relations under study. It also invokes the countervailing principle that you cannot estimate anything if you try to estimate everything (often used to justify parsimony). Regression with random coefficients offers a rational compromise . . .

This accords with my views on parsimony and inference (see also here and here).

P.P.S. On a technical level, I’m disturbed that Gustafson and Greenland use inverse-gamma prior distributions for their variance parameters. I think this is too restrictive as a parametric family. Ironically, the family of prior distributions I’ve proposed has the same mathematical form as the multiplicative models that can be used for varying coefficients.