It also reminds me that at some point I was thinking about how to do model averaging in causal models/ causal assumptions. The potential outcome is still a predictive quantity so I think it is still possible to define the utility based on predictions. The only change in a causal model is you have to model the covariates shift (the “do” operator as you mentioned).

]]>I have a question about Figure 8 (the table of coefficients) in the paper, which shows estimates of the model coefficients using OLS and the various averaging methods discussed in the paper.

Ever since at least Hoeting et al 1999, there has been the occasional, offhand comment in the statistics literature that it makes no sense to average coefficients across nested submodels since these “estimate different parameters” (the effect conditional on different sets of covariates) — see quotes below.

I disagree. I would argue that, at least if “explanatory” modeling, or predictive modeling where we hope to use the model to make interventions in the system, then model-averaging averages biased estimates of the same parameter and not consistent estimates of different parameters. Or a bit more formally, a regression coefficient for X_j estimates the causal (Pearl’s do or Rubins potential outcome) effect of X_j on Y. While its estimate is conditional on the other covariates in the model, its interpretation or “meaning” is not. Its interpretation takes its meaning from “causal conditioning” and not “probabilistic conditioning” sensu Shalizi’s text p. 505 or section 4.7 here https://plato.stanford.edu/entries/causal-models/

This isn’t an issue with averaging the prediction since the parameter estimated is agreed on by everyone. But Fig 8 is a table of coefficients that are functions of combining (averaging) information from different nested models. Is this apples and lawn tractors (estimates of different parameters) or is it combining coefficients that estimate the same thing?

Some quotes.

“(It is probably also worth mentioning regarding HMRV’s equations (1–3) that ???? needs to have the same meaning in all “models” for the equations to be straightforwardly interpretable; the coefficient of x1 in a regression of y on x1 is a different beast than the coefficient of x1 in a regression of y on x1 and x2.)” – Draper, comment in Hoeting et al. 1999, p. 405

“It is important to apprciate that model averaging makes sense only if the quantities being averaged have the same interpretation for all the models under consideration. Thus averaging of parameter values or estimates over different models is not usually useful, as parameters typically pertain to particular models, even if the same Greek letter is used.” – Candolo, Davison, Demetrio, 2003, p. 166

“What model averaging does not mean is averaging parameter estimates, because parameters in different models have different meanings and should not be averaged, unless you are sure your are in a special case in which it is safe to do so.” Mcelreath, Statistical Rethinking, p. 196

“This is concerning because the interpretation of partial regression coefficients can depend on other variables that have been included in the model, so averaging regression coefficients across models may not be practically meaningful.” – Banner and Higgs 2017

“In general, the only case for which partial regression coefficients associated with a particular explanatory variable hold the same interpretation across models is when the explanatory variables are orthogonal.” – Banner and Higgs 2017

Banner, Katharine M., and Megan D. Higgs. “Considerations for Assessing Model Averaging of Regression Coefficients.” Ecological Applications 27, no. 1 (January 1, 2017): 78–93. https://doi.org/10.1002/eap.1419.

Candolo, C., A. C. Davison, and C. G. B. Demétrio. “A Note on Model Uncertainty in Linear Regression.” Journal of the Royal Statistical Society: Series D (The Statistician) 52, no. 2 (2003): 165–177.

Hoeting, Jennifer A., David Madigan, Adrian E. Raftery, and Chris T. Volinsky. “Bayesian Model Averaging: A Tutorial.” Statistical Science 14, no. 4 (1999): 382–417.

McElreath, Richard. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. CRC Press, 2018.

]]>