Hi Eric, Yes it should be $i$. It makes sense to stack centered, non-centered together. But in most experiments I have run, I didn’t find too many improvements from additional inclusions. It also takes extra effort in practice to code both centered and non-centered parameterizations.

]]>Oops, for the last question, I left out the part that it was part of an inline equation on page 18.

]]>Finally, is the first sum $\sum_{n=1}{n_{val}}$ supposed to be $\sum_{i=1}{n_{val}}$ using the index $i$ rather than $n$?

]]>Steven, I think the starting point for population Monte Carlo methods is the unbiasedness at every iteration—with the goal to compute the ingegral with respect to the exact posterior density. But the chain-stacking appraoch is not the same, even asymptotically, as the exact posterior.

]]>Ben:

For PSIS, the hope is that even when it has problems, it’s getting us in the right direction. When it fails, I guess a key question is how often it fails. For example if you have a million data points and the PSIS diagnostics show problems 10,000 times, what do you do about it? You’re not gonna re-fit the model 10,000 times. If it’s only messing up on 1% of data points, maybe it’s no big deal? Sometimes when PSIS has problems, Aki recommends K-fold cross validation with K = 5 or 10.

To get BMA weights, you compute or approximate the posterior mass corresponding to each chain. The usual problem with BMA is strong dependence on aspects of the prior that have essentially no impact on the posterior distribution of the parameters in the model. One problem is that people with traditional Bayesian training often think that BMA is the right thing to do. In the setting of this new paper, we’re kind of off the hook on that one because we’re just approximating. One of the goals of the new paper is to understand how stacking can outperform approximate BMA.

]]>When PSIS fails, you do need to re-fit the model right? Or is there justification to ignore these things in this context?

I’d expect a chain stuck in a bad place in the model to have bad PSIS diagnostics.

Something that strikes me looking at these plots is I’ve read through this stuff a bunch, and I never remember what the BMA weights are, and I don’t think I ever understood what pseudo-BMA weights were at all.

Not to say the comparison isn’t worthwhile if there’s some historical context there, but the little example that starts “One of the benefits of stacking is that it manages well if there are many similar models” on this page: https://cran.r-project.org/web/packages/loo/vignettes/loo2-weights.html#example-oceanic-tool-complexity is the thing I always come back to in my head when I think BMA and it’s the reason I’m always happy to dismiss it without caring about it. -10 vs. 0 doesn’t mean as much.

]]>