Anirban Bhattacharya, Debdeep Pati, Natesh Pillai, and David Dunson write:

Penalized regression methods, such as L1 regularization, are routinely used in high-dimensional applications, and there is a rich literature on optimality properties under sparsity assumptions. In the Bayesian paradigm, sparsity is routinely induced through two-component mixture priors having a probability mass at zero, but such priors encounter daunting computational problems in high dimensions. This has motivated an amazing variety of continuous shrinkage priors, which can be expressed as global-local scale mixtures of Gaussians, facilitating computation. In sharp contrast to the corresponding frequentist literature, very little is known about the properties of such priors. Focusing on a broad class of shrinkage priors, we provide precise results on prior and posterior concentration. Interestingly, we demonstrate that most commonly used shrinkage priors, including the Bayesian Lasso, are suboptimal in high-dimensional settings. A new class of Dirichlet Laplace (DL) priors are proposed, which are optimal and lead to efficient posterior computation exploiting results from normalized random measure theory. Finite sample performance of Dirichlet Laplace priors relative to alternatives is assessed in simulations.

I have just a few comments (along with my immediate reaction that the tables in the articles should be replaced by graphs. Really, does anyone care that a certain squared error is “493.03”???).

First, I’m happy to see research on theoretical Bayesian statistics looking at the limit of increasing complexity of the model, what I sometimes call “infill asymptotics” (fixed sample, build more and more model, as opposed to “sprawl asymptotics” where you keep adding more and more data to a fixed model). And I’m happy to see connections to lasso (see here and here for my earlier thoughts on lasso and its popularity).

Also, they talk a lot about how to implement the models using Gibbs sampling, but now that we have Stan, this is much less of an issue. What matters are the models and what they do, and that is indeed most of what’s in the paper at hand.

Basically I think this stuff is great. In the old days I used to be upset when other people made progress on ideas in which I had persistent but unformed thoughts (just as others were perhaps upset when I would publish articles that happened to coincide with their unformed ideas). But now as I get older, I am just happy to see that progress is being made.

I agree about those tables! Even if someone cared about the squared error, he’d probably be happy enough knowing it was 493. Does anyone at all care that it is 493.03? Especially when 0.03 is ~0.005% of the range spanned by numbers in that table?

PS. Is this high symbol density typical of statistics papers? They seem to be unusually light on trying to explain what it all means and why it matters and quite heavy on rigor, or at least apparent rigor.

Then again, I’m no statistician, so perhaps they are indeed deriving some pretty profound results……

Andrew,

I really do not know how you keep this up.

As long as you do then I will publicise it downunder

Blogging is a form of procrastination. It’s more relaxing than doing my real work.

Have you been having good luck with stan in high dimension? I found block-gibbs from MCMCglmm to be faster (ESS/cpu time) on “high” dimensional regression problems (2-3 variance components, 2-3 hundred total variables) using the default settings. That was about 1.0, so maybe it’s better now. My understanding was that this was the folklore for HMC: on non-trivial problems in high dimension the gradient calculations kill you and NUTS blows the rotation invariance advantage unless you do more tuning on the mass matrix.

Ryan:

Could you please send a message with your model to the Stan user’s list? I’m guessing that the current Stan will work well on your problem, but, if it doesn’t, we’d like to know! The gradients are calculated very efficiently using a tree structure so I don’t think that the folklore you cite applies here.

Infill asymptotics usually refers to “sampling increasingly dense observations in a fixed bounded region” (also called fixed-domain asymptotics) opposed to expansion asymptotics where density of observations stays approximately same and region grows. I don’t know specific term for increasing model complexity asymptotics. Infill and expansion terms are used when discussing models where the number of unknowns increases with n (e.g. Gaussian processes) and the increase in the effective number of parameters is different in these two cases.