Same old same old

Posted on May 9, 2013 5:15 PM by Andrew

In an email I sent to a colleague who’s writing about lasso and Bayesian regression for R users:

The one thing you might want to add, to fit with your pragmatic perspective, is to point out that these different methods are optimal under different assumptions about the data. However, these assumptions are never true (even in the rare cases where you have a believable prior, it won’t really follow the functional form assumed by bayesglm; even in the rare cases where you have a real loss function, it won’t really follow the mathematical form assumed by lasso etc), but these methods can still be useful and be given the interpretation of regularized estimates.

Another thing that someone might naively think is that regularization is fine but “unbiased” is somehow the most honest. In practice, if you stick to “unbiased” methods such as least squares, you’ll restrict the number of variables you can include in your model. So in reality you suffer from omitted-variable bias. So there is not safe home base. It’s not like the user can simply do unregularized regression and then think of regularization as a frill. The practitioner who uses unregularized regression has already essentially made a compromise with the devil by restricting the number of predictors in the model to a “manageable” level (whatever that means).