Skip to content

Many Ways to Lasso

Jared writes:

I gave a talk at the Washington DC Conference about six different tools for fitting lasso models. You’ll be happy to see that rstanarm outperformed the rest of the methods.

That’s about 60 more slides than I would’ve used. But it’s good to see all that code.


  1. Richard Hardy says:

    I have checked out the slides but did not get the idea. What is driving the differences between the different methods of fitting lasso?

    • Yuling says:

      Besides the conceptual difference between MAP versus posterior distribution for stan, I imagine it is mostly the default tuning of hyper-parameter in different software that makes the difference, as otherwise we should be pretty shocked if different algebra solvers lead to diverging answer for the same strictly convex problem.

  2. Adam B says:

    Can anyone recommend a good introductory tutorial for fitting Bayesian penalized regression with STAN/rstanarm/brms? Thank you.

  3. Jared says:

    The various methods either tuned the hyperpameters automatically (like rstanarm) or used a value that was heuristically chosen.

  4. Jackson Monroe says:

    Keras used MAE as the criterion function. Isn’t that a mistake?

  5. jd2 says:

    No mention of regularized horseshoe, which is what I would use rstanarm sparse regression for over better Lasso defaults, anyway.

    • Jared says:

      Would have done the horseshoe but wanted to keep it all lasso.

      • Richard Hardy says:

        Very interesting topic. As mentioned above, I have checked out the slides but did not get the idea. What is actually driving the differences in performance between the different methods of fitting lasso? Do you have any for-dummies type of summary available?

      • Richard Hardy says:

        Oh, I see I missed the talk that you link to in one of the comments above. I guess that could be a starting point. (Have not checked it out yet.) Thanks for the link!

      • Richard Hardy says:

        OK, so I watched the talk. I liked that it was easy to follow, but what is the takeaway (besides how to pronounce “glmnet” – that was a good one)? You have illustrated that lasso can be implemented in several different ways using existing packages. You have shown that the results were different, but have not really explained why. I think it could very well have been due to different values of tuning parameters used in different implementations. Some implementations shared the values, and incidentally the results were similar, e.g. glmnet and lars. Still not quite sure what conclusion to make out of all this.

  6. Adrian says:

    Why are they using `intercept=FALSE, standardize=FALSE` with glmnet?

Leave a Reply