Many Ways to Lasso

Posted on October 27, 2019 9:04 AM by Andrew

Jared writes:

I gave a talk at the Washington DC Conference about six different tools for fitting lasso models. You’ll be happy to see that rstanarm outperformed the rest of the methods.

That’s about 60 more slides than I would’ve used. But it’s good to see all that code.

19 thoughts on “Many Ways to Lasso”

Richard Hardy on October 27, 2019 4:36 PM at 4:36 pm said:

I have checked out the slides but did not get the idea. What is driving the differences between the different methods of fitting lasso?

Reply ↓
- Yuling on October 27, 2019 6:32 PM at 6:32 pm said:
  
  Besides the conceptual difference between MAP versus posterior distribution for stan, I imagine it is mostly the default tuning of hyper-parameter in different software that makes the difference, as otherwise we should be pretty shocked if different algebra solvers lead to diverging answer for the same strictly convex problem.
  
  Reply ↓
  - Ben Bolker on October 27, 2019 8:29 PM at 8:29 pm said:
    
    +1. I realize that (1) I’m looking a gift horse in the mouth and (2) I don’t know what information was presented verbally in the presentation that doesn’t show up here, but I’m frustrated by being able to see only that xgboost and stan do better than the others, and not what the differences are in the hyperparameter/penalization selection machinery.
    
    Reply ↓
    - Jared on October 28, 2019 11:08 PM at 11:08 pm said:
      
      Here’s the talk
      
      https://youtu.be/R-lVeYjJtw0
Adam B on October 27, 2019 8:00 PM at 8:00 pm said:

Can anyone recommend a good introductory tutorial for fitting Bayesian penalized regression with STAN/rstanarm/brms? Thank you.

Reply ↓
- bozobrain on October 27, 2019 11:33 PM at 11:33 pm said:
  
  Yes please! This would be very interesting.
  
  Reply ↓
Jared on October 27, 2019 8:37 PM at 8:37 pm said:

The various methods either tuned the hyperpameters automatically (like rstanarm) or used a value that was heuristically chosen.

Reply ↓
- Jesper Wulff on October 28, 2019 3:27 AM at 3:27 am said:
  
  How is rstanarm tuning the hyperparameters? Based on the slides, I would guess that a default prior is used.
  
  Reply ↓
  - Andrew on October 28, 2019 8:18 AM at 8:18 am said:
    
    Jesper:
    
    Yes, rstanarm’s priors are set by default. We’ve been talking about making the default priors stronger, actually, which I think should work well, given that they’re already defined relative to centered and scaled predictors and outcomes.
    
    Reply ↓
- Bob Carpenter on October 28, 2019 12:51 PM at 12:51 pm said:
  
  Automatically as in using a hierarchical model?
  
  As Andrew mentions, rstanarm also preconditions the inputs, which changes the interpretation of the default priors.
  
  Is that MSE on held out data with point estimates? I find root mean squared error (aka RMSE) easier to interpret.
  
  Reply ↓
  - Jared on October 28, 2019 11:03 PM at 11:03 pm said:
    
    The lasso() prior used a chi-squared prior on the tuning parameter.
    
    It’s the RMSE on held out data.
    
    Reply ↓
Jackson Monroe on October 28, 2019 1:50 PM at 1:50 pm said:

Keras used MAE as the criterion function. Isn’t that a mistake?

Reply ↓
- Jared on October 28, 2019 11:05 PM at 11:05 pm said:
  
  Keras used MSE as the loss function and MAE as the metric.
  
  Reply ↓
jd2 on October 28, 2019 3:31 PM at 3:31 pm said:

No mention of regularized horseshoe, which is what I would use rstanarm sparse regression for over better Lasso defaults, anyway.

Reply ↓
- Jared on October 28, 2019 11:06 PM at 11:06 pm said:
  
  Would have done the horseshoe but wanted to keep it all lasso.
  
  Reply ↓
  - Richard Hardy on October 29, 2019 7:43 AM at 7:43 am said:
    
    Very interesting topic. As mentioned above, I have checked out the slides but did not get the idea. What is actually driving the differences in performance between the different methods of fitting lasso? Do you have any for-dummies type of summary available?
    
    Reply ↓
  - Richard Hardy on October 29, 2019 7:45 AM at 7:45 am said:
    
    Oh, I see I missed the talk that you link to in one of the comments above. I guess that could be a starting point. (Have not checked it out yet.) Thanks for the link!
    
    Reply ↓
  - Richard Hardy on October 29, 2019 3:39 PM at 3:39 pm said:
    
    OK, so I watched the talk. I liked that it was easy to follow, but what is the takeaway (besides how to pronounce “glmnet” – that was a good one)? You have illustrated that lasso can be implemented in several different ways using existing packages. You have shown that the results were different, but have not really explained why. I think it could very well have been due to different values of tuning parameters used in different implementations. Some implementations shared the values, and incidentally the results were similar, e.g. glmnet and lars. Still not quite sure what conclusion to make out of all this.
    
    Reply ↓
Adrian on October 31, 2019 4:53 PM at 4:53 pm said:

Why are they using `intercept=FALSE, standardize=FALSE` with glmnet?

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

Many Ways to Lasso

19 thoughts on “Many Ways to Lasso”

Leave a Reply Cancel reply