This comment from Ben reminded me that lots of people are running nonlinear regressions using least squares and other unstable methods of point estimation.

You can do better, people!

Try stan_nlmer, which fits nonlinear models and also allows parameters to vary by groups.

I think people have the sense that maximum likelihood or least squares is this rigorous, well-defined thing, and that Bayesian inference is flaky. The (mistaken) idea is that when using Bayesian inference you’re making extra assumptions and you’re trading robustness for efficiency.

Actually, though, Bayesian inference can be more robust than classical point estimation. They call it “regularization” for a reason!

**P.S.** I’m not kidding that this can make a difference. Consider this bit from an article cited in the above-linked post:

The point here is not that there’s anything wrong with the above steps, just that they represent a lot of effort to get something that’s kinda clunky and unstable. Just a lack of awareness of existing software.

**P.S.** Earlier post title was wrong: it’s stan_nlmer you want for this purpose, not stan_lmer. Damn confusing function names!

sklearn in python sets the default regularization parameter in their LogisticRegression to a non-zero value, which I think is a step forward in this regard. I think setting regularization parameters to a non-zero default, even a bad one, should be standard for frequentist MLE packages. At least if it’s a bad one, it forces people to think about what a sensible value would be. Otherwise everyone will just leave it at zero and argue “tHiS wAy iT’s ObJeCtIvE”.

Somebody:

Yup. In addition to the “social” reason that you give, there are (at least) three other good reasons to have default regularization:

1. Better predictions.

2. No problems anymore with separation.

3. Now that you have proper priors, you can automatically do prior predictive simulation and fake-data checking.

I’m skeptical that 2 is an advantage, wouldn’t it be better to warn/output an error if there’s separation and make the user make decisions? In the case of a separation your choice of regularisation/prior will virtually completely determine estimates…

Zhou:

It would be fine to warn about separation too. More generally, I think our software should be flagging al those cases where the prior is highly informative; see here. I want to do more work on this topic. One subtle point here is that the informativeness of the prior is not a function of the prior alone; it also depends on the data model and the data.

With a flat prior, you will get a warning in Stan if you run a logistic regression with separable data, because it will quickly explore coefficient values past +/- 10^300 and then overflow.

Even with a weakly informative, separable data, especially if there’s a lot of it, will lead to extreme parameter estimates that are easily spotted (even in bulk by plotting histograms of parameter estimates).

I think the point of defaults is to cater to those who don’t want to make decisions. Then the question is whether our software should be enabling not making decisions.

I suspect if you did away with defaults, most of those people would Google around or look on GitHub or whatever to find an example of what someone else used in some model, somewhere. Then they’d make it their personal “default”.

That’s certainly what happened with BUGS and the gamma(epsilon, epsilon) prior.

I think it matters whether the goal is primarily prediction or whether it’s inference about the components of the regression model. The (frequentist) bias caused by regularization seems to make it very challenging to construct confidence intervals and it’s even worse if there is variable selection as in lasso. sklearn is a library of machine learning models for which the focus is prediction so it doesn’t really care about that (and doesn’t compute standard errors). I, and I imagine many others, would be disturbed if, eg lm regularized by default.

But I think this is another point in favor of Bayes. It can do regularization plus inference.

As an industry data scientist I found myself going back and forth with rstanarm and brms a lot. Both are great! I sometimes wish one would dominate more to simplify my workflow but it’s good to have options.

Jd2:

Yes, brms is great too. At first I was thinking that it was too bad that there are these two packages which are so similar, but right now I think a bit of pluralism helps. Recently I’ve been talking with the rstanarm and brms developers about making sure our default priors make sense.

As an industry data scientist in a GPL-averse organization who would prefer to use rstanarm and/or brms, I just wish both had different licenses so I could use them more often. At least brms is GPL-2. But like jd2, I like the ability to go back and forth.

R itself is GPL-ed, as is the tidyverse.

CmdStan, PyStan 3, and CmdStanPy have non-copyleft licenses. I don’t know about CmdStanR.

Andrew, the sentiment and link are right but the function name in the title and text is wrong. I was referring to stan_nlmer — which does logistic growth models and other nonlinear stuff but people don’t know it exists — rather than stan_lmer (which is pretty popular).

In any event, if anyone has questions about stan_nlmer (or stan_lmer for that matter) post on Discourse.

Fixed, thanks.

There is still a typo, I think. Now it says nmler.

D’oh!

Hi from the University of Iceland! We’ve been helping our government with short term predictions and even though we haven’t been using stan_lmer() we’ve been using Stan! We saw early that the ML fits were not robust enough so we devised a hierarchical model to help increase stability. Here are some links to our official page and a technical report on the model.

https://covid.hi.is/english/

https://rpubs.com/bgautijonsson/HierarchicalLogisticGrowthCurves

Brynjólfur,

I took a quick look at your report and your code, and I think I found a small error. In Stan, it’s normal(mu, sigma), not normal(mu, sigma^2). In your code it looked like you were passing the variance rather than the sd for the scale parameter of the normal.

Thank you, Andrew. I realised that a couple of days ago and believe I have fixed all those errors. Thank you for the response!