## Stan Roundup, 27 October 2017

I missed two weeks and haven’t had time to create a dedicated blog for Stan yet, so we’re still here. This is only the update for this week. From now on, I’m going to try to concentrate on things that are done, not just in progress so you can get a better feel for the pace of things getting done.

Not one, but two new devs!

This is my favorite news to post, hence the exclamation.

• Matthijs Vákár from University of Oxford joined the dev team. Matthijs’s first major commit is a set of GLM functions for negative binomial with log link (2–6 times speedup), normal linear regression with identity link (4–5 times), Poisson with log link (factor of 7) and bernoulli with logit link (9 times). Wow! And he didn’t just implement the straight-line case—this is a fully vectorized implementation as a density, so we’ll be able to use them this way:
```int y[N];  // observations
matrix[N, K] x;                  // "data" matrix
vector[K] beta;                  // slope coefficients
real alpha;                      // intercept coefficient

y ~ bernoulli_logit_glm(x, beta, alpha);
```

These stand in for what is now written as

```y ~ bernoulli_logit(x * beta + alpha);
```

and before that was written

```y ~ bernoulli(inv_logit(x * beta + alpha));
```

Matthijs also successfully defended his Ph.D. thesis—welcome to the union, Dr. Vákár.

• Andrew Johnson from Curtin University also joined. In his first bold move, he literally refactored the entire math test suite to bring it up to cpplint standard. He’s also been patching doc and other issues.

Visitors

• Kentaro Matsura, author of Bayesian Statistical Modeling Using Stan and R (in Japanese) visited and we talked about what he’s been working on and how we’ll handle the syntax for tuples in Stan.

• Shuonan Chen visited the Stan meeting, then met with Michael (and me a little bit) to talk bioinformatics—specifically about single-cell PCR data and modeling covariance due to pathways. She had a well-annotated copy of Kentaro’s book!

Other news

• Bill Gillespie presented a Stan and Torsten tutorial at ACoP.

• Charles Margossian had a poster at ACoP on mixed solving (analytic solutions with forcing functions); his StanCon submission on steady state solutions with the algebraic solver was accepted.

• Krzysztof Sakrejda nailed down the last bit of the standalone function compilation, so we should be rid of regexp based C++ generation in RStan 2.17 (coming soon).

• Ben Goodrich has been cleaning up a bunch of edge cases in the math lib (hard things like the Bessel functions) and also added a chol2inv() function that inverts the matrix corresponding to a Cholesky factor (naming from LAPACK under review—suggestions welcome).

• Bob Carpenter and Mitzi Morris taught a one-day Stan class in Halifax at Dalhousie University. Lots of fun seeing Stan users show up! Mike Lawrence, of Stan tutorial YouTube fame, helped people with debugging and installs—nice to finally meet him in person.

• Ben Bales got the metric initialization into CmdStan, so we’ll finally be able to restart (the metric used to be called the mass matrix—it’s just the inverse of a regularized estimate of global posterior covariance during warmup)

• Michael Betancourt just returned from SACNAS (diversity in STEM conference attended by thousands).

• Michael also revised his history of MCMC paper, which as been conditionally accepted for publication. Read it on arXiv first.

• Aki Vehtari was awarded a two-year postdoc for a joint project working on Stan algorithms and models jointly supervised with Andrew Gelman; it’ll also be joint between Helsinki and New York. Sounds like fun!

• Breck Baldwin and Sean Talts headed out to Austin for the NumFOCUS summit, where they spent two intensive days talking largely about project governance and sustainability.

• Imad Ali is leaving Columbia to work for the NBA league office (he’ll still be in NYC) as a statistical analyst! That’s one way to get access to the data!

1. Andrew says:

2. David J. Harris says:

I really enjoy seeing these posts. It’s pretty amazing how much the team accomplishes every week or so. The new GLM functions look especially slick.

3. With regard to “y ~ bernoulli_logit_glm(x, beta, alpha);”, does this have corresponding lpdf, rng functions as well?

• Yes, there is an _lpmf function. No rng yet, but we should have one for every probability function.

• I would guess that for RNG, it wouldn’t be a notable performance loss to just wrap the _glm_rng around the _rng functions, no?

• Writing

```vector[N] y;
for (n in 1:N)
y[n] = normal_rng(x[n] * beta + alpha, sigma);
```

isn’t going to be that much slower than

```vector[N] mu = x * beta + alpha;
for (n in 1:N)
y[n] = normal_rng(mu[n], sigma);
```

Ben Bales is almost done vectorizing, so that we’d be able to write it as

```vector[N] y = normal_rng(x * beta + alpha, sigma);
```

which will be just as efficient as a vectorized version because the vectorization only improves performance when (a) there are big matrix ops involved (one matrix-vector multiply is more efficient than N row-vector/vector products because of memory locality), and (b) when there are gradients involved (there are no gradients in generated quantities).

Nevertheless, we really wnat to write the RNG, so this all looks like:

```vector[N] y = normal_rng(x, beta, alpha, sigma);
```