Comments on: Whatever you’re looking for, it’s somewhere in the Stan documentation and you can just google for it.

By: somebody

somebody — Thu, 06 May 2021 19:39:05 +0000

Sorry to blow you up with a gazillion comments, but this thread in discourse seems to suggest something different
https://discourse.mc-stan.org/t/zero-inflated-models-in-stan/19496/12

By: somebody

somebody — Thu, 06 May 2021 17:36:55 +0000

In reply to somebody.

“integrating over the set R – {0} gives 1”

Sorry, I mean gives (1 – lambda)

Integral over {0} + Integral over R – {0} = lambda + (1 – lambda) Normal(0) + (1 – lambda) = 1 + (1 – lambda) Normal(0) > 1?

By: somebody

somebody — Thu, 06 May 2021 17:28:14 +0000

In reply to Bob Carpenter.

I’ve rethought my question. Okay, so a dominating measure should be

mu(A) = Indicator(1 in A) + Lebesgue(A)

Integrating your density:

Integral lambda Indicator(x == 0) + (1 – lambda) Normal(x) d mu

over the set {0} gives lambda + (1 – lambda) Normal(0), and integrating over the set R – {0} gives 1, so we end up with a probability greater than 1?

By: somebody

somebody — Thu, 06 May 2021 16:20:04 +0000

In reply to Bob Carpenter.

I see why my formulation doesn’t make sense, but I’m having trouble wrapping my mind around the likelihood of a mixed distribution, maybe because I’ve never taken measure theory. I think the density should be

lambda * dirac_delta(x) + (1 – lambda) f_theta(x)

where parameters are (lambda, theta). I’m having trouble going from this density to the log likelihood

log(1 – lambda) + log f_theta(x) + x == 0 ? log(lambda) : 0

By: Bob Carpenter

Bob Carpenter — Tue, 04 May 2021 15:18:49 +0000

In reply to Sean.

Hi, Sean! It’s already in the user’s guide: https://mc-stan.org/docs/2_26/stan-users-guide/reparameterizations.html

What we need is a good index. I found it with Google search [beta prior parameterization stan user’s guide].

By: Sean

Sean — Tue, 04 May 2021 01:09:28 +0000

In reply to Bob Carpenter. Bob, would love to see your hierarchical beta regression added to the SUG

By: Adam Fleischhacker

Adam Fleischhacker — Tue, 04 May 2021 00:49:25 +0000

I agree, the Stan documentation is amazing. I googled “Stan Jacobian” today and landed on this gem of a quote:

“A transformation samples a parameter, then transforms it, whereas a change of variables transforms a parameter, then samples it. Only the latter requires a Jacobian adjustment.”

Exactly what I needed at exactly the right time :-) Thanks Stan team!!!

By: Bob Carpenter

Bob Carpenter — Mon, 03 May 2021 20:50:18 +0000

In reply to somebody. Technically, if you zero-inflate a continuous distribution, it's no longer continuous. But that doesn't mean you can't do it. The result is just a mixed distribution. Here's the typical "spike and slab" formulation using lambda in (0, 1) as the mixing rate, which is also the zero-inflation rate.

p(x) = lambda * bernoulli(x | 0) + (1 - lambda) * normal(x | mu, sigma)

In Stan, you can code this mixture directly,

target += log_mix(lambda, bernoulli_lpdf(x | 0), normal_lpdf(x | mu, sigma));

Or you can code it in a way that is more efficient and reflects the inflation.

target += log1m(lambda) + normal_lpdf(x | mu, sigma);
if (x == 0) 
  target += log(lambda);

P.S. I added <pre></pre> tags around your code---Wordpress doesn't do markdown.

By: Bob Carpenter

Bob Carpenter — Mon, 03 May 2021 20:38:42 +0000

In reply to Andrew. What do we want to interpret? The distribution? Its parameters? Its priors? Any parameterization of negative binomial will let you define expectation and standard deviation as derived quantities. How about using a mean/variance parameterization? All we require is that variance >= sqrt(mean)? That's super easy to interpret, but it pushes us to a joint prior in order to be able to normalize w.r.t. the constraint. So what do we do? Well, we could take scale to just be psi * sqrt(mean) for psi >= 1. Or we could take scale to be sqrt(mean) + psi * sqrt(mean) for psi >= 0. Andrew took the approach that's popular in the regression literature, which does something similar, only with variance rather than scale. That is, variance is parameterized in terms the mean and parameter psi >= 0 as variance = mu + psi * m, making the scale sd = sqrt(mean + psi * mean). This is the most popular approach in the regression literature, but I don't personally find variance very easy to interpret. Also for reasons that escape me, Andrew parameterized in terms of phi = 1 / psi, which I would think would be more challenging for prior formulation. What priors did you use for psi, Andrew?

By: Andrew

Andrew — Mon, 03 May 2021 19:10:07 +0000

In reply to Alain. Alain: If you have binary outcomes and you're using the log model, then I guess the rates are very close to zero, so Poisson is the same as logistic.

By: Andrew

Andrew — Mon, 03 May 2021 19:09:15 +0000

In reply to Bob Carpenter. Bob: Negative binomial can be interpretable, but (a) you have to use the alternative parameterization for the negative binomial, not the standard parameterization, and (b) even with the alternative parameterization, the extra parameter is not so easy to interpret. In section 15.2 of Regression and Other Stories, we define phi as a "reciprocal dispersion parameter" so that sd(y|x) = sqrt(E(y|x) + E(y|x)^2/phi: "In this parameterization, the parameter phi is restricted to be positive, with lower values corresponding to more overdispersion, and the limit phi -> infinity representing the Poisson model (that is, zero overdispersion)." Our description has the virtue of being unambiguous, but it's still not so easy to interpret phi.

By: somebody

somebody — Mon, 03 May 2021 17:17:24 +0000

> Zero inflation does not work for continuous models

We can’t just model zeros separately from non-zeros and treat the continuous model as a separae likelihood on some finite probability p?

model {
    p_zero ~ beta(prior_zero, prior_nonzero)
    n_zero ~ binomial(N, p_zero)
    mu ~ normal(mu_prior_mu, mu_prior_sd)
    sigma ~ log_normal(log_sigma_prior_mu, log_sigma_prior_sd)
    non_zeros ~ normal(mu, sigma)
}

I feel like this should be okay since with any continuous model the probability of getting an exact real number is zero, so you’ll almost surely not get any observations where `is_zero = False` but the value is zero anyways by chance from the continuous distribution.

[editor: code escaped]

By: Bob Carpenter

Bob Carpenter — Mon, 03 May 2021 16:56:50 +0000

In reply to Brent Hutto. What's the problem for interpretability? The inflation/hurdle models just add extra probability for zero values and the negative binomial adds an independent control for variance > mean (but can't handle underdispersion where variance < mean).

By: Bob Carpenter

Bob Carpenter — Mon, 03 May 2021 16:52:03 +0000

In reply to Alain. You could create a zero-inflated negative binomial in the same way you create a zero-inflated Poisson. Or you can alternatively rethink the way negative binomial is coded. That is,

y ~ neg_binomial(alpha, beta)

just marginalizes out the lambda in

lambda ~ gamma(alpha, beta)
y ~ poisson(lambda)

So an alternative for over dispersed regression is to just add an overdispersion term epsilon[n],

y[n] ~ poisson(x[n] * beta + epsilon[n])

Then you can give the overdispersion parameter a hierarchical prior,

epsilon[n] ~ normal(0, sigma[group[n]])

You could probably do something like this using the negative binomial, but I find the parameterization trickier in the same way I find writing hierarchical beta models tricky.

By: Bob Carpenter

Bob Carpenter — Mon, 03 May 2021 16:40:26 +0000

In reply to Bob Carpenter.

I’m not thrilled with this being the example in the user’s guide.

The joy of open source is that you can submit a revision as a pull request. It's all just R markdown. For everyone else, Andrew and I have an ongoing debate over whether to write simple hello-world-type computer scientist examples or complicated narrative what-if statistician-type examples. I opted for the former. Andrew's books opt for the latter.

By: Bob Carpenter

Bob Carpenter — Mon, 03 May 2021 16:36:08 +0000

You’re welcome. I mainly used the doc as an excuse to learn all these models. So thanks for all the feedback as I was writing a lot of it.

I hope to spend a month or so this year rewriting the whole thing to bring it up to our current coding practices. And maybe write a shorter getting started with Stan modeling document that tries to convey the overall way things work more succinctly.

By: Alain

Alain — Mon, 03 May 2021 14:50:56 +0000

You don’t recommend the Poisson model for count data because count data almost always have more variance than what is predicted by the Poisson model. I think I get it, but… in epidemiology, it is common to use Poisson regression for binary outcomes to compute rates or risk ratio. Since you have also written that binary variables can’t be over-dispersed, is it fine to use the Poisson regression for binary outcomes?

By: Brent Hutto

Brent Hutto — Mon, 03 May 2021 13:45:54 +0000

I don’t have the knowledge or the time (at the moment) to make a useful comment but I’d like to say there’s an interesting discussion possible around the general case where the distribution which is highly preferable for creating a valid model just happens to produce parameters that are nigh impossible to interpret meaningfully. For purely predictive models, it seems to me not a big issue. But for the other kind of model where you’re trying to make a meaningful description of the relationships within the data it’s a big deal. Is there often a trade-off decision to balance poor model fit (or threats to model validity) versus interpretability of the results?