Sorry to blow you up with a gazillion comments, but this thread in discourse seems to suggest something different

https://discourse.mc-stan.org/t/zero-inflated-models-in-stan/19496/12

“integrating over the set R – {0} gives 1”

Sorry, I mean gives (1 – lambda)

Integral over {0} + Integral over R – {0} = lambda + (1 – lambda) Normal(0) + (1 – lambda) = 1 + (1 – lambda) Normal(0) > 1?

]]>I’ve rethought my question. Okay, so a dominating measure should be

mu(A) = Indicator(1 in A) + Lebesgue(A)

Integrating your density:

Integral lambda Indicator(x == 0) + (1 – lambda) Normal(x) d mu

over the set {0} gives lambda + (1 – lambda) Normal(0), and integrating over the set R – {0} gives 1, so we end up with a probability greater than 1?

]]>I see why my formulation doesn’t make sense, but I’m having trouble wrapping my mind around the likelihood of a mixed distribution, maybe because I’ve never taken measure theory. I think the density should be

lambda * dirac_delta(x) + (1 – lambda) f_theta(x)

where parameters are (lambda, theta). I’m having trouble going from this density to the log likelihood

log(1 – lambda) + log f_theta(x) + x == 0 ? log(lambda) : 0

]]>Hi, Sean! It’s already in the user’s guide: https://mc-stan.org/docs/2_26/stan-users-guide/reparameterizations.html

What we need is a good index. I found it with Google search [beta prior parameterization stan user’s guide].

]]>Bob, would love to see your hierarchical beta regression added to the SUG

]]>“A transformation samples a parameter, then transforms it, whereas a change of variables transforms a parameter, then samples it. Only the latter requires a Jacobian adjustment.”

Exactly what I needed at exactly the right time :-) Thanks Stan team!!!

]]>Technically, if you zero-inflate a continuous distribution, it’s no longer continuous. But that doesn’t mean you can’t do it. The result is just a mixed distribution.

Here’s the typical “spike and slab” formulation using lambda in (0, 1) as the mixing rate, which is also the zero-inflation rate.

p(x) = lambda * bernoulli(x | 0) + (1 - lambda) * normal(x | mu, sigma)

In Stan, you can code this mixture directly,

target += log_mix(lambda, bernoulli_lpdf(x | 0), normal_lpdf(x | mu, sigma));

Or you can code it in a way that is more efficient and reflects the inflation.

target += log1m(lambda) + normal_lpdf(x | mu, sigma); if (x == 0) target += log(lambda);

P.S. I added <pre></pre> tags around your code—Wordpress doesn’t do markdown.

]]>What do we want to interpret? The distribution? Its parameters? Its priors?

Any parameterization of negative binomial will let you define expectation and standard deviation as derived quantities.

How about using a mean/variance parameterization? All we require is that variance >= sqrt(mean)? That’s super easy to interpret, but it pushes us to a joint prior in order to be able to normalize w.r.t. the constraint. So what do we do? Well, we could take scale to just be psi * sqrt(mean) for psi >= 1. Or we could take scale to be sqrt(mean) + psi * sqrt(mean) for psi >= 0.

Andrew took the approach that’s popular in the regression literature, which does something similar, only with variance rather than scale. That is, variance is parameterized in terms the mean and parameter psi >= 0 as variance = mu + psi * m, making the scale sd = sqrt(mean + psi * mean). This is the most popular approach in the regression literature, but I don’t personally find variance very easy to interpret. Also for reasons that escape me, Andrew parameterized in terms of phi = 1 / psi, which I would think would be more challenging for prior formulation. What priors did you use for psi, Andrew?

]]>Alain:

If you have binary outcomes and you’re using the log model, then I guess the rates are very close to zero, so Poisson is the same as logistic.

]]>Bob:

Negative binomial can be interpretable, but (a) you have to use the alternative parameterization for the negative binomial, not the standard parameterization, and (b) even with the alternative parameterization, the extra parameter is not so easy to interpret. In section 15.2 of Regression and Other Stories, we define phi as a “reciprocal dispersion parameter” so that sd(y|x) = sqrt(E(y|x) + E(y|x)^2/phi: “In this parameterization, the parameter phi is restricted to be positive, with lower values corresponding to more overdispersion, and the limit phi -> infinity representing the Poisson model (that is, zero overdispersion).” Our description has the virtue of being unambiguous, but it’s still not so easy to interpret phi.

]]>We can’t just model zeros separately from non-zeros and treat the continuous model as a separae likelihood on some finite probability p?

model { p_zero ~ beta(prior_zero, prior_nonzero) n_zero ~ binomial(N, p_zero) mu ~ normal(mu_prior_mu, mu_prior_sd) sigma ~ log_normal(log_sigma_prior_mu, log_sigma_prior_sd) non_zeros ~ normal(mu, sigma) }

I feel like this should be okay since with any continuous model the probability of getting an exact real number is zero, so you’ll almost surely not get any observations where `is_zero = False` but the value is zero anyways by chance from the continuous distribution.

[editor: code escaped]

]]>What’s the problem for interpretability? The inflation/hurdle models just add extra probability for zero values and the negative binomial adds an independent control for variance > mean (but can’t handle underdispersion where variance < mean).

]]>You could create a zero-inflated negative binomial in the same way you create a zero-inflated Poisson. Or you can alternatively rethink the way negative binomial is coded. That is,

y ~ neg_binomial(alpha, beta)

just marginalizes out the lambda in

lambda ~ gamma(alpha, beta) y ~ poisson(lambda)

So an alternative for over dispersed regression is to just add an overdispersion term epsilon[n],

y[n] ~ poisson(x[n] * beta + epsilon[n])

Then you can give the overdispersion parameter a hierarchical prior,

epsilon[n] ~ normal(0, sigma[group[n]])

You could probably do something like this using the negative binomial, but I find the parameterization trickier in the same way I find writing hierarchical beta models tricky.

]]>I’m not thrilled with this being the example in the user’s guide.

The joy of open source is that you can submit a revision as a pull request. It’s all just R markdown.

For everyone else, Andrew and I have an ongoing debate over whether to write simple hello-world-type computer scientist examples or complicated narrative what-if statistician-type examples. I opted for the former. Andrew’s books opt for the latter.

]]>I hope to spend a month or so this year rewriting the whole thing to bring it up to our current coding practices. And maybe write a shorter getting started with Stan modeling document that tries to convey the overall way things work more succinctly.

]]>