21 thoughts on “Mixture models in Stan: you can use log_mix()

  1. Is there a way to implement a hierarchical mixture model longitudinally using stan in this same framework? (ie: classifying subjects based on observations made over time.) I’m looking to use a stan equivalent to something like proc traj in SAS.

    I’m still a stan novice, so I’m trying to think it out.

    • Yep, you can. You can basically mixture-ize anything in stan.

      I tend to forgo the log_mix method and use, e.g., target += log_sum_exp(log(theta[1]) + pdf(whatever | params), log(theta[2] + pdf(whatever | params)) etc, because it’s easier to see what is going on, it’s more programmable (can construct each component in a loop and sum across the vector with log_sum_exp), and it extends to k > 2 mixtures.

      If you want two trajectory states to be defined, just define two separate sets of parameters, order them in some way for identifiability (intercept tends to work well enough, but it depends on the data), and split the likelihood into k likelihoods weighted by log(theta[k]).

    • I think mixture modeling is a terrific approach. However, I would caution against using proc traj in SAS. Research by Diallo and colleagues has shown how these assumptions can lead to misspecified models. This article in particular is really informative in this regard:

      Diallo, T. M., Morin, A. J., & Lu, H. (2016). Impact of misspecifications of the latent variance–covariance and residual matrices on the class enumeration accuracy of growth mixture models. Structural Equation Modeling: A Multidisciplinary Journal, 23(4), 507-531.

  2. It would be a great future to expan this beyond two mixture components (as far as I am aware that is not yet something log_mix covers). With two mixture components the log_sum_exp solution is still not too unpleasant, it is for, say, 5 component mixtures where that writing out all those nested log_sum_exps and all the brackets without any mistakes gets really painful. For that kind of situation a more general log_mix would be super-useful.

    • [OK, let’s try again. My login must have elapsed while writing the post.]

      That was in the original feature request, but nobody’s ever gotten to it. The more-than-two-component version would use a simplex to parallel what we already have.

      I’d also like to see a log-odds parameterization for both of these as we do with bernoulli_logit and categorial_logit. These are relatively easy functions to add to the Stan math library.

      Longer term, we’ll be looking at even better ways to do this with higher-order types. I want to add simple types and lambdas with closures to Stan. Mitzi’s already got the basic infrastructure plumbed through to let us start generalizing the type system. She’ll be adding tuples first, but we’ll also be looking at adding higher-order types in the future.

    • … but there is log_sum_exp defined for vectors, 1d arrays and even matrices. So just do

      real lmix[5];

      lmix[1] = …;
      lmix[2] = …;

      And then

      target += log_sum_exp(lmix);

      Will do it. Not as nice as a log_mix with multiple entries, but better than nesting.

      • This is exactly how I’m doing it for a 7 component mixture prior I have in one of my models, just make the local variable inside the loop to accumulate the individual components of the model, and then at the end log_sum_exp(lmix).

        • You mean you want to generate from a mixture posterior? Just at each cycle, for each thing you want to generate, randomly select one of the mixtures with probabilities equal to the mixture probabilities, and generate from that.

        • Uhm… for prediction you should probably read the stan manual. You will need the *posterior* probabilities of each mixture component which is given by the prior predictive of that component and its prior weight. You end up needing the log_sum_exp sum as a normalization constant and the weight of each component is given by each lmix… (so lmix[i]/log_sum_exp(lmix) is the weight for component i) but again, it’s well described in the manual.

        • Both the approach of Daniel Lakeland and Sebastian are presented in the manual. Where possible, the method Sebastian recommends will be a lot more accurate in the sense of computing posterior expectations with lower MCMC error (the Rao-Blackwell theorem provides the theory, but it’s easy to see in practice, as in the example in the manual).

          The first change-point example in the latent discrete parameters explains how to do what Sebastian is recommending. Just be careful, because it’s a subtraction, not a division as Sebastian wrote, because it’s on the log scale:

          log Pr[y = i] = lmix[i] - log_sum_exp(lmix)
          
        • I’m not quite clear, I’m pretty sure “You will need the *posterior* probabilities of each mixture component which is given by the prior predictive of that component and its prior weight” has a typo somewhere because prior predictive component * prior weight = prior model

          If you’re trying to create a parameter that samples from the posterior of the mixture then you create a parameter “FakeData” and give it the same distribution as your real data.

          p(FakeData | Component = 1,other parameters…) p(Component=1) + … + p(FakeData | Component = n, other parameters….) p(Component=n)

          Since each component will be calculated as logarithms, remember that to multiply on regular scale, you add logarithms, and to add things, you take the log of the sum of the exponentials… log_sum_exp

          if p(Component=1) is itself a simplex parameter then you can just use it, if these parameters don’t already sum to 1, then you need the normalization described above.

          I hope that helped someone, at least maybe me. ;-)

        • You can predict in all sorts of ways.

          You can technically predict individual weights directly, with some prespecified allowable error. You can predict the posterior weights. You can predict the prior weights as well. All of these tend to work well, to be honest. I actually think most people predict log(p(theta_i)), and then compute the posterior p(theta_i|data) afterward. Seems like most mixture packages work that way.

      • This is the case where it would be nice to have the feature Aki requested, which is a vector-return log pdf. Then a k-mixture is a one-liner.

        int N;
        int K;
        simplex[K] theta;
        real y[N];
        ordered[K] mu;
        vector[K] sigma;
        ...
        vector[K] Pr[N];
        for (n in 1:N) {
          vector[K] lp = log(theta) + normal_vec_lpdf(y[n] | mu, sigma);
          real log_Z = log_sum_exp(lp);
          target += log_Z;
          Pr[n] = lp - log_Z;
        }
        

        and then Pr[n, k] gives the probability that the n-th item is assigned to mixture component k (it’s Pr[z[n] = k] if you think in terms of the latent responsibility parameter z[n] in 1:K).

  3. I am trying to write a model to estimate the proportions of at least three food categories for 10 species of aquatic insects. How should I write the code to make estimation for each species?

Leave a Reply to Andrew Cancel reply

Your email address will not be published. Required fields are marked *