Dean Eckles writes:

I have a hopefully interesting question about methods for analyzing varying coefficients as a way of describing similarity and variation between the groups.

In particular, we are studying individual differences in susceptibility to different influence strategies. We have quantitative outcomes (related to buying books), and influence strategy (7 levels, including a control) is a within-subjects factor with two implementations of each strategy (also within-subjects).

So I have fit non-nested multilevel models to the data (using lmer) with subject (varying-intercept and influence strategy slopes) and book (varying-intercept) as grouping factors. Following the suggestions in Part 3 of Gelman & Hill, the standard deviation of the coefficients for the influence strategies has been helpful in characterizing the (quite substantial) individual variation. I have also examined the correlations of those varying coefficients and computed the shinkage/pooling factor.

However, I am thinking that looking at this overall correlation matrix allows for insufficient flexibility in identifying ways in which susceptibility to multiple strategies is related or not. This has motivated trying to identify clusters of people with related susceptibilities using, e.g., k-means, soft (Gaussian mixture) k-means, and hierarchical clustering applied to the point estimates for the influence strategy coefficients for each subject. I am wondering if you have used this or similar approaches to analyzing varying coefficients — and whether you think such methods are appropriate at all. I’d be interested in other approaches you would suggest in applied contexts like this one.

Using these estimates from a multilevel model seems to make sense in this context, but I’ve been having trouble finding any examples of this having been done before — perhaps I’m not using the right terminology for this. It also seems undesirable to only using the point estimates of the coefficients, rather than propagating the uncertainty about the coefficients along to the clustering, so perhaps one could use the output of some simulations in the clustering.

My reply: This reminds me of some work in psychology by Iven Van Mechelen, Paul De Boeck and others on hierarchical classification models. Actually, the data shown on the cover of Bayesian Data Analysis came from a study that was analyzed using such methods. For a start, you can go to this article and follow the references back from there.

I've thought about doing similar things, and one alternative to the 2-stage process of (1) fitting a model, then (2) clustering on model parameter estimates, is to fit a latent class model (or maybe an HMM for time series data). The latent classes represent the clusters, and you infer their characteristics from the data, as well as the class membership of each data point. A paper that discusses this is "Hidden Markov Models for Longitudinal Comparisons" by Scott, et. al (JASA, 2005, Vol. 100). In the paper they discuss how their method propagates parameter uncertainty into the latent class membership.

One shortfall is that the models are computationally demanding, and a second is that you have to pre-set the number of latent classes in the model you choose to fit. Optimally choosing the number of latent classes has been written about a lot, but I don't think any general solutions exist yet.

Since asking this question, I've begun working with mixtures of Dirichlet processes. This does not require specifying the number of classes in advance (and then doing model comparison). However, the components of the mixture cannot generally be interpreted and will be numerous when there are many participants.