Anouschka Foltz writes:

One of my students has some data, and there is an issue with multiple comparisons. While trying to find out how to best deal with the issue, I came across your article with Martin Lindquist, “Correlations and Multiple Comparisons in Functional Imaging: A Statistical Perspective.” And while my student’s work does not involve functional imaging, I thought that your article may present a solution for our problem.

My student is interested in the relationship between vocabulary size and different vocabulary learning strategies (VLS). He has measured each participant’s approximate vocabulary size with a standardized test (scores between 0 and 10000) and asked each participant how frequently they use each of 37 VLS on a scale from 1 through 5. The 37 VLS fall into five different groups (cognitive, memory, social etc.). He is interested in which VLS correlate with or predict vocabulary size. To see which VSL correlate with vocabulary size, we could run 37 separate correlation analyses, but then we run into the problem that we are doing multiple comparisons and the issue of false positives that goes along with that.

Do you think a multilevel Bayesian approach that uses partial pooling, as you suggest in your paper for functional imaging date, would be appropriate in our case? If so, would you be able to provide me with some more information as to how I can actually run such an analysis? I am working in R, and any information as to which packages and functions would be appropriate for the analysis would be really helpful. I came across the brms package for Advanced Bayesian Multilevel Modeling, but I have not worked with this particular package before and I am not sure if this is exactly what I need.

My reply:

I do think a multilevel Bayesian approach would make sense. I’ve never worked on this particular problem. So I am posting it here on blog on the hope that someone might have a response. This seems like the exact sort of problem where we’d fit a multilevel model rather than running 37 separate analyses!

What’s the official name of the standardized vocabulary size test?

Would factor analysis be appropriate?

It sounds very much like this is an exploratory scenario where you really need to let go of the desire to control decision errors and instead do your best to construct& evaluate useful models to motivate subsequent hard confirmation for decisions.

I’d start with modelling the 37 responses on the VLS as ordinal outcomes associated with latent propensities modeled as a multivariate normal with vocabulary size as an additional outcome. Prior on correlation matrix like lkj_corr(3) for moderate skepticism of high correlations. Can even then do exploratory factor analysis from the posterior samples to confirm your expectation of 5 VLS subscales (example here: https://www.youtube.com/watch?v=S9xkZQ1CHvk&index=15&list=PLu77iLvsj_GO4aGFHng78t1Fd14vPJu0l). Could then follow up with a more SEM like structure with a latent mean propensity for each subscale from which each item of the subscale deviates slightly (achieving pooling within subscale), then correlations amongst scales and the vocab size.

oops, didn’t intend this to be a reply to Donald’s comment, but does converge with their suggestion of factor analysis.

That’s a really cool approach, which I am going to play with for some other problems myself. Thanks! But, from the problem description it does sound like the student is at a pretty preliminary point in their theory development, and I wonder if the real Bayesian solution might be just to compute the 37 correlations and correct the p-values using the False Discovery Rate (which seems to implicitly deal with the latent dimensions problem, which would make, say, Bonferroni too conservative). By “real Bayesian”, I am being a bit cheeky and referring to decision theory rather than just computational bit of being Bayesian. Doing the cool approach that you describe is going to take a lot of time for a student, there is the increased risk of making a mistake, and the pain that comes when the person that reviews the thesis/paper gets confused and rejects the work because they don’t like being confused.

I’d probably run a factor analysis on the data and see if items can be collapsed together. Probably prune out some bad items in the process. Then I’d analyze with structural equation modelling if the sample was large.

Basically, though you have 37 items, it sounds like you actually have only 5 constructs.So presumably, you should be analyzing 5 correlations after combining items together and doing some basic psychometric work to assess reliability of measurement.

The lavaan package in R does structural equation modelling and confirmatory factor analysis, if you have an a priori idea of what items should cluster together.