“Handling Multiplicity in Neuroimaging through Bayesian Lenses with Hierarchical Modeling”

Donald Williams points us to this new paper by Gang Chen, Yaqiong Xiao, Paul Taylor, Tracy Riggins, Fengji Geng, Elizabeth Redcay, and Robert Cox:

In neuroimaging, the multiplicity issue may sneak into data analysis through several channels . . . One widely recognized aspect of multiplicity, multiple testing, occurs when the investigator fits a separate model for each voxel in the brain. However, multiplicity also occurs when the investigator conducts multiple comparisons within a model, tests two tails of a t-test separately when prior information is unavailable about the directionality, and branches in the analytic pipelines. . . .

More fundamentally, the adoption of dichotomous decisions through sharp thresholding under NHST may not be appropriate when the null hypothesis itself is not pragmatically relevant because the effect of interest takes a continuum instead of discrete values and is not expected to be null in most brain regions. When the noise inundates the signal, two different types of error are more relevant than the concept of FPR: incorrect sign (type S) and incorrect magnitude (type M).

Excellent! Chen et al. continue:

In light of these considerations, we introduce a different strategy using Bayesian hierarchical modeling (BHM) to achieve two goals: 1) improving modeling efficiency via one integrative (instead of many separate) model and dissolving the multiple testing issue, and 2) turning the focus of conventional NHST on FPR into quality control by calibrating type S errors while maintaining a reasonable level of inference efficiency.

The performance and validity of this approach are demonstrated through an application at the region of interest (ROI) level, with all the regions on an equal footing: unlike the current approaches under NHST, small regions are not disadvantaged simply because of their physical size. In addition, compared to the massively univariate approach, BHM may simultaneously achieve increased spatial specificity and inference efficiency. The benefits of BHM are illustrated in model performance and quality checking using an experimental dataset. In addition, BHM offers an alternative, confirmatory, or complementary approach to the conventional whole brain analysis under NHST, and promotes results reporting in totality and transparency. The methodology also avoids sharp and arbitrary thresholding in the p-value funnel to which the multidimensional data are reduced.

I haven’t read this paper in detail but all of this sounds great to me. Also I noticed this:

The difficulty in passing a commonly accepted threshold with noisy data may elicit a hidden misconception: A statistical result that survives the strict screening with a small sample size seems to gain an extra layer of strong evidence, as evidenced by phrases in the literature such as “despite the small sample size” or “despite limited statistical power.” However, when the statistical power is low, the inference risks can be perilous . . .

They’re pointing out the “What does not kill my statistical significance makes it stronger” fallacy!

And they fit their models in Stan. This is just wonderful. I really hope this project works out and is useful in imaging research. It feels so good to think that all this work we do can make a difference somewhere.

10 thoughts on ““Handling Multiplicity in Neuroimaging through Bayesian Lenses with Hierarchical Modeling”

  1. I have little experience with brain imaging data but as far as I know it does poorly with regression (logistic or normal) so I’m not sure that this is a good idea. Brain imaging data is best fitted with convolutionnal neural networks (make sense since this is how our brain does it). Looking directly at voxels as variables is far from ideal and generally requires a lot of feature engineering (including only at certain voxels, using lots of variable selection and weird tricks). If I’m wrong, please correct me.

    • It may be true that brain imaging data does poorly with regression, but it is definitely true that regression is the primary method for analyzing brain imaging data (at least in psychology). There are a number of pre-processing steps to prepare the voxel-level data for regression but I’m not sure they are the feature engineering you’re talking about.

    • There were a number of studies in the 90s by Heeger and Boynton (e.g. http://www.cns.nyu.edu/heegerlab/content/publications/Boynton-jneurosci96.pdf) showing that the linear model was pretty reasonable for fmri research, especially in vision. I don’t know of anyone fitting imaging data with logistic regression (doesn’t really make sense since the BOLD signal is continuous). If anyone is successfully directly fitting a convnet to imaging data and interpreting the parameters, that’s news to me. It seems like the wrong tool for the job for most studies, unclear what theoretical understanding you could get from a fit network even if you could regularize it enough to fit a small imaging dataset. The studies that have been done (by Kriegeskorte, van Gerven and others) that I think you might be referring to typically use the responses of units in a convnet as predictors in a GLM so the model is still essentially a linear model. Regression may not be the best model but so far it’s been working pretty well…

    • This is not to say that the massively univariate GLM approach is ideal. There are lots of better ways forward and the above linked paper is certainly a good step. But I don’t think you are going to see anyone analyzing their imaging data with a convnet any time soon and for good reason.

    • “Brain imaging data is best fitted with convolutionnal neural networks (make sense since this is how our brain does it)”

      Those are really separate questions. The brain as an information processor is very well modeled by convolutional nets (at least, certain subsystems are, like the early visual system), but that’s very different from saying that convolutional nets are the best way to do statistical analysis of neuroimaging data.

      The whole voxel by voxel mass-univariate thing is a pretty stupid way to analyze neuroimaging data, but there’s a lot of work on hierarchical models using different spatial/temporal smoothness priors that solve a lot of the multiple comparison issues (e.g. https://arxiv.org/pdf/1710.01434.pdf).

      The feature engineering issue is pretty spot on, though. The standard approach seems to be to take a bunch of voxels in a region (which may be known a priori, or defined using a moving spotlight) and then either throw them directly into a (sometimes kernel) SVM, or maybe doing PCA first. That kind of prediction has multiple comparison issues of its own, though, since the CV error is still a noisy estimate, so if you do a thousand comparisons you’re almost guaranteed to find something. I wish this was more widely acknowledged in the literature.

    • Not exactly true. Convnets preserve the spacial relationship between inputs to an extent, by making objects within an image translation invariant. It does retain certain similarities to the visual cortex, but issues like rotation can cause a convnet to completely misclassify its input. (Unless its trained on rotated data, but then they seem to learn features for each aspect of rotation, failing to classify the object as a meaningful whole. No citation, just some experiments I did with MNIST and data augmentation.) In particular, precise spacial relationships get lost due to pooling.

      To see how convnets doesn’t learn a strictly-realistic internal representation of reality, google some of the GAN papers that have been trained on ImageNet. Sure, some things look kind of like a dog, or a boat, or a plane…but why does this dog have two mouths, and three legs? Why does that plane only have one wing? The model learns roughly what a dog should be and should look like, but never learns that a dog has legs below (NOT OUT OF ITS HEAD OH GOD THE HORROR). These facts aren’t needed to determine that a dog is a dog. The extent of this problem depends on how similar your input data is though. The celeb faces one seems to have worked because the images generally have the same spacial relationships.

      Hinton seems to have identified the issue, and he’s working on a new class of autoencoder to try and resolve it. If I’m reading the paper right, it’s trying to learn general spacial invariance for object recognition based on specialised units that build an internal representation of the object within frame. The biggest problem is that it gets confused by multiple instances of the one object (ie. two cars facing different directions) but hey, early days.

      @Corson: On the question of whether neural nets are the best way of doing statistical analysis? I share your skepticism. Even though I know little (okay, nothing) about how neuroimaging is done, would it potentially be useful to use denoising convolutional autoencoders to map three dimensional imaging data to a one-dimensional vector? It’s still “black box”-y, but it’s based on pretty solid fundamental ideas. While the transform itself is still hidden, and the components won’t reliably be ordered as in PCA, the vector itself describes a point in lower-dimensional space not bound by linearity assumptions. Multiple comparisons could be avoided as the autoencoder is semi-supervised, meaning you can train it with all the neuroimaging data you can find, then run the analysis on your new data. Or perhaps I’ve missed the point! If I have, how would you approach the problem of feature engineering?

      • These kinds of models are a bit out of my area of expertise, but I know that those kinds of approaches have been used for preprocessing in e.g. medical imaging. When it comes to statistics, I general like to stay in “brain-space” as much as possible, just because the structure of the brain is well enough understood that we generally have a lot of prior information to go on (e.g. effects are likely to be smooth within certain well defined functional regions, and discontinuities are likely to occur across boundaries that are fairly well known and easily identified). When you transform too early, you lose a lot of that prior knowledge, and it gets harder to reason about the effects.

        • Interesting. I’m relatively new to Bayesian inference, and priors for spacial relationships is something I’ve not really considered before. I was expecting dimensionality reduction to be an essential step, as I’m used to thinking in machine learning (not statistical) terms. That being said, I absolutely grasp your point and feel dumb for not seeing it myself. Thanks for taking the time to explain it to me!

        • Dimensionality reduction (usually pca or ica) is very popular, but for general statistical modelling, linear models are still by far the most common, if only because it’s the thing that most people are trained to use.

Leave a Reply to Corson N. Areshenkoff Cancel reply

Your email address will not be published. Required fields are marked *