Peter Clayson writes:
I have spent much of the last 6 months or so of my life trying to learn Bayesian statistics on my own. It’s been a difficult, yet rewarding experience.
I have a question about a research debate that is going on my field.
Briefly, the debate between some very prominent scholars in my area surrounds the question of whether the dorsal anterior cingulate cortex (dACC) is selective for pain. I.e., pain is the best explanation compared to the numerous of other terms commonly studied with regard to dACC activation. The paper reached these conclusions by using reverse inference on studies included in the NeuroSynth database.
What is getting many researchers riled up is the statistical approach the paper used (as well as the potential anatomical errors). The original paper in PNAS, by Matt Lieberman and Naomi Eisenberger here at UCLA, used z-scores to summarize dACC activation instead of posterior probabilities. Explanation below:
Our next goal was to quantify the strength of evidence for different processes being the psychological interpretation for dACC activity and how the evidence for different psychological processes compared with one another. We wanted to explore this issue in an unbiased way across the dACC that would allow each psychological domain to show where there is more or less support for it as an appropriate psychological interpretation. To perform this analysis, we extracted reverse inference statistics (Z-scores and posterior probabilities) across eight foci in the dACC for the terms “pain” (410 studies), “executive” (531 studies), “conflict” (246 studies), and “salience” (222 studies).
The foci were equally spaced out across the midline portion of the dACC (see Fig. 5 for coordinates). We plotted the posterior probabilities at each location for each of the four terms, as well as an average for each psychological term across the eight foci in the dACC (Fig. 5). Because Z-scores are less likely to be inflated from smaller sample sizes than the posterior probabilities, our statistical analyses were all carried out on the Z-scores associated with each posterior probability (21).
The paper goes on to compare z scores in various dACC voxels for different psychological terms. The paper was slammed by Tal Yarkoni, the creator of the NeuroSynth database, for not using Bayesian statistics (here) as well as for other reasons. Lieberman posted a snarky, passive-aggressive reply defending his statistical analysis (here), and Yarkoni brazenly responded to that post (here). Then things “got real”, and some heavy hitters in my field published a commentary in PNAS (here), to which Leiberman responded (here). (I show you all this to demonstrate how contentious things have gotten about this PNAS paper.)
Lieberman et al. defend their statistical approach and emphasized hit rates:
Imagine a database consisting of 100,000 attention studies and 100 pain studies. If a voxel is activated in 1,000 attention studies and all 100 pain studies, we would draw two conclusions. First, a randomly drawn study from the 1,100 with an effect would likely be an attention study. Second, because 100% of the pain studies produced an effect and only 1% of attention studies did, we would also conclude that this voxel is more selective for pain than attention. Hit rates (e.g., the number of pain studies that activate a region divided by the total number of pain studies in Neurosynth) are more important for assessing structure-to-function mapping than the historical tendency to conduct more studies on some topics than others.
They go on to analyzed a subset of data matching the number of studies included in analyses for each of the terms.
It seems like the appropriate thing to do is to analyze posterior probabilities for each psychological term and dACC activation. I think an appropriate analogy would be a doctor diagnosing smallpox. Say, patients with smallpox had a 99% probability of having spots, whereas patients with chickenpox had a 70% probability of having spots. Given how rare smallpox currently is (the prior), without taking into account the posterior probabilities the doctor would incorrectly diagnose patients as having smallpox, based on the reasoning that patients who have smallpox are more likely to show spots.
I think the same thing is happening in this PNAS paper. Ignoring the posterior probabilities is like just focusing on whether a patient has spots and ignoring the prior probabilities.
Am I reaching a correct conclusion? I think the PNAS paper is bad for numerous other reasons, but I want to understand Bayesian statistics better.
My response: It’s interesting how different fields have different terminologies. From the abstract of the paper under discussion:
No neural region has been associated with more conflicting accounts of its function than the dorsal anterior cingulate cortex (dACC), with claims that it contributes to executive processing, conflict monitoring, pain, and salience. However, these claims are based on forward inference analysis, which is the wrong tool for making such claims. Using Neurosynth, an automated brainmapping database, we performed reverse inference analyses to explore the best psychological account of dACC function. Although forward inference analyses reproduced the findings that many processes activate the dACC, reverse inference analyses demonstrated that the dACC is selective for pain and that pain-related terms were the single best reverse inference for this region.
I’d never before heard of “forward” or “reverse” inference. Here’s how they define it:
Forward inference, in this context, refers to the probability that a study or task that invokes a particular process will reliably produce dACC activity [e.g., the probability of dACC activity, given a particular psychological process: P(dACC activity|Ψ process)]. . . . Reverse inference, in the current context, refers to the probability that dACC activity can be attributed to a particular psychological process [i.e., the probability of a given psychological process, given activity in the dACC: P(Ψ process|dACC activity)].
I agree with Clayson that these reverse probabilities will depend on base rates. “The probability of a given psychological process” depends crucially on how this process is defined and how often it is happening. For example, some people are in pain all the time. If pain is the “Ψ process” under consideration here, then P(Ψ process|dACC activity) will be 1 for those people, automatically. Other people are just about never in pain, so P(Ψ process|dACC activity) will be essentially 0 for them.
I’m not saying that this reverse inference is necessarily a bad idea, just that much will depend on what scenarios are in this database that they are using. In his discussion, Yarkoni writes, “Pain has been extensively studied in the fMRI literature, so it’s not terribly surprising if z-scores for pain are larger than z-scores for many other terms in Neurosynth.” I think this is pretty much the same thing that I was saying (but backed up by actual data), that this reverse-inference comparison will depend strongly on what’s in the database.
I also have some problems with how both these inferences as defined, in that “dACC activity” is defined discretely, as if the dorsal anterior cingulate cortex is either on or off. But it’s my impression that things are not so simple.
Finally, there are the major forking-paths problems with the study, which are addressed in detail by Yarkoni. I agree with Yarkoni that the right way to go should be to perform a meta-analysis or hierarchical model with all possible comparisons, rather than just selecting a few things of interest and using them to tell dramatic stories.
On the other hand, Matthew Lieberman, one of the authors of the paper being discussed, has a Ted talk (“The Social Brain and its Superpowers”) and has been featured on NPR, and Tal Yarkoni hasn’t. So there’s that.