I’ve often seen confusion between interactions in a regression model and correlations among the predictors. To keep it simple, consider the model y = b0 + b1*x1 + b2*x2 + b3*x1*x2 + error, and assume the predictors have been signed so that both b1 and b2 are positive. Then b3 represents the interaction. This has nothing to do with the joint distribution of x1 and x2 in the data, or in the population. (For simplicity, assume the data to which the model are being fit is a random sample from the population of interest.)
The interaction depends on the model of y given x1 and x2, while the correlation depends on the model for x1 and x2. These are two completely different parts of the model. And yet, they often seem connected.
I have the general impression that I’d be more likely to expect a positive interaction of x1 and x2 when predicting y, if x1 and x2 are positively correlated in the population.
For example, when predicting income from height and sex, being taller and being male both predict higher income, also they interact–the coefficient for height is higher for men than for women–and of course the two predictors, height and male, are positively correlated in the population.
I’m not sure how to think about this connection or even whether it’s a real pattern! But there might be something there so I wanted to share it with you.
The issue of interactions comes up in the context of the concept of intersectionality, which is a form of interaction that comes up in sociology. It started for me with this email from Elin Waring:
I’ve been working on data on intersectionality and retention of students in STEM majors. My little group is specifically looking at data from Lehman College and trying to model graduation with a STEM degree. There are a lot of details, but basically we have come to the conclusion that the right way to describe this is with a discrete time competing risk model (the competing risks being graduation with a STEM degree and graduation with a non-STEM degree). I won’t go into all the details. We have data for between 1 and 20 semesters enrolled for students starting as freshman. For us, intersectional identity is defined by 5 variables that yield 32 distinct combinations or strata as used in the next articles.
In trying to think about how to account for intersectional identities we came across the “MAIHDA Method.” I was wondering if you had seen this discussion before or have any thoughts about it.
Evans, Clare R., George Leckie, and Juan Merlo. 2020. “Multilevel versus Single-Level Regression for the Analysis of Multilevel Information: The Case of Quantitative Intersectional Analysis.” Social Science & Medicine (1982) 245:112499. doi:10.1016/j.socscimed.2019.112499.
They essentially argue for treating the strata as random effects in a multilevel model where with the individual components of the combinations introduced as fixed effects describing the combinations.
The next article criticizes that approach and argues for fixed effects all around.
Wilkes, Rima, and Aryan Karimi. 2024. “What Does the MAIHDA Method Explain?” Social Science & Medicine 345:116495. doi:10.1016/j.socscimed.2023.116495.
Responded to here:
Evans, Clare R., Luisa N. Borrell, Andrew Bell, Daniel Holman, S. V. Subramanian, and George Leckie. 2024. “Clarifications on the Intersectional MAIHDA Approach: A Conceptual Guide and Response to Wilkes and Karimi (2024).” Social Science & Medicine 350:116898. doi:10.1016/j.socscimed.2024.116898.
I was wondering if you have any thoughts about this? For me, intersectionality as a theoretical approach does mean that it makes sense to look at the strata rather than thinking of the strata as just the most complex level of creating statistical models of the intersection of the variables. But then it seems as though treating this a random effect more or less undermines its centrality to the theory. And is treating both the strata and the individual characteristics as variables at the same level basically a way to decompose?
In the end, I feel like the pro-MAIHDA people retreat to “we are just descriptive” in a way that isn’t very helpful. That said, they are right that this seems to have some traction in the world of health disparity research.
I replied that I’d never heard of any of this method before. I couldn’t actually muster the energy to read the above articles, as all this debate seems to be missing the key issues. I don’t really care if something is called a fixed effect or a random effect (see here); my current preferred way of thinking of these problems is by framing as a generative model.
Regarding intersectionality, the natural way I would see it is that this would show up as an interaction term, the idea that the interaction is more than the sum of its parts? For a simple example, if there are 5 binary variables and each has the same effect on its own (which they wouldn’t, this is just a simple hypothetical example), then you could create a variable which is the total number of identities, thus a number from 0 to 5, and “intersectionality” would show up as a super-linear or convex relation between the outcome and this total predictor?
Waring responded:
Sure, but the idea you suggested about intersectionality itself isn’t right. You can’t just sum the number of identities, everyone has identities and the idea is that it is not just about concentrated disadvantage of having all or some specific identities. If we have 5 dichtomous identity/group variables everyone has 5 dimensions of identity. Intersectionality is about the idea that something like “white, native born. woman, high income” shapes what happens because of how those come together to shape (in the case of my analysis) whether, as an undergraduate, you persist in STEM fields.
I replied as follows:
Yes, I was actually thinking this when I wrote that! I was imagining that each of the 5 factors has an “off” and “on” setting, and intersectionality kicks in when there are multiple “on” settings, where “on” represents the group that faces more difficulty (nonwhite, non-native born, female, low income, gender nonconformist, etc.). Once you allow arbitrary possibilities for intersectionality, then my simple superadditive model wouldn’t fit. On the other hand, if you were to allow all 32 possibilities to take on any value, then realistically you would not be able to estimate anything much at all: this is the usual problem in sociology of approximating a complex social structure by a simple model that explains most of the variance. For predicting persistence in STEM (or any academic field), one possible factor that could enter in a complicated way is conservative political ideology, in that for many attitudes and behavior its predictive effect goes in the opposite of the “on” categories listed above, but grad students, in STEM and other fields are predominantly politically on the left. I could well imagine that conservative political ideology, like the other “on” categories, is predictive of not persisting in STEM but that this could interact in unexpected ways with those other categories.
From a statistical perspective, my main message is to choose such a model based on its explanatory power and recognizing that it’s an approximation, rather than using methods such as statistical significance or Bayes factors which in different ways are driven by sample size, as we discussed in this 1995 paper.
Another interesting statistical feature of this and similar discussions is that it’s natural for the discussion to go back and forth between the correlation between two predictors in the data (or the population) and the interaction between their predictive effects, as discussed at the top of this post.
I’m not sure if this interaction thing is a general pattern that has some statistical explanation, or just a faulty intuition of mine based on just a couple of special cases. But I have noticed a general confusion that when people talk about interactions, often they seem to be talking about correlation between the predictors.

















