There’s a meta-principle of mathematics that goes as follows. Any system of logic can be written in various different ways that are mathematically equivalent but can have different real-world implications, for two reasons: first, because different formulations can be more directly applied in different settings or are just more understandable by different people; second, because different formulations more conveniently allow different generalizations.

Familiar examples from classical physics include Newton’s laws, conservation rules, the least-time principle, and Hamiltonian dynamics, which (if you’re careful with the details) represent different ways of saying the same thing, but which can be effective in different ways for problem-solving and understanding, and which lead to different generalizations. Similarly, quantum mechanics can be expressed using different formalisms: Schrodinger’s equation, path diagrams, etc. I don’t remember my physics very well so I’m probably garbling some of the above details, but I think the general principle holds.

In Euclidean geometry, it’s been well known for a long long time that the famous 5 postulates can be reformulated in different ways, which can make a difference when we want to generalize to non-Euclidean systems (for example, the sum of the angles of a triangle on a sphere will always be greater than 180 degrees). Similarly, in analysis, the Bolzano-Weierstrass theorem can instead be a postulate if you want to do things that way.

With that history in mind, we realize that sometimes it can be helpful in understanding a problem to bring up a different formulation, even if mathematically equivalent to existing frameworks, because of the connections it makes to particular areas of applications, or because of how it conveniently generalizes.

**Causal inference**

And that brings us to the topic of causal inference, which already can be expressed in several mathematically equivalent ways using regression modeling, potential outcomes, or graphical models. See the book by Morgan and Winship for review and discussion of all these frameworks.

Here I want to talk about another way of looking at causal inference, this time using multilevel modeling.

The basic idea is as follows: *causal* comparisons are *within* a person, but we can often only directly make *descriptive* comparisons *between* people. (More generally, we could replace “people” by “causal units of analysis,” which could be schools or cell cultures or countries or whatever.)

I was thinking about this when reading the discussion thread on this post, and also when attending a presentation recently which featured the estimation of a causal effect without, it seemed to me, a clear sense of what was being estimated.

In the above-linked thread, people were saying that predictive inference is different from causal inference, I was saying that causal inference *is* predictive inference, and everyone seemed to be talking past each other.

It struck me that some of the confusion is arising from a lack of clarity, not about causal inference but about predictive inference.

There are lots of (mathematically equivalent) definitions of causal inference, but what exactly is “predictive inference”? It depends on what is being predicted.

Causal inference is, I believe, unambiguously about comparisons within people (or, more generally, within units), but prediction can be about anything.

When correspondents on the thread were saying that predictive and causal inference are different, they were thinking about predictions between people. For example, if you measure (x_i, y_i) on a bunch of people, i=1,…,n, and then you run a regression of y on x, and then you use this to predict y_i for new people, that’s predictive inference between people and does not directly address any causal question—not without some strong assumptions. Not just distributional assumptions about p(x,y) in the population, but assumptions about variation in y *within* a person. I think that’s what Judea Pearl is talking about when he says that causal inference goes beyond statistics. All the statistics in the world on p(x,y) in the population—data, model, theory, whatever—isn’t enough to answer questions about variation in y within a person. It’s as if statistics is living on a flat surface, and causal inference is the third dimension. No amount of movement on the floor, no matter how efficient, will take you into the air. So I think that’s what Pearl is getting at. Econometricians are making a similar point with their notation, by looking at various approaches for estimating causal effects using regression, and pointing out that these all require assumptions about potential outcomes.

OK. Now suppose you have multiple observations on each person, or multiple potential observations. (For the purpose of probability modeling, it doesn’t matter if these additional measurements are observed or just latent or potentially observable.) Causal inference refers to particular statements about these potential observations, and causal questions about these multiple measurements per person can be addressed using statistical models.

To put it another way, we reach the “third dimension” by considering within-person comparisons. All the between-person analysis in the world won’t take you to that third dimension, not without some strong assumptions. (Even a clean randomized experiment can only tell you about average effects, not anything about individual effects unless you’re willing to assume something about the distribution of these within-person comparisons.)

This is one reason that Eric Loken and I say that, in a psychology experiment studying effects on individual people, you should do within-person comparisons as this is the only direct way to study causal effects. (As Rubin wrote, in causal inference, design trumps analysis.) If all you have are between-person comparisons, you have to make big assumptions to get into the air.

To slightly adapt Pearl’s framing, the distinction is not between “a statistical quantity” and “a causal quantity,” but rather between a between-person comparison and a within-person comparison. It just happens that in statistics we typically learn about causal inference for within-person treatments in the context of data that only allow between-person comparisons.

I don’t know that Pearl fully realizes the way in which, using multilevel modeling, we can combine within- and between-person inference and incorporate causal structures into statistical models. I think it’s been a mistake of the field of statistics (including my own textbooks) to present casual inference as if between-person comparisons from randomized experiments are a “gold standard” for within-person causal inference, and I can see why this would legitimately frustrate Pearl and others.

f all you have are within-person comparisons, you have to make big assumptions to get into the air.That should be

between, right?fixed; thanks.

“If all you have are within-person comparisons, you have to make big assumptions to get into the air” should be “If all you have are betweeen-person comparisons”?

Even more generally, a within-person comparison refers to a hyper-population formulated by delta functions. So to conduct within-person analysis from between-person data is equivalently to learn information from a non-representative samples, which is therefore so statistical and so multilevel modeling.

Yuling:

Sure, you can make within-person inferences from between-person comparisons; you just need to make assumps.

I’m not really sure if calling it “within persons vs between persons” really clarifies the issue. Taking the classic hospitalization example, taking the difference in health status in two time steps is a within persons comparison that can be used to infer that hospitals cause bad disease. Of course, it’s also a between persons comparison since you’re comparing the difference within persons who were hospitalized to the difference within persons who were not hospitalized. Of course, what you would really mean here is that you want comparisons within persons in the sense that you want to compare someone who was hospitalized to the same exact person were they not hospitalized. But I feel that the phrase “within persons” can easily be misinterpreted, and am not clear what it elucidates over the language of potential outcomes and manipulations.

Good points. One more reason to respect IASS (It Ain’t Simple Stupid) (which is interesting as an acronym that when pronounced as a word is somewhat relevant to what the acronym is describing).

I’m pretty sure within and between person comparisons are more complicated from a PO perspective. But I think this is more clear assuming a decision theory point of view (Phil Dawid’s). The good thing is DAGs are compatible with both.

I’m not really sure I understand how multi-level modeling allows for within person comparison, would you mind expanding on how that’d be the case?

I’m not sure what Andrew would say, but I think the key is counterfactuals. Sometimes the assumption that what happened to one person acts as information about what would have happened to another. The key is matching important causal factors and having a realistic view of how much uncertainty remains after this matching.

On (perhaps overly) simple example of this would be a straight-forward individual growth model extended to as many levels as needed. The outcome is modeled across time at level one, clustered by individual at level two, clustered by how many other meaningful levels one might need.

Multilevel models are exciting to me because they allow you to explore within person (or whatever your cluster variable(s) represents) and between person associations with the outcome for those predictors that have both within and between person variation. Key to this insight is that the multilevel model partitions the outcome into within and between person variance components and we can do the very same thing with our predictors.

Building from Shane’s example, I find this easiest to think about when the cluster is person and each person is observed on multiple occasions. For those variables that are measured at each occasion, we calculate a variable that is the mean across time for each person and include that mean variable in our multilevel model along with its time-varying version. The time-varying version represents the within person association between that variable and the time-varying outcome and the mean version represents the between person association between the outcome and individual’s average value on the outcome. Technically, the person mean in this setup is a test of the difference in the within and between person association, which is sometimes called a contextual effect in the multilevel modeling literature.

And this is all before allowing the time-varying slope to vary across persons! That adds a powerful dimension to the modeling, because you relax the assumption that the time-varying within-person association is the same across persons.

This is a deep topic, since it ultimately comes down to what we mean by causation. As a sort of pragmatist, I think a model is causal if it does two things, satisfy our demand for a story (process) of how a set of outcomes were arrived at and provide a reliable basis for choosing between models in situations in which more than one might be plausible. This second is about the generality of prediction. If I can be predictively successful only in situations virtually identical to the ones I’ve already analyzed and can’t say whether or how my model should change as the situations vary, my model is not explanatory in this sense.

A classic case is the epidemiology and toxicology of cigarette smoking. The epi data were powerful long before the labs kicked in. The problem was that, without biologically based mechanisms, one of the criteria for belief, the presence of a satisfying process account (or in this case multiple such accounts), wasn’t satisfied, and the epi inferences could never be extended beyond the precise situations (populations, distributions of covariates, etc.) from which they were drawn.

If I understand you correctly, Andrew, you’re saying it comes down to inference at the level of the individual unit of analysis: how would varying the amount and type of smoking affect the same individual holding all else constant? But I don’t see how this serves the purpose of predictive generality, since causal understanding should extend across different individuals (including future individuals) and the myriad situations any individual should find themself in. It also seems to reject the desire for a describable causal process, yes?

It seems to me, however, that causation is inescapably about the process by which outcomes eventuate. In that sense they are meta-statistical. Statistical analysis can help undermine or foster belief in the presence and efficacy of such processes, but their formulation and theoretical validation come from substantive knowledge about the kind of problem at hand: biology, chemistry, social science, etc. Also, from a statistical point of view, a common component of establishing a process is direct description, where measurement alone is sufficient. For instance, case study research can play a powerful role in moving from associative to causal analysis if the process is visible (actually or metaphorically) at the case level. Of course, statistical work is required to determine the generality of the process across cases.

And then there’s the problem of network causation. Causation sometimes takes the form of “A causes B”, but it often appears as a process that operates on constellations of factors. Technically, each causal factor operates conditionally on the operation of the others, but the system is patterned in ways that enable it to be understood at a lower level of complexity. If you accept some version of network causation of this sort, I think you want the concept of causation to be process-based, not unit-based.

I’m not well schooled in this area and have come to it only out of frustration with the approach of most of my discipline (economics) to causal analysis. That is, my views have developed in reaction to certain forms of (what I regard as) error and not from an engagement with the literature. Still, I’d like to know where this approach goes wrong, if it does.

Post below by me referencing the agent based modeling book was supposed to be reply to Peter here… this blog is hard to post on by phone sometimes

Should we be thinking about causal models? Or about causal processes that we can represent with causal models?

Are those different? Are they the same? Does it matter?

Just this morning before this post appeared, I was reading a book “Agent-Based and Individual-Based Modeling: A Practical Introduction” by Steven F. Railsback, Volker Grimm.

In chapter 17 they discuss pattern based modeling, basically the idea that a model should reproduce multiple aspects of the system being studied. For a model to be good, it shouldn’t just predict what happens in one type of case as measured by one type of measurement… it should also recapitulate many other related measurements.

This is the opposite of the “causal identification plus a regression with statistically significant coefficient” paradigm common enough today.

I do think it’s possible to make progress without such process based models, but they are the ideal.

Of course they are ideal… if assumptions are met as Ellie Murray explains here:

https://www.ncbi.nlm.nih.gov/pubmed/28838064

Thanks for the link. I think I pulled some things out of the fog in my brain caused by not really knowing the jargon in Epidemiology. If I were going to summarize what I think they were saying it’s that g-formulas, whatever they are, do not cause biases in extrapolation because they don’t claim to be able to extrapolate, whereas ABMs are used for extrapolation, but they won’t extrapolate well when they don’t correctly model the altered conditions where they are asked to extrapolate.

This seems not particularly controversial but is stated in a lot of technical language. Models that don’t work well in alternative contexts shouldn’t be used in those contexts.

I don’t think there exists a magical model that can be fit to data in condition A and then magically work in condition B where there are other additional changing and unmodeled factors that affect the outcomes.

To me, the importance of ABM is not so much its ability to produce precise and accurate estimates, as its ability to provide insight into which processes are important for the outcomes. When we use them in a new environment and find that they don’t work as well, it is an invitation to find out what process is missing or different.

Daniel,

Thank you for your thoughtful and insightful comments. I am already reviewing my notes.

I have not read this post. However, anyone who read this post should read Yongnam Jim’s piece on graphical views for “fixed” and “random” effects.

Causal Graphical Views of Fixed Effects and Random Effects Models

I am not sure I am convinced, and I think my view is similar to “somebody”‘s above.

In the counterfactual model, the individual causal effect is the difference in the outcome between a person (or other unit) that was treated and that same person at the same point in time that was not treated. In a between-person comparison, you give up the identity of the people. In a within-person comparison, you give up the identity of the point in time.

(By “point in time” I do not mean that treatment and measurement literally have to occur at the same moment. In practice, this will often not be the case, and there will be the assumption that the small differences in this respect will not matter. It should probably be roughly at the same time, though, and a crucial difference is whether the person has received the treatment before.)

So, saying that individual causal effects refer to within-person comparisons is true, but they refer to within-person comparisons at the same time — one person compared to their hypothetical clone. Neither a within- nor a between-person comparison gives you that. The question then becomes: Which deviation from the counterfactual ideal causes a bigger deviation of the measured from the causal effect? I don’t see how this can be decided on a general basis without reference to the specific cases at hand. For example, “poisoning-the-well” effects seem likely to be large if we are taling about taking IQ tests repeatedly, but small if we are talking about temperature effects on aggressive behaviour.

Lemmus:

I think we’re in agreement. I agree that, whatever data are gathered, assumptions need to me made in order to make inference about the individual causal effect (the within-person comparison) of interest. With between-person data, you need to make assumptions about what would happen in a within-person comparison. With a within-person comparison, you still need to make assumptions about the counterfactual. My point is that assumptions are needed either way, which is counter to the usual presentations of casual inference (including in my own books) which consider controlled experiments to be a gold standard, glossing over the big big step from the estimation of average causal effect to inference for individual effects.

Andrew said,

” … assumptions are needed either way, which is counter to the usual presentations of casual inference (including in my own books) which consider controlled experiments to be a gold standard, glossing over the big big step from the estimation of average causal effect to inference for individual effects.”

+1

LemmusLemmus: well said.

> predictive inference between people […] does not directly address any causal question—not without some strong assumptions. Not just distributional assumptions about p(x,y) in the population, but assumptions about variation in y within a person.

> Now suppose you have multiple observations on each person, or multiple potential observations. […] Causal inference refers to particular statements about these potential observations, and causal questions about these multiple measurements per person can be addressed using statistical models.

Don’t you still need those strong “assumptions about variation in y within a person”?

Carlos:

Yes, but they’re different assumptions. What I’m pushing against is the attitude that between-person comparisons give assumption-free or minimal-assumption causal inference.

I’m afraid it’s not very clear what your point is, as your replies seem to weaken your original claims. Maybe some of us misunderstood them to be stronger than they were.

If causal questions “can be addressed using statistical models” in a within-person setting then surely it’s also possible to do so in a between-people setting with additional assumptions?

> the distinction is not between “a statistical quantity” and “a causal quantity,” but rather between a between-person comparison and a within-person comparison

I’d say you can still have “statistical quantities” and “causal quantities” in a within-person comparison. Again, maybe we all agree but your post seemed to suggest otherwise.

Carlos:

From my original post, here are my points:

The bit in the discussion regarding within-person assumptions: that relates to the parenthetical in the last bit I just quoted. Modeling is needed to make statements about additional measurements that are just latent or potentially observable.

If you mean that “causal inference is a special case of within-person predictive inference” I don’t necessarily disagree… In the absence of a definition of “predictive inference” one may imagine that it could include causal considerations.

Carlos:

By “predictive inference,” I mean inference about an observed or potentially observed quantity (as distinguished from inference about a model parameter which is not directly observable). Statisticians often like to think in terms of predictive quantities because they can be defined without reference to a particular model (even though an underlying model is needed to make the inference).

Ok, predictive inference using a causal model is causal inference. I guess one may say as well that causal models are a particular case of statistical models if the latter are defined broadly enough.

Carlos:

The idea of causal inferences being predictive was a key insight of Rubin, to define the individual causal effect in terms of observable potential outcomes, which can then be aggregated to define average causal effects, if there is interest in these. From this perspective, predictive inference is fundamental, and inference about model parameters is just a means to that end.