Eric Brown asks:

How does Stan and its Bayesian modeling relate to structural equation modeling? Do you know of a resource that attempts to explain the concepts behind SEM in terms of Stan nomenclature and concepts?

Some research that I’ve looked into uses SEM to evaluate latent factors underlying multiple measurements with associated errors; or use SEM to relate different measurements of the same physical property. I have a hard time wrapping my head around that analysis and would prefer to use what I know (Stan) and investigate the same issues.

Any suggestions?

My reply:

There are two aspects to a structural equation model: the statistical model and the causal interpretation.

The statistical model is a big multivariate distribution, and there should be no problem fitting it in Stan. I haven’t personally fit such models myself, but my guess is that if you put a query on the Stan Discourse list, asking if anyone’s fit a structural equation model in Stan, that you’ll get some responses.

The causal interpretation is just a separate issue from the fitted model. I think the usual causal interpretations of structural equation models are typically over-ambitious: without making lots of assumptions, there’s a limit to how much causal knowledge you can get from observational data, and traditional structural equation modeling does not make a lot of formal assumptions. Fitting a structural equation model in Stan won’t solve this problem, because even if you put strong priors on the parameters in the model, this doesn’t give you priors on the causal inferences. From a statistical perspective, causal inference corresponds to predictions about potential outcomes, and structural equation models, as traditionally written, just model the data, they don’t model potential outcomes. Some of these concerns are discussed in the causal inference chapters of my book with Jennifer Hill. We don’t talk about structural equation models, but our general discussions of causal inference should be relevant to understanding these issues.

tl;dr: I think Stan’s an excellent way to fit a structural equation model, considering it as a probability model, a math problem to fit a model to data. To go causal (which is the usual purpose of structural equation modeling), you might not want to fit a structural equation model at all!

There’s one other thing, which is that these models can be so big, that often people try to simplify the model or estimate some underlying structure by using rules such as statistical significance or Bayes factors to remove links from the model. I generally don’t like this, the practice of trying to estimate causal structure based on data. I discuss this a bit in my 2011 paper, Causality and Statistical Learning:

The R packages blavaan and ctsem both use Stan to estimate different types of structural equation models.

I posted over on Stan Discouse about this. Here is a nice introduction the Bayesian SEM modeling in brms. https://www.imachordata.com/bayesian-sem-with-brms/ You can dump out the Stan model model code from there to play with.

Conceptually, a non-frequentist probability model appears to be the equivalent of a “subjective” causal model rather than a non-causal model as a probability is assigned based on features or characteristics rather than as a random sampling within category. Predictive models always rely on causality, they simply proceed as if it is not important to understand it. Interpretations are always made, the problem is when the interpretations are in fact causal despite the pretense that they are not.

To see that they’re not equivalent, consider regressing lung cancer vs. smoking vs. the opposite. It’s just as easy to predict smoking status from cancer (if not easier) than it is to predict cancer from smoking. Clearly the causation goes from smoking to cancer, not the reverse.

Good point.

I’ve often used the height vs. weight example to distinguish predictive from causal, which leaves us with what can or cannot be inferred based on the extent of our knowledge about the causal process.

My impression is that this is true as a matter of notation but not practice—that the SEM tradition does commonly give these models a causal interpretations.

I think a boon of recent work in causal modelling is making it notationally easy to say whether

Y = a·T + b·X + ε

is meant to be interpreted as

Y(t,x) = a·t + b·x + ε

or, more conservatively, as something like

E(Y(t) | x) = a·t + b·x

or

E(Y | t, x) = a·t + b·x.

You’re right that SEM is problematic with observational data, which is why you’re only supposed to use it with experiments.