How to reconcile that I hate structural equation models, but I love measurement error models and multilevel regressions, even though these are special cases of structural equation models?

Andy Dorsey writes:

I’m a graduate student in psychology. I’m trying to figure out what seems to me to be a paradox: One issue you’ve talked about in the past is how you don’t like structural equation modeling (e.g., your blog post here). However, you have also talked about the problems with noisy measures and measurement error (e.g., your papers here and here).

Here’s my confusion: Isn’t the whole point of structural equation modeling to have a measurement model that accounts for measurement error? So isn’t structural equation modeling actually addressing the measurement problem you’ve lamented?

The bottom line is that I really want to address measurement error (via a measurement model) because I’m convinced that it will improve my statistical inferences. I just don’t know how to do that if structural equation modeling is a bad idea.

My reply:

I do like latent variables. Indeed, when we work with models that don’t have latent variables, we can interpret these as measurement-error models where the errors have zero variance.

And I have no problem with structural equation modeling in the general sense of modeling observed data conditional on an underlying structure.

My problem with structural equation modeling as it is used in social science is that the connections between the latent variables are just too open-ended. Consider the example on the second page of this article.

So, yes, I like measurement-error models and multilevel regressions, and mathematically these are particular examples of structural equation models. But I think that when researchers talk about structural equation models, they’re usually talking about big multivariate models that purport to untangle all sorts of direct and indirect effects from data alone, and I don’t think this is possible. For further discussion of these issues, see Sections 19.7 and B.9 of Regression and Other Stories.

One other thing: I think they should be called “structural models” or “stochastic structural models.” The word “equation” in the name doesn’t seem quite right to me, because the whole point of these models is that they’re not equating the measurement with the structure. The models allow error, so I don’t think of them as equations.

P.S. Zad’s cat, above, is dreaming of latent variables.

24 thoughts on “How to reconcile that I hate structural equation models, but I love measurement error models and multilevel regressions, even though these are special cases of structural equation models?

  1. For the life of me, I have not been able to find a succinct definition of what a structural equation model is. Is it any model that isn’t y = X beta?

    • Like, for example, Angrist and Pischke write

      Simultaneous Equations Models, an econometric framework in which causal relationships between variables are described by several equations

      Wikipedia has:

      Although each technique in the SEM family is different, the following aspects are common to many SEM methods, as it can be summarized as a 4E framework by many SEM scholars like Alex Liu, that is 1) Equaltion (model or equation specification), 2) Estimation of free parameters, 3) Evaluation of models and model fit, 4) Explanation and communication, as well as execution of results.

      I’m having serious trouble seeing how these descriptions don’t apply to every parametric model I’ve ever seen. Like, wouldn’t GLMs be a special case where number of equations goes to infinity? Hidden markov models would be an SEM where each equation is linear, so it can be succinctly represented by a transition matrix. What parametric model isn’t an SEM if interpreted causally? The only distinct feature I can see are those DAG-like path diagrams, but even those are uh, like DAGs.

      • I think it is more helpful to say that those models “can be expressed/estimated as” an SEM (or Bayesian model, neural network, etc.). It is technically true that the models you list are specific configurations of a more general SEM, but better in my mind to be practical in highlighting that different methods come with different practices associated with them.

        SEMs break down a model into “measurement” and “structural” components. The measurement component describes how latent variables are related to indicator variables. The structural component describes how latent variables and un-modelled observed variables relate to one another.

        One overlooked aspect of SEM is that it is only concerned with fitting the covariance matrix, not individual observations. All of the numerous SEM fit statistics estimate the difference between the sample and model-implied covariance matrices. And, under-the-hood, SEM only estimates variances and covariances (and intercepts, if you want them). The regression coefficients are then derived from those terms.

      • I believe:
        A SEM (aka path analysis by Sewall Wright and a SCM by Pearl) is a Bayesian Network whose internal (endogenous) nodes are deterministic (i.e., their CPT is diagonal) and its external (exogenous) nodes are non-deterministic and usually root nodes.

        I’m not sure but I think Andrew was saying that his main beef with SEM is that he doesn’t like the usual assumption that the external nodes are root nodes (i.e., they may be correlated in real life).

        In my book Bayesuvius, I call SEM by yet another name. I call them DEN (Deterministic with External Noise) models if the SEM’s internal nodes are related non-linearly. I call them LDEN if the internal nodes are related Linearly.

  2. At Researchgate (see below) sits a conference paper of mine which discusses the problems of effect sizes in structural equation modeling (SEM) and why we should be concerned about what happens in SEM. My main concern is the potential overcorrection of manifest constructs we are interested in via communality based estimates of reliability.
    Best regards

  3. My main issue with how SEM tipically deals with measurement error (thinking from a social science perspective) is the usual zero mean, constant variance, uncorrelated-with-another-covariates assumptions about how the measurement process affects our readings from the intended variable. My physicists friends swears that white-noise error is good enough for classical physics because, after accounting for it, experimental data usually fits proposed theoretical models very well. Without proper theoretical models in social sciences, how are we supposed to know if this model for measurement error is enough, or even a small step in the right direction? Should we really be confortable with these assumption in our weird variables – usually bounded, full of variance and correlation artifacts due to discrete measurement level?

    And then, there’s the usual common factor structure of latent variables, i.e., that a single or a handful of latent variable causally affects the indicators or observed variables. The most usual case is that the desired number of latent variables is in gross disagreeament with the data, but we pretend it’s not – or employ ad hoc methods to empirically define the ‘best’ number of latent variables.

    There’s a really nice paper by Rhemtulla and Borsboom that address how misspecification in SEM can lead to heavily biased estimates, making any merit in ‘dealing with measurement error’ nothing more than wishful thinking:

    In my opinion, if the OP really wants to deal with measurement error, he should ditch the default classical test theory measurement model imbued in SEM and think really hard about how the measurement process and the variable of interest are related; if he is dealing with psychological constructs, the literature about conjoint measurement seems like a good place to start, as hard as it is to find interesting applied exemples of it.

  4. Structural equation modeling is like a modeling analogy to specialized MCMC sampler: sometimes we need to write our own complicated computer programs and solve problems specific to one model and one application, often more efficient on big, tangled, discrete models compared with a generic Stan program.

  5. In my tutoring experience, I’ve seen people put a lot of stock into DAGs. There’s something about seeing a bunch of boxes & arrows adorned with asterisks that makes the underlying statistical model seem more *real* – it “immanetizes the eschaton” as it were. I remember reading an article on how putting colorful fMRI brain figures into papers make people trust the research in the papers more. I wonder if there’s perhaps a similar effect at play here. It should be pretty easy to test it out experimentally. Unfortunately I’m not at a place where I can run any experiments right now, so if someone wants to pick this up please go ahead!

  6. “they’re usually talking about big multivariate models that purport to untangle all sorts of direct and indirect effects from data alone, and I don’t think this is possible”

    Agree!! The ultimate statistical free lunch! A quantitative do-over; an algebraic Mulligan, a statistical upgrade from observational data to “as-if-by experiment” causal inferences. In the social and behavioral sciences, at least, there is no combination of colliders, mediators, disturbance terms, and backdoor/frontdoor adjustments that can alter the stubborn reality that structural equation modeling is merely slicing spam with a laser beam.

    When Clark Hull introduced the first automated correlating calculating machine in 1925, he offered the next best thing to a free lunch- a convenient lunch!:

    “while the machine is doing its various calculations … the attention of the operator is not required by the machine at all … the writer on several occasions has started the machine on a long and difficult column of computations, locked the laboratory and gone out to lunch … Upon his return the computation was found completed … the machine having stopped itself at the conclusion of the operation” (Hull, 1925, p. 530).

    Hull, C. L. (1925). An automatic correlation calculating machine. Journal of the American Statistical Association, 20, 522-531.

  7. Hi Andrew,

    I was struck by what you said here:

    “they’re usually talking about big multivariate models that purport to untangle all sorts of direct and indirect effects from data alone, and I don’t think this is possible”

    I’m wondering if social scientists are trying to untangle these effects by data alone or whether they’re using “structural equation models” to encode causal assumptions which, in given cases, may or may not be plausible. So it’s data in conjunction with causal assumptions instead of data alone.

    Of course, causal assumptions, especially when one is working with observational data, are hard to directly assess; so in a sense, ultimately, there is an attempt to do something based on data alone. All I’m pointing out is that causal notions, rightly or wrongly, are often brought to bear too. Is this what you were getting at or am I saying something different from what you intended?

    • I think Andrew’s caveat “from data alone” is key. I think Andrew is saying that he believes, just like Judea Pearl believes, that the dataset alone does not fully determine the causal model. But I don’t think anyone that uses causal models believes that it does. So that part of Andrew’s criticism doesn’t make too much sense to me either.

      • Robert:

        No generalizations, causal or otherwise, can ever be made from data alone! When I said “from data alone,” I’m also assuming that people are using some model or another. For example, forget about causality. Suppose you’re just doing regression, estimating E(y|x). It’s generally accepted that you can estimate this function from data alone, along with generic assumptions such as continuity and the assumption that the observed data are a representative sample of the population of interest.

        There are researchers how similarly believe that you can untangle causal structure from data alone, in the same way that I’m saying you can estimate E(y|x) from data alone, that is, along with some generic assumptions. See for example this statement from the statistician Cosma Shalizi. I’ve collaborated with Shalizi—but not on this! I disagree with his claim that it makes sense to try to uncover causal structure from data alone and generic assumptions.

        • Andrew, I agree with most of what you just said, but would like to add my 2 cents.

          I think a lot of the deficiencies people perceive in SEMs will magically disappear if they replace SEMs by fully fledged Bayesian Networks/DAGs

          (Henceforth, I will be using the terms DAG and Bayesian Network (B net) indistinguishably. See

          A SEM is just a special case of a B net. A very useful special case, but still a special case. A SEM is a B net with deterministic internal nodes and external nodes that are non-deterministic root nodes. Replace a SEM by a general B net, and all your worries disappear. No need to use voodoo external noise. No need to use deterministic internal variables. No need to linearize if you don’t want to. No need to use linear regression. For example, in my book Bayesuvius, I explain all of Donald Rubin’s theory of Potential Outcomes using only fully fledged B nets.

        • It sounds like, you referring to the causal discovery or causal search literature. You mentioned Shalizi, but Daniel Malinsky, Clark Glymour, and others do this kind of work too. There’s even a chapter on it in Pearl’s book Causality. I don’t know enough about this stuff, to have much of an opinion on whether it, “makes sense to try to uncover causal structure….” This is something I want to learn more about.

  8. Andrew, et al.
    A colleague called my attention to your discussion on SEM, so I am happy to share my recently revised chapter to the new
    edition of the Handbook of Structural Equations Models:
    I believe the causal perspective presented in this chapter should resolve some of the foundational issues discussed above.

    • Judea:

      Thanks for sharing. I just noticed one thing. In your linked article, you write of “combining results from many experimental and observational studies, each conducted on a different population and under a different set of conditions, so as to synthesize an aggregate measure of effect size in yet another environment, different than the rest. This fusion problem has received enormous attention in the health and social sciences, where it is typically handled inadequately by a statistical method called ‘meta analysis’ which ‘averages out’ differences instead of rectifying them.”

      It’s hard for me to know exactly what you’re referring to regarding meta analysis because you didn’t give a citation there, but I don’t think it’s accurate for you to describe meta analysis as averaging out differences. Modern meta-analysis is all about variation. Rubin wrote about this in a 1989 book chapter that I can’t find on the internet, so let me link to a journal article from 1992 where he writes about meta-analysis as estimating an “effect-size surface.” Along with this is a model of variation of the effect size conditional on individual and study-level predictors. Lauren Kennedy and I applied this idea recently in the context of psychology experiments; see this article.

      To say all this is not to criticize your methods; as you note in your paper, there can be many ways of expressing similar statistical ideas, and different approaches offer different advantages, depending on context.

      • That point is even brought out on

        “In addition to providing an estimate of the unknown common truth, meta-analysis has the capacity to contrast results from different studies and identify patterns among study results, sources of disagreement among those results, or other interesting relationships that may come to light in the context of multiple studies.”

        However, that is in the second not first paragraph and a casual reader on meta-analysis may be mislead into thinking it is all about averaging out differences instead of rectifying them.

  9. “…untangle all sorts of direct and indirect effects from data alone…” One can only agree…and it gets worse when there are recipes for conducting tests to tell you to add or remove an effect, independent of any sort of logical reasoning for the effect!

    But for demonstrating the power of encoding a logical explanation as a path diagram, and seeing if the data are consistent with the correlation structure that the explanation encoded in that diagram implies, the examples in Wright’s “The Method of Path Coefficients” was a really illuminating for me

Leave a Reply

Your email address will not be published.