Skip to content

Bayesian models, causal inference, and time-varying exposures

Mollie Wood writes:

I am a doctoral student in clinical and population health research. My dissertation research is on prenatal medication exposure and neurodevelopmental outcomes in children, and I’ve encountered a difficult problem that I hope you might be able to advise me on.

I am working on a problem in which my main exposure variable, triptan use, can change over time— e.g., a women may take a triptan during first trimester, not take one during second trimester, and then take one again during third trimester, or multiple permutations thereof. I am particularly concerned about time-varying confounding of this exposure, as there are multiple other medications (such as acetaminophen or opioids) whose use also changes over time, and so are both confounders and mediators.

I’m fairly familiar with the causal inference literature, and have initially approached this using marginal structural models and stabilized inverse probability of treatment weights (based mainly on Robins’ and Hernan’s work). I am interested in extending this approach using a Bayesian model, especially because I would like to be able to model uncertainty in the exposure variable. However, I have had little luck finding examples of such an approach in the literature. I’ve encountered McCandless et al’s work on Bayesian propensity scores, in which the PS is modeled as a latent variable, but as of yet have not encountered an example that considered time-varying treatment and confounding. In principle, I don’t see any reason why an MSM/weighting approach would be inadvisable… but then, I’m a graduate student, and I hear we do unwise things all the time.

My reply:

My short answer is that, while I recognize the importance of the causal issues, I’d probably model things in a more mechanistic way, not worrying so much about causality but just modeling the output as a function of the exposures, basically treating it as a big regression model. If there is selection (for example, someone not taking the drug because of negative side effects that are correlated with the outcome of interest), this can bias your estimates, but my guess would be that a straightforward model of all the data (not worrying about propensity scores, weighting, etc) might work just fine. That is, if the underlying phenomenon can be described well by some sort of linear model and there’s not big selection in the nonresponse, you can just model the data directly and just interpret the parameter estimates as is.

To which Wood continued:

I’m mainly hesitant to trust the results of a multivariable-adjusted model because there’s some evidence from the bit of my dissertation I’m working on now that there is some amount of selection happening. Previously, I’ve fit a marginal structural model and compared the results with the MV-adjusted model, and the parameter estimates change by 10-20%, depending on which outcome measure I’m looking at. (I’m interpreting it as selection; I realize one can’t directly compare the results of the MSM and regression model.)

I see what she’s saying, and from this perspective, yes it makes sense to include selection in the model. In theory, my preference would be to model the selection directly by adding to the model a latent variable defined so that exposure could be taken as ignorable given available information plus this latent variable. That said, I’ve never actually fit such a model, it just seems to me to be the cleanest approach. For the usual Bayesian reasons, I’d generally not be inclined to use weights based on the probability of treatment (or exposure), but, again, I can see how such methods could be useful in practice if applied judiciously.

P.S. On a related topic, Adan Becerra writes:

Do you or readers of the blog know of anyone currently working on Bayesian or penalized likelihood estimation techniques for marginal structural models? I know MSMs are not a Bayesian issue but I don’t see why you couldn’t estimate the inverse probability weights within a Bayesian framework.

Regular readers will know that I don’t think inverse probability weights make much sense in general (see the “struggles” paper for more on the topic), but maybe one of you in the audience can offer some help on this one.


  1. Keith O'Rourke says:

    Mollie might also have to worry about (or already is) time varying treatment effects in addition to “concerned about time-varying confounding”.

    For instance, folks trying to use g-estimation to remove treatment benefit received from placebo patients given rescue treatment at time t – ran into this problem. With some drugs, those placebo patients get the treat at time t primarily for ethical or compassionate reasons – no one really thinks the treatment would work at that time point in their disease. But the program ran fine and gave nice (but unfortunately very misleading) answers.

  2. Z says:

    I’ve tried in the past to develop a Bayesian MSM. The problem I ran into was that MSMs are semiparametric, i.e. the full likelihood is not specified in terms of the parameters you actually estimate.

    If you want to handle time varying confounding Bayesianly and use a Jamie Robins approach and are not married to MSMs, it’s completely straightforward to do Bayesian G computation.

    • Z says:

      I should clarify. MSMs themselves are not semiparametric. They are just specifications of the marginal structural relationship between exposure over time and response. The inverse probability weighting method of estimating the parameters in an MSM is semiparametric. Robins has actually come up with a likelihood based method for estimating MSM parameters, but it’s unwieldy.

    • Ed says:

      Forgive my ignorance, but what’s a Bayesian G computation?

  3. Rahul says:

    Wow. Sounds like attempting to answer a very complicated question. With so much noise and confounders.

    Whatever complex way you model this, is there a reasonable chance of getting an answer you’d believe? How big is the study size? How large is the expected effect?

  4. Anders_H says:

    Saarela et al had a very recent paper about Bayesian Marginal Structural Models at

    Robins, Hernan and Wasserman responded at

    (I’m a graduate student in epidemiology working with Miguel and Jamie, I wish I could be more help but I don’t yet understand the relationship between Bayes and inverse probability weighting well enough to give meaningful advice.)

  5. jrc says:

    I wish I could help (I’m struggling with similar problems from a different perspective), but I don’t think there are a ton of great answers on this. I mean, you could put in a dummy for every permutation of took-it/didn’t-take-it, but Andrew and many others would probably think this is trading too much bias for variance. If you think there are no interactions (meaning the effect of taking the drug in the second trimester does not change depending on what you did in the first trimester) then dummy variables for yes/no exposed in each age period would work.

    But that said, I do want to make one substantive point: whenever I read this kind of thing and see someone refer to “time-varying effects” or “exposures” I think “No. I think you mean age-varying effects or exposures.” I make the point because so often people conflate changing effects over time with changing effects over the age-cycle. These are fundamentally different things and depending on your data, using time controls versus age controls can make a huge, huge difference in the actual identifying variation your model latches on to. I also think this is a problem in theory where so many models subscript things by “t” when they would be so much more coherent if they subscripted by “a”.

    If you think tylenol/opiate use is changing over time in the sense that from year to year people as a whole take more or less of it, OK. But my guess is that the age-profile is the thing you are talking about… but maybe I’m wrong?

    • Martha says:

      I agree that people often fail to distinguish between “time varying” and “age varying,” but this case helps points out that “age varying” can refer to a number of things. In many cases, “age varying” refers to “age cohort,” whereas in others it refers to differences in biological age, independent of time of birth. But in this case, it refers not to the biological age or cohort of the person (the pregnant mother) using the tylenol/opiate/whatever, but to the timing within the gestation period (which could be considered the prenatal age of the fetus.)

      • jrc says:


        I was re-reading Grossman’s “On the Concept of Health Capital…” today, and the lack of clarity is there too. His writing usually refers to the dynamics of health capital accumulation in terms of age, but he talks about his model in terms of “time periods” and those periods are indexed by ‘i’, which is like hitting the trifecta of ambiguity in these models [biological world (age), economic world (time) and mathematical world (i), all describing the *same* aspect of the model].

        I honestly think that the lack of clarity in the rhetoric we use to describe these models is leading to misguided thinking on how we specify and interpret statistical analyses of human development.

        • Martha says:

          The problem different interpretations of words is more widespread than just human development.

          For example, “statistical significance” (quite naturally) sounds to many people as meaning “significant and backed up by data.” So we need to make the point continually that “statistical significance” is not the same as “practical significance” — but the distinction is in practice rarely emphasized.

          Then there’s “95% confident,” which a lot of people like to use because it gives them a feeling of understanding — but when pressed, they can’t explain it, or even distinguish between a correct and an incorrect explanation (especially if the incorrect one is shorter).

          And what the heck do weather forecasters mean by “40% chance of rain tomorrow”? I suspect you would get a variety of answers if you polled a bunch of weather reporters.

    • Anders_H says:

      When Robins talks about “time-dependent exposures”, this is not an attempt to discuss whether the parameter for the effect is constant with time, nor is it an attempt to distinguish between age/cohort/time effects

      Rather, he assumes that the data on any individual is structured according to some meaningful time scale (for example “time since randomization/baseline”) which he labels with the subscript t. He wants to estimate the effect of treatment, but he recognizes that individuals may change their treatment status over the course of follow-up, leading to confounding and selection bias. Marginal structural models are designed to control for biases that arise as data is generated sequentially along the t time scale.

      It is certainly possible that exposure has varying effects depending on age or calendar time. If you want to allow that, it is easy to use interaction terms in the outcome model. However, the primary time scale is still going to be time since baseline – the t subscript is absolutely essential to the logic behind the models.

      • jrc says:

        I see what you are saying – I think of that as “event time”.

        But in this case, I still think event time more closely relates to “age” in the sense of a developing fetus having an “age” (thanks Martha!) because we know that inputs at different points in the gestational cycle have different effects. So it isn’t like having on-on-off is the same as having on-off-on, and it isn’t the same because the function that goes from maternal input to fetal health is different in trimester 1 than in trimester 2. Subscripting as “t” is fine, but I just think it obscures the fact that the thing you should be thinking about isn’t “t” it is “a”.

        Think of it like this: suppose I take pregnant women of all gestational ages and randomly assign treatment as yes/no each month for the next 9 months. If I”m thinking “t” as in “time since treatment began” I’m missing the point. If I think “a” as in “were you treated at gestational age a or not” then I’m thinking about the right problem. If you start treatment for all women at conception, then the problem disappears because you’ve joined “calendar time” and “age time” together.

  6. Jay Kaufman says:


    On Bayesian estimation of marginal structural models

    Olli Saarela1,*,
    David A. Stephens2,
    Erica E. M. Moodie3 and
    Marina B. Klein4

    Article first published online: 10 FEB 2015

    DOI: 10.1111/biom.12269

    The purpose of inverse probability of treatment (IPT) weighting in estimation of marginal treatment effects is to construct a pseudo-population without imbalances in measured covariates, thus removing the effects of confounding and informative censoring when performing inference. In this article, we formalize the notion of such a pseudo-population as a data generating mechanism with particular characteristics, and show that this leads to a natural Bayesian interpretation of IPT weighted estimation. Using this interpretation, we are able to propose the first fully Bayesian procedure for estimating parameters of marginal structural models using an IPT weighting. Our approach suggests that the weights should be derived from the posterior predictive treatment assignment and censoring probabilities, answering the question of whether and how the uncertainty in the estimation of the weights should be incorporated in Bayesian inference of marginal treatment effects. The proposed approach is compared to existing methods in simulated data, and applied to an analysis of the Canadian Co-infection Cohort.

Leave a Reply