Skip to content

Varying treatment effects, again

This time from Bernard Fraga and Eitan Hersh. Once you think about it, it’s hard to imagine any nonzero treatment effects that don’t vary. I’m glad to see this area of research becoming more prominent. (Here‘s a discussion of another political science example, also of voter turnout, from a few years ago, from Avi Feller and Chris Holmes.)

Some of my fragmentary work on varying treatment effects is here (Treatment Effects in Before-After Data) and here (Estimating Incumbency Advantage and Its Variation, as an Example of a Before–After Study).


  1. Jonathan (a different one) says:

    “Once you think about it, it’s hard to imagine any nonzero treatment effects that don’t vary.”


  2. K? O'Rourke says:

    Andrew: I very much agree that parameter (treatment) variation (or lack of commonness) has been too much neglected in the literature, but this does seem to be changing fairly quickly right now ( i.e. ATE being refined and clarified into many four word acronyms).

    Its neglect between studies (meta-analysis), was not so easy given the concerns of heterogeneity were so obvious and pressing there. And there was that neat, apples and oranges metaphor.

    Within studies: not sure how extensively it has been looked for. I tried to get some interest in carrying this out, using that “ugly” plotting method I worked out as one of the primary motivations was to discover heterogeneity by individual observation and unit of analysis. As many, on this blog will know, I have had a lot of difficulty getting any interest in that.

    The last entry is here

    Two failed journal submissions both suggest the material does not have enough technical innovation to merit publication in a professional journal (and I actually mostly agree) and one suggested a lack of clarity of purpose. And admitted none of the examples was real or had much substance.

    In the hopes a tricking people into trying the graphs out on real applications, the software will be slowly upgraded and is available at

  3. Mark says:

    This is something that I’ve been thinking about for quite a long time, particularly in the context of medical research. Up front, I have to confess that I am becoming more and more a “randomization-based inference only” type of statistician (at least in my thinking… in my collaborative applications, I have to be more pragmatic and diplomatic).

    So, given this, I just have to ask: isn’t it possible that using statistical models, even fancy hierarchical models, to estimate a “treatment effect” might be barking up the wrong tree altogether? This just perplexes me every time I hear someone say that we need to present confidence intervals because randomized comparisons cannot stand by themselves (i.e., the very reasonable conclusion, to me at least, that *some people* in our study did better on Drug A than on Drug B), or every time I see a meta-analysis based on the strong assumption that there is a reasonable “treatment effect” to be estimated in the first place.

    I mean, if we accept the fact that “it’s hard to imagine any nonzero treatment effects that don’t vary” across people, why would we assume that they wouldn’t also vary within people? If we do accept that, then how can we possible define anything like a “treatment effect”? That seems rather akin to defining an individual’s reaction any time the phone rings. Anyway, I know that I’m a long way from home, outing my model-free fantasy on Andrew’s blog (which I really dig, by the way… even if the exclusive focus on modeling does sometimes make me wince). People are just so afraid of the concept of uncertainty, they want their medicine to have a degree of certainty!

    • Andrew says:


      I agree that treatment effects can vary within people. This is related to Walter Mischel’s classic point about personality traits varying by situation. With repeated treatments and measurements on people, it should be possible in many settings to estimate the sort of varying treatment effects that you are discussing.

      On your other point: randomization-based inference is fine. You can use “randomization-based inference only” in the subset of problems for which this is possible and appropriate; this will leave lots of work for modelers such as myself who are studying problems in toxicology, public opinion, climate, etc., where randomization inference won’t do the job!

      • Mark says:

        Thanks Andrew,

        I appreciate everything you said, and especially agree with your last point that randomization is not always applicable! Totally happy to let the modelers have their day, so to speak, when randomization is out of the picture.

        I guess my point was more in the case where randomization is possible and is in fact the norm (e.g., testing medical or behavioral interventions). My strong feeling is that most people have no idea that they are jumping into a territory where strong assumptions (or MUCH stronger assumptions, anyway) are required when they insist either that manuscripts present confidence intervals or that meta-analyses are somehow higher up the “evidence pyramid” than individual RCTs. I can happily analyze an RCT without assuming a constant treatment effect, without the need for SUTVA, etc. The only big, unverifiable assumption that I really need, and it’s plenty big enough, is that any loss is non-differential between arms. The moment I estimate a “treatment effect”, I’m in a whole new territory.

        • K? O'Rourke says:

          Had I waited, most of my first comment would have been unnecessary…

          But I worry about the randomization not being actually done correctly (e.g. an investigator holding back certain patients given they can guess what the assignment will be), a loss of independence in the measurement of the response (e.g. a highly correlated measurement instrument problem within just say the treatment group), a loss of independence in the response itself (e.g. most of the control group being in that day when one of the examiners forgot to wash their hands and infected most of the control group with nasty GI infection), etc, etc, [thought that was SUTVA related].

          Sort of glad I don’t work in that area any more.

          I do very much agree with your point that the randomization test [with the strong Fisher null] has the least possible assumptions that are likely to be violated. [Believe Fisher once stated that was why he thought it was an ideal design]

    • K? O'Rourke says:

      Mark: Some good points, but

      Yes it nails “that *some people* in our study did better on Drug A than on Drug B”.

      But – not much for generalizing the average effect to expect in populations that will be treated.

      In general, your stance seems a mix of “you can’t but your foot twice in the same river” and “all models are wrong [with the sometimes useful left out]”.

      Indeed, in biologics (e.g. antibiotics) the variation within patient may be extremely critical. Fortunately this is often not the case and it can safely be ignored. If one had to deal with dynamic variation, they might wish to check out Susan Murphy’s work on dynamic treatment designs.

      Also, for meta-analyses, if you thought about “that *some study* in in the _published_ showed patients did better on Drug A than on Drug B” again pretty much all the same reasoning would apply (but with drastically heightened concerns about variation) and you might wish to check out Ingram Olkin’s methods on how to address this (giving up on modelling treatment effects through some data model/likelihood [I have not thought much yet about doing plots in the absence of likelihoods to plot].)