Avi Feller and Chris Holmes sent me a new article on estimating varying treatment effects. Their article begins:

Randomized experiments have become increasingly important for political scientists and campaign professionals. With few exceptions, these experiments have addressed the overall causal effect of an intervention across the entire population, known as the average treatment effect (ATE). A much broader set of questions can often be addressed by allowing for heterogeneous treatment effects. We discuss methods for estimating such effects developed in other disciplines and introduce key concepts, especially the conditional average treatment effect (CATE), to the analysis of randomized experiments in political science. We expand on this literature by proposing an application of generalized additive models to estimate nonlinear heterogeneous treatment effects. We demonstrate the practical importance of these techniques by reanalyzing a major experimental study on voter mobilization and social pressure and a recent randomized experiment on voter registration and text messaging from the 2008 US election.

This is a cool paper—they reanalyze data from some well-known experiments and find important interactions. I just have a few comments to add:

1. As I wrote in my review of Angrist and Pischke, once you start talking about average treatment effects, you’re implicitly talking about interactions and varying treatment effects. Were the treatment effect truly constant across all units, we could just speak of “the treatment effect” without having to specify which cases we are averaging over. Any discussion of particular average treatment effects is relevant because treatment effects vary; that is, the treatment interacts with pretreatment variables. It frustrates me to see all this talk of ATE, LATE, CACE, and all the other average treatment effects without even an attempt to model the interactions. So I’m really happy to see this article by Feller and Holmes.

2. I think variation in treatment effects is crucial (as we’ve already discussed), but it can take a lot of data to get good estimates of this variation. In the incumbency example we had hundreds of elections per year, but even so, the year-on-year estimates were pretty noisy. I wrote an article a few years ago on varying treatment effects–it appeared in my book edited with Xiao-Li Meng. I talked about some models there, but it’s hard to estimate them unless you have hundreds of data points. I’m still not sure what to do about all this, but I do feel that the focus on “average treatment effects” seems like a dead end.

See also this blog entry and this presentation.

**P.S.** Chris responds:

While I agree as statisticians that we should strive to characterize as much of the variation in the data as possible I certainly don’t see the central role played by average treatment effects (ATE) as a “dead end”. I would agree/argue that ATE is an important (and descriptive) statistic which should be placed within a fuller analysis of variance. An interesting example arises I believe in drug trials where it is not uncommon that a drug with a significantly beneficial ATE may be shelved due to increased variation in the treated group (or from side effects on other outcomes). The decision to shelve arises from the asymmetric loss function, protecting the harmed more than the helped. Of course the interest then is to try to explain the unexplained variation, and subtype patients at risk under the treatment (this being the holy grail of pharmacogenetics and genomic biomarkers). This then leads us to conditional average treatment effect modeling and the methods we discuss in our paper.

Regards the problem of estimation, is your example due to a lack of a suitable low dimensional statistical model to parameterize the variation? It feels like under a (suitable) hierarchical model framework you should get reasonable estimates, though I may not understand your example.

To which I replied that I agree that ATE is an important and useful concept, and I

respect that ideas such as LATE, CACE, and all the rest are ways of quantifying the ways in which the data are informative about treatment effects. That said, I fear that people can over-focus on the different varieties of ATE. Ultimately, if the particular definition of ATE makes a difference, then there are interactions which are important in their own right.

In answer the second point, yes, I set it up as a hierarchical model, but it was difficult to estimate the hierarchical variance parameters.

Or alternatively that there is a purely random component to the treatment effect, or a component of the treatment effect that depends on unobserved pretreatment variables (seems to amount to the same thing).

In this sense I think it's reasonable to talk about the average treatment effect without necessarily modeling interactions.

Non-constant/common/additive seems to imply varying ;-)

The paper is perhaps a bit too kind/optimistic about what usually done in clinical research.

Daniel – this quote "[modeling interactions provides] a partial base for hope that any conclusion is generalizable to new situations and applicable to specific individuals." from INTERPRETATION OF INTERACTION: A REVIEW BY AMY BERRINGTON DE GONZÁLEZ AND D. R. COX The Annals of Applied Statistics 2007, Vol. 1, No. 2, 371–385 does suggest some explicit consideration of interactions is always "helpful"

But more generally "varying effects" seems like a less distracting term than "interactions"…

Keith