“Behavioural science is unlikely to change the world without a heterogeneity revolution”

Christopher Bryan, Beth Tipton, and David Yeager write:

In the past decade, behavioural science has gained influence in policymaking but suffered a crisis of confidence in the replicability of its findings. Here, we describe a nascent heterogeneity revolution that we believe these twin historical trends have triggered. This revolution will be defined by the recognition that most treatment effects are heterogeneous, so the variation in effect estimates across studies that defines the replication crisis is to be expected as long as heterogeneous effects are studied without a systematic approach to sampling and moderation. When studied systematically, heterogeneity can be leveraged to build more complete theories of causal mechanism that could inform nuanced and dependable guidance to policymakers. We recommend investment in shared research infrastructure to make it feasible to study behavioural interventions in heterogeneous and generalizable samples, and suggest low-cost steps researchers can take immediately to avoid being misled by heterogeneity and begin to learn from it instead.

We posted on the preprint version of this article earlier. The idea is important enough that it’s good to have an excuse to post on it again.

P.S. This also reminds me of our causal quartets.

5 thoughts on ““Behavioural science is unlikely to change the world without a heterogeneity revolution”

  1. I missed this the first time around so I’m glad you posted it again. I liked it a great deal. I wonder to what extent medical research avoids the heterogeneity they highlight. Many clinical trials are multi-site and the elaborate protocols are designed to remove much of the cross-site variability – or, at least, to capture data about that variability. Issues remain about applying the results out-of-sample (such as under-representation of race or gender), but many of the salient characteristics that could produce heterogeneous effects are standardized in the protocols. Behavioral research certainly seems sloppier in terms of careful protocols. And, it would be difficult to establish protocols for observational studies (though it might still serve as a template for the kinds of data that should be collected and documented).

    • The difference is medical researchers get more funding so sample size is an order of magnitude higher, which allows even smaller average differences to be significant. That is why the typical NNT is now around 50-100, rather than 1-10 like for treatments developed pre-1950ish. Ie, something like 99 people are given a drug but only one experiences a net benefit. No one knows which patient will win that lottery.

      https://www.thennt.com/

      To progress we need to turn our focus away from summary statistics and back to the individual. And observe the timecourse for each individual, if not dose-response. Then we have data that can be used to guess rational models of what is going on. Comparing a snapshot of two averages has proven to be infertile grounds for science.

  2. Another thing that I have seen too often, which can be related to the individual heterogeneity issue, is that measurement variance for a single individual is not reported. For example, the New York Times recently reported on a study of the effects of High Intensity Training on blood pressure (BP). The published work claimed a significant reduction. They even charted before vs after systolic BP values for each individual in the study (HIT participants and controls).

    Many but not all of the participants showed a drop in BP (some increased). But the variance of each person’s BP readings was not reported. So it is impossible to assess whether anyone’s before/after difference has any statistical value. For example, the individuals who showed an increase in before/after BP may very well not have “actually” increased, just because of normal measurement variation.

    Perhaps the researchers assumed that the between-individual variance was the same as the individual’s variance. That would have been a very questionable assumption unless it had been tested, but the subject was not covered in the paper so far as I could see.

Leave a Reply

Your email address will not be published. Required fields are marked *