Joshua Vogelstein pointed me to this post by Michael Nielsen on how to teach Simpson’s paradox.
I don’t know if Nielsen (and others) are aware that people have developed some snappy graphical methods for displaying Simpson’s paradox (and, more generally, aggregation issues). We do some this in our Red State Blue State book, but before that was the BK plot, named by Howard Wainer after a 2001 paper by Stuart Baker and Barnett Kramer, although in apparently appeared earlier in a 1987 paper by Jeon, Chung, and Bae, and doubtless was made by various other people before then.
Here’s Wainer’s graphical explication from 2002 (adapted from Baker and Kramer’s 2001 paper):
Here’s the version from our 2007 article (with Boris Shor, Joe Bafumi, and David Park):
But I recommend Wainer’s article (linked to above) as the first thing to read on the topic of presenting aggregation paradoxes in a clear and grabby way.
P.S. Robert Long writes in:
I noticed your post about Simpson’s paradox and wanted to let you know about another nice teaching approach using DAGs based on a paper by Perl, but implemented in the fantastic DAGitty tool:
I have used this to teach Simpson’s Paradox to masters level students recently. The module (led by Mark Gilthorpe) teaches advanced modelling concepts.
The DAGitty tool simulates the data which you can give to the students and ask them to explore. You have a main exposure X and outcome Y, and various “potential confunders” Z1, Z2 etc. The beauty of this is that by running models that successively add more of the potential confounders, the estimate for the main exposure X changes from positive to negative and back again.
Y~X+Z1 gives a correct estimate
Y~X+Z1+Z2 gives a biased estimate
Y~X+Z1+Z2+Z3 gives the correct estimate
and so on.
So the students get very confused by that. So then you show them the DAG and it all nicely falls into place, with the bottom line of “be careful what variables you thrown into a model.”
I have mixed feelings about this particular tool as I often work in settings where the concept of “true causal effect” doesn’t mean much. On the other hand, causal reasoning is often a central goal for researchers, in which case this tool could be helpful.
P.P.S. Let me clarify the above point about causal inference as it seems to have led to a lot of confusion in the comments:
There are examples of conditioning paradoxes in which causal reasoning does not arise. Red-blue is an example. There’s no treatment involved at all, I’m just looking at different sorts of comparisons in the data. Comparisons can be interesting and important even without a causal question. I’m not dismissing the importance of causal inference (obviously not; look at the title of this blog!), I’m just saying that puzzles of conditioning arise even in non-causal settings, which suggests to me that causal reasoning is not necessary for the understanding of these problems, even though in many settings it can be useful.
A helpful analogy here might be to statistics and decision analysis. Statistics is sometimes called the science of decisions, and statistical inference is sometimes framed as a decision problem. I often find this helpful (hence the inclusion of a decision analysis chapter in BDA) but I also have seen many examples of statistical analyses where there is no corresponding decision, where the goal is to learn rather than to decide. Hence, although I often find decision analysis to be useful, I don’t feel that it is a necessary part of the formulation of a statistical problem. My attitude toward causal reasoning is similar.