Simpson’s Paradox not always such a paradox

I’m on an email list of media experts for the American Statistical Association: from time to time a reporter contacts the ASA, and their questions are forwarded to us. Last week we got a question from Cari Tuna about the following pattern she had noticed:

Measured by unemployment, the answer appears to be no, or at least not yet. The jobless rate was 10.2% in October, compared with a peak of 10.8% in November and December of 1982.

But viewed another way, the current recession looks worse, not better. The unemployment rate among college graduates is higher than during the 1980s recession. Ditto for workers with some college, high-school graduates and high-school dropouts.

So how can the overall unemployment rate be lower today but higher among each group?

Several of us sent in answers. Call us media chasers or educators of the populace; whatever. Luckily I wasn’t the only one to respond: I sent in a pretty lame example that I’d recalled from an old statistics textbook; whereas Xiao-Li Meng, Jeff Witmer, and others sent in more up-to-date items that Ms. Tuna had the good sense to use in her article.

There’s something about this whole story that bothers me, though, and that is the implication that the within-group comparisons are real and the aggregate is misleading. As Tuna puts it:

The Simpson’s Paradox in unemployment rates by education level is but the latest example. At a glance, the unemployment rate suggests that U.S. workers are faring better in this recession than during the recession of the early 1980s. But workers at each education level are worse off . . .

This discussion follows several examples where, as the experts put it, “The aggregate number really is meaningless. . . . You can’t just look at the overall rate. . . .”

Here’s the problem. Education categories now do not represent the same slices of the population that they did in 1976. A larger proportion of the population are college graduates (as is noted in the linked news article), and thus the comparison of college grads (or any other education category) from 1982 to the college grads today is not quite an apples-to-apples comparison. Being a college grad today is less exclusive than it was back then.

In this sense, the unemployment example is different in a key way from the other Simpson’s paradox examples in the news article. In those other examples, the within-group comparison is clean, while the aggregate comparison is misleading. In the unemployment example, it’s the aggregate that has a cleaner interpretation, while the within-group comparisons are a bit of a mess.

As a statistician and statistical educator, I think we have to be very careful about implying that the complicated analysis is always better. In this example, the complicated analysis can mislead! It’s still good to know about Simpson’s paradox, to understand how the within-group and aggregate comparisons can differ–but I think it’s highly misleading in this case to imply that the aggregate comparison is wrong in some way. It’s more of a problem of groups changing their meaning over time.

7 thoughts on “Simpson’s Paradox not always such a paradox

  1. Exactly – not an apples-to-apples comparison.

    The same thing arises in the meta-analysis of dignostic tests versus RCTs.

    In RCTs the subgroups within studies are well defined and equal in distribution. In dignostic tests patients are not randomized to have the disease and different studies often sample populations with different disease prevalences.

    But the inertia in the literature was so strongly inclined to _stratify_ as an obvious right thing to do – that I created an anti-Simpson's paradox example – study populations were selectively sampled so the stratified analysis gave the most wrong answer while the aggregate analysis gave the least wrong answer.

    But as an aside on the usual Simpson's paradox, I believe there 2 1/2 key confusions, dimension reduction – going from numerator and denominator to percent(vector addition to addition), the "noting versus changing" thing Pearl puts so well and confounding.


  2. One useful way to think about a Simpson's Paradox example is to ask Which situation would I prefer, assuming that I don't know my underlying status? In the kidney stones example cited in the WSJ article, suppose I don't know whether my kidney stone is small or large. I want the traditional surgery since it worked better on either large or small stones. In the airline example, I prefer Continental over United because they did better at every (studied) airport (so if I know where I'm flying from, Continental does better at that airport, and if (for some odd reason) I don't know which airport I'll be using, Continental is the better choice).

    But in the baseball example, I want Jeter over Justice because the appropriate measure of ability is overall, aggregate, batting average, not batting average in a smaller, one-year, sample. If I don't know which year it is, I still prefer Jeter, since I believe that he is a better hitter based on the aggregate data. Any Justice advantage in a given year can be attributed to chance. As always, in statistics the context matters. This is a Simpson's Paradox example, but one in which the aggregate picture is the one to consider.

    For the unemployment situation, I would rather be looking for work in 1982 than in 2009 — whether I know my educational status or not — since employment rates were better at each level of education in 1982 than in 2009. (This is not to deny Andrew's point that being a college grad today is less exclusive than it was in the past.)

    Jeff Witmer

  3. Jeff: But the point is the the demographic equivalent of "you" today, mapped back to 1982, might not have been a college grad.

  4. great post. the lesson of simpson's paradox is to always look at the data at several levels of aggregation. If the analyses lead to different conclusions, that will lead us to explore more, and understand the data better. If aggregation is always bad, then we really don't need to do statistics!

  5. I totally agree that there's a problem with the groups changing their meaning and not being an apples-to-apples comparison. I think the other issue here is that this is a pretty clear cut case of 'adjusting for an intermediate outcome' or 'controlling for an endogenous variable' or 'overadjustment' (depending on what tradition you're coming from)?

    The motivation for analysing by subgroup is to try and remove differences that are not due to the effect of the exposure we're worrying about – in this case variations in unemployment risk between 1982 and 2009.

    If we were talking about something like gender, I imagine there wouldn't be a problem stratifying. We can think of gender as a risk factor for unemployment and not being effected by unemployment or risk of unemployment. If we stratify we remove the confounding effect of gender, and look at the real within-group comparison of changes in unemployment across time.

    The problem with education is that unemployment risk is going to influence which statum you get slotted into (particularly if there's a natural rate of unemployment). People are going to select or be selected into education based on employability or unemployability. So by controlling for education status I worry that we're going to be removing some of the real effect of unemployment risk over time that we're interested in.

  6. Andrew:

    I agree that the demographic equivalent of "me" today, mapped back to 1982, might not have been a college grad, but it isn't as if the change in the distribution of educational achievement is an interesting but independent factor in the story. Educational achievement has change over time largely because of the changing workplace environment.

    If we replaced no HS/HS/some college/college by bottom 25%/next25%/next 25%/top 25% — that is, if we did not change the weights in the weighted averages — then Simpson's Paradox would be mathematically impossible. Any example of Simpson's Paradox depends on a distribution changing, and that change is what is (potentially) interesting. One could say of the famous Berkeley admissions example that men and women have (or had) different departmental aspirations, but rather than saying "this is no longer apples-to-apples" I would say "Thus we aren't comparing apples-to-apples! That's extremely interesting and important!"

    Here is a hypothetical conversation that might capture the (dis?)agreement:

    Person A: "Things are worse today than 30 years ago b/c unemployment is worse at every level of education."
    Person B: "Things are not worse. Thank goodness we've pushed more people into the highest (college grad) category where unemployment is relatively low; thus the aggregate number is better today than it was 30 years ago."
    Person A: "But 30 years ago I could have gotten a decent job with a HS diploma. Now I have to be a college grad to get that same job. So my father ('me mapped back in time 30 years') got a job without going to college. It took me 4 years, lots of tuition money, lots of income forgone (the _real_ cost of college) to end up where my father ended up."

    I agree with your central point that in the unemployment example "the within-group comparisons are a bit of a mess." I even directed other people to read your post on this aspect of Simpson's Paradox, because I think it is a point well taken and often overlooked. But I maintain that I would rather be looking for work in 1982 than in 2009. The fact that the demographic equivalent of "me" today, mapped back to 1982 might not be a college grad only makes this more pronounced.

  7. You're right about preferring to look for work in 1982 rather than now, but for the wrong reason. The real reason is that at the height of the recession in 1982 the average length a person was unemployed was 16.6 weeks. Now it's 28.5 weeks. The median then was about 12 weeks and now it is 20 weeks.

Comments are closed.