Skip to content

The difference between …

Bruce McCullough points out this blog entry from Eric Falkenstein:

Recently the Wall Street Journal has had several articles about estrogen’s link to heart disease in women, highlighting a recent New England Journal of Medicine article showing that it lowers risk of arterial sclerosis. Then last week, the Journal did a story concentrating on how the Women’s Health Initiative (WHI) misread the data by focusing on the increased heart attack risk for women over 70, While neglecting the lowered rate of heart attack for women under 60 (since the WHI’s 2002 report arguing that estrogen therapy actually raised heart disease–opposite sign to previous findings–hormone sales plummeted 30%). The WHI shot back in a letter to the WSJ, arguing they stand by their interpretation of the data, which they think is somewhat mixed, and in their words, the differences in heart disease between the older and younger (one up, one down!) is not ‘statistically significant’. If the difference isn’t statistically significant, I can’t see how the old cohort can be thought to have a higher than average risk (eg, if the sample estimate for the old is +14%, for the young, -30%, if the difference is noise, the +14% is certainly noise). As Paul Feyerabend argued, there are no definitive tests in science, as people just ignore evidence that goes against them, emphasizing the consistent results.

I don’t really have anything new to say about the Women’s Health Initiative but I did want to point this out since it’s an interesting reminder about the difficulty of using statistical signifcance as a measure of effect size.

Just a couple of weeks ago I was meeting with some people who were doing a health study where effect A was positive and not statistically significant, effect B was negative and not stat signif., but the difference was stat. signif. They had another comparison in their study where A was positive and stat signif, B was negative and not signif, and the difference was not stat signif. They were struggling to figure out how to explain all these things. Rather than give some sort of “multiple comparisons correction” answer, I suggested the opposite: to graphically display all their comparisons of interest in a big grid, to get a better understanding of what their study said. Then they could go further and fit a model if they want.

Falkenstein also writes,

Estrogen therapy helps women with symptoms of menopause, including hot flashes, bone loss, but also depression, wrinkles, vaginal dryness, and lower sexual desire. Though not mentioned in the WSJ articles, I think it is the latter issues are what really bothers the WHI. Women’s groups are fond of coming up with pretexts to desexualize women…

I don’t know enough about the WHI to answer this one, but I imagine that they want to be extra careful when assessing estrogen therapy, given the problems with earlier recommendations on this.


  1. ed says:

    I don't have anything to say about WHI either, but Falkenstein's post seems too simple minded to me.

    He asserts that "If the difference isn't statistically significant, I can't see how the old cohort can be thought to have a higher than average risk." But of course this is quite possible, especially if the standard error for the younger cohort is large.

    And since we might expect the incidence of heart disease in the young cohort to be much lower than in the older cohort, this would lead to much less precise estimates of the sign of the treatment effect in the younger group. So a situation where the old group effect was stat signif, the young group showed the opposite sign, but the difference was not stat signif, wouldn't be that surprising.

    (But I totally agree with you about the perils of confusing statistical and substantive significance.)

  2. David Lloyd-Jones says:

    "depression, wrinkles, vaginal dryness, and lower sexual desire," hunh?

    And exactly which of these has an objective measure?