# Correcting statistical biases in “Rising morbidity and mortality in midlife among white non-Hispanic Americans in the 21st century”: We need to adjust for the increase in average age of people in the 45-54 category

In a much-noticed paper, Anne Case and Angus Deaton write:

This paper documents a marked increase in the all-cause mortality of middle-aged white non-Hispanic men and women in the United States between 1999 and 2013. This change reversed decades of progress in mortality and was unique to the United States; no other rich country saw a similar turnaround.

Here’s the key figure:

I have no idea why they label the lines with three-letter abbreviations when there’s room for the whole country names, but maybe that’s some econ street code thing I don’t know about.

Anyway, the graph is pretty stunning. And for obvious reasons I’m very interested in the mortality of white Americans in the 45-54 age range.

But could this pattern be an artifact of the coarseness of the age category? A commenter here raised this possibility a couple days ago, pointing out that, during the period shown in the above graph (1989 to the present), the 45-54 bin has been getting older as the baby boom has been moving through. So you’d expect an increasing death rate in this window, just from the increase in average age.

How large is this effect? We can make a quick calculation. A blog commenter pointed out this page from the Census Bureau, which contains a file with “Estimates of the Resident Population by Single Year of Age, Sex, Race, and Hispanic Origin for the United States: April 1, 2000 to July 1, 2010.” We can take the columns corresponding to white non-Hispanic men and women. For simplicity I just took the data from Apr 2000 and assumed (falsely, but I think an ok approximation for this quick analysis) that this age distribution translates by year. So, for example, if we want people in the 45-54 age range in 1990, we take the people who are 55-64 in 2000.

If you take these numbers, you can compute the average age of people in the 45-54 age group during the period covered by Case and Deaton, and this average age does creep up, starting at 49.1 in 1989 and ending up at 49.7 in 2013. So the increase has been about .6 years of age.

How does this translate into life expectancy? We can look up the life table at this Social Security website. At age 45, Pr(death) is .003244 for men and .002069 for women. At age 54, it’s .007222 for men and .004301 for women. So, in one year of age, Pr(death) is multiplied by approximately a factor of (.007222/.003244)^.1 = 1.08 for men and (.004301/.002069)^.1 = 1.08 for women—that is, an increase in Pr(death) of 8% per year of age.

The above calculations are only approximate because they’re using life tables for 2011, and for the correct analysis you’d want to use the life table for each year in the study. But I’m guessing it’s close enough.

To continue . . . in the period graphed by Case and Deaton, average age increases by about half a year, so we’d expect Pr(death) to increase by about .6*8%, or about 5%, in the 45-54 age group, just from the increase of average age within the cohort as the baby boom has passed through.

Doing the calculation a bit more carefully using year-by-year mortality rates, we get this estimate of how much we’d expect death rates in the 45-54 age range to increase, just based on the increase in average age as the baby boom passes through:

This is actually not so different from the “US Whites” line in the Case-Deaton graph shown above: a slight decrease followed by a steady increase, with a net increase in death rate of about 5% for this group. Not identical—the low point in the actual data occurs around 1998, whereas the low point is 1993 in my explain-it-all-by-changes-in-age-composition graph—but similar, both in the general pattern and in the size of the increase over time.

But Case and Deaton also see a dramatic drop in death rates for other countries (and for U.S. Hispanics), declines of about 30%. When compared to these 30% drops, a bias of 5% due to increasing average age in the cohort is pretty minor.

Summary

According to my quick calculations, the Case and Deaton estimates are biased because they don’t account for the increase in average age of the 45-54 bin during the period they study. After we correct for this bias, we no longer find an increase in mortality among whites in this category. Instead, the curve is flat.

So I don’t really buy the following statement by Case and Deaton:

If the white mortality rate for ages 45−54 had held at their 1998 value, 96,000 deaths would have been avoided from 1999–2013, 7,000 in 2013 alone. If it had continued to decline at its previous (1979‒1998) rate, half a million deaths would have been avoided in the period 1999‒2013.

According to my above calculation, the observed increase in death rate in the 45-54 cohort is roughly consistent with a constant white mortality rate for each year of age. So I think it’s misleading to imply that there were all these extra deaths.

However, Case and Deaton find dramatic decreases in mortality rates in other rich countries, decreases on the order of 30%. So, even after we revise their original claim that death rates for 45-54’s are going up, it’s still noteworthy that they haven’t sharply declined in the U.S., given what’s happened elsewhere.

So, one could rewrite the Case and Deaton abstract to something like this:

This paper documents a marked increase flattening in the all-cause mortality of middle-aged white non-Hispanic men and women in the United States between 1999 and 2013. This change reversed ended decades of progress in mortality and was unique to the United States; no other rich country saw a similar turnaround stasis.

Still newsworthy.

P.S. Along similar lines, I’m not quite sure how to interpret Case and Deaton’s comparisons across education categories (no college; some college; college degree), partly because I’m not clear on why they used this particular binning but also because the composition of the categories have changed during the period under study. The group of 45-54-year-olds in 1999 with no college degree is different from the corresponding group in 2013, so it’s not exactly clear to me what is learned by comparing these groups. I’m not saying the comparison is meaningless, just that the interpretation is not so clear.

P.P.S. See here for a response to some comments by Deaton.

P.P.P.S. And still more here.

## 57 thoughts on “Correcting statistical biases in “Rising morbidity and mortality in midlife among white non-Hispanic Americans in the 21st century”: We need to adjust for the increase in average age of people in the 45-54 category”

1. What does the age-adjusted US Hispanic line look like, is that also falling by ~30%? Or are they getting younger on average within the 45-54 bracket?

• Poochie:

I don’t know—it would be more effort to get this because it wouldn’t be so reasonable to just take the 1990 population and shift it—but I can’t imagine the bias correction would be much more than 5% in either direction.

2. >”At age 45, Pr(death) is .003244 for men and .002069 for women. At age 54, it’s .007222 for men and .004301 for women. So, in one year of age, Pr(death) is multiplied by approximately a factor of (.007222/.003244)^.1 = 1.08 for men and (.004301/.002069)^.1 = 1.08 for women—that is, an increase in Pr(death) of 8% per year of age.”

Can you explain why you do this calculation rather than: (.007222-.003244)/(54-45)=.000442

Ie mortality rates increase by ~44 deaths per 100,000 males for each year of age. I don’t mean the conversion to per 100k, but why did you choose to look at percentage increase?

• Anon:

Percentage change just seemed like the safest thing to look at. I don’t have much intuition about the absolute mortality rate but I understand what 8% is. But if you use raw numbers you should get similar results.

3. Didn’t other countries have baby booms at a similar time also? It was the end of a *world* war. I guess you’d have to adjust for changes in composition across all of the countries studied to do a good comparison.

• This was my first thought too. It seems that the real difference that was observed would be that US mortality, adjusted for age, is flat – and all the other countries are declining on that basis.

• Dave:

I didn’t have such easy access to the data from other countries. I’m guessing the baby boom was less in those countries compared to the U.S., but even if it’s the same size, you’d end up with something like the same 5% bias, which is pretty small compared to the 30% declines shown in Case and Deaton’s graph.

• I would have expected the proportion of baby boomers to be higher in the UK, Canada, NZ and OZ just because a higher proportion of men were involved in fighting from those countries. Maybe, proportionately more American soldiers came home alive.

The graph seems a little hard to believe – my suspicion would be that the definition of a USW changed in 1998.

• If other countries are decreasing despite the upward bias of increasing average age in the age bracket the result would be even stronger, right?

• >”the result would be even stronger”

Z, the (scientifically legitimate) goal is not to see whether there is a difference between different groups. It is to accurately measure and describe the difference. This paper did not do that, so their result cannot get “stronger”.

But yes, if this binning artifact also exists in the international data the year on year decrease in mortality rate would be greater. In a later comment in that previous thread I looked single-age mortality and population for US hispanics, there was not a visible baby boom effect in that data, I don’t know about the others:

• Many countries had heavy infrastructure losses and a lot of poverty and displacement in the immediate aftermath of the war, so if there were a boom, I’d expect it to be a little later once people’s lives had stabilized. The US suffered much less and thus definitely had an immediate boom.

• Daniel’s comments make sense to me. Also, I would guess that the death toll on men of child-bearing age in the UK, France, and Germany (possibly also Sweden) during WWII was higher than for men from the US, Canada, and Australia, which would also contribute to a smaller, later, or non-existent baby boom in the European countries.

• Canada and Australia, from the above graph, had the same no loss of infrastructure as the USA (you could and the New Zealand data into that group as well). All these countries shared the “war was fought overseas” experience. Eyeballing birth rates from national level sources, they seem to have had very similar baby boom patterns.

• There were probably nutritional differences between the USA and Can/Oz/NZ/UK in the years after the war. NZ rationing didn’t end until 1950, not because we didn’t have enough food, but because we were still shipping tons of food to Europe. Whereas I remember this story about doctors noticing a difference in artery health in young men between the Korean and Vietnam War which they put down to the emergence of fast food in the US.

• I don’t know about Continental Europe, but the British baby boom wasn’t as dramatic as the American one.

• Dave,

Canada had a baby boom but with a noticeable lag in starting compared to the US boom, No idea about other countries.

While it is fairly accurate to define the U.S. baby boom as having taken place in the period between 1946 and 1964, that is definitely not true for Canada. When one graphs the number of live births in Canada, it is quite clear that the “boom” years went from 1952 to 1965 (inclusive).

s

• I’m not seeing a lag in the rate of births v. total population, and it seems to track the same as the US. I don’t know enough about Canadian immigration history etc to know if large scale adult immigration changes, but the since the rate from the late 40s is higher than the rate from the 50s, it suggests to me that for the raw numbers to be highest there (as per the article you linked to) there were a lot more total population around as well (and it increased in proportion to the number of added babies).

• Figure 12 doesn’t substantially disagree with my (model based) prediction that other countries would have lower and/or later booms. Also, that the booms would be related to degree of damage.

It would be interesting to see a “figure 12” like graph for: France, Denmark, Austria, Germany, Norway… and to compare to the eastern bloc etc: Russia, Poland, East Germany, Hungary, Czechoslovakia etc.

Obviously what happened in the aftermath of WWII varied a lot, and there was still plenty of time for lots of political instability, economic instability, etc between 1945-1985

4. With regards to your postscript, my coauthors and I have a paper in which we test for changes in the mortality-education gradient while accounting for changes in the distrbution of education over time (https://dl.dropboxusercontent.com/u/993359/JHE_revision.pdf). After we attempt to correct for the biases raised in your post, we find that the change in the gradient over the period 1984-2006 is not as strong as previous research suggests.

5. Great stuff. I’d also love for someone to see whether changes in the share of Hispanics who are ID’ed as white on death certificates can explain anything. If fewer Hispanics were ID’ed as white by coroners or their families, that would explain the decline in death rates for Hispanics and the rise among non-Hispanic whites (because Hispanics have lower death rates in this age group). As Amitahb Chandra noted on Twitter, if true then we should see regional differences depending on how prevalent Hispanics are and how their numbers changed over time.

6. A wrinkle in this discussion is looking at the causes of death that have increased. Interestingly, the two things that have changed drastically are suicides and poisoning (alcohol and drug overdoses). Drug overdoses are probably being driven by the increased use of painkillers. Neither of these causes has a strong age gradient – not in the way, say, that cardiovascular disease does (but, because we can treat some diseases like cancer better, but this could mean that more people have chronic pain and thus use painkillers, and older people will be more likely to have survived these events…).

And the probabilities death per year have declined from 1990 to 2010, which means that perhaps the results are not as biased as suggested. The increase in the age of the 45-54 year old cohort over time may be offset by the fact that the probability of dying has decreased. Here are the numbers for the whole population (I don’t have quick access to lifetables by race for 1990).

The probability of death from the 1990 SS life table = 0.00416 (males at 45) and 0.00900 at 54. Probability of death increases 8%. For females: 0.00218 0.00514 = 9%, average across genders=8.5%. Prob. increases about 0.00039 per year averaged across genders.

From the 2010 life tables = 0.00363 (males at 45) and 0.00632 (males at 54). Probability of death increases 5.7% for males. For women: 0.00201 0.00403 = 7%, average across genders = 6.4%. Prob. increases about 0.000236 per year.

So, assuming race isn’t interfering with these numbers (which it may!), (0.000236*4.7)/(0.00039*4.1) = .7, which means that average age increase in the cohort might be more than offset by the decline in the probability of dying for each year longer you live. Maybe the numbers are downwardly biased. Someone needs to do this by race to see what’s actually going on.

7. I’ve been working on this, too. I’m about to post my estimates showing age composition change accounts for about half of the increase they report.

8. Good point. When I was a child, Hispanics were considered white. Sure, some were a little dark skinned, but no more so than some Greeks or Middle Easterners. We didn’t even think of Hispanics as a group — ethnicity was mostly a matter of the country of ancestors, with groupings Scandinavian or Middle Eastern used for some purposes but not others. “Hispanic” came later. Even in the mid-sixties, Mexican, Puerto Rican, and Cuban were considered different ethnic groups.

9. It looks to me like age composition change accounts for about half of the rise in mortality they report. They really should have adjusted for age.

10. This effect is likely related to something I’ve observed about Case and Deaton’s numbers: the increase in death rates are worse for Baby Boomers than for those born before and after:

http://www.unz.com/isteve/is-there-a-generational-explanation-for-rising-white-death-rates/

The increase in death rates for each cohort from the Big 3 rising causes of overdoses, suicides, and liver troubles are closely related to being a Baby Boomer. For example, the sharpest rise in death rates from the Big 3 were seen among the 50 to 54 cohort who in 1998 were born between 1944 and 1948, but by 2013 the 50 to 54 cohort was born between 1959-1963.

This is probably due in part to cultural changes related to sex, drugs, and rock and roll (people born in 1959-1963) were exposed to a lot more drugs in high school than people born in 1944-1948), but also in part to this post’s statistical changes about the cohort aging slightly on average due to different birth cohort sizes.

11. Andrew,
I just did a quick check using CDC WONDER data, and your idea checks out nicely. White Non-Hispanic all-cause mortality for 45-47 year-olds is flat; for 52-54 year-olds it’s pretty flat (tiny increase). Non-Hispanic Blacks and Asians, and Hispanics, all show nice declines. The one group with real increases (like 25%) was American Indians. You’ll probably like playing with the data yourself at http://wonder.cdc.gov/ucd-icd10.html If you need any help, email me at cbarber@hsph.harvard.edu The increases in middle-aged deaths by suicide and by drug overdose are definitely real, though. Personally, I think it makes sense to think through when age-adjusting (as opposed to stratifying) makes sense and when it doesn’t. When examining things that have a biological basis for increasing with age (like heart disease), age-adjusting makes sense. But when examining things that are only associated with age via cultural factors and where the association varies across groups (e.g., for whites suicide rates are highest among elderly males but not females; among American Indians they’re highest among young males) it probably doesn’t, and instead it probably makes sense to examine trends within fine strata.
I tried to paste in the graphs here, but looks like I can’t. Email me if you’d like me to send them.

• The sharp increase in overdose death rates is probably not just a statistical artifact, although this post has provided a public service in pointing out the purely statistical reason for a partial explanation.

There were policy changes around the turn of the century that made it easier to get synthetic opiate pain killers, followed by more recent tightening up of availability that caused some to shift to heroin (e.g., the late Philip Seymour Hoffman).

Moreover, the cohorts hit hardest by the causes of death identified by Case and Deaton as the big 3 (overdoses, suicides, and liver failures) tended to be those who turned 18 around 1969 – 1979, i.e, the peak of the Drug Era.

So you had a white cohort who saw drugs at age 18 year olds as cool, as something that celebrities did, as something that belonged to their Generation rather than their alcoholic elders. As they hit middle aged aches and pains, they are more likely to turn to prescription painkillers and then to heroin.

12. Quickly graphed the single year age death rates (each age from 45 to 54) from the human mortality database for the US from 1999 to 2013 for men, women and total. No clear trend of rising rates at any age, mainly flat for women and some downward trends for men. Admittedly this is whole population not just whites.

• Yes, you need to look at non-Hispanic whites specifically.

A general lesson from this disturbing story is how almost nobody noticed the general problem that white death rates for the middle-aged weren’t continuing to fall, unlike most other groups and cohorts. Why did this big story get so little attention until 2015, even though the big spike was in 1999-2002?

There are countless organizations devoted to scouring the statistics looking for inequalities harming blacks and other minorities. But for an organization to have a mission of keeping an eye out for the welfare of whites seems kind of disreputable in this day and age, the kind of thing that might get you blacklisted by the SPLC as a hate group. If Deaton hadn’t just won the Nobel quasi-Prize for Econ, this story might have gotten ignored as well.

• Steve:

Luckily, 45-54-year-old white men are overrepresented among the blogging class.

• But, unluckily, there’s not a lot of humanitarian solidarity between white male members of the blogging class and white male members of the working class, so this story was largely missed for many years by us writer folks on the right side of the bell curve. For example, Charles Murray tweeted yesterday that he missed this death rate problem even though he wrote a 2012 book, Coming Apart, on the struggles of the white working class.

13. Buckles and Hungerman: SEASON OF BIRTH AND LATER OUTCOMES: OLD QUESTIONS, NEW ANSWERS

http://www.mitpressjournals.org/doi/pdf/10.1162/REST_a_00314

Wherein it is argued that socioeconomic groups “select” into certain months of birth (err…conception, thus birth) and this is what drives seasonal differences in labor market outcomes in the U.S. They show that teen birth rate is relatively high in some months compared to others, as well as differences by racial group.

As for the teens: There are more births in some months than other months, going way back. Thus, the population of, say, 16 year olds is older (on average) at some points in the year than at other points in the year (because of the relative weighting of 16y/o and 1 month compared to 16 y/o and 11 months, and the like).

Fertility hazard is rising throughout the teens, so in some months when the 16 y/os are “old” they are more likely to give birth than in other months. Repeat for 17, 18 and 19 y/os. If seasonal fertility is stable, we would EXPECT more teenage births in some months than in others, based on the age-within-years differentials for the whole teenager group across month of potential birth.

I also think there is a chance that differential fertility patterns across months (in the past) by racial/ethnic group could explain those apparent seasonal effects too. I’ve never had the time to check it, but using older natality files it would be easy to predict the differences in ages for “16 y/o’s” in each month, and then predict the differential fertility based on hazard rates, and then predict seasonality of teen births caused by the variation in the age of teenagers across potential birth months.

14. I would have thought the same increase in average age would have been occurring in all the western countries, so the argument would have to be that it is occurring at a faster rate in the US – and that seems implausible.

• Margaret:

As discussed in the comments, the baby boom was not the same in the U.S. as in other countries. In any case, the differences between the U.S. and other countries are much bigger than the aggregation bias. The bias correction changes the “increasing mortality rate in the U.S.” story but it doesn’t change the comparison to other countries.

• If the baby boom happens at different times in other countries couldn’t actually increase the difference when comparing the mortality patterns?

• Israel:

Yes, the bias correction will induce some changes to the between-country comparisons, but these changes are small compared to the big difference we see between the U.S. and other countries.

15. “I have no idea why they label the lines with three-letter abbreviations when there’s room for the whole country names, but maybe that’s some econ street code thing I don’t know about.”

Love the attitude, better to be clear and explain well than to look cool.

16. I too wondered about effects of the age group bin selection and possible age adjustment bias, so I appreciate the discussion on these questions. I’m a little perturbed about the categories used in the analysis and Fig.1. Is it fair to compare the US whites and US Hispanics with entire populations of the six countries? I would like to see how mortality trend of the entire US population (45-54 group) compares to those of the other countries. What are the trends for blacks and Asians? Also, are there ethnic/educational/geographic subgroups in the other countries with mortality trends that differ strongly from the averages shown in Fig. 1?

The similarity of mortality trends for men and women noted on page 2 is to me counter-intuitive. Is this the case only for non-Hispanic whites, or is it true across sub-groups and for the US population combined?

Finally, I wonder whether occupation is a factor. In particular, Americans who were 45-54 during period when mortality increased the fastest (around 1998-2008) were pretty likely to be Vietnam veterans.

• “Americans who were 45-54 during period when mortality increased the fastest (around 1998-2008) were pretty likely to be Vietnam veterans.”

I’d wondered something related but not the same: A large portion of the men in the age range 45 – 54 in 1998-2008 were of draft age during the Vietnam war. That would have affected the characteristics of this age cohort in (at least) the following ways:
Some able-bodied men killed during the war, while those 4F, which includes many in poor health, survived to middle age.
Many men who served in the war were injured, exposed to agent orange or similar health hazards, or mentally scarred during the war; these would be at higher than average risk for early death.

17. “Many men who served… these would be at higher than average risk for early death.”

I agree. And it’s not only those who served and were damaged that may have been at higher risk, but their spouses as well. It may be possible to test the hypothesis.

18. Are there finer ethnic groupings within USW? Or is it possible to look at US-born white and non-US born white?

The definition for “white” is “having origins in any of the original peoples of Europe, the Middle East, or North Africa. It includes people who indicated their race(s) as ‘White’ or reported entries such as Irish, German, Italian, Lebanese, Arab, Moroccan, or Caucasian”. If lots of middle-Eastern people are going to the USA after the wars in the middle-East than perhaps they are skewing the mortality rates (assuming they are dying younger because of their experiences in the wars).

19. So does this mean that there was no Baby Boom in births for Hispanics and Blacks?

If so, that’s a bit of data I did not hear about. Could someone update me on this?

• lightly:

“So does this mean that there was no Baby Boom in births for Hispanics and Blacks?”

It’s not clear what the “this” in this sentence is referring to. Please clarify.

20. Paul Krugman in today’s Times (Monday 11/9) uses this study to frame his column.

http://www.nytimes.com/2015/11/09/opinion/despair-american-style.html

“In particular, I know I’m not the only observer who sees a link between the despair reflected in those mortality numbers and the volatility of right-wing politics.”

I think he’s more persuasive when he uses good economics to support his conclusion.

• Krugman’s gonna Krug.

21. Why did nobody comment on the most obvious rationale for this graph: the use of misleading subgroup analyses? Why are the total population of other countries compared with only a select subgroup of the US? Why are hispanics not considered whites in the US, but in all other countries they may well be so? If you combine the two lines, and throw in all the other americans left out (African Americans, Asians etc.) we may very well see a line that is consistent with the other regions. The fact that it was not presented here, seems to point to a typical example of how to lie with statistics.