Skip to content

“Light Privilege? Skin Tone Stratification in Health among African Americans”

Kevin Lewis points us to this article by Taylor Hargrove, which states:

Although skin color represents a particularly salient dimension of race, its consequences for health remains unclear. The author uses four waves of panel data from the Coronary Artery Risk Development in Young Adults study and random-intercept multilevel models to address three research questions critical to understanding the skin color–health relationship among African American adults (n = 1,680): What is the relationship between skin color and two global measures of health (cumulative biological risk and self-rated health)? . . .

The findings indicate that dark-skinned women experience more physiological deterioration and self-report worse health than lighter skinned women. These associations are not evident among men, and socioeconomic factors, stressors, and discrimination do not explain the dark-light disparity in physiological deterioration among women. Differences in self-ratings of health among women, however, are generally explained by education and income.

When I read this, my first thought was that this is a topic worth studying (I don’t know the literature on this stuff, so I can’t comment on what’s come before), and my second thought was that the difference between “significant” and “not significant” is not itself statistically significant.

Having skimmed through the paper, I have a lot of problems with the details. Just for example, one of the biomarkers is “waist circumference (1 = >88 cm in women and >120 cm in men).” But that doesn’t sound right. If the issue is being fat, wouldn’t you want some more continuous measure? It’s also not clear to me why they discretize skin color into only three levels.

More generally, I think the strategy of going through results and pulling out statistically significant comparisons isn’t going to work. I’m not exactly talking about “p-hacking” here—it’s not that I think these researchers are fishing around looking for something statistically significant to sell—my problem is more that statistical-significance filtering is a noise amplifier.

I think it would make more sense to use these data to answer specific questions in a focused way, or to perform more clearly exploratory analyses that display all the data.

The current paper is a mix of both, that I don’t think works at all. Just for example:

Additionally, the coefficient for dark skin among women in Model 2 is reduced in magnitude (by approximately 37 percent) and to statistical nonsignificance at the .05 α level, suggesting that skin color differences in education and income explain the dark-light gap in self-rated health among women.

From a statistical point of view, this analysis doesn’t really make sense. To put it another way: this sort of statistical procedure has poor frequency properties, in that if it is used repeatedly, it will often give wrong answers.

I feel kinda bad about saying this, as the paper in question does not seem like a bunch of hype, nor do I see any proposed research misconduct. Still, honesty and transparency are not enough. If the ultimate goal of this research is to learn what we can about skin color and health, I recommend looking at all the data, not selecting on statistical significance, and using multilevel models to get better estimates (see here).


  1. Bill Spight says:

    My first thought was that social factors might be difficult to identify. It might be interesting to research the same questions in other cultures, such as Brazil and India, in which light skin is favored.

  2. LemmusLemmus says:

    ” Just for example, one of the biomarkers is “waist circumference (1 = >88 cm in women and >120 cm in men).” But that doesn’t sound right. If the issue is being fat, wouldn’t you want some more continuous measure?”

    That’s standard operating procedure in medicine. I remember a paper in which researchers had a measure of subjective back pain which ranged from 0 to 100 as their dependent variable. Naturally, before running their regressions, they dichotomized it into high and low back pain, without any explanations given.

    • Joshua Pritikin says:

      I’m not sure about “standard procedure,” but the threshold that I’m familiar with is waist circumference < 0.5 height.

      Browning LM, Hsieh SD, Ashwell M. A systematic review of waist-to-height ratio as a screening tool for the prediction of cardiovascular disease and diabetes: 0·5 could be a suitable global boundary value. Nutr Res Rev. 2010 Dec;23(2):247-69.

Leave a Reply