Skip to content

Asking the question is the most important step

In statistics, the glamour often comes to those who perform a challenging data analysis that extracts signal from noise, as in Aki Vehtari’s decomposition of the famous birthday data which led to the stunning graphs on the cover of BDA3.

But, from a social-science point of view, the biggest credit has to go to whoever asked the question in the first place. For the birthday example, that credit goes to Becca Levy, Pil Chung, and Martin Slade who were the ones who noticed the pattern of excess births on Valentine’s day and fewer on Halloween. From a statistics perspective, the challenge began there. But nothing would’ve been done without the question being asked.

I’ve been thinking about this recently in the context of the recent discussion of trends in mortality rates among middle-aged whites:


I worked super-hard to make the graph above, along with lots of other displays like this grid that helped me understand what was going on in the data. And people such as this anonymous commenter have been looking carefully into issues of data quality.

All of this work is useful and relevant, I do believe. But, from the social science perspective, what’s far more important is asking the question in the first place, which is what Case and Deaton did in their recent article. The data have been out there for any of us to grab and graph and analyze. But we didn’t. Case and Deaton did, and that’s what got the ball rolling. (And, to be fair, they also rolled the ball most of the way.) I’m happy to have refined their analyses and, as noted yesterday, I wasn’t so thrilled by one of Case’s offhand remarks, but let me emphasize that all this discussion is predicated on their effort, on their knowing what to look at, which in turn derives from their justly well-respected research on public health and economic development.

That’s the big picture. Acknowledging a statistical bias correction is fine, and statisticians such as myself have our place in the research ecosystem, but all the bias correction and modeling and clever graphics in the world won’t help you if you don’t know what to look at. And in this particular example, I had no idea of looking at any of this until I was pointed to Case and Deaton’s work. Aggregation bias was an entry point for me into this problem, just as analyzing all 366 days was an entry point for Aki into the birthday problem. If we can correct and improve and expand existing analyses, that’s great, but in these cases none of our contributions could’ve happened without the work by the original authors.

It’s not Us vs. Them. It’s never Us vs. Them. It’s Us and Them. Or, perhaps more accurately, THEM followed by a little bit of us. And that’s one reason I want them to respect and understand us, not to fear us and be defensive. We want to be useful, which we can do by building upon their work and motivating them (not just the original researchers, but the whole field) to do more.

The point of a bias correction is not “gotcha!” Rather, the point of the bias correction is that the original researchers are studying something interesting and important, and we want to help them do better.


  1. John Mashey says:

    Since you included a picture without naming ji,, it seems apropos to select from <a href=";?WikiQuote:

    “Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.”

  2. numeric says:

    Gee, I always ask the question while American presidential elections since 1964 have had a large, sometimes overwhelming racial component (1968 obviously, but 1988 (“I’ll make people think Willie Horton is Dukakis’s running mate”), 2008, etc) but quantitative academic political scientists never manage to get this in their models (Sears is the Maytag repairman). Krugman lists a number of such people in his column today as his “heros” (shout out to you, Andrew), but Krugman subscribes to the racial bait and switch theory of Republican elites and those he listed deemphasize, ignore or suppress it. Oh well, parison is the only true crime, as Orwell would say. But it says something when the critical race theorists, with their non-quantitative approaches, have portrayed electoral more accurately than the rational choice group ensconced in the top political science departments.

    • Andrew says:


      I’m not so well acquainted with the field of political science—I’m primarily a statistician—but it’s my impression that the mainstream of American Politics research (and I put my own research within the mainstream) is neither based on rational choice or on critical race theory. I think we’re mostly atheoretical. Yes, we have some default models, but the models have enough fudge factor that they don’t really fall within a theory. We discuss this a bit in Red State Blue State. I’m not saying that our tenuous connection to theory is necessarily a good thing—there are aspects of politics that our field systematically underrates—I just wouldn’t say that we work within any particular theoretical framework, rational choice or otherwise.

  3. ezra abrams says:

    “this grid” I learned that this is a trellis plot from Naomi Robbins great book, which, imo, is much better then the flashier, more exspensive books from the men

    also, imo, this is why, in effect, excel is a disaster: if your software won’t do trellis plots, you don’t do them..

  4. Anoneuoid says:

    They report the main cause of increased mortality for non-hispanic whites is poisoning (~67%; the rest mostly suicides). They vast majority causes of poisoning deaths are ICD10 codes X42 and X44, unfortunately I don’t think it gets any more specific. It looks strange, like there is a cohort effect:

    X42: Accidental poisoning by and exposure to narcotics and psychodysleptics [hallucinogens], not elsewhere classified

    cannabis (derivatives)
    lysergide [LSD]
    opium (alkaloids)

    X44: Accidental poisoning by and exposure to other and unspecified drugs, medicaments and biological substances

    agents primarily acting on smooth and skeletal muscles and the respiratory system
    anaesthetics (general)(local)
    drugs affecting the:

    cardiovascular system
    gastrointestinal system

    hormones and synthetic substitutes
    systemic and haematological agents
    systemic antibiotics and other anti-infectives
    therapeutic gases
    topical preparations
    water-balance agents and drugs affecting mineral and uric acid metabolism

    Data from CDC wonder:

  5. Josh Garoon says:

    Andrew, are you familiar with S. Jay Olshansky’s work from 2012?

    Case and Deaton don’t cite it, which is one reason I bring it up.

    • Martha says:

      The title, “Differences In Life Expectancy Due To Race And Educational Differences Are Widening, And Many May Not Catch Up,” sounds, shall we say, not well thought out: It makes sense to talk about possible differences in life expectancy resulting from race differences (thinking not of genetic differences but societal differences). But talking about “differences in life expectancy resulting from educational differences” is iffy, since there are possibly factors that could influence both life expectancy and educational differences.

Leave a Reply