Describing descriptive studies using descriptive language, or the practical virtues of statistical humility

Posted on March 2, 2009 11:56 AM by Andrew

When writing my book with Jennifer, I learned to be super-careful in my use of causal language. For example, when describing a regression coefficient, instead of saying “the effect of x on y,” I trained myself to say, “the average difference in y, comparing people who differed by one unit in x.” Or, in a multiple regression, “the average difference in y, comparing people who differed by one unit in x while being identical in all other predictors.”

At first it’s a struggle to speak this way, but eventually, I have found, this constraint has improved my thinking.

Application to the studies that purport to show that “real-life voters must also have based their choice of candidate on looks”

Yesterday I discussed an article that claimed (misleadingly, in my opinion) that people decide how to vote based on candidates’ physical appearance.

Let’s try to describe the study using Jennifer’s non-causal approach. OK, here goes:

Winning politicians are judged to be more attractive, on average, than losing politicians.

Or, if there is some controlling for background variables:

Comparing two political candidates, one who won and one who didn’t, but who are the same age, sex, and …, the winner was, on average, judged to be more attractive than the loser.

At first glance, this might not seem to give us anything beyond the usual summary. But I find its precision helpful. Once the results are expressed as a difference, it’s clear that there’s no direct relevance to the question of how people vote; rather, it’s a statement about a way in which successful and unsuccessful politicians differ. Which, among other things, perhaps makes it clearer that there are a lot of ways this could happen.

More generally

The “comparisons” way of describing regressions has helped me in other ways. Iain Pardoe and I wrote an article on average predictive comparisons, in which we focused on the question of what does it mean to compare two people who differ on one input variable while being identical on all the others. Among other things, this helped clarify for me the distinction between inputs and predictors in a regression model. (For example, in a model with age, sex, and age*sex, there are four predictors–the three items just listed, along with the constant term–but only two inputs: age and sex. It’s a challenge to try to compare two people who differ in age but are identical in sex and age*sex–but people do this sort of thing all the time when they look at regression coefficients.)

The interpretation of coefficients as comparisons also helped clarify my thinking regarding the scaling of regression inputs. Now my default is to rescale continuous inputs to have standard deviation 1/2, which makes a comparison of one unit comparable to the difference between 0 and 1 for a binary variable. (Actually, I have to admit that I’m starting to wish that, for comparability with standard deviations in other examples, that I’d set the default of rescaling to have a standard deviation of 1, and rescaled binary inputs to be +/-1. I don’t know if I have it in me to shift everything in this way, though.)

My recommendation

When describing comparisons and regressions, try to avoid “effect” and other causal terms (except in clearly causal scenarios) and instead write or speak in descriptive terms. It might seem awkward at first, but give it a try for a week. In the amorphous world of applied statistics, it can be oddly satisfying to speak precisely.

13 thoughts on “Describing descriptive studies using descriptive language, or the practical virtues of statistical humility”

Anonymous on March 2, 2009 10:43 AM at 10:43 am said:

What constitutes a "clearly causal scenario" in the social sciences? I'm asking out of curiosity and not to be a pain in the rear.
Pierre-Hugues on March 2, 2009 11:14 AM at 11:14 am said:

Nice comments and I agree that most models should not be interpreted causally (except maybe for well planned longitudinal studies and survival analyses). However, I am not sure I follow your comment about input and predictors:

"It's a challenge to try to compare two people who differ in age but are identical in sex and age*sex…"

It seems clear that in that situation they cannot be identical in age*sex. When faced with interactions, I never try to interpret the "main" effects without talking about the interaction (interpreting by predictors). Instead, I would only interpret inputs: for example, to compare 2 people who differ in age but have the same sex, the difference would include the coefficient of both age and age*sex (at least in a regression setting).

Am I missing something here, or was that the point you were trying to make with that example?
Tom on March 2, 2009 12:29 PM at 12:29 pm said:

Thats a very good question that anonymous brought up. Im curious as well.
Kaiser on March 2, 2009 2:07 PM at 2:07 pm said:

Taking your age and sex example, and taking the no-interaction model age + sex, it is still a challenge to explain to someone that we are comparing two people who differ by age but have the "average sex" or people who differ by sex but have the identical "average age". The latter assumes equal weights among the age groups as defined, not the realistic age distribution.
john on March 2, 2009 4:07 PM at 4:07 pm said:

Actually, your statements are imprecise as well. As a female friend of mine once said, completely honestly, "I think X is much better looking now that he's been promoted to functional manager…" but of course he wasn't, not really. The key element left out of "Winning politicians are judged to be more attractive, on average, than losing politicians" and the more detailed version is that the people making the judgment did not know whether the politicians were winners or not. Without that qualifier, it's a "duh, given human nature of course they are."

A more accurate, but not flawless, statement would be "Politicians who are judged more attractive win more often in head-to-head election contests", which at least gets the sequence of the judgers' judgments / knowing who won correct.
Bill Harris on March 2, 2009 4:44 PM at 4:44 pm said:

Andrew, that sounds very similar to a lesson out of organization development work: speak at the bottom level of the ladder of abstraction / ladder of inference. When we speak at high levels of inference, we make claims that are often imprecise and often lead to "he said / she said" types of disputes. When we speak at a much more concrete level, we often find we really understand what each other is saying, and we either get fewer disputes or they're at least more clearly focused, it seems.

Another case where a maxim for statistics may align with a maxim for working with people (like the proposed connection between convergent interviewing and MCMC I once made here). Thanks for the note!
Andrew Gelman on March 2, 2009 5:36 PM at 5:36 pm said:

Anon: A clearly causal scenario would be an experiment or a natural experiment.

Pierre: Indeed, that was the point I was trying to make. That if you try to interpret the coefficient directly, you get into difficulties which force you to think more clearly about the model.

Kaiser: I agree, and that's why I didn't talk about an average age or average sex.

John: I agree that the conditions of the study are important, including who are the people being studied, what information they were given, etc. My point is that the numerical information from the regression can be expressed as a comparison; beyond this, yes, much auxiliary info is needed in order to make the comparisons. But, no, I don't agree with your final sentence. In this example, it looks like the election came before the rating.

Bill: Interesting analogy; thanks.
ceolaf on March 2, 2009 8:09 PM at 8:09 pm said:

Somewhere along the way, I was taught to write "is associated with," when talking about the relationship between DVs and IVs in regresson analysis.
a on March 3, 2009 4:01 AM at 4:01 am said:

Benjamin Whorf, an insurance adjuster who wrote on linguistic relativity (perhaps overstated) had a neat example where he calimed a fire was caused by employees smoking by the empty gas cans rather than by the full gas cans – empty gas cans because of the fumes are much more flamable but the "empty" in the sign "empty gas cans" semanticly suggests "harmless"

There are probably many areas of statistics that are rife with semantically "dangerous" descriptions

Perhaps we need to develope a set of less dangerous descriptions

p_value – probability given the Null hypothesis is true that an observation as unlikely or more unlikely (as supportive or more suportive of the alternative) than what was observed

posterior probability – probability given the joint model of both parameters (used/motivated to represent?) and observations conditioned on what was observed …

On the other hand, I was never able to think of a semantically safe sign for empty versus full gas cans …

But they likely will be awkward and a struggle to use
Amanda Owen on March 3, 2009 4:26 AM at 4:26 am said:

So I've ended here by following along from fivethirtyeight.com (I like your style over there by the way – the numbers and raw data are why I started reading that website – although I'd like more informative axis on your graphs and I'd love the R-code you're using to MAKE the graphs).

As a behavioral scientist starting to dabble in regression… I'm curious about why you find scaling makes it easier to interpret your results. Forcing age to a scale of 0-1, sd .5 seems like it would be convenient for input into the model but hard for thinking about how much change in age causes a change in your DV in a real way. Do you turn around and unscale them after getting the model results?
Sanjay on March 3, 2009 8:43 AM at 8:43 am said:

In my training in psychology, we often had the opposite thing come up (sort of)… Throughout grad school, I had it hammered into me that unless you have run a randomized experiment, you cannot use causal language. Much of the time, forcing yourself to think about non-experimental data in purely associational terms is a useful exercise — because it reduces the kind of mistaken interpretations you're talking about.
But when it is an automatic habit (rather than the result of a thoughtful deliberation about causality), it has a downside. As just one example, I have seen people dismiss carefully controlled, prospective longitudinal studies as "just correlational data." When I've argued that such a study can narrow down the set of plausible causal interpretations (e.g., ruling out that Y caused X), they look incredulous — because they were trained to believe that if it ain't a randomized experiment, it's useless.
seelbs on March 3, 2009 11:50 AM at 11:50 am said:

If one has sex =0 then age*sex = 0, so you just have to assume that the subjects in these causal models have 0 sex. That may differ a lot from what happens the real world…
Michael Bishop on March 8, 2009 3:45 PM at 3:45 pm said:

I agree these practices would improve the intellectual honesty of the social sciences and reduce journalists tendency to oversell findings as well.

Comments are closed.