When doing regression (or matching, or weighting, or whatever), don’t say “control for,” say “adjust for”

Posted on January 25, 2019 9:36 AM by Andrew

This comes up from time to time. We were discussing a published statistical blunder, an innumerate overconfident claim arising from blind faith that a crude regression analysis would control for various differences between groups.

Martha made the following useful comment:

Another factor that I [Martha] believe tends to promote the kind of thing we’re talking about here is use of language in ways that obscure that the devil is in the details. This can be illustrated in this particular case by the following quote from Marc’s original post:

“controlling for a confounder in a model does not resolve the problem? That is, if I put all covariates into a statistical model and compare it to a model with all covariates + target predictor, I thought I was able to test whether the additional target predictor can account for additional variance in the criterion?”

A big part of the problem here is using the word “control” in a technical meaning that is only vaguely related to the way the word is used in everyday situations. My experience is that the use of “control” here leads people to believe (innocently) that the procedure in question does something stronger than it really does. I think it would be more helpful (communicate more clearly) if the process were called “attempt to adjust for” or “attempt to take into account” rather than “control for”.

I’ve felt this for awhile. For example, in revising our book for the new edition, Jennifer and I went through and changed “control for” to “adjust for,” wherever we could find the phrase. (We also removed the term “statistically significant” except when explaining what it means so that readers know to be wary of it.)

Commenter Mikhail added:

I guess “attempt to adjust for” can be further expanded to “attempt to adjust for using unrealistic linear assumption.”

All adjustments are attempted adjustments and all assumptions are unrealistic. So I don’t mind saying “adjust for,” with the understanding that any adjustment is necessarily an approximation.

50 thoughts on “When doing regression (or matching, or weighting, or whatever), don’t say “control for,” say “adjust for””

yyw on January 25, 2019 9:47 AM at 9:47 am said:

“Attempt” seems to be the keyword here. With it, neither “adjust” nor “control” would be misleading.

Reply ↓
Keith O'Rourke on January 25, 2019 10:03 AM at 10:03 am said:

> the understanding that any adjustment is necessarily an approximation.
Would be nice to have a sense of percentage of working scientists versus say journalists that share that understanding.

Reply ↓
- Ben Hanowell on January 25, 2019 12:59 PM at 12:59 pm said:
  
  By this book, the book Identity Crisis by John Sides, Lynn Vavreck, and Michael Tesler is really bad at saying it’s controlling for stuff it’s actually attempting to adjust for.
  
  Reply ↓
  - Andrew on January 25, 2019 1:12 PM at 1:12 pm said:
    
    Ben:
    
    It could still be a good book. I unthinkingly used the phrase “control for” for many years until I suddenly realized that it made no sense in this context.
    
    In an experiment, you really can “control” a factor by manipulating it. That’s one reason why it’s not appropriate to use the word “control” for an adjustment in a statistical analysis. But until you think of it this way, you can use the word “control” to mean “adjust” all the time, and the actual analyses you’re doing might be just fine.
    
    Reply ↓
Clyde Schechter on January 25, 2019 10:36 AM at 10:36 am said:

Actually, the abuse of language goes even farther. I often see people use the term “control variable” when referring to a covariate in a linear model.

Reply ↓
- Martha (Smith) on January 25, 2019 3:29 PM at 3:29 pm said:
  
  Similarly, I’ve seen people use “independent variables” to refer to the variables they are regressing on, when those variables might have a lot of dependencies among themselves.
  
  Reply ↓
  - Joshua Pritikin on January 30, 2019 2:03 PM at 2:03 pm said:
    
    Great insight! +1
    
    Reply ↓
Robert on January 25, 2019 10:47 AM at 10:47 am said:

When will be launched a new book: Regression and Other Stories?

Reply ↓
Andy Seaton on January 25, 2019 10:47 AM at 10:47 am said:

I’ve felt similarly RE use of the word “explain” in “predictor X explains b% of the variation in response Y”.

This does not match the usual meaning of an explanation, which generally is a much stronger claim. An explanation implies we have some reasons, in this case we need not have any.

Reply ↓
- Alan O'Callaghan on January 25, 2019 11:37 AM at 11:37 am said:
  
  I agree – the phrase “X explains Y% of the variance in Z”, to me, strongly implies a causal relationship.
  
  Reply ↓
- Martha (Smith) on January 25, 2019 3:35 PM at 3:35 pm said:
  
  Me too, although I find it hard to find a better word. “Predict” comes to mind, but that may also sound like suggesting causation. Possibly “are associated with”? (as in “These variables are associated with a% of the variation in response Y”)
  
  Reply ↓
  - Jan on January 26, 2019 12:45 PM at 12:45 pm said:
    
    Would “describe” work?
    
    Reply ↓
Noah Motion on January 25, 2019 10:50 AM at 10:50 am said:

On the one hand, “control for” just has the one vowel (repeated three times). That’s pretty boring. Not much variation in the spectrotemporal energy, though the consonants help some with that. Seminars can be pretty soporific, though, so this is a mark against “control for”.

On the other hand, “adjust for” has three distinct vowels, a consonant cluster (“st”), and an articulatorily and acoustically complex consonant (the “j” sound is an affricate, which is kind of a combination of a stop and a fricative). It’s got a lot more going on!

Add to all this the fact that “adjust for” is also better statistically, and I think it’s clear which one should go through to round two.

This is a post in the seminar bracket thing, right?

Reply ↓
- zbicyclist on January 25, 2019 11:10 AM at 11:10 am said:
  
  Yes, “adjust” will make it to the next round, and take on Steve Martin :)
  
  More on point: One can see a long-term theme here of humility. Adjust is more humble that control. Significant was always a term of hubris. We now find A model, not THE model to explain a phenomenon. We replicate / crossvalidate, etc. because we might not have gotten it right the first time. You can think of a choice of priors as a way of making explicit our prior assumptions, rather than hiding them. We’re perhaps more likely to ask “Hey, does that effect magnitude make sense?” (e.g. yesterday’s post about Fox News resulting in 6 point swings).
  
  Humility is good for the soul, and probably good for the statistics profession (p>.95)
  
  Reply ↓
  - Martha (Smith) on January 25, 2019 3:38 PM at 3:38 pm said:
    
    +1 for humility! (and for Noah’s last sentence)
    
    Reply ↓
- Garnett on January 25, 2019 11:45 AM at 11:45 am said:
  
  Noah:
  
  I liked your comment so much that I ran it by my hearing scientist colleagues. Here’s what one person said:
  
  “…in general American English pronunciation there are three distinct vowels in “control for” (/kənˈt(ʃ)ɹoʊl fɔɹ/) and either two or three in “adjust for,” depending on whether one draws a distinction between /ə/ and /ʌ/ (/əˈdʒʌst fɔɹ/). Plus, “control” has an allophonic affricate and a consonant cluster as well. So, the scorecard is not quite so clear-cut for us…
  
  I’m not sure what all this means, but I think it sounds super-cool.
  
  Reply ↓
  - Noah Motion on January 25, 2019 12:08 PM at 12:08 pm said:
    
    Garnett:
    
    Should I double down and insist that this doesn’t change my conclusions? I mean, sure, what I wrote was at best an oversimplification, but we all agree that “adjust for” should make it round two, so no harm no foul, right? Okay, fine. In keeping with one of many running themes on this blog, and because I genuinely agree with Andrew’s approach to this, I will own up to my mistake. I was rigging the game in favor of “adjust for”, referring mostly, and when convenient for my argument, to the orthographic form rather than the actual acoustic-phonetic form.
    
    Truth be told, “control for” is just as interesting as “adjust for” if you look closely at what people really do. As your colleague(s) point out, the vowels are variable in both phrases. Unstressed vowels in English are reduced and influenced quite a bit by neighboring consonants, the sequence of [t] and [ɹ] typically ends up being a phonologically-weird alveo-palatal affricate (weird for English, anyway), and [l] can do all kinds of funny things.
    
    Someone should probably alert the Retraction Watch guys…
    
    Reply ↓
    - Garnett on January 25, 2019 12:21 PM at 12:21 pm said:
      
      Noah,
      I have but one thing to say:
      
      //𝒻∞”Љ”_ _ //ОПϸ/
    - Martha (Smith) on January 25, 2019 3:49 PM at 3:49 pm said:
      
      I’m tempted to say, “Same to you,” but not sure if that would be using some expletive that I would find embarrassing.
    - Martha (Smith) on January 25, 2019 3:45 PM at 3:45 pm said:
      
      Now that you mention it, I “see” (or hear? and feel physically?) that I pronounce the three o’s in “control for” differently: the first as “uh”, the second long but “open”, and the third “closed”. (I don’ know if “open” and “closed” are official linguistic terms or not, but they’re what a singing teacher once used to distinguish between some English and Italian pronunciations.)
Garnett on January 25, 2019 10:59 AM at 10:59 am said:

This comment made my day!! :)

Reply ↓
Martin Modrák on January 25, 2019 12:18 PM at 12:18 pm said:

Recently I have used “when X is included in the model” which I find more humble than “adjust for”, but without the negative connotation of “attempt”. It also happens to be exactly what I did, which is IMHO a plus :-)

Reply ↓
Jonathan Baron on January 25, 2019 12:30 PM at 12:30 pm said:

Actually it is usually worse than an “approximation”. “Statistical control” usually errs on the side of failing to remove the variance that is supposedly controlled. The word “adjusted” may not help, because it still may leave the impression that the coefficient of the main predictor has been adjusted to take into account a possible confound. Thus, statements like “X affects Y above and beyond the effect of Z” are usually over-estimates of the effect of X.

The paper by Westfall and Yarkoni reviews some of the literature (at the beginning).
http://dx.doi.org/10.1371/journal.pone.0152719
I’m not sure their solution is all that useful.

Reply ↓
Jeff Walker on January 25, 2019 12:36 PM at 12:36 pm said:

Why not just say “conditional on” which only implies what we expect to see in Y given that *we have seen* X=x and doesn’t imply in either stats jargon or everyday language anything more than that.

Reply ↓
Terry on January 25, 2019 1:00 PM at 1:00 pm said:

Sounds like we need a style guide listing preferred word usages in statistics.

I’d bet there are common themes in the preferred words that would provide useful statistical lessons. The themes would also make it easier to organize the preferred words and make them easier to remember.

Reply ↓
- LauraK on January 25, 2019 2:02 PM at 2:02 pm said:
  
  I like this idea. I think in general the lack of operational definitions is really a challenge in statistics. Other fields use statistical terms but I think many of us focus on our mathematical training and so we know what it is mathematically and write it up using whatever we were taught, I am sure I have written “control for” even if I knew it was not really true in the sense of a me controlling it.
  
  A style guide would be a very nice thing. Especially given so many statistical terms are lexically ambiguous with common usage. I wonder if one could be agreed upon?
  
  Reply ↓
LauraK on January 25, 2019 2:14 PM at 2:14 pm said:

What word do you think would be better? “accounts for” explains does not seem that causal to me but maybe I am so used to it. I actually just did an activity with my students about “causal language” I found at http://datalit.sites.uofmhosting.net/books/book/#toc in chapter 2. It is really interesting to think about these issues,

http://datalit.sites.uofmhosting.net/wp-content/uploads/2016/01/Chapter_2_Bergson-Michelson.pdf

Reply ↓
- LauraK on January 25, 2019 2:21 PM at 2:21 pm said:
  
  My apologies: this comment was in response to the comment by Andy S above” I’ve felt similarly RE use of the word “explain” in “predictor X explains b% of the variation in response Y”.”
  
  What do people suggest instead?
  
  Reply ↓
Dan on January 25, 2019 2:46 PM at 2:46 pm said:

I agree with this interpretation of regression but I don’t think it’s necessary for some matching estimators. If you are exactly matching on X, then you’re not adjusting for X, you are indeed holding X fixed. (I think.)

Reply ↓
- Andrew on January 25, 2019 2:51 PM at 2:51 pm said:
  
  Dan:
  
  With matching, you’re comparing units that have similar (or even identical) values of x. But I don’t think it’s helpful to say that this is “holding x fixed.” To me, “holding x fixed” represents something that you’re doing to an individual unit, while “comparing units with similar values of x” is a between-units analysis. This has come up a lot in our discussions of problematic social psychology studies that apply different treatments to different people and then use this to make claims about within-person effects or changes.
  
  Reply ↓
  - Garnett on January 25, 2019 3:00 PM at 3:00 pm said:
    
    In my experience, investigators believe that matching is equivalent to “control” or “adjustment”, so that analysis can proceed by safely ignoring the matching criteria. This is at best inefficient, as described by Bland and Altman in one of their BMJ statistics notes:
    
    https://www.bmj.com/content/309/6962/1128
    
    Reply ↓
  - Dan on January 26, 2019 3:05 PM at 3:05 pm said:
    
    I think the main point I was trying to make is that standard regression adjustment involves extrapolation and linearity assumptions in a way that exact matching does not. I guess reasonable people can disagree about what exactly “holding X fixed” means. Maybe it is safer not to use the language.
    
    Reply ↓
Mark Schaffer on January 25, 2019 6:09 PM at 6:09 pm said:

What should the noun be? It’s convenient to be able to say “I included XXX as controls”, but “I included XXX as adjusters” sounds a bit odd.

Reply ↓
- Clyde Schechter on January 25, 2019 11:39 PM at 11:39 pm said:
  
  “I included XXX as covariates” will do just fine.
  
  Reply ↓
  - Mark Schaffer on January 26, 2019 7:30 AM at 7:30 am said:
    
    “Covariate” is more general, no? And it gets used in different ways by different people. I just had a google around, and the first website that came up said “A covariate can be an independent variable (i.e. of direct interest) or it can be an unwanted, confounding variable.” Another one said “A covariate may be of direct interest or it may be a confounding or interacting variable.”
    
    I’d like to be able to say “I adjusted for XXX” or “I included XXX as a [something]”. Maybe just avoiding the use of control as a verb is OK, i.e., “I adjusted for XXX” or “I included XXX as a control”, but not “I controlled for XXX”.
    
    Reply ↓
- Martha (Smith) on January 26, 2019 5:58 PM at 5:58 pm said:
  
  How about something like, “I included XXX as possible confounding factors”?
  
  (I see “possible” as being an important to include — to help keep the focus on the ever-presence of uncertainty.)
  
  Reply ↓
  - Mark Schaffer on January 26, 2019 7:11 PM at 7:11 pm said:
    
    “Possible confounding factor” works, or “possible confounder”. You’re right that it’s important to include “possible” with confounder. I hadn’t thought about it before, but I don’t think you need “possible” with “control” – the uncertainty is implied, no?
    
    It would be nice if there were a standard one-word alternative noun for “control”, but I can’t think of one (other than the odd-sounding “adjuster”).
    
    Reply ↓
    - Daniel Lakeland on January 26, 2019 10:09 PM at 10:09 pm said:
      
      I like “predictors”
    - Martha (Smith) on January 26, 2019 11:32 PM at 11:32 pm said:
      
      The problem I see with “predictors” is that many people understand this to imply causation and/or certainty.
    - Mark Schaffer on January 27, 2019 5:45 AM at 5:45 am said:
      
      “Predictor” has the same problem as covariate – it covers anything. And you can have a variable that is a good “predictor” but a bad “control” (as in a bad choice for inclusion to address possible confounding).
    - Martha (Smith) on January 26, 2019 11:29 PM at 11:29 pm said:
      
      ” I don’t think you need “possible” with “control” – the uncertainty is implied, no?”
      
      Huh? My impression is that to many people, “control” implies certainty– not uncertainty!
    - Mark Schaffer on January 27, 2019 5:36 AM at 5:36 am said:
      
      I meant like “possible presence of confounding”, as in confounding is something to worry about but sometimes we’re not sure it is actually a problem in a particular application, so we’re uncertain about it. That’s why I like your “possible confounding factor”. People don’t say “possible control” when they have this in mind as a reason for including a covariate; they just call it a “control”. The possibility of there being a confounding problem is understood. But maybe I am reading too much into all this….
    - Martha (Smith) on January 27, 2019 3:37 PM at 3:37 pm said:
      
      “People don’t say “possible control” when they have this in mind as a reason for including a covariate; they just call it a “control”.”
      
      And I think that this practice is a mistake, because this makes it sound like they aren’t thinking in uncertain terms.
Lydia on January 26, 2019 2:54 AM at 2:54 am said:

“all assumptions are unrealistic”

I hope you’re not giving advice to scientists. Are you aware that theoretical assumptions are supposed to be realistic, that is, to be consistent with known facts? I think you’re confusing the fact that mathematicians don’t need to care about the validity of their assumptions with the question of whether or not they are, in fact, realistic.

Reply ↓
- Martha (Smith) on January 26, 2019 6:19 PM at 6:19 pm said:
  
  “Are you aware that theoretical assumptions are supposed to be realistic, that is, to be consistent with known facts?”
  
  To me, “realistic” isn’t the same as “consistent with known facts”; to me it means consistent with reality — and to me, reality includes known facts, but also includes things that are not yet known.
  
  Reply ↓
Lydia on January 26, 2019 2:56 AM at 2:56 am said:

Follow-up to my previous comment:
“any adjustment is necessarily an approximation.” An approximation of what? A phenomenon whose description is based on “unrealistic assumptions?

Reply ↓
Dale Lehman on January 26, 2019 8:34 AM at 8:34 am said:

Pertinent to these issues is the CLAIMS study (https://www.metacausal.com/claims/). We all know of (and have repeatedly discussed) a variety of reasons why study results are overstated – first by poor design, then by the authors writeup, and finally by the media reports. What is not clear is what to do about it. If it is indeed a problem, which I believe it is, then surely some community of researchers need to do something. I am skeptical that we can find an agreed-upon language for describing study results – while the discussion of caused by, related to, adjusted for, controlled for, connected with, associated with, etc etc is interesting and worthwhile, it just doesn’t seem to lead to reducing the problems. The pressure to overstate appears to grow faster than any attempt to instill humility and good scientific practice. The phrase “tilting at windmills” comes to mind.

I have no solutions in mind – I just don’t see much real progress against overwhelming forces (resulting from decreasing attention, increasing complexity, pressures to get attention,jobs, publications, and grants, etc.).

Reply ↓
Brian on January 26, 2019 9:49 AM at 9:49 am said:

Similar argument from Greenland et al. (p42):

“The failure of intuitions in the above examples may arise because common intuitions about confounding control arise from experiments, in which “control” may mean direct physical control (manipulation) of a variable…By definition, physical blocking of a path is not an option in observational studies. Instead, we can only “control” C in a sense of adjustment; that is, we restrict our analyses (and perhaps our data collection) to a stratum of C; we may then combine results across different strata or employ some regression analogue of this process.”

http://publicifsv.sund.ku.dk/~pka/epi18/causaldiagrams.pdf

Reply ↓
Kaiser on January 26, 2019 4:19 PM at 4:19 pm said:

For a good laugh – I was once asked to provide technical feedback to a study of school performance in which they use a matching criterion (with only a few covariates!) but instead of “controlling for” or “adjusting for”, they call the matches “virtual twins”

Reply ↓
- Martha (Smith) on January 26, 2019 11:35 PM at 11:35 pm said:
  
  Ah, the perennial problem with metaphors: Although they are often intended to explain a concept, they all too often foster misunderstanding.
  
  Reply ↓

Statistical Modeling, Causal Inference, and Social Science

When doing regression (or matching, or weighting, or whatever), don’t say “control for,” say “adjust for”

50 thoughts on “When doing regression (or matching, or weighting, or whatever), don’t say “control for,” say “adjust for””

Leave a Reply Cancel reply