Good examples of lurking variables?

Posted on November 17, 2011 6:58 PM by Andrew

Rama Ganesan writes:

I have been using many of your demos from the Teaching Stats book . . . Do you by any chance have a nice easy dataset that I can use to show students how ‘lurking variables’ work using regression? For instance, in your book you talk about the relationship between height and salaries – where gender is the hidden variable.

Any suggestions?

30 thoughts on “Good examples of lurking variables?”

Eli on November 17, 2011 7:02 PM at 7:02 pm said:

I always liked the WWII bombing analysis — I think this was in some old textbook (by Tukey?). After the war they studied the accuracy of strategic bombing with regressions. Some things made sense (different types of bombers had different accuracy levels, higher altitude meant less accuracy). But one variable was whether enemy fighters opposed the bombers, and this had the *opposite* effect from what anyone would expect (fighter opposition meant more accuracy). Want to try to guess the hidden variable?

Cloud cover. If the weather was cloudy the enemy wouldn’t bother to send up fighters, and accuracy was terrible because in that era bombing depended on sighting landmarks on the ground.
- Andrew on November 17, 2011 7:10 PM at 7:10 pm said:
  
  Eli:
  
  That’s good. What would complete the analysis would be the construction of some fake data to make the point. It would be really cool if the students in the class could themselves learn how to create the fake data.
  - Jonathan on November 17, 2011 7:41 PM at 7:41 pm said:
    
    You could maybe falsify the data yourself and make them try and find the issue? That would be fun too for both you and them!
Chainsaw Riot on November 17, 2011 9:49 PM at 9:49 pm said:

UCB admissions data (UCBAdmissions in dataset package of R) on the relationships among gender (exposure), admission (outcome) and department (confounding/lurking).
- Andrew on November 17, 2011 10:17 PM at 10:17 pm said:
  
  I don’t like that example because it’s hard to do it as a regression.
Reader on November 18, 2011 3:13 AM at 3:13 am said:

What about trade and conflict, with distance as a lurking variable? The bivariate relationship is positive, but this simply reflects that closer pairs of states tend to trade more and fight more.
derek on November 18, 2011 3:26 AM at 3:26 am said:

To me, height is the hidden variable in discussions of the relationship between sex and salaries, not the other way round. Can both those statements be valid?
- Rama on November 18, 2011 10:18 AM at 10:18 am said:
  
  Gender is closer to being the cause of salary discrepancy than height. Now, there might be various luring variables between gender and salary, and Andrew has another blog post on that (Andrew, could you please link? Thanks!)
  - jimmy on November 18, 2011 11:07 AM at 11:07 am said:
    
    “closer to being the cause?” what does that mean?
    - Daniel Lakeland on November 18, 2011 12:27 PM at 12:27 pm said:
      
      It means that the “cause” is something like the long term biases built in to societal norms, but those biases are more related to the gender than to the height. I would guess that within gender height might be an issue, but that across genders you’ll find even that people of the same height have a gender discrepancy, combine that with the fact that the two genders tend to have different average heights, and you get a large portion of the bias.
    - derek on November 18, 2011 1:53 PM at 1:53 pm said:
      
      How do you distinguish that statement from “within height, gender might be an issue, but that across heights you’ll find that even people of the same gender have a height discrepancy”?
    - Rama on November 18, 2011 12:55 PM at 12:55 pm said:
      
      In consumer research, an area that I am getting to know, they are always looking for intervening variables or ‘process measures’. A causes C via B. So they set up experiments, where they show that A causes C. Then they measure/manipulate B to show the causality.
    - jimmy on November 19, 2011 2:24 AM at 2:24 am said:
      
      dan, thanks for the reply. by more related, do you mean more correlated? and when you talk about looking within genders vs between genders, i think that is an example of when andrew talks about the all else equal fallacy.
      
      i preface this with the disclaimer that i don’t really know what i am talking about.
      rama, is that consumer research example your definition of a lurking variable? would you define a lurking variable as a confounder too? (again, i have no clue what any of these terms mean.) if yes, then something seems off. definitions of confounding usually say that the confounder cannot be in the causal path.
    - Rama on November 19, 2011 6:45 PM at 6:45 pm said:
      
      Jimmy (There is no reply button on your post so I’m replying to myself) A confound is something that you need to get rid of to get your paper published. An intervening variable/mediator is something you need to have to get your paper published. That just about sums it up for me.
Matt on November 18, 2011 6:15 AM at 6:15 am said:

I seem to remember one from university talking about the number of storks in an area was a fantastic predictor for the number of babies being born in areas of Oslo. Turns out the hidden variable was the number of chimneys in the area as storks like nesting there!
- Rama on November 18, 2011 10:19 AM at 10:19 am said:
  
  I love this image of storks and babies and chimneys. However, why would number of chimneys be related to number of babies born in an area? Now that question leads to me other kinds of images — fireplaces are aphrodisiac?
  - Phil on November 18, 2011 12:12 PM at 12:12 pm said:
    
    Perhaps there are more chimneys in more densely populated areas.
  - Daniel Lakeland on November 18, 2011 12:28 PM at 12:28 pm said:
    
    number of chimneys is closely related to number of people.
    - Rama on November 18, 2011 12:56 PM at 12:56 pm said:
      
      So this case, number of people is the lurking variable.
    - Daniel Lakeland on November 18, 2011 3:13 PM at 3:13 pm said:
      
      people with chimneys. In a region where chimneys are uncommon the number of people would be less related.
Tom on November 18, 2011 7:10 AM at 7:10 am said:

Aren’t “lurking” or moderator variables in social research really the same thing as instrumental variables in econometrics?
- George Papaioannou on November 18, 2011 4:12 PM at 4:12 pm said:
  
  I actually have the same question.
  - Soren on November 20, 2011 7:48 PM at 7:48 pm said:
    
    To be clear, I understand a ‘lurking variable’ to be an ‘omitted variable’ in the metrics sense.
    
    The answer, then, is no. Recall that instrumental variables (IV) are used to ‘instrument’ for variables that we believe *are* correlated with the error term. For instance, consider a model where we look to describe earnings based on education and age with age^2, for instance. It’s likely that ability describes wages, but ability is unobserved and is correlated with education. [So ability is the omitted variable, and thus is the lurking variable to which you were referring?] So since ability is in the error term, u, we have bias in the parameter on education. So to avoid this, we look for a variable Z that
    
    (1) is correlated with education, so that Cov(educ, Z) ≠ 0
    
    (2) is not correlated with the error, so that Cov(Z,u) = 0.
    
    Then by estimated the model:
    
    educ = A + B*Z + u_2
    
    and taking the predictions (which are notably *not* correlated with ability), we are able to estimate the effect of education that is *not* correlated with ability, which is what we wanted.
    
    So a ‘lurking variable’ is not the same thing as an ‘instrumental variable’ because they have strictly different definitions, but an IV is sometimes used in response to a well specified lurking variable.
    - George Papaioannou on November 21, 2011 5:32 AM at 5:32 am said:
      
      thank you. That helped a lot to distinguish the two concepts. I keep that “an IV is sometimes used as a response to well speceified lurking variables”
jimmy on November 18, 2011 11:57 AM at 11:57 am said:

hi andrew, (eli’s example is pretty good.) do you think it would be worthwhile to define some of the terms? i always find this conversation confusing, because i do not ever really know what people mean when they use these terms above. what is a lurking variable? is it a confounder? if yes, what is confounding? people will sometimes say (as part of their definition) that the confounder is associated with both exposure and outcome. however, others will strengthen this requirement and say that the confounder has to cause exposure and outcome. if so, is the uc berkeley admissions example an example of confounding? but why does it make sense to say that department causes gender? if a lurking variable is not a confounder, what is it? and then, how do other things such as interaction, instrumental variables, etc fit? how do you think about this?
Soren on November 18, 2011 1:58 PM at 1:58 pm said:

What about David Card’s classic dataset on return to schooling where he instruments education with a dummy for proximity to a four-year college. The lurking variable here, as usual, is ability, where ability is correlated with education (meaning that an explanatory variable is correlated with the error term). The dataset is available from Wooldridge’s Intermediate metrics text, and is available for download at the following link:

https://academic.reed.edu/economics/parker/s11/312/asgns/data/CARD.DTA
- Rama on November 19, 2011 10:05 AM at 10:05 am said:
  
  I can’t figure out how to open this dataset – please help!
  - tgs on November 20, 2011 1:32 AM at 1:32 am said:
    
    From the fellow’s website:
    
    Note: The .dta file is a Stata dataset that should be downloaded and opened in Stata. The .def file is a text file with definitions of the variables and sample statistics. Open the latter in any text editor.
afoss on November 20, 2011 11:03 AM at 11:03 am said:

My favorite is the (possibly apocryphal) finding that ice cream consumed and number of drownings are correlated, with the lurking variable being hot weather.
Nick Cox on November 21, 2011 1:42 PM at 1:42 pm said:

Brian L. Joiner. 1981.
Lurking variables: some examples.
The American Statistician 35(4): 227-233.

Comments are closed.