Skip to content
 

Using y.bar to predict y: What’s that all about??

Toon Kuppens writes:

After a discussion on a multilevel modeling mailing list, I came across this one-year-old blog post written by you.

You might be interested to know that in social psychology, taking the aggregate outcome variable to predict the outcome variable has been used as a test of ‘convergence’, the phenomenon that people’s responses convergence to those of others in the group. The first one to use this method was Eliot Smith (see the 2007 paper, “Can Emotions Be Truly Group Level? Evidence Regarding Four Conceptual Criteria,” by (see Smith, Seger, and Mackie). I think this is just plain wrong but I had the hardest time convincing editors and reviewers of the error made by Smith (in fact, I failed to convince anyone). In the end my critique on the method had to be buried in a paper on predicting variability.

I wrote back: If the goal is to understand group effects, could you do a cross-validation-like thing where you use the mean of everyone else’s responses to predict each person’s response? Would that get around the problem?

Kuppens responded:

That solves the problem in that the coefficient for the group mean no longer exactly equals 1, and is no longer significant in randomly generated data.

The thing is that the “outcome group mean” as predictor picks up on all differences between groups. The more groups differ, the stronger the effect of the “outcome group mean” will be. When the individual is excluded from the calculation of the group mean, however, that variable will assess how extremely the overall pattern of group differences is reflected in particular individuals. If there’s a type of individuals that have extreme scores in the direction that the group differs from other groups, such people would show stronger effects of such a variable (ie the group mean calculated without the individual’s contribution).

Maybe there are cases in which such an analysis is perfectly reasonable. But I could not see any such cases mentioned in response to your blog, or on the multilevel mailing list. Importantly, the group mean does not assess how similar individuals are to the group, but how much they differ from the group in the same direction that the group differs from the overall mean. That’s complicated and I can’t think of any immediate use for such an analysis, so your “I think you have to be careful” response is very sensible.

To which I wrote: Yes, I see your point. It’s related to the issue of causality vs. unmodeled heterogeneity, but in a slightly more subtle way.

Kuppens adds:

I forgot to add that you could also refer to a more recent JPSP paper on collective nostalgia that also uses the ‘convergence’ analysis used by Smith, Seger, and Mackie (2007):
Wildschut et al. (2014) http://psycnet.apa.org/journals/psp/107/5/844/

The original Smith paper can be found here: http://psycnet.apa.org/journals/psp/93/3/431/

5 Comments

  1. Mikkel says:

    Josh Angrist has a paper on this issue in the context of peer effects models in economics:
    http://www.sciencedirect.com/science/article/pii/S0927537114000712

    • Andy W says:

      Neat reference Mikkel! (Manski has a related paper as well, _Identification of Endogenous Social Effects: The Reflection Problem_)

      Not to leave a social science out, this discussion reminds me of “frog-pond” effects in Sociology. Although that was to see how individuals perceptions were relative to the group, not to assess convergence within the group. (So kind of the opposite of convergence?) See _The Effect of Different Forms of Centering in Hierarchical Linear Models_ (Kreft, de Leeuw, and Aiken, 1995) and _Groups as contexts and frog ponds_ (Firebaugh, 1980) as two examples.

  2. David says:

    Josh Angrist has a nice recent overview paper on this exact issue, including what can and can’t be learned from the ‘leave-one-out’ type method. See http://economics.mit.edu/files/9800

  3. Elin says:

    I find it kind of amazing that people still do the drop one out model which was a way that we thought about trying to model this when I was in grad school in sociology in the 1980s back when you had to leave a logistic regression to run overnight and we all knew it was problematic then. It is problematic to treat a group mean, with or without the individual, as fixed and I would think you would want conceptualize it as something more like spatial autocorrelation.

Leave a Reply