If you want to measure differences between groups, measure differences between groups.

Les Carter points to the article, Coming apart? Cultural distances in the United States over time, by Marianne Bertrand and Emir Kamenica, which states:

There is a perception that cultural distances are growing, with a particular emphasis on increasing political polarization. . . .

Carter writes:

I am troubled by the inferences in the paper.

The authors state: “We define cultural distance in media consumption between the rich and the poor in a given year by our ability to predict whether an individual is rich or poor based on her media consumption that year. We use an analogous definition for the other three dimensions of culture (consumer behavior, attitudes, and time use) and other group memberships. We use a machine learning approach to determine how predictable group membership is from a set of variables in a given year. In particular, we use an ensemble method that combines predictions from three distinct approaches, namely elastic net, regression tree, and random forest (Mullainathan and Spiess 2017).” And come up with this:

This looks akin to ANOVA or a discriminant analysis. It seems to show (invalidated) predictability, but I can’t get from there to a measurement of trait differences among categories, especially as they include traits that are poorly predictive of category, namely time use. Is there reason to infer that they measure the degree of distinctiveness, or some kind of distance? Or is this analogous to the speaker I once witnessed using p values to rank effectiveness of various therapies?

My reply:

I took a look at the article and I too am unhappy with the indirect form of the analyses there. The questions asked by the authors are interesting, but if they want to study the differences in cultural behaviors of different groups, I’d rather see that comparison done directly, rather than this thing of using the behaviors to predict the group. I can see that the two questions are mathematically connected, but I find it confusing to use these indirect measures. When, in Red State Blue State, we were comparing the votes of upper and lower-income people, we just compared the votes of upper and lower-income people, we didn’t try to use votes to predict people’s income.

Again, the research conclusions in that paper could be correct, and I assume the authors put their data and code online so anyone can recover these results and then go on and do their own analyses. I just find it hard to say much from this indirect data analysis that’s been done so far.

P.S. I also noticed one little thing. The authors write, “We focus on the top and the bottom quartile (as opposed to, say, the top and the bottom half or the top and the bottom decile) to balance a desire to make the rich and the poor as different in their income as possible and the pragmatic need to keep our sample sizes sufficiently large.” That’s right! If you want to make simple comparisons (rather than running a regression), it’s a good move to throw out those middle cases. For more detail, see my paper with David Park, Splitting a predictor at the upper quarter or third and the lower quarter or third, which we wrote because we were doing this sort of comparison in Red State Blue State.

P.P.S. Please round those percentages in the tables. “71.4%,” indeed.

9 thoughts on “If you want to measure differences between groups, measure differences between groups.

  1. … human “Group(s)” and group-behaviors are an intellectual abstraction.
    In the real world, only “individuals” act and individual behaviors observed.
    So what is the fundamental basis of measurement?

    • Nesbitt:

      You can measure individual survey responses or behaviors and compare them across groups. In addition, certain behaviors are characteristics of groups. For example, the smoking rate is not just the product of individual choices; it is also a product of a social and economic environment including the cost and availability of cigarettes, legal restrictions, social expectations, etc.

    • Mikhail:

      I don’t know what it means to measure cultural difference. It can be measured in different ways. In the above paper, they’re comparing different groups based on media consumption—that’s what TV shows they watch, things like that, I guess. Such things are worth studying. My problem with the above-linked research is not that it’s difficult to measure cultural difference. My problem is that their measure is indirect, hard to interpret, and subject to artifacts of measurement. That’s why I’d like to see direct comparisons. I’m not a fan of the general approach to data analysis of taking a big dataset, running it through a big computer program, and then pulling out results. I have a similar feeling with path analysis: it’s a way of extracting conclusions from any data, but the interpretations of these conclusions always seem so indirect.

      • Andrew,

        I do agree that using Machine Learning could be problematic in this case. I just think that using more traditional metrics could be problematic as well.

        “Media consumption” is pretty complex subject involving concepts that are not easily quantified. Any metric summarizing “media consumption difference” between two groups into a single number would be imperfect one way or another. So I would not immediately see why “machine learning separability” distance would be worse than “Hamming distance between shows watched”, “cosine distance between time spend per show” or “mean square game of thrones” or something. Every metric would be subject to artifacts and can be interpreted wrong way.

        I do agree that a lot of analysts seems to be using machine learning algorithms as a substitute for domain knowledge. And this is wrong.

        ML is not a shortcut to discovery. It is more like… getting wasted to survive aerophobia.

  2. I’m afraid the whole issue is captured in one phrase, right up front.

    “We measure cultural distance between two groups as the ability to infer an individual’s group…”

    It conflates “cultural distance”, apparently meant to be a characteristic of groups, with “ability to infer” which is a characteristic of the statistical analyst. I guess we await a future study where someone measures the treatment effect in a clinical trial as how difficult it was to get MCMC convergence.

  3. “We define cultural distance in media consumption between the rich and the poor in a given year by our ability to predict whether an individual is rich or poor based on her media consumption that year. We use an analogous definition for the other three dimensions of culture (consumer behavior, attitudes, and time use) and other group memberships.”

    When I read this, I get, to restate: We define a real-world difference in group behavior in media consumption between the rich and the poor in a given year by our ability to predict whether an individual is rich or poor based on their media consumption that year. We use an analogous approach in rating the validity as genuinely diagnostic of a real group distinction of the other dimensions of culture (consumer behavior, attitudes, and time use) and other proposed group traits.

    If you start with the assumption that a group is a true group, that membership in the group gives meaningful information about the distribution of individual traits among the select population, then isn’t it likely that direct comparisons will find statistically significant differences? And some of them due to chance? And, will real differences make any real differences? Starting with groups defined at will seems apt to fall into pattern seeking. Would it even be p-hacking to focus on statistically significant differences between groups?

    But perhaps I’ve completely misunderstood, and this isn’t an effort to build a valid taxonomy.

  4. Re: 71.4%. Isn’t three significant digits the rule of thumb? I agree that it is a little absurd to claim such precision, but I think this idea is deeply set. I get annoyed when I see, e.g. 53.63128903. But three digits seems OK.

    • Jack:

      No way you’d expect to see 71.4% in a replication. Even from straight simple random sampling, you’d need a sample size of a million to trust an estimate with accuracy one-tenth of a percentage point. But even if you had a simple random sample of a million, there’s much more nonsampling error and variation out there.

Leave a Reply to jack pq Cancel reply

Your email address will not be published. Required fields are marked *