Next open blog spots are in April but all these are topical so I thought I’d throw them down right now for ya.
1. Alex Durante writes:
I noticed that this study on how Trump supporters respond to racial cues is getting some media play, notably over at Vox. I was wondering if you have any thoughts on it. At first glance, it seems to me that its results are being way overhyped. Thanks for your time.
Here’s a table showing one of their analyses:
My reaction to this sort of thing is: (a) I won’t believe this particular claim until I see the preregistered replication. Too many forking paths. And (b) of course it’s true that “Supporters and Opponents of Donald Trump Respond Differently to Racial Cues” (that’s the title of the paper). How could that not be true, given that Trump and Clinton represent different political parties with way different positions on racial issues? So I don’t really know what’s gained by this sort of study that attempts to scientifically demonstrate a general claim that we already know, by making a very specific claim that I have no reason to think will replicate. Unfortunately, a lot of social science seems to work this way.
Just to clarify: I think the topic is important and I’m not opposed to this sort of experimental study. Indeed, it may well be that interesting things can be learned from the data from this experiment, and I hope the authors make their raw data available immediately. I’m just having trouble seeing what to do with these specific findings. Again, if the only point is that “Supporters and Opponents of Donald Trump Respond Differently to Racial Cues,” we didn’t need this sort of study in the first place. So the interest has to be in the details, and that’s where I’m having problems with the motivation and the analysis.
2. A couple people pointed me to this paper from 2006 by John “Mary Rosh” Lott, “Evidence of Voter Fraud and the Impact that Regulations to Reduce Fraud Have on Voter Participation Rates,” which is newsworthy because Lott has some connection to this voter commission that’s been in the news. Lott’s empirical analysis is essentially worthless because he’s trying to estimate causal effects from a small dataset by performing unregularized least squares with a zillion predictors. It’s the same problem as this notorious paper (not by Lott) on gun control that appeared in the Lancet last year. I think that if you were to take Lott’s dataset you could with little effort obtain just about any conclusion you wanted by just fiddling around with which variables go into the regression.
3. Andrew Jeon-Lee points us to this post by Philip Cohen regarding a recent paper by Yilun Wang and Michal Kosinski that uses a machine learning algorithm and reports, “Given a single facial image, a classifier could correctly distinguish between gay and heterosexual men in 81% of cases, and in 71% of cases for women. Human judges achieved much lower accuracy: 61% for men and 54% for women.”
Hey, whassup with that? I can get 97% accuracy by just guessing Straight for everybody.
Oh, it must depend on the population they’re studying! Let’s read the paper . . . they got data on 37000 men and 39000 women, approx 50/50 gay/straight. So I guess my classification rule won’t work.
More to the point, I’m guessing that the classification rule that will work will depend a lot on what population you’re using.
I had some deja vu on this one because last year there was a similar online discussion regarding a paper by Xiaolin Wu and Xi Zhang demonstrating an algorithmic classification of faces of people labeled as “criminals” and “noncriminals” (which I think makes even less sense than labeling everybody as straight or gay, but that’s another story). I could’ve sworn I blogged something on that paper but it didn’t show up in any search so I guess I didn’t bother (or maybe I did write something and it’s somewhere in the damn queue).
Anyway, I had the same problem with that paper from last year as I have with this recent effort: it’s fine as a classification exercise, and it can be interesting to see what happens to show up in the data (lesbians wear baseball caps!), but the interpretation is way over the top. It’s no surprise at all that two groups of people selected from different populations will differ from each other. That will be the case if you compare a group of people from a database of criminals to a group of people from a different database, or if you compare a group of people from a gay dating website to a group of people from a straight data website. And if you have samples from two different populations and a large number of cases, then you should be able to train an algorithm to distinguish them at some level of accuracy. Actually doing this is impressive (not necessarily an impressive job by these researchers, but it’s an impressive job by whoever wrote the algorithms that these people ran). It’s an interesting exercise, and the fact that the algorithms outperform unaided humans, that’s interesting too. But this kind of thing: like “The phenomenon is, clearly, troubling to those who hold privacy dear—especially if the technology is used by authoritarian regimes where even a suggestion of homosexuality or criminal intent may be viewed harshly.” That’s just silly, as it completely misses the point that the success of these algorithms entirely depends on the data used to train them.
Also Cohen in his post picks out this quote from the article in question:
[The results] provide strong support for the PHT [prenatal hormone theory], which argues that same-gender sexual orientation stems from the underexposure of male fetuses and overexposure of female fetuses to prenatal androgens responsible for the sexual differentiation of faces, preferences, and behavior.
Huh? That’s just nuts. I agree with Cohen that it would be better to say that the results are “not inconsistent” with the theory, just as they’re not inconsistent with other theories such as the idea that gay people are vampires (or, to be less heteronormative, the idea that straight people lack the vampirical gene).
Also some goofy stuff about the fact that gay men in this particular sample are less likely to have beards.
In all seriousness, I think the best next step here, for anyone who wants to continue research in this area, is to do a set of “placebo control” studies, as they say in econ, each time using the same computer program to classify people chosen from two different samples, for example college graduates and non-college graduates, or English people and French people, or driver’s license photos in state X and driver’s license photos in state Y, or students from college A and students from college B, or baseball players and football players, or people on straight dating site U and people on straight dating site V, or whatever. Do enough of these different groups and you might get some idea of what’s going on.