Combining two of my interests

Paul Alper writes:

Hi Andrew (or Andy or even Gelman [17 of them]):

Go to this link and have some fun with (useless? powerful?) data mining.

As the authors say, it is addictive.
Paul (no other way to spell it) Alper [215 of us]

I’m reminded of this discussion from 2012, “Michael’s a Republican, Susan’s a Democrat.” As I wrote at the time:

It’s no surprise that men give more to Republicans and women to Democrats, or that the average contribution to a Republican has a larger dollar value than the average contribution to a Democrat, nor perhaps should we be surprised that “Tom” splits his support between the two parties while “Thomas” is a strong Republican. Still, it’s fun to see the data.

Overall, I think this graph understates contributions to Republicans because it doesn’t include those new super-pacs.

But the new tool seems to be based on a different dataset, opinion polls rather than campaign contributions. Playing around a bit, I see a lot less variability in party ID by name (estimated using the survey database) than in partisanship of campaign contributions by name (using the campaign contribution database).

I’m not sure what to make of this. But perhaps both data sets have some flaws. In both cases, I’d say the data are fun and worth exploring but we should be careful before assuming the numbers are correct.

18 thoughts on “Combining two of my interests

  1. Hi Andrew: Can you please let us know the names you have tried that reveal less variability in name than in contribution case? I think it’s the other way round and perhaps depends on the rarity of the name in the registered voters list.

  2. It is well known that unlike other industrial nations, the U.S. is loaded with guns in the home. But it is still surprising that no matter what name I enter (and I tried many), the web site shows that for

    “Have a gun in their house”

    keeps producing around 50% or more. With that in mind, be sure to read “When may I shoot a student” which appeared today in the NYT; the op-ed piece is hilarious when it isn’t knee buckling.

    http://www.nytimes.com/2014/02/28/opinion/when-may-i-shoot-a-student.html?hp&rref=opinion

  3. Sorry, I feel like I should be sharing more in the fun but I’m having a hard time. Querying the database with a name gets a prediction with very limited precision. The prediction is driven by implicit correlations with gender, ethnicity, age, SES, and other possible demographic factors. We are quite sure, aren’t we, that there is no worthwhile predictive relationship qua the name itself independent of the missing data concerning the traits of the person who owns the name?

    So can’t we get that “missing” data otherwise? Let’s decouple the names and the survey database. For the survey database, we build a predictive model on available demographics. Given any name we elicit a posterior distribution on a set of demographic variables, then we generate predictions. The question is how little effort do we have to expend to achieve similarly accurate predictions? Subjective one shot guesstimates of gender and ethnicity which may have high surface information? Or would we need a relatively simple database of names and 5 or 6 demographic factors?

    I get that this is the database we have, and so it might be fun to play with it. But I’m just saying that combining two simpler databases might yield similar information. I guess I’m being humourless on this. The name game just seems like a proxy for the complete data problem, which has nothing whatsoever to do with the person’s actual name. Compare to probing a genetic database. There, if you do push past the noise, there’s a chance you’ve struck gold by actually linking a gene with a trait.

    • The name game could be taken as just a lighthearted way into thinking about these issues. And most people identify with their name and think it is fun to have some data attached to it; and to find things like a 61.5 % Democrat identification for “Ayn”. At least I do.

  4. Steve Sailer has been talking about this concept (not the site) for a long time. It’s pretty obvious some of the trends of ghetto baby names, white trash baby names, etc. And that’s first names. Last names will correlate to ethnicity. The whole thing seems way too obvious and politically incorrect.

    On the science-y stats side, I wonder how such predictors do if you deconfound ethnicity. For instance are Alfred Morris and Robert Griffin more likely to be Republican than LeSean McCoy and DeSean Jackson?

      • I looked at her post, but did not read all the commenters points. Again, I think she is making a point of deconfounding variables. Basically she said that Palin was lower class compared to her NY neighbors. (True in a way.) But…Red Staters also serve their country more than snobby Manhattan middlebrows (sorry Columbia). And I’d probably have some redneck or biker or whatever stop to help me change a tire or give me a lift to the gas station than one of her social set.

        I guess you have to ask how much extra the names give you versus other variables (region, income, education, etc.) Although, to support her work, perhaps the names can be used as a proxy for other information that may not be available (income, education, etc.) Of course the whole thing then ends up treading awfully close to some non-PC areas like should an insurance company or a mortgage company or an HR resume screener use names to guide the first cut…

        • Or to get really Moneyballish with it: how about a hedge fund that bets on tranches of mortgage debt based on the first/last names. Maybe the mortgage company authorizing the loan is not allowed to make ethnicity based decisions (despite statistical rationale). Maybe race is not recorded. But who can stop a hedge fund just deciding to bet differently on the Mary versus Mykayla versus Latoya portfolios.

        • Nony:

          I see any place where Wattenberg described Palin as “lower class,” nor did I see Palin being compared to “NY neighbors.” I didn’t see any mention of New York in Wattenberg’s post at all. Indeed, neither Palin nor Wattenberg lives in New York, as far as I know. So all this is coming from you, not from Palin or Wattenberg.

        • “The most liberal and conservative parts of the country differ on key style-shaping variables, like income, education level, and the age when women marry and have children. A community where the typical first-time mother is a 22-year-old high-school grad is going to have a very different style climate from the community where the typical new mom is a 28-year-old with a college degree.”

          That’s what I was coming from…

          P.s. (segue) You have seen this right?

          https://www.youtube.com/watch?v=_UTnE6DVLq0

        • Dude, she lives in Massachusetts. I mean sheesh…same flipping difference. Ever looked at regional patterns of culture and politics? ;)

          Oh…and here are the first two sentences of her book!!!!

          “When my first baby was a girl, I noticed a curious phenomenon. It seemed every baby girl we met in Riverside Park in New York was named either Hannah or Olivia.”

          http://www.amazon.com/Baby-Name-Wizard-Magical-Finding-ebook/dp/B002GPGZ0G/ref=la_B001H6EMM8_1_1?s=books&ie=UTF8&qid=1393719587&sr=1-1#reader_B002GPGZ0G

          Oh…and before you go there, I didn’t even notice her last name (until now). Cross heart. I just knew that she was a writer and the tone of the blog. She’s gotta be a New Yorker are a wannabe New Yorker. Remember what Wolfe said in Burn Rate, it’s basically just an understood thing amongst the writing, editing crowd. Shared values, interests.

        • Nony:

          I think the difference between us in this conversation is that, as a statistician, I take things more literally than you do. When you wrote that of Wattenberg that “Basically she said that Palin was lower class compared to her NY neighbors,” I thought you were saying that (1) Wattenberg said that Palin was lower class and (2) that Palin or Wattenberg had neighbors in New York. But what you actually meant was that (1) Wattenberg gave demographic statistics, and (2) Wattenberg lives in Massachusetts. If you’d written that “Basically she connects Palin to a demographic profile that has lower education levels than Wattenberg’s Massachusetts neighbors,” I’d have no problem. Of course these are blog comments so I’m not saying you should proofread your statements for accuracy before posting. All I’m saying is that, as a literalist, I was puzzled by your statements as they didn’t seem to describe what I knew about Palin or Wattenberg. Hence my correction of what you posted.

        • I meant Wattenberg’s NY neighbor’s:

          When my first baby was a girl, I noticed a curious phenomenon. It seemed every baby girl we met in Riverside Park in New York was named either Hannah or Olivia.”

        • Dude, come on. I won this round of the Internet. You’re rich and smart. Let me have my little victory. ;)

          P.s. The ‘are’ was supposed to be an ‘or’. (It doesn’t make sense otherwise.)

          P.s.s. Anyhow, I just picked up on it. I didn’t even know that the FIRST TWO SENTENCES in her book would say “New York”.* That was just sweet sweet karma bailing me out. But I could tell…just some sort of Malcolm Gladwell, Steve Sailer hair on the back of the neck. We could call it “subconscious Bayesian insight” so it’s stah-tist-i-cal. :)

          Anyhow, she’s a smart interesting lady. And I think plenty smart enough even to be self-aware. When she wrote those sentences it wasn’t just because that’s the water she swims in as a fish…it was because they were GOOD SENTENCES to bring out the picture and she knew what she was doing.

          *It might have been in one of her blog posts that I skimmed too, but I’m too lazy to go back and use the ctrl-F to check. :(

  5. Male-female diff in attending religious services = 15%??
    Last year’s GSS estimated 35% of households with guns. Not sure how that relates to individual level of names.
    Party by gender looks about right.
    When it comes to guns, “there’s something about Mary…”

    Dem gun relig college N
    david 47.1 48.7 53.4 55.7 3064839
    john 46.7 49.5 51.9 55 3687229
    steven 46.1 47.3 54.1 56.3 1087825
    james 47.1 50.4 54.8 52.5 3647373
    robert 46.7 49.3 52.8 53.4 3591894

    susan 50.3 53.2 38.3 57.7 1213848
    jennife 53.1 51.5 35.1 57.9 1587428
    mary 54.7 56.8 36.1 49.3 2618387
    patrici 54.4 55 36.7 50 1602754
    linda 52.4 55.5 40.6 49.6 1446826

Comments are closed.