David Blei points me to this report by Lars Backstrom, Jonathan Chang, Cameron Marlow, and Itamar Rosenn on an estimate of the proportion of Facebook users who are white, black, hispanic, and asian (or, should I say, White, Black, Hispanic, and Asian).
Facebook users don’t specify race/ethnicity, but they do give their last name, and Backstrom et al. use Census data on the ethnic breakdowns of last names to estimate the proportion of Facebook users in each of several Census-defined ethnic categories. They present their results for several snapshots of Facebook from 2006 through 2009.
Their analysis seems reasonable enough to me, even if it won’t be exactly right since the Facebook population is not a random sample of Americans within each ethnic category. The next step is to break things down by other variables, most obviously age, sex, education, and state of residence. Does the Census give last name data for any of these subcategories of the population?
And then there’s lots more you can do, once you have these numbers; for example, you can estimate how often people in different groups (categorized by age, sex, ethnicity, etc.) log into Facebook, how many Facebook friends they have, and so forth. You can get all sorts of details, far beyond anything my collaborators and I have learned about social connections.
Also, a few minor comments:
1. Backstrom et al. appear to use the term “white” and “Caucasian” interchangeably, which, as I’ve noted before, isn’t quite right, as most South Asians are “Caucasian” but not white. It’s not clear whether south Asians fall in the “Caucasian” or “Asian/Pacific Islander” category in this analysis.
2. The dotted lines in their very first graph are labled as “the proportion of the Internet population” for each ethnic group. I’m just wondering: where did they get these numbers?
3. Also, along the same lines, could they give the link to the public data they used? I followed the link they did give, but it was a general Census website, and I wasn’t sure where one would go to find the full tables.