Aleks pointed me to this recent article by Pablo Mateos, Paul Longley, and David O’Sullivan on one of my favorite topics.
The authors produced a potentially cool naming network of the city of Auckland New Zealand. I say “potentially cool” because I have such difficulty reading the article–I speak English, statistics, and a bit of political science and economics, but this one is written in heavy sociologese–that I can’t quite be sure what they’re doing. However, despite my (perhaps unfair) disdain for the particulars of their method, it’s probably good that they’re jumping in with this analysis. Others can take their data (and similar datasets from elsewhere) and do better. Ya gotta start somewhere, and the basic idea (to cluster first names that are associated with the same last names, and to cluster last names that are associated with the same first names) seems good.
I have to admit, though, that I was amused by the following line, which, amazingly, led off the paper:
Personal naming practices exist in all human groups and are far from random.
Far from random, huh? Who’d a thunk it?
And also this:
Researchers have automatically classified the 2.5 million users of a mobile phone operator in Belgium into French and Flemish speaking communities based exclusively on the topological network structure of their 800 million phone calls and texts interactions [9]. In doing so they have demonstrated the enduring importance of linguistic and geographical barriers in the age of global mobile communications, and more importantly, that they can automatically be detected using network analysis.
OK, sure, any analysis of 2.5 million users is impressive on computational grounds alone, but . . . it’s hard to be impressed that you can automatically partition phone calls and texts from two different languages, right? It’s fine to do, but it’s hardly news that people like to talk in their own language.
This is partly what goes into the “sociologese” style of writing: a sort of flattening of affect, in which seemingly strange behaviors or findings are presented deadpan, while unremarkable observations can be touted as important.
P.S. [just added] This was just a coincidence—the above post about a month ago and was waiting its turn in the queue, whereas the item from yesterday was more recent—but it’s funny that I slammed economists one day and sociologist the next. I’m just full of stereotypes this week, I guess!
I don’t dissent from anything you say about sociology – to which I can add countless examples – but these guys, judging from their affiliation, are geographers, which perhaps makes it even worse…
” It’s fine to do, but it’s hardly news that people like to talk in their own language.”
Agree (!), but still, you have to remember that this was said in Belgium, a member of the EUropean Union. In EU, saying such an obvious thing is revolutinary, since the EU barons seems to have forgotten it.
The authors are all geographers, so you’ve applied your own stereotype to a different group.
Discovering the obvious is better than proving a false theorem :)
Andrew, it’s almost like you missed the fact that they determined who is in which group SOLELY from the network structure. Yeah, it wouldn’t have been hard to do it by looking at the languages they used in the texts, but that’s not how they did it!
I’m still not sure the result is surprising — it seems much like the example of separating liberal and conservative books by looking at the “people who bought X also bought Y” relationships on Amazon, which turned out to be pretty easy I think — but at least it’s a lot harder than saying “hey, I bet the people who send texts in Flemish are Flemish.”
Phil:
I caught that. But I’m not impressed that phone calls in two different languages can be separated into two clusters.
Look, I’m part of the team who wrote this Belgium paper. I agree with you, the cluster thing is not so big deal, and it was not the point.
The paper was about the probability of mobile phone communication between two individuals, that has been checked to be inversely proportional to (roughly) the inverse square of the distance between them. Not so big deal either… once you have checked it ;)
(we also devise a dynamical model that yields this probability distribution, but we’re far from your post about names)
The paper is there btw ;)
http://lanl.arxiv.org/abs/0802.2178
Hi—I wasn’t trying to criticize your work. My criticism was of the linked paper that seemed to be making a big deal of the less-important part of what you’d done.
Here is my similar observation using Twitter network of 40 Million users
http://www.akshaybhat.com/LPMR/
Also analysis of network of 2.5 millions users is not as impressive as you might think, its actually quite routine in field of network analysis.
yeah, once you’ve found the poor guy that will spend a few weeks cleaning up the data ;)
@Andrew, i had got your point, and agreed with it. I was even quite amused to see us cited that way.
As for your troll about Mateos et al’s paper, it made me want to read it, to check if you’re not just an as__ole (look, i’m kidding, ok?)
I’ve always enjoyed your blog Andrew (although given the time lapse you can tell I’m not an everyday reader…) so it was with mixed feelings that I came across your comment on our paper – but hey, all publicity is good publicity, right?!
Suffice to say that the paper went around the houses to get published – PLoS ONE was our fourth port of call. The language had been flattened (in several metaphorical senses) by the time it had done the rounds of the revision process multiple times. Something to do with the difficulties of publishing more quantitative sociology (or for that matter geography) which is trying to build on new approaches in ‘network science’… if you’re not a physicist-mathematician-statistician, that is (or so it sometimes seems).
My feeling (I speak only for myself here, not my co-authors, who may have different views) on our work (and Christophe’s) is that being able to confirm that such structures exist in the networks everyone is currently so excited about is (i) interesting in itself, and (ii) an important first step, even if it doesn’t tell us anything we didn’t already know. Our intended contribution is the concept of a names network and the clustering exercise on Auckland was intended to establish that it does indeed have interesting structure to make it worth spending any further time on it.
What we’d like to do next is see if we can use network structure to infer the cultural-ethnic-linguistic affiliations of names in a network for which we don’t have pre-classified information. In fact, that’s where we got started on this idea anyway.
I’m glad you like the using-forenames-to-cluster-surnames, surnames-to-cluster-forenames idea, even if we don’t do the latter in the paper. Another direction we’d like to go in is running co-clustering methods to simultaneously do both, which seems like it would make sense.