Skip to content
 

The statistical properties of smart chains (and referral chains more generally)

Louis Mittel writes:

The premise of the column this guy is starting is interesting: Noah Davis interviews a smart person and then interviews the smartest person that smart person knows and so on.

It reminded me of you mentioning survey design strategy of asking people about other people, like “How many people do you know named Stuart?” or “How many people do you know that have had an abortion?”

Ignoring the interview aspect of what this guy is doing, I think there’s some cool questions about the distribution/path behavior of smartest-person-I-know chains (say, seeded at random). Do they loop? If so, how long do they run before looping, how large are the loops? What parts of the population do the explore? Do you know of anything that’s been done on something like this?

My reply: Interesting question. It could be asked of any referral chain, for example asking a sequence of people, “Who’s the tallest person you know?” or “Who’s the best piano player you know” or “Who’s the weirdest person you know” or whatever. But let’s stick with the “who’s the smartest” chain.

In answer to Louis’s first question: yes, such a chain would have to loop, as there’s only a finite number of people. Some of the loops might be pretty short. For example, if you ask Stephen Hawking for the smartest person he knows, and then ask that next person, you’ll probably loop back to . . . Stephen Hawking. The distribution of lengths of the loops, that I have no idea.

I’m trying to think how one could measure the distribution of this sort of referral network.

P.S. All in all, the guy in the above-linked interview seemed reasonable, but I was struck by this one bit, where he writes of one of his early business experiences:

Our head of IT at the time was adamant that we should start an Internet Service Provider because it was hard to get onto the Internet if you weren’t at college, and ISPs were growing something like 1,000 percent a month. He tried to convince me to invest $10,000 to start an ISP in Cambridge. . . . I thought it was the stupidest thing ever. That was the end of my Internet foray. If I had listened to him, I would have been like Zuckerberg or something. I completely missed the boat back then.

Everything’s relative. The guy is rich, successful, can do anything he wants. But he thinks he missed the boat.

P.P.S. One thing that came up in comments is, can people refer to themselves? I assume not, otherwise all chains would eventually dead-end at Stephen Hawking, Scott Adams, and that albedo guy.

17 Comments

  1. Willem says:

    The questions seem a part of Social Network Analysis and directed toward some model that has the Small world property [1], but also some discriminative feature based on (for example) IQ. (If you are the dumbest person in the network, nobody will mention you, etc.)

    The hard part is the number of potential networks with anything above a few players. Many proofs are based on mean field approximations, so they evade questions on the true distribution.

    The number of loops seem to depend on things like distribution of “smart”, the distribution of “number of links” and whether or not people fail to name the same person “smartest”. My feeling is that, even when the error rate is fairly small, the number of loops will diminish dramaticly.

    On the other hand, in fairly connected networks without errors people the Small world-effect will dominate.

    A model that estimates a wide range of distributions (from preferential attachment to pure random models) [2]

    [1] http://en.wikipedia.org/wiki/Small-world_network
    [2] http://www.kellogg.northwestern.edu/faculty/rogers_b/personal/assets/netpower.pdf

  2. jonathan says:

    The original page rank idea – speaking without technical precision about the basic Google approach – viewed connections to web pages and sites as a proxy for usefulness. Maybe one way to look at the distribution would be to examine differences in results as you change the query.

    Have you ever posted about citation indices?

  3. rdm says:

    If I were to answer this kind of question, the answer would not always be the same, in part because my opinions change, but also because when I do have an idea of “smartness of someone” that sort of idea depends on the context.

    But quite often I would have no answer.

  4. Tom says:

    This is an interesting approach to what will no doubt be a fairly interesting set of interviews.

    I couldn’t help thinking that the reality tv version would be to ask someone who was the dumbest person they knew, follow that chain and televise the ensuing chaos.

  5. John Mashey says:

    I think I’d want to categorize such questions, since tallest has a clear simple ranking metric, but smartest doesn’t, in practice.

  6. jonathan says:

    I have no way of thinking as deeply as you about this material so pardon the idiocy. But I have a question. It’s related to the weakly informative prior work you linked to earlier. How do you determine in a case like this the bias driving the relationship and the extent to which that fit is dependent? By extent, I am in fact meaning spatial.

    Here’s a case. Music resolves. I play my work in my head until it reaches a spot where it needs work, which may be a harmony but may mean a reset of the entire piece – which means I’ve created a part of a whole. But I sometimes wonder how my work would resolve if I grew up in a different tonal system. (Substitute whatever you want for tonal.) And sometimes I wonder about how tuning affects these perceptions: shift away from equal termpering to any weighted tuning, like one for strings, and you get a different world in which my model is distorted by my priors. And then the unfamiliarity, the surprise, incorporates into the structure, as we’ve seen with the history of dissonance and atonality, but within that structure there is still this essential “pick this and you get this” issue. So if I ask this guy who the smartest guy is and then that guy, how do I know the extent to which this chain is determined by a shared prior of some sort? I assume then you could dump random processes on top and that you’d find nomal distributions … but is this something well thought out in your field? I hope I was clear; it’s clearly multi-level but biased at some levels in some ways that are hard to contain.

    Sorry to post what must to you be drivel.

  7. lemmy caution says:

    The chain concept is the key to the hardest brain teaser that I know of:

    http://blogs.scientificamerican.com/observations/2012/07/19/puzzling-prisoners/

  8. Lord says:

    Under the proviso, it is not themself?

    • Andrew says:

      I can only assume. Otherwise all chains would eventually dead-end at Stephen Hawking, Scott Adams, and that albedo-guy.

      • Wonks Anonymous says:

        The joke doesn’t quite work because Scott Adams likes to refer to himself as an idiot.

        • Andrew says:

          Yah, but remember that Adams also described himself as having
          a certified genius IQ” and also that he “can open jars with [his] bare hands” and is “able to lift heavy objects.” He pretty much dominates Stephen Hawking, being equal on the IQ thing and then clearly superior in the jar-opening and object-lifting dimensions.

  9. You know why this post pleases me!

    • Andrew says:

      I’m glad thatsome people around here appreciate the recurring characters. Even if I couldn’t work Dr. Anil Potti into this one.

  10. […] Andrew Gelman asks about the statistical properties of referral chains. […]

  11. J Bulbulia says:

    Just a stray thought about estimating the distribution of referrals. Probably there’s an easier way. I’m not a mathematician.

    First, identify the chain length using the chain product rule.* Here’s a simple linear formulation but it could be done in parallel with a different counting method.

    A GENERAL METHOD FOR COUNTING WITHIN A POPULATION
    A chain is seeded in a connected population. A progenitor counts “1” and passes that number to (any) child who says “2” and passes that number… until everyone has a number. Only one number is allowed for each n of N. At some point, a child, the “youngest”, will say “N”, and unable to find a target, returns “N” to a parent…. back down the chain to the progenitor. Interestingly, everyone in the population will know the population count, “N”. Nifty.

    COUNTING REFERRALS
    Here’s a distribution of referrals using a similar principle. Let each n of N cite one “smartest.” No self-citing. Seed as before. This time, the youngest child of N sends “N” + her citation information (# of referrals + referral record… how much information?) Parents keep a running register, which is passed to grandparents. The progenitor then knows “a smartest of all” distribution for N combining a very dumb rule with a running index.

    There are other distributions. Allow two citations. Allow N-1. Allow self-citations…

    Generally I think the key to learning about the distribution is to allow indexing but to avoid looping. Said differently, look at this problem like you would a tree.

    But again, this isn’t my area. Might be way off. Fun to think about!

    *I just learned about the chain product rule from friend friend Marcus, by sitting in on his class for fun.
    This is a chapter from a book by his colleague David McKay on the chain rule.

    https://ecs.victoria.ac.nz/foswiki/pub/Courses/COMP307_2013T1/Lect18/message-passing-DJCM-Chapter16.pdf

    The book:
    MacKay, D. J. (2003). Information theory, inference and learning algorithms. Cambridge University Press.

  12. […] Gelman asks about the statistical properties of the “smartest person you know” network. That is, if you asked every one of a large number of people “Who is the smartest person you […]

  13. Calum says:

    An enjoyable version of this is to enter a bar. Ask the barman his favourite bar and his favourite drink in that bar. Journey to said bar, order said drink, and repeat question. Repeat until you are served a Pan-Galactic Gargle-Blaster, or the game is terminated in some other way.