pourquoi michael jackson est blanc

Alex Tabarrok links to these amusing partial Google searches found by Dan Ariely:

ariely1.png

ariely2.png

Ariely pretty much takes these at face value, labeling them “What Boyfriends and Girlfriends Search for on Google” and writing: “This shows Google’s remarkable power as a source of data on a range of human behaviors, emotions, and opinions. It gives us insights into what people might care the most about concerning a given topic. . . .”

I followed a link in Ariely’s comments to a blog whose entire content are partial Google searches. Seems like a bit of a niche market to me, but the results were so weird (for example, one of the top ten searches for “my rob” is “my robot friend is pregnant”) that i started to get skeptical.

So I tried the simplest thing I could think of on my own computer, and here’s what came out:

jackson.png

(Click to see a larger version of this image.)

My second choice was est-ce-que, which also yielded some strange results.

So my current thought is not to take these Google partial searches so seriously. I wonder if the algorithm purposely spits out wacky searches in order to make the search function more fun to play with.

Maybe some of the Google employees who read this blog can enlighten us (anonymously, if necessary) about how seriously we should interpret these?

P.S. Ariely’s blog is pretty cool–a mix of some basic intro stuff (as is appropriate, since the blog is attached to his popular book and some deeper ideas too. When Predictably Irrational came out, we received from his publicist several emails, a copy of the book (which Juli reviewed), and a suggestion that we could interview Ariely or he could guest blog for us. We said yes on both but never heard back. I can understand it: publicists get busy, and we did get a free book out of it. But, Dan, if you’re reading this, get in touch: we’d still be glad to have you guest blog for us!

P.P.S. Our earlier discussion of googlefights as a teaching tool.

P.P.P.S. Ariely seems to have moved his blog. Here’s an updated link.

16 thoughts on “pourquoi michael jackson est blanc

  1. Something else that is interesting with the Google auto complete results is when you try searching for:

    "Christianity is " vs "Judaism is " vs "Islam is"

    Pretty surprising results.

  2. As usual, your results vary by geographic locale (think mixture model). For [pourquoi], I get more prosaic completions and for [pourquoi mi] I get the single suggestion "pourquoi michael jackson est mort". Completions for [how can i get my gi] are different, but equally funny. They show our lack of sex ed in the United States, with completions like "how can i get my girlfriend pregnant"!

    I'd guess Google's simply using a Shannon-style noisy channel model to find the most likely completions in their vast query logs. And weighting locales or doing it on a per-locale basis.

    What's cool about the algorithm is that it includes errors. I get the same suggestion for [porquoi mi]. Given the tolerance to errors, it's amazing it runs so fast, even given the vast caches and computer power in Google's server farms.

    We have a module that implements auto-completion with errors using a noisy channel model. We describe the whole thing in our "did-you-mean?"-type spelling correction tutorial:
    http://alias-i.com/lingpipe/demos/tutorial/queryS

  3. Bob: Of course I realize that the results vary geographically. My point was that they were just weird: no way are they the most common searches (as in fact you can tell from the numerical counts that you see if you click on the image). Which makes me wonder whether Ariely was correct in taking his results seriously. What do you think?

  4. A lot of those questions come from Yahoo Answers and WikiAnswers. These question and answer sites are being used by sorta-clever search engine optimizers to try and drive traffic to their (often bogus) sites. As part of that, they're stuffing queries into google to prime the autocomplete.

    It seems really clever, but is only sort of: you'll note that these illegitimate sites aren't showing up in the results nearly as often as they should be, given all the effort to get these (mostly useless) things to show up like that.

    (One of these that really took off: "How is babby formed????" It became a bit of an internet meme, complete with flash animations of dramatic readings of the incoherent replies.)

  5. Andrew – fun game. I do think you're almost certainly right these aren't the most common searches (and knowing google I wouldn't be surprised if they, in addition to geographic factors also have time of the day factors – probably in this case more sex and less marriage after 10pm).
    Google also wants to stay up to date, so recent searches, especially in your vicinity, will tend to show up more.
    For those reasons I fear this "data" as Ariely claims it is really just entertaining and not usable data.

    Finally "(as in fact you can tell from the numerical counts that you see if you click on the image)." is incorrect. While I'm sure they are correlated, the frequency of a search question and the number of google results it produces are two entirely different things.

  6. Yeah I think at this point taking Google at face value on anything is a bit of a no-no. On a related note:

    1. I assume you've seen this Language Log post on the weirdness of Google hit counts: http://languagelog.ldc.upenn.edu/nll/?p=1992

    2. I tried a variant on this lately and found a really weird systematic effect: http://glassbottomblog.blogspot.com/2010/01/count

    In principle I imagine there are ways to get reliable information out of Google hit counts, suggestions, etc. but I have no idea of what they are.

  7. Suggested completions for "why" on google.com are rather strange, too, including "why do dogs eat poop" and "why can't i own a canadian". The latter is also a suggestion on google.ca.

    However, "Why can't I own a Canadian?" is actually the title of a humor piece, and as a result might actually be something people search for, whereas "pourquoi Michael Jackson est blanc" is not the title of anything; when put it quotes it yields only people wondering why Google gives this as a result.

  8. Following on from what xi'an said, some of these things are usually quotes from films/books/tv etc, and so they look bizarre, but are legit. I don't know about the ones in particular in your post, but unless there's something obvious that comes up when you hit enter, then I don't know what to suggest.

  9. No, you can't take Google hit counts seriously for queries of more than one word.

    I hadn't seen the counts on the auto-complete, but I'd suspect that it's still a combination of noisy-channel auto completer and search errors. They have to cut lots of corners making things go fast. And multi-term counts are only lousy approximations.

    Also, if they geographically narrow enough, then you'd expect a crazy level of variance. I learned that from the awesome cancer cluster map examples in BDA!

    And as someone else pointed out, all sorts of people try to game the system by submitting botnet queries. Google then tries to filter these out.

    I seriously doubt Google's trying to modify the results for humor. They tend to keep their organic search pretty direct in order to avoid conflicts with their advertising (maybe this is inevitable if you sell ads with content, like newspapers and Google do).

  10. amazon best buy craigslist dictionary.com ebay facebook gmail hotmail imdb jcpenney kohls lowes myspace netflix office depot pandora qvc realtor.com southwest airlines target usps verizon wireless walmart xbox 360 youtube zillow.

    Are these just the commonest searches in my area, or is advertising involved?

  11. Here's an interesting phenomenon:

    Type in any unpronounceable combination of three letters — nqp, zlw, vqo, whatever — and it always turns up something on the list.

    But type "fuc" and the list goes blank.

    Clearly Google has deliberately shut down that combination.

    Not just that. Other shut-down combos can be found: assh, nigge, motherf, and several more you can probably find for yourself.

Comments are closed.