Google analytics versus random variation

Ted Dunning writes:

Google analytics normally does a pretty good job of dealing with statistical issues. For instance, the Google website optimizer product does a correct logistic regression complete with error bars and (apparently) Bayesian analysis of how likely one setting is to actually be better than another.

But their demo of their latest visualization product is worth a write-up. They seem to ascribe volumes of meaning to a variations in small count statistics.

As Aleks knows, I can’t bear to watch videos. I like the idea of dynamic graphics, but I can’t stand the lack of control that comes from watching a video. I like to read something that I can see all of at once.

But the Google tool looks pretty cool. Also, I didn’t know they did Bayesian logistic regression. I wonder what prior distribution they use? This is a topic that my colleagues and I have thought about.

Ted continues:

I don’t know that they do Bayesian logistic regression. I have compared their results on a few cases to your R code (bayesglm) and to non-Bayesian results (glm). The results from all three (Google, glm, bayesglm) were indistinguishable. I suppose that is good news from the standpoint of choice of prior. The only advantage in practice to a Bayesian process would be that having a decent prior would prevent the system from having bad transients when data collection is begun.

The Bayesian-ness that I was referring to was the computation of whether a design option was a good choice. They present these result in terms of probability that a design option would be better than the original design and probability that the design option is the best choice of the ones shown. The computation of these probabilities is inherently Bayesian since it involves integration over the posterior. I have long advocated this way of presenting results to decision makers and am happy to see Google using it. In fact, I generally go one step further since business decision makers usually don’t care if they have the absolute best answer and do care about getting answers sooner. The number that I present in these problems is usually the probability that a design option will be within x% of the best design (I call this score viability). This score has the nice property that when an option really doesn’t matter, viability of both options will increase to 100% as more data is collected.

I completely agree with Ted’s point. There’s lots of statistical writing on how to estimate rankings. But, from a decision-analytic point of view, it’s very rare that you’d care about rankings at all! Especially in settings with many options and noisy data.