Silly old chi-square!

Brian Mulford writes:

I [Mulford] ran across this blog post and found myself questioning the relevance of the test used.

I’d think Chi-Square would be inappropriate for trying to measure significance of choice in the manner presented here; irrespective of the cute hamster. Since this is a common test for marketers and website developers – I’d be interested in which techniques you might suggest?

For tests of this nature, I typically measure a variety of variables (image placement, size, type, page speed, “page feel” as expressed in a factor, etc) and use LOGIT, Cluster and possibly a simple Bayesian model to determine which variables were most significant (chosen). Pearson Chi-squared may be used to express relationships between variables and outcome but I’ve typically not used it to simply judge a 0/1 choice as statistically significant or not.

I like the decision-theoretic way that the blogger (Jason Cohen, according to the webpage) starts:

If you wait too long between tests, you’re wasting time. If you don’t wait long enough for statistically conclusive results, you might think a variant is better and use that false assumption to create a new variant, and so forth, all on a wild goose chase! That’s not just a waste of time, it also prevents you from doing the correct thing, which is to come up with completely new text to test against.

But I agree with Mulford that chi-square is not the way to go. I’d prefer a direct inference on the difference in proportions. Take that inference–the point estimate and its uncertainty, estimated using the usual (y+1)/(n+2) formulas–and then carry that uncertainty into your decision making. Balance costs and benefits, and all that.

Moving forward, you’re probably making lots and lots of this sort of comparison, so put it into a hierarchical model and you’ll get inferences that are more reasonable and more precise.

But . . . who knows? Maybe Cohen’s advice is a net plus. Ignoring the chi-square stuff, the key message I take away from the above-linked blog is that, with small samples, randomness can be huge. And that’s an important lesson–really, one of the key concepts in statistics. Don’t overreact to small samples. If the silly old chi-square test is your way of coming to this conclusion, that’s not so bad.

1. zbicyclist says: