Joe Simmons writes:
I asked MTurk NFL fans to consider an NFL game in which the favorite was expected to beat the underdog by 7 points in a full-length game. I elicited their beliefs about sample size in a few different ways (materials .pdf; data .xls).
Some were asked to give the probability that the better team would be winning, losing, or tied after 1, 2, 3, and 4 quarters. If you look at the average win probabilities, their judgments look smart.
But this graph is super misleading, because the fact that the average prediction is wise masks the fact that the average person is not. Of the 204 participants sampled, only 26% assigned the favorite a higher probability to win at 4 quarters than at 3 quarters than at 2 quarters than at 1 quarter. About 42% erroneously said, at least once, that the favorite’s chances of winning would be greater for a shorter game than for a longer game.
How good people are at this depends on how you ask the question, but no matter how you ask it they are not very good.
The explicit warning, “This Graph is Super Misleading,” is a great idea.
But don’t stop there! You can do better. The next step is to follow it up with a spaghetti plot showing people’s estimates. If you click through the links, you see there are about 200 respondents, and 200 is a lot to show in a spaghetti plot, but you could handle this by breaking up the people into a bunch of categories (for example, based on age, sex, and football knowledge) thus allowing a grid of smaller graphs, each of which wouldn’t have too many lines.
P.S. Jeff Leek points out that sometimes a spaghetti plot won’t work so well because there are too many lines to plot and all you get is a mess (sort of like the above plate-o-spag image, in fact). He suggests the so-called lasagna plot, which is a sort of heat map, and which seems to have some similarities to Solomon Hsiang’s “watercolor” uncertainty display.
A heat map could be a good idea but let me also remind everyone that there are some solutions to overplotting of the lines in a spaghetti plot, some ways to keep the spaghetti structure while losing some of the messiness. Here are some strategies, in increasing order of complexity:
1. Simply plot narrower lines. Graphics devices have improved, and thin lines can work well.
2. Just plot a random sample of the lines. If you have 100 patients in your study, just plot 20 lines, say.
3. Small multiples: for example, a 2×4 grid broken down by male/female and 4 age categories. Within each sub-plot you don’t have so many lines so less of a problem with overplotting.