We haven’t had one of these in awhile, having mostly switched to the “chess trivia” and “bad p-values” genres of blogging . . .
But I had to come back to the topic after receiving this note from Raghuveer Parthasarathy:
Here’s another bad graph you might like. It might (arguably) be even worse than the “worst graphs of the year” you’ve blogged about, since rather than being a poor representation of data, it is simply the plotting of a tautology that mistakenly gives the impression of being data. (And it’s in Nature.)
On the vertical axis we have the probability of being Type 2 Diabetic (T2D). On the horizontal axis we have the probability of being normal. There’s a clear, important trend evident, right? No! The probability of being normal is trivially one minus the probability of being T2D! The graph could not possibly be anything other than a straight line of slope -1. (For the students out there: the complete lack of scatter in the graph is a strong hint of something wrong.) What about the colors? They assign the data points for people with a > 50% probability of being T2D to be red, and the opposite to be green. The graph is simply plotting a tautology, that the probability of x is one minus the probability of not-x, together with a color scheme for labeling x. Paraphrasing Tufte, it has an information-to-ink ratio of approximately zero.
Not quite zero: what we seem to have here is a highly inefficient two-dimensional multicolor display of a one-dimensional set of 49 numbers, using dots that are so blurry that we can’t actually get much of a sense of their distribution. All joking aside, I’m guessing this graph would be much better if the x-axis were used for some relevant continuous variable (for example, people’s ages) and the colors used for some discrete variable (for example, some other indicator of health status).
Parthasarathy does add: “I’ll stress that the study itself is fascinating.” So, just to be clear, he’s criticizing the graph, not the underlying research.