Going beyond the rainbow color scheme for statistical graphics

Yesterday in our discussion of easy ways to improve your graphs, a commenter wrote:

I recently read and enjoyed several articles about alternatives to the rainbow color palette. I particularly like the sections where they show how each color scheme looks under different forms of color-blindness and/or in black and white.

Here’s a couple of them (these are R-centric but relevant beyond that):

The viridis color palettes, by Bob Rudis, Noam Ross and Simon Garnier

Somewhere over the Rainbow, by Ross Ihaka, Paul Murrell, Kurt Hornik, Jason Fisher, Reto Stauffer, Claus Wilke, Claire McWhite, and Achim Zeileis.

I particularly like that second article, which includes lots of examples.

7 thoughts on “Going beyond the rainbow color scheme for statistical graphics

  1. I have to admit that I found the second article a bit dubious. They say they’re going to demonstrate how inferior the rainbow color scheme is… but only two of the five comparisons actually use the rainbow scheme.

    Of the two examples which do use it, the second — the influenza severity in Germany — is plausibly a good example. I mean, their statement that “it is hard to grasp intuitively which areas are most affected by influenza” is kind of silly, since all I have to do is glance once at the colorbar and it’s obvious which areas are the most affected — the red ones! — but, yes, if you really want to emphasize just the high-severity areas and not also emphasize the low-severity areas, the second color scheme is a bit better.

    But when I look at the first example — the bivariate kernel density estimate plot — what I see is that the rainbow scheme is clearly superior, because it tells you more about the data. The rainbow scheme shows me five or six different levels, and lets me see that the lower left density peak is definitely denser (solid red) than the upper right peak (orange-ish). But their preferred “sequential-heat” scheme basically shows me just three levels (I’m ignoring the yellow outline, since that’s almost impossible to see and comes across as though it might be just an optical illusion of some kind). Is the lower-left density peak denser than the upper-right? Well, it’s really hard to tell; they’re both almost uniformly dark red-brown. If you look carefully, you might decide the upper-left peak was maybe a little lighter, but it’s dubious. Whereas the rainbow scheme makes the difference clear.

    (This, in a sense, explains why the sequential-heat scheme translates so well into grayscale — it’s not really giving you any more information than the grayscale, so there’s not much point in using it other than pure aesthetics: it’s like a duotone version of a plot.)

    So, yes, if you want to simplify your data and suppress details and variations — don’t use the rainbow scheme. There are going to be cases where this is the right thing to do. But there are going to be other cases where showing more of your data is useful (and even honest), and the rainbow is a pretty good way of doing that.

    • I think part of the knock on the kernel density plot is that: a) each color bands represents a different and unpredictable value band; and b) the narrow bands are so narrow that they effectively don’t provide any info. I don’t have any trouble recognizing the difference between the two peak / high intensity areas on either map. To my eye they’re equally effective on both plots.

      But I’ve used many magnetic anomaly and mag gradient maps in the rainbow scheme and never had a problem because of the colors. You also make a good point about the comparison with grayscale. If the color is a single hue, why bother with color at all? And I don’t like the brown-yellow color scheme at all. A deterrent to readers.

      I definitely agree that people shouldn’t get too caught up in a bunch of rules about what colors to use. Just do what makes the data shine.

  2. As a partially red-green colorblind person myself, I appreciate when someone does to some trouble to take color vision into consideration. One thing that I’d like to bring out is that there are several properties that can all be varied, and the best course is to make use of more than one of them.

    There are at least hue, saturation, and intensity. If all of them are used, and with a perceptually monotonic variation, there will be the greatest chance for a viewer to perceive the intended message of a diagram, even a viewer with color vision aberrations. One complication is that the number of distinguishable gradation steps is not necessarily the same for all these characteristics. For example, most people may be able to distinguish more hue steps than saturation steps. Such a diagram also will fare well enough when reproduced in gray-scale, since at least the intensity axis will remain.

Leave a Reply to Ari Hartikainen Cancel reply

Your email address will not be published. Required fields are marked *