What are some common but easily avoidable graphical mistakes?

John Kastellec writes:

I was thinking about writing a short paper aimed at getting political scientists to not make some common but easily avoidable graphical mistakes. I’ve come up with the following list of such mistakes. I was just wondering if any others immediately came to mind?

– Label lines directly

– Make labels big enough to read

– Small multiples instead of spaghetti plots

– Avoid stacked barplots

– Make graphs completely readable in black-and-white

– Leverage time as clearly as possible by placing it on the x-axis.

That reminds me . . . I was just at a pharmacology conference. And everybody there—I mean everybody—used the rainbow color scheme for their graphs. Didn’t anyone send them the memo, that we don’t do rainbow anymore? I prefer either a unidirectional shading of colors, or a bidirectional shading as in figure 4 here, depending on context.

60 thoughts on “What are some common but easily avoidable graphical mistakes?

        • Yeah, figure 6 not only commits to red vs green. It also violates advice 5) from the post above. If you will print it in black and white, I doubt you will see anything.

        • Isn’t the one about graphs readable in black and white obsolete? Monochrome printers are increasingly rare, and most journals accept color figures already. Yes there are color blind people, but you can just use the right palette without throwing color out completely. It often takes 10x time to make sense out of a complex chart that was mindlessly converted to black and white after reading advices like this.

        • There are different kinds of color blindness, and going full b/w is the safest way to handle it. Through it would hot help blind people =(

          My argument for making pictures readable in b/w is that it makes it better in color as well. It is easier to visually distinguish two colors if they have different value, not just hue.

          Here is unrelated read: https://www.outdoorpainter.com/painting-basics-understanding-value/

          Having said that, I think that “Make graphs completely readable in black-and-white” is a good advice rather then absolute axiom. To improve Figure 6 from Andrew paper, I could change colors from red and green (to handle the major type of colorblindness) but keep the value the same. Because it looks nice this way.

        • Monochrome printers are not rare in libraries and computer labs where students print and low budget locations — I always print monochrome if possible because I have a budget and color ink is very expensive. In my university you needs to make a strong case to get a printer with color for normal use. Perhaps not an issue in wealthier settings.

        • I also find the inconsistency between Figures 4 and 6 in that paper a bit problematic: in the first figure, red is used to indicate the upper end of the scale (positive percentages), and then in the second figure it’s used to represent the lower end of the scale (low percentages).

          (And you can only get away with schemes where white indicates a value if you can — as in these figures — outline the regions with that (or other) values, so they’re distinguishable from the white background. You certainly wouldn’t want to use color schemes like these in scatter plots, for example.)

      • I’m a form of red-green colorblind and prefer the rainbow pallete with range 0 to 4/5.

        The palletes designed for colorblind people look unreadable to me…

  1. Since we’re on the subject of clear communication (graphical or otherwise), the author says he’s compiled a “list of such mistakes” while the list itself doesn’t contain the mistakes, but rather the correctives TO the mistakes. This might be nitpicky in the extreme, but it still irks me when I see people do this. The description of a (figure, list, table) should match its contents!

      • I hope he IS advocating small multiples – and not spaghetti plots. Indeed, consistent with his confusion of corrections and mistakes, he appears to be saying use small multiples rather than spaghetti plots. I’m ok with that advice.

        • I don’t really get it. Small multiples are for showing different but related curves, spaghetti plots are for showing multiple realizations of the same curve… so they serve different functions. Obviously what’s needed is small multiples of spaghetti plots.

        • Heck, use small multiples of the same spaghetti plot, just printed over and over again. What are the odds the reviewer will notice?

    • Fair point-my bad. In my defense, this query was originally sent as an email to Andy, whom I knew would know what I meant; he generously offered to post it, and I didn’t think to clarify.

    • My bad. I meant “corrections for common mistakes.” (I sent the query to Andy as an email intended for his advice, and I knew he would understand what I meant. He generously offered to post it, and I didn’t think to clarify further).

  2. You can also use different degrees of saturation – like shades of gray but for any color. I have found that it’s hard to distinguish too many different saturation levels, so better to use only 3 or maybe 4 max. And order them in the direction of the data magnitude.

    As a bonus, these colored areas or lines of different saturation will translate well to grayscale.

  3. No double y-axes! No time series as bar plots! No only using color for distinguishing groups – use different line dashes or point shapes, too! Start your bar plots at 0! Graph percentage values on a 0-100 axis – if it makes your effect look a lot smaller, that’s because your effect is very small.

    One I’m much less dogmatic about is no 3D plots. Your graph shouldn’t let you hide (or dramatize) your results based on how you angle it. But I’ve had people I deeply respect disagree with me on this one, so I’m sure there must be a value to 3D graphs (other than “they look cool”) I don’t quite get. I’ll keep using heatmaps.

    • Also, I’d argue that stacked bar plots are the most effective way to show the relative proportion of 2 values at multiple levels – if the two stacked bars always add to the same value, it makes it very easy to see how they change. More than 2 values, or bars summing to different amounts, is where stacked bar plots go wrong.

    • Hi, Mike. These are guidelines that are true much of the time but I don’t agree with the exclamation points.

      – Double y axes can be okay if they express the same values in different terms: https://www.nhs.uk/live-well/healthy-weight/height-weight-chart/
      – Decent bars that don’t start at 0: https://darksky.net
      – A variable can correlate with a large practical effect even if its range is only 10 percentage points. Compressing it to a tenth the height of a graph doesn’t necessarily serve a purpose: https://www.philosophicaleconomics.com/2016/02/uetrend/

    • 3D plots can be great. I’m pretty good at reading heat maps and contour maps, but when I’m looking at terrain for hiking or biking I get a much better idea of what it will be like by looking at a 3D projection than by looking at a contour map. And there’s no reason to think that’s true only when y represents elevation.

      Actually I think that just about all of these rules have exceptions. But I endorse them as standard practice, to be violated only if there’s good reason. It’s just that I think there sometimes are good reasons.

      • For sure. There’s a big difference between a 3D plot that represents a three physical dimensions and the ones you get out of Excel that are 2D charts with a decorative third dimension. I think it’s the latter that such lists of rules warn you against.

  4. Edward Tufte’s books on graphical communication offer a kind of graphical ideal — simplicity admitting only those elements that are absolutely necessary for clarity. Does anyone still read them?

  5. I like The diverging color scheme in Andrew’s paper but I prefer it broken into discrete segments.

    Making the diverging scheme discrete makes it easier to track the color of an object to it’s corresponding value in the legend, and it also makes it easier to compare graphic objects, especially when there are many objects and the spread is smooth.

    In Tableau you can choose a diverging scheme and fix the number of color segments. For myself I find 7 is about the max that’s useful, otherwise you may as well use the continuous scheme.

    • I dislike discrete color schemes because a relatively large difference within a color level will appear as no difference, while a relatively small difference across levels shows up as a large difference.

      • There may or may not be small differences across the color breaks. Depends on how much data you have and how continuous it is.

        Personally I’d rather see the basic breakdown at a glance then label the units and look at the details.

        IMO the appropriate use of continuous color schemes is to show gradients, not to find values. A gradient scheme isn’t at it’s best functionality on a map of 50 cells (states at the national level or zips at the county level). If you’re using zip codes at the state level or counties at the national level, then a gradient is appropriate.

  6. I’ve seen several cases lately where I wished for a rainbow color scheme instead of a one-color-fine-shades-7-lines plot that literally took me a minute to figure out which curve belongs to which legend item. All of these were graphs layoutet by a professional graphics designer (in books, popular magazines and corporate whitepapers) and did look very aestetically-pleasing…

  7. Some of these advices are context-specific.

    In Archeology (I think) time is traditionally represented by Y-axis, because it is how fossils lie.

    Rainbow colors have a lot of drawbacks, but if you have about 7 distinct groups to represent, hey, why not use rainbow?

    • Rainbow colormaps are pretty widespread in astronomy because:

      1. Tradition and inertia;

      2. Astronomers really know the electromagnetic spectrum, so the rainbow is (for us) the only only obvious, logical ordering of colors;

      3. There are relevant physical systems that match with rainbow color schemes, such as temperatures and colors of stars;

      4. It’s part of our metaphorical technical language. E.g., we talk about radial velocities in terms of “redshift” and “blueshift” (which means that using the rainbow colormap to plot velocity fields is a no-brainer). In plots of galaxy “color” versus mass, the two main groups are “the red sequence” and “the blue cloud”; about ten or fifteen years ago, people started focusing on the galaxies in between, and that region is now called “the green valley” — not because the galaxies really look green, but because it’s obvious (to us) that green is in between red and blue. (“Valley” because there are fewer galaxies there, so in a contour plot of density it’s a low point between the red and blue ridges.)

    • The rationale against rainbow is that some colors are closer perceptually than others (orange and yellow imply closer relation than green and blue, for instance), and that the rainbow isn’t really colorblind-friendly. If you are graphing seven groups, I think these issues are going to come up anyway, so rainbow is probably not a bad bet – but I think this is where small multiples come in handy, when possible.

      • Peter, Mike:

        A big problem with the rainbow color scheme is its discrete nature (which we know about, as the rainbow itself is a continuous range of frequencies that appears discrete to our visual system). Thus, when reading a graph with a rainbow scheme, we need to mentally go back and forth between the continuous thing being mapped and the discrete color scheme. In contrast, a unidirectional or bidirectional color scheme can be read directly as continuous.

        Mikhail:

        If you have 7 discrete groups to represent, I’d recommend using small multiples, i.e. 7 separate graphs. Using the colors requires that back-and-forth between graph and legend which is so confusing.

        • > If you have 7 discrete groups to represent, I’d recommend using small multiples, i.e. 7 separate graphs. Using the colors requires that back-and-forth between graph and legend which is so confusing.

          Not if you follow the advice 1 and “Label lines directly”!

          I think if you want to show 7 time series on a single plot, well, it would be a messy plot no matter how you try. So probably you should not do it. Do small multiples instead.

          I you have 7 blobs (ie you want to show the results of clustering), why not? Just remember to label blobs directly.

          And, by the way, you can do color coding even if you use small multiples. Like I’m using blue to show the numbers of mild infections and red to show the number of severe infections. And I would use this color even if I’m showing only one of these lines.

        • Mikhail:

          Sure, if you have 7 blobs, give them different colors and label them directly. And I agree with the use fo color to make it easier to follow small multiples, as for example figure 6 of this paper. (Sorry about the red and green lines; we should fix that.)

        • In contrast, a unidirectional or bidirectional color scheme can be read directly as continuous.

          Really? Can you give me an example? (Because I find that such color schemes just provide a more limited set of quasi-discrete colors than what the rainbow provides.)

      • IMO the problem with the rainbow scheme is that the colors are non-complementary. The general rule is whatever colors you use, use complimentary colors – Not red-yellow-blue, but red-orange-yellow; not red-green-purple but green-blue-purple.

        • Interesting! thanks! I stand corrected.

          I guess the colors I suggested are gradients along…something?? :)

          The real complementary colors in the Wikipedia article are frequently used as divergent gradients, as in Andrew’s maps referenced above.

        • jim said:

          “I guess the colors I suggested are gradients along…something?? :)”

          Along the Color Wheel (https://en.wikipedia.org/wiki/Color_wheel) (or Color Circle, as Newton called it), which is also something I learned in grade school, along with the mnemonic “Roy G. Giv” for “red, orange, yellow, green, blue, indigo, violet”, which also describe the colors of the rainbow in order — see https://en.wikipedia.org/wiki/ROYGBIV.

        • I’ve never used mnemonics. Lots of people remember Every Good Boy Deserves Favors but they can’t play a lick of music much less construct a chord from the tones. Roy G. Biv seems like the most useless one of all because the progression of colors is obvious – how could a person not know orange comes between red and yellow?

          But it’s true that beyond the basic spectrum and what’s obvious to my eye, and even though I’ve done lots of graphic work that’s generally been well received, I don’t really get color. I understand how to manipulate palettes like RGB or HSL or whatever – to me, HSL makes the most intuitive sense – and I recognize that there is a numerical axis in the color model for each component of the model. But even HSL seems odd to me: how exactly are “saturation” and “lightness” distinguished? They seem in part variation of the same thing.

  8. I tend to disagree with this

    “Make graphs completely readable in black-and-white”

    because it basically says that color shouldn’t be used to convey additional information — in which case, what’s the point of using color (other than pure aesthetics)? It’s a bit like saying that all color movies should have exactly the same visual affect and meaning if seen on a B&W TV.

    • Thanks. But what I meant is that most journals won’t publish color graphs. So if you design a graph that uses color to distinguish between items (say, using different color lines to distinguish groups), then the graph won’t be interpretable upon publication.

      • How many people read print vs digital anyways these days? Why are Journals trapped in a B/W age? At best they should go full color online and B/W for print.

      • If you use color in such a way that a grayscale rendering is understandable, then the color version is even more understandable. For example, low saturation blue in contrast with high saturation red would render reasonable well in both grayscale and color.

      • what I meant is that most journals won’t publish color graphs

        I suppose that depends on the field. In mine, all the journals will publish color figures in the online version of the paper (whether it’s HTML or PDF), and of course there are no color restrictions in the preprint versions authors put on the arXiv.

        It’s true that if you want color figures in the physical, printed edition, it costs extra money — but almost all the main astronomical journals are now online-only, so that’s pretty much irrelevant; and I would venture that virtually no one reads the print edition of the only one or two that aren’t yet online-only.

    • I’m with you on this. It is way too limiting to ensure B&W readability. Let’s move on from 1950.

      Reasonable accommodation should be made for Red/Green difficulties, but not if it gets in the way of the main point being conveyed. Your paper on the other hand should be structured such that it is readable and well argued even if your reader is blind.

  9. Avoid bar graphs for relative measures (anything calculated as a ratio) or when the axis value is unrealistic (ex: blood pressure will never be zero so a bar starting at an axis of zero makes no sense). In general, I prefer to avoid bar charts completely and use a maker with error bars instead. One exception could be when looking at differences where a true axis of zero makes sense and the estimates could go in either direction.

  10. I recently read and enjoyed several articles about alternatives to the rainbow color palette. I particularly like the sections where they show how each color scheme looks under different forms of color-blindness and/or in black and white.

    Here’s a couple of them (these are R-centric but relevant beyond that):
    https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html
    http://colorspace.r-forge.r-project.org/articles/endrainbow.html

Leave a Reply to Rahul Cancel reply

Your email address will not be published. Required fields are marked *