A checklist for data graphics

Christian Hennig offers the following checklist for people who are making data graphics:

1. Is the aim of the graph to find something out (“analysis graph”), or to make a point to others?

2. What do you want to find out?

3. Who is the audience for the graph? (It may be yourself.)

All these should ask also: …and does it work for this aim? (2b: Could you do a different/simpler graph from which the same thing could be learned?)

4. Do all the graphical elements make sense? This concerns proper use of colours, plot symbols, order, axes, lines, annotations, text; it also involves questioning default choices of the software!

5. Is the graph informative but not overloaded?

6. Is the graph easy to understand? (. . . and well enough explained?)

7. Does the graph respect the “logic of the data”? This concerns whether graphical elements are used in ways that correspond to the meaning of the data, such as whether lines connect observations that really belong together and should be seen as connected; whether value with particular meaning such as (often) zero can be seen as such and are treated appropriately; whether variables (or objects) are standardised to make them comparable if the graph suggests to compare them; whether orderings of observations or variables (e.g., along x- or y-axis) are meaningful and helpful etc.

And here’s the background, from Christian:

I have done two sessions in a course on data visualisation (statistics students), and apart from some examples and discussions on certain details I have one “baseline” slide that has a few questions that are meant to help with doing data graphs. I didn’t want them to be very specific, rather my idea was that one could ask these questions to themselves when doing more or less whatever data graph in order to check whether the graph is good.

I just thought I share them with you in case you are in need of ideas for the blog, or at least you’d be interested. I was somehow expecting that such a thing already exists on the blog, or in other places, but I haven’t really found what I wanted, so I thought I had to do it myself. There is an obvious limitation in that I wanted it to fit on one slide, so for sure this can be added to. Probably it can also be improved sticking to the one slide limit, so I’d be really curious what you or your blog audience thinks. (Also if you have some place in mind where this kind of thing exists already, I’d be happy about a pointer.)

In fact it’s slightly longer than a slide as I have added a few things that I would say orally in the course.

All this reminds me of the advice I give in my Communicating Data and Statistics class, which is to think about:

– Goals

– Audience.

14 thoughts on “A checklist for data graphics

  1. Regarding 4 and 6, if the audience is people other than myself, then I usually show the plot to a few people to get their input. If I make the plot, then it always makes sense and is easily understood to me, so point 6 is difficult to answer for myself.

    • Asking other people is good advice for sure… “If I make the plot, then it always makes sense and is easily understood to me” – good for you, but many use defaults without questioning them consciously, and many students have to learn to exercise their own judgment rather than just doing what they believe is expected from them.

      • I see. Good point.
        When I make a plot, I’ll think hard about and go through a lot of these great steps you have mentioned. Then I will hand it to a coworker (especially one not familiar on the subject matter), and they will sometimes point out some bit of information that they gleaned from the plot that I did not necessarily intend or foresee, as a result of the way that the plot is displayed (not necessarily the information in the data). It is amazing to me how different people get different information out of the same plot. So I think it is always a good idea to have other people look at your plot and describe it to you. I actually think it is a good idea to find someone who is not necessarily familiar with the subject or analyzing data, if your audience for the plot is a broad one.

        • I’ll give an example that happened to me:
          I made a plot that consisted of a map with sample sites located on the map. The sample sites were denoted by bubbles, with size proportional to the number of samples taken from the site, and the color of the bubble was a gradient that denoted the proportion of positive samples taken from the site. I really thought this map was good – it showed the proportion by location of site on the map; it showed the approx size of the site (number of samples was a proxy – larger sites had higher numbers of samples taken); it allowed comparison of both proportion and size at the same time for different regions on the map. Note that the sites (i.e. bubbles) were not evenly distributed across the map – they showed up in certain parts of the map in higher concentrations, and most of the sites had low proportion, actually zero, (meaning similar colored bubbles).
          I showed the map to a coworker, and their first comment was, “oh, it looks like there isn’t much of the outcome in these regions”, as she pointed to blank parts of the map with no bubbles. …But that wasn’t the information I was trying to convey. Blank space didn’t mean there was lower proportion of the outcome, it simply meant no sample sites were located in those areas. Something that helped a little bit was to change the color of zero proportion to the same color as the background on the map (ie just a circle as the bubble).

          You never know how someone is going to interpret your plot.

  2. I have some similar rubrics I used in my vis course for a first assignment in which I give the students a big dataset and ask them to create a static visualization to communicate the important structure to others. Includes things like what we call ‘Expressiveness’ in vis research (does it show the data and only the data, without implying false inferences, do the mappings align well with the semantics of the data, is the most important information given visual priority, are the mappings consistent (eg respect color mappings in subplots), ‘Effectiveness’ (how accurately can one read the data – do the mappings facilitate accurate decoding of values and comparisons, does the design minimize cognitive overhead for making important comparisons, are most comparisons possible through comparing positions of points rather than colors, sizes, etc), organization and transformation (does the grouping and sorting of the data facilitate answering the important questions, are any transformations or aggregations done to the data appropriate), and non-data elements/guides (does it have a descriptive title, caption, are labels clear, is the data source/any important methodology noted, are there annotations to guide reader to see the intended patterns, are there gridlines and is the contrast appropriate, if there are legends are they necessary)

  3. It is always positive news to hear about data visualisation being included in courses. Despite what you write, Christian, there is lots of good advice around. Marc mentions several sources and you might even find my book helpful if you just pick some of the “Main points” summaries. Unfortunately authors often act in the spirit of Oscar Wilde: “I always pass on good advice. It is the only thing to do with it. It is never of any use to oneself.”

    Good graphics are deceptive. To paraphrase Dolly Parton: “You have no idea how difficult it is to draw a graphic that looks easy.”

    • “Good graphics are deceptive. To paraphrase Dolly Parton: “You have no idea how difficult it is to draw a graphic that looks easy.”
      In fact one thing I do in this course is I talk the students through a number of considerations and decisions I went through coming up with certain maybe innocent looking graphs, and I emphasize how much work it is (they then have the chance to make such effort themselves, and indeed some of the outcomes are impressive).

  4. I liked Michael Betancourt’s suggestion in one of his case study livestreams: a good plot has to be the “right” kind of ugly (or maybe “a good plot is ugly in a neat way” or something like that, can’t remember the exact quote now).

  5. Nice but given the forest plot, why not also show a L’Abbe plot, variation in control rates/means can be critical.

    The L’Abbe plot is simply a raw data plot of study group estimates (e.g. control proportion and treatment proportion) while the Forest plot just shows the study effect estimates which presumes variation in control rates is ignorable.

    Just one of the things that was suggested in L’Abbé KA, Detsky AS, O’Rourke K (1987) Meta-analysis in clinical research. Annals of Internal Medicine.

    Again just a raw data plot usually on the logit scale – hilarious that it was given the name of the first author.

Leave a Reply

Your email address will not be published. Required fields are marked *