Kaiser Fung on Axis Zero and Spiral Charts

1. Axis Zero

Regarding the perennial question of whether to have your y-axis start at zero, I wrote, “If zero is in the neighborhood, invite it in.” Kaiser improves this advice by pointing out, first, that you should only worry about zero if it makes sense. I’d illustrated that point with the example of degrees Fahrenheit; Kaiser uses the example of a test score where nobody actually scores anything close to zero. Another example would be adult heights: zero height is meaningful in a literal sense but is not at all in the neighborhood of the data. More formally we could make this a statement about dynamic range, using a rule such as, “For all-positive data y, include zero if max(y)/min(y) > 2.” But only if “zero” has some meaning. In my experience, it usually does, but not always, and I’ve seen some pretty silly examples where people miss the point. For example, taking survey responses that are on a 1-5 scale and plotting on a graph that goes down to zero. Don’t do that! In that case, better to put the axis at 3.

In his post, Kaiser also points out that, if your x-axis isn’t at y=0, you should take a look at where it is, in case you’re losing an opportunity to convey some information. Somehow this doesn’t seem like such a big deal for the x-position of the y-axis, maybe because x often represents time, with a continuous data stream so there’s no natural starting point.

2. Spiral Charts

Kaiser discusses an eye-catching graph of new covid cases that recently appeared in the newspaper and apparently was much discussed on twitter. Here it is:

I agree with Kaiser that a straight-up time series graph coveys the data much better: the time-series clearly displays each peak, as well as subtle details such as the fact that the initial peak was slow to decline, whereas later peaks were more symmetric:

I don’t think Kaiser’s graph is perfect, either: for my taste, he’s got too many numbers on the y-axis, which seems like a relic from the old days when graphs were used as look-up tables, and I find the horizontal axis labeling very confusing: I’d prefer some clarity showing 2020, 2021, and 2022 rather than these selected dates which then I need to decode.

Kaiser does a good job of explaining what got lost when going from the time series to the spiral: in short, the spiral has a lot less resolution—unless you really really stare at it, you can’t learn much more than the fact that the rate of new covid cases started at 0 and then had some cycles of increasing and decreasing, and is now high—which any reader of the newspaper already knew before seeing the graph! The curving and reflecting and stretching makes it just about impossible to see anything more than that (again, compare to the time series), and it also adds this weird artifact by which it looks like the lines in 2022 is about to intersect the line from 2021, but that has no meaning as it’s entirely dependent on arbitrary scaling parameters. Also it’s not clear why the spiral doesn’t start at the center! Lots of puzzling things there.

But . . . what is the purpose of the spiral, exactly? It’s not to convey the data. For that, we have the time series graph! The spiral is there to look cool, to get our attention. That’s not such a bad goal. As Antony Unwin and I wrote, we can think of such infographics not as competing with statistical data visualizations but rather as illustrations:

Newspaper and magazine articles are often illustrated by photographs and cartoons which are pretty or shocking or attention-grabbing or interesting and in some way complement the substance of the article. We don’t generally criticize newspaper illustrations as being noninformative; they’re not really expected to convey substantive information in the first place. From that perspective, an infographic can be a good choice even if it does not clearly display patterns in the data.

We recommend the click-through solution: Start with the eye-catching infoviz, then click on it to get the statistical visualization (in this case, the time series plot), then click again to get the spreadsheet with the raw data.

Ideally you can have a graph that is both statistically informative and surprisingly beautiful, as with these recent examples from Danny Dorling. But usually we get just one or the other. I guess we can appreciate a striking infoviz for what it is, while also following Kaiser’s lead and being aware of what it’s missing.

Also, there is one thing that the spiral graph can give you, in principle, which is that it lines up different parts of the year, so in theory you could see patterns recurring each spring or whatever. In this particular example, though, we don’t see any notable annual-cycle patterns so that doesn’t really give us anything.

14 thoughts on “Kaiser Fung on Axis Zero and Spiral Charts

      • Sherlock Holmes solves a case by noting “the strange behavior of the dog in the night.”. What’s strange is that the dog didn’t bark (proving an inside job). People like Governor DeSantis, thought Covid was seasonal. The spiral is pretty clear contrary evidence.

      • The seasonality would be shown better in a time series plot similar to Kaiser’s but instead of a single chart, plot a pair of stacked one-year charts starting March 1 with months on the horizontal axis. The chart could also be shaded for seasons.

        I don’t think the spiral chart shows the seasonality well at all, and the month labels are bad: there is a line labeled “Oct”, but “Oct” is a region, not a line. This chart would be improved with seasonal shading as well.

        • Anon:

          I agree that from a statistical graphics perspective, the spiral plot is not a very good way to show seasonal patterns. I had a similar reaction to Florence Nightingale’s circular plot. The strong points of these graphs is that they are eye-catching visualizations. To see data patterns, I think the straight time series is the way to go, along with alternative visualizations such as stacked one-year plots to show time patterns.

          From a conceptual point of view, the spiral plot respects seasonality, but this is more of a theoretical point than something that is practically useful.

        • “The strong points of these graphs is that they are eye-catching visualizations”

          It is eye-catching – it looks like one of those horns that the Whos down in Whoville play on the Grinch! :)) Suessian for sure. But I guess I don’t think “eye-catching” is strong point if it doesn’t come through on the utility – all the worse for novices, who will then justifiably be confused by how to use it sensibly. When they try to read the graph and find it lacking I know just what they’ll do. Their mouths will hang open a minute or two, then the Whos down in Whoville will all cry, “Boo Hoo.”.

  1. “For all-positive data y, include zero if max(y)/min(y) 2? As in, the minimum is closer to 0 than it is to the maximum, so you might as well show its distance to 0 as well to get an appropriate sense of scale? In contrast, if max(y) = 1001 and min(y) = 1000, max(y) / min(y) = 1.001, which is < 2 (but starting the vertical axis at 0 would obscure all the variation in y).

    On the broader use of polar coordinate systems for data viz, I've like and used it in some contexts, like re-visualizing a high-dimensional heatmap with lots of internal structure (similar to 'hive plots') — though even then, the result was pretty hairball-y, and I felt it really worked best interactively, i.e. when you could hover over / click particular edges or nodes to get more information or subset the graph to strip away all but some aspect.

    Otherwise, my primary exposure to them comes from circular & fan depictions of phylogenies (or more broadly, dendrograms in hierarchical clustering), e.g. here: https://guangchuangyu.github.io/ggtree-book/ggtree-book_files/figure-html/layout-1.png which I've broadly found harder to parse than the standard rectangular layout… maybe they can help visually ID clusters if birth rates are much higher at the roots than at the tips, but otherwise I've always seen their primary application to be in the realm of spiffy looking wall art & t-shirts, or when you're drawing a robot and want to give its eyeball awesome looking sclera.

    • Weird, it cut off a sentence from my first paragraph — guess I didn’t escape the angled brackets and formatting ate everything between them? Damn alligators. Should have read “…max(y)/min(y) ‘less than’ 2. Shouldn’t this be max(y) / min(y) ‘greater than’ 2?”

    • One advantage of circular layouts for phylogenies etc. is that they economize on space, by a factor of pi. When it’s not convenient to use a long, skinny space to show your phylogeny, rolling it up is convenient.

      • Ah, good point. I do wonder if there’s an good way to collapse clades down to their parent node and draw a bunch of little subtrees in panels around the main tree — no off-the-shelf functions to do that afaik, but should be fairly easy to implement from scratch. You’d maybe lose some ability to compare relative tree heights, but idk how much you have that in the radial view anyway, and with everything still rectangular you could still economize on space by tiling (since papers / screens are rectangles themselves). Or maybe it would just be super confusing! idk

  2. > he’s got too many numbers on the y-axis, which seems like a relic from the old days when graphs were used as look-up tables

    I look up numbers in graphs all the time! If you need to numbers, not having them is very annoying. If you don’t need those numbers — I get that it can be mildly annoying but that’s just mildy annoying. Very annoying > mildly annoying, so I’m happy with lots of labels here.

Leave a Reply

Your email address will not be published. Required fields are marked *