Labeling the x and y axes: Here’s a quick example where a bit of care can make a big difference.

I’ll have more to say about the above graph some other time—it comes from this excellent post from Laura Wattenberg. Here I just want to use it as an example of statistical graphics.

Overall, the graph is great: clear label, clean look, good use of color. I’m not thrilled with the stacked-curve thing—it makes it hard to see exactly what’s going on with the girls’ names in the later time period—I think I’d prefer a red line for the girls, a blue line for the boys, and a gray line with total so you can see that if you want. Right now the graph shows boys and total, so we need to do some mental subtraction to see the trend for girls. Also, a very minor thing: the little squares with M and F on the bottom could be a bit bigger and just labeled as “boys” and “girls” or “male” and “female” rather than just “M” and “F.”

Also, I think the axis labels could be improved.

Labeling the x-axis every four years is kind of a mess: I’d prefer just labeling 1900, 1920, 1940, etc., with tick marks every 5 and 10 years.

As for the y-axis, it’s not so intuitive to have labels at 4, 8, 12, etc—I’d prefer 5, 10, 15, etc., or really just 10, 20, 30 would be fine—; also, I don’t find numbers such as 10,000 per million to be so intuitive. I’d prefer expressing in terms of percentages. So that 10,000 per million would be 1%. Labeling the y-axis as 1%, 2%, etc., that should be clearer, no? Lots less mysterious than 4000, 8000, etc. Right now, about 3% of baby names end in “i.” At the earlier peak in the 1950s-70s, it was about 2%.

Just to be clear: I like the graph a lot. The criticisms point to ways to make it even better (I hope).

The point of this post is how, with a bit of thought, we can improve a graph. We see lots of bad graphs that can be improved. It’s more interesting to see how even a good graph can be fixed up.

13 thoughts on “Labeling the x and y axes: Here’s a quick example where a bit of care can make a big difference.

  1. Axis tick choice and labeling is usually done by plotting software, and is itself a tricky problem; moreover most software packages make it difficult to customize in a robust way (ie if the data changes, the adjustment usually needs to be done again). For real numbers (including years as integers) k·4·10ⁿ is an uncommon choice, k·5·10ⁿ and k·2·10ⁿ are acceptable, but k·10ⁿ should be preferred unless there is a strong reason against it.

    Personally above I would prefer the x axis labelled at 1900, 1950 and 2000, with smaller ticks (no labels) at 1920 etc, but I understand why some people prefer 1900, 1920, … Between the choice of 12 or 3 labels, I would take 3. For the y axis, I agree that percentages would be more informative, and then labels at 1%, 2%, 3%, with unlabeled ticks at 0.5% etc.

    I agree that stacked plots are hard to interpret. I would also prefer simple line plots. And some very faint gridlines behind them, less than 0.5pt wide, and an unobtrusive light gray.

  2. I agree with your suggested improvements, but don’t agree about liking the graph. For the reasons you mention, I find it a particularly poor display of the data. I made the adjustments you suggest and it is much improved (wish I could post it here). For example, it isn’t at all clear from the graph above that the rate of male names ending in i actually exceeds that for females in the last decade – but it does. Also, there is a lot of ink in that colored graph that is simply not informative, and I find it actually distracts from the data – a simple line graph works better in my opinion. So, I only question your statement “Overall, the graph is great.” It seems not so great to me.

  3. Numerology and “namerology” seem to pop up on this blog every once in a while. “Dennis the dentist” and then if you are keeping score, “K” as in Ken, somehow mysteriously causes you to strikeout in baseball. Even more “inside baseball” is that “W” for whiff and “B” for William (as in Bill) will also cause more desire to strikeout.

  4. Only problem I see: neither axis has a label: Y: “Number of Babies with i-ending Names”; X: “Year”. That’s pretty fundamental stuff and it should be on the chart ‘cuz there int a figure cap with an explanation.

    Beyond that, the purpose of a chart is to support the discussion and this chart is well designed for the discussion. In a brief skim of the post, I didn’t find a single reference to the values on the Y axis of the above chart. The discussion is about the relative amount of names over time. There’s no reason to convert the Y axis to percentages since they aren’t discussed in the text. From the POV of the post, it would be a confusing abstraction, not an improvement, to use percentages on the Y axis. The chart is just a general chart supporting the claim that i-ending names have increased dramatically in popularity since 1900. It does that well enough. She provides a second chart to support her discussion of the m/f ratio. The X-axis is fine too.

    My personal preferences are slightly different. I agree w/ Andrew regarding the x-axis: tens and ticks would be better; On the Y-axis my preference is for the axis to be labeled “thousands” and the numbers adjusted accordingly, in 5s. The legend should spell “male” and “female” and should be larger and placed more prominently on the chart, in the white space in the upper left quadrant. Oh, well the longer I look….The title is actually what should be the label of the Y axis. The chart is about the number of names *over time*. “US Babies with i-ending Names Since 1900” .

  5. I guess it’s how you look at it. When I looked yesterday I found it pretty confusing ‘tho I didn’t comment.

    But it’s a pretty bad presentation if, should you wish to know the female i-name count at 2018 (say) you can’t just read across to the Y-axis. Is the i-female count at 2018 about 31,000? No, it’s 31,000 minus the male count at 2018. It’s a pretty poor presentation that forces one to do some mental gymnastics to get some really simple information. Elio’s straightforward presentation is a huge improvement in that respect IMO.

    • Chris:

      I too prefer Elio’s graph (and I’ll add it to the above post). But I think you’re being too hard on Wattenberg’s original plot. Its purpose is to show trends, not to be a look-up table. For showing trends, it’s does the job well. I think Elio’s graph shows the trends better—it’s good to have that anchoring at 1%, and it’s easier to understand the trends over time with a sparer x-axis—but Wattenberg’s original plot is not bad at all.

Leave a Reply

Your email address will not be published. Required fields are marked *