The R code for those time-use graphs

By popular demand, here’s my R script for the time-use graphs:

# The data
a1 <- c(4.2,3.2,11.1,1.3,2.2,2.0)
a2 <- c(3.9,3.2,10.0,0.8,3.1,3.1)
a3 <- c(6.3,2.5,9.8,0.9,2.2,2.4)
a4 <- c(4.4,3.1,9.8,0.8,3.3,2.7)
a5 <- c(4.8,3.0,9.9,0.7,3.3,2.4)
a6 <- c(4.0,3.4,10.5,0.7,3.3,2.1)
a <- rbind(a1,a2,a3,a4,a5,a6)
avg <- colMeans (a)
avg.array <- t (array (avg, rev(dim(a))))
diff <- a - avg.array <- c("France", "Germany", "Japan", "Britain", "USA", "Turkey")

# The line plots

par (mfrow=c(2,3), mar=c(4,4,2,.5), mgp=c(2,.7,0), tck=-.02, oma=c(3,0,4,0),
  bg="gray96", fg="gray30")
for (i in 1:6){
  plot (c(1,6), c(-1,1.7), xlab="", ylab="", xaxt="n", yaxt="n",
    bty="l", type="n")
  lines (1:6, diff[i,], col="blue")
  points (1:6, diff[i,], pch=19, col="black")
  if (i>3){
    axis (1, c(1,3,5), c ("Work,\nstudy", "Eat,\nsleep",
      "Leisure"), mgp=c(2,1.5,0), tck=0, cex.axis=1.2)
    axis (1, c(2,4,6), c ("Unpaid\nwork",
      "Personal\nCare", "Other"), mgp=c(2,1.5,0), tck=0, cex.axis=1.2)
    axis (1, c(1,3,5), c ("Work", "Eat,sleep ",
      "  Leisure"), mgp=c(2,.5,0), tck=0, cex.axis=1.2)
    axis (1, c(2,4,6), c ("Unpaid   ",
      "", "Other"), mgp=c(2,.5,0), tck=0, cex.axis=1.2)
  if (i%%3==1) mtext ("Excess or deficit Hours/day", 2, 2.5, cex=.9, col="black")
  axis (2, c(-1,0,1), c("-1 hr", "Avg", "+1 hr"), cex.axis=1.3)
  mtext ([i], 3, -1.5, col="black", cex=.9)
  abline (0, 0, col="gray")
mtext ("Excess or deficit hours/day spent at each activity (compared to avg country)", side=3, outer=TRUE, line=2, col="black")
mtext ("Redrawing of graph from the Economist,, using line rather than circle plots to facilitate comparisons within and between countries", side=1, line=1.5, outer=TRUE, cex=.7, col="black")

# The bar plots

par (mfrow=c(1,7), mar=c(1,0,1,0), mgp=c(1,.5,0), tck=-.04, oma=c(3,0,4,0),
  bg="gray96", fg="gray30")
plot (c(0,1),c(.5,6.5), xlab="", ylab="", xaxt="n", yaxt="n", bty="n", type="n")
labels <- c ("Work, study", "Unpaid work", "Eat, sleep", "Pers. care", "Leisure", "Other")
text (0, 6:1, labels, adj=0, col="black", cex=1.4)
for (i in 1:6){
  plot (c(-1.2,2.2), c(.5,6.5), xlab="", ylab="", xaxt="n", yaxt="n", bty="n", type="n")
  abline (v=0, col="gray")
  for (j in 6:1){
    lines (c(0, diff[i,7-j]), c(j,j), lwd=3, col="blue")
  axis (1, c(-1,0,1), c("-1 hr", "Av", "+1 hr"), cex.axis=1.05)
  mtext (paste([i],"   "), 3, .5, col="black", cex=.9)
mtext ("Excess or deficit hours/day spent at each activity (compared to avg country)", side=3, outer=TRUE, line=2, col="black", cex=.9)
mtext ("Redrawing of graph from the Economist,", side=1, line=1.5, outer=TRUE, cex=.7, col="black")

Some notes:

1. Yes, I know the code is ugly. To start, I should've entered the data into a file rather than put them in the R script. And the sloppiness goes on from there.

2. On the plus side, I was pleasantly surprised by how easy it was to make the second, coefplot-style, graph.

3. The biggest kluge with the coefplot graph was that there was exactly as much space for the words on the left as for each of the six country graphs. My limiting factor was how much space I needed for the names of the activities.

4. Rather than make the graphs using png() or pdf() or whatever, I made them on the console and then resized the graphics device to make the images more compact. This is a convenient feature in R: the image changes size but the text fonts and sizes remain fixed. When I try to make graphs directly using png() or pdf(), I have to spend a lot of time monkeying around with the font sizes.

5. A commenter suggested making the negative values red and the positive values blue in the bar chart. I tried it but I didn't like how it looked. My eye got drawn to the pattern of red and blue, and I found it hard to follow.

6. The bars are narrower than you might be used to seeing. I think narrow bars work well here. When you have fat bars your eye gets drawn to the negative spaces between the bars.

7. I was surprised how well it worked to plot absolute rather than relative differences. Absolute differences are easier to undersand; but, in addition, in this case I think they're more informative too.

8. Perhaps we can formalize the ideas in these plots and put them into the arm package along with everything else.

9 thoughts on “The R code for those time-use graphs

  1. I think there is a simpler way of making the second plot. We made a much more complicated one in our MSZ 2010 paper using one simple R command.

  2. Hadley:

    Thanks! I think my graphs are clearer but they did take me 2 hours. Yours probably took you about 5 minutes. So, at the very least, ggplot2 seems like the way to go to make some graphs fast to see what they look like. You can always go back later and spend an hour tinkering to make things cleaner. For example, an experienced ggplot2 user might spend 20 minutes creating 4 quick graphs (the two you showed plus maybe two others), then choose his or her favorite and spend an hour in base graphics (or maybe lattice or ggplot2) to make something really nice. Total time in this scenario: 1 hour 20 minutes, an improvement on the 2 hours it took me, and a more feasible long-term strategy.

    I take issue with a few of your default choices in ggplot2:

    In the first plot, I really don't like that your x-axis and y-axis labels are in the "outer" portion of the image. Trellis does this too and I don't like that either. For logical as well as practical reasons, I much prefer the labels for the individual x and y axes to be attached to the individual graphs. I also don't see the gain from putting the country names in gray bars at the top of each sub-plot. I'd prefer to but the names inside (as I do in my version).

    Another thing that bugs me with both graphs–but particularly the second–is that you have way way too many grid divisions and axis labels for my taste. I have the same problem with base graphics. I just about always redo the axes by hand to have coarser labeling. I suspect the hyperfine labeling is a relic from the old days when graphs were sometimes used as look-up tables, and I strongly recommend you play around with coarser divisions yourself.

    In the second graph, I don't like the look of the dots at the end of the lines. I actually tried that myself (this was part of the 1 hour per graph, that I tried out some different ideas) but ultimately decided that the bars worked better on their own.

    Finally, I don't like that your package seems to have automatically reordered the two dimensions in alphabetical order. Alphabetical order is horrible! I'm not quite sure what was the ordering that the Economist designers were using, but in general I think almost anything is better than alphabetical, almost always.

    Also, a couple minor comments:

    – Your code doesn't run as is. The library(ggplot2) call has to come before the call to ddply()

    – You mispelled "color" and "gray." What's that all about??

  3. Andrew: Enter ??colour and ??grey at the R prompt.

    I guess this fuzzy/regular expression matching facilitates multicultural spelling.


  4. Totally agreed with your general comments – particularly given that my main emphasis is on graphics for exploration not communication. The faster your iteration time, the more variations you can try, and the more likely you are to come up with a really good graphic that you can then spend time polishing for presentation/communication.

    On the specific issues:

    1) I see repeating labels as a waste of space. But that may be a difference in emphasis on exploration (where I'm familiar with the axes labels) and exposition (where you're not familiar with the labels). Either way should be easy to do in ggplot2, but not yet.

    2) Agreed – the default tick labelling is too fine. I will try to do better automatically (using the labeling package) but this is something I think you often need to tweak for presentation. ggplot2 can't know the desired output size of your plot.

    3) It's easy to remove the points – just remove geom_point and maybe adjust the size of the segments

    4) I couldn't easily see where the order came from in your code, but again that's easy to fix.

    5) ggplot2 understands common misspellings so that you can use color and grey if you are lexicographically challenged ;)

  5. Below are a few style changes to Hadley's work.

    Using ggplot for 5 minutes to see what they look like, leaves you about 11/12 hours left to tinker with the ggplot options. I still use lattice, especially for spatial statistics. My point is simply that one can get a lot of miles out of ggplot without much additional effort.

    Plus, I think the code is easier to read.

    My tinkering was about 15 minutes.

    # Plot
    ggplot(time, aes(excess, activity))+
    geom_segment(aes(xend= 0, yend= activity), size= 1.5, colour= "cornflowerblue")+
    geom_vline(xintercept= 0, colour= "grey50")+
    facet_wrap(~ country, nrow= 1)+
    scale_x_continuous(breaks=c(-0.5, 0.5, 1.5))+
    labs(x= "Excess or Deficit Hours per Day", y= "Activity")+
    title= "Excess Hours per Day Spent in Each Activity
    (compared to mean over all countries)",
    panel.grid.minor= theme_blank(),
    panel.grid.major= theme_blank(),
    axis.text.x= theme_text(size= 8, family= "serif"),
    axis.text.y= theme_text(size= 8, family= "serif"),
    axis.title.x= theme_text(family= "serif"),
    axis.title.y= theme_text(family= "serif", angle= 90),
    strip.text.x= theme_text(family= "serif"),
    plot.title= theme_text(family= "serif")

    ggsave(file= "MyPlot.pdf", width= 8, height= 5)


  6. Hadley:

    Regarding my point on the axis labels: often I make a grid of graphs in which there are internal and external axis. For example, suppose I'm plotting y vs. x for several countries and several years. Each individual plot is y vs. x, and the plots are arranged in, say, a 6×5 grid, with each row representing a different country and each column representing a different year. I'd like the country and year names on the outside margins and the names of x and y on the inside margins. The standard trellis/ggplot2 approach does not seem to automatically allow it. Rather, there seems to be an implicit assumption that the locations of the subgraphs are not informative.

  7. P.S. My ugly code for the axes on the line plots came up because I needed to get around an annoying "feature" of R that it doesn't print all the axis labels when it deems they are too close together. So I had to do that stupid overwriting trick. And, yeah, I should've just written a function–that would've take about 2 seconds.

Comments are closed.