More and more I’m thinking that line plots are great. More specifically, two-way grids of line plots on common scales, with one, two, or three lines per plot (enough to show comparisons but not so many that you can’t tell the lines apart). Also dot plots, of the sort that have been masterfully used by Lax and Phillips to show comparisons and trends in support for gay rights.

There’s a big step missing, though, and that is to be able to make these graphs as a default. We have to figure out the right way to structure the data so these graphs come naturally.

Then when it’s all working, we can talk the Excel people into implementing our ideas. I’m not asking to be paid here; all our ideas are in the public domain and I’m happy for Microsoft or Google or whoever to copy us.

P.S. Drew Conway writes:

This could be accomplished with ggplot2 using various combinations of the grammar. If I am understanding what you mean by line plots, here are some examples with code.

In fact, that website is a tremendous resource for all things data viz in R.

Well, R's got ggplot2, which is a terrific package for making line plots and dot plots and most other sorts of plots, traditional or novel. Admittedly, though, it requires a mental model of the Grammar of Graphics to be used effectively.

For the benefit of those who haven't used it, here's what the syntax for a two-way grid of line plots look like. Suppose you have a data frame (table) of data with columns Predictor, Covariate1, Covariate2, Covariate3 and Result. The covariates are nominal (factors), while the predictor and result and continuous (numbers).

ggplot(data, aes(Predictor, Result, color=Covariate1)) + geom_line() + facet_grid(Covariate2 ~ Covariate3)

What does it mean to "make these graphs as a default"? I'm not sure that makes sense in R. It does potentially make sense in something like Excel, where you could use the existing mental models of a pivot table to drag and drop columns into the slots of a Grammar of Graphics structure.

Andrew, you really should look at Graph Builder in JMP.

(Free trial download available on their website.) There's a demo video here: http://www.jmp.com/software/jmp8/new.shtml

In a certain sense Tableau http://www.tableausoftware.com does what you ask. However, the way it structures the data to get the visualizations to come out naturally, I at least, find inconvenient for other analyses.

I second Harlan's recommendation of ggplot2. Not only does it do those simple line plots well, it also provides statistical summaries for the data, e.g. there is direct linkage to Hmisc's smean_cl_boot to display bootstrap intervals on data. Check out http://had.co.nz/ggplot2

Yes, I think, maybe, it requires a mental model for any graphics software package to be used effectively. My mental model of graphics (whether for good or bad) developed in the context of R's native graphics devices.

This is the principal reason I haven't adopted the Grammar of Graphics yet, though I have heard great things about ggplot2. It's like an internal `regime change' for graphics. Is it worth it?

If I understand what you're saying is for when we have say 50 lines and want to plot them spread over a grid of say 12 panel where which panel they are in is selected to make them comparable on some measure?

The code below plots %dem presidential vote in this way, which the panels having similar %dem in 2008.

library(lattice)

library(directlabels)

library(quadprog)

data(presidentialElections, package="pscl")

pE

Good for you! I'm accused of "over-using" line charts by certain Junk Charts readers from time to time. But I'm not relenting…

matplot() in base graphics is not half as bad.

Packages lattice and ggplot2 are more powerful. The plots produced by ggplot2 are well-designed out of the box, and great-looking.

Guess I should have escaped the angle brackets above:

data(presidentialElections, package="pscl")

pE <- subset(presidentialElections, state != "DC")

pE$state <- as.factor(pE$state)

pE$state <- reorder(pE$state, pE$demVote, function(x) mean(tail(x, 3)))

pE$panel <- round(as.integer(pE$state) / 3)

direct.label(xyplot(demVote ~ year | panel, groups=state, pE, type=c("l"), adjust=.1, xlim = c(1890, 2010)), first.qp)