A question on graphics

Boris writes,

Andy,

I reread your 2002 paper on plots on the subway, and I was inspired to transform the tables in our TSCS paper to plots, given that we are explicitly doing comparisons, the core of your recommendation for graphical displays of results.

Check out the plot version of table 2, and compare it to the table 2 on page 13 of the paper, and the same for .

I took your advice about doing multiple plots, keeping zero as a reference point, pointing out significant points (as I have by drawing a grey line at 100, an important value for the statistics we use), and of course leaving out legends.

I have a question about another plot replacement for a table. Check out the
, and compare it to the second half of table 2 on page 3.

It’s not important that you understand what rel. efficiency is, but I’m interested in hearing your advice in the spirit of your 2002 paper. I am comparing two columns of numbers here (OLS/BML Eff and FE/BML Eff), and one column’s crazy values screws up the graphical display of the values of the second column.

Specifically, look at row 9 of table 2. There are two values here, 1350 and 150. Compare that to the third plot here (“Efficiency: Varying Heteroscedasticity”). The 1350 gives the whole plot a a huge yrange, which obscures the important value of 150 in the second column of row 9.

Is there some kind of way to keep the linear scale for the y axis (which I like) and just chop out a whole range of values (like say 300-1200), so that the 150 amount will be clear (not the exact number is imp’t, but the fact that it is comfortably more than 100. I see this in plots in the newspaper all the time; where skyrocketed values are chopped in the middle and replaced with 2 wavy lines. Is this something you recommend, and if so, how do you do it in R?

My reply:

In the first plot (rmse.pdf), don’t use those goofy squares, triangles, and circles, and don’t use those goofy dotted and dashed lines. Just plot them all as thin lines, and label the lines directly on the graphs. The relations are monotonic (square > triangle > circle for all the graphs, so it should be easy enough to label them clearly.

Also on thse graphs, make 0 a hard bound at the bottom (use the yaxt=”i” option in R) since RMSE can’t be negative. By the way, I like that you use RMSE and not MSE.

You also might want to put them all on the same vertical scale (0 to 1.5, I guess). Or maybe not. I don’t know enough about your example to know whether that would make sense here.

In the second plot (“optimism.pdf”), again make 0 a hard bound at the bottom. Also, again, ditch the symbols and goofy lines, and just label the 5 lines directly on each graph. Either in the middle of the graphs, or extend the rightmost axis a bit so there is room for labels there. Again, the monotonicity will make these labels easy to read.

Same points for your third plot (“efficiency.pdf”).

But . . . hey! I just realized something. You’re not using one of your dimensions! Each of your pages of plots has 5 conditions, which are placed in 3 rows. But hey, just make them a little smaller and you can have all 5 plots in 1 row. It might seem tough to fit 5 graphs, but there’s room to make them narrower, and you can use landscape format if you’d like.

What does that get you? Well, now you can have 3 rows of 5 graphs. Row 1 is RMSE, row 2 is optimism, and row 3 is efficiency. Each of the 5 colums of this 3×5 grid corresponds to a factor being varied: varying T, varying N, varying heterosc, varying cont corr, varying unit effects. You can tell your whole story all on one page! And I think patterns will be clearer.

Finally, you had one other question, which was that the 1350 screwed up your yrange in efficiency.pdf (in my reformulation, your third row of plots). I’d actually try putting them all on that same scale and seeing if things are really so unreadable as all that. Another option is to use a smaller range (e.g., 0-500) and then just let those lines go up off the top of the graphs. It depends how important it is to you to show exactly how high they go. I wouldn’t recommend the squiggly-line trick.