The old, old story: Effective graphics for conveying information vs. effective graphics for grabbing your attention

Posted on March 21, 2010 8:21 AM by Andrew

One thing that I remember from reading Bill James every year in the mid-80’s was that certain topics came up over and over, issues that would never really be resolved but appeared in all sorts of different situations. (For Bill James, these topics included the so-called Pesky/Stuart comparison of players who had different areas of strength, the eternal question (associated with Whitey Herzog) of the value of foot speed on offense and defense, and the mystery of exactly what it is that good managers do.)

Similarly, on this blog–or, more generally, in my experiences as a statistician–certain unresolvable issues come up now and again. I’m not thinking here of things that I know and enjoy explaining to others (the secret weapon, Mister P, graphs instead of tables, and the like) or even points of persistent confusion that I keep feeling the need to clean up (No, Bayesian model checking does not “use the data twice”; No, Bayesian data analysis is not particularly “subjective”; Yes, statistical graphics can be particularly effective when done in the context of a fitted model; etc.). Rather, I’m thinking about certain tradeoffs that may well be inevitable and inherent in the statistical enterprise.

Which brings me to this week’s example.

At the sister blog, Erik Voeten posted a graph from Adam Bonica summarizing campaign contributions from several different categories of people:

This graph has been all over the web in the last few days–people loooove it–but from my perspective, it has some obvious, obvious flaws. As I suggested in a comment to Erik’s blog:

A simple dotplot (with occupations listed in order of their average “conservatism” of contributions, however this is measured) would be much better–at least for conveying the information. Better still might be a scatterplot of avg $ contributed vs. avg ideology.

Three’s also a couple of huge selection effects going on in the above graph:

1. 2008 was an unusual year in which Democrats received more campaign contributions than Republicans. By focusing on 2008, you’re drawing a misleading picture of the historical pattern of partisanship of contributions.

2. It’s not at all clear to me how the categories map to the colors, or what happened to the people who don’t fall into any of the occupation/industry categories shown.

By bringing all this up, I’m certainly not trying to slam Adam Bonica, an enterprising student who went to the trouble of making this graph for free for all of us! Bonica and others have the opportunity to take this forward from here, and that’s what research is all about.

At a more political-sciency level, there’s a lot in Bonica’s article and a lot more to look at. As Erik noted, following our discussion of Bonica’s mixing of occupations (“professors” with industries (“oil and gas”): “Further complicating things is that for research questions related to lobbying influence you would want to know about industries but if you want to make more sociological statements about ideology, then profession may well be the more appropriate category.”

I’d also refer interested readers (and researchers) to the work of Thomas Ferguson, who’s done some historical studies of campaign contributions in the U.S. since the 1930s.

What I want to focus on here, though, are questions of graphical presentation. From the standpoint of conveying information, I have no doubt that a dotplot or scatterplot would be far superior. Bonica’s graph looks cool but it is filled with distracting vertical lines, and the occupation/industry categories are extremely hard to read, with no sense of whether these categories are intended to be arbitrary, or exhaustive, or something in between.

On the other hand, if Bonica had simply made a dotplot or scatterplot, it wouldn’t have looked so cool to many, and it might well have been lost in the

P.S. As I said at the beginning, this topic–the tension between conveying information and grabbing attention–has come up again and again, most notably here.

12 thoughts on “The old, old story: Effective graphics for conveying information vs. effective graphics for grabbing your attention”

LemmusLemmus on March 21, 2010 9:14 AM at 9:14 am said:

Proofreading alert: Your pt. 1 above. (Feel free not to publish this comment.)
Andrew Gelman on March 21, 2010 9:39 AM at 9:39 am said:

Wow–I had 3 typos in one paragraph! How did that happen??
Phil on March 21, 2010 8:43 PM at 8:43 pm said:

I like it, though — the graph, I mean.

Not having seen anything except the graphic as you link to it, I'd assume that blue is the distribution of Democrats and red is the distribution of Republicans…which, if true, means there are some conservative Democrats but no liberal Republicans.

As it is, we get no distributional information within job categories. But if they did as you suggest — dotchart — we'd have no distributional information _at all_. At least this way, we can see that profs and filmmakers really are shifted way to the left, not just compared to other occupations but compared to most people. That's interesting, and I'm glad we can see it.

I agree that there are better ways to present the same information…and, as usual, I would simply like to see _more_ information. How about showing an interval for each occupation, a horizontal bar, so we can see the 10th-50th-90th percentile within each occupation (as in the plot you showed a few posts ago, with the experts' uncertainties in collapse time of a dam)?

And I agree that this is an odd assemblage of occupations and I'd be interested in either seeing more or seeing how these were selected…but perhaps that's discussed in the caption or report accompanying this plot, if there is one.

I also agree that using 2008 data seems a bit misleading, but that's an issue of data selection rather than graphical display.

Basically, I share your desire to have these guys go back to the drawing board to draw a better plot, but I think this one is better than you do, and I also don't like your specific suggestion, assuming I understand it.
Andrew Gelman on March 21, 2010 11:09 PM at 11:09 pm said:

Phil: It's the same thing I discussed above: should the graph convey information or look cool? I'm not clear what exactly the colors do mean, also not clear what the x-axis is. I agree with you that it would be most desirable to include that information, but I can't figure out how to do it until I know what it all means! I think it would be easy enough to include such information as part of the dotplot, either with strips of red and blue colors for each occupation/industry category or as a separate colored picture. Another way of putting it is that the graph as shown is already a scatterplot; it just has an arbitrary y-axis.

In any case, my point is not to criticize Bonica's graph–this is an unpublished paper and he has lots of opportunity to do more, partly based on feedback such as ours–but to meta-comment on what it takes for the graph to get noticed. I think that people were attracted to something about the appearance of this graph and its general message rather than the details of the information it was conveying. In that sense, the graph is more of an illustration of a simple point than a way to convey quantitative information.

And a more direct and effective approach to convey the information might well, I think, have received much less attention. To convey information most effectively, I believe a graph should be transparent and, to as large an extent possible, self-explanatory. In contrast, I think one thing people like about the graph shown above is its novelty and air of mystery. It is somewhat of a puzzle to figure out what it is saying and it gives the reader a warm feeling that that there are unplumbed deaths underlying its surface.

To put it even more directly> bells and whistles and fancy tricks attract attention but can make it harder to convey the quantitative information. I'm not saying that a simple clear graph would be best here: such a graph might never have gotten noticed in the first place. The best, I suppose, would be something flashy to get noticed and then, back at the main page, something more clear that would bear further examination.
Michael on March 22, 2010 2:46 AM at 2:46 am said:

Perhaps I'm just dramatizing your point here, but you don't seem to mention the most glaring flaw of all — what is that vertical axis measuring?

The humped things look kind of like sample distributions. Does each integrate to one? Or is this telling me that total contributions are about as equal as those two areas?

Or is it something completely different? Perhaps the answer is there in a caption you omit in order to focus on your point.
Andrew Gelman on March 22, 2010 5:35 AM at 5:35 am said:

Michael: I'm omitting nothing. I put in the graph as I saw it on the blog, along with links that allow the reader to research the topic further. I think the best graphs are self-explanatory (or close to that ideal), but I know from experience that this can be done. Certainly when I was a student I made lots of graphs that are much worse than Bonica's posted above. So let me emphasize once again that I am not trying to criticize his efforts; rather, I'm suggesting that there is an inherent tension between clarity and coolness, and maybe it's no coincidence that this particular hard-to-follow graph got attention where a plainer, more direct presentation might not have been noticed.
NU on March 22, 2010 5:50 AM at 5:50 am said:

I don't like that the graph lacks distributional information about each profession. I like Phil's suggestion (or my understanding of it): replace each vertical line with a horizontal line whose width represents an interval (e.g., 5/95 percentile), maybe with a dot in the middle as a measure of central tendency.

Vertically sort these horizontal lines by central measure of ideology (e.g., from least to most conservative, top to bottom).

Optionally include the population distribution on the bottom, or one population interval. Or the two distributions from the original graph, if I understood what they were. (Are they the ideological rankings of registered Democrats/Republicans?)
Krzysztof Sakrejda-L on March 22, 2010 7:55 AM at 7:55 am said:

Assuming these are density plots, why is there an absence of labels to the right of "Oil and Gas"? I'm curious about the correct labels to put on these mystery conservative industries/professions.
Phil on March 22, 2010 8:59 AM at 8:59 am said:

Although I, too, like self-contained graphics, I don't mind having to read the caption, which in this case is:

"As a first cut, I recovered ideal point estimates for the 3125 PACs and 131,000 individual contributors that gave to two or more unique candidates during the 2007-2008 election cycle and scaled them using the IMWA procedure. The figure below ranks a subset of occupations from left to right based on the mean ideal point of the members of each occupation. As a point of reference, the occupation ideal points are imposed over the density plots for all Democratic and Republican candidates."
Adam on March 22, 2010 4:51 PM at 4:51 pm said:

When I posted that figure, I expected no more than a few dozen people to see it. When the plot started to attract some attention, I started to regretted not putting more care into it. Although I disagree that a simple dotplot would have been more informative, I agree with you on most of your points about presentation.

In retrospect, I should have been more careful in mixing the groupings between occupations and industries. I was careful to select professions with large N’s without any industry cross-over so that the groups remained mutually exclusive–for example, professors are all coded as employees in the Education industry–but including a group for salesmen would span multiple industries.

One might chalk this up to a beginner’s mistake, but I now realize that describing the figure using text in the blog post is a very bad idea. I know this should have seemed obvious before hand, but when someone re-posts the plot on another blog, they generally just take the image and leave the text.

Anyway, I made another attempt at making the graphs, which can be found here.There are still a few kinks I need to work out in the ggplot2 code–for instance making the candidate distributions the same height in both graphs–but I think it's a step in the right direction.
Andrew Gelman on March 23, 2010 12:19 AM at 12:19 am said:

Adam:

Very nice. Using the y-axis to show the total helps a lot. It makes the graph into a sort of enhanced scatterplot. Good stuff. That's one reason for posting things on the web–to get people's free comments!

And, just to say it one more time, I thought your original graph was fine; I was just reflecting on the question of what is it about a graph that makes people notice it. When you consider the features of a graph that people find the most awesome, these are often not the information-conveying parts but rather the attention-grabbing parts. It's a message that I should be aware of, given that in Red State, Blue State we had lots and lots of graph which were mostly functional without being grabby. Having a few more unusual pictures–rather than lineplot after lineplot and scatterplot after scatterplot–might have helped us attract more interest.
Ed. on March 23, 2010 6:49 AM at 6:49 am said:

a european would color the liberals red and the conservatives blue.

in fact, this was what my brain was telling me before i studied the labelling on the x-axis more closely…

Comments are closed.