More on data visualization, beauty, etc.

Nathan Yau makes some good points in response to my belated comments on his “5 Best Data Visualization Projects of the Year.”

First off, I’d like to apologize for saying the projects “suck,” That was just rude. Would I like it if somebody said that the examples in Bayesian Data Analysis “suck” because they’re not completely realistic, or if somebody said that the demos in Teaching Statistics “suck” because they’re not tied closely enough to the lecture material? A better thing for me to say would’ve been: “I don’t particularly like these as data displays, but I’m impressed by the effort that went into them, and I’m glad to see these sort of data-based displays getting a broad audience.”

In the interest of constructive discussion, I’d like to make a few points.

I would characterize all these graphs as visually attractive and data-related, so at the very least they can serve as inspiration to statisticians and other designers who are thinking about future data display challenges.

But . . . one of my big problems with labeling these as “best of the year” is that I don’t want them to be seen as completed data displays in and of themselves. I gave some detailed reasons for this attitude here, but briefly, I was, on the whole, unhappy with the graphics that you posted because I felt that they attracted attention to themselves more than they displayed the data.

To take a specific example-and recognizing that we may just disagree about what we find beautiful or interesting or helpful-I found the Baby Name Wizard to be far superior, as a tool for conveying information, than any of the examples linked to by Yao. If, for a moment, you accept this judgment, it leads to the surprising conclusion that the 5 best developments from 2008 were lower in quality than something that was done in 2005-which is a bit of a disappointment given the improvements in technology.

To get to some specifics:

– Wordle conveys a small but important amount of information (the most common words in a document and their relative frequencies) in what I see as a confusing way. Again, it it’s eye-catching, I can see that it is doing a service, is making the world a better place-after all, the alternative to people using Wordle is not, in general, people using something better, but rather people not using Wordle and thus not learning what the most common words (other than “the” etc, are in their documents). But I don’t have to like it!

– I dislike the Decision Tree because I dislike the model it is based on, and I think it leads people to a confused understanding of voting. Here, I think the world would be better if nobody were to see this graph. I’m not really complaining about the display, more about what it’s displaying.

To draw another analogy, I see pie charts as the high-tech, snazzy, attention-grabbing, beautiful graphical tool of the 1970s. It should be possible to argue both of the following:

(1) Pie charts are great–they’ve introduced millions of people to data, giving people a physical sense of numerical relationships.

(2) Pie charts are a dead end–elaborations on pie charts (3-d pie charts, exploding pie charts, and all the rest) make things worse, and they can stand in the way of more direct data displays.

For that matter, Excel’s graphics can be great. The problem occurs when people assume that the Excel output is enough. I think of all the research papers in economics where the authors must have spent dozens of hours trying all sorts of different model specifications, dozens of hours writing and rewriting the prose of the article, . . . and 15 minutes making the graphs. They just don’t realize that more can be done. And, from this perspective, Wordle and all the rest don’t really help. I guess what I’m saying is that there is still a place for traditional display tools such as line plots, and there’s a place for thinking seriously about the connection of these methods to the data and inferential problems at hand. It’s not all about making something that looks pretty and has data in it.

One could even make an analogy to literature. My taste in graphics is similar to George Orwell’s preferences in literature, for prose to be like a windowpane or whatever it was that he said. But two qualifications are needed here. First, it can take a lot of work to write clear prose, just as it can take a lot of work and a lot of practice to make clear graphs. Second, the pyrotechnic writing of a Martin Amis or T.S. Eliot can be fun in itself and also point the way forward: yesterday’s experiments can be tomorrow’s standards. Much of Ezra Pound is not so readable today but he had a big influence.

So let me re-emphasize that I am not, and was not, criticizing the general idea of snazzy graphics–after all, we link to Flowing Data on our blog. It was more that I had problems with the specific displays Nathan labeled as Best of the Year. Let’s praise the innovators who design wacky, eye-catching tools such as Wordle, but let’s also think about how to use these tools to give us a fuller understanding of the world around us, as is done by Hans Rosling and the Baby Name Wizard designers.

Wow–that was a lot! Perhaps I’ll elaborate (or simply repeat) some of these points in future blog entries.

14 thoughts on “More on data visualization, beauty, etc.

  1. What you should have said in the original post is "Britain From Above" is really cool, and is pretty good at giving a feel for the data it displays. All of the rest of them suck.

  2. i think this tradeoff between graphics as art vs. explanatory tool is interesting.

    i work in an environment where bar charts etc are the norm, and it's almost impossible to convince anyone that looking at it in a different way may be helpful.

    on the other hand, i think many of the really visually interesting graphics that have been coming out are based on "boring" data (or don't tell the story like you describe).

    the "commercial" value of facebook link data, wikipedia activity and music listening history is questionable. this makes it hard to use these as examples of why building something new would have some value (ie, be worth the resources spent to make it), and so we get stuck on the pie charts and bar charts, because we "already know how to read those".

  3. I recently did a topic classification project for a class where I used LDA for modeling topics and author information. I was very excited to use word clouds to visualize the results. And as interesting as they looked, the word clouds were not very helpful. They provided some other interesting insights into the data but they were not particularly helpful when trying to determine the themes of the discovered topics. I always tended to switch back to a table view, which brings up the divide between utility and aesthetics.

  4. Andrew –

    The best analogy may be to architecture. Modernists have long held, following Adolf Loos, that all ornamentation is superfluous. He wrote in a 1908 essay that "The evolution of culture marches with the elimination of ornament from useful objects." And the style of the American modernist Louis Sullivan is embodied in his maxim that "form ever follows function."

    Ever for those who agree that information graphics ought to be functional (and I suspect Flowing Data's Nathan Yau, being a graduate student at UCLA in statistics, is one of them) — the argument is in defining "function". Some ornamentation can be functional in how it draws the eyes to the page.

  5. Your Orwell reference on prose as a windowpane [1] is interesting, thank you. I had missed that, and am amused, since I was trained in the aesthetics of typographic design from "The Crystal Goblet, or Printing Should Be Invisible"[2] to expose the thoughts like fine wine in a crystal goblet, that "Type well used is invisible as type". This is in agreement with your (and Tufte's) data-graphics aesthetics from a second distinct vantage, but on the same metaphor. I suspect Orwell expected his more cultured readers to catch his recursive allusion, that the prose should be just as transparent a window on the ideas as the typography was on the prose — they all read each other back then, allusion was a virtue.

    [1] Orwell, "Why I Write" (1946)
    [2] Beatrice Warde (1932,1955uk/1956us), as 'Paul Beuajon' originally,

  6. Orwell’s comments on style are excellent, as one would expect. Jacques Barzun elaborated on a similar idea in Simple and Direct: a rhetoric for writers:

    What a fuss over a word! Yes, but let me say it again: the price of learning to use words is the development of an acute self-consciousness. Nor is it enough to pay attention to words only when you face the task of writing – that is like playing the violin only on the night of the concert. You must attend to words when you read, when you speak, when others speak. Words must become ever present in your waking life, an incessant concern, like colour and design if the graphic arts matter to you, or pitch and rhythm if it is music, or speed and form if it is athletics. Words, in short, must be there, not unseen and unheard, as they probably are and have been up to now. It is proper for the ordinary reader to absorb the meaning of a story or description as if the words were a transparent sheet of glass. But he can do so only because the writer has taken pains to choose and adjust them with care. They were not glass to him, but mere lumps of potential meaning. He had to weigh and assemble and fuse them before his purposed meaning could shine through.

  7. Isn't it ironic to praise Orwell for using a great turn of phrase, when the point he was making is that readers shouldn't notice turns of phrase because the writing should be transparent and only the ideas should be noticeable? I'll answer my own question: yes, it is ironic.

    There are times when you want the writing to stand out, and there are times when you want the data display (and not just the data) to stand out. In both cases, you sometimes want a little "wow" factor. There's nothing wrong with that, in the right place.

    That said, I think four of the "five best" data displays are really bad.

  8. I think in some sense that we're all talking at cross-purposes. The word "graph" or "visualization" can refer to many different kinds of chart, each with its own application. Comparing a graph designed for data exploration with one used for data display doesn't make a whole lot of sense, IMO. Given the discussion has taken a literary turn, it's a shame we don't have better words to distinguish the various users of the words for data graphics. More here:

  9. I think that it is important to admit these visualizations not just as workaday tools of data analysis, but as art that glorifies data and analysis.

    Just as once Titian painted to glorify God in a world he or his clients might have viewed as insufficiently godly, we now live in a world where I, and others I hope, feel that insufficient reverence is given to empiricism.

    Good art is thought provoking. What thoughts it provokes is not constrained to topics medieval or fashionable in the 19th century. So beautiful visualizations may well serve their purpose as art even if they fail in a posited analytical role.

  10. In what way is circos better than, say, clustering or one of the correlation matrix visualizations we spoke about a few days ago?

  11. I just came in quite late to this, but after reading some of this "debate," David Smith above is right. I think everyone's suffering from category problems. I DO think the Afghanistan war flow-chart network thingy "sucks," though. Illegible AND ugly.

Comments are closed.