## “Data is Personal” and the maturing of the literature on statistical graphics

1. Exhortations to look at your data, make graphs, do visualizations and not just blindly follow statistical procedures.

2. Criticisms and suggested improvements for graphs, both general (pie-charts! double y-axes! colors! labels!) and specific.

3. Instruction and examples of how to make effective graphs using available software.

4. Demonstration or celebration of particular beautiful and informative graphs.

5. Theorizing about what makes certain graphs work well or poorly.

We’ve done lots of all these over the years—this blog has about 600 posts on statistical graphics—as have Kaiser Fung and others, following the lead of Tukey, Cleveland, Tufte, Wainer, etc. When writing about graphics, the above five things are what we do.

Almost always when we and others write about statistical graphics, it is in the spirit of exhortation, criticism, celebration, demonstration, or instruction—but not of open inquiry.

Yes, my views on graphics have evolved, I’m open to new ideas, and in some of my writings I’ve thought hard about the virtues of other perspectives (as in this paper with Antony Unwin on different goals in visualization)—but just about always we’re writing to advance some argument or to simply celebrate the virtues of graphical display.

There’s been some literature on comparative evaluation of different graphical approaches, but much of what I’ve seen in that area hasn’t been so impressive. It’s hard to quantitatively evaluate something as slippery as statistical graphics, given the many goals that graphs serve.

With that as background, I was very happy to read this post, “Data is Personal. What We Learned from 42 Interviews in Rural America,” where Evan Peck describes a study he did with Sofia Ayuso and Omar El-Etr:

We asked 40+ people from rural Pennsylvania to rank a set of 10 graphs. Then we talked about it.

At a farmers market, a construction site, and in university dining facilities, we interviewed 42 members of our community about graphs and charts to understand how they understand and engage with data.

We showed people 10 data visualizations about drug use that varied in their visual encodings, their style, and their source.

We asked them to rank the 10 graphs (without source information!) based on their usefulness.

After revealing the sources of the graphs, people were given an opportunity to rerank their visualizations.

The people we talked to weren’t just young and weren’t just in college. They diverse in their education (60% never completed college) and age (26% were 55+, 33% were between 35–44). Through many hours of conversations, here is what we found . . .

The point I want to make here, beyond that I found the stories fascinating, is that Peck is demonstrating a new way of writing about statistical graphics, going beyond the five standard approaches listed above.

This suggests to me that our thinking about statistical graphics is moving to a new level of sophistication, and I think that’s very important, that we can go beyond the usual tropes of exhortation, celebration, criticism, instruction, and theorizing.

This is big news for those of us who work in this field.

1. Garnett says:

The linked blog post is very interesting with lots to think about.

I might be reading too much into this small sample, but it *seems* in the last figure that liberals down-grade graphs from BreitBart while converatives down-grade graphs from NYT.

However, I don’t see the opposite quite so much: liberals don’t necessarily up-grade the NYT and conservatives don’t up-grade BreitBart.

Maybe they don’t care about up-voting their “allies” as much down-voting their “enemies”.

2. I’m not sure what you think is news in this, Andrew. An empirical or focus-group approach to visualization is standard in computer-human interaction literature. For example, the following paper is an introduction to a special issue of a journal.

This is just the computer science computer-human inteaction literature. I searched with “computer human interaction” quoted specifically to find that literature since I knew they did this kind of thing. I have to imagine the psychology, education, and marketing literature is all over this problem.

• Jeff says:

The difference that I see is not the methodology of the study in the Peck article but the observations about the many different and unexpected things that caused people to engage with or prefer certain charts.

I’ve read papers that explore various types of visualizations in an attempt to isolate the usability (or utility, or preferability, etc.) of a visualization as a function of its form–the “chart type.” There’s doubtless some value in that, but as a guy whose job involves watching people struggle to understand things I’m happy to see a growing recognition that this process is messy and also dependent on what the people in the audience bring to the table in terms of what they know, what they believe, and what they need at that moment.

(This may or may not overlap with what Andrew finds interesting but it’s how I understand it based on what I know, believe and need in this moment.)

• The keyword you want to find that kind of thing in the computer-human interaction literature is “focus group” or “A/B testing”. You’ll find all sorts of work on how people engage with graphics of all kind and how that varies by culture/demographics. There are even big ongoing projects for evaluating things like cultural differences in graphical engagement.

When I worked in spoken dialogue systems in the early 00s, the big issue with every customer was whether the computer voice was going to be male or female, friendly or businesslike, stern or helpful, etc. There were a gazillion decisions and we ran focus groups all the time to see how potential customers would react subjectively. It was informative, but not particularly scientific. Quite often, the CEO of the customer’s company would jump in and say things like, “I know my customers want a friendly woman’s voice helping them reserve long-haul trucking equipment.” or “Our customers need a businesslike man, and he shouldn’t ever apologize for mistakes because that’s our company policy.” The big linguistic problems were issues like what “next Thursday” meant when it’s Monday or when “today” is at 11:30 pm or 1:05 am or from different time zones or whether a customer would understand a question framed from the company’s perspective, like, “Are you an OEM customer?”

• jim says:

I noticed Kroger’s automated check out system has what I call the “urgent voice”, which speaks faster and slightly strident or urgent tone. It comes on if you make scanning mistakes or pass things over the scanner that don’t register. Then it says “Place the ITEM IN THE BAG!”

I bet they spent a year working on the exact tone for the urgent voice – strident enough to trigger the customer’s subliminal awareness, but not enough to consciously piss the customer off.

• Andrew says:

Bob:

Thanks for the link. I’m sure there’s been lots of great work done in information visualization that I’m not aware of. I wouldn’t think that Peck et al. did their work in a vacuum. I have seen some papers on evaluations of new visualizations, or comparisons of different visualization methods, but I haven’t felt I’ve learned anything from them. I like the perspective of Peck’s post, and it would be great to see more of that. For whatever reason, almost everything I’ve seen on visualization falls into the categories 1,2,3,4,5 above, or else just doesn’t seem interesting to me (as with the chartjunk paper linked above).

• That shows how much culture matters. Andrew and I have different opinions about what is “good” for (2: criticisms and improvements), as do a lot of people—it’s a classic bike shed issue and the easiest thing to obsess about. There’s a ton of widely replicated work on it in the computer-human interaction literature of interface components, such as Fitt’s law. I only see (4: celebrations of nice graphics) in the popular literature (like Tufte’s books or similar ones on visualizing the world-wide web that were popular in the 90s). (3: software guides) is for manuals and tutorials. To publish on (5: theorizing about what works) in computer science has required empirical results for the last 30+ years.

It’s going to be a lot of work to port all the results from CS, which often involve interactive graphics and user actions (like Fitt’s law, which governs how long it’ll take to click on a graphical component given its size). That means there’s a cottage industry opportunity if anyone cares.

But I’m just reporting on the literature I know. Is anyone familiar with work in education and psychology on these problems? I see lots of hits, but don’t have the patience to try to piece it together.

3. Martha (Smith) says:

This reminds me of how one needs to be careful to word survey questions to (try to) avoid getting answers to questions that weren’t the ones you were trying to ask.

For some examples, see https://web.ma.utexas.edu/users/mks/statmistakes/morewording.html

• jim says:

+1!

Also to always be aware that no matter how carefully questions are worded, people’s experience might lead them to view the question in a different light and answer differently than expected.

4. I’m very familiar with the empirical work in visualization because it’s what I do (I started the blog that published Peck’s post). Yes qualitative methods are popular in human computer interaction. Qualitative approaches have also been used to understand how visualizations are used/interpreted, but it remains rarer in visualization research. Since most people in the field aren’t trained in how to do actual open ended inquiry, it’s not always that informative (though that may be slowly changing). To me Peck’s work is unique among this type of qualitative study in that he isn’t setting out with some narrowly scoped agenda of identifying how people such and such plot for such and such specific expert task. Instead he interviewed people of a demographic that most visualization research rarely considers, and what he finds does not integrate well with most visualization research, which I personally think is great.

5. Ron Kenett says:

Andrew – Thanks for posting this. What might be also worth considering is what people understand from the visualization alternatives, as opposed to what they liked.

An approach to assess conceptual understanding is the application of meaning equivalence reusable learning objects (MERLO). The idea is to use alternative representations, some with meaning equivalence, some with surface similarity. It used as a pedagogical tool in K-12 education https://library.iated.org/view/ROBUTTI2016MEA
and https://www.igi-global.com/chapter/learning-in-the-digital-age-with-meaning-equivalence-reusable-learning-objects-merlo/140750

Combining what people like with what people understand adds an important dimension to the topic in this thread.

In an interview based study of 58 educators and policymakers, Hambleton (2002) found that the majority misinterpreted the official statistics reports on reading proficiency that compare results across school grades and across years.

What people see, what they like and what they understand are important complementary dimensions with sometimes significant impact. In the Ron Hambleton case it was about funding schools. The reference to this is:
Hambleton, R.K. (2002) How Can We Make NAEP and State Test Score Reporting Scales and Reports More Understandable?, in Assessment in Educational Reform, Lissitz, R.W. and Schafer, W.D. (editors), Allyn & Bacon, Boston, MA, pp. 192–205.

6. jim says:

There’s a lot more in the study than what’s discussed in the paper but the main results don’t seem that surprising and really seem consistent with what’s taught about data visualizations in particular and communication in general.

Somewhat surprisingly, the paper doesn’t acknowledge that each graphic probably is designed for a specific audience and a specific purpose and it therefore doesn’t make any attempt to assess the effectiveness of the graphics within that context. That’s a bummer because they do appear to have been somewhat effective in reaching their target audience, at least as far as these data can tell.

Graphic E was clearly the most popular. Unlike the others, it gives county-specific information for Pennsylvania. The information is also practical and actionable – what kind of treatment programs are available in my county? It’s not surprising that people in Pennsylvania find it more interesting. What is surprising is that people rated it highly even though they also rated it as confusing and cluttered. That suggests that people take the time to read the graphic and figure out how to use it if they care about the content.

Graphic J is the only other graphic that provides practical guidance. It provides risk factors for addiction to certain substances. This graphic gets the largest number of top ratings, but also the largest number of bottom ratings. Interestingly, though it gets the largest number of bottom ratings, it gets no negative comments.

The rest of the graphics are designed for a national audience. They don’t provide local or practical information. Instead, they try to communicate the context of the opioid epidemic or drug use in general. In that respect, they’re of academic interest.

It seems like most of the graphics really are reaching their intended audience, not missing the mark as the paper seems to imply. The Economist isn’t seeking to influence or deter potential drug users or provide advice to parents or relatives on how to deal with drug use problems. It’s seeking to inform policy makers & voters by providing information on the national scope of the issue. The lack of local information is intentional. No one in Nevada cares about county level drug treatment in Pennsylvania.

For the most part also basic chart choice doesn’t seem to be a major factor in people’s perception. The three time-series charts received significantly different ratings, but the more colorful chart received *lower* ratings. One colorful infographic (j) gets the largest number of top ratings, while the other (f) gets among the lowest overall ratings. Interestingly, the title of the high-rated colorful infographic is a practical question: “What is addiction?”, while the title of the low-rated colorful infographic is an abstraction: “National Statistics”.

7. Ralph Winters says:

Well, I am definitely one of those 1-5 people. Finally the genie is out of the bottle; that people do prefer graphs based upon their experience, objectivity aside. Maybe the beer drinkers can get an plain black and white bar graph showing the social determinants of health (along with a little pilsner icon in the corner), if that is what it will take to for them to take a closer look?

8. Kaiser says:

Thanks for linking to this article. Like Bob, I tried to tease out what Andrew meant by big news.

Here are some aspects of the paper that I find interesting:

1. The use of ranking on an all-inclusive, vaguely-defined metric “usefulness”, rather than a composite metric made up of rigorously defined sub-components. This is very interesting given the typically futile exercise in coming up with a proper evaluation metric.

2.Comparing charts of all kinds controlling only for a high-level topic (“drug use”) when one will typically want to control for the data, and most likely also the message. The charts being compared have different underlying data, different chart forms, different everything! This opens up a new world although in this world, one is hard-pressed to explain what aspect of the chart causes the response.

3. I’d like to show people these results that confirm that some of the most trendy graphics have the least impact – these include the concentric circles, and the “purple” maps.

That said, there is a contradiction in the core message of the author. On the one hand, he espouses a world in which we communicate data “to all people”, and warns against “deepening divides if [graphics are] not designed for everyone.” On the other hand, a takeaway from this research is the futility of an objective standard for judging data visualization, given that people’s responses to a chart is personal (and political). A further complication is the discovery that people subjectively believe that data visualization is objective.

In the world I’m more familiar with – focus groups used in marketing research, focus group research is seen as a precursor to more reliable survey research, helping to refine the question. This might point to a path beyond this paper. In my talks, I tend to present graphical evaluation in poll results. X percent of people prefer chart A versus Y percent who prefer chart B.

• Jeff says:

I don’t see an inherent contradiction in aspiring to design for broad audiences while recognizing that individual interpretations will vary. That just means it’s hard. Peck uses words like “everyone” and “all people” to encourage designers to look beyond the easily available participant pools for such research. That doesn’t mean he expects you can design something that every person will understand.

Data visualization is a communication skill. It’s like writing. It’s messy, and you rely on your audience to meet you somewhere in the middle. There will always be people who hear something different than what you are trying to say. Ultimately, the standard by which a communication should be judged is its effectiveness in conveying its intended message, or in supporting its intended analysis. People who critique data visualization tend to focus on the technical aspects of how the information is presented and perceived, but as you note and as Peck’s paper illustrates, it’s more complicated than that.

9. Mikhail Shubin says:

I feel there is much harder to give talks about data visualization now. Five years ago you could talk for two hours without preparation about how everything is shit and how you should not use 3d Red-Green pie-charts.

But now defaults have improved, the general visualization literacy is higher, there are much less things to bash. You have to give positive suggestions!