Hullman’s theorem of graphical perception

Any experimental measure of graphical perception will inevitably not measure what it’s intended to measure.

I extracted this “theorem” from various comments Jessica has made regarding her skepticism about empirical studies of the effectiveness of statistical graphics. Of course we should be doing empirical studies all the time, but you-know-who is in the details, as discussed for example here. And then similar issues arose in Jessica’s recent discussion of explainable AI and in our recent paper, To design interfaces for exploratory data analysis, we need theories of graphical inference.

There has to be more to say about all this, but I’ll leave it here for now.

20 thoughts on “Hullman’s theorem of graphical perception

  1. A general problem is that “graphical perception” is neither a clearly defined nor useful term. When papers describe “graphical perception”, they almost always mean “visualization effectiveness”. They rarely base hypotheses on visual perception, and the findings rarely extend our understanding of visual perception. Moreover, perception is far from the only limitation in graph effectiveness, as attentional selection, working memory, long term memory, cultural semantics, etc. all often play a larger role.

    I’m not necessarily saying that people should change their research approach. They should just be honest about it and not claim that the results of every A/B study with no generalizable conclusions are caused by perception. Any graphical perception study will be ineffective because the term itself is ineffective.

      • A possible follow up on the explainable AI.

        If I play the devil’s advocate here, some of the explainable AI literature resembles the big tobacco play book of asking for more research on harms before giving up on smoking as the harms are not fully understood. Here doing more clarification and research on what it means to be explained or interpretable before giving up on black boxes just because they are not that more accurate.

  2. In case anyone is interested, here’s a few papers that give some background on why I say things like this and have a lot less faith in experiments on statistical graphics: https://mucollective.northwestern.edu/project/2020-vis-effect-size-judgements
    https://mucollective.northwestern.edu/project/uncertainty-eval-survey

    Danielle Navarro proposes the Navarro conjecture that generalizes this to “Any experimental measure of any psychological construct will inevitably not measure what it’s intended to measure.” I think she’s probably right.

  3. What content does this proposition contain that “all models are wrong” does not contain?

    Or despair that “but some are useful” does not ameliorate as usual?

    • All human behavior with graphs as studied in controlled experiments actually measures how people satisfice/optimize within the specific artificial environment they find themselves in. It’s an exaggeration of course, but the difference is that this is about human behavior and how well we can study it in a controlled way.

  4. And all this assumes that the people using the graph actually give a damn about understanding it.

    I hope no one is surprised to learn that lots of people don’t give a damn about understanding anything, what they care about is getting the “right” answer – i.e., the one the corresponds to the answer of the Superior Person looking over their shoulder, who controls their pay, promotions, work load, respectability at work etc. For these people there’s no equation that connects “understanding” to “right answer”. The “right answer” is found by guessing what Superior Person’s preference might be, not by understanding and reasoning.

    Now there’s a social science project for you: figure out how to distinguish people who actually want to understand the problem and get the right answer from those who’s interest is only social.

    • Interesting – just be in line with those you take as authoritative.

      “The method of authority will always govern the mass of mankind … Following the method of authority is the path of peace … The peaceful and sympathetic man will, therefore, find it hard to resist the temptation to submit his opinions to authority”

      The Fixation of Belief. Popular Science Monthly 12 (November 1877), pp. 1-15 Charles Sanders Peirce – http://www.bocc.ubi.pt/pag/peirce-charles-fixation-belief.html

      Outside our won areas of expertise, I don’t think we can often escape the method of authority.

      • Interesting! An old quote but still relevant.

        The “Authority” can also be a person’s social group or the larger social group of society.

        The situation arises in politics, but in the case of politics the “Superior Person” is the electorate. So, for example, what if rents are rising faster than incomes for some part of the electorate? The politician wants to get elected and knows that rent controls are popular. S/he may or may not know that rent control is destructive over the long term. But why should s/he care? HH goal is to get elected. Problem solved! Advocate for rent control!

    • Interesting, you seem to be pointing to a difference between responding to incentives and intrinsic motivation to understand. It seems reasonable to me to assume that for any person in a target population, there is some situation you could construct in which they will actually try hard to interpret a graph. The stakes just might need to be pretty extreme (eg, get the interpretation of this graph correct or die), and like you suggest, for many people this response to the stakes could come without any intrinsic motivation to understand. If this assumption is reasonable, it implies that in the real world, incentives matter a lot. So then if the experiment incentives don’t capture real world situations well (which they usually do not, because its a contrived situation, people know they are in an experiment, etc., its unclear how much the experimental observations really tell us about behavior with the graph/data/display.

      And yet, in the lab, especially in online experiments, people often talk about how the incentives don’t really matter that much. So that’s a problem right there. I would describe a lot of the data I have collected through controlled experiments over the years as a weird mess of seeming satisficing and shallow visceral reactions to the stimuli. I don’t think my data is any worse than what other people are getting. One can interpret this mess as what many people do when they are shown data, and assume it describes real world behavior, but if the responses are so shallow and driven by visceral and associational reactions to the experimental task, then wouldn’t we expect different things to happen under different circumstances? Smaller, less controlled studies where experts are observed using some interface can provide better data, but generally it means we have to give up control and our desire to make precise statements, which can be scientifically very unsatisfying.

      • Hmmm…

        I guess two quick thoughts:

        When you say “the incentives”, what do you mean? Right off the bat I can think of three classes of incentives: psychological, social, and tangible (e.g., monetary). But how to break these apart isn’t obvious, because – for example – the social approval of a superior or co-workers might have all three kinds of benefits. Chances are you won’t be getting raises or plum jobs if your boss thinks your stupid and your coworkers hate you, so even though most people wouldn’t admit to it, everyone knows social approval at work yields financial benefits.

        The fact that people are trying hard to interpret a graph doesn’t mean they’re trying to do it through understanding what’s represented on a graph. They might, for example, remember some previous similar examples they’ve seen, remember those particular interpretations, and spit that out. They’re just mentally comparing graphical forms, not analyzing the data on the graph.

        • In an experimental paradigm, it’s usually monetary incentives that are controlled. But I agree, there are other types of incentives in the real world and they can be intertwined.

          And yes, I agree effort doesn’t mean intrinsic motivation to actually understand the content. Maybe there’s some sort of “mental leap” that some people are willing to make but others aren’t when it comes to actually trying to understand versus falling back on some sort of heuristic. I’ve heard behavioral economists mention how at least with an online experiment you can expect about 15% of the responses to appear random; I’ve observed that much or more. In a recent paper on incentivized decision making from graphs we observed all sorts of heuristic responses and concluded that if we’re going to study visualizations at all we should be trying to understand what sort of strategies people are falling back on to help them answer the questions we think are testing their understanding.

        • “if we’re going to study visualizations at all we should be trying to understand what sort of strategies people are falling back on to help them answer the questions we think are testing their understanding.”

          I picked up my guitar again a few years ago after not having played seriously for quite a while. I find that minor thought disturbances really screw up my playing. I’ll be say practicing some arpeggios or something, seemingly cruising on the pattern, then my mind slowly drifts into some regular daily worry and *pow*, my fingers go astray.

          It’s really hard to know what’s going on in people’s minds. You have an idea of how information should be processed and that’s what you’re basing your observations on, but people are extremely complex and have a bazillion things going on in their minds all the time, most of which have nothing to do with what they’re doing at the moment.

          Also the idea of studying how to teach people by studying how they respond to stimuli bothers me. Alot. When I took my first petrology course, I had no idea that there even was such a thing as the granitic eutectic, let alone that it could be depicted on a 2-D or 3-D diagram of the chemical components and mineralogical phases in a granite. I was taught that. People *learn* how to interpret graphs and everything else. They aren’t born knowing that. What you’re getting when you try to measure this stuff is not intuitive ability, but a mash-up of what people have been taught before.

          whatever the case you definitely have your work cut out for you! :)

        • Jim:

          Your comment reminds me of something I’ve noticed about my chess playing. I think I understand chess a lot better thank when I was 17 but I’m a worse player. I say I understand the game better because I feel like I have a better sense of where I want my pieces to be, I can make plans, and I have a much better sense of how I can use my pieces together. I think the reason I play worse is that I find it harder to concentrate, so I play a lot more by gut and a lot less by calculation. Just to calibrate: I’ve never been very good and I’ve never played competitively; I just enjoy the game. It’s kinda fun to play by gut and see what happens, but it’s not the same as really focusing on each move.

Leave a Reply to Jessica Hullman Cancel reply

Your email address will not be published. Required fields are marked *