What can the anthropic principle tell us about visualization?

three beears and three chairs, one broken

 

 

 

 

 

 

Andrew’s post on the anthropic principle implies statistical problems are one of three types

  1. Those that are so easy that you don’t need stats (the signal is very strong relative to noise).
  2. Those that require stats because there’s some noise or confounding to be dealt with to recover the signal.
  3. Those that can’t be helped by stats because the signal is too small or noise/confounding too big.

One way you know you’re in region 1 is when the signal is so obvious it hits you right between the eyes. This is the “interocular traumatic test”, attributed to Joe Berkson.

It makes me wonder, where does visualization usually fall in this spectrum? Our assumptions about which of the three regions we’re in will influence how we use it in our own work and (from a research perspective) how we advise people to use it, how we design visualization tools, and what problems we invest our time in. 

One observation that “hits me right between the eyes” when I think about how visualization is perceived is that many people associate it primarily with finding and displaying “signal.” The lack of emphasis on the role visualization plays in statistical modeling, and on depicting uncertainty is noticeable even in research. 

To give some examples: if you look at how visualizations are used in the public sphere, it’s rather striking how comfortable government agencies, data journalists, and others visualizing data in the public sphere seem to be with omitting depicting uncertainty alongside visualized estimates in reports. Economist Chuck Manski has called this “the lure of incredible certitude.” A few years back, I sampled about 450 visualizations from reports by OECD, and only about 3% visually expressed uncertainty or variation, and that includes counting the cases where raw data was plotted. Averages and other point estimates abound. As part of the same project I asked a number of data journalists and other well-known visualization designers how they felt about the idea of addressing uncertainty more directly in their work. A common response was something like, “Oh, well, we go through a process, to analyze the data and determine whether the pattern is real before we visualize it for other people. And other people don’t really need to see that process, they trust us.”

And then when it comes to visualization research and software development, as we’ve written about and has come up on this blog, there often seems to be a similar fixation on pattern finding. Popular visualization systems like Tableau aggregate by default, so you have to know what you’re doing if you want to see variation or uncertainty. And then there’s how we tend to evaluate visualizations in research, like looking at how much people like using it, or how well can they read the data, with much less focus on how it influences decision making under uncertainty. A few years ago I looked at about 90 experiments that evaluated uncertainty visualizations, where you would think the evaluations would be about decision making under uncertainty. But instead what predominates are measures of how well people can read values from the visualization, which we find in other research doesn’t necessarily predict good decision making, and self-reported measures like satisfaction and perceived effort. 

All this might seem to imply region 1, where the differences and trends are so obvious we don’t need to worry about probability and uncertainty. We’re hit between the eyes with all the signal. But that doesn’t seem quite right. Why do we have all these fancy interfaces to let us cross-filter and facet by multiple variables and make complex selections by interactively brushing? And why, in research on visualization do we care so much about getting the encodings and interactions right? Often evaluations involve comparing multiple ways of visualizing or interacting with the same data to find the one that’s best. If all the signal was so overwhelming, surely it wouldn’t matter that much, we could just use any technique and it would still hit us between the eyes.  

So instead we might assume most visualization applications will fall in region 2 where we need stats, because there’s some noise or confounding. Under the assumption of confounding, having the ability to slice and dice and filter the data quickly makes sense; we need to separate out sources of correlation. But then what about this lack of emphasis on directly modeling or visualizing uncertainty, and the prevalence of visualization tools that aren’t well suited to doing statistical modeling? If we assume we’re in region 2, and we consider a visualization as a test statistic, are there times when we think these statistics are sufficient, as in, we can rely on visual recognition of patterns without explicitly modeling probability and at least incorporating that in our graphs? 

I think most researchers would agree that it seems pretty unlikely that people are really good at detecting when they are in this particular region of noisy-but-not-noisy-enough-to-need-more-than-visualization. Instead I suspect the question of when visual analysis is enough versus when it needs to be part of a larger statistical workflow is seen as out of scope to people who research visualization or interactive analysis. The goal in most computer science is to solve problems, so if there’s some evidence a problem exists for which a fancy interface would be helpful, that’s often all we need to know.  

But lately I find myself wanting to take a step back to consider what class of problems visualization solves. And more specifically, how much prioritizing pattern finding over inference impacts various realistic scenarios, defined by the type of decision or judgment (e.g., a choice between risky alternatives, an allocation problem) and type of data generating process (including specifying some reasonable approximation of a sampling prior for that problem, which is extremely rare in visualization research). Utility functions will also matter, in the sense that there might be some questions which require more than comparisons of visual patterns to answer satisfactorily, but which are inconsequential enough that more error can be tolerated. 

Maybe we even need complexity classes like we use for decision problems in computer science. The constrained resource we usually care about in complexity classification is time/memory, but maybe instead for visualization should be cast as elementary visual operations like comparisons. And the model of computation could account for the representation of data, e.g., what’s the default data aggregation applied. There has been some push for development of benchmarks in the visualization/interactive analysis research communities, which is one way of taking a step towards this. Selecting benchmarks implies defining and constraining the broader problem space. All this requires more thought and at some point getting so theoretical might cease to seem like visualization research at all, but I guess that’s in the eye of the beholder!

21 thoughts on “What can the anthropic principle tell us about visualization?

  1. Jessica:

    You mention that many graphs don’t display uncertainty. I have a few thoughts on this. One thought is that statistical graphics is typically framed as “graphing the data,” so if “the data” being sent to the graphing program don’t include uncertainty, then the uncertainty won’t get graphs. Another thought is that sometimes when uncertainty is graphed, the uncertainty measure itself is taken too seriously. An example are points with error bars where the ends of the error bars are emphasized with big perpendicular lines (see for example the graph here which I found with a google image search); emphasizing the endpoints of the confidence interval is a statistical error because these endpoints are themselves noisy. My third thought is that it can be ok not to display uncertainty when the same information is captured using variation. This is the principle of the secret weapon (actually that linked example does display uncertainties, but it doesn’t really have to) and it also arises in the idea that a graph should contain the seeds of its own destruction.

    Let me also add that this “anthropic principle” idea comes up all the time in statistics, including preposterior analysis (choosing a prior based on how it would combine with plausible data that might arise), classical decision analysis (deducing probabilities and utilities from actions), and many other settings.

    • Regarding graphs not displaying uncertainty … I’ve also been thinking recently about how robust visually-based heuristics for certain decisions/judgments, like ‘just pick the one with higher sample mean/proportion/etc’ can be, and how this may excuse a lack of emphasis on uncertainty in many settings. Last summer we took a decision task, choosing between several risky alternatives, specified some different sampling assumptions, then generated data for them varying parameters of the sampling process and looked at how well one could expect to do in terms of payoffs if they used simple heuristics on point estimates. Under seemingly reasonable assumptions about how the data was generated it can be surprisingly hard to create “difficult” decisions, ones where the simple heuristics don’t work.

      The secret weapon thing is interesting – the idea of checking separate models seems related to the types of facetting and cross filtering we do all the time with visualization, exactly the kind of thing I wonder about formalizing.

  2. I’d be interested to know how the US education system teaches uncertainty visualization at a high school/middle school level. Maybe we just need to make it part of the math or science curriculum. I’d be curious to read a history on the topic to know how we got to the error-bars land we live in.

    • Anon said, “I’d be interested to know how the US education system teaches uncertainty visualization at a high school/middle school level. Maybe we just need to make it part of the math or science curriculum. ”

      First, I don’t think there is any “overall” US education system.

      Second, I wonder if there would be pushback from parents (and maybe some teachers?) who consider the goal to be obtaining certainty. (i.e, a lot of people are uncertainty-averse)

  3. > An example are points with error bars where the ends of the error bars are emphasized with big perpendicular lines (see for example the graph here which I found with a google image search); emphasizing the endpoints of the confidence interval is a statistical error because these endpoints are themselves noisy.

    It’s uncertainty all way down: https://xkcd.com/2110/

    • Interesting xkcd. Whatever you may think of Taleb, one of his arguments for fat tails is by taking the limit that arises from each estimate itself having error, all the way down.

  4. > A few years ago I looked at about 90 experiments that evaluated uncertainty visualizations, where you would think the evaluations would be about decision making under uncertainty. But instead what predominates are measures of how well people can read values from the visualization

    Do you mind adding more about why I would think this? It’s not so obvious to me but I don’t work in visualization either.

    I can see why you might say let’s study the perception + decision making pipeline instead of trying to break it down into a separate perception stage and then a separate decision stage (cuz everything is lumped together in the first and maybe it’s just too hard to connect all the details when things are separate).

    It seems like there’d be decisions that destroy information though and make this not such a good idea though.

    Like, maybe a decision requires a left-right sorta decision, but the decision-maker understands from the visualization that this is mainly a guess already. So you wouldn’t know people were actually guessing until you looked at a ton of individual decisions — but you’d know that much more quickly if you asked what the decision makers were thinking.

    • Yeah, that’s a good point, you want to understand the mechanism by which a decision made, and ideally the pipeline from perception to some interpretation/semantics to decision. My problem is more that researchers are motivating their uncertainty related work using the idea that people use data to make decisions under uncertainty, then gravitating towards a few measures that rarely include any kind of decision to evaluate techniques and concluding (based on their assumptions about what that mechanism might be – e.g., better perception=better decision, more confident judgment=better decision) which technique is best for uncertainty.

      Sometimes I use the NYT election needle as example. Many people HATED that visualization, and to be fair it had some issues, namely it was rolled out on a night when many were pretty anxious, after people had gotten use to displays that made it easy to ’round up’ and ignore the uncertainty. Plus the jitter was a function of two qualitatively different sources – the dataset updating as more results came in, but also random draws within a 50% CI, if I remember correctly. In an experiment, you might ask people to estimate probabilities with that display, and maybe they can do it ok, but they are worse than if you gave them some static visualization that maps probability to position, height etc. (which also makes it easier to ignore uncertainty if you want, but if you are asking them explicitly for probability values in an experiment, you may miss that). You could ask them how much they like using the needle, and I bet they’d say its hard and prefer something static. But when it comes to ensuring that any judgments and decisions made after that point are based on understanding uncertainty in the election outcome, it seems hard to top something as visceral as the needle. I haven’t actually run experiments on the needle so this is conjecture, but the fact that it’s possible to think of various cases where better perception or satisfaction don’t correlate with or even run counter to usefulness in decision making is why I’m distrustful of certain results in the lit.

      • > motivating their uncertainty related work using the idea that people use data to make decisions under uncertainty

        Yeah, I’ve used that loosely before.

        > where better perception or satisfaction don’t correlate with or even run counter to usefulness in decision making

        I can see satisfaction not being the right objective, but it’s hard for me to think that better perception should lead to worse decision making.

        For that to happen it seems like we’d need:
        1. The person to not understand the decision they’re making such that worse perception of reality accidentally leads to a safer choice
        2. The person to be making a different decision than we think they are (different constraints that we didn’t lay out, or whatever)
        3. Some weird thing happens with groups and averages — if some people perceive better and some worse then maybe the payouts from the decision making is such that the average change is a loss.

        But this I guess is your first point, if the promise is that the uncertainty improves decision making, then maybe it’s better to address that directly.

        I’m not sure I like the needle example here because I don’t really see that there is a decision to make.

  5. I enjoyed this post immensely. Reminded me of a piece I wrote 20+ years ago in my PhD thesis about animal science research with cows:

    Outcomes are frequently reduced to least-squares means, which are then plotted, often without error bars. With or without error bars, this does little to communicate the magnitude of the effects relative to the variance in the outcome of interest. While standard errors are useful in communicating the precision of an estimate, dispersion in the data is better communicated by standard deviations. For example, consider Figures 1.1 and 1.2, which are based on data reported in chapter 7.
    In the top panel of Figure 1.1, the adjusted means for plasma NEFA in postpartum transition cows are plotted without error bars, as frequently is done in published journal articles; no information is conveyed about the variance, or if the differences in the treatment groups are real. In the bottom panel of Figure 1.1 error bars were added, improving the interpretation of existence of treatment, day, and treatment by day differences. Most transition studies present treatment effects in one of these plot formats. The differences are very real (P< 0.01), but they belie the extensive variance present as observed in Figure 1.2, where the same plots have the raw data added as a scatterplot. Figure 1.2 demonstrates the failure of plots in Figure1.1 to communicate the classic CUF-period variance. Scatterplots are not being recommended for routine use here, but rather to demonstrate why the use of alternatives, such as standard-deviation bars, might better reveal the relative importance of treatment differences in transition cows. In Figure 1.2, the treatment differences are far less clear, though no less real, but one gains an appreciation that there is a lot of variance associated with other than treatment differences. Indeed, the treatment differences are significant but explain a very small portion (15%) of the variance. A consequence of the failure to decompose and interpret the variance is that the focus in reporting transition studies has often been on determining if treatment differences are real, rather than if they are real and important.
    The precedent for reporting probability values and standard errors in plots and tables is well established. However, some have communicated the extensive variance in transition-cow biology by reporting and plotting standard deviations rather than or in addition to standard errors (Meijer et al., 1995). For similar reasons, I have elected to report P values and in many cases, 95% confidence intervals. In some plots,
    standard deviations are included rather than standard errors. In all cases, this was done because it illuminates the variance present in transition cow data, and makes interpretation easier. Typical graphics in Animal Science are at this link: https://www.dropbox.com/s/bk1hmnksepset2f/Hidden%20Variance%20in%20Transition%20Cow%20Data.pdf?dl=0

  6. I think I’d draw a distinction between graphs we as statisticians produce and graph-generating-interactive interfaces. In the former case, we’re usually using the graph to illustrate some kind of statistical conclusion. The goal is to turn something potentially sophisticated and complicated (a type 2 situation) into a type 1 situation where the conclusion is very obvious that the reader does not need stats knowledge themselves to understand what we are saying. The full details of everything and the assurance that all the i’s are dotted and the t’s are cross can be found elsewhere. For example, we can choose to omit uncertainty intervals if we are already assumed ourselves that uncertainty isn’t important – or if the variability from observation to observation dwarfs intrinsic variation, we can choose to marginalise out a confounding variable, we can condition on a specific factor level, if those are all justified to our satisfaction. We might choose deliberately to emphasise uncertainty if that’s our core conclusion, that the uncertainty swamps any estimated effects. And if the reader doubts us, they can go look in the supplementary material and fiddle with the raw data. The visualisation is essentially a storytelling tool.

    I think the case of interactive visualisation, such as tableau is similar but also very different. At worst, we are kinda giving the user the rope to hang themselves with, divesting ourselves of responsibility for the user’s own mistakes. We could alternatively have tried to anticipate usual errors and supplied suitable defaults and constrained the options to produce good results.

    • I have issues with your statement “In the former case, we’re usually using the graph to illustrate some kind of statistical conclusion.” I agree that this is one important role for visualization. But when I teach visualization it is as a necessary step everywhere along the analysis process, not just in communicating a conclusion. I think many of the examples of poor analysis can be partially traced to a lack of visualization early on in the analysis. Andrew has many times asked for just showing the data before performing much processing – and I am almost always looking for that. In fact, I think visualization is one of the primary ways we can determine whether we are in a type 1, 2, or 3 situation.

  7. A related question would be: how much progress has there been in data visualization research? I know I’ve read some papers from some applied statistics / data visualization journal, the name of which I have forgotten. I mean: sure, things have gotten more beautiful and easier for laymen, but I suppose the research was done already decades ago. Like the 1970s, 1980s, and Tukey and the like. I might be wrong, though.

  8. I’ve been thinking about using this article as a writing prompt for low income college freshman.
    https://www.nytimes.com/interactive/2015/05/28/upshot/you-draw-it-how-family-income-affects-childrens-college-chances.html

    But someone said to me that it would be too depressing for them. My thinking had been that we’d be talking about how they have already beaten the prediction. But the fact that the reveal of the real data doesn’t include anything about variation. And here I’m not just talking about having a confidence band for the estimate of the slope but more importantly, nothing that tells the reader/viewer that the line is defining probabilities.

Leave a Reply

Your email address will not be published. Required fields are marked *