The title of this post is a line that Jeff Lax liked from our post the other day. It’s been something we’ve been talking about a long time; the earliest reference I can find is here, but it had come up before then, I’m sure. The above histograms illustrate. The upper left plot averages away […]

**Statistical graphics**category.

## size of bubbles in a bubble chart

(This post is by Yuling, not Andrew.) We like bubble charts. In particular, it is the go-to visualization template for binary outcomes (voting, election turnout, mortality…): stratify observations into groups, draw a scatter plot of proportions versus group feature, and use the bubble size to communicate the “group size”. To be concrete, below is a graph […]

## Whassup with the weird state borders on this vaccine hesitancy map?

Luke Vrotsos writes: I thought you might find this interesting because it relates to questionable statistics getting a lot of media coverage. HHS has a set of county-level vaccine hesitancy estimates that I saw in the NYT this morning in this front-page article. It’s also been covered in the LA Times and lots of local […]

## When can a predictive model improve by anticipating behavioral reactions to its predictions?

This is Jessica. Most of my research involves data interfaces in some way or another, and recently I’ve felt pulled toward asking more theoretical questions about what effects interfaces can or should have in different settings. For instance, the title of the post is one question I’ve started thinking about: In situations where a statistical […]

## Hullman’s theorem of graphical perception

Any experimental measure of graphical perception will inevitably not measure what it’s intended to measure. I extracted this “theorem” from various comments Jessica has made regarding her skepticism about empirical studies of the effectiveness of statistical graphics. Of course we should be doing empirical studies all the time, but you-know-who is in the details, as […]

## Let them log scale

This post may seem like it’s on a six month delay, but actually it’s not! Alexey Guzey sends a link to this blog post about a study done by some researchers at LSE and Yale earlier in pandemic history on how well understood log scales are. They randomly assigned 2000 American adults recruited online to […]

## “Better Data Visualizations: A Guide for Scholars, Researchers, and Wonks”

Lee Wilkinson recommends this book by Jonathan Schwabish: I [Lee] think most books on “business” charts are junk, but this one is different. Schwabish does his work instead of incessantly quoting Tufte and online rants about pie charts. He’s one of the only writers on pie charts who seems to have read the research on […]

## Tukeyian uphill battles

It seems that at least once a year, I find myself begging someone to make exploratory plots of some experimental data. I say begging because I have found that often when I’m being presented with some analysis and I ask questions like Did you plot all the variables first? or Did you look at this […]

## Subtleties of discretized density plots

Many people are familiar with the idea that reformatting a probability as a frequency can sometimes help people better reason with it (such as on classic Bayesian reasoning problems involving conditional probability). In a visualization context, discretizing a representation of uncertainty, or really any probability distribution, can be useful for other reasons. For instance, by […]

## Sketching the distribution of data vs. sketching the imagined distribution of data

Elliot Marsden writes: I was reading the recently published UK review of food and eating habits. The above figure caught my eye as it looked like the distribution of weight had radically changed, beyond just its mean shifting, over past decades. This would really change my beliefs! But in fact the distributional data wasn’t available […]

## xkcd: “Curve-fitting methods and the messages they send”

We can’t go around linking to xkcd all the time or it would just fill up the blog, but this one is absolutely brilliant. You could use it as the basis for a statistics Ph.D. I came across it in this post from Palko, which is on the topic of that Dow 36,000 guy who […]

## Most controversial posts of 2020

Last year we posted 635 entries on this blog. Above is a histogram of the number of comments on each of the posts. The bars are each of width 5, except that I made a special bar just for the posts with zero comments. There’s nothing special about zero here; some posts get only 1 […]

## How many infectious people are likely to show up at an event?

Stephen Kissler and Yonatan Grad launched a Shiny app, Effective SARS-CoV-2 test sensitivity, to help you answer the question, How many infectious people are likely to show up to an event, given a screening test administered n days prior to the event? Here’s a screenshot. The app is based on some modeling they did with […]

## Is there a middle ground in communicating uncertainty in election forecasts?

Beyond razing forecasting to the ground, over the last few days there’s been renewed discussion online about how election forecast communication again failed the public. I’m not convinced there are easy answers here, but it’s worth considering some of the possible avenues forward. Let’s put aside any possibility of not doing forecasts, and assume the […]

## I like this way of mapping electoral college votes

This post is by Phil Price, not Andrew. I like maps — everybody likes maps; who doesn’t like maps? — but any map involves compromises. For mapping electoral votes, one thing you sometimes see is to shrink or expand states so they have area proportional to electoral votes (or to population, which is almost, but […]

## Why is this graph actually ok? It’s the journey, not just the destination.

Josh Miller was in my office and started flipping through Kieran Healy’s book on data visualization, a book that I like a lot—I even use it in my class, replacing Cleveland’s Elements of Graphing Data which is wonderful but things have changed in 35 years so time for a new book. Josh noticed Figure 8.17 […]

## Interactive analysis needs theories of inference

Jessica Hullman and I wrote an article that begins, Computer science research has produced increasingly sophisticated software interfaces for interactive and exploratory analysis, optimized for easy pattern finding and data exposure. But assuming that identifying what’s in the data is the end goal of analysis misrepresents strong connections between exploratory and confirmatory analysis and contributes […]

## Follow-up on yesterday’s posts: some maps are less misleading than others.

Yesterday I complained about the New York Times coronavirus maps showing sparsely-populated areas as having a case rate very close to zero, no matter what the actual rate is. Today the Times has a story about the fact that the rate in rural areas is higher than in more densely populated areas, and they have […]

## All maps of parameter estimates are (still) misleading

I was looking at this map of coronavirus cases, pondering the large swaths with seemingly no cases. I moused over a few of the gray areas. The shading is not based on counties, as I assumed, but on some other spatial unit, perhaps zip codes or census blocks or something. (I’m sure the answer is […]

## Sleep injury spineplot

Antony Unwin sends along the above graph in response to this recent post. The data are kinda crap, but I agree with Antony that this plot is a good way of showing the number of cases corresponding to each histogram bar.