Skip to content
Archive of posts filed under the Statistical graphics category.

Is there a middle ground in communicating uncertainty in election forecasts?

Beyond razing forecasting to the ground, over the last few days there’s been renewed discussion online about how election forecast communication again failed the public. I’m not convinced there are easy answers here, but it’s worth considering some of the possible avenues forward. Let’s put aside any possibility of not doing forecasts, and assume the […]

I like this way of mapping electoral college votes

This post is by Phil Price, not Andrew.  I like maps — everybody likes maps; who doesn’t like maps? — but any map involves compromises. For mapping electoral votes, one thing you sometimes see is to shrink or expand states so they have area proportional to electoral votes (or to population, which is almost, but […]

Why is this graph actually ok? It’s the journey, not just the destination.

Josh Miller was in my office and started flipping through Kieran Healy’s book on data visualization, a book that I like a lot—I even use it in my class, replacing Cleveland’s Elements of Graphing Data which is wonderful but things have changed in 35 years so time for a new book. Josh noticed Figure 8.17 […]

Interactive analysis needs theories of inference

Jessica Hullman and I wrote an article that begins, Computer science research has produced increasingly sophisticated software interfaces for interactive and exploratory analysis, optimized for easy pattern finding and data exposure. But assuming that identifying what’s in the data is the end goal of analysis misrepresents strong connections between exploratory and confirmatory analysis and contributes […]

Follow-up on yesterday’s posts: some maps are less misleading than others.

Yesterday I complained about the New York Times coronavirus maps showing sparsely-populated areas as having a case rate very close to zero, no matter what the actual rate is. Today the Times has a story about the fact that the rate in rural areas is higher than in more densely populated areas, and they have […]

All maps of parameter estimates are (still) misleading

I was looking at this map of coronavirus cases, pondering the large swaths with seemingly no cases. I moused over a few of the gray areas. The shading is not based on counties, as I assumed, but on some other spatial unit, perhaps zip codes or census blocks or something. (I’m sure the answer is […]

Sleep injury spineplot

Antony Unwin sends along the above graph in response to this recent post. The data are kinda crap, but I agree with Antony that this plot is a good way of showing the number of cases corresponding to each histogram bar.

Misrepresenting data from a published source . . . it happens all the time!

Following up on yesterday’s post on an example of misrepresentation of data from a graph, I wanted to share a much more extreme example that I wrote about awhile ago, about some data misrepresentation in an old statistics textbook: About fifteen years ago, when preparing to teach an introductory statistics class, I recalled an enthusiastic […]

Alexey Guzey plays Stat Detective: How many observations are in each bar of this graph?

How many data points are in each bar of the top graph above? (See here for background.) It’s from this article: Milewski MD, Skaggs DL, Bishop GA, Pace JL, Ibrahim DA, Wren TA, Barzdukas A. Chronic lack of sleep is associated with increased sports injuries in adolescent athletes. Journal of Pediatric Orthopaedics. 2014 Mar 1;34(2):129-33. […]

Information, incentives, and goals in election forecasts

Jessica Hullman, Christopher Wlezien, and Elliott Morris and I write: Presidential elections can be forecast using information from political and economic conditions, polls, and a statistical model of changes in public opinion over time. However, these “knowns” about how to make a good presidential election forecast come with many unknowns due to the challenges of […]

“Pictures represent facts, stories represent acts, and models represent concepts.”

I really like the above quote from noted aphorist Thomas Basbøll. He expands: Simplifying somewhat, pictures represent facts, stories represent acts, and models represent concepts. . . . Pictures are simplified representations of facts and to use this to draw a hard and fast line between pictures and stories and models is itself a simplified […]

An example of a parallel dot plot: a great way to display many properties of a list of items

I often see articles that are full of long tables of numbers and it’s hard to see what’s going on, so then I’ll suggest parallel dot plots. But people don’t always know what I’m talking about, so here I’m sharing an example. Next time when I suggest a parallel dot plot, I can point people […]

Know your data, recode missing data codes

We had a class assignment where students had to graph some data of interest. A pair of students made the above graph, as a reminder that some data cleaning is often necessary. The students came up with the excellent title as well!

Coding and drawing

Some people like coding and they like drawing too. What do they have in common? I like to code—I don’t looove it, but I like it ok and I do it a lot—but I find drawing to be very difficult. I can keep tinkering with my code to get it to look like whatever I […]

The history of low-hanging intellectual fruit

Alex Tabarrok asks, why was the game Dungeons and Dragons, or something like it, not invented in ancient Rome? He argues that the ancient Romans had the technology (that would be dice, I guess) so why didn’t someone thing of inventing a random-number-driven role-playing game? I don’t have an answer, but I think we can […]

Roll Over Mercator: Awesome map shows the unreasonable effectiveness of mixture models

I’m not gonna link to all the great xkcd drawings cos if I did, I’d just be linking to xkcd every day, but today’s is just too good to pass by: He could’ve thrown in some Pacific islands and Scandinavia too, but it’s amazing in any case. The relevant statistical point here is how good […]

New report on coronavirus trends: “the epidemic is not under control in much of the US . . . factors modulating transmission such as rapid testing, contact tracing and behavioural precautions are crucial to offset the rise of transmission associated with loosening of social distancing . . .”

Juliette Unwin et al. write: We model the epidemics in the US at the state-level, using publicly available death data within a Bayesian hierarchical semi-mechanistic framework. For each state, we estimate the time-varying reproduction number (the average number of secondary infections caused by an infected person), the number of individuals that have been infected and […]

Hey, I think something’s wrong with this graph! Free copy of Regression and Other Stories to the first commenter who comes up with a plausible innocent explanation of this one.

Paul Alper points us to this column by Dana Milbank discussing the above graph from Georgia’s Department of Public Health: Ok, the comb-style bar graph is, as always, a bad idea, as it multiplexes two dimensions (county and time) on a single x-axis. The graph should be a lineplot, with one line per county, and […]

“Stay-at-home” behavior: A pretty graph but I have some questions

Or, should I say, a pretty graph and so have some questions. It’s a positive property of a graph that it makes you want to see more. Clare Malone and Kyle Bourassa write: Cuebiq, a private data company, assessed the movement of people via GPS-enabled mobile devices across the U.S. If you look at movement […]

Uncertainty and variation as distinct concepts

Jake Hofman, Dan Goldstein, and Jessica Hullman write: Scientists presenting experimental results often choose to display either inferential uncertainty (e.g., uncertainty in the estimate of a population mean) or outcome uncertainty (e.g., variation of outcomes around that mean). How does this choice impact readers’ beliefs about the size of treatment effects? We investigate this question […]