Skip to content
Archive of posts filed under the Statistical graphics category.

Roll Over Mercator: Awesome map shows the unreasonable effectiveness of mixture models

I’m not gonna link to all the great xkcd drawings cos if I did, I’d just be linking to xkcd every day, but today’s is just too good to pass by: He could’ve thrown in some Pacific islands and Scandinavia too, but it’s amazing in any case. The relevant statistical point here is how good […]

New report on coronavirus trends: “the epidemic is not under control in much of the US . . . factors modulating transmission such as rapid testing, contact tracing and behavioural precautions are crucial to offset the rise of transmission associated with loosening of social distancing . . .”

Juliette Unwin et al. write: We model the epidemics in the US at the state-level, using publicly available death data within a Bayesian hierarchical semi-mechanistic framework. For each state, we estimate the time-varying reproduction number (the average number of secondary infections caused by an infected person), the number of individuals that have been infected and […]

Hey, I think something’s wrong with this graph! Free copy of Regression and Other Stories to the first commenter who comes up with a plausible innocent explanation of this one.

Paul Alper points us to this column by Dana Milbank discussing the above graph from Georgia’s Department of Public Health: Ok, the comb-style bar graph is, as always, a bad idea, as it multiplexes two dimensions (county and time) on a single x-axis. The graph should be a lineplot, with one line per county, and […]

“Stay-at-home” behavior: A pretty graph but I have some questions

Or, should I say, a pretty graph and so have some questions. It’s a positive property of a graph that it makes you want to see more. Clare Malone and Kyle Bourassa write: Cuebiq, a private data company, assessed the movement of people via GPS-enabled mobile devices across the U.S. If you look at movement […]

Uncertainty and variation as distinct concepts

Jake Hofman, Dan Goldstein, and Jessica Hullman write: Scientists presenting experimental results often choose to display either inferential uncertainty (e.g., uncertainty in the estimate of a population mean) or outcome uncertainty (e.g., variation of outcomes around that mean). How does this choice impact readers’ beliefs about the size of treatment effects? We investigate this question […]

Make Andrew happy with one simple ggplot trick

By default, ggplot expands the space above and below the x-axis (and to the left and right of the y-axis). Andrew has made it pretty clear that he thinks the x axis should be drawn at y = 0. To remove the extra space around the axes when you have continuous (not discrete or log […]

We need better default plots for regression.

Robin Lee writes: To check for linearity and homoscedasticity, we are taught to plot residuals against y fitted value in many statistics classes. However, plotting residuals against y fitted value has always been a confusing practice that I know that I should use but can’t quite explain why. It is not until this week I […]

Tracking R of COVID-19 & assessing public interventions; also some general thoughts on science

Simas Kucinskas writes:

10 on corona

Here are some things people have sent me lately. They are in no particular order, except that I put the last item last so we could end with some humor. After this, I’ll write a few more blog posts, then it’ll be time to do some real work. Table of contents 1. Suspicious coronavirus numbers […]

Number of deaths or number of deaths per capita

Pablo Haya writes: Currently, there is a lot of data analysis in the news media showing multiple aspects of the COVID-19 crisis. Many of them compare the virus spread and evolution between different countries, or between different regions within each country. They use to compare the absolute frequency of several metrics such as confirmed cases […]

A better way to visualize the spread of coronavirus in different countries?

Joel Elvery write: Long-time listener, first-time caller. I’m an economist at the Federal Reserve Bank of Cleveland. I think I have stumbled on to a very effective way to visualize and compare the trajectories of COVID-19 epidemics. This short post describes the approach and what we learn from it, but the graph above is enough […]

Interesting y-axis

Merlin sent along this one: P.S. To be fair, when it comes to innumeracy, whoever designed the above graph has nothing on these people. As Clarissa Jan-Lim put it: Math is hard and everyone needs to relax! (Also, Mr. Bloomberg, sir, I think we will all still take $1.53 if you’re offering).

Corona virus presentation by the Dutch CDC, also some thoughts on the audience for these sorts of presentations

Anne Pier Salverda writes: I’ve attached a corona virus presentation by the RIVM, the Dutch equivalent of the CDC. This briefing is happening as I send this email. The presentation is comprehensive and comprehensible, and it hits all the marks in the data visualization and communication department. Are you aware of any comparable presentations by […]

“Older Americans are more worried about coronavirus — unless they’re Republican”

Philip Greengard points us to the above-titled news article by Philip Bump. The article was just fine, a reminder of modern-day political polarization. The only thing that bothered me were the graphs. I redrew them above. Here were the original versions: I see a few problems with these graphs. First, the information is duplicated because […]

An article in a statistics or medical journal, “Using Simulations to Convince People of the Importance of Random Variation When Interpreting Statistics.”

Andy Stein writes: On one of my projects, I had a plot like the one above of drug concentration vs response, where we divided the patients into 4 groups. I look at the data below and think “wow, these are some wide confidence intervals and random looking data, let’s not spend too much time more […]

Graphs of school shootings in the U.S.

Bert Gunter writes: This link is to an online CNN “analysis” of school shootings in the U.S. I think it is a complete mess (you may disagree, of course). The report in question is by Christina Walker and Sam Petulla. Gunter lists two problems: 1. Graph labeled “Race Plays A Factor in When School Shootings […]

“Bullshitters. Who Are They and What Do We Know about Their Lives?”

Hannes Margraf writes: I write to make you aware of a paper with the delightful title “Bullshitters. Who Are They and What Do We Know about Their Lives?” [by John Jerrim, Phil Parker, and Nikki Shure]. The authors examine “teenagers’ propensity to claim expertise in three mathematics constructs that do not really exist” and “find […]

My talk on visualization and data science this Sunday 9am

Uncovering Principles of Statistical Visualization Visualizations are central to good statistical workflow, but it has been difficult to establish general principles governing their use. We will try to back out some principles of visualization by considering examples of effective and ineffective uses of graphics in our own applied research. We consider connections between three goals […]

Rachel Tanur Memorial Prize for Visual Sociology

Judith Tanur writes: The Rachel Tanur Memorial Prize for Visual Sociology recognizes students in the social sciences who incorporate visual analysis in their work. The contest is open worldwide to undergraduate and graduate students (majoring in any social science). It is named for Rachel Dorothy Tanur (1958–2002), an urban planner and lawyer who cared deeply […]

Hey, look! The R graph gallery is back.

We’ve recommended the R graph gallery before, but then it got taken down. But now it’s back! I wouldn’t use it on its own as a teaching tool, in that it has a lot of graphs that I would not recommend (see here), but it’s a great resource, so thanks so much to Yan Holtz […]