Skip to content
Archive of posts filed under the Statistical graphics category.

R fixed its default histogram bin width!

I remember hist() in R as having horrible defaults, with the histogram bars way too wide. (See this discussion: A key benefit of a histogram is that, as a plot of raw data, it contains the seeds of its own error assessment. Or, to put it another way, the jaggedness of a slightly undersmoothed histogram […]

“Principles of posterior visualization”

What better way to start the new year than with a discussion of statistical graphics. Mikhail Shubin has this great post from a few years ago on Bayesian visualization. He lists the following principles: Principle 1: Uncertainty should be visualized Principle 2: Visualization of variability ≠ Visualization of uncertainty Principle 3: Equal probability = Equal […]

“Check yourself before you wreck yourself: Assessing discrete choice models through predictive simulations”

Timothy Brathwaite sends along this wonderfully-titled article (also here, and here’s the replication code), which begins: Typically, discrete choice modelers develop ever-more advanced models and estimation methods. Compared to the impressive progress in model development and estimation, model-checking techniques have lagged behind. Often, choice modelers use only crude methods to assess how well an estimated […]

Exploring model fit by looking at a histogram of a posterior simulation draw of a set of parameters in a hierarchical model

Opher Donchin writes in with a question: We’ve been finding it useful in the lab recently to look at the histogram of samples from the parameter combined across all subjects. We think, but we’re not sure, that this reflects the distribution of that parameter when marginalized across subjects and can be a useful visualization. It […]

Graphs and tables, tables and graphs

Jesse Wolfhagen writes: I was surprised to see a reference to you in a Quartz opinion piece entitled “Stop making charts when a table is better”. While the piece itself makes that case that there are many kinds of charts that are simply restatements of tabular data, I was surprised that you came up as […]

Perhaps you could try a big scatterplot with one dot per dataset?

Joe Nadeau writes: We are studying variation in both means and variances in metabolic conditions. We have access to nearly 200 datasets that involve a range of metabolic traits and vary in sample size, mean effects, and variance. Some traits differ in mean but not variance, others in variance but not mean, still others in […]

“Fudged statistics on the Iraq War death toll are still circulating today”

Mike Spagat shares this story entitled, “Fudged statistics on the Iraq War death toll are still circulating today,” which discusses problems with a paper published in a scientific journal in 2006, and errors that a reporter inadvertently included in a recent news article. Spagat writes: The Lancet could argue that if [Washington Post reporter Philip] […]

How to graph a function of 4 variables using a grid

This came up in response to a student’s question. I wrote that, in general, you can plot a function y(x) on a simple graph. You can plot y(x,x2) by plotting y vs x and then having several lines showing different values of x2 (for example, x2=0, x2=0.5, x2=1, x2=1.5, x2=2, etc). You can plot y(x,x2,x3,x4) […]

Don’t get fooled by observational correlations

Gabriel Power writes: Here’s something a little different: clever classrooms, according to which physical characteristics of classrooms cause greater learning. And the effects are large! Moving from the worst to the best design implies a gain of 67% of one year’s worth of learning! Aside from the dubiously large effect size, it looks like the […]

Against Arianism 2: Arianism Grande

“There’s the part you’ve braced yourself against, and then there’s the other part” – The Mountain Goats My favourite genre of movie is Nicole Kidman in a questionable wig. (Part of the sub-genre founded by Sarah Paulson, who is the patron saint of obvious wigs.) And last night I was in the same room* as […]

Who spends how much, and on what?

Nathan Yau (link from Dan Hirschman) constructed the above excellent visualization of data from the Consumer Expenditure Survey. Lots of interesting things here. The one thing that surprises me is that people (or maybe it’s households) making more than $200,000 only spent an average of $160,000. I guess the difference is taxes, savings (but not […]

What’s gonna happen in the 2018 midterm elections?

Following up on yesterday’s post on party balancing, here’s a new article from Joe Bafumi, Bob Erikson, and Chris Wlezien giving their predictions for November: We forecast party control of the US House of Representatives after the 2018 midterm election. First, we model the expected national vote relying on available generic Congressional polls and the […]

Awesome MCMC animation site by Chi Feng! On Github!

Sean Talts and Bob Carpenter pointed us to this awesome MCMC animation site by Chi Feng. For instance, here’s NUTS on a banana-shaped density. This is indeed super-cool, and maybe there’s a way to connect these with Stan/ShinyStan/Bayesplot so as to automatically make movies of Stan model fits. This would be great, both to help […]

Should the points in this scatterplot be binned?

Someone writes: Care to comment on this paper‘s Figure 4? I found it a bit misleading to do scatter plots after averaging over multiple individuals. Most scatter plots could be “improved” this way to make things look much cleaner than they are. People are already advertising the paper using this figure. The article, Genetic analysis […]

Opportunity for Comment!

(This is Dan) Last September, Jonah, Aki, Michael, Andrew and I wrote a paper on the role of visualization in the Bayesian workflow.  This paper is going to be published as a discussion paper in the Journal of the Royal Statistical Society Series A and the associated read paper meeting (where we present the paper and […]

“Choose the data visualization that best serves your audience.”

Tian Zheng prepared the above slide which very clearly displays an important point about statistical communication. The maps are squished to be too narrow, and the scatterplot has too many numbers on the axes (better to have income in thousands and percentages in tens), also given the numbers it seems that the data must be […]

Awesome data visualization tool for brain research

When I was visiting the University of Washington the other day, Ariel Rokem showed me this cool data visualization and exploration tool produced by Jason Yeatman, Adam Richie-Halford, Josh Smith, and himself. The above image gives a sense of the dashboard but the real thing is much more impressive because it’s interactive. You can rotate […]

The current state of the Stan ecosystem in R

(This post is by Jonah) Last week I posted here about the release of version 2.0.0 of the loo R package, but there have been a few other recent releases and updates worth mentioning. At the end of the post I also include some general thoughts on R package development with Stan and the growing number of […]

Taking perspective on perspective taking

Gabor Simonovits writes: I thought you might be interested in this paper with Gabor Kezdi of U Michigan and Peter Kardos of Bloomfield College, about an online intervention reducing anti-Roma prejudice and far-right voting in Hungary through a role-playing game. The paper is similar to some existing social psychology studies on perspective taking but we […]

“The problem of infra-marginality in outcome tests for discrimination”

Camelia Simoiu, Sam Corbett-Davies, and Sharad Goel write: Outcome tests are a popular method for detecting bias in lending, hiring, and policing decisions. These tests operate by comparing the success rate of decisions across groups. For example, if loans made to minority applicants are observed to be repaid more often than loans made to whites, […]