Skip to content
Archive of posts filed under the Statistical graphics category.

“The most mysterious star in the galaxy”

Charles Margossian writes: The reading for tomorrow’s class reminded me of a project I worked on as an undergraduate. It was the planet hunter initiative. The project shows light-curves to participants and asks them to find transit signals (i.e. evidence of a transiting planets). The idea was to rely on human pattern recognition capabilities to […]

What if that regression-discontinuity paper had only reported local linear model results, and with no graph?

We had an interesting discussion the other day regarding a regression discontinuity disaster. In my post I shone a light on this fitted model: Most of the commenters seemed to understand the concern with these graphs, that the upward slopes in the curves directly contribute to the estimated negative value at the discontinuity leading to […]

“Data is Personal” and the maturing of the literature on statistical graphics

Traditionally there have been five ways to write about statistical graphics: 1. Exhortations to look at your data, make graphs, do visualizations and not just blindly follow statistical procedures. 2. Criticisms and suggested improvements for graphs, both general (pie-charts! double y-axes! colors! labels!) and specific. 3. Instruction and examples of how to make effective graphs […]

What pieces do chess grandmasters move, and when?

The above image, from T. J. Mahr, is a cleaned-up version of this graph: which in turn is a slight improvement on a graph posted by Dan Goldstein (with R code!) which came from Ashton Anderson. The original, looks like this: This is just fine, but I had a few changes to make. I thought […]

Going beyond the rainbow color scheme for statistical graphics

Yesterday in our discussion of easy ways to improve your graphs, a commenter wrote: I recently read and enjoyed several articles about alternatives to the rainbow color palette. I particularly like the sections where they show how each color scheme looks under different forms of color-blindness and/or in black and white. Here’s a couple of […]

What are some common but easily avoidable graphical mistakes?

John Kastellec writes: I was thinking about writing a short paper aimed at getting political scientists to not make some common but easily avoidable graphical mistakes. I’ve come up with the following list of such mistakes. I was just wondering if any others immediately came to mind? – Label lines directly – Make labels big […]

Do regression structures affect research capital? The case of pronoun drop. (also an opportunity to quote Bertrand Russell: This is one of those views which are so absurd that only very learned men could possibly adopt them.)

A linguist pointed me with incredulity to this article by Horst Feldmann, “Do Linguistic Structures Affect Human Capital? The Case of Pronoun Drop,” which begins: This paper empirically studies the human capital effects of grammatical rules that permit speakers to drop a personal pronoun when used as a subject of a sentence. By de‐emphasizing the […]

What’s the upshot?

Yair points us to this page, The Upshot, Five Years In, by the New York Times data journalism team, listing their “favorite, most-read or most distinct work since 2014.” And some of these are based on our research: There Are More White Voters Than People Think. That’s Good News for Trump. (Story by Nate Cohn. […]

Ballot order update

Darren Grant writes: Thanks for bringing my work on ballot order effects to the attention of a wider audience via your recent blog post. The final paper, slightly modified from the version you posted, was published last year in Public Choice. Like you, I am not wedded to traditional hypothesis testing, but think it is […]

“The Long-Run Effects of America’s First Paid Maternity Leave Policy”: I need that trail of breadcrumbs.

Tyler Cowen links to a research article by Brenden Timpe, “The Long-Run Effects of America’s First Paid Maternity Leave Policy,” that begins as follows: This paper provides the first evidence of the effect of a U.S. paid maternity leave policy on the long-run outcomes of children. I exploit variation in access to paid leave that […]

David Weakliem on the U.S. electoral college

The sociologist and public opinion researcher has a series of excellent posts here, here, and here on the electoral college. Here’s the start: The Electoral College has been in the news recently. I [Weakliem] am going to write a post about public opinion on the Electoral College vs. popular vote, but I was diverted into […]

Maybe it’s time to let the old ways die; or We broke R-hat so now we have to fix it.

“Otto eye-balled the diva lying comatose amongst the reeds, and he suddenly felt the fire of inspiration flood his soul. He ran back to his workshop where he futzed and futzed and futzed.” –Bette Midler Andrew was annoyed. Well, annoyed is probably too strong a word. Maybe a better way to start is with The […]

R fixed its default histogram bin width!

I remember hist() in R as having horrible defaults, with the histogram bars way too wide. (See this discussion: A key benefit of a histogram is that, as a plot of raw data, it contains the seeds of its own error assessment. Or, to put it another way, the jaggedness of a slightly undersmoothed histogram […]

“Principles of posterior visualization”

What better way to start the new year than with a discussion of statistical graphics. Mikhail Shubin has this great post from a few years ago on Bayesian visualization. He lists the following principles: Principle 1: Uncertainty should be visualized Principle 2: Visualization of variability ≠ Visualization of uncertainty Principle 3: Equal probability = Equal […]

“Check yourself before you wreck yourself: Assessing discrete choice models through predictive simulations”

Timothy Brathwaite sends along this wonderfully-titled article (also here, and here’s the replication code), which begins: Typically, discrete choice modelers develop ever-more advanced models and estimation methods. Compared to the impressive progress in model development and estimation, model-checking techniques have lagged behind. Often, choice modelers use only crude methods to assess how well an estimated […]

Exploring model fit by looking at a histogram of a posterior simulation draw of a set of parameters in a hierarchical model

Opher Donchin writes in with a question: We’ve been finding it useful in the lab recently to look at the histogram of samples from the parameter combined across all subjects. We think, but we’re not sure, that this reflects the distribution of that parameter when marginalized across subjects and can be a useful visualization. It […]

Graphs and tables, tables and graphs

Jesse Wolfhagen writes: I was surprised to see a reference to you in a Quartz opinion piece entitled “Stop making charts when a table is better”. While the piece itself makes that case that there are many kinds of charts that are simply restatements of tabular data, I was surprised that you came up as […]

Perhaps you could try a big scatterplot with one dot per dataset?

Joe Nadeau writes: We are studying variation in both means and variances in metabolic conditions. We have access to nearly 200 datasets that involve a range of metabolic traits and vary in sample size, mean effects, and variance. Some traits differ in mean but not variance, others in variance but not mean, still others in […]

“Fudged statistics on the Iraq War death toll are still circulating today”

Mike Spagat shares this story entitled, “Fudged statistics on the Iraq War death toll are still circulating today,” which discusses problems with a paper published in a scientific journal in 2006, and errors that a reporter inadvertently included in a recent news article. Spagat writes: The Lancet could argue that if [Washington Post reporter Philip] […]

How to graph a function of 4 variables using a grid

This came up in response to a student’s question. I wrote that, in general, you can plot a function y(x) on a simple graph. You can plot y(x,x2) by plotting y vs x and then having several lines showing different values of x2 (for example, x2=0, x2=0.5, x2=1, x2=1.5, x2=2, etc). You can plot y(x,x2,x3,x4) […]