Following up on our recent discussion of visually-weighted displays of uncertainty in regression curves, Lucas Leeman sent in the following two graphs:

First, the basic spaghetti-style plot showing inferential uncertainty in the E(y|x) curve:

Then, a version using even lighter intensities for the lines that go further from the center, to further de-emphasize the edges:

P.S. More (including code!) here.

So you literally shaded the lines based on their distance from the mean? Did you shade each part of each line, or the entire line?

Are the “error lines” some kind of confidence interval?

I would guess these are drawn from samples from the joint posterior. Each sample of parameters is used to draw a set of three regression lines. Do that a bunch of times, you get the cloud of predictions.

That is the approach explained in Gelman and Hill, using the sim function. I think it’s in the arm package. If I remember right, all it does is use the MLE and var-cov matrix to define a multivariate normal posterior, then samples parameters from it using mvrnorm (from MASS package).

If you fit the model with BUGS/JAGS/STAN, then you have the samples already, and you just compute predictions for each set of samples and draw them over one another.

If this is a bayesian analysis and the curves are drawn from the posterior density on the model parameters, the alpha at each point should be proportional to the conditional posterior density of Y conditioned on X. So the credible intervals of the Y variable conditioned on the X variable are implied by the figure (so long as you can do a reasonable job integrating intensities by eyeball).

This is assuming the lightness is achieved with an alpha < 1 and not just a light color choice. imo using the alpha channel would be preferable as the intensity encodes the conditional pdf. Otherwise, with enough curves and a finite resolution, you lose the density information and just have a block of color around the curve again.

As @M says below, to obtain the originally desired perceptive effect, you would need to get rid of the center curve, at which point you're basically just looking at the conditional pdfs implied by the model. Another way of making his point is that if the intensities represent the conditional pdfs, then including a dark curve in the middle makes the entropy implied by the intensities opposite what it should be (low where Y is uncertain and high where Y is certain, whereas it should be the other way around. The downside of removing the curve in the middle is that you're left to eyeball out a representative curve, but then again maybe the variation in the model predictions should be emphasized over a summarizing curve anyway.

Yes, that’s why I suggested the white line possibility in the (accidental) “anonymous” post.

I’d like to see the line, but have it be minimally obtrusive.

If continuous shades don’t work, perhaps +/- 1 SD, 2SD, 3SD might be tried.

Wow that looks pretty neat. Could the, presumably, R-Code also be published?

I like this kind of graphics but:

a) most of statistical journals would rather b&w or shaded gray lines, if you’re lucky;

b) some referees prefer just CI limits.

Any suggestions on what to do in these situations?

I second that. I would be fantastic if you could share the R code. The graphs look very neat and the concept is promising.

Yes! Much better, especially the second graph. Now we’re talking.

After all, while Gaussian cannot be assumed, if a vertical slice ~represents a Gaussian PDF, in some sense the intensity should match that, i.e., one would like error bars that got lighter and then faded out, if one were drawing error bars. The second chart certainly approximates that. I’ve long disliked error bars., since visually they *look* like a uniform distribution.

If one had software that did error bars that way (hence had continuous shading too pixel, not simulation thereof by lines), it would be tempting to experiment with drawing the line in white, and maybe try +/-1SD lines in narrower white.

(Akin to Tufte’s first book, pp.127-128, where he uses white for gridlines.)

I still think, visually, that the strong lines can be misleading.

As an example of combining multiples, see IPCC diagram mentioned here.

Quantum mechanics folks and other folks have long experimented with interesting visualizations – see pictures and movies, Google quantum mechanics visualization.

A nice movie is this, showing ;likelihood of electron states, dynamically.

Sigh, the old Rutherford model of the atom was a lot simpler and easier to display. It’s a lot easier to think of electrons as little moons flying around a planet, in definite places.

Do you have happen to have a direct link for the “IPCC diagram mentioned here”? (The link in the linked comment doesn’t go anywhere.)

Sorry, here was that Link to IPCC AR4 WG I 6.6.1.1 What Do Reconstructions Based on Palaeoclimatic Proxies Show? .

See the last graph on that page, which looks fuzzy. this is expanded version.

Aha — thanks!

Another request for the R code! These are pretty. (The second is prettier but I’m not sure it conveys the information as well as the first.)

I am not quite sure, if these plots actually achieve the goal. In the visually weighted regression plots, that have been previously posted, one’s attention really got drawn to the center of the regression curve (i.e. the part of the curve with the least estimation uncertainty). In these ‘spaghetti-style’ plots, the light intensities at the edges actually draw attention to the point estimate of the E[Y|X] curve, itself. In other words, the center of the curve–and not the edges–seem ‘blurry’ to me. Or do I miss something?

The second plot looks nice, but that is how I would expect the speghetti-style plot to look when appropriate numbers of sample paths are plotted with suitable transparencies — if the sample paths tend to pile up at the point-wise means or medians.

If some entire paths are made more transparent based on some notion of being an outlier (integrated squared deviation from the pointwise mean path…), then that may emphasize concentration around the mean path more than a basic spaghetti plot.

If the weighting/extra transparency is applied pointwise by making extreme quantiles of the pointwise distribution more transparent or by increasing transparency as a function of distance from the mean path, then it may give a pretty picture, but I would have to fight an urge to interpret it as if it were a basic speghetti-plot with a lot of concentration around the mean path.

Nice! I don’t think there is a rule about how to shade it, all the options are worth experimenting with. I wonder about shading each line according to its Mahalanobis distance from the point estimates in parameter space? The biggest hurdle for everyday analysts will be that the covariance matrix of parameters is not output by most standard estimation software. In fact, non-parametric bootstrap is probably worth doing even if it’s a really simple regression model because that way you do the inference and get your lines ready to go.

I have been using graduated shading n Excel to represent a proxy for probability density in confidence intervals for some time. Details can be found at

http://www.lho.org.uk/ViewResource.aspx?id=16015

Marc: thanks! That was just the sort of thing I was looking for.

I especially like the examples on pp.12-13, such as the grey/ with white-line.

Related to this, Sun Y. and Genton G. Functional boxplots. 2011 (http://www.tandfonline.com/doi/abs/10.1198/jcgs.2011.09224) is definitely worth to read.

If you use alpha<1 (partly transparent plots) how can a graph be exported from R in a way that it can be imported into MS Word as vector graphics (.wmf (Windows meta format) seems to not support transparency)?

(Sorry for being off topic but this problem appeared exactly when I tried using graphs with uncertainty bands plotted transparently.)

There is no need to use differential shading of lines. This employs an arbitrary evaluation by the presenter, instead of letting the reader interpret the data.

In a previous post, you wrote:

“Usually when we do this we just let the lines overwrite, but suppose that instead we make each of the 1000 lines really light gray but then increase the darkness when two or more lines overlap. Then you’ll get a graph where the curve is automatically darker where the uncertainty distribution is more concentrated and lighter where the distribution is more vague.”

This approach is objective, but neither of the two graphs above follow it. Instead of increasing transparency of lines that “go further from the center” (the arbitrary part), reduce the lightness (e.g. value, tone) and increase the transparency of all lines uniformly. This will provide a similar effect, without introducing subjectivity. I am a newbie to R, but I know you can do this in photo editing software.

[…] Andrew Gelman: graphs showing uncertainty in a fitted curve. […]

[…] our discussion of visual displays of regression uncertainty, I asked Solomon Hsiang and Lucas Leeman to send me […]

Late to the party, but here is one extremely straightforward approach to this type of graph, done in R: http://is-r.tumblr.com/post/32193893263/simple-visually-weighted-regression-plots