Skip to content
 

Watercolor regression

Solomon Hsiang writes:

Two small follow-ups based on the discussion (the second/bigger one is to address your comment about the 95% CI edges).

1. I realized that if we plot the confidence intervals as a solid color that fades (eg. using the “fixed ink” scheme from before) we can make sure the regression line also has heightened visual weight where confidence is high by plotting the line white. This makes the contrast (and thus visual weight) between the regression line and the CI highest when the CI is narrow and dark. As the CI fade near the edges, so does the contrast with the regression line. This is a small adjustment, but I like it because it is so simple and it makes the graph much nicer. (see “visually_weighted_fill_reverse” attached). My posted code has been updated to do this automatically.

2. You and your readers didn’t like that the edges of the filled CI were so sharp and arbitrary. But I didn’t like that the contrast between the spaghetti lines and the background had so much visual weight. So to meet in the middle, I smoothed the spaghetti plot to get a nonparametric estimate of the probability that the conditional mean is at a given value (see “visually_weighted_fixed_ink_smoothed_spaghetti” attached). To do this, after generating the spaghetti through bootstrapping, I estimate a kernel density of the spaghetti in the Y dimension for each value of X. I set the visual-weighting scheme so it still “preserves ink” along a vertical line-integral, so the distribution dims where it widens since the ink is being “stretched out”. To me, it kind of looks like a watercolor painting — maybe we should call it a “watercolor regression” or something like that.

The watercolor regression turned out to be more of a coding challenge than I expected, because the bandwidth for the kernel smoothing has to adjust to the width of the CI. And since several people seem to like R better than Matlab, I attached 2 figs to show them how I did this. Once you have the bootstrapped spaghetti plot (step1.jpg), I defined a new coordinate system that spanned the range of bootstrapped estimates for each value in X (step2.jpg). The kernel smoothing is then executed along the vertical columns of this new coordinate system.

I’ve updated the code posted online to include this new option. This Matlab code will generate a similar plot using my vwregress function:

x = randn(100,1);
e = randn(100,1);
y = 2*x+x.^2+4*e;

bins = 200;
color = [.5 0 0];
resamples = 500;
bw = 0.8;

vwregress(x, y, bins, bw, resamples, color, 'SMOOTH');

This has been a really helpful/fun process. Thanks to you and your readers for all the feedback. I don’t think I’ll ever plot a simple/solid regression line again:)

5 Comments

  1. John Mashey says:

    I’m slightly confused about this post. (trying again)

    ‘. I realized that if we plot the confidence intervals as a solid color that fades (eg. using the “fixed ink” scheme from before) we can make sure the regression line also has heightened visual weight where confidence is high by plotting the line white. This makes the contrast (and thus visual weight) between the regression line and the CI highest when the CI is narrow and dark. As the CI fade near the edges, so does the contrast with the regression line. This is a small adjustment, but I like it because it is so simple and it makes the graph much nicer. (see “visually_weighted_fill_reverse” attached). My posted code has been updated to do this automatically.’

    I like all that, IF that meant that the solid color fades vertically, and especially if the regression line were white …
    but I didn’t actually see an example of that in rummaging around Simon’s site. Was there an attached figure somewhere? Did I miss it?

    I still think the density should change along the vertical scale, as per Felix’s examples, especially the next to last. I still wish for white or gray line.

    Ideally, the vertical density profile should convey the probability, and any plausible shading scheme from dense to white would do that. Ideally, we’d get a good scheme that would get used widely and replace the misleading CI bars that have been around so long. It might be a good idea to consult the right sorts of cognitive scientists about the most effective mappings from probability to density, although that’s fine-tuning.

  2. Andrew has posted the email without the figures, so it’s a bit confusing to read.  I’ve posted the figures along with the original email here. I hope it helps.

    ps. I called it “watercolor regression” on the fly in the email, since that’s what it looked like to me. But seeing it in the post title makes me like the name and hope that it sticks.

  3. John Mashey says:

    Thanks, Solomon, I’ve commented over at your blog. Definitely worth looking at, i.e., 2) is the one I like the best so far.
    [Actual visuals are always better than descriptions of visuals by words :-)]