## “Principles of posterior visualization”

What better way to start the new year than with a discussion of statistical graphics.

Mikhail Shubin has this great post from a few years ago on Bayesian visualization. He lists the following principles:

Principle 1: Uncertainty should be visualized

Principle 2: Visualization of variability ≠ Visualization of uncertainty

Principle 3: Equal probability = Equal ink

Principle 4: Do not overemphasize the point estimate

Principle 5: Certain estimates should be emphasized over uncertain

And this caution:

These principles (as any visualization principles) are contextual, and should be used (or not used) with the goals of this visualization in mind.

And this is not just empty talk. Shubin demonstrates all these points with clear graphs.

Interesting how this complements our methods for visualization in Bayesian workflow.

1. Apu says:

I’m puzzled by the statement “Boxplot is a prefect tool for showing a variability in the data, but it should not be used for visualizing the posterior distribution.” Doesn’t the provided example show that a boxplot diagram is bad for *both*? What makes it “perfect” for showing variability (if, for example, the data being shown had something like a gamma distribution)?

• My argument was like this:

Imagine you measure some value several times with a noisy measurement tool. The measurements look like samples from normal (or gamma) distribution. It make sense to believe the measurements in the middle of the distribution are closer to the true value, while the measurements at the margins to be errors. Therefore, when visualizing these measurement, it make sense to emphasize the interval in the middle of the distribution.

Now imagine you used MCMC to estimate the same value. When plotting MCMC samples there is no reason the emphasize one sample over another, as they are all equally possible by definition!

Would boxplot be appropriate when visualizing variability in a population? Well, I dont know. I guess it depends if you think average values are more important than the marginal one.

So yeah, I made a mistake by calling “Boxplot to be a prefect tool for showing variability in the data”. Ill better write “boxplots may be appropriate sometimes”.

• Andrew says:

Just to add to this discussion: I hate boxplots!

2. Wow, nice to see someone still reads it now… or at least someone was reading it half a year ago =)

One point of criticism I have to my post from 2015. The Principle 3: “Equal probability = Equal ink” does not work well for visualizing heavy tailed distributions. Sometimes you have relevant probability mass spread across very large interval, so if you follow the principle and spread the ink correspondingly, the ink becomes invisible.

yeah, maybe I should revive the blog…

• Kaiser says:

Good post. I actually like Principle 3 as is. I think when values are spread out over a wide range, we are more uncertain about the values, so it makes sense that the ink is less visible. For that reason, I prefer panel B to panel C under your point 5.

Under point 5, is the y-axis on a probability scale? Why does the histogram at the very bottom shrink drastically from panel A to panel B?

• Mikhail says:

Even wide uncertain distributions have some summary statistics (mean, credible interval) which could be important and relevant for the reader, so it would be useful to make them visible in the figure.

As for the bottom histogram… eh, I dont remember

• Expanded answer: It all depends and what message you want to send with your figure.

If it is “Some estimates are very uncertain” then panel B would work the best.
If it is “There are our estimates, but some are very uncertain” then panel C would work the best.
If it is “There are our estimates, uncertainty does not matter” (maybe because variance of these different estimates are not directly compatible) then panel A would work the best.
If it is “There are our mean estimates” then maybe you should use a table.

If you are doing exploratory data visualization, and have no message to send… In this case, I would prefer panel C and a default choice.

3. Nat says:

An alternative to the standard boxplot is the Tufte boxplot which I think is a nice visual even if it does not satisfy “Equal probability = Equal ink”.

In the linked blog post by Mikhail Shubin, figure 4B is very similar to a violin plot. These plots are easy to make in R.

In the linked blog post by Mikhail Shubin, figure 5 A to C are ridgeline plots (formerly joyplots). These are also easy to make in R if you like that sort of thing.