“Principles of posterior visualization”

Posted on January 1, 2019 9:00 AM by Andrew

What better way to start the new year than with a discussion of statistical graphics.

Mikhail Shubin has this great post from a few years ago on Bayesian visualization. He lists the following principles:

Principle 1: Uncertainty should be visualized

Principle 2: Visualization of variability ≠ Visualization of uncertainty

Principle 3: Equal probability = Equal ink

Principle 4: Do not overemphasize the point estimate

Principle 5: Certain estimates should be emphasized over uncertain

And this caution:

These principles (as any visualization principles) are contextual, and should be used (or not used) with the goals of this visualization in mind.

And this is not just empty talk. Shubin demonstrates all these points with clear graphs.

Interesting how this complements our methods for visualization in Bayesian workflow.

12 thoughts on ““Principles of posterior visualization””

Apu on January 1, 2019 11:23 AM at 11:23 am said:

I’m puzzled by the statement “Boxplot is a prefect tool for showing a variability in the data, but it should not be used for visualizing the posterior distribution.” Doesn’t the provided example show that a boxplot diagram is bad for *both*? What makes it “perfect” for showing variability (if, for example, the data being shown had something like a gamma distribution)?

Reply ↓
- Mikhail Shubin on January 1, 2019 7:17 PM at 7:17 pm said:
  
  My argument was like this:
  
  Imagine you measure some value several times with a noisy measurement tool. The measurements look like samples from normal (or gamma) distribution. It make sense to believe the measurements in the middle of the distribution are closer to the true value, while the measurements at the margins to be errors. Therefore, when visualizing these measurement, it make sense to emphasize the interval in the middle of the distribution.
  
  Now imagine you used MCMC to estimate the same value. When plotting MCMC samples there is no reason the emphasize one sample over another, as they are all equally possible by definition!
  
  Would boxplot be appropriate when visualizing variability in a population? Well, I dont know. I guess it depends if you think average values are more important than the marginal one.
  
  So yeah, I made a mistake by calling “Boxplot to be a prefect tool for showing variability in the data”. Ill better write “boxplots may be appropriate sometimes”.
  
  Reply ↓
- Andrew on January 1, 2019 8:03 PM at 8:03 pm said:
  
  Just to add to this discussion: I hate boxplots!
  
  Reply ↓
  - Keith O’Rourke on January 2, 2019 8:47 AM at 8:47 am said:
    
    I hate them too!
    
    (OK, they worked well for Tukey when he was trying to do statistical analyses by hand on plane trips – but that need has passed.)
    
    Reply ↓
  - Donald Szlosek on January 2, 2019 9:28 PM at 9:28 pm said:
    
    Justin Matejka and George Fitzmaurice made a wonderful illustration of the variability in boxplots with is a great education piece.
    
    https://www.autodeskresearch.com/publications/samestats
    
    Reply ↓
    - Mikhail on January 3, 2019 7:13 AM at 7:13 am said:
      
      My problem with Boxplots is not that they hide details (all plots based on summary statistics do this), but that they misrepresent the data by pretending the probability distribution is much narrower.
    - Mikhail Shubin on January 3, 2019 7:37 AM at 7:37 am said:
      
      https://www.autodeskresearch.com/publications/samestats is wonderful article explaining why visualizing the full distributions is important for exploratory research. But I think there is a difference between exploratory and communicative visualization.
      (see here https://ctg2pi.wordpress.com/2014/09/18/single-axiom-of-visualization/)
      
      Imagine I visualized my posterior distributions and found no dinosaurs. In fact, the vast majority of all posterior distributions I had ever obtained look like boring Gaussians. When communicating my results to someone else, could I go straight to the point and only show my summary statistics? Yeah, I think I can. But even in this case boxplot would be misleading.
Mikhail Shubin on January 1, 2019 6:35 PM at 6:35 pm said:

Wow, nice to see someone still reads it now… or at least someone was reading it half a year ago =)

One point of criticism I have to my post from 2015. The Principle 3: “Equal probability = Equal ink” does not work well for visualizing heavy tailed distributions. Sometimes you have relevant probability mass spread across very large interval, so if you follow the principle and spread the ink correspondingly, the ink becomes invisible.

yeah, maybe I should revive the blog…

Reply ↓
- Kaiser on January 2, 2019 11:43 AM at 11:43 am said:
  
  Good post. I actually like Principle 3 as is. I think when values are spread out over a wide range, we are more uncertain about the values, so it makes sense that the ink is less visible. For that reason, I prefer panel B to panel C under your point 5.
  
  Under point 5, is the y-axis on a probability scale? Why does the histogram at the very bottom shrink drastically from panel A to panel B?
  
  Reply ↓
  - Mikhail on January 3, 2019 7:44 AM at 7:44 am said:
    
    Even wide uncertain distributions have some summary statistics (mean, credible interval) which could be important and relevant for the reader, so it would be useful to make them visible in the figure.
    
    As for the bottom histogram… eh, I dont remember
    
    Reply ↓
  - Mikhail Shubin on January 7, 2019 4:40 AM at 4:40 am said:
    
    Expanded answer: It all depends and what message you want to send with your figure.
    
    If it is “Some estimates are very uncertain” then panel B would work the best.
    If it is “There are our estimates, but some are very uncertain” then panel C would work the best.
    If it is “There are our estimates, uncertainty does not matter” (maybe because variance of these different estimates are not directly compatible) then panel A would work the best.
    If it is “There are our mean estimates” then maybe you should use a table.
    
    If you are doing exploratory data visualization, and have no message to send… In this case, I would prefer panel C and a default choice.
    
    Reply ↓
Nat on January 7, 2019 2:26 PM at 2:26 pm said:

An alternative to the standard boxplot is the Tufte boxplot which I think is a nice visual even if it does not satisfy “Equal probability = Equal ink”.

In the linked blog post by Mikhail Shubin, figure 4B is very similar to a violin plot. These plots are easy to make in R.

In the linked blog post by Mikhail Shubin, figure 5 A to C are ridgeline plots (formerly joyplots). These are also easy to make in R if you like that sort of thing.

Reply ↓

Statistical Modeling, Causal Inference, and Social Science

“Principles of posterior visualization”

12 thoughts on ““Principles of posterior visualization””

Leave a Reply Cancel reply