On a proposal to scale confidence intervals so that their overlap can be more easily interpreted

Greg Mayer writes:

Have you seen this paper by Frank Corotto, recently posted to a university depository?

It advocates a way of doing box plots using “comparative confidence intervals” based on Tukey’s HSD in lieu of traditional error bars. I would question whether the “Error Bar Overlap Myth” is really a myth (i.e. a widely shared and deeply rooted but imaginary way of understanding the world) or just a more or less occasional misunderstanding, but whatever it’s frequency, I thought you might be interested, given your longstanding aversion to box plots, and your challenge to the world to find a use for them. (I, BTW, am rather fond of dox plots.)

My reply: Clever but I can’t imagine ever using this method or recommending it to others. The abstract connects the idea to Tukey, and indeed the method reminds me of some of Tukey’s bad ideas from the 1950s involving multiple comparisons. I think the problem here is in thinking of “statistical significance” as a goal in the first place!

I’m not saying it was a bad idea for this paper to be written. The concept could be worth thinking about, even if I would not recommend it as a method. Not every idea has to be useful. Interesting is important too.

15 thoughts on “On a proposal to scale confidence intervals so that their overlap can be more easily interpreted

  1. According to today’s blog, Greg Mayer wrote,

    “I, BTW, am rather fond of dox plots.”

    Is this a typo, or is there yet another field of endeavor that I am totally unaware of? Is it related to the plotting of “doxing”?

    “The meaning of DOX is to publicly identify or publish private information about (someone) especially as a form of punishment or revenge.”

    • A dox plot is a box plot combined with a symmetric dot plot (“d”ot plus b”ox”). I don’t know if Leland Wilkinson coined the term, but it was one of Systat’s graphing options. For small data sets, it gives a nice set of summary information (median, hinges, range– the box plot part) along with a view of the entire distribution (the dots). For large samples, the dots can become too numerous and obscure what’s going on.

    • On the subject of portmanteau words, back in the 1980s Ed Dudewicz of Syracuse University came up with “selestimation” – simultanous selection (across several populations) and estimation of the largest quantile.

  2. Suggested revision to the title: “1001 ways to use p-values to beat an analysis into submission.”
    I think the referee was kind.

    And, although many on this blog will agree with the paper’s statement about incorrect interpretation of confidence intervals (“I am x% confident that….”) I stand by my belief that such a statement is not terrible. The correct interpretation of a confidence interval is a mouthful, and while accurate, basically leaves the audience perplexed about why the analysis was done at all.

  3. I don’t think it is a new idea but I haven’t read the full paper.

    When two 95%CI do not overlap, we can conclude that the two means are different with about 99% confidence, p ~ .01.

    To get 95% confidence to use non-overlapping confidence intervals, we can use ~ 85%CI.

    One could easily show both intervals using whiskers at values corresponding to 85% and 95%.

  4. I’m not sure how many examples are needed to make a myth, but here’s one.

    In the Wisconsin deer trustee report, Kroll et al. write (page 50):

    “Figure 14 presents graphs used in the planning document. The graphs imply (using fitted
    exponential trend lines) an upward trend in infection rates, even for yearlings. Yet, the graphs
    also present 95% confidence limits for each year; and, in every case these limits overlap. From a
    statistical standpoint, this means there were no significant differences between years! ”

    https://www.sco.wisc.edu/wp-content/uploads/2012/07/2012WisconsinDeerTrusteeReport.pdf

  5. One dumb comment is that if you want to advertise something, you should provide it some format other than a Word document.

    More substantively relevantly, I very much agree it is often useful to plot confidence intervals for differences compared with some reference level, as often even if you want to plot things on the original scale, the inference for comparisons is more important. I’ve done this many times. For an example in a current paper see Figure 4A & B here: https://arxiv.org/abs/1810.03579.

    • To be clear, my point about advertising is prompted by “Others have published on these intervals (the mathematical basis goes back to John Tukey) but here I advertise comparative confidence intervals in the hope that more people use them.”

  6. I’ve used these kinds of bars frequently in the past in applied contexts where the client only wants to make comparisons visually quickly. For general science it’s not a good idea. We called them comparison bars. I’m pretty sure our group has a published paper using them. I know I’ve seen people using half LSD bars.

  7. Thank you Dr. Gelman for your courteous opinion on the Corotto paper. I’m very sensitive to criticism. With comparative confidence intervals, you don’t have to think about “statistical significance”. I advocate the plotting of pairs of CCIs each with different alphas. For this, box and whisker plots are kindest to the eye. With pairs of CCIs, you can sit back, take in the big picture, and make up your own mind what about what to–provisionally–believe. Consider figure 2 in the linked paper. Imagine instead a table of paired differences in which you have to think about what is subtracted from what to determine the direction. Thanks for your time. I never understood type S errors until I saw one of your posts. –Frank S. Corotto https://digitalcommons.gaacademy.org/gjs/vol81/iss2/11/

Leave a Reply

Your email address will not be published. Required fields are marked *