Lorraine Denby and Colin Mallows write:

It is usual to choose to make the bins in a histogram all have the same width. One could also choose to make them all have the same area. These two options have complementary strengths and weaknesses–the equal-width histogram oversmooths in regions of high density and is poor at identifying sharp peaks; the equal-area histogram oversmooths in regions of low density and so does not identify outliers. We describe a compromise approach which avoids both of these defects. We argue that relying on asymptotics of the Integrated Mean Square Error leads to inappropriate recommendations.

I’m so glad they wrote this article (it appeared recently in the Journal of Computational and Graphical Statistics)! I’ve thought for a long time that (a) histogram bars are typically too wide (for example, as set by default in software packages such as S and R), and (b) that the underlying problem was that people think of the goal of the histogram as to closely approximate the density function.

A key benefit of a histogram is that, as a plot of raw data, it contains the seeds of its own error assessment. Or, to put it another way, the jaggedness of a slightly undersmoothed histogram performs a useful service by visually indicating sampling variability. That’s why, if you look at the histograms in my books and published articles, I just about always use lots of bins. I also almost never like those kernel density estimates that people sometimes use to display one-dimensional distributions. I’d rather see the histogram and know where the data are.

Denby and Mallows go far beyond my vague thoughts by considering histograms with varying widths and coming up with a particular algorithm. I’d like to try out their method on my own problems. Is there R package out there?

Link's busted. Should be http://pubs.research.avayalabs.com/pdfs/ALR-2007-…

I really like kernel density estimates, but like you, I prefer to use a slightly under-smoothed representation, so I almost always reduce the bandwidth from the default to include a little more wiggle, which helps to understand the actual data points.

I've often wondered why there were no adaptive bandwidth KDE methods in R. Perhaps someone has a mixture of gaussians function?

See here for code: http://gist.github.com/217245

Neat. The article supplement has R code in a tar file if it hasn't been packaged. There's a dhist in ggplot2, but it's not identical.

The report version

http://pubs.research.avayalabs.com/pdfs/ALR-2007-…

refers to an R function they have.

Thanks. I'll try it out.

This link is not working either :(

Interesting – I have recently been thinking about histograms too. Define a histogram as a measurable function from the domain into some interval 1..n, and look for an information preserving optimal – I've a sketch of a method that starts with a kernel density estimate and uses this to find a best fit, via the relative entropy information projection, onto n points.

This is a bit elaborate (!) as a method for producing pictures, however if you really need an optimum discretization of data (which can be important in, e.g, data mining, or compression algorithms), then it might be worth doing. I haven't really dotted the is, or crossed the t's – It was just something I was thinking about recently before falling asleep.

I guess if you could see an equal-area histogram from above, it would look like a box-and-whisker plot?

(while an equal-width histogram seen from above would give no information, except the choice of widths)

From the point of view that histograms are condensed yet informative representations of data, I think it may be reasonable to try to use the minimum description length (MDL) principle to choose the intervals.

This has actually been done in a paper by

Kontkanen and Myllymäki.

Would this be a problem where parsimony is ok?