I was thinking more about axes that extend beyond the possible range of the data, and I realized that it’s not simply an issue of software defaults but something more important, and interesting, which is the way in which graphics objects are stored on the computer.
R (and its predecessor, S) is designed to be an environment for data analysis, and its graphics functions are focused on plotting data points. If you’re just plotting a bunch of points, with no other information, then it makes sense to extend the axes beyond the extremes of the data, so that all the points are visible. But then, if you want, you can specify limits to the graphing range (for example, in R, xlim=c(0,1), ylim=c(0,1)). The defaults for these limits are the range of the data.
What R doesn’t allow, though, are logical limits: the idea that the space of the underlying distribution is constrained. Some variables have no constraints, others are restricted to be nonnegative, others fall between 0 and 1, others are integers, and so forth. R (and, as far as I know, other graphics packages) just treats data as lists of numbers. You also see this problem with discrete variables; for example when R is making a histogram of a variable that takes on the values 1, 2, 3, 4, 5, it doesn’t know to set up the bins at the correct places, instead setting up bins from 0 to 1, 1 to 2, 2 to 3, etc., making it nearly impossible to read sometimes.
What I think would be better is for every data object to have a “type” attached: the type could be integer, nonnegative integer, positive integer, continuous, nonnegative continuous, binary, discrete with bounded range, discrete with specified labels, unordered discrete, continuous between 0 and 1, etc. If the type is not specified (i.e., NULL), it could default to unconstrained continuous (thus reproducing what’s in R already). Graphics functions could then be free to use the type; for example, if a variable is constrained, one of the plotting options (perhaps the default, perhaps not) would be to have the constraints specify the plotting range.
Lots of other benefits would flow from this, I think, and that’s why we’re doing this in “mi” and “autograph”. But the basic idea is not limited to any particular application; it’s a larger point that data are not just a bunch of numbers; they come with structure.