Statistics is the science of defaults.

One of the differences between statistics and other branches of engineering is that we have a special love for default procedures, perhaps because so many statistical problems are routine (or, at least, people would like them to be). We have standard estimates for all sorts of models, books of statistical tests, and default settings for everything. Recently I’ve been working on default weakly informative priors (which are *not* the same as the typically noninformative “reference priors” of the Bayesian literature). From a Bayesian point of view, the appropriate default procedure could be defined as that which is appropriate for the population of problems that one might be studying.

More generally, much of our job as statisticians is to come up with methods that will be used by others in routine practice. (Much of the rest of our job is to come up with methods for evaluating new and existing statistical methods, and methods for coming up with new statistical methods.)

I was recently reminded of the importance of defaults when reading this from sociologist Fabio Rojas on the presidential election:

My [Rojas’s] hypothesis is that the popular vote is only close because of extreme anti-Obama sentiment in the south. . . . My theory of the election is that Obama will slightly outperform the “fundamentals.” Normally, it’s really, really hard for the incumbent party to win the White House with nearly 8% unemployment. But I think non-Southern voters like Obama and don’t blame him that much for the slow recovery. There’s also Romney’s less than effective campaign (other than debate #1). That’s why he’s doing well outside the South. And in the South, there’s an unusually large drop in Obama support that’s hard to explain.

As a political scientist who’s worked on and popularized the idea of “the fundamentals,” I think Rojas’s attitude is just right. The fundamentals are indeed just a starting point. The idea is that, instead of taking a baseline of 50/50, or a baseline of a redo of the last election, or a baseline of some arbitrary historical comparison, or a baseline of a random walk, you take the baseline as some fundamentals-based forecast. And then you can go from there, as you do.

Here’s another way of putting it: There’s always a default. Choose your default, or your default will choose you. Fundamentals-based election forecasts are not perfect (in statistics jargon, their standard errors are not zero), but if you look carefully, you’ll see that people who don’t use these forecasts are using other default starting points, typically defaults that don’t make much sense from a theoretical or an empirical standpoint.

P.S. Hey: this is a new item for the lexicon!

I am reminded of a scatterplot you posted after the 2008 election, showing state by state results from 2008 and 2004, and noting the Obama vote was roughly uniformly 4% higher than the Kerry vote (except for obvious differences such as HI and AZ).

I wonder if you will be able to post a similar result after this election.

http://statmodeling.stat.columbia.edu/2009/01/state-by-state/#comments

The uniformity of the 4% increase in 2008 from 2004 and the extent that you have basically sort of even moves at the margin with less variability over time (another argument in your earlier post) supports the importance of fundamentals.

Isn’t the idea of “choosing” your default a bit paradoxical? As I understand defaults, they’re essentially conventions for causal inference (e.g. 5% p-values), for curve-fitting (e.g. linear dose-response curves for carcinogens), for weighting evidence (e.g. inverse variance weighting in meta-analysis), and so on.

Some are more arbitrary than others, some have a decent empirical or theoretical basis, and some are basically just there for ensuring consistency. But the point is that defaults are entrenched to a significant degree such that they’re hard to depart from in individual cases. They act to constrain the discretion of the analysts – they narrow the choices available.

I’m talking more of science-for-policy (e.g. risk assessment), so things are perhaps very different in less applied areas of statistics (in the sense that default practices will be less constraining).

Brian:

As a textbook writer, I choose the defaults. But practitioners get their choice too: they get to decide what textbook to choose!

Good points. As they say, the good thing about standards is that there are so many to choose from!

That said, if you’re conducting a risk or decision analysis for regulatory purposes, say within the EPA or FDA, then you don’t get to pick your inference rules or defaults. You have to play by their methodologies. Rule-bound analysis carries a lot of connotations that are attractive to regulatory agencies. It looks systematic, objective, consistent, and in a way, scientific. So it’s not surprising that those agencies – and the Federal Courts – often take a dim view of any attempt to depart from them.