Yesterday I spoke at the Princeton economics department. The title of my talk was:

“Unbiasedness”: You keep using that word. I do not think it means what you think it means.

The talk went all right—people seemed ok with what I was saying—but I didn’t see a lot of audience involvement. It was a bit like pushing at an open door. Everything I said sounded so reasonable, it didn’t seem clear where the interesting content was.

The talk went like this: I discussed various examples where people were using classical unbiased estimates. There was one example with a silly regression discontinuity analysis controlling for a cubic polynomial, where it’s least squares so it’s unbiased but the model makes no sense. And another example with a simple comparison in a randomized experiment, where selection bias (the statistical significance filter and various play-the-winner decisions) push the estimate higher so that the reported estimate is biased, even though the particular statistical procedure being reported is nominally unbiased.

My point was: Here are these methods that respected researchers (including economists) use, that get published in top journals, but which are clearly wrong, in the sense of giving estimates and uncertainty statements that we don’t believe.

Why are people using such methods, in one case using a clearly inappropriate model and in the other case avoiding clearly appropriate adjustments?

I think part of the problem is a prioritizing of “unbiasedness” and a misunderstanding of what this really means in the practical world of data analysis and publication. The idea is that unbiased estimates are seen as pure, and that it’s ok to use an analysis that’s evidently flawed, if it does not “bias” the estimate. So, in a regression discontinuity setting, it’s considered ok to control for a high-degree polynomial because this fits into the general idea that, if you control for unnecessary predictors in a regression, you’re fine: it adds no bias and all that might happen is that your standard error gets bigger. Now, ok, nobody wants a big standard error, but remember that the usual goal in applied work is “statistical significance.” So . . . as long as your estimate is more than 2 standard errors away from 0, you’re cool. The price you paid in terms of variance was, apparently, not too high.

In my talk, I then continued by briefly describing our Xbox analysis, as an example of how we can succeed by adjusting. Instead of clinging to a nominally unbiased procedure, we can reduce actual bias by modeling.

As I said above, the people in the audience (mostly economists and political scientists) pretty much agreed with everything I said, except that they disagreed with my claim that “minimizing bias is the traditional first goal of econometrics.”

Or, maybe they accepted that this was a *traditional* first goal but they said that econometrics has moved on. In particular, I was informed that econometricians are much more interested in interval estimation than point estimation, and their typical first goal now is coverage. In fact, I was told that I could pretty much keep my talk as it was and just replace “unbiasedness” with “coverage” everywhere and it would still work. Thus, various conventional approaches for obtaining 95% intervals are believed to be ok because they have 95% coverage. But, because of selection and omitted variables, these intervals *don’t* have that nominal coverage. And that’s good to know.

The other point that was made to me after the talk was that, yes, some of the work I criticized was by respected economists—but this work was not published in econ journals. One of the papers was published in the tabloid PPNAS, for example. And these Princeton people assured me that had the work I’d criticized been presented in their seminar, they would’ve seen the problems—the omitted variable bias in one example and the selection bias in the other.

The point I made which still holds, I think, is the critique of what is commonly viewed as inferential conservatism. I feel that a central stream in econometric thinking is to play it safe, to favor unbiasedness and robustness over efficiency. And my central point is that the choices that look like “playing it safe” (for example, using least squares with no shrinkage, or taking simple comparisons with no adjustments) are, in practice, only used when the resulting estimates are more than 2 standard errors away from 0—and this selection sets us up for lots of problems.

So, I agree that it’s misleading to think of unbiasedness as the first goal in modern econometrics, but it remains my impression that there’s a misguided tendency in econometrics to downplay methods that increase statistical efficiency.

A lot of this comes from statistics teaching. When one learns that regression is BLUE, the fact that other estimators might be preferable (if not U, or not L) is never emphasized, even if it’s taught at all. And the terminology doesn’t help, particularly when your job is to explain statistical concepts to laymen. Who wants to admit that their methodology is biased? You can explain it, of course, but it takes a lot of precious time where impatient people are waiting for your conclusion, and won’t really understand the difference anyway. There are several statistics term whose English language connotation is quite different from their statistical meaning: unbiased, consistent, and robust are my three favorites.

For mathematically more sophisticated audiences, I pull out some of the examples given in Romano and Siegel, Counterexamples in Probability and Statistics, particularly 7.15 in which one takes a sample of one (X) from a Poisson and wants to estimate exp(-3*lambda). The *only* unbiased estimate is T(X)=(-2)^X, which is always worse than the biased estimator max(0,T). This example doesn’t really *convince* anyone that unbiasedness isn’t required for *their* problem, but at least throws cold water on the idea that unbiasedness is *required*.

I’ll add “significant” to your list of favorites.

Forgot that one.. Leaving off my all-time favorite would be significant. but it’s only a sample of one.

Andrew – thanks for this helpful follow-up here to the earlier, related, post where I objected to your premise about econometrics. I feel much more comfortable now. I have just one remaining quibble – your last sentence. Historically, and currently, econometricians have been very concerned about efficiency in estimation. For instance, the emphasis on “full system ” estimators such as 3SLS and FIML (rather than “single equation” estimators such as 2SLS) for simultaneous systems. Asymptotic efficiency is very much a concern when applying IV/GMM estimators, etc. Other than that, I’m with your audience.

Econs always say that. They are still strugling with the very simple concept of statistical significance. When you present the problem, they look at you bored saying: “everybody knows that. And nobody does that anymore”. Then you take the main papers in AER and, BAM, there it is: people thinking that getting statistical significance is the main goal of their paper, people rejecting models based on significance, people comparing a significant and insignificant estiamate when the difference may not be significant by itself, people saying that study A contradicts study B because one was significant and the other wasn’t… and so on. So don’t believe them when they say that ‘work I’d criticized been presented in their seminar, they would’ve seen the problems’. Next time, take a couple of papers from AER. Want a famous one? Take the Acemoglu, Johnson and Robinson (2001) paper on institutions.

I don’t know about economics, but in beginning regression there is a lot of emphasis on BLUE and the assumptions related to BLUE. I think that is because the G-M assumptions are pretty easy to understand in English and also one can understand the basics pretty well visually or with simple examples. I think that especially when you have a semester it’s pretty easy to say we are just going to concentrate on understanding OLS really well. So if OLS is BLUE under these assumptions, that’s pretty darn elegant right? In one nice neat package you’ve talked about the relationship between assumptions, characteristics of estimators, and how to run (and check the assumptions on) a pretty darn powerful and practical model. And you’ve also brought in the possibility of thinking about what you do when you don’t have independence, you have non linearity or autocorrelation. Those open up the discussion for biased models not mention various non BLUE approaches. Perfect work of a semester. (And the added bonus that other approaches with the right assumptions will yield the same results and you can build on the tests the students learned previously.)

That’s not necessarily a bad thing (better to understand it really well than to rush through) but the temptation is not to explain what the B, L, U, and E stand for lest you reveal that students really didn’t get some of the vocabulary in their earlier classes or to even discuss other possibilities. I actually always do a bit where I say “when we are talking about bias here, we’re not talking about racism” and I actually think that the common US language of bias could be part of the reason that some people think the idea of giving up unbiasedness is beyond the Pale.

Elin,

I agree that “I actually think that the common US language of bias could be part of the reason that some people think the idea of giving up unbiasedness is beyond the Pale” — I also say something like “bias in statistics doesn’t mean that you are personally biased,” along with the technical definition. But I usually found time in a regression course to spend some time talking about the tradeoff between bias and variance: showing how mean squared error is a measure that takes both bias and variance into account; and discussing Mallow’s C as one tool to take into account in model selection. I agree that there’s not enough time to do a more thorough discussion of various model selection techniques, but think Mallow’s is good choice if you only have time to do just one, because it’s less likely than AIC, etc. to be taken as “a rule,” and because it lends itself to explaining the bias-variance tradeoff. (Admittedly, I made up my own notes on this because I didn’t think the textbook did a good enough job. In case you’re interested, they’re at http://www.ma.utexas.edu/users/mks/384Gfa08/selterms.pdf.)

However, I’m inclined to disagree with “but in beginning regression there is a lot of emphasis on BLUE “. Many beginning regression textbooks don’t emphasize BLUE; maybe you’re limiting your choices.

I wonder, did the idea of “unbiasedness” actually enter the minds of the guys who wrote the

“silly regression discontinuity analysis controlling for a cubic polynomial”study?Rahul:

I think it’s indirect. The applied economists follow what the econometricians tell them, and the econometricians are influenced by the idea of unbiasedness.

I think you are detecting a subliminal pattern.

> The applied economists follow what the econometricians tell them

Oh, if only…

I am sort of curious where you’re meeting all of these finite-sample econometricians, though. I can probably count on one hand the number of non-bayesian econometrics talks I’ve seen that weren’t entirely about asymptotic properties.

At least they’re consistent!

A few comments based on my experience with the publication process in economics:

i. It is known and accepted by econometricians that unbiasedness shouldn’t always be the goal.

ii. Unbiasedness is usually good enough to get a paper accepted.

iii. A lot of referees would never go along with publication if you didn’t have unbiasedness.

There is a prescient quote by Tukey which articulates some of these issues rather well (with added stars for emphasis):

“”If we ask for a ‘regression’ formula to give optimum prediction, admitting both bias and variability as measurable, comparable and combinable evils, we are driven to answers which today are, at the very least, considered non-standard and may indeed even be considered heretical. In Anderson’s example classical statisticians would take either the low-variance conventional estimate of the linear regression, or the high-variance conventional estimate of the quadratic regression, ***each of which is unbiased in terms of its own model***. But any linear regression is surely biased with respect to a quadratic model. ***And if the real world be quadratic, then we shall not eliminate real bias by talking of linear regression.***” – Tukey

Jaynes has a good discussion of this topic in _Probability Theory: The Logic of Science_.

I’m not sure about established econometricians, but among almost all of the PhD Econ students I know (including some at Princeton) there definitely is a lack of understanding/interest in biased estimation techniques. None of them are specializing in econometrics, so maybe there’s some disconnect between what the econometricians understand and what’s being taught to non-specialists in grad school.

It just seems to me that the underlying problem is always model specification, particularly in a field that doesn’t have theories with well-defined differentiation of observed results (and that’s you, economics) and the fact that extremely different models can give excellent “fits”. It can happen in the physical sciences also–Spanos’ example of regressing planetary motion using both the Ptolmey and Kepler models through linear regression gives excellent R-squareds for both models. Typically any misspecified model is “biased”, and all models are misspecified (but some are useful!). I haven’t looked a econometrics in a number of years but I recall all the hoopla over 2SLS and 3SLS and full information maximum likelihood when it was clear non of these SE models could be correct (note, however, simultaneous equations, once reserved to biological and econometric work, have made a come back because of Pearl). Anyway, my advice to Andrew is to not speak to econometricians but rather work on analysis of model specification.

To get the message across that unbiasedness shouldn’t be the first goal, perhaps statistics could take something from the way machine learning talks about it — overfitting! Doesn’t sound so desirable now does it :)

I gave you a hard time about this before, but then yesterday I saw a paper by Colin Cameron & Douglas Miller (Economics, UC Davis) which opens with the sentence “In an empiricist’s day-to-day practice, most effort is spent on getting unbiased or consistent point estimates.” They don’t think it’s a good thing, but they agree that it’s what most economists do. See: http://cameron.econ.ucdavis.edu/research/Cameron_Miller_JHR_2015_February.pdf

The idea that econometricians are obsessed with unbiasedness seems to me to be at least two decades out of date. Even when teaching undergraduates, the emphasis on finite-sample properties has shifted dramatically (at least for those of us who haven’t been teaching the same material for the past two decades!). For example, when teaching IV estimation, we rarely even discuss finite-sample properties such as unbiasedness, but instead focus solely on establishing consistency (and asymptotic normality). As another example, we tend to advocate H(A)C covariance matrix estimators, which are (usually) biased.

It is absolutely astonishing to see such a statistically literate and distinguised group of commenters reach a consensus that is so dangerously wrong. You all seem to be in agreement that “unbiasedness” should not be the primary goal in applied economics research. This is a truly disturbing conclusion.

I think what is going on is that you all fail to grasp what investigators actually mean when they use the word “unbiased” in the context of causal inference. The kind of bias that we are discussing – such as confounding bias, selection bias etc – simply cannot be traded away at any cost without making the results impossible to interpret.

Do:

We should put you in a room with all the econometricians who tell me that they don’t actually think of bias as paramount, and see who survives!

And, regarding your main point, no, there are no unbiased estimators in practice. In the real world it is not possible to get confounding bias, selection bias etc to zero. That makes results difficult (not “impossible”) to interpret, but that’s just the way it is.

I think that what Do-Operator is confused about here is that there are two notions of bias thrown around in the social sciences. First, bias in the intuitive sense: we are estimating a different object to the one we actually care about, perhaps due to confounding or selection problems. Second, bias in the statistical sense: we estimate the object we care about, but to do we use an estimator whose expected value is not that object. Economists usually do not call the first kind of bias “bias”, we call it “endogeneity problems” (I should know, I am an economist). So an audience of economists understands that when Andy says “bias” he means statistical bias.

I think that the audience Andy is describing was a little too generous to the economics profession. Econometricians know and teach the bias-variance tradeoff, yet I have never seen a single one teach their students an example that shows the value of choosing a biased-but-more-precise estimator. Consequently, you will not find many graduate students in economics who believe it is ok to use a biased estimator when an unbiased one is available. They understand that in many situations there is not one available (like doing IV). But they do not realise that – as Andy says – unbiasedness requires the model to be exactly true and so in fact we never have access to an unbiased estimator. I have tried to convince my colleagues of this point several times, without success. So I would love to meet the Princeton econometricians in Andy’s audience that day, who apparently believe that the obsession with unbiasedness is no longer a problem. Perhaps economics is more balkanised across departments than I realised!