## There’s No Such Thing As Unbiased Estimation. And It’s a Good Thing, Too.

Following our recent post on econometricians’ traditional privileging of unbiased estimates, there were a bunch of comments echoing the challenge of teaching this topic, as students as well as practitioners often seem to want the comfort of an absolute standard such as best linear unbiased estimate or whatever. Commenters also discussed the tradeoff between bias and variance, and the idea that unbiased estimates can overfit the data.

I agree with all these things but I just wanted to raise one more point: In realistic settings, unbiased estimates simply don’t exist. In the real world we have nonrandom samples, measurement error, nonadditivity, nonlinearity, etc etc etc.

So forget about it. We’re living in the real world.

P.S. Perhaps this will help. It’s my impression that many practitioners in applied econometrics and statistics think of their estimation choice kinda like this:

1. The unbiased estimate. It’s the safe choice, maybe a bit boring and maybe not the most efficient use of the data, but you can trust it and it gets the job done.

2. A biased estimate. Something flashy, maybe Bayesian, maybe not, it might do better but it’s risky. In using the biased estimate, you’re stepping off base—the more the bias, the larger your lead—and you might well get picked off.

This is not the only dimension of choice in estimation—there’s also robustness, and other things as well—but here I’m focusing on the above issue.

Anyway, to continue, if you take the choice above and combine it with the unofficial rule that statistical significance is taken as proof of correctness (in econ, this would also require demonstrating that the result holds under some alternative model specifications, but “p less than .05” is still key), then you get the following decision rule:

A. Go with the safe, unbiased estimate. If it’s statistically significant, run some robustness checks and, if the result doesn’t go away, stop.

B. If you don’t succeed with A, you can try something fancier. But . . . if you do that, everyone will know that you tried plan A and it didn’t work, so people won’t trust your finding.

So, in a sort of Gresham’s Law, all that remains is the unbiased estimate. But, hey, it’s safe, conservative, etc, right?

And that’s where the present post comes in. My point is that the unbiased estimate does not exist! There is no safe harbor. Just as we can never get our personal risks in life down to zero (despite what Gary Becker might have thought in his ill-advised comment about deaths and suicides), there is no such thing as unbiasedness. And it’s a good thing, too: recognition of this point frees us to do better things with our data right away.

1. Jonathan (another one) says:

Sure, but a lot of economics doesn’t try and get at what’s true, but at what’s “as if” true. Thus, it’s not that we think the effects of X on Y are really linear, but what we want is the implications of behavior if the data are “reasonably consistent” with linearity. Other commenters made the point that there are very few instances in which economics dictates functional form; that’s true, but even more important is that what economists are looking for (often, not always) is the “economic” model, not the the “true” model. To take a simple example that I know you hate, no one really believes in utility, but a properly identified model will identify behavior that, to some extent, behaves as if people are motivated by utility maximization. That counts as a win whether there’s a more complicated (atheoretic) model that does a better job or not.

So if you’re trying not to explain reality, but to explain how much “linear reality” explains the world, you are led naturally to BLUE models and throwing the rest of your “but reality isn’t really like this” observations into a big pile called “parts of the real world I’m not trying to explain here.”

Again, there are gigantic swaths of economics in which what you’re saying is exactly right, but there are also big swaths in which it is irrrelevant.

• Andrew says:

Jonathan:

For questions such as, “What are the effects of early childhood intervention” or “What are the effects of air pollution on health outcomes” or “What percentage of people support policy X” or “How many people would buy product Y” or . . . for any of these, there is no such thing as unbiased estimation.

I agree that mathematical models are helpful—I do lots of math myself!—and I agree with earlier commenters about tradeoffs between bias and variance (even if I might not always frame things in that way), but I think it’s good to remember that in real situations, there are no unbiased estimates.

So, if you’re deciding how to analyze some data, it’s not like you have the choice between a safe unbiased estimate and a biased estimate that might be better but is risky. I added a P.S. to the above post to elaborate.

2. D.O. says:

But aren’t you answering a different question from what you asked in the PS? The safeness of unbiased estimates (as you describe it, I have no first-hand knowledge) comes not from they factual unbiasedness, but from their acceptance as the default choice. In other words, they are safe not as method of inquiry, but as a method of publication.

• Andrew says:

D.O.:

No, I meant that unbiased estimation is seen as safe from a statistical perspective, that it might not be the most efficient thing but the resulting inferences have the correct statistical properties.

• D.O. says:

Thank you, I think I’ve got it now. But what can help is some realistic estimate of how much bias one can tolerate depending on how much one is unsure about the relevancy of the statistical method etc. In bias-variance trade-off it is a well understood situation dealt with numerically and so forth. If your objection is more abstract, its hard to heed it without some estimate of how much there is to lose.

• Andrew says:

D.O.:

Yes, I agree completely. I want to move away from a lexicographic attitude that unbiasedness is privileged. Once we recognize that unbiasedness doesn’t even exist, it’s easier for us to move on to discuss tradeoffs. But as long as people think that the safe, unbiased estimate is there for them as an option, I fear this is holding them back.

3. Martha says:

The phrase “unbiased estimate” that you use is strange to me — I don’t think I’ve ever used it, just “unbiased estimator”. The statistical definition (at least, that I’m familiar with) of “unbiased” refers to an estimator (which is a random variable); it says that the expected value of the estimator is the parameter being estimated. So “unbiased estimate” doesn’t make sense to me, because an estimate (which is a number, not a random variable) doesn’t have an expected value.

• P Dant says:

Don’t be such a pedant. It’s clear from context that the “unbiased estimate” is the estimate produced by an unbiased estimator.

• Martha says:

I was assuming from context that, as you say, “unbiased estimate” is the estimate produced by an unbiased estimator for a particular sample. But my point was that I don’t recall encountering the term, except in Andrew’s discussion. Maybe economists do use it — but I don’t generally read economics literature. Do others used it routinely?

After writing the above paragraph, I was about to suggest it would be more accurate to call me naive than pedantic — but then I noticed belatedly your handle “P Dant” — the belatedness being further evidence that I indeed tend to be naive.

In any event, the term “unbiased estimator” sure seems like an abuse of notation to me, and an unfortunate one, since it leads to misunderstanding.

• Andrew says:

Martha:

I understand what you mean, but I think the term “estimate” is unambiguous here. The estimate theta.hat(y) is a function of data y. Thus, when considering its sampling distribution, theta.hat is a random variable with distribution induced by the distribution of y. The estimate theta.hat is “a number” (as you put it) conditional on y; it is a random variable unconditional on y. The expectation involved in calling theta.hat an “unbiased estimate” is the expectation averaging over y, conditional on theta. Or, in the situations where theta is considered a predictive quantity, not an estimate, classical statisticians refer to an unbiased prediction, in which case the expectation averages over y and theta and is conditional on some hyperparameters phi.

• Z says:

Andrew, I think there’s another distinction between how you use ‘estimate’ and how Martha uses ‘estimator’. When you (Andrew) say an ‘estimate’ is biased you seem to take into account its interpretation. For example, if you’re trying to estimate a causal quantity in the presence of confounding then your estimate will be biased. It will be biased even if it is an unbiased estimator for the non-causal association between treatment and outcome adjusting for whatever covariates you adjusted for. In other words, I think estimates can be biased for extra-statistical reasons while estimators cannot.

4. Haynes Goddard says:

While it has been some time since I taught econometrics, Martha has it right: it is the “estimator”, i.e., OLS, that is unbiased, with the standard assumptions: linearity in parameters, random sampling, zero conditional mean, not the estimate. And just checking a recent and popular text, Woodridge, Introductory Econometics, confirms this.

The actual estimate can be “wrong” or biased in the sense that Andrew uses for all the reasons he mentioned, and more.

So is Andrew’s take a matter of subdisciplines of statistics using different vocabularies?

• Andrew says:

Haynes:

The point is that there is no need in a mathematical sense to have two different words, one for the number and one for the random variable. They’re the same thing. The number is just the random variable, conditional on data. And the random variable is just the number, considering the data as a random variable from their sampling distribution. So I don’t think it is a good idea to use two different words for this one thing.

• Martha says:

Andrew,

I don’t understand this. Not sure whether it’s that I need to think about it more, or that your explanation is not good enough, or both. Here’s an attempt to explain my puzzlement at least in part, in the hopes that it can help bring some clarity to one or the other or both of us:

As I understand it, the random variable in question is the estimator — say the estimator “sample mean”. To each (suitable) data set, it assigns a number, in this case by plugging the number into the formula for sample mean. Am I correct that that (i.e., the value of the random variable for that particular data set) what you mean by the random variable, conditional on the data?

But then you say, “the random variable is just the number, considering the data as a random variable from their sampling distribution.” I’m not sure what you’re saying here — in particular, you seem to be talking about “the sampling distribution of the data.” Is this indeed what you are talking about? However, I don’t know what this means. I am familiar with “sampling distribution of a statistic,”but haven’t encountered the notion of “sampling distribution of the data.”
Or are you talking about the data as a value of the vector-valued random variable (X1, X2, … , Xn) where the Xi are iid, and then saying that the estimator is a function of this vector-valued rv? But this is still not a number, unless you take the particular data. Yeah, I’m really confused by this.

• Andrew says:

Martha:

It’s simple. What you’re calling the estimate is a function of the data, y, considering y as a fixed vector. What you’re calling the estimator is a function of a function of the data, y, considering y as a random variable with distribution p(y|theta), which I was calling the sampling distribution. Either way it’s the same function, theta.hat(y).

• Martha says:

We still seem to be on different wavelengths, so I’ll try again to see if I can clarify:

1) You said at 3:57 pm, “The point is that there is no need in a mathematical sense to have two different words, one for the number and one for the random variable. They’re the same thing.”

To me, the number and the random variable are *not* the same thing. The number is a particular *value* of the random variable.

2)You said at 8:00 pm, “What you’re calling the estimate is a function of the data, y, considering y as a fixed vector.” OK, I can see this if you use “function of” in a certain way. But I’d say, “the estimate is the *value* of a certain function evaluated at the data” Let’s call that function F. (So, for example, if the estimator is “sample mean,” then F(x1, .. , xn) = (x1+ … +xn)/n, so the estimate would be (y1+ … +yn)/n)

You also said, “What you’re calling the estimator is a function of a function of the data, y, considering y as a random variable”. OK. But I’ll use the notation I try to use to distinguish between a random variable and a particular value of the random variable: I use capital letters to stand for random variables and lower case letters to stand for values of the random variable. So in my notation, I’ll use Y = (Y1, …, Yn) where you use y in the last quoted sentence. So the estimator in this sentence is F(Y1, … Yn), which in the example would be (Y1 + + Yn)/n. My notation emphasizes that the estimator is not the same as the estimate. Sure, they both involve the same function F, but that doesn’t mean they’re the same mathematical object: the estimator is itself a random variable, but the estimate is a number (that happens to be one value of the estimator, the value for the data at hand). Those are not the same thing.

5. Haynes Goddard says:

Andrew:

I don’t disagree, but I think there is a communication issue here. I decided to drag out my very old econometrics text by Judge, Hill, Griffiths, Lütkepohl and Lee, “Introduction to the Theory and Practice of Econometrics” to see how I, and econometricians then (the ‘80’s), covered it. On page 27, they say, with frequentist reasoning:

“Returning to the problem of evaluating the least squares estimates, it is clear that we must evaluate the *rule* [their italics] provided by the least squares criterion, because *particular* estimates cannot be judged good or bad”.

I take it that Andrew would disagree with that latter part of the sentence for the reasons he gives, and I would agree with him.

They go on: “What we want is a rule or estimator that will yield estimates as close as possible to the true parameter value regardless of the true parameter value.”

I recall that I would try to get the intuition of this across with the example of a faulty eyeglasses prescription. As it happened, my ophthalmologist had decided that I needed a prismatic correction in my new lenses. Once my brain had adjusted to the new lenses, I discovered that, as I stood in the shower, without glasses of course, the water in the bottom of the tub ran upward to the drain, and out. I likened my new lenses to a new and distorted rule with which to perceive or take the measure of the world, that is, to a biased estimator, and that analogy seems to get the point across.

That is all I see Martha saying, and thus my agreement with her.

Now I do agree with Andrew that really there is no such thing as an unbiased estimate of the world, as he notes: “In realistic settings, unbiased estimates simply don’t exist. In the real world we have nonrandom samples, measurement error, nonadditivity, nonlinearity, etc etc etc.”.

But apparently, econometricians are still making the distinction. So I would conclude that if the estimator is biased in addition to the reasons Andrew mentions, than we have additional an reason to conclude that the estimates are biased views of reality. However, if we can eliminate or greatly reduce nonrandomess, measurement error, etc., then our measuring stick, OLS, can be viewed as giving us fairly true estimates of the world, and is unbiased.

• Martha says:

Haynes,

I agree that there’s probably a communication issue here. But I’m not sure that what you think I’m saying is what I’m saying.

To elaborate: Your book says, “What we want is a rule or estimator that will yield estimates as close as possible to the true parameter value regardless of the true parameter value,” but I don’t see that as what an unbiased estimator actually gives.

• jrc says:

Martha,

Agreed. As one example – imagine an estimator that is mildly biased but incredibly efficient and produces confidence intervals with appropriate coverage. Then the CI will cover the true mea 95% of the time, and the estimator, though biased, will be very close to the true mean in each of those cases (because the CIs are small because the estimator is efficient).

Compare that with an incredibly inefficient but unbiased estimator. That estimator converges to the true mean, but any particular estimate will be incredibly imprecise (it will have very large CIs). Obviously, the first estimator is going to get you “as close as possible” to the true mean (meaning expected distance from true mean) even though it is biased (due to the relative variances of the BetaHats).

Now – I think the hard bit for applied economists to stomach is the idea that you could actually get good coverage rates from the first estimator. But supposing you could, then it is a better estimate in almost every sense other than “statistically unbiased”.

• Anonymous says:

@jrc I would go beyond that and say this is more than a bias-variance tradeoff metaphor or argument that variance is just as bad or worse than bias. I think the claim can be made in a literal sense.

Look at it this way – shrinkage estimators are sometimes characterized as “biased” because borrowing information between parameters produces so-called “biased” estimates of the individual parameters.

That’s acceptable as a description, although not valid as a critique of shrinkage due to bias-variance arugment. However, by that argument, assigning an “unbiased” estimator as a description of a population implicitly borrows information across subjects to ascribe a single value to members of the population. Thus if you’re willing to call shrinkage methods biased, then assigning a single statistic to a population is a form of bias at the level of the individual subject.

I think andrew has said something along these lines previously drawing a (IMO poor) analogy to notions of intrinsic uncertainty in physics, but I don’t think the point has been adequately communicated or understood.

• Andrew says:

Anon:

Yes, well put. I particularly like your point that the so-called unbiased estimate also pools across subjects to ascribe a single value to members of the population. An econometrician might define this away by saying that his or her goal is simply the population mean, but then this raises two more problems:

1. In real life we don’t have simple random samples so, in fact, the mean from the data (or any purportedly unbiased estimate from the data) won’t be an unbiased estimate of the population mean of interest.

2. Why is the goal the population mean? That only makes sense if there is some interest in pooling information across this population.

So, yes, I agree with you, and you put it well.

• Haynes Goddard says:

Martha:

I did not of course quote everything. The authors go on in that section to indicate that an unbiased estimator yields E[b hat} = beta, the parameter to be estimated.

Is that not what you were saying?

• Martha says:

Yes, in the sense that this is “What we get” as opposed to “What we want”. Indeed, frequentist statistics seems to be filled with, “This is what we want … but this is what we get” situations — p-values and confidence intervals are other examples.