Andrew, I think there’s another distinction between how you use ‘estimate’ and how Martha uses ‘estimator’. When you (Andrew) say an ‘estimate’ is biased you seem to take into account its interpretation. For example, if you’re trying to estimate a causal quantity in the presence of confounding then your estimate will be biased. It will be biased even if it is an unbiased estimator for the non-causal association between treatment and outcome adjusting for whatever covariates you adjusted for. In other words, I think estimates can be biased for extra-statistical reasons while estimators cannot.

]]>Yes, in the sense that this is “What we get” as opposed to “What we want”. Indeed, frequentist statistics seems to be filled with, “This is what we want … but this is what we get” situations — p-values and confidence intervals are other examples.

]]>We still seem to be on different wavelengths, so I’ll try again to see if I can clarify:

1) You said at 3:57 pm, “The point is that there is no need in a mathematical sense to have two different words, one for the number and one for the random variable. They’re the same thing.”

To me, the number and the random variable are *not* the same thing. The number is a particular *value* of the random variable.

2)You said at 8:00 pm, “What you’re calling the estimate is a function of the data, y, considering y as a fixed vector.” OK, I can see this if you use “function of” in a certain way. But I’d say, “the estimate is the *value* of a certain function evaluated at the data” Let’s call that function F. (So, for example, if the estimator is “sample mean,” then F(x1, .. , xn) = (x1+ … +xn)/n, so the estimate would be (y1+ … +yn)/n)

You also said, “What you’re calling the estimator is a function of a function of the data, y, considering y as a random variable”. OK. But I’ll use the notation I try to use to distinguish between a random variable and a particular value of the random variable: I use capital letters to stand for random variables and lower case letters to stand for values of the random variable. So in my notation, I’ll use Y = (Y1, …, Yn) where you use y in the last quoted sentence. So the estimator in this sentence is F(Y1, … Yn), which in the example would be (Y1 + + Yn)/n. My notation emphasizes that the estimator is not the same as the estimate. Sure, they both involve the same function F, but that doesn’t mean they’re the same mathematical object: the estimator is itself a random variable, but the estimate is a number (that happens to be one value of the estimator, the value for the data at hand). Those are not the same thing.

]]>Martha:

I did not of course quote everything. The authors go on in that section to indicate that an unbiased estimator yields E[b hat} = beta, the parameter to be estimated.

Is that not what you were saying?

]]>Anon:

Yes, well put. I particularly like your point that the so-called unbiased estimate also pools across subjects to ascribe a single value to members of the population. An econometrician might define this away by saying that his or her goal is simply the population mean, but then this raises two more problems:

1. In real life we don’t have simple random samples so, in fact, the mean from the data (or any purportedly unbiased estimate from the data) won’t be an unbiased estimate of the population mean of interest.

2. Why is the goal the population mean? That only makes sense if there is some interest in pooling information across this population.

So, yes, I agree with you, and you put it well.

]]>@jrc I would go beyond that and say this is more than a bias-variance tradeoff metaphor or argument that variance is just as bad or worse than bias. I think the claim can be made in a literal sense.

Look at it this way – shrinkage estimators are sometimes characterized as “biased” because borrowing information between parameters produces so-called “biased” estimates of the individual parameters.

That’s acceptable as a description, although not valid as a critique of shrinkage due to bias-variance arugment. However, by that argument, assigning an “unbiased” estimator as a description of a population implicitly borrows information across subjects to ascribe a single value to members of the population. Thus if you’re willing to call shrinkage methods biased, then assigning a single statistic to a population is a form of bias at the level of the individual subject.

I think andrew has said something along these lines previously drawing a (IMO poor) analogy to notions of intrinsic uncertainty in physics, but I don’t think the point has been adequately communicated or understood.

]]>Martha,

Agreed. As one example – imagine an estimator that is mildly biased but incredibly efficient and produces confidence intervals with appropriate coverage. Then the CI will cover the true mea 95% of the time, and the estimator, though biased, will be very close to the true mean in each of those cases (because the CIs are small because the estimator is efficient).

Compare that with an incredibly inefficient but unbiased estimator. That estimator converges to the true mean, but any particular estimate will be incredibly imprecise (it will have very large CIs). Obviously, the first estimator is going to get you “as close as possible” to the true mean (meaning expected distance from true mean) even though it is biased (due to the relative variances of the BetaHats).

Now – I think the hard bit for applied economists to stomach is the idea that you could actually get good coverage rates from the first estimator. But supposing you could, then it is a better estimate in almost every sense other than “statistically unbiased”.

]]>Martha:

It’s simple. What you’re calling the estimate is a function of the data, y, considering y as a fixed vector. What you’re calling the estimator is a function of a function of the data, y, considering y as a random variable with distribution p(y|theta), which I was calling the sampling distribution. Either way it’s the same function, theta.hat(y).

]]>Haynes,

I agree that there’s probably a communication issue here. But I’m not sure that what you think I’m saying is what I’m saying.

To elaborate: Your book says, “What we want is a rule or estimator that will yield estimates as close as possible to the true parameter value regardless of the true parameter value,” but I don’t see that as what an unbiased estimator actually gives.

]]>Andrew,

I don’t understand this. Not sure whether it’s that I need to think about it more, or that your explanation is not good enough, or both. Here’s an attempt to explain my puzzlement at least in part, in the hopes that it can help bring some clarity to one or the other or both of us:

As I understand it, the random variable in question is the estimator — say the estimator “sample mean”. To each (suitable) data set, it assigns a number, in this case by plugging the number into the formula for sample mean. Am I correct that that (i.e., the value of the random variable for that particular data set) what you mean by the random variable, conditional on the data?

But then you say, “the random variable is just the number, considering the data as a random variable from their sampling distribution.” I’m not sure what you’re saying here — in particular, you seem to be talking about “the sampling distribution of the data.” Is this indeed what you are talking about? However, I don’t know what this means. I am familiar with “sampling distribution of a statistic,”but haven’t encountered the notion of “sampling distribution of the data.”

Or are you talking about the data as a value of the vector-valued random variable (X1, X2, … , Xn) where the Xi are iid, and then saying that the estimator is a function of this vector-valued rv? But this is still not a number, unless you take the particular data. Yeah, I’m really confused by this.

I don’t disagree, but I think there is a communication issue here. I decided to drag out my very old econometrics text by Judge, Hill, Griffiths, Lütkepohl and Lee, “Introduction to the Theory and Practice of Econometrics” to see how I, and econometricians then (the ‘80’s), covered it. On page 27, they say, with frequentist reasoning:

“Returning to the problem of evaluating the least squares estimates, it is clear that we must evaluate the *rule* [their italics] provided by the least squares criterion, because *particular* estimates cannot be judged good or bad”.

I take it that Andrew would disagree with that latter part of the sentence for the reasons he gives, and I would agree with him.

They go on: “What we want is a rule or estimator that will yield estimates as close as possible to the true parameter value regardless of the true parameter value.”

I recall that I would try to get the intuition of this across with the example of a faulty eyeglasses prescription. As it happened, my ophthalmologist had decided that I needed a prismatic correction in my new lenses. Once my brain had adjusted to the new lenses, I discovered that, as I stood in the shower, without glasses of course, the water in the bottom of the tub ran upward to the drain, and out. I likened my new lenses to a new and distorted rule with which to perceive or take the measure of the world, that is, to a biased estimator, and that analogy seems to get the point across.

That is all I see Martha saying, and thus my agreement with her.

Now I do agree with Andrew that really there is no such thing as an unbiased estimate of the world, as he notes: “In realistic settings, unbiased estimates simply don’t exist. In the real world we have nonrandom samples, measurement error, nonadditivity, nonlinearity, etc etc etc.”.

But apparently, econometricians are still making the distinction. So I would conclude that if the estimator is biased in addition to the reasons Andrew mentions, than we have additional an reason to conclude that the estimates are biased views of reality. However, if we can eliminate or greatly reduce nonrandomess, measurement error, etc., then our measuring stick, OLS, can be viewed as giving us fairly true estimates of the world, and is unbiased.

]]>Haynes:

The point is that there is no need in a mathematical sense to have two different words, one for the number and one for the random variable. They’re the same thing. The number is just the random variable, conditional on data. And the random variable is just the number, considering the data as a random variable from their sampling distribution. So I don’t think it is a good idea to use two different words for this one thing.

]]>The actual estimate can be “wrong” or biased in the sense that Andrew uses for all the reasons he mentioned, and more.

So is Andrew’s take a matter of subdisciplines of statistics using different vocabularies?

]]>I was assuming from context that, as you say, “unbiased estimate” is the estimate produced by an unbiased estimator for a particular sample. But my point was that I don’t recall encountering the term, except in Andrew’s discussion. Maybe economists do use it — but I don’t generally read economics literature. Do others used it routinely?

After writing the above paragraph, I was about to suggest it would be more accurate to call me naive than pedantic — but then I noticed belatedly your handle “P Dant” — the belatedness being further evidence that I indeed tend to be naive.

In any event, the term “unbiased estimator” sure seems like an abuse of notation to me, and an unfortunate one, since it leads to misunderstanding.

]]>Martha:

I understand what you mean, but I think the term “estimate” is unambiguous here. The estimate theta.hat(y) is a function of data y. Thus, when considering its sampling distribution, theta.hat is a random variable with distribution induced by the distribution of y. The estimate theta.hat is “a number” (as you put it) conditional on y; it is a random variable unconditional on y. The expectation involved in calling theta.hat an “unbiased estimate” is the expectation averaging over y, conditional on theta. Or, in the situations where theta is considered a predictive quantity, not an estimate, classical statisticians refer to an unbiased prediction, in which case the expectation averages over y and theta and is conditional on some hyperparameters phi.

]]>Don’t be such a pedant. It’s clear from context that the “unbiased estimate” is the estimate produced by an unbiased estimator.

]]>D.O.:

Yes, I agree completely. I want to move away from a lexicographic attitude that unbiasedness is privileged. Once we recognize that unbiasedness doesn’t even exist, it’s easier for us to move on to discuss tradeoffs. But as long as people think that the safe, unbiased estimate is there for them as an option, I fear this is holding them back.

]]>Thank you, I think I’ve got it now. But what can help is some realistic estimate of how much bias one can tolerate depending on how much one is unsure about the relevancy of the statistical method etc. In bias-variance trade-off it is a well understood situation dealt with numerically and so forth. If your objection is more abstract, its hard to heed it without some estimate of how much there is to lose.

]]>D.O.:

No, I meant that unbiased estimation is seen as safe from a statistical perspective, that it might not be the most efficient thing but the resulting inferences have the correct statistical properties.

]]>The PS helps. Thanks.

]]>Jonathan:

For questions such as, “What are the effects of early childhood intervention” or “What are the effects of air pollution on health outcomes” or “What percentage of people support policy X” or “How many people would buy product Y” or . . . for any of these, there is no such thing as unbiased estimation.

I agree that mathematical models are helpful—I do lots of math myself!—and I agree with earlier commenters about tradeoffs between bias and variance (even if I might not always frame things in that way), but I think it’s good to remember that in real situations, there are no unbiased estimates.

So, if you’re deciding how to analyze some data, it’s not like you have the choice between a safe unbiased estimate and a biased estimate that might be better but is risky. I added a P.S. to the above post to elaborate.

]]>So if you’re trying not to explain reality, but to explain how much “linear reality” explains the world, you are led naturally to BLUE models and throwing the rest of your “but reality isn’t really like this” observations into a big pile called “parts of the real world I’m not trying to explain here.”

Again, there are gigantic swaths of economics in which what you’re saying is exactly right, but there are also big swaths in which it is irrrelevant.

]]>