The talk is tomorrow, Tues 24 Feb, 2:40-4:00pm in 200 Fisher Hall:

“Unbiasedness”: You keep using that word. I do not think it means what you think it means.

Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University

Minimizing bias is the traditional first goal of econometrics. In many cases, though, the goal of unbiasedness can lead to extreme claims that are both substantively implausible and not supported by data. We illustrate with several examples in areas ranging from public opinion to social psychology to public heath, using methods including regression discontinuity, hierarchical models, interactions in regression, and data aggregation. Methods that purport to be unbiased, aren’t, once we carefully consider inferential goals and select on the analyses that are actually performed and reported. The implication for econometrics research: It’s best to be aware of all sources of error, rather than to focus narrowly on reducing bias with respect to one particular aspect of your model.

This work reflects collaboration with Guido Imbens and others. Here are the slides, and people can read the following papers for partial background:

Why high-order polynomials should not be used in regression discontinuity designs. (Andrew Gelman and Guido Imbens)

[2015] Evidence on the deleterious impact of sustained use of polynomial regression on causal inference. Research and Politics. (Andrew Gelman and Adam Zelizer)

[2014] Beyond power calculations: Assessing Type S (sign) and Type M (magnitude) errors. Perspectives on Psychological Science 9, 641-651. (Andrew Gelman and John Carlin)

Not to mention that minimizing bias usually comes with the side effect of increasing variance; one could argue that a distinct but well-understood and well-defined bias in conjunction with a narrow variance might be preferable.

Great title … hopefully the talk lives up to the movie :-)

+1 Your best seminar title to date …

After reading those papers on Regression Discontinuity analysis, I have a couple of questions if you (or someone else) don’t mind:

1. Why fit independent polynomials on either side of the discontinuity? Couldn’t you model the relationship as a single global polynomial plus a step function at the threshold? (I noticed that, in many of the models in these papers, a lot times the polynomials take sudden-looking turns close to the threshold; e.g. Fig.3 in Gelman & Zelizer.) Wouldn’t this make the discontinuity measurement a bit more stable?

2. In Gelman & Imbens you talk about how the polynomial fits give bad weights to the x-values farther from the threshold (among other problems), and show that local fits do better. Would it be possible to do a weighted polynomial fit, with the weights defined by (say) a normal curve centred at the threshold? I would have thought that would be better than discarding data farther than the bandwidth and keeping the rest at full strength.

(I don’t mean to derail the thread, though. If this is too off-topic, please feel free to delete this comment.)

Erf:

Guido and I really need to add a bit more to our paper. But the short answer is that I think the original sin of regression discontinuity analysis is the monomaniacal focus on the forcing variable. The issue isn’t so much that the functional form is wrong, but that in many applications the whole idea of a “functional form”—the idea that the outcome is determined (with error) by the forcing variable—is wrong. The big big problem is omitted variable bias, in particular omitted variables that are correlated or otherwise dependent with the forcing variable. All the careful nonparametric smoothing in the world won’t solve that.

Also, above and beyond all this, polynomials typically don’t make sense. If you

dowant to fit a 4-parameter family, why a cubic polynomial? Where did that come from??? Linear as a starting point I can see: linear is understandable, and lots of things are locally linear. Cubic polynomials, not so much, which is one reason the graph in that China paper was such a joke.Naive question: What are other options in the 4 parameter family? Splines? Are they better?

Actually, the typical goal in econometrics used to be, either minimizing mean square errors or related criteria (specially after James-Stein) or the usual goal of minimizing the variance subject to unbiasedness…..at least when dealing with relatively basic linear models and deterministic regressors. But in general econometrician care most about asymptotic properties, consistency and some type of efficiency in asymptotic distributions. I am not so sure about who has the goal in minimizing bias….

+1

I’m a complete outsider here, with no dog in this fight. But just out of curiosity, can you explain how your “…usual goal of minimizing the variance subject to unbiasedness” is distinct from Gelman’s claim that “Minimizing bias is the traditional first goal of econometrics.”? It seems to me that if you mandate unbiasedness and then seek to minimize variance, then your first goal was minimizing bias (in fact, making it zero) and your second goal was minimizing variance.

This is actually the classical approach in statistics, old econometrics build on that. But certainly this has not been a central issue in econometrics since long ago, or at least not any more than in mainstream statistics.

That is one way, but minimizing mean square errors is another, and other loss función have been considered. But this is not exclusive of econometrics. It happens in statistics in general.

Thanks for the replies. My impression was that our host was implying that econometrics was hanging on to the more traditional approaches (like MVUE) longer than other statistics-heavy disciplines had, but from what you say above and in other threads, that does not appear to be the case (and quite possibly, that’s not what Gelman was implying at all). I appreciate your clearing that up for me.

My impression is that when we talk about robustness of an inference procedure with respect to any deviation from the usual assumptions, one usually needs to balance ability of the procedure to bound inconsistence error with some cost of higher variances when the assumption actually holds, so that trade offs are taken into account. This is done in many branches, for example non parametrics, and robust estimation to outliers.

“Minimizing bias is the traditional first goal of econometrics.” Really?? I must have missed something over the past 40 years!

Dave:

Yes, I think you have missed something. But maybe you’re better off having missed it, as I think unbiasedness is a misguided notion. But, just in case you need convincing, here’s something from the Wikipedia entry on Econometrics. As always, I use Wikipedia not as a measure of absolute truth but as a signal of consensus. Anyway, here it is, under the heading Theory:

Unbiasedness is listed first, and my impression is that unbiasedness is the first goal, with efficiency coming second. And, indeed, this sounds reasonable, that efficiency is secondary, what’s the point of being efficient if you’re biased? But, as I discuss in my talk, this attitude leads to lots of problems. (The third item on the Wikipedia list is consistency, but in most of the models we consider, that comes for free and is not really a concern.)

Andrew:

Have you published on Econometrics?

Just curious. I know you’ve published tons of work on Poli Sci, Radon, Groundwater etc. but didn’t know if you dabbled in Econometrics too.

Yes, I’ve published in economics and econometrics, but not much. I’m not an expert in the field, so these are just my impressions. But I think I have something useful to add, just as, conversely, econometricians have something to add by sharing their perspectives with statisticians.

Andrew – Oh come on. Wikipedia?? I haven’t missed anything at all. The attitude that you refer to is NOT the norm in terms of how econometrics is taught. I have no argument with your point that unbiasedness should not be given high priority – and none of my students would disagree with you either. My objection is to your sweeping statement that purports to tell us how econometricians think. I’m afraid you’er wrong about the latter.

Dave:

I never purported to tell you how econometricians think. What I wrote was that minimizing bias is the traditional first goal of econometrics.

And, yes, Wikipedia. I’m glad that your students are not taught that way. I don’t think Guido Imbens teaches econometrics that way either. There are many strands in econometrics, but I do think that the Wikipedia entry gives a sense of what a lot of people think.

And, for that matter, I describe minimizing bias as a

traditionalfirst goal, and I know that lots of people are going beyond tradition. Still, tradition is a starting point. And, as I discuss in my talk, I think that a lot of practical mistakes that people make in statistics are connected with this bias-first attitude, even for methods that are not explicitly framed in terms of bias. An example are those silly high-degree polynomial regression discontinuity analyses: they are not generally presented as unbiased but I think the goal of unbiasedness is related to people making the error of doing these analyses.Most discussions of bias I have seen in econometrics, since the first course I took in the subject about 30 years ago, have framed it as a question of trading off bias against variance. This is not a question of ‘people going beyond the tradition’ – it is the tradition. I think that if you talk about minimizing bias as *the* traditional objective or the primary objective you will appear naive to most people who have studied more than a very small amount of econometrics.

Frederick:

Thanks for the comment. I’d like to avoid appearing more naive than necessary, so this sort of pre-talk feedback is helpful.

In response to your comment: Yes, I agree that people are aware of the tradeoff. And, if you frame the problem as, Would you prefer bias=0 and se=100, or bias=.1 and se=1, then, yes, people would go with the latter. But I still think that bias is the traditional

firstgoal—I did not say theonlygoal. If you were to put it in bias-variance terms, bias is given too high a priority. But I think that in practice this is not stated explicitly but it is how things turn out.Andrew – Semantics! It is NOT the traditional first goal. Sorry, but you’re misrepresenting the discipline. Too bad.

I’ll back Andrew up a little bit, but I think it needs qualification:

The focus of reduced-form econometrics aimed at estimating causal effects has been primarily on unbiasedness – at least in my very recent training. That is, the major concern of practitioners estimating causal effects from observational data has been to get an unbiased estimate first, and one with the smallest possible standard errors given unbiasedness second.

That is not true the in the theoretical econometrics literature (or structural empirical work), where I think the focus has been much more on finding efficient estimators. But that is always “efficient given some set of assumptions”, its just that the assumptions in the theoretical literature, even the weaker ones, are often not accepted by empirical researchers in economics. Empirical reduced-form economists are usually trying to convince the reader that their model latches on to “good” identifying variation in the world, not that their model is the most efficient means of estimating that thing in the world if such and such condition holds.

I get that this is a little wishy-washy, in that, in order to convince someone you have good identifying variation, you are making assumptions, and that those assumptions imply some efficient estimator, but more often in practice you see people using things like cluster robust (arbitrary) SE estimators, when an FGLS (and more efficient) estimator might still satisfy the explicit assumptions of their model.

So I think Andrew is right, at least regarding a dominant mode of economic empiricism, but not necessarily right about the aims of econometric theory.

Does that strike you as a reasonable interpretation?

Dave:

I gave my talk, I spoke with the econometricians, and I realize you’re right. More in a future post, once I get organized to write it.

Andrew – thank you for the feedback – I’ll look forward to what you have to say later.

Aris Spanos is an example of an active academic econometrician who, based on my outsider’s impressions, seems to put more emphasis on unbiasedness as a desirable optimality property and less emphasis on MSE minimization than is common in the econometrics mainstream as represented by Dave Giles and Frederick Guy here in these comments.

I disagree with Andrew here – it is absolutely essential that unbiasedness is the primary objective for applied statisticians doing causal inference. Bias (in the causal inference sense of the word) is a function of whether the thing you are estimating corresponds to the research question. How can you possibly be willing to accept deviations from whether you are estimating the right thing?

I don’t even understand the purpose of thinking about other properties (such as variance) unless you have eliminated bias. The SE is a weird, non-Bayesian measure of the strength of the evidence. If you are willing to accept bias just to make the evidence appear stronger, you are committing a serious epistemic sin which will massively degrade the validity of the evidence.

To roughly illustrate the idea of the bias-variance tradeoff: Suppose you are trying to estimate a parameter, and suppose the true value is 0. Consider two possible estimators: One is unbiased and has sampling distribution normal with mean 0 and sd 1. The other is biased, and has sampling distribution normal with mean 0.5 and sd 0.5. Draw a picture to see: for 95% of samples, the second estimator (the biased one) gives estimates between -0.5 and 1.5 — which are within 1.5 of the true value. But noticeably fewer than 95% percent of samples will give values of the unbiased estimator that are within 1.5 of the true value.

I am not using the word “bias” in the statistical sense (ie, whether when you apply an estimator to a sample, the expected value is equal to the parameter) but in the causal inference sense (ie, whether the parameter is a valid identification of the causal effect). It is unfortunate that we use the same word for both concepts.

Let’s say you are interested in estimating the causal effect X, where the true value of X is 0. What happens if you accept bias, is that you implicitly decide to instead estimate Y, whose true value is 0.5. This may reduce your variance, but it is intellectually dishonest to present it as an estimate of X unless you have some very transparent reasoning for why the difference between X and Y has to be equal to 0.5. Perhaps you can do this using sensitivity analysis or bounds, but that is generally not what scientists do.

You have to put yourself in the shoes of the investigator and ask yourself what information they have access to. They are clearly not going to know that the exact magnitude of bias is 0.5; where would such information possibly come from? Therefore, when they accept bias, they have no idea how much it affects their analysis. My view is that the standard error then loses its value as a measure of the strength of the evidence.

“I am not using the word “bias” in the statistical sense [] but in the causal inference sense [].”

Since AG

isusing the word “bias” in the statistical sense, this makes your original comment (“I disagree with Andrew here…”) a non sequitur, no?I think the econometricians who he is responding to are trying to get at a causal inference concept of bias. Also note that Andrew’s slides refer to “omitted variable bias” which is clearly a causal inference bias rather than a statistical bias. (To illustrate, if you run a regression model with an omitted variable, the maximum likelihood parameter estimates are still consistent – the proof for consistency does not depend on including the right variables)

Fair enough.

Anders:

I would agree with you in the context where idealized un-confounded randomized trial are feasible – why not avoid confounding if its possible (its extremely hard to quantify).

But

1. People here are using the term bias which is a math/stat term that can arise estimating treatment effects even from non-confounded studies (Greenland once qualified confounding as asymptotic bias).

2. There are very few contexts where idealized un-confounded randomized trial are feasible (none in human studies).

Now, I did have fun arguing your take (infinite penalty on bias) with Sander Greenland when we were writing http://biostatistics.oxfordjournals.org/content/2/4/463.abstract but I expected to lose and did.

Anders: asking “whether the thing you are estimating corresponds to the research question” is a great starting point. But, faced with finite samples, asking whether you can estimate that “thing” better allowing some bias in order to reduce variance is also a non-trivial concern, in both Bayesian and non-Bayesian analyses.

Also… “I don’t even understand the purpose of thinking about other properties” – this is argument from incredulity. Just because you can’t imagine or comprehend such an argument doesn’t mean that there isn’t one, or can’t be one.

George, I may have phrased that awkwardly but I stand by my main point: If you allow bias in order to reduce the standard error, this has the effect of reducing the

reporteduncertainty about the research findings. It doesn`t reduce theactualuncertainty. The whole procedure is therefore an example of a broken epistemology.Anders: sorry, but it’s not clear what you mean by

actualuncertainty.Perhaps a measure of spread in a Bayesian posterior for the “thing” that corresponds to the research question?

Or the same measure but also allowing for sensitivity over a class of priors?

Or perhaps you are making the distinction, in frequentist analysis, between naive standard errors that ignore shrinkage/model-selection/etc, vs those that take the actual analytic steps into account?

OK, I recognize that I’ve been imprecise and vague here. This is roughly what I mean:

Imagine you are interested in the causal parameter X. You want a point estimate and a confidence interval for X. You know that the statistical parameter A is a valid identification of X. If you use the estimator A-hat and its standard error, you can construct a valid confidence interval for X (ie such that with resampling, 95% of the intervals will capture X)

However, my claim is that if you accept bias, you research algorithm is equivalent to this: If there exists a statistical parameter B such that B-hat has smaller standard error than A-hat, then you are under certain circumstances willing to substitute B for A.

I am willing to bet that if you use this algorithm, the confidence interval for X is going to be invalid; the resampling interpretation is going to be false. This is a problem because in this weird non-Bayesian inductive process, the resampling interpretation is the justification for using the variance as a measure of the strength of the evidence

Anders:

Your goal of zero bias is admirable but, in almost all practical settings, impossible. People don’t respond to surveys, they drop out of studies, they don’t follow the assigned treatment, there’s systematic measurement error, the population changes between when the study is performed and when the results are going to be used, etc etc etc. So it’s all about modeling, it’s all about doing the best we can. We need to adjust adjust adjust adjust. And . . . an adjustment with low bias but huge variance isn’t so helpful!

Imagine you are doing some kind of study, and a simulation overlord or other higher being appears before you and says “I can make the standard error smaller by introducing bias”. You have two ways of responding: You can either respond that this is helpful, because you will have less uncertainty about the truth. Or you can respond that that will not be helpful, because it will make your beliefs stronger than what is justified by the data. I argue that the honest approach is to avoid the bias, that anything else will lead to false beliefs.

I am not so much arguing for “always zero bias” as saying that, if you are willing to accept bias in your study, you have to make a very transparent and rigorous argument that lays out exactly what assumptions you are making. This has to be done in terms of your beliefs about the data generating mechanism (the causal graph) for your particular study, NOT in terms of the statistical properties of the estimator. This is difficult and relies on strong assumptions, so always avoiding bias seems like a very good heuristic for applied statisticians.

I think this discussion is confusing because most other commenters have been trained to use the statistical definition of bias. When we are talking about attempts at causal inference such as regression discontinuity, this is simply not the type of bias we are worried about. Perhaps causal bias should have a different name, but we are stuck with an imprecise and confusing language.

Anders, what do you exactly mean by the “causal inference definition of bias”? I have never seen an article or textbook on causal inference define the term bias any differently than the statistical definition of the term. When they say bias they mean bias (i.e. equal the estimand over repeated random sampling/assignment), regardless of whether the estimand is a causal quantity or not.

Anonymous:

I agree that the distinction between these classes of bias is insufficiently discussed in textbooks. Also, a majority of the academic literature is confused about the distinction.

Like I’ve said previously on this blog, imagine you have a data generating mechanism that creates a joint distribution of observed variables. Then you sample from the joint distribution and get data. Statistics allows you to use the sample to learn about the joint distribution. Causal inference allows you to use the joint distribution to learn about the data generating mechanism. Each of these steps could potentially have bias, but the statistical definition only captures those biases that occur during the statistical step.

When I use the term “Causal inference definition of bias” I mean that the parameter you are estimating is not equal to the causal effect. You can still estimate it without statistical bias, the problem is that the parameter itself is junk.

If you want to convince yourself that this distinction is real and that there are types of bias that cannot captured by the statistical definition of bias, think about confounding. If you are interested in the effect of B on A, you could try to estimate Pr(A|B) and compare it to Pr(A|Not B). You can easily find a consistent estimator for the joint distribution of A and B. In other words, your estimators are statistically unbiased (at least asymptotically), yet if there are common causes of A and B you still have bias for the causal effect.

The entire point of causal graphs is to give you a transparent language for reasoning about this type of bias.

Some references:

Jamie Robins and Sander Greenland’s 1986 article “Identifiability, exchangeability and epidemiological confounding” at http://www.tc.umn.edu/~alonso/Greenland_IJE_1986.pdf

Jamie Robins and Miguel Hernan’s textbook on causal inference at http://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/ , particularly chapter 10

Judea Pearl’s textbook on Causal Inference (which exclusively deals with issues of identification, ie the relationship between the data generating mechanism and the joint distribution)

If you cash out “actual uncertainty” as root mean square error, then reducing actual uncertainty is

exactlywhat is enabled by allowing bias in order to reduce the standard error.Corey: sure, one might perhaps call MSE

actualuncertainty, but then Anders’ comment wouldn’t make any sense at all.My guess: if the only inference we’re allowed is to give the posterior for the “thing” of interest under a single prior, it would make

somesense to say that bias-variance tradeoffs are irrelevant, broken epistemology, etc; one should just give the posterior – under those rules it contains all the uncertainty that could be relevant.But maybe something else was meant. Hope we find out.

exactly. What’s more “actual”… the root mean square error as it is now, or an imaginary sequence of infinite perfectly replicated experiments?

Take a look to standard textbooks from the eighties, eg.g., The theory and practice of econometrics by Judge et al., and you can see how much emphasis received Ridge estimators, sacrificing bias by variance reduction to improve the mean square error. Econometricians did as the rest of statisticians after the impressive work by Stein..

The limits of finite sample sets should imply some missing variables. Mainly the expectation function will of necessity be a rational due to finiteness, and combinatorics alone means the extreme powers should be much less precise, the combinations needed to keep them stable are simply not there. Just thinking on the fly.

Just a bit more thought. Not even getting esotoric, but just doing precision analysis like a computer nerd. One example mentioned had 22,000 students and 44,000 students. That is 14 and 16 bits, that is all you get, the data supports no more. Then you make a polynomial of sixth order, expect to get at least three bits of precision per weighting, and you have run out of precision. Then the problem seems more severe when the samples have to have some reasonable dispersion, meaning some require ‘bandwidth’ to have some smooth, bounded spectrum. I thought I saw a test like this on one of Dave Guiles blogs, but it may have been different.

Andrew:

Going through your slides, the problems with the papers you use to make the point are deeper and wider to pin the blame on unbiasedness.

I got the feeling that the examples didn’t exactly bolster the specific point about unbiasedness but rather sampled a variety of problems.

Rahul:

You’re wrong on that. But to see that you’d have to see the talk. I follow recommended practice by keeping the slides simple and then saying a lot of words that are not on the slides. I explain the connections to unbiasedness in my speech, but there’s no way for someone to see this from the slides.

I think you only talk with a subset of econometricians, specifically the ones that have a very narrow view of “real” econometrics. Frank Diebold’s comments on this are a good summary of that camp’s perspective:

“All told, Mostly Harmless Econometrics: An Empiricist’s Companion is neither “mostly harmless” nor an “empiricist’s companion.” Rather, it’s a companion for a highly-specialized group of applied non-structural micro-econometricians hoping to estimate causal effects using non-experimental data and largely-static, linear, regression-based methods. It’s a novel treatment of that sub-sub-sub-area of applied econometrics, but pretending to be anything more is most definitely harmful, particularly to students, who have no way to recognize the charade as a charade.”

http://www.fxdiebold.blogspot.com/2015/01/mostly-harmless-econometrics.html

I totally agree with this.

Could you guys please expand on this? Since I’m a political scientist, I took Mostly Harmless as a typicial view of econometricians! Is There any applied text book (I don’t think Greene is really applied, is it?) that would be more representative?

The NBER summer institute econometrics lectures give some scope: http://www.nber.org/SI_econometrics_lectures.html

so do the discussion papers for the “… taking the Con out of Econometrics” JEP paper: https://www.aeaweb.org/articles.php?doi=10.1257/jep.24.2

Those aren’t very satisfying replies, though, because there isn’t a great single overview of different distinct applied research areas. For everything except the program evaluation literature, the relevant economic theory has a huge effect on the estimation tools (sometimes explicitly, sometimes implicitly).

I really think that unbiasedness is not very important in some occasions.

Even unbiased variance is not the best estimator when the purpose of

the estimation is prediction. “Third variance”, which I found a few years ago, is

better than unbiased variance in terms of prediction.

The detail of the “third variance” is described in my paper:

A Revision of AIC for Normal Error Models

Open Journal of Statistics Vol.2 No.3, July 6, 2012

http://www.scirp.org/journal/PaperInformation.aspx?PaperID=20651