Skip to content

More on significance testing in economics

After I posted this discussion of articles by McCloskey, Ziliak, Hoover, and Siegler, I received several interesting comments, which I’ll address below. The main point I want to make is that the underlying problem–inference for small effects–is hard, and this is what drives much of the struggles with statistically significance. See here for more discussion of this point.

Statisticians and economists not talking to each other

Scott Cunningham wrote, surprised that I’d not heard of these papers before:

I wasn’t expecting anything like what you wrote. I live in a bubble, and just assumed you were familiar with the papers, because in grad school, whenever I presented results and said something was significant (meaning statistically significant), I would *always* get someone else responding, “but is it _economically_ significant” meaning, at minimum, is the result basically a very precisely measured no effect? The McCloskey/Ziliak stuff was constantly being thrown at you by the less quantitatively inclined people (that set is growing smaller all the time), and I forgot for a moment that those papers probably didn’t generate much interest outside economics.

I live in a bubble too, just a different bubble than Scott’s. He and others might be interested in this article by Dave Krantz on the null hypothesis testing controversy in psychology. Dave begins his article with:

This article began as a review of a recent book, What If There Were No Significance Tests? . . . The book was edited and written by psychologists, and its title was well designed to be shocking to most psychologists. The difficulty in reviewing it for [statisticians] is that the issue debated may seem rather trivial to many statisticians. The very existence of two divergent groups of experts, one group who view this issue as vitally important and one who might regard it as trivial, seemed to me an important aspect of modern statistical practice.

As noted above, I don’t think the issue is trivial, but it is true that I can’t imagine an article such as McCloskey and Ziliak’s appearing in a statistical journal.

Rational addiction

Scott also writes,

BTW, the rational addiction literature is a reference to Gary Becker and Kevin Murphy’s research program that applies price theory to seemingly “non-market phenomenon”, such as addiction. Rational choice would seem to break down as a useful methodology when applied to something like addiction. Becker and Murphy have a seminal paper on this from 1988. It’s been an influential paper in the area of health economics, as numerous papers have followed it by estimating various price elasticities of demand, as well as to test the more general theory regarding the theory.

My reply to this: Yeah, I figured as much. It’s probably a great theory. But, ya know what? If Becker and Murphy want to get credit for being bold, transgressive, counterintuitive, etc etc., the flip side is that they have to expect outsiders like me to think their theory is pretty silly. As I noted in my previous entry, there’s certainly rationality within the context of addiction (e.g., wanting to get a good price on cigarettes), but “rational addiction” seems to miss the point. Hey, I’m sure I’m missing the key issue here, but, again, it’s my privilege as a “civilian” to take what seems a more commonsensical position here and leave the counterintuitive pyrotechnics to the professional economists.

The paradigmatic example in economics is program evaluation?

Mark Thoma “disagreed mildly” with my claim that the null hypothesis of zero coefficient is essentially always false. Mark wrote:

I don’t view the “paradigmatic example in economics” to be program evaluation. We do some of that, but much of what econometricians do is test the validity of alternative theories and in those contexts the hypothesis of a zero coefficient can make sense. For example, New Classical models imply that expected changes in the money supply should not impact real variables. Thus, a test of a zero coefficient on expected money in an equation with a real activity as the dependent variable is a test of the validity of the New Classical model’s prediction. These tests requires sharp distinctions between models, i.e. to find variables that can impact other variables in one theory but not another, and that’s something we try hard to find, but when such sharp distinctions exist I believe classical hypothesis tests have something useful to contribute.

Hmmm . . . . I’ll certainly defer to Mark on what is or is not the paradigmatic example in economics. I can believe that theory testing is more central. I’ll also agree that important theories do have certain coefficients set to zero. I doubt, however, that in actual economic data, such coefficients really would be zero (or, to be more precise, that coefficient estimates would asymptote to zero as sample sizes increase). To wander completely out of my zone of competence and comment on Mark’s money supply example: I’m assuming this is somewhat of an equilibrium theory, and short-term fluctuations in expected money supply could affect individual actors in the economy, which could then create short-term outcomes, which would show up in the data in some way (and then maybe, in good “normal science” fashion, be explained in a reasonable way to preserve the basic New Classical model). What I’m saying is: in the statistics, I don’t think you’d really be seeing zero, and I don’t think the Type 1 / Type 2 error framework is relevant.

Getting better? And a digression on generic seminar questions

Justin Wolfers writes that “the meaningless statements of statistical rather than economic significance are declining.” Yeah, I think things must be getting better. Many years ago, Gary told me that his generic question to ask during seminars was, “What are your standard errors.” Apparently in poli sci, that used to stop most speakers in their tracks. We’ve now become much more sophisticated–in a good way, I think. (By the way, it’s good to have a few of these generic questions stored up, in case you fall asleep or weren’t paying attention during the talk. My generic questions include: “Could you simulate some data from your fitted model and see if they look like your observed data?” and “How many data points would you have to remove for your effect estimate to go away?”

Justin uses a lot of bold type in his blog entries. What’s with that? Maybe a good idea? I use bold for section headings, but he uses them all over the place.

Sports examples

Also, since I’m responding to Justin, let me comment on his use of sports as examples in his classes. I do this too–heck, I even wrote a paper on golf putting, and I’ve never even played macro-golf–but, as people have noted on occasion, you have to be careful with such examples because they exclude many people who aren’t interested in the topic. (And, unlike examples in biology, or economics, or political science, it’s harder to make the case that it’s good for the students’ general education to become more familiar with the statistics of basketball or whatever.) So: keep the sports examples, but be inclusive.


  1. bccheah says:

    On rational addiction: Economists who adopt the rational utility maximizing model as a theoretical construct feel that they have to address the issue as to why someone would do something that is essentially bad for them. To this end they came up with the rational addiction model — which derives results that are obvious to a lot of people except economists. Not only that, they also had to come up with catchy phrase for this as well as for "marketing" purposes so they called it rational addiction. Economics is probably not the only field that loves its catchy phrases and terminology that only those on the "inside" can understand.
    Tim Harford has an entertaining piece on rational addiction:

  2. notsneaky says:

    I agree with bccheah. It's the "rational" in rational addiction that is making non-economists shake their heads but all it really means for economists is that the demand curves for addictive products are still downward sloping, more or less. So the theory of rational addiction is pretty much along your lines of "wanting to get a good price on your cigarettes". There's some implications and issues though that come out once you think it through. For example, if you try using taxes to lower smoking you won't see much immediate effect. But it can still be a successful long term policy to reduce smoking by preventing the creation of new addicts (teenagers smoking). So if you're going to look for effect of taxes on smoking rate the theory tells you that you better put in some serious lags there.
    Again, this just reflects the fact that economists use the word "rational" in a different (and precisely defined) sense than the general public.