Laurent Belsie writes:

An economist formerly with the Consumer Financial Protection Bureau wrote a paper on whether a move away from forced arbitration would cost credit card companies money. He found that the results are statistically insignificant at the 95 percent (and 90 percent) confidence level.

But the Office of the Comptroller of the Currency used his figures to argue that although statistically insignificant at the 90 percent level, “an 88 percent chance of an increase of some amount and, for example, a 56 percent chance that the increase is at least 3 percentage points, is economically significant because the average consumer faces the risk of a substantial rise in the cost of their credit cards.”

The economist tells me it’s a statistical no-no to draw those inferences and he references your paper with John Carlin, “Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors,” Perspectives on Psychological Science, 2014.

My question is: Is that statistical mistake a really subtle one and a common faux pas among economists – or should the OCC know better? If a lobbying group or a congressman made a rookie statistical mistake, I wouldn’t be surprised. But a federal agency?

The two papers in question are here and here.

My reply:

I don’t think it’s appropriate to talk about that 88 percent etc. for reasons discussed in pages 71-72 of this paper.

I can’t comment on the economics or the data, but if it’s just a question of summarizing the regression coefficient, you can’t say much more than to just give the 95% conf interval (estimate +/- 2 standard errors) and go from there.

Regarding your question: Yes, this sort of mistake is subtle and can be made by many people, including statisticians and economists. It’s not a surprise to see even a trained professional making this sort of error.

The general problem I have with noninformatively-derived Bayesian probabilities is that they tend to be too strong.

Yes, but even a 1 in 10^500 chance is just too great a risk. ;-)

This is probably a dumb question, but after reading many of your blog posts, I was wondering if the smaller the sample size, the more informative and conservative the prior should be? For the example of multiple linear regression, if you had a small n study you would use more informative conservative priors like normal(0,1) for regression parameters and for a large study less informative normal(0,10). The more or less informative prior recommendations would be scaled based on conservative conjectures of effect and the sample size. This way smaller studies have a ‘governor’ on them, and any real effect would have to be rather strong. Is there some general recommendations like that? Or is it too specific case by case? It seems that it wouldn’t be any more approximate or subjective than the assumptions in a typical power analysis. Is it not possible to make general recommendations for a scale of priors for various sample sizes, given a plausible conservative effect size?

In general you want to use as much information as you’re comfortable with. So if you think normal(0,1) makes some sense, you probably should use it in both studies. On the other hand, it can be easier to convince people that “this thing is probably smaller in size than 10 or 20” than it is to convince people “this thing is probably smaller in size than 1 or 2”. In the higher-N study you can get away with the broader prior because the likelihood will dominate the posterior, and it’s less work to convince people your prior makes sense.

With the small N study you can’t afford to throw away the information you have that tells you normal(0,1) is ok.

A very interesting discussion. The issue ultimately turns on very human issues of innate distrust and the slipperiness of language.

Basically, a non-informative prior is your prior if you were born yesterday and all you know is this study. In that case, yes, the naïve statements by the economists are true. But most humans weren’t born yesterday and impose priors to guard against the myriad dishonesties and uncertainties in the world. (In this case, “dishonesties” = all the biases and shortcomings of academic research.) Humans innately need much more than a small amount of evidence that tips the scale one way or the other. We need something compelling to make us get up off our backsides and do something and flimsy evidence is not enough. Hence, humans innately apply conservative priors that require claimants to offer strong proof.

Another reason for conservative priors is that there is a fixed cost to taking any action and there are huge numbers of possible actions, so we need the benefits of any action to be clear and compelling. In this instance, there is a large fixed cost to crank up the regulatory machinery to move away from forced arbitration. Plus, if the regulators do this, they will probably forego doing something else which, hopefully, is worthwhile.

I think this is a basic flaw in the Trolley-Problem. Trolley problems assume away error and dishonesty. For instance, the Trolley Problem where you push a fat person off a bridge to stop a train assumes you know for certain that doing so will save lives. But how do you actually know that? If a stranger runs up to you on the bridge and shouts that information to you, are you really going to believe it? Of course not. What if you “know” you will save lives because you looked down the track and saw people you think will be killed by the trolley? How certain are you? Probably not very certain. Trolley’s rarely kill people. So the human prior is towards inaction when doing something so drastic. Humans naturally believe you need a damn good reason to go pushing people off bridges to their death. The Trolley Problem is a trick question because it asks you to make decisions assuming ridiculously high levels of certainty. Philosophers then go “Ah hah!” when people respond by applying moral principles based on a deep understanding of real-world uncertainty and dishonesty that the Trolley Problem assumes away.