A couple people pointed me to this recent econometrics paper, which begins:

In the single IV model, current practice relies on the first-stage F exceed- ing some threshold (e.g., 10) as a criterion for trusting t-ratio inferences, even though this yields an anti-conservative test. We show that a true 5 percent test instead requires an F greater than 104.7. Maintaining 10 as a threshold requires replacing the critical value 1.96 with 3.43. We re-examine 57 AER papers and find that corrected inference causes half of the initially presumed statistically significant results to be insignificant. We introduce a more powerful test, the tF procedure, which provides F-dependent adjusted t-ratio critical values.

I don’t like this sort of thing as it seems to be focusing on binary decisions in a way that seems inappropriate to me. To me, this sort of paper is the rough equivalent of some sort of Talmudic argument about whether God can dig a ditch so wide he can’t jump across it. I just don’t buy into the premise so it’s hard for me to go further. But maybe the paper is useful for people who work within that framework.

How do you quantify if your instrument is too weak to use in a 2 stage model?

The paper argues though that the AER papers under review should have reported Anderson-Rubin confidence set (for one dimensional IV, this would give rather “sharp” inference), in addition or instead of doing tF thing. AR-based inference won’t lead to “binary” inference or thinking. If an instrument is strong, AR CS will adapt to be an optimal CR and if it isn’t, it will provide a valid confidence region. There is probably a sort of Bayesian take on this — we can view AR criterion as negative log likelihood for empirical IV moments (motivated by approximate normality). The CS based on AR inversion is a credible region obtained using flat prior.

Victor:

Thanks. What you say makes sense, in that, conditional on these methods being used, it makes sense for them to be used well, also I could imagine some of these ideas working their way into Bayesian IV. My take on Bayesian IV is that we need to think more seriously about the possibility of zero or negative effects on the intermediate variable. But I haven’t done such analyses myself, I’ve just talked about it.

Personally I think this is quite valuable as I have reviewed a fair number of papers that take the “F>threshold ; do 2SLS” approach. The lit supporting the thresholds is fairly thin, and this will end up in review comments asking for alternative procedures or permutation based checks on TSLS coverage.