My Columbia political science colleague shares “What Has Been Learned from the Deworming Replications: A Nonpartisan View”:
Last month there was another battle in a dispute between economists and epidemiologists over the merits of mass deworming.1 In brief, economists claim there is clear evidence that cheap deworming interventions have large effects on welfare via increased education and ultimately job opportunities. It’s a best buy development intervention. Epidemiologists claim that although worms are widespread and can cause illnesses sometimes, the evidence of important links to health is weak and knock-on effects of deworming to education seem implausible. . . .
So. Deworming: good for educational outcomes or not?
You’ll have to click through to read the details, but here’s Macartan’s quick summary:
The conclusions that I take away though are that (a) the magnitude and significance of spillover effects are in doubt because of the measurement issues and the inference issues; (b) the inferences on the main effects are also in doubt because of the problems with identification and explanation. Neither of the main claims is demonstrably incorrect, but there are good grounds to doubt both of them.
What about policy? Macartan continues:
A number of commentators have argued that the policy implications are more or less unchanged. This includes organizations that focus specifically on the evidence base for policy (such as CGD and GiveWell).
Perhaps the most important point of confusion is what policy conclusions this discussion could affect. Many are defending deworming for non-educational reasons. But the discussion of the MK [Miguel and Kremer] paper really only matters for the education motivation. And perhaps primarily for the short-term school attendance motivation. Like much other literature in this area it finds only weak evidence for direct health benefits (beyond the strong evidence for the removal of worms). It also does not claim to find evidence on actual performance. Although many groups endorse deworming for health reasons, and rank it as a top priority, this, curiously, goes against the weight of evidence as summarized in the Cochrane reports at least. If the consensus for deworming for health reasons still stands it is not because of this paper.
Does the challenge to this paper weaken the case for deworming for educational reasons? I find it hard to see how it cannot.
I have a few comments of my own, not on deworming—I know nothing about that—but on some of the statistical points raised by Macartan’s post.
– The 800-pound gorilla in the room is opportunity cost, or cost-benefit analysis. As you say, who could be against de-worming kids? I’m reminded of Jeff Sachs’s argument that all of these sorts of interventions are worth doing, and that rather than trying so hard to rank the cost-effectiveness of different health and economic interventions, the rich countries should just kick in that 1% of GDP or whatever and do all of them. I’m not saying Sachs is necessarily right on this, I’m just saying that most of the discussion seems to be on traditional statistical grounds (Is there an effect? Is it statistically significant? Has it been proven beyond a reasonable doubt?) and the cost-benefit or opportunity cost calculations are implicit. Once or twice, cost-benefit calculations do get done, but not in a serious way. For example, Macartan points to a “60 to 1” benefit-to-cost ratio for deworming claimed by the Copenhagen Consensus, but apparently those guys just took the point estimate of effectiveness (which is a biased estimate, possibly hugely biased; see more on this below) and ran with it.
– Macartan talks about multiple comparisons, which is fine (though I’d prefer hierarchical modeling rather than classical corrections; see here and here. Macartan mentions the statistical significance filter: Statistically significant estimates tend to overestimate the magnitude of true effects (we call it the type M error or exaggeration factor here). This can be a big deal, especially once things get to the decision stage.
– Macartan mentions development economist Paul Gertler. I’ve only encountered his work once, and it was a case where he hyped and exaggerated (unintentionally, I’m sure) an effect size. I contacted him about it and asked him if he was concerned about the statistical significance filter, and he did not reply. Apparently he was happy reporting an overestimate. It was an early-childhood intervention experiment in Jamaica. Again, who could object to helping poor kids?
– I share Macartan’s skepticism about the spillovers. One problem here is that researchers have an incentive to make a “discovery.” De-worming helps kids, ok, that’s fine. But a spillover effect, that’s news. But the paradox is that these surprising findings are more subject to the statistical significance filter. The headline clams can be the biggest overestimates. And this is completely consistent with the calculation in section 3.4.1 of Macartan’s report. It is similar to the calculation that Eric Loken and I did regarding the notorious claim that women in a certain part of their monthly cycle were more likely to wear red. The researchers were proud of making this discovery with such a noisy measuring instrument, but if you back out how large the effect would’ve had to be, for the claimed effect to show in the population, it would have to be unrealistically huge. And of course this happened with that horrible LaCour study—the claimed effects in the aggregate implied huge effects in the subgroup of the population who would’ve been affected by the treatment.
– I don’t like Macartan’s section 4.2, “Can we be a bit more Bayesian?” I guess I’d like him to be a bit more Bayesian. In particular, I really don’t like the sort of binary thinking in which deworming works or doesn’t work for some purpose. To me, the concern is not that deworming or whatever is a “dud” but rather that it is not as effective as the published record might suggest. For a Bayesian decision analysis I’d prefer to do it straight, with costs, benefits, and a continuous parameter that represents the effectiveness of the treatment. Even setting the decision analysis aside, you can do Bayesian inference: just say there’s a true (population, average) casual effect and that you have a prior for it. Then it’s simple inference, an inverse-variance weighted average of the data and the prior information, no need for tricky probability formulas.
Finally, I appreciate the way that, in his report, Macartan moves back and forth between the details and the big questions. These connections are a key part of any methodological analysis.