Michael Spagat notifies me that his article criticizing the 2006 study of Burnham, Lafta, Doocy and Roberts has just been published. The Burnham et al. paper (also called, to my irritation (see the last item here), “the Lancet survey”) used a cluster sample to estimate the number of deaths in Iraq in the three years following the 2003 invasion. In his newly-published paper, Spagat writes:
[The Spagat article] presents some evidence suggesting ethical violations to the survey’s respondents including endangerment, privacy breaches and violations in obtaining informed consent. Breaches of minimal disclosure standards examined include non-disclosure of the survey’s questionnaire, data-entry form, data matching anonymised interviewer identifications with households and sample design. The paper also presents some evidence relating to data fabrication and falsification, which falls into nine broad categories. This evidence suggests that this survey cannot be considered a reliable or valid contribution towards knowledge about the extent of mortality in Iraq since 2003.
There’s also this killer “editor’s note”:
The authors of the Lancet II Study were given the opportunity to reply to this article. No reply has been forthcoming.
Now on to the background:
More than six-and-a-half years have elapsed since the US-led invasion of Iraq in late March 2003. The human losses suffered by the Iraqi people during this period have been staggering. It is clear that there have been many tens of thousands of violent deaths in Iraq since the invasion. . . . The Iraq Family Health Survey Study Group (2008a), a recent survey published in the New England Journal of Medicine, estimated 151,000 violent deaths of Iraqi civilians and combatants from the beginning of the invasion until the middle of 2006. There have also been large numbers of serious injuries, kidnappings, displacements and other affronts to human security.
Burnham et al. (2006a), a widely cited household cluster survey, estimated that Iraq had suffered approximately 601,000 violent deaths, namely four times as many as the IFHS estimate, during almost precisely the same period as covered by the IFHS study. The L2 data are also discrepant from data provided by a range of other reliable sources, most of which are broadly consistent with one another. Nonetheless, there remains a widespread belief in some public and professional circles that the L2 estimate may be closer to reality than the IFHS estimate.
But Spagat says no; he suggests “the possibility of data fabrication and falsification.” Also some contradictory descriptions of sampling methods, which are interesting enough that I will copy them here (it’s from pages 11-12 of Spagat’s article):
The L2 authors [Burnham et al.] have often dismissed the possibility of sampling bias by stating that they did not actually follow the sampling procedures that they claimed to have followed in their Lancet publication. For example, Burnham and Roberts (2006a) write that they had removed the following sentence from their description of their sampling methodology at the suggestion of peer reviewers and the editorial staff at the Lancet:
As far as selection of the start houses, in areas where there were residential streets that did not cross the main avenues in the area selected, these were included in the random street selection process, in an effort to reduce the selection bias that more busy streets would have. (Burnham and Roberts, 2006a)
Thus [according to Spagat], this part of the description of sampling methodology should have read:
The third stage consisted of random selection of a main street within the administrative unit from a list of all main streets. A residential street was then randomly selected from a list of residential streets crossing the main street. As far as selection of the start houses, in areas where there were residential streets that did not cross the main avenues in the area selected, these were included in the random street selection process, in an effort to reduce the selection bias that more busy streets would have. (Original text from Burnham et al., 2008, with new text italicised)
Combining this with Gilbert Burnham’s New Scientist interview already quoted (Biever, 2007) would imply that at each location:
(1) Field teams wrote names of main streets on pieces of paper and selected one street at random.
(2) The field teams then walked down this street writing down names of cross streets on pieces of paper and selected one of these at random.
(3) The field teams then became aware of all other streets in the area that did not cross the main avenues and may have selected one of these instead of one of the cross streets written on pieces of paper. This wide selection was done according to an undisclosed procedure.
The Biever (2007) description of Burnham does outline a sampling procedure that could have been followed and is broadly consistent with the published methodology. If other types of streets, beyond those that would be covered by the published methodology, were included in the sampling procedures then the authors need to specify how these streets were included. More fundamentally, how did the field teams discover the existence of such streets that could not be seen by walking down principal streets as described by Burnham in Biever (2007)?
The L2 field teams would not have brought detailed street maps with them into each selected area or else it would not have been necessary to walk down selected principal streets writing down names of surrounding streets on pieces of paper. We can also rule out the possibility that the teams completely canvassed entire neighbourhoods and built up detailed street maps from scratch in each location. Developing such detailed street maps would have been very time consuming and the L2 field teams had to follow an extremely compressed schedule that required them to perform 40 interviews in a day (Hicks, 2006).
In Giles (2007), an article in Nature, Burnham and Roberts suggested one possible explanation on how the field teams had managed to augment their street lists beyond streets that could be seen by walking down a main street, but this suggestion was rejected by an L2 field team member interviewed by Nature:
But again, details are unclear. Roberts and Gilbert Burnham, also at Johns Hopkins, say local people were asked to identify pockets of homes away from the centre; the Iraqi interviewer says the team never worked with locals on this issue. (Giles, 2007)
Even if locals had identified such ‘pockets of homes away from the centre’ the authors still would have to specify how these were included in the randomisation procedures. Indeed, involving local residents in selecting the streets to be sampled would seem to be at odds with the random selection of households. Locals could, for example, lead the survey teams to particularly violent areas.
Burnham and Roberts have induced further confusion about their sample design by issuing a series of contradictory statements.
The sites were selected entirely at random, so all households had an equal chance of being included. (Burnham et al., 2006b, emphasis added)
Our study team worked very hard to ensure that our sample households were selected at random. We set up rigorous guidelines and methods so that any street block within our chosen village had an equal chance of being selected. (Burnham and Roberts, 2006b, emphasis added)
. . . we had an equal chance of picking a main street as a back street. (The National Interest, 2006)
These statements contradict each other and the methodology published in the Lancet. Some streets are much longer than others. Some streets are much more densely populated than others. Such varied units cannot all have equal probability of selection. If, for example, every street block had an equal chance of selection then households on densely populated street blocks would have lower selection probabilities than households on a sparsely populated street block. If main streets are more densely populated on average than are back streets and main streets and back streets have equal selection probabilities then households on main streets would have lower selection probabilities than households on back streets.
Spagat has clearly done a lot of work here and I haven’t read his paper in detail, nor have I carefully studied the original articles by Burnham et al. Also, some of Spagat’s criticisms seem less convincing than others. When I saw the graph on page 16 (in which three points fall suspiciously close to a straight line, suggesting at the very least some Mendel’s-assistant-style anticipatory data adjustment), I wondered whether these were just three of the possible points that could be considered. Investigative blogger Tim Lambert made this point last year, and having seen Lambert’s post, I don’t see Spagat’s page 16 graph as being so convincing.
In any case, I’m happy to see a high-profile survey subjected to this sort of scrutiny. When I looked at the Burnham paper a few years ago, I search without success for details of their sampling and estimation procedures. But, as I wrote in response to Spagat in 2008, it’s surprisingly difficult for people to write exactly what they did (see also this discussion with Phil and others).
If Burnham et al. are giving contradictory descriptions of their sampling methods, this could be evidence of fraud, or evidence that they don’t fully understand cluster sampling (which actually is a complicated topic that lots of researchers have trouble with), or evidence that their sampling was a bit of a mess (which happens to the best of us) and that they didn’t do a great job explaining it. I hope the discussion surrounding Burnham, Spagat, etc., will push future survey researchers to describe their methods more explicitly.
Beyond all this, it can be difficult to get people to respond to a survey. Countries such as the U.S. are saturated with junk polls (recall my recent rant on the topic) to the extent that hanging up on all survey interviewers is probably the optimal strategy for most.
Blame the statistics teachers
To some extent, the culprits here are not just Burnham, Lafta, Doocy and Roberts, but also statistics education in general. In our introductory courses We do cover these topics in more detail in our classes on sample surveys, but in the statistics department that I’ve been involved in, very few students tend to take such classes, and many statistics faculty–even those who should really know better–are unaware of the practical and conceptual difficulties of sampling of human populations. A lot more interesting than asymptotic theory, in my opinion, but it occupies a pretty small place in the statistics curriculum. As a result, you have people going out in the field, just winging it, then struggling later to define and justify what they’ve done.
P.S. Les Roberts, one of the authors of the disputed study, apparently teaches at Columbia! I don’t recall ever having met him, though. Perhaps we (the Applied Statistics Center and Center for the Study of Development Strategies) can throw a miniconference on the topic of Statistical Sampling in Developing Countries and invite Roberts to participate.