Thomas Basbøll points to this ten-year-old article from Anne-Wil Harzing on the consequences of sloppy citations. Harzing tells the story of an unsupported claim that is contradicted by published data but has been presented as fact in a particular area of the academic literature. She writes that “high expatriate failure rates [with “expatriate failure” defined as “the expatriate returning home before his/her contractual period of employment abroad expires”] were in fact a myth created by massive misquotations and careless copying of references.” Many papers claimed an expatriate failure rate of 25-40% (according to Harzing, this is much higher than the actual rate as estimated from empirical data), with this overly-high rate supported by a complicated link of references leading to . . . no real data.
Hartzing reports the following published claims:
Harvey (1996: 103): `The rate of failure of expatriate managers relocating overseas from United States based MNCs has been estimated to range between 25±40 per cent (Tung, 1982, 1988; Mendenhall and Oddou, 1985; Gray 1991; Wedersphan, 1992; Solomon, 1994; Dowling, Schuler and Welch, 1994; Swaak, 1995).’
Shay and Bruce (1997: 30): `Cross-industry studies have estimated US expatriate failure, defined as premature return from an overseas assignment, at between 25±40 per cent for developed countries (Baker and Ivancevich, 1971; Tung, 1981).’
Ashamalla (1998: 54): `According to a number of recent [emphasis added] studies, the rate of failure among American expatriates ranges from 25 to 40 per cent depending on the location of the assignment (Dumaine, Fortune, 1995; McDonald, 1993; Ralston et al., 1995).’
Hartzing writes that, despite the profusion of references which appear to show multiple confirmations, these claims all comes from a single publication from 1979 which itself gives no source for its numbers.
If you believe Hartzing on this (and I see no reason not to), this is not about one or two sloppy researchers; rather, it seems to be general practice for people to thrown in citations without reading the original articles, thus creating a statistical problem of increasing the apparent N by treating multiple instances of the same claim as if they were independent pieces of supporting evidence.
I never thought of this as a statistical problem before! Thanks.
Being a statistican who ended up in public health, I can only confirm those kind of “evidence” creation. In one case for the prevalence of a not too rare disease (I can’t give you a percentage because nobody knows the true number), the scientific “authority” for this disease basically published every other year an article. In each article giving the same range and citing the previous article. Somewhere in the eighties a second strand of literature developed, giving a very similar value. It turned out that the second strand of literature originated from the first and just took one value within the cited range. In the very end, the whole literature could be traced back to a study in the 1950ies of questionable methodology. By not citing the original source a fake timeliness of the data was created, and in my point of view very delicately.
I thought this was something drilled into most people in grad school: only cite data from its primary source. I guess the explosion of meta-analysis makes that harder in some fields. It leads to other hilarities (like continued citation of a long-improved estimate), but at least you know where it’s coming from.
Neat (though unfortunately I have a few time run down nested references to find a miss-quote).
Here is a nasty example where the SAS programmer miss-quoted Greenacre’s publications, that there were two adjustments (there was only ever one) and identified the non-existent one as the preferred adjustment (error was like confusing sigma with sigma squared). Numerous articles and books referenced that preference and as it added non-existent structure in the visualization there is nice collection of esteemed academics making sense of artifacts.
Can anyone discern that kind of error in the note and SAS bug fix (which I must say, was extremely prompt and immediately implemented). For various reasons it took me almost two years to actually find out about the error.
Looking forward, this also reminds me when (some) academics simply made up their citations as it used to be very hard to verify. The dean at my university at the time bought an early version of a citation tracking software package and had much fun listening to those whose claimed 100+ citations why this service had only found 5 (one real case anyways).
So perhaps a bot to automatically find and scan for key phrases in all references in submitted papers before they are sent out for review!
So this can have a serious effect on the journal impact factor?
I have no problem with the concept that some authors have not read a cited paper. In some cases I am not even sure that they read the abstract. I clearly remember tracking down one reference from a peer reviewed study on the effects of bicycle helments only to find that the reference was to things like elbow pads in down-hill skiing.
While not peer reviewed papers — neither mine nor the one critiqued–this paper points out that there some sloppy referncing and use of information. There were errors, omissions or inaccuracies in 10 of 22 references
A Critique of the Canadian Academy of Sport Medicine
Position Paper: Mandatory Use of Bicycle Helmets
Blast it, I keep messing up the links. Raw link is http://www.vehicularcyclist.com/casm.doc
“False facts are highly injurious to the progress of science, for they often endure long; but false views, if supported by some evidence, do little harm, for every one takes a salutary pleasure in proving their falseness.”
C.R. Darwin, The descent of man
I can recall another instance of this in the biomedical literature–the claim that “only 50% of cases of coronary heart disease are attributable to established risk factors.” (meaning smoking, cholesterol, hypertension). Magnus and Beaglehole wrote a nice paper in Archives of Internal Medicine (2001;161:2657-60, available at http://archinte.ama-assn.org/cgi/content/full/161/22/2657) describing their failure to find any empirical evidence for this claim, despite its being consistently cited in the literature. A quote from their paper parallels that of Hartzing above: “The only-50% claim goes back at least to 1975, and we could find no reference or published source that plausibly supports it with empirical data. The claim consists of simple assertions with no supporting data or rationale, or assertions made with inappropriate citations. In both cases, the claim has been secondarily quoted, perpetuating the myth.” [I removed the citations from the quote for clarity].
What I really love about that article is that she presents 12 rules for good referencing and shows how the received sense of the expat failure rate is maintained by breaking every last one of them.
Thomas Basbøll seems to have a nice supply of interesting things in his files, and I hope he will be an occasional feature here.
Reminds me of this fabulous bit of science, How citation distortions create unfounded authority: analysis of a citation network, by Steven Greenberg, based on a Google-style graph-theoretical analysis of citations.
From the abstract:
A complete citation network was constructed from all PubMed indexed English literature papers addressing the belief that β amyloid, a protein accumulated in the brain in Alzheimer’s disease, is produced by and injures skeletal muscle of patients with inclusion body myositis. Social network theory and graph theory were used to analyse this network…..
Unfounded authority was established by citation bias against papers that refuted or weakened the belief; amplification, the marked expansion of the belief system by papers presenting no data addressing it; and forms of invention such as the conversion of hypothesis into fact through citation alone.
I love the idea of letting Google do our systematic reviews!
I can’t resist, so at the risk of reminding Andrew of a certain statistician, Strange Scholarship in the Wegman Report devoted Appendix W.8 (pp.165-186) to analysis of the bizarre scholarship, in which 40 of 80 references were not actually ever cited. The real references often contradicted the claims and some of them cannot have been read. My favorite was (p.180):
‘Valentine, Tom (1987) ―Magnetics may hold key to ozone layer problems,‖ Magnets, 2 (1) 18-26.
This uncited reference alone raises a serious question of basic scholarly competence77. It is utterly bizarre, especially in a report criticizing the quality of review elsewhere. I could not find an online copy, but a 1987 ozone article is at best irrelevant bibliography-padding.
―MAGNETS In Your Future‖ was an obscure fringe-science magazine, for which Valentine wrote articles and later served as Editor. He had a long history of writing on fuel-less engines, psychic surgery (books, see Amazon) and conspiracy theories, for a tabloid, The National Tattler. His Bio states of that work:
“(Miracle editor—had to come up with a miracle a week!)”
Some examples and background are:
web.archive.org/web/20050208000510/tomvalentine.com/html/about_tom1.html his Biography
http://www.rexresearch.com/evgray/1gray.htm#1 ―Man Creates Engine That Consumes No Fuel…‖
http://www.rexresearch.com/elxgnx/elxgenx.htm ―electrogenic agriculture‖
http://www.rexresearch.com/nemes/1nemes.htm#magnets invention suppression
His later talk show often promoted ―black helicopters‖ conspiracies:
For more discussion, and credits to various people, see :
The WP should be asked if someone else gave them this, if they found it themselves, if anyone actually read it, why it is mentioned at all, and why it is labeled an academic paper along with papers in Science or Nature.
There is a nice paper on this sort of distortion by Vincente and Brewer:
Vicente, K.J. & Brewer, W.F. (1993). Reconstructive remembering of the scientific literature. Cognition, 46, 101–128.
I wrote a short piece (partly) about it in the context of statistical rules/myths: