Copying from Wikipedia but introducing an error in the process . . . how tacky is that??
I’ll discuss another minor outrage and then consider the more general question of what motivates researchers to plagiarize and otherwise break the rules of scholarship.
If you’re gonna steal from Wikipedia, remember to preserve formatting or you might end up embarrassing yourself
John Mashey pointed me to this:
From “Roadmap for optimization,” by Yasmin H. Said and Edward J. Wegman, from Wiley Interdisciplinary Reviews, Computational Statistics:
Klee and Minty3 developed a linear programming problem in which the polytope P is a distortion of a d-dimensional cube. In this case, the simplex method visits all 2d vertices before arriving at the optimal vertex. Thus the worst-case complexity for the simplex algorithm is exponential time.
From an old Wikipedia article on the simplex algorithm:
Klee and Minty gave an example of a linear programming problem in which the polytope P is a distortion of an n-dimensional cube. They showed that the simplex method as formulated by Dantzig visits all 2n vertices before arriving at the optimal vertex. This shows that the worst-case complexity of the algorithm is exponential time.
I see four possibilities here:
1. Wegman and Said wrote the Wikipedia article based on their existing (but at that time unpublished) paper. This seems unlikely given that they are not cited in the Wikipedia article.
2. The Wikipedia author stole from Wegman and Said’s article.
3. Wegman and Said stole from the the Wikipedia article or from some other similar source such as the article at wordiq.com.
4. Wikipedia and Wegman/Said stole from a common source.
I’m guessing it was 3. Why? For three reasons. First, Wegman and Said have a long track record of plagiarism from Wikipedia and elsewhere—and it’s standard procedure for plagiarists to make minor changes in the blocks of text that they copy without attribution. From the other direction, if the Wikipedia writer was going to steal, he or she could just steal verbatim—there’d be no reason to hide the copying.
Second, the above-linked Wikipedia edit predated the publication of the Wegman and Said article, so if the Wikipedia editor stole, he or she would’ve had to steal from an unpublished manuscript. Which doesn’t seem so likely, especially given that the Wegman and Said article is nothing like an authoritative source on the simplex algorithm.
The third reason why I suspect Wegman and Said of
stealing plagiarizing copying blocks of text without attribution (whether from Wikipedia or some other source) is that, as Mashey notes, they botched the math. Wikipedia correctly states that an n-dimensional cube has 2^n vertices. Wegman and Said think a cube has 2n vertices. And . . . here’s the kicker . . . the “2^n” in the Wikipedia article becomes “2n” if you cut and paste into plain text. (Go to the links and try it yourself.)
It’s also possible that Wegman and Said got the 2^d correct in their manuscript and then it was mistakenly changed to 2d by an ignorant copy editor. But, if that happened, it doesn’t explain the similarity to the Wikipedia article. It seems unlikely to me that the Wikipedian would steal from this obscure unpublished manuscript, make minor changes in the wording, correct a mistake, and then not bother to cite it.
Possibility 4 above also does not seem so likely. I guess it’s possible that the Wegman/Said and the Wikipedia editor stole the above paragraph from the same textbook (with Wegman and Said taking the extra step of garbling the 2^n). But, given the other instances of Wikipedia borrowing on Wegman’s part, it seems more plausible to me that they took the easy way out and copied from the Wik. After all, stealing from a textbook is more work—you can’t just cut and paste, you actually have to type!
2^n != 2n
This is not the first time that Wegman and his crew have introduced errors as a byproduct of apparent plagiarism. As I wrote in that earlier discussion:
Doing the right thing is easy, easy, easy, easy. All you have to write is something like, “Scholar X wrote a clear summary of topic Y. We paraphrase Scholar X’s summary as follows…”
The only bad thing about this is . . . maybe people who read this will realize you’re not much of an expert, and maybe they’ll ask Scholar X to write that expert report instead. But that’s the honest thing to do. That, or become an expert yourself.
Let me say it again: There’s not much mystery to plagiarism. If you take the work of person X and claim it as yours, you get credit for that work. A common defense of plagiarists is that the work being copied without attribution is not so important. But, if so, how much would it hurt to write, “Scholar X wrote a clear summary of topic Y. We paraphrase Scholar X’s summary as follows…”? The answer is: it could hurt a lot, because it could quickly become obvious that you didn’t do the work, and then the question arises, why should you be considered the expert? Why indeed?
But if you’re not careful, you end up looking like an idiot who can’t tell the difference between 2^n and 2n. That sort of thing can happen, especially if you publish in journals that don’t have serious peer review.
[Note to Drs. Wegman and Said: You can replace “2^n” by “2n” only if n=1 or 2. I checked by following the principles of statistical computation and making a graph in R: curve (2^x-2x, from=-2, to=5). I know it’s a pain to do superscripts in Word, but next time you should really put in the effort to do it right.]
Wiley Interdisciplinary Reviews, Computational Statistics
From the journal’s webpage:
Editors in Chief:
Edward J. Wegman, Bernard J. Dunn Professor of Data Sciences and Applied Statistics, George Mason University
Yasmin H. Said, Professor, George Mason University
David W. Scott, Noah Harding Professor of Statistics, Rice University
Editorial Advisory Board:
Jianqing Fan, Princeton University
Jerome H. Friedman, Stanford University
Michael Friendly, York University
Genshiro Kitagawa, Institute of Statistical Mathematics
Carlo N. Lauro, University of Naples “Federico II”
Jae C. Lee, Korea University
Xiao-Li Meng, Harvard University
James L. Rosenberger, Pennsylvania State University
Luke Tierney, University of Iowa
D. Michael Titterington, University of Glasgow
Antony Unwin, University of Augsburg
The editorial board has some big, big names in statistics (including two of my own coauthors!), and I don’t think any of them are complicit in the cheating that’s been going on. To clarify for the non-academics in the audience: being on an editorial advisory board doesn’t really mean anything—if you’re well known in some academic field, publishers will ask
you to do it, and it’s easier to say yes than to say no—but, still, I don’t think I would want to remain on the board of a journal where two-thirds of the editors in chief are serial plagiarists.
A negative social value
As far as I can see, Wegman’s cut-and-paste jobs have no redeeming social value. Actually, they have a negative value: they steal from others’ writing and introduce errors. Bruno Frey is writing interesting articles and just wasting the time of people who might encounter them in different places. Doris Kearns Goodwin can argue that she’s fashioning others’ material into more readable prose. Frank Fischer can claim that he’s adding necessary background material to otherwise insightful articles and books. Even Mark Hauser can make the argument, tenuous as it may be, that his scientific theories are true and that he is guilty of nothing more than a debatable interpretation of data. But what can Wegman claim? The world is a better place because he copied and pasted other people’s material into empty review articles, adding errors along the way?
Wikipedia already exists, it’s free, and everyone knows where it is. What is contributed by copying bits of Wikipedia and adding errors? The point of academic publishing should be to add to knowledge, not to copy and degrade existing work. Just disgraceful. It pollutes our scientific discourse. There are enough honestly crappy papers out there—I’m sure I’ve written a few. No need to deliberately add noise to the system.
And, yes, if you write a review article on a topic on which you’re not an expert, that qualifies as noise. Unless you’re Doris Kearns Goodwin and can do a rewrite that’s more readable than the original.
The question then arises: why did Wegman do it? It’s not like Wiley Interdisciplinary Reviews was paying him a $500,000 advance for the article. It’s not like he needed it for tenure. It didn’t serve any political goal, nor do I think it helped him get a research grant. Why cheat when there was nothing to gain?
To get at an answer to this question, I’ll consider some of the other cheating cases we have discussed recently.
Why did they do it?
Here are my guesses:
Bruno Frey is interested in his academic status which in his case is associated with having a high rate of publications in top journals. In his writings he’s expressed the belief that it’s common practice to manipulate the journal game, and so he does it too.
Doris Kearns Goodwin is proud of her writing ability and, when she found it convenient to take from somebody else’s book, did not want to acknowledge the plagiarism because she felt that a long quotation from a secondary source would add awkwardness to that passage of her book. (To keep the book smooth, her choice was to plagiarize or to rewrite entirely in her own words, and she didn’t have it in her to digest the stolen material and process it herself.)
Frank Fischer is a similar case. He wanted to include some background material but he was too lazy to digest it and write it in his own words. (Here, I’m not talking about changing a phrase or two here and there to avoid being caught in a google search, I’m talking about actually rewriting to give it from your own perspective.) Unlike Goodwin, he has (so far) refused to admit that he copied without sufficient attribution. Given the low stakes—I doubt Fischer’s books have much influence beyond his circle of friends and colleagues—and lack of continuing attention to the case, I suspect he’ll get away with denying it for the rest of his life if he feels like it.
Mark Hauser is a true believer. He’s a super-confident guy who is sure his theories are correct. He sees in the data the confirming evidence that he wants to see. At the same time, he knew he was breaking the rules and he knew that other people wouldn’t read his data the same way, that’s why he didn’t want to share his tapes.
I know the least about the Diederik Stapel case, but from the level of fraud it sounds like he went beyond Hauser-type overzealousness. I have no idea whether his entire career was a fraud or whether he more recently got into the data-faking game. If the latter, his motivation may have been greed for fame and reputation, possibly also simple greed for money if he was planning to use the faked findings to get research grants.
Ed Wegman, the only statistician in the bunch, is the most interesting case to me. I see several different motivations for his plagiarism:
– He did the report for the U.S. Congress out of a sense of duty, then he and his students found themselves overwhelmed with the technical challenges and, in order to get the report out in time, copied and pasted. For political reasons they couldn’t bring themselves to admit they were copying from their scientific and ideological adversaries, and, in their efforts to cover up the plagiarism, they introduced errors which (inadvertently) revealed their cluelessness about the science they were purportedly discussing.
– The motivation for the followup report (the one in Computational Statistics and Data Analysis) was political; the plagiarism came up because Wegman and his coauthors wanted to summarize a research area they knew nothing about. They were too lazy to learn it themselves so they copied it.
– But what’s the motivation for Wegman to plagiarize in review articles, for chrissake? Here I wonder if it’s just a vague sense of duty to the profession, that Wegman feels he’s supposed to write these articles, but again he’s too lazy to actually do it.
But . . . then why doesn’t he just cite the sources he’s stealing from? It’s not like the congressional report where he’s copying and distorting from the opposition. I have no idea why he doesn’t just do exact copies with quotation marks. I’ll say this, though: his desire to avoid looking like he’s using the work of others appears to be greater than his desire to avoid scientific errors in his manuscripts.
Wegman’s is a fascinating case, in that he’s breaking the rules, destroying his own reputation, and not getting anything out of it personally. He already was well respected with a comfortable job, and the cut-and-paste jobs were bringing him neither fame nor fortune. The sad thing is that I think Wegman may have done it out of a sense of obligation to his country, his profession, and his students. He promised more than he had the ability or inclination to do, and then he didn’t see any reasonable way of backing out. I hope that at some point he has the decency to apologize to the people whose work he ripped off and distorted.
The above imputed motivations are all just guesses. I met Bruno Frey once and exchanged a couple of emails—I liked they guy but I wouldn’t say I know him at all—and the others I’ve never met. So really I’m using these cases as an opportunity for some general speculations.
P.P.S. Wegman is listed as the “Bernard J. Dunn professor” at George Mason University. Unfortunately, the very accomplished Dunn (check out his Wikipedia page!) died 2 1/2 years ago. I wonder what he would think if he knew that his donation to the university went to paying the salary of
a plagiarist the author of papers that bear a striking similarity to, but are worse than, Wikipedia articles.
P.P.P.S. You might very well ask why I keep writing about this. I write about it because it makes me mad. Being a scientist is a privilege, and academic jobs are hard to find. So, yeah, I get angry if repeat offenders can get good jobs, just because they’re connected. It looks a bit like the mob, or like Tammany Hall. Maybe a bit of graft is needed to grease the system, but one way of keeping the corruption under corruption under control is to point it out when we see it.
Also, I work hard in my research, and I don’t appreciate those people who spew out meaningless nonsense (or, even worse, fake their data) making it that much harder to find the signal amidst the noise.
I’d certainly be annoyed if someone were to ever take work of mine, introduce errors, and then publish it as if it were his own idea.