Plagiarism means never having to say you’re clueless.

In a blog discussion on plagiarism ten years ago, Rahul wrote:

The real question for me is, how I would react to someone’s book which has proven rather useful and insightful in all aspects but which in hindsight turns out to have plagiarized bits. Think of whatever textbook, say, you had found really damn useful (perhaps it’s the only good text on that topic; no alternatives) and now imagine a chapter of that textbook turns out to be plagiarized.

What’s your reaction? To me that’s the interesting question.

It is an interesting question, and perhaps the most interesting aspect to it is that we don’t actually see high-quality, insightful plagiarized work!

Theoretically such a thing could happen: an author with a solid understanding of the material finds an excellent writeup from another source—perhaps a published article or book, perhaps something on wikipedia, maybe something written by a student—and inserts it directly into the text, not crediting the source. Why not credit the source? Maybe because all the quotation marks would make the resulting product more difficult to read, or maybe just because the author is greedy for acclaim and does not want to share credit. Greed is not a pretty trait, but, as Rahul writes, that’s a separate issue from the quality of the resulting product.

So, yeah, how to think about such a case? My response is that it’s only a hypothetical case, that in practice it never occurs. Perhaps readers will correct me in the comments, but until that happens, here’s my explanation:

When we write, we do incorporate old material. Nothing we write is entirely new, nor should it be. The challenge is often to put that old material into a coherent framework, which requires some understanding. When authors plagiarize, they seem to do this as a substitute for understanding. Reading that old material and integrating it into the larger story, that takes work. If you insert chunks of others’ material verbatim, it becomes clear that you didn’t understand it all, and not acknowledging the source is a way of burying that meta-information. To flip it around: as a reader, that hypertext—being able to track to the original source—can be very helpful. Plagiarists don’t want you to be aware of the copying in large part because they don’t want to reveal that they have not put the material all together.

To use statistical terminology, plagiarism is a sort of informative missingness: the very fact that the use of outside material has not been acknowledged itself provides information that the copyist has not fully integrated it into the story. That’s why Basbøll and I referred to plagiarism as a statistical crime. Not just a crime against the original author—but, yeah, as someone whose work has been plagiarized, it annoys me a lot—but also against the reader. As we put it in that article:

Much has been written on the ethics of plagiarism. One aspect that has received less notice is plagiarism’s role in corrupting our ability to learn from data: We propose that plagiarism is a statistical crime. It involves the hiding of important information regarding the source and context of the copied work in its original form. Such information can dramatically alter the statistical inferences made about the work.

To return to Rahul’s question above: have I ever seen something “useful and insightful” that’s been plagiarized? In theory it could happen: just consider an extreme example such as an entirely pirated book. Take a classic such as Treasure Island, remove the name Robert Louis Stevenson and replace it with John Smith, and it would still be a rollicking good read. But I don’t think this is usually what happens. The more common story would be that something absolutely boring is taken from source A and inserted without checking into document B, and no value is added in the transmission.

To put it another way, start with the plagiarist. This is someone who’s under some pressure to produce a document on topic X but doesn’t fully understand the topic. One available approach is to plagiarize the difficult part. From the reader’s perspective, the problem is that the resulting document has undigested material, the copied part could actually be in error or could be applied incorrectly. By not disclosing the source, the author is hiding important information that could otherwise help the reader better parse the material.

If I see some great material from another source, I’ll copy it and quote it. Quotations are great!

Music as a counterexample

In his book, It’s One for the Money, music historian Clinton Heylin gives many examples of musicians who’ve used material from others without acknowledgment, producing memorable and sometimes wonderful results. A well known example is Bob Dylan.

How does music differ from research or science writing? For one thing, “understanding” seems much more important in science than in music. Integrating a stolen riff into a song is just a different process than integrating an explanation of a statistical method into a book.

There’s also the issue of copyright laws and financial stakes. You can copy a passage from a published article, with quotes, and it’s no big deal. But if you take part of someone’s song, you have to pay them real money. So there’s a clear incentive not to share credit and, if necessary, to muddy the waters to make it more difficult for predecessors to claim credit.

Finally, in an academic book or article it’s easy enough to put in quotation marks and citations. There’s no way to do that in a song! Yes, you can include it in the liner notes, and I’d argue that songwriters and performers should acknowledge their sources in that way, but it’s still not as direct as writing, “As X wrote . . .”, in an academic publication.

What are the consequences of plagiarism?

There are several cases of plagiarism by high-profile academics who seem to have suffered no consequences (beyond the occasional embarrassment when people like me bring it up or when people check them on wikipedia): examples include some Harvard and Yale law professors and this dude at USC. The USC case I can understand—the plagiarist in question is a medical school professor who probably makes tons of money for the school. Why Harvard and Yale didn’t fire their law-school plagiarists, I’m not sure, maybe it’s a combination of “Hey, these guys are lawyers, they might sue us!” and a simple calculation along the lines of: “Harvard fires prof for plagiarism” is an embarrassing headline, whereas “Harvard decides to do nothing about a plagiarist” likely won’t make it into the news. And historian Kevin Kruse still seems to be employed at Princeton. (According to Wikipedia, “In October 2022, both Cornell, where he wrote his dissertation, and Princeton, where he is employed, ultimately determined that these were “citation errors” and did not rise to the level of intentional plagiarism.” On the plus side, “He is a fan of the Kansas City Chiefs.”)

Other times, lower-tier universities just let elderly plagiarists fade away. I’m thinking here of George Mason statistician Ed Wegman and Rutgers political scientist Frank Fischer. Those cases are particularly annoying to me because Wegman received a major award from the American Statistical Association and Fischer received an award from the American Political Science Association—for a book with plagiarized material! I contacted the ASA to suggest they retract the award and I contacted the APSA to suggest that they share the award with the scholars who Fischer had ripped off—but both organizations did nothing. I guess that’s how committees work.

We also sometimes see plagiarists get canned. Two examples are Columbia history professor Charles Armstrong and Arizona State historian Matthew Whitaker. Too bad for these guys that they weren’t teaching at Harvard, Yale, or Princeton, or maybe they’d still be gainfully employed!

Outside academia, plagiarism seems typically to have more severe consequences.

Journalism: Mike Barnicle, Stephen Glass, etc.
Pop literature: that spy novelist (also here), etc.

Lack of understanding

The theme of this post is that, at least regarding academics, plagiarism is a sign of lack of understanding.

A common defense/excuse/explanation for plagiarism is that whatever had been copied was common knowledge, just some basic facts, so who cares if it’s expressed originally? This is kind of a lame excuse given that it takes no effort at all to write, “As source X says, ‘. . .'” There seems little doubt that the avoidance of attribution is there so that the plagiarist gets credit for the words. And why is that? It has to depend on the situation—but it doesn’t seem that people usually ask the plagiarist why they did it. I guess the point is that you can ask the person all you want, but they don’t have to reply—and, given the record of misrepresentation, there’s no reason to suspect a truthful answer.

But, yeah, sometimes it must be the case that the plagiarist understands the copied material and is just being lazy/greedy.

What’s interesting to me is how often it happens that the plagiarist (or, more generally, the person who copies without attribution) evidently doesn’t understand the copied material.

Here are some examples:

Weggy: copied from wikipedia, introducing errors in the process.

Chrissy: copied from online material, introducing errors in the process; this example of unacknowledged copying was not actually plagiarism because it was stories being repeated without attribution, not exact words.

Armstrong (not the cyclist): plagiarizing material by someone else, in the process getting the meaning of the passage entirely backward.

Fischer (not the chess player): OK, I have to admit, this one was so damn boring I didn’t read through to see if any errors were introduced in the copying process.

Say what you want about Mike Barnicle and Doris Kearns Goodwin, but I think it’s fair to assume that they did understand the material they were ripping off without attribution.

In contrast, academic plagiarists seem to copy not so much out of greed as from laziness.

Not laziness as in, too lazy to write the paragraph in their own words, but laziness as in, too lazy to figure out what’s going on—but it’s something they’re supposed to understand.

That’s it!

You’re an academic researcher who is doing some work that is relying on some idea or method, and it’s considered important that the you understand it. This could be a statistical method being used for data analysis, it could be a key building block in an expository piece, it could be some primary sources in historical work, something like that. Just giving a citation and a direct quote wouldn’t be enough, because that wouldn’t demonstrate the required understanding:
– If you’re using a statistical method, you have to understand it at some level or else the reader can’t be assured that you’re using it correctly.
– In a tutorial, you need to understand the basics, otherwise why are you writing the tutorial in the first place.
– In historical work, often the key contribution is bringing in new primary sources. If you’re not doing that, a lot more of a burden is placed on interpretation, which maybe isn’t your strong point.

So, you plagiarize. That’s the only choice! OK, not the only choice. Three alternatives are:
1. Don’t write and publish the article/book/thesis. Just admit you have nothing to add. But that would be a bummer, no?
2. Use direct quotes and citations. But then there may be no good reason for anyone want to read or publish the article/book/thesis. To take an extreme example, is Wiley Interdisciplinary Reviews going to publish a paper that is a known copy of a wikipedia entry? Probably not. Even if your buddy is an editor of the journal, he might think twice.
3. Put in the work to actually understand the method or materials that you’re using. But, hey, that’s a lot of effort! You have a life to read, no? Working out math, reading obscure documents in a foreign language, actually reading what you need to use, that would take effort! Ok, that’s effort that most of us would want to put in, indeed that’s a big reason we became academics in the first place: we enjoy coding, we enjoy working out math, understanding new things, reading dusty old library books. But some subset of us doesn’t want to do the work.
If, for whatever reason, you don’t want to do any of the above three options, then maybe you’ll plagiarize. And just hope that, if you get caught, you receive the treatment given to the Harvard and Yale law professors, the USC medical school professor, and the Princeton history professor or, if you do it late enough in your career, the George Mason statistics professor and the Rutgers history professor. So, stay networked and avoid pissing off powerful people within your institution.

As I wrote last year regarding scholarly misconduct more generally:

I don’t know that anyone’s getting a pass. What seems more likely to me is that anyone—left, center, or right—who gets more attention is also more likely to see his or her work scrutinized.

Or, to put it another way, it’s a sad story that perpetrators of scholarly misconduct often “seem to get a pass” from their friends and employers and academic societies, but this doesn’t seem to have much to do with ideological narratives; it seems more like people being lazy and not wanting a fuss.

The tell

The tell, as they say in poker, is that the copied-without-attribution material so often displays a lack of understanding. Not necessarily a lack of ability to understand—-Ed Wegman could’ve spent an hour reading through the Wikipedia passage he’d copied and avoided introducing an error; Christian Hesse could’ve spent some time actually reading the words he typed, and maybe even some doing some research, and avoiding errors such as this, reported by chess historian Edward Winter:

In 1900 Wilhelm/William Steinitz died, a fact which did not prevent Christian Hesse from quoting a remark by Steinitz about a mate-in-two problem by Pulitzer which, according to Hesse, was dated 1907. (See page 166 of The Joys of Chess.) Hesse miscopied from our presentation of the Pulitzer problem on page 11 of A Chess Omnibus (also included in Steinitz Stuck and Capa Caught). We gave Steinitz’s comments on the composition as quoted on page 60 of the Chess Player’s Scrap Book, April 1907, and that sufficed for Hesse to assume that the problem was composed in 1907.

Also, I can only assume that Korea expert Charles Armstrong could’ve carefully read the passage he was ripping off and avoided getting its meaning backward. But having the ability to do the work isn’t enough. To keep the quality up in the finished product, you have to do the work. Understanding new material is hard; copying is easy. And then it makes sense to cover your tracks. Which makes it harder for the reader to spot the mistakes. Etc.

In his classic essay, “Politics and the English Language,” the political journalist George Orwell drew a connection between cloudy writing and cloudy content, which I think applies to academic writing as well. Something similar seems to be going on with copying without attribution. It happens when authors don’t understand what they’re writing about.

P.S. I just came across this post from 2011, “A (not quite) grand unified theory of plagiarism, as applied to the Wegman case,” where I wrote, “It’s not that the plagiarized work made the paper wrong; it’s that plagiarism is an indication that the authors don’t really know what they’re doing.” I’d forgotten about that!

“Reading Like It’s 1965”: Fiction as a window into the past

Raghu Parthasarathy writes:

The last seven books I read were all published in 1965. I decided on this literary time travel after noticing that I unintentionally read two books in a row from 1965. I thought: Why not continue? Would I get a deep sense of the mid-1960s zeitgeist? I don’t think so . . .

Contra Raghu, I do think that reading old books gives us some sense of how people used to live, and how they used to think. I have nothing new to offer on this front, but here are some relevant ideas we’ve discussed before:

1. The Speed Racer principle: Sometimes the most interesting aspect of a scientific or cultural product is not its overt content but rather its unexamined assumptions.

2. Storytelling as predictive model checking: Fiction is the working out of possibilities. Nonfiction is that too, just with more constraints.

3. Hoberman and Deliverance: Some cultural artifacts are striking because of what they leave out. My go-to example here is the book Deliverance, which was written during the U.S.-Vietnam war and, to my mind, is implicitly all about that war even though I don’t think it is mentioned even once in the book.

4. Also, Raghu mentions Stoner so I’ll point you to my post on the book. In the comments section, Henry Farrell promises us an article called “What Meyer and Rowan on Myth and Ceremony tells us about Forlesen.” So, something to look forward to.

5. And Raghu mentions Donald Westlake. As I wrote a few years ago, my favorite Westlake is Killing Time, but I also like Memory. And then there’s The Axe. And Slayground’s pretty good too. And Ordo, even if it’s kind of a very long extended joke on the idea of murder. Overall, I do think there’s a black hole at the center of Westlake’s writing: as I wrote a few years ago, he has great plots and settings and charming characters, but nothing I’ve ever read of his has the emotional punch of, say, Scott Smith’s A Simple Plan (to choose a book whose plot would fit well into the Westlake canon). But, hey, nobody can do everything. Also see here and here.

Russell’s Paradox of ghostwriters

A few months ago we discussed the repulsive story of a UCLA USC professor who took full credit for a series of books that were ghostwritten. It turned out that one of the books had “at least 95 separate passages” of plagiarism, including “long sections of a chapter on the cardiac health of giraffes.”

You’d think you’d remember a chapter on the cardiac health of giraffes. Indeed, if I hired someone to write a chapter under my name on the cardiac health of giraffes, I think I’d read it, just out of curiosity! But I guess this guy has no actual curiosity. He just wants another bestselling book so he can go on TV some more and mingle with rich and famous people.

OK, I’ve ranted enough about this guy. What I wanted to share today is a fascinating story from a magazine article about the affair, where the author, Joel Stein, “Nearly all experts and celebrities use ghostwriters,” and then links to an amusing magazine article from 2009 subtitled, “If Sarah Palin can write a memoir in four months, can I write my life story in an afternoon?”:

When I heard that Sarah Palin wrote her upcoming 400-page autobiography, Going Rogue: An American Life, in four months, I thought, What took her so long? To prove that introspection doesn’t need to be time-consuming, I decided to try to write my memoir in one day. Since Palin had a ghostwriter, I figured it was only fair that I have help too, so I called Neil Strauss, who co-wrote the best-selling memoirs of Marilyn Manson, Mötley Crüe, Dave Navarro and Jenna Jameson. . . .

The whole article is fun. They wrote a whole memoir in an afternoon!

That particular memoir-book was a gag, but it got me thinking of this general idea of recursive writing. A writer hiring a ghostwriter . . . what a great idea! Of course this happens all the time when the writer is a brand name, as with James Patterson. But then what if Patterson’s ghostwriter is busy and hires a ghostwriter of his own . . .

Perhaps the most famous ghostwritten book is The Autobiography of Malcolm X, by Alex Haley. After Roots came out, the Malcom X autobiography was promoted heavily based on the Haley authorship. On the other hand, parts of Roots were plagiarized, which is kind of like a ghostwriter hiring a ghostwriter.

A writer hiring a writer to do his writing . . . that sounds so funny! But should it? I’m a professional writer and I call upon collaborators all the time. Collaborative writing is very rare in literary writing; it sometimes happens in nonliterary writing (for example here, or for a less successful example, here), but usually there it follows a model of asymmetric collaboration, as with Freakonomics where Levitt supplied the material, Dubner supplied the writing, but I assume that both the content and the writing benefited from conversations between the authors.

One of the common effects of ghostwriting is to give a book a homogenized style. Writers of their own books will have their original styles—most of us cannot approach the caliber of Mark Twain, Virginia Woolf, or Jim Thompson, but style is part of how you express yourself—and nonprofessional writers can have charming idiosyncratic styles of their own. The homogenized, airport-biography style comes from writers who are talented enough to produce this sort of thing on demand, while having some financial motivation not to express originality. In contrast, Malcolm Gladwell deserves credit for producing readable prose while having his own interesting style. I doubt he uses a ghostwriter.

Every once in awhile, though, there will be a ghostwriter who adds compelling writing of his own. One example is the aforementioned Alex Haley; another is the great Leonard Shecter. I’d say Stephen Dubner too, but I see him as more of a collaborator than a hired gun. Also Ralph Leighton: much of the charm in the Feynman memoirs is that voice, and you gotta give the ghostwriter some of the credit here, even if only to keep that voice as is and not replace it with generic prose.

There must be some other ghostwriters who added style rather than blandness, although I can’t think of any examples right now.

More generally, I remain interested in the idea that collaboration is so standard in academic writing (even when we are writing fiction) and for Hollywood/TV scripts (as discussed in comments) and so unusual elsewhere, with the exception of ghostwriting.

Hey! Here’s how to rewrite and improve the title and abstract of a scientific paper:

Last week in class we read and then rewrote the title and abstract of a paper. We did it again yesterday, this time with one of my recent unpublished papers.

Here’s what I had originally:

title: Unifying design-based and model-based sampling inference by estimating a joint population distribution for weights and outcomes

abstract: A well-known rule in practical survey research is to include weights when estimating a population average but not to use weights when fitting a regression model—as long as the regression includes as predictors all the information that went into the sampling weights. But it is not clear how to apply this advice when fitting regressions that include only some of the weighting information, nor does it tell us what to do when analyzing already-collected surveys where the weighting procedure has not been clearly explained or where the weights depend in part on information that is not available in the data. It is also not clear how one is supposed to account for clustering in such analyses. We propose a quasi-Bayesian approach using a joint regression of the outcome and the sampling weight, followed by poststratifcation on the two variables, thus using design information within a model-based context to obtain inferences for small-area estimates, regressions, and other population quantities of interest.

Not terrible, but we can do better. Here’s the new version:

title: MRP using sampling weights

abstract: A well-known rule in practical survey research is to include weights when estimating a population average but not to use weights when fitting a regression model—as long as the regression includes as predictors all the information that went into the sampling weights. But what if you don’t know where the weights come from? We propose a quasi-Bayesian approach using a joint regression of the outcome and the sampling weight, followed by poststratifcation on the two variables, thus using design information within a model-based context to obtain inferences for small-area estimates, regressions, and other population quantities of interest.

How did we get there?

The title. The original title was fine—it starts with some advertising (“Unifying design-based and model-based sampling inference”) and follows up with a description of how the method works (“estimating a joint population distribution for weights and outcomes”).

But the main point of the title is to get the notice of potential readers, people who might find the paper useful or interesting (or both!).

This pushes the question back one step: Who would find this paper useful or interesting? Anyone who works with sampling weights. Anyone who uses public survey data or, more generally, surveys collected by others, which typically contain sampling weights. And anyone who’d like to follow my path in survey analysis, which would be all the people out there who use MRP (multilevel regression and poststratification). Hence the new title, which is crisp, clear, and focused.

My only problem with the new title, “MRP using sampling weights,” is that it doesn’t clearly convey that the paper involves new research. It makes it look like a review article. But that’s not so horrible; people often like to learn from review articles.

The abstract. If you look carefully, you’ll see that the new abstract is the same as the original abstract, except that we replaced the middle part:

But it is not clear how to apply this advice when fitting regressions that include only some of the weighting information, nor does it tell us what to do when analyzing already-collected surveys where the weighting procedure has not been clearly explained or where the weights depend in part on information that is not available in the data. It is also not clear how one is supposed to account for clustering in such analyses.

with this:

But what if you don’t know where the weights come from?

Here’s what happened. We started by rereading the original abstract carefully. That abstract has some long sentences that are hard to follow. The first sentence is already kinda complicated, but I decided to keep it, because it clearly lays out the problem, and also I think the reader of an abstract will be willing to work a bit when reading the first sentence. Getting to the abstract at all is a kind of commitment.

The second sentence, though, that’s another tangle, and at this point the reader is tempted to give up and just skate along to the end—which I don’t want! The third sentence isn’t horrible, but it’s still a little bit long (starting with the nearly-contentless “It is also not clear how one is supposed to account for” and the ending with the unnecessary “in such analyses”). Also, we don’t even really talk much about clustering in the paper! So it was a no-brainer to collapse these into a sentence that was much more snappy and direct.

Finally, yeah, the final sentence of the abstract is kinda technical, but (a) the paper’s technical, and we want to convey some of its content in the abstract!, and (b) after that new, crisp, replacement second sentence, I think the reader is ready to take a breath and hear what the paper is all about.

General principles

Here’s a general template for a research paper:
1. What is the goal or general problem?
2. Why is it important?
3. What is the challenge?
4. What is the solution? What must be done to implement this solution?
5. If the idea in this paper is so great, why wasn’t the problem already solved by someone else?
6. What are the limitations of the proposed solution? What is its domain of applicability?

We used these principles in our rewriting of my title and abstract. The first step was for me to answer the above 6 questions:
1. Goal is to do survey inference with sampling weights.
2. It’s important for zillions of researchers who use existing surveys which come with weights.
3. The challenge is that if you don’t know where the weights come from, you can’t just follow the recommended approach to condition in the regression model on the information that is predictive of inclusion into the sample.
4. The solution is to condition on the weights themselves, which involves the additional step of estimating a joint population distribution for the weights and other predictors in the model.
5. The problem involves a new concept (imagining a population distribution for weights, which is not a coherent assumption, because, in the real world, weights are constructed based on the data) and some new mathematical steps (not inherently sophisticated as mathematics, but new work from a statistical perspective). Also, the idea of modeling the weights is not completely new; there is some related literature, and one of our contributions is to take weights (which are typically constructed from a non-Bayesian design-based perspective) and use them in a Bayesian analysis.
6. Survey weights do not include all design information, so the solution offered in the paper can only be approximate. In addition the method requires distributional assumptions on the weights; also it’s a new method so who knows how useful it will be in practice.

We can’t put all of that in the abstract, but we were able to include some versions of the answers to questions 1, 3, and 4. Questions 5 and 6 are important, but it’s ok to leave them to the paper, as this is where readers will typically search for limitations and connections to the literature.

Maybe we should include the answer to question 2 in the abstract, though. Perhaps we could replace “But what if you don’t know where the weights come from?” with “But what if you don’t know where the weights come from? This is often a problem when analyzing surveys collected by others.”

Summary

By thinking carefully about goals and audience, we improved the title and abstract of a scientific paper. You should be able to do this in your own work!

I disagree with Geoff Hinton regarding “glorified autocomplete”

Computer scientist and “godfather of AI” Geoff Hinton says this about chatbots:

“People say, It’s just glorified autocomplete . . . Now, let’s analyze that. Suppose you want to be really good at predicting the next word. If you want to be really good, you have to understand what’s being said. That’s the only way. So by training something to be really good at predicting the next word, you’re actually forcing it to understand. Yes, it’s ‘autocomplete’—but you didn’t think through what it means to have a really good autocomplete.”

This got me thinking about what I do at work, for example in a research meeting. I spend a lot of time doing “glorified autocomplete” in the style of a well-trained chatbot: Someone describes some problem, I listen and it reminds me of a related issue I’ve thought about before, and I’m acting as a sort of FAQ, but more like a chatbot than a FAQ in that the people who are talking with me do not need to navigate through the FAQ to find the answer that is most relevant to them; I’m doing that myself and giving a response.

I do that sort of thing a lot in meetings, and it can work well, indeed often I think this sort of shallow, associative response can be more effective than whatever I’d get from a direct attack on the problem in question. After all, the people I’m talking with have already thought for awhile about whatever it is they’re working on, and my initial thoughts may well be in the wrong direction, or else my thoughts are in the right direction but are just retracing my collaborators’ past ideas. From the other direction, my shallow thoughts can be useful in representing insights from problems that these collaborators had not ever thought about much before. Nonspecific suggestions on multilevel modeling or statistical graphics or simulation or whatever can really help!

At some point, though, I’ll typically have to bite the bullet and think hard, not necessarily reaching full understanding in the sense of mentally embedding the problem at hand into a coherent schema or logical framework, but still going through whatever steps of logical reasoning that I can. This feels different than autocomplete; it requires an additional level of focus. Often I need to consciously “flip the switch,” as it were, to turn on that focus and think rigorously. Other times, I’m doing autocomplete and either come to a sticking point or encounter an interesting idea, and this causes me to stop and think.

It’s almost like the difference between jogging and running. I can jog and jog and jog, thinking about all sorts of things and not feeling like I’m expending much effort, my legs pretty much move up and down of their own accord . . . but then if I need to run, that takes concentration.

Here’s another example. Yesterday I participated in the methods colloquium in our political science department. It was Don Green and me and a bunch of students, and the structure was that Don asked me questions, I responded with various statistics-related and social-science-related musings and stories, students followed up with questions, I responded with more stories, etc. Kinda like the way things go here on the blog, but spoken rather than typed. Anyway, the point is that most of my responses were a sort of autocomplete—not in a word-by-word chatbot style, more at a larger level of chunkiness, for example something would remind me of a story, and then I’d just insert the story into my conversation—but still at this shallow, pleasant level. Mellow conversation with no intellectual or social strain. But then, every once in awhile, I’d pull up short and have some new thought, some juxtaposition that had never occurred to me before, and I’d need to think things through.

This also happens when I give prepared talks. My prepared talks are not super-well prepared—this is on purpose, as I find that too much preparation can inhibit flow. In any case, I’ll often finding myself stopping and pausing to reconsider something or another. Even when describing something I’ve done before, there are times when I feel the need to think it all through logically, as if for the first time. I noticed something similar when I saw my sister give a talk once: she had the same habit of pausing to work things out from first principles. I don’t see this behavior in every academic talk, though; different people have different styles of presentation.

This seems related to models of associative and logical reasoning in psychology. As a complete non-expert in that area, I’ll turn to wikipedia:

The foundations of dual process theory likely come from William James. He believed that there were two different kinds of thinking: associative and true reasoning. . . . images and thoughts would come to mind of past experiences, providing ideas of comparison or abstractions. He claimed that associative knowledge was only from past experiences describing it as “only reproductive”. James believed that true reasoning could enable overcoming “unprecedented situations” . . .

That sounds about right!

After describing various other theories from the past hundred years or so, Wikipedia continues:

Daniel Kahneman provided further interpretation by differentiating the two styles of processing more, calling them intuition and reasoning in 2003. Intuition (or system 1), similar to associative reasoning, was determined to be fast and automatic, usually with strong emotional bonds included in the reasoning process. Kahneman said that this kind of reasoning was based on formed habits and very difficult to change or manipulate. Reasoning (or system 2) was slower and much more volatile, being subject to conscious judgments and attitudes.

This sounds a bit different from what I was talking about above. When I’m doing “glorified autocomplete” thinking, I’m still thinking—this isn’t automatic and barely conscious behavior along the lines of driving to work along a route I’ve taken a hundred times before—; I’m just thinking in a shallow way, trying to “autocomplete” the answer. It’s pattern-matching more than it is logical reasoning.

P.S. Just to be clear, I have a lot of respect for Hinton’s work; indeed, Aki and I included Hinton’s work in our brief review of 10 pathbreaking research articles during the past 50 years of statistics and machine learning. Also, I’m not trying to make a hardcore, AI-can’t-think argument. Although not myself a user of large language models, I respect Bob Carpenter’s respect for them.

I think that where Hinton got things wrong in the quote that led off this post was not in his characterization of chatbots, but rather in his assumptions about human thinking, in not distinguishing autocomplete-like associative reasoning with logical thinking. Maybe Hinton’s problem in understanding this is that he’s just too logical! At work, I do a lot of what seems like autocomplete—and, as I wrote above, I think it’s useful—but if I had more discipline, maybe I’d think more logically and carefully all the time. It could well be that Hinton has that habit or inclination to always be in focus. If Hinton does not have consistent personal experience of shallow, autocomplete-like thinking, he might not recognize it as something different, in which case he could be giving the chatbot credit for something it’s not doing.

Come to think of it, one thing that impresses me about Bob is that, when he’s working, he seems to always be on focus. I’ll be in a meeting, just coasting along, and Bob will interrupt someone to ask for clarification, and I suddenly realize that Bob absolutely demands understanding. He seems to have no interest in participating in a research meeting in a shallow way. I guess we just have different styles. It’s my impression that the vast majority of researchers are like me, just coasting on the surface most of the time (for some people, all of the time!), while Bob, and maybe Geoff Hinton, is one of the exceptions.

P.P.S. Sometimes we really want to be doing shallow, auto-complete-style thinking. For example, if we’re writing a play and want to simulate how some characters might interact. Or just as a way of casting the intellectual net more widely. When I’m in a research meeting and I free-associate, it might not help immediately solve the problem at hand, but it can bring in connections that will be helpful later. So I’m not knocking auto-complete; I’m just disagreeing with Hinton’s statement that “by training something to be really good at predicting the next word, you’re actually forcing it to understand.” As a person who does a lot of useful associative reasoning and also a bit of logical understanding, I think they’re different, both in how they feel and also in what they do.

P.P.P.S. Lots more discussion in comments; you might want to start here.

P.P.P.P.S. One more thing . . . actually, it might deserve its own post, but for now I’ll put it here: So far, it might seem like I’m denigrating associative thinking, or “acting like a chatbot,” or whatever it might be called. Indeed, I admire Bob Carpenter for doing very little of this at work! The general idea is that acting like a chatbot can be useful—I really can help lots of people solve their problems in that way, also every day I can write these blog posts that entertain and inform tens of thousands of people—but it’s not quite the same as focused thinking.

That’s all true (or, I should say, that’s my strong impression), but there’s more to it than that. As discussed in my comment linked to just above, “acting like a chatbot” is not “autocomplete” at all, indeed in some ways it’s kind of the opposite. Locally it’s kind of like autocomplete in that the sentences flow smoothly; I’m not suddenly jumping to completely unrelated topics—but when I do this associative or chatbot-like writing or talking, it can lead to all sorts of interesting places. I shuffle the deck and new hands come up. That’s one of the joys of “acting like a chatbot” and one reason I’ve been doing it for decades, long before chatbots ever existed! Walk along forking paths, and who knows where you’ll turn up! And all of you blog commenters (ok, most of you) play helpful roles in moving these discussions along.

Hey, check this out! Here’s how to read and then rewrite the title and abstract of a paper.

In our statistical communication class today, we were talking about writing. At some point a student asked why it was that journal articles are all written in the same way. I said, No, actually there are many different ways to write a scientific journal article. Superficially these articles all look the same: title, abstract, introduction, methods, results, discussion, or some version of that, but if you look in detail you’ll see that you have lots of flexibility in how to do this (with the exception of papers in medical journals such as JAMA which indeed have a pretty rigid format).

The next step was to demonstrate the point by going to a recent scientific article. I asked the students to pick a journal. Someone suggested NBER. So I googled NBER and went to its home page:

I then clicked on the most recent research paper, which was listed on the main page as “Employer Violations of Minimum Wage Laws.” Click on the link and you get this more dramatically-titled article:

Does Wage Theft Vary by Demographic Group? Evidence from Minimum Wage Increases

with this abstract:

Using Current Population Survey data, we assess whether and to what extent the burden of wage theft — wage payments below the statutory minimum wage — falls disproportionately on various demographic groups following minimum wage increases. For most racial and ethnic groups at most ages we find that underpayment rises similarly as a fraction of realized wage gains in the wake of minimum wage increases. We also present evidence that the burden of underpayment falls disproportionately on relatively young African American workers and that underpayment increases more for Hispanic workers among the full working-age population.

We actually never got to the full article (but feel free to click on the link and read it yourself). There was enough in the title and abstract to sustain a class discussion.

Before going on . . .

In class we discussed the title and abstract of the above article and considered how it could be improved. This does not mean we think the article, or its title, or its abstract, is bad. Just about everything can be improved! Criticism is an important step in the process of improvement.

The title

“Does Wage Theft Vary by Demographic Group? Evidence from Minimum Wage Increases” . . . that’s not bad! “Wage Theft” in the first sentence is dramatic—it grabs our attention right away. And the second sentence is good too: it foregrounds “Evidence” and it also tells you where the identification is coming from. So, good job. We’ll talk later about how we might be able to do even better, but I like what they’ve got so far.

Just two things.

First, the answer to the question, “Does X vary with Y?”, is always Yes. At least, in social science it’s always Yes. There are no true zeroes. So it would be better to change that first sentence to something like, “How Does Wage Theft Vary by Demographic Group?”

The second thing is the term “wage theft.” I took that as a left-wing signifier, the same way in which the use of a loaded term such as “pro-choice” or “pro-life” conveys the speaker’s position on abortion. So I took the use of that phrase in the title as a signal that the article is taking a position on the political/economic left. But then I googled the first author, and . . . he’s an “Adjunct Senior Fellow at the Hoover Institution.” Not that everyone at Hoover is right-wing, but it’s not a place I associate with the left, either. So I’ll move on and not worry about this issue.

The point here is not that I’m trying to monitor the ideology of economics papers. This is a post on how to write a scholarly paper! My point is that the title conveys information, both directly and indirectly. The term “wage theft” in the title conveys that the topic of the paper will be morally serious—they’re talking about “theft,” not just some technical violations of a law—; also it has this political connotation. When titling your papers, be aware of the direct and indirect messages you’re conveying.

The abstract

As I said, I liked the title of the paper—it’s punchy and clear. The abstract is another story. I read it and then realized I hadn’t absorbed any of its content, so I read it again, and it was still confusing. It’s not “word salad”—there’s content in that abstract—; it’s just put together in a way that I found hard to follow. The students in the class had the same impression, and indeed they were kinda relieved that I too found it confusing.

How to rewrite? The best approach would be to go into the main paper, maybe start with our tactic of forming an abstract by taking the first sentence of each of the first five paragraphs. But here we’ll keep it simple and just go with the information right there in the current abstract. Our goal is to rewrite in a way that makes it less exhausting to read.

Our strategy: First take the abstract apart, then put it back together.

I went to the blackboard and listed the information that was in the abstract:
– CPS data
– Definition of wage theft
– What happens after minimum wage increase
– Working-age population
– African American, Hispanic, White

Now, how to put this all together? My first thought was to just start with the definition of wage theft, but then I checked online and learned that the phrase used in the abstract, “wage payments below the statutory minimum wage,” is not the definition of wage theft; it’s actually just one of several kinds of wage theft. So that wasn’t going to work. Then there’s the bit from the abstract, “falls disproportionately on various demographic groups”—that’s pretty useless, as what we want to know is where this disproportionate burden falls, and by how much.

Putting it all together

We discussed some more—it took surprisingly long, maybe 20 minutes of class time to work through all these issues—and then I came up with this new title/abstract:

Wage theft! Evidence from minimum wage increases

Using Current Population Survey data from [years] in periods following minimum wage increase, we look at the proportion of workers being paid less than the statutory minimum, comparing different age groups and ethnic groups. This proportion was highest in ** age and ** ethnic groups.

OK, how is this different from the original?

1. The three key points of the paper are “wage theft,” “evidence,” and “minimum wage increases,” so that’s now what’s in the title.

2. It’s good to know that the data came from the Current Population Survey. We also want to know when this was all happening, so we added the years to the abstract. Also we made the correction of changing the tense in the abstract from the present to the past, because the study is all based on past data.

3. The killer phrase, “wage theft,” is already in the title, so we don’t need it in the abstract. That helps, because then we can use the authors’ clear and descriptive phrase, “the proportion of workers being paid less than the statutory minimum,” without having to misleadingly imply that this is the definition of wage theft, and without having to lugubriously state that it’s a kind of wage theft. That was so easy!

4. We just say we’re comparing different age and ethnic groups and then report the results. This to me is much cleaner than the original abstract which shared this information in three long sentences, with quite a bit of repetition.

5. We have the ** in the last sentence because I’m not quite clear from the abstract what are the take-home points. The version we created is short enough that we could add more numbers to that last sentence, or break it up into two crisp sentences, for example, one sentence about age groups and one about ethnic groups.

In any case, I think this new version is much more readable. It’s a structure much better suited to conveying, not just the general vibe of the paper (wage theft, inequality, minority groups) but the specific findings.

Lessons for rewriters

Just about every writer is a rewriter. So these lessons are important.

We were able to improve the title and abstract, but it wasn’t easy, nor was it algorithmic—that is, there was no simple set of steps to follow. We gave ourselves the relatively simple task of rewriting without the burden of subject-matter knowledge, and it still took a half hour of work.

After looking over some writing advice, it’s tempting to think that rewriting is mostly a matter of a few clean steps: replacing the passive with the active voice, removing empty words and phrases such as “quite” and “Note that,” checking for grammar, keeping sentences short, etc. In this case, no. In this case, we needed to dig in a bit and gain some conceptual understanding to figure out what to say.

The outcome, though, is positive. You can do this too, for your own papers!

“Modeling Social Behavior”: Paul Smaldino’s cool new textbook on agent-based modeling

Paul Smaldino is a psychology professor who is perhaps best known for his paper with Richard McElreath from a few years ago, “The Natural Selection of Bad Science,” which presents a sort of agent-based model that reproduces the growth in the publication of junk science that we’ve seen in recent decades.

Since then, it seems that Smaldino has been doing a lot of research and teaching on agent-based models in social science more generally, and he just came out with a book, “Modeling Social Behavior: Mathematical and Agent-Based Models of Social Dynamics and Cultural Evolution.” The book has social science, it has code, it has graphs—it’s got everything.

It’s an old-school textbook with modern materials, and I hope it’s taught in thousands of classes and sells a zillion copies.

There’s just one thing that bothers me. The book is entertainingly written and bursting with ideas, also does a great job of giving concerns about the models that it’s simulating, not just acting like everything’s already known. My concern is that nobody reads books anymore. If I think about students taking a class in agent-based modeling and using this book, it’s hard for me to picture most of them actually reading the book. They’ll start with the homework assignments and then flip through the book to try to figure out what they need. That’s how people read nonfiction books nowadays, which I guess is one reason that books, even those I like, are typically repetitive and low on content. Readers don’t want the book to offer a delightful reading experience, so authors don’t deliver it, and then readers don’t expect it, etc.

To be clear: this is a textbook, not a trade book. It’s a readable and entertaining book in the way that Regression and Other Stories is a readable and entertaining book, not in the way that Guns, Germs, and Steel is. Still, within the framework of being a social science methods book, it’s entertaining and thought-provoking. Also, I like it as a methods book because it’s focused on models rather than on statistical inference. We tried to get a similar feel with A Quantitative Tour of Social Sciences but with less success.

So it kinda makes me sad to see this effort of care put into a book that probably very few students will read from paragraph to paragraph. I think things were different 50 years ago: back then, there wasn’t anything online to read, you’d buy a textbook and it was in front of you so you’d read it. On the plus side, readers can now go in and make the graphs themselves—I assume that Smaldino has a website somewhere with all the necessary code—so there’s that.

P.S. In the preface, Smaldino is “grateful to all the modelers whose work has inspired this book’s chapters . . . particularly want to acknowledge the debt owed to the work of,” and then he lists 16 names, one of which is . . . Albert-Lázló Barabási!

Huh?? Is this the same Albert-Lázló Barabási who said that scientific citations are worth $100,000 each. I guess he did some good stuff too? Maybe this is worthy of an agent-based model of its own.

Academia corner: New candidate for American Statistical Association’s Founders Award, Enduring Contribution Award from the American Political Science Association, and Edge Foundation just dropped

Bethan Staton and Chris Cook write:

A Cambridge university professor who copied parts of an undergraduate’s essays and published them as his own work will remain in his job, despite an investigation upholding a complaint that he had committed plagiarism. 

Dr William O’Reilly, an associate professor in early modern history, submitted a paper that was published in the Journal of Austrian-American History in 2018. However, large sections of the work had been copied from essays by one of his undergraduate students.

The decision to leave O’Reilly in post casts doubt on the internal disciplinary processes of Cambridge, which rely on academics judging their peers.

Dude’s not a statistician, but I think this alone should be enough to make him a strong candidate for the American Statistical Association’s Founders Award.

And, early modern history is not quite the same thing as political science, but the copying thing should definitely make him eligible for the Aaron Wildavsky Enduring Contribution Award from the American Political Science Association. Long after all our research has been forgotten, the robots of the 21st century will be able to sift through the internet archive and find this guy’s story.

Or . . . what about the Edge Foundation? Plagiarism isn’t quite the same thing as misrepresenting your data, but it’s close enough that I think this guy would have a shot at joining that elite club. I’ve heard they no longer give out flights to private Caribbean islands, but I’m sure there are some lesser perks available.

According to the news article:

Documents seen by the Financial Times, including two essays submitted by the third-year student, show nearly half of the pages of O’Reilly’s published article — entitled “Fredrick Jackson Turner’s Frontier Thesis, Orientalism, and the Austrian Militärgrenze” — had been plagiarised.

Jeez, some people are so picky! Only half the pages were plagiarized, right? Or maybe not? Maybe this prof did a “Quentin Rowan” and constructed his entire article based on unacknowledged copying from other sources. As Rowan said:

It felt very much like putting an elaborate puzzle together. Every new passage added has its own peculiar set of edges that had to find a way in.

I guess that’s how it felt when they were making maps of the Habsburg empire.

On the plus side, reading about this story motivated me to take a look at the Journal of Austrian-American History, and there I found this cool article by Thomas Riegler, “The Spy Story Behind The Third Man.” That’s one of my favorite movies! I don’t know how watchable it would be to a modern audience—the story might seem a bit too simplistic—but I loved it.

P.S. I laugh but only because that’s more pleasant than crying. Just to be clear: the upsetting thing is not that some sleazeball managed to climb halfway up the greasy pole of academia by cheating. Lots of students cheat, some of these students become professors, etc. The upsetting thing is that the organization closed ranks to defend him. We’ve seen this sort of thing before, over and over—for example, Columbia never seemed to make any effort whatsoever to track down whoever was faking its U.S. News numbers—, so this behavior by Cambridge University doesn’t surprise me, but it still makes me sad. I’m guessing it’s some combination of (a) the perp is plugged in, the people who make the decisions are his personal friends, (b) a decision that the negative publicity for letting this guy stay on at his job is not as bad as the negative publicity for firing him.

Can you imagine what it would be like to work in the same department as this guy?? Fun conversations at the water cooler, I guess. “Whassup with the Austrian Militärgrenze, dude?”

Meanwhile . . .

There are people who actually do their own research, and they’re probably good teachers too, but they didn’t get that Cambridge job. It’s hard to compete with an academic cheater, if the institution he’s working for seems to act as if cheating is just fine, and if professional societies such as the American Statistical Association and the American Political Science Association don’t seem to care either.

The Ministry of Food sez: Potatoes are fattening

As is often the case, I was thinking about George Orwell, and some googling led me to this post from 2005 that had some fun food-related quotes from Bernard Crick’s biography of Orwell, including this:

Just before they moved to Mortimer Crescent [in 1942], Eileen [Orwell’s wife] changed jobs. She now worked in the Minstry of Food preparing recipes and scripts for ‘The Kitchen Front’, which the BBC broadcast each morning. These short programmes were prepared in the Ministry because it was a matter of Government policy to urge or restrain the people from eating unrationed foods according to their official estimates, often wrong, of availability. It was the time of the famous ‘Potatoes are Good For You’ campaign, with its attendant Potato Pie recipes, which was so successful that another campaign had to follow immediately: ‘Potatoes are Fattening’.

Good stuff.

I looked up Crick on wikipedia and it turns out that early in his career he wrote a book that “identified and rejected [the] premises that research can discover uniformities in human behaviour, that these uniformities could be confirmed by empirical tests and measurements, that quantitative data was of the highest quality, and should be analysed statistically, that political science should be empirical and predictive . . .”

Interesting. Too bad he’s no longer around, as I’d have liked to talk with him about this, given how different it is from my own perspective. Crick’s Orwell biography is great, so much so that I’d hope he and I would have been able to find some common ground. He died in 2008, three years after the appearance of the above-linked post, so at least in theory I could’ve tried to reach him and have a discussion of quantitative political science and goats.

The Freaky Friday that never happened

Speaking of teaching . . . I wanted to share this story of something that happened today.

I was all fired up with energy, having just taught my Communicating Data and Statistics class, taking notes off the blackboard to remember what we’d been talking about so I could write about it later, and students were walking in for the next class. I asked them what it was, and they said Shakespeare. How wonderful to take a class on Shakespeare at Columbia University, I said. The students agreed. They love their teacher—he’s great.

This gave me an idea . . . maybe this instructor and I could switch classes some day, a sort of academic Freaky Friday. He could show up at 8:30 and teach my statistics students about Shakespeare’s modes of communication (with his contemporaries and with later generations including us, and also how Shakespeare made use of earlier materials), and I could come at 10am to teach his students how we communicate using numbers and graphs. Lots of fun all around, no? I’d love to hear the Shakespeare dude talk to a new audience, and I think my interactions with his group would be interesting too.

I waited in the classroom for awhile so I could ask the instructor when he came into the room, during the shuffling-around period before class officially starts at 10:10. Then 10:10 came and I stood outside to wait as the students continued to trickle in. A couple minutes later I saw a guy approaching, about my age, I ask if he teaches the Shakespeare class. Yes, he is. I introduce myself: I teach the class right before, on communicating data and statistics, maybe we could do a switch one day, could be fun? He says no, I don’t think so, and goes into the classroom.

That’s all fine, he has no obligation to do such a thing, also I came at him unexpectedly at a time when he was already in a hurry, coming to class late (I came to class late this morning too. Mondays!). His No response was completely reasonable.

Still . . . it was a lost opportunity! I’ll have to brainstorm with people about other ways to get this sort of interdisciplinary opportunity on campus. We could just have an interdiscplinary lecture series (Communication of Shakespeare, Communication in Statistics, Communication in Computer Science, Communication in Medicine, Communication in Visual Art, etc.), but it would be a bit of work to set up such a thing, also I’m guessing it wouldn’t reach so many people. I like the idea of doing it using existing classes, because (a) then the audience is already there, and (b) it would take close to zero additional effort: you’re teaching your class somewhere else, but then someone else is teaching your class so you get a break that day. And all the students are exposed to something new. Win-win.

The closest thing I can think of here is an interdisciplinary course I organized many years ago on quantitative social science, for our QMSS graduate program. The course had 3 weeks each of history, political science, economics, sociology, and psychology. It was not a statistics course or a methods course; rather, each segment discussed some set of quantitative ideas in the field. The course was wonderful, and Jeronimo Cortina and I turned it into a book, A Quantitative Tour of the Social Sciences, which I really like. I think the course went well, but I don’t think QMSS offers it anymore; I’m guessing it was just too difficult to organize a course with instructors from five different departments.

P.S. I read Freaky Friday and its sequel, A Billion for Boris, when I was a kid. Just noticed them on the library shelves. The library wasn’t so big; I must have read half the books in the children’s section at the time. Lots of fond memories.

Advice on writing a discussion of a published paper

A colleague asked for my thoughts on a draft of a discussion of a published paper, and I responded:

My main suggestion is this: Yes, your short article is a discussion of another article, and that will be clear when it is published. But I think you should write it to be read on its own, which means that you should focus on the points you want to make, and only then talk about the target article and other discussions.

So I’d do like this:

paragraph 1: Your main point. The one takeaway you want the reader to get.

the next few paragraphs: Your other points. Everything you want to say.

a few paragraphs more: How this relates to the articles you are discussing. Where you agree with them and where you disagree. If there are things in the target article you like, say so. Readers will in part use the discussion to make their judgment on the main article, so if your discussion reads as purely negative, that will take its toll. Which is fine, if that’s what you want to do.

final paragraph: Summary and pointers to future work.

I hope this is helpful. This advice might sound kinda generic but I actually wrote it specifically with your article in mind!

Awhile ago I gave some advice on writing research articles. This is the first time I recall specifically giving advice on writing a discussion.

My two courses this fall: “Applied Regression and Causal Inference” and “Communicating Data and Statistics”

POLS 4720, Applied Regression and Causal Inference:

This is a fast-paced one-semester course on applied regression and casual inference based on our book, Regression and Other Stories. The course has an applied and conceptual focus that’s different from other available statistics courses.
Topics covered in POLS 4720 include:
• Applied regression: measurement, data visualization, modeling and inference, transformations, linear regression, and logistic regression.
• Simulation, model fitting, and programming in R.
• Causal inference using regression.
• Key statistical problems include adjusting for differences between sample and population, adjusting for differences between treatment and control groups, extrapolating from past to future, and using observed data to learn about latent constructs of interest.
• We focus on social science applications, including but not limited to: public opinion and voting, economic and social behavior, and policy analysis.
The course is set up using the principles of active learning, with class time devoted to student-participation activities, computer demonstrations, and discussion problems.

The primary audience for this course is Poli Sci Ph.D. students, and it should also be ideal for statistics-using graduate students or advanced undergraduates in other departments and schools, as well as students in fields such as computer science and statistics who’d like to get an understanding of how regression and causal inference work in the real world!

STAT 6106, Communicating Data and Statistics:

This is a one-semester course on communicating data and statistics, covering the following modes of communication:
• Writing (including storytelling, writing technical articles, and writing for general audiences)
• Statistical graphics (including communicating variation and uncertainty)
• Oral communication (including teaching, collaboration, and giving presentations).
The course is set up using the principles of active learning, with class time devoted to discussions, collaborative work, practicing and evaluation of communication skills, and conversations with expert visitors.

The primary audience for this course is Statistics Ph.D. students, and it should also be ideal for Ph.D. students who do quantitative work in other departments and schools. Communication is sometimes thought of as a soft skill, but it is essential to statistics and scientific research more generally!

See you there:

Both courses have lots of space available, so check them out! In-person attendance is required, as class participation is crucial for both. POLS 4720 is offered Tu/Th 8:30-10am; STAT 6106 will be M/W 8:30-10:am. These are serious classes, with lots of homework. Enjoy.

Artificial intelligence and aesthetic judgment

This is Jessica. In a new essay reflecting on how we get tempted to aestheticize generative AI, Ari Holtzman, Andrew, and I write: 

Generative AIs produce creative outputs in the style of human expression. We argue that encounters with the outputs of modern generative AI models are mediated by the same kinds of aesthetic judgments that organize our interactions with artwork. The interpretation procedure we use on art we find in museums is not an innate human faculty, but one developed over history by disciplines such as art history and art criticism to fulfill certain social functions. This gives us pause when considering our reactions to generative AI, how we should approach this new medium, and why generative AI seems to incite so much fear about the future. We naturally inherit a conundrum of causal inference from the history of art: a work can be read as a symptom of the cultural conditions that influenced its creation while simultaneously being framed as a timeless, seemingly acausal distillation of an eternal human condition. In this essay, we focus on an unresolved tension when we bring this dilemma to bear in the context of generative AI: are we looking for proof that generated media reflects something about the conditions that created it or some eternal human essence? Are current modes of interpretation sufficient for this task? Historically, new forms of art have changed how art is interpreted, with such influence used as evidence that a work of art has touched some essential human truth. As generative AI influences contemporary aesthetic judgment we outline some of the pitfalls and traps in attempting to scrutinize what AI generated media “means.”

I’ve worked on a lot of articles in the past year or so, but this one is probably the most out-of-character. We are not exactly humanities scholars. And yet, I think there is some truth to the analogies we are making. Everywhere we seem to be witnessing the same sort of beauty contest, where some interaction with ChatGPT or another generative model is held up for scrutiny, and the conclusion drawn that it lacks a certain emergent “je ne sais quoi” that human creative expressions  like great works of art achieve. We approach our interactions as though they have the same kind of heightened status as going to a museum, where it’s up to us to peer into the work to cultivate the right perspective on the significance of what we are seeing, and try to anticipate the future trajectory of the universal principle behind it.   

At the same time, we postulate all sorts of causal relationships where conditions under which the model is created are thought to leave traces in the outputs – from technical details about the training process to the values of the organizations that give us the latest models  – just like we analyze the hell out of what a work of art says about the culture that created it. And so we end up in a position where we can only recognize what we’re looking for when we see it, but what we are looking for can only be identified by what is lacking. Meanwhile, the artifacts that we judge can be read as a signal of anything and everything at once.

If this sounds counterproductive (because it is), it’s worth considering why these kinds of contradictory modes of reading objects have arisen in the past over the history of art: to keep fears at bay. By making our judgments as spectators seem essential to understanding the current moment, we gain a feeling of control.  

And so, despite these contradictions, we see our appraisals of model outputs in the the current moment as correct and arising from some innate ability we have to recognize human intelligence. But aesthetic judgments have never been fixed – they have always evolved along with innovations in our ability to represent the world, whether through painting or photography or contemporary art. And so we should expect that with judgments of generative AI as well. We conclude by considering how the idea of taste and aesthetic judgment might continue to shape our interactions with generative model outputs, from “wireheading” to generative AI as a kind of art historical tool we can turn toward taste itself.

Blogging is “destroying the business model for quality”?

I happened to come across this post from 2011 about a now-forgotten journalist who was upset that bloggers were “destroying the business model for quality” in writing by flooding the market with free and crappy content.

It’s all so quaint. To be a journalist and to think that Public Enemy #1 is blogging . . . wow!

Here we are 12 years later and blogging has pretty much disappeared. This makes me sad. But that dude might well be happy about this state of affairs!

(from 2017 but still relevant): What Has the Internet Done to Media?

Aleks Jakulin writes:

The Internet emerged by connecting communities of researchers, but as Internet grew, antisocial behaviors were not adequately discouraged.

When I [Aleks] coauthored several internet standards (PNG, JPEG, MNG), I was guided by the vision of connecting humanity. . . .

The Internet was originally designed to connect a few academic institutions, namely universities and research labs. Academia is a community of academics, which has always been based on the openness of information. Perhaps the most important to the history of the Internet is the hacker community composed of computer scientists, administrators, and programmers, most of whom are not affiliated with academia directly but are employed by companies and institutions. Whenever there is a community, its members are much more likely to volunteer time and resources to it. It was these communities that created websites, wrote the software, and started providing internet services.

“Whenever there is a community, its members are much more likely to volunteer time and resources to it” . . . so true!

As I wrote a few years ago, Create your own community (if you need to).

But it’s not just about community; you also have to pay the bills.

Aleks continues:

The skills of the hacker community are highly sought after and compensated well, and hackers can afford to dedicate their spare time to the community. Society is funding universities and institutes who employ scholars. Within the academic community, the compensation is through citation, while plagiarism or falsification can destroy someone’s career. Institutions and communities have enforced these rules both formally and informally through members’ desire to maintain and grow their standing within the community.

Lots to chew on here. First, yeah, I have skills that allow me to be compensated well, and I can afford to dedicate my spare time to the community. This is not new: back in the early 1990s I wrote Bayesian Data Analysis in what was essentially my spare time, indeed my department chair advised me not to do it at all—master of short-term thinking that he was. As Aleks points out, was a time when a large proportion of internet users had this external compensation.

The other interesting thing about the above quote is that academics and tech workers have traditionally had an incentive to tell the truth, at least on things that can be checked. Repeatedly getting things wrong would be bad for your reputation. Or, to put it another way, you could be a successful academic and repeatedly get things wrong, but then you’d be crossing the John Yoo line and becoming a partisan hack. (Just to be clear, I’m not saying that being partisan makes you a hack. There are lots of scholars who express strong partisan views but with intellectual integrity. The “hack” part comes from getting stuff wrong, trying to pass yourself off as an expert on topics you know nothing about, ultimately being willing to say just about anything if you think it will make the people on your side happy.)

Aleks continues:

The values of academic community can be sustained within universities, but are not adequate outside of it. When businesses and general public joined the internet, many of the internet technologies and services were overwhelmed with the newcomers who didn’t share their values and were not members of the community. . . . False information is distracting people with untrue or irrelevant conspiracy theories, ineffective medical treatments, while facilitating terrorist organization recruiting and propaganda.

I’ve not looked at data on all these things, but, yeah, from what I’ve read, all that does seem to be happening.

Aleks then moves on to internet media:

It was the volunteers, webmasters, who created the first websites. Websites made information easily accessible. The website was property and a brand, vouching for the reputation of the content and data there. Users bookmarked those websites they liked so that they could revisit them later. . . .

In those days, I kept current about the developments in the field by following newsgroups and regularly visiting key websites that curated the information on a particular topic. Google entered the picture by downloading all of Internet and indexing it. . . . the perceived credit for finding information went to Google and no longer to the creators of the websites.

He continues:

After a few years of maintaining my website, I was no longer receiving much appreciation for this work, so I have given up maintaining the pages on my website and curating links. This must have happened around 2005. An increasing number of Wikipedia editors are giving up their unpaid efforts to maintain quality in the fight with vandalism or content spam. . . . On the other hand, marketers continue to have an incentive to put information online that would lead to sales. As a result of depriving contributors to the open web with brand and credit, search results on Google tend to be of worse quality.

And then:

When Internet search was gradually taking over from websites, there was one area where a writer’s personal property and personal brand were still protected: blogging. . . . The community connected through the comments on blog posts. The bloggers were known and personally subscribed to.

That’s where I came in!

Aleks continues:

Alas, whenever there’s an unprotected resource online, some startup will move in and harvest it. Social media tools simplified link sharing. Thus, an “influencer” could easily post a link to an article written by someone else within their own social media feed. The conversation was removed from the blog post and instead developed in the influencer’s feed. As a result, carefully written articles have become a mere resource for influencers. As a result, the number of new blogs has been falling.
Social media companies like Twitter and Facebook reduced barriers to entry by making so easy to refer to others’ content . . .

I hadn’t thought about this, but, yeah, good point.

As a producer of “content”—for example, what I’m typing right now—I don’t really care if people come to this blog from Google, Facebook, Twitter, an RSS feed, or a link on their browser. (There have been cases where someone’s stripped the material from here and put it on their own site without acknowledging the source, but that’s happened only rarely.) Any of those legitimate ways of reaching this content is fine with me: my goal is just to get it out there, to inform people and to influence discussion. I already have a well-paying job, so I don’t need to make money off the blogging. If it did make money, that would be fine—I could use it to support a postdoc—but I don’t really have a clear sense of how that would happen, so I haven’t ever looked into it seriously.

The thing I hadn’t thought about was that, even if to me it doesn’t matter where our reader are coming, this does matter to the larger community. Back in the day, if someone wanted to link or react to something on a blog, they’d do it in their own blog or in a comment section. Now they can do it from Facebook or Twitter. The link itself is no problem, but there is a problem in that there’s less of an expectation of providing new content along with the link. Also, Facebook and Twitter are their own communities, which have their strengths but which are different than those of blogs. In particular, blogging facilitates a form of writing where you fill in all the details of your argument, where you can go on tangents if you’d like, and where you link to all relevant sources. Twitter has the advantage of immediacy, but often it seems more like community without the content, where people can go on and say what they love or hate but without the space for giving their reasons.

“They got a result they liked, and didn’t want to think about the data.” (A fish story related to Cannery Row)

John “Jaws” Williams writes:

Here is something about a century-old study that you may find interesting, and could file under “everything old is new again.”

In 1919, the California Division of Fish and Game began studying the developing sardine fishery in Monterey. Ten years later, W. L. Scofield published an amazingly through description of the fishery, the abstract of which begins as follows:

The object of this bulletin is to put on record a description of the Monterey sardine fishery which can be used as a basis for judging future changes in the conduct of this industry. Detailed knowledge of changes is essential to an understanding of the significance of total catch figures, or of records of catch per boat or per seine haul. It is particularly necessary when applying any form of catch analysis to a fishery as a means of illustrating the presence or absence of depletion or of natural fluctuations in supply.

As detailed in this and subsequent reports, the catch was initially limited by the market and the capacity of the fishing fleet, both of which grew rapidly for several decades and provided the background for John Steinbeck’s “Cannery Row.” Later, sardine population famously collapsed, and never recovered.

Sure enough, just as Scofield feared, scientists who did not understand the data subsequently misused it as reflecting the sardine population, as I pointed out in this letter (which got the usual kind of response). They got a result they liked, and didn’t want to think about the data.

The Division of Fisheries was not the only agency to publish detailed descriptive reports. The USGS and other agencies did as well, but generally they have gone out of style; they take a lot of time and field work, are expensive to publish, and don’t get the authors much credit.

This comes to mind because I am working on a paper about a debris flood on a stream in one of the University of California’s natural reserves, and the length limits for the relevant print journals don’t allow for a reasonable description of the event and a discussion of what it means. However, now I can write a separate and more complete description, and have it go as on-line supplementary material. There is some progress.

The Ten Craziest Facts You Should Know About A Giraffe:

Palko points us to this story:

USC oncologist David Agus’ new book is rife with plagiarism

The publication of a new book by Dr. David Agus, the media-friendly USC oncologist who leads the Lawrence J. Ellison Institute for Transformative Medicine, was shaping up to be a high-profile event.

Agus promoted “The Book of Animal Secrets: Nature’s Lessons for a Long and Happy Life” with appearances on CBS News, where he serves as a medical contributor, and “The Howard Stern Show,” where he is a frequent guest. Entrepreneur Arianna Huffington hosted a dinner party at her home in his honor. The title hit No. 1 on Amazon’s list of top-selling books about animals a week before its March 7 publication.

However, a [Los Angeles] Times investigation found at least 95 separate passages in the book that resemble — sometimes word for word — text that originally appeared in other published sources available on the internet. The passages are not credited or acknowledged in the book or its endnotes. . . .

The passages in question range in length from a sentence or two to several continuous paragraphs. The sources borrowed from without attribution include publications such as the New York Times and National Geographic, scientific journals, Wikipedia and the websites of academic institutions.

The book also leans heavily on uncredited material from smaller and lesser-known outlets. A section in the book on queen ants appears to use several sentences from an Indiana newspaper column by a retired medical writer. Long sections of a chapter on the cardiac health of giraffes appear to have been lifted from a 2016 blog post on the website of a South African safari company titled, “The Ten Craziest Facts You Should Know About A Giraffe.”

Never trust a guy who wears a button down shirt and sweater and no tie.

The author had something to say:

“I was recently made aware that in writing The Book of Animal Secrets we relied upon passages from various sources without attribution, and that we used other authors’ words. I want to sincerely apologize to the scientists and writers whose work or words were used or not fully attributed,” Agus said in a statement. “I take any claims of plagiarism seriously.”

From the book:

“I’m not pitching a tent to watch chimpanzees in Tanzania or digging through ant colonies to find the long-lived queen, for example,” he writes. “I went out and spoke to the amazing scientists around the world who do these kinds of experiments, and what I uncovered was astonishing.”

All good, except that when he said, “I went out and spoke to the amazing scientists around the world,” he meant to say, “I went on Google and looked up websites of every South African safari company I could find.”

“The Ten Craziest Facts You Should Know About A Giraffe,” indeed.

And here are a few relevant screenshots:

I have no idea what that light bulb thingie is doing in that last image, but here’s some elaboration:

“Research misconduct,” huh? I guess if USC ever gives Dr. Agus a hard time about that, he could just move a few hundred miles to the north, where they don’t care so much about that sort of thing.

Why is every action hero named Jack, John, James, or, occasionally, Jason, but never Bill, Bob, or David?

Demetria Glace writes:

I wasn’t the first to make the connection, but once I noticed it, it was everywhere. You walk past a poster for a new movie and think, Why is every action hero named Jack, John, James, or, occasionally, Jason?

I turned to my friends and colleagues, asking desperately if they had also noticed this trend, as I made my case by listing off well-known characters: John Wick, Jason Bourne, Jack Reacher, John McClane, James Bond, Jack Bauer, and double hitter John James Rambo. . . .

As a data researcher, I [Glace] had to get to the bottom of it. What followed was months of categorizing hundreds of action movies, consulting experts in the field of name studies, reviewing academic papers and name databases, and seeking interviews with authors and screenwriters as to the rationale behind their naming decisions. . . .

Good stuff. It’s fun to see a magazine article with the content of a solid blog post.

Don’t get me wrong, I enjoy reading magazines. But magazine articles, even good magazine articles, follow a formula: they start off with a character and maybe an anecdote, then they ease into the main topic, they follow through with a consistent story, ending it all with a pat summary. By contrast, a blog post can start anywhere, go wherever it wants, and, most importantly, does not need to come to a coherent conclusion. The above-linked article on hero names was like that, and I was happy to see it running in Slate.

Cheating in science, sports, journalism, business, and art: How do they differ?

I just read “Lying for Money: How Legendary Frauds Reveal the Workings of Our World,” by Dan Davies.

I think the author is the same Dan Davies who came up with the saying, “Good ideas do not need lots of lies told about them in order to gain public acceptance,” and also the “dsquared” who has occasionally commented on this blog, so it is appropriate that I heard about his book in a blog comment from historian Sean Manning.

As the title of this post indicates, I’m mostly going to be talking here about the differences between frauds in three notoriously fraud-infested but very different fields of human endeavor: science, sports, and business.

But first I wanted to say that this book by Davies is one of the best things about economics I’ve ever read. I was trying to think what made it work so well, and I realized that the problem with most books about economics is that they’re advertising the concept of economics, or they’re fighting against dominant economics paradigms . . . One way or another, those books are about economics. Davies’s book is different in that he’s not saying that economics is great, he’s not defensive about economics, and he’s not attacking it either. His book is not about about economics; it’s about fraud, and he’s using economics as one of many tools to help understand fraud. And then when he gets to Chaper 7 (“The Economics of Fraud”), he’s well situated to give the cleanest description I’ve ever seen of economics, integrating micro to macro in just a few pages. I guess a lot of readers and reviewers will have missed that bit because it’s not as lively as the stories at the front of the book, also, who ever gets to Chapter 7, right?, and that’s kinda too bad. Maybe Davies could follow up with a short book, “Economics, what’s it all about?” Probably not, though, as there are already a zillion other books of this sort, and there’s only one “Lying for Money.” I’m sure there are lots of academic economists and economics journalists who understand the subject as well or better than Davies; he just has a uniquely (as far as I’ve seen) clear perspective, neither defensive nor oppositional but focused on what’s happening in the world rather than on academic or political battles for the soul of the field. (See here and here for further discussion of this point.)

Cheating in business

Cheating in business is what “Lying for Money” is all about. Davies mixes stories of colorful fraudsters with careful explanations of how the frauds actually worked, along with some light systematizing of different categories of financial crime.

In his book, Davies does a good job of not blaming the victims. He does not push the simplistic line that “you can’t cheat an honest man.” As he points out, fraud is easier to commit in an environment of widespread trust, and trust is in general a good thing in life, both because it is more pleasant to think well of others and also because it reduces transaction costs of all sorts.

Linear frauds and exponential frauds

Beyond this, one of the key points of the book is that there are two sorts of frauds, which I will call linear and exponential.

In a linear fraud, the fraudster draws money out of the common reservoir at a roughly constant rate. Examples of linear frauds include overbilling of all sorts (medical fees, overtime payments, ghost jobs, double charging, etc.), along with the flip side of this, which is not paying for things (tax dodging, toxic waste dumping, etc.). A linear fraud can go on indefinitely, until you get caught.

In an exponential fraud, the fraudster needs to keep stealing more and more to stay solvent. Examples of exponential frauds include pyramid schemes (of course), mining fraud, stock market manipulations, and investment scams of all sorts. A familiar example is Bernie Madoff, who raised zillions from people by promising them unrealistic returns on their money, but as a result incurred many more zillions of financial obligations. The scam was inherently unsustainable. Similarly with Theranos: the more money they raised from their investors, the more trouble they were in, given that they didn’t actually ever have a product. With an exponential fraud you need to continue expanding your circle of suckers—once that stops, you’re done.

A linear fraud is more sustainable—I guess the most extreme example might be Mister 880, the counterfeiter of one-dollar bills who was featured in a New Yorker article many years ago—but exponential frauds can grow your money faster. Embezzling can go either way: in theory you can sustainably siphon off a little bit every month without creating noticeable problems, but in practice embezzlers often seem to take more money than is actually there, giving them unending future obligations to replace the missing funds.

With any exponential fraud, the challenge is to come up with an exit strategy. Back in the day, you could start a pyramid scheme or other such fraud, wait until a point where the scam had gone long enough that you had a good profit but before you reach the sucker event horizon, and then skip town. The only trick is to remember to jump off the horse before it collapses. For business frauds, though, there’s a paper trail, so it’s harder to leave without getting caught. The way Davies puts it is that in your life you have one chance to burn your reputation in this way.

Another way for a fraudster to escape, financially speaking, is to go legit. If you’re a crooked investor, you can take your paper fortune to the racetrack or the stock market and make some risky bets: if you win big, you can pay off your funders and retire. Unfortunately, if you win big, and you’re already the kind of person to conduct an exponential fraud in the first place, it seems likely you’ll just take this as a sign that you should push further. Sometimes, though, you can keep things going indefinitely by converting an exponential into a linear scheme, as seems to have happened with some multilevel modeling operations. As Davies says, if you can get onto a stable financial footing, you have something that could be argued was never a fraud at all, just a successful business that makes its money by convincing people to pay more for your product than it’s worth.

The final exit strategy is recidivism, or perhaps rehabilitation. Davies shares many stories of fraudsters who got caught, went to prison, then popped out and committed similar crimes again. They kept doing what they were
good at! Every once in awhile you see a fraudster who managed to grease enough palms that after getting caught he could return to life as a rich person, for example Michael Milken.

One other thing. Yes, exponential frauds are especially unsustainable, but linear frauds can be tricky to maintain too. Even if you’re cheating people at a steady, constant rate, so you have no pressing need to raise funds to cover your past losses, you’re still leaving a trail of victims behind, and any one of them can decide to be the one to put in the effort to stop you. More victims = greater odds of being tracked down. There’s all sorts of mystique about “cooling off the mark,” but my impression that the main way that scammers get away with their frauds is by maintaining some physical distance from the people they’ve scammed, and by taking advantage of the legal system to make life difficult for any whistleblowers or victims who come after them. Again, see Theranos.

Cheating in science

Science fraud is a mix of linear and exponential. The linear nature of the fraud is that it’s typically a little bit in paper after paper, grant proposal after grant proposal, Ted talk after Ted talk, a lie here, an exaggeration there, some data manipulation, some p-hacking, at each time doing whatever it takes to get the job done. The fraud is linear in that there’s no compounding; it’s not like each new research project requires an ever-larger supply of fake data to make up for what was taken last time.

On the other hand, there’s a potentially exponential problem that, if you use fraud to produce an important “discovery,” others will want to replicate it for themselves, and when those replications fail, you’ll need to put in even more effort to prop up your original claims. In business, this propping-up can take different forms (new supplies of funds, public relations, threats, delays, etc.), and similarly there are different ways in science to prop up fake claims: you can ignore the failed replications and hope for the best, you can attack the replicators, you can use connections in the news media to promote your view and use connections in academia to publish purported replications of your own, you can jump sideways into a new line of research and cheat to produce success there . . . lots of options. The point is, fake scientific success is hydra-headed as it will spawn continuing waves of replication challenges. As with financial fraud, the challenge, after manufacturing a scientific success, is to draw a line under it, to get it accepted as canon, something they can never take away from you.

Cheating in sports

Lance Armstrong is an example of an exponential fraud. He doped to win bike races—apparently everybody was doping at the time. But Lance was really really good at doping. People started to talk, and then Lance had to do more and more to cover it up. He engaged in massive public relations, he threatened people, he tried to wait it out . . . nothing worked. Dude is permanently disgraced. It seems that he’s still rich, though: according to wikipedia, “Armstrong owns homes in Austin, Texas, and Aspen, Colorado, as well as a ranch in the Texas Hill Country.”

Other cases of sports cheating have more of a linear nature. Maradona didn’t have to keep punching balls into the net; once was enough, and he still got to keep his World Cup victory. If Brady Anderson doped, he just did it and that was that; no escalating behavior was necessary.

Cheating in journalism

Journalists cheat by making things up in the fashion of Mike Barnicle or Jonah Lehrer, or by reporting stories that originally appeared elsewhere without crediting the original source, which I’ve been told is standard practice at the New York Times and other media outlets. Reporting an already-told story without linking to the source is considered uncool in the blogging world but is so common in regular journalism that it’s not even considered cheating! Fabrication, though, remains a bridge too far.

Overall I’d say that cheating in journalism is like cheating in science and sports in largely being linear. Every instance of cheating leaves a hostage to fortune, so as you continue to cheat in your career, it seems likely you’ll eventually get found out for something or another, but there’s no need for an exponential increase in the amount of cheating in the way that business cheaters need to recoup larger and larger losses.

The other similarity of cheating in journalism to cheating in other fields is the continuing need for an exit strategy, with the general idea being to build up reputational credit during the fraud phase that you can then cash in during the discovery phase. That is, once enough people twig to your fraud, you are already considered too respectable or valuable enough to dispose of. Mike Barnicle is still on TV! Malcolm Gladwell is still in the New Yorker! (OK, Gladwell isn’t doing fraud, exactly: rather than knowingly publishing lies, he’s conveniently putting himself in the position where he can publish untrue and misleading statements while setting himself in some sort of veil of ignorance where he can’t be held personally to blame for these statements. He’s playing the role of a public relations officer who knows better than to check the veracity of the material he’s being asked to promote.)

Art fraud

I don’t have anything really to say about cheating in art, except that it’s a fascinating topic and much has been written about it. Art forgery involves some amusing theoretical questions, such as: if someone copies a painting or a style of a no-longer-living artist so effectively that nobody can tell the difference, is anyone harmed, other than the owners of existing work whose value is now diluted? From a business standpoint, though, art forgery seems similar to other forgery in being an essentially linear fraud, again leading to a linearly increasing set of potentially incriminating clues.

Closely related to art fraud is document fraud, for example the hilarious and horrifying (but more hilarious than horrifying) gospel of Jesus’s wife fraud, and this blurs into business fraud (the documents are being sold) and science fraud (in this case, bogus claims about history).

Similarities between cheating in business, science, sports, and journalism

Competition is a motivation for cheating. It’s hard to compete in business, science, sports, and journalism. Lots of people want to be successes and there aren’t enough slots for everyone. So if you don’t have the resources or talent or luck to succeed legitimately, cheating is an alternative path. Or if you are well situated for legitimate success, cheating can take you to the next level (I’m looking at you, Barry Bonds).

Cheating as a shortcut to success, that’s one common thread in all these fields of endeavor. There’s also cheating in politics, which I’m interested in as a political scientist, but right now I’m kinda sick of thinking about lying cheating political figures—this includes elected officials but also activists and funders (i.e., the bribers as well as the bribed)—so I won’t consider them here.

Another common thread is that you’re not supposed to cheat, so the cheater has to keep it hidden, and sometimes the coverup is, as they say, worse than the crime.

A final common thread is that business, science, sports, journalism, and art are . . . not cartels, necessary, but somewhat cooperative enterprises whose participants have a stake in the clean reputation of the entire enterprise. This motivates them to look away when they see cheating. It’s unpleasant, and it’s bad all around for the news to spread, as this could lead to increased distrust of the entire enterprise. Better to stick to positivity.

Differences

The key difference I see between these different areas is that in business it’s kinda hard to cheat by accident. In science we have Clarke’s Law: Any sufficiently crappy research is indistinguishable from fraud. In business or sports we wouldn’t say that. OK, there might be some special cases, for example someone sells tons of acres of Florida swampland and is successful because he (the salesman) sincerely thinks it’s legitimate property, but in general I think of business frauds as requiring something special, some mix of inspiration, effort, and lack of scruple that most of us can’t easily assemble. A useful idiot might well be useful as part of a business fraud, but I wouldn’t think that ignorance would be a positive benefit.

In contrast, in research, a misunderstanding of scientific method can really help you out, if your goal is to produce publishable, Gladwell-able, Freakonomics-able, NPR-able, Ted-able work. The less you know and the less you think, the further you can go. Indeed, if you approach complete ignorance of a topic, you can declare that you’ve discovered an entire new continent, and a pliable news media will go with you on that. And if you’re clueless enough, it’s not cheating, it’s just ignorance!

In this dimension, sports and art seem more like business, and journalism seems more like science. Yes, you can cheat in sports without realizing it, but knowing more should allow you to be more effective at it. I can’t think of a sporting equivalent to those many scientists who produce successful lines of research by wandering down forking paths, declaring statistical significance, and not realizing what they’ve been doing.

With journalism, though, there’s a strong career path of interviewing powerful people and believing everything they say, never confronting them. To put it another way, there’s only one Isaac Chotiner, but there are lots and lots of journalists who deal in access, and I imagine that many of them are sincere, i.e. they’re misleading their readers by accident, not on purpose.

Other thoughts inspired by the book Lying for Money

I took notes while reading Davies’s book. Page references are to the Profile Books paperback edition.

p.14, “All this was known at the time.” This comes up again on p.71: “At this point, the story should have been close to its conclusion. Indeed, the main question people asked in 1982, when OPM finally gave up and went bankrupt, is why didn’t it happen three years earlier? Like a Looney Toons character, nothing seemed to stop it. New investors were brought in as the old ones gave up in disgust.” This happens all the time; indeed it was one of the things that struck me about the Theranos story was how the company thrived for nearly a decade, after various people in the company realized the emptiness of the company’s efforts.

A fraud doesn’t stay afloat all by itself; it takes a lot of effort to keep it going. This effort can include further lies, the judicious application of money, and, as with Theranos, threats and retaliation. It’s a full-time job! Really there’s no time to make up the losses or get the fictional product to work, given all the energy being spent to keep the enterprise alive for years after the fact of the fraud is out in the open.

p.17, “Fraudsters don’t play on moral weaknesses, greed or fear; they play on weaknesses in the system of checks and balances.” I guess it’s a bit of both, no? One thing I do appreciate, though, is the effort Davies puts in to not present these people as charming rogues.

I want to again point to a key difference between fraud in business and fraud in science. Business fraud requires some actual talent, or at least an unusual lack of scruple or willingness to take risks, characteristics that set fraudsters apart from the herd. In contrast, scientific misconduct often just seems to require some level of stupidity, enough so that you can push buttons, get statistical results, and draw ridiculous conclusions without looking back. Sure, ambition and unscrupulousness can help, but in most cases just being stupid seems like enough, and also is helpful in the next stage of the process when it’s time to non-respond to criticism.

p.18, “Another thing which will come up again and again is that it is really quite rare to find a major commercial fraud which was the fraudster’s first attempt. An astonishingly high proportion of the villains of this book have been found out and even served prison time, then been places in positions of trust once again.” I’m reminded of John Gribbin and John Poindexter.

Closer to home, there was this amazing—by which I mean amazingly horrible—story of a public school that was run jointly by the New York City Department of Education and Columbia University Teachers College. The principal of this school had some issues. From the news report:

In 2009 and 2010, while Ms. Worrell-Breeden was at P.S. 18, she was the subject of two investigations by the special commissioner of investigation. The first found that she had participated in exercise classes while she was collecting what is known as “per session” pay, or overtime, to supervise an after-school program. The inquiry also found that she had failed to offer the overtime opportunity to others in the school, as required, before claiming it for herself.

The second investigation found that she had inappropriately requested and obtained notarized statements from two employees at the school in which she asked them to lie and say that she had offered them the overtime opportunity.

After those findings, we learn, “She moved to P.S. 30, another school in the Bronx, where she was principal briefly before being chosen by Teachers College to run its new school.”

So, let’s get this straight: She was found to be a liar, a cheat, and a thief, and then, with that all known, she was hired to two jobs as school principal?? An associate vice president of Teachers College said, “We felt that on balance, her recommendations were so glowing from everyone we talked to in the D.O.E. that it was something that we just were able to live with.” In short: once you’re plugged in, you stay plugged in.

p.47: Davies talks about how online drug dealers eventually want to leave the stressful business of drug dealing, and at this point they can cash in their reputation by taking a lot of orders and then disappearing with customers’ money. An end-of-career academic researcher can do something similar if they want, using an existing reputation to promote bad ideas. Usually though you wouldn’t want to do that, as there’s no anonymity so the negative outcome can reflect badly on everything that came before. The only example I can think of offhand is the Cornell psychology researcher Daryl Bem, who is now indelibly associated with some very bad papers he wrote on extra-sensory perception. I was also gonna include Orson Welles here, as back in the 1970s he did his very best to cash in his reputation on embarrassing TV ads. But, decades later, the ads are just am amusing curiosity and Orson’s classic movies are still around: his reputation survived just fine.

p.50: “When the same features of a system keep appearing without anyone designing them, you can usually be pretty sure that the cause is economic.” Well put!

p.57: Regarding Davies’s general point about fraud preying upon a general environment of trust, I want to say something about the weaponization of trust. An example is when a researcher is criticized for making scientific errors and then turns around, in a huff, and indignantly says he’s being accused of fraud. The gambit is to move the discussion from the technical to the personal, to move from the question of whether there really is salad oil on those tanks to the question of whether the salad oil businessman can be trusted.

p.62: Davies writes, “fraud is an unusual condition; it’s a ‘tail risk.'” All I can say is, fraud might be an unusual “tail risk” in business, but in science it’s usual. It happens all the time. Just in my own career, I had a colleague who plagiarized; another one who published a report deliberately leaving out data that contradicted the story he wanted to tell; another who lied, cheated, and stole (I can’t be sure about that one as I didn’t see it personally; the story was told to me by someone who I trust); another who smugly tried to break an agreement; and another who was conned by a coauthor who made up data. That’s a lot! It’s two cases that directly affected me and three that involved people I knew personally. There was also Columbia faking its U.S. News ranking data; I don’t know any of the people involved but, as a Columbia employee, I guess that I indirectly benefited from the fraud while it was happening.

I’d guess that dishonesty is widespread in business as well. So I think that when Davies wrote “fraud is an unusual condition,” he really meant that “large-scale fraud is an unusual condition”; indeed, that would fit the rest of his discussion on p.62, where he talks about “big systematic fraud” and “catastrophic fraud loss.”

This also reminds me of the problems with popular internet heuristics such as “Hanlon’s razor,” “steelmanning,” and “Godwin’s law,” all of which kind of fall apart in the presence of actual malice, actual bad ideas, and actual Nazis. The challenge is to hold the following two ideas in your head at once:

1. In science, bad work does not require cheating; in science, honesty and transparency are not enough; just cos I say you did bad work it doesn’t mean I’m accusing you of fraud; just cos you followed the rules as you were taught and didn’t cheat it doesn’t mean you made the discovery you thought you did.

2. There are a lot of bad guys and cheaters out there. It’s typically a bad idea to assume that someone is cheating, but it’s also often a mistake to assume that they’re not.

p.65: Davies refers to a “black hole of information.” I like that metaphor! It’s another way of saying “information laundering”: the information goes into the black hole, and when it comes out its source has been erased. Traditionally, scientific journals have functioned as such a black hole, although nowadays we are more aware that, even if a claim has been officially “published,” it should still be possible to understand it in the context of the data and reasoning that have been used to justify it.

As Davies puts it on p.71, “People don’t check up on things which they believe to have been ‘signed off.’ The threat is inside the perimeter.” I’ve used that analogy too! From 2016: “the current system of science publication and publicity is like someone who has a high fence around his property but then keeps the doors of his house unlocked. Any burglar who manages to get inside the estate then has free run of the house.”

p.76: “The government . . . has some unusual characteristics as a victim (it is large, and has problems turning customers away).” This reminds me of scientific frauds, where the scientific community (and, to the extent that the junk science has potential real-world impact, the public at large) is the victim. Scientific journals have the norm of taking every submission seriously; also, a paper that is rejected from one journal can be submitted elsewhere.

p.77: “If there is enough confusion around, simply denying everything and throwing counter-accusations at your creditors can be a surprisingly effective tactic.” This reminds me of the ladder of responses to criticism.

p.78: Davies describes the expression “cool out the mark” as having been “brought to prominence by Erving Goffman.” That’s not right! Cooling out the mark was already discussed in great detail in linguist David Maurer’s classic book from 1940, The Big Con. More generally, I find Goffman irritating for reasons discussed here, so I really don’t like to see him credited for something that Maurer already wrote about.

p.114: “Certain kinds of documents are only valid with an accountant’s seal of approval, and once they have gained this seal of validity, they are taken as ‘audited accounts’ which are much less likely to be subjected to additional verification or checking.” Davies continues: “these professions are considered to be circles of trust. The idea is partly that the long training and apprenticeship processes of the profession ought to develop values of trust and honesty, and weed out candidates who do not possess them. And it is partly that professional status is a valuable asset for the person who possesses it.”

This reminds me of . . . academic communities. Not all, but much of the time. This perspective helps answer a question that’s bugged me for awhile: When researchers do bad work, why do others in their profession defend them? Just to step away from our usual subjects of economics and psychology for a moment, why were the American Statistical Association and the American Political Science Association not bothered by having giving major awards to plagiarists (see here and here)? You’d think they’d be angry about getting rooked, or at least concerned that their associations are associated with frauds. But noooo, the powers that be in these organizations don’t give a damn. The Tour de France removed Lance Armstrong’s awards, but ASA and APSA can’t be bothered. Why? One answer is that they—we!—benefit from the respect given to people in our profession. To retract awards is to admit that this respect is not always earned. Better to just let everyone quietly go about their business.

On p.124, Davies shares an amusing story of the unraveling of a scam involving counterfeit Portuguese banknotes: “While confirming them to be genuine, the inspector happened to find two notes with the same serial numbers—a genuine one had been stacked next to its twin. Once he knew what to look for, it was not too difficult to find more pairs. . . .” The birthday problem in the wild!

p.126: “mining is a sector of the economy in which standards of honesty are variable but requirements for capital are large, and you can keep raising money for a long time before you have to show results.” Kind of like some academic research and tech industries! Just give us a few more zillion dollars and eventually we’ll turn a profit . . .

p.130: “The key to any certification fraud is to exploit the weakest link in the chain.” Good point!

p.131: “It’s often a very good idea to make sure that one is absolutely clear about what a certification process is actually capable of certifying . . . Gaps like this—between the facts that a certification authority can actually make sure of, and those which it is generally assumed it can—are the making of counterfeit fraud.”

This reminds me of scientific error—not usually fraud, I think, but rather the run-of-the-mill sorts of mistakes that researchers, journals, and publicists make every day because they don’t think about the gap between what has been measured and what is being claimed. Two particularly ridiculous examples from psychology are the 3-day study that was called “long term” and the paper whose abstract concluded, “That a person can, by assuming two simple 1-min poses, embody power and instantly become more powerful has real-world, actionable implications,” even though the reported studies had no measures whatsoever of anyone “becoming more powerful,” let alone any actionable implications of such an unmeasured quantity. Again, I see no reason to think these researchers were cheating; they were just following standard practice of making strong claims that sound good but were not addressed by their data. Given that experimental scientists—people whose job is to connect measurement to a larger reality!—regularly make this sort of mistake, I guess it’s not a surprise that the same problem arises in business.

p.134: Davies writes that medical professionals “have a long training program, a strong ethical code and a lot to lose if caught in a dishonest act.” But . . . Surgisphere! Dr. Anil Potti! OK, there are bad apples in every barrel. Also, I’m sure there’s some way these dudes rationalize their deeds. Ultimately, they’re just trying to help patients, right? They’re just being slowed down by all those pesky regulations.

p.136: Davies writes, “The thing is, the certification system for pharmaceuticals is also a safety system.” I love that “The thing is.” It signals to me that Davies didn’t knock himself out writing this book. He wrote the book, it was good, he was done, it got published. When I write an article or book, I get obsessive on the details. Not that I don’t make typos, solecisms, etc., but I’m pretty careful to keep things trim. Overall I think this works, it makes my writing easier to read, but I do think Davies’s book benefits from this relaxed style, not overly worked over. No big deal, just something I noticed in different places in the book.

p.137: “Ranbaxy Laboratories . . . pleaded guilty in 2013 to seven criminal charges relating to the generic drugs it manufactured . . . it was in the habit of using substandard ingredients and manufacturing processes, and then faking test results by buying boxes of its competitors’ branded product to submit to the lab. Ranbaxy’s frauds were an extreme case (although apparently not so extreme as to throw it of the circle of trust entirely; under new management it still exists and produces drugs today).” Whaaa???

p.145: Davies refers to “the vital element of time” in perpetuating a fraud. A key point here is that uncovering the fraud is never as high a priority to outsiders as perpetuating the fraud is for the fraudsters. Even when money is at stake, the amount of money lost by each individual investor will be less than what is at stake for the perpetuator of the fraud. What this means is that sometimes the fraudster can stay alive by just dragging things out until the people on the other side get tired. That’s a standard strategy of insurance companies, right? To delay, delay, delay until the policyholder just gives up, making the rational calculation that it’s better to just cut your losses.

I’ve seen this sort of thing before, that cheaters take advantage of other people’s rationality. They play a game of chicken, acting a bit (or a lot) crazier than anyone else. It’s the madman theory of diplomacy. We’ve seen some examples recently of researchers who’ve had to deal with the aftermath of cheating collaborators, and it can be tough! When you realize a collaborator is a cheater, you’re dancing with a tiger. Someone who’s willing to lie and cheat and make up data could be willing to do all sorts of things, for example they could be willing to lie about your collaboration. So all of a sudden you have to be very careful.

p.157: “In order to find a really bad guy at a Big Four accountancy firm, you have to be quite unlucky (or quite lucky if that’s what you were looking for). But as a crooked manager of a company, churning around your auditors until you find a bad ‘un is exactly what you do, and when you do find one, you hang on to them. This means that the bad auditors are gravitationally drawn into auditing the bad companies, while the majority of the profession has an unrepresentative view of how likely that could be.”

It’s like p-hacking! Again, a key difference is that you can do bad science on purpose, you can do bad science by accident, and there are a lot of steps in between. What does it mean if you use a bad statistical method, people keep pointing out the problem, and you keep doing it? At some point you’re sliding down the Clarke’s Law slope from incompetence to fraud. In any case, my point is that bad statistical methods and bad science go together. Sloppy regression discontinuity analysis doesn’t have to be a signal that the underlying study is misconceived, but it often is, in part because (a) regression discontinuity is a way to get statistical significance and apparent causal identification out of nothing, and (b) if you are doing a careful, well-formulated study, you might well be able to model your process more likely. Theory-free methods and theory-free science often go together, and not in a good way.

p.161: “The problem is that spotting frauds is difficult, and for the majority of investors not worth spending the effort on.” Spotting frauds is a hobby, not a career or even a job. And that’s not even getting into the Javert paradox.

p.173: “The key psychological element is the inability to accept that one has made a mistake.” We’ve seen that before!

p.200: “The easier something is to manage—the more possible it is for someone to take a comprehensive view of all that’s going on, and to check every transaction individually—the more difficult it is to defraud.” This reminds me of preregistration in science. It’s harder to cheat in an environment where you’re expected to lay out all the steps of your procedure. Cheating in that context is not impossible, but it’s harder.

p.204: Davies discusses “the circumstances under which firms would form, and how the economy would tend not to the frictionless ideal, but to be made up by islands of central planning linked by bridges of price signals.” Well put. I’ve long thought this but, without having a clear formulation in words, it wasn’t so clear to me. This is the bit that made me say the thing at the top of this post, about this being the best economics book I’ve ever read.

p.229: “as laissez-faire economics was just getting off the ground, the Victorian era saw the ideology of financial deregulation grow up at the same time as, and in many cases faster and more vigorously than, financial regulation itself.” That’s funny.

p.231: “The normal state of the political economy of fraud is one of constant pressure toward laxity and deregulation, and this tends only to be reversed when things have got so bad that the whole system is under imminent threat of losing its legitimacy.” Sounds like social psychology! Regarding the application to economics and finance, I think Davies should mention Galbraith’s classic book on the Great Crash, where this laxity and deregulation thing was discussed in detail.

p.243: Davies says that stock purchases by small investors are very valuable to the market because, as a stockbroker, you can “be reasonably sure that you’re not taking too big a risk that the person selling stock to you knows something about it that you don’t.” Interesting point, I’m sure not new to any trader but interesting to me.

p.251: “After paying fines and closing down the Pittston hole, Russ Mahler started a new oil company called Quanta Resources, and somehow convinced the New York authorities that despite having the same owner, employees, and assets, it was nothing to do with the serial polluter that they had banned in 1976.” This story got me wondering: where the authorities asleep at the switched, or were they bribed, or did they just have a policy of letting fraudsters try again?

As Davies writes on p.284, “comparatively few of the case studies we’ve looked at were first offenses. . . . there’s something about the modern economic system that keeps giving fraudsters second chances and putting people back in positions of responsibility when they’ve proved themselves dishonest.” I guess he should say “political and economic system.”

Davies continues: “This is ‘white-collar crime’ we’re talking about after all; one of its defining characteristics is that it’s carried out by people of the same social class as those responsible for making decision about crime and punishment. We’re too easy on people who look and act like ourselves.” I guess so, but also it can go the other way, right? I think I’m the same social class as Cass Sunstein, but I don’t feel any desire to go easy on him; indeed, it seems to me that, with all the advantages he’s had, he has even less excuse to misrepresent research than someone who came in off the street. From the other direction, he might see me as a sort of class traitor.

p.254: “It’s a crime against the control system of the overall economy, the network of trust and agreement that makes an industrial economy livable.” That’s how I feel about Wolfram research when they hire people to spam my inbox with flattering lies. If even the classy outfits are trying to con me, what does that say about our world?

p.254: “Unless they are controlled, fraudulent business units tend to outcompete honest ones and drive them out of business.” Gresham!

p.269: “Denial, when you are not part of it, is actually a terrifying thing. One watches one’s fellow humans doing things that will damage themselves, while being wholly unable to help.” I agree. This is how I felt when corresponding with the ovulation-and-clothing researchers and with the elections-and-lifespan researchers. The people on the other side of these discussions seemed perfectly sincere; they just couldn’t consider the possibility they might be on the wrong track. (You could say the same about me, except: (1) I did consider the possibility I could be wrong in these cases, and (2) there were statistical arguments on my side; these weren’t just matters of opinion.) Anyway, setting aside if I was right or wrong in these disputes, the denial (as I perceived it) just made me want to cry. I don’t think graduate students are well trained in handling mistakes, and then when they grow up and publish research, they remain stuck in this attitude. I can see how this could be even more upsetting if real money and livelihoods are on the line.

Finally

In the last sentence of the last page of his book, Davies writes, “we are all in debt to those who trust; they are the basis of anything approaching a prosperous and civilised society.”

To which I reply, who are the trusters to which we are in debt? For example, I don’t think we are all in debt to those who trust scams such as Theranos or the Hyperloop, nor are we in debt to the Harvard professor who fell for the forged Jesus document and then tried to explain away its problems rather than just listening to the critics. Nor are we in debt to the administrations of Cornell University, Ohio State University, the University of California, etc., when they did their part to diffuse criticism of bad work being done by their faculty who had been so successful at raising money and getting publicity for their institutions.

I get Davies’s point in the context of his book: if you fall for a Wolfram Research scam (for example), you’re not the bad guy. The bad guy is Wolfram Research, which is taking advantage of your state of relaxation, tapping into the difficult-to-replenish reservoir of trust. In other settings, though, the sucker seems more complicit, not the bad guy, exactly—ultimately the responsibility falls on the fraudsters, not the promoters of the fraud—but their state of trust isn’t doing the rest of us any favors, either. So I’m not really sure what to think about this last bit.

P.S. Sean Manning reviews the book here. Perhaps surprisingly, there’s essentially no overlap between Manning’s comments and mine.

Annals of Spam

OK, this one baffles me. It came in my inbox one day:

Dear Dr. Gelman,

I am writing to inquire about the availability of obtaining a visiting scholar position in your institution. I’m a lecturer in ** Institute **. And I’m in my final year of doctoral study in ** University. Currently I’m working on my dissertation on the personal growth and national imagination in American romantic Bildungsroman. After my graduation in January 2024, I plan to start studying abroad for one year and this study program will be sponsored by **. If I could have the honor to be accepted, what I need from you is an invitation letter and your signed short CV for the ** approval process to visit your university.

I’ve been interested in American Literature and history for more than a decade. The close internal relationship between novel and history has always been attractive to me, which enables me to observe literary work from a wider perspective. As a Ph.D. candidate, I have narrowed down my research to the early or the first half of 19th century American Literature and mainly focus on the Bildungsormans during that period. The reasons are as follows: firstly, the personal growth of the protagonist can reflect the national imagination of the author, so through the development of a fictional character the history of a country can be demonstrated; secondly, too many scholars focus on the second half of 19th century novels and 20th century works, neglecting the beginning of American literature. When working in this area, I find that there has been a great gap between research in America and **. While only a couple of researchers in recent years have devoted their study in this period in **, a considerable amount of research has already been dedicated to Charles Brockden Brown and Catharine Maria Sedgwick in America. And in **, it’s very difficult to get first hand material about American writers. Theses drive me to look forward to studying in the US as a visiting scholar.

I have been learning and teaching English for over two decades. I got my B. A. in English Education in 2005 from ** University, the hometown of **. I studied as an English Language and Literature major and got M. A. in 2007 from ** University of **, **. And since then, I have been teaching College English courses in ** Institute **, **. This means that there won’t be a serious language barrier in my communication and study with you.

Desire for knowledge and academic progress pushes me to further study. From August 2007 to July 2020, I also developed some other academic interests besides literature, including college English teaching and multi-modal metaphor, the first of which mainly serves for my teaching work and the second was inspired by some teaching material. These extended my academic horizon on the one hand, but also showed my lack of adequate academic training on the other. In order to improve my academic performance, I applied to be a Ph.D. candidate and now to be a visiting scholar. I sincerely hope that I can study under your supervision.

In order to make you know me better, I enclose my CV. Any comment from you will be highly appreciated. I’m looking forward to hearing from you. Thank you very much for your time and consideration.

Sincerely yours,

**
Lecturer
School of Foreign Languages
** Institute ** **

Don’t they know that my real expertise is on Freud? OK, it seems I’m an expert on North Korea, too. But Charles Brockden Brown and Catharine Maria Sedgwick? I’d never heard of them!