Extinct Champagne grapes? I can be even more disappointed in the news media

Happy New Year. This post is by Lizzie.

Over the end-of-year holiday period, I always get the distinct impression that most journalists are on holiday too. I felt this more acutely when I found an “urgent” media request in my inbox when I returned to it after a few days away. Someone at a major reputable news outlet wrote:

We are doing a short story on how the climate crisis is causing certain grapes, used in almost all champagne, to be on the brink of extinction. We were hoping to do a quick interview with you on the topic….Our deadline is asap, as we plan to run this story on New Years.

It was late on 30 December so I had missed helping them but still had to reply that I hoped that found some better information because ‘the climate crisis is causing certain grapes, used in almost all champagne, to be on the brink of extinction’ was not good information in my not-so-entirely-humble opinion as I study this and can think of zero-zilch-nada evidence to support this.

This sounded like insane news I would expect from more insane media outlets. I tracked down what I assume was the lead they were following (see here), and found it seems to relate to some AI start-up I will not do the service of mentioning that is just looking for more press. They seem to put out splashy sounding agricultural press releases often — and so they must have put out one about Champagne grapes being on the brink of extinction to go with New Year’s.

I am on a bad roll with AI just now, or — more exactly — the intersection of human standards and AI. There’s no good science that “the climate crisis is causing certain grapes, used in almost all champagne, to be on the brink of extinction.” The whole idea of this is offensive to me when human actions are actually driving species extinct. And it ignores tons of science on winegrapes and the reality that they’re pretty easy to grow (growing excellent ones? Harder). So, poor form on the part of the zero-standards-for-our-science AI startup. But I am more horrified by the media outlets that cannot see through this. I am sure they’re inundated with lots of crazy bogus stories every day, but I thought that their job was to report on ones that matter and they hopefully have some evidence are true.

What did they do instead of that? They gave a platform to a “a highly adaptable marketing manager and content creator” to talk about some bogus “study” and a few soundbites to a colleague of mine who actually knew the science (Ben Cook from NASA).

Here’s a sad post for you to start the new year. The Onion (ok, an Onion-affiliate site) is plagiarizing. For reals.

How horrible. I remember when The Onion started. They were so funny and on point. And now . . . What’s the point of even having The Onion if it’s running plagiarized material? I mean, yeah, sure, everybody’s gotta bring home money to put food on the table. But, really, what’s the goddam point of it all?

Jonathan Bailey has the story:

Back in June, G/O Media, the company that owns A.V. Club, Gizmodo, Quarts and The Onion, announced that they would be experimenting with AI tools as a way to supplement the work of human reporters and editors.

However, just a week later, it was clear that the move wasn’t going smoothly. . . . several months later, it doesn’t appear that things have improved. If anything, they might have gotten worse.

The reason is highlighted in a report by Frank Landymore and Jon Christian at Futurism. They compared the output of A.V. Club’s AI “reporter” against the source material, namely IMDB. What they found were examples of verbatim and near-verbatim copying of that material, without any indication that the text was copied. . . .

The articles in question have a note that reads as follows: “This article is based on data from IMDb. Text was compiled by an AI engine that was then reviewed and edited by the editorial staff.”

However, as noted by the Futurism report, that text does not indicate that any text is copied. Only that “data” is used. The text is supposed to be “compiled” by the AI and then “reviewed and edited” by humans. . . .

In both A.V. Club lists, there is no additional text or framing beyond the movies and the descriptions, which are all based on IMDb descriptions and, as seen in this case, sometimes copied directly or nearly directly from them.

There’s not much doubt that this is plagiarism. Though A.V. Club acknowledges that the “data” came from IMDb, it doesn’t indicate that the language does. There are no quotation marks, no blockquotes, nothing to indicate that portions are copied verbatim or near-verbatim. . . .

Bailey continues:

None of this is a secret. All of this is well known, well-understood and backed up with both hard data and mountains of anecdotal evidence. . . . But we’ve seen this before. Benny Johnson, for example, is an irredeemably unethical reporter with a history of plagiarism, fabrication and other ethical issues that resulted in him being fired from multiple publications.

Yet, he’s never been left wanting for a job. Publications know that, because of his name, he will draw clicks and engagement. . . . From a business perspective, AI is not very different from Benny Johnson. Though the flaws and integrity issues are well known, the allure of a free reporter who can generate countless articles at the push of a button is simply too great to ignore.

Then comes the economic argument:

But in there lies the problem, if you want AI to function like an actual reporter, it has to be edited, fact checked and plagiarism checked just like a real human.

However, when one does those checks, the errors quickly become apparent and fixing them often takes more time and resources than just starting with a human author.

In short, using an AI in a way that helps a company earn/save money means accepting that the factual errors and plagiarism are just part of the deal. It means completely forgoing journalism ethics, just like hiring a reporter like Benny Johnson.

Right now, for a publication, there is no ethical use of AI that is not either unprofitable or extremely limited. These “experiments” in AI are not about testing what the bots can do, but about seeing how much they can still lower their ethical and quality standards and still find an audience.

Ouch.

Very sad to see an Onion-affiliated site doing this.

Here’s how Bailey concludes:

The arc of history has been pulling publications toward larger quantities of lower quality content for some time. AI is just the latest escalation in that trend, and one that publishers are unlikely to ignore.

Even if it destroys their credibility.

No kidding. What next, mathematics professors who copy stories unacknowledged, introduce errors, and then deny they ever did it? Award-winning statistics professors who copy stuff from wikipedia, introducing stupid-ass errors in the process? University presidents? OK, none of those cases were shocking, they’re just sad. But to see The Onion involved . . . that truly is a step further into the abyss.

Explaining that line, “Bayesians moving from defense to offense”

Earlier today we posted something on our recent paper with Erik van Zwet et al., “A New Look at P Values for Randomized Clinical Trials.” The post had the provocative title, “Bayesians moving from defense to offense,” and indeed that title provoked some people!

The discussion thread here at the blog was reasonable enough, but someone pointed me to a thread at Hacker News where there was more confusion, so I thought I’d clarify one or two points.

First, yes, as one commenter puts it, “I don’t know when Bayesians have ever been on defense. They’ve always been on offense.” Indeed, we published the first edition of Bayesian Data Analysis back in 1995, and there was nothing defensive about our tone! We were demonstrating Bayesian solutions to a large set of statistics problems, with no apology.

As a Bayesian, I’m kinda moderate—indeed, Yuling and I published an entire, non-ironic paper on holes in Bayesian statistics, and there’s also a post a few years ago called What’s wrong with Bayes, where I wrote, “Bayesian inference can lead us astray, and we’re better statisticians if we realize that,” and “the problem with Bayes is the Bayesians. It’s the whole religion thing, the people who say that Bayesian reasoning is just rational thinking, or that rational thinking is necessarily Bayesian, the people who refuse to check their models because subjectivity, the people who try to talk you into using a ‘reference prior’ because objectivity. Bayesian inference is a tool. It solves some problems but not all, and I’m exhausted by the ideology of the Bayes-evangelists.”

So, yeah, “Bayesians on the offensive” is not new, and I don’t even always like it. Non-Bayesians have been pretty aggressive too over the years, and not always in a reasonable way; see my discussion with Christian Robert from a few years ago and our followup. As we wrote, “The missionary zeal of many Bayesians of old has been matched, in the other direction, by an attitude among some theoreticians that Bayesian methods were absurd—not merely misguided but obviously wrong in principle.”

Overall, I think there’s much more acceptance of Bayesian methods within statistics than in past decades, in part from the many practical successes of Bayesian inference and in part because recent successes of machine learning have given users and developers of methods more understanding and acceptance of regularization (also known as partial pooling or shrinkage, and central to Bayesian methods) and, conversely, have given Bayesians more understanding and acceptance of regularization methods that are not fully Bayesian.

OK, so what was I talking about?

So . . . Given all the above, what did I mean with my crack about Bayesians moving from defense to offense”? I wasn’t talking about Bayesians being positive about Bayesian statistics in general; rather, I was talking about the specific issue of informative priors.

Here’s how we used to talk, circa 1995: “Bayesian inference is a useful practical tool. Sure, you need to assign a prior distribution, but don’t worry about it: the prior can be noninformative, or in a hierarchical model the hyperparameters can be estimated from data. The most important ‘prior information’ to use is structural, not numeric.”

Here’s how we talk now: “Bayesian inference is a useful practical tool, in part because it allows us to incorporate real prior information. There’s prior information all around that we can use in order to make better inferences.”

My “moving from defense to offense” line was all about the changes in how we think about prior information. Instead of being concerned about prior sensitivity, we respect prior sensitivity and, when the prior makes a difference, we want to use good prior information. This is exactly the same as in any statistical procedure: when there’s sensitivity to data (or, in general, to any input to the model), that’s where data quality is particularly relevant.

This does not stop you from using classical p-values

Regarding the specific paper we were discussing in yesterday’s post, let me emphasize that this work is very friendly to traditional/conventional/classical approaches.

As we say right there in the abstract, “we reinterpret the P value in terms of a reference population of studies that are, or could have been, in the Cochrane Database.”

So, in that paper, we’re not saying to get rid of the p-value. It’s a data summary, people are going to compute it, and people are going to report it. That’s fine! It’s also well known that p-values are commonly misinterpreted (as detailed for example by McShane and Gal, 2017).

Given that people will be reporting p-values, and given how often they are misinterpreted, even by professionals, we believe that it’s a useful contribution to research and practice to consider how to interpret them directly, under assumptions that are more realistic than the null hypothesis.

So if you’re coming into all of this as a skeptic of Bayesian methods, my message to you is not, “Chill out, the prior doesn’t really matter anyway,” but, rather, “Consider this other interpretation of a p-value, averaging over the estimated distribution of effect sizes in the Cochrane database instead of conditioning on the null hypothesis.” So you now have two interpretations of the p-value, conditional on different assumptions. You can use them both.

Plagiarism means never having to say you’re clueless.

In a blog discussion on plagiarism ten years ago, Rahul wrote:

The real question for me is, how I would react to someone’s book which has proven rather useful and insightful in all aspects but which in hindsight turns out to have plagiarized bits. Think of whatever textbook, say, you had found really damn useful (perhaps it’s the only good text on that topic; no alternatives) and now imagine a chapter of that textbook turns out to be plagiarized.

What’s your reaction? To me that’s the interesting question.

It is an interesting question, and perhaps the most interesting aspect to it is that we don’t actually see high-quality, insightful plagiarized work!

Theoretically such a thing could happen: an author with a solid understanding of the material finds an excellent writeup from another source—perhaps a published article or book, perhaps something on wikipedia, maybe something written by a student—and inserts it directly into the text, not crediting the source. Why not credit the source? Maybe because all the quotation marks would make the resulting product more difficult to read, or maybe just because the author is greedy for acclaim and does not want to share credit. Greed is not a pretty trait, but, as Rahul writes, that’s a separate issue from the quality of the resulting product.

So, yeah, how to think about such a case? My response is that it’s only a hypothetical case, that in practice it never occurs. Perhaps readers will correct me in the comments, but until that happens, here’s my explanation:

When we write, we do incorporate old material. Nothing we write is entirely new, nor should it be. The challenge is often to put that old material into a coherent framework, which requires some understanding. When authors plagiarize, they seem to do this as a substitute for understanding. Reading that old material and integrating it into the larger story, that takes work. If you insert chunks of others’ material verbatim, it becomes clear that you didn’t understand it all, and not acknowledging the source is a way of burying that meta-information. To flip it around: as a reader, that hypertext—being able to track to the original source—can be very helpful. Plagiarists don’t want you to be aware of the copying in large part because they don’t want to reveal that they have not put the material all together.

To use statistical terminology, plagiarism is a sort of informative missingness: the very fact that the use of outside material has not been acknowledged itself provides information that the copyist has not fully integrated it into the story. That’s why Basbøll and I referred to plagiarism as a statistical crime. Not just a crime against the original author—but, yeah, as someone whose work has been plagiarized, it annoys me a lot—but also against the reader. As we put it in that article:

Much has been written on the ethics of plagiarism. One aspect that has received less notice is plagiarism’s role in corrupting our ability to learn from data: We propose that plagiarism is a statistical crime. It involves the hiding of important information regarding the source and context of the copied work in its original form. Such information can dramatically alter the statistical inferences made about the work.

To return to Rahul’s question above: have I ever seen something “useful and insightful” that’s been plagiarized? In theory it could happen: just consider an extreme example such as an entirely pirated book. Take a classic such as Treasure Island, remove the name Robert Louis Stevenson and replace it with John Smith, and it would still be a rollicking good read. But I don’t think this is usually what happens. The more common story would be that something absolutely boring is taken from source A and inserted without checking into document B, and no value is added in the transmission.

To put it another way, start with the plagiarist. This is someone who’s under some pressure to produce a document on topic X but doesn’t fully understand the topic. One available approach is to plagiarize the difficult part. From the reader’s perspective, the problem is that the resulting document has undigested material, the copied part could actually be in error or could be applied incorrectly. By not disclosing the source, the author is hiding important information that could otherwise help the reader better parse the material.

If I see some great material from another source, I’ll copy it and quote it. Quotations are great!

Music as a counterexample

In his book, It’s One for the Money, music historian Clinton Heylin gives many examples of musicians who’ve used material from others without acknowledgment, producing memorable and sometimes wonderful results. A well known example is Bob Dylan.

How does music differ from research or science writing? For one thing, “understanding” seems much more important in science than in music. Integrating a stolen riff into a song is just a different process than integrating an explanation of a statistical method into a book.

There’s also the issue of copyright laws and financial stakes. You can copy a passage from a published article, with quotes, and it’s no big deal. But if you take part of someone’s song, you have to pay them real money. So there’s a clear incentive not to share credit and, if necessary, to muddy the waters to make it more difficult for predecessors to claim credit.

Finally, in an academic book or article it’s easy enough to put in quotation marks and citations. There’s no way to do that in a song! Yes, you can include it in the liner notes, and I’d argue that songwriters and performers should acknowledge their sources in that way, but it’s still not as direct as writing, “As X wrote . . .”, in an academic publication.

What are the consequences of plagiarism?

There are several cases of plagiarism by high-profile academics who seem to have suffered no consequences (beyond the occasional embarrassment when people like me bring it up or when people check them on wikipedia): examples include some Harvard and Yale law professors and this dude at USC. The USC case I can understand—the plagiarist in question is a medical school professor who probably makes tons of money for the school. Why Harvard and Yale didn’t fire their law-school plagiarists, I’m not sure, maybe it’s a combination of “Hey, these guys are lawyers, they might sue us!” and a simple calculation along the lines of: “Harvard fires prof for plagiarism” is an embarrassing headline, whereas “Harvard decides to do nothing about a plagiarist” likely won’t make it into the news. And historian Kevin Kruse still seems to be employed at Princeton. (According to Wikipedia, “In October 2022, both Cornell, where he wrote his dissertation, and Princeton, where he is employed, ultimately determined that these were “citation errors” and did not rise to the level of intentional plagiarism.” On the plus side, “He is a fan of the Kansas City Chiefs.”)

Other times, lower-tier universities just let elderly plagiarists fade away. I’m thinking here of George Mason statistician Ed Wegman and Rutgers political scientist Frank Fischer. Those cases are particularly annoying to me because Wegman received a major award from the American Statistical Association and Fischer received an award from the American Political Science Association—for a book with plagiarized material! I contacted the ASA to suggest they retract the award and I contacted the APSA to suggest that they share the award with the scholars who Fischer had ripped off—but both organizations did nothing. I guess that’s how committees work.

We also sometimes see plagiarists get canned. Two examples are Columbia history professor Charles Armstrong and Arizona State historian Matthew Whitaker. Too bad for these guys that they weren’t teaching at Harvard, Yale, or Princeton, or maybe they’d still be gainfully employed!

Outside academia, plagiarism seems typically to have more severe consequences.

Journalism: Mike Barnicle, Stephen Glass, etc.
Pop literature: that spy novelist (also here), etc.

Lack of understanding

The theme of this post is that, at least regarding academics, plagiarism is a sign of lack of understanding.

A common defense/excuse/explanation for plagiarism is that whatever had been copied was common knowledge, just some basic facts, so who cares if it’s expressed originally? This is kind of a lame excuse given that it takes no effort at all to write, “As source X says, ‘. . .'” There seems little doubt that the avoidance of attribution is there so that the plagiarist gets credit for the words. And why is that? It has to depend on the situation—but it doesn’t seem that people usually ask the plagiarist why they did it. I guess the point is that you can ask the person all you want, but they don’t have to reply—and, given the record of misrepresentation, there’s no reason to suspect a truthful answer.

But, yeah, sometimes it must be the case that the plagiarist understands the copied material and is just being lazy/greedy.

What’s interesting to me is how often it happens that the plagiarist (or, more generally, the person who copies without attribution) evidently doesn’t understand the copied material.

Here are some examples:

Weggy: copied from wikipedia, introducing errors in the process.

Chrissy: copied from online material, introducing errors in the process; this example of unacknowledged copying was not actually plagiarism because it was stories being repeated without attribution, not exact words.

Armstrong (not the cyclist): plagiarizing material by someone else, in the process getting the meaning of the passage entirely backward.

Fischer (not the chess player): OK, I have to admit, this one was so damn boring I didn’t read through to see if any errors were introduced in the copying process.

Say what you want about Mike Barnicle and Doris Kearns Goodwin, but I think it’s fair to assume that they did understand the material they were ripping off without attribution.

In contrast, academic plagiarists seem to copy not so much out of greed as from laziness.

Not laziness as in, too lazy to write the paragraph in their own words, but laziness as in, too lazy to figure out what’s going on—but it’s something they’re supposed to understand.

That’s it!

You’re an academic researcher who is doing some work that is relying on some idea or method, and it’s considered important that the you understand it. This could be a statistical method being used for data analysis, it could be a key building block in an expository piece, it could be some primary sources in historical work, something like that. Just giving a citation and a direct quote wouldn’t be enough, because that wouldn’t demonstrate the required understanding:
– If you’re using a statistical method, you have to understand it at some level or else the reader can’t be assured that you’re using it correctly.
– In a tutorial, you need to understand the basics, otherwise why are you writing the tutorial in the first place.
– In historical work, often the key contribution is bringing in new primary sources. If you’re not doing that, a lot more of a burden is placed on interpretation, which maybe isn’t your strong point.

So, you plagiarize. That’s the only choice! OK, not the only choice. Three alternatives are:
1. Don’t write and publish the article/book/thesis. Just admit you have nothing to add. But that would be a bummer, no?
2. Use direct quotes and citations. But then there may be no good reason for anyone want to read or publish the article/book/thesis. To take an extreme example, is Wiley Interdisciplinary Reviews going to publish a paper that is a known copy of a wikipedia entry? Probably not. Even if your buddy is an editor of the journal, he might think twice.
3. Put in the work to actually understand the method or materials that you’re using. But, hey, that’s a lot of effort! You have a life to read, no? Working out math, reading obscure documents in a foreign language, actually reading what you need to use, that would take effort! Ok, that’s effort that most of us would want to put in, indeed that’s a big reason we became academics in the first place: we enjoy coding, we enjoy working out math, understanding new things, reading dusty old library books. But some subset of us doesn’t want to do the work.
If, for whatever reason, you don’t want to do any of the above three options, then maybe you’ll plagiarize. And just hope that, if you get caught, you receive the treatment given to the Harvard and Yale law professors, the USC medical school professor, and the Princeton history professor or, if you do it late enough in your career, the George Mason statistics professor and the Rutgers history professor. So, stay networked and avoid pissing off powerful people within your institution.

As I wrote last year regarding scholarly misconduct more generally:

I don’t know that anyone’s getting a pass. What seems more likely to me is that anyone—left, center, or right—who gets more attention is also more likely to see his or her work scrutinized.

Or, to put it another way, it’s a sad story that perpetrators of scholarly misconduct often “seem to get a pass” from their friends and employers and academic societies, but this doesn’t seem to have much to do with ideological narratives; it seems more like people being lazy and not wanting a fuss.

The tell

The tell, as they say in poker, is that the copied-without-attribution material so often displays a lack of understanding. Not necessarily a lack of ability to understand—-Ed Wegman could’ve spent an hour reading through the Wikipedia passage he’d copied and avoided introducing an error; Christian Hesse could’ve spent some time actually reading the words he typed, and maybe even some doing some research, and avoiding errors such as this, reported by chess historian Edward Winter:

In 1900 Wilhelm/William Steinitz died, a fact which did not prevent Christian Hesse from quoting a remark by Steinitz about a mate-in-two problem by Pulitzer which, according to Hesse, was dated 1907. (See page 166 of The Joys of Chess.) Hesse miscopied from our presentation of the Pulitzer problem on page 11 of A Chess Omnibus (also included in Steinitz Stuck and Capa Caught). We gave Steinitz’s comments on the composition as quoted on page 60 of the Chess Player’s Scrap Book, April 1907, and that sufficed for Hesse to assume that the problem was composed in 1907.

Also, I can only assume that Korea expert Charles Armstrong could’ve carefully read the passage he was ripping off and avoided getting its meaning backward. But having the ability to do the work isn’t enough. To keep the quality up in the finished product, you have to do the work. Understanding new material is hard; copying is easy. And then it makes sense to cover your tracks. Which makes it harder for the reader to spot the mistakes. Etc.

In his classic essay, “Politics and the English Language,” the political journalist George Orwell drew a connection between cloudy writing and cloudy content, which I think applies to academic writing as well. Something similar seems to be going on with copying without attribution. It happens when authors don’t understand what they’re writing about.

P.S. I just came across this post from 2011, “A (not quite) grand unified theory of plagiarism, as applied to the Wegman case,” where I wrote, “It’s not that the plagiarized work made the paper wrong; it’s that plagiarism is an indication that the authors don’t really know what they’re doing.” I’d forgotten about that!

Source of the line, “One should always beat a dead horse because the horse is never really dead”

Remember that quote, “One should always beat a dead horse because the horse is never really dead”?

Paul Alper gives further background:

That dead horse quotation originated with a British colleague of mine during the time I was working for the OECD but based in London in the mid 1960s; our project concerned determining the consequences of raising the school leaving age. Peter was employed by the British Civil Service and thus, he was in frequent contact with people who always wanted to go back to the comfort of square one, invariably the status quo. Thus, the remark illustrating that nothing is ever truly settled and interred with its bones. He was quite a gifted mimic and although he had a northern accent, he could imitate our supervisors’ upper-class accents and speech patterns.

Speaking of northern accents, we watched In the Loop the other day. It was really funny!

Celebrity scientists and the McKinsey bagmen

Josh Marshall writes:

Trump doesn’t think of truth or lies the way you or I do. Most imperfect people, which is to say all of us, exist in a tension between what we believe is true and what is good for or pleasing to us. If we have strong character we hew closely to the former, both in what we say to others and what we say to ourselves. The key to understanding Trump is that it’s not that he hews toward the latter. It’s that the tension doesn’t exist. What he says is simply what works for him. Whether it’s true is irrelevant and I suspect isn’t even part of Trump’s internal dialog. It’s like asking an actor whether she really loved her husband like she claimed in her blockbuster movie or whether she was lying. It’s a nonsensical question. She was acting.

The analogy to the actor is a good one.

Regarding the general sort of attitude and behavior discussed here, though, I don’t think Trump stands out as much as Marshall implies. Even setting aside other politicians, who in the matter of lying often seem to differ from the former president more in degree than kind, I feel like I’ve seen the same sort of thing with researchers, which is one reason I think Clarke’s law (“Any sufficiently crappy research is indistinguishable from fraud”) is so often relevant.

When talking about researchers who don’t seem to care about saying the truth, I’m not just talking about various notorious flat-out data fakers. I’m also talking about researchers who just do unreplicable crap or who make claims in the titles and abstracts of their papers that aren’t supported by their data. We get lots of statements that are meaningless or flat-out false.

Does the truth matter to these people? I don’t know. I think they believe in some things they view as deeper truths: (a) their vague models of how the world works are correct, and (b) they are righteous eople. Once you start there, all the false statements don’t matter, as they are all being done in the service of a larger truth.

I don’t think everyone acts this way—I have the impression that most people, as Marshall puts it, “exist in a tension between what we believe is true and what is good for or pleasing to us.” There’s just a big chunk of people—including many academic researchers, journalists, politicians, etc.—who don’t seem to feel that tension. As I’ve sometimes put it, they choose what to say or what to write based on the music, not the words. And they see the rest of us as “schoolmarms” or “Stasi“—pedants who get in the way of the Great Men of science. Not the same as Donald Trump by a longshot, but I see some similarities in that it’s kinda hard to pin them down when it comes to factual beliefs. It’s much more about who’s-side-are-you-on.

Also incentives: it’s not so much that people lie because of incentives, as that incentives affect the tough calls they make, and incentives affect who succeeds on climbing the greasy pole of success.

Concerns about misconduct at Harvard’s department of government and elsewhere

The post below addresses a bunch of specifics about Harvard, but for the key point, jump to the last paragraph of the post.

Problems about Harvard

A colleague pointed me to this post by Christopher Brunet, “The Curious Case of Claudine Gay,” and asked what I thought. It was interesting. I’ve met or corresponded with almost all the people involved, at some time or another. Here’s my take:

Interesting. I know almost all the people involved, one way or another (sometimes just by email). Here’s my take:

– There’s a claim that Harvard political science professor Ryan Enos falsified a dataset. I looked at this awhile ago. I thought I’d blogged it but I couldn’t find it in a google search. There’s a pretty good summary here by journalist Jesse Singal here. I corresponded with both Singal and Brunet on this one. As I wrote, “I’d say that work followed standard operating procedure of that era which indeed was to draw overly strong conclusions from quantitative data using forking paths.” I don’t think it’s appropriate to say that someone falsified data, just because they did an analysis that (a) had data issues and (b) came to a conclusion that doesn’t make you happy. Data issues come up all the time.

– There’s a claim that Gay “swept this [Enos investigation] under the rug” (see here). This reminds me of my complaint about the University of California not taking seriously the concerns about the publications of Matthew “Why We Sleep” Walker (see here). A common thread is that universities don’t like to discipline their tenured professors! Also, though, I wasn’t convinced by the claim that Enos committed research misconduct. The Walker case seems more clear to me. But, even with the Walker case, it’s possible to come up with excuses.

– There’s a criticism that Gay’s research record is thin. That could be. I haven’t read her papers carefully. I guess that a lot of academic administrators are known more for their administration than their research records. Brunet writes, “A prerequisite for being a Dean at Harvard is having a track record of research excellence.” I guess that’s the case sometimes, maybe not other times. Lee Bollinger did a lot as president of University of Michigan and then Columbia, but I don’t think he’s known for his research. He published some law review articles once upon a time? Brunet refers to Gay being an “affirmative action case,” but that seems kind of irrelevant given that that lots of white people reach academic heights without doing influential research.

– There’s a criticism of a 2011 paper by Dustin Tingley, which has the line, “Standard errors clustered at individual level and confidence intervals calculated using a parametric bootstrap running for 1000 iterations,” but Brunet says, “when you actually download his R code, there is no bootstrapping.” I guess, maybe? I clicked through and found the R code here, but I don’t know how the “zelig” package works. Brunet writes that Tingley “grossly misrepresented the research processes by claiming his reported SEs are bootstrap estimates clustered at the individual level. As per the Zelig documentation, no such bootstrapping functionality ever existed in his chosen probit regression package.” I googled and it seemed that zelig did have boostrapping, but maybe not with clustering. I have no idea what’s going on here: it could be a misunderstanding of the software on Brunet’s part, a misunderstanding on Tingley’s part, or some statistical subtlety. I’m not really into this whole clustered standard errors thing anyway. My guess is that there was some confusion regarding what is a “bootstrap,” and it makes sense that a journalist coming at this from the outside might miss some things. The statistical analysis in this 2011 paper can be questioned, as is usually the case with anyone’s statistical analysis when they’re working on an applied research frontier. For example, from p.12 of Tingley’s paper: “Looking at the second repetition of the experiment, after which subjects had some experience with the strategic context, there was a significant difference in rejection rates across the treatments in the direction predicted by the model (51% when delta = 0.3 and 63% when delta = 0.7) (N = 396, t = 1.37, p = .08). Pooling offers of all sizes together I find reasonable support for Hypothesis 1 that a higher proportion of offers will be rejected, leading to both players paying a cost, when the shadow of the future was higher.” I’m not a fan of this sort of statistical-significance-based claim, including labeling p = .08 as “reasonable support” for a hypothesis, but this is business as usual in the social sciences.

– There’s a bunch of things about internal Harvard politics. I have zero knowledge one way or another regarding internal Harvard politics. What Brunet is saying there could be true, or maybe not, as he’s relying on various anonymous sources and other people with axes to grind. For example, he writes, “Gay did not recuse herself from investigating Enos. Rather, she used the opportunity to aggressively cover up his research misconduct.” I have no idea what standard policy is here. If she had recused herself, maybe she could be criticized for avoiding the topic. For another example, Brunet writes, “Claudine Gay allowed Michael Smith to get away scot-free in the Harvard-Epstein ties investigation — she came in and nicely whitewashed it all away. Claudine Gay has Epstein coverup stink on her, and Michael Smith has major Epstein stink on him,” and this could be a real issue, or it could just be a bunch of associations, as he doesn’t actually quote from the Harvard-Epstein ties investigation to which he refers. Regarding Jorge Dominguez: as Brunet says, the guy had been around for decades—indeed, I heard about his sexual harassment scandal back when I was a Ph.D. student at Harvard, I think it was in the student newspaper at the time, and I also remember being stunned, not so much that it happened, but that the political science faculty at the time just didn’t seem to care—so it’s kind of weird how first Brunet (rightly) criticizes the Government “department culture” that allowed a harasser to stay around for so long, and then he criticizes Smith for “protecting Dominguez” and criticizes Gay for being “partly responsible for having done nothing to address Dominguez’s abuses”—but then he also characterizes Smith as having “decided to throw [Dominguez] under the bus.” You can’t have it both ways! Responding to a decades-long harassment campaign is not “throwing someone under the bus.” Regarding Roland Fryer, Brunet quotes various politically-motivated people complimenting Fryer, which is fine—they guy did some influential research—but no context is added by referring to Fryer as “a mortal threat to some of the most powerful black people at Harvard” and referring to Gay as “a silky-smooth corporate operator.” Similarly, the Harvey Weinstein thing is something that can go both ways: if Gay criticizes a law professor who chooses to defend Weinstein, then she’s “was driven by pure spite. She is a petty and vicious little woman.” If she had supported the prof, I can see the argument the other way: so much corruption, she turns a blind eye to Epstein and then to Weinstein, why is she attacking Fryer but defending the law professor who is defending the “scumbag,” etc.

It’s everywhere

Here’s my summary. I think if you look carefully at just about any university social-science departments, you’ll be likely to find some questionable work, some faculty who do very little research, and some administrators who specialize in administration rather than research, as well as lots and lots of empirical papers with data challenges and less than state-of-the-art statistical analyses. You also might well find some connections to funders who made their money in criminal enterprises, business and law professors who work for bad guys, and long-tolerated sexual harassers. I also expect you can find all of this in private industry and government; we just might not hear about it. Universities have a standard of openness that allows us to see the problems, in part because universities have lots of graduates who can spill the beans without fear of repercussions. Also, universities produce public documents. For example, the aforementioned Matthew Walker wrote Why We Sleep. The evidence of his research misconduct is right out there. In a government or corporate context, the bad stuff can be inside of internal documents.

Executive but no legislative and no judicial

There’s also the problem that universities, and corporations, have an executive branch but no serious legislative or judicial branches. I’ve seen a lot of cases of malfeasance within universities where nothing is done, or where whatever is done is too little, too late. I attribute much of this problem to the lack of legislative and judicial functions. Stepping back, we could think of this as a problem with pure utilitarianism. In a structural system of government, each institution plays a role. The role of the judicial system is to judge without concern about policy consequences. In the university (or a corporation), there is on the executive, and it’s hard for the executive to make a decision without thinking about consequences. Executives will accept malfeasance of all sorts because they decide that the cost of addressing the malfeasance is greater than the expected benefit. I’m not even just talking here about research misconduct, sexual harassment, or illegal activities by donors; other issues that arise range from misappropriation of grant money, violation of internal procedures, and corruption in the facilities department.

To get back to research for a moment, there’s also the incentive structure that favors publication. Many years ago I had a colleague who showed me a paper he’d written that was accepted for publication in a top journal. I took a look and realized it had a fatal error–not a mathematical error, exactly, more of a conceptual error so that his method wasn’t doing what he was claiming it was doing. I pointed it out to him and said something like, “Hey, you just dodged a bullet–you almost published a paper that was wrong.” I assumed he’d contact the journal and withdraw the article. But, no, he just let the publication process go as scheduled: it gave him another paper on his C.V. And, back then, C.V.’s were a lot shorter; one publication could make a real difference! That’s just one story; the point is that, yes, of course a lot of fatally flawed work is out there.

So, yeah, pull up any institutional rock and you’re likely to find some worms crawling underneath. It’s good for people to pull up rocks! So, fair enough for Brunet to write these posts. And it’s good to have lots of people looking into these things, from all directions. The things that I don’t buy are his claims that there is clear research misconduct by Enos and Tingley, and his attempt to tie all these things together to Gay or to Harvard more generally. There’s a paper from 2014 with some data problems, a paper from 2011 by a different professor from the same (large) department that used some software that does bootstrapping, a professor in a completely different department who got donations from a criminal, a political science professor and an economics professor with sexual harassment allegations, a law professor who was defending a rich, well-connected rapist . . . and Brunet is criticizing Gay for being too lenient in some of these cases and too strict in others. Take anyone who’s an administrator at a large institution and you’ll probably find a lot of judgment calls.

Lots of dots

To put it another way, it’s fine to pick out a paper published in 2014 with data problems and a paper published in 2011 with methods that are not described in full detail. Without much effort it should be possible to find hundreds of examples from Harvard alone that are worse. Much worse. Here are just a few of the more notorious examples:

Stereotype susceptibility: Identity salience and shifts in quantitative performance,” by Shih, Pittinsky, and Ambady (1999)

This Old Stereotype: The Pervasiveness and Persistence of the Elderly Stereotype,” by Cuddy, Norton, and Fiske (2005)

Rule learning by cotton-top tamarins,” by Hauser, Weiss, and Marcus (2006)

Signing at the beginning makes ethics salient and decreases dishonest self-reports in comparison to signing at the end,” by Shu, Mazar, Gino, Ariely, and Bazerman (2012)

“Jesus said to them, ‘My wife…'”: A New Coptic Papyrus Fragment,” by King (2014)

Physical and situational inequality on airplanes predicts air rage,” by DeCelles and Norton (2016).

the replication rate in psychology is quite high—indeed, it is statistically indistinguishable from 100%” (not actually in a published paper, but in a press release featuring two Harvard processors from 2016)

Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis,” by Mehra, Ruschitzka, and Patel (2020).

It’s not Brunet’s job, or mine, or anyone’s, to look at all these examples, and it’s fine for Brunet to focus on two much more ambiguous cases of problematic research papers. The point of my examples above is to put some of his connect-the-dots exercises into perspective. Harvard, and any other top university, will have hundreds of “dots”—bad papers, scandals, harassment, misconduct, etc.—that can be connected in many different ways.

A problem of quality control

We can see this as a problem of quality control. A large university is going to have some rate of iffy research, sexual harassment, tainted donations (and see here for a pointer to a horrible Harvard defense of that), faculty who work for bad people, etc., and it’s not really set up to handle this. Indeed, a top university such as Harvard or USC could well be more likely to have such problems: Its faculty are more successful, so even their weak work could get publicity, their faculty are superstars so might be more likely to get away with sexual harassment (but it seems that even the non-tenure-track faculty at such places can be protected by the old boys’ network), top universities could be more likely to get big donations from rich criminals, and they could also have well-connected business and law professors who’d like to make money defending bad guys (back at the University of California we had a professor who was working for the O. J. Simpson defense team!). I’ve heard a rumor that top universities can even cheat on their college rankings. And, again, universities have no serious legislative or judicial institutions, so the administrators at any top university will find themselves dealing with an unending stream of complaints regarding research misconduct, sexual harassment, tainted donations, and questionable outside activities by faculty, not to mention everyday graft of various sorts. I’m pretty sure all this is happening in companies too; we just don’t usually hear so much about it. Regarding the case of Harvard’s political science department, I appreciate Brunet’s efforts to bring attention to various issues, even if I am not convinced by several of his detailed claims and am not at all convinced by his attempt to paint this all as a big picture involving Gay.

In July 2015 I was spectacularly wrong

See here.

Also interesting was this question that I just shrugged aside:

If a candidate succeeds in winning a nomination and goes on to win the election and reside in the White House do they have to give up their business interests as these would be seen as a conflict of interest? Can a US president serve in office and still have massive commercial business interests abroad?

“Other than when he treated Steve Jobs, Agus, 58, had never been told anything besides that he’s awesome . . .”

Remember when we talked about the problem with the “scientist-as-hero” narrative?

Here’s another example. Celebrity doctor / USC professor David Agus had a side gig putting his name on plagiarized books that he never read.

I get that some people are busy, but talk about lazy! Putting your name on a book you didn’t write is one thing, but not even reading it! C’mon, dude.

But—hey—you can see what happened, right? The title of this post is a line from a magazine article about the story. I don’t actually like the article because it seems like it was written by Agus’s publicist (it credits Agus as being “very involved” with the books that have his name on them, without explaining how a “very involved” author didn’t notice an entire section of plagiarized material about giraffes—how exactly did he “dictate the substance” of that bit??), but that one line about “never been told anything besides that he’s awesome” is a good one, and I think it captures a big problem with how science is reported.

Similar problems arise with non-scientist celebrity academics too. Alan Dershowitz and Steven Pinker got nearly uniformly-positive coverage until it came out that they were doing favors for Jeffrey Epstein. Cass Sunstein was swaggering around saying he’d discovered a new continent (i.e., he wrote a book about a topic he knew next to nothing about). Edgelord Marc Hauser was riding high until he wasn’t, etc etc etc.

These people get so much deference that they just take it as their due. I’d prefer more historically-informed paradigms of scientific progress.

P.S. How was it that Los Angeles Magazine decided to run an article presenting the plagiarizing professor as a good guy, an innocent victim of his ghostwriter? A clue comes from Google . . . an article by that same author in that same magazine from 2022, describing the not-yet-acknowledged plagiarist as “genius Forrest Gump, a soft-spoken and menschy cancer researcher,” and an article from 2014 where “Pioneering biomedical researcher David Agus reveals which clock stoppers excite him most,” and another from 2016 featuring “Longevity expert Dr. David Agus,” and, on the plus side, this article from 2021 encouraging people to take the covid vaccine.

Also this:

Considering that this guy is “always uncomfortable being the focus of any media,” it’s funny that he has a page full of clips of TV and promotional appearances:

Including . . . a picture of a giraffe! Dude has giraffes on his mind. I’m starting to be suspicious of his implicit claim that he never read the chapter in his latest book that was plagiarized from “a 2016 blog post on the website of a South African safari company titled, ‘The Ten Craziest Facts You Should Know About A Giraffe.'”

“Guns, Race, and Stats: The Three Deadliest Weapons in America”

Geoff Holtzman writes:

In April 2021, The Guardian published an article titled “Gun Ownership among Black Americans is Up 58.2%.” In June 2022, Newsweek claimed that “Gun ownership rose by 58 percent in 2020 alone.” The Philadelphia Inquirer first reported on this story in August 2020, and covered it again as recently as March 2023 in a piece titled “The Growing Ranks of Gun Owners.” In between, more than two dozen major media outlets reported this same statistic. Despite inconsistencies in their reporting, all outlets (directly or indirectly) cite as their source a survey-based infographic conducted by a firearm industry trade association.

Last week, I shared my thoughts on the social, political, and ethical dimensions of these stories in an article published in The American Prospect. Here, I address whether and to what extent their key statistical claim is true. And an examination of the infographic—produced by the National Shooting Sports Foundation (NSSF)—reveals that it is not. Below, I describe six key facts about the infographic that undermine the media narrative. After removing all false, misleading, or meaningless words from the Guardian’s headline and Newsweek’s claim, the only words remaining are “Among” “Is,” “In,” and “By.”

(1) 58.2% only refers to the first six months of 2020

To understand demographic changes in firearms purchases or ownership in 2020, one needs to ascertain firearm sales or ownership demographics from before 2020 and after 2020. The best way to do this is with a longitudinal panel, which is how Pew found no change in Black gun ownership rates among Americans from 2017 (24%) to 2021 (24%). Longitudinal research in The Annals of Internal Medicine, also found no change in gun ownership among Black Americans from 2019 (21%) through 2020/2021 (21%).

By contrast, the NSSF conducted a one-time survey of its own member retailers. In July 2020, the NSSF asked these retailers to estimate demographics in the first six months of 2020 to demographics in the first six months of 2019. A full critique of this approach and its drawbacks would require a lengthy discussion of the scientific literature on recency bias, telescoping effects, and so on. To keep this brief, I’d just like to point out that by July 2020, many of us could barely remember what the world was like back in 2019.

Ironically, the media couldn’t even remember when the survey took place. In September 2020, NPR reported—correctly—that “according to AOL News,” the survey concerned “the first six months of 2020.”  But in October of 2020, CNN said it reflected gun sales “through September.” And by June 2021, CNN revised its timeline to be even less accurate, claiming the statistic was “gun buyers in 2020 compared to 2019.”

Strangely, it seems that AOL News may have been one of the few media outlets that actually looked at the infographic it reported. The timing of the survey—along with other critical but collectively forgotten information on its methods are printed at the top of the infographic. The entire top quarter of the NSSF-produced image is devoted to these details:  “FIREARM & AMMUNITION SALES DURING 1ST HALF OF 2020, Online Survey Fielded July 2020 to NSSF Members.”

But as I discuss in my article in The American Prospect, a survey about the first half of 2020 doesn’t really support a narrative about Black Americans’ response to “protests throughout the summer” of 2020 or to that November’s “contested election.” This is a great example of a formal fallacy (post hoc reasoning), memory bias (more than one may have been at work here), and motivated reasoning all rolled into one. To facilitate these cognitive errors, the phrase “in 2020” is used ambiguously in the stories, referring at times to its first six months of 2020 and at times specific days or periods during the last seven months. This part of the headlines and stories is not false, but it does conflate two distinct time periods.

The results of the NSSF survey cannot possibly reflect the events of the Summer and Fall of 2020. Rather, the survey’s methods and materials were reimagined, glossed over, or ignored to serve news stories about those events.

(2) 58.2% describes only a tiny, esoteric fraction of Americans

To generalize about gun owner demographics in the U.S., one has to survey a representative, random sample of Americans. But the NSSF survey was not sent to a representative sample of Americans—it was only sent to NSSF members. Furthermore, it doesn’t appear to have been sent to a random sample of NSSF members—we have almost no information on how the sample of fewer than 200 participants were drawn from the NSSF’s membership of nearly 10,000. Most problematically—and bizarrely—the survey is supposed to tell us something about gun buyers, yet the NSSF chose to send the survey exclusively to its gun sellers.

The word “Americans” in these headlines is being used as shorthand for “gun store customers as remembered by American retailers up to 18 months later.” In my experience, literally no one assumes I mean the latter when I say the former. The latter is not representative of the former, so this part of the headlines and news stories is misleading.

(3) 58.2% refers to some abstract, reconstructed memory of Blackness

The NSSF doesn’t provide demographic information for the retailers it surveyed. Demographics can provide crucial descriptive information for interpreting and weighting data from any survey, but their omission is especially glaring for a survey that asked people to estimate demographics. But there’s a much bigger problem here.

We don’t have reliable information about the races of these retailers’ customers, which is what the word “Black” is supposed to refer to in news coverage of the survey. This is not an attack on firearms retailers; it is a well-established statistical tendency in third-party racial identification. As I’ve discussed in The American Journal of Bioethics, a comparison of CDC mortality data to Census records shows that funeral directors are not particularly accurate in reporting the race of one (perfectly still) person at a time. Since that’s a simpler task than searching one’s memory and making statistical comparisons of all customers from January through June of two different years, it’s safe to assume that the latter tends to produce even less accurate reports.

The word “Black” in these stories really means “undifferentiated masses of people from two non-consecutive six-month periods recalled as Black.” Again, the construct picked out by “Black” in the news coverage is a far cry from the construct actually measured by the survey.

(4) 58.2% appears to be about something other than guns

The infographic doesn’t provide the full wording of survey items, or even make clear how many items there were. Of the six figures on the infographic, two are about “sales of firearms,” two are about “sales of ammunition,” and one is about “overall demographic makeup of your customers.” But the sixth and final figure—the source of that famous 58.2%—does not appear to be about anything at all. In its entirety, that text on the infographic reads: “For any demographic that you had an increase, please specify the percent increase.”

Percent increase in what? Firearms sales? Ammunition sales? Firearms and/or ammunition sales? Overall customers? My best guess would be that the item asked about customers, since guns and ammo are not typically assigned a race. But the sixth figure is uninterpretable—and the 58.2% statistic meaningless—in the absence of answers.

(5) 58.2% is about something other than ownership

I would not guess that the 58.2% statistic was about ownership, unless this were a multiple choice test and I was asked to guess which answer was a trap.

The infographic might initially appear to be about ownership, especially to someone primed by the initial press release. It’s notoriously difficult for people to grasp distinctions like those between purchases by customers and ownership in a broader population. I happen to think that the heuristics, biases, and fallacies associated with that difficulty—reverse inference, base rate neglect, affirming the consequent, etc.—are fascinating, but I won’t dwell on them here. In the end, ammunition is not a gun, a behavior (purchasing) is not a state (ownership), and customers are none of the above.

To understand how these concepts differ, suppose that 80% of people who walk into a given gun store in a given year own a gun. The following year, the store could experience a 58% increase in customers, or a 58% increase in purchases, but not observe a 58% increase in ownership. Why? Because even the best salesperson can’t get 126% of customers to own guns. So the infographic neither states nor implies anything specific about changes in gun ownership.

(6) 58.2% was calculated deceptively

I can’t tell if the data were censored (e.g., by dropping some responses before analysis) or if the respondents were essentially censored (e.g., via survey skip logic), but 58.2% is the average guess only of retailers who reported an increase in Black customers. Retailers who reported no increase in Black customers were not counted toward the average. Consequently, the infographic can’t provide a sample size for this bar chart. Instead, it presents a range of sample sizes for individual bars: “n=19-104.”

Presenting means from four distinct, artificially constructed, partly overlapping samples as a single bar chart without specifying the size of any sample renders that 58.2% number uninterpretable. It is quite possible that only 19 of 104 retailers reported an increase in Black customers, and that all 104 reported an increase in White customers—for whom the infographic (but not the news) reported a 51.9% increase. Suppose 85 retailers did not report an increase in Black customers, and instead reported no change for that group (i.e., a change of 0%). Then if we actually calculated the average change in demographics reported by all survey respondents, we would find just a 10.6% increase in Black customers (19/104 x 58.2%), as compared to a 51.9% increase in white customers (104/104 x 51.9%).

A proper analysis of the full survey data could actually undermine the narrative of a surge in gun sales driven by Black Americans. In fact, a proper calculation may even have found a decrease, not an increase, for this group. The first two bar charts on the infographic report percentages of retailers who thought overall sales of firearms and of ammunition were “up,” “down,” or the “same.” We don’t know if the same response options were given for the demographic items, but if they were, a recount of all votes might have found a decrease in Black customers. We’ll never know.

The 58.2% number is meaningless without additional but unavailable information. Or, to use more technical language, it is a ceilingestimate, as opposed to a real number. In my less-technical write-up, I simply call it a fake number.

This is kind of in the style of our recent article in the Atlantic, The Statistics That Come Out of Nowhere, but with lot more detail. Or, for a simpler example, a claim from a few years ago about political attitudes of the super-rich, which came from a purported survey about which no details were given. As with some of those other claims, the reported number of 58% was implausible on its face, but that didn’t stop media organizations from credulously repeating it.

On the plus side, a few years back a top journal (yeah, you guessed it, it was Lancet, that fount of politically-motivated headline-bait) published a ridiculous study on gun control and, to their credit, various experts expressed their immediate skepticism.

To their discredit, the news media reports on that 58% thing did not even bother running it by any experts, skeptical or otherwise. Here’s another example (from NBC), here’s another (from Axios), here’s CNN . . . you get the picture.

I guess this story is just too good to check, it fits into existing political narratives, etc.

Some people have no cell phone and never check their email before 4pm.

Paul Alper points to this news article, “Barely a quarter of Americans still have landlines. Who are they?”, by Andrew Van Dam, my new favorite newspaper columnist. Van Dam writes:

Only 2 percent of U.S. adults use only landlines. Another 3 percent mostly rely on landlines and 1 percent don’t have phones at all. The largest group of holdouts, of course, are folks 65 and older. That’s the only demographic for which households with landlines still outnumber wireless-only households. . . . about 73 percent of American adults lived in a household without a landline at the end of last year — a figure that has tripled since 2010.

Here’s some statistics:

“People who have cut the cord” — abandoning landlines to rely only on wireless — “are generally more likely to engage in risky behaviors,” Blumberg told us. “They’re more likely to binge drink, more likely to smoke and more likely to go without health insurance.” That’s true even when researchers control for age, sex, race, ethnicity and income.

OK, they should say “adjust for,” not “control for,” but I get the idea.

The article continues:

Until recently, we weren’t sure that data even existed. But it turns out we were looking in the wrong place. Phone usage is tracked in the National Health Interview Survey, of all things, the same source we used in previous columns to measure the use of glasses and hearing aids by our fellow Americans.

Here are just some of the factors that have been published in the social priming and related literatures as having large effects on behavior.

This came up in our piranha paper, and it’s convenient to have these references in one place:

Here are just some of the factors that have been published in the social priming and related literatures as having large and predictable effects on attitudes and behavior: hormones (Petersen et al., 2013; Durante et al., 2013), subliminal images (Bartels, 2014; Gelman, 2015b), the outcomes of recent football games (Healy et al., 2010; Graham et al., 2022; Fowler and Montagnes, 2015, 2022), irrelevant news events such as shark attacks (Achen and Bartels, 2002; Fowler and Hall, 2018), a chance encounter with a stranger (Sands, 2017; Gelman, 2018b), parental socioeconomic status (Petersen et al., 2013), weather (Beall and Tracy, 2014; Gelman, 2018a), the last digit of one’s age (Alter and Hershfield, 2014; Kühnea et al., 2015), the sex of a hurricane name (Jung et al., 2014; Freese, 2014), the sexes of siblings (Blanchard and Bogaert, 1996; Bogaert, 2006; Gelman and Stern, 2006), the position in which a person is sitting (Carney et al., 2010; Cesario and Johnson, 2018), and many others.

These individual studies have lots of problems (see references below to criticisms); beyond that, the piranha principle implies that it would be very difficult for many of these large and consistent effects to coexist in the wild.

References to the claims:

Kristina M. Durante, Ashley Rae, and Vladas Griskevicius. The fluctuating female vote: Politics, religion, and the ovulatory cycle. Psychological Science, 24:1007–1016, 2013.

Larry Bartels. Here’s how a cartoon smiley face punched a big hole in democratic theory. Washington Post, https://www.washingtonpost.com/news/monkey-cage/wp/2014/09/04/heres-how-a-cartoon-smiley-face-punched-a-big-hole-in-democratic-theory/, 2014.

A. J. Healy, N. Malhotra, and C. H. Mo. Irrelevant events affect voters’ evaluations of government performance. Proceedings of the National Academy of Sciences, 107:12804–12809, 2010.

Matthew H. Graham, Gregory A. Huber, Neil Malhotra, and Cecilia Hyunjung Mo. Irrelevant events and voting behavior: Replications using principles from open science. Journal of Politics, 2022.

C. H. Achen and L. M. Bartels. Blind retrospection: Electoral responses to drought, flu, and shark attacks. Presented at the Annual Meeting of the American Political Science Association, 2002.

Anthony Fowler and Andrew B. Hall. Do shark attacks influence presidential elections? Reassessing a prominent finding on voter competence. Journal of Politics, 80:1423–1437, 2018.

Melissa L. Sands. Exposure to inequality affects support for redistribution. Proceedings of the National Academy of Sciences, 114:663–668, 2017.

Michael Bang Petersen, Daniel Sznycer, Aaron Sell, Leda Cosmides, and John Tooby. The ancestral logic of politics: Upper-body strength regulates men’s assertion of self-interest over economic redistribution. Psychological Science, 24:1098–1103, 2013.

Alec T. Beall and Jessica L. Tracy. The impact of weather on women’s tendency to wear red or pink when at high risk for conception. PLoS One, 9:e88852, 2014.

A. L. Alter and H. E. Hershfield. People search for meaning when they approach a new decade in chronological age. Proceedings of the National Academy of Sciences, 111:17066–17070, 2014.

Kiju Jung, Sharon Shavitt, Madhu Viswanathan, and Joseph M. Hilbe. Female hurricanes are deadlier than male hurricanes. Proceedings of the National Academy of Sciences, 111:8782–8787, 2014.

R. Blanchard and A. F. Bogaert. Homosexuality in men and number of older brothers. American Journal of Psychiatry, 153:27–31, 1996.

A. F. Bogaert. Biological versus nonbiological older brothers and men’s sexual orientation. Proceedings of the National Academy of Sciences, 103:10771–10774, 2006.

D. R. Carney, A. J. C. Cuddy, and A. J. Yap. Power posing: Brief nonverbal displays affect neuroendocrine levels and risk tolerance. Psychological Science, 21:1363–1368, 2010.

References to some criticisms:

Andrew Gelman. The connection between varying treatment effects and the crisis of unreplicable research: A Bayesian perspective. Journal of Management, 41:632–643, 2015a.

Andrew Gelman. Disagreements about the strength of evidence. Chance, 28:55–59, 2015b.

Anthony Fowler and B. Pablo Montagnes. College football, elections, and false-positive results in observational research. Proceedings of the National Academy of Sciences, 112:13800–13804, 2015.

Anthony Fowler and B. Pablo Montagnes. Distinguishing between false positives and genuine results: The case of irrelevant events and elections. Journal of Politics, 2022.

Andrew Gelman. Some experiments are just too noisy to tell us much of anything at all: Political science edition. Statistical Modeling, Causal Inference, and Social Science, https://statmodeling.stat.columbia.edu/2018/05/29/exposure-forking-paths-affects-support-publication/, 2018b.

Andrew Gelman. Another one of those “Psychological Science” papers (this time on biceps size and political attitudes among college students). Statistical Modeling, Causal Inference, and Social Science, https://statmodeling.stat.columbia.edu/2013/05/29/another-one-of-those-psychological-science-papers/

Andrew Gelman. When you believe in things that you don’t understand. Statistical Modeling, Causal Inference, and Social Science, https://statmodeling.stat.columbia.edu/2014/04/15/believe-things-dont-understand/, 2018a.

Simon Kühnea, Thorsten Schneiderb, and David Richter. Big changes before big birthdays? Panel data provide no evidence of end-of-decade crises. Proceedings of the National Academy of Sciences, 112:E1170, 2015.

Jeremy Freese. The hurricane name people strike back! Scatterplot, https://scatter.wordpress.com/2014/06/16/the-hurricane-name-people-strike-back/, 2014.

Andrew Gelman and Hal Stern. The difference between “significant” and “not significant” is not itself statistically significant. American Statistician, 60:328–331, 2006.

J. Cesario and D. J. Johnson. Power poseur: Bodily expansiveness does not matter in dyadic interactions. Social Psychological and Personality Science, 9:781–789, 2018.

Lots more out there:

The above is not intended to be an exhaustive or representative list or even a full list of examples we’ve covered here on the blog! There’s the “lucky golf ball” study, the case of the missing shredder, pizzagate, . . . we could go on forever. The past twenty years have featured many published and publicized claims about essentially irrelevant stimuli having large and predictable effects, along with quite a bit of criticism and refutation of these claims. The above is only a very partial list, just a paragraph giving a small sense of the wide variety of stimuli that are supposed to have been demonstrated to have large and consistent effects, and it’s relevant to our general point that it’s not possible for all these effects to coexist in the world. Again, take a look at the piranha paper for further discussion of this point.

In all seriousness, what’s USC gonna do about its plagiarizing professor?

His plagiarism unambiguously violates the USC Integrity and Accountability Code:

Our Integrity and Accountability Code is anchored to USC’s Unifying Values and aligns our everyday decisions with the institution’s mission and compliance obligations. The Code is a vital resource for all faculty and staff throughout USC, including those at Keck Medicine at USC and the Keck School of Medicine.

The university policy continues:

To protect our reputation and promote our mission, every Trojan must do their part and act with integrity in our learning, teaching and research activities. This includes . . .

Never tolerating acts of plagiarism, falsification or fabrication of data, or other forms of academic and research misconduct.

I guess his defense would be that he didn’t “tolerate” these acts of plagiarism, because he didn’t know that they happened. But that would imply that he did not read his own book, which violates another part of that policy:

Making sure that all documentation and published findings are accurate, complete and unbiased.

Also it implies he was not telling the truth when he said the following in his book: “I went out and spoke to the amazing scientists around the world who do these kinds of experiments, and what I uncovered was astonishing.” Unless of course he never said this and his ghostwriter made it up, in which case he didn’t read that part either.

At some point you have to take responsibility for what is written under your name, right? I understand that in collaborative work it’s possible for coauthors to include errors or even fabrications without the other authors knowing, but he was the sole author of this book.

As the official USC document says:

Plagiarism is the appropriation of another person’s ideas, processes, results, or words without giving appropriate credit.

Here’s what the USC medical school professor said:

I am grateful that my collaborator has confirmed that I did not contribute to, nor was I aware of, any of the plagiarized or non-attributed passages in my books . . . I followed standard protocols and my attorney and I received several verbal [sic] and written assurances from this highly respected individual that she had run the book through multiple software checks to ensure proper attributions.

Ummmm . . . what about the “long sections of a chapter on the cardiac health of giraffes [that] appear to have been lifted from a 2016 blog post on the website of a South African safari company titled, ‘The Ten Craziest Facts You Should Know About A Giraffe'”? That didn’t look odd at all??

My “I have run the book through multiple software checks” T-shirt has people asking a lot of questions already answered by my shirt.

Also weird that the ghostwriter gave an assurance that she had run “multiple software checks.” This sounds like the author of record and his attorney (!) already had their suspicions. Who goes around asking for “several verbal and written assurances”? I get it: the author of record didn’t just pay for the words in the book; he also paid for an assurance that any plagiarism wouldn’t get caught.

I’m completely serious about this question:

What if a student at the USC medical school (oh, sorry, the “Keck School of Medicine”) were to hand in a plagiarized final paper? Would that student be kicked out of the program? What if the student said that he didn’t know about the plagiarism because he’d hired a ghostwriter to write the paper? And the ghostwriter supplied several verbal and written assurances that she had run the book through multiple software checks. Then would it be ok?

I have no personal interest in this one; I’m not going to file a formal complaint with USC or whatever. I just think it’s funny that USC doesn’t seem to care. What ever happened to “To protect our reputation and promote our mission . . .”? “Every Trojan,” indeed. To paraphrase Leona Helmsley, only the little people have to follow the rules, huh?

Why this matters

Junk science pollutes our discourse, Greshamly overwhelming the real stuff out there. Confident bullshitters suck up attention, along with TV, radio (NPR, of course), and space on airport bookshelves across the nation. When this regurgitated crap gets endorsements by Jane Goodall, Al Gore, Murray Gell-Mann, and other celebrities, it crowds out whatever is really being learned about the world.

There’s room for pop science writing and general health advice, for sure. This giraffe crap ain’t it.

On the other hand

Let’s get real. All this is much better than professors who engage in actual dangerous behavior such as conspiracy theorizing, election denial, medical hype, etc. I guess what bothers me about this USC case is the smugness of it all. The professors who push baseless conspiracy theories or dubious cures typically have an air of desperation to them. OK, not all of them. But often. Even Wansink at the height of his fame had a sort of overkinetic nervous aspect. And presumably they believe their own hype. Even those business school professors who made up data think they’re doing it in support of some true theory, right? But USC dude had to have known he was contracting out his reputation, just so he could get one more burst of fame with “The Ten Craziest Facts You Should Know About A Giraffe” or whatever. In any case, yeah, Alex Jones is a zillion times worse so let’s keep our perspective here.

Also, the USC doc is a strong supporter of vaccines, so he’s using his reputation for good rather than trying to use political polarization to score political points. I guess he can forget about moving to Stanford.

More on possibly rigor-enhancing practices in quantitative psychology research

In an paper entitled, “Causal claims about scientific rigor require rigorous causal evidence,” Joseph Bak-Coleman and Berna Devezer write:

Protzko et al. (2023) claim that “High replicability of newly discovered social-behavioral findings is achievable.” They argue that the 86% rate of replication observed in their replication studies is due to “rigor-enhancing practices” such as confirmatory tests, large sample sizes, preregistration and methodological transparency. These findings promise hope as concerns over low rates of replication have plagued the social sciences for more than a decade. Unfortunately, the observational design of the study does not support its key causal claim. Instead, inference relies on a post hoc comparison of a tenuous metric of replicability to past research that relied on incommensurable metrics and sampling frames.

The article they’re referring to is by a team of psychologists (John Protzko, Jon Krosnick, et al.) reporting “an investigation by four coordinated laboratories of the prospective replicability of 16 novel experimental findings using rigor-enhancing practices: confirmatory tests, large sample sizes, preregistration, and methodological transparency. . . .”

When I heard about that paper, I teed off on their proposed list of rigor-enhancing practices.

I’ve got no problem with large sample sizes, preregistration, and methodological transparency. And confirmatory tests can be fine too, as long as they’re not misinterpreted and not used for decision making.

My biggest concern is that the authors or readers of that article will think that these are the best rigor-enhancing practices in science (or social science, or psychology, or social psychology, etc.), or the first rigor-enhancing practices that researchers should reach for, or the most important rigor-enhancing practices, or anything like that.

Instead, I gave my top 5 rigor-enhancing practices, in approximately decreasing order of importance:

1. Make it clear what you’re actually doing. Describe manipulations, exposures, and measurements fully and clearly.

2. Increase your effect size, e.g., do a more effective treatment.

3. Focus your study on the people and scenarios where effects are likely to be largest.

4. Improve your outcome measurement.

5. Improve pre-treatment measurements.

The suggestions of “confirmatory tests, large sample sizes, preregistration, and methodological transparency” are all fine, but I think all are less important than the 5 steps listed above. You can read the linked post to see my reasoning; also there’s Pam Davis-Kean’s summary, “Know what the hell you are doing with your research.” You might say that goes without saying, but it doesn’t, even in some papers published in top journals such as Psychological Science and PNAS!

You can also read a response to my post from Brian Nosek, a leader in the replication movement and one of the coauthors of the article being discussed.

In their new article, Bak-Coleman and Devezer take a different tack than me, in that they’re focused on challenges of measuring replicability of empirical claims in psychology, whereas I was more interested in the design of future studies. To a large extent, I find the whole replicability thing important to the extent that it gives researchers and users of research less trust in generic statistics-backed claims; I’d guess that actual effects typically vary so much based on context that new general findings are mostly not to be trusted. So I’d say that Protzko et al., Nosek, Bak-Coleman and Devezer, and I are coming from four different directions. (Yes, I recognize that Nosek is one of the authors of the Protzko et al. paper; still, in his blog comment he seemed to have a slightly different perspective). The article by Bak-Coleman and Devezer seems very relevant to any attempt to understand the empirical claims of Protzko et al.

Dorothy Bishop on the prevalence of scientific fraud

Following up on our discussion of replicability, here are some thoughts from psychology researcher Dorothy Bishop on scientific fraud:

In recent months, I [Bishop] have become convinced of two things: first, fraud is a far more serious problem than most scientists recognise, and second, we cannot continue to leave the task of tackling it to volunteer sleuths.

If you ask a typical scientist about fraud, they will usually tell you it is extremely rare, and that it would be a mistake to damage confidence in science because of the activities of a few unprincipled individuals. . . . we are reassured [that] science is self-correcting . . .

The problem with this argument is that, on the one hand, we only know about the fraudsters who get caught, and on the other hand, science is not prospering particularly well – numerous published papers produce results that fail to replicate and major discoveries are few and far between . . . We are swamped with scientific publications, but it is increasingly hard to distinguish the signal from the noise.

Bishop summarizes:

It is getting to the point where in many fields it is impossible to build a cumulative science, because we lack a solid foundation of trustworthy findings. And it’s getting worse and worse. . . . in clinical areas, there is growing concern that systematic reviews that are supposed to synthesise evidence to get at the truth instead lead to confusion because a high proportion of studies are fraudulent.

Also:

[A] more indirect negative consequence of the explosion in published fraud is that those who have committed fraud can rise to positions of influence and eminence on the back of their misdeeds. They may become editors, with the power to publish further fraudulent papers in return for money, and if promoted to professorships they will train a whole new generation of fraudsters, while being careful to sideline any honest young scientists who want to do things properly. I fear in some institutions this has already happened.

Given all the above, it’s unsurprising that, in Bishop’s words,

To date, the response of the scientific establishment has been wholly inadequate. There is little attempt to proactively check for fraud . . . Even when evidence of misconduct is strong, it can take months or years for a paper to be retracted. . . . this relaxed attitude to the fraud epidemic is a disaster-in-waiting.

What to do? Bishop recommends that some subset of researchers be trained as “data sleuths,” to move beyond the current whistleblower-and-vigilante system into something more like “the equivalent of a police force.”

I don’t know what to think about that. On one hand, I agree that whistleblowers and critics don’t get the support that they deserve; on the other hand, we might be concerned about who would be attracted to the job of official police officer here.

Setting aside concerns about Bishop’s proposed solution, I do see her larger point about the scientific publication process being so broken that it can actively interfere with the development of science. In a situation parallel to Cantor’s diagonal argument or Russell’s theory of types, it would seem that we need a scientific literature, and then, alongside it, a vetted scientific literature, and then, alongside that, another level of vetting, and so on. In medical research this sort of system has existed for decades, with a huge number of journals for the publication of original studies; and then another, smaller but still immense, set of journals that publish nothing but systematic reviews; and then some distillations that make their way into policy and practice.

Clarke’s Law

And don’t forget Clarke’s Law: Any sufficiently crappy research is indistinguishable from fraud. All the above problems also arise with the sorts of useless noise mining we’ve been discussing in this space for nearly twenty years now. I assume most of those papers do not involve fraud, and even when there are clearly bad statistical practices such as rooting around for statistical significance, I expect that the perpetrators think of these research violations as merely serving the goal of larger truths.

So it’s not just fraud. Not by a longshot.

Also, remember the quote from Bishop above: “those who have committed fraud can rise to positions of influence and eminence on the back of their misdeeds. They may become editors, with the power to publish further fraudulent papers in return for money, and if promoted to professorships they will train a whole new generation of fraudsters, while being careful to sideline any honest young scientists who want to do things properly. I fear in some institutions this has already happened.” Replace “fraud” by “crappy research” and, yeah, we’ve been there for awhile!

P.S. Mark Tuttle points us to this news article by Richard Van Noorden, “How big is science’s fake-paper problem?”, that makes a similar point.

“Open Letter on the Need for Preregistration Transparency in Peer Review”

Brendan Nyhan writes:

Wanted to share this open letter. I know preregistration isn’t useful for the style of research you do, but even for consumers of preregistered research like you it’s essential to know if the preregistration was actually disclosed to and reviewed by reviewers, which in turn helps make sure that exploratory and confirmatory analyses are adequately distinguished, deviations and omissions labeled, etc. (The things I’ve seen as a reviewer… are not good – which is what motivated me to organize this.)

The letter, signed by Nyhan and many others, says:

It is essential that preregistrations be considered as part of the scientific review process.

We have observed a lack of shared understanding among authors, editors, and reviewers about the role of preregistration in peer review. Too often, preregistrations are omitted from the materials submitted for review entirely. In other cases, manuscripts do not identify important deviations from the preregistered analysis plan, fail to provide the results of preregistered analyses, or do not indicate which analyses were not preregistered.

We therefore make the following commitments and ask others to join us in doing so:

As authors: When we submit an article for review that has been preregistered, we will always include a working link to a (possibly anonymized) preregistration and/or attach it as an appendix. We will identify analyses that were not preregistered as well as notable deviations and omissions from the preregistration.

As editors: When we receive a preregistered manuscript for review, we will verify that it includes a working link to the preregistration and/or that it is included in the materials provided to reviewers. We will not count the preregistration against appendix page limits.

As reviewers: We will (a) ask for the preregistration link or appendix when reviewing preregistered articles and (b) examine the preregistration to understand the registered intention of the study and consider important deviations, omissions, and analyses that were not preregistered in assessing the work.

I’ve actually been moving toward more preregistration in my work. Two recent studies we’ve done that have been preregistered are:

– Our project on generic language and political polarization

– Our evaluation of the Millennium Villages project

And just today I met with two colleagues on a medical experiment that’s in the pre-design stage—that is, we’re trying to figure out the design parameters. To do this, we need to simulate the entire process, including latent and observed data, then perform analyses on the simulated data, then replicate the entire process to ensure that the experiment will be precise enough to be useful, at least under the assumptions we’re making. This is already 90% of preregistration, and we had to do it anyway. (See recommendation 3 here.)

So, yeah, given that I’m trying now to simulate every study ahead of time before gathering any data, preregistration pretty much comes for free.

Preregistration is not magic—it won’t turn a hopelessly biased, noisy study into something useful—but it does seem like a useful part of the scientific process, especially if we remember that preregistering an analysis should not stop us from performing later, non-preregistered analyses.

Preregistration should be an addition to the research project, not a limitation!

I guess that Nyhan et al.’s suggestions are good, if narrow in that they’re focused on the very traditional journal-reviewer system. I’m a little concerned with the promise that they as reviewers will “examine the preregistration to understand the registered intention of the study and consider important deviations, omissions, and analyses that were not preregistered in assessing the work.” I mean, sure, fine in theory, but I would not expect or demand that every reviewer do this for every paper that comes in. If I had to do all that work every time I reviewed a paper, I’d have to review many fewer papers a year, and I think my total contribution to science as a reviewer would be much less. If I’m gonna go through and try to replicate an analysis, I don’t want to waste that on a review that only 4 people will see. I’d rather blog it and maybe write it up on some other form (as for example here), as that has the potential to help more people.

Anyway, here’s the letter, so go sign it—or perhaps sign some counter-letter—if you wish!

Oooh, I’m not gonna touch that tar baby!

Someone pointed me to a controversial article written a couple years ago. The article remains controversial. I replied that it’s a topic that I’ve not followed any detail and I’ll just defer to the experts. My correspondent pointed to some serious flaws in the article and asked that I link to the article here on the blog. He wrote, “I was unable to find any peer responses to it. Perhaps the discussants on your site will have some insights.”

My reply is the title of this post.

P.S. Not enough information is given in this post to figure out what is the controversial article here, so please don’t post guesses in the comments! Thank you for understanding.

What happens when someone you know goes off the political deep end?

Speaking of political polarization . . .

Around this time every year we get these news articles of the form, “I’m dreading going home to Thanksgiving this year because of my uncle, who used to be a normal guy who spent his time playing with his kids, mowing the lawn, and watching sports on TV, but has become a Fox News zombie, muttering about baby drag shows and saying that Alex Jones was right about those school shootings being false-flag operations.”

This all sounds horrible but, hey, that’s just other people, right? OK, actually I did have an uncle who started out normal and got weirder and weirder, starting in the late 1970s with those buy-gold-because-the-world-is-coming-to-an-end newsletters and then getting worse from there, with different aspects of his life falling apart as his beliefs got more and more extreme. Back in 1999 he was convinced that the year 2K bug (remember that?) would destroy society. After January 1 came and nothing happened, we asked him if he wanted to reassess. His reply: the year 2K bug would indeed take civilization down, but it would be gradual, over a period of months. And, yeah, he’d always had issues, but it did get worse and worse.

Anyway, reading about poll results is one thing; having it happen to people you know is another. Recently a friend told me about another friend, someone I hadn’t seen in awhile. Last I spoke with that guy, a few years back, he was pushing JFK conspiracy theories. I don’t believe any of these JFK conspiracy theories (please don’t get into that in the comments here; just read this book instead), but lots of people believe JFK conspiracy theories, indeed they’re not as wacky as the ever-popular UFOs-as-space-aliens thing. I didn’t think much about it; he was otherwise a normal guy. Anyway, the news was that in the meantime he’d become a full-bore, all-in vaccine denier.

What happened? I have no idea, as I never knew this guy that well. He was a friend, or I guess in recent years an acquaintance. I don’t really have a take on whether he was always unhinged, or maybe the JFK thing started him on a path that spiraled out of control, or maybe he just spent too much time on the internet.

I was kinda curious how he’d justify his positions, though, so I sent him an email:

I hope all is well with you. I saw about your political activities online. I was surprised to see you endorse the statement that the covid vaccine is “the biggest crime ever committed on humanity.” Can you explain how you think that a vaccine that’s saved hundreds of thousands of lives is more of a crime committed on humanity than, say, Hitler and Stalin starting WW2?

I had no idea how he’d respond to this, maybe he’d send me a bunch of Qanon links, the electronic equivalent of a manila folder full of mimeographed screeds. It’s not like I was expecting to have any useful discussion with him—once you start with the position that a vaccine is a worse crime than invading countries and starting a world war, there’s really no place to turn. He did not respond to me, which I guess is fine. What was striking to me was how he didn’t just take a provocative view that was not supported by the evidence (JFK conspiracy theories, election denial, O.J. is innocent, etc.); instead he staked out a position that was well beyond the edge of sanity, almost as if the commitment to extremism was part of the appeal. Kind of like the people who go with Alex Jones on the school shootings.

Anyway, this sort of thing is always sad, but especially when it happens to someone you know, and then it doesn’t help that there are lots of unscrupulous operators out there who will do their best to further unmoor these people from reality and take their money.

From a political science perspective, the natural questions are: (1) How does this all happen?, and (2) Is this all worse than before, or do modern modes of communication just make us more aware of these extreme attitudes? After all, back in the 1960s there were many prominent Americans with ridiculous extreme-right and extreme-left views, and they had a lot of followers too. The polarization of American institutions has allowed some of these extreme views to get more political prominence, so that the likes of Alex Jones and Al Sharpton can get treated with respect by the leaders of the major political parties. Political leaders always would accept the support of extremists—a vote’s a vote, after all—but I have the feeling that in the past they were more at arms length.

This post is not meant to be a careful study of these questions, indeed I’m sure there’s a big literature on the topic. What happened is that my friend told me about our other friend going off the deep end, and that all got me thinking, in the way that a personal connection can make a statistical phenomenon feel so much more real.

P.S. Related is this post from last year on Seth Roberts and political polarization. Unlike my friend discussed above, Seth never got sucked into conspiracy theories, but he had this dangerous mix of over-skepticism and over-credulity, and I could well imagine that he could’ve ended up in some delusional spaces.

Another reason so much of science is so bad: bias in what gets researched.

Nina Strohminger and Olúfémi Táíwò write:

Most of us have been taught to think of scientific bias as a distortion of scientific results. As long as we avoid misinformation, fake news, and false conclusions, the thinking goes, the science is unbiased. But the deeper problem of bias involves the questions science pursues in the first place. Scientific questions are infinite, but the resources required to test them — time, effort, money, talent — are decidedly finite.

This is a good point. Selection bias is notoriously difficult for people to think about, as by its nature it depends on things that haven’t been seen.

I like Strohminger and Táíwò’s article and have only two things to add.

1. They write about the effects of corporations on what gets researched, using as examples the strategies of cigarette companies and oil companies to fund research to distract from their products’ hazards. I agree that this is an issue. We should also be concerned about influences from sources other than corporations, including the military, civilian governments, and advocacy organizations. There are plenty of bad ideas to go around, even without corporate influence. And, setting all this aside, there’s selection based on what gets publicity, along with what might be called scientific ideology. Think about all that ridiculous research on embodied cognition or on the factors that purportedly influence the sex ratio of babies. These ideas fit certain misguided models of science and have sucked up lots of attention and researcher effort without any clear motivation based on funding, corporate or otherwise. My point here is just that there are a lot of ways that the scientific enterprise is distorted by selection bias in what gets studied and what gets published.

2. They write: “The research on nudges could be completely unbiased in the sense that it provides true answers. But it is unquestionably biased in the sense that it causes scientists to effectively ignore the most powerful solutions to the problems they focus on. As with the biomedical researchers before them, today’s social scientists have become the unwitting victims of corporate capture.” Agreed. Beyond this, though, that research is not even close to being unbiased in the sense of providing accurate answers to well-posed questions. We discussed this last year in the context of a fatally failed nudge meta-analysis: it’s a literature of papers with biased conclusions (the statistical significance filter), with some out-and-out fraudulent studies mixed in).

My point here is that these two biases—selection bias in what is studied, and selection bias in the studies themselves—go together. Neither bias alone would be enough. If there were only selection bias is what was studied, the result would be lots of studies reporting high uncertainty and no firm conclusions, and not much to sustain the hype machine. Conversely, if there were only selection bias within each study, there wouldn’t be such a waste of scientific effort and attention. Strohminger and Táíwò’s article is valuable because they emphasize selection bias in what is studied, which is something we haven’t been talking so much about.

Simulations of measurement error and the replication crisis: Update

Last week we ran a post, “Simulations of measurement error and the replication crisis: Maybe Loken and I have a mistake in our paper?”, reporting some questions that neuroscience student Federico D’Atri asked about a paper that Eric Loken and I wrote a few years ago. It’s one of my favorite papers so it was good to get feedback on it. D’Atri had run a simulation and had some questions, and in my post I shared some old code from the paper. Eric and I then looked into it. We discussed with D’Atri and here’s what we found:

1. No, Loken and I did not have a mistake in our paper. (More on that below.)

2. The code I posted on the blog was not the final code for our paper. Eric had made the final versions. From Eric, here’s:
(a) the final code that we used to make the figures in the paper, where we looked at regression slopes, and
(b) cleaned version of the code I’d posted, where we looked at correlations.
The code I posted last week was something in my files, but it was not the final version of the code, hence the confusions about what was being conditioned on in the analysis.

Regarding the code, Eric reports:

All in all we get the same results whether it’s correlations or t-tests of slopes. At small samples, and for small effects, the majority of the stat sig cors/slopes/t-tests are larger in the error than the non error (when you compare them paired). The graph’s curve does pop up through 0.5 and higher. It’s a lot higher if r = 0.08, and it’s not above 50% if the r is 0.4. It does require a relatively small effect, but we also have .8 reliability.

3. Some interesting questions remain. Federico writes:

I don’t think there’s an error in the code used to produce the graphs in the paper; rather I personally find that certain sentences in the paper may lead to some misunderstandings. I also concur with the main point made in their paper, that large estimates obtained from a small sample in high-noise conditions should not be trusted and I believe they do a good job of delivering this message.

What Andrew and Eric show is the proportion of larger correlations achieved when noisy measurements are selected for statistical significance, compared to the estimate one would obtain in the same scenario without measurement error and without selecting for statistical significance. What I had initially thought was that there was an equal level of selection for statistical significance applied to both scenarios. They essentially show that under conditions of insufficient power to detect the true underlying effect, doing enough selection based on statistical significance, can produce an overestimation much higher than the attenuation caused by measurement error.

This seems quite intuitive to me, and I would like to clarify it with an example. Consider a true underlying correlation in ideal conditions of 0.15 and a sample size of N = 25, and the extreme scenario where measurement error is infinite (in the noisy x and y will be uncorrelated). In this case, the measurements of x and y under ideal conditions will be totally uncorrelated with those obtained under noisy conditions, hence the correlation estimates in the two different scenarios as well. If I select for significance the correlations obtained under noisy conditions, I am only looking at correlations greater than 0.38 (for α = 0.05, two-tailed test), which I’ll be comparing to an average correlation of 0.15, since the two estimates are completely unrelated. It is clear then that the first estimate will almost always be greater than the second. The greater the noise, the more uncorrelated the correlation estimates obtained in the two different scenarios become, making it less likely that obtaining a large estimate in one case would also result in a large estimate in the other case.

My criticism is not about the correctness of the code (which is correct as far as I can see), but rather how relevant this scenario is in representing a real situation. Indeed, I believe it is very likely that the same hypothetical researchers who made selections for statistical significance in ‘noisy’ measurement conditions would also select for significance in ideal measurement conditions, and in that case, they would obtain an even higher frequency of effect overestimation when selecting for statistical significance (once selecting for the direction of the true effect) as well as a greater ease in achieving statistically significant results .

However, I think it could be possible that in research environments where measurement error is greater (and isn’t modeled), there might be an incentive, or a greater co-occurrence, of selection for statistical significance and poor research practices. Without evidence of this, though, I find it more interesting to compare the two scenarios assuming similar selection criteria.

Also I’m aware that in situations deviating from the simple assumptions of the case we are considering here (simple correlation between x and y and uncorrelated measurement errors), complexities can arise. For example, as probably know better than me, in multiple regression scenarios where two predictors, x1 and x2, are correlated and their measurement errors are also correlated (which can occur with certain types of measures, such as self-reporting where individuals prone to overestimating x1 may also tend to overestimate x2), and only x1 is correlated with y, there is an inflation of Type I error for x2 and asymptotically β2 is biased away from zero.

Eric adds:

Glad we resolved the initial confusion about our article’s main point and associated code. When you [Federico] first read our article, you were interested in different questions than the one we covered. It’s a rich topic, with lots of work to be done, and you seem to have several ideas. Our article addressed the situation where someone might acknowledge measurement error, but then say “my finding is all the more impressive because if not for the measurement error I would have found an even bigger effect.” We target the intuition that if a dataset could be made error free by waving a wand, that the data would necessarily show a larger correlation. Of course the “iron law” (attenuation) holds in large samples. Unsurprisingly, however, in smaller samples, data with measurement error can have a larger realized correlation. And after conditioning on the statistical significance of the observed correlations, a majority of them could be larger than the corresponding error free correlation. We treated the error free effect (the “ideal study”) as the counterfactual (“if only I had no error in my measurements”), and thus filtered on the statistical significance of the observed error prone correlations. When you tried to reproduce that graph, you applied the filter differently, but you now find that what we did was appropriate for the question we were answering.

By the way, we deliberately kept the error modest. In our scenario, the x and y values have about 0.8 reliability—widely considered excellent measurement. I agree that if the error grows wildly, as with your hypothetical case, then the observed values are essentially uncorrelated with the thing being measured. Our example though was pretty realistic—small true effect, modest measurement error, range of sample sizes. I can see though that there are many factors to explore.

Different questions are of interest in different settings. One complication is that, when researchers say things like, “Despite limited statistical power . . .” they’re typically not recognizing that they have been selecting on statistical significance. In that way, they are comparing to the ideal setting with no selection.

And, for reasons discussed in my original paper with Eric, researchers often don’t seem to think about measurement error at all! because they have the (wrong) impression that having a “statistically significant” result gives retroactive assurance that their signal-to-noise ratio is high.

That’s what got us so frustrated to start with: not just that noisy studies get published all the time, but that many researchers seem to not even realize that noise can be a problem. Lots and lots of correspondence with researchers who seem to feel that if they’ve found a correlation between X and Y, where X is some super-noisy measurement with some connection to theoretical concept A, and Y is some super-noisy measurement with some connection to theoretical concept B, that they’ve proved that A causes B, or that they’ve discovered some general connection between A and B.

So, yeah, we encourage further research in this area.