About 80 people pointed me to this post by Uri Simonsohn, Joe Simmons, and Leif Nelson about a 2012 article, “Signing at the beginning makes ethics salient and decreases dishonest self-reports in comparison to signing at the end.” Apparently some of the data in that paper were faked; see for example here:
Uri et al. report some fun sleuthing:
The analyses we have performed on these two fonts provide evidence of a rather specific form of data tampering. We believe the dataset began with the observations in Calibri font. Those were then duplicated using Cambria font. In that process, a random number from 0 to 1,000 (e.g., RANDBETWEEN(0,1000)) was added to the baseline (Time 1) mileage of each car, perhaps to mask the duplication. . . .
The evidence presented in this post indicates that the data underwent at least two forms of fabrication: (1) many Time 1 data points were duplicated and then slightly altered (using a random number generator) to create additional observations, and (2) all of the Time 2 data were created using a random number generator that capped miles driven, the key dependent variable, at 50,000 miles.
This is basically the Cornell Food and Brand Lab without the snow.
Uri et al. summarize:
We have worked on enough fraud cases in the last decade to know that scientific fraud is more common than is convenient to believe, and that it does not happen only on the periphery of science. Addressing the problem of scientific fraud should not be left to a few anonymous (and fed up and frightened) whistleblowers and some (fed up and frightened) bloggers to root out. The consequences of fraud are experienced collectively, so eliminating it should be a collective endeavor. What can everyone do?
There will never be a perfect solution, but there is an obvious step to take: Data should be posted. The fabrication in this paper was discovered because the data were posted. If more data were posted, fraud would be easier to catch. And if fraud is easier to catch, some potential fraudsters may be more reluctant to do it. . . .
Until that day comes, all of us have a role to play. As authors (and co-authors), we should always make all of our data publicly available. And as editors and reviewers, we can ask for data during the review process, or turn down requests to review papers that do not make their data available. A field that ignores the problem of fraud, or pretends that it does not exist, risks losing its credibility. And deservedly so.
Their post concludes with letters from four of the authors of the now-discredited 2012 paper. All four of these authors agree that Uri et al. presented unequivocal evidence of fraud. Only one of the authors handled the data; this was Dan Ariely, who writes:
I agree with the conclusions and I also fully agree that posting data sooner would help to improve data quality and scientific accuracy. . . . The work was conducted over ten years ago by an insurance company with whom I partnered on this study. The data were collected, entered, merged and anonymized by the company and then sent to me. . . . I was not involved in the data collection, data entry, or merging data with information from the insurance database for privacy reasons.
Some related material
Lots of people sent me the above-linked post by Uri, Joe, and Leif. Here are a few related things that some people sent in:
– Kevin Lewis pointed to the betting odds at Polymarket.com (see image at top of post).
– Gary Smith pointed to this very recent article, “Insurance Company Gives Sour AI promises,” about an insurance company called Lemonade:
In addition to raising hundreds of millions of dollars from eager investors, Lemonade quickly attracted more than a million customers with the premise that artificial intelligence (AI) algorithms can estimate risks accurately and that buying insurance and filing claims can be fun . . . The company doesn’t explain how its AI works, but there is this head-scratching boast:
A typical homeowners policy form has 20-40 fields (name, address, bday…), so traditional insurers collect 20-40 data points per user.
AI Maya asks just 13 Q’s but collects over 1,600 data points, producing nuanced profiles of our users and remarkably predictive insights.
This mysterious claim is, frankly, a bit creepy. How do they get 1,600 data points from 13 questions? Is their app using our phones and computers to track everywhere we go and everything we do? The company says that it collects data from every customer interaction but, unless it is collecting trivia, that hardly amounts to 1,600 data points. . . . In May 2021 Lemonade posted a problematic thread to Twitter (which was later deleted):
When a user files a claim, they record a video on their phone and explain what happened. Our AI carefully analyzes these videos for signs of fraud. [AI Jim] can pick up non-verbal cues that traditional insurers can’t, since they don’t use a digital claims process. This ultimately helps us lower our loss ratios (aka how much we pay out in claims vs. how much we take in).
Are claims really being validated by non-verbal cues (like the color of a person’s skin) that are being processed by black-box AI algorithms that the company does not understand?
There was an understandable media uproar since AI algorithms for analyzing people’s faces and emotions are notoriously unreliable and biased. Lemonade had to backtrack. A spokesperson said that Lemonade was only using facial recognition software for identifying people who file multiple claims using multiple names.
I agree with Smith that this sounds fishy. In short, it sounds like the Lemonade people are lying in one place or another. If they’re really only using facial recognition software for identifying people who file multiple claims using multiple names, then that can’t really be described as “pick[ing] up non-verbal cues.” But yeah, press releases. People lie in press releases all the time. There could’ve even been some confusion, like maybe the nonverbal cues thing was a research idea that they never implemented, but the public relations writer heard about it and thought it was already happening.
The connection to the earlier story is that Dan Ariely works at Lemonade; he’s their Chief Behavioral Officer, or at least he had this position in 2016. I hope he’s not the guy in charge of detecting fraudulent claims, as it seems that he’s been fooled by fraudulent data from an insurance company at least once in the past.
– A couple people also pointed me to this recent Retraction Watch article from Adam Marcus, “Prominent behavioral scientist’s paper earns an expression of concern,” about a 2004 article, “Effort for Payment: A Tale of Two Markets.” There were inconsistencies in the analysis, and the original data could not be found. The author said, “It’s a good thing for science to put a question mark [on] this. . . . Most of all, I wish I kept records of what statistical analysis I did. . . . That’s the biggest fault of my own, that I just don’t keep enough records of what I do.” It actually sounds like the biggest fault was not a lack of records of the analysis, but rather no records of the original data.
Just don’t tell me they’re retracting the 2004 classic, “Garfield: A Tail of Two Kitties.” I can take a lot of bad news, but Bill Murray being involved in a retraction—that’s a level of disillusionment I can’t take right now.
Why such a big deal?
The one thing I don’t quite understand is why this latest case got so much attention. It’s an interesting case, but so were the Why We Sleep story and many others. Also notable is how this seems to be blowing up so fast, as compared with the Harvard primatologist or the Cornell Food and Brand episode, each of which took years to play out. Maybe people are more willing to accept that there has been fraud, whereas in these earlier cases lots of people were bending over backward to give people the benefit of the doubt? Also there’s the dramatic nature of this fraud, which is similar to that UCLA survey from a few years ago. The Food and Brand Lab data problems were so messy . . . it was clear that the data were nothing like what was claimed, but the setup was so sloppy that nobody could figure out what was going on (and the perp still seems to live in a funhouse world in which nothing went wrong). I’m glad that Uri et al. and Retraction Watch did these careful posts; I just don’t quite follow why this story got such immediate interest. One person suggested that people were reacting to the irony of fraud in a study about dishonesty?
The other interesting thing is that, as reported by Uri et al., the results of the now-discredited 2012 article failed to show up in an independent replication. And nobody seems to even care.
Here’s some further background:
Ariely is the author of the 2012 book, “The Honest Truth About Dishonesty: How We Lie to Everyone—Especially Ourselves.” A quick google search finds him featured in a recent Freakonomics radio show called, “Is Everybody Cheating These Days?”, and a 2020 NPR segment in which he says, “One of the frightening conclusions we have is that what separates honest people from not-honest people is not necessarily character, it’s opportunity . . . the surprising thing for a rational economist would be: why don’t we cheat more?”
But . . . wait a minute! The NPR segment, dated 17 Feb 2020, states:
That’s why Ariely describes honesty as something of a state of mind. He thinks the IRS should have people sign a pledge committing to be honest when they start working on their taxes, not when they’re done. Setting the stage for honesty is more effective than asking someone after the fact whether or not they lied.
And that last sentence links directly to the 2012 paper—indeed, it links to a copy of the paper sitting at Ariely’s website. But the new paper with the failed replications, “Signing at the beginning versus at the end does not decrease dishonesty,” by Kristal Whillans, Bazerman, Gino, Shu, Mazar, and Ariely, is dated 31 Mar 2020, and it was sent to the journal in mid-2019:
Ariely, as a coauthor of this article, had to have known for at least half a year before the NPR story that this finding didn’t replicate. But in that NPR interview he wasn’t able to spare even a moment to share this information with the credulous reporter? This seems bad, even aside from any fraud. If you have a highly publicized study, and it doesn’t replicate, then I think you’d want to be damn clear with everyone that this happened. You wouldn’t want the national news media to go around acting like your research claims held up, when they didn’t.
I guess that PNAS might retract the paper (that’s what the betting odds say!) NPR will eventually report on this story, and Ted might take down the talks (no over-under action on this one, unfortunately), but I don’t know that they’ll confront the underlying problem. What I’d like is not just an NPR story, “Fraud case rocks the insurance industry and academia,” but something more along the lines of:
We at NPR were also fooled. Even before the fraud was revealed, this study which failed to replicate was reported without qualification. This is a problem. Science reporters rely on academic scientists. We can’t vet all the claims told to us, but at the very least we need to increase the incentives for scientists to be open to us about their failures, and reduce the incentives for them to exaggerate. NPR and Ted can’t get everything right, but going forward we endeavor to be part of the solution, not part of the problem. As a first step, we’re being open about how we were fooled. This is not just a story we are reporting; it’s also a story about us.
P.S. Recall the Armstrong Principle.
P.P.S. More on the story from Stephanie Lee.
P.P.P.S. And more from Jonatan Pallesen, who concludes, “This is a case of fraud that is completely bungled by ineptitude. As a result it had signs of fraud that were obvious from just looking at the most basic summary statistics of the data. And still, it was only discovered after 9 years, after someone attempted a replication. . . . This makes it seem likely that there is a lot more fraud than most people expect.”