I have that one in my collection of PDFs. I see I downloaded it on January 7, 2017, which was 3 days before our preprint went live. Probably I skimmed it and didn’t pay much further attention. I don’t know if my coauthors looked at it. Let’s give it five minutes worth of attention:
1. I notice right off the bat that the first numerical statement in the Method section contains a GRIM inconsistency:
“Data collection took place in 60 distinct FSR ranging from large chains (e.g., AppleBees®, Olive Garden®, Outback Steakhouse®, TGIF®) to small independent places (58.8%).”
58.8% is not possible. 35 out of 60 is 58.33%. 36 out of 60 is 60%.
2. The split of interactions by server gender (female 245, male 250) does not add up to the total of 497 interactions. The split by server BMI does. Maybe they couldn’t determine server gender in two cases. (However, one would expect far fewer servers than interactions. Maybe with the reported ethnic and gender percentage splits of the servers we can work out a plausible number of total servers that match those percentages when correctly rounded. Maybe.)
3. The denominator degrees of freedom for the F statistics in Table 1 are incorrect (N=497 implies df2=496 for the first two, 495 for the third; subtract 2 if the real N is in fact 405 rather than 407).
4. In Table 5, the total observations with low (337) and high (156) BMI servers do not match the numbers (low, 215, high, 280) in Table 2.
There are errors right at the surface, and errors all the way through: the underlying scientific model (in which small, seemingly irrelevant manipulations are supposed to have large and consistent effects, a framework which is logically impossible because all these effects could interact with each other), the underlying statistical approach (sifting through data to find random statistically-significant differences which won’t replicate), the research program (in which a series of papers are published, each contradicting something that came before but presented as if they are part of a coherent whole), the details (data that could never have been, incoherent descriptions of data collection protocols, fishy numbers that could never have occurred with any data), all wrapped up in an air of certainty and marketed to the news media, TV audiences, corporations, the academic and scientific establishment, and the U.S. government.
What’s amazing here is not just that someone publishes low-quality research—that happens, journals are not perfect, and even when they make terrible mistakes they’re loath to admit it, as in the notorious case of that econ journal that refused to retract that “gremlins” paper which had nearly as many errors as data points—but that Wansink was, until recently, considered a leading figure in his field. Really kind of amazing. It’s not just that the emperor has no clothes, it’s more like the emperor has been standing in the public square for fifteen years screaming, I’m naked! I’m naked! Look at me! And the scientific establishment is like, Wow, what a beautiful outfit.
A lot of this has to be that Wansink and other social psychology and business-school researchers have been sending a message (that easy little “nudges” can have large and beneficial effects) that many powerful and influential people want to hear. And, until recently, this sort of feel-good message has had very little opposition. Science is not an adversarial field—it’s not like the U.S. legal system where active opposition is built into its processes—but when you have unscrupulous researchers on one side and no opposition on the other, bad things will happen.
P.S. I wrote this post in Sep 2017 and it is scheduled to appear in Mar 2018, by which time Wansink will probably be either president of Cornell University or the chair of the publications board of the Association for Psychological Science.
P.P.S. We’ve been warning Cornell about this one for awhile.