“It’s not just that the emperor has no clothes, it’s more like the emperor has been standing in the public square for fifteen years screaming, I’m naked! I’m naked! Look at me! And the scientific establishment is like, Wow, what a beautiful outfit.”

Somebody pointed Nick Brown to another paper by notorious eating behavior researcher Brian Wansink. Here’s Brown:

I have that one in my collection of PDFs. I see I downloaded it on January 7, 2017, which was 3 days before our preprint went live. Probably I skimmed it and didn’t pay much further attention. I don’t know if my coauthors looked at it. Let’s give it five minutes worth of attention:

1. I notice right off the bat that the first numerical statement in the Method section contains a GRIM inconsistency:
“Data collection took place in 60 distinct FSR ranging from large chains (e.g., AppleBees®, Olive Garden®, Outback Steakhouse®, TGIF®) to small independent places (58.8%).”
58.8% is not possible. 35 out of 60 is 58.33%. 36 out of 60 is 60%.

2. The split of interactions by server gender (female 245, male 250) does not add up to the total of 497 interactions. The split by server BMI does. Maybe they couldn’t determine server gender in two cases. (However, one would expect far fewer servers than interactions. Maybe with the reported ethnic and gender percentage splits of the servers we can work out a plausible number of total servers that match those percentages when correctly rounded. Maybe.)

3. The denominator degrees of freedom for the F statistics in Table 1 are incorrect (N=497 implies df2=496 for the first two, 495 for the third; subtract 2 if the real N is in fact 405 rather than 407).

4. In Table 5, the total observations with low (337) and high (156) BMI servers do not match the numbers (low, 215, high, 280) in Table 2.

There are errors right at the surface, and errors all the way through: the underlying scientific model (in which small, seemingly irrelevant manipulations are supposed to have large and consistent effects, a framework which is logically impossible because all these effects could interact with each other), the underlying statistical approach (sifting through data to find random statistically-significant differences which won’t replicate), the research program (in which a series of papers are published, each contradicting something that came before but presented as if they are part of a coherent whole), the details (data that could never have been, incoherent descriptions of data collection protocols, fishy numbers that could never have occurred with any data), all wrapped up in an air of certainty and marketed to the news media, TV audiences, corporations, the academic and scientific establishment, and the U.S. government.

What’s amazing here is not just that someone publishes low-quality research—that happens, journals are not perfect, and even when they make terrible mistakes they’re loath to admit it, as in the notorious case of that econ journal that refused to retract that “gremlins” paper which had nearly as many errors as data points—but that Wansink was, until recently, considered a leading figure in his field. Really kind of amazing. It’s not just that the emperor has no clothes, it’s more like the emperor has been standing in the public square for fifteen years screaming, I’m naked! I’m naked! Look at me! And the scientific establishment is like, Wow, what a beautiful outfit.

A lot of this has to be that Wansink and other social psychology and business-school researchers have been sending a message (that easy little “nudges” can have large and beneficial effects) that many powerful and influential people want to hear. And, until recently, this sort of feel-good message has had very little opposition. Science is not an adversarial field—it’s not like the U.S. legal system where active opposition is built into its processes—but when you have unscrupulous researchers on one side and no opposition on the other, bad things will happen.

P.S. I wrote this post in Sep 2017 and it is scheduled to appear in Mar 2018, by which time Wansink will probably be either president of Cornell University or the chair of the publications board of the Association for Psychological Science.

P.P.S. We’ve been warning Cornell about this one for awhile.

34 thoughts on ““It’s not just that the emperor has no clothes, it’s more like the emperor has been standing in the public square for fifteen years screaming, I’m naked! I’m naked! Look at me! And the scientific establishment is like, Wow, what a beautiful outfit.”

  1. As long as we are being fanciful–Wansink as “either president of Cornell University or the chair of the publications board of the Association for Psychological Science,” try this now that Easter/ Passover is at hand and there has been some snow in the Northeast:

    https://www.washingtonpost.com/local/dc-politics/dc-lawmaker-says-recent-snowfall-caused-byrothschilds-controlling-the-climate/2018/03/18/daeb0eae-2ae0-11e8-911f-ca7f68bff0fc_story.html?utm_term=.ad27d8994eda

    “Man, it just started snowing out of nowhere this morning, man. Y’all better pay attention to this climate control, man, this climate manipulation,” he says. “And D.C. keep talking about, ‘We a resilient city.’ And that’s a model based off the Rothschilds controlling the climate to create natural disasters they can pay for to own the cities, man. Be careful.”

    [D.C. Council member Trayon White Sr. (D-Ward 8)] expressed surprise that his remarks might be construed as anti-Semitic. Asked to clarify what he meant, he wrote, “The video says what it says.”

    • For more on the control of the weather by the Rothschilds

      https://www.washingtonpost.com/news/retropolis/wp/2018/03/19/the-rothschilds-a-pamphlet-by-satan-and-conspiracy-theories-tied-to-a-battle-200-years-ago/?utm_term=.0e70aa5feff9

      “It wasn’t the first time White repeated anti-Semitic conspiracy theories about the Rothschilds. During a Feb. 27 meeting with Mayor Muriel E. Bowser and other top city officials, he was captured on video claiming that the family controls the World Bank and the federal government.”

      “The conspiracy theories about the Rothschilds — and other groups, Jewish or not — are spread in online forums, self-published books, right-wing and religious radio programs, and especially YouTube, where seemingly normal and harmless people spin complicated, illogical theories that viewers seeking to confirm their owns views can easily find.”

      And here is a lengthy video by a young religious woman explaining everything about the Rothschilds’ control of climate and pedophilia:

      https://www.youtube.com/watch?v=6aXA0e7k87Q

      As is often remarked, “it is turtles all the way down” because one of her many followers commented that “The Rothschilds are a smokescreen for the Vatican, they are merely the Treasurer for the Vatican.”

      • The machine that controls the weather is in Agartha (inner earth continent). It is inhabited by the beings usually called “greys” and confused for extraterrestrials. There is a huge psyop to get people looking in the exact wrong direction. They are actually “intraterrestrials”, they’ve got you looking *up* for the source of UFOs when you should be looking *down*.

        The Nazis (descendants of those chased down there by Admiral Byrd after WWII) are currently prevented from launching a large scale UFO attack via the icecaps so they made a treaty with the Greys and Republicans to melt the icecaps with global warming. Joe Biden was sent to negotiate a treaty in 2009 but that fell through. As a plan B, the Rothschilds are trying to prevent the Nazi attack by lowering the temperature with chemtrails (aluminum sulfate). Following the Brooking’s report, they want to do this without revealing there is a secret war going on to the sheeple, thus the CO2 theory that was sold to the public. Unfortunately, they used statistical significance[0] when developing the chemtrails so it doesn’t work right, thus causing this snow storm.

        [0] Another psyop developed to slow nuclear proliferation until a large enough neutrino beam can be constructed by the japanese on the dark side of the moon to deactivate all the worlds nukes

  2. The framework of large consistent effects triggered by small nudges keeps reminding me for some reason of hierarchy problem in physics. The idea there being that we predict all these extremely large positive and negative corrections to some fundamental value, but somehow they all manage to precisely cancel out. It seems like to do an experiment living in the highly-leveraged world of small nudges and big effects requires some cancellations like that for all the nudges you aren’t looking at!

    • “And all matter is a mixture of positive protons and negative electrons which are attracting and repelling with this great force. So perfect is the balance however, that when you stand near someone else you don’t feel any force at all. If there were even a little bit of unbalance you would know it. If you were standing at arm’s length from someone and each of you had one percent more electrons than protons, the repelling force would be incredible. How great? Enough to lift the Empire State building? No! To lift Mount Everest? No! The repulsion would be enough to lift a “weight” equal to that of the entire earth!” — Richard Feynman

  3. The Song Remains the Same – relevant snippets from the Tilburg report

    “5.2 Generalizability of the findings from local to national and international culture

    The discovery of the methodological defects, which constitutes an unintended and unexpected finding of this inquiry, did raise the crucial question for the Committees as to whether this research culture … is also rife throughout the field … Could it be that in general some aspects of this discipline’s customary methods should be deemed incorrect from the perspective of academic standards and scientific integrity?

    It was extremely rare for his extraordinarily neat findings to be subjected to serious doubt …

    [S]everal co-authors … defended the serious and less serious violations of proper scientific method with the words: ‘that is what I have learned in practice; everyone in my research environment does the same, and so does everyone we talk to at international conferences.

    5.3 Verification bias and missing replications

    One of the most fundamental rules of scientific research is that an investigation must be designed in such a way that facts that might refute the research hypotheses are given at least an equal chance of emerging as do facts that confirm the research hypotheses.

    Verification bias is not the same as the ‘usual’ publication bias … Verification bias refers to something more serious: the use of research procedures in such a way as to ‘repress’ negative results by some means. [What follows this discussion dovetails nicely with the recently released emails between Wansink and his researchers].

    5.6 Failure of scientific criticism

    [T]he urgent question that remains is why this fraud and the widespread violations of sound scientific methodology were never discovered in the normal monitoring procedures in science.

    The data and findings were in many respects too good to be true… The effects were improbably large… Highly conspicuous impossible findings went unnoticed.

    It is almost inconceivable that … reviewers of the international ‘leading journals’, who are deemed to be experts in their field … did not notice the reporting of impossible statistical results … Virtually nothing of all the impossibilities, peculiarities and sloppiness mentioned in this report was observed by all these local, national and international members of the field.”

    Any bets on whether Cornell can live up to Tilburg’s standards?

    • I am very happy that people still pay attention to the Tilburg report! :-)

      At the time when it was brought out, social psychologists where not happy AT ALL.

      I am quite proud to tell you that since then the Tilburg social psychologists are in general front runners in trying to do science in a better way.

    • Here you can find the Levelt Report (also available in English):

      https://www.tilburguniversity.edu/nl/over/profiel/kwaliteit-voorop/commissie-levelt/

      What i find interesting is that:

      1) it apparently took 3 young researchers to dig deeper and be the whistle blowers

      2) their names were kept a secret as far as i know at the time, and i have not heard anything about these, what can be perceived to be, scientific heroes. If this is (still) correct: what are the odds that this information about these 3 whistle blowers has not been leaked by anyone or that they have come forward themselves?

  4. The “emperor has no clothes phenomenon” is all too common in psychology. I know of two very highly cited and impactful papers in my discipline (together they have almost 3,000 citations) in which the central findings are mathematically impossible and obviously so for anyone with even a cursory understanding of the statistical methods used in the papers. The authors know that they are impossible, the editors know that they are impossible and yet neither one has been corrected.

  5. Not that these details are really important, but regarding the 58.8%, it doesn’t just rule out 60 restaurants — it rules out everything near 60. The closest I could find was 30 out of 51. And note that elsewhere in the paper, Wansink writes, “To accomplish this goal, 497 interactions between diners and servers have been observed in more than 50 restaurants across the United States.” Describing 51 as “more than 50” seems reasonable; describing 60 this way seems less so.

    On the other hand, perhaps the 58.8% refers to 292 out of 497? This would make sense of the 94.0% and the 45.7% in the following two sentences.

    Damnit, now I’ve wasted 15 minutes of my day thinking about Wansink.

  6. The first paper is by Walumbwa et al. (2008) in which the authors claim that authentic leadership is a higher-order construct. To test this they present data that a second-order model with four first-order factors exhibits significantly better fit (lower chi-square) than a first-order model with four correlated factors. This is impossible because the second-order model is nested within the first-order model. At best, the second-order model can exhibit fit that is not significantly worse than the first-order model. The fit indexes reported for the preferred second-order model are also inconsistent with each other and there are plenty of other impossible findings reported in that paper. This paper forms the foundation of much of the literature on authentic leadership.

    Walumbwa, F. O., Avolio, B. J., Gardner, W. L., Wernsing, T. S., & Peterson, S. J. (2008). Authentic leadership: Development and validation of a theory-based measure. Journal of Management, 34(1), 89-126. DOI: 10.1177/0149206307308913

    The second-paper is by Duckworth and Quinn (2009). Here the authors argue that grit is a second-order factor with two first-order factors (perseverance and passion). They test this model and provide fit indexes and parameter estimates but the model is not identified at the higher-order level because they are essentially trying to estimate two parameters a and b to reproduce a single correlation/covariance c such that a*b=c. Obviously there is no unique solution to a and b and stats software will inform the user that aspects of the model are unidentified. The authors also fail to test the alternative model in which the two first-order factor are simply kept distinct from each other. This paper is used as a foundation for the entire literature on grit that made Angela Duckworth famous but there is simply no evidence that it is appropriate to combine perseverance and passion to form a construct called grit.

    https://www.ncbi.nlm.nih.gov/pubmed/19205937

    The editors of both journals have been informed of these errors. In the case of the first paper the editors have known for over five years and one even admitted that I was “obviously correct” when I pointed out these errors.

    • How were the editors informed?

      Do those journals have letters sections that accept papers pointing out the invalidity of their published papers?

      This problem is a recurring theme on this blog.

      • The editors were informed in a variety of ways. In the case of the first paper I submitted an article outlining the errors. The action editor told me that I was “obviously correct” in my assertions but still ended up rejecting the paper. I’ve also corresponded with subsequent editors at the journal about the errors.
        For the second paper I contacted the editor via e-mail and he asked that I write and submit a paper outlining the errors.

      • Thanks, Marcus.

        It seems a big problem. Someone (in a different field from yours) said you’re up against both the editor and the flawed paper’s authors, who are sent the paper. The editor’s criterion is e.g. that the flaw must be ‘central to the paper’s main conclusions’, etc., and it’s easy for him to claim it isn’t. The editor is not on your side, because he and the reviewers were the ones who accepted the flawed work, and he won’t want to admit making a mistake.

        Did you try sending the paper to a different journal?

  7. Andrew: “Science is not an adversarial field—it’s not like the U.S. legal system where active opposition is built into its processes”

    In some fields, it sort of is. Do you think it would be better if it is (mederately) adversarial?

  8. You write: “the research program (in which a series of papers are published, each contradicting something that came before but presented as if they are part of a coherent whole)”

    This really comes out if you read his book “Mindless Eating” (I don’t recommend it). Each finding is somehow woven into his overall narrative even if initially the finding seems contradictory (not to mention it is impossible to find evidence that some of the studies he cites ever took place).

    Reading his book I was reminded of “How We Decide”, which similarly wove apparently contradictory scientific findings into a consistent narrative. To my delight, I then discovered “How We Decide” has been pulled from the shelves due to “falsifications”: https://en.wikipedia.org/wiki/How_We_Decide

    • Jordan:

      I think what’s going on is that Wansink, and people who read his book, and even some of the people spending millions of dollars of government funding to implement his ideas, are parsing these ideas on the shallowest level. The basic idea is that nudges work. Just about anything can be considered to be a nudge, and just about anything can be said to work, so anything goes. Smaller plates cause people to eat less of something? Great. Goofy names cause people to eat more of something else? Great. It reminds me of sociologist Jeremy Freese’s description of some theories as “more vampirical than empirical—unable to be killed by mere evidence.” A related set of examples come from the studies of social priming. Anything can be a “prime,” and anything can have any effect. Priming you with words connecting to old people can make you walk slower—or it can remind you of your inevitable decline, and motivate you to walk faster. Etc. As I keep saying, “p-hacking” is the least of the problems here.

      • I have found that if you raise questions with a psychology grad student about the validity of social priming, stereotype threat, and related concepts, you tend to get similarly shallow, shifting answers. They have already learned the routine — Well, this is all so fascinating; Every little bit of information helps us understand people a little more; and of course, the biggie, Yes, I understand the niggling math concerns, no study is perfect, but to me this seems entirely plausible.

        • Kyle:

          Recall our discussion regarding, “When does research have active opposition?” With a silly psychology study, there’s little active opposition (at least, typically not much until the research gets heavily promoted and people start to get irritated) so lots of iffy claims can slide by.

          But in the recent thread about the paper on drug abuse, the research is discussed in the context of active policy, and the claims are in opposition to what seems to be generally believed by people who work in the drug-abuse field, so it makes sense that there’s a lot more pushback. A statement such as “In most of the graphs I think there’s clearly a change in the mean at time zero, and we have a zillion robustness checks that convince us the patterns are real. I also wish the data were less noisy but such is life” (see link here) might be enough to satisfy some other academic economists but I don’t think it will work for a general audience.

Leave a Reply

Your email address will not be published. Required fields are marked *