“Statistical heartburn: An attempt to digest four pizza publications from the Cornell Food and Brand Lab”

Tim van der Zee​, Jordan Anaya​, and Nicholas Brown posted this very detailed criticism of four papers published by food researcher and business school professor Brian Wansink. The papers are all in obscure journals and became notorious only after Wansink blogged about them in the context of some advice he was giving to graduate students. Last month someone pointed out Wansink’s blog, and I took a look at the papers and criticized them here. The short story is that all four papers come from a single experiment that Wansink himself had characterized as “flawed.” And, although the four papers were all based on the same data, they differed in all sorts of detail, which suggested that the authors opportunistically used data exclusion, data coding, and data analysis choices to obtain publishable (that is, p less than .05) results.

I also disagreed with Wansink’s claim that “doing the analyses for a paper” was more valuable than “Facebook, Twitter, Game of Thrones, Starbucks, spinning class.” I think that watching a season’s worth of episodes of Game of Thrones is more valuable than writing a paper such as “Eating Heavily: Men Eat More in the Company of Women.” Views on this may differ, however.

Zee, Anaya, and Brown did a more thorough job than I did. They went through all four papers in excruciating detail, and they find literally dozens of errors. Some of these errors are small. For example, the following table has inconsistencies marked in red:

I thought I’d check:

> (43.67 - 44.55)/sqrt(18.5^2/43 + 14.3^2/52)
[1] -0.2551872
> (68.65 - 66.51)/sqrt(3.67^2/43 + 9.44^2/52)
[1] 1.503114
> (184.83 - 178.38)/sqrt(63.7^2/43 + 45.71^2/52)
[1] 0.556064

OK, the first one isn’t so bad. The t statistic is 0.26 (to two decimal places) but they mark it as 0.25. The next two aren’t so far off either—1.38 and 0.52 instead of 1.50 and 0.56—still, it’s kind of impressive , in some sort of perverse way, that they got something wrong in every line of their table. The next two tables, too, have errors in every line. How can you manage that? Really this is kind of amazing. This Wansink guy is giving Richard Tol a run for a his money.

I guess Tol is worse because he has a political agenda, whereas I assume that Wansink just wants to churn out publications.

But wait . . . there’s more!

And there are other inconsistencies even beyond what’s in that PeerJ article. See here:

1. Wansink flat-out admits to selection bias. He writes that “Plan A [of his data analysis] had failed” and then he shares four papers from this study—these must be the results of Plans B, C, D, and E, or something like that—but we never hear about failed Plan A. This is never published, and in none of his four papers (or in his subsequent blog) does he ever say what Plan A was. I don’t think I’ve ever heard of such a blatant and acknowledged example of selection bias. I can only assume that Wansink doesn’t realize how serious this is, or else he wouldn’t’ve so blithely mentioned it. Fake it till you make it, indeed.

2. There’s a fundamental inconsistency between two of Wansink’s statements: (a) He describes his experiment as a “failed study which had null results” and (b) he seems to think the data are so great that they’re worth 4 papers. If he can get 4 papers out of a failed study, it kinda makes you wonder what he can get out of a successful study. 8 papers, maybe? 16?

3. See the P.P.P.S. at the above-linked blog post. A note to one of the papers contains this, in the note on Authors’ contributions: “OS collected and analyzed data, and helped draft the manuscript.” But in his post, he clearly states that the study had been designed and the data collected before Ozge Sigirci (“OS”) arrived. So this is a flat-out contradiction.

Haters gonna hate?

Finally, I would not be surprised if Wansink’s response to all this is to shrug it off: Don’t feed the trolls, Haters gonna hate, Who cares about picky people, etc. (He seems too mellow to go the “replication terrorists” route.) So I’d pre-empt this by saying: Nobody forced Wansink to write and publish a series of false statements. He could’ve written the truth at any point. It is kind of exhausting to sift through published work and find error after error, but I’d say the responsibility for these errors falls on the authors.

60 thoughts on ““Statistical heartburn: An attempt to digest four pizza publications from the Cornell Food and Brand Lab”

  1. Just a note on the t test replications in this post: Andrew seems to be using the formula for Welch’s t test, which does not assume equal variances and typically returns non-integer degrees of freedom. We believe that these articles used Student’s original t test formula, which assumes equal variances.

    We base this belief on the observation that most of the test statistics reported in the articles are from ANOVAs, and the degrees of freedom (which are usually, but not always, reported) are integers in all cases; hence, we assume that Welch’s ANOVA (which, similarly, gives non-integer degrees of freedom) was not used. We would therefore be surprised if Welch’s t test had been used. SPSS typically reports both results, labelled as “Equal variances assumed” and “Equal variances not assumed”.

    Moreover, if Welch’s t test is used to calculate the few reported t statistics in the articles (eight in what we called Article 3, and three in what we called Article 4), in most cases the discrepancy compared to the numbers reported in the articles is greater. In no case that we have checked does the presence or absence of a discrepancy depend on which version of the t test was used.

    For readers of the blog who have not yet read the article, I would also add that we took as much care as we could to eliminate the consequences of rounding error (i.e., the fact that the means and SDs going into the t tests and ANOVAs that we performed had typically been rounded to two decimal places, so that a mean of 68.35 could have corresponded to any value from 68.345 through 68.355). We examined every combination of the most extreme possible values of these means, and used either the minimum or maximum possible standard deviations, to ensure that we covered the full range of possible test statistics that were compatible with these rounded means and SDs. Our code (R and Python) is available and we would welcome critical comments on this method (as well as any other aspects of the preprint).

    • Nick:

      So, like this (for the first line of the table)?

      > sd_pooled <- sqrt((42*18.50^2 + 51*14.30^2)/(42 + 51))
      > se <- sd_pooled*sqrt(1/43 + 1/52)
      > print((43.67 - 44.55)/se)
      [1] -0.2614218
      
      • Yes, that’s it.

        In fact I just checked and with a certain amount of rounding error, it is actually possible for that figure of 0.25 (the t statistic for difference in ages across conditions) to be correct, if they used Welch’s t test. So we should perhaps have been extra-conservative and not highlighted that one number. The second number (for height) comes out far worse than the article if you use Welch’s t test; the third number comes out a little better.

        Basically, it seems to be a crap shoot. We scratched our heads at length, but couldn’t work out any pattern that would explain the various errors. And not all that problems are with the reporting of statistics; there seem to be several contradictions in the story of what happened during the field study.

        • There have been some very interesting recent developments.

          First, it appears that the authors have addressed your question at PubPeer: https://pubpeer.com/publications/92B836EDBA3F705300E46467F6E4F5

          Second, Dr. Wansink recently posted a second Addendum to his blog post, and responded to questions: http://www.brianwansink.com/phd-advice/the-grad-student-who-never-said-no

          I find it interesting that they wouldn’t respond to our emails, that Dr. Wansink would not engage in conversations on Twitter, and posts on his blog were ignored for a week. Perhaps the Cornell Food and Brand lab saw that the preprint was downloaded 2800 times and that this wasn’t something they could ignore. Or maybe they were contacted by some of the journals and decided to try and get ahead of the story.

          There’s a lot to digest here (pun intended).

          The data sharing thing is complete BS, they keep saying that can’t share the data because it contains diners’ names. Umm, do they think we are asking them to photocopy the data set to us? They can just remove any sensitive information.

          In his addendum he states, “We’ve always been pleased to be a group that’s accurate to the 3rd decimal point.” Looking at other publications from his lab suggests otherwise, and I’m getting tempted to take a closer look.

          They say they are redoing all the analyses and submitting erratums to the journals. They are leaving this task to a “Stats Pro”. This person might just have the most difficult job in America. I’m very curious to see how the corrections will resolve all of the impossible sample sizes across papers. And if it somehow does, will this be the largest erratum in history?

          Perhaps most importantly, there still has not been any acknowledgment of the clear HARKing that occurred and its serious implications.

          This is the story that just keeps on giving, kind of like his “bottomless bowls” (hint, hint, I’ve taken a quick look at that paper).

        • Jordan:

          Yes, Wansink has been contacted by a journalist so he must feel that it’s time to get ahead of the story. And I’m glad that he is polite in his responses; he’s not lashing out in the manner of Richard Tol or John Bargh or Susan Fiske. That’s really good.

          My impression remains that Wansink is a naif when it comes to research methods, and that he has not, and does not, understand the issue of forking paths. There’s no shame in that—it just puts him in the position of the editors of Psychological Science and JPSP five years ago, or the editors of PPNAS, NPR, and Ted today.

          The bad news is that, so far, Wansink seems to have turned down the opportunity to learn. A bunch of commenters on his blog have pointed out the obvious problems with his research methods, but he just responds blandly in an in-one-ear-and-out-the-other kind of way.

          Here’s a representative example:

          Anthony St. John writes:

          “With field studies, hypotheses usually don’t “come out” on the first data run. But instead of dropping the study, a person contributes more to science by figuring out when the hypo worked and when it didn’t.” [quoting Wansink]

          I suggest you read this xkcd comic carefully: https://xkcd.com/882/

          It provides a great example of learning from a “deep dive”. [quoting Wansink]

          Brian Wansink replies:

          Hi Anthony,

          I like it. Thanks for the link. (Makes me grateful I’m more of a purple jelly bean guy).

          Best,

          Brian

          Anyone who looks at that famous xkcd jelly-bean cartoon will immediately realize that it’s slamming the “deep dive and look for statistical significance” approach to research. But Wansink follows the link and . . . doesn’t get the point? Doesn’t realize that St. John, like most of the other commenters on the blog are saying he’s doing everything exactly wrong?

          What gives? Is Wansink that clueless or is he just pretending not to understand? I have no idea.

          I still hope, however, that Wansink or others in his field can use this controversy to move forward in his research methods.

          The first step is to recognize that there’s a big problem that he still has never stated what was his original Plan A which bombed so badly that he characterized this experiment as “failed study which had null results.”

          The second step is to recognize that the data from those four papers are basically noise. The problem is not just the 150 (!) errors—although that certainly is a problem. It’s also his attitude! When you publish four papers from a “failed study,” and the statistical methods in those papers are criticized by experts, and when an outside team finds 150 errors in the papers, the appropriate response is not to say you’re gonna go fix some little things and “correct some of these oversights.” No. The appropriate response is to consider that maybe, just maybe, the data in those papers don’t support your claims.

          Let me put it this way. At some point, there must be some threshold where even Brian Wansink might think that a published paper of his might be in error—by which I mean wrong, really wrong, not science, data not providing evidence for the conclusions. What I want to know is, what is this threshold? We already know that it’s not enough to have 15 or 20 comments on Wansink’s own blog slamming him for using bad methods, and that it’s not enough when a careful outside research team finds 150 errors in the papers. So what would it take? 50 negative blog comments? An outside team finding 300 errors? What about 400? Would that be enough? If the outsiders had found 400 errors in Wansink’s papers, then would he think that maybe he’d made some serious errors.

          The whole thing just baffles me. On one hand, Wansink seems so naive, but on the other hand, who could be so clueless as to not suspect a problem when hundreds of errors have been found in these papers? Most scientists I know would get concerned if someone found one error.

          And what can it possibly mean when he writes, “We’ve always been pleased to be a group that’s accurate to the 3rd decimal point”? That makes no sense given the incredible density of errors on those four papers.

          The whole thing is so baffling to me. I just can’t figure out any good explanations for what we’re seeing. How do you get so many errors, followed up by such a serene sense of certainty that the conclusions can’t possibly change.

        • Does remind me of a clinician who had setup a data base of patient hospital outcomes and my director invited him to present to research requesting a sub-set of entries from the data base for a clinical resident to verify.

          After their presentation arguing how comprehensive and accurate the data base was the resident summarized all the errors he had found in just verifying 30 patients (including one patient who was alive and had been recorded as dead.)

          They continued to claim and protest that was comprehensive and accurate and we just did not appreciate this. The only sense I could make this was that his data base had a large range clinical measures and that was supposed to somehow make up the distracting entry errors we should just overlook.

          Whatever they were thinking they were clear that they thought we did not understand something…

        • Keith:

          Yeah, I’m kinda guessing that Wansink feels he already knows all the answers and that he defines a successful experiment as one that confirms his views (or, more generally, that has any “p less than .05” pattern around which he can wrap a story and get a publication). Since thinks he already knows the answer, the data in a paper could literally be 100% wrong—every number could be false—but he’d still think the conclusion was correct.

          And, who knows, maybe all (or almost all) his conclusions really are correct. But then why bother doing the experiments at all? I guess it’s just what he’s in the habit of doing.

          I’m still struggling to put myself in the position of someone who’s had 150 errors pointed out and just thinks of this a mild squally that certainly won’t upset any ships.

        • Yes, his responses have been unusual. When I was alerted to a problem with OncoLnc I acted very differently.

          I must admit, most emails I get about problems are, let’s just say, ill-informed, so my first reaction to someone pointing out a problem is “here we go again, someone who didn’t read the paper.”

          But one time the problem was legit, and when I realized that I leaped into action to see how big the problem was and fixed it as quickly as I could. Basically some patients were listed as Alive when they should have been listed as Dead–the TCGA has really inconsistent files, and it wasn’t a super serious error, but it is something I should have noticed and it was embarrassing.

          You can read how I responded to this error here: https://peerj.com/preprints/1780/#feedback

          Within two days of being notified of the problem I had everything fixed and noted the problem as Feedback on both the published paper and the preprint.

        • Andrew said “I’m still struggling to put myself in the position of someone who’s had 150 errors pointed out and just thinks of this a mild squally that certainly won’t upset any ships.”

          Wisdom I once found in a Chinese fortune cookie: “In order to put yourself in someone else’s shoes, you first have to take off your own.” (Good advice, but often very difficult to apply.)

        • Just speculation, but could they have been doing some kind of old-school table / chart lookup?

          I find this sort of thing often in Engineering where some older guy uses a table lookup versus everyone else that uses code and then all the latter answers agree exactly and we scratch our heads wondering where the one different value came from.

        • I had a quick look at the papers and they don’t seem to disclose any info on packages used.
          One paper (LOWER BUFFET PRICES LEAD TO LESS TASTE SATISFACTION) cites FIELD, A. 2005. Discovering Statistics Using SPSS, 2nd Ed., Sage Publications, London, UK, so this perhaps might suggest the authors used SPSS (though, sure, perhaps they needed the book to look something up, and then went back to Excel or a notepad).

        • @Rahul: I think it’s unlikely that someone would do ANOVAs with 100 people using a pocket calculator. And to get some of these errors (e.g., a mean score of 1.18 on a one-item, 1-9 scale with a sample size of 10; “Eating heavliy”, Table 2, column 4, line 4) they would have had to be using a rusty slide rule.

          @Bo: Good catch. Indeed, SPSS isn’t mentioned, so I was making assumptions. Maybe we were primed by the Field reference. (Maybe there is an R script somewhere that pumps out every one of the statistics that we flagged up across the articles, including a mention of each and every data point exclusion that changed the DFs, and an explanation of why that exclusion was justified, Maybe.)

          The point where they mention Field’s book is in “Lower buffet prices lead to less taste satisfaction”, where they state that there were “moderate to high effect sizes” from the ANOVAs, and give the formula for calculating an effect size in terms of r. As far as I know, SPSS doesn’t report an r-effect for an ANOVA directly, so it would make sense if that was done by hand if they were using SPSS. But probably some other packages (e.g., some of the 18 different ways to do ANOVAs in R, ha ha) don’t report r-effects either.

  2. Is this guy really a Cornell professor? See here. If so, at least in Germany, people have lost their professorship positions and funding in Germany for this level of misconduct. Lying in a paper about who did what, and publishing the same null results four times after a lot of data munging to make p come out less than 0.05 etc. The case of Jens Foerster comes to mind. As far as I know, he lost a 5 million Euro grant from the Humboldt foundation and his job at Amsterdam and then maybe also Bochum as a consequence of producing results that were too good to be true.

    I think Foerster does comedy shows now.

    • Shravan:

      The weird thing here is that Wansink seems completely dissociated from his own choices here.

      I get the impression that Diederik Stapel, Jens Foerster, and Bruno Frey knew exactly what they were doing: they were gaming the system. Given Frey’s writing about how the system was too much like a game, he in particular may have felt justified—perhaps he felt that everyone else was doing it, so he should to.

      Moving to someone like Marc Hauser: I expect that he knew exactly what he was doing too. Yes, he was cheating, but I’m guessing he felt that he was cheating in support of a larger truth. In his mind, he knew what was scientifically correct and he didn’t want the messiness of real data to stand in the way or to distract people from the underlying truth.

      But Wansink . . . he’s kind of amazing. He just seems like a pure careerist, having 100% bought into the goal that the purpose of the scientific endeavor is: publications, grants, fame, successful students, etc. Based on his blog post and his responses to comments, he seems genuinely interested in making people happy. The quality of the work doesn’t matter at all.

      Check out the comment thread on Wansink’s post: Commenter after commenter slams him for doing bad science, and each time he responds in a completely clueless way. For example, commenter Tina lays out a bunch of statistical criticisms, concluding, “I have always been a big fan of your research and reading this blog post was like a major punch in the gut,” and Wansink replies, “Thank you for your very thoughtful note. You make a really important point,” along with some paragraphs suggesting that he did not process her comment at all. Commenter Sergio writes, “There is something catastrophically wrong in the genesis of the papers you describe,” and Wansink replies, “You were very thoughtful to take that time to share your story and to share those two papers. I’ll make good use of them,” again not seeming to recognizing that to “make good use” of this advice would imply retracting those four papers and entirely rethinking his approach to research.

      I think that some of this cluelessness on Wansink’s part just has to be strategic—perhaps he feels that fighting the criticism will just make it worse so he’ll try to bend with the wind until the storm goes away—but there’s something about his tone which makes me feel that, even now, he doesn’t understand the problem with publishing papers based on garbled data, or telling contradictory stories about the same incident in different places. He’s just so open about it all; I’d think if he were fully aware of what he was doing, that he’d be more furtive, more Wegman-like.

      But maybe I’m just missing the whole point. If this guy were a businessman of some sort, for example, umm, I dunno, a vendor of restaurant equipment who’d become a big success but then started becoming sloppy, claiming to deliver products that he didn’t really have, selling used products as new, ripping customers off in various ways, then we probably wouldn’t be trying to “psychologize” him; we’d just say that he learned that he could become a bigger success with less effort by cutting corners. And, for the purpose of evaluating him as a potential supplier, it wouldn’t particularly matter that he was sweet guy and wanted every one of his customers to be happy: the salient fact would be that he was misrepresenting the condition and quality of his products and services. I expect that research ethics and personal ethics have some positive correlation but it’s not 100%, and it’s naive of me to expect that, just because someone is a scientist and seems to be a nice person, that this means he can’t be cheating. In some fields of business, cheating is so standard that a businessman might feel that it’s standard operating procedure, that he can’t survive and feed his family without misrepresenting to customers. Maybe that’s now Wansink feels about his field of research. Sure, at this point he’s tenured and could do honest work from here on in, but it’s hard to change once you’re in a groove, and also I get the impression from his post that Wansink also wants to help out his students and young colleagues, and he’ll be able to do that help much more effectively by cheating. At least in the short term.

      • The analogy to business reminds me of the British entrepreneur (Gerald Rattner) who ran a very successful chain of budget jewellery stores. Then he made a speech to the Institute of Directors which was too candid:
        — “How can you sell this for such a low price?”, I say, “because it’s total crap.”
        The business then collapsed.

        One might say that Wansink’s findings, as Rattner said of his products, are “cheaper than an M&S prawn sandwich but probably wouldn’t last as long.”

      • I was astounded by his responses too. The only explanation that I have is that maybe a lab manager takes care of the comment sections, and he doesn’t actually writes them himself. I walk past his lab at least once a week to get coffee at the library, and Brian has a huge glass showcase with what I can only describe as “scientific merchandise” – memorabilia from famous experiments, a video screen with rolling interviews, papers with comments pasted on them, current newspaper clippings, photos from the latest lab excursion. It’s really quite impressive, but clearly there is a hired person who takes care of all this, so maybe that person also takes care of the blog. On the facade this is an incredibly smooth running lab with major (!) successes – like inventing the 100 calorie pack, etc.
        That’s why it was so devastating to hear about what is going on behind the scenes.

      • Andrew said…

        >If this guy were a businessman of some sort, for example, umm, I dunno, a vendor of restaurant equipment who’d become a big success but then started becoming sloppy, claiming to deliver products that he didn’t really have, selling used products as new, ripping customers off in various ways, then we probably wouldn’t be trying to “psychologize” him; we’d just say that he learned that he could become a bigger success with less effort by cutting corners.

        But Wansink *is* a businessman of sorts. Here he’s pushing his “Slim by Design” book and personal brand. Here he’s touting his appearance with Rachael Ray. He’s seeming a lot like a huckster with tenure and I’m thinking that the faculty senate of Cornell’s College of Agriculture and Life Sciences might want to consider how Wansink’s research malpractice in the service of personal aggrandizement reflects on their institution.

        • >So, can we say that he is the Gweneth Paltrow of food fads?

          Except that we generally hold Ivy League professors/laboratory directors to a tad higher standard than we do wacky actors who promote nutty cure-alls.

          It’s a debatable point whether you’re doing more harm to society by selling women a $66 rock with directions for vaginal insertion, or by cheerleading for p-hacking as a route to academic success. I haven’t read Wansink’s $20 book, but a quick skim of the contents suggests that several of his suggestions may be relatively benign (lots of stuff on the psychological effect of smaller portion sizes). However, all of this appears to be based in the hundreds of papers his lab has churned out, and it would take a lot of effort to determine which of those have stronger credibility than the four dubious pizza analyses he convinced his hapless grad student to undertake.

          Actually, maybe describing her as hapless is letting Özge Sığırcı off a little easily. There’s no reason anyone conducting research or analysis shouldn’t be able to grasp the fatal flaw inherent in slicing and dicing a data set until you find a “significant” correlation and then failing to externally validate the resulting hypothesis. Even in our brave new world of alternative facts, it’s not hard to point out the idiocy of rock-insertion. But how hard would it be to try to reverse policy or commercial decisions based on research such as “Eating Heavily: Men Eat More in the Company of Women”?

          Wansink is the guy behind 100-calorie snack-food packs. So it’s not as if the impact of his work is small. How much confidence do we now have that only these four or five papers were p-hacked out of the 500 he’s produced? At what point along the spectrum from genuine statistical significance to alternative fact has he been setting his threshold for truthiness?

      • Unfortunately this kind of attitude is common among engineers and applied scientists, particularly in lower quality institutions (like Cornell). They are practical and pragmatic by orientation. Having taken to heart the engineer’s disrespectful maxim that those who can’t do teach, they make their university employment into a surrogate for their failed commercial dreams. They conceptualize the goal as meeting certain quotas, particularly with respect to grants and publications. Because they have failed at commercializing useful products, they don’t realize that a business built on selling an inferior product won’t last, so they are happy publishing mediocre to fraudulent papers, and more so when the impact factor of the journal is high in some ranking they can convince the dean to add to the evaluation metrics employed by the university. The difference with the business world is that once you make full professor, you stay full professor unless you get caught (and even then). But they are hard to catch because they are good at surrounding themselves by even lower quality lackeys who defend them ferociously in exchange for refereeing colaborator’s junk papers favorably. They view grants as an end in themselves, and see little need to produce anything while traveling to pay your own way conferences offering attendees publication in refereed indexed journals, except insofar as later they have to justify the next grant, but that is why they have student to take with them on these trips, to present posters.

  3. I can’t believe the post today
    Oh, I can’t close my eyes
    And make it go away
    How long…
    How long must we endure these errors
    How long, how long…
    ’cause today…we find them endless
    Unremitting…

    Broken reality under scholar’s feet
    Fabrication for promotion strewn across exalted journal pages
    But I won’t heed the superficial scholars’ call
    It puts my back up
    Puts my back up against the wall

    • “Fabrication for promotion strewn across exalted journal pages”–that’s an awful lot of syllables to squeeze into the beat. That works, actually, as onomatopoeia of sorts, as it *sounds* like squeezing data to fit the hypothesis.

      I would have gone for something like “Bad stats strewn across the TED-end street,” but that would have been corny (and inaccurate, since the TED street apparently has no end). I prefer yours.

      • Lyrics are not my forte!

        But I also think “Bad stats” is just a surface feature – it’s bad scientific practice that’s become the academic default.

        For instance, in a area with far less of a role for statistical calculations

        “Those who propound such absurdities presumably hope – consciously or, more likely, in a convenient fog of self-deception –
        that this will make them famous, or at least notorious; and the not-so-ambitious who happily climb aboard one fashionable bandwagon or another presumably hope – consciously or, more likely, in a convenient fog of self-deception – that this will provide opportunities to join a clique and, better yet, a publication cartel. And as for taking such pleasure (not to mention profit) in endlessly arguing “around and over and about” the same issue that you’d be really put out if it were actually resolved”

        From S Haack https://www.academia.edu/30425540/Serious_Philosophy_2016_

        • The fun has to end but a couple more from same source.

          “The academic environment today, with its constant demands for abstracts, proposals, reports of results, and lists of achievements, encourages self-aggrandizement and exaggeration – and, inevitably, the vice Peirce calls “the vanity of
          cleverness””

          “But clever undergraduates are encouraged into graduate programs; clever graduate students land academic jobs in fancy places; clever professors build impressive résumés, snag “prestigious” grants, publish in the most “prestigious” journals and with the most “prestigious” publishers.”

          “Why the scare quotes? Because in our hunger to make a name for ourselves we forget the
          etymological connection between “prestige” and “prestidigitation,” i.e., sleight of hand, conjuring.”

  4. Whatever the criticism of Wansink’s work, at least nobody but statisticians are offended. He gains some fame, his students likewise, the public chuckles. Far more egregious stuff with real consequences appears in the medical world. At least Wansink deals with food, a substance we all need. A substance some people believe is miraculous is, of all things, a “jade egg.” On the odd chance that others are as ignorant as I am on the subject of jade eggs, go to

    http://goop.com/better-sex-jade-eggs-for-your-yoni/

    For an analysis of jade eggs

    http://www.healthnewsreview.org/2017/01/vaginal-jade-egg-gwyneth-paltrow-goop-news-media-scrutiny/

    where you can find

    “The Goop.com article also did not miss the scrutiny of Dr. Jen Gunter, a sharp-eyed ob/gyn with a big online following of her own, who wrote an open letter to Paltrow last week.”

    “I can tell you it is the biggest load of garbage I have read on your site since vaginal steaming.”

    Should you also be ignorant of vaginal steaming, go to

    https://drjengunter.wordpress.com/2015/01/27/gwyneth-paltrow-says-steam-your-vagina-an-obgyn-says-dont/

  5. A slight change of topic: here is Wansink on CBS This Morning talking about some of his research:

    https://youtu.be/80nQheINpe4?t=35s

    Wansink: “(…) if you sit near a window you’re about 80% more likely to order salad, [if] you sit in a dark corner booth you’re about 80% more likely to order desert.”

    I know we’re not talking sex-ratio-at-birth effect sizes, but really … 80%?

    He also says that if you sit up straighter “you are more likely to order chicken, seafood, and less likely to order ribs.”

    • Nick:

      This bit on the blurb is interesting: “Instead of focusing on theory, the focus is on asking and answering practical research questions.” This won’t work too well with noisy data. In the absence of theory, effect sizes will be low, and anything statistically significant is likely to be a huge overestimate of any effect and also likely to be in the wrong direction (that’s type M and type S errors).

      In many ways I blame the statistics profession for the mistaken attitudes of people such as Wansink. For decades we’ve been telling people that statistics can reject the null hypothesis in the absence of substantive theory. So it makes sense that these dudes will believe us!

      • As a statistician working in industry, I would agree that we in the statistics community have been telling people that statistics can reject the null hypothesis in the absence of theory. But this begs the question of how a theory is to be developed, in the absence of understanding the basic mechanism of the result.

        • Aaron:

          I’m no expert in food science, but let me just take an example: Wansink’s claim, mentioned by a commenter above, that “if you sit near a window you’re about 80% more likely to order salad.” There has to be some theory underlying this, right? I don’t know what it is, not having read Wansink’s book or watched his video, but whatever it is, I imagine one could take measurements on some of the intermediate steps. You develop theories, make testable predictions, the usual story. I’d think if that is someone’s full-time job and he’s supposed to be a world authority on the topic, that he can do this. If there’s really no theory at all—zero—then my guess is that the whole thing is a waste of time, that he’s just chasing noise and learning nothing at all.

        • I don’t think he has a theory which would extend to windows. His basic theory is that of “mindless choice”: that you eat without thinking, making choices because of proximity and habits. That’s not exactly controversial. The basic recommendations are also sensible: use smaller plates, put stuff out of sight, etc. (As I’ve mentioned, that’s how they developed the small, typically 100 calorie packs for industry because they are, in fact, in business, as many people associated with universities are.) I suspect the weird idea about windows and salads comes from observations and that he either doesn’t get or doesn’t care about how that’s phrased and causation issues. I mean, for example, his last book was written out of work done for food clients, meaning consulting on restaurant design, menus, etc., and it included a section about the choices thin people make when they eat at buffets (and thus a recommendation of how to emulate that). But theory? I can’t see one; he says thinner people do this or that for some reason and he tries to count their behaviors, and does the same for heavy people, in a sort of old-fashioned time/motion study form of consulting from decades ago. None of his work should even be thought of statistically IMO unless there’s further work done by his clients, meaning the observations are small and thus noisy and, like any consulting, they’re implemented and it would be at that level which you might be able to judge effects not at his observational level. I can’t imagine he would say there’s a reason why someone who sits at a window eats salad, but he might recommend to a client that if you have windows with seating then you should add some pricier salads. At that level, is it different from designing ambiance? He’s done lots with ambiance – change the lighting, serve the same food, check responses – and work by him and others in that field is seen when you eat out. But a lot of it is A vs B or A vs B, C or D testing at sample sizes that you can’t really analyze until you see an effect on sales. I think from what I’m reading here is a disconnect between that form of academic work, which is really a series of business consultancies run through a university, and what people may think of as academic work.

        • > how a theory is to be developed, in the absence of understanding the basic mechanism
          Speculatively, with expectations of many false steps until there is some understanding that turns out not to be too wrong.

          (Not running default stat analyses looking for any p < .05)

    • Wow, I see what others mean when they ask what pills he has been popping. This guy should have a comedy show rather than being a Cornell professor. If someone told me this was actually a Saturday Night Live clip I would have believed it.

    • The Cornell Food and Brand lab also posted a short video (1m 25s) on these particular Pizza Buffet studies: https://www.youtube.com/watch?v=9OzunhdW2Qk

      They discuss some findings (or research questions?) at 0:55; “environmental cues, how many times you get up, how many different items you take […] affects how much you eat”. Not sure if that’s discussed in any of the 4 papers, but maybe that’s the original Plan A? (Fancy finding a null result on the correlation between “number of items on the plate” and “amount eaten” – that’d be surprising indeed!).

      • That video appears to be from a different data collection effort from the same lab at the same restaurant. The articles say that the lab collected data for two weeks in the spring, whereas the video says they collected data for (at least) three weeks, and it was apparently filmed in November 2007, unless the TV station sat on it for 7 or 8 months. You can see the video at the TV station’s web site, with the date on it, here http://www.twcnews.com/archives/nys/central-ny/2007/11/17/the-psychology-of-food-NY_37384.old.html

        I know all this because I phoned the restaurant to ask if the lab had collected data on more than one occasion. I was told “Yes. they did two studies, a couple of years apart”. I have no idea what happened to the data from the first study (the one shown in the video).

    • Wow, and thanks for saving me the time so I could skip to that critical section. While there are many things to cringe about with his research philosophy, what bothers me the most is not that he proceeds from a hunch without any (overt) theory. Such inductive research could be useful. But there are literally millions of potential “hunches” I can have about how environmental, physical, social, etc. factors might influence eating behavior. It is a sure bet that if you pick one hunch and run one of his studies using a relatively small number of participants, that you will find some result that gives you p<.05. And, when the study does not replicate, you can do more published studies showing that some of those other millions of factors provide a mediating influence, that once accounted for, give you another p<.05.

      Thus are successful academic careers built. Makes me proud to have only published a couple of dozen.

  6. Has anyone studied the history of the academic as media star? I wonder if the rise of the Gweneth-Paltrow academic (or should I say the jade-eggs academic) is associated with the perceived need to be a media celebrity.

  7. Wansink apparently won the Ig Nobel in 2007. In an interview he said, “I love research that does 3 things: 1) Answers an important question, 2) Does so in a vivid, memorable way, and 3) Is published in a highly prestigious journal.” The 2nd and 3rd do, indeed, seem almost like a parody of the sort of research we often find criticized here: i.e., something with a catchy press release published in Psychological Science or PPNAS.

  8. I don’t recall that Wansink asked his students’ permission before he bragged on his blog about how well they had learned a willingness and ability to spin useless data into junk science. Now thousands of copies of the Van der Zee critique have been downloaded and everyone knows that “Hey, these students make up their statistical results!”
    Great career move.

    • >”how well they had learned a willingness and ability to spin useless data into junk science…Great career move.”

      “No one” cares, or understands well enough to care. There are a lot of people doing junk research, far outnumbering those who are not (99 to 1?, 999 to 1?). That is literally being taught as the scientific method in grad schools all over the world, right now.

      • Anon:

        Don’t overstate it. Yes, just about everything gets published, but lots of these papers got rejected too. There are lots of people who hate junk science. But the practice is to keep submitting a paper until it finally appears somewhere, then from the outside none of the criticism appears. This is yet another reason I prefer a post-publication review process.

        • I don’t think I am overstating it, actually I was toning down what I really think.

          Based on checking out posters at a big conference a few years ago, I may be understating the ratio by a factor of 10 (~1/10k posters seemed to go beyond NHST, IIRC). When I was working in a NHST-afflicted community, I heard “is it significant?” so many times it lead to nightmares. That is really all anyone cared about at the end of the day. The closest I can think of is the TPS reports from here: https://www.youtube.com/watch?v=TXk5VOmIvGY (office space, so maybe NSFW)

          That was a few years ago, so maybe things have improved. However, even though I try to avoid it, I will hear from clients that people (from prestigious institutions) tell them that to publish papers you need to keep reanalyzing/transforming/filtering/etc the data until you find statistical significance. It is being taught as the way science works “in practice”.

  9. Well, that is exactly the point. If Cornell does not live up to Ivy Standards, where is it going to stop this silliness. A public correction by researcher is what is needed.
    If they do not enforce it, they do not live up to their “Ivy league standards”.

Leave a Reply

Your email address will not be published. Required fields are marked *