Dear Cornell University Public Relations Office

image

I received the following email, which was not addressed to me personally:

From: ** <**@cornell.edu>
Date: Wednesday, April 5, 2017 at 9:42 AM
To: “[email protected]
Cc: ** <**@cornell.edu>
Subject: Information regarding research by Professor Brian Wansink

I know you have been following this issue, and I thought you might be interested in new information posted today on Cornell University’s Media Relations Office website and the Food and Brand Lab website.

**
**@cornell.edu
@**
office: 607-***-****
mobile: 607-***-****

The message included two links: Media Relations Office website and Food and Brand Lab website.

You can click through. Wansink’s statement is hardly worth reading; it’s the usual mixture of cluelessness, pseudo-contrition, and bluster, referring to “the great work of the Food and Brand Lab” and minimizing the long list of problems with his work. He writes, “all of this data analysis was independently reviewed and verified under contract by the outside firm Mathematica Policy Research, and none of the findings altered the core conclusions of any of the studies in question,” but I clicked through, and the Mathematica Policy Research person specifically writes, “The researcher did not review the quality of the data, hypotheses, research methods, interpretations, or conclusions.” So if it’s really true that “none of the findings altered the core conclusions of any of the studies in question,” then it’s on Wansink to demonstrate this. Which he did not.

And, as someone pointed out to me, in any case this doesn’t address the core criticism of Wansink having openly employed a hypotheses-after-results-are-known methodology which leaves those statistics meaningless, even after correcting all the naive errors.

Wansink also writes:

I, of course, take accuracy and replication of our research results very seriously.

That statement is ludicrous. Wansink has published hundreds of errors in many many papers. Many of his reported numbers make no sense at all. To say that Wansink takes accuracy very seriously is like saying that, umm, Dave Kingman took fielding very seriously. Get real, dude.

And he writes of “possible duplicate use of data or republication of portions of text from my earlier works.” It’s not just “possible,” it actually happened. This is like the coach of the 76ers talking about possibly losing a few games during the 2015-2016 season.

OK, now for the official statement from Cornell Media Relations. Their big problem is that they minimize the problem. Here’s what they write:

Shortly after we were made aware of questions being raised about research conducted by Professor Brian Wansink by fellow researchers at other institutions, Cornell conducted an internal review to determine the extent to which a formal investigation of research integrity was appropriate. That review indicated that, while numerous instances of inappropriate data handling and statistical analysis in four published papers were alleged, such errors did not constitute scientific misconduct (https://grants.nih.gov/grants/research_integrity/research_misconduct.htm). However, given the number of errors cited and their repeated nature, we established a process in which Professor Wansink would engage external statistical experts to validate his review and reanalysis of the papers and attendant published errata. . . . Since the original critique of Professor Wansink’s articles, additional instances of self-duplication have come to light. Professor Wansink has acknowledged the repeated use of identical language and in some cases dual publication of materials. Cornell will evaluate these cases to determine whether or not additional actions are warranted.

Here’s the problem. It’s not just those 4 papers, and it’s not just those 4 papers plus the repeated use of identical language and in some cases dual publication of materials.

There’s more. A lot more. And it looks to me like serious research misconduct: either outright fraud by people in the lab, or such monumental sloppiness that data are entirely disconnected from context, with zero attempts to fix things when problems have been pointed out.

If Wansink did all this on his own and never published anything and never got any government grants, I guess I wouldn’t call it research misconduct; I’d just call it a monumental waste of time. But to repeatedly publish papers where the numbers don’t add up, where the data are not as described: sure, that seems to me like research misconduct.

From the NIH link above:

Research misconduct is defined as fabrication, falsification and plagiarism, and does not include honest error or differences of opinion. . .

Fabrication: Making up data or results and recording or reporting them.

Falsification: Manipulating research materials, equipment, or processes, or changing or omitting data or results such that the research is not accurately represented in the research record.

Plagiarism: The appropriation of another person’s ideas, processes, results, or words without giving appropriate credit.

I have no idea about fabrication or plagiarism, but Wansink’s work definitely seems to have a bit of falsification as defined above.

Let’s talk falsification.

From my very first post on Wansink, from 15 Dec 2016, it seems that he published four papers from the same experiment, not citing each other, with different sample sizes and incoherent data-exclusion rules. This falls into the category, “omitting data or results such that the research is not accurately represented in the research record.”

Next, Tim van der Zee​, Jordan Anaya​, and Nicholas Brown found over 150 errors in just four papers. If this is not falsification, it’s massive incompetence.

How incompetent do you have to be to make over 150 errors in four published papers?

And, then, remember this quote from Wansink:

Also, we realized we asked people how much pizza they ate in two different ways – once, by asking them to provide an integer of how many pieces they ate, like 0, 1, 2, 3 and so on. Another time we asked them to put an “X” on a scale that just had a “0” and “12” at either end, with no integer mark in between.

As I wrote at the time, “how do you say ‘we realized we asked . . .’? What’s to realize? If you asked the question that way, wouldn’t you already know this?” To publish results under your name without realizing what’s gone into it . . . that’s research misconduct. OK, if it just happens once or twice, sure, we’ve all been sloppy. But with Wansink it happens over and over again.

And this is not new. Here’s a story from 2011. An outside researcher found problems with a published paper of Wansink. Someone from Wansink’s lab fielded the comments, responded politely . . . and did nothing.

And then there’s the carrots story from Jordan Anaya, who shares this table from a Wansink paper from 2012:

As Anaya points out, these numbers don’t add up. None of them add up! The numbers violate the Law of Conservation of Carrots. I guess they don’t teach physics so well up there at Cornell . . .

Anaya reports that, as part of a ridiculous attempt to defend his substandard research practices, Wansink said the values were, “based on the well-cited quarter plate data collection method referenced in that paper.” However, Anaya continues, “The quarter plate method Wansink refers to was published by his group in 2013. Although half of the references in the 2012 paper were self-citations, the quarter plate method was not referenced as he claims.” In addition, according to that 2012 paper from Wansink, “the weight of any remaining carrots was subtracted from their starting weight to determine the actual amount eaten.”

What’s going on? Assuming it’s not out-and-out fabrication, it’s “Manipulating research materials, equipment, or processes, or changing or omitting data or results such that the research is not accurately represented in the research record.” In short: what it says in the published paper is not what happened, nor do Wansink’s later replies add up.

But wait, there’s more. Here’s a post from 21 Mar 2017 from Tim van der Zee, who summarizes:

Starting with our pre-print describing over 150 errors in 4 papers, there is a now ever-increasing list of research articles (co-)authored by BW which have been criticized for containing serious errors, reporting inconsistencies, impossibilities, plagiarism, and data duplications.

To the best of my [van der Zee’s] knowledge, there are currently:

37 publications from Wansink which are alleged to contain minor to very serious issues,
which have been cited over 3300 times,
are published in over 20 different journals, and in 8 books,
spanning over 19 years of research.

Here’s just one example:

Wansink, B., & Seed, S. (2001). Making brand loyalty programs succeed. Brand Management, 8, 211–222.

Citations: 34

Serious issue of self-plagiarism which goes beyond the Method section . . . Furthermore, both papers mention substantially different sample sizes (153 vs. 643) but both have a table with results which are basically entirely identical.

That’s a big deal. OK, sure, self-plagiarism, no need for us to be Freybabies about that. But, what about that other thing?

Furthermore, both papers mention substantially different sample sizes (153 vs. 643) but both have a table with results which are basically entirely identical.

Whoa! That sounds to me like . . . “changing or omitting data or results such that the research is not accurately represented in the research record.”

And here’s another, “Relation of soy consumption to nutritional knowledge,” which Nick Brown discusses in detail. This is one of three papers that reports three different surveys, each with exactly 770 respondents, but one based on surveys sent to 1002 people, one sent to 1600 people, and other sent to 2000 people. All things are possible but it’s hard for me to believe there are three different surveys. It seems more like “changing or omitting data or results such that the research is not accurately represented in the research record.” As van der Zee puts it, “Both studies use (mostly) the same questionnaire, allowing the results to be compared. Surprisingly, while the two papers supposedly reflect two entirely different studies in two different samples, there is a near perfect correlation of 0.97 between the mean values in the two studies.”

Brown then gets granular:

A further lack of randomness can be observed in the last digits of the means and F statistics in the three published tables of results . . . Here is a plot of the number of times each decimal digit appears in the last position in these tables:

These don’t look like so much like real data but they do seem consistent with someone making up numbers and not wanting them to seem too round, and not being careful to include enough 0’s and 5’s in the last digits.

Again, all things are possible. So let me just say, that given all the other misrepresentations of data and data collection in Wansink’s papers, I have no good reason to think the numbers from those tables are real.

Oh, wait, here’s another from van der Zee:

Wansink, B., Cheney, M. M., & Chan, N. (2003). Exploring comfort food preferences across age and gender. Physiology & Behavior, 79(4), 739-747.

Citations: 334

Using the provided summary statistics such as mean, test statistics, and additional given constraints it was calculated that the data set underlying this study is highly suspicious. For example, given the information which is provided in the article the response data for a Likert scale question should look like this:

Furthermore, although this is the most extreme possible version given the constraints described in the article, it is still not consistent with the provided information.

In addition, there are more issues with impossible or highly implausible data.

Looks to me like “changing or omitting data or results such that the research is not accurately represented in the research record.”

Or this:

Wansink, B., Van Ittersum, K., & Painter, J. E. (2006). Ice cream illusions: bowls, spoons, and self-served portion sizes. American journal of preventive medicine, 31(3), 240-243.

Citations: 262

A large amount of inconsistent/impossible means and standard deviations, as well as inconsistent ANOVA results, as can be seen in the picture.

If this is not “changing or omitting data or results such that the research is not accurately represented in the research record,” then what is it? Perhaps “telling your subordinates that they are expected to get publishable results, then never looking at the data or the analysis?” Or maybe something else? If the numbers don’t add up, this means that some data or results have been changed so that the research is not accurately represented in the research record, no?

What is this plausible deniability crap? Do we now need a RICO for research labs??

OK, just one more from van der Zee’s list and then I’ll quit:

Wansink, B., Just, D. R., & Payne, C. R. (2012). Can branding improve school lunches?. Archives of pediatrics & adolescent medicine, 166(10), 967-968.

Citations: 38

There are various things wrong with this article. The authors repeatedly interpret non significant p values as being significant, as well as miscalculating p values. Their choice of statistical analysis is very questionable. The data are visualized in a very questionable manner which is easily misinterpreted (such that the effects are overestimated). In addition, the visualization is radically different from an earlier version of the same paper, which gave a much more modest impression of the effects. Furthermore, the authors seem to be confused about the participants, as they are school students aged 8-11 but are also called “preliterate children”; in later publication Wansink mentions these are “daycare kids”, and further exaggerates and misreports the size of the effects.

This does not sound to me like mere “honest error or differences of opinion.” From my perspective it sounds more like “changing or omitting data or results such that the research is not accurately represented in the research record.”

The only possible defense I can see here is that Wansink didn’t actually collect the data, analyze the data, or write the report. He’s just a figurehead. But, again, RICO. If you’re in charge of a lab which repeatedly, over and over and over and over again, changes or omits data or results such that the research is not accurately represented in the research record, then, yes, from my perspective I think it’s fair to say that you have been changing or omitting data or results such that the research is not accurately represented in the research record, and you have been doing scientific misconduct.

I said I’d stop but I have to share this additional example. Again, here’s van der Zee’s summary:

Sığırcı, Ö, Rockmore, M., & Wansink, B. (2016). How traumatic violence permanently changes shopping behavior. Frontiers in Psychology, 7,

Citations: 0

This study is about World War II veterans. Given the mean age stated in the article, the distribution of age can only look very similar to this:

The article claims that the majority of the respondents were 18 to 18.5 years old at the end of WW2 whilst also have experienced repeated heavy combat. Almost no soldiers could have had any other age than 18.

In addition, the article claims over 20% of the war veterans were women, while women only officially obtained the right to serve in combat very recently.

What does that sound like? Oh yeah, “changing or omitting data or results such that the research is not accurately represented in the research record.”

So . . . I decided to reply.

Dear Cornell University Media Relations Office:

Thank you for pointing me to these two statements. Unfortunately I fear that you are minimizing the problem.

You write, “while numerous instances of inappropriate data handling and statistical analysis in four published papers were alleged, such errors did not constitute scientific misconduct (https://grants.nih.gov/grants/research_integrity/research_misconduct.htm). However, given the number of errors cited and their repeated nature, we established a process in which Professor Wansink would engage external statistical experts to validate his review and reanalysis of the papers and attendant published errata. . . . Since the original critique of Professor Wansink’s articles, additional instances of self-duplication have come to light. Professor Wansink has acknowledged the repeated use of identical language and in some cases dual publication of materials.”

But there are many, many more problems in Wansink’s published work, beyond those 4 initially-noticed papers and beyond self-duplication.

Your NIH link above defines research misconduct as “fabrication, falsification and plagiarism, and does not include honest error or differences of opinion. . .” and defines falsification as “Manipulating research materials, equipment, or processes, or changing or omitting data or results such that the research is not accurately represented in the research record.”

This phrase, “changing or omitting data or results such that the research is not accurately represented in the research record,” is an apt description of much of Wansink’s work, going far beyond those four particular papers that got the ball rolling, and far beyond duplication of materials. For a thorough review, see this recent post by Tim van der Zee, who points to 37 papers by Wansink, many of which have serious data problems: http://www.timvanderzee.com/the-wansink-dossier-an-overview/

And all this doesn’t even get to criticism of Wansink having openly employed a hypotheses-after-results-are-known methodology which leaves his statistics meaningless, even setting aside data errors.

There’s also Wansink’s statement which refers to “the great work of the Food and Brand Lab,” which is an odd phrase to use to describe a group that has published papers with hundreds of errors and major massive data inconsistencies that represent, at worst, fraud, and, at best, some of the sloppiest empirical work—published or unpublished—that I have ever seen. In either case, I consider this pattern of errors to represent research misconduct.

I understand that it’s natural to think that nothing can every be proven, Rashomon and all that. But in this case the evidence for research misconduct is all out in the open, in dozens of published papers.

I have no personal stake in this matter and I have no plans to file any sort of formal complaint. But as a scientist, this bothers me: Wansink’s misconduct, his continuing attempt to minimize it, and this occurring at a major university.

Yours,
Andrew Gelman

P.S. I have no interest in Wansink being prosecuted for any misdeeds, I don’t think there’s any realistic chance he’ll be asked to return his government grants, nor am I trying to get Cornell to fire him or sanction him or whatever. Any such efforts would take huge amounts of effort which I’m sure could be better spent elsewhere. And who am I to throw the first stone? I’ve messed up in data analysis myself.

But it irritates me to see Wansink continue to misrepresent what is happening, and it irritates me to see Cornell University minimize the story. If they don’t want to do anything about Wansink, fine; I completely understand. But the evidence is what it is; don’t understate it.

Remember the Ed Wegman story? Weggy repeatedly published articles with his name on it that plagiarized copied material written by others without attribution; people notified his employer, who buried the story. George Mason University didn’t seem to want to know about it.

The Wansink case is a bit more complicated, in that there do seem to be people at Cornell who care, but there also seems to be a desire to minimize the problem and make it go away. Don’t do that. To minimize scientific misconduct is an insult to all of us who work so hard to present our data accurately.

105 thoughts on “Dear Cornell University Public Relations Office

  1. I still think you are giving to much credit to the general state of science education. I’ve had professors look at me sideways when I mentioned “we should be trying to falsify our hypothesis” because they thought I meant “make things up”.

    It wouldn’t surprise me if the people writing these papers needed to be told what an error bar looks like (“like a capital I that is squished or stretched out”), etc. There is no way someone with such poor training will understand the importance of correctly reporting sample sizes or methods, they will consider it a “typo” since it “does not affect the conclusions”.

  2. Hmm, regarding the mystical numbers of carrots which don’t seem to add up… I think you are being unfair by not considering all possibilities, I think the numbers are entirely plausible: for example 17 taken but 11 eaten and 7 uneaten is simply explained by the child regurgitating one carrot back and then eating it again.

      • I think “a carrot pancake” is just another of Kitty’s pseudonyms — perhaps Kitty is miffed because the image at the top of this post is a pizza (not a cat) and it looks kinda like a carrot pancake?

        • Hah. I think I imagined the scenario the very first time I read about that carrot study, when was it, a month or even more ago? After that every time the subject’s been brought up on this blog or elsewhere, the image of some kids re-eating carrots has been awakened in my brain and I’ve been snickering at it by myself. Finally I had to let it out, to roam free in the wild!

      • One of the things that finally convinced me that this whole affair involves more than just sloppiness was when Dr. Wansink, in his second interview with Retraction Watch, not only invoked the quarter plate method (which, to be fair, could have been “in development” when the carrot research was done”; but the article doesn’t mention it, and the draft of the “carrots” article that we found online explicitly states that the uneaten carrots were *weighed*), but also claimed that the discrepancy between “taken”, “eaten”, and “uneaten” was due to some of the carrots being thrown in a food fight or dropped.

        This sounds like the kind of amusing anecdote you might say to an interviewer, who you hope will chuckle and move on, but it falls apart under any sort of scrutiny, because /a/ the article explicitly states that they calculated Eaten = Taken – Uneaten, and /a/ unless they had some kind of carrot-o-meter attached to each kid’s oesophagus, they *could not possibly* have had any measure of “eaten” that did *not* include carrots that were dropped, or thrown, or beamed up by aliens. Or it might be something that someone with no knowledge of the experiment might say if you stopped them in the street, gave them a 30-second explanation of the procedure, and asked then why the numbers didn’t add up (“I dunno, maybe some of the carrots got lost or something”). But it seems absolutely inconceivable that the principal investigator and lead author would construct an explanation like this, given that the reality of the study was so simple.

        A similar flashing light and loud siren went off for me when I read the comment about “preliterate children”. I cannot imagine under what circumstances one would write up a study that was conducted in seven elementary schools with children aged 8-11 and then describe those 208 children as “preliterate”. I simply don’t see how that word could accidentally emerge from the author’s thought processes. The choice of Elmo as a character to appeal to this age group also appears less than obvious. It is tempting to imagine that the discussion section of this (brief) article was written for a completely different (and, apparently, as yet unpublished study) and then “repurposed”. (And that is about the most charitable interpretation I can come up with.)

        • Nick:

          Of all the evidence so far, the most striking to me are: (1) the survey where someone’s supposed to be 105 years old and everyone else is exactly 18; and (2) that set of numbers that so rarely end in 0 or 5. Of course, these could’ve been created by someone wanting to please Wansink, not Wansink himself. Indeed, coming up with data summaries sees seems below Wansink’s pay grade, as it were.

          I agree with you that the carrot story and the elementary schools story are horrible. The most charitable interpretation I can come up with is that Wansink has no idea what is going on in these studies, that he’s not closely involved in design of the studies, nor in data collection, nor in data analysis, nor in writing the papers. (He’s already on record saying he’s not the one who submits the papers to the journals.) So he starts bullshitting about food fights and the quarter-plate method because he himself has no idea what’s going on. And the claim that Elmo appeals to 8-year-olds is coming from some third party who was writing up the paper by putting together paragraphs of previously-written material (which, as we know, is one of Wansink’s habits).

          So, at best, nobody knows what’s going on, and Wansink is still behaving unprofessionally (in my opinion) by spouting off meaningless pseudo-explanations.

          After all, if he really didn’t collect the data, or he really didn’t remember how the data were collected, and if he really didn’t write the paper, or he really didn’t remember what the paper said, how can he be so sure that what he or his colleagues did was correct? Especially given the zillions of errors that have already been revealed in his work.

        • He’s already on record saying he’s not the one who submits the papers to the journals

          Are you thinking of the Pubpeer comment, where Wansink explained the false description of authors’ contributions as being the fault of the secretary (or whoever) who finished and submitted the paper? Who was following a standard template which in this case was not correct?

        • It was a pity that the person who was trusted to finish and submit papers could not be trusted to reply to PubPeer critiques.

        • Smut Clyde

          The four pizza papers were published in 2014 and 2015. It is possible that the person who now handles editing journal submissions (Lindsey Brill) was not the person who handled these four papers.

        • “After all, if he really didn’t collect the data, or he really didn’t remember how the data were collected, and if he really didn’t write the paper, or he really didn’t remember what the paper said, how can he be so sure that what he or his colleagues did was correct? Especially given the zillions of errors that have already been revealed in his work.”

          I thought this video was interesting:

          https://www.youtube.com/watch?v=NKCS8c8OJpY

          Seems like a research/publication machine over there at Cornell, possibly with lots of early career researchers doing all the work.

  3. Andrew,

    > And, as someone pointed out to me, in any case this doesn’t address the core criticism of Wansink having openly employed a hypotheses-after-results-are-known methodology which leaves those statistics meaningless, even after correcting all the naive errors.

    I often don’t have hypotheses before results. Are you saying my statistics are meaningless? Can you clarify your meaning?

    • Eric:

      Yes, I was not clear. Hypothesizing after results are known is just fine; we do it all the time. The problem is that when you hypothesize after results are known, your p-values are invalidated. And Wansink’s claims were all based on p-values.

      • I’m not sure if “p-values are invalidated” is the right way of putting it. The null hypothesis assumes the data was collected a certain way, and if it was not, then the p-value should correctly detect this deviation from the hypothesis being tested.

        It is like giving group A a drug and group B placebo, then testing the hypothesis you gave group A a drug. You already know group A got the drug, so such behaviour can serve no scientific purpose. BTW I have actually seen almost exactly that done before as well. A drug was injected, then they detected the drug in some blood tests, and finally the news reported a cure for MS is around the corner. (Of course studying the pharmacokinetics is of interest, but not the mere detection of the drug at some point)

        • The problem ultimately is that the p value doesn’t map to a thing of interest.

          I go and look at my data, and see that more carrots are eaten, I hypothesize that “giving carrots a cool sounding name causes people to eat more carrots”. I then formulate some meaningless null “eating carrots is a random number generation process in which the same random number generator will be at work regardless of what you name the carrots” (this is the actual mathematical meaning of the more common “mean number of carrots is the same in both conditions”)

          Then I test to see if the RNGs were the same (note, I don’t concern myself with the fact that eating carrots isn’t an RNG process). I detect that they weren’t, a fact I knew I would detect because I chose which hypothesis to test after I already knew I would see a difference.

          I take this rejection of the null as evidence that my substantive hypothesis *is true* “giving carrots a cool sounding name causes people to eat more carrots” even though this has the same false logical structure as IF (there are no unicorns) THEN (Eric owes me a million dollars)… (There are no unicorns)… Therefore SEND THE DAMN CHECK.

          Now, we KNOW that observing carrots with different names *is NOT an RNG process…* and therefore, we should expect the hypothesis test to reject it with a small p value, the fact that it does so is just exactly what it was supposed to do mathematically. So, we’ve got the logical structure

          IF (thing we know is false is detected to be false by a test that detects this kind of falseness) THEN (SHOW ME THE MONEY)

          In Anonuoid’s injection example, you KNOW you injected the drug, so when you do the test, your null hypothesis is “My blood testing machinery always puts out the same random reading regardless of what I inject people with”, and when you reject it as you should… you discover what exactly? That your blood testing machine works better than an RNG.

        • Now, we KNOW that observing carrots with different names *is NOT an RNG process…* and therefore, we should expect the hypothesis test to reject it with a small p value, the fact that it does so is just exactly what it was supposed to do mathematically.

          This is going on too, but in the case where people do early stopping, multiple testing, etc and don’t include that in their model, the procedure is even more obviously wrong*. The entire paradigm is a mess splattered onto existing mess.

          *As a positive example, the LIGO team’s attempt to deal with it is pretty good (https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.116.061102). However even in that case it seemed that “5 sigma rejection of null model” got incorrectly attached to “inspiralling black hole binary”.

        • I don’t understand your observation: “However even in that case it seemed that “5 sigma rejection of null model” got incorrectly attached to “inspiralling black hole binary”.”

          They noted that the relevant event had a SNR “equivalent to a significance greater than 5.1σ.” But, they reported credible intervals and used Bayesian software. The significance level was really a side observation and not the heart of their presentation.

          “. Shaded areas show 90% credible regions for two independent waveform reconstructions. One (dark gray) models the signal using binary black hole template waveforms [39]. The other (light gray) does not use an astrophysical model, but instead calculates the strain signal as a linear combination of sine-Gaussian wavelets.”

          There was a boatload of structure in the signal and it matches the signal from a a pair of inspiralling black holes.

          I find some of their claims hard to believe: their system has more than”10 orders of magnitude of isolation from ground motion for frequencies above 10 Hz”; the signal in the interferometer “is further increased to 100 kW circulating in each arm cavity” and they are measuring changes in length of the arms by something like 1 part in 10^21. It’s much easier to believe power pose or ESP than to believe people build things like this!

          Bob

        • @Bob. The engineering behind the interferometer is pretty amazing stuff.

          https://www.ligo.caltech.edu/page/ligo-technology

          I agree with you that the LIGO result looks at first glance to be a powerful Bayesian analysis of a predicted signal, rather than a simple p value based filtration of noise.

          with these things, “measurement” is not always a straightforward term. For example, I bet they have some portion of their apparatus/model that accomodates thermal expansion and removes the effect of this expansion on the signal. But often this kind of thing is “removed” *before* the data enters the “black hole detection model” so the input data is not “pure measurement”

        • “However even in that case it seemed that “5 sigma rejection of null model” got incorrectly attached to “inspiralling black hole binary”.”

          This was more in the media coverage and discussion surrounding the paper, but even in the paper they gave too much space to rejection of the “background” model relative to one or two sentences about “alternative explanations” that got ruled out. The background model stuff should have just been put in the same “secondary” paper as the rest, I don’t see why it got special treatment unless someone was confused about the meaning of that 5 sigma.

          The more convincing aspect is that apparently no one has been able come up with an alternative explanation for that signal. There is some stuff about the US power grid supplying both sites (http://www.scirp.org/journal/PaperInformation.aspx?paperID=71246), but that is about it. On the other hand it really would be nice to have one of these GW detections accompanied by gamma rays, neutrinos, orwhatever that are measured independently before accepting the machine is measuring the right thing.

        • @Daniel Lakeland

          According to a reference I found, they don’t filter out the variations in length from the data, rather they move the mirrors to cancel out variations in the length of the cavity. The move the mirrors by shining a one watt laser on them. Photon recoil does the rest. See
          http://ligo.org/science/Publication-GW150914Calibration/index.php

          A quote:
          The Advanced LIGO detectors sense this induced length variation of the interferometer arms. These length variations are tiny—at the same scale as those of the passing gravitational waves, about 1×10−19 m. It is remarkable that the force caused by the photons from a small, one-watt laser beam is able to move a 40 kg (88 pound) mirror by an amount that is easily measurable by the LIGO detectors! – See more at: http://ligo.org/science/Publication-GW150914Calibration/index.php#sthash.Qa7xFDnf.dpuf

        • @Bob, yes a feedback control system that applies forces on the mirrors so that as the cavity expands due to thermal changes, the mirrors don’t get farther apart. Well, how do you know that the changes are due to thermal expansion? The fact is the feedback control system is sensitive to *some* changes in cavity length (such as those caused by a slow trend of expansion caused by heating), and insensitive to other changes (such as those caused by gravitational waves involving an oscillating change with frequencies within some band)

          So, the output signal that is being analyzed is not a pure measurement of “what happened to the cavity” it’s a measurement of “what happened to the cavity given that a sophisticated pre-model was used to control the length of the cavity in a certain way”.

          which is in many ways as good as “we measured the length of the cavity, and then subtracted out the signal caused by a known cause such as thermal expansion” at least, at the point of the Bayesian analysis of the data you’re assuming p(Signal | AllTheKnownEngineeringInTheDetector, SomeAstroPhysicalParameters) and Signal = ActualChange – CompensatedChange and CompensatedChange has a very precisely known form thanks to your AllTheKnownEngineering…

        • > you discover what exactly? That your blood testing machine works better than an RNG.

          On the other hand, that’s pretty nice to know!

        • When they chose their null model, it was deduced from a hypothesis that included stuff like “the data is IID” (usually this is called an “assumption”, but really it is just another part of their hypothesis).

          Then, when they looked at the data many different ways, slicing and dicing it, the subsets cannot be IID because they chose which data to include based on whatever summary statistics were generated (which includes info from the other datapoints).

          So they set up a null model to test, purposefully rendered the model false, and then claimed some kind of “success” for detecting the deviation from the null model. NB: This is all 100% out in the open, no one is being deceived here.

    • In the UK there’s a current political debate about whether we should fund universal free school meals in all primary schools. It’s an important question: so ignoring the politics, I was curious to look at the research being cited, especially because the originators of it were sounding rather sensibly careful about the interpretation of their work.

      On page 21 of the original paper (https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/184047/DFE-RR227.pdf) is this:

      “It is also worth noting that, unless otherwise stated, all impacts discussed in this report are statistically significant at the 5 per cent level; that is, in 95 per cent of randomly drawn samples of pupils, we would expect the impact of the pilot (the difference observed between pupils in pilot and comparison areas) to be different from zero. In other words, there is a high degree of certainty that statistically significant results do not occur due to chance.”

      Not quite a definition I’d heard before…and maybe I shouldn’t have expected too much more value from this.

      But then the report marks up the results throughout the 150-page expedition with a ** [or *] on any result indicating “significance at the 1 per cent [or 5 per cent] level.” And bullets these results in an Executive Summary.

      #nonethewiser

      • That’s a pretty bad interpretation in that quote, but honestly, p values are so confused to begin with that it’s not surprising you hear that kind of thing.

        The p value is *not* a statement about how often repeated trials will show sample means different from zero.

        The p value is a statement about how often the particular result you saw in your particular sample (or something more extreme) would occur if the overall repeated experimentation scheme is “as good as” a random number generator that *does* have a zero mean.

        You might be able to rewrite it as “If the treatment *is* useless, 95% of the time that we run a future experiment we will get a less impressive result than the one we have here”

        That doesn’t sound very good though right?

        • That definition of the p value sounds more like an attempt to define a confidence interval. Sometimes I feel that there are quite a lot of people who think that a CI is just another needlessly complicated (“but currently fashionable, so we have to pay lip service to it, nudge, wink”) way of writing a p value (“I’ve worked out that if it doesn’t contain zero then you have a significant result, which is what matters”).

        • But even the CI doesn’t tell you how often future samples of the same size will have sample means that aren’t zero… or whatever crap they tried to say.

          Note: 100% of the time you do an experiment with a finite sample size the sample mean of an approximately continuous outcome measurement will be different from exactly 0 and this is true even if the experiment is “pull random numbers from an RNG whose mean value is exactly 0”

        • I seem to get two different vibes here concerning p-values; one is “p-values are bad” and the other seems to be “p-values are bad if they are used to make a yes/no decision” (i.e., p<.05=significant). I thought the inescapably-bad thing about a p-value was that “it doesn’t tell us what we want to know” – the whole “inverting the conditional probability” thing. That doesn’t have anything to do with whether or not the p-value is evaluated dichotomously at some arbitrary point. I used to say that a p-value contained no quantitative information concerning the truth or falsity of the null. Now I say that ”by itself, a p-value contains no…” since p(Data|null is true) would be a part of Bayes Theorem that gives p(null is true|Data). Anyway…all that seems to imply that, save its use in obtaining p(null is tue|Data), a p-value is pretty useless. Some things I have heard here seem to indicate that some here think a p-value is meaningful given “proper practices” – but not as an arbitrary cutoff.

        • GS: “I thought the inescapably-bad thing about a p-value was that “it doesn’t tell us what we want to know” ”

          There are lots of problems with p-values — many falling under the general category of “people don’t really understand what they are and what they aren’t”

        • “people don’t really understand what they are and what they aren’t”

          GS: But that isn’t “inescapable.” Which is it? “Inescapably bad” or just “bad for the improperly-educated”?

        • Aaron:

          These discussions by Shalizi and Stark seem reasonable to me. They don’t emphasize everything that I emphasize, but they’re fine for what they are.

        • The Stark commentary seems good as a brief “cautionary guide” for laypersons. I haven’t had a chance to read the entire Shalizi commentary yet, but the introduction sounds good — I get the impression that he is focusing more on teaching than addressing his comments to laypersons.

        • It depends how you use the p-value.

          1) To test a strawman nil null hypothesis if p is less than some value: worthless, this is pure pseudoscience
          2) To test a prediction of a theory you believe: OK, but difficult to do right (why no one believes the ESP results)
          3) As a summary statistic that (along with sample size) indexes a likelihood function: OK, but of dubious usefulness imo

          Maybe there are more possible uses, but the main one (#1) needs to stop. Some useful references:
          https://meehl.dl.umn.edu/sites/g/files/pua1696/f/074theorytestingparadox.pdf
          https://arxiv.org/abs/1311.0081

        • p values work for basically one thing:

          http://models.street-artists.org/2015/07/29/understanding-when-to-use-p-values/

          So when you’re doing that thing, there’s nothing particularly bad

          The problem is, this is a tiny fraction of what p values are used for. In the case where they’re used for inference on a parameter… the thing that’s bad about them is that they don’t mean what you think they do (you really want the bayesian p(Parameter | Data), and they are often used to make a decision (even though you can’t make any kind of good decision without considering the consequences of your actions, and p values don’t inform you about consequences). So, there’s lots wrong with p values.

          The thing is, if you’re not doing the filtering thing described in that link, then… you’re not using p values for what they actually mean which is “this is a strange data point if I assume some particular substantive ‘null’ model”

  4. And, as someone pointed out to me, in any case this doesn’t address the core criticism of Wansink having openly employed a hypotheses-after-results-are-known methodology which leaves those statistics meaningless, even after correcting all the naive errors.

    There does seem to be a consistent “meta-hypothesis” behind Wansink’s various papers, which is that humans don’t know what we’re doing… we’re mindless eaters, whose dietary choices are controlled by factors other than hunger. Like experiences in wartime, or the size of the food-packaging, or the word “power”, or the presence of a potential sexual partner who might be impressed by our stomach capacity. With the corollary that a wise benevolent government can fine-tune our food consumption for our own good, by providing the appropriate environmental triggers (so long as they seek the advice of sufficiently well-informed experts).

    This is especially obvious with the ‘restaurant study’, where Wansink blithely boasted of the fact that they collected the data with the plan of proving one environmental hypothesis, and ended up falsifying it, so instead of publishing the falsification, they scoured the data in search of some other cause that could fit into the meta-hypothesis.

  5. At the risk of bringing coals to Newcastle, I have some additional issues with the “X-ray vision carrot” study.

    http://foodpsychology.cornell.edu/research/attractive-names-sustain-increased-vegetable-intake-schools
    “In Study 1, 147 students were selected from 5 diverse schools in New York. During one week, carrots were offered three times for lunch. On Monday and Friday, carrots were served as a pre- and post-test. On the second day of the study (Thursday), carrots were served under three conditions: unnamed (control), with a simple name “The Food of the Day,” or with an attractive name “X-ray vision carrots.” For the students that were present for all three lunches (113), lunch choices were recorded and amounts eaten were calculated from ending weights. Our results showed that naming the vegetables did not affect the amount of carrots chosen between the three naming conditions, but did influence how much was eaten. The intake of “X-ray Vision Carrots” was greater than the other naming conditions. 66% of “X-ray Vision Carrots” were eaten, whereas 32% of carrots named “Food of the Day” and 35% of unnamed carrots were consumed. Results also indicated an effect of naming on future vegetable choice. Students exposed to “X-ray Vision Carrots” were more likely to take carrots on the post-test Friday. Students who were not exposed to X-ray Vision Carrots” were less likely to choose carrots on the post-test day than during the pre-test on Monday.

    “In Study 2, two months of purchase data was collected on two similar neighboring elementary schools outside of New York City. Both schools had identical lunch menus. For the first 20 days, both schools offered carrots, green beans and broccoli without names, as usual. In one school, the following 20 days featured the same hot vegetables served with attractive names such as “X-ray Vision Carrots,” “Silly Dilly Green Beans,” and “Power Punch Broccoli.” The other school served as a control and served the same items, but without any names. Our results showed a 99% increase in vegetable purchase from the treatment school. In the control school, hot vegetable purchases declined 16.2% from the first to second month of the study. Green bean and broccoli purchases increased significantly, 176.9% and 109.4% respectively. In summary, the number of students taking hot vegetables increased by 12%.”

    From a marketing standpoint, the first part has a really bizarre result, the kind of thing that would be very interesting if true. A rather trivial piece of branding produces a massive effect on consumption but no effect (technically slightly worse than that, but never mind) on likelihood to ask for carrots. This is theoretically possible but it raises all sorts of questions. The other way around would be easy to explain – – we’ve all ordered something off the menu because it sounded good only to be disappointed – – but it is difficult to see how a name would make you less likely to order something but more likely to finish it.

    On top of that, there is an effect on selection that, if I’m reading this correctly, only cuts in 24 hours later, when the carrots no longer have the cute name. There seems to be something about the vegetable formerly known as “X-ray Vision Carrots.”

    And the second study seems to ignore the cleaning-your-plate issue entirely and only looks at how much is being selected (presumably that’s what drives purchasing). Now the effect on selecting is huge.

    But then, all of Wansink’s effects seem to be colossal.

  6. >>>now for the official statement from Cornell Media Relations. Their big problem is that they minimize the problem.<<<

    Isn't that the main job of a PR cell? Assuming "Media Relations" is just a fancy name for PR.

    Damage control. Positive spin on disaster. Distraction. Downplaying events that show the organization in a bad light. et cetra

    • Rahul:

      Sure, but these guys don’t work for Wansink. They work for Cornell. And I don’t see this ducking and weaving being in Cornell’s interest. Until the recent defenses of Wansink, Cornell wasn’t looking so bad. Sure, they had a bad apple on their faculty, but these things happen. It’s when they keep defending Wansink, that’s what’s making Cornell, the organization, look bad.

      • Andrew:

        I think this sort of thing often happens in PR:

        Say the media catches a CEO with his pants down or drunken driving or something. The first response is to deny or downplay the scandal or waffle. Most times these defense tactics work and the issue dies down and everyone’s happy.

        But sometimes you cannot suppress the disaster, and then such initial defenses can look silly or worse. In those cases in hindsight the PR team wishes they had not tried the ducking n weaving.

        I’m not justifying Wansink’s conduct; just saying that this seems typical PR strategy.

  7. I’ve gotten a chance to briefly look over Wansink’s explanation of the errors, here’s what bothers me. He downplays BOTH the severity and number of the errors.

    Okay, so if you want to say none of the numbers in your papers make sense because you listed the wrong sample sizes, that’s fine. But isn’t that kind of an important thing to report accurately? How do you do a study and not even know how many rows of data you have?

    In biology a frameshift mutation is a single mistake in the genome, but it is still one of the most devastating mutations you can have since it messes up the rest of the protein.

    You can’t say that not knowing how much data you have is a small problem. Either there are many small mistakes, or a few large ones. A few small mistakes is not one of the options.

    P.S. It seems like you might be a 76ers fan. I really hope you are aware of the raise the cat movement:
    http://www.sbnation.com/lookit/2017/1/21/14343984/sixers-fans-raise-the-cat-meow

    • Hi Jordan,

      I agree 100% that Wansink is downplaying the errors. In his table of responses to your preprint (http://foodpsychology.cornell.edu/sites/default/files/unmanaged_files/Response%20to%20%27Statistical%20Heartburn%27%20Table%20of%20150%20Inconsistencies%20FINAL.pdf) he classifies them all using a combined “Unique and Valid” variable. Setting aside what “valid” even means here (how can an incorrectly reported number not be a valid “inconsistency”?), this is a misleading way to classify them. It results in errors like the following having “Unique and Valid” = No:

      “The original statistic was a result of improper elimination of outliers. The corrected statistic appears in the updated analysis.”

      So he thinks that improperly eliminating outliers should only count as 1 error even though it results in many numbers being reported incorrectly. I can see that, but I think a better way to show that would be to have Unique and Valid as separate variables, so the above could be classified as valid but not unique. If I counted correctly he only has 21 of the 151 as Unique and Valid! His whole response seems to be trying to make it look like there were actually only a handful of rounding errors, which does not seem very honest or a good faith response to me.

      Did you see that he posted the data and code? I believe it is the code for his corrected versions, not the originals, but I imagine looking through it would still be quite enlightening.

      • I took a quick look at the data and the glance was enough to see some very interesting things.

        I’m traveling back from philly today (I gave a talk on this topic at UPenn), but I’ll take a closer look at the data this weekend and see what my collaborators want to do (update our preprint or write a new preprint or blog about it etc).

      • Will:

        It’s not always clear to me that there were any raw data for these papers. Or, if there were raw data, the published summaries might just be made up, with no connection to any data that might be around. Consider the last-digit summaries in the above post, or the summaries of ages from the supposed survey of veterans. Perhaps these numbers are just entirely made up here. I’m not accusing Wansink of anything here—maybe there are some real data somewhere in the loop—but we should at least be open to the possibility that there are no data all, or that the summaries are created out of whole cloth, not by manipulation of raw data.

        • Andrew:

          I definitely agree in general, and clearly something is up with those examples you mention. But in this case with the data, the code for the corrected analyses, and his descriptions of what caused the 150 errors, it should be easy to replicate those errors. Assuming that can be done, I would be convinced that this is the real data, since it would be really sophisticated to fake data that can match the original and corrected analyses.

          Of course, this only addresses one aspect of these 4 papers, and even if the 150 errors were all honest mistakes and sloppy data analysis, there still would be the greater issue of generating hypotheses based on the data, and all the issues in his other papers.

          But who can pass up the chance for some forensic data analysis?

        • Will:

          Yes, it’s possible there are data for some studies and not others. Or that in some papers, the data were slightly massaged in order to get the reported numbers, whereas for other papers, the numbers in the tables were entirely made up. And of course there are all sorts of intermediate possibilities.

      • Can you summarize? I tried to watch those but, I don’t know how else to describe this, the information transfer rate was so slow it was painful. It was like waiting for an image to load in the dial-up days.

        • Someone from Fox news was interviewing students and was asked to stop by Cornell’s media relations, the same media relations that released the statement about Wansink. When the reporter requested permission to interview the students they were denied. When the reporter asked why, he wasn’t given an answer. The media relations team said they would email the reporter an explanation, but apparently the email did not contain any useful information.

        • Jordan, I think your characterization of this event is a bit misleading. This “reporter” is not a journalist, he is an entertainer and a troll. Unless you call his race-baiting and deliberately offensive trolling form of entertainment “journalism”: https://www.nytimes.com/2016/10/07/business/media/fox-reporter-accused-of-racism-for-chinatown-interviews-on-trump-clinton-and-china.html

          Cornell is perfectly within their rights to not engage with trolls. This is no different from asking a film crew for the Daily Show or one of the late night programs to not film. It is not shutting down journalism, it is about not being a foil for an entertainer to make a buck of you.

          Furthermore the entire premise of the piece is bogus. The alleged injustice is that 96% of political donations from the faculty went to liberal/progressive/democratic (whatever word you want to use) candidates. While this is perhaps evidence of the political leanings of faculty, it is not evidence of a deliberate discrimination in hiring based on political viewpoints. If Cornell were to ask candidates what political causes they donate to and then make hiring decision based on that information, then there would be a story here. But it seems that is exactly what Bill O’Reilly and his buddy are suggesting they do to “remedy” this situation.

          We don’t criticize scientists without at least reading the entire paper. Let’s not laud a journalist and demonize a reporting subject without at least looking a bit into the context.

        • I agree that Fox news is trolling, however I found Cornell’s response very interesting since you could argue that is exactly how they have responded to pizzagate, which I think we can all agree is a legitimate story.

          When dealing with a troll typically you first start by trying to address their concerns. When they then continue to troll regardless of what explanations you provide is when you should start ignoring them. I’m sure the Fox footage was heavily edited, but given my experience with Cornell it would not surprise me if Cornell never provided any meaningful answers to Fox news.

          Cornell’s policy seems to be to treat everyone as if they are trolls and don’t deserve legitimate responses, but when someone first contacts you how do you know for a fact they are a troll? Perhaps Fox news didn’t deserve a response in this case, but I don’t think Fox was looking for a response from the administration in the first place, they were simply interviewing students. Cornell presumably thought that those interviews would hurt their reputation, so they kicked Fox news off the campus, which I guess they have the right to do and maybe it was warranted, but I don’t really see how the news crew was causing the students any harm (who doesn’t want to be on TV?).

          Admittedly, the goal of Fox news was probably to make Cornell look bad, but Cornell did them a favor with the manner in which the media relations department reacted, and made it easy for Fox news to make them look even worse. Maybe the United airlines media relations department and Cornell’s follow the same handbook or something.

          You could argue that how Cornell has responded to pizzagate has also made the story much worse. If they told us they were launching a full investigation into Wansink’s work when we first contacted them then maybe we wouldn’t have had to launch our own investigation, which has led to one retraction and counting. And their continuing portrayal of massive amounts of impossible numbers and duplicate publications as minor criticisms that don’t amount to academic misconduct paints Cornell as a university which probably shouldn’t be taken seriously by the scientific community.

          On the one hand I do think their response has some merit. When the Lakers made Kobe Bryant the highest paid player in the league despite being injured and old some called it a terrible contract. But in doing so the Lakers showed that they were loyal to their superstars. As a result, other superstars may view the Lakers favorably and want to play for them. Cornell’s response basically shows that no matter what happens they will not discipline tenured faculty, which is great if you are a tenured faculty member. As a result, other “superstar” faculty may see Cornell’s response and think “hey, that’s a place I’d want to work since I know the university has my back, ride or die!”. The problem is the “superstars” this type of thing attracts could team up to form the Mighty Ducks; quack quack.

        • I’m very hesitant to paint with such a broad brush, the perils of aggregation and all that. For one, I think it’s a little unfair to not distinguish between Fox News broadly and Bill O’Reilly/Jesse Watters narrowly. Fox News has some good journalists and some good entertainers and provecateurs. Just as NBC as some good journalists and some good comedians. From a media relations perspective you wouldn’t give the same respone to Lester Holt and Jimmy Kimmel. Watters is in the latter category of Fox News and has repeatedly demonstrated himself so. From a media relations perspective I would come into any engagement with him with a very strong prior of “This man is a troll.”

          For some of these same reasons, I strongly disagree with the statement that “Cornell as a university which probably shouldn’t be taken seriously by the scientific community.” Cornell’s Media Relations probably shouldn’t be taken seriously by the scientific community. But I think that is more so because it is an object of the class “PR Department” than an object of the class “Cornell Organ”. The PR departments of many universities shouldn’t be taken seriously by the scientific community. I agree that Brian Wansink should definitely not be taken seriously by the scientific community. Exact in so far as he paints himself as a scientist and it is the duty of other scientists to tear that masquerade apart. The response so far by Cornell does make it seem like somebody higher up within the food chain shares some of the blame, but it’s a matter of how high we take that. Probably the rot doesn’t extend much beyond the School of Applied Economics and Management which is the next higher level of organization for the Food and Brand Lab. This itself is part of the College of Agriculture and Life Sciences which also contains the (very fine) Ecology and Evolutionary Biology and the Biological Statistics and Computation Biology Departments. Brian Wansink’s sins do not make me doubt the work being done in those other areas. What I want to see is a response by Brian Wansink’s boss. Not some poor PR peon tasked with responding to the entire ecosystems of outside media.

          For some of these same reasons, I’m not surprised that Cornell didn’t immediately launch an investigation into Wansink. We live in a very information dense, partisan and sometimes hostile scientific environment. Unfortunately it’s perhaps difficult to sort out valid criticism (yours) from less valid criticism (e.g. Watters World or the criticisms directed at Penn State’s Michael Mann). So I think your last paragraph with the Kobe Bryant analogy is perhaps apt here, but that there is another side to the coin. If I were a climate scientist or otherwise working on some topic with policy implications (e.g. endangered species, environmental contamination and human health, etc.) I would surely hope that my employer has my back and presumes my innocence unless proven otherwise.

          That said, you and Andrew and others are doing good work to point out these flaws in the research. It’s also not a surprise your encountering resistance. Inertia is a bitch. But I think presuming ill-will on part of the larger body that employs a scientists with questionable publications may work to undercut your own case by allowing room for a counterargument that you yourself are a troll. (You’re not.) Implicitly defending true trolls also may not be a good rhetorical strategy. I think you will have to be persistent, but it doesn’t hurt to also be polite.

        • Dalton: You make a lot of good points.

          Yeah, of the three of us (Nick, Tim, and I) I’m the loose cannon of the bunch with the least amount of patience. At some point during this investigation Nick Brown also got fed up. I think Tim van der Zee is getting there.

          I posted a video on youtube about the statistical tools we use and the timeline of our investigation from my perspective:
          https://www.youtube.com/watch?v=5YjZOKNM7zA

          I gave a similar talk recently at UPenn and I think it was well received. I believe we followed pretty good practices in our investigation, but Wansink’s and Cornell’s responses have led me to use a little bit of explosive rhetoric here and there.

        • Jordan Anaya: Very interesting video. You would make a good teacher. But why is there no mention of Tim van der Zee? You make it sound as though you and Nick Brown did all the work. I’ve seen van der Zee’s summary (dossier, he calls it) and it’s clear that he has contributed a lot of boring, painstaking work to this project.

        • Jordan Anaya

          “If they [Cornell University] told us they were launching a full investigation into Wansink’s work when we first contacted them”

          “paints Cornell as a university which probably shouldn’t be taken seriously by the scientific community”

          You are naive in the extreme to think that Cornell would do much of anything when first contacted by three graduate students. It just doesn’t work that way.

          Why should one bad apple cast doubt on the research of the many fine scientists and academic programs at Cornell?

        • Anonymous: I didn’t mean to downplay anyone’s contributions. The video was meant to be from my perspective, I’m sure if Nick or Tim made a video the story would be slightly different.

          Tim has made very important contributions to this investigation. Because my talk was limited to an hour with discussions I could not cover everything. Maybe I should have made a presentation with no time limit…

          I actually do think it is somewhat valid to cast doubt on the “scientists” at Cornell because of one bad apple, or at least the people in Wansink’s department. Wansink is not some obscure Cornell faculty member, he’s featured prominently on Cornell’s websites and he teaches courses and gives lectures at Cornell. No one at Cornell ever raised concerns about his obvious p-hacking? No one at Cornell ever noticed his numbers don’t add up?

          And I’m sure students and professors at Cornell are aware of the current developments, so why haven’t they done anything? If I was a faculty member and saw that a fellow faculty member committed clear academic misconduct I would start a petition to have him disciplined. All the real scientists (if they exist) at Cornell should be deeply embarrassed. And if I was a student in one of Wansink’s courses I would have been extremely disruptive, peppering him with questions throughout the lecture. He would never dare set foot in that class again.

          I have to imagine that if misconduct at this level happened at my alma mater of Berkeley the students and faculty would have done something. I would have.

        • Jordan:

          What’s interesting is that, in terms of quality of work, Daryl Bem in his ESP research is just about as bad an “apple” as Brian Wansink. But Bem doesn’t bother people as much, perhaps because he’s viewed as more of a lovable eccentric, whereas Wansink is going on government panels and garnering millions of dollars of government grants.

          Regarding the reaction, or non-reaction, of Cornell students and faculty: I think they just have better things to do. Here I am at Columbia University, where our most famous faculty member is the notorious Dr. Oz. I’m not happy that Dr. Oz is at Columbia, but I’m not distributing petitions to get him fired. I’m just ignoring him. If the Columbia public information office started promoting Dr. Oz, this wouldn’t make me happy, but I don’t know that there’s much I could really do about it.

        • I’m a tenured scientist at Cornell. I work in a different department from Wansink, in a different college, on a different part of campus. In the decade plus that I’ve been here, I’ve never met the guy. AEM is only on my radar because it has some excellent development economists, and because the last (and really only) act of a now-deceased president was to merge AEM with two other units into a new undergraduate business college, which got a lot of press in the campus newspaper.

          I’m not proud that Wansink is at Cornell, but that doesn’t make me responsible for his shoddy research, nor does it give me special powers to do anything about it. I also don’t think that all United Airlines employees are imbeciles who are responsible for the decisions of the gate agents of flight 3411 or for CEO and PR department’s tone-deaf response to the “overbooking” incident.

        • Cardinal Sins: I think the United comparison is interesting. Yes, being dragged off your flight for no reason is a rare occurrence, and doesn’t reflect on the employees, but when it happens it raises questions about why and how often it occurs. Since the event stats have come out showing United overbooks flights more often than its competitors and based on the CEO’s response it was clear that the employees were following standard operating procedures. As a result, I think it’s safe to say that if you flew United you were at a higher risk.

          Similarly, I’m sure a Wansink is a rare event, but given Cornell’s response I have no reason to believe they view Wansinks as a problem and I have no reason to believe Cornell isn’t enriched in Wansinks. In fact, on this very blog Gelman talks about Daryl Bem and Thomas Gilovic, who are both from Cornell. Based on all the evidence it’s not hard to imagine that Cornell specifically recruits these types of “scientists”. Perhaps this is only limited to one or two departments, but until I have reason to believe otherwise any time I see the affiliation of Cornell on a paper I’m going to view it as a huge red flag.

          Extending the bad apple analogy, imagine you went to the store and found a rotten apple in clear view of all the employees. They clearly see it but you mention it to them anyways. They tell you they don’t see anything wrong with the apple. You contact their manager and he/she says there’s nothing wrong with the apple. The CEO of the store chain releases a statement saying they investigated the case found no problems with the apple. Are you really going to keep shopping at a store that pawns off rotten apples as fresh fruit? Do you really want to stick around to see the quality of the other food?

        • Jordan: the problem with the bad apple analogy is that it uses our disgust response to obfuscate that the mechanism by which apples rot doesn’t clearly analogize to how “bad actors” taint the whole group.

          That is, Wansink’s research malpractice is quite disgusting. But for the bad apple analogy to hold, there must be a corruptive influence that Wansink has. And I’d suggest that at least within Cornell it would be limited to the Food lab. And perhaps more broadly if you accept the argument that Wansink’s work has central theme of mindlessness when it comes to eating. But I wouldn’t label

          Bem, on the other hand, has had I think a much wider corruptive influence. From his “Writing the Empirical Journal Article,” which I was required to read in my first year of graduate school.

          “Analyzing Data. Once upon a time, psychologists observed behavior directly, often for sustained periods of time. No
          longer. Now, the higher the investigator goes up the tenure ladder, the more remote he or she typically becomes from the grounding observations of our science. If you are already a successful research psychologist, then you probably haven’t seen a participant for some time. Your graduate assistant assigns the running of a study to a bright undergraduate who writes the computer program that collects the data automatically. And like the modern dentist, the modern psychologist rarely even sees the data until they have been cleaned by human or computer hygienists.

          To compensate for this remoteness from our participants, let us at least become intimately familiar with the record of their behavior: the data. Examine them from every angle. Analyze the sexes separately. Make up new composite indexes. If a datum suggests a new hypothesis, try to find additional evidence for it elsewhere in the data. If you see dim traces of interesting patterns, try to reorganize the data to bring them into bolder relief. If there are participants you don’t like, or trials, observers, or interviewers who gave you anomalous results, drop them (temporarily). Go on a fishing expedition for something—anything—interesting.

          No, this is not immoral. The rules of scientific and statistical inference that we overlearn in graduate school apply to the “Context of Justification.” They tell us what we can conclude in the articles we write for public consumption, and they give our readers criteria for deciding whether or not to believe us. But in the “Context of Discovery,” there are no formal rules, only heuristics or strategies. How does one discover a new phenomenon? Smell a good idea? Have a brilliant insight into behavior? Create a new theory? In the confining context of an empirical study, there is only one strategy for discovery: exploring the data.

          Yes, there is a danger. Spurious findings can emerge by chance, and we need to be cautious about anything we discover in this way. In limited cases, there are statistical techniques that correct for this danger. But there are no statistical correctives for overlooking an important discovery because we were insufficiently attentive to the data. Let us err on the side of discovery.”

          Bern, D. (2003). Writing the empirical journal. The compleat academic: A Practical Guide for the Beginning Social Scientist, 2nd Edition.

          src: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.687.6970&rep=rep1&type=pdf

          Andrew: some of us are very pissed with Bem.

        • AnonAnon: Maybe Wansink was just getting advice from Bem this entire time!

          I still think the bad apple analogy has some merit. By not only tolerating but promoting bad apples it suggests a problem with the culture at Cornell. I’ve only attended two different universities so maybe I’m not in the best position to judge this, but I immediately noticed a huge difference in culture between Berkeley and UVA. It’s not hard to imagine that if I enrolled in a program at Cornell I would also detect a difference in culture. You could say I might smell what they are cooking.

          P.S. In case you were wondering, UVA is where science goes to die.

        • Anonymous: Kind of funny you should mention that. During my recent visit to UPenn I learned people there refer to the COS as the “Center for Open”. They leave out a key word.

        • Jordan Anaya: Berkeley is an outlier in terms of activism, even today, although it’s a far cry from what it was in the
          60s (Mario Savio, etc.). You cannot expect other schools to operate the same way.

  8. > “The researcher did not review the quality of the data, hypotheses, research methods, interpretations, or conclusions.”

    I’m aware of but not particularly following the Wansink business; however, that sentence got my attention. In the context of the letter it’s taken from the takeaway is “With respect to Wansink’s Stata code, the researcher could not distinguish between ‘runs to completion’ and ‘works’ if their life depended on it.” If you’re not going to attempt to establish whether the result is worth damn then why bother? Your time would be just as well spent picking your nose and drinking beer.

  9. Is it safe to say that Cornell University should no longer be considered Ivy League? Because c’mon, protecting this kind amount of misconduct tells all about an organization that you need to know.

      • Harvard has those “the replication rate in psychology is quite high—indeed, it is statistically indistinguishable from 100%” guys. Princeton had that guy who gave Hillary Clinton a 98% chance of winning. And, here at Columbia, we’ve Dr. Oz!

  10. I see that the article by Wansink and Park (2002, Journal of Sensory Studies) has been retracted now, although there is no explanatory note attached. Presumably the retraction occurred because the article duplicated the earlier article by Wansink, Park, Sanka, and Morganosky (2000, International Food and Agribusiness Management Review), as detailed by Nick on his steamtraen blog.

    Kudos.

    Interesting name: Se-Bum Park.

    I wonder if Nick would share with us the letter to the editor of JoSS that resulted in the retraction?

    How could Cornell University not consider a duplicate publication to be research misconduct? Maybe it hasn’t gotten to anything but the four “pizzagate” papers yet?

    • (Anonymous)
      >>I wonder if Nick would share with us the letter to the editor
      >>of JoSS that resulted in the retraction?

      I didn’t contact the journal. In fact this particular article only appeared on my radar quite recently (I blogged about it on March 20), so the journal has acted quite rapidly. As I noted in my blog post, the 2002 article (which has two authors, Wansink and Park) acknowledges the contribution of two other authors to “the original version of this project”, by which Wansink and Park appear to mean the 2000 version of the article, which had four authors. Indeed, the 2000 version is cited on p. 484 of the 2002 version: “In extending past research (Wansink et al. 2000)”.

    • There is now an explanatory note (“Retraction Statement”) attached to the retraction at JoSS. The article is retracted due to “major overlap with a previously published article.” There is no indication that Wansink agreed to the retraction.

Leave a Reply

Your email address will not be published. Required fields are marked *