Pizzagate update: Don’t try the same trick twice or people might notice

[cat picture]

I’m getting a bit sick of this one already (hence image above; also see review here from Jesse Singal) but there are a couple of interesting issues that arose in recent updates.

1. One of the weird things about the Brian Wansink affair (“Pizzagate”) was how he responded to such severe criticism (serious in its content and harsh in its tone) in such an equanimous way.

Critic A:

You pushing an unpaid PhD-student into salami slicing null-results into 5 p-hacked papers . . . Because more worthless, p-hacked publications = obviously better….? . . . I really hope this story is a joke. If not, your behaviour is one of the biggest causes of the proliferation of junk science in psychology and you are the one who should be shamed, not the postdoc.

Wansink’s reply:

I understand the good points you make. . . .

Critic B:

I’m very grateful to you for exposing how these valueless and misleading studies are generated. . . .

Wansink’s reply:

Hi George, You are right on target. . . .

Critic C:

There is something catastrophically wrong in the genesis of the papers you describe . . . the moment you start exploring your data, the p-value becomes worse than worthless: it is positively misleading. It misled you, the entire field of psychology and many more. To be honest, it’s even worse than this . . .

Wansink’s reply:

Outstanding! Thank you so much for point out those two paper. (I downloaded the draft form and will be making it required reading for my team). You make outstanding points. . . .

As I wrote the other day, Wansink’s reactions just don’t add up. When people point out major problems with your work and say to you that your work is worthless, you might react with anger, or you might realize they’re right and go clean up your act, or you might change your career.

But Wansink’s responses are just . . . off. He doesn’t act annoyed or offended at all; instead he thanks the critics. That’s fine, except then he doesn’t react to the criticisms. Wansink’s “You are right on target” and “I understand the good points you make” are empty, in that in effect he’s admitting to repeated research incompetence and perhaps unethical behavior in pressuring students and postdocs to use poor research practices.

Similarly puzzling are Wansink’s reaction to the 150 errors that an outside research team found in his four papers. Who has 150 errors in four papers? When does that ever happen? But Wansink’s like, no big deal, he’ll “correct some of these oversights.” 150 errors is not an “oversight”; it’s an absolute disaster!

So what’s going on?

I got some insight by reading this post by Tim Smits, who experienced the very same behavior from Wansink, five years ago, regarding a completely different paper! Here’s Smits:

A series of academic social media posts and a critical article target the lab’s inferior methodology and old school approach to rendering null-effects into (a set of) publishable papers. In this post, I [Smits] want to give my account of a previous similar situation that I had with the same lab in 2012. . . .

In 2011, the Cornell research published an article (Zampollo, Kiffin, Wansink & Shimizu, 2011) on how children’s preferences for food are differentially affected by the how the foods are presented on a plate compared to adults. . . . some of the findings were incomprehensible from the article . . . I wrote a polite email. Asking for some specific information about the statistics. This was the response I got.

Dear Tim, Thank you for being interested in our paper.Actually there are several errors in the results section and Table 1. What we did was two step chi-square tests for each sample (children and adults), so we did not do chi-square tests to compare children and adults.As indicated in the section of statistical analysis, we believe doing so is more conclusive to argue, for example, that children significantly prefer six colors whereas adults significantly prefer three colors (rather than that children and adults significantly differ in their preferred number of color). Thus, for each sample, we first compared the actual number of choices versus the equal distribution across possible number of choices. For the first hypothesis, say #1=0, #2=0, #3=1, #4=0, #5=2, #6=20 (n=23), then we did a chi-square test (df=5) to compare those numbers with 3.83 — this verified the distribution is not equal. Then, we did second chi-square test (df=1) to compare 20 and 0.6 (the average of other choices), which should yield 18.3. However, as you might already notice, some of values in the text and the table are not correct — according to my summary notes, the first 3 results for children should be: 18.3 (rather than 40.4)16.1 (rather than 23.0)9.3 (rather than 26.88) Also, the p-value for .94 (for disorganized presentation) should not be significant apparently. I am sorry about this confusion — but I hope this clarify your question.

Well, that was interesting. Just one email, and immediately a bunch of corrections followed. Too bad the answer was nonsensical. So I wrote back to them (bold added now):

When reading the paper, I did understand the first step of the chi-square tests. I was puzzled by the second step, and to be honest, I still am a bit. The test you performed in that second step boils down to a binomial test, examining the difference between the observed number of counts in the most preferred cell and the H0 expected number of counts. Though this is informative, it does not really tell you something about how significant the preferences were. For instance, if you would have the following hypothetical cell counts [0 ; 0 ; 11; 0; 0 ; 12], cell 6 would still be preferred the most, but a similar binomial test on cell 3 would also be strongly significant. In my opinion, I thus believe that the tests do not match their given interpretations in the article. From a mathematical point of view, your tests on how much preferred a certain type of plate is raise the alpha level to .5 instead of .05. What you do test on the .05 level is just the deviation in the observed cell count from the hypothesized count in that particular cell, but this is not really interesting

Then, this remarkable response came. . . . they agree with the “shoddy statistics” . . . Moreover, they immediately confess to having published this before.

I carefully read your comments and I think I have to agree with you regarding the problem in the second-step analysis.I employed this two-step approach because I employed similar analyses before (Shimizu & Pelham, 2008, BASP). But It is very clear that our approach is not appropriate test for several cases like the hypothetical case you suggested. Fortunately, such case did not happen so often (only case happened in for round position picture for adults). But more importantly, I have to acknowledge that raising the p-value to .5 in this analysis has to be taken seriously. Thus, like you suggested, I think comparing kids counts and adults counts (for preferred vs rest of cells) in 2×2 should be better idea. I will try to see if they are still significant as soon as I have time to do.

You see what happened? Wansink did the exact same thing years ago! Someone sent him a devastating criticism, he or someone in his lab responded in a cordial and polite way, gave tons of thanks, and then they did essentially nothing. As Smits put it, these are “old school researchers and labs, still empowered by the false feedback of the publishing system that tends to reward such practices. . . . But how on earth can you just continue with “shoddy methodology” after someone else pointed that out for you?”

Smits concludes:

Just to take this one article as an example: Their own press releases and outreach about that study did not show a single effort of self-correction. You can still find some of that material on their website. Similarly, despite the recent turmoil, I have seen them just continue their online communication efforts.

Indeed, here are the most recent items on Wansink’s “Healthier and Happier” blog:

Behind-the-Scenes with Rachael Ray

Keeping the Change [about someone’s weight loss]

Foreign Weight [inviting people to participate in an online survey]

Congratulations, You’re Already Losing Weight

First Seen is First Eaten – The Solution

The Solution to Mindless Eating

I’ll decline the opportunity here to make a joke like, “The Solution to Mindless Publishing and Promoting.”

The point is:

(a) Over the past few months, Wansink has received harsh and on-the-mark criticism about his research methods and his claimed results. In his words, he’s accepted much of this criticism, but in his actions, he’s ignoring it, minimizing it, indeed seems to be using his words in an attempt to defuse the criticism without ever addressing it.

(b) The same thing happened five years ago. Back then, the strategy worked, in the sense that the revelation of research incompetence in that earlier paper did not stop him from continuing full steam ahead with his group, getting funding, publishing in respected journals, going on national TV, etc.

Hearing about (b) gives me a lot more insight into (a). I’m no longer puzzled by Wansink’s recent behavior. Now it all makes sense: he’s following the same strategy that worked before.

That said, from my perspective it would seem like less effort to just not write papers that have 150 errors. I still can’t figure out how that happened. But by Wansink’s own admission he puts his students and postdocs under a huge amount of pressure, and accuracy is clearly not high on anybody’s priority list over there. People don’t always respond to incentives, but responding to incentives is usually a lot easier than not responding to incentives.

2. Jesse Singal’s aforementioned news article had this wonderfully revealing quote from Cornell University’s public relations officer:

Recent questions have arisen regarding the statistical methods utilized by Professor Brian Wansink

That’s misleading. The issue is that Wansink doesn’t seem to be using any statistical method at all, as no known statistical method can produce the numbers in his tables.

Yah, yah, sure, the P.R. guy’s job is not to spread truth, it’s to promote (or, in this case, to minimize damage to) Cornell University. The whole thing’s kind of sad, though. Who’d want a job like that?

The P.R. guy also says, “we respect our faculty’s role as independent investigators to determine the most appropriate response to such requests, absent claims of misconduct.”

Which makes me wonder: If you publish four different papers on the same failed experiment, and you make 150 errors in the process, and in the meantime you’re pressuring your students and postdocs to do this work . . . does this count as “misconduct”? Recall Clarke’s Law.

Sure, all of this put together still only rates a zero on the Richter scale compared to what’s happening every day in Washington D.C. but it still bothers me that this is standard operating procedure in big-time science. From a certain point of view, it’s just kind of amazing.

90 thoughts on “Pizzagate update: Don’t try the same trick twice or people might notice

  1. Ironically, Wansink published an article in Public Undersanding of Science last year, “Blinded with science: Trivial graphs and formulas increase ad persuasiveness and belief in product efficacy”. The abstract begins: “The appearance of being scientific can increase persuasiveness.”

    See here: http://s3.amazonaws.com/academia.edu.documents/44102408/scientificposturing.pdf?AWSAccessKeyId=AKIAIWOWYYGZ2Y53UL3A&Expires=1486663861&Signature=6YKsM4gMbud8Pn9JKDhPbWmoZJo%3D&response-content-disposition=inline%3B%20filename%3DBlinded_with_science_Trivial_graphs_and.pdf

    • Neil:

      I’m reminded of Bruno “Arrow’s Theorem” Frey who wrote all sorts of articles about corruption-inducing incentives in scientific publication, and then up and broke the rules himself. Frey’s own behavior must have made him painfully aware of problems with the system, and the same thing may be the case with Wansink. At this point, I’m pretty sure that he is actually aware that it’s bad form to publish papers with hundreds of errors.

        • Tor:

          Yes, I guess I do think Frey was doing something wrong. When writing that earlier post several years ago I was trying to give him all benefit of doubt, but in retrospect it does seem like Frey was behaving unethically. Publishing 4 essentially identical papers without citing each other is something you’re not supposed to do, even though it does less harm, in my opinion, than publishing 4 papers which between them have over 150 errors.

  2. It sounds like Wansink has been following something along the lines of Harriet Lerner’s (dubious) advice, in her book “Why Won’t You Apologize?” on how to deal with “big-time criticism” (I am listing the steps here but omitting the explanations):

    1. Recognize your defensiveness.
    2. Breathe.
    3. Listen only to understand.
    4. Ask questions about whatever you don’t understand.
    5. Find something you can agree with.
    6. Apologize for your part.
    7. Let the offended party know he or she has been heard and that you will continue to think about the conversation.
    8. Thank the critical party for sharing his or her feelings.
    9. Take the initiative to bring the conversation up again.
    10. Draw the line at insults.
    11. Don’t listen when you can’t listen well.
    12. Define your differences.

    What really gets me is #7, “Let the offended party know he or she has been heard and that you will continue to think about the conversation.” That seems like a way out of addressing the problem. “Thank you so much. Please know that I take your criticism seriously and will be thinking about it in the days to come.”

    What’s missing is any indication that the offending party will address the problem itself. Something here seems dreadfully voicemail-recording-ish.

    I have no idea whether Wansink reads Lerner, but if he does, his responses are “by the book.” He hits at least half of the points–but also captures the essence of the whole by appearing open, concerned, grateful, unflummoxed, and unmoved.

  3. Andrew: You missed the fact that the earlier (e-mail) exchange with Tim Smits was not with Brian Wansink himself, but with Mitsuru Shimizu, who was a post doc in Wansink’s lab.

    • “However, P-hacking shouldn’t be confused with deep data dives – with figuring out why our results don’t look as perfect as we want.”

      I think that Wansink’s singular contribution to science might be the wonderful phrase “deep data dives”. Wahnsinn!

      • What would be an appropriate answer for you? I mean literally, let’s say I took one of your papers and said there’s a problem there. Could you quote what you would say.

        As for me I would say: “Thank you for pointing that out, we will work on that” or something related.

        • As I said, there are more likely explanations, and there is no need for insinuation about someone’s mental state.
          And yes, I could imagine someone replying in exactly that manner, if they consider the criticism unfounded and they are not interested in fanning the fire, or (from their perspective) feeding trolls.

        • John,

          If somebody described my work as “valueless and misleading” or “catastrophically wrong” or said to me, “your behaviour is one of the biggest causes of the proliferation of junk science in psychology and you are the one who should be shamed,” then I might not reply at all or I might express disagreement or I might even apologize (if I was convinced of the correctness of the criticism), but I wouldn’t blandly reply, “I understand the good points you make,” “You are right on target,” etc. without actually engaging with the criticism.

          And if somebody found 150 errors in my papers, I’d be like, Oh shit, what happened? I wouldn’t give a chipper response on how I’m gonna improve my workflow and then make a plan to start giving advice to other people on the topic.

          Wansink’s responses make no sense if taken literally. But if considered as a damage-avoidance strategy (replicating what worked several years earlier), then I can see it.

        • @Shravan and Prof Gelman,

          1.) I think we all agree Prof Wansink’s (or whoever replied in his stead) replies are *weird*, and sometimes come across as nonsensical, but my point is that: (a) even though you and I might choose to reply meaningfully, he has no obligation to (some researchers do not believe in post-publication review or discussion), and (b) ad hominem has no place in a serious (mostly!) discussion forum.

          2.) It is important to distinguish the obvious errors (N=89 v N=95 or whatever), from the less obvious garden-of-forking-paths errors, which most researchers seem not to understand (or want to recognize). For many researchers, sadly, slicing & analyzing the same data M different ways to yield M different models (and even M papers) is not an error, it’s how you get lots of papers…

        • Fair enough about the ad hominem point—my apologies. I was reacting to the utter weirdness of his responses. But you are certainly right that he has no obligation to reply meaningfully.

        • Andrew:

          “And if somebody found 150 errors in my papers, I’d be like, Oh shit, what happened?”

          The casual use of smutty, scatological language is becoming more prevalent in this blog. Wansink as a whipping boy does not require shoot-from-the-hip descent into potty English. Vulgarities should be few and far between in order to maintain civil discourse. As 2016 came to a close, you once wrote regarding previous transgressions of language

          http://statmodeling.stat.columbia.edu/2016/12/30/kevin-lewis-paul-alper-send-much-material-think-need-blogs/

          “OK. No guarantees but I’ll try my best.”

          Wansink is not the only one who makes an “occasional” error.

        • I disagree with Paul. I certainly respect that it might not be what he prefers to read, but I don’t think there’s anything wrong or uncivil or distasteful about using “potty English” (and pretty tame potty English) in that way on your blog. It would be one thing if you had used “smutty, scatological language” in a direct insult towards someone, but you were describing your hypothetical reaction to the 150 errors. What you wrote doesn’t seem at all inappropriate to me in that context and I think the way you worded it is actually better writing than if you had put “Oh shoot!”. It’s more genuine. I understand if you wan’t to avoid it in order to not offend readers like Paul, but I think your blog is enjoyable to read in part because you let yourself write freely. That and all the power pose ‘shoot’.

        • Andrew, you are a saint, a certified G, a bona fide stud. And you can’t teach that. I would have responded very differently to paul alper.

          P.S. I don’t really know the context here, maybe you and paul are friends.

        • Thanks Paul, that’s exactly what I’m worried about. This sometimes feel like a which hunt, and many of those throwing stones have glass ceilings. Keep up the good work debunking bad research, but let’s calm down a little bit people!

        • I’m referring to exchanges like this one:

          Commentator: “You pushing an unpaid PhD-student into salami slicing null-results into 5 p-hacked papers and you shame a paid postdoc for saying ‘no’ to doing the same.
          Because more worthless, p-hacked publications = obviously better….? The quantity of publications is the key indicator of an academic’s value to you?
          I really hope this story is a joke. If not, your behaviour is one of the biggest causes of the proliferation of junk science in psychology and you are the one who should be shamed, not the postdoc.”

          Wansink: “I understand the good points you make. There isn’t always a quantity and quality trade-off, but it’s just really important to make hay while the sun shines. If a person doesn’t want to one, they need to do the other. Unfair as it is, academia is really impatient. ”

          This response strikes me as utterly incoherent. I certainly would not reply to someone who says to me he hopes my post is a joke by saying “I understand the good points you make.” The rest of the response makes no sense to me at all. Make hay while the sun shines? What the heck does that even mean in the present context?

          Anyway, I agree that he is probably not literally mentally not all there. I was referring to the incoherent nature of his responses.

        • “The rest of the response makes no sense to me at all. Make hay while the sun shines? What the heck does that even mean in the present context?”

          I think I could rephrase what he is saying as:

          There is a quantity-quality trade off. I’m not making any judgement about which you should emphasize, but you have to produce something. If you don’t want to do quality, then you have to go with quantity. It’s not fair, but academia is really impatient.

          Which seems like a blatant endorsement of blasting out a stream of low-quality papers to appease the powers that be in academia rather than be forced out for doing nothing.

  4. One version had “PhD Statistician.”

    Why checking the accuracy of the statistics presented needs “a Stats Pro, “a PhD statistician,” or “a PhD econometrician is puzzling. The statistics used are simple. Someone just needs to check the SAS/SPSS/R/Minitab/Stata/whatever output against the results reported in the articles.

    I agree that Wansink’s responses to the comments on his blog are very strange, but perhaps Wansink is not actually the person writing the responses. Wansink’s website lists several staff members who might be responding on Wnasink’s behalf without really understanding. For example, there is a “communications specialist” who is responsible for editing website content, coordinating media outreach, and handling social media. Another staff member does editing. And so on.

    • Carol: A sinister view of why they need an expert is they want to redo the analyses in such a way that they still give the same conclusions, but no longer contain any errors that we can detect. For a single paper this might not be difficult, but the sample sizes and other statistics need to be consistent across all four papers.

      Posting the data set would be very bold of them. The data set they post may be consistent with their new analyses, but there is no way it will also be consistent with the results which were published. I assume they only mentioned they will try and post the data set to try and further defuse the situation.

  5. “If you publish four different papers …make 150 errors in the process, …does this count as “misconduct”?”

    OK but you (or other commenters) cannot play the DA, jury, and judge. What bugs me is that there is no way to mediate or arbitrate situations like this one. Let’s say I’m an outsider such as a journalist. I have one famous professor publishing research that passes peer review and he does not admit to any significant errors. I have another famous professor, and other bloggers or commentors, pointing out errors (or alleged errors–innocent until proven guilty?). How do I know who is right? I’ve asked this before, but commenters replied, essentially, “Well the errors are obvious.” Well no–not to a third party. Not to a journalist, or a Dean, or a Provost. And the original researcher can say, Look the editors and referees didn’t see any problems.

    My point is, something’s missing. Post-publication review is great, but we need some way to get to a decision, like a post-publication editor’s decision. Otherwise researchers get defensive and some even end up calling their critics “terrorists”, which is ridiculous, but I’m looking for a way to break through because we have a stalemate right now.

    • Umm, the total number of diners for “Low prices and high regret: how pricing influences regret at all-you-can-eat buffets” is stated as 95.

      In Table 2 the sample sizes only add up to 89.

      Are you saying journalists don’t know how to add?

      • Jordan:

        I take the point that these are simple enough calculations but are you somehow implying that a journalist should know what numbers to add together from the table to match the sample size previously reported? Or put differently, why should a layperson like a journalist believe that their addition is correct in comparison to an authority figure like Wansink? Or put differently, your argument pushes responsibility onto the journalist, dean, etc to actually read the articles. When in reality that responsibility should depend upon the original reviewer.

        The substance that I take from Jack’s point is that how does a layperson resolve disputes between two putative authorities. A subject matter expert like Wansink or stats/methods experts like Andrew. Your reply doesn’t solve that problem because it ignores the subject matter expertise that is required to know what numbers to sum.

    • Jack:

      What Jordan said. In my earlier posts I gave specific examples of inconsistencies in those publications. Not every journalist can run R, but some of them can, and others have friends who can check the numbers for them.

      I don’t know who the dean or provost are at Cornell but I bet they can do these simple calculations or they know people who can.

      But, sure, everyone has to make their own decision. Just as I wonder what it would take to get Brian Wansink worried about his research methods—apparently, having 150 errors pointed out to him is not enough to get him to seriously reassess—, I also wonder what it would take for the Ted talk people to stop promoting power pose. You’d think that a detailed retraction by the first author of the original paper would be enough, but apparently not.

      To me, the scariest thing about the whole Wansink episode is that almost the exact same thing happened 5 years ago, Wansink dodged, and just about nobody was the wiser. This time seems different but only because of the intense level of publicity.

    • First thought +1 for the points you’re raising.

      Follow up thought. I don’t believe there will be some formal mechanism for adjudicating these conflicts. Absent that there are a variety of heuristics one might rely upon. First, Gelman and other recognized statistical/methodological authorities should carry greater weight than the claims of the original author if one assumes/knows that the original author (or team of authors) do not contain a statistical / methodological expert.

    • I’ve thought about this a lot. I think one of the most plausible explanations is that they wrote all four papers at the same time and maybe mixed up the analyses, but this doesn’t explain errors in their non-pizza papers. So yes, maybe they are doing calculations by hand.

      • My best guess is that there is missing data which when coupled with procedures that vary in their use of listwise vs pairwise deletion could produce some of the inconsistencies.

        • Anon:

          Could be. For example, in the table I pasted into this post, perhaps the research assistant used a different dataset for the third column than for the first two.

          It also seems likely that nobody in the lab knew what they were doing. They could’ve made all sorts of mistakes such as dividing by n rather than sqrt(n), or using the wrong n’s, or mistaking standard deviation for standard error, or sometimes dividing by n and sometimes by n-1 when computing standard deviations, or coding missing data as -9’s and accidentally including them when computing averages, or . . . there are just endless ways of getting things wrong.

          As Tolstoy never wrote, all competent data analyses are alike, each incompetent data analysis is incompetent in its own way.

          It’ll be a good thing once Wansink writes up this guide to research practices that he’s planning. Who better to do it than someone who managed to make 150 errors out of 4 papers from a single failed study? He can perhaps team up with Richard “no data point left standing” Tol.

        • Andrew: I’d be very surprised if there were not already a “guide to research practices.” Perhaps SAS has one? Or SAGE, which publishes those “little green books.” I’ll look around.

        • Here’s one: DATA MANAGEMENT FOR RESEARCHERS: ORGANIZE, MAINTAIN, AND SHARE YOUR DATA FOR RESEARCH SUCCESS by Karen Briney (see Amazon.com).

        • Carol:

          In one of the revisions to his notorious post, Wansink wrote, “When we finish these new SOPs (and test them and revise them), I hope to publish them (along with implementation tips) as an editorial in a journal so that they can also help other research groups.”

    • anon: Well, one possibility is an inexperienced graduate student who perhaps has not had much statistical training and is not a native speaker, combined with one or more professors who are not paying much attention.

      Many years ago, a graduate student in social psychology came to my office eager to show me the wonderful results of her study. I flipped through page after page of her fat line-printer output. Nada. Not one p value > .05. I still don’t understand it.

    • There are a lot of ways you can get these sort of things wrong. Here is a list of things I could think up.

      – Simple data errors where there is a minor miscoding of some sort that leads to data exclusion.
      – Unreported conditions for inclusion that reduced the sample size.
      – Copying and pasting non-automated table generation and mixing it up.
      – Fraud but sloppy fraud.

      Since these papers appear to be pushed out fast, I’m a fan of some minors errors from code or processes that were never double checked. Toss in some junior grad or undergrad students doing grunt work and there is plenty of room for mistakes. Since the results look good, why bother double checking? This seems like a simple explanation.

        • I’m not sure. I think it seems crazy on first sight because, in general, we never audit papers much. 150 errors might be more common than we think.

        • It is. One consideration is that 150 errors in the results is not necessarily the result of 150 independent mistakes. Since many of the errors were of the same type or in a table it could be the result of a few mistakes each breaking a entire table. This seems to happen a lot with my code and projects where a single mistake causes multiple errors down the line.

          If its true, this would mean either no quality control causing lots of little mistakes or something that corrupts the whole dataset from the start. I’m thinking along the lines of Reinhart and Rogoff where an error near the beginning of a project with the dataset breaks everything past that point.

          Regardless of why it happened, it shows that the work produced by this lab in not credible until they can provide a transparent explanation for what went wrong, review/correct old work, and adopt new quality control methods statistical or otherwise.

      • Paul Fisher: I agree. One of the problems is surely missing data, which the GRIM programs don’t handle, if I understand correctly. But other problems are evident, too, such as reporting a df larger than the number of subjects. As I wrote earlier, my guess is that this is sloppy work by an inexperienced and rushed graduate student who may not have much statistics under her belt and was not supervised properly by the professor(s) involved. I’ve often helped graduate students and sometimes undergraduate students with their research projects, and the number of mistakes that they can make is astounding. And sometimes those mistakes cascade.

        I doubt that there is fraud involved (or perhaps I just really hope not), with the possible exception of the authors not acknowledging that they were using the same dataset in all four articles. The graduate student may not have known to do this, but the professors involved surely did.

  6. Wansink is unfortunately just one of a growing number of academic psychologists who have made their name and very substantial fortune off the publicity that they attained from a specific paper or small number of papers that were later found to be characterized by various fatal flaws or that simply could not be replicated (e.g., Fredrickson, Cuddy, Duckworth). The general pattern seems to be to ignore the criticism and simply go on as if nothing had happened – continuing to cash in on the speaking and book circuit while making occasional placating noises in academic circles. Their host institutions seems to be quite happy to go along.

    • It seems to me that a further common factor in cases such as these is that those researchers often have a substantial number of followers — whether these be individuals, institutions (such as school systems), or simply publishers who are hoping to earn back the 6- or 7-figure advance they paid for the self-help book — who have a substantial intellectual, emotional, and/or financial investment in the pronouncements of these researchers being “true”, and hence continue to drink the Kool-Aid for rather longer than the evidence might justify.

    • Marcus: The senior psychologists often bring substantial grant money into their institutions, so those institutions have a strong incentive to look the other way.

  7. “Sure, all of this put together still only rates a zero on the Richter scale compared to what’s happening every day in Washington D.C. but it still bothers me that this is standard operating procedure in big-time science. From a certain point of view, it’s just kind of amazing.”

    How much is nature and how much is nurture? We don’t really know. There are a lot of people in the world who seem pretty crazy to me because they see the world so differently than I do. But they might symmetrically see me as crazy — and might see responses like Wansink’s as being kind and tolerant to a crazy person who doesn’t “get it”.

  8. About the data release issue, Andrew argued that nobody is under the obligation to release data, and Wansink maintained that he had promised anonymity.

    The German Research Foundation (DFG) takes a hard-line position on this point. Here is the relevant text from the Good Scientific Practice document that every researcher who gets funded by the DFG (and this must be pretty much everyone in academia in Germany?) must agree to follow. The document has this explosive statement:

    “The published reports on scientific misconduct are full of accounts of vanished original data and of the circumstances under which they had reputedly
    been lost. This, if nothing else, shows the importance of the following statement: The disappearance of primary data from a laboratory is an infraction of
    basic principles of careful scientific practice and justifies a prima facie assumption of dishonesty or gross negligence (9).” (pages 75-76)

    Here is the document: the English section comes after the German text.
    http://www.dfg.de/download/pdf/dfg_im_profil/reden_stellungnahmen/download/empfehlung_wiss_praxis_1310.pdf

    • In this case I don’t think anyone has claimed that the data have been lost; apparently they exist (although as Andrew has noted, they cannot possibly be consistent with the reported results) and the lab is simply declining to share them.

      But I agree with the wider point that the alleged physical unavailability of primary data within less than five years of collection (and frankly, now that we don’t use diskettes any more, that could be longer) is very likely to be evidence of either some kind of malpractice, or else a level of massive incompetence that suggests that nothing that comes out of the lab in question can be trusted. I cannot believe that so many people who call themselves scientists appear to be unable to operate a computer to the point where they can insert a USB stick, create a directory whose name is today’s date, and drag a file to it.

      In many cases, the cost of collecting those data may have been where half or more of the grant money went. If I ever play the lottery and discover I have a winning ticket, I’m not going to stand outside in the wind and wave the ticket around; I’m going to place it inside a very thick book and drive very slowly down to the claims office.

      As Anthony Bourdain writes (https://www.theguardian.com/books/2000/aug/12/features.weekend1), “I won’t eat in a restaurant with filthy bathrooms. This isn’t a hard call. . . . If the restaurant can’t be bothered to replace the puck in the urinal or keep the toilets and floors clean, then just imagine what their refrigeration and work spaces look like.”

      • I guess it’s not unheard of that people lose their data (data loss does happen), so I actually think that the DFG statement is a bit too sweeping. I have had people not being able to release their data for good reasons that effectively amount to data loss; I don’t believe there was any intent to hide anything. In one case, a researcher just didn’t have the time to release the data, which is highly plausible because it can take a long time to pull together data that is 5 or so years old. I myself try to put up all the data and code online as soon as the paper is published. I know that some other labs in psycholinguistics do that too. It would be great if a paper just came with the data + code so one didn’t have to ask.

        Related: Amy Cuddy deserves a lot of credit for releasing the data from her original Ted-talk study. I think that she should be recognized for having done the right thing in this whole power pose discussion.

      • Nick: Unfortunately, in psychology at least, graduate courses in statistics and methods do not usually teach practical skills
        like managing your data and your research projects, etc. Sometimes a graduate student will learn these skills from the professor heading the lab, or from the more senior students in the lab, but often this is not so. I once threatened to teach such a course; I thought that I would be insulting the professor to whom I made this suggestion, but his eyes lit up.

        It is highly unlikely that “In many cases, the cost of collecting those data may have been where half or more of the grant money went.” The bulk of grant money in the USA goes for (1) equipment, materials, and services, (2) graduate research assistant salaries, tuition/fee waivers, and benefits, (3) conference travel for the PI and graduate students, (4) summer salary for the PI, (5) indirect costs, which can be 60-80% of the direct costs, and so on. At large research universities, there is usually a substantial subject pool (undergrad students who get course credit rather than payment for participating in faculty/student research projects). And nowadays there is Mechanical Turk, etc.

  9. We’ve just had some very interesting developments.

    First, we have noticed that all of our archived versions of Wansink’s blog at the Wayback Machine have been taken down.

    We had the blog archived at two different urls:
    http://www.brianwansink.com/phd-advice
    http://www.brianwansink.com/phd-advice/the-grad-student-who-never-said-no

    Both are now missing all previously saved versions.

    This isn’t really that big of a deal since we have been saving PDFs of the blog, but it is interesting. This entire time Wansink has either been ignoring us or playing defense, this appears to be the first proactive measure.

    Second, the Food and Brand Lab deleted their Throw Back Thursday tweet about the pizza paper.

    I kind of wonder if they read this blog to stay on top of what we’re doing. If so I don’t mind, they need all the help they can get.

  10. Jordan Anaya: I’m curious. How can *someone else* remove, or ask to have removed, files that *you* saved like this, without notification to you? I’ve heard of the Wayback Machine, but I don’t know how it works.

    Wansink’s lab has a “communications specialist” who handles social media. Perhaps she is monitoring the situation at Wansink’s request.

Leave a Reply

Your email address will not be published. Required fields are marked *