Organizations that defend junk science are pitiful suckers get conned and conned again

[cat picture]

So. Cornell stands behind Wansink, and Ohio State stands behind Croce. George Mason University bestows honors on Weggy. Penn State trustee disses “so-called victims.” Local religious leaders aggressively defend child abusers in their communities. And we all remember how long it took for Duke University to close the door on Dr. Anil Potti.

OK, I understand all these situations. It’s the sunk cost fallacy: you’ve invested a lot of your reputation in somebody; you don’t want to admit that, all this time, they’ve been using you.

Still, it makes me sad.

These organizations—Cornell, Ohio State, etc.—are victims as much as perpetrators. Wansink, Croce, etc., couldn’t have done it on their own: in their quest to illegitimately extract millions of corporate and government dollars, they made use of their prestigious university affiliations. A press release from a “Cornell professor” sounds so much more credible than a press release from some fast-talking guy with a P.O. box.

Cornell, Ohio State, etc., they’ve been played, and they still don’t realize it.

Remember, a key part of the long con is misdirection: make the mark think you’re his friend.

301 thoughts on “Organizations that defend junk science are pitiful suckers get conned and conned again

  1. I’m curious if going after the largest donors for the schools or institutions might be the best alternative. I imagine individuals who fund a lot of these results are removed from the problems in the work (with the exception of government sources). But I also would like to believe that individual donors DO NOT want to fund shoddy research. If a donor were presented with alarming findings or results that might push the needle for universities to respond. That said doing something like this could be seen as unfair given the original researcher is unable to defend themselves.

  2. The PACE trial seems another good example of this, but here it is a wide-range of institutions who have not just invested their own reputation in this research, but were also involved in attempts to smear those patients who were the first to raise concerns. Not only do a wide range of UK research and media organisations now have an incentive to turn a blind-eye to the problems with this research, but it is also tied to an ideologically driven approach to cutting disability benefits that had support from all three of the major UK political parties, despite these cuts having recently been condemned by a UN inquiry as having led to “grave and systematic violations” of disabled people’s rights. Misleading research into disability, ill-health and rehabilitation can have a profound upon how people are treated, and I fear that many researchers are oblivious to the harm being done outside of the bubble of academia. There’s a real moral responsibility to get these things right.

    As with Wansink, it was only when academics from outside of the field started looking at problems with PACE that any real attention was paid. If it were not for the campaigning of patients, or Wansink’s blog, it is easy to imagine that this work would not have attracted any real critical attention from within academia.

  3. As a professor, surely you know the purpose of an institution such as a unversity is the perpetuation of itself because only through its own existence can it then fulfill additional missions such as teaching undergraduates, adding to knowledge and paying salaries that make life in a college environment something people prefer. You want to re-center the distribution to something more akin to a standard of intellectual honesty in research. But where in the set of reasons for existence is reseach itself? I’d say it’s behind perpetuation, behind many of the attributes associated with perpetuation – from money to prestige – and that it conflicts with some of those elements: example would be prestige can be achieved by publicity (even if for crap work as long as it doesn’t involve certain but not all instances of outright theft, meaning those that manage to attract general opprobrium), so prestige may be number of papers printed or a citation ranking. I can’t see how you could re-center the distribution the way you want. Anyone modeling the behavior would say this result describes the system well, that you’d expect defenses from institutions because they easily see pathways to their self-perpetuation in defense and only risks in acceptance (which in some ways becomes treated as defeat). I’d argue the identification of acceptance with defeat is a huge problem generally, but that’s not point.

    Thing I really like about what you’re doing is the way you’re saying this matters so much. There is a deep relationship between statistics and ethics, though perhaps not to morals. I think in many of the cases the people involved are both attention and validation seeking but also generally what one might call well-intentioned. Does anything shows the concept of multi-level modeling better than the shift from ‘well-intentioned = larger moral choice’ (like “I’m not killing someone”) to ‘well-intionened = context of research studies’ (and then into or up to the next level of statistically reported research studies)? And you have one thread running out of ‘larger moral choice’ from ‘it isn’t hurting x’ to ‘I believe it’ (and sub-threads like ‘I believe it absolutely even under torture’ to closer to ‘well, at least it isn’t hurting x’). Is there a word ‘upto’? Should be.

  4. Oh no, all this talk has reminded me that my own alma mater of UC Berkeley has it’s own con artist!

    I was only an undergrad at the time but I spent most of my time around grad students and I heard them talk about him. Peter Duesberg holds some very controversial scientific views, and has tried to get them published:

    Basically he doesn’t believe HIV causes AIDS and he believes cancer is due solely to aneuploidy, or something…I haven’t bothered wasting my time reading his work:

    And guess what? Apparently UC Berkeley conducted an investigation into his controversial HIV work but found “insufficient evidence…to support a recommendation for disciplinary action.”

    To be honest, I don’t know quite how bad this researcher is. I don’t know if he belongs in the same circle of hell as Wakefield. I remember some discussion at Berkeley about there being value in someone voicing contrarian opinions.

    So unlike Wansink, who just told everyone what they wanted to hear, this guy has been going against a mountain of scientific evidence. I suppose you’ve got to admire that to a degree. If you have to quack, you should make sure you quack loudly and clearly.

    • Well, I think there are very important differences between holding unpopular/incorrect views, and misguiding people about data/level of evidence, etc.

      I think it’s very important to have unpopular views still be heard in the scientific community, especially if the researcher can provide valid evidence. Even if the researcher cannot provide evidence, I think there is nothing wrong with them holding their views. It’s the just that the scientific community should (at least eventually) stop paying attention when they realize they cannot provide valid evidence to support their hypothesis.

      Which is entirely why it is so important that researchers do not misrepresent their level of evidence.

    • Jordan:

      Duesberg spoke in the stat dept back when I was teaching at Berkeley, decades ago. It was a really weird talk. I think the guy who invited him had some sort of pride that he was inviting someone so controversial. The whole thing was strange, in that, sure, Duesberg had a right to speak, but it wasn’t clear what this had to do with statistics.

    • To be honest, I don’t know quite how bad this researcher is. I don’t know if he belongs in the same circle of hell as Wakefield

      Duesberg’s mavericky denial advocacy made it easier for a few vitamin pimps to convince Mbeki that South Africa could defeat the AIDS problem with vitamins and beetroot and garlic rather than with anti-retroviral therapy.

      But it would be hard to determine how many of the 343,000-365,000 preventable deaths should be attributed directly to Duesberg’s tenure and his proud independence of thought. Utimately the scammers and the South African authorities had more agency.

    • Don’t forget, almost all this stuff is based on stringing together NHST tests done by people who think p-hacking is normal. So I wouldn’t be so quick to accept the mainstream opinion…

      At this point, cancer research has devolved into “cancer is many diseases so we can never find that cure we were promising” and now it is believed the HIV “mutates so fast it may be impossible to make that vaccine we were promising”. So I’d say both research programmes have been disappointing, and definitely not delivered on initial promises after the hundreds of billions of dollars funneled their way. I wouldn’t be at all surprised if there were some major conceptual errors sitting at the center of current understanding.

      • +1
        Yet it might be the case that the hundreds of billions of dollars spent were necessary for people to really see that the original programs were fairy tales.

      • Yes, I’m a biologist and I’m very aware that p-hacking is not limited to psychology. I also really value questioning what we know, but I haven’t found any merit in Duesberg’s views.

        I think a recent controversy in the NBA may be illustrative here. Kyrie Irving claimed that the Earth is flat, and some NBA players backed him up. When people challenged him he said things like “how do you know the Earth isn’t flat”.

        I actually found his position more defensible than most. A lot of people said things like “how do you fly to Australia if the Earth is flat”. I didn’t find this argument convincing, of course you could fly anywhere if the Earth is flat. A more convincing argument would have been something like “how can you fly in the same direction and end up back where you started”.

        To me I think the simplest argument is when you look up at the moon or sun they are circular, so by deduction the Earth being another celestial body is also probably circular/spherical.

        I think we could use a lot of similar simple arguments to show that Duesberg is simply a quack, and a stupid quack at that.

        Just because each cancer is unique (why wouldn’t they be unique), that doesn’t in any way imply cancer is not due to unique mutations and instead is due to aneuploidy. We have developed specific drugs such as imatinib or herceptin that target specific alterations which are very effective. Why do these drugs work if the only thing that matters is aneuploidy? Similarly we have various drugs for HIV which are very effective. You can even be “cured” if you get a bone marrow transplant from a donor with a CCR5/CXCR4 mutation. If HIV doesn’t cause AIDs then why do these treatments work?

        So yeah, I guess I value contrarian opinions, but only if they make some good points. If Duesberg said something like “we need to be careful about what we think we know because of the reproducibility crisis” I would take him a lot more seriously.

        • Just because each cancer is unique (why wouldn’t they be unique), that doesn’t in any way imply cancer is not due to unique mutations and instead is due to aneuploidy. We have developed specific drugs such as imatinib or herceptin that target specific alterations which are very effective. Why do these drugs work if the only thing that matters is aneuploidy?

          I think you are way underestimating how easy it is to come up for explanation why some drug “works” (ie, that A is correlated with B). A major thing I learned from my time in biomed research is that this is simply too vague a “prediction” to be of any use in distinguishing between explanations. For example, this is from a paper with Duesberg as author:

          Recently, we have demonstrated that centrosome defects are an early event in the transformation process of CML and occur prior to chromosome instability at the earliest identifiable step in CML development.18,19 Since CML is considered to be caused by a single genetic event, some authors argue that early centrosome aberrations are the consequence of BCR-ABL expression, leading to karyotype alterations, aneuploidy, and genetic instability.19 The occurrence of similar karyotypic alterations in BCR-ABL-negative progenitor cells under imatinib raises the question whether centrosome-associated mechanisms possibly involving ABL or other imatinib-sensitive kinases may be responsible.

          So… their idea is apparently that BCR-abl activity causes aneuploidy, which causes cancer, thus imatinib “works”. Not hard at all.

          Another thing, regardless of whether aneuploidy causes cancer, it is widely recognized as a property of cancer cells. So if we found a way to target aneuploid cells, we would have a general cure for cancer. For that reason, I would say that cancer is “one disease”.

          Regarding HIV, it seems your “cure” argument is based on a single person. I looked into it a bit and it wasn’t clear to me whether the AIDS symptoms also disappeared at the same time (beyond testing positive on HIV), so I am not sure that is relevant to Duesberg’s arguments. I couldn’t quickly find what Duesberg’s argument about the effectiveness of HAART is exactly, perhaps he just claims those studies are flawed? I haven’t checked that, but I bet we could pretty easily come up with some alternative narratives for those too (once again, all we have to explain is a vague “increase/decrease” to compete with the mainstream).

        • Again, I value people questioning established beliefs, but I haven’t found Duesberg’s arguments (that I’ve seen) at all convincing, and definitely not convincing enough to put decades of work into doubt.

          About the CCR5 “cure”. Yes it was a single person, but people with this mutation do not contract HIV. They have a natural immunity, and there is a lot more than just one of them. I think it is hard to come up with an explanation for why people with a mutation in the receptor for HIV do not get HIV/AIDS. But I suppose that wouldn’t stop Duesberg from coming up with one. Perhaps the CCR5 gene is strongly genetically linked to another gene, and this gene is what prevents these people from getting HIV/AIDS. AIDS has nothing to do with HIV, it is just a huge coincidence that people without CCR5 don’t get AIDS. So what is this magical genetically linked gene? Gee, I don’t know, maybe it has something to do with aneuploidy and cancer as well, and this is the secret to curing cancer. We might as well just go ahead and hand this guy his Nobel Prize now.

        • I think it is hard to come up with an explanation for why people with a mutation in the receptor for HIV do not get HIV/AIDS.
          it is just a huge coincidence that people without CCR5 don’t get AIDS.

          I assume you are talking about the “delta 32” allele, in that case your premise is wrong:

          The wild-type/delta 32 heterozygous and delta 32/delta 32 homozygous conditions were represented in 10.7 and 0.8% of healthy controls and in 9.8 and 0.7% of HIV-1-infected subjects, respectively.

        • Similar results here:

          We examined 901 healthy individuals from several Sicilian provinces. We found a mean (+/- standard deviation) delta32 allele frequency (fr) of 0.04 +/- 0.012. The highest value was observed in the province of Messina, with a mean delta32 allele frequency of 0.06 +/- 0.024, where we collected samples from a cohort of 114 HIV-1-infected individuals. The observed frequency amongst these patients was quite low (fr = 0.03 +/- 0.031) compared to the healthy population, although the difference was not statistically significant.

          Actually this is the first I’ve looked into the CCR5-delta32 aspect. I am surprised to see results like this in the first few primary research articles I came across.

        • Sorry forgot the link to the Sicilian paper:

          I guess it’s just a coincidence these people get infected by CXCR4 tropic HIV strains instead.

          Sure, we would then have to look at that data too. Do you have a link? They don’t measure this in the two papers I looked at.

          I am just trying to clarify what exactly you are claiming first. In particular, people without (ie homozygous for the delta32 allele of) CCR5 apparently do get diagnosed as HIV positive (not sure about AIDS, which requires other symptoms, from either of those papers).

        • I guess you and Duesberg are going to be sharing a Nobel Prize once you guys prove HIV doesn’t cause AIDs. I’m really looking forward to reading your Nature publication.

          To help you get a head start here’s a great video:

          Duesberg says HIV can’t be the cause of AIDS because it’s the only case ever of a microbe causing disease after antibodies have formed. Oh dear, how can we possibly argue with logic like that!?! It couldn’t be because it takes a while for HIV to deplete our CD4 counts, could it? And his statement isn’t even accurate, when you are first infected the immune response causes flu-like symptoms. This guy is a clear quack, and without watching more of the video I’m just going to put him in the Wakefield drawer.

          This discussion has been interesting, I guess it shows how dangerous lay people can be when they think they have enough information to comment on something. I’m not a statistician and I don’t bother to read or comment on any of the statistical discussions on this blog because I know my opinions are probably going to be uneducated and wrong. HIV and biology are complicated topics, and if you take the time to learn more biology I don’t think you will side with Duesberg, but until then I think it would be better for you to trust people who are more knowledgeable on this topic than to make uninformed statements.

          Yes, I believe there is a reproducibility crisis in biology, and results supported by a single paper with limited experimental results should be taken with a huge grain of salt, but I don’t believe that work with decades of supporting evidence in biology is wrong. I have not personally done any work on HIV, so I can’t personally confirm any of the work in the field, however I have been able to independently reproduce the results of several high profile publications in small RNA biology. If all of the work on HIV is wrong I would have to imagine that would mean most of what we think we know in biology is also wrong, and I have not found that to be the case.

        • HIV and biology are complicated topics, and if you take the time to learn more biology I don’t think you will side with Duesberg, but until then I think it would be better for you to trust people who are more knowledgeable on this topic than to make uninformed statements.

          Sorry, can you quote what I have said to make you think I have “sided with Duesberg”? Also, quote the “uninformed statements” you refer to?

          I don’t believe that work with decades of supporting evidence in biology is wrong.

          I do, in fact that this is easy to do in our world of institutionalized NHST (+ p-hacking). The goggles are NHST, it offers zero protection against proliferation of misinformation:

          I have been able to independently reproduce the results of several high profile publications in small RNA biology.

          Figuring out reproducible methods is the (relatively) easy part. The hard part comes after. Once you have data to “hang your hat on”, the orders of magnitude more difficult task of interpreting the results correctly begins. We do know that most of biomed can’t even do the first step yet, but it does happen.

          If all of the work on HIV is wrong I would have to imagine that would mean most of what we think we know in biology is also wrong, and I have not found that to be the case.

          How did you determine what was right or wrong? The problem is that the usual method being used to do this is not fit for purpose: NHST.

        • Jordan, Jordan. You’ve got it backwards. You’ve been punk’d your whole professional life and everything you know is false. In fact, you have negative knowledge Now is your red pill moment!

          In retrospect, I don’t know why people (including former me) think this is so exceptional. It is some kind of cognitive bias.

          Think about the thousands of years during which the most educated members of society spent their lives arguing theology. Today many of us would consider all that to have amounted to literally meaningless drivel that lead to mass suffering. Back then, they would have scoffed at that possibility just as you are at the possibility it is currently happening to you.

        • Anoneuoid: I finally see that you are correct.

          Yes, because we don’t have a cure for cancer all of biology must be wrong.

          Similarly, because we don’t have time travel all of physics must be fundamentally wrong.

          We need to take all of the knowledge we have accumulated the last several thousand years and burn it. It’s all tainted by that filthy NHST.

          It might appear that some of our knowledge has led to technological advances, but it was all just a massive coincidence. If you try enough random things something is bound to work.

        • Anoneuoid: My point is that you throw out the baby with the bathwater. As far as I can tell, your position is that when NHST is involved it’s not possible to develop any useful knowledge. My position is to grant that the problems you rightly raise create gross, massive distortions, but that we find out real stuff sometimes in spite of it. That’s because the durable knowledge we do get is not actually being built by reliance on NHST, but a more distributed diffuse process where the core finding has been replicated and checked over and over.

          There is more in biomed than the current hot ideas in cancer and Alzheimer’s and HIV and whatever. It’s true as it ever was that both specific findings and whole frameworks will turn out wrong. Some areas, I think we would probably agree, are already obviously wrong. But I’m pretty confident we aren’t going to find out that DNA bases don’t pair, or that neurons don’t have electrical activity, or that the estrogen receptor doesn’t bind any small molecule at all, or… or…

        • Am I being Punk’d right now?

          No, you just haven’t realized the true nature and extent of the problem yet. Reproducibility issues are the tip of the iceberg. It took me a long time to accept too, the damage done is really unbelievable.

          Regarding Duesberg, I am not too familiar with HIV research but I know a few things that really convince me the standard narrative is wrong. For example, it is supposed to be very difficult to transmit sexually (~1-10 per thousand exposures) and upon a new infection it seems that (despite the transmitter having huge diversity of variants) only 1-3 viruses ever get transmitted.[eg ref 1 seems to be a good summary]

          This seems very unlike a virus, so Duesberg is right that something seems very wrong here. To me though, it seems much more like an entire cell that gets transmitted and then the virus passes directly from cell-to-cell. In that case, these continued efforts to create a vaccine are a waste.


        • Just wait until Anoneuoid tells you that CRISPR is not real…and that neither is sequencing

          These are both strawmen. The crispr-cas9 issue is that double strand breaks are known to kill the vast majority of cells or lead to growth arrest (NHEJ is just a minor occurrence)[eg, 1], the papers always report 1/100 to 1/1000 cells are already mutants at the target site in the control group[eg, 2], therefore their results can very well be explained by selecting for pre-existing mutants.

          [1] table 2
          [2] figure 3

          The sequencing issue is related to the above in that if you get back a single sequence from a population of cells, it may very well not correspond to the actual sequence of any cell. I need to look into the consequences of this more though, it may not be a big deal.

        • If you get back a single sequence from a population of cells, it is almost guaranteed not to correspond to the actual sequence of any one cell. This is exactly the point about the “typical set” that we discussed in Bob’s post a while back. If there are mutations at various points with a certain probability then the “consensus” sequence is very much like the “zero” sequence in a sequence of normal(0,1) random variables. It never happens.

          The question though, is whether that matters much. We have several issues:

          1) “very bad” mutations kill the cell
          2) “meaningless” mutations don’t even change the protein coding or regulatory environment. Everything works the same.
          3) “minor” mutations change things in minor ways that cause individual cells to survive in their environment in better or worse condition but not by a large amount

          4) “major” mutations that don’t kill the cell do things like cause cancer, and then it’s a matter of whether the immune system clears the cell or not

          etc etc.

          It would be insane to think that a gigabase of DNA sequence is identical in every cell in your body, just on the face of it mathematically.

        • The shadow of the earth on the moon is always circular no matter where in the sky the moon is during syzygy; this is how the ancients knew the Earth was a sphere.

  5. Andrew, the idea that the universities are victims here is one possibility. But, what about an alternative view. As you probably know, the university administration is the fastest growing employment category on campus: eg.

    Now for the sake of argument, let’s say in the distant past the actual function of organizations like Cornell, Princeton, Harvard, and even less prestigious organizations like say UCLA or University of Minnesota or whatever was to educate undergraduates about things they wouldn’t have otherwise had an opportunity to learn. Whether it’s physics or journalism or food science or whatever, they actually taught classes and gained money by individuals paying tuition out of their own pocket.

    Fast forward to today, and the main sources of funds are large government grants to do biomedical research, and tuition paid not directly out of the pockets of families that want educations, but out of the pockets of families that have access to federal student loans that you can’t escape from via bankruptcy and have weird terms (you can defer payments for example, but the loan capitalizes the interest or whatnot).

    Now, add in massive increases in administration, a couple of decades of business schools teaching “Rent seeking for dummies” and you have a recipe for Cornell and Harvard and wherever to become enormous tax-advantage hedge funds whose purpose is to suck down grant money and tuition subsidies (student loans, federal student grants etc) and then use the prestige of number of grants produced to further suck down tax advantaged donations from other rent seekers who don’t have anything better to do with their massive piles of cash (names like Keck, and Broad, and Murdoch, and Gates, and Trump etc)

    Now, if you’re a large organization full of bright and cheerful newly minted Masters in Public Rent Seeking graduates how do you accomplish this task of Hoovering up piles of cash from the government? Do you?

    1) Hire careful researchers who propose multi-decade projects that set up several new animal model systems for studying diet and satiety regulation through multiple avenues including behavioral, genetic, and hormonal feedback systems?

    2) Hire a slew of people who don’t even know that what they’re doing is wrong who have a cargo cult view of science and are willing to go out with some minor preliminary grant and a couple of undergrads, sit around at an all you can eat buffet and record how many times people who sit next to windows eat sausage etc. If they don’t like what they get they “rescue” their data collection by publishing 4 papers full of p filtered crap in 4 different journals, then apply for big government grants claiming a lot of preliminary data for studying how offering fish next to pizza increases the consumption of omega threes but only in overweight caucasian males and under-weight black females who are within their peak fertility period but only if the fish has red sauce for the women, and the fish is pasty white for the men.

    I’ll give you three guesses which one of these two options I think, and since we’ve never been in the same room together if you guess correctly we’ll count it towards a Bem ESP replication.

    From this perspective, Wansinks are just tools, the goal of todays typical Uni administrator is “Hoover up the Cash” and if you can claim to have large numbers of cutting edge researchers pumping out multiple amazing true discoveries per year about problems that affect real live Americans, or you can have a faculty full of deep thinkers who study one basic important problem for decades until they finally have a truly scientifically sound body of evidence that some particular phenomenon is caused by an interaction between the foo receptor under a particular single nucleotide polymorphism but only when combined with the loss of the bar transcription factor binding site within a promoter region of baz… or whatever

    well, I know which one will be listed in the flow chart of the newly minted administrator’s Rent Seekers R U Powerpoint Deck.

    From this perspective, Wansink is just a tool, and I think that double entendre describes him pretty well. The real evil / con here is whoever attracted him to Cornell so they could use their new tool to suck down more funds.

    • Daniel
      I don’t think your characterization is far off, but it is far too simplistic. It requires the cooperation of colleagues, promotion and tenure committees, editors, referees, students, etc. So, they are either all “tools” of these administrators, or the problem is more complex than you make out. In some ways they are all tools, just as we are all tools of a number of “rigged” economic and political systems. But if we allow that to escape responsibility by blaming those “administrators” then we let ourselves off the hook too easily.

    • Your explanation reminds me of a Hollywood maxim: actors/actresses aren’t paid to act. They’re paid to promote.

      Does the acting really matter if the flick is a commercial success?…Does the quality of research really matter if grants/cash are tumbling in?

      Long run implications may vary but certainly in the short run…

    • Daniel:

      I see your point. Still, at even the most crassly instrumental level, at this point I’d think Cornell would be better to cut their losses and stop defending Wansink. But I guess we could take your analysis one step further and see Cornell’s loyalty toward Wansink, or Ohio State’s loyalty toward Croce, as a result of a commitment device, signaling “we’ve got your back” to other current and future rainmakers.

      • Andrew

        I’m not aware that Cornell U. has made any judgments on Wansink’s research other than on the original 4 pizza articles. With the subsequent avalanche of flawed papers (38? more) turned up by van der Zee, Anaya, and Brown, Cornell’s views may change.

  6. “It’s the sunk cost fallacy: you’ve invested a lot of your reputation in somebody; you don’t want to admit that, all this time, they’ve been using you.”

    It’s not about sunk cost nor being used. It’s about admitting that they themselves suck. It’s the very real cost of admitting their own failure to monitor the quality of its members, it’s the preservation of their own reputation what they are after.

    Check out UN’s Malcorra hiding pedophile cases in Africa. She went to the extent of firing the highly respected whistleblower Anders Kompass, 3rd in rank in its Human Rights division. It was all about Malcorra & Co preserving their own reputation. In classic Vatican style: the whistleblower gets fired, the protector of the reputation of the institution is considered for a promotion: head of the UN! The previous Pope also “proved his worth” to the institution by protecting pedophiles.

  7. Also, I have read the responses of Wansink to van der Zee et al.’s 150 pizza-paper errors. Many if not most of the errors appear to be due to inaccurately reported sample sizes (due to missing values) and so on. Very sloppy research, surely. But how is this to considered research misconduct? An accusation of research misconduct is a serious business. Cornell may just be moving slowly and carefully before making such an accusation. I don’t blame it.

    The biggest strike against Wansink so far is one journal’s retraction of an article for duplicate publication, not the errors in the 4
    pizza papers.

    • Anon:

      You ask how Wansink’s papers can be considered research misconduct.

      The answer is in this post, which summarizes a bunch of problems from various sources, including:
      – the table of carrots that did not add up;
      – the two tables with basically identical numbers but with sample sizes reported as 153 in one case and 643 in another;
      – the reports of three different surveys sent to 1002 people, 1600 people, and 2000 people, each one having exactly 700 respondents;
      – the table that has suspiciously few numbers whose last digits are 0 or 5;
      – the many many many tables that are not consistent with any possible data;
      – the study with students aged 8-11 who are also characterized as “preliterate children” and “daycare kids”;
      – and others.

      “The biggest strike against Wansink” is not duplicate publication, not at all. The biggest strike is that he’s publishing numbers which do not correspond to reality. This is indeed a serious business. It’s a disgrace.

      • Andrew, Anon:
        Yes, the pizza papers at this point do not appear to be fabricated, and if you believe that all of the non-sample size issues are due to incompetence/unusual statistics, then there is also no falsification and I guess no clear research misconduct.

        However, I think you could argue that not responding to emails and not sharing your data is a type of a research misconduct. When did it become okay to not reply to emails?

        I could add to the list Andrew provided which hints that some sort of falsification/fabrication is going on, but accusing someone of research misconduct is very serious and maybe we should stick to clear cut cases of misconduct.

        Throughout this investigation I have noticed Wansink make inaccurate statements. Whether this is lying or incompetence I don’t know, but I also don’t know that is matters. Not being able to accurately talk about your work should be seen as a type of misconduct.

        Case 1:
        Wansink mentions he didn’t share the data with us because it was proprietary and because of IRB issues. However, in his interview with Tom Bartlett he says he thought about sharing it but decided not to because he thought it would make him look bad.

        Case 2:
        After Nick Brown’s blog post showing duplicate publication he released this statement:
        “a master’s thesis was intentionally expanded upon through a second study which offered more data that affirmed its findings with the same language, more participants and the same results”

        He didn’t just get the “same results”, he got the exact same results. It’s just mathematically impossible. Perhaps they just accidentally used the wrong table of results in the paper, but if that’s the case they should say that instead of lying.

        Case 3:
        In his defense of the carrot numbers he mentioned to Retraction Watch the quarter-plate method was used. According to the paper this is not the case. Either the paper is not accurate or he is misremembering or lying.

        Case 4:
        The pizza papers also have bowls of salad recorded. We noticed in our preprint how unusually large the numbers were. In the data release this variable is described as:
        “Mark the amount of salad you ate (continuous rating scale)”

        But in the paper they describe:
        “In the case of salad, customers used a uniformly
        small bowl to self-serve themselves and, again, research assis-
        tants were able to observe how many bowls were filled and,
        upon cleaning by the waitstaff, make appropriate subtractions
        for any uneaten or half-eaten bowls at a location outside of the
        view of the customers.”

        So either their data release is wrong or the methods in the paper are incorrect. Given the large values I have to believe the data release contains the correct description.

        Case 5:
        In his Retraction Watch interview he says:
        “These people were eating lunch and they could skip any question they wanted – maybe they were eating with their best buddy or girlfriend and didn’t want to be distracted. So that explains many of the differences in the base figures some readers had noticed.”

        This makes it sound as if the diners filled out the questionnaires as they ate. But in the papers it is clear they filled them out after they ate, and some of the questions wouldn’t make sense unless they were answered at the end.

        Case 6:
        In his book Slim by Design he mentions doing a survey of Chinese buffet diners using secret agent tools, but says they didn’t work and they ended up using a visual method, which is also what the paper says. However, in this youtube video (at 22:20) he says that they actually did use lasers and hidden scales:

        Perhaps he just said that in his talk to make the study sound more interesting than it was, but is this science or science fiction?

        In that video he also mentions that they recorded 70 variables. Umm, p-hacking anyone?

        Okay, maybe all of that was just incompetence, and harmless incompetence at that. But once in a while he makes potentially dangerous statements. Here’s a quote from one of the pizza papers:
        “our isolated focus on the behavior of eating is particularly justified and valu-
        able since radical interventions such as gastric bypass surgery
        offer the potential for people to engage in the behavior of over-
        consumption without the typical consequences for body shape.”

        This is a person who talks at grand rounds at Cornell. I wonder if he teaches them that shrinking your stomach is a great way to be able to eat as much as you want.

        P.S. Unlike Andrew I’m not sure what the biggest strike is, there’s just so much to choose from. I think there are valid concerns about falsification given the strange numbers, but in the end is that really worse than his rampant p-hacking? P-hacking and falsification have the same result. Then there’s been his response to these serious concerns, which at every step has been to downplay the problems, act as if they don’t affect his conclusions, and he somehow always manages to throw in a statement about how amazing his work is. Maybe we haven’t made the problems clear enough. Maybe we need to be a little more bold in exactly what we are accusing him of, I don’t know.

        • Jordan:

          You write, “if you believe that all of the non-sample size issues are due to incompetence/unusual statistics, then there is also no falsification and I guess no clear research misconduct.”

          Just to be clear: (1) I don’t think all the issues are due to incompetence/unusual statistics. The carrots, for example: how can that happen? Or the tables that correspond to no possible data?

          In their letter, the Cornell Media Relations Office linked to an NIH page which defined research misconduct as follows:

          Research misconduct is defined as fabrication, falsification and plagiarism, and does not include honest error or differences of opinion. . .

          Fabrication: Making up data or results and recording or reporting them.

          Falsification: Manipulating research materials, equipment, or processes, or changing or omitting data or results such that the research is not accurately represented in the research record.

          Plagiarism: The appropriation of another person’s ideas, processes, results, or words without giving appropriate credit.

          As I wrote in my above-linked post, I think many of Wansink’s papers can be characterized as falsification in that he and his collaborators were changing or omitting data or results such that the research is not accurately represented in the research record.

          “Changing or omitting data or results such that the research is not accurately represented in the research record”: that describes all those tables that can’t correspond to any actual data.

        • I think it’s an interesting question (clearly, some of us have different approaches to the word “interesting” to most of the population!) whether it constitutes *actual research misconduct* to do a boring study where you estimate people’s BMI by looking at them, maybe using the height of the buffet counter as a helper to work out their height — as described in the article on eating habits at Chinese buffets — and then to describe the exact same study in a popular talk while claiming that you used lasers to measure their height, hidden mats to weigh them, etc.

          Assuming that the study took place as described in the article, is it a misconduct-level misrepresentation of the scientific record to then claim you did it a different way? And if so, who is the offended party? The article is just what it is (again, presuming it’s accurate). The talk is just a talk the guy gave to some random bunch of people who want to be entertained by the wacky professor from Cornell — who, incidentally (cf. the description of the fish sticks in the YouTube video linked to by Jordan, circa 11:50) seems to have very little compunction about mocking either the people whom he is deceiving, or the person cooking them. What authority is going to step in here, and on the basis of what violation of (written) norms?

        • Jordan, Andrew:

          Again, my point was that, to date, Cornell has issued only a statement regarding the 150 errors in the first four “pizzagate” articles, which it considers sloppy work, not research misconduct. I am not asserting that Wansink is not guilty of research misconduct, only that there is insufficient evidence from these first four papers for Cornell to have decided on research misconduct, which is a very serious accusation.

          Looking over all 42 papers, I think Wansink is in fact guilty of research misconduct, and I think that when Cornell has had a chance to review everything, it will come to the same conclusion. More than one journal-article retraction will be damning outside evidence.

        • Hi Jordan,

          Re “not responding to e-mails and not sharing your data is a type of research misconduct”:

          Perhaps, but this has been the norm. I’ll send you over private e-mail a couple of examples of messages that I sent to authors that never received a response.

        • I agree (based on my own experience) that not responding to emails seems to be the norm — at least, emails that point out problems with a published paper.

  8. Andrew

    My point wss that Cornell, as far as we know, has not gotten past the 4 pizza papers yet, and so, *THUS FAR*, there is evidence of very sloppy research but not research misconduct. That may change eventually as Cornell gets to the other papers.

    • Jordan:

      I followed your link. What baffles me is his claim that the data actually exist. If his data exist, how come he keeps writing papers where the purported data summaries are not consistent with any actual data. How many carrots were there? Were the soldiers all exactly 18 years old? Etc. It seems like a big leap of faith to suppose that there are real data in all his studies.

      • Andrew:
        We called the pizza buffet and they confirmed that the lab had visited multiple times. There is also a youtube video of them at the pizzeria:

        Given the date of the video I’m not sure that is the exact data in question.

        Looking through the released data I have not yet found evidence that it is fabricated.

        My theory is that if there was fabrication it occurred in studies pre-Cornell.

        Let’s imagine a purely hypothetical scenario. A researcher doesn’t get tenure twice. That researcher then realizes he can claim he has data from his previous institutions and no one would be any the wiser. The ensuing success allows him to obtain enough money and grants to actually do real studies. But there is still no reason a little fudging can’t be done here or there, I mean, who would notice? Dual publication? Sure, who reads this stuff anyways?

        • Jordan:

          I wonder what was going on with the carrots study, the numbers-with-not-enough-0’s-and-5’s study, the everyone’s-age-18 study, etc. One possibility is that these studies were entirely fabricated; another is that they had actual data, made their tables and did their calculations, and then altered the numbers in their tables to get the results they wanted, without being careful enough, Lacour-style, to go back and alter their raw data. In other cases it seems that they had one dataset (the survey with 770 responses) but they wanted multiple publications so they fabricated different scenarios for it.

          Or there could be other explanations.

          Offhand, though, if I try to think of any explanations for repeatedly reporting summaries that can’t correspond to any real data, all the explanations that come to mind involve “Changing or omitting data or results such that the research is not accurately represented in the research record.” And it’s not an encouraging sign that, when Wansink is pressed on the specifics, he comes up with nonsensical explanations (as in the carrots study) or no explanations at all.

          The scary thing is that all this happened five years ago and the duck-and-weave strategy actually worked.

          The whole thing gives me a new respect for Diederik Stapel. Dude actually came clean and admitted what he did. Which I don’t think Mark Hauser, Michael Lacour, Ed Wegman, Matthew Whitaker, etc., ever did.

        • AG said: “Or there could be other explanations.”

          One is that they just don’t have a mind for details or accuracy; they just think in vague terms, and can’t distinguish between one instance of the vague term and another. (I’ve known people like that.)

        • Martha:

          Sure, but the numbers in the tables had to come from somewhere. I can believe that they didn’t bother to check that the carrot numbers added up, but someone had to have put the numbers into the table in the first place.

    • Hi Jordan,

      I was able to pick up the text of the article from The Times (London). Most amazing was this statement: “He [Wansink] denied that any of his researchers had been careless.”

      • Carol: When Wansink continues to deny any wrongdoing it does make me second guess the inconsistencies we have found. For example, every granularity error can be excused by not reporting that there were missing values, and this does account for many of the errors we find. Maybe it’s common in the food industry to have a bunch of a missing data, and also to not report that, I don’t know.

        And there are cases where I have made incorrect assumptions, which you have kindly pointed out. So it does make me wonder if there are also innocent explanations for the other problems we find.

        But when I look at the different anomalies it’s just hard to understand how they can all be due to a little bit of sloppiness/laziness/incorrect reporting.

        Then there’s the case of him not being able to make accurate public statements. I listed 6 in my previous comment, here’s another:

        He doesn’t get the degrees correct! Or the height correct! Those numbers are literally in the abstract of the paper! Where does he even get his numbers from? Does he think no one is going to read the paper and check if what he says is correct? Do numbers just not matter?

        For me the most damning evidence is how they have responded, or I should say have not responded. The corresponding authors never replied to our initial emails, and we didn’t even say we had found problems. When journalists contact coauthors they don’t reply. The lab wouldn’t share the pizza data with us initially, and other people have requested other data sets to no avail. Is this how you respond to scientific criticism when you are innocent?

        If this whole thing has just been a giant misunderstanding on our part they should have emphatically said so instead of hoping the problem would just go away. Everything is pointing to a very guilty party that is simply trying to draw as little attention to the story as possible. Where there’s smoke there’s fire, and they are hoping as few people as possible notice the smoke.

        Even in their data release they tried to paint most of the errors as a misunderstanding on our part. And for 4 of the approximately 150 problems they are correct! We should not have tried to calculate BMI from the averages of height and weight. It is fundamentally incorrect to calculate the BMI this way. A nonlinear transform of the average is not equal to the average of the nonlinear transforms. I’ve actually made this mistake two other times in my blog post! But a reader kindly pointed out the error and I corrected it (in the post), and I’m working on figuring out how to correct the preprint/submitted article.

        But guess what? In their response they also said this error applies to the cm to in and kg to lbs conversions. But those are linear operations! So those are fine. They were probably so happy to see we had a mistake (and at least we made it clear how we calculated BMI in the preprint) they jumped on it and assumed the mistake applied to the other transformations. But it doesn’t. So 4 out of the 150 errors are wrong on our part (actually they don’t provide height and weight in the data release so the BMI could still be wrong), but with the data release we could easily report another 4 problems to replace those errant errors with even more problems.

        Really the biggest problems that can’t be explained by incompetence/laziness/sloppiness are the massive amounts of reused text, several cases of reused data, and the very interesting citations:

        His citations don’t make any sense, he cites studies which appear to have no public evidence of ever taking place, and he hides this fact by citing some other paper or his books.

        So yeah, I don’t know if we are ever going to find out exactly what goes on in that lab. We emailed the “postdoc who said no” and got a no comment. Other members of the lab also seem unwilling to comment. Wansink clearly is never going to provide any meaningful explanations, Cornell isn’t going to investigate, what do they have to gain? If Cornell finds that they have the American counterpart to Stapel that would follow them for years.

        • Hi Jordan,

          I very much appreciate how difficult and frustrating this is for you. I have a situation now with a different set of authors that is literally keeping me awake at night.

          Viewed over all the information that you and Nick and Tim have compiled with such dedication (not just the four pizza-parlor articles), the situation seems to have resulted from a messy complicated combination of incompetence, carelessness, lack of professionalism, disorganization, hubris, and yes, probably dishonesty, too.

          But you need to give Cornell time to wade through it all.

          If the journals in which these articles are published issue any more retractions, Wansink will look very bad indeed.

    • I love this line: “But this data took two months to collect. Every day we had to drive 45 minutes to get it. It’s too big of a data set and too rich of an opportunity to waste.”

      Reminds me of a buddy I had in grad school. His project was comparing the effects of herbivory by deer, birds and herbicide on regrowth after clear cutting on treee farms in Oregon. He had to drive up to four hours, then hike up steep slopes with hand tools and fencing on his back, to install deer fences, bird netting or just fence poles with no fencing or netting (the control). The plots were selected in consultation with a statistician using some variation of something out of Thompson’s book and he had to hike to those exact spots whether they were 30 feet or a half-mile from the nearest road. After all the set up he still went back regularly over four years to carefully measure growth as percent cover, an estimate of biomass, and keeping track of various species.

      He didn’t mind the physical work so much as meeting with the department statistician (a petite woman who barely stood 5′ tall who regularly scared grad students shitless by calmly asking them “What’s your scope of inference) on setting up his analysis. The analysis was of course was mostly decided in advance, but it was still really tricky for him because it combined aspects of repeated measures as well as multilevel framework for the multiple plots on multiple clearcuts.

      That was one study where I had no issue believing in the p-values. But I guess we should give poor old Brian a break because had to drive 45 minutes (one way!) listening to the TED radio hour in his Prius. I believe while my buddy was waiting out a lighting storm he had time to whittle the worlds tiniest violin out of some good old Oregon Doug fir. We should play the worlds saddest song on that for Brian.

      • Dalton: One thing that always confused me was how in his original blog post he said:

        “this cost us a lot of time and our own money to collect”

        How much money could this have cost? It appears they used undergraduate volunteers to collect the data who didn’t end up on the papers. Okay, maybe they provided the undergraduates with transportation. And sure, I guess they had to pay the pizzeria to do the study, how much could that have been? Even if they paid for the meal of every single diner there were only 139 diners. Although I guess it’s possible the 139 is not every diner that needed to be comped. Still, this is a lab that gets millions in funding, or at least claims they do.

        If this study was “expensive” for them and took a lot of time how do they do studies where they survey over 1000 people? I think we can come up with a few possibilities.

        The worst part about this whole thing is the quality of the data. If you look at the data release there are a bunch of missing values or entries that just don’t make sense and should be thrown out. And yet Wansink continues to refer to this as a “rich” data set. If this is an example of one of his “rich” data sets I don’t even want to see what the other data sets look like.

        • Just the distribution of “bowls of salad eaten” in the released dataset is hilarious. Apparently 10 people each ate exactly 5.5 bowls and 8 people each ate exactly 7.9 bowls. You would expect the number of bowls consumed to follow something like a Poisson distribution (which is what the slices of pizza do), but the histogram looks like a child scribbled it.

        • Nick: Those must be ounces of of salad, not bowls. But yes, the distribution is odd. I wonder if it could have something to do with the bowl size?

        • Never mind. The articles *does* say bowls (“uniformly small bowls”). They must have been tiny for some people to eat 7.9 bowls of salad.

        • Although the article says bowls, the STATA code in the data release describes that variable as:
          “Mark the amount of salad you ate (continuous rating scale)”

          So it is unclear to me exactly what the numbers correspond to. Regardless, the pileups at strange numbers such as 7.9 are extremely odd.

        • Jordan,

          Thanks! I wasn’t able to open the Stata code on the computer I was using earlier. Now I can, and the code does say that a continuous rating scale was used. It’s impossible to know what the scale was from this description, as you suggest.

        • Hi Jordan:

          There is now (April 24) an editorial note on Wansink’s “Eating Heavily” article stating that salad consumption was measured on a 13-point rating scale.

        • Thanks Carol, I’ve successfully managed to replicate most of the numbers in the original papers along with the values in the response tables in addition to the values outputted by their STATA scripts. So I don’t think the data or original statistics were fabricated, but that doesn’t mean there aren’t problems. I’m working on a response.

        • Hi Jordan,

          I have no doubt that even if the data are correct, many design/methodology/statistics problems would be discovered in these articles, if someone were to dig deeper than the top-level analysis that your GRIM/GRIMMER/etc. software does.

        • Hi Jordan,

          If it’s a 13-point continuous rating scale, how could a person get a score of 7.9? Maybe we should take this to e-mail.

        • Hi Jordan,

          “our own money to collect”: It shouldn’t make any difference who paid for the research. Wansink is implying that he would be willing to waste someone else’s money (by not doing the extra analyses that resulted in the four pizza papers) but not to waste his own. Ugh.

        • I’m very curious what “our own money” means.

          In the biomedical sciences typically you pay for everything with grants, with the possible exception of personal computers/conferences/etc. The idea of running an experiment literally with your own salary is pretty much unheard of, although my previous PI did suggest we donate to the lab (which I wouldn’t mind if the PI also donated a similar fraction of his salary).

          So does his “own money” mean his salary? Does it mean some pool of money the lab has for a rainy day? Is it money from one of his private clients?

          I’ve also been confused why the Turkish grad student was unpaid. If the lab has these huge grants in addition to large private clients why didn’t they have the money to pay her? I’m beginning to think this lab is just one giant house of cards and actually doesn’t have nearly as much money/clients as they claim.

        • Hi Jordan,

          It isn’t clear what “our own money” means. Clearly this is not public or private grant funding. I doubt that it is Wansink’s salary, either. Possibly he is using money he was paid to give a talk. I see that he is listed at
          the speakers bureau allamericanspeakers.con with a fee range of $30K to $50K per talk (yes, per talk). He is also listed at HarperCollins Speaker Bureau and a couple of others.

          Another possibility is that he is devoting proceeds from sales of his book(s) to his research projects.

          A third possibility is consulting fees.

          I have no idea about the Turksih graduate student. It’s possible that she came with funding from her own country.

        • Carol wrote: ” I see that he is listed at
          the speakers bureau allamericanspeakers.con with a fee range of $30K to $50K per talk (yes, per talk).”

          !! This sure makes it hard to muster up any sympathy for him

  9. Jordan Anaya wrote:

    Anoneuoid: I finally see that you are correct.

    Yes, because we don’t have a cure for cancer all of biology must be wrong.

    Similarly, because we don’t have time travel all of physics must be fundamentally wrong.

    We need to take all of the knowledge we have accumulated the last several thousand years and burn it. It’s all tainted by that filthy NHST.

    It might appear that some of our knowledge has led to technological advances, but it was all just a massive coincidence. If you try enough random things something is bound to work.

    It isn’t for thousands of years. This is something that started in the 1940s and has been spreading gradually. Once everyone trained by people in the properly-trained pre-NHST generation has retired/died off, that is when you really get to see what NHST does. We see this now in education research, social psych, etc; who were first to adopt it. Biomed is headed straight at the disaster, right on track.

    And we know the problem, it is really simple. The original idea was to test your hypothesis and compare it to other people’s hypotheses. Instead, NHST compares your hypothesis to the strawman of coincidence/chance/whatever that no one believes. The old way encouraged coming up with precise quantitative predictions to allow distinguishing between various theories. The new way encourages leaving your hypothesis as vague as possible so it can be consistent with the max range of possible outcomes.

    Jason Yamada-Hanff wrote:

    Anoneuoid: My point is that you throw out the baby with the bathwater. As far as I can tell, your position is that when NHST is involved it’s not possible to develop any useful knowledge. My position is to grant that the problems you rightly raise create gross, massive distortions, but that we find out real stuff sometimes in spite of it. That’s because the durable knowledge we do get is not actually being built by reliance on NHST, but a more distributed diffuse process where the core finding has been replicated and checked over and over.

    There is more in biomed than the current hot ideas in cancer and Alzheimer’s and HIV and whatever. It’s true as it ever was that both specific findings and whole frameworks will turn out wrong. Some areas, I think we would probably agree, are already obviously wrong. But I’m pretty confident we aren’t going to find out that DNA bases don’t pair, or that neurons don’t have electrical activity, or that the estrogen receptor doesn’t bind any small molecule at all, or… or…

    For sure people can figure things out despite NHST. It is just an obstacle placed in the way. I don’t think those examples you give (eg neurons firing) work well here though, that was all figured out before NHST was adopted. You can check my comments on HIV and CRISPR upthread that presume certain basic things are correct (cells exist, double stranded breaks to DNA is toxic them, etc).

    • Anoneuid:

      What I find particularly odd about your argument is that you are suggesting that all the researchers who were taught NHST think that “p less than 0.05 equals true, otherwise false” and that’s it. If that’s what every scientist believed, I would have no faith in science either.

      But every scientist I’ve ever consulted seems to understands that (very sadly) “p less than 0.05 equals publication”, and fully grasps that this is very different than “p less than 0.05 equals true”. Everyone I worked with seemed to understand and want to fully explore the theories of how things worked, and realized that achieving p less than 0.05 was a necessary hurdle for publication rather than an uncovering of God’s truth.

      Yes, this means you shouldn’t believe every new idea freshly published. But that’s how science has been since the beginning of time. Even mathematical “theorems” given disproven all the time.

        • Peers…who also understand that p less than 0.05 doesn’t mean God’s truth.

          As I have said, I don’t think there’s a researcher out there who believes that just because something got published means it’s absolutely true. That includes reviewers. But it does mean that they’ve crossed a minimal threshold of necessary evidence (assuming no fraud/bad statistics/etc, of course) to be published.

          And believe it or not, publication usually also means that the theory behind the paper SHOULD hold some water without the empirical evidence as well. Of course, that gets very subjective…which is exactly why we run experiments to test our hypotheses.

          In your posts, you seem to believe that because of NHST, scientists run around with their eyes closed, only making decisions based on biased die rolls, without a single further thought. Perhaps that was seen in the people you worked with, but that really does not describe the people I’ve worked with.

        • But it does mean that they’ve crossed a minimal threshold of necessary evidence

          Evidence regarding what (in general)?

        • Also, I prefer “Omniscient Jones” to “God” in such examples because the latter carries a lot of other baggage.

        • Cliff AB. I agree with you mostly. There are many things wrong with biology, but I don’t think NHST rates above things like grant funding incentive structures, problems with publication and peer review, wasting money on things that didn’t really have much chance of doing anything much useful in the first place, lack of data availability, lack of generalizability of findings outside a narrow model organism, funding research on highly visible health issues that affect only very few people, politically motivated earmarked funding… etc

          NHST facilitates a lot of crap, but biologists tend to do informal replications as the first step in trying to use something published to do something new, and if things don’t replicate approximately, they get ignored. Where you’re likely to find real serious problems with NHST noise is in fields where no one much cares about the published results and so no one is motivated to study them further starting with a basic replication.

          Also my impression is that under-funded areas of biology have generally higher quality on average. If you study the interaction of various populations of wildflowers in mountain areas or the growth of saplings in fire-cleared areas or the mating behavior of stickleback fish or whatever… you probably do this out of a love for science and an interest in basic mechanisms, not out of a desire to run a big lab with lots of grad students and tens of millions of dollars per year in funding.

        • Daniel: Yes, I completely agree. I don’t want to come across as saying the field of biology is an optimal well oiled machine…but I also don’t think it’s full of dolts. And I agree with you, I think the issues that arise are caused much less from the use of NHST, but rather the politics and incentives that arise from the whole grant process.

        • Cliff:

          I can’t be sure, but when it comes to medical research and policy research, I really am concerned that null hypothesis significance testing is rendering a lot of research useless.

          Treatments typically have small effects, and the standard approach seems to be to compare treatment to control on some people in a between-subjects design and then hope for p less than .05 (or to find p less than .05 in some forking path). This can result in research reports that have essentially zero value (or even negative value to the extent that they waste people’s time), even if the researchers are completely honest and well-intentioned and have solid grounding in their substantive fields.

          That is, even in a world without “dolts” or hacks or publicity hounds or corruption or bad incentives, if researchers in a field are studying small effects using null hypothesis significance testing, everything could be a disaster.

          This conclusion, to the extent it is correct, has two important implications:

          1. We should be very concerned about all this published research, even from people whose integrity and substantive expertise we respect;

          2. Fixing the incentives for publication are fine, but we should expect this to have only have very indirect impacts on research quality (if, after awhile, researchers realize they can’t get their “power = .06” studies published, they might start taking measurements more carefully doing more within-subject studies; but I fear it will take people awhile to figure this one out).

        • Anoneuoid: For the record, I pretty much agree with all of that. It’s the leap from “most of this is crap” to “everything we think we know from the last 50 years is wrong because of NHST” that goes too far (or at least comes off as too simplistic in the context of a blog comment). For instance, I still think the “old way” of directly comparing competing hypotheses is still what people like best and what the people who I most respect shoot for—that’s the stuff that gets a lot of excited chatter from my corners at least. Of course, as you say, there are other schools of thought… so it goes.

        • Andrew, I think your concerns regarding direct medical and policy research are well founded. My comments were more directed at biological research on mice and the like where direct experimentation is more straightforward. On humans people seem to love the rct for NHST on difference in means BS.

        • I also agree with Andrew’s comments re rct’s — there are big problems there. My impression is that the label “gold standard” is one part of the problem — too many people seem to see rct’s as a machine where you set it up, turn the crank, and get an answer, with no real thinking involved (e.g., about quality of measures, blinding, how to deal with drop-outs, etc., etc.), and with plenty of opportunities to “nudge” the outcome.

        • All I can say is these papers coming out claiming something like half of biomed research is unreproducible in principle (the description in the paper is worthless and the actual methods have been lost even to the original authors) and half of what remains does not reproduce when checked “formally”, is wholly consistent with my experience.

          But as I keep emphasizing, the reproducibility problem is a relatively minor issue. The bigger problems arise when you do get a replication but misinterpret the results, which is extremely easy to do when all you do is compare how well your vague favorite explanation explains them rather than “chance”.

        • Here is a project for someone who thinks biomedical research is currently functioning acceptably. Check out the below paper presenting a nice model connecting a signal transduction pathway with a cell behavior. Unfortunately the model can’t really be tested because, as they write:

          Although the signals that transduce the external cues to the GTPase network are becoming clear (21), most of the chemical parameters remain unknown. Because many of the reaction coefficients in Fig. 1 B are also unknown, we allocated a number of possible parameter sets to qualitatively analyze the kinetics of these reactions.

          The paper is from 2005, I am sure many other people would have models that could use constraints like that too. Has anyone published such data in the last 12 years?

          I didn’t check, but I bet not. I bet rather than collecting useful information like the reaction coefficients, there has been much bumbling about checking whether p is less than 0.05 after they do this or that.

        • I’d add to Daniel’s list of things in biology that are bigger problems than NHST: Poor design and analysis of experiments. (Not to say that all biology experiments are poorly designed or inappropriately analyzed — just that I’ve seen problems here often enough.)

        • This is what separates the real scientists from the people with PhDs and no clue. Phrases like:

          a) You show that you’ve knocked out the gene by the fact that you don’t get a PCR band, but I don’t see any bands at all on this gel, where is your positive control?

          b) How did you show that the cells didn’t just die due to generalized toxicity of the compound?

          c) What other kinds of cells does this compound stain? Have you don’t it in whole mount to show that it doesn’t also stain X?

          etc etc.

          I’ve heard the stories, where people with PhDs and postdoctoral training and everything spend 6 months or a year and they have a really confusing project with lots of different assays involved, and they present their data at a group meeting and it turns out their whole project was based on a false premise right at the beginning caused by not having sufficient controls to figure out what caused the initial “interesting” result that wasn’t interesting at all, it was just that they left out some step in the reaction or failed to consider that a compound might just be toxic, or didn’t actually knock out the gene, or the PCR primers were designed wrong, or whatever.

        • As more evidence that people do actually believe their NHST results, I learned earlier that for most of the 20th century infants were thought to be incapable of experiencing pain. This lead to surgery without anesthesia, use of (“non-“)pain response as a diagnostic tool, and other awful things. In continued until a mother noticed in the 1980s when her son was not anesthetized during surgery.

          Amazingly, the first article I read on this topic puts blame on NHST:

          For example, unthinking adherence to null hypothesis testing, with its requirement that the scientist disprove the assertion that nothing has happened as a result of the experimentation, undoubtedly contributed to infant pain skepticism.

          NHST literally lead to widespread, institutionalized baby torture.

        • NHST is used to justify anything people think up, it does nothing to filter good ideas from bad. I haven’t seen anything to indicate baby seal murder is an example, but it would not surprise me.

          Anyway, did you read the paper or look into this at all before making your dismissive comment?

          PS. I do respect what you are doing with regards to Wansink’s ridiculous level of incompetence and/or BS (keep at it!), but I am surprised to have failed to elicit any worthwhile discussion or insight from you about any other topic (your only reference used this whole time was a youtube video… just consider how that looks).

    • Jordan wrote: “And we know the problem, it is really simple. The original idea was to test your hypothesis and compare it to other people’s hypotheses. Instead, NHST compares your hypothesis to the strawman of coincidence/chance/whatever that no one believes. The old way encouraged coming up with precise quantitative predictions to allow distinguishing between various theories. The new way encourages leaving your hypothesis as vague as possible so it can be consistent with the max range of possible outcomes.”


  10. These institutions are no different than Fox News. They have a star who generates revenue who has failed professionally. They defend him as long as he keeps generating revenue and the bad press related to his professional failures doesn’t endanger them in other ways.

    • I like this analogy. I guess Fox News recently gave Bill O’Reilly a bunch of money to go away. I wonder how much Cornell would pay to make its Wansink problem go away. Or alternatively, I wonder how much they would have paid us to never investigate his work in the first place. Presumably, despite the backlash on his blog post, he would have continued to happily p-hack/falsify his data for the next couple decades and continued going on TV shows and generating good press for Cornell.

      • Jordan:

        I’ve seen this happen at Berkeley and Columbia, where a prof gets involved in some scandal and then the university effectively pays him to leave. It’s funny: you’d think that if the prof is in a scandal, that he’d have zero bargaining power and the university could just push him out, threatening to call the cops if he complains. But in practice it seems that the reverse is the case: the university is so afraid of bad publicity that it cedes all the bargaining power to the scandal-plagued prof. I suppose that similar things happen in industry—“golden parachutes” and the like—with the main difference being that the dollar values are much higher in the corporate sector.

        • This has happened several times at UPenn, which apparently has a lower tolerance for unethical behavior than some other schools. It may very well be concern for bad publicity at the Ivy League schools and other big-name schools. Another consideration is that it is so difficult and so expensive to get rid of a tenured professor that some schools would rather go for a money settlement.

  11. I’m very curious about this statistic. Apparently Amy Cuddy claims she has been mentioned over 600 times on this blog:

    I have to assume she did a google search of her name and this blog and saw 600 results, which would include mentions of her in the comments. For what it’s worth I tried this and only got 73 results. Although I only searched specifically for “Amy Cuddy”. I guess the search should be widened to also include “power pose” etc. which technically would also be mentions.

    I must admit though, the idea of her reading this blog every day and keeping a tally of every time she gets mentioned is pretty amusing.

    • Hi Jordan,

      If you count the number of times that Amy Cuddy has been mentioned in Andrew’s postings, plus the number of times that she has been mentioned in comments on those postings, the sum would greatly exceed 73, I’m sure.

      She once sent me private e-mail about a comment that I wrote on this blog, so she apparently does monitor the blog to some extent.

    • Jordan:

      Recall that Cuddy once wrote a paper whose main claims were supported by t statistics reported as 5.03 and 11.14, but upon recalculation the values were actually 1.8 and 3.3. So I wouldn’t take any numbers from her at face value. Beyond this, perhaps Cuddy feels offended-by-proxy when other researchers are criticized for statistical mistakes, so perhaps she’s also counting all mentions of Bem, Hauser, Kanazawa, Wansink, Tol, Stapel, etc etc etc. Add all these up and you could definitely get to 600.

      • Andrew: Yeah, in support of her argument that these “new” types of scientists are despicable she mentioned getting death threats:

        Unfortunately I don’t think there is a live stream of her talk, but it doesn’t appear she went into any details. Her supporters, i.e. apparently everyone in the audience and a few people on Twitter, seemed to take her statement to mean that these death threats came from one of these nasty bloggers.

        Sure, I guess anything is possible, but I find it very difficult to imagine these death threats came from a scientist, let alone one of the bloggers such as you or Uri who have criticized her work. It seems much more likely that one of the 40 million people who watched her TED talk sent her a death threat, and it’s not even clear to me if the threat was in any way tied to criticism of her work (this could have happened even if everyone was drinking her Kool-Aid).

        As further evidence for how impersonal these attacks are she mentions that you (I assume she is talking about you) never met her:

        Umm, I don’t really know how this is relevant. Before posting the Wansink criticism as a preprint should I have flown to New York and given him a chance to explain himself in person? We emailed his lab and the coauthors before posting. Was that not enough? I like the thought that if only I visited the Cornell Food and Brand Lab and saw first hand all the amazing work that they do that I would be have been compelled to look past the hundreds of problems in their papers.

        I also like all this talk about how we “target” people, and perhaps pick on women more:

        And how we specifically pick on people out of “jealousy”:

        Umm, I’m pretty sure there’s isn’t “targeting” going on. I think we are mostly just going about our business and once in a while something comes across our desk that we can’t ignore. I didn’t know who Wansink was before his blog post. Heck, I didn’t even know about your blog before Wansink’s blog post.

        It is pretty amazing that not only does Amy Cuddy still get invited to conferences, but people eat up what she has to say. I guess she knows her audience though, because she seems to know that not everyone was going to be enthralled by her talk:

        If anything her talk generated bunch of discussion on Twitter. Didn’t really see anything too compelling, although it looks like it might have inspired James Heathers to work on a long blog post. Here are some excerpts of his draft:

        Maybe I’ll write a blog post at some point as well.

        • Jordan:

          Some people sent me those links but because I have no idea if Cuddy was speaking about me in particular, it’s hard for me to respond to anything she said. Of course I have never sent anyone a death threat. I do think it’s ok to criticize published work without having met the author. Indeed, one of the key purposes of publication, in all its forms, is to reach people whom the author has never met.

          Cuddy might consider contacting the editors at Slate to complain about this article. Maybe she could get Slate to introduce a new policy that the authors of any articles referring to living people be required to personally meet those people before publishing anything negative about their work. It could cost a lot in plane fare, but maybe it’s worth it!

        • Jordan Anaya:

          1. You cannot possibly have knowledge of all the communication that came to Cuddy and about what, so it is unclear how you are drawing any conclusion about death threats that is different than her claim.

          2. I suspect you might be a bit more open to the claim if someone were claiming that Cuddy had made the threat toward someone attacking her work.

          3. You may not like the term applied to your behavior, but you have clearly targeted Wansink, Cuddy, et al. That said, those who claim to understand your motivations for having targeted those you have.

        • Jordan,

          I can certainly understand why Cuddy would be worried about a culture of fear and distrust when reading this blog.

          Jordan is skeptical about Cuddy’s claim of having been mentioned 600 times on this blog and confesses to be amused by “the idea of her reading this blog every day and keeping a tally of every time she gets mentioned”. Andrew then joins in with the fun remarking that Cuddy is unable to tell the difference between two very different pairs of t-values, so he “wouldn’t take any numbers from her at face value.”

          This, of course, is not in the least condescending and is all in the service of improving scientific methodology.

          Jordan also finds it “pretty amazing that not only does Amy Cuddy still get invited to conferences, but people eat up what she has to say”. Obviously, it is impossible that someone who’s done a bad study, got criticized for it and then refused to acknowledge the problems could have anything of value to say to a scientific audience.

          Well, it also seems quite obvious that you guys (as well as others here) are not exactly Mr. Sensitive and your hard-ass style of pointing fingers may be perceived as intimidating or even bullying by others, particularly women. Cuddy (according to Katie Korker’s live tweet) makes a great statement: that “we need to find ways to let people safely be wrong”. This blog certainly does not always contribute to this goal (and I include myself, insofar as I have probably also made a few derogatory remarks).

        • Alex,

          There is already a way for people to “safely be wrong”. It’s called tenure. It allows you to make many, many mistakes and have them pointed out. Keeping tenure, of course, requires you to be honest about your methods and analysis and, to an extent, to acknowledge your mistakes and correct them. Blogs don’t “endanger” anyone’s tenure as long as the mistakes are honest and acknowledged.

          So the (I presume) sarcasm of this sentence is somewhat misplaced: “Obviously, it is impossible that someone who’s done a bad study, got criticized for it and then refused to acknowledge the problems could have anything of value to say to a scientific audience.” Actually, given that last part about acknowledging problems, it IS sort of obvious that scientific audiences should take her less seriously.

          PS. Is it really true that we should be less “hard-ass” when criticizing “particularly” women? There’s something about that idea that I find disorienting. If this became a norm there’d be a presumption that research done by women had been given a softer touch in review, or that, as more women enter the sciences, the tone of criticism will change. I don’t see why gender should be a factor here.

        • Alex:

          1. There’s no evidence that Cuddy was mentioned 600 times on this blog. As far as I know, Cuddy never mentioned me in her talk so it’s not clear that this 600 refers to me in any case.

          2. Cuddy really did state two t-values as 5.03 and 11.14, even though they were actually 1.8 and 3.3. To the best of my knowledge she never admitted that this change might affect the conclusions of her paper, indeed one of her collaborators expressly stated that the change made no difference in the conclusions, which makes one wonder why the t-values were calculated at all. Then again, Cuddy is a coauthor of a famous paper, the last sentence of whose abstract reads, “That a person can, by assuming two simple 1-min poses, embody power and instantly become more powerful has real-world, actionable implications.”—despite the fact that the paper in question had zero measures of anyone “becoming more powerful.” There’s a consistent pattern here of someone making strong statements with no evidence, or being unconcerned about the accuracy of any claimed evidence.

          3. I don’t see why it’s “condescending” to point out errors in published work. People have pointed out errors in my published work and I thank them for it. A serious researcher should be glad when someone points out mistakes in their work. I’m sure that in some way, Satoshi Kanazawa, Daryl Bem, John Bargh, Amy Cuddy, Brian Wansink, etc etc are sincere and serious researchers in some way—but they share the feature that when people have pointed out serious errors in their work, that they dodge the criticisms, both in their details and in their larger implications.

          4. I’m not sure what you mean by “pointing fingers.” I’m pointing out problems with the published work and the public statements of Kanazawa, Bem, Bargh, Cuddy, Wansink, etc. As far as I can tell, adding the word “fingers” after “pointing” is just a way to make things sound worse, somehow? I suppose one could imagine an alternative world in which blogging were like the U.S. Senate and we prefaced every criticism with something like, “My esteemed colleague from Massachusetts . . .” I don’t really have the energy for such discourse, but, sure, maybe it would help. I guess there’s a reason for such formalities.

        • Hi Jordan, Alex, and Andrew:

          I think that there must be 600 mentions and probably more of Amy Cuddy on this blog. I counted incidences of the string “Cuddy” for a few threads (is that what they are called?):

          Gremlins in the Work of … 39 instances
          The time-reversal heuristic … 124 instances
          Organizations that defend … 42 instances
          Low-power pose update … 50 instances
          The Association for Psychological Pseudoscience … 35 instances
          Michael Lacour … 7 instances
          Powerpose update … 26 instances

          “Cuddy” appears in many more threads than just these seven.

          But so what? Frequent mentions of one’s name does not ipso facto mean that one is being shamed or bullied. It’s the content that matters.


        • I think a big question about this number is do the comments really count? I mean, if the post is about Cuddy, is it really fair to count every single comment as a separate mention?

          I just did this search on google:

          After clicking through to the end and clicking to list all results, I got 112. Some of these are not actual posts though.

          So yes, I obviously agree Cuddy is mentioned a lot, I’m just curious where the number came from. Is it literally every single mention, including comments? If so, how was it calculated? Did she scrape all the google hits to count every single time her name comes up?

        • Hi Jordan,

          The tweet by Katie Corker said “A popular blog mentioned me over 600 times.” My tally counted only “Cuddy” strings within a thread and its comments, but not “she” or “her” or “power pose” even when obviously referring to Cuddy.

          I have no idea what Cuddy counted, or how.

          Yes, I think mentions in Andrew’s postings and in comments on those postings count as “mentions.” She could have written “I was the subject of xxx postings on a popular blog” and then the number would be many fewer.

          But again, so what? Mentions are not in and of themselves indicative of shaming or bullying.

          I was more interested in her comment about wanting to do a network analysis. Does she think there is some kind of organized conspiracy against her?

        • I was also interested to hear that Cuddy claims that a backlash to her talk has already begun. Backlash? I know that some people disagreed with what she said during her talk, but how is disagreement a backlash?

        • Carol: Yeah, backlash was an interesting comment in one of her recent tweets. In her talk she seems to argue that these nasty criticisms of her work shouldn’t be downplayed as “Hater’s gonna hate”:

          But by referring to the smallest sign of disagreement with her talk as “backlash” she is using the “Hater’s gonna hate” mentality! A real backlash would have been if every blogger immediately wrote a post about Amy Cuddy’s talk, the story got picked up by the New York Times, Andrew Gelman gave a Ted Talk about how silly Cuddy’s argument is, etc.

        • Not 100% relevant here, but relevant nonetheless: Something Daniel Lakeland just wrote in the N = 300 vs N = 900 discussion:

          “And there are people who do all the trappings of science but don’t do the hard work of actual science, and they collect a good paycheck and lots of prestige.”

        • Hi Martha (Smith):

          I just watched the Laura Arnold video. Wow. Thanks for providing the link. But the video was published on March 28, 2017. Amy Cuddy’s comment about backlash referred to backlash to her talk at the Midwestern Psychological Association, which was not until April 20, 2017. So she could not have meant the video.

        • Carol:

          One of the annoying things about the arguments of Amy Cuddy and Susan Fiske in this area is that they are short on specifics. I can’t quite really Cuddy in this case as she was just delivering a talk, not publishing a paper. But I find it more than a bit frustrating that we’re having all this discussion about a twitter report about a talk that had no details. This is kind of the opposite of my posts, where I am very specific about the published statements that I am criticizing.

        • Carol:

          The link to Arnold’s post was originally posted by Keith O’Rourke in the comment section of another of Andrew’s blog posts, but I think it’s worth spreading.

          Thanks for providing the information about dates of the Arnold talk and Cuddy’s backlash comment. I am wondering, though, about the possibility that Arnold’s talk might have helped prompt “backlash” to Cuddy’s later talk.

        • Martha: The backlash Cuddy referred to was undoubtedly on Twitter. Apparently during her talk there was a lot of applause, and the questions asked were like “why are these people so awful?”. I’d be interested to see what would happen if she gave that talk to an open science audience, or really any audience that hasn’t had its head up its ass.

        • I really think we’re doing ourselves a disservice by relying on tweets as evidence. It’s a bit like interpretive archaeology, but for the present, not the past. So much of the context is lost and misunderstandings and over-simplification are pre-programmed. (Absurdly, we’re deliberately doing this to ourselves).

        • Andrew: Yes, these accusations from Susan Fiske, Amy Cuddy, etc. are short of evidence. And so many people seem to believe them without that evidence.

          Susan Fiske explicitly admitted that only a few people are doing all this shaming and blaming on the social media. Why then do we keep hearing about the toxic “culture” if it is only a few people?

        • Jordan Anaya:

          As you labeling me a “Cuddy supporter” indicates, you consistently engage in the same type of sloppy reasoning as you criticize others for engaging. I am not a defender of Cuddy’s research or methods, but I am a defender of anyone who is being unfairly maligned. If you want to engage in your targeting of Cuddy, Wansink, et al, fine that is your choice. But, don’t think your sloppy reasoning about it gets a pass.

          Correct answers to the “death threat” issue are: 1. “I don’t know and would have to accept her at her word.” or 2. “Cuddy has claimed a death threat, but has provided no evidence, which does not disprove it but makes it more difficult to believe.” Outright denial has no more weight of evidence than an assertion lacking evidence.

          When there are legitimate explanations for errors in research, you should those should be explicated as well. The decision not to, makes the targeting look personal and reckless.

        • Well I guess Cuddy supporters do monitor this blog.

          1. As I said, I didn’t get a chance to see the talk and I don’t know the complete context. My point is just why even mention death threats? It’s like how Wansink in one of his many blog updates mentioned previous rumors about his sexuality. What does that have to do with criticisms of his work? He clearly was just trying to get some sympathy.

          2. For a claim as ridiculous as a death threat I’m going to need to see some hard evidence before believing it came from someone in the scientific community, regardless of which side is accused of the threat.

          3. I don’t care what people call me. Terrorist sounds a lot cooler than “careful reader of the literature”.

          Alex Gamma:
          I have no doubt Cuddy has been mentioned many times on this blog. I’ve only been following this blog since December (because of the Wansink post) and I have noticed her mentioned many times. I was just curious where that number came from.

          I’m open to anyone’s ideas for improving science, even a broken clock is correct twice a day. I just don’t think the status quo for how we identify and correct problems with the literature is working, and more broadly, how performing sloppy/questionable/fake science benefits the individual at the expense of knowledge/public good/etc. The second problem is very complicated and may not be fixed in my life time, but the first problem really is not that hard.

          Currently, the standard procedure for correcting the literature goes something like this:
          1. Contact the authors.
          -They often don’t respond, won’t share data, or deny problems.
          2. Contact the journal.
          -They ask if you’ve contacted the authors already and if yes they go ahead and contact the authors themselves. The authors then usually respond to the journal downplaying the problems.
          3. Publish a letter detailing the problems.
          -I’m not sure how hard it is to get these letters published, probably depends on your prestige. The authors then often get invited to write a response to the letter and it usually just ends up sounding like a “your word against mine” argument.

          I don’t know about you, but that doesn’t seem like an acceptably rapid method of raising issues and getting problems addressed/corrected. Maybe that would have been fine back when letters were delivered by horseback, but not in 2017.

          So I really don’t understand any reason why someone can’t criticize an article in a blog post, or more recently, in a preprint, and how that could be interpreted as bullying/terrorism.

          Preprints/blog posts usually have comment sections, so the authors of the original article are free to post a rebuttal. Or better yet they can make their own blog post/preprint! However, that isn’t what is usually done. They are typically ignored by the authors because peer-review, lack of editorial standards, etc.

          It’s really not that hard to be “safe” in science. Do you see 600 mentions of Dana Carney on this blog? If there is criticism of your work either respond with evidence about why the criticism is invalid, or acknowledge that you made a mistake. Don’t ignore the criticism, or discredit the source, or change the topic by bringing up death threats/chilling behavior, harassment, etc. There is a lot of actual bullying and harassment that occurs in science, but that occurs within the walls of the ivory towers, not on blogs.

          P.S. I don’t want to get lumped together with these other terrorists, so just to be clear, I have not criticized any of Cuddy’s work. I didn’t even know who she was until I read Andrew’s blog, but I am a coauthor of Nick Brown, who I understand has pointed out some problems in her work. My only victim is Wansink, whose work provides a great case study for testing methods to detect fraud that I’ve developed/am developing.

        • Thomas,

          I am not criticizing the relentless pointing out of bad research methodology on this blog. I’m neither saying that pointing out errors is condescending (Andrew) nor that criticizing an article in a blog post amounts to bullying (Jordan).

          What I’m saying, however, is that this blog is sometimes slipping into facile and dismissive remarks about the *researchers themselves*, thereby entering into the dubious territory of social group phenomena where even rational scientists can succumb to base instincts like banding together to pick on a single

          It is *this* kind of behavior I’m calling condescending; that I can see as creating a climate of intimidation, fear and distrust; that I can see as being perceived as bullying; and that is quite the opposite of making a person in error feel safe to be wrong.

          When I added “particularly woman”, I was referring to the fact that women (incl. in science) on average might care more than men about the social quality of their collaborations and might correspondingly be more strongly disturbed by social climates dominated by male toughness.

          However, in reply to Thomas, that doesn’t mean we should pull our punches when criticizing research conducted by women. My point rather is that whenever such research criticism degenerates into a more primitive mode of person-centered derogatory remarks, this should rightfully be called out as hindering an open and honest discourse between critics and those criticized, and I can understand that some of the latter would describe this as a culture of fear or bullying.

          Again in reply to Thomas, I understand Cuddy’s call for a way to be safely wrong as a call for a social-emotional climate in which *mistakes* are targeted, not those who make them. Ideally, such a climate would permeate science on all levels, with tenure being only one aspect.

          Finally, Andrew, about “pointing fingers”. You often enumerate examples of questionable research. Sometimes you refer to such studies by impersonal descriptors like the “clothing and ovulation” study or the “himmicanes” study, but often enough you make some sweeping derogatory remarks about the researchers themselves (I once counted your mentionings of Bem over a certain time period and found plenty examples where not his research, but his person was the target of some more or less facile/derogatory remark, branding him personally as someone who was definitely out of the game, no longer to be trusted, a synonym for bad research.) So yes, this way of pointing fingers takes things beyond what I consider a fruitful, productive engagement with an “opponent” on matters of scientific methodology.

        • I just want to add that at a certain point a person becomes tied to their work. Let’s use a basketball analogy:

          If a player has a bad game we should just criticize that specific performance instead of the player in general. But if the player constantly performs poorly at some point we are warranted in labeling them as a poor shooter/defender or player in general.

          Similarly, at some point if a researcher is revealed to have performed enough bad science or has defended enough bad science we should be able to label them as bad scientists. Or do bad scientists not exist? Why can’t we call a spade a spade?

        • Curious: Exactly!

          I’ve actually been thinking about this, is there a single case where a scientist has wrongly had their work lambasted by us terrorists?

          Sure, I’m not batting a thousand on every single criticism of Wansink’s work, but overall I’m pretty sure I hit a home run by pointing out the initial pizza errors and subsequent problems.

          I haven’t personally verified that many of the papers, himmicanes, color red, power pose, etc., on this blog are justly criticized, but I haven’t seen any evidence that they aren’t, and the papers sound like complete BS, so I’m willing to believe that they are!

          Sure, you might say “that’s not very scientific of you”. But science is all about trust! How often do you personally verify that something is done correctly? We have to trust each other. And I don’t have any reason not to trust the critics of these works, and given some of the public statements I’ve seen from people like Cuddy, I have many reasons not to trust them. Also, I don’t trust anyone who doesn’t preprint their work, and doesn’t provide data or code, so that also helps.

        • I just want to add this quote to show how difficult it is to get journals to do anything:

          “A long and heated comment thread opened with disbelief from Paul Kirschner, a professor of educational psychology at the Open University of the Netherlands: “Is this a tongue-in-cheek satire of the academic process or are you serious? I hope it’s the former.” Kirschner contacted the journals that had published the papers Wansink mentioned, but told Ars that only one replied to him. That journal told him that there was nothing they could, or would, do. As a journal editor himself, Kirschner was shocked by the lack of action.”

          Even with the blatant salami slicing and p-hacking in plain sight a journal editor couldn’t get 3 out of the 4 journals to even respond to him.

          The quote is from:

    • Something that troubles me is that in her talk Amy Cuddy has taken things out of context. For example, she had a slide on which was written “… public shaming [of people like Amy Cuddy and Angela Duckworth] seems entirely appropriate.”

      This seems to have been cut from a two-paragraph comment on 7 July 2016, 12:26PM, by “mark” on Andrew’s 5 July 2016 blog posting entitled “Gremlins in the Work of Amy J. C. Cuddy, Michael I. Norton, and Susan T. Fiske,” the last line of which reads “a little public shaming seems entirely appropriate.” I remembered this because I responded to mark directly following.

      mark makes a good point in his two paragraphs, I think. The “shaming” in his comment comes across quite differently from the way it does in Cuddy’s slide. Duckworth wasn’t even mentioned in mark’s comment, although I know that he is critical of her research
      and her tactics. Cuddy completely ignored the fact that the climate can be just as hostile, if not more so, for the critiquers as it is for the critiqued.

      Also, apparently Cuddy said in her talk that she had been the victim of sexual harassment and death threats. If so, that’s terrible, of course, but she seems not to have distinguished between the people doing these things and the people making serious and valid scientific criticisms of her work but lumped all of us together. I just cannot imagine any of the people on this blog (or the other blogs where Cuddy’s work had been discussed) making death threats or sending her porn (as she stated on Twitter). Such people are probably from the general public.


      • Carol,
        Your comments about Cuddy taking things out of context add to my impression of Cuddy as an example of “people very different from me”. I don’t mean this pejoratively, but just that she seems to belong to a group of people whose thinking, worldview, reality, whatever, seems so different from mine. I have had the same impression of Wansink. (To be honest, I think of them as “people like my mother”.) I in some sense feel sorry for them, but still find them very difficult to deal with.

  12. Continuing against the idea that no one actually uses NHST (they just do it under duress to get published)…I found a nice example in an archaeology article:

    Of course, an astronomical interpretation is not mandated by the presence of the scorpion; one might attempt interpretation instead in terms of hunting or migration patterns, mythology, or any other coherent system or framework. Indeed, we must also consider the possibility that the symbols on Pillar 43 were not intended to convey any specific meaning, beyond depictions of common animals. However, our basic statistical analysis (see later) indicates our astronomical interpretation is very likely to be correct. We are therefore content to limit ourselves to this hypothesis, and logically we are not required to pursue others.


    Our hypothesis is that the low-relief symbols at GT (except snakes) usually represent star asterisms, and particularly on Pillar 43 they are used to represent a date close to 10,950 BC. Our challenge is to provide evidence to support this.
    Here we tackle this problem by asking the following question: ‘Given the symbolism at GT, by considering all permutations of the available animal symbols at GT how many configurations of Pillar 43 can be found that are a better fit than the current one’. A ‘better fit’ is any permutation of symbols that provides a closer fit to the asterisms we suggest are symbolised in Pillar 43. In simple terms, we are asking ‘Given all the possible animal symbols that could have been carved on Pillar 43, how likely is it that the actual ones were chosen by pure chance, considering that they fit the asterisms for the date stamp better than almost any other combination?’


    Table 1 shows the result of our analysis. That is, it shows the number of animal symbols at GT that we consider at least as good a fit as the existing symbol at that position on Pillar 43. We therefore conclude that the probability that Pillar 43 does not represent the date 10,950 BC is around one in 100 million, or one in 5 million if we neglect permutations with repeated symbols on Pillar 43. Considering these odds, it seems extremely likely that Pillar 43 does indeed represent the date 10,950 BC, to within 250 years.

    You can see they explicitly say that it is unnecessary to consider any other explanation than their favorite, because the positioning of the carvings is unlikely to be due to chance. This is the real “logic” people are using. It is pretty amazing to see it written out so clearly step-by-step like this though, maybe it is new to archaeology?

    • Carol:

      Singal’s article is fine but the whole thing annoys me—it’s not Singal’s fault—that there’s this whole online discussion being driven by a third-party report of Cuddy’s apparently evidence-free claims. As I wrote in some other comment in one of these threads, it’s hard for anyone to even reply, given that Cuddy seems to have offered no specifics. It was the same thing when Fiske wrote about terrorism: she gave no examples. I feel that in a just world those sorts of empty statements would just be ignored, but instead it seems like everybody has to offer their take. Again, I don’t blame Singal—he didn’t get the discussion started. The whole thing just seems weird to me.

      • Well if you find that frustrating wait till you hear about this.

        Lakens asked someone for evidence of this “vile bullying” that Cuddy mentioned in her evidence-free talk. As evidence, someone mentioned that I had joked about death threats, but conveniently, had deleted the tweet:

        I never delete tweets unless I immediately realize a spelling error, so I was interested in what this deleted tweet was they were referring to. Apparently they are referring to this tweet, which is not deleted:

        I was simply responding to Lakens that it is fair for you to continue to talk about Cuddy since she stays in the public eye. Is it wrong to talk about public figures?

        So just to recap, Cuddy gave a talk and mentioned death threats while giving no evidence. As evidence that there is a clear culture of bullying someone brought up a tweet that never happened. Sad!

        • Jordan:

          Wow, what a horrible thread. I really do hate twitter. We have someone saying that it’s ok for Cuddy to call me an ass—but Cuddy never called me an ass. She never even mentioned my name, as far as I can tell. We have someone else who accuses you of joking about death threats, and then when you point out that you never did such a thing, he never goes back to apologize to you.

          These people—not Cuddy, but these twitter people—are behaving terribly. In all seriousness, I don’t think Cuddy is behaving so badly here. From what I’ve heard of her latest speech, I don’t think it was particularly productive, but it sounds like she was polite and was not accusing any specific people of anything at all. In contrast, these twitter commenters are going out of their way to insult you, me, and others, even though Cuddy never mentioned any of us. This is not all Cuddy’s fault; it’s the fault of people who feel the need to jump into a discussion and start throwing around insults.

        • Andrew, Jordan: Reading through this twitter discussion, it seems to me that Daniel Lakens stated that Cuddy gets too much focus, and Jordan responded that it seems fair because she keeps herself in the spotlight. Nicholas Christakis seemed to think that Jordan’s “seems fair” was a response to someone’s earlier mention of death threats; it wasn’t.

        • Twitter is where rationality goes to die.

          To repeat my previous post: I really think we’re doing ourselves a disservice by relying on tweets as evidence. It’s a bit like interpretive archaeology, but for the present, not the past. So much of the context is lost and misunderstandings and over-simplification are pre-programmed. (Absurdly, we’re deliberately doing this to ourselves).

        • Andrew said: “… it’s the fault of people who feel the need to jump into a discussion and start throwing around insults.”

          Yes, it does seem as though there are people who “see a fight” and jump in, possibly just for the sake of fighting (or bullying). Sad.

        • I’m really trying to understand exactly what we can be doing better, but these criticisms have not included any evidence of bullying or suggestions for how to properly point out someone’s life work is actually wrong without seeming like a bully.

          I don’t know that Amy Cuddy should get a complete pass here. I think Donald Trump is currently involved in a lawsuit about inciting violence at a rally. When you use explosive rhetoric in a talk aren’t you at least a little responsible for the resulting angry mob?

          Obviously I did not see the talk, but it’s pretty much a fact she said she was mentioned 600 times on a blog. Okay, sure, maybe if you count every single mention the number might be right, but regardless, the number was thrown out just to exaggerate the problem. And if mentioning someone is wrong maybe she should complain to TED, I hear they have a video of her that has been seen 40 million times, how dare they!

          And it seems to be a fact that she had a slide that mentioned death threats. Again, probably true, but what was the purpose of including that? The threats likely had nothing to do with the bloggers. Public figures get death threats all the time.

          I just feel that her audience and supporters would have reacted very differently if she gave a more accurate description of what has happened to her. Does it suck to be her these last couple years? I don’t know, probably. Does it feel like everyone is attacking her? I don’t know, maybe. But just because one popular blog repeatedly pokes fun at power posing doesn’t mean there is a culture of fear.

          P.S. Social psychology researchers who have been fabricating results should be very afraid of recent developments, so if that is the fear that the social science community is experiencing I’m all for it.

          P.P.S. In case anyone says I’m hypocritical for talking about explosive rhetoric, because after all, I did call Brian Wansink the “Donald Trump of Food Research”, I’m okay with all kinds of rhetoric as long as the situation warrants it. The problems with Wansink’s work cannot be overstated. I could write an entire book about them. If anything I have not used strong enough rhetoric in my criticisms of his work.

          I just don’t think having your work criticized on a few blogs warrants the declaration that a culture of fear exists that is slowing scientific progress.

        • You should focus on whether they can predict something precise. You know, how smart people used to prove they knew stuff before NHST allowed them to trick people without consequence.

        • I just realized that Paul Coster, who has been objecting to the tweets by Daniel Lakens (who wants evidence), is Amy Cuddy’s husband.

        • Luckily someone gave me a heads up about that. And guess what? He blocked me on Twitter! And he’s the one who accused me of blocking him!

          This is so amazing I just to recap it.

          I never delete tweets or block people, but I woke up yesterday to accusations of a deleted tweet and having blocked someone.

          One of the people accusing me of deleting a tweet deleted their accusation, and the person accusing me of blocking them blocked me.


        • Jordan:

          I just hope that, at some point, that contagion-of-obesity dude apologizes to you for saying to the world that you were jokingly approving of death threats. His mistake was understandable in the heat of the moment but once you pointed out he’d got it wrong, he really should say he’s sorry.

        • I don’t know, maybe he really believes there’s a deleted tweet out there somewhere.

          Now that I have people on the TIME’s 100 list spreading lies about me I feel like I’ve earned some sort of methodological terrorism badge. I leveled up!

        • Jordan:

          It does seem that, when people are on twitter, they say things they otherwise might think better of.

          It’s funny because there’s that saying: Don’t put anything on email that you wouldn’t want the world to see. On twitter, the world really does see it, yet people still throw around rash accusations. On the plus side, twitter has that here-today-gone-tomorrow feeling, so it’s possible that the contagion-of-obesity guy just never bothered to follow up and find out he’d been wrong about you.

          One thing this whole episode has demonstrated to me is the role of the medium of communication in how the story develops. When people first emailed me about Cuddy’s speech, a few days ago, I was thinking about blogging on it, but that seemed pointless given that Cuddy never even named me specifically. I could hardly defend myself given that there were no particular charges. So it annoyed me to see the twitter feed and find that I’d been dragged into it anyway.

          None of this will make power pose work, though. The tone police could round up all the methodological terrorists, they could retake control of Psychological Science, etc etc., but there’d still be no evidence that “That a person can, by assuming two simple 1-min poses, embody power and instantly become more powerful has real-world, actionable implications.”

          Somehow it seems that people are saying that it’s rude to laugh at a statement like that. So maybe I’ll just say that I think it is poor scientific practice to conclude the abstract of a paper with a statement for which zero evidence has been offered. (Those who want to google can read the paper and find that nowhere did those experiments have anything on people “becoming more powerful.”)

        • Ah, Nicholas Christakis, the “contagion of obesity dude”! I knew that name sounded familiar but couldn’t place it. I remember reading that article about a decade ago. It got a lot of media attention but was statistically flawed, if memory serves. It’s a small world, isn’t it?

        • Carol:

          Christakis has had a much longer career than Cuddy, but he shares with her that he did research with a mind-over-matter flavor that was widely publicized (and which he presented in a Ted talk!) and was then widely criticized on statistical grounds.

          You can search this blog for discussion of those criticisms: I didn’t do the critical analysis myself; rather, I discussed the arguments of various critics along with the responses by Christakis and his collaborator, James Fowler.

          I don’t know if Christakis knows Cuddy personally, or if it’s just that he sympathizes with her, having done widely celebrated research that was gradually dismantled by critics.

        • Andrew: I will concede one point about this culture of fear Cuddy is talking about. It does make sense for people in psychology who support Cuddy to remain silent, because a lot of the terrorists come from fields outside of psychology, and are only made aware of most work once it gets a certain amount of attention.

          Look at Wansink, none of us had any idea who he was until his blog post.

          Look at the positivity ratio and Fredrickson, I’m pretty sure Nick Brown just randomly came across that paper.

          These people (social psychologists peddling snake oil) are in a very difficult spot right now because they need to promote their work, but if it gets noticed by the wrong person the house of cards they built could quickly collapse.

          And if a social psychology researcher decides to publicly endorse work under fire there’s a good chance one of us terrorists will notice, and might decide to take a closer look at some of their work. For example, if I didn’t have my hands full with pizzagate I’d be tempted to look at the work of this Christakis guy.

          However, I still don’t think this culture of fear is our fault, or if it is it’s not a bad thing. If you’ve been doing good work, great, you have nothing to be afraid of. If you’ve been p-hacking, fudging statistics, or fabricating data, then yes, you should probably try to draw as little attention to yourself as possible.

          It’s similar to why people are afraid to post their data and code. The real reason is they are embarrassed by the quality of the data and code (or neither exists). Just look at the pizza papers, that data set is a mess (the STATA code the econometrician wrote seems good, but that wasn’t the code used in the original papers).

          People who share their data and code and preregister their studies do so because they have nothing to hide, they are confident you can try and tear apart their conclusions but won’t succeed. The people who hide data and code do so because they have something to hide. And these same people are the ones who are afraid to stand up for poor methods because they know their own work won’t withstand criticism.

        • Hi Jordan,

          I’ve met Wansink, briefly. I don’t recall ever reading his articles or hearing him speak, though.

          I believe Nick Brown discovered the 2.9013 positivity ratio article of Fredrickson and Losada (2005) in a course that he was taking as part of his master’s degree program in applied positive psychology at the University of East London. Nick, please correct me if I’m wrong.

          Some of the people critical of social-psychology research (e.g., Ulrich Schimmack) are psychologists; others are from other disciplines.

      • Looking through the data set is like going through a house of horrors.

        I thought the STATA code looked good, but now I’ve noticed that it appears they counted people’s responses to questions like “how was the middle piece of pizza” even if that person didn’t provide a response for how many pieces they ate. We don’t know if these people even ate pizza and yet they are happily taking the values at face value. I guess when you do a deep data dive you grab whatever you can reach.

        • Jordan:

          We just have to be careful here. If we mention the name Wansink more than 600 times in total, I think this qualifies as “bullying” or “public shaming” or some other bad behavior. Oh, why can’t bloggers be more polite?

        • Anon:

          Where’s the bloody face? What I see is a guy who’s a tenured professor at an Ivy League university, who regularly appears on widely-watched TV shows, who’s held a U.S. government position and has consulted with major corporations, and who recently testified before Congress.

          What’s the rule on this that you’d like to implement? We’re only allowed to write X number of blog posts about Wansink? What is X? To blame Anaya etc. for turning up more and more problems with Wansink’s work, to blame me for reporting on it . . . that’s just blaming the messenger, dude.

        • So you don’t believe there are better and worse ways to deliver a message? All methods are equal? Throwing a brick through a window is the same as sending a letter through the mail?

        • Anon:

          I see no bloody faces, I see no bricks, I see no broken windows. People have contacted Wansink in quieter ways and nothing has come of it. See for example this story from 2012! I think open letters and blog posts are a very appropriate way of addressing research misconduct.

        • I agree as long as they are accurate critiques and as long as they are done with a modicum of respect. Neither of which appears to be Jordan’s forte.

        • Anon:

          You write, “as long as they are accurate critiques and as long as they are done with a modicum of respect. Neither of which appears to be Jordan’s forte.”

          I agree that what is relevant are the accurate critiques, and if there are enough inaccurate critiques, that presents a problem. The critiques I’ve shared in this blog seem reasonable to me and I’ve not seen any evidence that they are inaccurate. Regarding “a modicum of respect”: at this point, I don’t know how much respect Wansink’s work deserves. But, sure, that’s a judgment call. I do think this is a far cry from “God, you both are fucking pompous assholes. . . . Let’s spend some more time rubbing this guys bloodied face into the pavement.”

        • At some point differences of degree become differences of kind. Excessive critique of the same person begins to look personal and bullying, whether you think it should appear so or not.

        • Anon:

          You write of “excessive critique.” But what is “excessive”? You’re entitled to your own opinion, of course, but, again, I honestly don’t see why you think that criticizing published work more than X times makes someone a “pompous asshole.” Wansink’s research behavior irritates me, and I’ll agree if you tell me that responding to blog comments on the topic is probably not the most productive use of my time, but I don’t get the whole “bullying” thing. I’m expressing my annoyance in an open forum, open to comments such as yours, and based on specifics of published work. I get that this annoys you, but I think that my writing can annoy you without it being “bullying,” “fucking pompous,” “rubbing this guy’s bloodied face,” etc. I’m just pointing out flaws in published work and responding to comments!

        • Except that you aren’t simply criticizing the flaws in the research, Andrew. You are also supporting error filled and over-inferred critiques. You seem to miss all of the flaws in Jordan’s analysis. Why is that?

        • Anon:

          You write that I am “supporting error filled and over-inferred critiques.” I’m afraid you’ll have to be specific here. I’ve made mistakes—I make mistakes all the time—but it’s kinda hard for me to correct the mistakes if I don’t know what they are. All the critiques that I’ve written about seem reasonable to me. So if there are some errors in my posts you should let me know!

        • Let’s focus on the post you linked to. Please step through the logic that allows an inference of “misconduct” or “misrepresentation” from the “evidence” provided by Jordan. Because it is not there. The study is not worth the time and money put into it, I agree. But Jordan’s inference is well beyond the data at hand.

          There may be additional information that allows for this inference, but it is not in the blog post nor is it linked to through the post.

        • Anon:

          According to the NIH, at a webpage linked to by the Cornell University Media Relations Office, “Research misconduct is defined as fabrication, falsification and plagiarism, and does not include honest error or differences of opinion,” and they define “falsification” as “Manipulating research materials, equipment, or processes, or changing or omitting data or results such that the research is not accurately represented in the research record.”

          In the link above, Brown, Anaya, et al. report a paper in which Wansink and his collaborators clearly misstated how they collected their data. And, in the context of that paper, the misstatement was consequential: the authors wrote that they collected the data “unobtrusively,” making appropriate subtractions “outside of the view of the customers.” Actually, though, if you believe their later report, they simply asked people how many slices of pizza did they eat. This seems to me to be a clear case of misrepresentation, and it’s hard to see how it could’ve occurred by accident. I don’t see how this could be classified as an honest error or a difference of opinion. In addition, there were several other problems with the paper, indicating that the numerical data, however they were gathered, were not accurately reported. The research was not accurately represented in the research record. This is fatal to science.

        • I hate that I am arguing about this absurd study, but here goes:

          All of the following are possible in the context of this study:

          1. To visually observe people eating pizza.
          2. To collect self-reported data after the fact.
          3. To correct the self-reported data using the visually observed data.

        • Anon:

          Sure, all things are possible but the point is that your three-step procedure is not what they reported in the paper. Their paper misreported how the data were actually collected.

        • My 3 step procedure is consistent with what was reported. The tie doesn’t go to the critic. If you want to criticize the lack of specificity, fine. If you want to criticize the reviewers, the editor, and all of the authors for not being more explicit in the description, fine. But, there is nothing about the description that is inconsistent with what I wrote.

        • And it is not that “all things are possible” it is that this very reasonable and obvious explanation for the same information should be given its just due by a critic alleging “misconduct”.

        • Anon:

          You write, “My 3 step procedure is consistent with what was reported.” No it’s not. The paper said nothing about asking people how many slices of pizza they ate. Rather, it said that the measurements of consumption were taking indirectly.

          In addition, if there was such a three-step procedure—which, again, was never claimed, either in the original paper or in the correction—there is then no statement about how the three numbers are combined to get the data.

          When I say “all things are possible,” I mean that you or I or Anaya or anyone else could make up stories about where the data came from. It’s the responsibility of the authors to accurately state how the data were gathered, and they did not do that. Rather, they misrepresented. And of course there were other problems of misrepresentations in several other Wansink papers.

        • Andrew:

          You are inferring well beyond the data at hand as well. There is nothing inconsistent about the information they wrote and a procedure that cannot reasonably be described as “misconduct”.

          Not explicit enough? Sure. Misconduct is a huge overstatement.

        • I find it interesting that in “Peak-end pizza: prices delay evaluations of quality” they wrote:

          “Unfortunately, given the field setting, we were not able to accurately measure consumption of non-pizza food items.”

          But in “Eating Heavily: Men Eat More in the Company of Women” they wrote:
          “In the case of salad, customers used a uniformly small bowl to self-serve themselves and, again, research assistants were able to observe how many bowls were filled and, upon cleaning by the waitstaff, make appropriate subtractions for any uneaten or half-eaten bowls at a location outside of the view of the customers.”

          But hey, I guess we’re just nitpicking.

        • Anon:

          I guess everything’s a judgment call. I agree with Brown, Anaya, van der Zee, Heathers, and Chambers that what was done in this paper qualifies as “changing or omitting data or results such that the research is not accurately represented in the research record,” which fits the NIH definition of research misconduct. Data were collected by asking people directly, but the paper says that the data were collected without the participants’ knowledge: that seems like a big deal to me, an inaccurate representation. But, sure, you can label this differently. I’m just glad we’ve moved beyond terms like “assholes” and images of bloody faces and bricks being thrown through windows.

        • Andrew:

          Jordan and your determined effort to conclude misconduct in the face of substantial ambiguity is troubling. I don’t expect more of Jordan, but I do expect more of you.

        • Anon:

          It seems clear to me that there was changing or omitting data or results such that the research is not accurately represented in the research record. But I respect that you have a different opinion, and I’m glad this has moved beyond discussions of assholes and bloodied faces.

        • Anon: Hey, Andrew is the real terrorist here, I’m just a junior terrorist or terrorist in training. Can you believe Andrew has mentioned Amy Cuddy 600 times on this blog and she is getting death threats! Why should you expect more of him than of me?

        • I just realized something. Critics often get accused of being jealous of the famous people they are criticizing, but here we have someone talking about how “pompous” the critics are! Definitely seems Anon here with his attacks is a little jealous we are getting so much attention and credit. Oh how the tables have turned.

          It’s not my fault that my criticism of Wansink has been covered by Slate, New York Magazine, The Times, The Chronicle of Higher Education, Ars Technica, and many other outlets. Okay fine, we did contact some journalists when we weren’t happy with Wansink’s initial response, but it’s not like we put a gun to their head and forced them to cover the story.

          But maybe this is a slippery slope. If I take calls from journalists what’s to say I wouldn’t accept a TED talk if they contacted me? If I gave a TED talk why would I stop there? I might as well write a book called “Mindless Research” while I’m at it. I’m really no better than Wansink, Cuddy, etc. Oops, I mentioned Cuddy again, I forgot that wasn’t allowed.

        • Jordan —

          You love to blame “Cornell” and say that we [serious scholars] can’t trust anything coming out of “Cornell”, even though it’s been pointed out to you many times that Wansink is not Cornell, Cornell is not Wansink, and that it’s unfair to characterize all research by Cornell scholars as if it is of the same shoddy quality as Wansink’s.

          All the self-righteous preening about how careful you are as a scholar and how reluctant you are to generalize beyond your data is for naught when you refuse to stop generalizing beyond your data.

        • Anon:

          Can you be specific? We can’t do much with “asshole” as it is so subjective, but I did look up “pompous” and got this: “affectedly and irritatingly grand, solemn, or self-important.” I’m certainly not intending to be “grand, solemn, or self-important.” Quite the opposite. I can’t speak for Jordan Anaya, but, as a statistician, I’m bothered by the misuse of statistics, and I’m annoyed when people try to take the focus off the technical questions by raising what I see as irrelevant other issues. Again, I have no intention of being “grand, solemn, or self-important.” It’s not about me at all; it’s about the research, and about the system by which cargo-cult pseudoscience gets treated as the real thing by respected organs including NPR and the National Academy of Sciences. This bugs the hell out of me.

        • Jordan’s approach is nothing short of “self-important” and your willingness to uncritically support his comments gives the impression that you seek the same.

        • Anon:

          I can’t speak for Jordan Anaya (or Nick Brown, Tim van der Zee, James Heathers, or Chris Chambers) but I will say that they seem to have put a lot of work into all this, so I don’t begrudge them some claiming of credit.

          I agree that finding research misconduct in some studies of dietary experiments is not the biggest thing in the world—there are a lot bigger scandals out there—but to me it is symptomatic of larger problems in published research. Science relies on trust, and people like Wansink betray that trust, and organizations like Cornell betray that trust once again by shielding the Wansinks in their employ.

          I think Anaya and his colleagues have done a service by putting in the effort to dig up the details on what Wansink did. If Jordan wants to be a bit self-important about it (in your judgment), that doesn’t bother me so much. I’m much more bothered by Wansink’s behavior and by the behavior of university officials and journal editors who sometimes seem to just want to close their eyes and hope this all goes away.

        • Jordan is not only self-important, but he is wrong about many, many of his criticisms and that does not seem to bother you as long as he is criticizing Wansink or Cuddy or whoever is the goat of the moment.

        • Anon:

          I’ve not read everything that Anaya has written—actually, I hadn’t realized he’d written anything about Cuddy. The criticisms by him, Brown, etc., that I’ve relayed on my blog all seem reasonable to me.

          When I relay things from Anaya, Brown, etc., this should not be taken as an endorsement of other things that they have written.

          Conversely, when I criticize people’s writings, this should not be taken as a criticisms of other things they have written. Indeed, I made that point explicitly in my long post last year responding to Susan Fiske: there was an opinion piece of hers that really bothered me, there were some research articles by her with statistical errors, there were some terrible papers she published as a journal editor—but I very clearly was not saying that I thought all or even most of her work was bad. I was making specific criticisms of specific papers and of specific behaviors.

        • Anon: Any time someone mentions a problem with a blog post of mine I immediately post an addendum.

          The biggest problem I had in my first post was assuming you can check a mean of values that have undergone a nonlinear transformation (you can’t). That accounted for a few of the errors I mentioned here:

          I was clearly in the wrong there.

          Other addenda I’ve added are not as serious and I stand by criticisms without attached addenda, and stand by almost all criticisms in the original preprint (BMI suffers from the nonlinear problem).

        • I would never defend Wansink’s research methods. There is nothing of value that can be inferred from them. My reading of Jordan’s criticisms is almost the same.

        • Anon:

          You write that almost nothing of value can be inferred by Anaya’s criticism. I disagree. For example, I think this recent post by Brown, Anaya, van der Zee, Heathers, and Chambers is both informative and insightful.

        • Where is the link to the specific description of the study that definitively shows that the study was all “self-reported” data which is the primary claim of the post?

        • I will say one more thing. Perhaps some of the assumptions in a couple of my blog posts should have been more carefully thought through, but the main criticisms that people usually reference are the pizza publications, and despite the response from Cornell that downplays the problems, I believe my colleagues and I did an amazing job in that paper.

          For example, without the data set we determined that it was impossible for the mode of slices to be 3. And guess what? We were right. And we were right about many other things. So many in fact that I believe a second preprint is warranted to show just how accurate our methods can be.

          I’m not sure how many people could have done as accurate of a job as us on the pizza papers, so I’m not sure saying critiques are not my forte is accurate.

          Not to throw Nick Brown under the bus, but he also has addenda on his posts sometimes. The difference between us and Wansink is we immediately acknowledge errors (and we have a few as opposed to hundreds. Oh, and we share our data/code. Oh, and we don’t blatantly lie. Oh, and we respond to emails. Oh, and we don’t turn one publication into 4 or submit the same paper to multiple journals. etc. etc.).

        • Anon:

          As diners completed their meal, they were intercepted
          after they paid at the cash register following the meal, and
          each was given a short questionnaire that asked for demo-
          graphic information along with a variety of questions
          asking how much they believed they ate, and their taste and
          quality evaluations of the pizza. Other than questions
          involving numerical estimates, most questions asked their
          agreement with a number of statements on 9-point Likert
          scales (1 = “strongly disagree”; 9 = “strongly agree”). The
          specific wording for each Likert question appears in Table 2.
          Demographics of the two conditions are provided in
          Table 1.

        • Without that link, it is impossible to determine from the post whether the post is accurate. It might be. But there is no evidence provided to support it.

        • Anon:

          You ask, “Where is the link to the specific description of the study that definitively shows that the study was all “self-reported” data which is the primary claim of the post?”

          There was no link but it was not hard to find. Google *”Eating Heavily: Men Eat More in the Company of Women,” by Kevin M. Kniffin, Ozge Sigirci, and Brian Wansink (Evolutionary Psychological Science, 2016, Vol. 2, No. 1, pp. 38–46).* and you come to this link: and below the abstract you see this:

          A comment to this article is available at

          and if you follow that link you see this Editorial Note:

          In the article “Eating Heavily: Men Eat More in the Company of Women,” by Kevin M. Kniffin, Ozge Sigirci, and Brian Wansink (Evolutionary Psychological Science, 2016, Vol. 2, No. 1, pp. 38–46), tha authors report that the units of measurement for pizza and salad consumption were self-reported in response to a basic prompt “how many pieces of pizza did you eat?” and, for salad, a 13-point continuous rating scale. The authors further report that a robustness check that affirms the main findings that men tend to eat (i) more pizza and (ii) more salad in the company of women is now available at the CISER Data Archive at, with raw data as well as Stata command lines that reproduce the signature results of the article.

          The last sentence of that note is, to my mind, in poor taste. The authors admit that they completely represented how the data were collected, which to my mind makes the so-called robustness check pretty meaningless. In any case, I agree with Brown et al. that, if you accept the correction, then the original paper completely misrepresented the data collection in a way that is fatal to the main conclusions of the paper.

        • We do not disagree about the absurdity of the research questions nor about the methods. But, we seem to disagree about what can be inferred from those words.

        • Jordan is once again drawing a strong conclusion without enough information with which to do so. That is my problem with his critiques.

        • Anon:

          It’s not just that Wansink’s methods were “absurd.” It’s that he said one thing and did another. In the paper he said “the number of slices of pizza that diners consumed was unobtrusively observed by research assistants and appropriate subtractions for uneaten pizza were calculated after waitstaff cleaned the tables outside of the view of the customers” but (if you believe his correction) what he actually did was just ask people directly how many pizza slices they ate.

          To collect your data in one way, and the report that you collected it in a much more sophisticated way, that’s uncool. One might even call it research misconduct.

        • There is nothing about that description of “observation of pizza eating” that is incompatible with also collecting “self-reported pizza eating” data.

        • Anon:

          1. You write, “There is nothing about that description of ‘observation of pizza eating’ that is incompatible with also collecting ‘self-reported pizza eating’ data.” Yes, it’s incompatible, given that the paper did not report two sets of data. The original paper explicitly said “the number of slices of pizza that diners consumed was unobtrusively observed.” This is not the same thing as directly asking people! Just go through and read the original paper; it’s online.

          2. I hate twitter. I’m evaluating the Brown, Anaya, et al. post on its own merits; I don’t really care if people love it or hate it on twitter.

        • In the open letter, Nick and company wrote: “Nobody observed the number of slices of pizza. Nobody counted ….” May I suggest that making this certainty is ill-advised at this point? It is possible that Wansink’s research assistants did observe. One of the other articles states that the particular restaurant was chosen because it was “necessary to find an AYCE [all you can eat] restaurant where diners could be unobtrusively served and observed in a natural manner.” There is no elaboration as to the observation but this is a hint that there was some sort of observation going on.

        • Carol:
          Sure, anything is possible.

          But maybe they just like writing the phrase “unobtrusively recorded”. In the carrots paper they write:

          “For the 113 students who were present for all three study days,
          their choices at each meal were unobtrusively recorded. Following
          lunch, the weight of any remaining carrots was subtracted from
          their starting weight to determine the actual amount eaten.”

          In my opinion “unobtrusively recorded” would suggest a visual method, but the very next sentence contradicts this by talking about weights. In his retraction watch interview Wansink claimed a visual method, “the quarter plate method”, was used in this paper, and in fact, was cited in the paper. The method wasn’t cited as claimed, but I suppose the method could have been used despite the clear mention of weights. I also suppose all the numbers could be made up, anything is possible after all.

          A very disturbing pattern is emerging. When it is pointed out that numbers don’t add up the default response seems to be that the method of the paper wasn’t accurately reported. Assuming this is true and explains the numbers we are seeing, how does this happen? Do they just randomly copy and paste methods without even thinking about what actually was done in the study?

          Given the number of contradictions our investigation is revealing it is difficult to know what we can and can’t believe when it comes to Cornell and this lab.

        • Hi Jordan,

          I’m just suggesting a little more caution, so that you don’t leave yourself open to counter-attack.

        • Hi Glen M. Sizemore and Curious:

          If you are interested in the topic of between-subjects analysis vs. within-subject analysis, Peter Molenaar has written some wonderful articles on this.

        • This is going to be an obscure analogy, but whatever:

          Your comment reminded me how in Gurren Lagann the population of a planet of spiral people can’t reach 1 million or else the antispirals will come and destroy them.

          Because you see, spiral people (people who can continue to evolve and gain knowledge–the term spiral is a symbol for DNA), are dangerous to the universe. They can become so powerful that they could one day destroy everything.

          To stop the Earth’s population from reaching 1 million, which would result in the antispirals coming in and destroying Earth, there is a spiral king who keeps the population under control (through brutal means).

          I’m just imagining the tone police as the the spiral king in this analogy. Maybe they really have our best interests in mind. Ignorance is bliss after all. Maybe if the terrorist numbers got out of control we would realize everything we’ve grown to believe is a lie, and the antispirals, in this case Donald Trump and the government, would swoop in and take away all our funding for doing terrible science the last several decades.

      • One thing I’ve noticed that could be problematic. Andrew occasionally argues by conceivability: “This seems to me to be a clear case of misrepresentation, and it’s hard to see how it could’ve occurred by accident.” or ” I can believe that they didn’t bother to check”.

        In a similar vein, Jordan et al in their letter to Shackelford write “Even if we ignore what appears to have been a deliberately misleading description of the method”, suggesting to me that they might think it has to be deliberate because they can’t imagine this to be the result of an accident or carelessness.

        But obviously, what I or Andrew or Jordan can imagine will be strongly influenced by how we ourselves think we would act in the situations in question. And Andrew and Jordan (not sure about myself) certainly being well-structured, meticulous people aware of the importance of methodological rigor, they might have a hard time imagining that something not done in that spirit could be unintentional.

        My experience has been that it’s often quite unimaginable and incomprehensible what goes on in other people’s minds and that basically every kind of irrational weirdness or mess can come out of it. Inferring from the results of human action to its causes or motives is therefore inherently difficult and caution should be exercised in any interpretation, even if – and perhaps particularly if – one can only imagine one explanation. (And Jordan is probably going to tell me that’s not at all why they suspected the inconsistency was deliberate.)

        • Alex:
          You’re right, we don’t know their intentions.

          It could be that they intentionally altered the methods of that paper. It’s not difficult to imagine that the paper was rejected by a journal, a criticism being how they collected the data, so they changed the methods.

          Or it could be they accidentally inserted an entire account that simply did not happen.

          Anything is possible, and I’m open to hearing from them how this type of thing happens. Unfortunately, the Editorial Note did not explain how this mistake was made. In fact, we don’t have explanations for how many of the strange findings/mistakes in the papers occur.

          If they don’t want us to wrongly assume how/why something was done the solution is really simple: tell us how these mistakes are being made.

        • Curious:

          No, Jordan is not doing a “When did you stop beating your wife?” kind of tactic. The paper in question misreported the data collection. That’s for real. A bunch of papers by Wansink had summaries that were not consistent with any possible data. That’s for real too.

        • Curious/Anonymous:

          Also, please use a consistent handle. If you don’t want to use an actual name, “Curious” is fine, but it’s not so helpful to call yourself “Anonymous” on part of the thread and “Curious” on another part. Not quite sock-puppeting, but it does pollute the discourse. I have no problem with anonymous or pseudonymous comments, but one name per person, please.

        • Alex makes a good point here — somewhat related to a point I initially had a hard time making with Andrew on another discussion (I forget which). IF I remember correctly, it went something like this: Andrew was speculating about what someone else would do based on what he would do (or perhaps on what was important to him?). I tried to point out another possibility; he said we were both speculating; I replied, yes, but his speculations were phrased as “would,” whereas mine were phrased as “might.” I also phrased this as he seemed to be considering a prior with all its weight on one possibility, whereas I was suggesting a prior with non-zero probability on more than one possibility.

    • I’ve just read the “open letter” and I feel a little bit responsible because I brought this editorial notice to the attention of Jordan and company (although I was in no way involved in the writing of this open letter and didn’t know about it until I
      clicked on Jordan’s link).

      In my opinion, Nick, Jordan, Tim, and the two new co-authors are way out of line telling the editor how to deal with the situation: “In view of these problems, we believe that the only reasonable course of action is to retract the article, and to invite the authors if they wish, to submit a new manuscript with an accurate description of the methods used, including a discussion of the consequences of their use of self-report measures for the validity of their study.” Retraction of the article and invitation of a new submission is the decision of the journal. Regardless of the seriousness of the problems in the article (and I agree that they do seem serious), the open letter comes across as high-handed and dictatorial.

      Also, given that there are four articles based on the same dataset, what to do now should probably involve the editors of all four journals, not just Shackelford.

    • Reading the discussion between Anonymous on one side and Jordan and Andrew on the other, I’ve decided to take a close look at the
      article, the data, and the program code myself. But the website with the materials that Wansink posted does not seem to contain a
      copy of the survey (the questionnaire or whatever one wants to call it — I mean the form that the diners at the pizza parlor completed) or a copy of the experimental protocol. Having these would allow determination of exactly what went on. Were these
      materials ever requested by anyone? Even if Wansink doesn’t have a copy now, Cornell’s IRB should have a copy because these materials would have been required by the IRB before approving the study.

      • Carol:
        We never requested the experimental protocol. I’m not sure that they have it. The first time we emailed the lab they told us if we wanted the data we should perform a replication of the experiment and that all the necessary information to do so was in papers (this clearly is not the case).

        Also, in his retraction watch interview Wansink said he realized they asked people how much they ate in two different ways. So if there does exist a survey form there appears to be multiple versions of it.

        Currently I’m taking the description of the variables in the STATA scripts at face value, which indicates diners self-reported how much they ate.

        • Hi Jordan,

          The two different ways described in the Retraction Watch interview are not the same as the two different ways identified now.

          In the RW interview, Wansink states the diners were asked how many pieces of pizza they ate (an integer number) and also to put an X on a scale with a 0 anchor at one end and a 12 anchor at the other end and no numbers in between. The latter cannot be claimed to be a 13-point continuous rating scale, as is stated in the editorial notice. Perhaps, though, this is just sloppiness in description. If they used the scale with 0 to 12 end-anchors, instead of a 13-point rating scale with the 13 points marked or verbal labels or both, this would account for the non-integer values (e.g., 7.9) but not why so many diners have exactly the same non-integer values.

          Both types of rating scales could have been used on the same survey form. Asking the same question two different ways is not uncommon in survey research (e.g, ask age and ask date of birth). This does not mean that there were necessarily two different forms.

  13. I know you hate Twitter but this came up in my feed and it concerns one of your favorite people so I couldn’t help but share.

    This preprint was recently posted:

    In it, Gilbert is quoted as saying:
    “The reproducibility of psychological science is quite high.”

    In Gilbert’s original paper he says:
    “Indeed, the data are consistent with the opposite conclusion, namely, that the reproducibility of psychological science is quite high.”

    Apparently this difference was enough for him to tweet this:

    “Completely misrepresents our meaning. You broke a basic rule of honest reporting & this is not debatable. You’ve now been notified.”


    He seems to think that breaking up the sentence and adding a capital letter to make it seem like a complete sentence is a huge ethics violation.

    To be fair, I do understand why he feels his quote is being taken out of context. I’m not a statistician, but I believe in his paper he used your “all things are possible” mantra. The reproducibility of psychology could be high, or it could be low, we just can’t tell with the data that we have.

    With that said, even if the authors used his full quote I feel most would interpret the sentence the same way as the half sentence, so him nitpicking about how he was quoted seems petty. If you don’t want people to interpret your paper incorrectly you shouldn’t make it so easy for them with misleading and loaded statements in the abstract. But I guess the paper wouldn’t have caused as much of a splash if it said something like “we don’t know the state of the reproduciblity crisis”.

    • Jordan:
      I think that’s one good example why Twitter is such a waste of time. Or, more to the point: why people using Twitter are wasting their and possibly other people’s time.

      What you’re doing here is gossiping. “Oh dear, did you hear, Gilbert complained about how his words got twisted around, but I think he’s sooo petty. I mean, like breaking up a sentence and adding a capital letter would be a huuuge violation of ethics!” – “Completely agree, love, I mean if you don’t want people to misinterpret your paper, you shouldn’t bla bla bla.”

      It’s pure gossip. Whether the people involved happen to be scientists are not doesn’t matter. It’s still Klatsch & Tratsch.

      We don’t need to hear this because all it’s ultimately about is your own social need for reputation building and self-affirmation by seeking others to agree with you.

      It’s also exactly what I meant in my comment about Cuddy and the culture of fear and distrust. If you start talking about people that way, you’re going beyond just criticizing their work on scientific grounds. You’re starting to draw a *social* line between you and them. Don’t think they won’t feel that. And it will make them much less likely to listen to your scientific arguments, so you’re undermining your own goals.

      • I don’t know, I think I gave a pretty neutral summary of the dust up.

        Maybe I should have added that the quote is definitely used in a context that is not intended by the original authors, and the fact that the authors of the preprint on the status of reproducibility think Gilbert’s Science commentary concluded that there is evidence for a high reproducibility in psychology suggests they may not have read the paper or may not understand it, which makes you question their qualifications for writing a review on the state of reproducibility in psychology (I’m definitely not qualified).

        However, it is interesting that instead of trying to explain how it is taken out of context Gilbert first complained about how it was quoted. I don’t really think how it was quoted matters, most people will reach the same conclusion if they see the full sentence.

        And this gets to the real irony of the whole thing. These people never complain when their ridiculous statements in the abstracts of their papers are splashed across the New York Times or lead to TED talks, but as soon as you use these misleading, potentially exaggerated statements in a scientific criticism, they are up in arms.

        Was it wrong for me to post something on this blog that I noticed on Twitter? I don’t know. Seemed interesting and somewhat relevant since Gilbert’s paper has been mentioned numerous times on this blog.

        And I like gossip, I can’t help it.

      • Alex:

        +1. Even though I do it too.

        The whole point-scoring thing is a dead end. I could score all the points I want in an argument, but if my Type M error paper made no sense, all the point scoring would be irrelevant. Similarly, Daniel Gilbert can get off zinger after zinger on twitter but that doesn’t resolve the serious problems with his research and his advocacy that we’ve discussed in detail on this blog. Zingers have their place—ideally, they are distillations of more complex ideas; indeed I have a whole page of zingers here, but the mix of zingers and affirmations does seem like a dead end.


        Gossip is fun; I like it too sometimes, and I’m still angry about what happened to Gawker. Still, I recognize Alex’s point that gossipy interactions can degrade discourse. I don’t blame you for Daniel Gilbert’s aggressive attitude—he’s been that way for awhile—but it could very well be that all these twitter interactions just make things worse.

        • If I were a religious person, I’d pray for both Jordan and Andrew that they would get over their liking for gossip. ;>)

        • Martha:

          It’s my impression that most people like gossip! You may be unusual in showing no interest in the topic. I’m not saying that liking gossip is a good thing: most people like sugary drinks, too, and too much of that can cause lots of problems too. My biggest problem perhaps is that I read and respond to blog comments as an odd form of relaxation…

        • I wouldn’t say that I have no interest in the gossip, but I guess it’s fair to say that my values prompt me to limit that interest somewhat.

        • I’ve often been accused of being a “gossiper” but I’ve never really understood what is wrong with it. There does seem to be a conflation of gossiping with spreading rumors, and that is certainly never my intention. If someone tells me they heard something why can’t I listen to what they have to say? I like to “chat it up” and I’m not ashamed of it.

          In the present case which started this conversation I’m not sure I would classify talking about Gilbert’s latest Twitter rampage as “gossip”. There’s hard evidence. The tweets are there for all to see.

          Speaking of Twitter, and relevant to the recent discussion about bullying spurred by Cuddy’s recent talk, I saw Daniel Lakens on Twitter voice opposition to your mentioning of Wansink in a post that wasn’t about Wansink:

          Obviously I don’t agree with Lakens, but it does seem like this is the sort of thing people consider “bullying”.

          Wansink for years got positive publicity for his pseudoscience. Now he is getting negative publicity. If negative publicity is “bullying” then what do we call the positive publicity?

  14. Anon:
    I would also like to point out that I’ve been contacting the journal editors about papers I’ve criticized, criticisms, which according to you, I am wrong about. See:
    “[Jordan is] wrong about many, many of his criticisms”

    Specifically, two of the journals I’ve contacted from this post:
    a post which has multiple addenda, say they have been in contact with Wansink and will be issuing corrections. So if Andrew is wrong for taking my criticisms seriously, these journals, and apparently Wansink himself, do not agree with you.

    • If we designed this as an item on a Critical Thinking assessment. My inference would be the correct answer. And yours, Andrews, and the Journal editors would be the incorrect answer.

      You do not seem capable of checking your bias. You are willing to critique editors who let Wansink’s papers in, but not editors who miss the erroneous inferences you are making.

      • If Wansink acknowledges a mistake, I accept that Wansink knows it was. That does not change the fact that your blog post did not provide a valid argument with enough evidence to logically come to that conclusion. It means, like those who rely on NHST, you got it right despite your faulty method.

        • Can you give one example of someone who ised nhst, then followed up with a precise prediction that was later shown accurate? I highly doubt it.

        • No. It is not. Claiming that the methods as described were impossible was faulty critical reasoning.

          Simply because you got one part of it correct, does not mean you got another aspect of it correct.

        • That’s like someone claiming that because one p-value of 20 tested was significant, that the entire theory was supported.

  15. Speaking of Wansink studies that couldn’t have taken place as described, this is a fun little puzzle:

    “During the Tuesday and Friday lunch of each of the
    six test weeks, two of the items were presented with their
    regular name (e.g., grilled chicken); two items were pre-
    sented with a descriptive name; and two items were not
    offered. For each of the next two weeks, the items and
    the conditions were systematically rotated until all menu
    items were presented in all conditions. The rotation was
    repeated in weeks four through six. The rotation was
    planned in order to minimize any unexpected variations
    that might affect either preferences or participation
    (such as blizzards, religious holidays, or game days).
    During a six-week period, each item was available six

    • Fair enough. What can you infer from this?

      1. Sloppy description missed by all authors, reviewers, and editors.

      2. Confirmation of researcher misconduct .

      This is where critical thinking gets challenging and one of the reasons social psyche is in its sorry state is because many make the same errors as you tend to Jordan.

      Finding errors in methods and description is important. Finding our own errors in reasoning is of even greater importance when we are making accusatory statements.

      • I’m not sure what I can infer from this one instance. I actually can’t blame the authors, reviewers, and editors for missing it since I didn’t notice it either. I contacted the journal about granularity errors (my only area of statistical expertise), and they took a look at the paper and noticed it.

        However, I think a definite pattern is emerging in Wansink’s papers where the study was not performed as indicated in the paper. Whether that is how salad and pizza consumption was recorded, or what time of year the study took place, or how many weeks the study took place, whether carrots were weighed or estimated visually, etc.

        From these cases I think it is safe to infer that accurate reporting is not a high priority of the lab. This is further confirmed by the myriad of incorrect public statements Wansink has made about his published work. Whether all of this is deliberate or careless remains to be seen. And I haven’t even mentioned any of the hundreds of statistical problems in the papers. What does all of this mean? I don’t know, but I’m trying to find out.

        It is incompetence at a massive scale? Is it malicious? I want answers and am doing my best to use “critical thinking” to get there.

        From my close reanalysis of the pizza papers there doesn’t appear to be anything malicious with the data or the statistics. But how the authors accidentally reported the incorrect method for one of the papers remains an open question.

        At this point I’m willing to believe anything. For the carrots paper they had a high school volunteer collect the data. If you told me all the errors in the papers are due to the papers being written by unpaid undergraduates in the lab I’d believe you. If you told me someone in the lab is randomly pulling methods and numbers out of a hat I’d believe you.

        Ideally scientific papers would contain all the information necessary to not only understand and replicate the study, but also reproduce the analyses in the paper (by providing data and code). But with this lab we’ve gotten to the point where literally anything is possible. Nothing in the papers can be taken as a fact. Up could actually be down, left could be right.

        Why does any of this matter? Personally I don’t care about psychology or food research, but it seems very disturbing that most of the journals don’t seem to see a problem, and Cornell also doesn’t seem to think this is a problem. And if this work is seen as acceptable by the scientific institutions how much other work just like it is out there?

        • This is the problem. There are very specific things that can and cannot be inferred from those words. There are very specific assumptions that underlie the different plausible inferences. These can and should be clearly explicated.

          All of the extraneous discussion is similar to a Trumpian distraction. It is important to maintain focus on what was written, what was claimed, and whether those can be logically supported.

    • Hi Jordan,

      I think that this was just sloppy writing with poor attention to detail, not dishonesty. If we number the 6 food items from 1 to 6, always pair the same two items, and label the conditions F (fancy name), P (plain name), and N (not offered):

      Week 1: 1F 2F 3P 4P 5N 6N
      Week 2: 3F 4F 5P 6P 1N 2N
      Week 3: 5F 6F 1P 2P 3N 4N
      Week 4: 1F 2F 3P 4P 5N 6N
      Week 5: 3F 4F 5P 6P 1N 2N
      Week 6: 5F 6F 1P 2P 3N 4N

      Then weeks 4 to 6 duplicate weeks 1 to 3, as stated in the article.

      Each food item is offered 4 times (not 6, as stated) across the 6 weeks, twice with a fancy name and twice with a plain name. It was also not offered twice, so each item appeared in each condition twice across the study, which equals 6 times. 6 items x 3 conditions x 2 rotations = 36, which is correct. I think that’s what they meant, although that isn’t what they wrote.

      • Carol, Jordan, Curious:

        This is all a good advertisement for open science. If researchers were by default to publish survey forms, data collection protocols, raw data, and code, then this sort of guessing game would not be necessary.

        • Jordan:

          In all seriousness, I doubt that anyone goes into this business wanting to cheat. So I think an expectation of openness, which should make it harder to cheat, should benefit everyone in the long run.

        • I’m with you. Don’t hate the player, hate the game.

          I’m just hoping if we expose the worst players the rules of the game will change.

          It’s like how in the NBA people make compilations of players flopping to try and get the league to do something about it.

          If we can show enough evidence that how science is being done is not only inefficient, but also encourages fraudulent work, maybe things will change.

        • Hi Andrew and Jordan,

          Some of the problem is surely due to the incentives and the competition in academia. To “get ahead” at the good research universities, one generally must publish quite a bit and do so in the best journals; some people are undoubtedly tempted to cut corners. Another problem is incompetence, which to some extent is due to poor training in statistics and methodology in substantive fields. And still another problems is that some people are just not detail oriented. I think that this is the problem with some of the Wansink articles. I described an example elsewhere in this thread.

          But, of course, in any walk of life there will be *some* dishonest people.

        • I am sympathetic to how people were trained, or how they were just doing what was “standard” in the field at the time.

          However, how Wansink has responded to this whole thing is all on him. He hasn’t responded to emails, refused to share data, flat out lied, downplayed concerns (I swear, if I hear about the great work the Food and Brand Lab has done the past 20 years one more time and how recent events do not change the conclusions I’m going to lose it), and frankly has acted much more like a politician than a scientist through this whole thing.

          And that’s the problem, academia isn’t filled with scientists, it’s filled with politicians. Facts are secondary, or tertiary, or in the case of Cuddy, Fiske, etc., completely irrelevant.

          (I had to get a Cuddy reference in there after reading Nick’s blog post about the t tests)

        • Hi Jordan,

          I was commenting on the general issue of the reasons for poor quality research.

          Also, note that I said “some of the Wansink articles” there. I was not commenting on the entire Wansink situation,
          which is complex, and growing increasingly so.

        • Hi Jordan,

          A postscript to my last note: I myself am not very sympathetic to arguments that the authors were just doing what was standard at the time.

          Take, for example, statements like this recent one from Amy Cuddy: “By today’s improved methodological standards, the studies in that paper [the 2010 PSYCHOLOGICAL SCIENCE article] — which was peer-reviewed — were underpowered,” meaning they should have included more participants. I wish we had conducted those studies with the rigor of today’s methodological standards ….”

          This is such nonsense. Jacob Cohen’s first psychology article on power was published in 1962. Tversky and Kahneman’s article about belief in the law of small numbers was published in 1971. The first edition of Cohen’s big power analysis book was published in 1969. Cohen’s”power primer” was published in PSYCHOLOGICAL BULLETIN in 1992. The second edition of Cohen’s book was published in 1988.

          PSYCHOLOGICAL SCIENCE should have rejected Carney, Cuddy, and Yap (2010)— with its n of 42 divided into two groups — on the basis of its small sample size alone, never mind its other problems. It was well-known in 2010 that this sample size was too small.

          And PSYCHOLOGICAL SCIENCE is still accepting articles with tiny sample sizes. The recent article by Sarah Hill et al. (2016) on poverty and eating had an n of 31 with something like 5 or 6 predictors in the regression model.



        • Carol: Re your postscript remark, “I myself am not very sympathetic to arguments that the authors were just doing what was standard at the time.”

          A big part of the problem is that standards in the field *were* indeed poor at the time — that the sources you cite (e.g., Cohen’s articles) had not really influenced the standards in the field. And the question remains: How much (if at all) have standards improved in the past few years?

          The takeaway may be something like, “the price of good science is eternal vigilance.”

        • Hi Martha (Smith),

          I’m not quite sure what you mean. Perhaps I should not have just repeated Jordan’s “standard in the field at that time.” What I meant to convey was that a lot of the supposedly new statistical/methodological developments in psychology have in fact been around a loooooong time and there is really no excuse for people not knowing about them. Cohen’s first article on power was published 50 years ago! But yes, it’s still common — perhaps even standard — for psychology studies to have sample sizes that are too small.

          As long as journals allow studies to have small sample sizes, studies will have small sample sizes. Alas.



        • Carol,

          You said (April 30, 3:31 pm): “Perhaps I should not have just repeated Jordan’s “standard in the field at that time.” … there is really no excuse for people not knowing about them”.

          Unfortunately, life being what it is, a lot of people either didn’t know about them, forgot about them, or ignored them. It really is “the standards in the field” that have influence, not what has been known by some and recommended by some.

          In fact, Cuddy’s 2010 paper (which you which you pointed out had too small a sample size) was published before Simmons et al’s 2011 paper, “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allow Presenting Anything as Significant,” which was a major influence in getting people to realize that standards/practices need to be improved. Yet Simmons et al’s second “requirement for authors” reads, “Authors must collect at least 20 observations per cell or else provide a compelling cost-of-data- collection justification.”

          In other words, a year later, Simmons et al published, in a much-lauded paper, a sample-size recommendation that the 2010 paper you justifiably criticize for having too small a sample size followed.

          I think the price of good science is indeed eternal vigilance. Even a paper such as Simmons et al’s that had a lot of positive points and was in many ways a positive influence also promoted some poor standards.

        • Hi Martha (Smith),

          I’m sticking to my guns. After so many years, psychologists ought to know better. And I think some do know better but continue to use samples that are too small because they can get away with it. (BTW, I am a quantitative psychologist but I interact and/or consult frequently with substantive psychologists, especially social psychologists. I know whereof I speak. It’s depressing.)

          But you are right, eternal vigilance is necessary. Even Daniel Kahneman, who co-authored the 1971 article with Amos Tversky, made the same mistake, as Ulrich (Uli) Schimmack pointed out on February 2, 2017.

          Kahneman admitted the mistake (scroll down through the comments) on February 14, 2017.

          Quite a read!



        • Carol:

          Yes, people have known about Kahneman’s mistake for awhile; for example Hal Pashler and Christine Harris talked about it a few years ago. It’s a credit to Kahneman that he accepted that he’d got this one wrong.

        • “After so many years, psychologists ought to know better. And I think some do know better but continue to use samples that are too small because they can get away with it.”

          Not sure where this particular response will wind up – I’m hoping below Carol’s comment from which the above quote was taken. Anyway, do you think that there are enough subjects in the study linked to below?

        • Glen:

          I consistently find your comments edifying and think it is important to point out the differences in types of research in psychology and behavior.

          I agree that you are correct that for many phenomena, where the between subject variation in response is small, that few subjects with high numbers of within subject data points are appropriate and even the ideal.

          However, I am sure you will agree this is not the case for many areas of psychology where the between subject variation in both predictor and response is substantial and inconsistent.

        • To clarify (in connection with Glen and Curious’s comments): I see Simmons et al’s mistake as giving a “one rule for all” recommendation on sample size — the reality being that appropriate sample size depends on lots of factors, and so needs to be considered on a case-by-case basis.

        • Another problem develops when areas that are used to dealing with the small variation and consistency of response between subjects begin to incorporate measures and assessments from the areas in which there is large variation and inconsistency in response but continue to operate under the assumptions of the small variation model and thus end up with far too few subjects.

        • Hi Glen M. Sizemore and Curious:

          If you are interested in the topic of between-subjects analysis vs. within-subject analysis, Peter Molenaar has written some wonderful articles on this.

        • This is a reply to “Curious.”

          C: I agree that you are correct that for many phenomena, where the between subject variation in response is small, that few subjects with high numbers of within subject data points are appropriate and even the ideal.

          GS: The key is within-subject variability since, in SSDs, the effect (in the simple case) or functional relation is judged in each subject. Of course, in this kind of research, the within-subject variability is subject to experimental control (as opposed to statistical control). One must exert sufficient experimental control over the behavior of each subject in order to do experiments.

          C: However, I am sure you will agree this is not the case for many areas of psychology where the between subject variation in both predictor and response is substantial and inconsistent.

          GS: Well, most psychologists do between-group experiments rather than exert rigorous experimental control over the subject matter. The need for precise experimental control and the fulfillment of this goal marks behavior analysis as a natural science and it is why behavior analysis makes cumulative progress.

      • Dear Carol,

        “I think that’s what they meant, although that isn’t what they wrote.” is the core of the issues.

        Blue is not black and red is not yellow. So lets give an example of a researcher who states in a scientific paper that something, for example the colour of the tail of a specfic species of bird is yellow. This implies that the colour of the tail of this specific bird is not red (or green or blue). That’s how science is working in my field of research (ornithology), and that’s of course also how science is working in the field of research about the type and the amount of food what Humans are eating.

        I have just finished reading a very recently publishes paper on some aspects of the provisioning of food to young Wood Warbers (‘Provisioning rate and prey load in relation to nesting age in Wood Warbers’, in Dutch) in which the author refers to (published in 1937 and in German). The author states that he was very easily able to compare all of his findings with the findings of H. v. Treuenfels (published in 1937 in German), because H. v. Treuenfels had followed the proper scientific method to describe exactly which kind of measurements he had taken.

        The author of this recent short paper states that it was in the past common in this field of research to publish all your details, both all the methodological details, and as well all results in tables and in figures, so ‘this sort of guessing game would not be necessary.’ (a quote from Andrew).

        It was towards my opinion an excellent step of Jordan et al to publish at this open letter. I fully agree with their proposal that there are solid grounds that the paper must be retracted (and that the authors are of course always free to submit a new paper about this topic).

        I would like to emphasize that we are living in a free world and that thus anyone, and thus also Jordan et al. can publish open letters in which they request for a retraction of this paper, and I also would like to emphasize that there are many examples (at Retractionwatch) where journals have very quickly retracted papers when they were contacted by third parties (not being the authors and/or their institutes) with more or less comparable concerns. So there are no fixed rules in which it is stated that it is mandatory for journal editors that a paper cannot be retracted when there is no underlying investigation of the affilated institute. Excuse me very much, but that’s not how it works.

        • Hi Klaas,

          Of course Nick and his co-authors can write an open letter, although I myself would have contacted the editor (Todd Shackelford) directly.

          My point was that I think dictating to the editor (or the journal or the university, etc. ) how the situation should be handled is inappropriate. Different journals have different policies for dealing with these situations, both across disciplines and within disciplines. Also, some journals and journal editors will want to follow the COPE guidelines for retraction, for whistleblowers who complain directly and for whistleblowers who complain on the social media, for plagiarism, for duplicate publication, for fabricated data, data theft, and so on.

          I did not state that the article(s) should not be retracted, nor did I state that the article could not be re-submitted. I stated that I thought that it was inappropriate to tell the editor how the situation should be handled.

          Best wishes,


  16. Hi Glen M. Sizemore,

    That’s a within-subject across-stimuli design. I think most psychological research should be within-subject and have written about this before. Example: doi: 10.1037/a0036961.

    Do you know Cattell’s “databox”? Very useful but it was published in the 1940s by a personality psychologist and many people seem unaware of it.



  17. OK…I have given up trying to figure out where to respond for particular posts to which one can no longer respond due to the “end of the hierarchy” – so I’ll just respond at the bottom. This is a response to Carol.

    Carol: If you are interested in the topic of between-subjects analysis vs. within-subject analysis, Peter Molenaar has written some wonderful articles on this.

    GS: Does he mention behavior analysis in any of these papers…or just ignore it as is usual for most psychologists?

    • Hi Glen,

      “Does he mention behavior analysis?” Not that I recollect. Here’s a reference to his work:

      Molenaar, P. C. M. (2004). A manifesto on psychology as idiographic science: Bringing the person back into psychology, this time
      forever. MEASUREMENT: INTERDISCIPLINARY RESEARCH AND PERSPECTIVES, 2, 201-218, with comments by Tuerlinckx; von Eye; Thum;
      Rogosa; Nesselroade; Curran & Wirth; a discussion by Molenaar; and a later rejoinder to Rogosa.

      • “Does he mention behavior analysis?” Not that I recollect.

        Hi Carol,

        Thanks for the response. I already figured that he did not mention behavior analysis, a natural science that has used single-subject designs since its inception in the early ’30s. I did glance at his stuff and so no indication that he mentioned it. And look at the title of his paper that you posted: “A manifesto on psychology as idiographic science: Bringing the person back into psychology, this time forever.”

        Perhaps the key is his claim that what he is after is “idiographic.” Is that what he really means? Or, rather, how is he using this term? Either way, not mentioning behavior analysis counts as poor scholarship, IMO. Clearly, behavior analysis focuses on individuals (and is therefore worth mention), whether human or nonhuman. It is, however, a natural science and interested in finding general explanatory principles. Perhaps that is why he finds behavior analysis irrelevant? From the little I saw of what he writes, he really is interested in general principles, but I suspect that “idiographic” is used somewhat idiosyncratically!

        • Hi Glen,

          Molenaar may simply not be aware of behavior analysis. He began as a mathematical psychologist and then entered developmental psychology, I think.

          If memory serves, he also doesn’t mention the single-subject research designs sometimes used in medicine and clinical psychology.

          Why don’t you take it up with him? He would probably be delighted to find that someone else is a fan of within-subject studies. In Todd Rose’s book, THE END OF AVERAGE, Rose mentions that Molenaar has had difficulty convincing psychologists of his ideas.

          [email protected]


        • Hi Carol,

          I may drop him a line. But…not aware of behavior analysis? Perhaps he’s a psychologist and never heard of Fred Skinner? That itself is an indictment – not necessarily of him, but of psychology in general. But…as you might have guessed, I think 97% of experimental psychology is absolute trash.

        • Hi Glen,

          Behaviorism has been out of fashion in psychology, having been eclipsed by the “cognitive revolution.” I’ve been associated with several different universities and have almost never heard it mentioned at any of them. Also, Molenaar is Dutch, I think, even though he works in the USA now, so perhaps that has something to do with it.


        • I’m aware of the status of “behaviorism” in mainstream psychological “science.” But, hey, look at it this way, 90% of what is published in mainstream experimental psychology is crap (and that is being charitable), either the sort of NHST nonsense criticized here, or stuff that stems from the indirect realism adopted by mainstream psychology, whose concepts (being, in reality, merely names for behavior – can you say “dormitive virtue”)are obtained via circular reasoning and justified through the use of its moronic interpretation of operationism. But…other than that, I don’t have strong feelings one way or the other.

    • Jordan:

      I clicked thru and read the long post. I agree with much of what Heathers writes, but I disagree when he writes of Susan Fiske, “I like her style. She isn’t afraid to speak her damn mind . . .” I remain upset by the “methodological terrorists” remark, and I remain angry that she never, to my knowledge, apologized for it. (Heathers wrote that there was “an apology” for it, but I don’t recall ever seeing that apology.)

      Overall, though, what Heathers says rings true to me. In particular, I’d never heard of Satoshi Kanazawa, Amy Cuddy, Susan Fiske, Daryl Bem, Roy Baumeister, John Bargh, Brian Wansink, Neil Anderson, Deniz Ones, Karl Weick etc etc—not to mention the ovulation-and-clothing researchers, the ovulation-and-voting researchers, the ages-ending-in-9 researchers, the himmicanes researchers, the air rage researchers, etc etc.—until people asked me what I thought of the wacky things they’d published (or, in the cases of Bargh, Baumeister, Anderson, and Ones, their stunning refusal to accept the possibility that their big claims may have been mistaken).

      I should also add that I have several motivations for writing about these things, beyond the usual rationales of (1) “someone is wrong on the internet” and (2) avoiding real work. Probably my most important motivation for investigating research errors and weak responses to criticism is to better understand the role of statistical information in scientific discovery. Believe it or not, careful study and argument about these cases has improved my understanding of important aspects prior distributions, Bayesian inference, replication, and other issues. “The garden of forking paths,” for example, is not just a slogan; to me, it represents a step beyond earlier ideas such as the file-drawer effect. And I’d not really thought about how huge the Type M error could be until my colleagues and I thought carefully about examples such as the ovulation-and-voting analysis. The understanding I gathered from these silly examples is relevant to lots of more important problems such as the early-childhood-intervention study which we’ve discussed on this blog.

      • Yeah, looking over Wansink’s work mainly feels like a waste of time, but I have a hunch that there is a lot of work out there on more important research that is done just as carelessly, but because of the lack of open data and tools to detect inconsistencies has thus far gone undetected. Hopefully after perfecting some of these techniques on Wansink’s large body of research I’ll find problems with important work.

        I also hope that the publicity our work has gotten inspires other people to do simple checks such as granularity testing, or serves as a model by which people might express concerns with a paper (i.e. just post a preprint instead of waiting years to get your comment accepted by a journal).

        • Interesting.

          In the past I would have second hand information about researchers _likely_ being sloppy and no way of verifying that.

          Even when I got access to data entry facilities they used and found them defective, one could never be sure if some researchers had the heads up or diligence to fix these defects. For instance, one particular researcher personally self verified all the records by hand. Pretty sure no one else did that and suspect many just went with what they got as _data_ but I could not know for sure.

          So you sort of have an X-ray machine for sloppily done research.

      • Andrew, you keep forgetting to include Richard Tol in your lists! :) Unlike some of these folks, his work has very pernicious consequences for our largest environmental crisis…

        • And Michael Lacour. And Ed Wegman. And the air-pollution-in-China researchers. And Bruno Frey. And Michael Whitaker. And Karl Weick. And Dr. Anil Potti. But I think we’ve still forgotten a few. It’s hard to keep track of all of them. There’s also the journalistic division: Gregg Easterbrook, David Brooks, the crew at NPR, etc.

        • :) Maybe you should compile a master list and then just use a random sample function when you are writing your blog posts. That way we can at least have a good frequentist coverage guarantee ;)

Leave a Reply

Your email address will not be published. Required fields are marked *