Fast analysis, soft statistics, and junk data intake is unrelated to research quality for 0% of American scientists

Under the heading, “Yet another bad analysis making the rounds,” John Mount writes:

This won’t waste much of your time—because there really isn’t much there. But I thought you would be disturbed by this new paper. Here’s my (Mount’s) commentary on what we can surmise about the methods.

Mount is pretty scathing. He starts with a bang:

The following article is getting quite a lot of press right now: David Just and Brian Wansink (2015), “Fast Food, Soft Drink, and Candy Intake is Unrelated to Body Mass Index for 95% of American Adults”, Obesity Science & Practice, forthcoming (upcoming in a new pay for placement journal). Obviously it is a senstational contrary position (some coverage: here, here, and here).

I thought I would take a peek to learn about the statistical methodology (see here for some commentary). I would say the kindest thing you can say about the paper is: its problems are not statistical.

At this time the authors don’t seem to have supplied their data preparation or analysis scripts and the paper “isn’t published yet” (though they have had time for a press release), so we have to rely on their pre-print. Read on for excerpts from the work itself (with commentary).

He continues from there.

The media outlets who took the bait and ran with the press release are Fortune, MedicalXpress, and Pacific Standard. Fortune is a shell of its former self and will probably run with just about any content that is supplied to them, MedicalXpress does not appear to be a real anything, and Pacific Standard we’ve already discussed as a serious media outlet with old-school science reporting of the heroic researcher variety. Last time we noted a Pacific Standard series that described itself as follows:

Findings is a daily column by Pacific Standard staff writer Tom Jacobs, who scours the psychological-research journals to discover new insights into human behavior, ranging from the origins of our political beliefs to the cultivation of creativity.

And this latest hyped study appears in this column:

Quick Studies is an award-winning series that sheds light on new research and discoveries that change the way we look at the world.

That’s right: a magazine with two separate columns about new insights, research, and discoveries. At this point they can’t just mine Psychological Science and PPNAS, they have to dip a bit lower into the publication pool.

But at this point there are so many media outlets that I guess any junk study will get coverage. Given that the paper is appearing in a low-reputation journal, I guess the credibility is coming from Cornell’s reputation. I see that the senior author of this paper is “the John S. Dyson Professor of Marketing in the Charles H. Dyson School of Applied Economics & Management at Cornell University and is the Director of the Cornell Food and Brand Lab. He is the author of the best-selling book Mindless Eating: Why We Eat More Than We Think (Bantam Dell 2006). Between 2007 and 2009 he was the Executive Director of the Center for Nutrition Policy and Promotion in Washington DC, leading the development of the USDA 2010 Dietary Guidelines. He is the President Elect of the Society for Nutrition Education.” And the junior author is “a professor and Director of Graduate Studies in the Charles H. Dyson School of Applied Economics and Management at Cornell University. In addition he serves as co-director of the Cornell Center for Behavioral Economics in Child Nutrition Programs.” So I guess these guys are well respected.

Maybe the issue is that, once you’re an expert, you start to believe your own theories a bit too much, and evidence becomes just a way to support a theory rather than a way to learn. Or maybe they’re just bad at statistics and have ventured beyond their research competence. In any case it’s too bad they had to drag their public relations department and Cornell University into this mess. Dragging fine reputations into the dirt just for a publication in Obesity Science & Practice and a quick blurb in Pacific Standard. It’s hardly worth it.

Then again, Daryl Bem is a Cornell professor so it’s not like that institution upholds the highest standards in quantitative research.

P.S. Perhaps it’s worth emphasizing that I have no reason to think these researchers are doing anything unethical. I’d guess it’s simple incompetence. Statistics is hard, and it doesn’t help when you’re already the Mr. Big Professor and the Director of the Bigtime Lab—then it’s easy to believe your own hype. Maybe they should’ve taken a hint when all the good journals rejected their paper—you don’t think Obesity Science & Practice was their first choice, do you?—but then again I get papers rejected from good journals all the time, and my usual instinct is to blame the %$%^*%^* reviewers, not to question the quality of my own work. And of course the editors at Pacific Standard are busy trying to fill up their magazine, and the author of the news article wants to get published, and this is what he knows how to do. And we can hardly blame the P.R. professionals who helped with the press release; that’s their job. Each player in the hype cycle plays his role.

Nobody’s a bad guy here. But the result—that is bad. And if researchers get reputational bumps from publishing high-quality work, and if they get reputational bumps from public dissemination of high-quality work—and I think they should get such benefits—then damn straight their reputation should take a hit if they publish and promote crap. Fair is fair. To not slam people for low-quality work is implicitly to hurt all the serious researchers out there who don’t just publish anything, who don’t hype their work, who are careful maybe to run their statistics by an expert (yes, Cornell has many excellent statisticians) rather than trying to sneak substandard work into print. I show my respect for researchers who show care, by not going easy on those who don’t.

P.P.S. Just to clarify one more step: I have nothing personal on these researchers, neither of whom I’d ever heard about before. There’s no reason for one bad paper to count more than two illustrious careers. All of have our bad days and even our bad research projects. Should we judge Isaac Newton based on his work on alchemy, or John Maynard Keynes for his advocacy of the gold standard, or Jean Piaget on his work on embodied cognition, or Niall Ferguson for whatever outrageous thing he said last week? No, of course not. I’m sure if you go through my published papers you’ll find a false theorem or two. In all seriousness, I have no reason to doubt that Brian Wansink’s service to the USDA was illustrious or that the students at Cornell are lucky to have him and David Just as instructors and research leaders. And why shouldn’t a couple of business-school marketing professors be giving out nutritional advice? Not every research project involves statistical inference, and if these people are subject-matter experts, that’s fine. No need to judge all their work based on one incident.

31 thoughts on “Fast analysis, soft statistics, and junk data intake is unrelated to research quality for 0% of American scientists

  1. Brian Wansink has won an Ig Nobel Prize in Nutrition – Annals of Improbable Research (2007) Harvard University. Maybe he’s gunning for the 2016 prize? Could be a spoof article, to expose the gullibility of the press, like the chocolate for losing weight study.

    • Shravan:

      That’s an interesting thought. I could ask him, I guess, except that (a) if it’s not a hoax he could be offended that I’m asking, and (b) if it is a hoax he might want to continue it by not admitting it!

    • I haven’t seen the Ig Nobel paper. But it actually sounds like potentially a very good experiment: people were served tomato soup in a bowl that sneakily refilled itself through hidden tubes. If done right this sounds like a very good way to study portion control and perception of satiety. Also it looks like Biran Wansink had the good humor to attend the ceremony.

      This was always my issue with long famous Senator Proxmire- about half the stuff he attacked was legitimate good research that needed a moment to be appreciated, and half of the stuff he attached was downright fraud and chicanery. But the uncritical grouping killed the value of the criticism.

  2. “Maybe the issue is that, once you’re an expert, you start to believe your own theories a bit too much, and evidence becomes just a way to support a theory rather than a way to learn.”

    Not just when you’re an expert. The main thing I learned from Bill James is what statistics is *for,* namely to provide evidence rather than support. As soon as you want to support a theory rather than test it severely, you’re doomed.

    • Personally, I always use statistics to confirm what I already believed before I run my study. Any other approach would be career-ending. When a funding agency looks at my record to decide whether to fund me, they are going to be totally unimpressed that I keep finding out that my previous theories were shown to be wrong by me myself.

      • > unimpressed that I keep finding out that my previous theories were shown to be wrong by me myself.
        I would be very impressed (given all was credible) – though I understand your dilemma :-(

        As CS Peirce (roughly) said “good thing we die, otherwise we would learn that anything we thought we understood – we didn’t”.

        Someone who repeatedly shows their previous theories are wrong is just being very efficient!

  3. So depressing. But there are actually many people in nutrition upset that the correlation between intake and BMI is sometimes negative in large surveys. Something is badly wrong they say. Wrong level of analysis, wrong interpretation.

    If you have any doubt that the “bedevilled” diet studied in this paper is problematic for weight status then spend a month doubling your intake of such foods. You’ll have your answer. But that is a within person analysis. The population surveys are between person variation, and the inferences are completely different. Weight status is a non-stationary system and people are not homogeneous energy processing units. Nutrition has been through this debate before in its history but the lesson doesn’t seem to stick. It’s not just these business guys who make the mistake.

  4. Not to nitpick or anything, but when did Keynes advocate for the gold standard? His most famous quote on the subject called it a “barbarous relic”.

    wrt the paper under discussion, I look forward to the press release announcing the discovery of negative p-values. Or maybe a new study from David Dunning (also at Cornell).

  5. You should make some substantive criticism here, I have no idea why you think their paper is bad and why you think you are allowed to criticize them. What are the main problems?? This kind of post is weird, criticism should have substance!

    • Jack:

      Actually, no, in my opinion the burden is on the authors of a paper to make their case. If you want to see some specific criticisms, follow the first link in the above post. For this post, my interest is not in the particular paper and its particular flaws but rather the more general question of how it is that researchers can get expert credentials while producing this sort of work.

      • Andrew, I mean what I say for real. Really, I used to be a fan, but your blog is getting weirder everyday. It seems like you are writing whatever you want with no filter. I just keep reading because everybody still reads it, it is a network thing. But if it keeps that way I will just stop. I mean it, your blog used to be one of the best. Write real content, otherwise you will be famous for the wrong reasons.

        • John:

          Not a problem, there’s no reason for you to read stuff that you don’t find interesting. You can also feel free to just read the posts that interest you. It’s your call—there’s lots of free content on the web! Or if you really want filtered stuff, you could subscribe to Psychological Science—every paper there is peer reviewed!

        • John:

          Just to elaborate slightly: There’s no way of keeping 10,000 readers all happy all the time. Some of my readers love my posts on voting and public opinion, others find political science utterly boring. Some people love the sports examples, others don’t see the point. Some people want me never to post on plagiarism, others are fascinated by this aspect of science communication. Sometimes I post on technical matters and most of the readers can’t follow—but one of the reasons I post here is to get ideas down that I can come back to later. I’ve heard people tell me that they love it when I take apart graphs, that this helps them in their own work; other people are sick and tired of hearing about grids of line plots.

          We have over 400 posts here a year, and probably the only person who is interested in all these topics is . . . me. Actually, sometimes I’m not so interested either! A few days ago I posted what seemed to me to be possibly the most boring thing I’ve ever posted, and Jordan Ellenberg commented that he didn’t think this is boring at all!

          Is it worth it for you to follow a blog that you find irritating most of the time? Ultimately it’s your call. Of course I’d like to make all 10,000 readers happy all the time (ok, maybe only 9999 readers, if Weggy’s a subscriber), but given that each of you has your own utility function, I can’t really hope to do so!

          The way you can make this blog better, if you’d like, is to contribute interesting comments, or even to suggest topics for posting. We have space available for new posts in June.

      • Andrew,
        I think you are missing Jack’s point here. You have linked to John Mount’s post and ridiculed the paper by Just and Wansink apparently without reading it. I also have not read their paper, but I read John Mount’s critique, and while some of it is legitimate, some of it is just absurd. For example he quotes the paper:

        “Likewise, those with normal BMIs consume an average of 1.1 salty snacks over two days, while overweight, obese, and morbidly obese consume an average 0.9, 1.0, and 0.9 salty snacks, respectively.”

        and then comments that since the independent variable doesn’t vary, “you are not going to be able to see if it drives an outcome.” What the heck does he expect the researchers to do? If there is no effect, then, no, you will not see an effect.

        At other points, he argues that their methods are wrong, while admitting he doesn’t actually know what they did and is only guessing at it.

        I don’t think it is reasonable for you, Andrew, to blast researchers for doing junk science when your conclusion is based on junk criticisms that someone else has made.

        • Eric:

          I did look at the paper and I don’t find it at all convincing. Here’s the conclusion of their abstract:

          For 95% of this study’s sample, the association between the intake frequency of fast food, soft drinks, and candy and BMI was negative. This result suggests that a strategy that focuses solely on these problem foods may be ineffective in reducing weight. Reducing the total calories of food eaten at home and the frequency of snacking may be more successful dieting advice for the majority of individuals.

          I don’t see their statistical calculations as offering much support for these claims, and I don’t like the hype.

        • If the independent variable isn’t varying then the best you can say “I can’t test for a relation”, not the strong claimed “there isn’t a relation.” The reason we are guessing what they did is their methods section doesn’t tell us- an inadequacy of the original paper itself.

        • John,
          The fact that mean snacking frequency was similar in the 4 groups does not tell you that it does not vary among individuals. It does tell you that it is not related to the outcome. If in fact it does not vary among individuals, then you are correct that this is a foregone conclusion, but it is still true that it is not related to the outcome among the study participants (who are, in NHANES, a representative sample of US residents). You have shown no evidence that it does not vary among individuals, but regardless of that, because of the nature of the sample, the researchers’ results suggest snacking frequency is not related to BMI category in the US (ignoring other weaknesses in their study).

        • To be a bit more precise:

          It is self-reported snacking frequency. Something that could easily be confounded with the weight variable.

        • John,
          We’re all aware of textbook examples in which a population varies on some factor, but someone studies an unrepresentative sample that does not vary on that factor and incorrectly concludes that the factor is not related to an outcome. But these data are from NHANES – a representative sample of the US. Even if the researchers did not look at unaggregated data – and we don’t know that – their results hold. Of course snacking frequency does vary at least a little bit, and the fact that they used a representative sample (ignoring other weaknesses) means that they are able to analyze that relationship.

          I too am skeptical of their conclusions, but I don’t think your criticism of this point is valid.

  6. Wansink’s work is valuable and his books are excellent, but it’s important to note they’re based in field work, in real world experiments about how people perceive food, how much they eat and how that can be managed, etc. I’ve read many of his papers. They don’t require much statistical work so I don’t know where this paper comes from (or, obviously, what it actually says yet). For example, to study how people eat at buffets, they worked with a chain of Chinese buffets and observed where people sat, how much they took, what their behaviors were (as in, walk around buffet first or just grab a plate), etc. This is very much like the observational work done by William Whyte about the use of public spaces. And like Whyte, Wansink’s work has resulted in redesigns, from the Chinese buffet client rearranging things to save cost to the adoption of 100 calorie packs by other clients of the food lab at Cornell.

    Another example: they gave people old, really stale popcorn at movie theaters and measured how much they ate and discovered they could manipulate the amount eaten by changing the size of the container. In another test, they set up 2 rooms with different ambiences and served the same food and wine and measured how that affected perception of food and wine quality, which goes directly to price. As you can infer, his work tends to be small scale and thus rather low powered, which is one reason why statistics doesn’t typically play that much of a role in his papers.

    It may be Wansink is operating out of his normal field work area, but I respect his work and that he does field work rather than rely on manipulation of data sets. To be honest, his reporting on the effect of reducing the size of your plate has changed my life for the better.

    • Jonathan:

      Wansink’s work could well be valuable. There’s a fundamental idea in statistics—that some experiments are just too noisy to be useful—that’s not well taught and not well understood. I could well believe that Wansink does excellent, Whyte-style qualitative research, some of which is backed up by larger quantitative studies—while when Wansink ventures into quantitative work, he can make the same sort of mistake made by John Bargh, Amy Cuddy, etc etc.

      • Exactly. In fact, I’d say that too many small studies resort to statistical analysis when they should present the findings or, rather, that they should spend more time on their study design so they can report their findings without fiddling with noisy data. I understand publication is an issue but I think there’s a huge problem in study design generally and, to an extent, the reliance on statistical findings for publication reflects that lack.

  7. The acknowledgements are odd:

    Acknowledgements: BW and DJ conceived of and carried out the study. DW analyzed
    data. Both authors were involved in writing the paper and approve its submission.

    Who is DW? I guess this is a typo.

  8. I always have to fight of nihilistic despair about social science when I read these posts of yours, Andrew. Today, I was reminded of Camus’ Myth of Sisyphus: “The fecundity and importance of a literary form are often measured by the trash it contains.” He was talking specifically about novels. The publication of a bad novel, even by a reputable press, even by an established author, does not make us give up hope for the genre, or even the author. (“Storytelling is hard,” you might say, “and it doesn’t help when you’re already the Mr. Big Author—then it’s easy to believe your own hype.”) I need to develop the same healthy sort of attitude about bad science. But journalists should too.

    And there’s the question of whether the most important task is writing another good novel or identifying another sham.

  9. Really sloppy work. There were several instances of negative p-values in the table. The negative values remain in the enhanced html version at the journal’s website http://onlinelibrary.wiley.com/enhanced/doi/10.1002/osp4.14 although they were not in the pdf version: http://onlinelibrary.wiley.com/doi/10.1002/osp4.14/epdf. The pdf available from SSRN at the time of Mount’s blog was a pre-peer review version so one would have hoped that the reviewers were able to up pick those errors. More disturbing was that the analyses were insufficiently described (nothing other than mentioning Stata) to know what tests were done since they just said they were “the standard tests”.

Leave a Reply to jonathan Cancel reply

Your email address will not be published. Required fields are marked *