More on why Cass Sunstein should be thanking, not smearing, people who ask for replications

Recently we discussed law professor and policy intellectual Cass Sunstein’s statement that people who ask for social science findings to be replicated are like the former East German secret police.

In that discussion I alluded to a few issues:

1. The replication movement is fueled in large part by high-profile work, lauded by Sunstein and other Ivy League luminaries, that did not replicate.

2. Until outsiders loudly criticized the unreplicated work, those unreplicated claims were essentially uncriticized in the popular and academic press. And the criticism had to be loud, Loud, LOUD. Recall the Javert paradox.

3. That work wasn’t just Gladwell and NPR-bait, it also had real-world implications.

For example, check this out from the Nudge blog, several years ago:

As noted above, Sunstein had no affiliation with that blog. My point is that his brand was, unwittingly, promoting bad research.

And this brings me to my main point for today. Sunstein likens research critics to the former East German secret police, echoing something that a psychology professor wrote a few years ago regarding “methodological terrorists.” But . . . without these hateful people who are some cross between the Stasi and Al Qaeda, those destructive little second-stringers etc. . . . without them, Sunstein would I assume still be promoting claims based on garbage research. (And, yes, sure, Wansink’s claims could still be true, research flaws notwithstanding: It’s possible that the guy just had a great intuition about behavior and was right every time—but then it’s still a mistake to present those intuitions as being evidence-based.)

For example, see this recent post:

The link states that “A field study and a laboratory study with American participants found that calorie counts to the left (vs. right) decreased calories ordered by 16.31%.” 16.31%, huh? OK, I’ll believe it when it’s replicated for real, not before. The point is that, without the research critics—including aggressive research critics, the Javerts who annoy Sunstein and his friends so much—junk science would expand until it entirely filled up the world of policy analysis. Gresham, baby, Gresham.

So, again, Sunstein should be thanking, not smearing, people who ask for replications.

The bearer of bad tidings is your friend, not your enemy.

P.S. Probably not a good idea to believe anything Brian Wansink has ever written, at least not until you see clearly documented replication. This overview by Elizabeth Nolan Brown gives some background on the problems with Wansink’s work, along with discussions of some political concerns:

For the better half of a decade, American public schools have been part of a grand experiment in “choice architecture” dressed up as simple, practical steps to spur healthy eating. But new research reveals the “Smarter Lunchrooms” program is based largely on junk science.

Smarter Lunchrooms, launched in 2010 with funding from the U.S. Department of Agriculture (USDA) . . . is full of “common sense,” TED-talk-ready, Malcolm Gladwell-esque insights into how school cafeterias can encourage students to select and eat more nutritious foods. . . . This “light touch” is the foundation upon which Wansink, a former executive director of the USDA’s Center for Nutrition Policy and Promotion and a drafter of U.S. Dietary Guidelines, has earned ample speaking and consulting gigs and media coverage. . . .

The first serious study testing the program’s effectiveness was published just this year. At the end of nine weeks, students in Smarter Lunchroom cafeterias consumed an average of 0.10 more fruit units per day—the equivalent of about one or two bites of an apple. Wansink and company called it a “significant” increase in fruit consumption.

But “whether this increase is meaningful and has real world benefit is questionable,” Robinson* writes.

Nonetheless, the USDA claims that the “strategies that the Smarter Lunchrooms Movement endorses have been studied and proven effective in a variety of schools across the nation.” More than 29,000 U.S. public schools now employ Smarter Lunchrooms strategies, and the number of school food service directors trained on these tactics increased threefold in 2015 over the year before.

Also this:

One study touted by the USDA even notes that since food service directors who belong to professional membership associations were more likely to know about the Smarter Lunchrooms program, policy makers and school districts “consider allocating funds to encourage [directors] to engage more fully in professional association meetings and activities.”

But now that Wansink’s work has been discredited, the government will back off and stop wasting all this time and money, right?

Ummm . . .

A spokesman for the USDA told The Washington Post that while they had some concerns about the research coming out of Cornell, “it’s important to remember that Smarter Lunchrooms strategies are based upon widely researched principles of behavioral economics, as well as a strong body of practice that supports their ongoing use.”

Brown summarizes:

We might disagree on whether federal authorities should micromanage lunchroom menus or if local school districts should have more control, and what dietary principles they should follow; whether the emphasis of school cafeterias should be fundraising or nutrition; or whether school meals need more funding. But confronting these challenges head-on is a hell of a lot better than a tepid consensus for feel-good fairytales about banana placement.

Or celebrating the “coolest behavioral finding of 2019.”

P.P.S. I did some internet searching and came across this tweet by Sunstein from 2018:

On one hand I appreciate that dude linked to our blog. On the other hand . . . I hate twitter! What does Sunstein mean by saying my post was “ill-considered and graceless”? I’d appreciate knowing what exactly I wrote was “ill-considered” and what exactly was graceless? I’m soooo sick of the happy-talk Harvard world where every bit of research is “cool” and scientists are all pals, promoting each other’s work and going on NPR.

Again:

Science is hard. I appreciate the value of public intellectuals like Sunstein, who read up on the latest science and collaborate with scientists and do their best to adapt scientific ideas to the real world, to influence public and private decision making in a positive way. I don’t agree with every one of Sunstein’s ideas, but then again I don’t agree with every one of any policy entrepreneurs’ ideas. That’s fine: it’s not their job to make me agree with them. Policy is controversial, and that’s part of the point too.

OK, fine. The point is, science-based policy advice ultimately depends on science—or, at least it should. And if you want to depend on science, you need to be open to criticism. Not every “cool” or “masterpiece” idea is correct. Even Ivy League professors make mistakes, all the time. (I know it: I’ve been an Ivy League professor forever, and I make lots mistakes all the time, all by myself.)

So to take legitimate criticism—in this case, when someone sent me a New York Times article that included following passage:

Knowing a person’s political leanings should not affect your assessment of how good a doctor she is — or whether she is likely to be a good accountant or a talented architect. But in practice, does it? Recently we conducted an experiment to answer that question. Our study . . . found that knowing about people’s political beliefs did interfere with the ability to assess those people’s expertise in other, unrelated domains.

and I pointed out that, no, the research article in question never said anything about doctors, accountants, architects, or any professional skills . . . When I did that, I was helping out Sunstein. I was pointing out that he was over-claiming. If you want to make science-based policy and you want to do it right, you want to avoid going beyond what your data are telling you. Or, if you want to extrapolate, then do so, but make the extrapolation step clear.

Is it “ill-considered and graceless” to point out an error in published work? No, it’s not. I can’t speak to graceless, but I think it was ill-considered to refer to Brian Wansink’s work as “masterpieces” and I think it was ill-considered to write something in the New York Times that mischaracterized their work. I can understand how such things happen—we all make mistakes, even in our areas of expertise—but, again, criticism can help us do better.

Again, I hate twitter because it encourages people to sling insults (“ill-considered,” “graceless,” “Stasi,” etc.) without backing them up. If you knew you had to back it up, you might reconsider the insults.

P.P.P.S. Check this out:

Lacking information of our own and seeking the good opinion of others, we often follow the crowd . . . when individuals suppress their own instincts about what is true and what is right, it can lead to significant social harm. While dissenters tend to be seen as selfish individualists, dissent is actually an important means of correcting the natural human tendency toward conformity and has enormous social benefits in reducing extremism, encouraging critical thinking, and protecting freedom itself.

While much of the time it is in the individual’s interest to follow the crowd, it is in the social interest for individuals to say and do what they think is best.

Interesting. Or is it just “cheap talk“? Maybe celebrity law professors need “a libertarian paternalism commitment device” to stop themselves from attacking their critics.

32 thoughts on “More on why Cass Sunstein should be thanking, not smearing, people who ask for replications

  1. I think it would be great to have a contest for the coolest behavioral finding!

    But why have just one? Why not have the Causal Inference and Social Science Grammies – the Gelman Awards! That would be awesome! Here’s some great categories in which papers could get a Gelman:

    Coolest Robust Behavioral Finding – individuals
    Coolest Robust Behavioral Finding – groups/cultures
    Most Important Robust Policy Finding
    Most Expected Replication Failure
    Least Expected Replication Failure
    Most Likely Future Replication Failure (will probably need many subcatgories for this one)

    Wow, you could both inspire great science and slam junk science at the same time, and get the community to participate!

  2. The calorie labeling finding was replicated in Israel in the same paper. Because the reading direction in Hebrew is right to left, they hypothesized (and found) that labels on the right would be effective. I don’t see why this would be a suspicious finding: that people overweight the first piece of information seems well established – consistent finding that people update beliefs less than the Bayesian benchmark.

        • You really think Sunstein is dumb?

          I never said anything about malice by the way. But if your prior predictive is so off, so often, shouldn’t you look again at your priors?

        • Bob:

          I was only invoking Hanlon in a rough way. I don’t think Sunstein is either stupid or malicious. My impression is that he is heavily invested, both intellectually and in his career, in the policies of “libertarian paternalism” and in social psychology of the sort that implies that people can be consistently manipulated by seemingly irrelevant stimuli.

          Given this, he has some short- and medium-term incentives to hype such research (for example, describing Brian Wansink’s experiments as “masterpieces,” or exaggerating the applicability of his own research in that New York Times article), to ignore problems with this research (for example, not taking stock of how the replication crisis in social psychology should alter his views on social policy), and to lash out against critics (for example, calling them Stasi, graceless, etc.).

          Longer-term, though, it seems to me that the scientific truth will out, and so Sunstein has an incentive to try to get things right, and to formulate policy ideas that are consistent with the real world, not just the fake world of Wansink papers.

          So I guess what I’m saying is that Sunstein is acting in his short-term interest but not in his long-term interest, and maybe he needs some incentives to help him move in a direction that will be better for him. In that way, I have a Sunstein-like perspective myself!

  3. I haven’t looked at the underlying study, but the biggest problem with “the coolest behavioral finding of 2019” is the credibility of such a large effect, a 16% change in calories ordered, as a result of such a small intervention. There are two mechanisms at play in the study, the effect of a calorie label on ordering behavior and the effect of placement on the awareness of the label on the part of consumers. Each is likely to be modest by itself, and taken together very modest. I’d have to see a lot of replication before I believed something like that.

    What strikes me is that Sunstein, at the very pinnacle of public intellectual-dom in the US, seems so devoid of skepticism about studies like this. You’d think he would have learned by now that even the most rigorous sciences proceed through a maze of false starts and promising dead ends.

    • Peter:

      I agree. Falling for Brian Wansink’s “cool research” shtick back in 2008 when just about everyone believed it, that’s one thing. Falling for something similar in 2019: that’s more disturbing.

    • You’ve hit on what I’ve long felt is a key component of these stories. Almost invariably they pair a plausible directional effect with an impossible magnitude. Example: making snacking slightly less convenient can cause you to lose twenty pounds. The first part lulls the skepticism, the second drives the clicks.

      • making snacking slightly less convenient can cause you to lose twenty pounds

        This is plausible to me under certain circumstances. If you reduce your carbs to under 20 g per day you can easily lose 10+ lbs in a week. This is almost all water, but it will show up on the scale.

    • The post and comments motivated me to take a close look at the paper on calorie labels. Consistent with the concerns expressed here, the results seem too good to be true. That is, if the effects are real and similar in magnitude to what is reported, then it should be very uncommon for experiments with similar sample sizes to be uniformly successful (which they were reported to be).

      A colleague and I did a formal analysis and submitted it to the journal that published the original paper. The editor there declined to send it out for review. Instead, we got it published at Meta-Psychology, and it just appeared there:

      https://open.lnu.se/index.php/metapsychology/article/view/2266

      • I didn’t read the article you linked, but appreciate the reference to the journal itself, which I had not heard of before, but which sounds like it is filling a need.

  4. “My point is that his brand was, unwittingly, promoting bad research.”

    Unwittingly, my foot. It’s not a bug, it’s a feature. There’s a lot of tax dollars and prestige positions available for anyone who can position their surfboard on the wave of interventionist policies that are currently in vogue.

    When the tide changes and something else becomes the new big deal they’ll hop on that one without a backwards glance, while the bean counters are pointing out methodological flaws in a paper from years ago.

    It’s why in the UK there’s constant campaigning about a non existent childhood obesity crisis. It means you can hoover up as much tax payers money as possible, while positioning yourself as the good guy to immunize yourself against any opposition.

    That’s why he reacts so strongly to criticism. It’s got nothing to do with good or bad science, it has the potential to hurt his bottom line.

    • “currently….”

      e.g., since the new deal. :)

      It is “unwitting” in the sense of Simon and Garfunkel: “still a man sees what he wants to see and disregards the rest” hmmm mmmmhm

      • I thought the video was quite good. Sure, “nudge theory” is a lot of hype and a new label on old ideas, but that doesn’t make it a ridiculous concept. What I find interesting about nudges is that they clearly work sometimes – engineering people’s choices can certainly have noticeable effects on their behavior – but at other times a little nudge is not likely to have much impact (as Andrew keeps convincing me, it is not easy to change a voter’s preference from their party affiliation, despite plenty attempts to nudge them). It should be relatively easy to produce “evidence” that nudges work – and I suspect much of that research could be replicated. Where things go awry is when that evidence is used to extrapolate to other circumstances (in a sense, exactly the fundamental statistical question of using evidence from a sample to make inferences about a population, often a somewhat different population than the sample was drawn from).

        Thus, it seems to me that we have a mixture of claims and evidence about nudges. Some work and may produce large effects, others do not. I would speculate that the difference involves whether our underlying behavior is more or less “hard-wired” (I’m thinking about System 1 behavior here, to use Kahneman’s term here). Our party affiliations may have tribal characteristics that small nudges are not likely to impact. Our use of default settings on printers (one sided or two) may not involve anything other than the intrinsic laziness of human behavior – so a little nudge can have a big impact.

        Weight loss would seem to me to be an intermediate case. I suspect that we can nudge people to low large amounts of weight in the short-run, but not in the long-run. Similarly, I am prone to believe the research (discussed on this blog recently, such as “mindset” interventions) that relatively small interventions can improve scholastic performance in the short run, but I’m more doubtful about the long run sustainability and size of the impact.

        Perhaps more research needs to be devoted to parsing out where nudging does and does not work, and why. The headlines are too easily grabbed by showing a case where nudging works and then speculating on how this means it can work everywhere else.

        • It seems interesting to try to study what kinds of nudges work in which situations. However, with the current “orthodoxy” of use of statistical inference in many fields, it seems high unlikely that these questions could be answered in the near future.

        • It once happened that I had to decide how to receive a small accessory part of my salary separately, either in cash or as an insurance premium for a —not very interesting— retirement scheme.

          I forgot to turn in the form before the deadline, so the amount went to the insurance.

          Is that nudging? A subtle psychological “intervention” that made me decide in my best interest, against my better judgment?

  5. Late addition to the lineup:

    Is it time to ditch the term “evidence based”? It doesn’t have any real purpose. All science is “evidence based”. “evidenced based science” is kind if like the “Rio Grand River”. Its an extra word that makes it less. In fact its worse than that. Ever heard of a “deluxe mercedes”? No, because if you have to say it, it isn’t so.

    • I partly agree — I think that the phrase “evidence-based” can mean too many different things to different people, to the point that it doesn’t say anything. On the other hand, we do need a term to express something like “based on high quality evidence”. Or perhaps, “based on high quality evidence, including high quality reasoning.”

    • I have a post coming up on internal contradictions within the “evidence-based medicine” movement. . . . Hmm, I thought I did, but searching the upcoming schedule, I don’t see it. So I’ll write the post now. It should appear in Feb.

  6. The extended Sunstein quote from ‘Conformity’ put me in mind of Charlan Nemeth’s ‘In Defense of Troublemakers: The Power of Dissent in Life and Business.’ I found it more useful to my thinking than ‘Nudge.’

  7. The corollary to all this, and closely related to Javert’s paradox, is the social law: Whistleblowers always get punished. Early in my career, a friend/colleague and I tried to replicate (obsessively) a famous senior colleagues signature work. After two years of trying, we were able to demonstrate that the findings were junk. The data, which had been generated via computer software simulation, didn’t replicate, nor did it measure what it was claimed to be measuring. We approached the senior star with our work and had the sad reality explained to us. If we tried to publish our results, they would at best end up in a third rate journal. In doing so, we would earn the permanent enmity, not just of the star, but of all his acolytes. The alternative, we to extend his work, by improving his code, and then, publish the “replication and extension” in a second tier journal. The denouement to it all occurred a couple of years later, when at two separate job interviews, even more luminous stars asked why we had wasted so much time in the most productive years of our careers on replicating and improving the work of others. Now you might ask, why haven’t the names of the three luminaries been revealed? After all, two of them are now deceased. Go back and read the first sentence of this note. Still applies. Probably more so now than when we were just out of grad school, 30+ years ago. It’s no fun being a paraiah.

Leave a Reply to Sameera Daniels Cancel reply

Your email address will not be published. Required fields are marked *