Skip to content
 

Further debate over mindset interventions

Warne

Following up on this post, “Study finds ‘Growth Mindset’ intervention taking less than an hour raises grades for ninth graders,” commenter D points us to this post by Russell Warne that’s critical of research on growth mindset.

Here’s Warne:

Do you believe that how hard you work to learn something is more important than how smart you are? Do you think that intelligence is not set in stone, but that you can make yourself much smarter? If so, congratulations! You have a growth mindset.

Proposed by Stanford psychologist Carol S. Dweck, mindset theory states that there are two perspectives people can have on their abilities. Either they have a growth mindset–where they believe their intelligence and their abilities are malleable–or they have a fixed mindset. People with a fixed mindset believe that their abilities are either impossible to change or highly resistant to change.

According to the theory, people with a growth mindset are more resilient in the face of adversity, persist longer in tasks, and learn more in educational programs. People with a fixed mindset deny themselves of these benefits.

I think he’s overstating things a bit: First, I think mindsets are more continuous than discrete: Nobody can realistically think that, on one hand, that hard work can’t help you learn, or, on the other hand, that all people are equally capable of learning something, if they just work hard. I mean, sure, maybe you can find inspirational quotes or whatever, but no one could realistically believe either of these extremes. Similarly, it’s not clear what is meant by hard work being “more important” than smarts, given that these two attributes would be measured on different scales.

But, sure, I guess that’s the basic picture.

Warne summarizes the research:

On the one side are the studies that serious call into question mindset theory and the effectiveness of its interventions. Li and Bates (2019) have a failed replication of Mueller and Dweck’s (1998) landmark study on how praise impacts student effort. Glerum et al. (in press) tried the same technique on older students in vocational students and found zero effect. . . .

The meta-analysis from Sisk et al. (2018) is pretty damning. They found that the average effect size for mindset interventions was only d = .08. (In layman’s terms, this would move the average child from the 50th to the 53rd percentile, which is extremely trivial.) Sisk et al. (2018) also found that the average correlation between growth mindset and academic performance is a tiny r = .10. . . .

On the other hand, there are three randomized control studies that suggest that growth mindset can have a positive impact on student achievement. Paunesku et al. (2015) found that teaching a growth mindset raised the grad point averages of at-risk students by 0.15 points. (No overall impact for all students is reported.) Yeager et al. (2016) found that at-risk students’ GPAs improved d = .10, but students with GPAs above the median had improvements of only d = .03. . . .

So, mixed evidence. But I think that we can all agree that various well-publicized claims of huge benefits of growth mindset are ridiculous overestimates, for the same reason that we don’t believe that early childhood intervention increases adult earnings by 42%, etc etc etc. On the other hand, smaller effects in some particular subsets of the population . . . that’s more plausible.

Then Warne lays down the hammer:

For a few months, I puzzled over the contradictory literature. The studies are almost evenly balanced in terms of quality and their results.

Then I discovered the one characteristic that the studies that support mindset theory share and that all the studies that contradict the theory lack: Carol Dweck. Dweck is a coauthor on all three studies that show that teaching a growth mindset can improve students’ school performance. She is also not a coauthor on all of the studies that cast serious doubt on mindset theory.

So, there you go! Growth mindsets can improve academic performance—if you have Carol Dweck in charge of your intervention. She’s the vital ingredient that makes a growth mindset effective.

I don’t think Warne really thinks that Dweck can really make growth mindset work. I think he’s being ironic and that what he’s really saying is that the research published by Dweck and her collaborators is not to be trusted.

Yeager

I sent the above to David Yeager, first author of the recent growth-mindset study, and he replied:

I [Yeager] don’t see why there has to be a conflict between mindset and IQ; there is plenty of variance to go around. But that aside, I think the post reflects a few outdated ways of thinking that devoted readers of your papers and your blog would easily spot.

The first is a “vote-counting” approach to significance testing, which I think you’ve been pretty clear is a problem. The post cites Rienzo et al. as showing “no impact” for growth mindset and our Nature paper as showing “an impact.” But the student intervention in Rienzo showed an ATE of .1 to 18 standard deviations (pg. 4 https://files.eric.ed.gov/fulltext/ED581132.pdf). That’s anywhere from 2 to 3.5X the ATE from the student intervention in our pre-registered Nature paper (which was .05 SD).  But Rienzo’s effects aren’t significant because it’s a cluster-randomized trial, while ours are because we did a student-level randomized trial. The minimum detectable effect for Rienzo was .4 to .5 SD, and I’ve never done a mindset study with anywhere near that effect size! It’s an under-powered study.

In a paper last year, McShane argued pretty persuasively that we need to stop calling something a failed replication when it has the same or larger effect as previous studies, but wider confidence intervals. The post you sent didn’t seem to get that message.

Second, the post uses outdated thinking about standardized effect sizes for interventions. The .1 to .18 in Rienzo are huge effects for adolescent RCTs. When you look at the I3 evaluations, which have the whole file drawer and pre-registered analyses, you can get an honest distribution of effects, and almost nothing exceeds .18 (Matt Kraft did this analysis). The median for adolescent interventions is .03. If the .18 is trustworthy, that’s massive, not counterevidence for the theory.

Likewise, the post says that an ATE of .08, which is what Sisk et al. estimated, is “extremely trivial.” But epidemiologists know really well (e.g. Rose’s prevention paradox) that a seemingly small average effect could mask important subgroup effects, and as long as those subgroup effects were reliable and not noise, then depending on the cost and scalability of the intervention, an ATE of .08 could be very important. And seemingly small effects can have big policy implications when they move people across critical thresholds. Consider that the ATE in our Nature paper was .05, and the effect in the pre-registered group of lower-achievers was .11. That corresponded to an overall 3 percentage point decrease in failing to make adequate progress in 9th grade, and a 3 point increase in taking advanced math the next year, both key policy outcomes. This is pretty good considering that we already showed the intervention could be scaled across the U.S. by third parties, and could be generalized to 3 million students per year in the U.S. I should note that Paunesku et al. 2015 and Yeager et al. 2016 also reported the D/F reduction in big samples, and a new paper from Norway replicated the advanced math result. So these are replicable, meaningful policy-relevant effects from a light-touch intervention, even if they seem small in terms of raw standard deviations.

Unfortunately, unrealistic thinking about effect sizes is common in psychology, and it is kept alive by the misapplication of effect size benchmarks, like you see in the Sisk et al.. Sisk et al. stated that the “average effect for a typical educational intervention on academic performance is .57,” (pg. 569) but Macnamara is citing John Hattie’s meta-analysis. As Slavin put it, “John Hattie is wrong.” And in the very paper that Macnamara cites for the .57 SD “typical effect,” Hattie says that those are immediate, short-term effects; when he subsets on longer-term effects on academic outcomes, which the mindset interventions focus on, it “declined to an average of .10.” (pg. 112). But Sisk/Macnamara cherry-pick the .57. I don’t see how Sisk et al. reporting .08 for the ATE and more than twice that for at-risk or low-ses groups is “damning.” .08 ATE seems pretty good, considering the cost and scalability of the intervention and the robust subgroup effects.

The third outdated way of thinking is that it is focused on main effects, not heterogeneous effects. In a new paper that Beth Tipton and I wrote [see yesterday’s post], we call it a “hetero-naive” way of thinking.

One way this post is hetero-naive is by assuming that effects from convenience samples, averaged in a meta-analysis, give you “the effect” of something. I don’t see any reason to assume that meta-analysis of haphazard samples converges on a meaningful population parameter of any kind. It might turn out that way by chance sometimes, but that’s not a good default assumption. For instance, Jon Krosnick and I show the non-correspondence between meta-analyses of haphazard samples and replications in representative samples in the paper I sent you last year.

The post’s flawed assumption really pops out when this blog post author cites a meta-analysis of zero-order correlations between mindset and achievement. I don’t see any reason why we care about the average of a haphazard sample of correlational studies when we can look at truly generalizable samples. The 2018 PISA administered the mindset measure to random samples from 78 OECD nations, with ~600,000 respondents, and they find mindset predicts student achievement in all but three. With international generalizability, who cares what Sisk et al. found when their meta-analysis averaged a few dozen charter school kids with a few dozen undergrads and a bunch of students on a MOOC?

Or consider that this post doesn’t pay attention to intervention fidelity as an explanation for null results, even though that’s the very first thing that experts in education focus on (see this special issue). I heard that, in the case of the Foliano study, up to 30% of the control group schools already were using growth mindset and even attended the treatment trainings, and about half of the treatment group didn’t attend many of the trainings. On top of that, the study was a cluster-randomized trial and had an MDE larger than our Nature paper found, which means they were unlikely to find effects even with perfect fidelity.

I don’t mean to trivialize the problems of treatment fidelity; they are real and they are hard to solve, especially in school-level random assignment. But those problems have nothing to do with growth mindset theory and everything to do with the challenges of educational RCTs. It’s not Carol Dweck’s fault that it’s hard to randomize teachers to PD.

Further, the post is turning a blind eye to another important source of heterogeneity: changes in the actual intervention. We have successfully delivered interventions to students, in two pre-registered trials: Yeager et al., 2016, and Yeager et al., 2019. But we don’t know very much at all yet about changing teachers or parents. And the manipulation with no effects in Rienzo was the teacher component, and the Foliano study also tried to change teachers and schools. These are good-faith studies but they’re ahead of the science. Here’s my essay on this. I think it’s important for scientists to dig in and study why it’s so hard to create growth mindset environments, ones that allow the intervention to take root. I don’t see much value in throwing our hands up and abandoning lots of promising ideas just because we haven’t figured out the next steps yet.

In light of this, it seems odd to conclude that Carol Dweck’s involvement is the special ingredient to a successful study, which I can only assume is done to discredit her research.

First, it isn’t true. Outes et al. did a study with the world bank and found big effects (.1 to .2 SD), without Carol, and there’s the group of behavioral economists in Norway who replicated the intervention (I gave them our materials and they independently did the study).

Second, if I was a skeptic who wondered about independence (and I am a skeptic), I would ask for precisely the study we published in Nature: pre-registered analysis plan, independent data collection and processing, independent verification of the conclusions by MDRC, re-analysis using a multilevel Bayesian model (BCF) that avoids the problems with null hypothesis testing, and so on. But we already published that study, so it seems weird to be questioning the work now as if we haven’t already answered the basic questions of whether growth mindset effects are replicable by independent experimenters and evaluators.

The more sophisticated set of questions focuses on whether we know how to spread and scale the idea up in schools, whether we know how to train others to create effective growth mindset content, etc. And the answer to that is that we need to do lots of science, and quickly, to figure it out. And we’ll have to solve perennial problems with teacher PC and school culture change—problems that affect all educational interventions, not just mindset. I suspect that will be hard work that requires a big team and a lot of humility.

Oh, and the post mentions Li and Bates, but that’s just not a mindset intervention study. It’s a laboratory study of praise and its effects on attributions and motivation. It’s not that different from the many studies that Susan Gelman and others have done on essentialism and its effects on motivation. Those studies aren’t about long-term effects on grades or test scores so I don’t understand why this blog post mentions them. A funny heterogeneity-related footnote to Li and Bates is that they chose to do their study in one of the only places where mindset didn’t predict achievement in the PISA — rural China — while the original study was done in the U.S., where mindset robustly predicts achievement.

15 Comments

  1. Brenton Wiernik says:

    Certainly psychology has wildly optimistic views about the size of standardized effects. But that doesn’t mean that d = .18 represents a big effect. There isn’t any need to standardize effect sizes here at all. The outcome of interest was GPA—that is an inherently meaningful metric. The effects for the at-risk they report correspond approximately to a student getting a C- instead of a D in one class. Now, that might be meaningful if the student is really on the margin of graduating versus not and this is the critical class they need credits for. But that’s a lot of ifs, if the benefit is realized in a class that isn’t their critical point or if the student isn’t marginal on graduating, this intervention wasn’t useful for them. In most cases, based on their results, this intervention is going to have no meaningful benefit.

    As a last point, a no-treatment comparison isn’t really appropriate I don’t think. Rather than spending 30 minutes on this intervention, what would the effect of 30 minutes of tutoring in a class they are struggling with have been? What would the effect of an intervention focused on more obvious academic skills like critical reading, time management, or organization?

    • Andrew says:

      Brenton:

      It’s just not realistic to expect a small intervention to have an average effect of 0.2 of a grade. Consider that for many students the intervention will have zero effect. Improving grades is hard work. In teaching you have to do lots of little things. There’s no pill to take or button to push.

    • Luis Zambrano says:

      We need to have in mind that average effects usually mean no effect for some and a bigger than average effect for others, as Andrew says.

      Besides that, the economist in me cannot do anything but think that a student getting a C- instead of a D is an amazing outcome for such a cheap intervention.

      Spending 30 minutes on tutoring does not seem to be an appropriate comparison, as it is more individualized (from my understanding of the intervention).

      PD: there are possibly other beneficial outcomes for this intervention. In example, for students, “different” activities regularly can greatly reduce how much some students dread school.

    • Michael Nelson says:

      We know that the effect of one extra half hour of regular instruction over the course of a school year is not meaningful. We also have meta-analyses that show (at least to the extent that heterogeneous samples can) that spending one extra half-hour on these other skills doesn’t do anything, either. Otherwise, believe me, there’d be someone out there pushing the “One-time, Half Hour Skills Booster” and making bank. After all, mindset is a new idea (relatively) but the “more obvious academic skills” have had a lot of time to be refined.

      In fact, to the extent that they add value to student learning, the other skills you refer to are probably already part of the business-as-usual condition in most schools. One day, if the science holds up, mindset will be just like peer-tutoring and group work–teachers will be trained in it in their college ed coursework, and it will be built into curricula and pedagogy. Consider that the non-trivial “effect” of a year’s worth of education, above mere maturation and at-home enrichment, is almost entirely due to the “traditional” instructional interventions with “trivial” .1 effects that teachers bundle together. This is most true for the students who don’t get the out-of-school enrichment they need.

  2. I would be interested in comparing Dweck’s insights with those of Tversky & Kahneman, Philip Tetlock, and others. After all, who has constituted the credentialed expert class? Quite probably they were inclined to a ‘growth mindset’. Maybe the term ‘fixed’ is not optimal. Besides we go through all kinds of learning cycling, depending on context/circumstance.

    I do think that how parents characterize their children’s intelligence is really annoying to me. It can debilitate an otherwise energetic and curious child.

    However, in my experience, parents mislead their kids into thinking they are geniuses. Not everyone is ‘amazing’. Overindulgence of compliments is no substitute for encouraging kids to think on their own I tried hard to skip school as much as I could. I preferred the company of adults mostly when it came to learning.

    • And so what did I learn? That so much of our knowledge base is the result of the sociology of expertise to which we are exposed. I saw that some large % of academics had quite fragile egos.

      Here and there a few mature and humble ones too. We don’t reward maturity and humility all that well.

  3. Anonymous says:

    Yeah I don’t want to hurt anyone’s bubble, but:

    “the intervention changed grades when peer norms aligned with the messages of the intervention. “

    isn’t this saying “the intervention works where people already believe it”?

    • Michael Nelson says:

      It’s implying that students who are out of step with the views of their peers can be brought back to those norms, and improve performance, to an extent that doesn’t happen without the intervention. It would suck if that’s all we get out of it–only helping people who come from certain backgrounds. But it ain’t nothin. The cumulative (negative) effect of a non-growth mindset among students in this “sub-group” over years might be large and could be almost entirely preventable or even reversible–a very practical finding. And in terms of the science, it would help us explain some of the heterogeneity in student trajectories.

      If someone came up with a vaccine for COVID that only worked for people who’d been sick with it before, by awakening a natural but latent immune response, would you trash it because “the vaccine works where people already have the antibodies?”

  4. Phil says:

    I haven’t read anything about this other than the blog post — I have not read the original studies or even their abstracts — so I’m really just commenting on what I see here.

    Yeager makes (at least) two good points, both of which tie in with themes highlighted on this blog in the past: (1) an intervention can work for some people but not others, so merely quoting the mean effect size may be misleading; and (2) even if the benefit of an intervention is small in every way we care about — if there is a benefit at all — it can still be worthwhile if the cost of the benefit is also small. (I assume point 2 is what Yeager has in mind when he says “…considering the cost…of the intervention.”

    It’s not a priori ridiculous that changing someone’s “mindset” can change their behavior and thus their performance. Indeed, plenty of people sign up for coaching and classes and counseling and therapy (sorry, I ran out of c words) under the assumption that these things will help them make changes. A lot of it may be hooey but I don’t think it’s all hooey.

    Figuring out the optimal material and methods for any individual hour seems almost impossible to do by experiment. To some extent teachers have to rely on their understanding — their mental models — of what ought to help students. We’ve all been students ourselves, we all try to learn things and try to explain things and we have some understanding of how our minds work and how they differ from others. One of the things that makes a difference (I assert without evidence, but I’m sure) is how much effort the student is willing to put into something. That is certainly true of me: there are things I’m willing to work hard to learn and whaddya know, I’m likely to learn them better and faster than the things I don’t work at. I’m not at all skeptical about the value of being willing to work hard to learn. My skepticism is all about how much you can change attitudes with an hour of instruction, however it is carried out.

    All in all, I’m skeptical that an hour of cheerleading about a ‘growth mindset’ leads to a substantial effect on the average ninth-grader or on the average student from any pre-defined subset of students. But that’s also true of an hour spent studying geometry or history or biology or anything else. A kid spends, what, something like 10,000 hours in school I guess, between the first day of first grade and graduation from high school, and if any single one of those hours is enough to move the needle that would be almost shocking. Students learn a ton from hour 1 to hour 10,000, but they only learn a teeny tiny bit in any given hour of school (and they learn half of that ton while they’re not in school at all).

    So my gut feeling is that you probably can’t achieve a noticeably improvement in student performance with one hour of instruction of any kind…but if I’m wrong, and I certainly might be wrong, then…well, if an educational outcome can really truly be improved noticeably by doing a specific thing for an hour, then hey, try it for a dozen hours (maybe once per year?) and see if you can get even more of it. Anything on the order of a 0.1 point improvement in GPA on average, or even 0.01 point improvement in GPA on average, if you remove an hour of something else and replace it with an hour of teaching about a “growth mindset” seems like a great deal. Do it again.

    And even if it’s true that Dweck is associated with all of the positive results and none of the negative results, I don’t consider that damning in itself. Maybe she’s just good at designing interventions, or at designing studies that reduce other sources of noise and thus allow the effect to be seen. This is the kind of thing that should make us suspicious but not dismissive, absent other evidence.

    • Michael Nelson says:

      I’ll play devil’s advocate and say, Not only is it plausible that a brief intervention could have such lasting effects, but there are an absolute ton of positive behavioral interventions that only take half an hour but have a meaningful and lasting effect. So many that I can literally rattle off a dozen in couple minutes.

      Introduce a little boy who likes firetrucks to a real fireman. Show a kid from an upper-middle class family a documentary about the consequences of real poverty. Have a student read an amazing short story. Inform a teen about her options for family planning. Show a depressed gay teen an “It gets better” video. Teach a man to fish. Etc. True, not every little boy or rich kid or reader or young woman or young man or non-sportsman will change, but one in ten? Two in a class of twenty? That’s a big effect for a pretty big subgroup.

      Taking my horns off, I can honestly say that many things most students learn in school are learned in no more than an hour’s instruction, because that’s all the time teachers have to teach individual state standards. Those things then get reinforced over time, which is (arguably) what’s going on inside a student’s head each time they do schoolwork and apply this mindset.

      • I am aligned with many of your observations. Nevertheless, I am more focused on the quality of interactions between children, parents, and community: community being the wider range of interactions with people. I guess I thrive in spontaneous, chance, and eclectic encounters. That is not what much of our education encourages. Particularly given the structure of Common Core curricula and simply the physical constraints of school learning.

        Some children have had the benefit, for example, of aiding their parents who were country doctors their offices or home visits. That is pretty amazing. So perhaps what I’m pointing to is that depth of engagement does matter. We have introduced all sorts of toys, games, and learning tools that have begun to substitute for engagement with real live people. I see this all around me here in DC.

      • jim says:

        “there are an absolute ton of positive behavioral interventions that only take half an hour but have a meaningful and lasting effect. “

        Sorry, not a ghost’s chance in hell. If it were that easy Egypt would be in it’s 500th dynasty. Check the counterfactual, man, there’s just no way it can be true.

    • jim says:

      “It’s not a priori ridiculous that changing someone’s “mindset” can change their behavior and thus their performance. “

      Phil, I absolutely agree. In fact it’s something we should be striving for in every educational environment. But I’m sure there are millions of parents and thousands of teachers who can attest that they work on this constantly with all their children and students, and a half-hour intervention isn’t going to do the job.

      I don’t see anything wrong with such an intervention. It costs almost nothing and definitely sends the right message. But sorry I’ll just need to see much more powerful evidence to be convinced that a single 30min video intervention is responsible for creating perpetual improvement for struggling students.

  5. Adrian says:

    The continued use of relative standardised effect size as an indicator of relative effectiveness of interventions in education research astonishes me. It worries me that it leads practitioners to make ill-informed decisions.

    The same intervention compared to the same control treatment on the same sample (or a sample chosen randomly from the same population) can have very different effect sizes depending on the outcome. An intervention on teaching fractions might have an effect size of, say, 0.5 with a fractions outcome, 0.2 with a maths outcome and 0 with an IQ test. Some argue that we can deal with this by only accepting “standardised tests” – but all three outcomes in this example might be standardised tests!

    A big effect size doesn’t mean an important intervention. With the most proximal measure, you can even get an infinite effect size for a simple, brief, near-zero-cost intervention with near-zero educational value (see https://bit.ly/2Wlkl72).

    If you then consider different populations, different comparison treatments, different treatment/outcome measure delays and so on – each can result in hugely heterogeneous effect sizes. The same might apply for growth mindset – it may work for some people, in some circumstances, for some outcomes, compared to some alternatives. Choose one balance of these for a big effect size, choose a different balance for a small or even negative one. Having Dweck on your team might help you choose the best possible balance, being a growth mindset skeptic might lead you (consciously or not) to a different set of choices. If stopped thinking of effect size as a proxy for educational importance and stopped expecting the effect sizes to be the same when the circumstances/populations/measures etc. are different, we might not be concerned.

    It might make more sense to think of standardised effect size as something akin to a signal-to-noise ratio – it tells you how clear the experimental outcome was, but it doesn’t tell you how important the intervention is nor how it compares to other interventions.

  6. Adrian says:

    The continued use of relative standardised effect size as an indicator of relative effectiveness in education research astonishes me. It worries me that practitioners are misguided in to using interventions associated with the largest effects sizes, rather than the best interventions for their class.

    The same intervention compared to the same control treatment on the same sample (or sample chosen randomly from the same population) can have very different effect sizes depending on the outcome. A intervention on teaching fractions might have an effect size of, say, 0.5 with a fractions outcome, 0.2 with a maths outcome and 0 with an IQ test. Some argue that we can deal with this by only accepting “standardised tests” – but all three outcomes in this example might be standardised tests!

    With the most proximal measure, you can even get an infinite effect size for a simple, near-zero cost intervention with near-zero educational value (see https://bit.ly/2Wlkl72).

    If one looks then at different samples, different comparison treatments, different treatment/outcome measure delays … each can result in hugely heterogeneous effect sizes. The same applies for growth mindset – it may work for some people, in some circumstances, for some outcomes, compared to some alternatives. Choose one balance of these for a big effect size, choose a different balance for a small or even negative one. It may be that having Dweck on your team leads you to use the right balance for a large effect, while a skeptic might (consciously or not) choose a less positive combination. Neither is right or wrong – if we can stop thinking about a bigger effect size as a better intervention, we can stop expecting the same effect sizes from different circumstances/outcomes/etc.

    It seems to make more sense to think of standardised effect size as more like a signal-to-noise ratio – it tells you how clear the experimental effect is, but it doesn’t tell you how important the intervention is nor directly how it compares to other interventions.

Leave a Reply