Skip to content

Discussion of effects of growth mindset: Let’s not demand unrealistic effect sizes.

Shreeharsh Kelkar writes:

As a regular reader of your blog, I wanted to ask you if you had taken a look at the recent debate about growth mindset [see earlier discussions here and here] that happened on

Here’s the first salvo by Brooke McNamara, and then the response by Carol Dweck herself. The debate seems to come down to how to interpret “effect sizes.” It’s a little bit out of my zone of expertise (though, as a historian of science, I find the growth of growth mindset quite interesting) but I was curious what you thought.

I took a look, and I found both sides of the exchange to be interesting. It’s so refreshing to see a public discussion where there is robust disagreement but without defensiveness.

Here’s the key bit, from Dweck:

The effect size that Macnamara reports for growth mindset interventions is .19 for students at risk for low achievement – that is, for the students most in need of an academic boost. When you include students who are not at risk or are already high achievers, the effect size is .08 overall. [approximately 0.1 on a scale of grade-point averages which have a maximum of 4.0]

An effect of one-tenth of a grade point—large or small? For any given student, it’s small. Or maybe it’s an effect of 1 grade point for 10% of the students and no effect for the other 90%. We can’t really know from this sort of study. The point is, yes it’s a small effect in the context of any student, and of course it’s a small effect. It’s hard to get good grades, and there’s no magic way to get there!

This is all a big step forward from the hype we’d previously been seeing, such as this claim of a 31 percentage point improvement from a small intervention. It seems that we can all agree that any average effects will be small. And that’s fine. Small effects can still be useful, and we shouldn’t put the burden on developers of new methods to demonstrate unrealistically huge effects. That’s the hype cycle, it’s the Armstrong principle, which can push honest researchers toward exaggeration just to compete in the world of tabloid journals and media.

Here’s an example of how we need to be careful. In the above-linked comment, Macnamara writes:

Our findings suggest that at least some of the claims about growth mindsets . . . are not warranted. In fact, in more than two-thirds of the studies the effects were not statistically significantly different from zero, meaning most of the time, the interventions were ineffective.

I respect the general point that effects are not as large and consistent as have been claimed—but it’s incorrect to say that, just because an estimate was not statistically significantly different from zero, that the intervention was ineffective.

Similarly, we have to watch out for statements like this, from Macnamara:

In our meta-analysis, we found a very small effect of growth mindset interventions on academic achievement – specifically, a standardized mean difference of 0.08. This is roughly equivalent to less than one-tenth of one grade point. To put the size of this effect in perspective, consider that the average effect size for a psychological educational intervention (such as teaching struggling readers how to identify main ideas and to create graphic organizers that reflect relations within a passage) is 0.57.

Do we really believe that “0.57”? Maybe not. Indeed, Dweck gives some reasons to suspect this number is inflated. From talking with people at Educational Testing Service years ago, I gathered the general impression that, to first approximation, the effect of an educational intervention is proportional to the amount of time the students spend on it. Given that, according to Dweck, growth mindset interventions only last an hour, I’m actually skeptical of the claim of 0.08. It’s a good thing to tamp down extreme claims made for growth mindeset, maybe not such a good thing to compare to possible overestimates of the effects of other interventions.


  1. Jonathan says:

    This comes home if you watch TV. There are 2 or more ads for drugs for dying people, one for metastatic breast cancer and another cancer med. There may also be one for heart failure. These medications, which I assume are really expensive, offer the promise of ‘longer life’, which is measured not in years but in weeks or maybe months and I mean single digit months. The ads are sad: dying people looking at their kids, trying to share moments, being observed by their loved ones. They hit all the notes. One even characterizes the fight to stay alive as being ‘relentless’ in the face of a relentless adversary, but that doesn’t change the ultimate reality that these drugs may extend life for a few months and the ads gloss over how people actually look and feel when they’re dying (because the actors are healthy people looking sad, not dying people in pain of all sorts). Small effect magnified by the stakes. And the expense of the drugs feeds into your concerns about research that generates small effects: because the stakes are large, does that make the cost and relative ineffectiveness less important than the potential for progress? Much research is justified by what it might lead to, but rarely in such stark terms. One could argue the money would be better spent on people who aren’t dying or perhaps on younger people who have more life potentially ahead of them – which gets into how we value life, a subject of not only actuarial interest but a reminder of the valuation of life after 9/11 – but then you wonder if there might someday be a breakthrough … and though that, if it occurs, have little connection to this expensive medication, then people will say it was worth it, won’t they?

    Context matters for determining effect sizes and it seems much research is actually designed to get you to buy into the context proposed in which the effects matter. I find educational work to be particularly prone to this: they’re almost always looking at dependent measures as proxies for some greater, often difficult to define concept which they frame as though this definition of context is ‘proper’, when it mathematically isn’t. Having worked with kids, being married to someone who works with emergent curriculum, I’m extremely skeptical of all educational claims of effect. Here’s an odd example: I have a friend who, like me, studied Latin – except she’s better at it – and we know the skills that develops and how it affects the way you organize and work through other problems. It’s a great thing to study but only a few take it because the context of their lives says Spanish or another actual language is more valuable. The effect of studying Latin might be large but then it is a self-selecting group so odds are good it appeals to those who tend to think that way already so the effect might be measurable as a positive in the aggregate but actually be concentrated in a relatively small group and then only when you don’t consider the other contextual effects, like what they may lose by not studying Spanish. I could chew this over and over until it becomes a series of perspectives which have value when you look at them on their own or in carefully selected groups but which you see conflict when not so carefully grouped.

  2. Dean Eckles says:

    Andrew’s commentary is right on in my opinion.

    I would have hoped for more statistically careful commentary from the authors of this meta-analysis. Perhaps a reminder that even when people are trying to be careful about quantitative social science, there is a lot that is confusing or misunderstood.

  3. Adrian Simpson says:

    I have my usual concerns about people comparing effect size without thinking very carefully about the nature of the studies: effect size is, after all, a property of the study as a whole, not just the intervention. It is entirely possible that some study designs will give larger effect sizes – use a non-active control treatment instead of an active one; use a test highly sensitive to the difference between the intervention and control treatment mechanism (a phonics test rather than a general language test for a phonic study) and use a restricted range (both by making the group more homogeneous and by choosing the slice of the wider population which we think will be most affected by the difference between the treatments). So it’s entirely possible that “the average effect size for a psychological educational intervention (such as teaching struggling readers how to identify main ideas and to create graphic organizers that reflect relations within a passage) is 0.57”, while average effect sizes for interventions where it is harder to design such clear cut studies (e.g. long term behaviour interventions) will be much smaller – contrary to arguments from the Education Endowment Foundation, John Hattie, the WWC etc. comparing effect sizes (without checking for study design comparability) will not distinguish better from worse interventions.

    As just one example, Slavin notes that effect sizes from standardised tests can be very much lower than researcher designed ones (mentioned again in his recent blog post – though of course even within standardised tests, a researcher can still choose a more focussed instrument (use a standardised fractions test for a study of different forms of fractions testing, instead of a more general maths test) and so get a larger effect size for an identical intervention (on an identical sample).

    We need to stop using relative effect size as a measure of the relative effectiveness of interventions or assume that different effect sizes from different studies of apparently similar interventions need explanation until study design differences are accounted for (see

Leave a Reply