John Hattie’s “Visible Learning”: How much should we trust this influential review of education research?

Dan Kumprey, a math teacher at Lake Oswego High School, Oregon, writes:

Have you considered taking a look at the book Visible Learning by John Hattie? It seems to be permeating and informing reform in our K-12 schools nationwide. Districts are spending a lot of money sending their staffs to conferences by Solution Tree to train their schools to become PLC communities which also use an RTI (Response To Intervention) model. Their powerpoint presentations prominently feature John Hattie’s work. Down the chain, then, if all of these school districts attending are like mine, their superintendents, assistant superintendents, principals, and vice principals are constantly quoting John Hattie’s work to support their initiatives, because they clearly see it as a powerful tool.

I am asking not as a proponent or opponent of Hattie’s work. I’m asking as a high school math teacher who found that there does not seem to have been much critical analysis of his work (except by Arne Kåre Topphol and Pierre-Jérôme Bergeron, as far as I can tell from a cursory search.) This seems strange given its ubiquitous impact on educational leaders’ plans for district and school-wide changes that affect many students and teachers. An old college wrestling teammate of mine, now a statistician, encouraged me to ask you about this.

The reason educational leaders have latched onto this book so much, I believe, is Hattie’s synthesis of over 1,000 meta-analyses. This is, no doubt, a very appealing thing. I’m glad to see educational leaders using data to inform their decisions, but I’m not glad to see them treating it as an educational research bible, of sorts. I wonder about the statistical soundness (and hence value) of synthesizing so many studies of so many designs. I wonder about a book where there’s only two statistics primarily used, one of them incorrectly. And, finally, I wonder about these things b/c this book is functioning as fuel for educational Professional Development conferences over multiple years in multiple states (i.e., it’s a significant component in a very profitable market) as well as the primary resource used by administrators in individual districts to affect change, often without teachers as change-agents. Regardless of these concerns, I also appreciate conversations the book elicits, and am open to the notion that perhaps there are some sound statistical conclusions from the book, ignoring Hattie’s misuse of the CLE stats. (Similarly, I should note, I like a lot about the RTI model that Solution Tree teaches/sells.) I’m sending you this email from a place of curiosity, not of cynicism.

My reply: I’ve not heard of this book by Hattie. I’m setting this down here as a placeholder, and if I have a chance to look at the Hattie book before the scheduled posting date, six months from now, I’ll give my impressions below. Otherwise, maybe some of you commenters know something about it?

33 thoughts on “John Hattie’s “Visible Learning”: How much should we trust this influential review of education research?

  1. I believe Hattie’s book is fatally flawed for reasons set out here: https://academiccomputing.wordpress.com/2013/08/05/book-review-visible-learning/ The synthesis of meta-analyses is frequently done by discarding vital context that means the synthesised results cannot be relied upon. E.g. the work averages pre-/post-test comparisons with control/intervention group comparisons to get one average effect size, but surely you can’t just average together two such different effects. Many people besides me have now pointed out such issues in blog posts (I’m surprised the original commenter didn’t turn these up), but still many years on the work seems to get a lot of attention.

    • As far as I remember Hattie responded to this criticisms in his book and pointed out that selecting good or bad metastudies according to extra-criteria would not change significantly the outcome. He quoted some studies on this issue. He then chose to only use meta-studies without creating extra criteria.

  2. Andrew,

    This is a great opportunity to discuss math and statistics education. From my observation there is a tendency to latch on to the curriculum, without sufficient discussion. I gather the ASA is offering a webinar series about high school statistics curricula. I look forward to your observations.

    BTW, I’m not opposed to trying out new stuff. We need more of it.

  3. As a school board member and somebody who teaches statistics at a graduate level, I find it an unusually strong contribution. It is, as Dan says, a summary of all the meta-analyses in the education literature. I like it because it focuses on effect size rather than significance. Every intervention is boiled down to a scale of standardized effect size and to help non-statisticians calibrate three points on the scale are identified – d=0.40 is the effect size of who is your teacher, d=0.15 is the effect size of what a student not in school achieves (so called developmental) and zero is an obvious boundary. And as an example of the clear eyed-ness d=0-0.15 is not considered a good outcome even though it is positive. Then the book just marches through several dozen educational interventions ranging from curriculum to instructional methods to behavioral management methods. Each chapter starts with an overall standardized effect size on the aforementioned scale and then basically a review of the literature and looking at subfactors, concerns with the literature, etc. I have found it to be a marvelous tool to cut through a very wide and deep pile of literature. Its my first go to.

    The CLE that Dan mentions is only found 5 times in the book – I haven’t bothered to check whether I think they’re used rightly or wrongly, because they are not at all central to the book. The SES d is what is empahsized and it is I think unusually well interpreted for non-statisticians by its comparison to other d’s.

    Could I find some nits in some of the detailed discussions? Sure. And it walks the line of simplifying enough to be accessible which will be oversimplifying to those with a stronger stats background. But to me it focuses on the right questions.

    Now to Dan’s point, just because the book is well done, does that mean it is always being used well in the education world. I would have to bet not. The education world is just as (more?) subject to fads, trends, and shallow thinking as most fields. I can easily imagine the book being abused. But that doesn’t mean the book is wrong or bad. And one of the main take aways from the book is that the cold hard reality is that the home a student comes from and the quality of the teacher we put in the classroom matter more than most educational reforms. You don’t hear educators admitting or addressing that very often! And the trends it identifies with large effect sizes worth pursuing are not necessarily the ones that are trendy in education.

    I have grown to have a dim view of the educational consulting world which pushes things on educators without the training to truly judge what they are being sold. But Hattie’s book is part of the solution, no the problem. I couldn’t speak to the consulting/training conglomerate that has grown up around it, but it wouldn’t surprise me if it was ineffective like most of that world is.

  4. Hattie is a similar star of professional development in UK schools. My impression is that ultimately, Hattie argues for growth mindset (dweck etc) and this has been called in to question more recently. There’s a blogger david didau who has some things to say on this.

  5. The most notable flaw to me is perhaps a little difficult to put into words: that the concept doesn’t directly consider that the system pushes back. That is, after we accept there have been a large number of studies about educational methods, and after we accept that we’ve tried various educational methods in various settings, that tends to the conclusion that we’re describing something that should be considered as resisting reduction. I mean that as actively, not passively resisting the imposition of some cause and effect relationship. Think of a bacteria or a virus: we try to pin one down and it adapts, even hides out for long periods of time, and then it reappears in a mutated form, sometimes beneficially to a higher organism, sometimes not. We’re not that much higher up than bacteria: try and educate us and we’ll find a way to not work at being educated and many of us will work to make sure others aren’t educated, even if this means inflicting our internal idiocy or personal pain on our own children (especially on our own children, right?).

    My point is deeply Bayesian: if you can accept that this is true about the educational system, and if you can accept that the educational system is made up of flawed people like yourself, then studies and attempts reveal that the educational system reflects the flaws of ourselves across the population involved. If we then accept that there are flawed families, that leads to conditional branch which has sign: one direction imposes cause from education toward families, stated essentially as a belief you can educate a child and treat that as having either zero or only positive effects on the child’s family. Another perspective attempts to impose all the negative characteristics of that family and that family’s understanding of culture, of expectation (by person, by gender, by birth order, etc.). There are other perspectives pushing back as well: societal expectations imposed by both your immediate physical contexts and the people beyond your family who impose on you, whether that’s through some emulative model or through mental and physical intimidation. You treat a bunch as opposite sign and then you rapidly get into reverberations of positive and negative effects, which is of course a big reason why educational studies aren’t worth much and why you can do a statistically valid analysis that still has no larger meaning other than this is your statistically valid analysis.

    I’m not sure if this is an aside or not, but the universe is set up this way: we calculate physical results of light using layers of couplings that relate layer after layer of probable outcomes. That gives us answers, sometimes exact, sometimes within a range that we can define. Now translate that to any ranking of educational methods. You can substitute something really easy: college rankings. They use a fairly small number of data points which represent deeper meanings and they generate lists, like if you ranked every Friends episode or every Twilight Zone story from 1 to gazillion as if that ranking has meaning beyond this group then that group then that group, with all of those groups being subject to additional layers of meaning, like whether your priors pre-determined results, whether your priors perpetuate results, whether your priors affect how you see the results, etc. In the VL lists I’ve seen, some fairly specific topics are included along with huge concepts like ‘teachers working together’. To me, I’d rank #1 as giving each child exactly what that child needs to overcome whatever issues that child has both internally and externally but then I’m not the fairy godfather and don’t grant wishes.

    Thinking about it, this is one reason why simple metric solutions are often proposed: test and fail if not pass works great … until society complains about failed kids. The complexity makes choices tough: a number of people lose their scholarships because they don’t maintain the required average which means we track the people who are likely to lose scholarships and that group is less likely to get them, and isn’t this just like lending or setting car insurance rates and don’t all these things both reflect and perpetuate circumstances well beyond the control of any single individual, so aren’t we penalizing those who most need help but are being brought down by the world around them or by the system or by racism or whatever argument you want to make about why things don’t work ‘better’. Is that a long enough sentence? Maybe we should just go pass/fail … oh, but then failing people is …

    To my way of thinking, the attraction of Visible Learning is that it gives administrators a way to make what they hope are helpful presentations that teach teachers how to teach, as well as the chance to instill a sense of common cause for their educational mission. Sensible. But also to my way of thinking, because it’s largely removed from actual teaching and from the actual relationships within and across classrooms, it’s mostly putting larger words on what is typically done within the limits of a highly complex system that reacts dynamically. Example: lots of people put their faith in early childhood education and yet teachers (and people everywhere) know that kids’ lives change both predictably and unpredictably as they move through life. That kid who was so interested in everything can become disillusioned or maybe is simply more interested in being popular in a milieu where learning doesn’t make you popular. The idea included a positive sign that early childhood education would affect family, would affect milieu, culture, etc. This easily becomes the ‘we didn’t spend enough/give it enough time’ argument and then you get stuck arguing over which tiny effect you want to believe, which is pretty much what Visible Learning is really about: relative rankings of small effects. You see one problem: you get administration with a checklist that now needs to be done in which you go over the various ‘educational intervention’ strategies and then you can’t say in public anything that indicates the kid’s family is bleeped as though obfuscation is either an actual positive or at least doesn’t mean you aren’t treated as a negative because your use of certain words means you aren’t fit to teach or administer. But to get back to early education, sure, it helps some kids: it helps the ones who benefit from it.

    Sorry about the sentiments but there’s a part of me that gets pissed off when people use their reactions to words to justify their own failures. That goes back to the top of this comment: you think there aren’t issues within families about education? That parents don’t take out their failures on their children? That parents may love their children even as they do the wrong things toward them? That people don’t blame others for their own failings? That parents don’t blame some other kid for leading our kid astray? That all this doesn’t show up in the classroom? It’s not that I’m a bad parent, that my kid doesn’t work but that the teachers don’t like …

    I guess the reason for this excessively long comment, other than coffee and boredom on my end, is that you can see the reluctance of the educational system to view itself as resisting reduction, the same as you can see psychology and sociology resisting reduction. If we take the Visible Learning charts, I’m not even sure you could say that #3 is really not #57 or #117 – except that some, particularly at the top, are obviously complex concepts – but I’d at a minimum say I’d lump them in big bunches and that alone makes the idea just a way of talking about stuff.

    • Your long comment makes me think you are making the wrong point – or at least that a different point may be in order. I should say that I have not read Hattie’s work at all so I am not commenting on that. But I think the ranking of interventions on the basis of average effect sizes may be the wrong approach. Your comment about individual circumstances is important – even if the average effect size is small, it may be large for particular individuals or groups. I believe this point has been made a number of times on this blog (by Andrew and others), particularly with respect to medical interventions. Measuring average effects across a heterogeneous set of individuals makes sense if the residual effects are random. When there are subgroups that systematically vary, then the average effect can be misleading (and likely wrong in both magnitude and/or sign). Our educational system too often fails to treat students as individuals.

    • I find it interesting that your comment, like much debate on education policy, moved from an acknowledgment of the complexity of the problem of improving a large dynamic system to moral chastising of individual failings.

  6. I find it fascinating that educators (or their administrators) rush into changing their teaching methods on such a regular basis. We all want to do a good job, but the newest and shiniest isn’t necessarily the way to go, nor may it be appropriate for where we are. What seems to be missing is in situ testing of new methods. Given that school districts have many schools, it seems obvious to this natural scientist that we should be doing designed experiments where some schools are getting the intervention and some are not. While some policy experiments are carried out, this is not the usual approach. Indeed, I have been faced with arguments between “not possible” and “not ethical” when asking this question. A final advantage is that startup costs are lower if only a portion (and that would be more than one) of all possible sites are part of the new intervention group.

    • “Indeed, I have been faced with arguments between “not possible” and “not ethical” when asking this question. A final advantage is that startup costs are lower if only a portion (and that would be more than one) of all possible sites are part of the new intervention group.”

      A “not ethical” argument is understandable — if people truly believe the new method will be better than the old, then it will seem unethical to them to withhold the treatment from some schools. At the same time, if they truly believe that the old method is better, it will seem unethical to them to “force” some schools to try it. Perhaps a solution would be to allow schools to volunteer to try the new method — there would, of course, be confounding (of “think new method better” with “using new method”) — but the confounding also works the other way (of “using old method” with “thinks old method better”). However, this particular confounding might not be so bad — if the method turned out to be the opposite of what was expected by one group, then this might be considered stronger evidence for the outcome being caused by the method than if schools had been randomly assigned.
      (Of course, there is also a question about just who thinks the new method is better or worse than the old — such a decision might be made by the principal rather than by the teachers involved, and principal’s belief might conflict with teachers’ beliefs.)

    • Indeed, in many situations the question should be not “Will this intervention work?” but “In what ways, in what context, at what costs, and to what ends might it work?” Often when schools and school systems rush to embrace some new idea, they implement it in a dogmatic, overblown, and silly manner. Then eventually people get fed up with it and exchange it for another fad. Why not take the time to implement it intelligently? That’s another reason to test out an idea before applying it on a large scale. Not only do you get a sense of its efficacy, but you can tune and calibrate it.

  7. If you look in the international journal of haruspicy and do a meta analysis of the effect of an all corn diet on the accuracy of chicken entrail based stock market prediction, you will get a meta-analysed estimate of the effect. Does anyone believe this will yield a good method of investing?

  8. The Hattie book sounds like an excellent compendium of educational research … some one looked at all the studies and has written a handy guide to summarize what works and how important each educational technique is.

    I’m quite surprised this hasn’t been done before. (Or has it?) How have the millions of people in the education industry been shaping their teaching methods without an understanding of what works and what doesn’t work and which techniques are more effective than others? Is it fads all the way down? An evaluation of teaching techniques is implicitly encoded in the education of teachers. Are many of these implicit evaluations wrong? Hattie seems to think so. (I, personally don’t know.)

    Hattie himself sounds pretty sane. Here is an interview of him: https://soundcloud.com/bridging-the-gaps/education-what-works-and-what-does-not-with-john-hattie

      • Sameera:

        I am not a natural teacher, which is why I wrote that book on Teaching Statistics. I needed a bag of tricks to teach statistics well, and I wanted to share these tricks with others. My coauthor, Deb Nolan, is an excellent teacher, so I thought we were a good team for writing that book.

  9. Dan, there are a lot more critiques of Hattie and his methods if you know where to look. Some early reviews of his work point out some problems (e.g. https://tinyurl.com/ybkk28c2 and https://tinyurl.com/y86jznp8). More recently, Robert Slavin has very boldly come out to say Hattie is just wrong (https://tinyurl.com/y8mezk3m). A key issue appears to be his misunderstanding of effect size – thinking that a bigger effect size means a more important intervention or a bigger influence. This is the same issue that Brian is making. A recent paper by Simpson shows that this is a fundamental error (https://tinyurl.com/y79we6at).

    Terry, you might like to listen to a pair of podcasts – one by Simpson raising his problems with Hattie’s effect size error (btw – Hattie’s not the only person to make this error) and the other by Hattie trying to defend his work (Simpson: https://tinyurl.com/y8kd8dtg Hattie:http://www.ollielovell.com/errr/johnhattie/). They’re both long, but at the end of it you might ask yourself if Hattie still seems as believable.

    Key is that teachers and school districts are using this as ‘evidence’ to direct practices, but the argument appears flawed from the beginning.

    • Susie B:

      I do not want to spend the time on those links but I did scan through the Bergeron critique (link above) partly as I met him years ago in Ottawa.

      Initially, I found the claim of obtaining something informative from a synthesis of over 1,000 meta-analyses somewhat amazing.

      There maybe real information in published papers but its usually hard to discern exactly what it is. Then discerning what is common over different studies is even harder. Unfortunately, its often largely skipped over by anxious authors who latch on to whatever “summary” measure other authors have gotten away with using. Bergeron claims that’s just what Hattie did.

      Meta-analysis should be first and foremost about discerning what is common over different studies but unfortunately that is not made explicit enough e.g. https://en.wikipedia.org/wiki/Meta-analysis

  10. This has been an eye opening discussion.

    Has anyone looked into these issues with respect to the widely cited 2014 Freeman meta-analysis in PNAS? It’s called “Active learning increases student performance in science, engineering, and mathematics”.

    Every popular article I’ve read or talk I’ve heard on active learning cites Freeman; some have claimed that its conclusions show that traditional lecture should now be considered unethical, or “the pedagogical equivalent of bloodletting”. I’ve not been able to find any substantial criticisms of it. But it seems to contain the same flaws that are being brought up here regarding Visible Learning – in particular the averaging of effect sizes from disparate studies and the dismissing of publication bias having an inflationary effect on published estimates. It also finds there there is *not* heterogeneity of effects between studies of different methodological rigor, and that there is *not* heterogeneity of effects among different types of active learning interventions, which range from using clickers to class worksheets to occasional group discussions to full on flipped classrooms. That seems strange.

    I personally use some active learning tools in my own classes and intuitively they are appealing. But the degree to which Freeman 2014 is held up as empirical proof of their efficacy bothers me.

  11. As someone who works in the K-12 sector doing research and analytic work, the first issue I would have – based solely on the discussion here – is the likelihood that anyone would find 1,000 studies in the education literature that were sufficiently comparable for inclusion in a meta-analysis. Slavic basically does nothing but IES funded meta analyses that are all located in the What Works Clearinghouse (no need for the compendium if you can use a database like that to identify interventions with substantive evidence behind them). Alex Bowers (Teachers College) did a sensitivity analysis of early warning indicator systems several years ago. Of the several hundred articles they found on that topic only 40-60 were sufficiently comparable to satisfy any type of requirement that would be analogous to that needed for meta-analysis.

    If meta-analysis is all that is needed, why wouldn’t Schools/districts gravitate to the work of Marzano – also meta analysis based?

    The preservice pipeline for education does a horrible job of training professionals how to read, interpret, and leverage research and this results in authors making supposedly research based claims that are based on horrible research and analysis. I wish more thorough and careful study was implemented as the norm, but our reality is still a long way from there.

  12. Thank you, Andrew, for presenting this. And thank you to all of the thoughtful, provocative comments & replies. Also thanks for the funny parts (I’m looking at you, Daniel Lakeland and Jeff :) .)

  13. Education comes down to ideology, not very different than any other belief systems. Once we believe in something, you can always find justification to our belief. Coming from science background and teaching stats myself, I think Hattie’s work is pseudoscience. Unfortunately, belief in pseudoscience seems to be not uncommon in the education industry

Leave a Reply to Sameera Daniels Cancel reply

Your email address will not be published. Required fields are marked *