6 Questions Hattie Didn’t Ask But Could Have

by Terry Heick

If you’re familiar with John Hattie’s meta-analyses and it hasn’t given you fits, it may be worth a closer look.If you missed it, in 2009, John Hattie released the results of a massive amount of work (work he updated in 2011). After pouring over thousands of studies, Hattie sought to separate the wheat from the chaffe–what works and what doesn’t–similar to Marzano’s work, but on a much (computationally) larger scale.Hattie (numerically) figures that .4 is an “average” effect–a hinge point that marks performance: anything higher is “not bad,” and anything lower is “not good.” More precisely, Grant Wiggins aggregated all of the strategies that resulted in a .7 or better–what is considered “effective.” The top 10?

  1. Student self-assessment/self-grading
  2. Response to intervention
  3. Teacher credibility
  4. Providing formative assessments
  5. Classroom discussion
  6. Teacher clarity
  7. Feedback
  8. Reciprocal teaching
  9. Teacher-student relationships fostered
  10. Spaced vs. mass practice

So what’s below these top 10? Questioning, student motivation, quality of teaching, class size, homework, problem-based learning, mentoring, and dozens of other practices educators cherish. Guess what ranks below direct instruction and the esoteric “study skills”? Socioeconomic status. Of course, it’s not that simple.

While I leave it up to Hattie and those left-brain folks way smarter than I am to make sense of the numbers, I continue to wonder how the effect of one strategy–problem-based learning, for example–can be measured independently of other factors (assessment design, teacher feedback, family structure, and so on). Also, it can also be difficult to untangle one strategy (inquiry-based learning) from another (inductive teaching).

Hattie’s research is stunning from a research perspective, and noble from an educational one, but there are too many vague–or downright baffling–ideas to be used as so many schools and districts will be tempted to use it. Teacher Content Knowledge has an effect size of.09, which actually is worse than if they did nothing at all? Really? So how does it make sense to respond, then?

As always, start with some questions–and you may be left with one troubling implication.

What Should You Be Asking?

Recently, we shared a list of these effect sizes, shown in ascending order. Included in Grant’s original post is a well thought-out critique of Hattie’s work (which you can read here), where the author questions first Hattie’s mathematical practice of averaging, and then brings up other issues, including comparing apples and oranges (which another educator does here). Both are much more in-depth criticisms that I have any intention of offering here.

There are multiple languages going on in Hattie’s work–statistical, pedagogical, educational, and otherwise. The point of this post is ask some questions out loud about what the takeaways should be for an “average teacher.” How should teachers respond? What kinds of questions should they be asking to make sense of it all?

1. What’s the goal of education? 

Beyond anything “fringe benefits” we “hope for,” what exactly are we doing here? That, to me, is the problem of so many new ideas, trends, educational technologies, research, and more–what’s the goal of education? We can’t claim to be making or lacking progress until we know what we’re progressing towards.

The standards-based, outcomes-based, data-driven model of education has given us bravely narrow goals for student performance in a very careful-what-you-wish-for fashion.

2. How were the effect sizes measured exactly?

How are we measuring performance here so that we can establish “effect”? Tests? If so, is that ideal? We need to be clear here. If we’re saying this and this and this “work,” we should be on the same page about what that means. And what if a strategy improves test scores but stifles creativity and ambition? Is that still a “win”?

3. What do the terms mean exactly?

Some of the language is either vague or difficult to understand. I am unsure what “Piagetian programs” are (though I can imagine), nor “Quality Teaching” (.44 ES). “Drugs”? “Open vs Traditional”? This is not a small problem.

4. How were those strategies locally applied?

Also, while the “meta” function of the analysis is what makes it powerful, it also makes me wonder–how can Individualized Instruction only demonstrate a .22 ES? There must be “degrees” of individualization, so that saying “Individualized Instruction” is like saying “pizza”: what kind? With 1185 listed effects, the sample size seems large enough that you’d think an honest picture of what Individualized Instruction looked like would emerge, but it just doesn’t happen.

5. How should we use these results?

In lieu of any problems, this much data has to be useful. Right? Maybe. But it might be that so much effort is required to localize and recalibrate it a specific context, that’s it’s just not–especially when it keeps schools and districts from becoming “researchers” on their own terms, leaning instead on Hattie’s list. Imagine “PDs” where this book has been tossed down in the middle of every table in the library and teachers are told to “come up with lessons” that use those strategies that appear in the “top 10.” Then, on walk-throughs for the next month, teachers are constantly asked about “reciprocal teaching” (.74 ES after all), while project-based and inquiry-based learning with diverse assessment forms and constant meta-cognitive support is met with silence (as said administrator flips through Hattie’s book to “check the effect size” of these strategies).

If you consider the analogy of a restaurant, Hattie’s book is like a big book of cooking practices that have been shown to be effective within certain contexts: Use of Microwave (.11 ES) Chefs Academic Training (.23 ES), Use of Fresh Ingredients (.98). The problem is, without the macro-picture of instructional design, they are simply contextual-less, singular items. If they are used for teachers as a starting point to consider while planning instruction, that’s great, but that’s not how I’ve typically seen them used. Instead, they often become items to check, along with learning target, essential question, and evidence of data use.

Which brings me to the most troubling question of all…

6. Why does innovation seem unnecessary?

Scroll back up and look at the top 10. Nothing “innovative” at all. A clear, credible teacher that uses formative assessment to intervene and give learning feedback should be off the charts. But off the charts how? Really good at mastering standards? If we take these results at face value, innovation in education is unnecessary. Nothing blended, mobile, connected, self-directed, or user-generated about it. Just good old-fashioned solid pedagogy. Clear, attentive teaching that responds to data and provides feedback. That’s it.

Unless the research is miles off and offers flat out incorrect data, that’s the path to proficiency in an outcomes-based learning environment. The only way we need innovation, then, is if we want something different.

image attribution flickr user usarmycorpofengineerssavannahdistrictpreviously published at TeachThought.com


  1. George Lilley

    Thank you for this analysis. Another major issue is Hattie’s misrepresentation of meta-analyses. For example, the number 1 influence – Student self-assessment/self-grading. The meta-analyses Hattie used do not claim to measure this. They are mostly measuring students’ reporting their High School GPA to a College entry interview a year or so later. The high effect size indicates that students are reporting a higher GPA than what they achieved! Details here – http://visablelearning.blogspot.com.au/p/self-report-grades.html

    Misrepresentation appears to be a common problem in Hattie’s 2009 book. Another striking example is the research used for Teacher Training, giving a low effect size of 0.11 (note many of the same studies are used for Teacher Content Knowledge). But, on closer analysis, NONE of the studies looked at teacher training but at a particular sort of USA teacher certification. This certification is done many years after they have gained their University degree and many years after a teacher has started teaching. So the low effect size indicates certification does not improve teaching. This certification is similar to what we do here in Australia where we have to create a dossier of lesson plans and present evidence of what we do in the classroom in order to get a promotion. So this research is saying the certification does not improve teaching! Details here – http://visablelearning.blogspot.com.au/p/teacher-training.html

  2. Dr. Angela Peery

    I do believe the lack of explanation of what each thing studied actually is/means is a huge problem. For ex., “microteaching” has been held up by many as being a wonderful thing to do, but no one understands the microteaching that was cited in the meta-analysis. What I’ve been able to find (and I’m an educational author/researcher by trade), microteaching is a specific method used in teacher training in New Zealand. It involves videotaping a teacher teaching a very small segment of a lesson and then that teacher debriefing it with his/her supervisor. I have NEVER seen this method used in the US as part of teacher training or ongoing professional development. What I have seen is teachers recording whole lessons and using them to reflect personally or discussing them with colleagues, coaches, or supervisors. Another example: teacher clarity. What does this mean exactly, and how many ways can it be enacted? How is it different from what Madeline Hunter laid out years ago in lesson design? Or is it different? Who knows? What I see over and over again in the many schools in which I consult is that a “sound bite” from Hattie is latched onto — like, “We’re working on teacher clarity” — and thus confusion ensues as the application of that term and idea varies from room to room.

  3. George Lilley

    Yes, i totally agree with you Dr Peery,

    The different definitions of each influence is important. Hattie just jumbles together anything that is remotely related. THere are so many examples,

    In the general category of feedback, he jumbles together studies that use music to calm down students with behavioural issues (i guess that’s feedback in the broadest sense) with feedback given to teachers with feedback that given to university students. The notion of feedback is not defined in any sense – was verbal, written other? Who was giving it and who was it for?

    Other clear examples are inquiry learning, problem-based learning and welfare.

    More details on feedback here – https://visablelearning.blogspot.com/p/feedback.html


Submit a Comment

Your email address will not be published. Required fields are marked *

The following GDPR rules must be read and accepted:
This form collects your name, email and content so that we can keep track of the comments placed on the website. For more info check our privacy policy where you will get more info on where, how and why we store your data.

This site uses Akismet to reduce spam. Learn how your comment data is processed.