This post is based on a video I recorded for the Chartered College of Teaching’s Chartered Teacher (Leadership) Programme, and explores the challenges and possibilities of quality assuring the practice of individual teachers through lesson observation. I have discussed what I mean by quality assurance, and some of its pitfalls, in a previous post.
Throughout my career lesson observation has been used both to evaluate teaching and to try to improve the practice of teachers. For most of that time, the evaluation element was logged formally as some sort of grade or rating. It took me a long time to question this; I assumed that the managers who visited my lessons knew what they were doing and that they were capable of judging the quality of what they saw. I even quite liked the process, valuing it as an opportunity to show off and win another ‘Outstanding’ trophy to add to my collection.
Lesson observation, however, was not a positive experience for everyone I knew. I saw colleagues for whom it was a thing of terror, and this problem was only magnified when other methods of teacher evaluation were added into the mix: learning walks, book looks, student consultation and use of pupil performance data. I even knew teachers who ended up leaving the profession as a result of the professional climate generated by quality assurance. This did not sit well with me, and it was a source of considerable tension in my mind for a number of years after becoming a senior leader.
There is nothing intrinsically wrong with evaluation, of course, and some form of it is inevitable. To make judgements is human. We cannot help appraising what we see from the second we walk into somebody’s classroom. More formal evaluation is also necessary and important in certain circumstances: we have a responsibility to students and when we see practice which lets them down we have a duty to do something about it.
Lesson observation commonly goes way beyond this, however, and it is frequently used to quality assure the practice of all teachers as a formal element of their management, sometimes playing a role in determining whether or not they get a pay increase. I believe that fewer schools nowadays are awarding Ofsted-style grades to lessons (thankfully), but even without them there can still be an evaluative mindset around observation. Ratings, descriptors, criteria, checklists, scores and the like all come under this umbrella, whether openly shared with the teacher or logged behind the scenes. Is this activity worth the time spent on it? In order to try to answer this question I want to probe the assumptions on which lesson observation rests and ask if it is truly fit for purpose.
In order to be worth pursuing, I would argue that quality assurance through lesson observation (and any other attempt to evaluate teacher effectiveness) must pass two tests:
- Can we do it accurately? It is a form of assessment, so it must offer reliability and validity in order to be fair to those who are assessed and to provide useful information to those who are carrying it out.
- Are the consequences desirable? Ultimately schools exist to educate students, so teacher evaluation must have a beneficial impact on student learning in order to be worthwhile.
Can we evaluate teaching accurately through lesson observation?
Does quality assurance through lesson observation pass the first test? In order for it to do so, we would need to be able to use it to identify effective and ineffective teachers. This concept in itself is not simple, because people have different views about what teaching aims to achieve. For the sake of a straightforward definition, however, I mean those whose practice leads to their students learning what the teacher intends them to learn. This immediately presents us with a problem, since we are unable to read every student’s mind and therefore cannot observe learning directly. Evidence generated by assessment activities (e.g. oral responses to questions, written work in books) can give us an insight, but even with excellent evidence, we can only observe short-term performance, not long-term learning.
Therefore the question becomes whether we can use what we see in the classroom to predict accurately whether or not the teacher’s practice will lead to successful long-term learning. For what it is worth, I suspect we can do this with a pretty good degree of accuracy at the very poor end of the spectrum. This is because barriers to learning, such as a high degree of disruption in the lesson or explanations which are inaccurate, are much more visible than learning itself. As I indicated before, we have a duty to make judgements in these instances and to take steps to try to improve provision.
Thankfully the vast majority of what we observe is not in this very poor territory, so we need to ask whether we can accurately identify effective teaching in better lessons. This is a question which lends itself to research, and while I am far from an expert on the evidence base, it seems to me that it is decidedly mixed. Most studies I am aware of come from the USA, measuring how much progress students made during their time with different teachers (assuming, of course, that the progress measures are valid). One study found a statistically significant correlation between various features of teaching identified through observation and students’ eventual value added scores (Gill et al, 2016). The positive correlation was especially strong for effective classroom management. This would seem to offer support to the practice of lesson observation for teacher quality assurance.
Another study, however, found that observers did less well than chance at identifying teachers whose students would go on to score well in tests (Strong et al, 2011). This provides a salutary reminder of our evaluative limitations: a coin toss might well do a better job of identifying good teaching than a lesson observation.
So which study is our school observation practice more likely to replicate? I do not know for certain, but I am not optimistic. This is partly because other things in schools are likely to reduce the accuracy of our judgements, such as the fact that observers are rarely provided with much training. The only observation guidance I remember getting when I first took on a management role was in how to deliver feedback, not how to make the judgements about which I would be feeding back. We also have to take into account the fact that managers frequently observe outside of their subject specialism and that teachers are not normally observed very frequently, so the lessons which are seen may well be unrepresentative of their typical practice. We probably all have memories of putting on a special performance for an observation.
Does quality assurance through lesson observation have desirable consequences?
So whether we can evaluate teaching accurately through observation is highly doubtful, but some might argue that it would still be worthwhile if the consequences were desirable. This brings me to my second test. Unfortunately research fails to offer an overwhelmingly hopeful picture here. One EEF study, for example, suggests a programme of lesson observation had no positive impact on student outcomes (Worth et al, 2017).
Why might this be the case? In order to answer that, I think it is helpful to think about what happens when an observer forms a judgement about the effectiveness of a teacher. I want to pick up here on an idea expressed by Dylan Wiliam (1998) in a paper about the assessment of students, but I think it applies to evaluation of teachers as well. Wiliam draws on J.L. Austin’s distinction between perlocutionary and illocutionary speech acts. Perlocutionary speech simply describes what has been, is or will be (e.g. a statement about the weather), whereas illocutionary speech makes something so just by saying it. An example of the latter type of speech is when a trial jury pronounces a guilty verdict. That pronouncement makes the defendant guilty in the eyes of the law, regardless of whether or not they actually committed the crime.
What does this have to do with lesson observation? If I observe you and rate your teaching as ‘Good’ (or anything else, for that matter), that has a force a little like a jury’s verdict, regardless of whether it relates to any effectiveness of your teaching in reality. I am pronouncing it to be ‘Good’ from a position of authority, and thus bringing the state of ‘Goodness’ into existence. As a result, my judgement has implications for the future. If I observe you again you will almost certainly repeat elements of what you have done, because I liked them previously. This does not seem healthy to me. Even if my judgement is correct, your attention is likely to become focussed on surface features of what I comment on, rather than the substance which underpins them. For example, I might mention your excellent questioning, leading you to dutifully serve up questioning on a plate every time I visit your classroom. Questioning becomes a proxy for quality, which potentially distracts you from whatever made your questioning good in the first place.
We see this when we use rubrics or criteria for lesson observation. These criteria are developed with the best of intentions, but what comes next is nicely illustrated by Greg Ashman’s image (above), applied by him to the use of rubrics with students, but equally apt in this scenario. We start with the complex domain of pedagogy and identify key features of the practice of those we see as expert teachers, using them to create criteria to help us observe others. These criteria are proxies, and even if they are good ones, it is hard not to lose sight of the fact that effective teaching consists of much more than proxies. Teachers respond to those rubrics when they plan observed lessons, having a tendency to treat them as a checklist, by seeking to include evidence that they are meeting the different criteria so they can get the box ticked. Before long the whole thing becomes about surface features and the complex domain of expert teaching has been lost almost entirely.
I remember this issue arising several years ago when a set of criteria for lesson observation placed a heavy emphasis on whether or not students made progress in the lesson. These days I would have a number of additional objections to this, but setting them aside for now, it certainly had a distorting effect on the way teachers thought about and planned for progress when they were being observed. What they focussed on was making progress visible, leading to an array of mini-plenaries, colour-coded cards for students to show their understanding, thumbs up, fingers to indicate the degree of confidence and the like. These activities made good observation fodder, but I doubt they revealed very much of substance about student learning.
Let us be charitable for a moment, however, and assume that an observer provides feedback to a teacher which is spot on and identifies the most valuable things they could do to improve. Why might this advice not translate sustainably into better practice on the part of the teacher? I think the answer is because teachers’ habits are deeply embedded and schools are not conducive environments to changing them, as Mike Hobbis has explained very persuasively. If we want teachers to act on feedback, we need to get serious about devoting time and resources to creating a friendly atmosphere in which we maximise the likelihood of this happening. This will require putting deliberate practice at the heart of CPD, building inset around it and probably conducting significantly more, tightly-focussed visits to lessons. Occasional observations, which we encourage teachers to treat as performances, followed by the hectic business as usual of school life, are highly unlikely to have the desired effect.
In addition to these problems, the culture engendered by judgemental lesson observation affects the dynamic in the classroom. It is much less likely that teachers will be open and treat being observed as a learning opportunity. Instead they will try to cherry pick their best lessons and take few risks. When feedback is given, they will only listen for the verdict (especially, but not exclusively, if it is in the form of a grade). In this environment teachers will not welcome managers to their classrooms as supportive colleagues, but will immediately be on their guard as soon as somebody enters. We have to face up to the fact that if we choose to use lesson observations to evaluate teachers, they have less power to help those teachers learn and improve.
My conclusion, therefore, is that while we might possibly be able to identify effective teaching more accurately through observation than by flipping a coin (hardly a resounding endorsement in itself), we cannot do so anything like accurately enough to base high stakes decisions on the judgements (e.g. pay increases). I also think that, in most circumstances, the likely costs of doing so outweigh any benefits which might accrue.
Is there a better way?
So what should we do instead? If we need to evaluate, I think it makes more sense to look for something specific, such as the consistency with which a particular aspect of the curriculum is being implemented in the classroom, or fidelity to a new school policy, rather than to attempt to form holistic judgements on lessons or a teacher’s practice. This can become a useful information-gathering activity, in line with the good bets I outlined in my post on quality assurance of the curriculum. When we identify evidence that things are not going as we hoped, this should raise questions about why that might be the case, rather than automatically reflecting poorly on the teacher in question.
When it comes to a focus on teachers’ individual practice, I much prefer lesson observation to be formative, with the purpose being to develop teachers’ capabilities rather than to rate them. Informal evaluation will inevitably still take place, of course, because it is part of our nature to judge, but it will be confined to the background and not recorded unless there is particular cause for concern.
It is important to state that this is not an easy option. Conducting formative observation effectively is a huge challenge, demanding a great deal of expertise on the part of observers. It will also require a significant investment of time and resource, as I indicate above, in order to create an environment in which there is a good chance of habits being changed. Leaders need to look to themselves and step up on this front. There is an argument for creating a specific role for observers, with tailored training provided, without it necessarily coming as part of the job when somebody takes a step up the career ladder in a school.
Ultimately the aim is to create a coaching culture, in which visits to the classroom are welcomed as a way of getting feedback and developing professionally, rather than feared as a means of judgement. I am far from being an expert on coaching, but I have been impressed by accounts written by those who have embedded practice like this across their schools, such as Jon Hutchinson (2020). In my school we have certainly taken significant steps, under the leadership of my colleague Sarah Hosegood, to make lesson observation a formative process. She and I wrote a short article about it (2020), although Sarah has done more work since then to build on this and take things further in a coaching direction.
Does that mean we should ditch accountability altogether and have a free-for-all? Of course not. Accountability is important and the formative nature of lesson observation should enshrine the highest of expectations for teachers: that they can improve their practice and should seek to do so throughout their careers.
What should this form of accountability look like when we observe teachers? My preferred way is to focus on individual goals, which they have played the lead role in creating. As professionals, teachers should be able to identify areas of their own practice which they wish to develop. They should be able to set the agenda with their observers by directing the focus of the observation in advance, asking for feedback about how successfully they are achieving these objectives and how they could do so more effectively. This is a very significant change of dynamic in the observation process, putting the teacher in a much more dominant position, and I have seen its power in my school.
Schools can build their appraisal systems and procedures around this formative approach, and many have done so, including my own. The work of Chris Moyse in this area has been especially influential. I strongly believe in the value of lesson observation, but in order to harness its formative potential we need to stop trying to have our cake and eat it, ridding ourselves of the addiction to measuring, labelling and pigeon-holing teachers.
Rather than thinking in terms of quality assurance of teaching, it might be more helpful to consider our efforts as quality nurture. We can aim to promote a virtuous circle of improvement in which the culture of observation revolves around developing our practice, both individually and collectively, and the actions of leaders are not seen as an unwelcome imposition by teachers, but as supportive contributions to a mutual endeavour.
At least, that’s the dream. Pursuing it means that teachers will have to cope without the prospect of an ‘Outstanding’ badge, but I think they’ll get something more valuable in return: a professional environment in which they can keep getting better.
In the talk on which this post is based I also discussed evaluation of teachers through the use of pupil performance data, but since I blogged about this a couple of years ago, covering similar ground, I have omitted this section here.
- Gill, B., Shoji, M., Coen, T. and Place, K. (2016) ‘The content, predictive power, and potential bias in five widely used teacher observation instruments’ (REL 2017–191). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory MidAtlantic. Retrieved from: http://ies.ed.gov/ncee/edlabs
- Hosegood, S. and Mountstevens, J. (2020) ‘Challenging the Status Quo: A Coherent Approach to Teacher Development’ in Impact 9 (Chartered College of Teaching: London). Retrieved from: https://impact.chartered.college/article/challenging-the-status-quo-coherent-teacher-development/
- Hutchinson, J. (2020) ‘Professional development through instructional coaching’ in Lock, S. (ed.) The ResearchED Guide to Leadership (John Catt: Woodbridge)
- Strong, M.A., Gargani, J. and Hacifazlioglu, O. (2011) ‘Do We Know a Successful Teacher When We See One? Experiments in the Identification of Effective Teachers’ in Journal of Teacher Education 62(4): 367-382. Retrieved from: https://www.researchgate.net/publication/258160263_Do_We_Know_a_Successful_Teacher_When_We_See_One_Experiments_in_the_Identification_of_Effective_Teachers
- Wiliam, D. (1998) ‘The Validity of Teachers’ Assessments’. Retrieved from: https://www.dylanwiliam.org/Dylan_Wiliams_website/Papers.html
- Worth, J., Sizmur, J., Walker, M., Bradshaw, S. and Styles, B. (2017) ‘Teacher Observation: Evaluation Report and Executive Summary’ (Education Endowment Foundation). Retrieved from: https://educationendowmentfoundation.org.uk/projects-and-evaluation/projects/teacher-observation/