Originally published on the Parents and Teachers for Excellence blog here.
Two trends have dominated how British exams have changed over recent decades: they have become more high-stakes, and they have become more skills-based. The two have combined to create a perfect storm that slows down learning and makes school less joyful. School leaders are under pressure to achieve good exam results, and so orient their schools around exam performance. They measure pupils in all year groups against the assessment objectives from exams, and expect teachers to teach to these objectives too. Every piece of work is a mini-GCSE exam.
This would make sense if the assessment objectives could be taught directly, but they can’t because they’re based on generic skills. Skills can only be acquired indirectly: by learning the component parts that build up to make the whole such as pieces of contextual knowledge, rules of grammar, or fluency in procedures. These components look very different to the skill being sought – just as doing drills in football practice looks very different to playing a football match, and playing scales on a violin looks very different to giving a recital. Yet in these analogies exam objectives would be something like “play with flair”, “keep possession” or “hit notes accurately”, and the instruction given to teachers is to directly teach these skills. Not to spend time on passing drills and scales, but to spend time on “having more flair”.
Most teachers see that the emperor isn’t wearing any clothes. Consider the plight of a typical English teacher. They’re told that their pupils aren’t good enough at understanding the author’s purpose, so as a result they need to teach more lessons on understanding the author’s purpose. They’re given lesson plans that tell their pupils to identify words which illustrate the author’s purpose, and to write paragraphs explaining why they do so. Maybe they include a handy mnemonic for remembering the model “author’s purpose” paragraph. But it doesn’t work. And it doesn’t work because you cannot teach generic skills directly.
To become better at understanding the author’s purpose you need to know more words, so you can understand the fullness of what the author has written, and you need to know more contexts, so you can understand the significance of those words to the author’s life and times. If you know that the gunpowder plot happened in 1605 and that Macbeth was first performed to an audience in 1606, then The Scottish Play becomes a warning against regicide. If you know that “to twist” was Victorian slang for “to hang”, then Oliver Twist becomes a tale about a boy destined for the gallows. If you know that Dickens first came up with the plot when appalled by the experience of attending a young criminal’s public hanging, then it becomes a campaign for social justice. You cannot infer this from practising to understand the author’s purpose. You can only infer it if you have the knowledge.
Becoming a better reader requires investing time in learning a wider vocabulary and building deeper contextual knowledge, but it would be a brave teacher who puts this maxim wholly into action in today’s schools. With the pressure of high-stakes exams there is no room to teach anything except the assessment objectives being examined, and the assessment objectives only measure generic skills. Instead of exciting lessons where pupils learn knowledge that opens up new worlds of history and literature, their teachers are pressured to push them through yet more rounds of dry and soulless skills practice. Pupils and teachers suffer with frustration as they try to become better at inference by doing lots of failed inferring. They rarely have the chance to learn the knowledge they’d need to imagine what was in the author’s head. Both pupils and teachers leave school unhappy as a result.
The same problem occurs in mathematics. Pupils fail exam questions involving problem-solving, so their teachers are told to teach them problem-solving. They’re expected to make their classes discover Pythagoras’s Theorem at the start of the lesson, as if the great breakthrough of a pioneering mathematician could be reliably and spontaneously reproduced by every fourteen year old on a given Thursday afternoon. Having to do this gives them less time to teach Pythagoras’ Theorem, and so jeopardises their pupils’ chance of successfully solving a problem about it in the future. Once again the pressure to teach generic, skill-based exam objectives directly undermines teachers’ attempts to make their pupils better at their subjects – and better in exams as a result.
We now need to realise what high-stakes, skills-based exams have done to our schools and how to recover from it. This will involve moving away from trying to teach skills directly, and from focusing on measuring them at every juncture. Instead we should plan the knowledge (e.g. vocabulary, historical context) and specific micro-skills (e.g. recognising whether the result of an addition will be negative, or re-writing a sentence to be active not passive) that our pupils need to learn in order to perform at a high-level in their exams. We can still target strong exam performance, but we should do so without expecting every lesson to resemble a mini-exam task. Doing so will mean creating schools where pupils learn more tangible things they can go home proud of, and where teachers teach more of the exciting content that brought them into teaching in the first place.
In many ways, this will be a Parliament of consolidation at the Department for Education. The policies of the last five years are coming into force, and Nicky Morgan will need to put her political energy into seeing them through. But there is one area that does need reforming, and it needs it now. It is possibly the biggest opportunity to improve education in this Parliament, and one that would last well beyond 2020. It doesn’t sound glamorous or exciting, and won’t make the headlines. But its potential should not be underestimated. Nicky Morgan should use this Parliament to set a curriculum for teacher training.
Teacher workload is already extremely high, as Morgan has publically recognised. This means that government can’t improve outcomes in a way that puts pressure on schools – there are no more gains to be made from making teachers work harder. Instead, government has to look for ways to help teachers be more effective; and it should start by making sure every new teacher gets the training they deserve.
When I did my teacher training we spent laughably little time learning about learning. We discussed what made a good lesson (in the lecturer’s opinion…) but rarely why those components were good. We were often given quasi-moral justifications, like the assertions that “it is better to discover things for yourself” or “children learn better when they work in groups”, but I cannot recall a single time I heard something explained in terms of how a child’s brain would be responding.
Imagine I told you there was a way to make our children perform 10% better in their exams after just four weeks of study. It involves changing a school’s timetable and teaching style, but still leaving plenty of room for leadership opportunities and extra-curricular activities. You’d expect to hear a clamour insisting that we roll this out in all schools immediately. Instead, Chinese School has earned itself a long list of critics. They don’t like Chinese education because it of its values. Or more precisely, because it values knowledge.
They argue that we should not be seeking to learn from Chinese teaching, despite its superior results. They concede that doing so would make our children learn more, but that this would come at too high a cost. Any improvement in our teaching of knowledge, they argue, would stop pupils being creative thinkers or challengers of the status quo. Yes, Chinese teaching may improve the learning of rules and information, but it does nothing to teach originality.
They seriously appear to be arguing that in a system in which 35% of 16 year olds failed English GCSE this year our problem is learning too much vocabulary, knowing the laws of grammar too well, and sticking too rigidly to the traditions of the literary canon. Otherwise why complain that Chinese teaching is good at helping pupils learn information?
Character is the new fad in education. We all want to develop good character in our children, but the policy that achieves this has proven elusive. Proponents of every conceivable activity have queued up to explain how their pet project develops character (and so should get to dip their hands in the pot of government gold). But while many of these are perfectly good things, building our children’s character requires much more fundamental change.
So instead of looking for new projects to fund, let us ask a different question: why is there a deficit that needs to be made up in the first place?
The deficit exists because the core activity of schools – lessons – can become too easy and too self-consciously fun to need any character at all. Take resilience as an example. A child learns resilience by practising. They try tasks that are difficult, fail at them, and keep trying again. Eventually they learn that you do not need to give up when you face difficulty but can be successful if you invest enough effort.
Teaching is a tough job, with a tough workload. It isn’t easy, and it isn’t going to become easy either. But it can and should be manageable. Sadly in too many schools workload can become excessive, and can do so without improving teaching. But there is one policy that would reduce teacher workload and improve lessons in English schools:
Abolish the Quality of Teaching judgment in Ofsted inspections.
We should do this because:
It incentivises bad leadership
It is easier to tick QoT tick boxes than it is to actually improve results. It is easier to produce an evidence trail than it is to produce an impact. And it is easier to force teachers to work ever harder than it is to make their effort more productive. School leaders who face a grilling from external inspectors, be they Ofsted or otherwise, will find it much easier to create an illusion of performance and score well on QoT than to create actual performance and score well on Achievement.
So why not insist that all staff plan all lessons in detail and in writing on a school proforma? It might not improve learning, but it’s good way to demonstrate QoT. Why not make all staff mark all books every night in four different colours of pen? It might not improve feedback, but it’s a good way to demonstrate QoT. And before you know it, terrified leaders in many schools are imposing dreadful policies on their staff; because it’s easier to put on a show for an inspector than it is to improve results.
It makes teachers teach worse lessons
QoT judges teaching by how it looks, not by what it achieves. This makes teachers ensure teaching looks better, even if that doesn’t make it achieve more. So teachers spend their time on appearances. They buy books on 100 ways to make their lesson look outstanding, and trade chinese whispers about what Ofsted want to see. The time they would spend increasing the impact of their teaching they spend implementing new fads; not because they work, but because they look good.
It harms the quality of teaching – and causes excessive workload
The QoT grade doesn’t improve impact, just appearances. But even if some of this pressure does rub off on impact, that impact doesn’t come for free. There is a huge opportunity cost to everything that a teacher does. If they’re spending their time on appearances then they’re not spending their time improving learning. And that matters. It matters because teachers can’t work infinite hours, and so something has to give. When that something was improving the actual quality of teaching, not the illusory QoT, it’s children who lose out.
So instead we should lose the QoT grade. Judge schools on the impact they have, not on how they look. Then we can lose the smoke and mirrors policies that look good but make real improvement harder. Leaders and teachers will have one aim – to improve the impact they have on children. And there will be no perverse incentive to distract them from it.
But it’s not because I couldn’t write one. I have quite strong views on private schools, and I even think they’re well-reasoned. But I’m not going to write about them.
Why?
Because this debate doesn’t matter. Will forcing reluctant private schools to share their theatres change education? Would an extra £150 million for the DfE budget really improve the system? No.
One day we might reach the point where the next most important change is reform of private schools, in which case we can have this debate. But until then we should focus on bigger things. Like:
Over a third of schools either Require Improvement or are Inadequate
Over a third of 16 year-olds don’t get at least 5 A*-C with English and Maths
Our alternative provision fails almost everyone who needs it
Teacher Training fails to teach you about how people learn
CPD tells teachers piles of rubbish pseudo-science
Teachers spend much of their time doing things without an evidence-base to please various interest groups
Sharing theatres might be nice, but it won’t solve these.
After a Twitter debate yesterday on whether removing National Curriculum levels was a good thing (hint: it was), I realised that the main point of contention lay in what we actually thought the role of government in education is. Rather than continue skirting around the issue, I’m going to lay out what I think.
The education system is going through a period of flux. Where the empire of government once ruled, now increasingly autonomous schools pick up the power to rule themselves. To adjust to a new system we need to understand the roles of all the players. Our new system might be school-led, but what does that actually tell us about the rightful role of government and teachers?
The role of government: to decide on the ends of education
What is education for? You can’t run an education system without an answer to this question, and yet it’s probably the most contentious question out there. The problem is that there isn’t a demonstrably correct answer. This question cannot be delegated to a double-blind randomised controlled trial or conclusively resolved by a panel of experts. It is far too fundamental for that.
Given we need an answer, but cannot seek one from science or consensus, the best place to turn is to democracy. A democratically-elected government should decide whether and what to examine, what schools should be aiming to achieve through education, and how to hold them to account for achieving it. Practically this means that they should set terminal exams, accountability frameworks, and little else.
The role of schools: to decide how best to achieve these ends
Once a government has set the ends of education, it is up to schools to decide how best to achieve them. This must be left to schools, and any mission creep by government or its agencies into this area should be hastily challenged. This role is for schools (rather than for government or teachers) because:
Schools see the whole education of each student, and have to balance the competing demands of different subjects.
Schools know the specific circumstances of their students and their intake, and are best-placed to respond to these.
Once the end goals are set, schools should determine the curriculum needed to achieve these, and the assessment system needed to keep on track. Where schools have their own set of values they are free to go beyond the expectations set by government, but may not drop below.
The role of teachers: to continually improve the quality of teaching
Schools will set curricula and assessment, but it is teachers who translate these into lessons. To quote the now old adage, the quality of an education system cannot exceed the quality of its teachers. But improving the quality of teaching is not a job for schools or for government, it is a job for teachers. Only we teach our lessons. Only we are in the classrooms with ourselves all day. Only we know our greatest strengths and our greatest challenges. Only we can actually make our teaching great. But this is not the current culture.
The current culture, developed through decades of National Strategies, government guidance, and school policies, sees the quality of teaching as the preserve of schools and governments. Teachers are not fit to make decisions about teaching, but are mere enactors of policy and followers of instructions.
We must break the culture of professional development being done to us by schools, of research being given to us by government, and of good ideas being handed down in quango-branded folders. Good teaching is our responsibility, and we should reclaim it.
Last year I attended the Evening Standard’s debate on the future of London Schools. I can’t remember much about the evening, but one phrase sticks in my mind. “The romance of the poor but bright”*.
Education is a marvellous sector. It is full of innovation and entrepreneurship: with schools, charities and social enterprises popping up all over. New solutions to old problems emerge almost like clockwork; but there is a worrying pattern. Our effort and resources, of schools but particularly of business and charitable enterprise, are directed disproportionately at students who are already high achieving – the poor but bright.
Huge effort is expended on access to the top universities, with great sums being spent to make marginal improvements to a small set of students at the top of the disadvantaged spectrum. They cite the gap in entry, often to Oxbridge, as a significant problem that blights our society.
But the gap in Oxbridge entry is the pretty face of the problem. The far uglier face is the gap in life outcomes for those who take least well to education. With rich parents they may go to a non-Russell Group university. With poor parents they go to prison or the job centre. It is this face of the problem we most urgently need to confront.
Popular discourse is easily caught up in the romance of the poor but bright. It’s such a great story – the brilliant child shackled by poverty, just waiting to be set free by a summer school or inspirational speech. This story has so captivated education that we end up ignoring the more pressing problem – of students for whom our efforts will determine whether or not they will ever have a job or contribute to society. We hear about the poor but bright all the time. When did you last here someone advocate for the poor but dim?**
Here is my attempt. The gap most damaging to society is in life outcomes for the children who perform least well at school. They are most at risk of not being able to engage with society – of not having the real or paper qualifications needed to enter meaningful employment. There would be a phenomenally positive impact on business, public services and communities if all those children who fall through school now were instead supported to exceed basic standards and find fulfilling futures.
For education to have a real impact on society and the economy we need to focus on the tail***, not just the top. I’d begin this by looking at three areas:
1. Alternative provision
Many of the students at risk of significantly underachieving at school have complex behavioural needs. Often these have been built up by a long period of underperformance, coupled with very challenging environments. Teachers lack the time and expertise to best support these students. They do their very best, and often make great headway, but they are not trained or equipped for dealing with complex psychological needs. This is why we have alternative provision. Unfortunately this sector operates as a shadow school system, largely unknown and wholly under appreciated. Developing a national network of high-quality alternative provision that works closely with schools to support students at risk of exclusion must be a priority if we are to close the gap at the bottom.
2. Consistency in SEN support
Many of the students at risk of significantly underachieving at school also have special educational needs. Once again schools are often ill equipped to cope with these, and often manage only because of the extraordinary effort of dedicated staff. The inconsistency in funding and support between local authorities is well known, and means that a student in a less generous (or more stretched) council area will receive far less support than they deserve.
3. Rigorous gateways in assessment
Too often underachievement is allowed to settle and persist because it can be dealt with later. This is incredibly dangerous – as knowledge accumulates in a compound way, falling behind early makes for an ever bigger gap. One way to help stop this attitude of putting off catching up is to have clearer assessments where basic skills act as a gateway. The present assessment regime, for example, allows students to achieve a Level 4 at KS2 by compensating for poor performance in the basics with higher performance on easier, less fundamental skills. Reforming assessment so that a student could not appear to be performing well unless they have mastered the basics would send a clearer message where gaps exist. Proficient use of each of the four operations, for example, could be a gateway for maths assessment, and clarify the importance of solidifying these foundational skills.
Any student failing to meet their potential is a dreadful thing, even worse when it happens due to factors totally outside of their control. This is not just the case for the poor but bright, the students with whom we so easily sympathise, and are so quick to support. It matters too for the student with incredibly challenging behaviour, but who is absolutely capable of achieving academically. It matters for the student with complex special needs, who is not a potential Oxbridge applicant but who does have a tremendous amount to offer society. Their successes have the power to change the British economy, far more so than those of their brighter peers.
Don’t just get caught up in the romance of the poor but bright. The other students need our investment too.
*I believe it was used by Lucy Heller.
**I do not believe in either bright or dim, only differences in epigenetic coding or accumulated lifetime practice, but that is a discussion for another day. Here I use dim as the logically necessary opposite to bright, as popularly used in discussions about education.
***This term is borrowed from the book Paul Marshall edited on outcomes for the bottom 20% of students.
I tried to discuss this on Twitter with Sam Freedman, but as his blog title points out, sometimes 140 characters isn’t enough…
The NAHT recently released the findings of their commission on assessment. They have attempted to set out a general framework for assessing without levels, including 21 recommendations, their principles of assessment, and a design checklist for system-building. All in all the report is a good one, capturing some of the most important principles for an effective system of assessment. However there are some significant problems to be fixed. Firstly, the report relies on ‘objective criteria’ to drive assessment, without recognising that criteria cannot be objective without assessments bringing them to life. Secondly, the report places a heavy emphasis on the need for consistency without recognising the need for schools to retain the autonomy to innovate in both curriculum and assessment. Thirdly, the report advocates assessment that forces students into one of only three boxes (developing, meeting or exceeding), instead of allowing for a more accurate spectrum of possible states. Here are my comments on some of the more interesting aspects of the report.
Summary of recommendations
4. Pupils should be assessed against objective and agreed criteria rather than ranked against each other. This seems eminently sensible – learning is not a zero sum game. The potential problem with this, however, is that ‘objective criteria’ are very rarely objective. In “Driven by Data”, Paul Bambrick-Santoyo makes a compelling case that criteria alone are not enough, as they are always too ambiguous on the level of rigour demanded. Instead, criteria must be accompanied by sample assessment questions that demonstrate the required level of rigour. So whilst I agree with the NAHT’s sentiment here, I’d argue that a criteria-based system cannot be objective without clear examples of assessment to set the level of rigour.
5. Pupil progress and achievement should be communicated in terms of descriptive profiles rather than condensed to numerical summaries (although schools may wish to use numerical data for internal purposes). Dylan Wiliam poses three key questions that are at the heart of formative assessment.
Where am I?
Where am I going?
How am I going to get there?
A school assessment system should answer these three questions, and a system that communicates only aggregated numbers does not. Good assessment should collect data at a granular level so that it serves teaching and learning. Aggregating this data into summary statistics is an important, but secondary, purpose.
7.Schools should work in collaboration, for example in clusters, to ensure a consistent approach to assessment. Furthermore, excellent practice in assessment should be identified and publicised, with the Department for Education responsible for ensuring that this is undertaken. The balance between consistency and autonomy will be the biggest challenge of the post-levels assessment landscape. Consistency allows parents and students to compare between schools, and will be particularly important for students who change schools during a key stage. Autonomy allows schools the freedom to innovate and design continually better systems of assessment from which we all can learn. I worry about calls for consistency, that they will degenerate into calls for homogeneity and a lowest common denominator system of assessment.
18. The use by schools of suitably modified National Curriculum levels as an interim measure in 2014 should be supported by government. However, schools need to be clear that any use of levels in relation to the new curriculum can only be a temporary arrangement to enable them to develop, implement and embed a robust new framework for assessment. Schools need to be conscious that new curriculum is not in alignment with the old National Curriculum levels. Can we please stick the last sentence of this to billboards outside every school? I really don’t think this message has actually hit home yet. Students in Year 7 and 8 are still being given levels that judge their performance on a completely irrelevant scale. This needs to stop, soon. I worry about this recommendation, which seems sensible at first, leading to schools just leaving levels in place for as long as possible. Who’s going to explain to parents that Level 5 now means Level 4 and a bit (we think, but we haven’t quite worked it out yet so just bare with us)?
Design Checklist
Assessment criteria are derived from the school curriculum, which is composed of the National Curriculum and our own local design. As above, it’s not a one way relationship from curriculum to assessment – the curriculum means little without assessment shedding light on what criteria and objectives actually mean. The difference between different schools’ curricula is another reason that the desired consistency becomes harder to achieve.
Each pupil is assessed as either ‘developing’, ‘meeting’ or ‘exceeding’ each relevant criterion contained in our expectations for that year. This is my biggest problem with the report’s recommendations. Why constrain assessment to offering only three possible ‘states’ in which a student can be? In homage to this limiting scale, I have three big objections:
Exceeding doesn’t make sense: The more I think about ‘exceeding’, the less sense it makes. If you’ve exceeded a criterion, haven’t you just met the next one? Surely it makes more sense to simply record that you have met an additional criterion that try to capture that information ambiguously by stating that you have ‘exceeded’ something lesser. For the student who is exceeding expectations, recording it in this way serves little formative purpose. The assessment system records that they’ve exceeded some things, but not how. It doesn’t tell them which ‘excess’ criteria they have met, or how to exceed even further. If it does do this because it records additional criteria as being met, what was the point of the exceeding grade in the first place?
I’m also struggling to see how you measure that a criterion has been exceeded. To do this you’d need questions on your assessment that measure more than the criterion being assessed. Each assessment would also have to measure something else, something in excess of the current criterion. The implication of all this is that when you’re recording a mark for one criterion, you’re also implicitly recording a mark for the next. Why do this? Why not just record two marks separately?
The NAHT report suggests using a traffic light monitoring system. Presumably green is for exceeding, and amber is for meeting. Why is meeting only amber? That just means expectations were not high enough to start with.
Limiting information: The system we use in our department (see more here) records scores out of 100. My ‘red’ range is 0-49, ‘amber’ is 50-69, and ‘green’ is 70-100. I have some students who have scored 70-75 on certain topics. Yes they got into the green zone, but they’re only just there. So when deciding to give out targeted homework on past topics, I’ll often treat a 70-75 score like a 60-70 score, and make sure they spend time solidifying their 70+ status. Knowing where a student lies within a range like ‘meeting’ is incredibly valuable. It’s probably measured in the assessment you’d give anyway. Why lose it by only recording 1, 2 or 3?
One high-stakes threshold: Thresholds always create problems. They distort incentives, disrupt measurement and have a knack for becoming way more important than they were ever intended to be. This proposed system requires teachers to decide if students are ‘developing’ or ‘meeting’. There is no middle ground. This threshold will inevitably be used inconsistently.
The first problem is that ‘meeting’ a criterion is really difficult to define. All teachers would need to look for a consistent level of performance. If left to informal assessment there is no hope of consistency. If judged by formal assessment then keep the full picture rather than squashing a student’s performance into the boxes of meeting or developing.
The second problem is that having one high-stakes threshold creates lots of dreadful incentives for teachers. Who wouldn’t be tempted to mark as ‘meeting’ the student who’s worked really hard and not quite made it, rather than putting them in a category with the student who couldn’t care less and didn’t bother trying. And what about the incentive to just mark a borderline student as ‘meeting’ rather than face the challenges of acknowledging that they’re not? The farce of the C/D borderline may just be recreated.
A better system expects a range of performance, and prepares to measure it. A system Primary School system I designed had five possible ‘states’, whereas the Secondary system we use is built on percentages. By capturing a truer picture of student performance we can guide teaching and learning in much greater detail.
Conclusion
I agree with most the NAHT’s report, and am glad to see such another strong contribution to the debate on assessment. However there are three main amendments that need to be made:
Acknowledge the two-way relationship between curriculum and assessment, and that criteria from the curriculum are of little use without accompanying assessment questions to bring them to life.
Consider the need for autonomy alongside the desire for consistency, lest we degenerate into a national monopoly that quashes innovation in assessment.
Remove the three ‘states’ model and encourage assessment systems that capture and use more information to represent the true spectrum of students’ achievements.
On performance related pay I am a believer in principle but a sceptic in practice. After reading Policy Exchange’s report published yesterday, “Reversing the Widget Effect“, I remain so. However I am coming to believe that PRP can be rescued, and that a more flexible and transparent system could help teachers to improve by improving the quality of professional development in schools.
This is a heated topic of conversation, and far too closely tied to mistrust of the political establishment and insinuations about privatising education. This much is evidenced by the disparity between two recent polls on PRP: when YouGov asked on behalf of Policy Exchange 89% of teachers were in favour of PRP in principle; when YouGov asked on behalf of the NUT in a survey about the government’s reforms, 81% were against PRP. Context here is king, and separating PRP from opinions about Michael Gove’s personal integrity is essential if we’re to have any semblance of rational debate.
PRP in Principle
The foreword to Matthew Robb’s report is written by George Parker, a former US union leader turned advocate of PRP. Branded a traitor by teaching unions in the States, Parker recounts a lightbulb moment he had after delivering a speech at a “high poverty primary school”. He writes that:
“Afterwards, a little girl came up to me and hugged me, and said that no-one had ever said that before. No-one had ever been fighting for them to get a better education. And in the car on the way back, I realised: you lied. You lied to that little girl. Because I didn’t really care about her, and getting good teachers in front of her. In fact, I’d just spent $10,000 to overturn a firing and keep a bad teacher in that school – a bad teacher I would not want anywhere near my own granddaughter…”
The PX Report devotes a lot of time to addressing this ‘in principle’ case, that it is almost morally wrong to reward poor or mediocre performance in the same way as good and excellent performance. I do strongly agree with their argument here. We should be doing everything possible to ensure that all children receive the best education, and as the biggest determinant of that is the teacher they have, we should be putting all of our effort into improving teaching. If tying together pay and accountability make even a marginal difference to student outcomes, then in principle we should be accepting PRP.
The Status Quo is Inadequate
The first step in Robb’s argument is that the apparently performance related status quo has ceased to reward performance. He references a report finding no relationship between the Ofsted quality of teaching grade a school is given and the average teaching salary in that school, and shows us the distribution of pay bands within schools of different Ofsted ratings. This evidence is damning. A pay system that has no relationship with performance is wasting taxpayers’ money.
Nor can it be argued that experience or tenure is a good proxy for performance. Do First Impressions Matter?, a recent paper by Atteberry, Loeb and Wyckoff, shows that of teachers whose first year performance is in the lowest quintile, 62% remain in the bottom two quintiles five years later. More worryingly they show that although the gap between the top and bottom quintiles closes, this is not just because the bottom quintile get better but because the top quintile actually get worse, with those in between largely stagnating.
With no evidence to suggest that the current system either is or should be working as we desire, in principle we should be looking for a new one.
The In-Principle Argument for PRP
There seems to me to be a reasonable causal chain, backed up by evidence, from well-implemented PRP to better student outcomes. PRP causes them to exert greater effort/raised extrinsic motivation. This leads to more deliberate practice, which leads to increased student outcomes.
i. Raising extrinsic motivation As Robb recognises, “it is not in doubt that for the majority of teachers, the primary motivation is to help their pupils progress”. Nonetheless even the most virtuous of teachers can be influenced to some extent by external factors, of which pay is one. The actual evidence on the relationship between teacher pay and teacher effectiveness is mixed. Few teachers cite pay as a motivation for entering the teaching profession, yet many cite it as a reason for leaving. Comparative international studies show that countries where teacher pay is higher have better student outcomes, but they do not conclusively show that a performance aspect of this pay is significant.
This is definitely the weakest link in the PRP causal chain. The most robust element of Robb’s argument is that higher pay, through PRP, would attract and retain good teachers who would otherwise either not enter or leave teaching. This is undoubtedly a positive effect, but I question whether this effect alone is enough to warrant the effort that implementing PRP would be. Rather I am compelled by Dylan Wiliam’s argument that improving the quality of entrants into the teaching profession will take a long time to have a relatively small effect, and therefore that “the key to improvement of educational outcomes is investment in teachers already working in our schools”. I am unaware of any evidence suggesting that there would be a sufficiently large influx of suitably talented new teachers under a new pay regime to undermine Wiliam’s argument.
More compelling, but less well evidenced, is the claim that PRP could increase the extrinsic motivation of teachers in schools. Nonetheless it seems to me that building teacher performance into the formal accountability proceedings of a school, tied to a teacher’s progression up the pay scale, cannot fail to increase the incentives for teachers to improve their performance. Not only this, but it places a much greater pressure on the school to improve its teachers (more on this later on). I believe, as I will argue later, that even if the impact on the motivation of teachers were to be minimal (although much evidence does suggest otherwise, as Robb discusses), the impact on school processes would be enough to drive the improvement we seek.
ii. Deliberate practice The second causal leap in the above chain is that increased motivation leads to increased deliberate practice. Much has been written about the role of deliberate practice in improving performance across domains. The canonical violinists study showed how practice, not talent, was the determinant of a great violinist, and although more recent evidence has shown the role of innate talent in some physical pursuits, deliberate practice still reigns in most other domains. Teaching, for example, is one of these, as discussed in Alex Quigley’s blog on applying deliberate practice to become a better teacher.
If deliberate practice improves teaching quality then the leap to better student outcomes is a straightforward one. Robb references research showing that the difference between a teacher in the 25th percentile and a teacher in the 75th percentile is 0.4 GCSE points per subject, whilst the difference between the 5th and 95th percentiles is 1 whole GCSE point per subject.
The causal chain from PRP to better student outcomes works in principle, and as George Parker argues, we have a moral obligation to take that very seriously indeed.
PRP in Practice
Robb’s argument for PRP hinges on a school’s ability to accurately measure teacher performance. Using the results of the Measures of Effective Teaching (MET) project, Robb dismisses the claim that teaching quality cannot accurately be measured. He does so too hastily.
The MET results are certainly positive, and have taught us a great deal about measuring effective teaching. Of particular interest for me was the significant predictive power of student surveys, something I’m confident would not be particularly popular with teaching unions. Robb argues, based on the MET results, that an appropriately weighted basket of measures, preferably averaged over two years, would be sufficiently accurate to determine a teacher’s pay.
I am less convinced. Robb’s report includes a table (below) comparing teacher effectiveness by quintile in two consecutive years. It finds that “the variance is such that only half the teachers assessed as being in the lowest quintile of performance in one year are in the lowest two quintiles the following year – and a third of those assessed as being in the top quintile in one year have moved to the lowest two quintiles as well!”
Even the most reliable measure in the MET study (an equally weighted basket of state test results, observations, and student surveys) only had a reliability of 0.76, and this is using observations where observers have been specially trained and certified in a far more rigorous system than anything commonly used in Britain. Indeed Wiliam quotes research showing that to achieve a reliability of 0.9 in assessing teacher quality from observation a teacher would have to be observed teaching six different classes by five independent observers. This is hardly a viable proposition.
Although Robb is willing to write off these difficulties by arguing for averages over greater periods of time, or focusing on extreme performance, neither of these are good enough solutions to the reliability problem. As he himself argues, for PRP to be workable it needs “a solid performance evaluation system that teachers support”. A system where a third of teachers fluctuate from the top to the bottom each year is neither solid, nor likely to be supported.
Squaring the Circle: Professional Development Targets
Although I am sceptical of PRP as suggested in the Policy Exchange report because of its reliance on unreliable measures of teacher quality, I am reluctant to throw away the potential to improve student outcomes through the use of pay reform. The clearest lever by which this would work is improving professional development.
Wiliam identifies that teachers, on the whole, stop improving after two or three years in the profession. He suspects, as do I, that this is strongly linked to the poor availability of good-quality feedback for teachers post-qualification. Deliberate practice is hard without feedback. Where we differ is on how to improve the feedback cycle for teachers to better support good quality deliberate practice. Wiliam so far is relying on the goodwill of schools. Although this might be enough for some schools, it will not be enough for all. PRP could be the way to radically improve the support schools give their staff in order to become more effective teachers. The combination of upward pressure from teachers demanding the support they need to improve, and downward pressure from regulators demanding an improvement in more accurately measured teacher quality, is significant and powerful enough to change the face of professional development in most schools.
i. Upward pressure from teachers As Robb argues, teachers who are judged on their performance will demand better feedback, coaching and training. They will insist on frequent, good-quality feedback that helps them to improve, and schools will be compelled to provide this. Once a teacher is given appropriate feedback they are much more able to improve through a cycle of deliberate practice, and to therefore improve the performance of the students they teach.
ii. Downward pressure from administrators Robb writes that “The implementation of performance-related pay will require Heads and senior managers to undertake more rigorous performance evaluations of their staff…[this] will also force managers to more explicitly acknowledge the range of teacher performance in their school and act on it.” Once a school has explicitly measured the quality of teaching in the school as part of a more rigorous framework, they will be compelled – by Ofsted and by governors – to do more to improve it.
My question is whether a system of PRP can be designed that replaces the attempted measurement of objective performance with more of a focus on development. Could we, for example, set and more accurately measure specific targets related to a teacher’s improvement, rather than try to measure their ethereal ‘effectiveness’? Poorly measured effectiveness is not transparent, so does not help a teacher to improve. The measure fails Robb’s own criterion. Drawing up a set of clear but demanding targets, on the basis of student performance data, (better) observation and student surveys would provide transparent objectives for teachers to meet. The involvement of pay would cause teachers to demand, and schools to offer, the support and feedback needed for deliberate practice, which in turn would improve student outcomes.
Conclusion
Performance related pay works in principle. It has great potential to improve student outcomes by encouraging and supporting deliberate practice amongst teachers. However systems attempting to measure teacher effectiveness are not sufficiently reliable for pay to be based on. Their unreliability would create confusion and unpopularity, which undermine the central arguments for PRP. A better system is for schools to take advantage of PRP powers to strengthen performance management, and use clear, demanding and evidence-based targets to improve teacher effectiveness. By combining teachers’ increased extrinsic motivation and schools’ increased pressure to provide good-quality support, teachers will become more effective and student outcomes will improve.