Performance Related Pay

On performance related pay I am a believer in principle but a sceptic in practice. After reading Policy Exchange’s report published yesterday, “Reversing the Widget Effect“, I remain so. However I am coming to believe that PRP can be rescued, and that a more flexible and transparent system could help teachers to improve by improving the quality of professional development in schools.

This is a heated topic of conversation, and far too closely tied to mistrust of the political establishment and insinuations about privatising education. This much is evidenced by the disparity between two recent polls on PRP: when YouGov asked on behalf of Policy Exchange 89% of teachers were in favour of PRP in principle; when YouGov asked on behalf of the NUT in a survey about the government’s reforms, 81% were against PRP. Context here is king, and separating PRP from opinions about Michael Gove’s personal integrity is essential if we’re to have any semblance of rational debate.

PRP in Principle

The foreword to Matthew Robb’s report is written by George Parker, a former US union leader turned advocate of PRP. Branded a traitor by teaching unions in the States, Parker recounts a lightbulb moment he had after delivering a speech at a “high poverty primary school”. He writes that:

“Afterwards, a little girl came up to me and hugged me, and said that no-one had ever said that before. No-one had ever been fighting for them to get a better education. And in the car on the way back, I realised: you lied. You lied to that little girl. Because I didn’t really care about her, and getting good teachers in front of her. In fact, I’d just spent $10,000 to overturn a firing and keep a bad teacher in that school – a bad teacher I would not want anywhere near my own granddaughter…”

The PX Report devotes a lot of time to addressing this ‘in principle’ case, that it is almost morally wrong to reward poor or mediocre performance in the same way as good and excellent performance. I do strongly agree with their argument here. We should be doing everything possible to ensure that all children receive the best education, and as the biggest determinant of that is the teacher they have, we should be putting all of our effort into improving teaching. If tying together pay and accountability make even a marginal difference to student outcomes, then in principle we should be accepting PRP.

The Status Quo is Inadequate

The first step in Robb’s argument is that the apparently performance related status quo has ceased to reward performance. He references a report finding no relationship between the Ofsted quality of teaching grade a school is given and the average teaching salary in that school, and shows us the distribution of pay bands within schools of different Ofsted ratings. This evidence is damning. A pay system that has no relationship with performance is wasting taxpayers’ money.

Nor can it be argued that experience or tenure is a good proxy for performance. Do First Impressions Matter?, a recent paper by Atteberry, Loeb and Wyckoff, shows that of teachers whose first year performance is in the lowest quintile, 62% remain in the bottom two quintiles five years later. More worryingly they show that although the gap between the top and bottom quintiles closes, this is not just because the bottom quintile get better but because the top quintile actually get worse, with those in between largely stagnating.

With no evidence to suggest that the current system either is or should be working as we desire, in principle we should be looking for a new one.

The In-Principle Argument for PRP

There seems to me to be a reasonable causal chain, backed up by evidence, from well-implemented PRP to better student outcomes. PRP causes them to exert greater effort/raised extrinsic motivation. This leads to more deliberate practice, which leads to increased student outcomes.

i. Raising extrinsic motivation
As Robb recognises, “it is not in doubt that for the majority of teachers, the primary motivation is to help their pupils progress”. Nonetheless even the most virtuous of teachers can be influenced to some extent by external factors, of which pay is one. The actual evidence on the relationship between teacher pay and teacher effectiveness is mixed. Few teachers cite pay as a motivation for entering the teaching profession, yet many cite it as a reason for leaving. Comparative international studies show that countries where teacher pay is higher have better student outcomes, but they do not conclusively show that a performance aspect of this pay is significant.

This is definitely the weakest link in the PRP causal chain. The most robust element of Robb’s argument is that higher pay, through PRP, would attract and retain good teachers who would otherwise either not enter or leave teaching. This is undoubtedly a positive effect, but I question whether this effect alone is enough to warrant the effort that implementing PRP would be. Rather I am compelled by Dylan Wiliam’s argument that improving the quality of entrants into the teaching profession will take a long time to have a relatively small effect, and therefore that “the key to improvement of educational outcomes is investment in teachers already working in our schools”. I am unaware of any evidence suggesting that there would be a sufficiently large influx of suitably talented new teachers under a new pay regime to undermine Wiliam’s argument.

More compelling, but less well evidenced, is the claim that PRP could increase the extrinsic motivation of teachers in schools. Nonetheless it seems to me that building teacher performance into the formal accountability proceedings of a school, tied to a teacher’s progression up the pay scale, cannot fail to increase the incentives for teachers to improve their performance. Not only this, but it places a much greater pressure on the school to improve its teachers (more on this later on). I believe, as I will argue later, that even if the impact on the motivation of teachers were to be minimal (although much evidence does suggest otherwise, as Robb discusses), the impact on school processes would be enough to drive the improvement we seek.

ii. Deliberate practice
The second causal leap in the above chain is that increased motivation leads to increased deliberate practice. Much has been written about the role of deliberate practice in improving performance across domains. The canonical violinists study showed how practice, not talent, was the determinant of a great violinist, and although more recent evidence has shown the role of innate talent in some physical pursuits, deliberate practice still reigns in most other domains. Teaching, for example, is one of these, as discussed in Alex Quigley’s blog on applying deliberate practice to become a better teacher.

If deliberate practice improves teaching quality then the leap to better student outcomes is a straightforward one. Robb references research showing that the difference between a teacher in the 25th percentile and a teacher in the 75th percentile is 0.4 GCSE points per subject, whilst the difference between the 5th and 95th percentiles is 1 whole GCSE point per subject.

The causal chain from PRP to better student outcomes works in principle, and as George Parker argues, we have a moral obligation to take that very seriously indeed.

PRP in Practice

Robb’s argument for PRP hinges on a school’s ability to accurately measure teacher performance. Using the results of the Measures of Effective Teaching (MET) project, Robb dismisses the claim that teaching quality cannot accurately be measured. He does so too hastily.

The MET results are certainly positive, and have taught us a great deal about measuring effective teaching. Of particular interest for me was the significant predictive power of student surveys, something I’m confident would not be particularly popular with teaching unions. Robb argues, based on the MET results, that an appropriately weighted basket of measures, preferably averaged over two years, would be sufficiently accurate to determine a teacher’s pay.

I am less convinced.  Robb’s report includes a table (below) comparing teacher effectiveness by quintile in two consecutive years. It finds that “the variance is such that only half the teachers assessed as being in the lowest quintile of performance in one year are in the lowest two quintiles the following year – and a third of those assessed as being in the top quintile in one year have moved to the lowest two quintiles as well!”

Even the most reliable measure in the MET study (an equally weighted basket of state test results, observations, and student surveys) only had a reliability of 0.76, and this is using observations where observers have been specially trained and certified in a far more rigorous system than anything commonly used in Britain. Indeed Wiliam quotes research showing that to achieve a reliability of 0.9 in assessing teacher quality from observation a teacher would have to be observed teaching six different classes by five independent observers. This is hardly a viable proposition.

Although Robb is willing to write off these difficulties by arguing for averages over greater periods of time, or focusing on extreme performance, neither of these are good enough solutions to the reliability problem. As he himself argues, for PRP to be workable it needs “a solid performance evaluation system that teachers support”. A system where a third of teachers fluctuate from the top to the bottom each year is neither solid, nor likely to be supported.

Squaring the Circle: Professional Development Targets

Although I am sceptical of PRP as suggested in the Policy Exchange report because of its reliance on unreliable measures of teacher quality, I am reluctant to throw away the potential to improve student outcomes through the use of pay reform. The clearest lever by which this would work is improving professional development.

Wiliam identifies that teachers, on the whole, stop improving after two or three years in the profession. He suspects, as do I, that this is strongly linked to the poor availability of good-quality feedback for teachers post-qualification. Deliberate practice is hard without feedback. Where we differ is on how to improve the feedback cycle for teachers to better support good quality deliberate practice. Wiliam so far is relying on the goodwill of schools. Although this might be enough for some schools, it will not be enough for all. PRP could be the way to radically improve the support schools give their staff in order to become more effective teachers. The combination of upward pressure from teachers demanding the support they need to improve, and downward pressure from regulators demanding an improvement in more accurately measured teacher quality, is significant and powerful enough to change the face of professional development in most schools.

i. Upward pressure from teachers
As Robb argues, teachers who are judged on their performance will demand better feedback, coaching and training. They will insist on frequent, good-quality feedback that helps them to improve, and schools will be compelled to provide this. Once a teacher is given appropriate feedback they are much more able to improve through a cycle of deliberate practice, and to therefore improve the performance of the students they teach.

ii. Downward pressure from administrators
Robb writes that “The implementation of performance-related pay will require Heads and senior managers to undertake more rigorous performance evaluations of their staff…[this] will also force managers to more explicitly acknowledge the range of teacher performance in their school and act on it.” Once a school has explicitly measured the quality of teaching in the school as part of a more rigorous framework, they will be compelled – by Ofsted and by governors – to do more to improve it.

My question is whether a system of PRP can be designed that replaces the attempted measurement of objective performance with more of a focus on development. Could we, for example, set and more accurately measure specific targets related to a teacher’s improvement, rather than try to measure their ethereal ‘effectiveness’? Poorly measured effectiveness is not transparent, so does not help a teacher to improve. The measure fails Robb’s own criterion. Drawing up a set of clear but demanding targets, on the basis of student performance data, (better) observation and student surveys would provide transparent objectives for teachers to meet. The involvement of pay would cause teachers to demand, and schools to offer, the support and feedback needed for deliberate practice, which in turn would improve student outcomes.


Performance related pay works in principle. It has great potential to improve student outcomes by encouraging and supporting deliberate practice amongst teachers. However systems attempting to measure teacher effectiveness are not sufficiently reliable for pay to be based on. Their unreliability would create confusion and unpopularity, which undermine the central arguments for PRP. A better system is for schools to take advantage of PRP powers to strengthen performance management, and use clear, demanding and evidence-based targets to improve teacher effectiveness. By combining teachers’ increased extrinsic motivation and schools’ increased pressure to provide good-quality support, teachers will become more effective and student outcomes will improve.

1 thought on “Performance Related Pay

  1. Laura

    To me, the diagram that shows the ‘drop off’ is actually the crux of this debate. In the end, getting really high scores is basically unsustainable. There’s a great moment in ‘The Unteachables’ shown about ten years ago now on Channel 4 in which Phil Beadle (super teacher!) pointed out that the only reason he was making the kids learn was because he was trading their sanity for his own.

    To be sustainable as a teacher the really bad get better, and the really awesome tone it down a bit. What you end up with by Year 3 is quite a small variation between teachers, particularly in ELA (English Language Arts), and that it is only really the 4th quintile who are doing really bad by their kids. The question is the extent to which this variance is then a matter of motivation, or whether it is simply a matter of a small proportion of people who were never great at the job having already gotten as good as they are likely to get. If so, we are better off moving this group out – and quickly – than we are trying to re-energise the top lot, as the evidence suggests that even if they get better it will be at the expensive of their sanity and be unsustainable. 70k doesn’t bring you extra hours or make you into a superhuman.

Comments are closed.