Business books are littered with soundbite quotations about measurement: “What you measure is what you get”; “If you can’t measure it you can’t improve it”; “You are what you measure”; etc. Although there are some risks in being over-reliant on certain measurements, the principle is true. Measurement brings you both information and accountability.
However, some things are hard to measure. How do we measure intangible things like behaviour? The most obvious answer, measuring the number of sanctions, doesn’t work. It would create perverse incentives that ultimately worsen behaviour – you’d make things look better if you stopped addressing poor behaviour, which would end up with chaos.
When deciding how to measure behaviour we stopped and thought about the particular areas we think we need to focus on. If “what you measure is what you get”, then we want to measure the outcomes we think we need to improve. This led us to two measures we’re launching this week:
1. Timing transitions
Our Assistant Heads of Year now have stopwatches to time and record how quickly we go through the end of break routine to move from social time into lessons. With three breaks a day, shaving a minute off this routine would reclaim 9.5 hours of lesson time – the equivalent of almost two school days.
2. Surveying staff satisfaction
I now send out a weekly one question survey to every member of staff who had to send a student to the behaviour team, asking how satisfied they are with the resolution of that situation on a scale of 1-10. Our systems are only working if our staff feel supported to teach great lessons without interruption. If they don’t then their morale will be low and our students will learn less
Of course there are dangers lurking when we become over-reliant on certain measures. The pitfalls of the public sector target-driven culture are well-documented, and we don’t want to become a place where the only thing that matters is a slim set of numbers. Our intention is to avoid this by changing the measures we use on a regular basis. When we are trying to measure something that we can only get at indirectly, like behaviour, then every measurement gives us a different angle on it. Switching between measurements gives us a more holistic view, and prevents us from working towards a distorted version of our end goal.
A final consequence of picking measurements is that they communicate what you care about. We choose to measure staff satisfaction because we care about it. I could repeat that we care about it every day, but that would have less power than deciding to measure satisfaction and using that measurement to hold ourselves accountable for how good a job we’re doing.
Over the coming few weeks we’ll see how well this works, and start thinking about how we measure other elements of the school.
PS: One thing we’re not yet sure of is whether and how we should use student insights as a measurement. We haven’t thought of the right question to ask yet, or the most efficient way to ask it, but we’re by no means closed to the idea.
Mastery learning is the belief that students should master a skill before moving on to learn a new one. In contrast to the classic spiral curriculum, where students raced between topics without properly learning any of them, a mastery curriculum gives students the space to learn a skill, understand it conceptually, and practise until it’s automatic.
This approach matters because of its effect on working memory. Students who have mastered previous skills have their working memory freed to learn new ones, while students who haven’t get bogged down in the basics and don’t have the working memory space to learn something new.
There are some important subtleties of definition that Steve Chinn picks up on. What it means to have mastered a topic must be clearly defined from the outset, or confusion will ensue. As understanding improves when students develop their conceptual map of maths and draw links between topics, we know that mastery early in school will not mean perfection. For me, mastery means two things:
The student can demonstrate or explain the concept orally, concretely, visually and abstractly.
The student can apply the concept automatically, so that it is not dominating their working memory.
Chinn does not engage with these fundamentals of mastery learning.
His first criticism is that mastery learning will not help children catch up, and that they should instead be taught with an emphasis “on understanding maths concepts”. Given that Singapore Maths and its mastery model is renowned for its focus on developing understanding, this seems like an odd criticism. Conceptual understanding is at the heart of mastery learning, especially of Singapore Maths and its concrete-pictorial-abstract model of learning mathematical concepts.
His second criticism is that mastery learning is flawed because the ordering of skills for teaching is imperfect. This is true – there is no universally accepted hierarchy of all skills. This does not detract from the obvious fact that some skills are dependent on others, and that these dependencies are important for the order in which we teach. Adding fractions requires a knowledge of lowest common multiples, which requires a knowledge of times tables. We may disagree on whether we should teach names of shapes or bar charts first in the gap between them, but we know they have to come in that order.
The next criticism is that mastery learning is flawed because some people, for unknown reasons, appear to learn things differently. Even if we accept this argument, I cannot see where it leads. Is the implication that we therefore don’t need to care about the order in which we teach topics, and should pull them from a hat? If order doesn’t matter for some people, why deprive the others of being taught in a logical sequence?
It is particularly dangerous to support such arguments with anecdotal success stories like the dyslexic maths student whose times table recall was not perfect. Anecdotes do not a policy make. This anecdote seems compelling precisely because it is so rare, and it is so rare because it is an exception to a large body of well established research. This student succeeded in spite of imperfect times tables, not because of them. That they succeeded against the odds is not a reason for us to stack the odds against everybody else.
I tried to discuss this on Twitter with Sam Freedman, but as his blog title points out, sometimes 140 characters isn’t enough…
The NAHT recently released the findings of their commission on assessment. They have attempted to set out a general framework for assessing without levels, including 21 recommendations, their principles of assessment, and a design checklist for system-building. All in all the report is a good one, capturing some of the most important principles for an effective system of assessment. However there are some significant problems to be fixed. Firstly, the report relies on ‘objective criteria’ to drive assessment, without recognising that criteria cannot be objective without assessments bringing them to life. Secondly, the report places a heavy emphasis on the need for consistency without recognising the need for schools to retain the autonomy to innovate in both curriculum and assessment. Thirdly, the report advocates assessment that forces students into one of only three boxes (developing, meeting or exceeding), instead of allowing for a more accurate spectrum of possible states. Here are my comments on some of the more interesting aspects of the report.
Summary of recommendations
4. Pupils should be assessed against objective and agreed criteria rather than ranked against each other. This seems eminently sensible – learning is not a zero sum game. The potential problem with this, however, is that ‘objective criteria’ are very rarely objective. In “Driven by Data”, Paul Bambrick-Santoyo makes a compelling case that criteria alone are not enough, as they are always too ambiguous on the level of rigour demanded. Instead, criteria must be accompanied by sample assessment questions that demonstrate the required level of rigour. So whilst I agree with the NAHT’s sentiment here, I’d argue that a criteria-based system cannot be objective without clear examples of assessment to set the level of rigour.
5. Pupil progress and achievement should be communicated in terms of descriptive profiles rather than condensed to numerical summaries (although schools may wish to use numerical data for internal purposes). Dylan Wiliam poses three key questions that are at the heart of formative assessment.
Where am I?
Where am I going?
How am I going to get there?
A school assessment system should answer these three questions, and a system that communicates only aggregated numbers does not. Good assessment should collect data at a granular level so that it serves teaching and learning. Aggregating this data into summary statistics is an important, but secondary, purpose.
7.Schools should work in collaboration, for example in clusters, to ensure a consistent approach to assessment. Furthermore, excellent practice in assessment should be identified and publicised, with the Department for Education responsible for ensuring that this is undertaken. The balance between consistency and autonomy will be the biggest challenge of the post-levels assessment landscape. Consistency allows parents and students to compare between schools, and will be particularly important for students who change schools during a key stage. Autonomy allows schools the freedom to innovate and design continually better systems of assessment from which we all can learn. I worry about calls for consistency, that they will degenerate into calls for homogeneity and a lowest common denominator system of assessment.
18. The use by schools of suitably modified National Curriculum levels as an interim measure in 2014 should be supported by government. However, schools need to be clear that any use of levels in relation to the new curriculum can only be a temporary arrangement to enable them to develop, implement and embed a robust new framework for assessment. Schools need to be conscious that new curriculum is not in alignment with the old National Curriculum levels. Can we please stick the last sentence of this to billboards outside every school? I really don’t think this message has actually hit home yet. Students in Year 7 and 8 are still being given levels that judge their performance on a completely irrelevant scale. This needs to stop, soon. I worry about this recommendation, which seems sensible at first, leading to schools just leaving levels in place for as long as possible. Who’s going to explain to parents that Level 5 now means Level 4 and a bit (we think, but we haven’t quite worked it out yet so just bare with us)?
Design Checklist
Assessment criteria are derived from the school curriculum, which is composed of the National Curriculum and our own local design. As above, it’s not a one way relationship from curriculum to assessment – the curriculum means little without assessment shedding light on what criteria and objectives actually mean. The difference between different schools’ curricula is another reason that the desired consistency becomes harder to achieve.
Each pupil is assessed as either ‘developing’, ‘meeting’ or ‘exceeding’ each relevant criterion contained in our expectations for that year. This is my biggest problem with the report’s recommendations. Why constrain assessment to offering only three possible ‘states’ in which a student can be? In homage to this limiting scale, I have three big objections:
Exceeding doesn’t make sense: The more I think about ‘exceeding’, the less sense it makes. If you’ve exceeded a criterion, haven’t you just met the next one? Surely it makes more sense to simply record that you have met an additional criterion that try to capture that information ambiguously by stating that you have ‘exceeded’ something lesser. For the student who is exceeding expectations, recording it in this way serves little formative purpose. The assessment system records that they’ve exceeded some things, but not how. It doesn’t tell them which ‘excess’ criteria they have met, or how to exceed even further. If it does do this because it records additional criteria as being met, what was the point of the exceeding grade in the first place?
I’m also struggling to see how you measure that a criterion has been exceeded. To do this you’d need questions on your assessment that measure more than the criterion being assessed. Each assessment would also have to measure something else, something in excess of the current criterion. The implication of all this is that when you’re recording a mark for one criterion, you’re also implicitly recording a mark for the next. Why do this? Why not just record two marks separately?
The NAHT report suggests using a traffic light monitoring system. Presumably green is for exceeding, and amber is for meeting. Why is meeting only amber? That just means expectations were not high enough to start with.
Limiting information: The system we use in our department (see more here) records scores out of 100. My ‘red’ range is 0-49, ‘amber’ is 50-69, and ‘green’ is 70-100. I have some students who have scored 70-75 on certain topics. Yes they got into the green zone, but they’re only just there. So when deciding to give out targeted homework on past topics, I’ll often treat a 70-75 score like a 60-70 score, and make sure they spend time solidifying their 70+ status. Knowing where a student lies within a range like ‘meeting’ is incredibly valuable. It’s probably measured in the assessment you’d give anyway. Why lose it by only recording 1, 2 or 3?
One high-stakes threshold: Thresholds always create problems. They distort incentives, disrupt measurement and have a knack for becoming way more important than they were ever intended to be. This proposed system requires teachers to decide if students are ‘developing’ or ‘meeting’. There is no middle ground. This threshold will inevitably be used inconsistently.
The first problem is that ‘meeting’ a criterion is really difficult to define. All teachers would need to look for a consistent level of performance. If left to informal assessment there is no hope of consistency. If judged by formal assessment then keep the full picture rather than squashing a student’s performance into the boxes of meeting or developing.
The second problem is that having one high-stakes threshold creates lots of dreadful incentives for teachers. Who wouldn’t be tempted to mark as ‘meeting’ the student who’s worked really hard and not quite made it, rather than putting them in a category with the student who couldn’t care less and didn’t bother trying. And what about the incentive to just mark a borderline student as ‘meeting’ rather than face the challenges of acknowledging that they’re not? The farce of the C/D borderline may just be recreated.
A better system expects a range of performance, and prepares to measure it. A system Primary School system I designed had five possible ‘states’, whereas the Secondary system we use is built on percentages. By capturing a truer picture of student performance we can guide teaching and learning in much greater detail.
Conclusion
I agree with most the NAHT’s report, and am glad to see such another strong contribution to the debate on assessment. However there are three main amendments that need to be made:
Acknowledge the two-way relationship between curriculum and assessment, and that criteria from the curriculum are of little use without accompanying assessment questions to bring them to life.
Consider the need for autonomy alongside the desire for consistency, lest we degenerate into a national monopoly that quashes innovation in assessment.
Remove the three ‘states’ model and encourage assessment systems that capture and use more information to represent the true spectrum of students’ achievements.
At Google, employees have their famed 20% time, where they work on projects of their choice that fall outside the scope of their usual job.
In Drive Daniel Pink tells the story of Atlassian, a software company who run quarterly FedEx days. On each of these days employees have 24 hours to work on any project of their choosing that relates to the company’s products.
Institutional innovation seems common in the computing sector – so why is it not in education?
Barriers to Innovation
Hierarchy – most schools are built on a hierarchical structure. They will differ in who rigid this is, but I’ve not heard of many where dissent and challenge are actively encouraged. Innovation is a process of “creative destruction”. It can only take place where the hierarchy allows elements of the status quo to be challenged and creatively destroyed.
Time – innovation takes time. Unlike a software company, schools cannot opt to stop working for a day, or afford to reduce timetables by 20% on account of speculative endeavours.
Orthodoxy – almost all teachers have been trained in the same dominant orthodoxy, and are used to being told that particular strategies are ‘right’. As a profession we have, until quite recently, been discouraged from thinking independently and challenging orthodoxy.
Silos – teachers tend to work in silos. Whether they be classrooms, departments, or even whole schools, the physical and organisational structures of education encourage teachers to work in silos rather than cross boundaries into other areas.
The Desire to Innovate
Schools are full of creative potential. Teachers know their students and their needs better than anyone else, and are best placed to drive the ideas and initiatives needed to improve their life chances. These barriers to innovation must be overcome. At WA we strongly believe that the best ideas will come from staff, and have begun trying to shape our culture of innovation.
Innovation Day
The first inset day this January was our first Innovation Day. Every member of staff was given the day to work on an idea or project of their choice. The only constraints were that:
It must contribute to the mission of the school.
It must have an impact beyond an individual teacher’s practice.
It cannot assume funding from the school budget.
Staff were in one of two streams. Developers submitted a project in advance, and got the day to work on bringing it closer to fruition. Over twenty five projects were submitted involving over sixty staff.
Innovators began the day with problem-solving around our five strategic priority areas, looking for the biggest underlying barriers and ways to combat them. Groups formed around good ideas, and they spent the rest of the day building these into more concrete plans.
The range and quality of innovations was incredible. To give a quick flavour we had:
A cross-curricular think tank founded, to develop stronger links and synergies across subjects.
A community World Cup programme designed, to open the school up as a community hub and take the opportunity to enthuse children of all ages about different subjects through football.
A programme for improving questioning developed, using a specially designed structure of lesson observation to pick out key successes and areas for development.
A new programme to develop presentation skills to be delivered through tutor time.
An improved induction programme for vulnerable students to make sure they settle in and succeed to the best of their abilities.
And about thirty more!
Lifting Barriers
Innovation Day worked because it lifted the above barriers.
Hierarchy – we explicitly said that anything goes, and no area of the school was off limits. To make this easier, senior leaders did not join in the rooms with other staff so that challenge could flow more freely. Instead groups that wanted to seek the advice of leadership booked consultation slots to go through their ideas.
Time – we freed up one day. One day is not enough, but it is a start!
Orthodoxy – all teachers were encouraged to challenge orthodoxy. Innovators had displays of prompts for their problem-solving, including things such as a table of Hattie’s effect sizes, and Prof Rob Coe’s great scatter graph. These prompted a challenge to some of the orthodoxies we have grown used to accepting.
Silos – staff chose the groups they worked in, but never fell back into silos during the day. Developers were roomed with projects tackling similar problems from different departments, and innovators were mixed from the start. Activities such as Idea Speed Dating created opportunities for further discussion outside of traditional school silos.
Lifting these barriers, just for a day, unleashed a huge amount of creative energy and has led to fantastic innovations to improve our students’ futures. Our challenge now is further minimising barriers in the longer term, so that innovation becomes part of our culture rather than an annual event.
On performance related pay I am a believer in principle but a sceptic in practice. After reading Policy Exchange’s report published yesterday, “Reversing the Widget Effect“, I remain so. However I am coming to believe that PRP can be rescued, and that a more flexible and transparent system could help teachers to improve by improving the quality of professional development in schools.
This is a heated topic of conversation, and far too closely tied to mistrust of the political establishment and insinuations about privatising education. This much is evidenced by the disparity between two recent polls on PRP: when YouGov asked on behalf of Policy Exchange 89% of teachers were in favour of PRP in principle; when YouGov asked on behalf of the NUT in a survey about the government’s reforms, 81% were against PRP. Context here is king, and separating PRP from opinions about Michael Gove’s personal integrity is essential if we’re to have any semblance of rational debate.
PRP in Principle
The foreword to Matthew Robb’s report is written by George Parker, a former US union leader turned advocate of PRP. Branded a traitor by teaching unions in the States, Parker recounts a lightbulb moment he had after delivering a speech at a “high poverty primary school”. He writes that:
“Afterwards, a little girl came up to me and hugged me, and said that no-one had ever said that before. No-one had ever been fighting for them to get a better education. And in the car on the way back, I realised: you lied. You lied to that little girl. Because I didn’t really care about her, and getting good teachers in front of her. In fact, I’d just spent $10,000 to overturn a firing and keep a bad teacher in that school – a bad teacher I would not want anywhere near my own granddaughter…”
The PX Report devotes a lot of time to addressing this ‘in principle’ case, that it is almost morally wrong to reward poor or mediocre performance in the same way as good and excellent performance. I do strongly agree with their argument here. We should be doing everything possible to ensure that all children receive the best education, and as the biggest determinant of that is the teacher they have, we should be putting all of our effort into improving teaching. If tying together pay and accountability make even a marginal difference to student outcomes, then in principle we should be accepting PRP.
The Status Quo is Inadequate
The first step in Robb’s argument is that the apparently performance related status quo has ceased to reward performance. He references a report finding no relationship between the Ofsted quality of teaching grade a school is given and the average teaching salary in that school, and shows us the distribution of pay bands within schools of different Ofsted ratings. This evidence is damning. A pay system that has no relationship with performance is wasting taxpayers’ money.
Nor can it be argued that experience or tenure is a good proxy for performance. Do First Impressions Matter?, a recent paper by Atteberry, Loeb and Wyckoff, shows that of teachers whose first year performance is in the lowest quintile, 62% remain in the bottom two quintiles five years later. More worryingly they show that although the gap between the top and bottom quintiles closes, this is not just because the bottom quintile get better but because the top quintile actually get worse, with those in between largely stagnating.
With no evidence to suggest that the current system either is or should be working as we desire, in principle we should be looking for a new one.
The In-Principle Argument for PRP
There seems to me to be a reasonable causal chain, backed up by evidence, from well-implemented PRP to better student outcomes. PRP causes them to exert greater effort/raised extrinsic motivation. This leads to more deliberate practice, which leads to increased student outcomes.
i. Raising extrinsic motivation As Robb recognises, “it is not in doubt that for the majority of teachers, the primary motivation is to help their pupils progress”. Nonetheless even the most virtuous of teachers can be influenced to some extent by external factors, of which pay is one. The actual evidence on the relationship between teacher pay and teacher effectiveness is mixed. Few teachers cite pay as a motivation for entering the teaching profession, yet many cite it as a reason for leaving. Comparative international studies show that countries where teacher pay is higher have better student outcomes, but they do not conclusively show that a performance aspect of this pay is significant.
This is definitely the weakest link in the PRP causal chain. The most robust element of Robb’s argument is that higher pay, through PRP, would attract and retain good teachers who would otherwise either not enter or leave teaching. This is undoubtedly a positive effect, but I question whether this effect alone is enough to warrant the effort that implementing PRP would be. Rather I am compelled by Dylan Wiliam’s argument that improving the quality of entrants into the teaching profession will take a long time to have a relatively small effect, and therefore that “the key to improvement of educational outcomes is investment in teachers already working in our schools”. I am unaware of any evidence suggesting that there would be a sufficiently large influx of suitably talented new teachers under a new pay regime to undermine Wiliam’s argument.
More compelling, but less well evidenced, is the claim that PRP could increase the extrinsic motivation of teachers in schools. Nonetheless it seems to me that building teacher performance into the formal accountability proceedings of a school, tied to a teacher’s progression up the pay scale, cannot fail to increase the incentives for teachers to improve their performance. Not only this, but it places a much greater pressure on the school to improve its teachers (more on this later on). I believe, as I will argue later, that even if the impact on the motivation of teachers were to be minimal (although much evidence does suggest otherwise, as Robb discusses), the impact on school processes would be enough to drive the improvement we seek.
ii. Deliberate practice The second causal leap in the above chain is that increased motivation leads to increased deliberate practice. Much has been written about the role of deliberate practice in improving performance across domains. The canonical violinists study showed how practice, not talent, was the determinant of a great violinist, and although more recent evidence has shown the role of innate talent in some physical pursuits, deliberate practice still reigns in most other domains. Teaching, for example, is one of these, as discussed in Alex Quigley’s blog on applying deliberate practice to become a better teacher.
If deliberate practice improves teaching quality then the leap to better student outcomes is a straightforward one. Robb references research showing that the difference between a teacher in the 25th percentile and a teacher in the 75th percentile is 0.4 GCSE points per subject, whilst the difference between the 5th and 95th percentiles is 1 whole GCSE point per subject.
The causal chain from PRP to better student outcomes works in principle, and as George Parker argues, we have a moral obligation to take that very seriously indeed.
PRP in Practice
Robb’s argument for PRP hinges on a school’s ability to accurately measure teacher performance. Using the results of the Measures of Effective Teaching (MET) project, Robb dismisses the claim that teaching quality cannot accurately be measured. He does so too hastily.
The MET results are certainly positive, and have taught us a great deal about measuring effective teaching. Of particular interest for me was the significant predictive power of student surveys, something I’m confident would not be particularly popular with teaching unions. Robb argues, based on the MET results, that an appropriately weighted basket of measures, preferably averaged over two years, would be sufficiently accurate to determine a teacher’s pay.
I am less convinced. Robb’s report includes a table (below) comparing teacher effectiveness by quintile in two consecutive years. It finds that “the variance is such that only half the teachers assessed as being in the lowest quintile of performance in one year are in the lowest two quintiles the following year – and a third of those assessed as being in the top quintile in one year have moved to the lowest two quintiles as well!”
Even the most reliable measure in the MET study (an equally weighted basket of state test results, observations, and student surveys) only had a reliability of 0.76, and this is using observations where observers have been specially trained and certified in a far more rigorous system than anything commonly used in Britain. Indeed Wiliam quotes research showing that to achieve a reliability of 0.9 in assessing teacher quality from observation a teacher would have to be observed teaching six different classes by five independent observers. This is hardly a viable proposition.
Although Robb is willing to write off these difficulties by arguing for averages over greater periods of time, or focusing on extreme performance, neither of these are good enough solutions to the reliability problem. As he himself argues, for PRP to be workable it needs “a solid performance evaluation system that teachers support”. A system where a third of teachers fluctuate from the top to the bottom each year is neither solid, nor likely to be supported.
Squaring the Circle: Professional Development Targets
Although I am sceptical of PRP as suggested in the Policy Exchange report because of its reliance on unreliable measures of teacher quality, I am reluctant to throw away the potential to improve student outcomes through the use of pay reform. The clearest lever by which this would work is improving professional development.
Wiliam identifies that teachers, on the whole, stop improving after two or three years in the profession. He suspects, as do I, that this is strongly linked to the poor availability of good-quality feedback for teachers post-qualification. Deliberate practice is hard without feedback. Where we differ is on how to improve the feedback cycle for teachers to better support good quality deliberate practice. Wiliam so far is relying on the goodwill of schools. Although this might be enough for some schools, it will not be enough for all. PRP could be the way to radically improve the support schools give their staff in order to become more effective teachers. The combination of upward pressure from teachers demanding the support they need to improve, and downward pressure from regulators demanding an improvement in more accurately measured teacher quality, is significant and powerful enough to change the face of professional development in most schools.
i. Upward pressure from teachers As Robb argues, teachers who are judged on their performance will demand better feedback, coaching and training. They will insist on frequent, good-quality feedback that helps them to improve, and schools will be compelled to provide this. Once a teacher is given appropriate feedback they are much more able to improve through a cycle of deliberate practice, and to therefore improve the performance of the students they teach.
ii. Downward pressure from administrators Robb writes that “The implementation of performance-related pay will require Heads and senior managers to undertake more rigorous performance evaluations of their staff…[this] will also force managers to more explicitly acknowledge the range of teacher performance in their school and act on it.” Once a school has explicitly measured the quality of teaching in the school as part of a more rigorous framework, they will be compelled – by Ofsted and by governors – to do more to improve it.
My question is whether a system of PRP can be designed that replaces the attempted measurement of objective performance with more of a focus on development. Could we, for example, set and more accurately measure specific targets related to a teacher’s improvement, rather than try to measure their ethereal ‘effectiveness’? Poorly measured effectiveness is not transparent, so does not help a teacher to improve. The measure fails Robb’s own criterion. Drawing up a set of clear but demanding targets, on the basis of student performance data, (better) observation and student surveys would provide transparent objectives for teachers to meet. The involvement of pay would cause teachers to demand, and schools to offer, the support and feedback needed for deliberate practice, which in turn would improve student outcomes.
Conclusion
Performance related pay works in principle. It has great potential to improve student outcomes by encouraging and supporting deliberate practice amongst teachers. However systems attempting to measure teacher effectiveness are not sufficiently reliable for pay to be based on. Their unreliability would create confusion and unpopularity, which undermine the central arguments for PRP. A better system is for schools to take advantage of PRP powers to strengthen performance management, and use clear, demanding and evidence-based targets to improve teacher effectiveness. By combining teachers’ increased extrinsic motivation and schools’ increased pressure to provide good-quality support, teachers will become more effective and student outcomes will improve.
This is the second of three posts reflecting on my first term as Curriculum Lead for Maths. The last post, on our new post-levels assessment system, can be found here.
A New Key Stage 3 Curriculum The curriculum is so much more than a statement of what is to be taught and when. It embodies a school’s vision for its students and its philosophy of learning. I can look at a school’s mathematics curriculum and tell you all about the person who wrote it – their expectations of students, their hopes for their futures, their beliefs about how to get there. The curriculum is the embodiment of all these things, and it is crucial to get it right.
To begin writing a curriculum you must begin from a vision of the mathematicians you want your students to become. Mine is that I want our students to become “knowledgeable problem-solvers who relish the challenge Mathematics offers”. They should, at the end of their time with us, be able to independently tackle an unfamiliar mathematical problem and create a meaningful solution to it.
I am mindful when considering this vision, of the roaring debate around discovery and project-based learning, and how it can fall foul of WIllingham’s novice-expert distinction. My view is this:
Knowing that students begin novices, the purpose of education is to make them into experts.
The curriculum needs to train students. We cannot assume expert qualities of them from the start, and plunge them into investigations where most or many students will fail to learn. Similarly, we cannot dogmatically write off any activity involving discovery, investigation or project work. I emphatically want my students to be capable of expert investigation when they leave school, and so our curriculum must explicitly prepare them for that. The last part of this post in particular looks at how we manage this in a practical way.
From the vision of how students should leave school, I drew up three design principles:
1) The curriculum must develop fluency. 2) The curriculum must develop conceptual understanding. 3) The curriculum must teach students to solve problems.
Principle 1: Developing Fluency
When I wrote the last iteration of our school’s KS3 Maths curriculum, I abandoned the traditional spiral structure and opted for a depth before breadth approach. We probably halved the amount of content covered in a year, as we wanted to give students the time they needed to develop fluency. This year we’ve cut it again. Each of the six terms covers a maximum of three ‘topics’, most of which are closely linked. Terms in Year 7, for example, look like this:
Mental addition and subtraction; Decimal addition and subtraction; Rounding
Mental multiplication and division; Decimal multiplication and addition; Factors and multiples
Understanding fractions; Operations with fractions
Generalising with algebra (expressions and functions only)
Properties of 2D shapes; Angle rules
Equivalence between fractions, decimals and percentages
Smaller concepts that tie closely with big ones above are taught alongside them. For example, perimeter is taught in Term 1 alongside addition, area and the mean average are both taught in Term 2 alongside multiplication and division.
Since we give so much time to teaching each mathematical skill, we expect a high degree of fluency. To take the National Curriculum’s definition, fluency is students’ ability to “recall and apply their knowledge rapidly and accurately to problems“. It means not just being able to do something, but being able to reliably do it well and quickly. I would add that a necessary condition for fluency in a skill or operation is that it is embedded in your long-term memory.
This is exceptionally valuable in mathematics. A student may learn to be able to multiply decimals, but not become fluent in it. When multiplying decimals they have to slow down, to stop and think, and may make mistakes. This means that in Term 4 when they are learning to substitute into formulae with decimal numbers, they will face two severe problems. Firstly, their working memory will be occupied thinking about multiplying decimal numbers together, and not about substituting into formulae. Secondly, their reduced pace will mean that they have less exposure to substituting into formulae in each lesson. Overall they will spend less time thinking about the new concept they are supposed to be learning, and will learn it less well as a consequence (after all, “memory is the residue of thought”).
Developing fluency then, means lots of practice time with well thought out problems. Practice has got a bad reputation in mathematics, with too many people having been turned off maths by pages of repetitive textbook questions. My response is that practice need not mean making maths dry or uninspiring. Our students appreciate the value of practice as something that gives them the skills to do fun maths, and to achieve things they are proud of. Practice is invaluable, but can be dangerous if not used alongside meaningful and motivating problems.
If fluency is about rapid and accurate recall of knowledge, then Kris Boulton will tell you that fluency depends on high storage strength and high retrieval strength. A depth-focused curriculum gives us storage strength, but could easily sabotage retrieval strength if knowledge is not revisited. This is probably our biggest area to work on. The curriculum includes notes about what content to revisit when (thanks Kris!) and our assessments presume previous content as mastered prior knowledge. However we haven’t yet found a more structured way of revisiting content consistently across classes.
Principle 2: Conceptual Understanding
It is not enough for a curriculum to say what to teach. A meaningful curriculum also says how to teach it. At WA we’re big believers in the Singaporean approach of concrete-pictorial-abstract (CPA), and use this to structure our teaching. One of the reasons mathematical understanding in Britain is historically so poor is because students have been immediately confronted with abstract representations, representations that are well separated from any concrete reality, and not been given enough support in understanding them.
A favourite example of mine is ratio. I meet strikingly few students who can answer a question of the following type correctly:
“Bill and Ben share sunflower seeds in the ratio 3:2. If Ben has 20 sunflower seeds, how many does Bill have?”
I’d love to do some research and rely on less anecdotal evidence, but I’d guess that more British 16 year olds would say 12 than would say the correct answer of 30. Why? Because they were taught ratio in a completely abstract way, where they learned to apply a method but didn’t every receive the support needed to understand the concept of ratio.
In our curriculum, however, the pictorial bar model is central to teaching ratio. In fact I don’t teach my students an abstract method (they’re perfectly capable of coming up with it for themselves by doing the bars mentally and writing down calculations). For the unfamiliar, a bar model to represent the above problem would look like this:
Students draw the ratio, label what they know, work out the size of each block and then the size of Bill’s bar. I am yet to find a student who doesn’t understand this method, and who can’t do considerably harder problems using it. This is the benefit of having a pictorial representation to help students understand the concept they are learning, and to soften the jump into pure abstract. Every topic in our curriculum comes with CPA guidance to develop strong conceptual understanding in all students.
Also key to developing conceptual understanding are links between areas of mathematics. I am eternally frustrated by how students see maths as broken down into small discrete chunks that have little or no relationship with one another. Even when we have topics that are just different representations of identical concepts (sequences and linear graphs, for example), few British students will ever see them as linked. At the core of our curriculum then is a sequence carefully designed to make every concept learned useful to a later one. More than this, it guides teachers to make links, and uses assessment to make sure students are comfortable making these.
Principle 3: Problem-solving
Mathematics is essentially the study of problem-solving. The process of mathematical abstraction has been followed for millennia because it is so useful for generalising and solving what the National Curriculum calls “some of history’s most intriguing problems”. If our students are to become the experts we want them to be when they leave, we need to train them in problem-solving now.
For me, problem-solving is a skill to be taught, and it should be taught like any other. Adept problem-solvers have not come to be so through innate talent, but because they have seen the solutions to many problems before and are able to spot similarities and apply familiar techniques. Our curriculum aims to teach students the most powerful problem-solving techniques by exposing them to a carefully selected sequence of problems, some of which are taught and some of which are independently worked on.
Each term has a problem-solving focus. For example, Term 1 was “Working systematically”. Students began with a problem where they had to work out how many different possible orders there were for a two course and then a three course set menu at a restaurant. They began using ordered lists to write out combinations, before speculating on general rules and checking them on new possibilities. Through a range of different problems in the term students learned (a) how to work systematically in different contexts, and (b) the value of doing so.
Conclusion
Our curriculum has definitely met the three design principles set out, and is working well for our students. Depth before breadth has given them time to become fluent, to develop conceptual understanding and to solve problems. They see the value in mathematics as they’re exposed to interesting and meaningful problems. However this is done in a deliberate and structured way to make sure they are learning throughout. By applying the concrete-pictorial-abstract principle throughout we make sure that all students can interact with the concepts they’re learning and develop their understanding to a deeper level.
For me, we have two key things to work on after Christmas. Firstly, the revisiting of prior knowledge. We need to keep retrieval strength high, and must find a more structured way of doing this. Secondly, developing the guidance we give for teaching, particularly around drawing links between areas of maths. Although this happens well it is not yet a big enough part of our formal curriculum documents, which risks it slipping away in future.
I’ve been fairly absent from blogging/Twitter since the summer – an inevitable consequence of taking up a few new roles amidst the discord of new systems and specifications emerging from gov.uk with increasing regularity. But I don’t mean that as a complaint. Much that was there was broken, and much that is replacing it is good. Although life in the present discord is manic and stressful, it is also a time of incredible opportunity to improve on what went before, and to rework many of the systems in teaching that went unquestioned in schools for too long.
This Christmas I’m stopping to reflect on the term gone by, and on our efforts to improve three areas: Assessment, Curriculum, and Teaching & Learning. There are many failures, many ideas that failed to translate from paper to practice, but also a good number of successes to learn from and develop in January.
A Blank Slate
KS3 SATs died years ago. National Curriculum levels officially die in September, but can be ‘disapplied’ this year. With tests and benchmarks gone, there is a blank slate in KS3 assessment. This is phenomenally exciting. Levels saturated schools with problems – they were a set of ‘best fit’ labels, good only for summative assessment, that got put at the heart of systems for formative assessment. No wonder they failed.
At WA we decided to try building a replacement system, trialled in Maths, that could ultimately achieve what termly reporting of NC levels never could. We began with three core design principles:
1) It has to guide teaching and learning (it must answer the question “what should I do tonight to get better at Maths?”). 2) It has to be simple for everyone to understand. 3) It has to prepare students for the rigour of tougher terminal exams and challenging post-16 routes.
Principle 2 led us to an early decision – we wanted a score out of 100. This would be easy for everyone to understand, and by scoring out of 100 rather than a small number we are less likely to have critical thresholds where students’ scores are bunched and where disproportionate effort is concentrated. Scoring out of 100, we felt, would always encourage a bit more effort on the margin in a way that GCSE with eight grades fail to do.
Principle 1 led us to another early decision – we need data on each topic students learn. Without this, the system will descend into level-like ‘best fit’ mayhem, where students receive labels that don’t help them to progress. Yet there’s a tension here between principles 1 and 2. Principle 1 would have data on everything, separated at an incredibly granular level. However this would soon become tricky to understand and would ultimately render the system unused.
For me, Principle 3 ruled out using old SATs papers and past assessment material. These were tied to an old curriculum that did no adequately assess many of the skills we expect of our students. They also left too much of assessment to infrequent high-stakes testing, which does not encourage the work ethic and culture of study we value.
These three principles guided our discussions to the system we have now been running since September.
Our System
The Maths curriculum in Year 7-9 (featured in the next post) has been broken down into topics – approximately 15 per year. Each of these topics is individually assessed and given a score out of 100. This score is computed from three elements: an in-class quiz, homework results, and an end of term test. Students then get an overall percentage score, averaged from all of the topics they have studied so far. This means that for each student we have an indication of their overall proficiency at Maths, as well as detailed information on their proficiency at each individual topic. This is recorded by students, stored by teachers, and reported to parents six times a year.
Does it work?
Principle 1: Does it guide teaching and learning?
Lots of strategies have been put in place to make sure that it does. For example, the in-class quiz is designed to be taken after the material in a topic has been covered but before teaching time is over. The results are used to guide reteaching in the following lessons so that the students can retake with another quiz on that topic and increase their score. Teachers also produce termly action plans as a result of their data analysis, which highlight the actions needed to support particular students as well as adjustments needed to combat problematic whole class trends.
Despite this, we haven’t yet developed a culture of assessment scores driving independent study. Our vision is that students know exactly what they have to do each evening to improve at Maths, and I believe that this system will be integral to achieving that. We need a bigger drive to actively develop that culture, rather than expecting it to come organically.
Extract from the Year 7 assessment record sheet.
I’m also concerned that assessment at this level has not yet become seen as a core part of teaching and learning. Teachers are dedicated in their collection and recording of data, and have planned some brilliant strategies for extending their students’ progress. But it still just feels like an add-on, something additional to teaching rather than at the heart of it. One of our goals as a department next term must be to embed assessment data further into teaching; not to be content with it assisting from the side.
Principle 2: Is it easy to understand?
Unequivocally yes. Feedback from parents, tutors and students has been resoundingly positive. Each term we report each student’s overall score, as well as their result for each topic studied that term. One question for the future is how to make all past data accessible to parents, as by Year 9 there will be 40+ topics worth of information recorded.
Principle 3: Is it rigorous enough?
By making the decision to produce our own assessments from scratch we allowed ourselves to set the level of rigour. I like to think that if anything we’ve set it too high. We source and write demanding questions to really challenge students, and to prepare them to succeed in the toughest of exams. A particular favourite question of mine was asking Year 8 to find the Lowest Common Multiple of pqr and pq^2, closely rivalled by giving them the famed Reblochon cheese question from a recent GCSE paper.
The Reblochon cheese question – a Year 8 favourite.
Following the advice of Paul Bambrick-Santoyo (if you haven’t read Leverage Leadership then go to a bookshop now) we made all assessments available when teachers began planning to teach each topic. This has been a great success, and I’ve really seen the Pygmalion effect in action. By transparently raising the bar in our assessments, teachers have raised it in their lessons; and students have relished the challenge.
Verdict
This assessment system works. It clearly tells students, teachers and parents where each individual is doing well and where they need to improve. Nothing is obscured by a ‘best fit’ label, yet the data is still easy to understand. Freeing ourselves from National Curriculum levels freed us from stale SATs papers and their lack of ambition. Instead we set assessments that challenge students at a higher level – a challenge they have met. The next step is making data and assessment a core part of teaching. Just like NC levels were once a part of every lesson (in an unhelpful labelling way), the results of assessment should be central to planning and delivering each lesson now.
“If you know yourself but not the enemy, for every victory gained you will also suffer a defeat.”
Sun Tzu, The Art of War
The persistence of the achievement gap is in part down to its mysterious nature. Teachers, new and old, battle their way through classrooms trying to defeat this enemy, doing everything they can to close the gap. But do we really know what we’re fighting? We can all give reasons why the achievement gap exists – hearing more words when growing up, fewer adverse experiences, more opportunities, greater intellectual stimulation from parents, better surroundings for working, etc, etc, etc. Lists like these give us a sense of the scale and variety of the problem. However they do little to help us solve it. They’re too big, full of too many vaguely related things, and far too complex for an individual teacher to use to build a strategy.
It’s time for a bit of synthesis. We need to boil the problem down into one simple idea; one that is straightforward enough to apply in every classroom, yet powerful enough to close a tremendously persistent gap.
I would argue that the achievement gap is little more than a practice gap.
In recent years our understanding of what it takes to be successful has come a long way, and numerous pieces of research* point to one single defining cause of success – deliberate practice. We know that natural talent, whatever that may be, is a fairly insignificant factor in success. What matters more is the volume and quality of practice in a field. From Tiger Woods to Mozart, the world’s most prodigious talents are actually the world’s most committed practicers.
All of the influences listed at the start of this post, the influences often blamed for the achievement gap, are in some way influences on practice. They shape either its quantity or its quality. Rather than trying to tackle each of these separately and being overwhelmed by the scale of the problem, teachers should be empowered by seeing the problem for what it is – a practice gap. Children from lower socio-economic backgrounds get worse academic results than their wealthier peers because they have less deliberate practice.
Defining the problem in these terms gives teachers a new challenge:
How do I maximize the quantity and quality of practice my students get in my subject?
Doing so has three major advantages:
A clear problem is easier to solve.Looking separately at all of the different aspects of a problem is confusing and overwhelming. Looking straight to its heart is empowering. Teachers closing an achievement gap have to undo a host of past problems and effects. Teachers closing a practice gap have to maximize deliberate practice.
Two criteria to judge solutions. Every idea, new or old, is judged by asking two questions. How much does this increase the quantity of practice? How much does this increase the quality of practice? If there’s not a resoundingly positive answer to one of these questions, it’s not closing the gap.
It unites teachers around a common problem. When the problem is unclear teachers all see it differently, and use different criteria to judge solutions. Sharing a common understanding of the problem makes conversation more productive, improves the quality of ideas, and aligns teachers towards one specific aim.
I’m going to write two follow-up posts about the practice gap – one on quantity and one on quality. The aim is to provoke some thought about priorities for the classroom, and to guide myself into next year with some specific targets.
The nation is faced with a huge gap. It is a gap into which millions have fallen, and will fall, unless we are able to close it. It is why the UK has some of the worst social mobility in the world, and why your parents wealth is such a powerful predictor of your educational success. It can be represented in many ways, and seen through many lenses, but at its heart, it is just a practice gap.
*Some great books in this genre include Practice Perfect by Doug Lemov, Talent is Overrated by Geoff Colvin, and Outliers by Malcolm Gladwell. Dweck’s work on mindset is also pretty influential.
Next year I am launching a character development curriculum in school. I am incredibly excited. To me, this represents a step change in what we do – progressing from a school that gets people good qualifications to a school that ensures every child will become a successful adult.
But when I talk to people about this curriculum I fill with frustration when their response almost invariably includes a phrase such as “wishy-washy”, “hand-wavey” or “soft and fluffy”. This blog is my response to those comments. Primarily, it is about why not running this curriculum is the hand-wavey-ist option of all.
What is character development? The character development curriculum is designed to ensure every student leaves school prepared to become a successful adult. Just as we are failing children if they leave without an adequate level of literacy, we are failing them if they cannot be resilient in the face of difficulty. Our curriculum is focused on developing the key traits we believe students need to become successful adults in the future.
The ‘hand-wavey’ status quo The status quo is the very definition of hand-wavey. This pejorative is usually levelled at the sort of lessons that ask students to work, without guidance or rigour, in some faddish way that assumes absorption of skills or content simply by osmosis.
This is our system’s current approach to character traits. We ask students to work on something else, without guiding them or applying rigour, and assume that by doing this they will develop the traits they need to be successful. Students are expected to learn resilience without their teachers thinking about it, teaching for it, or assessing it. No wonder so many fail to develop it. A rigorous curriculum A rigorous curriculum is one that is uncompromising. It has challenging objectives, and demands that students meet them. It will not leave this process to chance, but focuses relentlessly on preparing students to be successful. This is the basis of the character development curriculum.
The curriculum is our commitment that we will not leave our students’ development of the most crucial skills to chance. We will teach them key content – through theory and examples – so that they have the founding knowledge to understand what the traits really are. We will tweak and arrange lessons and routines so that students are applying these traits to different situations, creating new experiences where necessary. We will also assess their progress, grading them clearly and taking their performance seriously.
This curriculum is not wishy-washy, hand-wavey or soft and fluffy. Students must know content from Aristotle to Seligman, understand it in a variety of contexts, and apply it to the vast range of situations they face. They must be able to examine their thoughts and actions in great detail, drawing comparisons between themselves and the theory they know. It encompasses a myriad of content and skills, and involves practising the use of them all in a very deliberate fashion.
This is a rigorous curriculum. Anything less is hand-wavey.