Monthly Archives: February 2014

Good, But Not There Yet – A verdict on the NAHT report on assessment

I tried to discuss this on Twitter with Sam Freedman, but as his blog title points out, sometimes 140 characters isn’t enough…

The NAHT recently released the findings of their commission on assessment. They have attempted to set out a general framework for assessing without levels, including 21 recommendations, their principles of assessment, and a design checklist for system-building. All in all the report is a good one, capturing some of the most important principles for an effective system of assessment. However there are some significant problems to be fixed.

Firstly, the report relies on ‘objective criteria’ to drive assessment, without recognising that criteria cannot be objective without assessments bringing them to life. Secondly, the report places a heavy emphasis on the need for consistency without recognising the need for schools to retain the autonomy to innovate in both curriculum and assessment. Thirdly, the report advocates assessment that forces students into one of only three boxes (developing, meeting or exceeding), instead of allowing for a more accurate spectrum of possible states.

Here are my comments on some of the more interesting aspects of the report.

Summary of recommendations

4. Pupils should be assessed against objective and agreed criteria rather than ranked against each other.
This seems eminently sensible – learning is not a zero sum game. The potential problem with this, however, is that ‘objective criteria’ are very rarely objective. In “Driven by Data”, Paul Bambrick-Santoyo makes a compelling case that criteria alone are not enough, as they are always too ambiguous on the level of rigour demanded. Instead, criteria must be accompanied by sample assessment questions that demonstrate the required level of rigour. So whilst I agree with the NAHT’s sentiment here, I’d argue that a criteria-based system cannot be objective without clear examples of assessment to set the level of rigour.

5. Pupil progress and achievement should be communicated in terms of descriptive profiles rather than condensed to numerical summaries (although schools may wish to use numerical data for internal purposes).
Dylan Wiliam poses three key questions that are at the heart of formative assessment. 

  • Where am I? 
  • Where am I going? 
  • How am I going to get there?

A school assessment system should answer these three questions, and a system that communicates only aggregated numbers does not. Good assessment should collect data at a granular level so that it serves teaching and learning. Aggregating this data into summary statistics is an important, but secondary, purpose.

7. Schools should work in collaboration, for example in clusters, to ensure a consistent approach to assessment. Furthermore, excellent practice in assessment should be identified and publicised, with the Department for Education responsible for ensuring that this is undertaken.
The balance between consistency and autonomy will be the biggest challenge of the post-levels assessment landscape. Consistency allows parents and students to compare between schools, and will be particularly important for students who change schools during a key stage. Autonomy allows schools the freedom to innovate and design continually better systems of assessment from which we all can learn. I worry about calls for consistency, that they will degenerate into calls for homogeneity and a lowest common denominator system of assessment.

18. The use by schools of suitably modified National Curriculum levels as an interim measure in 2014 should be supported by government. However, schools need to be clear that any use of levels in relation to the new curriculum can only be a temporary arrangement to enable them to develop, implement and embed a robust new framework for assessment. Schools need to be conscious that new curriculum is not in alignment with the old National Curriculum levels.
Can we please stick the last sentence of this to billboards outside every school? I really don’t think this message has actually hit home yet. Students in Year 7 and 8 are still being given levels that judge their performance on a completely irrelevant scale. This needs to stop, soon. I worry about this recommendation, which seems sensible at first, leading to schools just leaving levels in place for as long as possible. Who’s going to explain to parents that Level 5 now means Level 4 and a bit (we think, but we haven’t quite worked it out yet so just bare with us)?

Design Checklist

Assessment criteria are derived from the school curriculum, which is composed of the National Curriculum and our own local design.
As above, it’s not a one way relationship from curriculum to assessment – the curriculum means little without assessment shedding light on what criteria and objectives actually mean. The difference between different schools’ curricula is another reason that the desired consistency becomes harder to achieve.

Each pupil is assessed as either ‘developing’, ‘meeting’ or ‘exceeding’ each relevant criterion contained in our expectations for that year.
This is my biggest problem with the report’s recommendations. Why constrain assessment to offering only three possible ‘states’ in which a student can be? In homage to this limiting scale, I have three big objections:

  1. Exceeding doesn’t make senseThe more I think about ‘exceeding’, the less sense it makes. If you’ve exceeded a criterion, haven’t you just met the next one? Surely it makes more sense to simply record that you have met an additional criterion that try to capture that information ambiguously by stating that you have ‘exceeded’ something lesser. For the student who is exceeding expectations, recording it in this way serves little formative purpose. The assessment system records that they’ve exceeded some things, but not how. It doesn’t tell them which ‘excess’ criteria they have met, or how to exceed even further. If it does do this because it records additional criteria as being met, what was the point of the exceeding grade in the first place?

    I’m also struggling to see how you measure that a criterion has been exceeded. To do this you’d need questions on your assessment that measure more than the criterion being assessed. Each assessment would also have to measure something else, something in excess of the current criterion. The implication of all this is that when you’re recording a mark for one criterion, you’re also implicitly recording a mark for the next. Why do this? Why not just record two marks separately?

    The NAHT report suggests using a traffic light monitoring system. Presumably green is for exceeding, and amber is for meeting. Why is meeting only amber? That just means expectations were not high enough to start with.

  2. Limiting informationThe system we use in our department (see more here) records scores out of 100. My ‘red’ range is 0-49, ‘amber’ is 50-69, and ‘green’ is 70-100. I have some students who have scored 70-75 on certain topics. Yes they got into the green zone, but they’re only just there. So when deciding to give out targeted homework on past topics, I’ll often treat a 70-75 score like a 60-70 score, and make sure they spend time solidifying their 70+ status. Knowing where a student lies within a range like ‘meeting’ is incredibly valuable. It’s probably measured in the assessment you’d give anyway. Why lose it by only recording 1, 2 or 3?

  3. One high-stakes thresholdThresholds always create problems. They distort incentives, disrupt measurement and have a knack for becoming way more important than they were ever intended to be. This proposed system requires teachers to decide if students are ‘developing’ or ‘meeting’. There is no middle ground. This threshold will inevitably be used inconsistently.

    The first problem is that ‘meeting’ a criterion is really difficult to define. All teachers would need to look for a consistent level of performance. If left to informal assessment there is no hope of consistency. If judged by formal assessment then keep the full picture rather than squashing a student’s performance into the boxes of meeting or developing.

    The second problem is that having one high-stakes threshold creates lots of dreadful incentives for teachers. Who wouldn’t be tempted to mark as ‘meeting’ the student who’s worked really hard and not quite made it, rather than putting them in a category with the student who couldn’t care less and didn’t bother trying. And what about the incentive to just mark a borderline student as ‘meeting’ rather than face the challenges of acknowledging that they’re not? The farce of the C/D borderline may just be recreated.

A better system expects a range of performance, and prepares to measure it. A system Primary School system I designed had five possible ‘states’, whereas the Secondary system we use is built on percentages. By capturing a truer picture of student performance we can guide teaching and learning in much greater detail.

Conclusion

I agree with most the NAHT’s report, and am glad to see such another strong contribution to the debate on assessment. However there are three main amendments that need to be made:

  1. Acknowledge the two-way relationship between curriculum and assessment, and that criteria from the curriculum are of little use without accompanying assessment questions to bring them to life.
  2. Consider the need for autonomy alongside the desire for consistency, lest we degenerate into a national monopoly that quashes innovation in assessment.
  3. Remove the three ‘states’ model and encourage assessment systems that capture and use more information to represent the true spectrum of students’ achievements.

Trying is Risky

This blog is about the most powerful pedagogical lesson I’ve ever learned.

In my first year of teaching I had to write an essay about two under performing students I taught. I chose two Year 9 boys, both of whom had potential but whose behaviour was stopping them from achieving. I followed the behaviour policy, experimented with all the standard behaviour advice, and had great support from more senior staff, but their learning just wasn’t good enough. In my frustration with the lack of help from the recommended education literature I turned to a reliable old friend: game theory.

The Model

When coming into a lesson students can make one of two choices: to exert effort, or not to exert effort. In a school with a solid behaviour policy the students who choose not to exert effort may avoid work, complete only the bare minimum, or not spend enough time thinking to remember. In a school without a solid behaviour policy they may cause carnage.

The lesson they are coming into can be one of two things: it can be a good lesson, or it can be a bad lesson. A good lesson is one where a student will learn if they exert effort; a bad lesson is one where they may not.

These two sets of options give us a two by two matrix like this:



For each pair of inputs there are two outcomes, the student’s level of academic and social success.

Consider the student’s choice. If they choose to exert effort, they will get either the best or the worst outcome. If the lesson is a good one then they will be both academically and socially successful, having learned in class and appeared capable/talented in front of their peers. However if the lesson is a bad one then they will be both an academic and a social failure. They will not only have failed in learning, but by trying and failing they will be embarrassed as an incapable or unintelligent person.

If a student chooses not to exert effort they receive a certain outcome – academic failure and social success. They have no chance of succeeding academically as they do not try to learn, however their rejection of learning guarantees that they never try and fail – their social status is secure.

So how does a student make their choice? It depends on how likely they think the lesson is to be a good one. Call the student’s perceived probability of the lesson being good p. If p is high, then they’re more likely to choose to exert effort, as it’s more likely they will get the best available outcome.

Risk Aversion

Imagine p = 0.5; that is the probability of a lesson being good was 50%. In this case would a student choose to exert effort (gambling between the best and worst outcomes) or to not exert effort (accepting a certain, albeit mediocre, outcome)? Most students would, quite rationally, opt to not exert effort. The reason for this is that they’re risk averse. They’d much rather choose a strategy that guaranteed them an okay outcome than a strategy that gambles between a good outcome and a bad one.

Because students are risk averse, p will have to be a high value before they would consider taking the risk of trying in class. Otherwise they’d rather settle for the poor yet certain outcome of academic failure complemented by social success.

The goal for teachers is making p as high as possible so that all students, no matter how risk averse they may be, exert effort in school.

What makes p?

Remember that p is the student’s perception of the probability that the lesson will make sure they learn, if they exert effort. It’s not a measure of how good the lesson actually is, or anything to do with the actual quality of teaching. All that matters for the decision to exert effort is the student’s perception. This can be affected by a huge number of variables way beyond the teacher’s control. A very non-exhaustive list is:

  • the student’s self-esteem (p is low if “I can’t do it”)
  • the student’s prior experience of the subject (p is low if “I’ve never been able to learn this”)
  • stereotypes around learning (p is low if “people like me don’t do well at this”)
  • the school culture (p is low if “our school’s no good at this”)

Teacher quality plays a part (p is also low if “this teacher’s rubbish”), but is by no means the whole picture, and is often not the dominant factor.

Raising p

Students reason by induction. Just as they believe that the sun rises tomorrow because it has always risen before, they believe that they’ll do badly in Maths because they’ve always done so before. Raising p is about breaking this damaging chain of reasoning, and the only way to go is by forcing them to experience success. This means that you plan your lesson to make sure that if they exert any effort at all, they will have some measurable success.

A personal tale

At the start of January I took over a new class, who were pretty disengaged about Maths. Our first lesson wasn’t great – they came in expecting to do badly, and largely met their expectations. p was low. Our lessons since then have been an all out war of attrition to raise p, and make sure they believe that if they exert effort they absolutely will succeed. My p-raising lessons have a very distinct structure:

  1. Clearly defined, ambitious lesson objective that seems daunting and will be rewarding if met.
  2. Sub-skills or steps broken down, almost list-like.
  3. Super-clear, often rehearsed explanation of the first step.
  4. Guided practice on mini-whiteboards until everyone can do it.
  5. Independent (timed) practice in books.
  6. Short assessment to prove to them they have achieved that step.
  7. Repeat 3-6 for next steps.
  8. Final assessment to prove to them they have achieved the whole skill.
  9. Repetition of my p-raising mantra – that everything in Maths looks scary and confusing at first, but easy once you’ve learned it.

If this looks remarkably like archetypal Direct Instruction, that’s because it is. The aim of these lessons are not to excite or engage in the popular sense. The aim is to convince all students, that if they try then they will learn. Discovery and inquiry have their place, but not when building confidence in fragile learners. Right now, I can’t risk any student not understanding at the end of the lesson.

I worry that too often teachers are encouraged to deal with disengaged classes by engaging them in expert-type activities that leave them too open to the risk of failure, and entrench many students’ pre-existing beliefs that they will not learn even if they try. I emphatically aim to build up to meaningful mathematical inquiry with all my students, but only when they have the confidence to cope with the very real prospect of failure in this.

A Warning

Teaching a student whose p is low is very different to teaching a student whose p is high. The former needs nurturing, confidence-building treatment where they are protected from failure and practically forced to succeed. The latter need to build their confidence by trying, failing and trying again. Where one type of student needs a tight structure, the other often needs a more open one. The trick is in identifying each type of student, and teaching appropriately to both of them.

Conclusion

Trying is risky. Lots of students quite rationally decide not to bother in their lessons, because the evidence they have tells them the probability of them doing well isn’t high enough. They’d rather take the certain path of failing academically, but with the social kudos of never having tried. To tackle this disengagement we need to take the risk out of trying. Turning around disengagement means relentlessly ensuring that every lesson ends in success, until confidence is built sufficiently high that trying no longer seems risky.

Innovation Day

How does your school innovate?

At Google, employees have their famed 20% time, where they work on projects of their choice that fall outside the scope of their usual job.

In Drive Daniel Pink tells the story of Atlassian, a software company who run quarterly FedEx days. On each of these days employees have 24 hours to work on any project of their choosing that relates to the company’s products.

Institutional innovation seems common in the computing sector – so why is it not in education?

Barriers to Innovation

  1. Hierarchy – most schools are built on a hierarchical structure. They will differ in who rigid this is, but I’ve not heard of many where dissent and challenge are actively encouraged. Innovation is a process of “creative destruction”. It can only take place where the hierarchy allows elements of the status quo to be challenged and creatively destroyed.
  2. Time – innovation takes time. Unlike a software company, schools cannot opt to stop working for a day, or afford to reduce timetables by 20% on account of speculative endeavours.
  3. Orthodoxy – almost all teachers have been trained in the same dominant orthodoxy, and are used to being told that particular strategies are ‘right’. As a profession we have, until quite recently, been discouraged from thinking independently and challenging orthodoxy.
  4. Silos – teachers tend to work in silos. Whether they be classrooms, departments, or even whole schools, the physical and organisational structures of education encourage teachers to work in silos rather than cross boundaries into other areas.

The Desire to Innovate

Schools are full of creative potential. Teachers know their students and their needs better than anyone else, and are best placed to drive the ideas and initiatives needed to improve their life chances. These barriers to innovation must be overcome. At WA we strongly believe that the best ideas will come from staff, and have begun trying to shape our culture of innovation.

Innovation Day

The first inset day this January was our first Innovation Day. Every member of staff was given the day to work on an idea or project of their choice. The only constraints were that:

  • It must contribute to the mission of the school.
  • It must have an impact beyond an individual teacher’s practice.
  • It cannot assume funding from the school budget.

Staff were in one of two streams. Developers submitted a project in advance, and got the day to work on bringing it closer to fruition. Over twenty five projects were submitted involving over sixty staff. 



Innovators began the day with problem-solving around our five strategic priority areas, looking for the biggest underlying barriers and ways to combat them. Groups formed around good ideas, and they spent the rest of the day building these into more concrete plans.



The range and quality of innovations was incredible. To give a quick flavour we had:

  • A cross-curricular think tank founded, to develop stronger links and synergies across subjects.
  • A community World Cup programme designed, to open the school up as a community hub and take the opportunity to enthuse children of all ages about different subjects through football.
  • A programme for improving questioning developed, using a specially designed structure of lesson observation to pick out key successes and areas for development.
  • A new programme to develop presentation skills to be delivered through tutor time.
  • An improved induction programme for vulnerable students to make sure they settle in and succeed to the best of their abilities.
  • And about thirty more!

Lifting Barriers

Innovation Day worked because it lifted the above barriers.

  1. Hierarchy – we explicitly said that anything goes, and no area of the school was off limits. To make this easier, senior leaders did not join in the rooms with other staff so that challenge could flow more freely. Instead groups that wanted to seek the advice of leadership booked consultation slots to go through their ideas.
  2. Time – we freed up one day. One day is not enough, but it is a start!
  3. Orthodoxy – all teachers were encouraged to challenge orthodoxy. Innovators had displays of prompts for their problem-solving, including things such as a table of Hattie’s effect sizes, and Prof Rob Coe’s great scatter graph. These prompted a challenge to some of the orthodoxies we have grown used to accepting.
  4. Silos – staff chose the groups they worked in, but never fell back into silos during the day. Developers were roomed with projects tackling similar problems from different departments, and innovators were mixed from the start. Activities such as Idea Speed Dating created opportunities for further discussion outside of traditional school silos.


Lifting these barriers, just for a day, unleashed a huge amount of creative energy and has led to fantastic innovations to improve our students’ futures. Our challenge now is further minimising barriers in the longer term, so that innovation becomes part of our culture rather than an annual event.