Tag Archives: assessment

Assessing Without Levels

Exam season does not a good blogger make. However I have attached below the presentation myself and another colleague from school gave at the National Mathematics Conference in Kettering on Saturday. It covers the principles of our approach to assessing without levels, which will be expanded on when our software is built and ready for wider use!

140608 Assessing Without Levels (Kettering)

Good, But Not There Yet – A verdict on the NAHT report on assessment

I tried to discuss this on Twitter with Sam Freedman, but as his blog title points out, sometimes 140 characters isn’t enough…

The NAHT recently released the findings of their commission on assessment. They have attempted to set out a general framework for assessing without levels, including 21 recommendations, their principles of assessment, and a design checklist for system-building. All in all the report is a good one, capturing some of the most important principles for an effective system of assessment. However there are some significant problems to be fixed.

Firstly, the report relies on ‘objective criteria’ to drive assessment, without recognising that criteria cannot be objective without assessments bringing them to life. Secondly, the report places a heavy emphasis on the need for consistency without recognising the need for schools to retain the autonomy to innovate in both curriculum and assessment. Thirdly, the report advocates assessment that forces students into one of only three boxes (developing, meeting or exceeding), instead of allowing for a more accurate spectrum of possible states.

Here are my comments on some of the more interesting aspects of the report.

Summary of recommendations

4. Pupils should be assessed against objective and agreed criteria rather than ranked against each other.
This seems eminently sensible – learning is not a zero sum game. The potential problem with this, however, is that ‘objective criteria’ are very rarely objective. In “Driven by Data”, Paul Bambrick-Santoyo makes a compelling case that criteria alone are not enough, as they are always too ambiguous on the level of rigour demanded. Instead, criteria must be accompanied by sample assessment questions that demonstrate the required level of rigour. So whilst I agree with the NAHT’s sentiment here, I’d argue that a criteria-based system cannot be objective without clear examples of assessment to set the level of rigour.

5. Pupil progress and achievement should be communicated in terms of descriptive profiles rather than condensed to numerical summaries (although schools may wish to use numerical data for internal purposes).
Dylan Wiliam poses three key questions that are at the heart of formative assessment. 

  • Where am I? 
  • Where am I going? 
  • How am I going to get there?

A school assessment system should answer these three questions, and a system that communicates only aggregated numbers does not. Good assessment should collect data at a granular level so that it serves teaching and learning. Aggregating this data into summary statistics is an important, but secondary, purpose.

7. Schools should work in collaboration, for example in clusters, to ensure a consistent approach to assessment. Furthermore, excellent practice in assessment should be identified and publicised, with the Department for Education responsible for ensuring that this is undertaken.
The balance between consistency and autonomy will be the biggest challenge of the post-levels assessment landscape. Consistency allows parents and students to compare between schools, and will be particularly important for students who change schools during a key stage. Autonomy allows schools the freedom to innovate and design continually better systems of assessment from which we all can learn. I worry about calls for consistency, that they will degenerate into calls for homogeneity and a lowest common denominator system of assessment.

18. The use by schools of suitably modified National Curriculum levels as an interim measure in 2014 should be supported by government. However, schools need to be clear that any use of levels in relation to the new curriculum can only be a temporary arrangement to enable them to develop, implement and embed a robust new framework for assessment. Schools need to be conscious that new curriculum is not in alignment with the old National Curriculum levels.
Can we please stick the last sentence of this to billboards outside every school? I really don’t think this message has actually hit home yet. Students in Year 7 and 8 are still being given levels that judge their performance on a completely irrelevant scale. This needs to stop, soon. I worry about this recommendation, which seems sensible at first, leading to schools just leaving levels in place for as long as possible. Who’s going to explain to parents that Level 5 now means Level 4 and a bit (we think, but we haven’t quite worked it out yet so just bare with us)?

Design Checklist

Assessment criteria are derived from the school curriculum, which is composed of the National Curriculum and our own local design.
As above, it’s not a one way relationship from curriculum to assessment – the curriculum means little without assessment shedding light on what criteria and objectives actually mean. The difference between different schools’ curricula is another reason that the desired consistency becomes harder to achieve.

Each pupil is assessed as either ‘developing’, ‘meeting’ or ‘exceeding’ each relevant criterion contained in our expectations for that year.
This is my biggest problem with the report’s recommendations. Why constrain assessment to offering only three possible ‘states’ in which a student can be? In homage to this limiting scale, I have three big objections:

  1. Exceeding doesn’t make senseThe more I think about ‘exceeding’, the less sense it makes. If you’ve exceeded a criterion, haven’t you just met the next one? Surely it makes more sense to simply record that you have met an additional criterion that try to capture that information ambiguously by stating that you have ‘exceeded’ something lesser. For the student who is exceeding expectations, recording it in this way serves little formative purpose. The assessment system records that they’ve exceeded some things, but not how. It doesn’t tell them which ‘excess’ criteria they have met, or how to exceed even further. If it does do this because it records additional criteria as being met, what was the point of the exceeding grade in the first place?

    I’m also struggling to see how you measure that a criterion has been exceeded. To do this you’d need questions on your assessment that measure more than the criterion being assessed. Each assessment would also have to measure something else, something in excess of the current criterion. The implication of all this is that when you’re recording a mark for one criterion, you’re also implicitly recording a mark for the next. Why do this? Why not just record two marks separately?

    The NAHT report suggests using a traffic light monitoring system. Presumably green is for exceeding, and amber is for meeting. Why is meeting only amber? That just means expectations were not high enough to start with.

  2. Limiting informationThe system we use in our department (see more here) records scores out of 100. My ‘red’ range is 0-49, ‘amber’ is 50-69, and ‘green’ is 70-100. I have some students who have scored 70-75 on certain topics. Yes they got into the green zone, but they’re only just there. So when deciding to give out targeted homework on past topics, I’ll often treat a 70-75 score like a 60-70 score, and make sure they spend time solidifying their 70+ status. Knowing where a student lies within a range like ‘meeting’ is incredibly valuable. It’s probably measured in the assessment you’d give anyway. Why lose it by only recording 1, 2 or 3?

  3. One high-stakes thresholdThresholds always create problems. They distort incentives, disrupt measurement and have a knack for becoming way more important than they were ever intended to be. This proposed system requires teachers to decide if students are ‘developing’ or ‘meeting’. There is no middle ground. This threshold will inevitably be used inconsistently.

    The first problem is that ‘meeting’ a criterion is really difficult to define. All teachers would need to look for a consistent level of performance. If left to informal assessment there is no hope of consistency. If judged by formal assessment then keep the full picture rather than squashing a student’s performance into the boxes of meeting or developing.

    The second problem is that having one high-stakes threshold creates lots of dreadful incentives for teachers. Who wouldn’t be tempted to mark as ‘meeting’ the student who’s worked really hard and not quite made it, rather than putting them in a category with the student who couldn’t care less and didn’t bother trying. And what about the incentive to just mark a borderline student as ‘meeting’ rather than face the challenges of acknowledging that they’re not? The farce of the C/D borderline may just be recreated.

A better system expects a range of performance, and prepares to measure it. A system Primary School system I designed had five possible ‘states’, whereas the Secondary system we use is built on percentages. By capturing a truer picture of student performance we can guide teaching and learning in much greater detail.


I agree with most the NAHT’s report, and am glad to see such another strong contribution to the debate on assessment. However there are three main amendments that need to be made:

  1. Acknowledge the two-way relationship between curriculum and assessment, and that criteria from the curriculum are of little use without accompanying assessment questions to bring them to life.
  2. Consider the need for autonomy alongside the desire for consistency, lest we degenerate into a national monopoly that quashes innovation in assessment.
  3. Remove the three ‘states’ model and encourage assessment systems that capture and use more information to represent the true spectrum of students’ achievements.

An Assessment System That Works

I’ve been fairly absent from blogging/Twitter since the summer – an inevitable consequence of taking up a few new roles amidst the discord of new systems and specifications emerging from gov.uk with increasing regularity. But I don’t mean that as a complaint. Much that was there was broken, and much that is replacing it is good. Although life in the present discord is manic and stressful, it is also a time of incredible opportunity to improve on what went before, and to rework many of the systems in teaching that went unquestioned in schools for too long.

This Christmas I’m stopping to reflect on the term gone by, and on our efforts to improve three areas: Assessment, Curriculum, and Teaching & Learning. There are many failures, many ideas that failed to translate from paper to practice, but also a good number of successes to learn from and develop in January.

A Blank Slate

KS3 SATs died years ago. National Curriculum levels officially die in September, but can be ‘disapplied’ this year. With tests and benchmarks gone, there is a blank slate in KS3 assessment. This is phenomenally exciting. Levels saturated schools with problems – they were a set of ‘best fit’ labels, good only for summative assessment, that got put at the heart of systems for formative assessment. No wonder they failed.

At WA we decided to try building a replacement system, trialled in Maths, that could ultimately achieve what termly reporting of NC levels never could. We began with three core design principles:

1) It has to guide teaching and learning (it must answer the question “what should I do tonight to get better at Maths?”).
2) It has to be simple for everyone to understand.
3) It has to prepare students for the rigour of tougher terminal exams and challenging post-16 routes.

Principle 2 led us to an early decision – we wanted a score out of 100. This would be easy for everyone to understand, and by scoring out of 100 rather than a small number we are less likely to have critical thresholds where students’ scores are bunched and where disproportionate effort is concentrated. Scoring out of 100, we felt, would always encourage a bit more effort on the margin in a way that GCSE with eight grades fail to do.

Principle 1 led us to another early decision – we need data on each topic students learn. Without this, the system will descend into level-like ‘best fit’ mayhem, where students receive labels that don’t help them to progress. Yet there’s a tension here between principles 1 and 2. Principle 1 would have data on everything, separated at an incredibly granular level. However this would soon become tricky to understand and would ultimately render the system unused.

For me, Principle 3 ruled out using old SATs papers and past assessment material. These were tied to an old curriculum that did no adequately assess many of the skills we expect of our students. They also left too much of assessment to infrequent high-stakes testing, which does not encourage the work ethic and culture of study we value.

These three principles guided our discussions to the system we have now been running since September.

Our System

The Maths curriculum in Year 7-9 (featured in the next post) has been broken down into topics – approximately 15 per year. Each of these topics is individually assessed and given a score out of 100. This score is computed from three elements: an in-class quiz, homework results, and an end of term test. Students then get an overall percentage score, averaged from all of the topics they have studied so far. This means that for each student we have an indication of their overall proficiency at Maths, as well as detailed information on their proficiency at each individual topic. This is recorded by students, stored by teachers, and reported to parents six times a year.

Does it work?

Principle 1: Does it guide teaching and learning?

Lots of strategies have been put in place to make sure that it does. For example, the in-class quiz is designed to be taken after the material in a topic has been covered but before teaching time is over. The results are used to guide reteaching in the following lessons so that the students can retake with another quiz on that topic and increase their score. Teachers also produce termly action plans as a result of their data analysis, which highlight the actions needed to support particular students as well as adjustments needed to combat problematic whole class trends.

Despite this, we haven’t yet developed a culture of assessment scores driving independent study. Our vision is that students know exactly what they have to do each evening to improve at Maths, and I believe that this system will be integral to achieving that. We need a bigger drive to actively develop that culture, rather than expecting it to come organically.

Extract from the Year 7 assessment record sheet.

I’m also concerned that assessment at this level has not yet become seen as a core part of teaching and learning. Teachers are dedicated in their collection and recording of data, and have planned some brilliant strategies for extending their students’ progress. But it still just feels like an add-on, something additional to teaching rather than at the heart of it. One of our goals as a department next term must be to embed assessment data further into teaching; not to be content with it assisting from the side.

Principle 2: Is it easy to understand?

Unequivocally yes. Feedback from parents, tutors and students has been resoundingly positive. Each term we report each student’s overall score, as well as their result for each topic studied that term. One question for the future is how to make all past data accessible to parents, as by Year 9 there will be 40+ topics worth of information recorded.

Principle 3: Is it rigorous enough?

By making the decision to produce our own assessments from scratch we allowed ourselves to set the level of rigour. I like to think that if anything we’ve set it too high. We source and write demanding questions to really challenge students, and to prepare them to succeed in the toughest of exams. A particular favourite question of mine was asking Year 8 to find the Lowest Common Multiple of pqr and pq^2, closely rivalled by giving them the famed Reblochon cheese question from a recent GCSE paper.

The Reblochon cheese question – a Year 8 favourite.

Following the advice of Paul Bambrick-Santoyo (if you haven’t read Leverage Leadership then go to a bookshop now) we made all assessments available when teachers began planning to teach each topic. This has been a great success, and I’ve really seen the Pygmalion effect in action. By transparently raising the bar in our assessments, teachers have raised it in their lessons; and students have relished the challenge.


This assessment system works. It clearly tells students, teachers and parents where each individual is doing well and where they need to improve. Nothing is obscured by a ‘best fit’ label, yet the data is still easy to understand. Freeing ourselves from National Curriculum levels freed us from stale SATs papers and their lack of ambition. Instead we set assessments that challenge students at a higher level – a challenge they have met. The next step is making data and assessment a core part of teaching. Just like NC levels were once a part of every lesson (in an unhelpful labelling way), the results of assessment should be central to planning and delivering each lesson now.