I tried to discuss this on Twitter with Sam Freedman, but as his blog title points out, sometimes 140 characters isn’t enough…
The NAHT recently released the findings of their commission on assessment. They have attempted to set out a general framework for assessing without levels, including 21 recommendations, their principles of assessment, and a design checklist for system-building. All in all the report is a good one, capturing some of the most important principles for an effective system of assessment. However there are some significant problems to be fixed.
Firstly, the report relies on ‘objective criteria’ to drive assessment, without recognising that criteria cannot be objective without assessments bringing them to life. Secondly, the report places a heavy emphasis on the need for consistency without recognising the need for schools to retain the autonomy to innovate in both curriculum and assessment. Thirdly, the report advocates assessment that forces students into one of only three boxes (developing, meeting or exceeding), instead of allowing for a more accurate spectrum of possible states.
Here are my comments on some of the more interesting aspects of the report.
Summary of recommendations
4. Pupils should be assessed against objective and agreed criteria rather than ranked against each other.
This seems eminently sensible – learning is not a zero sum game. The potential problem with this, however, is that ‘objective criteria’ are very rarely objective. In “Driven by Data”, Paul Bambrick-Santoyo makes a compelling case that criteria alone are not enough, as they are always too ambiguous on the level of rigour demanded. Instead, criteria must be accompanied by sample assessment questions that demonstrate the required level of rigour. So whilst I agree with the NAHT’s sentiment here, I’d argue that a criteria-based system cannot be objective without clear examples of assessment to set the level of rigour.
5. Pupil progress and achievement should be communicated in terms of descriptive profiles rather than condensed to numerical summaries (although schools may wish to use numerical data for internal purposes).
Dylan Wiliam poses three key questions that are at the heart of formative assessment.
- Where am I?
- Where am I going?
- How am I going to get there?
A school assessment system should answer these three questions, and a system that communicates only aggregated numbers does not. Good assessment should collect data at a granular level so that it serves teaching and learning. Aggregating this data into summary statistics is an important, but secondary, purpose.
7. Schools should work in collaboration, for example in clusters, to ensure a consistent approach to assessment. Furthermore, excellent practice in assessment should be identified and publicised, with the Department for Education responsible for ensuring that this is undertaken.
The balance between consistency and autonomy will be the biggest challenge of the post-levels assessment landscape. Consistency allows parents and students to compare between schools, and will be particularly important for students who change schools during a key stage. Autonomy allows schools the freedom to innovate and design continually better systems of assessment from which we all can learn. I worry about calls for consistency, that they will degenerate into calls for homogeneity and a lowest common denominator system of assessment.
18. The use by schools of suitably modified National Curriculum levels as an interim measure in 2014 should be supported by government. However, schools need to be clear that any use of levels in relation to the new curriculum can only be a temporary arrangement to enable them to develop, implement and embed a robust new framework for assessment. Schools need to be conscious that new curriculum is not in alignment with the old National Curriculum levels.
Can we please stick the last sentence of this to billboards outside every school? I really don’t think this message has actually hit home yet. Students in Year 7 and 8 are still being given levels that judge their performance on a completely irrelevant scale. This needs to stop, soon. I worry about this recommendation, which seems sensible at first, leading to schools just leaving levels in place for as long as possible. Who’s going to explain to parents that Level 5 now means Level 4 and a bit (we think, but we haven’t quite worked it out yet so just bare with us)?
Assessment criteria are derived from the school curriculum, which is composed of the National Curriculum and our own local design.
As above, it’s not a one way relationship from curriculum to assessment – the curriculum means little without assessment shedding light on what criteria and objectives actually mean. The difference between different schools’ curricula is another reason that the desired consistency becomes harder to achieve.
Each pupil is assessed as either ‘developing’, ‘meeting’ or ‘exceeding’ each relevant criterion contained in our expectations for that year.
This is my biggest problem with the report’s recommendations. Why constrain assessment to offering only three possible ‘states’ in which a student can be? In homage to this limiting scale, I have three big objections:
- Exceeding doesn’t make sense: The more I think about ‘exceeding’, the less sense it makes. If you’ve exceeded a criterion, haven’t you just met the next one? Surely it makes more sense to simply record that you have met an additional criterion that try to capture that information ambiguously by stating that you have ‘exceeded’ something lesser. For the student who is exceeding expectations, recording it in this way serves little formative purpose. The assessment system records that they’ve exceeded some things, but not how. It doesn’t tell them which ‘excess’ criteria they have met, or how to exceed even further. If it does do this because it records additional criteria as being met, what was the point of the exceeding grade in the first place?
I’m also struggling to see how you measure that a criterion has been exceeded. To do this you’d need questions on your assessment that measure more than the criterion being assessed. Each assessment would also have to measure something else, something in excess of the current criterion. The implication of all this is that when you’re recording a mark for one criterion, you’re also implicitly recording a mark for the next. Why do this? Why not just record two marks separately?
The NAHT report suggests using a traffic light monitoring system. Presumably green is for exceeding, and amber is for meeting. Why is meeting only amber? That just means expectations were not high enough to start with.
- Limiting information: The system we use in our department (see more here) records scores out of 100. My ‘red’ range is 0-49, ‘amber’ is 50-69, and ‘green’ is 70-100. I have some students who have scored 70-75 on certain topics. Yes they got into the green zone, but they’re only just there. So when deciding to give out targeted homework on past topics, I’ll often treat a 70-75 score like a 60-70 score, and make sure they spend time solidifying their 70+ status. Knowing where a student lies within a range like ‘meeting’ is incredibly valuable. It’s probably measured in the assessment you’d give anyway. Why lose it by only recording 1, 2 or 3?
- One high-stakes threshold: Thresholds always create problems. They distort incentives, disrupt measurement and have a knack for becoming way more important than they were ever intended to be. This proposed system requires teachers to decide if students are ‘developing’ or ‘meeting’. There is no middle ground. This threshold will inevitably be used inconsistently.
The first problem is that ‘meeting’ a criterion is really difficult to define. All teachers would need to look for a consistent level of performance. If left to informal assessment there is no hope of consistency. If judged by formal assessment then keep the full picture rather than squashing a student’s performance into the boxes of meeting or developing.
The second problem is that having one high-stakes threshold creates lots of dreadful incentives for teachers. Who wouldn’t be tempted to mark as ‘meeting’ the student who’s worked really hard and not quite made it, rather than putting them in a category with the student who couldn’t care less and didn’t bother trying. And what about the incentive to just mark a borderline student as ‘meeting’ rather than face the challenges of acknowledging that they’re not? The farce of the C/D borderline may just be recreated.
A better system expects a range of performance, and prepares to measure it. A system Primary School system I designed had five possible ‘states’, whereas the Secondary system we use is built on percentages. By capturing a truer picture of student performance we can guide teaching and learning in much greater detail.
I agree with most the NAHT’s report, and am glad to see such another strong contribution to the debate on assessment. However there are three main amendments that need to be made:
- Acknowledge the two-way relationship between curriculum and assessment, and that criteria from the curriculum are of little use without accompanying assessment questions to bring them to life.
- Consider the need for autonomy alongside the desire for consistency, lest we degenerate into a national monopoly that quashes innovation in assessment.
- Remove the three ‘states’ model and encourage assessment systems that capture and use more information to represent the true spectrum of students’ achievements.
Very good blog.
Key questions: Formative or summative? Workload implications? How easily gamed (stakes, low or high)?
Yes, the 3 levelled new ‘levels’ are nonsense. We already have them at the end of reception. They are meaningless. Is a child at ‘ emerging’ level a hair’s breadth away from ‘expected’ or do the have profound sen- no way of knowing, let alone setting the yr1 teacher a target for % increase in proportion at expected level. Same at the other end.
How about 14 levelled criteria in primary and 10 in secondary ( up to end of yr 11) ie 2 per year. This way we could say a child was performing at a yr6 summer level and we’d all know the sort of thing this meant whether they were 7 or 13.
Thanks for the comment. You’re absolutely right about the loss of information in an effectively binary system – I can’t see it making sense or being applied properly. My worry about levelling people as Year 6 summer, etc, is that you’d go back to a best fit model of assessment and have lots of the problems of old levels. It also wouldn’t allow for flexibility around each school’s curriculum. What we need is a precise and flexible system.