Why homework is bad for you

Laura McInerny’s third touchpaper problem is:

“If you want a student to remember 20 chunks of knowledge from one lesson to the next, what is the most effective homework to set?”

After a day of research at the problem-solving party, I came to this worrying conclusion:

Setting homework to remember knowledge from one lesson to the next could actually be bad for their memory.

So stop setting homework on what you did in that lesson – at least until you’ve read this post.

Components of Memory

Bjork says that memories have two characteristics – their storage strength and their retrieval strength. Storage strength describes how well embedded a piece of information is in the long-term memory, while retrieval strength describes how easily it can be accessed and brought into the working memory. The most remarkable implication of Bjork’s research surrounds how storage strength is built.

Storage and Retrieval strength – courtesy of Kris Boulton

Retrieval as a ‘memory modifier’

Good teaching of a piece of information can get it into the top left hand quadrant, where retrieval strength is high but storage strength is low. Once a chunk of knowledge is known (in the high retrieval sense of knowing), its storage strength is not developed by thinking on it further. Rather storage strength is enhanced by the act of retrieving that chunk from the long-term memory. This is really important. Extra studying doesn’t improve retention. Memory is improved by the act of retrieval.

The ‘Spacing Effect’

Recalling a chunk of knowledge from the long-term memory strengthens its storage strength. However for this to be effective, the chunk’s retrieval strength must have diminished. ‘Recalling’ a chunk ten minutes after you’ve studied isn’t going to be very effective, as your brain doesn’t have to search around for such a recent memory. Only when a memory’s retrieval strength is low will the act of recall increase storage strength. This gives rise to the spacing effect – the well-established phenomenon that distributing practice across time builds stronger memories than massing practice together. 

Rohrer & Taylor (2006) go a step further and compare overlearning (additional practice at the time of first learning) with distributed practice. They find no effect of over learning, and ‘extremely large’ effects of distributed practice on future retention.

Optimal intervals

There is an optimal point for recalling a memory, in order to maximise its storage strength. At this point, the memory’s retrieval strength has dropped enough for the act of retrieval to significantly increase storage strength, but not so much to prevent it from being accurately recalled. Choosing the correct point can improve future recall by up to 150% (Cepeda, et al., 2009).

There has been a common design of most studies into optimal spacing. Subjects learn a set of information at a first study session. There is then a gap before a second study session where they retrieve learned information. Before a final test there is a retrieval interval (RI) of a fixed time period. Studies such as Cepeda, et al (2008) show that the optimal gap is a function of the length of the RI, and that longer RIs demand longer gaps between study periods. However this function is not a linear one – shorter RIs have optimal gaps of 20-40%, whereas longer RIs have optimal gaps of 5-10%.

Better too long than not long enough

Cepeda et al’s 2008 study looks at four RIs: 7, 35, 70, and 350 days. The optimal gaps for maximising future recall were 1, 11, 21 and 21 days respectively, and these gaps improved recall by 10%, 59%, 111% and 77%.

Perhaps their most important finding is the shape of the curves relating the gap to the future retention. For all RIs these curves begin climbing steeply, reach a maximum, and then decline very slowly or plateau. The implication is that when setting a gap between study periods it is better to err on the side of making it too long than risk making it too short. Too long an interval will have only small negative effects. Too short an interval is catastrophic for storage strength.

Why homework could be bad

Homework is usually set as a continuation of classwork, where students complete exercises that evening on what they learned in school that day. This constitutes a short gap between study sessions of less than a day. We know that where information is to be retained for a week, the optimal gap is a day, and that where this is not possible it is better to leave a longer gap than a shorter one. For longer RIs, the sort of periods we want students to remember knowledge for, the optimal gap can be longer than a week.

Therefore, if you want students to remember information twenty chunks of knowledge for longer than just one lesson to the next, the best homework to set is no homework!

Setting homework prematurely actually harms the storage strength of the information learned that day by stopping students reaching the optimal retrieval interval. In this case, students who don’t do their homework are better off than ones who do!

Why I might be wrong, and what we need to do next

There is not enough good evidence of how to stagger multiple study sessions with multiple gaps. For example, we do not know where it would be best to place a third study session, only a second. However we do know that retrieval is a memory modifier, and so additional retrieval should strengthen memories as long as the gap is sufficiently large for retrieval strength to have diminished. Given we know that retrieving newly learned information after a gap of one day is good for storage strength, it may be that studying with gaps of say 1, 3, 10 and 21 days are better for storage strength than a solitary study session after 21 days, where the RI is long (350 days or greater). In this case for teachers who only have one or two lessons a week, homework could help them make up the optimal gaps by providing for study sessions between lessons.

The optimal arrangement of multiple gaps is a priority for research. We need to better understand how these should be staged, so that we can begin to set homework schedules that support memory rather than undermine it. Until then, only set homework on previously learned knowledge, and better to err on the side of longer delays. My students will be getting homework on old topics only from now on.


Joe Kirby on memory this weekend
EEF Neuroscience Literature Review
Dunlosky, et al., 2013. Improving Students’ Learning With Effective Learning Techniques: Promising Directions From Cognitive and Educational Psychology
Rohrer & Taylor, 2006. The Effects of Overlearning and Distributed Practise on the Retention of Mathematics Knowledge
Cepeda, et al., 2009. Optimizing Distributed Practice
Cepeda, et al., 2008. Spacing Effects in Learning: A Temporal Ridgeline of Optimal Retention
Everything Kris Boulton writes

Performance Related Pay

On performance related pay I am a believer in principle but a sceptic in practice. After reading Policy Exchange’s report published yesterday, “Reversing the Widget Effect“, I remain so. However I am coming to believe that PRP can be rescued, and that a more flexible and transparent system could help teachers to improve by improving the quality of professional development in schools.

This is a heated topic of conversation, and far too closely tied to mistrust of the political establishment and insinuations about privatising education. This much is evidenced by the disparity between two recent polls on PRP: when YouGov asked on behalf of Policy Exchange 89% of teachers were in favour of PRP in principle; when YouGov asked on behalf of the NUT in a survey about the government’s reforms, 81% were against PRP. Context here is king, and separating PRP from opinions about Michael Gove’s personal integrity is essential if we’re to have any semblance of rational debate.

PRP in Principle

The foreword to Matthew Robb’s report is written by George Parker, a former US union leader turned advocate of PRP. Branded a traitor by teaching unions in the States, Parker recounts a lightbulb moment he had after delivering a speech at a “high poverty primary school”. He writes that:

“Afterwards, a little girl came up to me and hugged me, and said that no-one had ever said that before. No-one had ever been fighting for them to get a better education. And in the car on the way back, I realised: you lied. You lied to that little girl. Because I didn’t really care about her, and getting good teachers in front of her. In fact, I’d just spent $10,000 to overturn a firing and keep a bad teacher in that school – a bad teacher I would not want anywhere near my own granddaughter…”

The PX Report devotes a lot of time to addressing this ‘in principle’ case, that it is almost morally wrong to reward poor or mediocre performance in the same way as good and excellent performance. I do strongly agree with their argument here. We should be doing everything possible to ensure that all children receive the best education, and as the biggest determinant of that is the teacher they have, we should be putting all of our effort into improving teaching. If tying together pay and accountability make even a marginal difference to student outcomes, then in principle we should be accepting PRP.

The Status Quo is Inadequate

The first step in Robb’s argument is that the apparently performance related status quo has ceased to reward performance. He references a report finding no relationship between the Ofsted quality of teaching grade a school is given and the average teaching salary in that school, and shows us the distribution of pay bands within schools of different Ofsted ratings. This evidence is damning. A pay system that has no relationship with performance is wasting taxpayers’ money.

Nor can it be argued that experience or tenure is a good proxy for performance. Do First Impressions Matter?, a recent paper by Atteberry, Loeb and Wyckoff, shows that of teachers whose first year performance is in the lowest quintile, 62% remain in the bottom two quintiles five years later. More worryingly they show that although the gap between the top and bottom quintiles closes, this is not just because the bottom quintile get better but because the top quintile actually get worse, with those in between largely stagnating.

With no evidence to suggest that the current system either is or should be working as we desire, in principle we should be looking for a new one.

The In-Principle Argument for PRP

There seems to me to be a reasonable causal chain, backed up by evidence, from well-implemented PRP to better student outcomes. PRP causes them to exert greater effort/raised extrinsic motivation. This leads to more deliberate practice, which leads to increased student outcomes.

i. Raising extrinsic motivation
As Robb recognises, “it is not in doubt that for the majority of teachers, the primary motivation is to help their pupils progress”. Nonetheless even the most virtuous of teachers can be influenced to some extent by external factors, of which pay is one. The actual evidence on the relationship between teacher pay and teacher effectiveness is mixed. Few teachers cite pay as a motivation for entering the teaching profession, yet many cite it as a reason for leaving. Comparative international studies show that countries where teacher pay is higher have better student outcomes, but they do not conclusively show that a performance aspect of this pay is significant.

This is definitely the weakest link in the PRP causal chain. The most robust element of Robb’s argument is that higher pay, through PRP, would attract and retain good teachers who would otherwise either not enter or leave teaching. This is undoubtedly a positive effect, but I question whether this effect alone is enough to warrant the effort that implementing PRP would be. Rather I am compelled by Dylan Wiliam’s argument that improving the quality of entrants into the teaching profession will take a long time to have a relatively small effect, and therefore that “the key to improvement of educational outcomes is investment in teachers already working in our schools”. I am unaware of any evidence suggesting that there would be a sufficiently large influx of suitably talented new teachers under a new pay regime to undermine Wiliam’s argument.

More compelling, but less well evidenced, is the claim that PRP could increase the extrinsic motivation of teachers in schools. Nonetheless it seems to me that building teacher performance into the formal accountability proceedings of a school, tied to a teacher’s progression up the pay scale, cannot fail to increase the incentives for teachers to improve their performance. Not only this, but it places a much greater pressure on the school to improve its teachers (more on this later on). I believe, as I will argue later, that even if the impact on the motivation of teachers were to be minimal (although much evidence does suggest otherwise, as Robb discusses), the impact on school processes would be enough to drive the improvement we seek.

ii. Deliberate practice
The second causal leap in the above chain is that increased motivation leads to increased deliberate practice. Much has been written about the role of deliberate practice in improving performance across domains. The canonical violinists study showed how practice, not talent, was the determinant of a great violinist, and although more recent evidence has shown the role of innate talent in some physical pursuits, deliberate practice still reigns in most other domains. Teaching, for example, is one of these, as discussed in Alex Quigley’s blog on applying deliberate practice to become a better teacher.

If deliberate practice improves teaching quality then the leap to better student outcomes is a straightforward one. Robb references research showing that the difference between a teacher in the 25th percentile and a teacher in the 75th percentile is 0.4 GCSE points per subject, whilst the difference between the 5th and 95th percentiles is 1 whole GCSE point per subject.

The causal chain from PRP to better student outcomes works in principle, and as George Parker argues, we have a moral obligation to take that very seriously indeed.

PRP in Practice

Robb’s argument for PRP hinges on a school’s ability to accurately measure teacher performance. Using the results of the Measures of Effective Teaching (MET) project, Robb dismisses the claim that teaching quality cannot accurately be measured. He does so too hastily.

The MET results are certainly positive, and have taught us a great deal about measuring effective teaching. Of particular interest for me was the significant predictive power of student surveys, something I’m confident would not be particularly popular with teaching unions. Robb argues, based on the MET results, that an appropriately weighted basket of measures, preferably averaged over two years, would be sufficiently accurate to determine a teacher’s pay.

I am less convinced.  Robb’s report includes a table (below) comparing teacher effectiveness by quintile in two consecutive years. It finds that “the variance is such that only half the teachers assessed as being in the lowest quintile of performance in one year are in the lowest two quintiles the following year – and a third of those assessed as being in the top quintile in one year have moved to the lowest two quintiles as well!”

Even the most reliable measure in the MET study (an equally weighted basket of state test results, observations, and student surveys) only had a reliability of 0.76, and this is using observations where observers have been specially trained and certified in a far more rigorous system than anything commonly used in Britain. Indeed Wiliam quotes research showing that to achieve a reliability of 0.9 in assessing teacher quality from observation a teacher would have to be observed teaching six different classes by five independent observers. This is hardly a viable proposition.

Although Robb is willing to write off these difficulties by arguing for averages over greater periods of time, or focusing on extreme performance, neither of these are good enough solutions to the reliability problem. As he himself argues, for PRP to be workable it needs “a solid performance evaluation system that teachers support”. A system where a third of teachers fluctuate from the top to the bottom each year is neither solid, nor likely to be supported.

Squaring the Circle: Professional Development Targets

Although I am sceptical of PRP as suggested in the Policy Exchange report because of its reliance on unreliable measures of teacher quality, I am reluctant to throw away the potential to improve student outcomes through the use of pay reform. The clearest lever by which this would work is improving professional development.

Wiliam identifies that teachers, on the whole, stop improving after two or three years in the profession. He suspects, as do I, that this is strongly linked to the poor availability of good-quality feedback for teachers post-qualification. Deliberate practice is hard without feedback. Where we differ is on how to improve the feedback cycle for teachers to better support good quality deliberate practice. Wiliam so far is relying on the goodwill of schools. Although this might be enough for some schools, it will not be enough for all. PRP could be the way to radically improve the support schools give their staff in order to become more effective teachers. The combination of upward pressure from teachers demanding the support they need to improve, and downward pressure from regulators demanding an improvement in more accurately measured teacher quality, is significant and powerful enough to change the face of professional development in most schools.

i. Upward pressure from teachers
As Robb argues, teachers who are judged on their performance will demand better feedback, coaching and training. They will insist on frequent, good-quality feedback that helps them to improve, and schools will be compelled to provide this. Once a teacher is given appropriate feedback they are much more able to improve through a cycle of deliberate practice, and to therefore improve the performance of the students they teach.

ii. Downward pressure from administrators
Robb writes that “The implementation of performance-related pay will require Heads and senior managers to undertake more rigorous performance evaluations of their staff…[this] will also force managers to more explicitly acknowledge the range of teacher performance in their school and act on it.” Once a school has explicitly measured the quality of teaching in the school as part of a more rigorous framework, they will be compelled – by Ofsted and by governors – to do more to improve it.

My question is whether a system of PRP can be designed that replaces the attempted measurement of objective performance with more of a focus on development. Could we, for example, set and more accurately measure specific targets related to a teacher’s improvement, rather than try to measure their ethereal ‘effectiveness’? Poorly measured effectiveness is not transparent, so does not help a teacher to improve. The measure fails Robb’s own criterion. Drawing up a set of clear but demanding targets, on the basis of student performance data, (better) observation and student surveys would provide transparent objectives for teachers to meet. The involvement of pay would cause teachers to demand, and schools to offer, the support and feedback needed for deliberate practice, which in turn would improve student outcomes.


Performance related pay works in principle. It has great potential to improve student outcomes by encouraging and supporting deliberate practice amongst teachers. However systems attempting to measure teacher effectiveness are not sufficiently reliable for pay to be based on. Their unreliability would create confusion and unpopularity, which undermine the central arguments for PRP. A better system is for schools to take advantage of PRP powers to strengthen performance management, and use clear, demanding and evidence-based targets to improve teacher effectiveness. By combining teachers’ increased extrinsic motivation and schools’ increased pressure to provide good-quality support, teachers will become more effective and student outcomes will improve.