Early trials of automarking software for maths papers indicate 99% accuracy

Automarking software developed by Cambridge AI company, Blutick, achieves 99% accuracy across marks in two GCSE papers when checked by a human marker.

Ofqual research suggests that, in maths, 30,000 students (4%) receive a mark which does not align with the ‘definitive grade’ given by the principal examiners. However, following a successful trial, an automated marking tool developed by Blutick was found to be 99% accurate when assessing scripts across two GCSE exam papers. This is an important step in helping to significantly reduce any margin of error, and ensure students receive the grades they deserve. 

Blutick is a Cambridge AI software company focused on teaching, learning and assessment in maths. The organisation is currently working with exam boards to improve marking consistency by augmenting and supporting the work done by examiners, and ultimately, champion a fair system for all students.

Once the automated marking was conducted across the exam papers, an examiner reviewed the students’ responses and the Blutick mark, correcting any marks where necessary and returning an accuracy rate of 98.6% to 98.75% by the AI software.

Currently, a sample of only 1.2% of questions are double marked (Ofqual 2018, p. 9). However, a 2013 review of literature on marking reliability research by Ofqual indicates the value of multiple marking for exam scripts. Despite this, it raises the recruitment of examiners, cost implications, time constraints and logistical issues as barriers to its introduction across the board. With other experts also disputing the accuracy of the current examination and grading system, this new automated marking software would help remove these barriers and challenges, increasing the feasibility of multiple marking.

Rob Percival, Blutick’s CEO and a former maths teacher, said:

“With so few questions double marked, there is a lot of scope for error. A system like this can review 100% of marked papers and flag potentially erroneous responses for further checking.

“It in no way replaces the work done by examiners, but instead acts as a safety net in ensuring more students get the grade they deserve with almost no extra cost or increased workload for examiners.”

Beyond eradicating errors, more automation in marking processes is a growing focus for exam boards and for Ofqual. With barriers to recruiting suitable examiners exacerbated by Covid, automarking software provides a solution to these problems that benefits students, teacher–examiners and exam boards.

Simon Armitage, Deputy Head at The Perse School, Cambridge, said:

“Whilst examination grades should never be the sole measure of ‘output’ from a school or the nature of any student’s achievements, it is self-evident that any grades must be fair.

“Anything that helps exam boards to deliver accurate results more reliably is good news for everyone – students, schools, universities and employers. It is one of the reasons why The Perse School has been pleased to be involved in the Blutick Maths project.

“If an Artificial Intelligence system is part of this improvement, then it also helps reduce inevitable human error and could help exam boards to circumnavigate the difficulties of finding well-qualified markers.”