The Practicalities of Keeping the Human in the Loop: insights from the AI in Marking and Feedback Pilot

A teacher assisting a student with a laptop in a classroom.

In 2025, Jisc launched the AI in Marking and Feedback Pilot, a year-long initiative bringing together colleges and universities to explore whether AI can meaningfully reduce marking and feedback workload in a way that is acceptable to key stakeholders.

The project spans two strands — tools designed specifically for educational purposes (these were Graide, Keath and TeacherMatic), and general-purpose AI tools (such as ChatGPT, Gemini, and Copilot), where the project focuses on custom assistants.

Across the initial months of the pilot, insights have been collected via regular community sessions, feedback forms and one-to-one interactions. As common themes appear, we’d like to share insights with the wider Jisc membership, allowing learnings to translate to direct value for the sector.

As such, we’re publishing this series of blogs, which we hope will give you a useful window into what the pilot has revealed about the role of AI within marking and feedback. You can read all the blogs in this series by following the links on this parent page.

The Practicalities of Keeping the Human in the Loop

Given the pilot’s remit of establishing what it means to use AI acceptably in the context of marking and feedback, we decided not to be too prescriptive in mandating rules and standards for implementation. That said, one of the parameters we did put in place was that wherever AI was used to mark and give feedback on students’ work, a human should retain oversight of the process.

This principle aligns with awarding-body positions: Ofqual’s approach to regulating AI, for instance, stresses that AI cannot determine marks autonomously, and that fairness, validity and public confidence depend on active human oversight.

Yet the pilot has demonstrated that keeping the human in the loop is more complex than one might initially think.

One of the clearest challenges is the problem of unintentional steering – the issue of educators being influenced by AI feedback, rather than reviewing it critically. To what extent can educators ensure their judgement cuts through when the AI’s feedback is presented as a comprehensive evaluation of the student’s work?

A related challenge is the risk of minimal human checking. Could the human-in-the-loop be reduced to simply scanning one’s eyes over the student’s work and the AI feedback before clicking ‘next’?

At first glance, this problem seems to be driven by a lack of diligence, but further insights revealed a more interesting logic. As per the objectives of the pilot, the motivation for using AI in this context is to reduce workload. As such, there is an expectation that the AI’s outputs can be trusted and delegated to to some extent. If the human in the loop is putting significant effort into reviewing the student’s work and the AI’s feedback, then what is the purpose of using AI at all?

Despite these challenges, the pilot is also revealing practical ways to resolve this tension, allowing for human oversight and time savings.

One successful approach is that of dual marking. This is where the educator reviews and makes notes on the student’s work before seeing the AI’s feedback. The educator is then in a stronger position to review the AI’s first draft of feedback discerningly, and to give the platform their notes on both the student’s work and the AI’s feedback – inputs that can be used to contribute to a second draft.

In terms of time savings, the key here is for the human input to be focused on the overall effectiveness of the student’s work and some of its standout aspects. The AI, meanwhile, plays the role of analysing fulfilment of the marking criteria and writing up the feedback in rich detail.

(To deviate along a tangent for a moment, the pilot has also highlighted the importance of having clear, detailed rubrics to input into AI platforms. There’s an old saying in the AI field: “garbage in, garbage out”. In our context, this translates to vague or ambiguous marking criteria leading to confused, unhelpful feedback.)

This dual marking approach can be facilitated by the features of TeacherMatic, and custom assistants (for example Custom GPTs, Copilot Agents and Gemini Gems) can also be constructed around this logic.

Graide, however, is designed with a distinct approach to keeping the human in the loop. With Graide, the AI platform does not utilise large language models. AI feedback is achieved by spotting patterns between features of the student’s work and previous feedback comments given directly by an educator. The AI, in essence, recommends each granular piece of feedback based on what it has learned about the feedback the educator has given to similar pieces of work. The educator’s role, therefore, is to verify that the feedback reflects their previous judgements. The process is comparable to a master checking that their apprentice is carrying out a task as instructed.

Ultimately, the pilot is reinforcing an important lesson: human oversight works best when it is designed intentionally into the workflow. When educators are given a clear role in shaping, verifying and refining AI-generated feedback, the technology becomes more of a collaborator rather than a substitute. As the pilot continues, this design question — how to structure meaningful human involvement without undermining the productivity benefits of AI — is likely to remain central to how institutions approach AI-supported marking and feedback.

Find out more by visiting our Artificial Intelligence page to explore publications and resources, learn more about our communities and sign up for our AI Literacy training.

For regular updates from the team sign up to our mailing list.
Get in touch with the team directly at AI@jisc.ac.uk

The Practicalities of Keeping the Human in the Loop

By Tom Moule