EduMark AI: A generative AI marking and feedback solution from Queen Mary University of London (QMUL)

This academic year, a key focus for Jisc’s AI team is understanding AI’s potential to support marking and feedback. We’re currently running three parallel pilots of AI solutions (Keath, Graide and TeacherMatic). And we will soon be launching a project in which we’ll be evaluating the impact that general-purpose AI tools (e.g. ChatGPT and Gemini) can have on marking and feedback.

Given this focus, we’ve also been talking to colleges and universities who’ve developed innovative approaches in this area. Queen Mary University of London, for instance, has developed their own dedicated platform, EduMark.

While not all institutions will choose to develop products in-house, Queen Mary’s example provides useful lessons in how thoughtful design, testing, and educator oversight can make AI a genuine partner in improving feedback quality and consistency.

The following is a guest blog from the project’s lead, Dr. Deepshikha, School of Engineering and Materials Science, Queen Mary University of London

The Challenge of Assessment in Higher Education

Traditional assessment methods in higher education face mounting pressures. At Queen Mary University of London (QMUL), like many institutions globally, educators face a significant challenge in balancing teaching and research with time-intensive tasks such as exam marking and providing comprehensive feedback. Traditional methods of assessing written assignments are not only time-consuming but can also lead to inconsistent grading standards. For instance, within the School of Engineering and Materials Science (SEMS) at QMUL, marking and feedback for assessments is estimated to consume approximately 3,400 hours annually, or about 67 hours per educator for a class of 50 students. This extensive workload often impacts the timeliness and consistency of feedback provided to students.

Sound familiar? What began as our quest to solve this time management problem evolved into EduMark AI, but more importantly, it revealed a practical pathway that any institution can follow using readily available AI tools.

Pioneering EduMark AI: Our Learning Journey

Recognising the potential of AI to revolutionise assessment, we embarked on the EduMark AI project with a precise aim:

to select and implement the most effective AI tool for grading and providing personalised student feedback, thereby significantly reducing educator workload and enhancing the quality of feedback.

Our journey began with a rigorous evaluation of multiple large language models (LLMs), followed by the careful design of a prompt framework, assessment criteria, and feedback structures that could align AI-generated outputs with academic expectations and pedagogical integrity.

While we ultimately developed EduMark AI as a comprehensive system, one of our most valuable discoveries was that the core methodology can be replicated using tools like ChatGPT, Custom GPTs, Gemini, or Copilot. The secret isn’t in the technology, it’s in the systematic approach to reviewing and selecting the right AI model for your needs, developing effective prompt templates, evaluating performance and adapting based on insights. Let me share exactly how we did this, and how you can adapt our approach using tools you likely already have access to.

Phase 1 – LLM Selection

The first step in the EduMark AI journey involved a structured evaluation of leading Large Language Models (LLMs) to determine which one would be most effective in an educational assessment context. We compared multiple models, including ChatGPT, Gemini, and Graide, against criteria such as grading accuracy, ease of integration, language fluency, feedback clarity, cost-effectiveness and response time.

You can replicate this process safely using institutionally approved AI platforms by following these steps:

Test with 5-10 anonymised student assignments from your course
Use the same rubric for both AI and human grading
Compare outputs for consistency and quality

After extensive internal benchmarking using anonymised student assignments, ChatGPT emerged as the most suitable model for our purposes. It offered the best overall performance in generating assessment-aligned feedback, showed strong alignment with rubric-based criteria, and was readily adaptable across diverse assignment types and marking schemes.

This selection process also considered student data privacy, cost, model responsiveness, and adaptability across various disciplines. Ultimately, we chose ChatGPT not only for its technical strength but also for its reliability in real-world testing during early pilot phases across multiple modules. However, your results may vary depending on your discipline and assessment types, which is why testing is crucial.

Phase 2 – Prompt Engineering (The Real Game-Changer):

This is where the magic happens. After selecting the LLM, we shifted our focus to training and fine-tuning its outputs for academic use. We discovered that effective AI assessment hinges on three key prompt components:

Structured Assessment Prompts

We developed template prompts aligned with our marking rubrics. For example:

“You are an experienced academic assessor. Evaluate this [assignment type] using the provided rubric. Focus on [specific criteria]. Provide a numerical score and detailed feedback covering: what went well, areas for improvement, and specific suggestions for enhancement.”

Exemplar Integration

We trained the AI using exemplar answers at different grade boundaries. You can do this by:

Providing 2-3 examples of excellent, good, and poor submissions
Including these in your prompt as reference standards
Asking the AI to calibrate its responses accordingly

Feedback Structure Templates

We designed systematic feedback formats that ensure consistency:

Strengths: What the student did well
Areas for improvement: Specific weaknesses identified
Action points: Concrete suggestions for enhancement
Overall grade: With justification

Significantly, these templates were refined through iterative testing, educator validation, and student focus group feedback. The result was an AI-driven system that not only produced scores but also detailed, personalised feedback comments structured around “what went well” and “areas for improvement”, ensuring a student-friendly and pedagogically sound experience.

What We Learned: The Critical Success Factors

With the prompt framework and model selection in place, the EduMark AI system was deployed as a shadow marker alongside traditional educator grading. Over 200 diverse student submissions, including lab reports, coursework essays, and research posters, revealed important insights that inform how you should approach AI assessment.

The results revealed several notable differences between AI and educator marking. While AI-generated marks tended to align well within the mid-performance range, deviations emerged at the extremes. The AI was observed to slightly over-mark lower-performing students, possibly due to its tendency to reward partial structure or the use of terminology, and to under-mark high achievers, where subtle reasoning, originality, or implicit understanding played a significant role. Unlike educators, who can interpret context, cross-reference prior performance, and recognise deeper insight beyond rubric descriptors, AI operates strictly within the limits of its prompt logic and rubric constraints.

Moreover, the tone and richness of feedback varied. AI feedback was often clear, structured, and quick, but occasionally lacked pedagogical sensitivity, developmental scaffolding, or tailored encouragement that educators naturally provide. This highlighted the critical importance of educator oversight in reviewing and refining AI-generated outputs. To address this, we embedded an “Educator Review” stage into our workflow, ensuring that AI feedback could be amended or enriched before being released to students.

This hybrid model preserved the efficiency gains of automation, saving up to 60% of marking time, while maintaining academic integrity and personalisation. For instance, marking 40 reports using AI took approximately 230 minutes, compared to 610 minutes when done manually by a human marker, without compromising the core elements of effective feedback. Efficiency with AI increases as cohort size increases, as training time remains constant regardless of the number of students.

Challenges and How We Tackled Them

One of the primary challenges was ensuring consistency between AI and human grading. Our analysis revealed that while AI marking closely aligned in the mid-range, it tended to overmark lower-performing students and underrate top-performing work, possibly due to a stricter adherence to rubric boundaries. This highlighted the importance of the ‘Educator Review’ step in our workflow, allowing for validation and adjustments to AI-generated scores and feedback.

Another significant aspect was addressing educator familiarity and confidence with AI technologies. We recognised that many educators are still new to these tools and may lack confidence in selecting and utilising them effectively. To overcome this, our project includes a robust training component with materials on how to use the recommended AI model, as well as continuous support to build faculty competence and confidence.

Vision for Scalable, Ethical AI in Assessment:

The results from EduMark AI are auspicious, demonstrating a quantifiable impact on workload and feedback quality. We achieved a significant reduction of 50-60% in grading time.

Beyond efficiency, the quality of feedback was a key focus. Student feedback consistently praised the AI’s specificity, highlighting how it helped them identify what went well and what needed improving, with concrete examples and suggested corrections. Students weren’t just receiving faster feedback; they were receiving better feedback. The AI’s ability to provide detailed, specific, and actionable comments consistently exceeded their expectations. Our data also showed strong grading consistency, with mark differences between educators and AI clustered around 0-10%.

Our vision for EduMark AI is one of scalable and ethical integration within educational settings. The project prioritises educator control, positioning AI as a powerful assistant rather than a replacement for human judgment. The promising results underscore the potential for a user-friendly, web-based application that seamlessly integrates with existing learning management systems, such as QMPlus and Turnitin.

Beyond Efficiency: The Broader Impact

EduMark AI represents more than a time-saving tool. It serves as proof of concept for integrating ethical AI in education. Our experience demonstrates that AI can enhance, rather than diminish, the human elements of education. It demonstrates how to:

Enhance consistency in assessment practices
Provide personalised learning experiences at scale
Free educators to focus on higher-value activities like mentoring and innovation
Improve accessibility of detailed feedback for all students

The future of assessment isn’t human versus AI, it’s human with AI. EduMark AI shows what’s possible when we thoughtfully integrate technology with pedagogical expertise, always keeping student success at the centre of our efforts. We believe the findings and methodologies from EduMark AI can be readily scaled across Queen Mary University of London and potentially other institutions, promoting widespread adoption of AI in education, particularly within STEM disciplines. This directly supports our institutional goal of embedding AI literacy across our programs and contributes to a future-oriented, high-impact educational ecosystem.

Want to learn more? The approach we’ve outlined here can be adapted across disciplines and institutions. The most important investment isn’t in technology, it’s in developing a systematic approach to prompt design, validation, and continuous improvement that makes AI assessment truly effective. If you’d like to find out more, you can contact us for guidance on implementing these approaches in your institution or to try our beta application where you’ll get to see the whole system in action.

Dr. Deepshikha is a Teaching Fellow in the School of Engineering and Materials Science at Queen Mary University of London. Her research focuses on AI applications in education and sustainable learning technologies. She can be reached at d.deepshikha@qmul.ac.uk

Find out more by visiting our Artificial Intelligence page to explore publications and resources, learn more about our communities and sign up for our AI Literacy training.

For regular updates from the team sign up to our mailing list.

Get in touch with the team directly at AI@jisc.ac.uk