AI Detection - Latest Recommendations - Artificial intelligence

A silhouette of a metal detectorist We’ve had a few member institutions contact us to ask for updated advice on AI detection software. This includes asking about recommendations for alternatives to Turnitin, so we thought it would be useful to produce some updated information.

First, the information in our previous blog post ‘AI writing detectors – concepts and considerations’ still holds, so we won’t repeat much from there, just to reiterate the four points:

No AI detection software can conclusively prove text was written by AI
It is easy to defeat AI detection software
All AI detection software will give false positives
We need to consider what we are actually trying to detect as AI-assisted writing is becoming the norm.

Can we recommend other more reliable AI detection tools?

We’ve been asked this question a number of times, particularly as institutions come to the end of any trial of Turnitin’s tool.

Our answer is that we wouldn’t recommend looking at other AI detection tools at the moment – we have seen none that are reliable enough, and that’s likely to remain the case for now. Those who are using Turnitin AI detection need to invest in educating staff that the results will miss much AI writing, and will give false positives, so claim work was written by AI when it wasn’t. Institutions therefore shouldn’t rely on AI detection, but instead, if they do opt to use it, make it part of a discussion with students if academic misconduct is suspected, and understand that the results might not be accurate.

This is backed up by an analysis of AI detectors by Weber-Wulff et al – Testing of Detection Tools for AI-Generated Text – this is a pre-print but appears robust and includes Turnitin’s AI detector alongside the other main players. It notes:

“The researchers conclude that the available detection tools are neither accurate nor reliable and have a main bias towards classifying the output as human-written rather than detecting AI-generated text. Furthermore, content obfuscation techniques significantly worsen the performance of tools.”

It should be noted that of the detectors analysed, this study shows Turnitin’s to be the most effective by most measures, including accuracy, and false positives (none were identified in this test) hence our recommendation that, for institutions committed to AI detection, there is little value now in looking at other alternatives unless for cost reasons. It’s perhaps not surprising that Turnitin’s solution is the most accurate in this study, given the volume of relevant training data they have access to, although we haven’t been able to find full details of this data set.

It should also be noted that whilst this particular study didn’t show false positives in Turnitin, others have – see for example ‘ We tested a new ChatGPT-detector for teachers. It flagged an innocent student’ in the Washington Post (April 2023). Turnitin themselves acknowledge that the false-positive rate was higher than expected in their update in May, and they explain steps they are taking to try to address this.

Finally, we need to bear in mind that AI detectors, even when accurate, won’t be able to help us discern how AI was used — for example, whether it wrote the entire text or just assisted with phrasing and clarity.

Are issues with false positives happening in the real world?

The issue of false positives does seem to be having a real-world impact. We haven’t seen any full studies of this yet, but a few of the stories in the press and Reddit give an indication that it is happening, but do not give a feel for the scale of the issue. The following articles are examples:

Professors are using ChatGPT detector tools to accuse students of cheating. But what if the software is wrong – USA Today (April 2023)
She Was Falsely Accused of Cheating With AI — And She Won’t Be the Last – Rolling Stone (June 2023)
What to do AFTER you are falsely accused of using AI at college/university – Interesting Reddit thread from a university advisor (June 2023)

There are also quite a few threads on Reddit started by students who have been falsely accused – we won’t attempt to list them all, but here are a couple that give a flavour:

Falsely Accused of using AI Reddit (July 2023)
Falsely accused of using CHATGPT by professor (July 2023)

What do Open AI say about AI Detectors?

Open AI shut their AI classifier down in July citing poor performance, noting:

“currently researching more effective provenance techniques for text, and have made a commitment to develop and deploy mechanisms that enable users to understand if audio or visual content is AI-generated.”

OpenAI also say the following in their guidance How can educators respond to students presenting AI-generated content as their own?

“Do AI detectors work? In short, no. While some (including OpenAI) have released tools that purport to detect AI-generated content, none of these have proven to reliably distinguish between AI-generated and human-generated content.”

Obviously, OpenAI are not a neutral player in this space, but they have committed to researching other approaches to determining the provenience of text.

Do we need to be concerned about bias in AI Detectors?

There’s mounting evidence that AI detection tools are more likely to provide false positives when classifying work by non-native English speakers. Intuitively, this isn’t surprising, in that we are more likely to write in a more formal, simplistic form, similar to that produced by ChatGPT etc. when writing in another language. You can read more about this in an article ‘ GPT detectors are biased against non-native English writers’. One thing that’s worth noting is that researchers rarely have access to Turnitin, so this study, like others with have seen, doesn’t look specifically at their software.

What are universities and colleges doing?

At the moment we have no firm data on what institutions are doing for the next academic year. We are picking up a mixed picture, with some institutions that have previously enabled the Turnitin detector disabling it and vice versa, and we are picking up similar stories internationally. We are not aware of any university or college in the UK that has formally adopted any AI detection other than Turnitin – if your institution has, we’d love to hear from you.

What next?

Our view has been that relying on AI writing detection is going to be futile as a primary mechanism of maintaining academic integrity, and nothing has changed. We have talked previously about a war between AI writing tools and detectors being futile, but, nonetheless, it is now starting to happen – see ‘How to bypass Turnitin‘ as an example of the technology battle.

Students we speak to want skills that are relevant to an AI-enabled workplace, and this includes making use of generative AI. Techniques for defeating AI detection will, for the foreseeable future, outpace the development of AI detection software.

Assessment redesign is key – see this blog post from Dr Isobel Bowditch on designing assessment in an AI enabled world for an example of the kinds of work that will help with this. We appreciate that, in some circumstances, such as cases when assessment is set by an external awarding body, assessment design won’t be in your control. Awarding bodies, sector bodies and institutions are going to need to work together to give clearer guidance.

In the meantime, if you have decided to use AI detection, it’s important to make sure staff are fully aware of the software’s weaknesses and you have clear processes for dealing with the results of AI detection. Along with this, ensuring clear/transparent information and guidance to students about what detection software is being used, and how the results are being evaluated, is key.

Find out more by visiting our Artificial Intelligence page to view publications and resources, join us for events and discover what AI has to offer through our range of interactive online demos.

For regular updates from the team sign up to our mailing list.

Get in touch with the team directly at AI@jisc.ac.uk