An introduction to copyright law and practice in education, and the concerns arising in the context of GenerativeAI

Introduction

Legislation and developing case law will, in time, clarify the legal obligations for Generative Artificial Intelligence tool providers as well as for rightsholders and users.

In the meantime, Generative AI is evolving rapidly and, as debate continues around its role within teaching, learning and assessment, the issue of ownership and of the laws of copyright in particular raise important concerns. Challenging questions arise which don’t have totally clear answers at the moment. These include:

If I use Generative AI tools to generate content, am I at risk of infringing the rights of someone whose work was used to train the model?

Will I as a user of the tool own the work that is created by my prompts? Or do authors whose works are used to train AI systems have an ownership claim? Or is the AI tool the author?

Does the material that I input to a Generative AI tool become part of the Generative AI tool database for others to use?

If I input someone else’s content into a Generative AI tool and content is generated could this be an infringement of their copyright?

Court cases that are currently underway will, in time, resolve many of these questions and if we in education want to engage with the transformative changes that Generative AI brings, we may have to live with a bit of risk and uncertainty until then.

AI-literacy – Protecting intellectual assets and creativity

In July 2023 the Russell Group published a new set of principles to help universities ensure students and staff are ‘AI literate’. The principles recognise the opportunities of Generative AI while drawing particular attention to the risks of plagiarised content and copyright infringement.

Understanding the legal aspects of how copyright operates in the context of Generative AI is central to supporting staff and students to become more Generative AI literate.

“AI literacy is just another part of digital/information literacy”.

Alex Fenlon Consideration for a copyright advisor – ALT COOLSIG – Webinar 63 – Copyright and AI in Education and Research.

Copyright infringement

In the UK it is fairly settled law that unless a work is licensed, out of copyright, or used under a specific exception, making copies of it will be infringement.

Copyright is a legally enforceable intellectual property right that gives a person ownership rights over the things they create. It makes it possible for the rights holder to profit from a work such as a book by preventing others from exploiting the work without their agreement for a period of time.

Is what the Generative AI tool doing copying?

A key question that arises is whether Generative AI is in fact copying the input that it uses to train its models. Or rather is it just analysing a lawfully accessed copy of that input material? The argument goes that Generative AI finds subtle patterns in the works it analyses, which it then uses to create new material (or a “derivative work”), guided by the prompts that are provided by users. There is no actual copying. This is contentious and cases that are currently being litigated in the US and the UK will, eventually, clarify whether copying is taking place. They will also set the boundaries of what a “derivative work” is under intellectual property law in the Generative AI context. This will obviously help those in education that are navigating the Generative AI revolution. An understanding of the technology as well as the risks of copyright infringement can in the meantime provide a pathway forward for those involved in teaching learning and research.

Training data

Universities are both substantial creators and consumers of copyrighted materials and make extensive use of the work of others. This “third-party” material used in teaching and research has always required certainty about its source and accuracy.

Generative AI creates text, images, music, speech, code or video based on learning from existing available content that is likely to be owned by someone and copyright protected. Where the Generative AI tool is able to provide the necessary assurance about the provenance of its output then the risk of infringement of others’ rights can be greatly reduced. This both enables the lecturer to credit the appropriate source and will ensure that the learning materials produced can more confidently be used as an asset by the university going forward.

Without that certainty the risk of infringement of the rights of others can undermine the entire process as well as diminish the value of learning materials as assets of the institution.

Liability for infringement

Part of the solution to this involves AI vendors reassuring their paying customers by promising legal support in the face of future legal threats. These indemnity clauses are often quite limited and act like an insurance policy, designed to reassure customers that it’s safe to use the technology for commercial (and presumably education) purposes. Some examples are looked at below.

Adobe

In June 2023 Adobe introduced an indemnity clause designed to ease its enterprise users fears about AI-generated art. The company undertook to pay out any claims should a customer lose a lawsuit over the use of Firefly-generated content.

Microsoft

In September 2023 Microsoft announced the Copilot Copyright Commitment for customers of the various Microsoft business-focused Copilots and Bing Chat Enterprise. Microsoft has undertaken to assume responsibility for the potential legal risks involved and will pay any legal damages if a third party sues a commercial customer for infringing their copyright by using Copilot.

OpenAI

In November 2023 OpenAI announced the “Copyright Shield” that will apply to the generally available features of ChatGPT Enterprise, the paid-for business tier of ChatGPT. Although this is designed to protect its business customers against copyright infringement lawsuits in certain circumstances, it does not apply to the free versions of ChatGPT.

Limitations on liability

It is clear that there are limits on the indemnification being offered. An example cited by Adobe is that the indemnification only covers the specific Firefly-generated output. Not anything else that might be added to the output that could infringe copyright would be covered, such as, for example, adding a likeness of Spiderman to the artwork.

Good practice then is to review the terms and conditions of the contract that is being entered into with the Generative AI tool provider to see if there is an indemnification clause in the contract and to understand what that covers.

Are licences the answer?

Publishers (rights holders) and the creative industries see challenges to their existing business models with the emergence and widespread use of Generative AI tools. In the world before Generative AI became widespread academic practice largely dealt with the ownership rights of others by:

licensing third party owned content from publishers either directly or by collective licences, or
by using exceptions to copyright law that are contained in legislation in particular the Copyright, Designs and Patents Act 1988 (CDPA).

One such licence widely used in Higher Education is managed by the Copyright Licensing Agency (CLA).

This sector agreement between rights holders and education providers has delivered certainty for rights holders as well as for lecturing staff, for example, producing lesson plans and learning materials. Rights holders are protected as the CLA licence agreement prohibits the copying of substantial amounts of licensed content. It also bars making licensed content available to others without permission.

As part of what we can call Generative AI literacy those using licences need to understand what they are permitted to do as licensees and stick to those terms and conditions. Clarifying for those in education how CLA licensed materials, for example, can be used with Generative AI tools would be beneficial.

Altering existing licences such as the CLA licence is unnecessary and could have the effect of preventing existing rights to legitimate non-commercial research and teaching and learning.

It remains to be seen how any alleged breaches can be enforced should a publisher contend that Generative AI tools are being used to circumvent or expand the licensed use in ways that were not anticipated when such licences were agreed.

What terms and conditions are needed in licences?

Currently educators who want to use Generative AI for teaching and learning and research are required to deal directly with the tool providers and agree to whatever terms and conditions are in the individual contracts.

There is an argument that educational use of Generative AI requires different terms and conditions in such licences. There is a call for agreements to provide assurances on the provenance of training data used. This transparency would then enable academics to uphold an acknowledgement-based approach. Such education agreements would also need to clarify how outputs created from content that academic staff submit to Generative AI tools can be acknowledged and protected.

Without Generative AI special educational agreements innovation in teaching and learning may be constrained. Meanwhile institutions need to continue to support academics, students and researchers to understand what they can and can’t do with third party licensed materials when using Generative AI tools.

Copyright exceptions

In education using the existing exceptions to copyright law is also put forward as part of the solution. One exception that is pivotal to Generative AI is s.29A of the CDPA – Text and Data Mining (TDM). This allows researchers to make copies of any copyright material for the purpose of “computational analysis” subject to certain restrictions.

“If the government’s aim is to promote an innovative AI industry in the UK, it should enable mining of available data, text, and images (the input) and utilise existing protections of copyright and IP law on the output of AI.”

Pro-innovation Regulation of Technologies Review Digital Technologies – March 2023 Response to Sir Patrick Vallance’s Pro-Innovation Regulation of Technologies Review

To achieve this, in late 2023 the UK Intellectual Property Office (IPO) brought together representatives from AI companies, as well as arts and news organisations, to produce guidance and a “code of practice” on how the mining of text and data for AI models could be authorised.

As of February 2024, it is reported that this “code of practice” is delayed because of failure to reach agreement on a set of rules. The consequent uncertainty effects all of those who could benefit from this exception including those in education and research as well as artists, authors and musicians who fear their work will be used and copied without compensation.

The copyright exceptions in the CDPA including TDM remain relevant to how AI is used. However, the uncertainty about an agreement on a new AI copyright code of practice is damaging to education particularly which requires assurances about the sources and accuracy of teaching and research materials.

Plagiarism

Students/researchers

In the same way that the use of Generative AI tools by staff and students has consequences for copyright practice in institutions their use presents significant plagiarism challenges in teaching, learning and assessment.

“Plagiarism means copying or paraphrasing someone else’s work or ideas or information without giving them proper credit. The source should be acknowledged and cited correctly.”

Attribution

Educators need to know where content has originated from. This need to have rightsholders’ works accurately attributed when included in Generative AI outputs is recognised. Given the iterative and often piecemeal way that outputs are created from multiple sources this remains complex and difficult to achieve at this time.

Authenticity

There is a need for clarity on which outputs from Generative AI are authentic and which are not.

“Any manipulation and / or use of copyright-protected works by Generative AI systems should not undermine the integrity, accuracy or original meaning of the original works.”

CLA – Principles for Copyright and Generative AI–

So, how can the risks associated with students and researchers using Generative AI tools in their learning, research and assessment work be mitigated?

Once again, the answer lies in being able to have assurances about the provenance of Generative AI outputs to enable appropriate citation and acknowledgement of third-party work.

Without clarity in terms of the source of training data the reliability and quality of Generative AI outputs are put in question as well as the risk of plagiarism hanging over the work.

Legislation

The approach to regulating Generative AI varies across different jurisdictions. However, it is widely recognised that ownership rights of Generative AI generated work and how those outputs are used is contentious and must be resolved in order to encourage innovation and support the opportunities that Generative AI presents.

United Kingdom

Are new Generative AI laws needed?

In the National AI Strategy, the UK government indicates that they consider that regulation is not always the most effective way to support responsible innovation. An AI Regulation Roadmap proposes a framework underpinned by principles to guide and inform the responsible development and use of AI. These principles include transparency, fairness and accountability.

UK law already protects computer-generated works which do not have a human creator (s178 CDPA). However, it was accepted (during the National AI Strategy consultation process) that some clarification was needed on the interpretation of words such as “author” and “creator” in the context of AI-assisted works.

As stated earlier, part of promoting an innovative AI industry in the UK involves certainty in terms of mining of available data, text, and images (the input). It means utilising existing protections of copyright and IP law on the output of AI.

For AI firms there is an urgent need to prioritise practical solutions to the barriers faced in accessing copyright and database materials.
For rights holders compensation for Generative AI use of their content is demanded.
Those in education require certainty about the source and accuracy of materials used in teaching and research.

Achieving the correct balance for both rights holders and those who want to innovate using Generative AI is a priority. Part of the solution could be the delayed IPO guidance and “code of practice” on the mining of text and data for AI models as mentioned above. Hopefully this ICO guidance and the “code of practice” can be progressed and made available soon.

United States

In the US generally, no copyright exists for the outputs of AI systems. The US differs from the UK in that it has a copyright registration system administered by the United States Copyright Office (USCO). Applicants have a duty to disclose the inclusion of AI-generated content in a work submitted for registration. In a significant ruling the USCO has stated that if a work’s traditional elements of authorship were produced by a machine, the work lacks human authorship and cannot be registered.

On 30 August 2023 USCO announced that it will undertake a study of the copyright law and policy issues raised by Generative AI and will assess whether legislative or regulatory steps are required.

European Union

The EU has taken a wide-ranging and precautionary approach to regulation of Generative AI. The EU AI Act (text not yet published) is intended to provide legal certainty for investment and innovation in AI. It adopts a tiered approach where Generative AI systems will have to be more transparent. All “general-purpose AI systems/models” will be obliged to comply with EU copyright law and provide detailed summaries of the content used for training AI models.

Legislation summary

In this time of legislative change and progress, clearly a more comprehensive review and study of emerging and developing legislation is called for. In particular, how will this legislation govern and regulate the use of Generative AI in UK education?

Case law

The outcome of a number of legal cases will shape the Generative AI landscape going forward.

In the UK the courts have yet to rule on the copyright issues related to content created with Generative AI. In the US, AI companies have faced multiple lawsuits from figures in the creative industry that believe the copyright of their work has been breached to train AI models. It is argued that the companies infringed copyright by using authors materials to train AI language models, and separately that the models’ output also violates their copyrights.

New York Times case

In December 2023, the New York Times sued OpenAI and investor Microsoft following accusations that it used “millions” of articles to train the Generative AI program. The arguments hinge on whether using copyrighted works by OpenAI to train its technologies is “fair use” under US law. Clearly the volume of material used to train the AI model has a bearing on whether “fair use” can apply.

AI companies claim that they can legally use such content to train their technologies without paying for it because the material is public and they are not reproducing the material in its entirety.

This is a crucial case that will have consequences for all AI systems and users worldwide. At this time there is no indication as to when a determination will be made. Many people assume that a negotiated settlement will be the outcome where payment for training content will be made to rights holders of copyrighted content.

Getty Images

In the UK, Getty Images raised legal proceedings against Stability AI before the High Court in London alleging infringement of its IP rights. Getty’s claims that images are used as data inputs for the purposes of training the Stability AI tool, as well the outputs generated which it is claimed reproduce substantial parts of Getty owned copyright works.

Summary – What’s next?

Generative AI is evolving rapidly and in terms of teaching, learning and research universities are aware of the risks with regard to academic integrity, plagiarised content and copyright infringement. Court cases that are currently being litigated will, eventually, help us avoid these risks. Taking the approach that copyright law applies to the use of Generative AI tools is how some universities are adapting to the challenges in the meantime.

Inputs

Generative AI tool providers are coming under increased scrutiny for the use of copyright protected works as training material. The New York Times case above is one example. More transparency is being called for by regulators and rightsholders about the source of the training content that is being used. This is the approach taken in the EU AI Act.

There is a growing need to develop innovative licensing models that will set the conditions and payment models for use of training materials belonging to rights holders. This can provide the sort of certainty that education users in particular require.

Outputs

All of those who want to use Generative AI outputs either for innovative or education purposes need to have the confidence that what they are using has been obtained lawfully and that the outputs don’t breach copyright.

The stakes are high. It is accepted that common-sense negotiated agreements which take into account the ownership rights of authors, creative artists, news organisations as well as other rights holders is desirable although not necessarily inevitable.

How and when these agreements will be achieved remains uncertain.

Education

Can existing licensing agreements that have been painstakingly and delicately negotiated and established with rights holders continue to serve the interests of both publishers and education providers in the era of Generative AI?

There is certainly a need for clarity and recognition of outputs created from content that academic staff submit to Generative AI tools, for example. There is also a need for assurance that outputs and the sharing of those outputs to students, doesn’t breach copyright.

Supporting staff and students in terms of their teaching, learning and research in this fast-changing environment is challenging. An understanding of the Generative AI technology itself and the associated risks (‘Generative AI literacy’) will help stimulate learning and development of key skills for life and the workplace.

Thanks

Our thanks to the HE collaborative copyright working group for their direction, support, and input to this blog:

Manya Sikombe, Junior AI Specialist, Jisc
David Callaghan, Liverpool School of Tropical Medicine
Georgina Dimmock, University of Northampton
Michele Smith, University of Islands and Highlands
Mike Reddy, University of South Wales
Rob Howe, University of Northampton
Tim Hall University of London
Robert Pashley, The University of Law
Ruth Powell University of Arts London
Darren Flynn, University of Northampton
Andrew Gray, University of Arts London
Catherine Luck, University of London
Ben Taplin, Contracts Specialist, Jisc

Acknowledgements

Microsoft announces new Copilot Copyright Commitment for customers

UK AI copyright code initiative abandoned – Pinsent Masons LLP

Navigating the Complexities of Generative AI in Intellectual Property: Challenges and Opportunities

Getty Images v Stability AI: copyright claims can proceed to trial – Pinsent Masons LLP

Guidance to Staff on the use of Artificial Intelligence – University of Leeds

Dawn of the EU’s AI Act: political agreement reached on world’s first comprehensive horizontal AI regulation

About Plagiarism.org

AI Raises Complicated Questions About Authorship

Find out more by visiting our Artificial Intelligence page to view publications and resources, join us for events and discover what AI has to offer through our range of interactive online demos.

For regular updates from the team sign up to our mailing list.

Get in touch with the team directly at AI@jisc.ac.uk

One reply on “An introduction to copyright law and practice in education, and the concerns arising in the context of GenerativeAI”

Thanks for this, I find useful and reassuring to know that the sector is starting to consider the power of AI also from the copyright/regulation point of view