Reflections and outcomes from our chatbot pilots

Introduction

We’ve just finished our extended chatbot pilots, started in 2022, in collaboration with Bolton College. A lot has changed in the world of chatbots since we started, particularly with the release of ChatGPT. We think a lot more is going to change over the next year as well.

The chatbot pilots were well received, but because of changes in the technology, and the way chatbots such as ChatGPT have changed user expectations, we’ve decided not to proceed further with these. Instead, we are going to monitor how generative AI chatbots and services evolve over the next few months before making further recommendations.

We’d very much like to thank Bolton College for working with us on this, and for the pilot colleges, Ayrshire College, Sandwell Collge, Yeovil College and Blackpool and The Fylde College.

There were still useful lessons learnt from the pilot, so in the rest of this post we’ll outline the pilot and evaluation approach, and discuss some of the key outcomes.

About the pilots

The chatbot approach used in this pilot was based on a ‘traditional’ chatbot – in this instance using Amazon Web Service’s Lex, but similar tools are available from IBM, Microsoft and many others. The aim was to create a general-purpose chatbot that could answer a range of questions about the college. These chatbots rely on providing a range of question-and-answer pairs, with each question posed in multiple ways. Each of the colleges customised a question-and-answer set, based on those used by Bolton College in Ada, typically with around 200-300 question-and-answer pairs, which each question phrased between 3 and 5 different ways, meaning the chatbots could answer over 1000 different questions. These were typically a range of general questions about the institution and its services and systems, for example:

“what time does the library open?”
“where can I buy coffee?”
“How do a reset my password?”

Three of the chatbots were general, and one chatbot was focused more on answering questions about student finances.

We evaluated the project by looking at a number of different aspects including:

The student perceptions of the chatbots
Technical effectiveness of the chatbots
The time savings associated with having a chatbot

The student perceptions of the chatbots

We explored student perceptions of the chatbot through focus groups, with students from three out of the four colleges in the pilot. Broadly, the feedback from the students was positive, noting that it was better at simpler questions. A number of the students also expressed that they preferred interacting with the chatbot to a person, particularly for basic questions, and found it easier than searching the college’s website.

A number of students mentioned that it didn’t quite meet their expectations in terms of responsiveness and ability to answer questions the first time, without the need to rephrase.

Most of the students would welcome a chatbot that was integrated into college IT systems and could give personalised answers.

Overall, the feedback indicated that chatbot services were a useful and welcome addition, and with more functionality would be even more valuable.

Technical effectiveness of the chatbots

The questions and responses to the chatbot were stored anonymously and analysed to see how frequently the chatbot responded with the right answer. This was typically around 40%-50% of the time. We found students often put in ‘test’ questions to play with the chatbot, which would be very unlikely to give a valid response. We, therefore, put more weight on the student workshops when accessing effectiveness. Either way, we saw that further work, by adding in more questions and answers, would be needed to approach a more useful level of response rate. This is a challenging and time-consuming activity and one that we think may well be partially solved by generative AI chatbots, which should be able to answer questions based on existing documents and information sources without the need for manually curating question sets.

As an aside, as we had the ability to analyse the questions asked (anonymised), it also gave us insights into the specific questions that were important to students. This varied between colleges but has the potential to be a useful data source in improving the information provided to students.

Time saved by implementing the chatbot.

We attempted to measure the time saving, in terms of staff time, in implementing the chatbot. We did this by estimating the staff time it would take if a person was answering the queries. Following the student focussed groups, it became clear this wasn’t necessarily an accurate comparison, as a lot of the use cases were about making it more convenient for the student, rather than preventing the query from going to a staff member.

One thing that was clear from discussions with staff in the pilots was, as mentioned before, the time in creating and curating the question-and-answer sets was fairly substantial, and for the pilots likely outweighed any time-saving.

Moving Forward.

The pilots showed that a chatbot that could answer general questions would be valued by students in colleges. However, setting it up was a time-consuming activity, and required substantial investment to get a good range of questions and answers. The pilots have shown this was possible, but the technology landscape changed substantially with the release of ChatGPT.

ChatGPT is based on a Large Language Model, pre-trained on vast amounts of information from the internet. This has resulted in two very significant changes in the chatbot landscape:

User expectations about a chatbot have shifted significantly
New, lower-effort techniques for training chatbots are emerging.

User expectations

Whilst the chatbot approach used in the pilot can provide useful functionality, a generative AI approach completely changes the user experience, so the user can chat to the bot in a very natural way, ask follow-up questions, ask questions in many different ways and always get a fairly natural response. This is very different to the experience with a traditional chatbot, with its much more limited training set. User expectations have therefore shifted, and the approach used in this pilot would no longer match users’ view of chatbot capability.

New Techniques for Training

Generative AI chatbots can answer questions about given texts with no extra training, so, for example, you can point it at a pdf or web page, and then ask questions with no preparation, unlike the approach used by traditional chatbots where question-answer pairs are required.

Tools are starting to emerge where a chatbot can be created by just pointing it at a group of pre-existing documents or webpages, and we expect this space to grow hugely over the next year, especially with the announcements from Microsoft and Google around Co-pilot and Workspaces.

This approach will remove one of the major barriers to chatbot adoption, namely set up time, and it will mature over the next year.

Recommendations

The pilots showed that staff and students liked the idea of a chatbot that could answer questions about their college, and valued their implementations. There were, however, significant concerns about the effort required to maintain the datasets behind them.

Given ChatGPT and large language models have completely changed the chatbot landscape, our decision is not to proceed with the approach used in this pilot, and to watch chatbot technology over the next few months to decide the next steps. It is likely that this will be guidance on how to use pre-existing large language model chatbots effectively in institutions rather than us creating a tool, but this will become clearer fairly quickly.

Find out more by visiting our Artificial Intelligence page to view publications and resources, join us for events and discover what AI has to offer through our range of interactive online demos.

For regular updates from the team sign up to our mailing list.

Get in touch with the team directly at AI@jisc.ac.uk