Exploring GPT-5: Advances, Insights, and Early Takeaways

Last week, we saw the release of several new models from the main providers of Frontier Large Language Models (LLMs). The most anticipated was the release of GPT-5 from OpenAI, which has been in the works for many months. Its release has, however, been met with mixed reactions. This solution differs from other LLMs currently on offer by abstracting the task of model choice away from the user. It is described as a unified system comprising multiple models alongside a real-time router to direct requests to the appropriate one, selecting from smart and fast models or slower, “thinking” models.

How is GPT-5 different?

All interactions with the GPT-5 option within ChatGPT will be assessed and directed to the most appropriate model for the use case. Plus and Pro users also have the option to manually select between GPT-5 (gpt-5-main), and GPT-5 Thinking (gpt-5-thinking), should they want to ensure a more in-depth response over response speed. If users hit usage limits, all requests will be sent to a smaller versions of these models called gpt-5-main-mini and gpt-5-thinking-mini until their usage resets. Free users can send GPT-5 10 messages every 5 hours, plus users get 160 messages per 3 hours and team and pro plans receive unlimited access.

The GPT-5 system card provided by OpenAI states the router will allocate a model based on conversation type, complexity, tool needs and explicit intent. Whilst it isn’t disclosed exactly how this decision is made, the onward methodology likely reflects requests made via the Application Programming Interface (API) version of GPT-5. These requests allow you to manually set the “reasoning” effort, “verbosity” and ability to use ChatGPT’s tools. Unless you are focusing heavily on an agentic workflow, the first two are most likely to dictate the quality of your response. High reasoning capability will return a more in-depth, well-thought-out response, whereas the “minimal” setting will favour a fast, concise response. The verbosity parameter will dictate the number of output tokens assigned to your query. So similarly, low verbosity will result in a quick, concise response, and high verbosity responses will be thorough in their explanations. We assume these parameters, or something similar, are being assigned to user requests in ChatGPT, with the embedded router using this information to decide the most appropriate model to handle the response.

The Artificial Analysis Intelligence Index, which incorporates 8 evaluations, including MMLU-Pro, Humanity’s Last Exam and other respected AI benchmarks, allows us to compare GPT-5 with previous OpenAI models and LLMs from other popular AI providers. Benchmarks assessing the performance of closed source LLMs, like those included in GPT-5, are normally carried out with requests to the model via the API, not the web or app user interface. We have touched on the fact the methodology differs in ChatGPT, with a GPT-5 based model being automatically assigned for the user, unless one is manually selected. Without knowing exactly how OpenAI’s model switching technology works, looking at the GPT-5 API benchmarking data is the closest we can come to comparing ChatGPT in model intelligence. Below is the current Artificial Analysis Intelligence Index for GPT-5 across the 4 levels of reasoning capability, minimal, low, medium and high, accessed via the API.

A graph showing the Artificial Analysis Intelligence Index data from Tuesday 12th of August 2025. This data is subject to change with the release of model updates and is available via the Artificial Analysis website.

All iterations of GPT-5 should perform better in intelligence than GPT-4o, but to what extent is determined by the model switching technology. Some users, who have never strayed away from using GPT-4o, may experience higher-quality responses from these new models. Resulting from the model switcher allocating their request to a higher-performing model with reasoning capabilities than they previously opted for. However, only those assigned high reasoning or forwarded to a GPT-5 Thinking model are likely to outperform OpenAI’s previous reasoning model, o3 and Google’s Gemini 2.5 Pro for intelligence based tasks.

User First Impressions

As part of the rollout of GPT-5, OpenAI initially retired their previous generation of models, including GPT-4o, GPT-4.5, o3, o4-mini and o4-mini-high. This resulted in mixed responses with several users mourning the loss of their access to the legacy models, stating they do not find the new model to be performing to the same standard. One cause of this may be due to the fact that GPT-5 was not initially transparent about which model it is using to generate each response, which has now been rectified. The model used can be viewed by hovering your mouse over the circular arrows button under each generated response.

A screenshot of a chat window with ChatGPT using GPT-5. The user asked "What is the capital of scotland" and ChatGPT has responded "The capital of Scotland is Edinburgh." Below the AI generated response is the user's cursor hovering over a button of two arrows in a circular configuration with a pop up stating "Try again... Used GPT-5" — Screenshot of ChatGPT showing LLM used for the generated response via a pop-up window.

Speculatively, this might also be attributed to the model switcher failing for some time upon release and allocating most requests to a lower-performing model than required. This was reported by Sam Altman, CEO at OpenAI, on X. For users familiar with models such as o3 or Gemini 2.5 Pro, this would have been a significant reduction in performance. Sam Altman also stated further updates were being made to the decision-making boundaries within the model switcher in a bid to improve the efficacy of the automated model selection.

Additional concerns were noted around ChatGPT Plus users receiving a reduction to the number of reasoning requests they were able to make, which OpenAI responded to by upping weekly requests for GPT-5 Thinking from 200 to 3000. These usage limits do not apply to requests automatically sent to GPT-5 Thinking model from interactions with the base GPT-5 model.

The main promotional information around the release of GPT-5 included that it was going to be a lower latency model with reasoning capabilities than we have seen from OpenAI previously. In addition to this, OpenAI claim it is less likely to hallucinate. It is reported that GPT-5 is ~45% less likely to contain a factual error than GPT-4o and ~80% less likely o3 when utilising “thinking”. GPT-5 is also purportedly better at following instructions and has a reduced amount of sycophancy than reported in previous models, something users had complained about.

There are many complaints relating to the loss of GPT-4o in particular, comparing it to the feeling of losing a friend, due to its adaptable and agreeable nature. Users are reporting that GPT-5 feels cold and lacks creativity within its responses, and that too much emphasis has been placed on intelligence within GPT-5 as opposed to emotional user experience. OpenAI responded to these complaints and reinstated access to GPT-4o for all paid users by default. Other legacy models can be enabled by switching on “Show additional models” within user settings. They also claim they will be working to make GPT-5 warmer than it’s current iteration but less so than GPT-4o, and have learnt from this release the importance of more personality customisation options within future models.

The release of GPT-5 did come with the option, currently within research preview, to assign a personality to your interactions with the system. The default is described as being “Cheerful and adaptive” with 4 alternative options being offered: Cynic – critical and sarcastic, Robot – efficient and blunt, Listener – thoughtful and supportive and Nerd – exploratory and enthusiastic. If you opt for this setting, it will apply to all new chats you open within ChatGPT.

GPT-5 marks a step forward for OpenAI’s language models, introducing smarter model switching, improved reasoning capabilities, and lower latency. Yet, it has sparked debate by retiring much-loved legacy models and altering the tone many users valued. While some celebrate its intelligence and reduced hallucinations, others miss the warmth and creativity of its predecessors. As the technology matures and people become accustomed to its differences, GPT-5’s significance in the AI landscape will become clearer. For now, it stands as both an interesting progression in capability and an insight into how users value consistency in LLMs.

Find out more by visiting our Artificial Intelligence page to view publications and resources, join us for events and discover what AI has to offer through our range of interactive online demos.

For regular updates from the team sign up to our mailing list.

Get in touch with the team directly at AI@jisc.ac.uk

How is GPT-5 different?

User First Impressions

Leave a Reply Cancel reply