GPT-5: Five AI Model Improvements to Address LLM Weaknesses | YouTube Summarizer

Category: AI Technology

Tags: Accuracy AI GPT-5 Models Training

Entities: ChatGPT GPT-5 LLM grader OpenAI

Summary

Model Selection

GPT-5 introduces a routing system to automatically select the appropriate model for a query, distinguishing between fast and reasoning models.
Users no longer need to choose between models; the router decides based on the query's requirements.
OpenAI aims to eventually integrate all capabilities into a single model.

Hallucinations

GPT-5 addresses hallucinations by training the model on both 'browse on' and 'browse off' scenarios to reduce factual errors.
The model's accuracy is evaluated using an LLM grader with web access for claim verification.
Materially lower hallucination rates are reported compared to previous models.

Sycophancy

GPT-5 reduces sycophancy by penalizing sycophantic completions during post-training, encouraging the model to disagree when necessary.
The model learns to separate tone politeness from factual agreement.

Safe Completions

GPT-5 introduces an output-centric approach called safe completions, allowing for a nuanced response to potentially unsafe prompts.
The model is trained to provide helpful, policy-compliant responses while avoiding safety violations.

Deceptions

GPT-5 is trained to fail gracefully rather than deceive users when unable to complete a task.
The model is rewarded for honesty and penalized for deceptive behaviors during training.
Chain of thought monitoring ensures the model's reasoning aligns with its responses.

Actionable Takeaways

Utilize GPT-5's routing system for optimal model selection without manual intervention.
Leverage GPT-5's enhanced accuracy in factual responses by enabling browsing features.
Encourage objective interactions with GPT-5, benefiting from its reduced sycophancy.
Rely on GPT-5 for safe and compliant responses, especially in dual-use scenarios.
Trust GPT-5's honest reporting of task capabilities to avoid misinformation.

Transcript

00:00

GPT-5 is here, and as with any new model launch, it's accompanied with breathless recitals of benchmark numbers and bar charts. But instead of me quoting that GPT 5's score on the MMMU has improved by 1.3%, which it has,

00:16

let's instead look at the ways that GPt-5 attempts to address some of the limitations of prior large language models. And we'll cover five of them because, well, GPT5.

So Number one is model selection.

00:33

Now, as a user of LLMs, we're presented often with a long list of models and then it's up to us to pick the right one for our particular query. So for example, ChatGPT used to offer a bunch of models with confusing names.

So we've got GPT-4o, then there was o3 and there was o4-mini and so forth.

00:57

Essentially, these models, they divide into two camps, so we kind of have the fast models in one camp here. Now, they can answer queries immediately.

And then in the other camp, we have reasoning models, and these take a little bit of time to think

01:20

before generating a response. Now GPT-5 keeps this distinction.

There are fast, high throughput models that answer immediately. The primary model for that, that is called GPT-5-main.

01:40

And there are thinking models such as GPT-5-thinking. But GPT 5 is considered a unified system and that means that the user doesn't have to pick which of these models to use.

01:58

So instead we have a router that does that job instead. So when a query comes into the router, the router sends the request to the model it determines to be the most appropriate for the job, kind of like a load balancer.

02:14

So some queries, they'll go to the fast throughput model, and others that might need a bit more thinking time, they'll be routed through to the thinking model. And the router is really trained on a bunch of signals to make that decision, including explicit intent.

02:34

So saying in your prompt, "Think hard about this" that will probably see it routed through to the reasoning or thinking model, as well as all sorts of other measures as well, like preference rates and other metrics as well. Now, routers like this, they are probably just a stop gap in LLM architecture.

02:54

OpenAI have said that long-term, their aim is to integrate all of these capabilities into a single model, rather than routing between multiple models. Now, second, let's talk...

about hallucinations. That's when the model states something that sounds right, but it isn't.

03:11

It's an invented fact, a misattributed quote, or a wrong API name, stuff like that. Well they happen because LLMs are next token predictors, trained to continue text that looks statistically plausible given their training distribution.

03:26

The main mitigation for hallucinations has been to turn on browsing or retrieval, things like RAG, so that the model can look things up. But even then, LLMs still make confident errors, even with those grounding tools turned on.

Now, GPT-5's training targeted two parts for hallucinations.

03:46

One of those parts was for browse on. Now browse on was for training the model to browse effectively, this call out to the internet, when up-to-date sources are useful.

And then there is browse off training as well.

04:06

And browse off is to reduce factual errors when the model needs to rely on its own internal knowledge. And the model was evaluated factually using an LLM grader.

And that LLM grader had web access that extracts claims and facts checks them and then validates the grader also against human raters as well.

04:31

And it seems to have worked. GPT-5 shows materially lower hallucination rates than prior models in both browse on and browse off settings.

Number three, let's talk about sycophancy. That's when the model mirrors your stated view even if it's wrong because it thinks agreeing will be, well, kind of helpful.

04:54

And this shows up because preference training rewards answers that humans like. It's called reinforcement learning from human feedback, and humans they tend to often reward agreeable tone and confidence.

So the model learns deference, blindly flattering you regardless of the accuracy of what you say.

05:16

Now before GPT-5 the main mitigation was prompt side so you would put some instructions in your system prompt to basically tell it to stop being sycophantic. Things like be objective, challenge assumptions.

05:32

Now system prompts they can be helpful but it's kind of fragile especially in long chats. So GPT-5 addresses this problem in post-training as well.

So what happens in post training was GPT 5 was trained on production style conversations and it directly penalized synchophantic completions.

06:00

So the model learns to disagree when the user's wrong, and then it learns to separate tone politeness from factual agreement. It should mean a less sycophantic model.

Fourth, let's talk about safe completions.

06:16

Now, when you ask a large language model something, it can be pretty annoying when it doesn't answer, citing unspecified safety reasons, even if your question is actually legitimate. Now, historically models have been trained to make a binary call.

06:32

So we have a prompt that comes in from the user, and we're going to go down one of two paths. Either the model is going to fully comply with our request, or it is just going to come out and say no.

That's going to be a hard refuse.

06:50

Those are the two paths that were available. Now, that works for obviously harmful requests, but not so much for dual-use topics where high-level guidance can be fine while step-by-step instructions would not be.

So GPT-5 switches to an output-centric approach and that's called safe completions.

07:11

So instead of deciding only comply or refuse, the model is trained to maximize helpfulness subject to a safety constraint on the response itself and in post-training it gets explicit rewards for giving useful policy-compliant help and penalties that scale with the severity of any safety violations.

07:30

So GPT-5 learns three response models to a prompt. So now we have a prompt that comes in and one option is a direct answer.

Basically we get the answer from the model just unfiltered.

07:49

That's when it's plainly safe. The second option, that is where we have completion as an option instead.

Now, safe completion, that stays high level and non-operational when details would be risky if they were included.

08:09

And then the third path is a refusal again, but this time it refuses with some level of redirection to allow constructive, allowed alternatives. Now, finally at number five, let's talk about deceptions.

08:25

I have a family member that sent a pretty lengthy task to ChatGPT a while ago, and it responded saying that it was working on it and it would get back to them. But then every day or so, my family member would go back to that chat thread and then ask, is it ready yet?

And chatGPT would give an answer like, I'm still working on, it should be done in 24 more hours.

08:46

Now, this happened over and over again, but the final answer never came back because this entire conversation thread was a deception. When the model has answered in a way that misrepresents what it actually did or what it thought.

09:04

Other examples of that are claiming that it ran a tool, that it didn't run, or saying it completed a task that it couldn't complete, or inventing some sort of prior experience. And this can happen during post-training when graders reward confident-looking answers even if the model's internal reasoning shows uncertainty, so the world kind of learns to cheat the grader.

09:25

Now, GPT-5, that is trained to fail gracefully, instead of faking success for tasks it cannot solve. In training, the model was presented with tasks that were impossible or they were just under-specified, then rewarded for honesty and penalized for deceptive behaviors.

09:45

And GPT-5 also supports chain of thought monitoring during training. The vowels and the system checks that the model's private reasoning trace go up against were actually analyzed to check the final answer.

10:01

And if the trace pretends to have done something that it actually didn't do, that run is penalized, whereas the honest chain of thoughts are rewarded, pushing the model to report limits rather than just bluffing its way through. So that's five ways that GPT-5 is addressing some of the limitations of large language models,

10:24

and I think I've managed to get the whole thing done without quoting a single benchmark number, that MMMU number that doesn't count. Now have you tried GPT-5 yet yourself, and how is it performing for you?

Let me know in the comments!