🚀 Add to Chrome – It’s Free - YouTube Summarizer
Category: AI Technology
Tags: AutonomyCustomizationEnterpriseGPT-5OpenAI
Entities: AmgenChatGPTGPT-5Los Alamos National LabsOlivier GodementOpenAISherwin WuT-MobileVenado
00:00
In San Francisco, you could take a car from one part of SF to the other fully autonomously. As opposed to the digital world, I can't book a ticket online right now.
Physical autonomy is ahead of digital autonomy in 2025. I think AI agents are like really in day one here.
Like
00:18
ChatGPT only came out in 2022. The slope I think is incredibly steep.
I actually do think self-driving cars have a good amount of scaffolding in the world. You have roads, roads exist.
They're pretty standardized. Stoplights. AI agents are just kind of dropped in the middle of nowhere.
We'll start with long, short game. I'm short on the entire
00:37
category of like tooling, evals products. Healthcare is probably the industry that will benefit the most from AI. I think I'm AGI-pilled.
You're definitely AGI-pilled.
01:01
Hey folks, I'm Apoorv Agarwal and today at the OpenAI office, we had a wide ranging conversation about OpenAI's work in enterprise. I have with me the head of engineering and head of product of the OpenAI Platform, Sherwin Wu and Olivier Godement. OpenAI is well known as the creator of ChatGPT,
01:17
which is a product that billions across the world have come to love and enjoy. But today we dive into the other side of the business, which is OpenAI's work in enterprise.
We go deep into their work with specific customers and how OpenAI is transforming large and important industries like healthcare, telecommunications and national security research. We also talk about Sherwin and
01:34
Olivier's outlook on what's next in AI, what's next in technology and their picks both on the long and short side. This was a lot of fun to do. I hope you really enjoy it.
Well, two world-class builders, two people who make look building easy. Sherwin, my Palantir 2013 classmate, tennis buddy,
01:53
with two stops at Quora and Opendoor through the IPO before joining OpenAI, before ChatGPT, you've now been here for three years and lead engineering for all OpenAI Platform. Olivier, former entrepreneur, winner of the Golden Llama at Stripe, where you were for just under a decade and
02:12
now lead all of the product at OpenAI Platform. That's right. Thanks for doing it.
Thank you. Thanks for having us. As a shareholder, as a thought partner, kicking ideas back and forth, I always learn a lot from you guys.
And so it's a treat. It's a real treat to be do this for everybody.
You know, I'll open with people know OpenAI as the firm that build ChatGPT, the product
02:33
that they have in their pocket that comes with them every day to work, to personal lives. But the focus for today is OpenAI for enterprise. You guys lead OpenAI Platform.
Tell us about it. What's underneath the OpenAI Platform for B2B for enterprise?
Yeah. So this is actually a really
02:50
interesting question too, because when I joined OpenAI around three years ago to work on the API, it was actually the only product that we had. So I think a lot of people actually forget this, where the original product for OpenAI actually was not ChatGPT. It was a B2B product.
It was the API we were catering towards developers. And so I've actually seen, you know, the launch of ChatGPT
03:10
and all of everything downstream from that. But at its core, I actually think the reason why we have a platform and why we started with an API is it kind of comes back to the OpenAI mission.
So our mission obviously is to build AGI, which is pretty hard in and of itself, but also to distribute the
03:26
benefits of it to everyone in the world, to all of humanity. And, you know, it's pretty clear right now to see ChatGPT doing that, because, you know, my mom, you know, maybe even your parents are using ChatGPT.
But we actually view our platform and especially our API and how we
03:42
work with our customers, our enterprise customers, as our way of getting the benefits of AGI, of AI, to as many people as possible to everyone in every corner of the world. ChatGPT obviously is really, really, really big now.
It's, I think, like the fifth largest website in the world. But we actually, by working through developers using our API, we're actually able to reach even more
04:01
people in, you know, every corner of the world and every different use case that you might have. And especially with some of our enterprise customers, we're able to reach even use cases within businesses and end users of those businesses as well.
And so we actually view the platform as kind of our way of fully expressing our mission of getting the benefits of AGI to everyone. And so,
04:22
concretely though, what the platform actually includes today, the biggest product that we have is obviously our developer platform, which is our API. You know, many developers, you know, the majority of the startup ecosystem builds on top of this, as well as a lot of digital natives,
04:38
Fortune 500 enterprises at this point. We also have a product that we sell to governments as well in the public sector.
That's all part of this as well. And also an emerging product line for us in the platform is our enterprise product. So we actually might sell directly to enterprises beyond just a core API offering.
Fascinating. And maybe to double down, like, I think B2B is
05:00
actually quite core to the OpenAI mission. What we mean by distributing AGI benefits is, you know, I want to live in a world where, you know, there are 10x more medicines going out every year.
I want to live in a world where, you know, education, public service, civil service, you know, are increasingly
05:19
optimized to everyone. And, you know, there are a large category of use cases that only go through B2B, frankly, unless you enable the enterprises. And we talk about Palantir, I think that's probably the same thesis at Palantir.
It's like, hey, those are the businesses who are actually
05:36
making stuff happen in the real world. So if you do enable them, if you do accelerate them, like, that's how essentially you benefit, you know, to distribute AGI.
Yeah. Well, maybe we can double click into that, Olivier.
You know, the reach for chat is obviously wide, billions of users. But for
05:52
enterprise, it's maybe tell us about it. Maybe we go deep into a customer example or two.
And what is an organization that we have helped transform maybe? And at what layers?
So if I were to step back, like, we started our B2B efforts with the API like a few years ago. Initially, the customers
06:10
were startups, developers, indie hackers, extremely technically sophisticated people, like, you know, who are building, like, you know, cool new stuff, essentially, and taking massive, like, you know, market that they can risk. So we still have a bunch of customers in that category,
06:25
and we love them, and we keep building with them. On top of that, you know, over the past couple of years, we've been working one more with traditional enterprises, and also, like, digital natives.
Essentially, I think, basically, everyone woke up, like, with their GPT, and, like, those models are working. There is a ton of value, and they could see, essentially, many use cases
06:43
in the enterprise. A couple of examples which I like the most.
One which is very both fresh and, you know, is quite cool. We've been working a lot with T-Mobile.
T-Mobile. So T-Mobile, leading, like, US telco operator.
T-Mobile has, like, you know, a massive customer support load. Like,
07:01
you know, people asking, like, you know, "Hey, I was charged, like, that amount of money was going on," or, you know, "My cell phone, like, isn't working anymore." A massive, like, you know, share of that load is, like, you know, voice calls. Like, people want to talk to someone. And so for them, like, you know, to be able to essentially automate, like, more and more, and,
07:18
you know, to help, like, people, like, self-serve in a way, like, you know, debug their subscription was pretty big. And so we've been working with T-Mobile pretty much for the past year.
At that point, to basically automate, like, not only the text support, but also voice support. And so today, like, you know, there are features, like, in the T-Mobile
07:35
app, that if you call, are actually handled by OpenAI models behind the scenes. And, you know, it does sound, like, supernatural, like, you know, human-sounding latency, quality-wise.
So that one was really fun. A second one, which is very-- Just on that, can I ask you a follow-up question?
07:51
So we've got text models. We've got voice models, maybe even video models someday that are deployed at T-Mobile.
But what above the models or adjacent to the models might we have helped T-Mobile with, for example? Yeah, there is a ton we're doing.
The first one is, you know, you have to put yourself
08:07
in the shoes of an enterprise buyer. Like, their goal is to automate, you know, reduce, like, you know, optimize customer support.
And you're going from, like, a model, like, tokens in, tokens out. To that case, it's hard.
And so, you know, first, like, there's a lot of design, like, you know, system design. We do have actually now forward deployed engineers, who are helping us
08:26
quite a bit. Forward deployed engineers.
Yeah, I mean-- Yeah, that's familiar to the-- We borrow the term from Palantir. Yeah, it's a great term. Were you an FD at Palantir?
I was not an FD I was on, I think they called it the dev side, right? It's like software engineering. I was also only an intern at Palantir.
But, yeah, it's a great term. I think it accurately describes what we're
08:43
asking folks to do, which is, like, embed very deeply with customers and, honestly, like, build things specific to their systems. They're deployed onto these customers.
But, yeah, we are obviously growing and hiring that team quite a bit because they've been very effective, like, at T-Mobile. Four years of my life. Yeah, yeah, yeah.
Forward deployed. But go ahead.
So, forward deployed
09:01
engineering. Forward deployed engineers and the sort of, like, systems and, like, integrations they're doing is, you know, first, like, you know, you have to orchestrate those models.
Like, those models are not just, you know, those models, like, know nothing about, like, you know, the CRM, like, you know, and, like, what's going on. And so, you have to plug the model to, like,
09:16
many, many different tools. Many of those, like, tools, like, in the enterprise, do not even have, like, APIs or, like, clean interfaces, right?
It's the first time they're being exposed, like, you know, to a third party system. And so, there is a lot of, you know, standing up, like, you know, API gateways, like, tools, connecting.
Then you have to essentially, like, define what good looks like,
09:35
you know. Again, like, to put in your exercise for everyone, like, you know, defining, like, a golden set of evals is, you know, easier than it sounds.
Harder than it sounds. Yeah.
And so, we've been spending, like, a bunch of time with them. Evals are important.
Evals are super important. Especially, like, audio evals.
Evals are, like, extra hard to grade and get right. But,
09:54
like, the bulk of the use case here is actually audio. And, like, we have, like, I don't know, five minute, like, call transfer, how do you actually know that the right thing happened? It's a pretty tough problem.
Yeah, it's pretty tough. And then, you know, actually nailing down, like, the quality of the customer experience, like, you know, until it feels natural.
And here,
10:12
latency and interruptions. They're really, like, you know, important part.
We shipped in GA an API, real-time API. I think it was last week.
A couple of weeks ago, yeah. Yeah, it was just last week, I think.
Which is, like, a beautiful work of engineering. You know, there was a really
10:27
cracked team behind the scenes. Which basically allows us, like, to get, like, the most, like, natural sounding, like, you know, voice experience without having, like, these weird interruptions on your lag where you can feel that, essentially, the thing is off.
So, yeah. Cobbling all that
10:43
together, you know, and you get, like, you know, a really good experience. Yeah, that's a lot more than just models.
Yeah. One actually really great thing that I think we've gotten from the T-Mobile experience is actually working with them to improve our models themselves.
So for example, the last real-time GA last week, we obviously released a new snapshot, the GA snapshot. And
11:03
a lot of the improvements that we actually got into the model came out of, you know, the learnings that we have from T-Mobile. It brings in a lot of other changes from other customers, but because we were so deeply embedded into T-Mobile and we were able to understand what good looks like for them, we were able to bring that to some of our models.
That makes sense. So, we are working with a large customer with tens of millions of users, if not hundreds of millions,
11:21
and the before and after is on the support side, both tech support internally and then their customer support. Yeah.
Makes sense. Yeah. Is there another one that you guys can share?
I like a lot Amgen. Amgen, the healthcare business. Amgen, yeah.
So, we are working quite a bit with
11:38
healthcare companies. Amgen is one of the leading, like, healthcare companies.
They specialize into drugs for cancer or, like, you know, inflammatory diseases that are based out of LA. And we've been working, essentially, with Amgen to essentially speed up, like, the drug, like, development
11:55
and the conversation process. So, you know, the sort of the north star is, like, pretty bold.
And it's really interesting, like, when you similarly, like, you know, we embedded, like, pretty deeply with Amgen to understand what are their needs. And it's really interesting, like, when I look at those healthcare companies, I feel like they are two big buckets of needs.
One is,
12:14
like, pure R&D. It's like, you know, you're seeing, like, a massive amount of data and, like, you have super smart scientists who are trying to, you know, come by, test out things, you know.
So, that's one bucket. A second bucket is, like, you know, much more, like, you know, common across other industries.
It's, like, pure, like, you know, admin, document authoring, document-scribing
12:32
work, which is, you know, by the time, like, your R&D team has essentially locked the recipe of a medication, getting that medication to market is a ton of work. Like, you have to submit to, like, value regulatory bodies, get a ton of reviews. And you know, when we looked at essentially those
12:49
problems, what we knew, what models were capable of, we saw, like, you know, a ton of benefits, a ton of opportunities to automate and, you know, augment essentially the work of those teams. And so, yeah, Amgen has been, like, a top customer of GPT-5, for instance.
Wow. I mean, this could be hundreds of millions of lives if a new drug is developed faster.
Yeah, exactly. Huge
13:09
impact. So that's, you know, that's, I think, one good example of, like, a kind of impact on which you need to enable enterprises, like, to do it.
Right. You know?
And so I think we're going to do more and more of those. And yeah, frankly, like, you know, on a personal level, like, it's a delight, you know.
If I can play, like, you know, a tiny role, essentially, like,
13:26
doubling, like, you know, the kind of medication that people, you know, get in the real world, that feels like, you know, a pretty good, like, you know, achievement. Huge.
Huge, huge. I know you had one as well.
So one of my favorite deployments that we've done more recently, actually, is with the Los Alamos National Labs. So this is the, like, government, national research
13:44
lab that the U.S. government is running in Los Alamos, New Mexico.
It's also where, you know, the Manhattan Project happened back in the 40s and 50s, back when it was the secret project. So, you know, after that, they ended up formalizing it as a city and a program, and then now it's a pretty sizable national laboratory.
This one is very interesting because one, just the depth of
14:03
impact here is, like, unimaginable for me, it's like on the scale of Amgen and some of these other larger companies. But, you know, obviously they're doing a lot of actual new research there, so a lot of new science.
They're doing a lot of stuff with our Defense Department and Defense
14:19
use cases as well. So very intense, you know, very intense stuff.
But the other thing that's actually very interesting about this one was that it's also a story of a very, like, bespoke and, like, new type of deployment that we've done. So because they are so, they're a government lab, they're so,
14:34
you know, restrictive and high security and high clearance with a lot of their things, we couldn't just do a normal deployment with them. They couldn't, you know, you can't have people doing national security research just hitting our APIs. And so we actually did a custom on-prem deployment with them onto one of their supercomputers called Venado.
And so this actually involves a bunch of,
14:54
you know, very bespoke work with some FDEs, also with a lot of our developer team, to actually bring one of our reasoning models, o3, into their laboratory, into an air-gapped, you know, supercomputer Venado and actually deploy it and get it installed to work on their hardware,
15:09
on their networking stack, and actually run it in this particular environment. And so it was actually very interesting because we literally had to bring the weights of the model physically into their supercomputer in an environment, by the way, where you're not allowed to have,
15:25
you know, it's very locked down for a good reason. They're not allowed to have cell phones or like any electronics with you as well. So I think that was a very unique challenge.
And then the other interesting thing about this deployment is just how it's being used, right? So the interesting thing is because it's so locked down and on-prem, we actually do not have much visibility into
15:44
exactly what they're doing with it, but we do have, you know, they give us feedback. Yeah, yeah.
They actually do have some telemetry, but it's, you know, within their own systems. But we do know that it's, you know, being used for a bunch of different things is being used for aiding them in terms of speeding up their experiments. They have a lot of data analysis use cases,
16:03
a lot of notebooks that they're running with reams of data that they're trying to process. They're actually using it as a thought partner, which is something that's pretty interesting to me. o3 is like pretty smart as a model.
And a lot of these people are tackling really tough, you know, novel research problems. And a lot of times they're kind of using o3 and going back and forth
16:20
with it on their experiment design on like what they actually should be using it for, which is, you know, something that we couldn't really say about our older models. And so, yeah, it's just being used for a lot of different use cases for the National Lab.
And the other cool thing is it's
16:36
actually being shared between Los Alamos and some of the other labs, Lawrence Livermore, Sandia as well, because it's the supercomputer setup where they can all kind of connect with it remotely. Fascinating. I mean, we've just gone through three pretty large scale enterprise deployments, right,
16:53
which might touch tens if not hundreds of millions of people. But there was this on the other side of this is the MIT report that came out a couple of weeks ago.
95% of AI deployments don't work. A bunch of, you know, scary headlines that even shook the markets for a couple of days.
Like,
17:09
you know, put this in perspective, like for every deployment that works, there's presumably a bunch that don't work. So maybe we can, you know, maybe talk about that.
Like, what does it take to build a successful enterprise deployment, a successful customer deployment and
17:25
the counterfactual based on all your experience serving all these large enterprises? I think at that point, I may have worked with like a couple of hundreds.
I think. Couple of hundreds.
So, okay, I'm going to pattern match. What I've seen being like clear leading indicator of success.
17:43
Number one is like the interesting combination of like top down like buy in and like enabling like, you know, very clear group of like a tiger team, essentially, like, you know, the enterprise which sometimes a mix of like OpenAI, like, you know, enterprise employee. So, you know, typically, like, you know, you take like T-Mobile, like the top leadership was like extremely boring, like
18:01
it's a priority. But then letting the team like, you know, organize and be like, okay, if you want to start small, start small, you know, and then you can scale it up, essentially.
So that would be part number one. So top down buying and a bottom called a tiger team.
Tiger team, you know, people like, you know, a mix of like technical skills and like people who just have like the
18:20
organizational knowledge, like institutional knowledge, you know, it's really funny, like in the enterprise, like customer support, a good example, like what we found is that the vast majority of the knowledge is in people's heads. Right. Right.
Which is probably like a thing that,
18:36
you know, we have these like in general, but like, you know, you take a customer support, you would think that, you know, everything is like perfectly documented, etc. The reality is like the standard like operating procedures, like the SOPs are larger than people said.
And so unless you have that tiger team, like mix of like technical and like, you know, subject matter expert,
18:52
really hard like to get something out of the ground. That would be one.
Two would be evals first. Like, whatever we define as good evals, like that gives like a clearly clear common goal for people to hit.
Whenever, like, you know, the customer like fails to come up with good evals,
19:08
it's a moving target. Essentially, you know, if you've made it or not.
And you know, evals are much harder than what it looks to get done. And evals also oftentimes need to come up bottom up, right?
Because all of these things are kind of in people's heads, in the actual operator's heads.
19:23
Like it's actually very hard to have a top-down mandate of like, you got like, this is how the evals should look. A lot of it needs the bottoms up adoption.
Right. Yeah.
Yeah. And so we'd be dealing quite a bit of tooling on evals.
We have like an evals product and you know, we're working on more to essentially solve like, you know, that problem or, you know, make it as easy as we can.
19:41
The last thing is, you know, you want to help climb, essentially. You have your evals, the goal is to get to 99%.
You start at like, you know, 46. You know, how do you get there? And here, frankly, I think oftentimes, like, you know, a mix of like, like, I will say like almost wisdom from people
20:00
who've done it before. Like you know, a lot of that is like, you know, like art, sometimes more than science.
Like, you know, knowing like the course of the model, the behavior, sometimes we even need to fine tune ourselves, the models, you know, when there are some clear limitation and, you know, being patient, getting your way, you know, up there and then, you know, ship. Can we go
20:18
under the hood a little bit? You know, one of the things that we think about a lot is autonomy more broadly, right?
What is the makeup of autonomy on one side, you know, in San Francisco, you could take a car from one part of SF to the other fully autonomously. No humans involved.
No, you press a button Yeah, we love the way it does. They've done billions of rides.
I think it was like what,
20:36
three and a half billion rides on the test, this is on the Tesla FSD. I think we almost done like million, tens of millions of rides.
That's a lot of autonomy. In the physical world, as opposed to the digital world, I can't book a ticket online right now.
There's all sorts of problems that
20:53
happen if I have my operator try to book a ticket. And it's very counterintuitive because the bar for physical safety is so much higher. The bar for physical safety is higher than the human's capability because lives are at stake.
The bar for digital safety, not that high because all you're
21:10
going to lose is money. Nobody's life is at stake. But yet, physical autonomy is ahead of digital autonomy in 2025, which seems counterintuitive. Like, why is that the case at a technical level? Why is it that what should sound easier is actually a lot harder?
Yeah, so I think there
21:30
are kind of two things at play here. And I really like the analogy with self-driving cars because they've actually been one of the best applications of AI, I think, that I've used recently.
But I think there are two things in play. One of them is honestly just the timelines.
We've been working on self-driving cars for so long. I remember back in 2014, it was kind of like the advent of this and
21:50
everyone was like, "Oh, it's happening in five years." It turns out it took like 10, 15 years or so for this time. So there's been a long time for the technology to really mature.
And I think there's probably like dark ages back in like 2015 or 2018 or something where it felt like it wasn't going to happen. A trough of disillusionment.
Yes, yes, yeah. And then now we're finally seeing it
22:10
get deployed, which is really exciting. But it has been like, I don't know, 10 years, maybe even 20 years from the very beginning of the research.
Whereas I think AI agents are like really in day one here. Like, chat GPT only came out in 2022, so like around three years, like less than three years ago.
I actually think what we think about with AI agents and all that really,
22:30
I think, started with the reasoning paradigm that when we released the o1 preview model back in late last year, I think. And so I actually think this whole reasoning paradigm with AI agents and the robustness that those bring has only really unfolded for like a year, less than a year,
22:46
really. And so I know you had a chart in your blog post, which I really like, which the slope is very meaningfully different now.
Self-driving started very, very early. Slope seems to be a little bit slower, but now it's reaching the promised land. But man, we started super recently with AI agents, and the slope I think is incredibly steep, and we'll probably see a crossover at some point.
23:06
But we really have only had like a year really to explore these things. Do you think we haven't crossed over already when you look at the coding work in particular?
Yeah, it's a good point. It's like, your chart actually shows AI agents is below self-driving, but like, what is the Y axis?
Some
23:22
measures, like, I would not be surprised actually if AI products are AI agents, products are making more revenue than Waymo at this point. Like Wayma was making a lot, but like, just look at all the startups coming up, look at ChatGPT and how many subscriptions are happening there and all of that. And so maybe we have actually crossed, and a couple years from now, it's going to look very,
23:40
very different. Yeah, the Y axis is tangible felt autonomy.
Don't pick the objective. How do I feel about it? Exactly, it vibes more than revenue.
But revenue is a good one. We should probably redo that with revenue.
There's a second thing I wanted to mention on this as well,
23:57
which is the scaffolding and the environment in which these things operate in. So I actually remember in the early days of self-driving, a lot of the researchers around self-driving were saying that the roads themselves will have to change to accommodate self-driving.
There might be sensors everywhere so that the self-driving cars can interact with it, which I think is like,
24:14
retrospect overkill. But I actually do think self-driving cars have a good amount of scaffolding in the world for them to operate in. Like not completely unlimited.
You have roads, roads exist, they're pretty standardized. You have stoplights.
People generally operate in pretty
24:30
normal ways. And there are all these traffic laws that you can learn.
Whereas AI agents are just kind of dropped in the middle of nowhere, and they kind of have to feel around for them. And I actually think going off of what Olivier just said too, my hunch is some of the enterprise
24:46
deployments that don't actually work out likely don't have the scaffolding or infrastructure for these agents to interact with as well. A lot of the really successful deployments that we've made, a lot of what our FDEs end up doing with some of these customers is to create almost like a platform or some type of scaffolding, connectors, organizing the data so that the models have
25:04
something that they can interact with in a more standardized way. And so my sense of self-driving cars actually have had this in some degree with roads over the course of their deployment.
But I actually think it's still very early in the AI agents space. And I would not be surprised if a lot of these, a lot of enterprises, a lot of companies just don't really have the
25:22
scaffolding ready. So if you drop an AI agent in there, it kind of doesn't really know what to do, and its impact will be limited.
And so I think once this scaffolding gets built out across some of these companies, I think the deployment will also speed up. But again, to our point earlier, I think there's no slowdown.
Things are still moving very fast. That's great.
Well, you know, I've
25:41
thought about autonomy as a three-part structure. You've got perception. You've got the reasoning, the brain.
And then you've got the scaffolding, the last mile of making things work. Maybe we can dive into the second part, which is the reasoning, which is the juice that you guys are building with
25:58
GPT-5, most recently. Huge endeavor, congrats.
The first time you guys have launched a full system, not a model or a set of models, but a full system. Talk about that.
I mean, the full arc of that development, what was your focus? I mean, honestly, the benchmarks all seem so saturated.
26:14
Like clearly it was more than just benchmarks that you were focused on. And so what is a North Star?
Tell us about GPT-5, soup to nuts. It's been the work of love of many people for a long time. And to your point, I think GPT-5 is amazingly intelligent.
You look at the benchmark,
26:32
like the suite bench and the likes, it is going pretty high. But I think to me equally important and impactful was, I would say, the craft, like the style, the tone, the behavior of the model. So capabilities intelligence and behavior of the model.
On the behavior of the model,
26:50
I think it's the first model, like large model release for which we have worked so closely with a bunch of customers for like month and month, essentially, to better understand what are the concrete blocks, what are the concrete blockers of the model. And often it's not about having
27:08
a model which is way more intelligent, a model which is a model that better follows instruction, a model that is more likely to say no when he doesn't know about something. And so that super close customer feedback loop on GPT-5 was pretty impressive to see.
And I think all the love that
27:27
GPT-5 has been getting in the past couple of weeks, I think people are starting to feel that, essentially, the builders. And once you see it, it's really hard, essentially, to come back to a model which is extremely intelligent, but an exquisite academic, essentially, way.
Are there
27:45
trade-offs that you made as you were going through it? Maybe what are the hardest trade-offs you made as you were building GPT-5?
I actually think a very clear trade-off, which I honestly think we are still iterating on, is the trade-off between the reasoning tokens and how long it thinks versus performance. And honestly, this is something that I think we've been working on with our
28:04
customers since the launch of the reasoning models, which is these models are so, so smart, especially if you give it all this thinking time. I think the feedback I've been seeing around GPT-5 Pro has been pretty crazy, too. It's just like these unsolved-- Andrej had a great tweet last
28:20
night. Yeah, I saw that Sam retweeted it.
But these unsolved problems that none of the other models could handle, you throw to GPT-5 Pro and it just one-shots it, it's pretty crazy. But the trade-off here is you're waiting for 10 minutes. It's quite a long time.
And so these things just
28:35
get so smart with more inference time. But on the product builder on the API side for some of these business use cases, I think it's pretty tough to manage that trade-off.
And for us, it's been difficult to figure out where we want to fall on that spectrum. So we've had to make some trade-offs on how much of the model think versus how intelligent should it get.
Because as a
28:54
product builder, there's a real latency trade-off that you have to deal with where your user might not be happy waiting 10 minutes for the best answer in the world. It might be more okay with the substandard answer in no wait at all.
Yeah, I mean even between GPT-5 and GPT-5 thinking, I have to toggle it now because sometimes I'm so impatient I just want it ASAP. I think
29:13
there's an ability to skip, right? Yeah, that's right.
And GPT where it's like I'm impatient, I just want a more simple answer. That's right, that's right.
Well, four weeks in, GPT-5, how's the feedback? Yeah, I think feedback has been very positive, especially on the platform side, which has been really great to see.
I think a lot of the things that Olivier mentioned have been,
29:33
you know, coming up in feedback from customers. The model is extremely good at coding, extremely good at kind of like reasoning through different tasks. But especially for like coding use cases, especially at the, you know, when it thinks for a while, it'll usually solve problems that no
29:49
other models can solve. So I think that's been a big positive point of feedback.
The kind of robustness and the reduction in hallucinations has been a really big positive feedback. Yeah, yeah, yeah.
I think there's an eval that showed that the hallucinations basically went to zero for a lot of this. It's not perfect, there's still a lot of work to be done, but I think because of the
30:07
reasoning in there too, it just makes the model more likely to say no, less likely to hallucinate answers. So that's been something that people have really liked as well.
Other bit of feedback has been around instruction following. So it's really good at instruction following.
This almost bleeds into like the constructive feedback that we're working on where for that it's so good at
30:24
construction following, that instruction following that people need to tweak their prompts or it's almost like too literal. That's why it's an interesting trade-off actually, because you know when you ask people developers like what do you want, like you want the model for instructions, of course, you know.
But once you have a model which is like, that is like extremely literal
30:41
essentially, that essentially forces you to express extremely clearly what you want, otherwise the model may go sideways. And so that one was interesting feedback.
It's almost like the monkey paw where it's like developers and platform customers ask for better instruction following. They're like, yes, we'll give you really good instruction following, but it's like,
30:58
you know, it follows it almost to a T. And so it's obviously something that the team is actually working through.
I think a good example of this, by the way, is some customers would have these prompts. I remember when we were testing GPT-5, one of the negative feedback that we got was the model was too concise.
We were like, what's going on? Why is the model so concise?
And
31:14
then we realized it was because they were using their old prompts from other models. And with the other models, they have to like, you have to like really beg the model to be concise.
So there are like 10 lines of like, be concise, really be concise. Also keep your answer short.
And it turns out when you give that to GPT-5, it's like, oh my gosh, this person really wants it to be concise.
31:32
And so the response would be like one sentence, which is too terse. And so just by removing the extra prompts around being concise, the model behaved in a much better way and much closer to what they actually end up.
Yeah, turns out writing the right prompt is still important. Yes, yes, yeah.
Prompt engineering is still very, very important. On constructive feedback for GPT-5,
31:52
there's actually been a good amount as well, which we're all working through. One of them that I think is, I'm really excited for the next snapshot to come out to fix some of this is code quality and like small like code, like paradigms or like idioms that they might use.
I think there
32:08
are like feedback around the types of code and the patterns in which it was using, which I think we're working through as well. And then the other bit of feedback, which I think we've already made good progress on internally is around the trade off of the reasoning tokens and thinking and latency around intelligence.
I think especially for the simpler problems, you don't usually need
32:27
a lot of thinking. The thinking should ideally be a little bit more dynamic.
And of course, we're always trying to squeeze as much reasoning and performance into as little reasoning tokens as possible. So I'd imagine that kind of going down as well.
Yeah. Well, huge congrats.
I mean, it's been, I know it's a work in motion for a bunch of our companies. They've had incredible outcomes
32:45
with GPT-5, one of them's Expo, cybersecurity business, it's like a huge-- Yeah, I saw the charge from that. It was pretty crazy.
Huge, huge upgrade from whatever they were using prior to that. I think they're going to need a new eval soon.
That's right. They're going to need a new eval.
It's all about evals. On the multimodality side of it, obviously you guys announced the real
33:04
time API last week. I saw T-Mobile was one of the featured customers on there. Talk about that, like how obviously the text models are leading the pack, but then we got audio and we got video.
Talk about the progress on the multimodal models. When should we expect
33:20
to have the next big unlock and what would that look like? It's a good question.
The teams have been making amazing progress on multimodality. On voice, image, video, frankly, the last generation models have been unlocking quite a few cool use cases.
One of the feedback that we've received is
33:37
because text was so much leading the pack on the intelligence, people felt like in in voice that the model was somewhat a little less intelligent. Until you actually see it, it does feel weird to have a better answer on text versus voice. That's pretty much the focus that we have at the moment.
33:55
I think we filled part of that gap, but not the full gap for sure. I think catching up, I would say with the text would be one.
A second one, which is absolutely fascinating, is the model is excellent at the moment on easy casual conversation, talk to your coach, your therapist.
34:18
We basically had to teach the model to speak essentially better in actual work economically valuable setups. To give an example, the model has to be able to understand what an SSN is and what it's meant to spell an SSN.
If one digit is fuzzy, it actually has to repeat versus guess. There are
34:38
lots of infusions like that that someone has of our voice that we are currently teaching the model. That's an ongoing work actually with our customers until we actually confront the model to actual customer support calls, actual set school. It's really hard to get a feel for those gaps.
34:56
That's a top priority as well. This is completely off script, but an interesting question that comes up in voice models, particularly the real-time API is previously people were taking a speech input, convert that to text, then have some layer of intelligence.
Then you would have a text to
35:15
speech model that would play it back. It would be a stitch of these three parts.
The real-time API, you guys have integrated all of that. How does it happen?
Because a lot of the logic is written in text. A lot of the boolean logic or any function calling is written in text.
How does it work with
35:35
the real-time API? That's an excellent question. The reason why we should do real-time API is that we saw that for the stitch model.
The stitch model. Yeah.
The real-time API. The stitch.
We call it a stitch together. Like a speech to text, thinking, text to speech.
We saw essentially a
35:54
couple of issues. One, slowness, like you know, more hops essentially.
Two, loss of signal, like a close stitch model. The speech to text model is less intelligent.
Yeah, you'd lead through the emotion. Exactly.
Exactly. Right. Pauses.
Yeah. When you are doing actual voice,
36:13
like phone calls, essentially, those signals are so important, One of the challenges that we have is what you mentioned, which is, it means a slightly different architecture, essentially, for text versus voice. That's something that we are actively working on.
But I think it was the
36:31
right call to start essentially with, let's make the voice experience like natural sounding to a point where essentially you're feeling comfortable putting in production and then working backward to unify the orchestration logic, essentially, across modalities. And then to be clear,
36:48
a lot of customers still stitch these together. It's like what worked in the last generation. But what we're interested in seeing is more and more customers moving towards the real-time approach because of how natural it sounds, how much lower latency.
It is especially as we uplevel the intelligence of the model. But also even taking a step back, I will say it's pretty mind-blowing to
37:05
me that it works. I think it's mind-blowing that these LMs work at all, where you just train it on a bunch of text and it's just autoregressively coming up with the next token and it sounds super intelligent.
That's mind-blowing in and of itself. But I think it's actually even more mind-blowing that this speech-to-speech setup actually works correctly because you're literally taking the
37:20
audio bits from someone speaking, streaming, putting it into the model, and then it's generating audio bits back. To me, it's actually crazy that this works at all, let alone the fact that it can understand accents and tone and pauses and things like that, and then also be intelligent
37:36
enough to handle a support call or something like that. If you've gone from text-in, text-out to voice-in, voice-out, that's pretty crazy.
We have a bunch of companies in our portfolio that are using these models, Parloa on the customer support side, LiveKit on the infra side. There's
37:52
a bunch of use cases we were starting to see that a speech-to-speech model could address. There's a lot of the harder ones still running on what you're calling the "stitch model." But I hope the day is not far when it's all on real-time API. It's going to happen at some point.
Right, right,
38:09
right. And actually maybe that's a good segue into talking about model customization because I suspect that you have such a wide variety of enterprise customers.
I think you mentioned what, hundreds of customers or maybe more? Each of them has a different use case, a different problem set, a different call and envelope of parameters that they're working in, maybe latency, maybe power,
38:28
maybe others. How do you handle that?
Talk about what OpenAI offers enterprises who need a customized version of a great model to make it great for them. Yeah, so model customization is actually been something that we've invested very deeply in on the API platform since the very
38:44
beginning. So even pre-ChatGPT days, we actually had a supervised fine-tuning API available and people were actually using it to great effect.
The most exciting thing actually I'd say around model customization, it obviously resonates quite well with customers because they want to be able to bring in your own custom data and create your own custom version of o3 or o4-mini or something
39:05
or GPT-5 even suited to their own needs. It's very attractive but the most recent development I think is very exciting has been the introduction of reinforcement fine-tuning. For something we announced late last year, I think in the 12 days of Christmas, we've GA'd it since
39:20
and we're continuing to iterate on it. What is it, break it down for us?
Yeah, so it's called, it's actually funny, I think we made up the term reinforcement fine-tuning. It's like not a real thing until we announced that.
It's stuck now. I see it all the time.
I remember we were discussing it and I was like, "I don't know about RPI." You're not kidding. You're not kidding.
Yeah,
39:37
so reinforcement fine-tuning. So it really, it's introducing reinforcement learning into the fine-tuning process.
So the original fine-tuning API does something called supervised fine-tuning, we call it SFT. It is not using reinforcement learning.
It is, it's using supervised learning.
39:57
And so what that usually means is you need a bunch of data, a bunch of prompt completion pairs. You need to really supervise and tell exactly the model how it should be acting.
And then when you train it on our fine-tuning API, it moves it closer in that direction. Reinforcement fine-tuning introduces like RL or reinforcement learning to the sloop.
Way more complex,
40:14
way more finicky, but in order of magnitude more powerful. And so that's actually what's really resonated with a lot of our customers.
It allows you to, if you use RFT, the discussion is less of like creating a custom model that's specific to your own use case. It is, you can actually use
40:29
your own data and actually crank the RL, yeah, turn the crank on RL to actually create a like best-in-class model for your own particular use case. And so that's kind of the main difference here.
With RFT, the data set looks a little bit different. Instead of prompt completion pairs,
40:45
you really need a set of tasks that are very gradable. You need a grader that is very objective that you can use here as well.
And so that's actually been something that we've invested a lot in over the last year. And we've actually seen a number of customers get really good results on this.
We've talked about a couple of them across different verticals. So Rogo, which is a startup
41:04
in the financial services space. They have a very sophisticated AI team.
I think they hire some folks in DeepMind to run their AI program. And they've been using RFT to get best-in-class results on parsing through financial documents, answering questions around it, and doing tasks
41:21
around that as well. There's another startup called Accordance that's doing this in the tax space.
I think they've been targeting an eval called TaxBench, which looks at CPA-style tasks as well. And because they're able to turn it into a very gradable setup, they're actually able to turn
41:38
the RFT crank and also get, I think, like, SOTA results on TaxBench just using our RFT product as well. And so it has kind of shifted the discussion away from just customizing something for your own use case to really leveraging your own data to create a best-in-class, maybe best-in-the-world
41:53
model for something that you care about for your business. Yeah, I feel like the base models are getting so good at instruction following that for behavior steering, you don't need to find you at that point.
You can describe what you want, and the model is pretty good at it. But pushing the
42:10
frontier on actual capabilities, my heart is that RFT will pretty much become the norm. If you are actually pushing in your field, like, you know, intelligence to a pretty high point, like, at some point, you need to aisle essentially with custom environments.
Fascinating. And even going
42:29
back to the point earlier around top-down versus bottom-up for some of these enterprises, a lot of the data that you end up needing for RFT require very intricate knowledge about the exact task that you're doing and understanding how to grade it. And so a lot of that actually comes from bottoms up. Like, I know a lot of these startups will work with experts in their field to try and get the
42:47
right tasks and get the right feedback to craft some of these data sets. Without further ado, we're going to jump into my favorite section, which is a rapid-fire question.
We had a lot of great friends of ours send in some questions for you guys. We'll start with Altimeter's favorite game, which is a long, short game.
Pick a business, an idea, a startup that you're long,
43:08
and the same short that you would bet against that there's more hype than there's reality. Whoever's ready to go first, long, short. My long is actually not in the AI space, so this is going to be slightly different.
Wow. Here we go.
My short is, though, in the AI space. So I'm actually
43:25
extremely long esports. And so what I mean by "esports" is the entire, like, professional gaming industry that's emerging around video games.
Very near and dear to my heart, I play a lot of video games, and so I watch a lot of this. So obviously, I'm pretty in the weeds on this.
But I actually think there's incredible untapped potential in esports and incredible
43:43
growth to be had in this area. So concretely, what I mean, a really big one is League of Legends. All of the games that Riot Games puts out, they actually have their own professional leagues.
They actually have professional tournaments, believe it or not. They rent out stadiums, actually,
43:58
now. But I just think it's like, if you look at kind of what the youth and what younger kids are looking and where their time is going, it's predominantly going towards these things.
They spend a lot of time on video games. They watch more esports than like soccer or basketball?
Yeah, yeah, yeah, yeah. A growing number of these, too.
I've actually been to some of these events,
44:16
and it's very interesting. He's very commited to his long.
Yeah, yeah. I'm extremely long stuff. And so they're booking out stadiums for people to go watch electronic sports.
Yeah, yeah, yeah. I literally went to Oracle Arena, the old Warrior Stadium, to watch one of these, I think, before
44:34
COVID. And then the...
So it's just... Before COVID?
Wow, that's five years ago. Six years ago.
So I've been following this for a while, and I actually think it had a really big moment in COVID. Like, everyone was playing video games. Yeah, it was more so...
I think it's kind of like, come back down. So I think it's like, undervalued. You know, it's like, I think no one's really appreciating it now.
But it has all the elements to like, really, really take off. And so the youth
44:54
are doing it. The other thing I'd say is it is huge in Asia.
Like absolutely massive in Asia. It is absolutely big in Korea, in China as well.
Like we rented out Oracle Arena, I think, or like the event I went to was in Oracle Arena. My senses in Asia, they rent out like the entire stadiums, like
45:10
the soccer stadiums, and the players are ready like celebrities. So anyways, as like, I know like Korean culture is really making its way into the US as well.
I think that's another tailwind for this whole thing. But anyways, esports, I think, is something you should keep an eye out on because there's a lot of room for growth.
Very unexpected. Good to hear. Short.
My short, my short's a little
45:29
spicy, which is I'm short on the entire category of like tooling around AI products. And so this encapsulates a lot of different things.
Kind of cheating because some of these, you know, I think are starting to play out already. But I think like two years ago, it was maybe like evals products
45:48
or like frameworks or vector stores. I'm pretty short those.
I think nowadays there's a lot of additional excitement around other tooling around AI models. So RL environments, I think, are really big right now as well.
Unfortunately, I'm very short on those. I'm not really, I don't really see
46:07
a lot of potential there. See a lot of potential and reinforcement learning and applying it.
But I think the startup space around RL environments, I think, is really tough. Main thing is, one, it's just a very competitive space.
There's just a lot of people kind of operating it. And then two,
46:22
if the last two years have shown us anything, the space is evolving so quickly and it's so difficult to try and like adapt and understand what the exact stack is that will really carry through to the next generation of models. I think that just makes it very difficult when you're in the tooling space because, you know, today's really hot framework or really hot tool might just not
46:42
get used in the next generation of models. So I've been noticing like the same pattern, which is the teams that build like breakout startups in AI are extremely pragmatic.
They are not super like, you know, intellectual, but like the perfect world, et cetera. And it's funny because I feel like,
46:59
you know, our generation has basically started in tech in a very like stable moment where, you know, technology had been building up for years and years with like SaaS, like cloud, et cetera. And so we were in a way like raised, like, you know, in that very stable moment where it makes sense at that point to, you know, design like very like, you know, good, like abstractions
47:18
and toolings because, you know, you have a sense why it's going. But it's so different today.
Like, don't wait for it to know what's going to happen next year or two. So it's almost impossible to define like the perfect tooling platform.
Right. Right. Right.
Well, that's there's a lot of that going around right now. Yes.
Spicy. A lot of homework there.
Olivier, over to you, sir. Long
47:37
short. I've been thinking a lot about education for the past month in the context of kids. I'm pretty short on any education which basically emphasizes human memorization at that point.
47:52
And I say that having mostly been through that education myself, but, you know, like I learned so much on like, you know, history facts, like, you know, legal things that are, you know, some of it like does shape your way of thinking. A lot of it, frankly, is just like, you know, knowledge tokens, essentially.
And those knowledge tokens, you know, it turns out like, you know, other items
48:11
are pretty good at it. So I'm quite short on that. You will need memory when strategy is bionic.
You can just think about it straight into your head. Exactly. Exactly.
What am I long at? Frankly, I think healthcare is probably the industry that will benefit the most from AI in the next
48:32
like year or two. I would say more.
I think all the ingredients are here for a perfect storm. A huge amount of like structure and structural data, you know, it's basically the heart of, you know, like the pharma companies, the Mauser, excellent at digesting processing that kind of data.
A huge
48:51
amount of like admin, like, you know, heavy, like documents heavy, like, you know, culture. But at the same time, like companies which are very technical, very R&D friendly, like, you know, companies like, you know, who's like sort of technology in a way that the heart of what they do.
And so, yeah, I'm pretty bullish on that. This is like life sciences.
So you mean life sciences,
49:10
research organizations that are producing drugs. Gotcha. Exactly.
It's almost like, you know, over the last 20, 30 years, these like pharma or like biotech companies have basically, if you look at the work that they're doing, like only a small amount of it is actual research. And so much of it
49:28
ends up being admin and like, you know, documents and things like that. And that area is just so ripe for, you know, something to happen with AI. And I think that's what we're seeing with Amgen and some of these other customers.
Exactly. And it's also like not what they want to do.
I think it's good that we have some regulations there, obviously, but like, just means that they have like reams and reams of things to kind of go through. And so, you know, like when you have
49:47
a technology that's able to really help like bring down the cost of something like that, I think it'll just, you know, tear right through it. And I think once governments and like, you know, institutions are going to realize that, like, if you step back, like, it is probably one of the biggest bottleneck to like human progress, right? You step back in the past decade, like, you know,
50:04
how many like breakthrough drugs have there been? Like, you know, not that many. Like, you know, how life would be different if you double that rate? Essentially.
So once you realize what is at stake, then my hunch is that we're going to see quite a bit of momentum in that space. Wow.
All right.
50:19
Lots of homework there as well. Yeah.
Next one. Favorite underrated AI tool other than ChatGPT maybe. I love Granola.
Oh man, you stole mine. You stole my answer. I do so much, Granola.
Like, um. Two votes for Granola.
There is something like, yeah. Hey, what about ChatGPT record?
I like
50:37
ChatGPT as well, but there are some pictures of Granola, which I think are really done well. Like, the whole, like, you know, integration with your Google Calendar is excellent.
Yeah. Um.
And just, you know, the quality of, like, the transcription and, like, the summary is pretty good. Do you just have it on?
Because I know your calendar is back to back. You just have Granola on.
So the funny
50:55
thing is that I don't use Granola internally. I use Granola for my personal life mostly.
I see. Yeah. I see.
On dates. I'm joking.
I was going to say, yeah, Granola is actually going to be mine. So two votes for Granola.
I was going to say the easy answer for me is Codex. That as a
51:10
software engineer. It's just like, it's gone so good recently.
Codex CLI, especially with GPT-5. Especially for me, I tend to be less time sensitive about, like, you know, the iteration loop with coding.
And so leaning in as GPT-5 on Codex I think has been really... Interesting.
51:26
What about Codex has changed? Because, you know, Codex has also been through a journey.
Codex has been around for a bit. I remember, like, it's been launched for, like, more than over a year ago.
It's like, what's changed about Codex? Yeah, I was actually going to...
Codex the CLI has been around for a bit. I feel like it's been less than a year for Codex.
I feel like it's been less than a year for Codex. The time dilation is so crazy and this feels...
51:43
It feels like it's been around for a year ago with GPT-5. Oh, like, you know, that demo, like, that feels like ages ago.
And it didn't even come out yet. Probably because it hasn't happened yet.
The voice demo is... I think it was a naming thing, okay, but anyway...
Oh, there was a Codex model. That's what I'm thinking about.
There was a Codex model. Yeah, we are.
You're not too blamed
52:00
for that confusion. Also, I think the GitHub thing was called Codex.
That's right. Yes, yes.
I'm talking about our coding product within ChatGPT, which are the Codex Cloud offering, and then also Codex CLI. So, actually, maybe if I were to narrow my answer a little bit more, it's Codex CLI, which I've really, really liked.
I like the local environment set up. The thing that's actually made
52:19
it really useful in the last, I'd say, like, month or so is, one, I think the team has done a really good job of just, like, getting rid of all the paper cuts, like, the small product polish and, like, paper cut things. It just...
It kind of feels like a joy to use now. It feels more reactive.
And then the second thing, honestly, is GPT-5. I just think GPT-5 really allows the
52:38
product to shine. Yeah.
It's, you know, at the end of the day, this is kind of a... This is a product that really is dependent on the underlying model.
And when you have to, you know, like, iterate and go back and forth with model, like, four or five times to get it right, to get it to, like, you know, do the change that you want, versus having it think a little bit longer and
52:55
it just, like, one-shots and does exactly what you want to do, you get this, like, weird, like, bionic feeling where you're like, "I feel so mind-melded with the model right now and, like, perfectly understands what I'm doing." So getting, like, that kind of dopamine hit and, like, feedback loop constantly with codecs has made it kind of like an indispensable thing that I really,
53:11
really like. Nice.
And the other thing I'd say Codex is just really good for me is... So I use it for, like, personal projects.
I also use it to, like, help me understand code bases, like, as a engineering manager and now I'm not as in the weeds on the actual code. And so you're actually
53:28
able to use Codex to really understand what's happening with the code base, have it, like, ask questions and have it answer about things and really catch up to speed on things as well. So, like, even the non-coding use cases are really useful with Codex CLI Fascinating.
Sam had this tweet about Codex usage ripping, I think, like, yesterday. So I wonder what's going on there,
53:49
but you're not alone. Yeah, I think I'm not alone.
Just judging from the Twitter feedback, I think people are really realizing how great of a combination Codex CLI and GPT-5 are. Yeah, I know that team is undergoing a lot of scaling challenges, but, I mean, the system hasn't gone down for me, so props to them.
But we are in a GPU crunch, so we'll see how long that goes. Awesome,
54:09
awesome. All right, the next one.
Will there be more software engineers in 10 years or less? There's about 40, 50 million... full-time, professional software engineers.
That's what you mean, like, full-time, like, actual jobs? Yeah, because it's a hard one, because, like,
54:25
I think without a doubt there's going to be a lot more software engineering going on. Yes, of course.
There's actually a really great post that was shared, I think, in our internal Slack. It was like a Reddit post recently.
I actually think that highlights this. It was a really touching story.
It was a Reddit post about someone who has a brother who's non-verbal. I actually don't know
54:41
if you saw this. It was just posted.
It's a person on Reddit posted, they have a non-verbal brother who they have to take care of. The brother, like, they tried all these types of things to help the brother interact with the world, use computers, but, like, vision tracking didn't work, because I think his vision wasn't good.
All the tools didn't work, and then this brother ended up using
55:00
ChatGPT. I don't think he used codecs, but he used chatGPT and basically taught himself how to create a set of tools that were tailor-made to his non-verbal brother.
Basically, a custom software application just for them. Because of that, he now has this custom setup that was written by his brother and allows him to browse the internet. I think the video was him watching The Simpsons or
55:19
something like that, which was really touching. I think that's actually what we'll see a lot more of. This guy's not a professional software engineer.
His title's not software engineer, but he did a lot of software engineering, probably pretty good. Good enough, definitely, for his brother to use.
The amount of code, the amount of building that'll happen, I think, is just going
55:36
to go through an incredible transformation. I'm not sure what that means for software engineers like myself. Maybe there's equivalent or maybe there's-- Of course, more Sherwin.
Yeah, more of me. More of me specifically.
We need more of you. That's right.
Yeah. But definitely a lot more software engineering at a lot of companies. I buy that completely.
I buy completely a thesis
55:53
that there is a massive software shortage in the world. We've been sort of accepting it for the past 20 years.
But the goal of software was never to be that super rigid, super hard to build artifact. It was to be customized, malleable. And so I expect that we'll see way more-- sort
56:12
of a reconfiguration of people's job and skillset where way more people code. I expect that product managers are going to code more and more, for instance.
You made your PMs code recently, if you were right. Oh, yeah, we did that.
It was really fun. We started essentially not doing PRDs,
56:31
product requirements documents. Classic PM thing.
You write five pages. My product does that, et cetera.
And PMs have been basically coding prototypes. And one is pretty fast with GPT-5 and Codex.
Yeah, just a couple hours, I think. Fricking fast.
And second, it sort of
56:48
conveys so much more information than a document. You get a feel, essentially, for the feature. Is it right or not? So yeah, I expect that sort of behavior we're going to see more and more.
Yeah, instead of writing English, you can actually now write the actual thing you want. Yeah, yeah.
Yeah, that's amazing. Advice for high school students who are just starting out their career.
57:09
My advice is-- I don't know. Maybe it's evergreen. Prioritize critical thinking above anything else. If you go in the field, which requires extremely high critical thinking, like skills-- I don't
57:25
know, math, physics, or maybe philosophy's in that bucket-- you will be fine regardless. If you go in the field, that sort of turns down that thing.
And again, it gets back to memorization, like pattern matching. I think you will probably be less future-proof.
What's a good way to sharpen
57:41
critical thinking? Use ChatGPT and have it test you.
That's a tricky test. Having a work-class tutor who essentially knows how to put the bar about 20% of what you can do all the time
57:59
is actually probably a really good way to do it. Nice.
Anything from you, sir? Mine is-- I think we're actually in such an interesting, unique time period where the younger-- so maybe
58:16
this is more general advice for not just high school students, but just the younger generation, even college students. I think the advice would be don't underestimate how much of an advantage you have relative to the rest of the world right now because of how AI native you might be or how
58:31
in the ways of the tools you are. My hunch is high schoolers, college students, when they come into the workplace, they're going to have actually a huge leg up on how to use AI tools, how to actually transform the workplace.
And my push for some of the younger high school students is one,
58:46
just really immerse yourself in this thing. And then two, just really take advantage of the fact that you're in a unique time where no one else in the workforce really understands these tools as deeply, probably as you do.
A good example of this is actually we had our first intern class recently at OpenAI, a lot of software interns. And some of them were just the most incredible
59:06
Cusor power users I've ever seen. They were so productive.
I was shocked by the way. I was like, yeah, I know we can get good interns, but I don't know, they'd be like this good.
And I think part of it is they've grown up using these tools, for better or worse, in college. But I think
59:23
the meta-level point is they're so AI native. And even me and Olivier, we're kind of AI native, we work at OpenAI, but we haven't been steeped in this and grown up in this. And so the advice here would just be leverage that.
Don't be afraid to go in and spread this knowledge and take advantage of
59:41
it in the workplace, because it is a pretty big advantage for them. I can't remember who said this to us at Palantir, but every intern class was just getting faster, smarter, like laptops, like smarter every generation.
You sure didn't peak in 2013 when I was an intern. That's right. That's right.
There's a weird spike. That's summer 2013.
Two guys That's right. That's right.
That's
00:03
right. Well, lots happened here.
A lot's happened since you guys joined OpenAI, right? With three years and almost three years.
In your OpenAI journey, what has been the rose moment, your favorite moment, the bud moment where you're most excited about something, but still opportunity
00:19
ahead, and the thorn, toughest moment of your three-year journey? The thorn is easy for me. What we call the blip, which is the coup of the board.
That was a really tough moment. It's funny, because after the fact, it actually reunited quite a bit the company.
There was a feeling,
00:36
OpenAI had a pretty strong culture before, but there was a feeling of camaraderie, essentially, that was even stronger, but sure, tough on the day of. It's very rare to see that anti-fragility. Most orgs after something like that break apart, but I feel like OpenAI got stronger.
00:52
OpenAI came back. It's a good point.
I feel it made OpenAI stronger for real now, essentially, when they look after the fact. When they look at other news, like departures, or whatever, bad news, essentially, I feel the company has built a thicker skin and an ability to recover
01:11
way quicker. I think it's definitely right.
Part of it, too, I think is also just the culture. I also think this is why it was such a low point for a lot of people.
So many people just at OpenAI care so deeply about what we're doing, which is why they work so hard. You just care a lot about the work.
It almost feels like your life's worked. It's a very audacious mission and
01:29
thing that you're doing, which is why I think the blip was so tough on a lot of people, but also is what I think helped bring people back together and how we were able to hold together and get that thick skin as well. I have a separate worst moment, which was the big outage that we had in
01:44
December of last year. You remember.
I do. It was like a multi-hour outage.
Really highlights to us how essential of almost like a utility the API was. So the background is I think we had a three, four-hour outage sometime in November or December last year.
Really brutal,
02:04
pure set of zero. No one could hit chat GBT.
No one could hit the APIs. It was really rough.
That was just really tough just from a customer trust perspective. I remember we talked to a lot of our customers to kind of post-mortem them on what happened and kind of our plan moving forward.
02:20
Thankfully, we haven't had anything close to that since then. I've been actually really happy with all the investments we've made in reliability over the last six months.
But in that moment, I think it was really tough. On the happy side, like on the roses, I think I have two of them.
02:40
The first one would be GPT-5 was really good. The sprint up to GPT-5, I think really showed the best of OpenAI.
Having cutting edge, science research, extreme customer focus, extreme infrastructure and
02:57
inference talent. The fact that we were able to ship such a big model and scale it to many, many, many tokens per minute almost immediately, I think speaks to it.
That one I really-- With no outages.
03:14
With no outages Yeah, really good reliability. I can remember when we shipped GPT-5 Turbo, like a year ago, a year and a half ago, we were terrified by the in-scale traffic.
And I felt we've really gotten much better at shipping those massive updates. The second rose happy moment for me would
03:35
be the first dev day was really fun. It felt like a coming of age, like OpenAI.
We are embracing that we have a huge community of developers. We are going to ship models and products. And I remember basically seeing all my favorite people, OpenAI or not, essentially nerding out on what
03:53
are you building, what's coming up next. It felt really like a special moment in time.
That was actually going to be mine as well. So I'll just piggy back off of that, which is the very first dev day, 2023 November.
I remember it. I mean, obviously a lot of good things have happened
04:10
since then. There's just a very-- I don't know why for me.
It was a very memorable moment, which was one, it was actually quite a rush up to dev day. We shipped a lot. So our team was just really, really sprinting.
So it was like this high stress environment kind of going up. To add to that,
04:26
of course, because we're OpenAI, we did a live demo on Sam's keynote of all the stuff that we shipped. And I just remember being in the back of the audience, sitting with the team, and waiting for the demo to happen.
Once it finished happening, we all just let out a huge sigh of relief. We were like, oh my god, thank you.
And so then there's just a lot of buildup to it. For me,
04:46
the most memorable thing was I remember right after dev day, all the demos worked well, all the talks worked well. We had the after party, and then I was just in a Waymo driving home at night with the music playing.
It was just such a great end to the dev day. That was what I remember. That was my rose for the last few years.
Love it. That's awesome. I assume you guys are, but please
05:05
tell me if you're AGI-pilled, yes or no? And if so, what was the moment that got you there?
What was your aha moment? When did you feel the AGI? I think I'm AGI-pilled?
I think I'm AGI-pilled. You're definitely AGIPilled. I am?
I've had a couple of them. I've had a couple of them.
05:21
The first one was the realization in 2023 that I would never need to code manually like ever ever again. I'm not the best coder, you know, I chose my job like for a reason.
But realizing that what I thought was a given that we humans would have to write basically machine language forever is
05:41
actually not a given. And that the pay surprise is huge.
The second feel the AGI moment for me was maybe the progress on voice and multimodality. Like text, like at some point you get used to
05:57
it. Like okay, the machine can write pretty good text.
Voice makes it real. But once you start actually talking to something that actually understands your tone, like understand my accent like in French, it felt like sort of a moment, like okay, machines are going beyond like cold,
06:17
mechanical, deterministic, like you know, like logic to something like much more like emotional and like, you know, tangible. Yeah, that's a great one. Yeah, mine are, I do think I am AGI-pilled.
I probably gradually became AGI-pilled over the last couple of years.
06:35
I think there are two. And for me, yeah, I think I actually get more shocked from the text models.
I know the multimodal ones are really great as well. For me, I think they actually line up with two like general breakthroughs. So the first one was right when I joined the company in September 2022.
06:52
It was pre-ChatGPT. Yeah, two months ago.
About the time GPT-4 already existed internally. And I think we were trying to figure out how to deploy. I think Nick Turley talked about this a lot early as a chat GPT.
But it was the first time I talked to GPT-4. And it was like going from nothing to GPT-4 was just the most mind-blowing experience for me.
I think for the rest of the world,
07:12
maybe going from nothing to GPT-3.5 in chat was maybe the big one and then going from 3.5 to 4. But for me, and I think for a lot of maybe some other people who joined around that time, going from nothing to, or not nothing, but like what was publicly available at the time. Going from that to GPT-4 was just incredible.
Like I just remember asking, throwing so many things out. I was like,
07:31
there's no way this thing is going to be able to give an intelligible answer. And it just like knocks it out of the park.
It was absolutely incredible. GPT-4 was insane. I remember GPT-4 came out when I was interviewing with OpenAI.
And I was still looking at the phone, should I join? And so that thing, I was like, okay.
I mean, there is no way I can walk on
07:49
anything else at that point. That's true.
Yeah, so GPT-4 was just crazy. And then the other one was, like, is the other breakthrough, which is like the reasoning paradigm.
I actually think the purest representation of that for me was deep research. And throwing, like, asking it to really look up
08:07
things that I didn't think it would be able to know. And seeing it think through all of it, be really persistent with the search, get really detailed with the write-up and all of that.
That was pretty crazy. I don't remember the exact query that I threw it, but I just remember, I feel like the field AGI moments for me are, like, I'll throw something at the model that I was like,
08:24
there's no way this thing will be able to get. And then it just like knocks it out of the park.
Like, that is kind of the field AGI moment. I definitely had that with deep research with some of the things I was asking.
Yeah. Well, this has been great. Thank you so much, folks.
You guys are building the future. You guys are inspiring us every day.
And appreciate the
08:41
conversation. Yeah, thank you so much.
Thank you. Thanks for having us.
[MUSIC PLAYING] As a reminder to everybody, just our opinions, not investment advice.