Inside OpenAI Enterprise: Forward Deployed Engineering, GPT-5, and More | BG2 Guest Interview

🚀 Add to Chrome – It’s Free - YouTube Summarizer

Category: AI Technology

Tags: AutonomyCustomizationEnterpriseGPT-5OpenAI

Entities: AmgenChatGPTGPT-5Los Alamos National LabsOlivier GodementOpenAISherwin WuT-MobileVenado

Building WordCloud ...

Summary

    Business Fundamentals
    • OpenAI's mission is to build AGI and distribute its benefits to all of humanity.
    • OpenAI started with an API as a B2B product before ChatGPT became popular.
    • The platform includes a developer API, government products, and enterprise solutions.
    Enterprise Deployments
    • OpenAI works with enterprises like T-Mobile to automate customer support using AI models.
    • Amgen uses OpenAI's models to accelerate drug development and regulatory processes.
    • Los Alamos National Labs deploys OpenAI models for national security and research purposes.
    AI and Autonomy
    • AI agents are still in early stages compared to self-driving cars, but progress is rapid.
    • Physical autonomy has advanced more than digital autonomy due to existing infrastructure like roads.
    Model Development and Customization
    • GPT-5 focuses on intelligence, behavior, and customer feedback for improvements.
    • Model customization includes supervised and reinforcement fine-tuning for specific enterprise needs.
    • OpenAI's real-time API integrates speech-to-speech models for seamless voice interactions.
    Actionable Takeaways
    • AI can significantly enhance enterprise operations, such as customer support and drug development.
    • Successful AI deployments require top-down support and a dedicated team within the enterprise.
    • Reinforcement fine-tuning enables enterprises to create tailored AI models for specific tasks.
    • AI agents need appropriate scaffolding and infrastructure to be effective in enterprises.
    • GPT-5's improvements in reasoning and instruction-following enhance its application in business.

    Transcript

    00:00

    In San Francisco, you could take a car from one  part of SF to the other fully autonomously. As   opposed to the digital world, I can't book a  ticket online right now.

    Physical autonomy is   ahead of digital autonomy in 2025. I think AI  agents are like really in day one here.

    Like  

    00:18

    ChatGPT only came out in 2022. The slope  I think is incredibly steep.

    I actually   do think self-driving cars have a good amount  of scaffolding in the world. You have roads,   roads exist.

    They're pretty standardized.  Stoplights. AI agents are just kind of   dropped in the middle of nowhere.

    We'll start  with long, short game. I'm short on the entire  

    00:37

    category of like tooling, evals products.  Healthcare is probably the industry that   will benefit the most from AI. I think I'm  AGI-pilled.

    You're definitely AGI-pilled.  

    01:01

    Hey folks, I'm Apoorv Agarwal and today at the  OpenAI office, we had a wide ranging conversation   about OpenAI's work in enterprise. I have with me  the head of engineering and head of product of the   OpenAI Platform, Sherwin Wu and Olivier Godement.  OpenAI is well known as the creator of ChatGPT,  

    01:17

    which is a product that billions across the  world have come to love and enjoy. But today we   dive into the other side of the business, which  is OpenAI's work in enterprise.

    We go deep into   their work with specific customers and how OpenAI  is transforming large and important industries   like healthcare, telecommunications and national  security research. We also talk about Sherwin and  

    01:34

    Olivier's outlook on what's next in AI, what's  next in technology and their picks both on the   long and short side. This was a lot of fun to do.  I hope you really enjoy it.

    Well, two world-class   builders, two people who make look building easy.  Sherwin, my Palantir 2013 classmate, tennis buddy,  

    01:53

    with two stops at Quora and Opendoor through  the IPO before joining OpenAI, before ChatGPT,   you've now been here for three years and lead  engineering for all OpenAI Platform. Olivier,   former entrepreneur, winner of the Golden Llama at  Stripe, where you were for just under a decade and  

    02:12

    now lead all of the product at OpenAI Platform.  That's right. Thanks for doing it.

    Thank you.   Thanks for having us. As a shareholder, as a  thought partner, kicking ideas back and forth,   I always learn a lot from you guys.

    And so it's  a treat. It's a real treat to be do this for   everybody.

    You know, I'll open with people know  OpenAI as the firm that build ChatGPT, the product  

    02:33

    that they have in their pocket that comes with  them every day to work, to personal lives. But   the focus for today is OpenAI for enterprise.  You guys lead OpenAI Platform.

    Tell us about   it. What's underneath the OpenAI Platform for B2B  for enterprise?

    Yeah. So this is actually a really  

    02:50

    interesting question too, because when I joined  OpenAI around three years ago to work on the API,   it was actually the only product that we had.  So I think a lot of people actually forget this,   where the original product for OpenAI actually was  not ChatGPT. It was a B2B product.

    It was the API   we were catering towards developers. And so I've  actually seen, you know, the launch of ChatGPT  

    03:10

    and all of everything downstream from that. But at  its core, I actually think the reason why we have   a platform and why we started with an API is it  kind of comes back to the OpenAI mission.

    So our   mission obviously is to build AGI, which is pretty  hard in and of itself, but also to distribute the  

    03:26

    benefits of it to everyone in the world, to all  of humanity. And, you know, it's pretty clear   right now to see ChatGPT doing that, because,  you know, my mom, you know, maybe even your   parents are using ChatGPT.

    But we actually view  our platform and especially our API and how we  

    03:42

    work with our customers, our enterprise customers,  as our way of getting the benefits of AGI, of AI,   to as many people as possible to everyone in every  corner of the world. ChatGPT obviously is really,   really, really big now.

    It's, I think, like  the fifth largest website in the world. But   we actually, by working through developers using  our API, we're actually able to reach even more  

    04:01

    people in, you know, every corner of the world and  every different use case that you might have. And   especially with some of our enterprise customers,  we're able to reach even use cases within   businesses and end users of those businesses as  well.

    And so we actually view the platform as   kind of our way of fully expressing our mission of  getting the benefits of AGI to everyone. And so,  

    04:22

    concretely though, what the platform actually  includes today, the biggest product that we   have is obviously our developer platform, which  is our API. You know, many developers, you know,   the majority of the startup ecosystem builds on  top of this, as well as a lot of digital natives,  

    04:38

    Fortune 500 enterprises at this point. We also  have a product that we sell to governments as   well in the public sector.

    That's all part of  this as well. And also an emerging product line   for us in the platform is our enterprise product.  So we actually might sell directly to enterprises   beyond just a core API offering.

    Fascinating.  And maybe to double down, like, I think B2B is  

    05:00

    actually quite core to the OpenAI mission. What we  mean by distributing AGI benefits is, you know, I   want to live in a world where, you know, there are  10x more medicines going out every year.

    I want to   live in a world where, you know, education, public  service, civil service, you know, are increasingly  

    05:19

    optimized to everyone. And, you know, there are a  large category of use cases that only go through   B2B, frankly, unless you enable the enterprises.  And we talk about Palantir, I think that's   probably the same thesis at Palantir.

    It's like,  hey, those are the businesses who are actually  

    05:36

    making stuff happen in the real world. So if you  do enable them, if you do accelerate them, like,   that's how essentially you benefit, you know, to  distribute AGI.

    Yeah. Well, maybe we can double   click into that, Olivier.

    You know, the reach for  chat is obviously wide, billions of users. But for  

    05:52

    enterprise, it's maybe tell us about it. Maybe we  go deep into a customer example or two.

    And what   is an organization that we have helped transform  maybe? And at what layers?

    So if I were to step   back, like, we started our B2B efforts with the  API like a few years ago. Initially, the customers  

    06:10

    were startups, developers, indie hackers,  extremely technically sophisticated people,   like, you know, who are building, like, you know,  cool new stuff, essentially, and taking massive,   like, you know, market that they can risk. So we  still have a bunch of customers in that category,  

    06:25

    and we love them, and we keep building with  them. On top of that, you know, over the past   couple of years, we've been working one more  with traditional enterprises, and also, like,   digital natives.

    Essentially, I think, basically,  everyone woke up, like, with their GPT, and, like,   those models are working. There is a ton of value,  and they could see, essentially, many use cases  

    06:43

    in the enterprise. A couple of examples which I  like the most.

    One which is very both fresh and,   you know, is quite cool. We've been working a lot  with T-Mobile.

    T-Mobile. So T-Mobile, leading,   like, US telco operator.

    T-Mobile has, like,  you know, a massive customer support load. Like,  

    07:01

    you know, people asking, like, you know, "Hey, I  was charged, like, that amount of money was going   on," or, you know, "My cell phone, like, isn't  working anymore." A massive, like, you know,   share of that load is, like, you know, voice  calls. Like, people want to talk to someone.   And so for them, like, you know, to be able to  essentially automate, like, more and more, and,  

    07:18

    you know, to help, like, people, like,  self-serve in a way, like, you know,   debug their subscription was pretty big. And so  we've been working with T-Mobile pretty much for   the past year.

    At that point, to basically  automate, like, not only the text support,   but also voice support. And so today, like, you  know, there are features, like, in the T-Mobile  

    07:35

    app, that if you call, are actually handled by  OpenAI models behind the scenes. And, you know,   it does sound, like, supernatural, like, you know,  human-sounding latency, quality-wise.

    So that   one was really fun. A second one, which is very--  Just on that, can I ask you a follow-up question?  

    07:51

    So we've got text models. We've got voice models,  maybe even video models someday that are deployed   at T-Mobile.

    But what above the models or adjacent  to the models might we have helped T-Mobile with,   for example? Yeah, there is a ton we're doing.

    The  first one is, you know, you have to put yourself  

    08:07

    in the shoes of an enterprise buyer. Like,  their goal is to automate, you know, reduce,   like, you know, optimize customer support.

    And  you're going from, like, a model, like, tokens in,   tokens out. To that case, it's hard.

    And so, you  know, first, like, there's a lot of design, like,   you know, system design. We do have actually now  forward deployed engineers, who are helping us  

    08:26

    quite a bit. Forward deployed engineers.

    Yeah, I  mean-- Yeah, that's familiar to the-- We borrow   the term from Palantir. Yeah, it's a great term.  Were you an FD at Palantir?

    I was not an FD I was   on, I think they called it the dev side, right?  It's like software engineering. I was also only   an intern at Palantir.

    But, yeah, it's a great  term. I think it accurately describes what we're  

    08:43

    asking folks to do, which is, like, embed very  deeply with customers and, honestly, like, build   things specific to their systems. They're deployed  onto these customers.

    But, yeah, we are obviously   growing and hiring that team quite a bit because  they've been very effective, like, at T-Mobile.   Four years of my life. Yeah, yeah, yeah.

    Forward  deployed. But go ahead.

    So, forward deployed  

    09:01

    engineering. Forward deployed engineers and the  sort of, like, systems and, like, integrations   they're doing is, you know, first, like, you  know, you have to orchestrate those models.

    Like,   those models are not just, you know, those models,  like, know nothing about, like, you know, the CRM,   like, you know, and, like, what's going on.  And so, you have to plug the model to, like,  

    09:16

    many, many different tools. Many of those, like,  tools, like, in the enterprise, do not even have,   like, APIs or, like, clean interfaces, right?

    It's  the first time they're being exposed, like, you   know, to a third party system. And so, there is a  lot of, you know, standing up, like, you know, API   gateways, like, tools, connecting.

    Then you have  to essentially, like, define what good looks like,  

    09:35

    you know. Again, like, to put in your exercise  for everyone, like, you know, defining, like,   a golden set of evals is, you know, easier than  it sounds.

    Harder than it sounds. Yeah.

    And so,   we've been spending, like, a bunch of time  with them. Evals are important.

    Evals are super   important. Especially, like, audio evals.

    Evals  are, like, extra hard to grade and get right. But,  

    09:54

    like, the bulk of the use case here is actually  audio. And, like, we have, like, I don't know,   five minute, like, call transfer, how do you  actually know that the right thing happened?   It's a pretty tough problem.

    Yeah, it's pretty  tough. And then, you know, actually nailing down,   like, the quality of the customer experience,  like, you know, until it feels natural.

    And here,  

    10:12

    latency and interruptions. They're really, like,  you know, important part.

    We shipped in GA an API,   real-time API. I think it was last week.

    A couple  of weeks ago, yeah. Yeah, it was just last week,   I think.

    Which is, like, a beautiful work  of engineering. You know, there was a really  

    10:27

    cracked team behind the scenes. Which basically  allows us, like, to get, like, the most, like,   natural sounding, like, you know, voice experience  without having, like, these weird interruptions on   your lag where you can feel that, essentially,  the thing is off.

    So, yeah. Cobbling all that  

    10:43

    together, you know, and you get, like, you know,  a really good experience. Yeah, that's a lot more   than just models.

    Yeah. One actually really  great thing that I think we've gotten from the   T-Mobile experience is actually working with them  to improve our models themselves.

    So for example,   the last real-time GA last week, we obviously  released a new snapshot, the GA snapshot. And  

    11:03

    a lot of the improvements that we actually  got into the model came out of, you know, the   learnings that we have from T-Mobile. It brings  in a lot of other changes from other customers,   but because we were so deeply embedded into  T-Mobile and we were able to understand what   good looks like for them, we were able to bring  that to some of our models.

    That makes sense.   So, we are working with a large customer with tens  of millions of users, if not hundreds of millions,  

    11:21

    and the before and after is on the support  side, both tech support internally and then   their customer support. Yeah.

    Makes sense. Yeah.  Is there another one that you guys can share?

    I   like a lot Amgen. Amgen, the healthcare business.  Amgen, yeah.

    So, we are working quite a bit with  

    11:38

    healthcare companies. Amgen is one of the leading,  like, healthcare companies.

    They specialize into   drugs for cancer or, like, you know, inflammatory  diseases that are based out of LA. And we've been   working, essentially, with Amgen to essentially  speed up, like, the drug, like, development  

    11:55

    and the conversation process. So, you know, the  sort of the north star is, like, pretty bold.

    And   it's really interesting, like, when you similarly,  like, you know, we embedded, like, pretty deeply   with Amgen to understand what are their  needs. And it's really interesting, like,   when I look at those healthcare companies, I feel  like they are two big buckets of needs.

    One is,  

    12:14

    like, pure R&D. It's like, you know, you're  seeing, like, a massive amount of data and, like,   you have super smart scientists who are trying to,  you know, come by, test out things, you know.

    So,   that's one bucket. A second bucket is, like, you  know, much more, like, you know, common across   other industries.

    It's, like, pure, like, you  know, admin, document authoring, document-scribing  

    12:32

    work, which is, you know, by the time, like, your  R&D team has essentially locked the recipe of a   medication, getting that medication to market is  a ton of work. Like, you have to submit to, like,   value regulatory bodies, get a ton of reviews.  And you know, when we looked at essentially those  

    12:49

    problems, what we knew, what models were capable  of, we saw, like, you know, a ton of benefits,   a ton of opportunities to automate and, you  know, augment essentially the work of those   teams. And so, yeah, Amgen has been, like, a top  customer of GPT-5, for instance.

    Wow. I mean,   this could be hundreds of millions of lives if a  new drug is developed faster.

    Yeah, exactly. Huge  

    13:09

    impact. So that's, you know, that's, I think,  one good example of, like, a kind of impact on   which you need to enable enterprises, like, to  do it.

    Right. You know?

    And so I think we're   going to do more and more of those. And yeah,  frankly, like, you know, on a personal level,   like, it's a delight, you know.

    If I can play,  like, you know, a tiny role, essentially, like,  

    13:26

    doubling, like, you know, the kind of medication  that people, you know, get in the real world,   that feels like, you know, a pretty good, like,  you know, achievement. Huge.

    Huge, huge. I know   you had one as well.

    So one of my favorite  deployments that we've done more recently,   actually, is with the Los Alamos National Labs. So  this is the, like, government, national research  

    13:44

    lab that the U.S. government is running in Los  Alamos, New Mexico.

    It's also where, you know,   the Manhattan Project happened back in the 40s  and 50s, back when it was the secret project. So,   you know, after that, they ended up formalizing  it as a city and a program, and then now it's a   pretty sizable national laboratory.

    This one is  very interesting because one, just the depth of  

    14:03

    impact here is, like, unimaginable for me, it's  like on the scale of Amgen and some of these other   larger companies. But, you know, obviously  they're doing a lot of actual new research there,   so a lot of new science.

    They're doing a lot of  stuff with our Defense Department and Defense  

    14:19

    use cases as well. So very intense, you know, very  intense stuff.

    But the other thing that's actually   very interesting about this one was that it's  also a story of a very, like, bespoke and, like,   new type of deployment that we've done. So because  they are so, they're a government lab, they're so,  

    14:34

    you know, restrictive and high security and high  clearance with a lot of their things, we couldn't   just do a normal deployment with them. They  couldn't, you know, you can't have people doing   national security research just hitting our APIs.  And so we actually did a custom on-prem deployment   with them onto one of their supercomputers called  Venado.

    And so this actually involves a bunch of,  

    14:54

    you know, very bespoke work with some FDEs,  also with a lot of our developer team,   to actually bring one of our reasoning models,  o3, into their laboratory, into an air-gapped,   you know, supercomputer Venado and actually deploy  it and get it installed to work on their hardware,  

    15:09

    on their networking stack, and actually run it  in this particular environment. And so it was   actually very interesting because we literally  had to bring the weights of the model physically   into their supercomputer in an environment,  by the way, where you're not allowed to have,  

    15:25

    you know, it's very locked down for a good reason.  They're not allowed to have cell phones or like   any electronics with you as well. So I think that  was a very unique challenge.

    And then the other   interesting thing about this deployment is just  how it's being used, right? So the interesting   thing is because it's so locked down and on-prem,  we actually do not have much visibility into  

    15:44

    exactly what they're doing with it, but we do  have, you know, they give us feedback. Yeah,   yeah.

    They actually do have some telemetry, but  it's, you know, within their own systems. But   we do know that it's, you know, being used for a  bunch of different things is being used for aiding   them in terms of speeding up their experiments.  They have a lot of data analysis use cases,  

    16:03

    a lot of notebooks that they're running with  reams of data that they're trying to process.   They're actually using it as a thought partner,  which is something that's pretty interesting to   me. o3 is like pretty smart as a model.

    And a lot  of these people are tackling really tough, you   know, novel research problems. And a lot of times  they're kind of using o3 and going back and forth  

    16:20

    with it on their experiment design on like what  they actually should be using it for, which is,   you know, something that we couldn't really say  about our older models. And so, yeah, it's just   being used for a lot of different use cases for  the National Lab.

    And the other cool thing is it's  

    16:36

    actually being shared between Los Alamos and some  of the other labs, Lawrence Livermore, Sandia as   well, because it's the supercomputer setup where  they can all kind of connect with it remotely.   Fascinating. I mean, we've just gone through three  pretty large scale enterprise deployments, right,  

    16:53

    which might touch tens if not hundreds of millions  of people. But there was this on the other side of   this is the MIT report that came out a couple of  weeks ago.

    95% of AI deployments don't work. A   bunch of, you know, scary headlines that even  shook the markets for a couple of days.

    Like,  

    17:09

    you know, put this in perspective, like  for every deployment that works, there's   presumably a bunch that don't work. So maybe  we can, you know, maybe talk about that.

    Like,   what does it take to build a successful enterprise  deployment, a successful customer deployment and  

    17:25

    the counterfactual based on all your experience  serving all these large enterprises? I think at   that point, I may have worked with like a couple  of hundreds.

    I think. Couple of hundreds.

    So,   okay, I'm going to pattern match. What I've seen  being like clear leading indicator of success.  

    17:43

    Number one is like the interesting combination of  like top down like buy in and like enabling like,   you know, very clear group of like a tiger team,  essentially, like, you know, the enterprise which   sometimes a mix of like OpenAI, like, you know,  enterprise employee. So, you know, typically,   like, you know, you take like T-Mobile, like the  top leadership was like extremely boring, like  

    18:01

    it's a priority. But then letting the team  like, you know, organize and be like, okay,   if you want to start small, start small, you know,  and then you can scale it up, essentially.

    So that   would be part number one. So top down buying and a  bottom called a tiger team.

    Tiger team, you know,   people like, you know, a mix of like technical  skills and like people who just have like the  

    18:20

    organizational knowledge, like institutional  knowledge, you know, it's really funny,   like in the enterprise, like customer support, a  good example, like what we found is that the vast   majority of the knowledge is in people's heads.  Right. Right.

    Which is probably like a thing that,  

    18:36

    you know, we have these like in general, but  like, you know, you take a customer support,   you would think that, you know, everything is like  perfectly documented, etc. The reality is like the   standard like operating procedures, like the SOPs  are larger than people said.

    And so unless you   have that tiger team, like mix of like technical  and like, you know, subject matter expert,  

    18:52

    really hard like to get something out of the  ground. That would be one.

    Two would be evals   first. Like, whatever we define as good evals,  like that gives like a clearly clear common goal   for people to hit.

    Whenever, like, you know, the  customer like fails to come up with good evals,  

    19:08

    it's a moving target. Essentially, you know, if  you've made it or not.

    And you know, evals are   much harder than what it looks to get done. And  evals also oftentimes need to come up bottom up,   right?

    Because all of these things are kind of in  people's heads, in the actual operator's heads.  

    19:23

    Like it's actually very hard to have a top-down  mandate of like, you got like, this is how the   evals should look. A lot of it needs the bottoms  up adoption.

    Right. Yeah.

    Yeah. And so we'd be   dealing quite a bit of tooling on evals.

    We have  like an evals product and you know, we're working   on more to essentially solve like, you know, that  problem or, you know, make it as easy as we can.  

    19:41

    The last thing is, you know, you want to help  climb, essentially. You have your evals, the goal   is to get to 99%.

    You start at like, you know, 46.  You know, how do you get there? And here, frankly,   I think oftentimes, like, you know, a mix of like,  like, I will say like almost wisdom from people  

    20:00

    who've done it before. Like you know, a lot of  that is like, you know, like art, sometimes more   than science.

    Like, you know, knowing like the  course of the model, the behavior, sometimes we   even need to fine tune ourselves, the models, you  know, when there are some clear limitation and,   you know, being patient, getting your way, you  know, up there and then, you know, ship. Can we go  

    20:18

    under the hood a little bit? You know, one of the  things that we think about a lot is autonomy more   broadly, right?

    What is the makeup of autonomy on  one side, you know, in San Francisco, you could   take a car from one part of SF to the other fully  autonomously. No humans involved.

    No, you press   a button Yeah, we love the way it does. They've  done billions of rides.

    I think it was like what,  

    20:36

    three and a half billion rides on the test, this  is on the Tesla FSD. I think we almost done like   million, tens of millions of rides.

    That's a lot  of autonomy. In the physical world, as opposed to   the digital world, I can't book a ticket online  right now.

    There's all sorts of problems that  

    20:53

    happen if I have my operator try to book a ticket.  And it's very counterintuitive because the bar   for physical safety is so much higher. The bar  for physical safety is higher than the human's   capability because lives are at stake.

    The bar for  digital safety, not that high because all you're  

    21:10

    going to lose is money. Nobody's life is at stake.  But yet, physical autonomy is ahead of digital   autonomy in 2025, which seems counterintuitive.  Like, why is that the case at a technical level?   Why is it that what should sound easier is  actually a lot harder?

    Yeah, so I think there  

    21:30

    are kind of two things at play here. And I really  like the analogy with self-driving cars because   they've actually been one of the best applications  of AI, I think, that I've used recently.

    But I   think there are two things in play. One of them is  honestly just the timelines.

    We've been working on   self-driving cars for so long. I remember back in  2014, it was kind of like the advent of this and  

    21:50

    everyone was like, "Oh, it's happening in five  years." It turns out it took like 10, 15 years   or so for this time. So there's been a long time  for the technology to really mature.

    And I think   there's probably like dark ages back in like 2015  or 2018 or something where it felt like it wasn't   going to happen. A trough of disillusionment.

    Yes,  yes, yeah. And then now we're finally seeing it  

    22:10

    get deployed, which is really exciting. But  it has been like, I don't know, 10 years,   maybe even 20 years from the very beginning of  the research.

    Whereas I think AI agents are like   really in day one here. Like, chat GPT only came  out in 2022, so like around three years, like less   than three years ago.

    I actually think what we  think about with AI agents and all that really,  

    22:30

    I think, started with the reasoning paradigm that  when we released the o1 preview model back in   late last year, I think. And so I actually think  this whole reasoning paradigm with AI agents and   the robustness that those bring has only really  unfolded for like a year, less than a year,  

    22:46

    really. And so I know you had a chart in your blog  post, which I really like, which the slope is very   meaningfully different now.

    Self-driving started  very, very early. Slope seems to be a little bit   slower, but now it's reaching the promised land.  But man, we started super recently with AI agents,   and the slope I think is incredibly steep, and  we'll probably see a crossover at some point.  

    23:06

    But we really have only had like a year really  to explore these things. Do you think we haven't   crossed over already when you look at the coding  work in particular?

    Yeah, it's a good point. It's   like, your chart actually shows AI agents is below  self-driving, but like, what is the Y axis?

    Some  

    23:22

    measures, like, I would not be surprised actually  if AI products are AI agents, products are making   more revenue than Waymo at this point. Like Wayma  was making a lot, but like, just look at all the   startups coming up, look at ChatGPT and how many  subscriptions are happening there and all of that.   And so maybe we have actually crossed, and a  couple years from now, it's going to look very,  

    23:40

    very different. Yeah, the Y axis is tangible  felt autonomy.

    Don't pick the objective.   How do I feel about it? Exactly, it vibes more  than revenue.

    But revenue is a good one. We   should probably redo that with revenue.

    There's a  second thing I wanted to mention on this as well,  

    23:57

    which is the scaffolding and the environment  in which these things operate in. So I actually   remember in the early days of self-driving, a  lot of the researchers around self-driving were   saying that the roads themselves will have to  change to accommodate self-driving.

    There might   be sensors everywhere so that the self-driving  cars can interact with it, which I think is like,  

    24:14

    retrospect overkill. But I actually do think  self-driving cars have a good amount of   scaffolding in the world for them to operate in.  Like not completely unlimited.

    You have roads,   roads exist, they're pretty standardized. You have  stoplights.

    People generally operate in pretty  

    24:30

    normal ways. And there are all these traffic  laws that you can learn.

    Whereas AI agents are   just kind of dropped in the middle of nowhere,  and they kind of have to feel around for them.   And I actually think going off of what Olivier  just said too, my hunch is some of the enterprise  

    24:46

    deployments that don't actually work out likely  don't have the scaffolding or infrastructure for   these agents to interact with as well. A lot of  the really successful deployments that we've made,   a lot of what our FDEs end up doing with some  of these customers is to create almost like a   platform or some type of scaffolding, connectors,  organizing the data so that the models have  

    25:04

    something that they can interact with in a more  standardized way. And so my sense of self-driving   cars actually have had this in some degree with  roads over the course of their deployment.

    But   I actually think it's still very early in the  AI agents space. And I would not be surprised   if a lot of these, a lot of enterprises, a  lot of companies just don't really have the  

    25:22

    scaffolding ready. So if you drop an AI agent in  there, it kind of doesn't really know what to do,   and its impact will be limited.

    And so I think  once this scaffolding gets built out across some   of these companies, I think the deployment will  also speed up. But again, to our point earlier, I   think there's no slowdown.

    Things are still moving  very fast. That's great.

    Well, you know, I've  

    25:41

    thought about autonomy as a three-part structure.  You've got perception. You've got the reasoning,   the brain.

    And then you've got the scaffolding,  the last mile of making things work. Maybe we can   dive into the second part, which is the reasoning,  which is the juice that you guys are building with  

    25:58

    GPT-5, most recently. Huge endeavor, congrats.

    The  first time you guys have launched a full system,   not a model or a set of models, but a full  system. Talk about that.

    I mean, the full arc   of that development, what was your focus? I mean,  honestly, the benchmarks all seem so saturated.  

    26:14

    Like clearly it was more than just benchmarks  that you were focused on. And so what is a   North Star?

    Tell us about GPT-5, soup to nuts.  It's been the work of love of many people for   a long time. And to your point, I think GPT-5 is  amazingly intelligent.

    You look at the benchmark,  

    26:32

    like the suite bench and the likes, it is going  pretty high. But I think to me equally important   and impactful was, I would say, the craft, like  the style, the tone, the behavior of the model.   So capabilities intelligence and behavior  of the model.

    On the behavior of the model,  

    26:50

    I think it's the first model, like large model  release for which we have worked so closely with   a bunch of customers for like month and month,  essentially, to better understand what are the   concrete blocks, what are the concrete blockers  of the model. And often it's not about having  

    27:08

    a model which is way more intelligent, a model  which is a model that better follows instruction,   a model that is more likely to say no when he  doesn't know about something. And so that super   close customer feedback loop on GPT-5 was pretty  impressive to see.

    And I think all the love that  

    27:27

    GPT-5 has been getting in the past couple of  weeks, I think people are starting to feel that,   essentially, the builders. And once you see it,  it's really hard, essentially, to come back to   a model which is extremely intelligent, but an  exquisite academic, essentially, way.

    Are there  

    27:45

    trade-offs that you made as you were going through  it? Maybe what are the hardest trade-offs you made   as you were building GPT-5?

    I actually think a  very clear trade-off, which I honestly think we   are still iterating on, is the trade-off between  the reasoning tokens and how long it thinks versus   performance. And honestly, this is something  that I think we've been working on with our  

    28:04

    customers since the launch of the reasoning  models, which is these models are so, so smart,   especially if you give it all this thinking time.  I think the feedback I've been seeing around GPT-5   Pro has been pretty crazy, too. It's just like  these unsolved-- Andrej had a great tweet last  

    28:20

    night. Yeah, I saw that Sam retweeted it.

    But  these unsolved problems that none of the other   models could handle, you throw to GPT-5 Pro and  it just one-shots it, it's pretty crazy. But the   trade-off here is you're waiting for 10 minutes.  It's quite a long time.

    And so these things just  

    28:35

    get so smart with more inference time. But on  the product builder on the API side for some of   these business use cases, I think it's pretty  tough to manage that trade-off.

    And for us,   it's been difficult to figure out where we want  to fall on that spectrum. So we've had to make   some trade-offs on how much of the model think  versus how intelligent should it get.

    Because as a  

    28:54

    product builder, there's a real latency trade-off  that you have to deal with where your user might   not be happy waiting 10 minutes for the best  answer in the world. It might be more okay with   the substandard answer in no wait at all.

    Yeah,  I mean even between GPT-5 and GPT-5 thinking,   I have to toggle it now because sometimes  I'm so impatient I just want it ASAP. I think  

    29:13

    there's an ability to skip, right? Yeah, that's  right.

    And GPT where it's like I'm impatient,   I just want a more simple answer. That's right,  that's right.

    Well, four weeks in, GPT-5, how's   the feedback? Yeah, I think feedback has been  very positive, especially on the platform side,   which has been really great to see.

    I think a lot  of the things that Olivier mentioned have been,  

    29:33

    you know, coming up in feedback from customers.  The model is extremely good at coding, extremely   good at kind of like reasoning through different  tasks. But especially for like coding use cases,   especially at the, you know, when it thinks for  a while, it'll usually solve problems that no  

    29:49

    other models can solve. So I think that's been  a big positive point of feedback.

    The kind of   robustness and the reduction in hallucinations has  been a really big positive feedback. Yeah, yeah,   yeah.

    I think there's an eval that showed that  the hallucinations basically went to zero for   a lot of this. It's not perfect, there's still a  lot of work to be done, but I think because of the  

    30:07

    reasoning in there too, it just makes the model  more likely to say no, less likely to hallucinate   answers. So that's been something that people  have really liked as well.

    Other bit of feedback   has been around instruction following. So it's  really good at instruction following.

    This almost   bleeds into like the constructive feedback that  we're working on where for that it's so good at  

    30:24

    construction following, that instruction following  that people need to tweak their prompts or it's   almost like too literal. That's why it's an  interesting trade-off actually, because you know   when you ask people developers like what do you  want, like you want the model for instructions,   of course, you know.

    But once you have a model  which is like, that is like extremely literal  

    30:41

    essentially, that essentially forces you  to express extremely clearly what you want,   otherwise the model may go sideways. And so that  one was interesting feedback.

    It's almost like   the monkey paw where it's like developers and  platform customers ask for better instruction   following. They're like, yes, we'll give you  really good instruction following, but it's like,  

    30:58

    you know, it follows it almost to a T. And  so it's obviously something that the team is   actually working through.

    I think a good example  of this, by the way, is some customers would have   these prompts. I remember when we were testing  GPT-5, one of the negative feedback that we   got was the model was too concise.

    We were like,  what's going on? Why is the model so concise?

    And  

    31:14

    then we realized it was because they were using  their old prompts from other models. And with   the other models, they have to like, you have to  like really beg the model to be concise.

    So there   are like 10 lines of like, be concise, really be  concise. Also keep your answer short.

    And it turns   out when you give that to GPT-5, it's like, oh my  gosh, this person really wants it to be concise.  

    31:32

    And so the response would be like one sentence,  which is too terse. And so just by removing the   extra prompts around being concise, the model  behaved in a much better way and much closer to   what they actually end up.

    Yeah, turns out writing  the right prompt is still important. Yes, yes,   yeah.

    Prompt engineering is still very, very  important. On constructive feedback for GPT-5,  

    31:52

    there's actually been a good amount as well,  which we're all working through. One of them   that I think is, I'm really excited for the next  snapshot to come out to fix some of this is code   quality and like small like code, like paradigms  or like idioms that they might use.

    I think there  

    32:08

    are like feedback around the types of code and  the patterns in which it was using, which I think   we're working through as well. And then the other  bit of feedback, which I think we've already made   good progress on internally is around the trade  off of the reasoning tokens and thinking and   latency around intelligence.

    I think especially  for the simpler problems, you don't usually need  

    32:27

    a lot of thinking. The thinking should ideally  be a little bit more dynamic.

    And of course,   we're always trying to squeeze as much reasoning  and performance into as little reasoning tokens as   possible. So I'd imagine that kind of going down  as well.

    Yeah. Well, huge congrats.

    I mean, it's   been, I know it's a work in motion for a bunch  of our companies. They've had incredible outcomes  

    32:45

    with GPT-5, one of them's Expo, cybersecurity  business, it's like a huge-- Yeah, I saw the   charge from that. It was pretty crazy.

    Huge, huge  upgrade from whatever they were using prior to   that. I think they're going to need a new eval  soon.

    That's right. They're going to need a new   eval.

    It's all about evals. On the multimodality  side of it, obviously you guys announced the real  

    33:04

    time API last week. I saw T-Mobile was  one of the featured customers on there.   Talk about that, like how obviously the text  models are leading the pack, but then we got   audio and we got video.

    Talk about the progress  on the multimodal models. When should we expect  

    33:20

    to have the next big unlock and what would that  look like? It's a good question.

    The teams have   been making amazing progress on multimodality. On  voice, image, video, frankly, the last generation   models have been unlocking quite a few cool use  cases.

    One of the feedback that we've received is  

    33:37

    because text was so much leading the pack on the  intelligence, people felt like in in voice that   the model was somewhat a little less intelligent.  Until you actually see it, it does feel weird to   have a better answer on text versus voice. That's  pretty much the focus that we have at the moment.  

    33:55

    I think we filled part of that gap, but not  the full gap for sure. I think catching up,   I would say with the text would be one.

    A second  one, which is absolutely fascinating, is the   model is excellent at the moment on easy casual  conversation, talk to your coach, your therapist.  

    34:18

    We basically had to teach the model to speak  essentially better in actual work economically   valuable setups. To give an example, the model has  to be able to understand what an SSN is and what   it's meant to spell an SSN.

    If one digit is fuzzy,  it actually has to repeat versus guess. There are  

    34:38

    lots of infusions like that that someone has  of our voice that we are currently teaching the   model. That's an ongoing work actually with our  customers until we actually confront the model to   actual customer support calls, actual set school.  It's really hard to get a feel for those gaps.  

    34:56

    That's a top priority as well. This is completely  off script, but an interesting question that comes   up in voice models, particularly the real-time API  is previously people were taking a speech input,   convert that to text, then have some layer of  intelligence.

    Then you would have a text to  

    35:15

    speech model that would play it back. It would be  a stitch of these three parts.

    The real-time API,   you guys have integrated all of that. How does it  happen?

    Because a lot of the logic is written in   text. A lot of the boolean logic or any function  calling is written in text.

    How does it work with  

    35:35

    the real-time API? That's an excellent question.  The reason why we should do real-time API is that   we saw that for the stitch model.

    The stitch  model. Yeah.

    The real-time API. The stitch.

    We   call it a stitch together. Like a speech to text,  thinking, text to speech.

    We saw essentially a  

    35:54

    couple of issues. One, slowness, like you know,  more hops essentially.

    Two, loss of signal,   like a close stitch model. The speech to  text model is less intelligent.

    Yeah, you'd   lead through the emotion. Exactly.

    Exactly. Right.  Pauses.

    Yeah. When you are doing actual voice,  

    36:13

    like phone calls, essentially, those signals  are so important, One of the challenges that   we have is what you mentioned, which is, it means  a slightly different architecture, essentially,   for text versus voice. That's something that we  are actively working on.

    But I think it was the  

    36:31

    right call to start essentially with, let's make  the voice experience like natural sounding to a   point where essentially you're feeling comfortable  putting in production and then working backward to   unify the orchestration logic, essentially,  across modalities. And then to be clear,  

    36:48

    a lot of customers still stitch these together.  It's like what worked in the last generation. But   what we're interested in seeing is more and more  customers moving towards the real-time approach   because of how natural it sounds, how much lower  latency.

    It is especially as we uplevel the   intelligence of the model. But also even taking a  step back, I will say it's pretty mind-blowing to  

    37:05

    me that it works. I think it's mind-blowing that  these LMs work at all, where you just train it on   a bunch of text and it's just autoregressively  coming up with the next token and it sounds super   intelligent.

    That's mind-blowing in and of itself.  But I think it's actually even more mind-blowing   that this speech-to-speech setup actually works  correctly because you're literally taking the  

    37:20

    audio bits from someone speaking, streaming,  putting it into the model, and then it's   generating audio bits back. To me, it's actually  crazy that this works at all, let alone the fact   that it can understand accents and tone and pauses  and things like that, and then also be intelligent  

    37:36

    enough to handle a support call or something  like that. If you've gone from text-in, text-out   to voice-in, voice-out, that's pretty crazy.

    We  have a bunch of companies in our portfolio that   are using these models, Parloa on the customer  support side, LiveKit on the infra side. There's  

    37:52

    a bunch of use cases we were starting to see that  a speech-to-speech model could address. There's   a lot of the harder ones still running on what  you're calling the "stitch model." But I hope   the day is not far when it's all on real-time API.  It's going to happen at some point.

    Right, right,  

    38:09

    right. And actually maybe that's a good segue  into talking about model customization because   I suspect that you have such a wide variety of  enterprise customers.

    I think you mentioned what,   hundreds of customers or maybe more? Each of them  has a different use case, a different problem set,   a different call and envelope of parameters that  they're working in, maybe latency, maybe power,  

    38:28

    maybe others. How do you handle that?

    Talk  about what OpenAI offers enterprises who need   a customized version of a great model to make  it great for them. Yeah, so model customization   is actually been something that we've invested  very deeply in on the API platform since the very  

    38:44

    beginning. So even pre-ChatGPT days, we actually  had a supervised fine-tuning API available and   people were actually using it to great effect.

    The  most exciting thing actually I'd say around model   customization, it obviously resonates quite well  with customers because they want to be able to   bring in your own custom data and create your  own custom version of o3 or o4-mini or something  

    39:05

    or GPT-5 even suited to their own needs.  It's very attractive but the most recent   development I think is very exciting has been  the introduction of reinforcement fine-tuning.   For something we announced late last year, I think  in the 12 days of Christmas, we've GA'd it since  

    39:20

    and we're continuing to iterate on it. What is  it, break it down for us?

    Yeah, so it's called,   it's actually funny, I think we made up the term  reinforcement fine-tuning. It's like not a real   thing until we announced that.

    It's stuck now. I  see it all the time.

    I remember we were discussing   it and I was like, "I don't know about RPI."  You're not kidding. You're not kidding.

    Yeah,  

    39:37

    so reinforcement fine-tuning. So it really,  it's introducing reinforcement learning into the   fine-tuning process.

    So the original fine-tuning  API does something called supervised fine-tuning,   we call it SFT. It is not using reinforcement  learning.

    It is, it's using supervised learning.  

    39:57

    And so what that usually means is you need a  bunch of data, a bunch of prompt completion   pairs. You need to really supervise and tell  exactly the model how it should be acting.

    And   then when you train it on our fine-tuning API, it  moves it closer in that direction. Reinforcement   fine-tuning introduces like RL or reinforcement  learning to the sloop.

    Way more complex,  

    40:14

    way more finicky, but in order of magnitude more  powerful. And so that's actually what's really   resonated with a lot of our customers.

    It allows  you to, if you use RFT, the discussion is less of   like creating a custom model that's specific to  your own use case. It is, you can actually use  

    40:29

    your own data and actually crank the RL, yeah,  turn the crank on RL to actually create a like   best-in-class model for your own particular use  case. And so that's kind of the main difference   here.

    With RFT, the data set looks a little bit  different. Instead of prompt completion pairs,  

    40:45

    you really need a set of tasks that are very  gradable. You need a grader that is very objective   that you can use here as well.

    And so that's  actually been something that we've invested a   lot in over the last year. And we've actually seen  a number of customers get really good results on   this.

    We've talked about a couple of them across  different verticals. So Rogo, which is a startup  

    41:04

    in the financial services space. They have a  very sophisticated AI team.

    I think they hire   some folks in DeepMind to run their AI program.  And they've been using RFT to get best-in-class   results on parsing through financial documents,  answering questions around it, and doing tasks  

    41:21

    around that as well. There's another startup  called Accordance that's doing this in the   tax space.

    I think they've been targeting an eval  called TaxBench, which looks at CPA-style tasks as   well. And because they're able to turn it into a  very gradable setup, they're actually able to turn  

    41:38

    the RFT crank and also get, I think, like, SOTA  results on TaxBench just using our RFT product as   well. And so it has kind of shifted the discussion  away from just customizing something for your own   use case to really leveraging your own data to  create a best-in-class, maybe best-in-the-world  

    41:53

    model for something that you care about for your  business. Yeah, I feel like the base models are   getting so good at instruction following that for  behavior steering, you don't need to find you at   that point.

    You can describe what you want, and  the model is pretty good at it. But pushing the  

    42:10

    frontier on actual capabilities, my heart is  that RFT will pretty much become the norm. If   you are actually pushing in your field, like, you  know, intelligence to a pretty high point, like,   at some point, you need to aisle essentially with  custom environments.

    Fascinating. And even going  

    42:29

    back to the point earlier around top-down versus  bottom-up for some of these enterprises, a lot of   the data that you end up needing for RFT require  very intricate knowledge about the exact task that   you're doing and understanding how to grade it.  And so a lot of that actually comes from bottoms   up. Like, I know a lot of these startups will work  with experts in their field to try and get the  

    42:47

    right tasks and get the right feedback to craft  some of these data sets. Without further ado,   we're going to jump into my favorite section,  which is a rapid-fire question.

    We had a lot of   great friends of ours send in some questions  for you guys. We'll start with Altimeter's   favorite game, which is a long, short game.

    Pick  a business, an idea, a startup that you're long,  

    43:08

    and the same short that you would bet against  that there's more hype than there's reality.   Whoever's ready to go first, long, short. My long  is actually not in the AI space, so this is going   to be slightly different.

    Wow. Here we go.

    My  short is, though, in the AI space. So I'm actually  

    43:25

    extremely long esports. And so what I mean by  "esports" is the entire, like, professional   gaming industry that's emerging around  video games.

    Very near and dear to my heart,   I play a lot of video games, and so I watch a lot  of this. So obviously, I'm pretty in the weeds on   this.

    But I actually think there's incredible  untapped potential in esports and incredible  

    43:43

    growth to be had in this area. So concretely, what  I mean, a really big one is League of Legends.   All of the games that Riot Games puts out, they  actually have their own professional leagues.

    They   actually have professional tournaments, believe  it or not. They rent out stadiums, actually,  

    43:58

    now. But I just think it's like, if you look  at kind of what the youth and what younger kids   are looking and where their time is going, it's  predominantly going towards these things.

    They   spend a lot of time on video games. They watch  more esports than like soccer or basketball?

    Yeah,   yeah, yeah, yeah. A growing number of these,  too.

    I've actually been to some of these events,  

    44:16

    and it's very interesting. He's very commited to  his long.

    Yeah, yeah. I'm extremely long stuff.   And so they're booking out stadiums for people to  go watch electronic sports.

    Yeah, yeah, yeah. I   literally went to Oracle Arena, the old Warrior  Stadium, to watch one of these, I think, before  

    44:34

    COVID. And then the...

    So it's just... Before  COVID?

    Wow, that's five years ago. Six years   ago.

    So I've been following this for a while,  and I actually think it had a really big moment   in COVID. Like, everyone was playing video games.  Yeah, it was more so...

    I think it's kind of like,   come back down. So I think it's like, undervalued.  You know, it's like, I think no one's really   appreciating it now.

    But it has all the elements  to like, really, really take off. And so the youth  

    44:54

    are doing it. The other thing I'd say is it is  huge in Asia.

    Like absolutely massive in Asia. It   is absolutely big in Korea, in China as well.

    Like  we rented out Oracle Arena, I think, or like the   event I went to was in Oracle Arena. My senses in  Asia, they rent out like the entire stadiums, like  

    45:10

    the soccer stadiums, and the players are ready  like celebrities. So anyways, as like, I know like   Korean culture is really making its way into the  US as well.

    I think that's another tailwind for   this whole thing. But anyways, esports, I think,  is something you should keep an eye out on because   there's a lot of room for growth.

    Very unexpected.  Good to hear. Short.

    My short, my short's a little  

    45:29

    spicy, which is I'm short on the entire category  of like tooling around AI products. And so this   encapsulates a lot of different things.

    Kind of  cheating because some of these, you know, I think   are starting to play out already. But I think like  two years ago, it was maybe like evals products  

    45:48

    or like frameworks or vector stores. I'm pretty  short those.

    I think nowadays there's a lot of   additional excitement around other tooling around  AI models. So RL environments, I think, are really   big right now as well.

    Unfortunately, I'm very  short on those. I'm not really, I don't really see  

    46:07

    a lot of potential there. See a lot of potential  and reinforcement learning and applying it.

    But I   think the startup space around RL environments,  I think, is really tough. Main thing is, one,   it's just a very competitive space.

    There's just a  lot of people kind of operating it. And then two,  

    46:22

    if the last two years have shown us anything, the  space is evolving so quickly and it's so difficult   to try and like adapt and understand what the  exact stack is that will really carry through   to the next generation of models. I think that  just makes it very difficult when you're in the   tooling space because, you know, today's really  hot framework or really hot tool might just not  

    46:42

    get used in the next generation of models. So I've  been noticing like the same pattern, which is the   teams that build like breakout startups in AI are  extremely pragmatic.

    They are not super like, you   know, intellectual, but like the perfect world,  et cetera. And it's funny because I feel like,  

    46:59

    you know, our generation has basically started  in tech in a very like stable moment where,   you know, technology had been building up for  years and years with like SaaS, like cloud,   et cetera. And so we were in a way like raised,  like, you know, in that very stable moment where   it makes sense at that point to, you know, design  like very like, you know, good, like abstractions  

    47:18

    and toolings because, you know, you have a sense  why it's going. But it's so different today.

    Like,   don't wait for it to know what's going to happen  next year or two. So it's almost impossible to   define like the perfect tooling platform.

    Right.  Right. Right.

    Well, that's there's a lot of that   going around right now. Yes.

    Spicy. A lot of  homework there.

    Olivier, over to you, sir. Long  

    47:37

    short. I've been thinking a lot about education  for the past month in the context of kids.   I'm pretty short on any education which basically  emphasizes human memorization at that point.  

    47:52

    And I say that having mostly been through that  education myself, but, you know, like I learned so   much on like, you know, history facts, like, you  know, legal things that are, you know, some of it   like does shape your way of thinking. A lot of it,  frankly, is just like, you know, knowledge tokens,   essentially.

    And those knowledge tokens, you  know, it turns out like, you know, other items  

    48:11

    are pretty good at it. So I'm quite short on that.  You will need memory when strategy is bionic.

    You   can just think about it straight into your head.  Exactly. Exactly.

    What am I long at? Frankly,   I think healthcare is probably the industry  that will benefit the most from AI in the next  

    48:32

    like year or two. I would say more.

    I think all  the ingredients are here for a perfect storm. A   huge amount of like structure and structural data,  you know, it's basically the heart of, you know,   like the pharma companies, the Mauser, excellent  at digesting processing that kind of data.

    A huge  

    48:51

    amount of like admin, like, you know, heavy, like  documents heavy, like, you know, culture. But at   the same time, like companies which are very  technical, very R&D friendly, like, you know,   companies like, you know, who's like sort of  technology in a way that the heart of what they   do.

    And so, yeah, I'm pretty bullish on that. This  is like life sciences.

    So you mean life sciences,  

    49:10

    research organizations that are producing drugs.  Gotcha. Exactly.

    It's almost like, you know,   over the last 20, 30 years, these like pharma or  like biotech companies have basically, if you look   at the work that they're doing, like only a small  amount of it is actual research. And so much of it  

    49:28

    ends up being admin and like, you know, documents  and things like that. And that area is just so   ripe for, you know, something to happen with AI.  And I think that's what we're seeing with Amgen   and some of these other customers.

    Exactly. And  it's also like not what they want to do.

    I think   it's good that we have some regulations there,  obviously, but like, just means that they have   like reams and reams of things to kind of go  through. And so, you know, like when you have  

    49:47

    a technology that's able to really help like  bring down the cost of something like that, I   think it'll just, you know, tear right through it.  And I think once governments and like, you know,   institutions are going to realize that, like, if  you step back, like, it is probably one of the   biggest bottleneck to like human progress, right?  You step back in the past decade, like, you know,  

    50:04

    how many like breakthrough drugs have there been?  Like, you know, not that many. Like, you know, how   life would be different if you double that rate?  Essentially.

    So once you realize what is at stake,   then my hunch is that we're going to see quite  a bit of momentum in that space. Wow.

    All right.  

    50:19

    Lots of homework there as well. Yeah.

    Next one.  Favorite underrated AI tool other than ChatGPT   maybe. I love Granola.

    Oh man, you stole mine.  You stole my answer. I do so much, Granola.

    Like,   um. Two votes for Granola.

    There is something  like, yeah. Hey, what about ChatGPT record?

    I like  

    50:37

    ChatGPT as well, but there are some pictures of  Granola, which I think are really done well. Like,   the whole, like, you know, integration with your  Google Calendar is excellent.

    Yeah. Um.

    And just,   you know, the quality of, like, the transcription  and, like, the summary is pretty good. Do you just   have it on?

    Because I know your calendar is back  to back. You just have Granola on.

    So the funny  

    50:55

    thing is that I don't use Granola internally. I  use Granola for my personal life mostly.

    I see.   Yeah. I see.

    On dates. I'm joking.

    I was going  to say, yeah, Granola is actually going to be   mine. So two votes for Granola.

    I was going to  say the easy answer for me is Codex. That as a  

    51:10

    software engineer. It's just like, it's gone  so good recently.

    Codex CLI, especially with   GPT-5. Especially for me, I tend to be less time  sensitive about, like, you know, the iteration   loop with coding.

    And so leaning in as GPT-5 on  Codex I think has been really... Interesting.  

    51:26

    What about Codex has changed? Because, you know,  Codex has also been through a journey.

    Codex has   been around for a bit. I remember, like, it's  been launched for, like, more than over a year   ago.

    It's like, what's changed about Codex? Yeah,  I was actually going to...

    Codex the CLI has been   around for a bit. I feel like it's  been less than a year for Codex.

    I   feel like it's been less than a year for Codex.  The time dilation is so crazy and this feels...  

    51:43

    It feels like it's been around for a year ago  with GPT-5. Oh, like, you know, that demo,   like, that feels like ages ago.

    And it didn't even  come out yet. Probably because it hasn't happened   yet.

    The voice demo is... I think it was a naming  thing, okay, but anyway...

    Oh, there was a Codex   model. That's what I'm thinking about.

    There was  a Codex model. Yeah, we are.

    You're not too blamed  

    52:00

    for that confusion. Also, I think the GitHub thing  was called Codex.

    That's right. Yes, yes.

    I'm   talking about our coding product within ChatGPT,  which are the Codex Cloud offering, and then also   Codex CLI. So, actually, maybe if I were to narrow  my answer a little bit more, it's Codex CLI,   which I've really, really liked.

    I like the local  environment set up. The thing that's actually made  

    52:19

    it really useful in the last, I'd say, like, month  or so is, one, I think the team has done a really   good job of just, like, getting rid of all the  paper cuts, like, the small product polish and,   like, paper cut things. It just...

    It kind  of feels like a joy to use now. It feels more   reactive.

    And then the second thing, honestly,  is GPT-5. I just think GPT-5 really allows the  

    52:38

    product to shine. Yeah.

    It's, you know, at the end  of the day, this is kind of a... This is a product   that really is dependent on the underlying  model.

    And when you have to, you know, like,   iterate and go back and forth with model, like,  four or five times to get it right, to get it to,   like, you know, do the change that you want,  versus having it think a little bit longer and  

    52:55

    it just, like, one-shots and does exactly what  you want to do, you get this, like, weird, like,   bionic feeling where you're like, "I feel so  mind-melded with the model right now and, like,   perfectly understands what I'm doing." So getting,  like, that kind of dopamine hit and, like,   feedback loop constantly with codecs has made it  kind of like an indispensable thing that I really,  

    53:11

    really like. Nice.

    And the other thing I'd say  Codex is just really good for me is... So I use   it for, like, personal projects.

    I also use it  to, like, help me understand code bases, like,   as a engineering manager and now I'm not as in the  weeds on the actual code. And so you're actually  

    53:28

    able to use Codex to really understand what's  happening with the code base, have it, like,   ask questions and have it answer about things and  really catch up to speed on things as well. So,   like, even the non-coding use cases are really  useful with Codex CLI Fascinating.

    Sam had this   tweet about Codex usage ripping, I think, like,  yesterday. So I wonder what's going on there,  

    53:49

    but you're not alone. Yeah, I think I'm not  alone.

    Just judging from the Twitter feedback,   I think people are really realizing how great  of a combination Codex CLI and GPT-5 are. Yeah,   I know that team is undergoing a lot of scaling  challenges, but, I mean, the system hasn't gone   down for me, so props to them.

    But we are in a GPU  crunch, so we'll see how long that goes. Awesome,  

    54:09

    awesome. All right, the next one.

    Will there  be more software engineers in 10 years or less?   There's about 40, 50 million... full-time,  professional software engineers.

    That's what   you mean, like, full-time, like, actual jobs?  Yeah, because it's a hard one, because, like,  

    54:25

    I think without a doubt there's going to be a  lot more software engineering going on. Yes, of   course.

    There's actually a really great post that  was shared, I think, in our internal Slack. It was   like a Reddit post recently.

    I actually think  that highlights this. It was a really touching   story.

    It was a Reddit post about someone who has  a brother who's non-verbal. I actually don't know  

    54:41

    if you saw this. It was just posted.

    It's a person  on Reddit posted, they have a non-verbal brother   who they have to take care of. The brother, like,  they tried all these types of things to help the   brother interact with the world, use computers,  but, like, vision tracking didn't work, because   I think his vision wasn't good.

    All the tools  didn't work, and then this brother ended up using  

    55:00

    ChatGPT. I don't think he used codecs, but he  used chatGPT and basically taught himself how to   create a set of tools that were tailor-made to his  non-verbal brother.

    Basically, a custom software   application just for them. Because of that, he  now has this custom setup that was written by   his brother and allows him to browse the internet.  I think the video was him watching The Simpsons or  

    55:19

    something like that, which was really touching.  I think that's actually what we'll see a lot more   of. This guy's not a professional software  engineer.

    His title's not software engineer,   but he did a lot of software engineering, probably  pretty good. Good enough, definitely, for his   brother to use.

    The amount of code, the amount of  building that'll happen, I think, is just going  

    55:36

    to go through an incredible transformation.  I'm not sure what that means for software   engineers like myself. Maybe there's equivalent  or maybe there's-- Of course, more Sherwin.

    Yeah,   more of me. More of me specifically.

    We need more  of you. That's right.

    Yeah. But definitely a lot   more software engineering at a lot of companies.  I buy that completely.

    I buy completely a thesis  

    55:53

    that there is a massive software shortage in  the world. We've been sort of accepting it for   the past 20 years.

    But the goal of software was  never to be that super rigid, super hard to build   artifact. It was to be customized, malleable.  And so I expect that we'll see way more-- sort  

    56:12

    of a reconfiguration of people's job and skillset  where way more people code. I expect that product   managers are going to code more and more, for  instance.

    You made your PMs code recently,   if you were right. Oh, yeah, we did that.

    It was  really fun. We started essentially not doing PRDs,  

    56:31

    product requirements documents. Classic  PM thing.

    You write five pages. My product   does that, et cetera.

    And PMs have been basically  coding prototypes. And one is pretty fast with   GPT-5 and Codex.

    Yeah, just a couple hours, I  think. Fricking fast.

    And second, it sort of  

    56:48

    conveys so much more information than a document.  You get a feel, essentially, for the feature.   Is it right or not? So yeah, I expect that sort of  behavior we're going to see more and more.

    Yeah,   instead of writing English, you can actually  now write the actual thing you want. Yeah,   yeah.

    Yeah, that's amazing. Advice for high school  students who are just starting out their career.  

    57:09

    My advice is-- I don't know. Maybe it's evergreen.  Prioritize critical thinking above anything else.   If you go in the field, which requires extremely  high critical thinking, like skills-- I don't  

    57:25

    know, math, physics, or maybe philosophy's in  that bucket-- you will be fine regardless. If   you go in the field, that sort of turns down that  thing.

    And again, it gets back to memorization,   like pattern matching. I think you will probably  be less future-proof.

    What's a good way to sharpen  

    57:41

    critical thinking? Use ChatGPT and have it test  you.

    That's a tricky test. Having a work-class   tutor who essentially knows how to put the  bar about 20% of what you can do all the time  

    57:59

    is actually probably a really good way to do  it. Nice.

    Anything from you, sir? Mine is--   I think we're actually in such an interesting,  unique time period where the younger-- so maybe  

    58:16

    this is more general advice for not just high  school students, but just the younger generation,   even college students. I think the advice would  be don't underestimate how much of an advantage   you have relative to the rest of the world right  now because of how AI native you might be or how  

    58:31

    in the ways of the tools you are. My hunch is high  schoolers, college students, when they come into   the workplace, they're going to have actually  a huge leg up on how to use AI tools, how to   actually transform the workplace.

    And my push for  some of the younger high school students is one,  

    58:46

    just really immerse yourself in this thing. And  then two, just really take advantage of the fact   that you're in a unique time where no one else  in the workforce really understands these tools   as deeply, probably as you do.

    A good example of  this is actually we had our first intern class   recently at OpenAI, a lot of software interns.  And some of them were just the most incredible  

    59:06

    Cusor power users I've ever seen. They were so  productive.

    I was shocked by the way. I was like,   yeah, I know we can get good interns, but I don't  know, they'd be like this good.

    And I think part   of it is they've grown up using these tools,  for better or worse, in college. But I think  

    59:23

    the meta-level point is they're so AI native.  And even me and Olivier, we're kind of AI native,   we work at OpenAI, but we haven't been steeped in  this and grown up in this. And so the advice here   would just be leverage that.

    Don't be afraid to go  in and spread this knowledge and take advantage of  

    59:41

    it in the workplace, because it is a pretty big  advantage for them. I can't remember who said   this to us at Palantir, but every intern class  was just getting faster, smarter, like laptops,   like smarter every generation.

    You sure didn't  peak in 2013 when I was an intern. That's right.   That's right.

    There's a weird spike. That's summer  2013.

    Two guys That's right. That's right.

    That's  

    00:03

    right. Well, lots happened here.

    A lot's happened  since you guys joined OpenAI, right? With three   years and almost three years.

    In your OpenAI  journey, what has been the rose moment, your   favorite moment, the bud moment where you're most  excited about something, but still opportunity  

    00:19

    ahead, and the thorn, toughest moment of your  three-year journey? The thorn is easy for me.   What we call the blip, which is the coup of the  board.

    That was a really tough moment. It's funny,   because after the fact, it actually reunited  quite a bit the company.

    There was a feeling,  

    00:36

    OpenAI had a pretty strong culture before, but  there was a feeling of camaraderie, essentially,   that was even stronger, but sure, tough on the  day of. It's very rare to see that anti-fragility.   Most orgs after something like that break  apart, but I feel like OpenAI got stronger.  

    00:52

    OpenAI came back. It's a good point.

    I feel it  made OpenAI stronger for real now, essentially,   when they look after the fact. When they look  at other news, like departures, or whatever,   bad news, essentially, I feel the company has  built a thicker skin and an ability to recover  

    01:11

    way quicker. I think it's definitely right.

    Part  of it, too, I think is also just the culture. I   also think this is why it was such a low point  for a lot of people.

    So many people just at   OpenAI care so deeply about what we're doing,  which is why they work so hard. You just care   a lot about the work.

    It almost feels like your  life's worked. It's a very audacious mission and  

    01:29

    thing that you're doing, which is why I think the  blip was so tough on a lot of people, but also is   what I think helped bring people back together  and how we were able to hold together and get   that thick skin as well. I have a separate worst  moment, which was the big outage that we had in  

    01:44

    December of last year. You remember.

    I do. It was  like a multi-hour outage.

    Really highlights to   us how essential of almost like a utility the API  was. So the background is I think we had a three,   four-hour outage sometime in November  or December last year.

    Really brutal,  

    02:04

    pure set of zero. No one could hit chat GBT.

    No  one could hit the APIs. It was really rough.

    That   was just really tough just from a customer trust  perspective. I remember we talked to a lot of our   customers to kind of post-mortem them on what  happened and kind of our plan moving forward.  

    02:20

    Thankfully, we haven't had anything close to that  since then. I've been actually really happy with   all the investments we've made in reliability  over the last six months.

    But in that moment,   I think it was really tough. On the happy side,  like on the roses, I think I have two of them.  

    02:40

    The first one would be GPT-5 was really good. The  sprint up to GPT-5, I think really showed the best   of OpenAI.

    Having cutting edge, science research,  extreme customer focus, extreme infrastructure and  

    02:57

    inference talent. The fact that we were able to  ship such a big model and scale it to many, many,   many tokens per minute almost immediately, I think  speaks to it.

    That one I really-- With no outages.  

    03:14

    With no outages Yeah, really good reliability. I  can remember when we shipped GPT-5 Turbo, like a   year ago, a year and a half ago, we were terrified  by the in-scale traffic.

    And I felt we've really   gotten much better at shipping those massive  updates. The second rose happy moment for me would  

    03:35

    be the first dev day was really fun. It felt like  a coming of age, like OpenAI.

    We are embracing   that we have a huge community of developers.  We are going to ship models and products. And I   remember basically seeing all my favorite people,  OpenAI or not, essentially nerding out on what  

    03:53

    are you building, what's coming up next. It felt  really like a special moment in time.

    That was   actually going to be mine as well. So I'll just  piggy back off of that, which is the very first   dev day, 2023 November.

    I remember it. I mean,  obviously a lot of good things have happened  

    04:10

    since then. There's just a very-- I don't know why  for me.

    It was a very memorable moment, which was   one, it was actually quite a rush up to dev day.  We shipped a lot. So our team was just really,   really sprinting.

    So it was like this high stress  environment kind of going up. To add to that,  

    04:26

    of course, because we're OpenAI, we did a live  demo on Sam's keynote of all the stuff that we   shipped. And I just remember being in the back  of the audience, sitting with the team, and   waiting for the demo to happen.

    Once it finished  happening, we all just let out a huge sigh of   relief. We were like, oh my god, thank you.

    And so  then there's just a lot of buildup to it. For me,  

    04:46

    the most memorable thing was I remember right  after dev day, all the demos worked well, all   the talks worked well. We had the after party, and  then I was just in a Waymo driving home at night   with the music playing.

    It was just such a great  end to the dev day. That was what I remember.   That was my rose for the last few years.

    Love it.  That's awesome. I assume you guys are, but please  

    05:05

    tell me if you're AGI-pilled, yes or no? And if  so, what was the moment that got you there?

    What   was your aha moment? When did you feel the AGI?  I think I'm AGI-pilled?

    I think I'm AGI-pilled.   You're definitely AGIPilled. I am?

    I've had  a couple of them. I've had a couple of them.  

    05:21

    The first one was the realization in 2023 that I  would never need to code manually like ever ever   again. I'm not the best coder, you know, I chose  my job like for a reason.

    But realizing that what   I thought was a given that we humans would have  to write basically machine language forever is  

    05:41

    actually not a given. And that the pay surprise  is huge.

    The second feel the AGI moment for me was   maybe the progress on voice and multimodality.  Like text, like at some point you get used to  

    05:57

    it. Like okay, the machine can write pretty  good text.

    Voice makes it real. But once you   start actually talking to something that actually  understands your tone, like understand my accent   like in French, it felt like sort of a moment,  like okay, machines are going beyond like cold,  

    06:17

    mechanical, deterministic, like you know,  like logic to something like much more like   emotional and like, you know, tangible.  Yeah, that's a great one. Yeah, mine are,   I do think I am AGI-pilled.

    I probably gradually  became AGI-pilled over the last couple of years.  

    06:35

    I think there are two. And for me, yeah, I think I  actually get more shocked from the text models.

    I   know the multimodal ones are really great as well.  For me, I think they actually line up with two   like general breakthroughs. So the first one was  right when I joined the company in September 2022.  

    06:52

    It was pre-ChatGPT. Yeah, two months ago.

    About  the time GPT-4 already existed internally. And I   think we were trying to figure out how to deploy.  I think Nick Turley talked about this a lot early   as a chat GPT.

    But it was the first time I talked  to GPT-4. And it was like going from nothing to   GPT-4 was just the most mind-blowing experience  for me.

    I think for the rest of the world,  

    07:12

    maybe going from nothing to GPT-3.5 in chat was  maybe the big one and then going from 3.5 to 4.   But for me, and I think for a lot of maybe some  other people who joined around that time, going   from nothing to, or not nothing, but like what was  publicly available at the time. Going from that to   GPT-4 was just incredible.

    Like I just remember  asking, throwing so many things out. I was like,  

    07:31

    there's no way this thing is going to be  able to give an intelligible answer. And   it just like knocks it out of the park.

    It  was absolutely incredible. GPT-4 was insane.   I remember GPT-4 came out when I was interviewing  with OpenAI.

    And I was still looking at the phone,   should I join? And so that thing, I was like,  okay.

    I mean, there is no way I can walk on  

    07:49

    anything else at that point. That's true.

    Yeah, so  GPT-4 was just crazy. And then the other one was,   like, is the other breakthrough, which is like the  reasoning paradigm.

    I actually think the purest   representation of that for me was deep research.  And throwing, like, asking it to really look up  

    08:07

    things that I didn't think it would be able to  know. And seeing it think through all of it,   be really persistent with the search, get really  detailed with the write-up and all of that.

    That   was pretty crazy. I don't remember the exact  query that I threw it, but I just remember, I   feel like the field AGI moments for me are, like,  I'll throw something at the model that I was like,  

    08:24

    there's no way this thing will be able to get. And  then it just like knocks it out of the park.

    Like,   that is kind of the field AGI moment. I  definitely had that with deep research   with some of the things I was asking.

    Yeah.  Well, this has been great. Thank you so much,   folks.

    You guys are building the future. You guys  are inspiring us every day.

    And appreciate the  

    08:41

    conversation. Yeah, thank you so much.

    Thank  you. Thanks for having us.

    [MUSIC PLAYING]   As a reminder to everybody, just  our opinions, not investment advice.