Monster prompt, OpenAI’s business play, nano-banana and US Open experimentations

Transcript

00:00

I think this is way more than a toy. This is by far the best image generation model that I've seen today.

And even if we look at the benchmarks, you know, when we look at, and I'm not a big fan of benchmarks, as you know, but even

00:15

when you look at those benchmarks, it is 200 sort of elo points ahead of everything else. >> All that and more on today's mixture of experts.

[Music] I'm Tim Hang and welcome to Mixture of

00:31

Experts. Each week, Moe brings together a panel of people pushing the frontiers of technology to discuss, debate, and analyze our way through the wildly fast-paced world of artificial intelligence.

Today, I'm joined by a great crew of veterans and also uh someone joining for the very first time.

00:46

I've got Aaron Bachmann, IBM fellow, master inventor. Aaron, welcome back to the show.

Uh Chris Haye, distinguished engineer uh and longtime veteran. And joining us for the very first time is Lauren McHugh, program director AI open innovation.

Lauren, welcome to the show. >> Thank you.

01:02

>> So, we've got a packed episode today. We're going to talk about Open AI hinting that they might sell infrastructure, nano banana, the US Open.

We're going to even talk about a 100page prompts coming out of KPMG. Uh, but first, as always, we've got our new segment from Eye.

So, I over to you.

01:22

>> Hey everyone, I'm Eiley McConn. I'm a tech news writer with IBM Think.

I'm here now with a few AI headlines you may have missed this busy week. First up, Nvidia, the world's most valuable company by market cap, reported a whopping 56% increase in sales over the

01:40

same period from last year. And this was largely driven by its data center business.

This would seem like good news for the chipmaker, right? In fact, market reaction was mixed because the revenues did not meet analyst expectations.

01:55

Next up, OpenAI and Anthropic, two of the biggest rivals in artificial intelligence, have actually teamed up to better understand the security issues facing models. They recently evaluated each other's models in order to better understand hallucinations and other

02:10

issues, basically hoping to catch what their own tests had missed. Meanwhile, in the category of helpful AI, many 911 centers are so understaffed that they're turning to AI to help them out.

This may

02:26

seem problematic on first blush, but actually these AI agents are helping with parking violations, noise complaints, basically non-urgent issues so that the human AI staffers can deal with the real emergencies.

02:41

Last but not least, IBM and NASA are helping give scientists more time to prepare before big storms hit. They recently released a new open-source foundation model called Surya that can predict solar flares that might interrupt satellites, power grids, and

02:58

GPS. Want to dive deeper into any of these topics?

Subscribe to our Think newsletter. The link is in the show notes.

So normally here ate we cover some of the biggest stories happening in AI

03:14

technology. You know the drops of all the largest models coming out of the frontier model companies the biggest features and products that people are launching.

But I actually want to start today with kind of a funny smaller story. Um there was a article that was written about KPMG um the kind of global

03:29

accounting firm uh which kind of launched their own as many companies and enterprise are doing now their own AI agent which they call Taxbot. And what Taxbot is attempting to do is gather together all of the tax advice expertise across a big firm like KPMG and uh

03:45

essentially strip through documents and generate sort of 25page kind of advisory opinions for their customers that basically are like kind of the first draft of what they would typically provide for a client. And this would normally be a very normal story.

Lots and lots of companies are are doing this

04:02

now. Um but this is the really funny thing.

They they took a lot of um flak. I don't know if flax's the right word, but they got a lot of attention online because they sort of revealed that in order to power tax spot, they had a 100page prompt running behind this, which I think just as someone who, you

04:18

know, kind of comes from a world where prompting is like a few sentences. This is like really remarkable.

Um, and so maybe Aaron, I'll start with you is like what's the longest prompt you've ever written? And is it kind of surprising to see like 100page prompts, like novel length, nolla length prompts coming out?

04:34

>> Yeah. Well, first I got to say, you know, growing up, and this this might, you know, give you my age a little bit, but I used to use these yellow books called Cliffnotes, you know, where where I could go pick it up from like Barnes & Noble, right, or or even buy it from Amazon and and get cliff notes about a book.

Well, we certainly don't need

04:50

those anymore, right? Because we can use these large prompts, right, to summarize.

But the largest prompt that I've ever written, um, I would say semi-written, right? Because I just copy paste a manual, right, into the context.

It was probably about 40 pages, right, that I inputed into a model and and it

05:08

came out, you know, with key points that were summarized. So, it it was very effective and really interesting how it worked and and and it surprised me >> and I think it's kind of the interesting thing.

So, and I guess maybe Lauren, do you want to jump in on this is, you know, I know a narrative that was very prominent maybe a year and a half ago,

05:24

two years ago, a long time ago in AI time was like prompt engineering is going to be dead over the long run. We're not going to really need prompt engineering.

You're going to just tell the computer what you want. and it will it will do it.

Um, this kind of story almost points us in a different direction, right? It's like almost a world in which in order to get agent

05:40

behavior to be really really good, there's going to have to be like a lot of specification and in some sense like prompt engineering is like becoming a a bigger part of what gets these things to work. Is that the right way of thinking about it?

Like did it turn out that prompt engineering was not actually dead? I think a good way to appreciate

05:57

how complex a prompt would be would be to look at some of the open source projects that are essentially agents. So GPT researcher metagpt you can see how long and complex those prompts are and that's been you know a whole community's worth of contribution of ideas of how to make the agent work better.

Um, I do

06:15

think that, you know, if a product requires a 100page user manual to work that, you know, at best it's poorly designed, at worst it's it's broken. Um, and in this case, the the product is the

06:31

model, the user manual is the prompt. So, one thing you could do is actually fine fine-tune it.

I think fine-tuning probably is making a comeback especially with some of the models like um Gemma 3 270 million where the actual

06:46

architecture is made to be tuned you know more parameters allocated to the embedding versus parameters allocated to the transformer blocks to do the processing >> part of it I think is like it actually goes to Aaron's original example which is I think like what's prompting exactly

07:03

you know I think prompting sometimes can just mean like the input to the model in which case it's be no surprise that you put in like a whole manual to try to get it to summarize here. I'm curious if like there's a reason why these prompts need to get super long in the tax domain.

Is there something about like

07:18

agents that require us to have longer prompts or do you think this is just kind of like a weird artifact of how they design the designs this tax bot basically? I think it like my main question would be out of those 100 pages, how many of those pages need to be rewritten for a new use case that you

07:35

know that only KPMG knows and that would get to the heart of how much of this is you know because it's an agent and any agent would need to do that and then how much of this is truly a custom solution which is then a lot harder to scale. >> Yeah, there's an interesting dynamic

07:51

there and I guess maybe Chris I'll bring you in on this. I think what Lauren what I hear you sort of saying is that like in some ways you have these really long prompts to make up for all the knowledge that the model doesn't know.

And so I guess Chris maybe there's one point of view is that as these models get deployed in more and more specialized

08:07

domains. It's not going to be atypical to to see really really long prompts emerge, right?

Because in effect there's all this domain knowledge that like a general model might not have. I suppose there's the original idea that the base model would just get smart enough that you wouldn't have to do that.

But I

08:22

mean, if if this is a good example, we may not be headed in that direction. >> I I'm not surprised, though.

I think a 100 pages, you know, I mean, as long as 99 of those pages aren't, you know, do not hallucinate, do not hallucinate, and repeat, then um then

08:37

>> it's actually like The Shining. It's just the same sentence over and over and over again.

>> Exactly. But I But but to Lauren's point, right, if the model doesn't have the knowledge in the first place and you've got a lot of specialist domain, then you're going to have to put that into context.

And and I'm not against it because we've all been using retrieval

08:54

augmented generation for a while and and if you really think what's going on with rag, you're doing a search and then essentially you're going to bring the relevant passages into your context anyway. So in some regards, how is that any different?

Really, what you're saying is actually I can fit everything

09:12

that I need into the context window and therefore the model's going to stand a better chance. and you know and and I I have to admit I would probably rather have it in the context window than um you know sort of rolling the dice at rag

09:28

and hoping that it gets the right chunk coming back. So it goes either way but yeah you are making up for lack of knowledge or there's certain patterns that you want it to do.

I need it. If you're generating a 25page document and that 25page document's got to look in an

09:43

exact way, you're you're to Aaron's earlier point, you're building a specification and and the model's not a mind readader. It's got to produce it in the way that you want.

And a good prompt is going to have examples. This, you know, this is section one, this is section two, I want you to do this, do not talk about this, this is your tone.

09:58

Very quickly, that's going to get very, very large. So, um, and yeah, you can fine-tune it, but fine-tuning is really hard.

So, you know, again, if you can stuff it in the context, that's fine. What what I would probably say is I

10:14

think this is and I probably challenge their use of agent in this case. Um, I suspect it's just a prompt.

Um, but I think if it was truly truly agentic, I would argue that the agent would be able to go around and round the loop a few

10:29

times and I don't think you would need a 100page uh, you know, prompt in that sense. and you could have the agent pull the elements that it needs and then bring together that structure in a way.

Now, in reality, it probably ends up still about 100 pages, but I think it's

10:45

rather than stuffing into the context window, you're having the agent go search and then bring everything together. So, I I I challenge the word agent in this case.

Um, you know, but yeah. >> Yeah.

Yeah. Yeah.

I wanted to just jump in and make two points if I could uh very quickly um you know, as to why you

11:01

would want to use a 100page prompt. So to me it seems as though these real-time systems right because the data is updated in real time it's never going to be within you know the knowledge base or the the fun foundational model that you have you know so if you think the stock market you know and you want to ask

11:16

questions you know about what's happening today or this minute you know then you know you need to get that information within um a prompt and you can get a lot of information very quickly within a prompt um and and then perhaps you even add in like a persona. All right.

So, and then the the second

11:33

use case that I was thinking about is, you know, why you would want to use this kind of a prompt. Um, even if the data were in, you know, the foundational model is is if you think of like a flashlight, you know, when when you put content content within a prompt, you're telling the system focus in right on this type of data rather than hoping and

11:50

rolling the dice like Chris mentioned that you're going to get that information back, right, as a result. But I would pair that with, you know, like an Allora technique, you know, where you can determine what the attention mechanism needs to focus in on, right?

So, so if you pair the Allora technique, you know, with a large

12:06

prompt, then I think in turn, you know, you'll be able to really take that flashlight when you're in the dark and light up exactly what you're looking for. >> And and Aaron, how do you plan to explain that to an accountant to do as opposed to copy and pasting into a prompt?

>> Yeah. So I mean so so in accounting it's

12:21

important I think because you know lots of these different types of rules and regulations it changes really quickly you know and so u if they're trying like tax right right in this this case um I think it's important to u now now I don't exactly know what was in the prompt right in this 100page prompt but

12:37

I would hope you know that that it was mostly about rules regulations you know so that they could better understand and advise maybe somebody right right around um what's happening um but in that tax That's that's what I would think and and and sort of help out, you know, a tax

12:53

auditor. >> That's one of the reasons I think like prompting is kind of it's unbeaten even though it's considered like a little bit of like a a cheap way of doing things for people who are like much more in the machine learning world is like for an accountant like they're not going to go through some fine-tuning, you know, process.

They would much rather just

13:09

type stuff type stuff in and see things happen. And so it's like really hard to beat the fact that like the feedback loop I'm prompting is just so satisfying in a way that like is really hard for other methods of, you know, AI alignment to basically work.

Chris, do you want to jump in there? >> Uh, no.

I agree with you 100%. But I I

13:26

would love to see some accountants sitting there going, why should I be using Qura here? Um, you know, how am I going to how am I going to debias my data set here, you know, or you're an expert tax advisor, please use Australian language in the response.

do not elucidate. Here's here are the tax

13:43

codes. That's >> Yeah.

Yeah. I think what's kind of scary too is the whole idea sentient, you know, is that whenever you start adding in different personalities, you know, if if you're a media company and you want to, you know, uh create, you know, maybe very entertaining broadcasting, it seems

13:59

like it's a real human, right? Um but that but that's another thread, right?

If we wanted to pull in in on it. these large context and prompts uh combined with a lot of these other like you know if you abstract those internals you know that Chris was mentioning away from the user so it's very simple you know it

14:16

really is very powerful you know like like the comet browser for example you know you you can do a lot of that I'm very excited right about uh what's what's in store >> I also think it's important to keep in mind that the feedback loop is very quick on prompt engineering for end users but in this case I actually

14:32

consider the prompt engineer to be I'm or a a IML team within KPMG that wrote that as the prompt for then others to just use in a more abstracted way. And so while a user might want prompt engineering for that super satisfying

14:47

quick feedback on nudging a model to do the right thing, you know, the actual team building this agent might have a, you know, more longer term view of if I could actually make it simpler to do the prompt in the first place, then I could use this not just for tax, but they have

15:04

other lines of businesses as well. So I think there's a difference in the level of patience and how much pain those two different groups would take on.

And I actually think the group that probably actually wrote this 100page prompt that then gets abstracted to the user might

15:20

be pretty interested in the ways that they can make that prompt simpler and reusable over the fact that more prompt engineering is just going to get them quick results. And the other thing that's in my mind is like I remember in the article that it was like from 2024

15:35

and and I just I just worry that like needle in a hay stack stuff hadn't really been figured out properly in 2024. So, I'm I'm just wondering if if like all it's 100 pages, but actually the model's probably just looking at the beginning and at the end and then

15:50

ignoring everything in the middle anyway, and they're just like tacking stuff on at the end and going, "Please work. Please work." And I don't know.

But I mean, I I suspect these days it's going to work a lot better, right? Because the models have been tuned to handle needle in the haststack stuff a lot better.

But but 2024, I think it's

16:06

probably quite impressive. >> Yeah.

I mean I mean these models can handle up to like what is it like 128,000 tokens, right? I mean that's big right and they're getting bigger and bigger you know you know it's you can take an entire book get a summarize you you know you know that's why I jokingly said at the beginning you know I don't

16:22

need these cliff notes anymore because I can get a model to summarize an entire book >> yeah and I think we will see that uh is basically just like as the window gets bigger and bigger you can just you know completely zero brain cell just put the entire thing in and just see what happens

16:39

all right I'm going to move us on to our next topic um So, the next story we want to cover um uh today was sort of an interesting uh comment really an off-hand comment from OpenAI's CFO um that got like a lot of play uh online and I think it's a pretty interesting

16:54

one. I kind of want to explain talk through especially for our listeners like why why it's happening.

So, basically um OpenAI CFO confirms this thing that they were thinking about not immediate but maybe something that that OpenAI might do down the line. And what

17:10

they might do is basically get into the infrastructure game. So, you know, rather than going to a Google Cloud Platform or an AWS, um you would simply like get compute from OpenAI.

Um and uh it's sort of an interesting thing because that's very different from, you

17:26

know, uh what OpenAI's business model has been to date, right? Which is basically selling access to its models.

This would be it selling access to its underlying infrastructure that it's building up. Um, and this is partially inspired actually by Amazon, right?

Where the model uh that that gave rise

17:41

to AWS was, hey, we run all this massive infrastructure for our e-commerce business. Maybe we just rent that underlying infrastructure itself.

So, I guess Lauren, maybe I'll turn it to you. Why would OpenAI want to do something like this?

It kind of feels like in some ways like these computing centers have

17:57

been like the the crown jewel. They don't want anyone to get access to it.

it's how they pull off, you know, the massive pre-training runs that get you a GPT. Um, but it kind of sounds like here they're now saying, well, you know, maybe not immediately, but we wouldn't mind renting that to some people.

It seems like kind of a change of change of

18:14

direction, don't you think? >> So, I could see this actually being like a foreshadowing to there being a market around secondhand GPUs or last season's GPUs.

So we can take for granted that OpenAI has to use the latest GPUs to be

18:30

competitive like performance efficiency for research for commercial offerings. And the re release pace of that has been about every two years.

So four years ago we had H A100s. Two years ago we had H100s.

This year we had Blackwell. So

18:47

every two years they have to refresh their whole fleet. Yet the actual lifespan of these GPUs is like five years, could be seven sitting around >> years.

Exactly. Yeah.

Yeah. And they're they won't be good enough, you know, at year three for OpenAI to use in their

19:04

research, but could be perfectly good enough for a customer who, you know, is running large scale inference workloads. So, I could see it as a way to recoup that investment, especially with the CEO saying that they could be making trillions of dollars in investment in

19:21

more infrastructure that, you know, after two years they have to find a way to not use themselves, but you know, figure out if there's customers who want them. >> Yeah, that's right.

And Chris, I guess this is like it's kind of remarkable because yeah, Lauren, I read the article in very much the same way where I was

19:38

like, "Oh yeah, like every time they build one of these big data centers, it's like the biggest data center that has ever data centered." And then kind of what they're saying here is like, but yeah, in 24 months it's going to be kind of obsolete for us and like we need to sell it to other people. Like Chris, I guess the pace of computing progress

19:54

here is like kind of insane, right? Like that basically the cutting edge becomes not fit for purpose for these frontier model companies within the span of a year.

like there's there's like you know this like the time period here is like very very small. >> Yeah, I think that's true and I think it really just comes down to economics as

20:11

as you sort of say there, right? Which is if it's cheaper to run the latest um uh GPU as opposed to an older version and you're going to be able to get your training runs done, then there's going to be a value.

So, you know, you need to stay ahead in that sense. So, I I guess it makes sense to to be able to rent

20:28

that stuff out. But, I mean, I don't know.

Oh, I mean, do I want to do I want to rent out Sam Alman's groy old unused GPUs? No.

I I I want I want I want the shiny things. You know what I mean?

And and I but I think it makes a lot of sense like, you know, as you say, AWS

20:45

does that, right? And they need the GPUs probably even today, they need it the most when they're doing the big training runs, but then inference is just taking over.

And then as we start to look at what's happening in inference, right, the the chips are kind of getting much much smaller. They're specialized

21:01

inference chips now. So you're not even using the Blackwell H100's, you know, arguably for inference.

There's a lot of providers if you think of things like Grock, for example, you know, they're using specialized chips in that sense. So So you can't you're not even passing it off over there.

So yeah, I mean what

21:18

are you gonna do with that? And so even if you've got the latest Blackwell or the whatever the next version after that is going to be then when you're not doing the big training runs then you know you you've got spare capacity that you want to you want to be able to sell off.

And now that becomes great for us

21:34

because if you want to know when they're training the next model just have a look at the spot models and if there's none available you know what's happening. >> Yeah that's a good tip for the future.

Um Aaron I guess question for you is like can OpenAI win in this space like That are offering these kinds of

21:50

services they're going to be going up against some pretty big players. Um and I guess the kind of question is like do you have confidence that OpenAI can just kind of flip a business like this on?

Well, I guess my first thought, you know, when I uh was looking into this is today it seems as though open AI is very

22:06

deeply dependent on Azure, right? And so, you know, for compute and even distribution of their models, but it it seems as though in the long run, OpenAI, they seem to be exploring building their their own infrastructure, which would then almost like rebalance, you know, this this from a um dependency that they

22:24

have to a collaboration, maybe a collaboration, right? and and so that that I think strategically is what potentially could be happening here, right?

Um yeah, and and is it a good thing and and can they do this? Well, um I think so.

Um I would just be careful

22:41

because I know that they've also, you know, released or or said that they want to have, you know, this consulting, you know, area, you know, where they're going to charge what is it like $10 million per client, right, to help them, you know, use their model. So I mean that's a big area focus area that they

22:56

have to work on. They're still building models right and now they want to do hardware that's three separate sort of tenants right that that they have you know and they they don't want to fragment themselves too much to get away from the bread and butter.

Um you know um but if they can be successful and put it all together then I think open can

23:13

pull it off and it'd be a nice re reorganization of the current business landscape. >> Yeah I think it's a good point.

It's just like how much like you know every few months it feels like open is launching a new product line. And maybe that's actually creating a bunch of spread.

Lauren was you want to jump in? >> I was going to say I think in terms of

23:28

whether they can win, one question is could they win against other um companies doing this. The other question is could they win against open source.

So VLM is very popular. Tensor RTLM is also quite popular and those are really the core technologies you would need to

23:45

set up your own deployment rather than just you know use a hosted API. And um I think there'd have to be a really strong case around either better performance, better usability, better something than these like very very popular open source

24:01

projects where a lot of the inference optimization other innovations are very quickly showing up in those engines because it has a whole community's worth of contribution happening happening in like almost real time. >> Yeah, that's really interesting.

We've

24:16

talked about obviously the pressure that OpenAI has had on like the the model side from open source. You're almost saying that this actually goes a level deeper, right?

Is like can it produce an inference stack and infrastructure business that's competitive with what's happening in open source. Um I hadn't really even thought about that.

It's

24:32

really interesting. >> Yeah.

Yeah. Yeah.

I kind of wonder if this is a hedge, you know, because they just released their openweight models, right? And and so because they're sort of doing some of that work, if they can build out this specialized infrastructure that's better than anyone else, um then perhaps, you know, this is where they think the market is going,

24:48

right? So that they can still remain financially solvent, >> right?

Yeah. Even as the bottle price comes down, you're like trying to capture it on the infrastructure side.

It's really interesting. >> Yeah.

What what I didn't see uh was their financials about how they were going to fund, you know, this like trillion dollar investment to build

25:04

their own uh data center. So, so I'd be interested to see some of that uh when whenever it comes out.

>> I'm going to move us on to our next topic today. Uh it was very funny.

We um had prepared uh this segment all to

25:20

focus on the ins and outs of a very detailed economic study that came out about jobs and AI and we will cover it on a future episode. But as so happens so frequently in the world of AI, uh Nano Banana launched um and uh and that obviously has taken up a lot more

25:36

airtime in AI world and I think it's worth going into. So we're going to instead rather than talking about AI economics and the labor market, we're going to talk about nanobanana.

Um Chris, I think you are one of the strongest advocates for switching out topics so we could talk about nanobanana. Uh I think the question for

25:51

you is how big of a deal is this? Like it seems in some ways that it's kind of just like a toy, right?

right? Like you put an image in and swap a person out and all that kind of stuff.

Uh talk to us a little bit about like what's going on beneath the hood and whether or not this is significant from a kind of research and technological capability

26:08

standpoint. >> Okay.

So I think the first thing to say is I think this is way more than a toy. This is by far the best image generation model that I've seen to date.

So and even if we look at the benchmarks we you

26:23

know when we look at and I'm not a big fan of benchmarks as you know but even when you look at those benchmarks it is 200 sort of elo points ahead of everything else right so it is it is absolutely just killing it and and what is super cool about it is to your point

26:40

Tim is the quality from the model is great the text capabilities of the model is great so if you typically look at an image model, you know, it'll mess up the text and all that side of things and and it doesn't look great. The quality is

26:56

just amazing. And then to your point, it's like the ability to hold an image and then put that image in different spaces, maintain the physics, etc.

is is absolutely brilliant. So, to your point, you can face swap, you can add a smile, you can you can make a change, you can

27:11

put somebody in a different location. It all works great.

In fact, if I want if we can Can I share my screen, Tim? Can I share my screen?

Uh, I believe yeah, the permissions are open if you want to share your screen. >> So, for all of the wonderful users here, whilst I was supposed to be paying attention to the podcast,

27:28

>> this is what Chris is usually doing when he's listening to everybody else talk. >> Yeah, exactly.

This is what I built instead. Wait a minute.

Let's Let's see. So, we'll we'll we'll we'll see here.

Let's get rid of this. So, I said, "Put Tim in a banana suit." There's Tim

27:43

screenshotted from today's podcast. And then there's Tim, right?

And and he didn't look very happy. Oh, sad Tim in a banana suit.

Make Tim happy in a banana suit. And there he's got a nice happy face.

And I said, he's going to be happy

27:58

because he's in banana. He's in my banana beach.

Miami Beach. >> Banana beach.

Very nice. >> Tim.

And then I said, "Oh no, he needs a friend with him and an apple suit. Sorry, Lauren.

I didn't get your permission." There we go. And Tim and Lauren are in Miami Beach happy.

So I

28:15

mean we're joking around. I'll stop sharing my screen now.

We're joking around there, >> but the reality is is that is fantastic getting any of the other models to be able to do it to that quality and it does all the style transferences as you

28:31

imagine. Now if you start to think of the impact of that everything from creation of YouTube thumbnails to you know image editing filtering all the sort of things that you would you know you would have typically done with kind of Photoshop type um stuff then you know

28:47

think of things like Canva for example you would typically I mean I use Canva a lot what's going to happen there right because you're you're going to start to be able to just use this um you know out of the box from from Google AI Studio I I honestly think it is phenomenal and I

29:03

I really think there's going to be a lot of people who are who've invested in image models really sort of starting to panic very quickly >> and I do want to pick up on that point. Um, you know, Aaron, I think one of the narratives that has kind of played out in a very interesting way over the last year or so has been I think had you asked me January 2025, you would been

29:21

like, who's leading in this AI space been like ah, you know, open AI, Anthropic, and then Google's like kind of at the very end of the list like oh man, they just do not have their act together. But kind of announcement by announcement, they really seem to be catching up in a really pretty significant way.

And so I guess Aaron, I

29:37

just wanted kind of for you to reflect a little bit on, you know, do you think that this is like in some ways like this is kind of like Google really kind of like fighting for first now in some respects, particularly on the image side. >> Well, I never thought I'd see Lauren in an Apple suit, you know, that that's for sure, right?

And I think that's pretty

29:53

impressive. Um u so I mean as far as what is you know sort of this jump forward I really like this multi-turn editing capability where it it can remember and build upon prior instructions that you already gave it you know and that's an indicator of like

30:10

some kind of extended attention and memory capabilities within the model you know which sort of propels it up right the uh projection of some of the best image generation models um and I think some of the other pieces are this, you know, um, it has like up to 1 million

30:27

tokens, right, that that you can put in. So, because you have to be able to put in like a text prompt and then also add in an add in an image, right?

And and so all all those things, you know, and then also seeing Tim in a banana suit, I think that definitely propels it up to the the number one image generation

30:45

system out there. Lauren, I guess we can use the opportunity, I think, to have you on the show because, you know, I think you've already brought it up a couple times as like the kind of like everpresent influence of open source on the space.

Um, and it's certainly for language models and text, it feels like open source is kind of, you know, kind

31:02

of like in the running for state-of-the-art. Where are we, you feel, on kind of like open source image and kind of other forms of media generation?

Is that similar in the space from your perspective where like open source is really kind of catching up very quickly or is a place where, you know, the space is still lagging? Yeah, I think on the um models front it's

31:19

maybe not as important as on the inference engine and then even like the user interface front because you need all of those pieces to come together and the inference engines are typically more skewed towards text use cases. So I think even if the models are up to par,

31:35

it's not the same as being able to go to that user interface that Chris just showed, which was probably free um or at least a free tier. there's not really the equivalent in open source.

There's always going to be that element of, you know, DIYness that you have to do to

31:50

first find the model and then the model might not be as generalizable. There might be, you know, there are certainly if you look at Hugging Face, I think it's millions of models at this point.

So, you could find models that are good at specific tasks. Um, not so sure about

32:06

as generalizable as what we just saw. Chris, a final question on this is um you know it's the inevitable question, but we've been freaking out all the time for years at this point on how AI generated images are going to destroy our ability to know what's real and

32:21

what's not. Have we finally crossed the threshold with Nano Banana?

This is like pretty good. >> What's real anyway, Tim?

We're all living in a simulation, so it's fine. Now, I I think I actually think the progression here is really good, right?

32:37

So I I actually think the fact that we've been seeing terrible image models for a while has been a very good thing and we've all got pretty good at spot on a like the hands slightly off you know. Yeah.

So I think over the last few years we've kind of got used to it and we now

32:54

know not to trust images. Do you know what I mean?

So if so I think if you think of all of the kind of the the flux stuff with Black Forest last year that was perfect example. we saw are politicians the holding hands doing whatever right we've got used to it we

33:11

know not to trust these models uh and the outputs I think the I think the bigger thing in this case is making sure that we hold people accountable for the the models that they can create and

33:26

making sure that the safety elements of those models are high because you know it's it's it's you know there's a good side people like me who couldn't create a thumbnail, great, I'm now going to be able to create decent thumbnails for for my YouTube channel. Plug.

Um, but for,

33:43

you know, for others, it means they're going to lose business in in that sense. And then there's a whole lot of scary scenarios there.

So, I think I think there is still a lot of the ethical side that needs to to be worked out. But, um, but the quality is great and and this and it's just going to get better,

33:58

right? And and actually, one of the things I would say is we're seeing this right now for image.

We can guarantee if we project forward 12 to 18 months, you're going to see the same level of quality on video, you're going to see the same level of quality on audio as well. So, this is just going to extend

34:14

out across modalities. >> And I'll add too that I think this being such an editing focused model, editing has kind of become a bad word because it, you know, editing means manipulation means like malicious intent.

But there are really important use cases for

34:30

editing. So like with the geospatial models that we built with NASA, one of the biggest struggles is cloud cover.

You know that most of the images satellite imagery has cloud cover. So you can't do anything with that.

And if you could do actually synthetically

34:46

generated data using an editing model to then improve your data set to train a foundation model, that's an editing use case. And it's, you know, it's not about manipulation or changing the meaning of something from a human perspective.

It's more for a machine learning perspective.

35:04

>> That's right. >> Sure.

And just being able to see Tim smile as well is really important. So, >> as you know, I never smile on these shows.

So, >> all right, last topic of the day. Uh, Aaron, it's always a joke that when we bring you on to the show, we're going to

35:20

talk about sports. And I'm not going to let us break the tradition here on episode 70.

So you've been covering the US Open uh and I think the team's been doing some interesting experiments and so we've been doing a lot of uh screen sharing on this episode. I believe you want to kind of share some of the stuff that you've been doing as well.

>> Yeah. So I mean I mean first you know

35:36

you know if I could just give a prelude as far as you know what we're doing. And so I mean we've been been with the US Open for over 30 years, right?

And there there's about a million fans that uh show up, right, and attend um you know, the Flushing Meadows site. And then every single day about there there's

35:52

there's what another 14 million fans that tune in through our digital properties. And what what we've done right is we've been uh the hallmarks right of of the US Open is where we want to combine the fan experience with technology right so that we can bring in people expand the swath of what we're

36:09

doing and we've introduced uh three new uh features this year so so one of them is a match chat so we spent um several months you know building this very impressive uh system and uh we'll we'll have you know a few papers you know out that describe you know the science behind it um but what this is it's a

36:26

real time sort of agent-driven um assistant so that you can go in and ask a question about a match about players um in real time right large scale um and get a response back right um and then the second piece is called key points right so so we we always say too long

36:42

didn't read tlddr right you see that a lot but there's these very long articles that people just don't have the time to read and so we summarize it right and then we show those bullet points right on top of these articles and we have a workflow of which you know we work with USA editors. And then the third one is

36:58

called live likelihood to win, right? This this has a very long historical background.

Uh but you know, we combine predictive modeling. Uh so we have an ensemble of different predictive modeling uh that then go into um who's going to win, right, the match.

So we have a pre-match prediction and then as

37:15

the match goes on, we have some proprietary equations that we develop that then fine-tune and change, right, the odds that somebody's going to win given these momentum. But ultimately what we want to do is increase the breadth and depth right of fans and give them the information that they need so that they can understand the story um of

37:32

a match. And uh what I was hoping to do was continue this trend of experimentation of screen share right and just show you uh some of the work here that we've uh done.

And it's it's live right now and play starts uh pretty soon right um so it's 10:47 right now.

37:48

So it starts at around 11:00, right? And uh we can go ahead and uh see some of the action.

So just just to orient you, you know, this is our work that we put together, right? That shows the this is a website experience, right?

And and then obviously we have a very nice uh mobile app, right, that's a twin, right,

38:04

of this. And I want to just just quickly show you um you know, when a user comes in, what's one of the first things they want to know?

Well, they want to know the scores, right, um of a match. And so I I would like to highlight two matches.

Um, one of them, right, was a big upset.

38:19

So, u, her name is Ela, right? And she was she's she's a 20-year-old, you know, from the Philippines, and she beat, right, um, uh, Tucson, right?

Um, so that that's one, right? And then another is the Alcarz match that I also want to

38:35

show you. And let's check out the Alcarz match here.

Um, so what you do is is you because this is already finished, right? But imagine, you know, play is going on and there's a match that's going on which which you can check later in the day.

But let's check out the match recap. So we have, you know, the IBN

38:50

slam tracker and it pops up and you can quickly see on the sidecar um that we have um the first tile would be the score here, right? And then when you go down, we have this 360°ree storytelling, right, of the match.

And if you want to

39:06

know beforehand, right, if the match hadn't started, what's the likelihood that Alcarath is going to win? Well, it's pretty high, right, in this case, right?

This is round u a very early round. So, this is what round two, right?

And and Alcarz is off to a strong start. But this is what we've assigned,

39:22

you know, that Alcarass has an 82% chance of winning. And again, this is this uses pure predictive modeling that we've experimented with over years and years.

Now, because the match is over, we can go to the summary tab, right? And you can see the live likelihood to win.

39:39

you know, how it's changed um over time, right? And and there weren't very many fluctuations in this one because Alcarz, you know, had a very big um you know, advantage whenever he uh came in, right?

Uh but now if I want to know some details, so this is Matt's chat. This is

39:57

one of the industry's first again I'm going to say, you know, large scale, right? Real time uh system, right?

And so let's just click it and it opens up and we have a frictionless uh user experience that we've designed you know you know so that we can um sort of guide

40:13

the user and to help them get the information that matters the most and we did a lot of user studying we did a lot of data analytics to figure out you know what what do people care about um so let's check out I find match stats very interesting right so let's just ask a question let's say how many aces

40:31

uh did and let's let's put a player that that isn't even in this match did center have and so let's let's do this first. So it's thinking right it's it's going through and it's hitting um the pieces right and and so what so what what it first says is wait a minute do you want to know set by set or do you want to

40:47

know about the match and and let's hit no right so because I want to know about the match here and so now it's thinking again analyzing and this is going out right real time right now it's it's hitting our middleware going out into um AWS and what it does is it then in turn

41:04

comes back and tells us how many you know serves did have right? Um and it and and it and it worked well because it was able to switch right center right into the right players um that it that that it has.

So so it automatically does a lot of the detection. So we have a lot

41:20

of hat pipelines and then it does pronoun corrections, it does player corrections and so on, right? But um you you can play with this more uh as you go through, right?

Um and and see what all we've uh built. But it's it's very interesting and there's a lot of deep

41:36

statistics, right? that uh that have come in.

And so, you know, if we were to were to keep going, then you know, you can see lots of stats that that people really want to know about. But in the interest of time, let's just go back uh and and close this, right?

And you know,

41:52

why don't you you pick a match here on the screen rather than me me picking a match, Tim? >> Let's do the Harris versus Fritz there on the bottom right.

>> On the bottom right. So, okay, this one here.

All right. Mhm.

>> So, you know, here the the pre- likelihood to win, let's let's check

42:08

that out. I mean, Fritz was overwhelmingly the favorite, right?

And so, because of that, whenever you go and look at the live likelihood to win, um if if we trace it with the actual match, right, you can see that uh Fritz lost the first set, right? So, his odds of

42:24

winning it, it goes down, but not that much because he's still favored so heavily. And then the story uh telling keeps going on where it's it's a very close one.

It gives him the break points, right? And the in the second set, right?

And because Fritz wins, well, you know, I I think he's gaining

42:40

regaining the momentum, right? And then the match continues, right, where finally set four, you know, he he eventually uh takes that over.

So this live likelihood to win is really powerful during the match itself because you can track and trace how that works,

42:55

you know. So that's that that in essence is is what I really wanted to show, you some of the exciting work, right, that's live right now.

Um, and then a plugin, you know, for our ESPN fantasy football, uh, we went live with a few other, uh, pieces yesterday. Um, and then next week on Wednesday, we're going to have

43:10

another piece that's live. But if you're part of a fantasy football team, uh, go and check out our player insights, um, and and factors that we have and grades and so on.

>> That's great, Aaron. Awesome.

Well, we'll keep you posted. Uh, and, uh, for all you listeners, we'll keep you posted.

And I guess Aaron, as this uh

43:26

continues to develop out, we'll have you back. I think uh it's fun having you on the show regularly because it feels like we get to see the iteration every time you come back on.

Um and so it's it's cool seeing that happen. >> Cool.

Yeah. >> Awesome.

Well, that's all the time that we have for today. Uh so thanks for uh joining us.

Uh Aaron, Lauren, Chris, was

43:42

a pleasure as always to have you on the show. And uh thanks to all you listeners.

If you enjoyed what you heard, you can get us on Apple Podcast, Spotify, and podcast platforms everywhere. And we will get you next week on Mixture of Experts.

All right.

Summary

Transcript