π Add to Chrome β Itβs Free - YouTube Summarizer
Category: AI Developments
Tags: AIInfrastructureMemoryModelsRobotics
Entities: agents.mdAmazon BedrockAtlasBoston DynamicsDeepSeekFigure RoboticsGeminiGoogleGPT-5GPT-6MetaNvidiaOpelOpenAIPerplexityQuen Image EditSam AltmanSebastian Bubck
00:00
The hype train never stops. Sam Alman is already talking about GPT6 not more than two weeks after GPT5 came out.
CNBC reports GPT6 people want memory. So Sam
00:15
Alman met with a bunch of reporters last week in San Francisco and had a private dinner thing and told them some information about what's coming with GPT6. Listen to this.
Altman didn't give a release date, but people want memory. People want product features that require us to be able to understand
00:31
them. And likely what he's talking about is an even deeper and meaningful memory for these models.
And I keep saying this model memory is an incredible moat. The more the model gets to know you, the better it will be.
It's going to learn a
00:47
shorthand with you. It's going to learn your preferences.
And it's also much more efficient because it's almost like shortcutting the solution in a lot of different use cases. If it knows the types of things you like, if it knows how you like to work, it can start down
01:02
that path rather than you having to prompt it multiple times trying to get it to go there. I think our product should have a fairly center of the road middle stance and then you should be able to push it pretty far.
If you're like, I want you to be super woke, it should be super woke and obviously the other way as well. And so there are two
01:17
sides to that argument. If the model is just going to reflect exactly what you ask from it, we're going to have the same issues we had as part of the social media revolution where everything just becomes an echo chamber because of the algorithm.
The algorithm was tuned to
01:33
maximize engagement and unfortunately fear and anger tend to maximize engagement. So of course that's a lot of what you saw and when you started interacting with certain posts you just saw more of those posts, hence the echochamber sentiment.
Now, on the flip side, the AI should be what I want it to
01:50
be. It is a reflection of something I want to work with day in and day out.
This is a really difficult problem that these Frontier Model Labs are going to have to figure out along the way. This is akin to the sick fency issue where the models were agreeing with everything users said, no matter how ridiculous it
02:06
was. And so, it'll be interesting to see how this plays out.
All right, next. Deepseek V3.1 is here, an open weights model.
You can download it right now from the Deep Seek company. This is not their R series of models.
R2 is rumored
02:22
to have been delayed because China is telling Deepseek to use Chinese chips rather than Nvidia chips. But now we do have DeepSeek V3.1.
And if you want me to do a thorough test of it, let me know in the comments below. You can download the model on HuggingFace.
There is not
02:37
much information other than the files right here. It is a very big model.
So if you don't have a lot of VRAM on your computer, you probably won't be able to run it. Wait for the quantized versions.
Hopefully you can run those. And another open weights model out of China.
We have
02:53
Quen image edit which is exactly what it sounds like. It is an image editing model and it is really good.
So key features, accurate text editing with bilingual support, highlevel semantic editing, so object rotation, IP creation, low-level appearance editing,
03:08
addition, deletion, insertion, and so you can try it now at Quen AI. You can download it on hugging face.
You can find the process to build it on GitHub. So here are some examples.
Here's the Quen mascot. And then a bunch of different very consistent versions of
03:24
the mascot doing different things. Here's another one.
Here's image rotation. Here's the input image of this guy.
Rotate it to the front view. There it is.
I think that looks fantastic. Another one.
Image input from the side. Obtain the front view.
Here's somebody from the back. Obtain the front view.
03:39
Obviously, the model's going to have to guess what this person looks like. And for the BMW, we have an input image.
Rotate it to the front. That looks flawless.
Here's some more. A baby, a dog, a crow, and a lion.
And we can also do avatar creation. So, here is the
03:54
input image. And then the prompt replace t-shirt to black t-shirt with the text Quen on it.
Transform to Gibli style. Here's another one.
3D cartoon style and chibi style. Finally, here's the input image.
A bunch of penguins on the beach
04:10
and the sign, welcome to Penguin Beach added. And if you look, all the penguins are nearly identical.
In fact, I can't tell a difference. And this model is just so good at isolating parts of the image that need to be changed and leaving everything else the way it is.
04:27
All right, here's another one. A minor image edit, but so meaningful.
Here's the input image with this gross strand of hair here. And then remove the strand of hair, and it's completely gone.
Really good. Everything else, look at all the words, the prices, everything is
04:43
exactly the same. even the 25 which is slightly to the right aligned with the rest of the prices.
Same on the edited image. So here's another one.
Not only can it edit the image, but it actually has an understanding of the image. So here's the alphabet A through Z, and
04:59
it's changed the color of the letter N to blue. So just one of the letters, change it to blue, and there it is.
Here we can do background swap. So, put this woman on the beach, put her in a classroom, virtual tryon, same woman across all the images, different
05:15
outfits, text editing. Very, very impressive.
I cannot wait to try this out. And by the way, if you want to try out open source models and even if you want to try out some Frontier models and not only that, but actually get the most out of them, check out the sponsor of
05:30
today's video, AWS. Amazon Bedrock has everything you need if you are building generative AI applications.
And let me tell you about the four critical features you need from Amazon Bedrock. First, prompt optimization.
Prompt management makes it easy to create,
05:47
evaluate, version, and run all of your prompts to the right models. And with prompt optimization, you can have your prompts automatically rewritten to help improve performance and make them concise.
Then from there, Amazon Bedrock also offers intelligent prompt routing.
06:03
So you can take your prompt and route it to the best model for the job dependent on cost or efficiency, latency, whatever it is, you can tell intelligent prompt routing what to do and it will route it automatically. The next big feature is prompt caching.
If you're using long
06:20
repeated prompts, you can use prompt caching which will save you processing time and lower latency. And then last, model distillation.
It's a technique for taking a more expensive teacher model and having that teacher model create a smaller performant version that it
06:36
teaches by transferring its knowledge to that smaller model. So check out all of these features.
I'm going to link to them down below. Amazon has been a fantastic partner.
So please check them out. Click the link so they know I sent you.
And now back to the video. All right.
And next, the major Aentic coding
06:54
platforms got together and did something great for the community. They put together a standard agents.md.
Now, if you do any type of vibe coding or agent-driven development and you use things like cloud code, cursor, windsurf, factory, any of them, you will
07:10
likely have multiple different files like this for each of them. And basically what these files do is tell the model how you like to write code, what rules to follow, some guidelines, best practices, everything like that, all in one place.
But there was no
07:27
standard. So if you use multiple tools, you would have multiple files even in the same codebase.
But now agents.md standardizes all of it. So as it's described, think of agents.md as a readme for agents, a dedicated predictable place to provide the context
07:43
and instructions to help AI coding agents work on your project. It is open source.
You can download it and use it right now. And it is supported already by codec from OpenAI, AMP, jewels from Google, Factory, Cursor, Rue, and
07:59
others. So, this is a huge upgrade for anybody doing vibe coding across multiple different tools.
All right, this next one's interesting because I actually missed it. Apparently, Google announced this product called Opel about a month ago, and it allows you to create these almost disposable mini apps with
08:17
AI. And so you're probably wondering, "All right, Matt, it came out a month ago.
Why are you talking about it?" Well, the news is now they release it in beta, so you can try it out right now. So this is Opel.
You can create apps with a simple prompt. It puts together a node-based workflow and then you can
08:34
share it with other people. So here's a learning with YouTube example.
I'll click it. You can see nodebased.
All the nodes tie together. And all of this was created with just a simple prompt.
So, collect URL. Enter the YouTube video
08:49
URL. Then here's the prompt.
You are a skilled transcriber capable of understanding and extracting information from multiple sources. You can give it tools.
You can give it variables. Here's the collect URL right here.
Then analyze educational content, generate quiz, and
09:04
then display it. And so, if we run it, here it is.
So, we click start. It's going to say enter a YouTube video URL.
So, I'll grab my previous video. I'll input it.
Hit enter. You can see it's processing.
So extract video transcript. Then it's going to analyze the educational content, generate the quiz,
09:21
and then display the report. And so once you're done, you can share the app.
You can also remix others. So if you find one you like, you can remix it any way you like.
And then you can also just manually create it. So just like any other nodebased framework, you get the inputs, you get the outputs, and then
09:36
everything in between. So here we can say do another user input, then do something else, connect it to the end, and so on.
It's free to try right now. So give it a try.
opal.withgoogle.com. And more by Google.
Just today they had their made by Google event and it was
09:53
announced that Gemini is now going to be powering all of their voice assistants at home. We're introducing an allnew voice assistant that will provide the nextG of help at home.
Gemini for home, enabling hands-free help for the entire household. So that's really cool.
I
10:09
started using a Pixel phone after I saw all of the cool announcements at Google IO and I can't wait to start getting some of these more advanced AI features on my phone. Apple, where are you?
All right, next for Perplexity. Arind the CEO says, "We're working on something
10:26
called super memory for all Perplexity users." As I've said before, memory is the moat. When you develop a really great model and then have it understand your users, learn about your users, it becomes so much more personal and so
10:42
much more powerful. So all of the AI app companies, all of the frontier model labs, they should all be, if they're not already, investing heavily into developing the best possible AI memory.
He continues, "It's in the final stages of testing. Early test suggests it's working much better than anything else
10:58
out there." So here's an example. Somebody just said, "Hey, hey there.
Good evening. I see you're at NITH.
How can I help you today? Whether you have some questions about electrical engineering, need assistance with coursework, or want to discuss anything else, I'm here to help.
So, it has
11:13
learned a lot about this person who just said, "Hey, next. Sebastian Bubck from OpenAI put out a claim with proof that GPT5 is actually solving new math." Look at this.
So, claim GPT5 Pro can prove
11:28
new interesting mathematics. Proof.
I took a convex optimization paper with a clean open problem in it and asked GPT5 Pro to work on it. It proved a better bound than what is in the paper and I checked the proof it's correct.
Details
11:44
below. So here we go.
We have the paper. Can you improve the condition on the step size in theorem one?
I don't want to add any more hypothesis. I just want you to work to improve the step size condition under the same assumptions as theorem one.
17 minutes later. Yes,
12:01
under the same assumptions and then it goes on to propose a new solution. Just crazy where we're at with artificial intelligence.
The core intelligence of these models is plenty. I've said this before, I'm going to say it again.
Even if we had no more progress on the core
12:16
intelligence of the models, the scaffolding around them being built out will provide the world with such incredible value. And there are so many use cases that are untapped right now strictly because of the missing scaffolding memory being a part of that
12:31
scaffolding. So Sebastian goes on to explain how it worked, what it proposed, and why it's correct.
This is math that is way beyond me, so I'm not even going to try to explain it to you. If you want to check it out, of course, I'll drop the link down in the description below.
12:47
And next, Boston Dynamics put out a new demo video of their robot Atlas. This is their next generation of robot.
Look how smooth it is. And this is all fully autonomous at 1x speed as it says right there.
Look at it opening the box. Of
13:03
course, somebody has to mess with it. So, it tries to open it again.
Oh, has to open it again. Starts to take things out.
Places it in the box next to it. In the bin, I should say.
13:18
And the person with the hockey stick moves the box. Let's see.
grabs the box, moves it closer to itself, the robot, and then continues its task. Now, what you can see here is the virtual environment that the robot is also able to visualize and run in and
13:35
probably train on. So, you can see the robot arm is kind of visualizing where it's going to be going.
That is so cool. And then next, we have parts of the spot robot, the dog base robot.
It's able to pick it up gently. Grab it multiple ways.
Grab it with two arms. Fold it.
13:53
And let's see if it's able to put it away. And there it is.
So, still has a little work to go. Needs to be a little bit faster.
Needs to be a little bit more fluid. But this is incredible.
The progress in humanoid robots has been absolutely insane to watch over the last
14:08
couple years. And so, why is this so impressive?
Well, the humanoid hub on Twitter breaks it down. Their approach focuses on long horizon language conditioned manipulation.
So you are saying the task that you want the robot to do and then it goes and does it. And
14:25
it's not a simple task. These are as it says long horizon tasks.
Language conditioned manipulation and locomotion by mapping sensor inputs and language prompts into whole body control at high frequency. A mouthful, but basically you tell it something and it does it.
So the
14:41
process to get it to do that, the actual training looks like this. Teleaoperated data collection, meaning somebody is sitting behind controls and getting it to do the thing manually and then it learns from that.
So then curation into pipelines, largecale model training and
14:56
rigorous evaluation to guide improvements. So if you want to learn more about it, more of the details, I'll drop the link down below.
They put out an entire blog post about how they were able to get Atlas to do this. And that's not it on the humanoid robots front.
We also have another demo video from Figure
15:13
Robotics. Check this out.
We have the Figure Robot walking outside through some bushes. This is probably near Figur's office.
Comes across a bunch of obstacles, gets its foot stuck, corrects itself, walks. Not very fluid motion,
15:28
but still very impressive that it's able to navigate this kind of rough, difficult terrain. And yeah, just fine.
Broke something right there. So this is the figure 2 robot and again reinforcement learning.
15:44
This is all where we're headed. End toend neural nets controlling these robots.
And cursor has a new stealth model that you can use and try out right now. It's called Sonic.
Some people are saying that it's Grock code because Grock code is supposed to be dropping any day now, but I don't know. But this
16:01
was posted from Cursor's team. So the model is definitely available and available right now.
Go try it out. And next, Bloomberg is reporting that OpenAI might actually get into the infrastructure game.
So, very similar to Google Cloud, Azure, AWS, it might start
16:17
selling its own infrastructure back to developers and other companies that need it. Now, it doesn't seem like anything immediate.
It actually seems like kind of a throwaway comment that their CFO made. And look, anything is possible in the future.
It's possible that they have enough compute, but I can see right now
16:33
they definitely don't have enough compute to go sell their spare capacity elsewhere. But I thought it was interesting, so shared it with you.
And next, Meta is going for a fourth restructuring of their AI teams and Business Insider has more information on
16:49
it. So here is the gist from Alexander Wang, who was the CEO of Scale AI and now basically runs all of Meta's AI division.
So, first will play a more active role. Fair being Yan Lun's team.
FIAR will be an innovation engine for
17:05
MSL's Meta Super Intelligence Lab. Training runs by feeding its research directly to TBD lab.
TBD is to be determined and that's their lab that has yet to be determined. Another branch of the company.
Meta Super Intelligence Labs research will be led by its new
17:21
chief scientist Chachi co-creator Shenha Xiao. Hopefully, I'm pronouncing that correctly.
Natt Freeman, who they recruited recently, will report directly to Wang, and that is GitHub's former CEO, and he will be responsible for integrating AI into Meta's products.
17:36
They're also unveiling a new infrastructure team led by Aparna Ramani, a longtime Meta engineering VP whose LinkedIn profile says she leads all of Meta's AI infrastructure. And Meta is also dissolving the AGI Foundations team, which was created just
17:52
a few months ago. So, if you want to read the full memo, Business Insider has it.
I will drop a link down below. And last, Nvidia is building a new chip specifically for China.
So, Reuters exclusive, Nvidia working on a new AI chip for China that outperforms the H20,
18:08
which is a custom chip that was specifically sold to China because of all the restrictions. And so, this is going to be an upgrade on that.
So, a little bit more about the chip. It is tentatively known as the B30A.
It will use single die design likely to deliver half the raw computing power of the more
18:24
sophisticated dual die configuration of the B300. So yeah, it's going to be a watered down version of the chips that a lot of other countries can get right now, including the US, obviously.
So that's it for today. If you enjoyed this video, please consider giving a like and subscribe.