π Add to Chrome β Itβs Free - YouTube Summarizer
Category: AI Technology
Tags: AIEfficiencyHardwareInnovationTechnology
Entities: Agentic AIAI acceleratorAI cardsASICFPGAGPUNPUPCIeTPU
00:00
AI is powerful and complicated. Agentic AI's entry into the AI landscape is pushing the boundaries of what's possible.
But this broadening scope of opportunity
00:18
comes at the cost of complexity and coordination overhead. If all of this possibility is not harnessed and properly aligned, the result ends up looking more like, well, chaos.
AI cards
00:37
can help us harness the power of agentic AI for a multitude of real-world applications. They have quickly become a fundamental component of modern AI integration.
00:55
Entities that want to fully leverage AI's capabilities is need to have a strategy for the end-to-end incorporation of AI across their IT systems, data centers and platforms. Let's explore the role AI cards play to simplify the modern AI ecosystem.
01:13
We'll start with the what and where. What is AI card?
And where is it in the system?
01:34
And then why do we need it? AI cards.
Okay. Why?
And then finally, how do they simplify this complex world of AI?
01:52
All right. So an AI card is sort of physical.
Depending on the type of card it is, you may be be able to hold it in your hand. But really it is a piece of hardware
02:07
that is designed to accelerate AI or that accelerates AI. These cards can be as small as a special piece of silicon built into your processor chip itself.
Or
02:23
they could be mounted on a system board and be something like an FPGA or a GPU, etc., or, and we're seeing more and more of these, they could be attached to your system
02:41
through the PCIe port. Industry standard PCIe.
One or more of them could be attached as a physical card that you, that you, that you actually can hold in your hand. Question I often get asked when we start getting to this point of the AI card conversation is:
02:57
What's the difference between a card and an accelerator? Are they the same thing?
And the answer is they're actually not. When we talk about AI cards in general, what we're speaking of is anything that's being used to accelerate AI.
03:14
But in hardware accelerator card, an AI accelerator card is something that was designed, the microarchitecture was designed, and the chip was fabricated to perform acceleration of one or more specific AI tasks.
03:30
So you can actually think of hardware accelerators as a very powerful subset
03:46
of the AI card space in general. The distinction is a little bit important.
So let's spend a little bit of time understanding more about why they're different and why both are still important parts of the ecosystem. So we just compare cards with accelerators.
04:13
Purpose. Why did we build this thing?
A card that's just being used for AI was built for a general purpose. But a card that's designed to accelerate AI
04:31
was obviously designed with a specific purpose. Because of these purposes, the efficiency of these cards are different
04:46
if you use an accelerator versus a regular card. In the space of AI, when we talk about efficiency, we're actually talking about a combination of different metrics.
You know, there's the accuracy of your result, how fast you get that result back, how much power and sustainability, uh,
05:03
you know, impacts there are as a result of getting that. All of those components, uh, become what we measure in the AI world as efficiency.
So with a, with a card that you're using for AI,
05:18
your efficiency is going to be variable. You might get lucky and, you know, make good engineering guesses and, you know, use these cards and get fairly good efficiency
05:33
out of them for some use cases. Other use cases, no matter how much engineering effort you put in, it's not going to be as optimal as if you had a card that was specifically designed for it, because these accelerator cards are optimized
05:51
for AI. In some cases, for very specific AI tasks, just training just, you know, deep inferencing, just fine tuning, combinations thereof.
Let's look at a couple of examples
06:11
to really kind of illustrate this. So GPU, very common, uh, example of a generic card used for AI.
Originally, they were used for graphics processing. The G stands for graphics.
Uh,
06:27
turns out that a lot of the linear mathematics you have to use for graphics processing, uh, overlaps, somewhat, with the the mathematics that you need to do in computer hardware for AI-type workloads. Um, FPGAs, field programmable,
06:44
you get a little bit of flexibility there, uh, but still very a very general purpose piece of hardware that you can attach onto your system and have it accelerate things for you, offloading work from your processor. The specific accelerators, however.
These are things that you have, like tensor processing units
07:02
specifically designed for AI. Um, NPUs, neural processing units, right?
Specifically designed to, uh, act, uh, the way a brain does, you know, using neural networks. Uh,
07:19
and then, um, ASICs, we're seeing more and more of these sort of custom-designed ASIC, uh, silicon, uh, that's being created for different types of AI. And for those, you really start to see the special purposes.
07:37
So why so many? This is a lot.
It's already fairly confusing and hard to grasp. And now we have all of these different combinations.
Uh, and the reason is that the use cases vary so much.
07:52
Sometimes, you really can get everything you need from a general-purpose AI card, but other times, you need that optimized hardware to really be effective in what you're doing and, able to, to be able to do real-time
08:09
transaction processing, for example, to to be able to really get that fine tuning that's, that's needed for, you know, medical research applications. You're going to need something that's specific to that.
If you're doing one specific AI task at a time, things
08:26
are simple. But to maximize the power of AI and to achieve enterprise-level goals, many AI tasks need to occur in parallel, and the right combination of models and cards need to be used for each task.
Managing that is what gets complex.
08:42
Let me give you an example. Many modern AI use cases often rely on more than one model to produce an accurate inference result.
So the mapping is not one-to-one
08:59
between use case and model, and neither is it one-to-one between model and the card that that model will run optimally on. Some use cases require one or more model to achieve optimal results
09:16
and some models. If a model.
If a use case is using two models, it may be using two different cards, or in some cases, it may be using the same card on the back end. Some of that has to do with the card's optimizations,
09:32
and some of it has to do with the card's availability. In my last talk, I shared a fraud detection example that leveraged a traditional AI and a gen AI model that was combined together to optimize
09:49
for inference accuracy and speed. So in that example, we had sort of a traditional ML DL type model.
And we fed into that, uh, online transaction.
10:07
And we were asking the model if it was a fraudulent transaction or not. Um, the traditional model had a fairly good accuracy result, but in some cases, we needed a better, uh, accuracy.
10:22
So we, when necessary, fed into a more sophisticated gen AI type of a model, uh, the same transaction, and then, um, used, uh, the most accurate as it results to determine the fraud
10:38
or the lack of it. So, in a case like this, we had, again, one use case, two models.
But which cards should we map these models to?
10:54
So this is online fraud detect. So there's a transaction happening in flight.
So this needs to happen fast. We already optimized it for speed.
So this is a really good use case for using an AI card that's physically located on the same processor chip,
11:10
the same die where the workload process is running. The gen AI model, it's significantly bigger than the machine learning, smaller model, so big that it probably can effectively leverage the caches of that processor chip as well.
11:26
So it would probably benefit from being somewhere a little closer to memory but still close enough to the working processor to get the answer back in time before the user gets frustrated and goes and buys their thing from somebody else. So the PCIe-attached
11:42
cards are a good example of a use case for this model. So by doing that, you can get the answer back to the user in an amount of time where the user itself can't tell
11:57
whether you use the quick ML DL model or the slower gen AI model, or both, because the difference in time between these is a smaller amount of time than a human can detect. We can't tell the difference between like a microsecond and a millisecond.
What we can tell the difference is between like a second and five seconds,
12:16
especially if we're on our lunch break trying to do all of our back-to-school shopping online. So it's important that you optimize for the speed that a human can detect or that your processor needs, but you don't want to do it too fast and not and not need to.
12:33
So say you come back from your lunch break and realize now you're the infrastructure manager of a transaction processing system. If you're that person, you know that fraud detect is only one of many other use cases
12:51
that AI could significantly help you with. For example, operational adaptability.
So if all of a sudden there's a spike in transactions
13:07
or the significant temperature change in one of your data centers, you might not need your system to very quickly do something different. And AI is a good use case for sort of adapting to that and making a change.
That's something that's probably going to happen quickly.
13:24
So, you know, another pretty good use case for our on system card attached AI accelerator to do some of that work. Other things that we might be concerned with, something like analytics.
13:42
Okay. So analytics is important.
You really want to collect trends and understand what your customers are doing and why you want as big of a data set as possible. And you're going to use this not in real time, but for future decision-making.
13:59
So you have a huge data set, a huge model, and you probably don't care so much about how long it takes to get your answers. So there's a use case where, you know, maybe you could use something a little bit more general purpose on a system board
14:14
that's, you know, offload those analytics from the working process. Um, and it doesn't even necessarily need to be super close to the machine that's running the transactions.
You could even, as long as your data isn't confidential, use something on a remote server, um,
14:31
or a secondary box within your own data center. Training similar.
That can be done elsewhere and then moved in once you have your model created.
14:48
But what about something more nuanced and more complex? Well, nothing gets more complex than regulatory compliance.
If you're doing something like this:
15:05
a) you have to it's it's regulated. So it's very important that you get that, you get it right.
It's also nuanced and different depending on geography and and location and time, etc. So what kind of a model might you need to
15:21
to react to that? Starts to get tricky to, to kind of guess here.
Well, uh, regulations change all the time. So maybe I need something like, like a RAG that can decipher, uh, that those updates, capture those updates
15:37
and apply them into my data structure as soon as possible. Um, they're also kind of hard to read.
So maybe I need sort of a, and I'll use the term loosely here, natural language processing. Uh, a bottle or maybe an LLM,
15:52
um, to to both sort of decipher what the rule is and then to generate and communicate, um, your actual compliance back to, back to the people you're accountable for.
16:09
Right. And then also, if you do find that you're out of compliance and it needs to be fixed very fast.
Uh, so it really would be good to avoid that situation. So maybe some sort of predictive model using something like a traditional ML DL might be useful.
16:28
All right. So I have a use case that needs what, four models?
And maybe a combination of all of them and maybe not know one of them is the exact right one?
16:44
And that's before I even start trying to figure out how many cards I need to map these models to. So how can having other cards actually help me here?
And there's kind of two ways that they can.
16:59
Um, the first one is, well, if compliance is so important, this might be a good use case for actually designing and building a custom ASIC or having one built for you. So in that case, we can decide which pieces of these models are most important,
17:16
and we can design a custom piece of hardware that is optimized to do language processing and predictive models. And we don't waste any silic here or design effort on some of the other stuff.
And that can be designed and physically built
17:34
as an AI card. Now everything is much simpler because instead of having to go user case, a whole bunch of models, how many cards, you can simply go, okay, send this one to my compliance card.
17:54
But then, you think about the next thing that you need. Oh, maybe it's security.
Wow. Okay, I need more models for this, too.
All right, so we can't have a magic decoder
18:11
ring to know exactly what models we need. And we also can't build a custom ASIC for every single AI use case that we have because there's many.
So this is where an agentic
18:27
AI can really start to become powerful and helpful. To me, this is a beautiful use case for agentic AI.
All right.
18:42
So what is agentic AI? The concept is that AI agents exist, and they're built in the virtual world.
And these systems are capable of autonomous decision-making and actual goal-oriented, goal-directed
18:59
behavior. So you can imagine how that capability can enhance sort of this enterprise-scale IT organization and headaches here.
They can reduce some headaches, uh, for for your infrastructure management.
19:16
Uh, So yeah, we could customize the AI card. Um, or we could build a new card whose job is to act as a virtual assistant, uh, for deciding
19:34
what other AI cards and what other AI models get used. So, this whole piece could be an AI agent.
19:50
And then you could have a separate AI agent for security, etc. So when you have a use case to interact with your, um, compliance task, um, or your security task or adaptability, uh,
20:08
you can bring the problem directly to the AI agent. The AI agent can capture that input, understand the problem, and then, based off of the resources that are available in the ecosystem, understand
20:27
which models to use and where to deploy those models. So, for compliance, it might look at it and say, okay, all right.
This, this, this. First, we need to check and see if it's compliant.
I need to go out, uh, to, to my board card
20:44
and find out if it is. And then, it can say, oh, wow, something's out of compliance.
We need to react quickly. Uh, let me let me send something on system or on die to go take care of that.
Uh, and then, I need to generate a report and send it back.
21:00
And that takes. That's not timing critical.
So maybe I'll send that out to the remote server and that can get done. It's optimizing to get the task done as soon as possible while also making sure that it doesn't overload resources
21:17
that are being used by the actual mainline transaction workloads, so it can decide what to do, uh, best in the moment based off of the resources available. Could be another AI agent is already using this, so it has to realize okay, I'll send it over there instead.
21:35
This type of thinking almost is really, a really one example of the AI card's fundamental role in metaverse enablement. There's many, many more.
21:51
The crux of it is that these AI cards really hold the power to redefine you in computer interaction for the better. Something like this, and AI agent can really take the headache out of it.
These cards are becoming a key catalyst for the agentic AI paradigm,
22:08
and that shift is happening right now. Leveraging AI cards combined with agentic AI to handle logistics in the virtual world.
That's the future of AI solutions, and that's going to enable a near infinite number of possibilities.