π Add to Chrome β Itβs Free - YouTube Summarizer
Category: N/A
No summary available.
00:00
So, everyone is talking about MCPs, AI agents, and agentto agent protocol. If you feel left out, this is the only video you need to watch to catch up.
In this video, we'll talk about AI agents, MCPS, and agent to agent model in a super simplified manner with
00:15
visualizations that will make it easy for anyone to understand. No background knowledge in AI or AI models or coding or programming required.
In the first part, we'll explain the why and the what behind these concepts. And then in part two, we'll dive into some code and
00:31
understand the how behind its implementation. And you'll also gain access to a hands-on lab that you can use to practice this as you watch this video.
So we'll start with something we already know, chat GPT. So what is chat GPT?
It is really two things, a chat
00:46
application and a GPT attached to it. The chat part itself is just an application like any other chat app that we know.
The GPT part is the large language model. That's the AI.
There are so many other LLMs that you have probably already heard of such as Claude
01:03
from Enthropic, Deepseek, Gemini, Llama, etc. But we're not going to get into more details about specifics of LLM in this video.
We'll just refer to the AI as LLMs and that LLM could be any of these models for the remainder of this video. That's all you need to know about
01:20
LLMs. So the way it works is when a user asks a question in charge, the application sends a request to the AI which is the LLM and the AI generates a response and sends it back to the application and displays the results on screen.
Now let's say we are building an
01:36
application called fly GBT similar to chat GBT and we wanted to book a flight for the user. If I ask it a question say I would like to fly to North London it should book a flight for me.
So we need a magical something that my application
01:52
can interact with that would understand my request and do as I say based on what we just discussed that magical thing would be AI in the form of LLM. But if you look at the response it's just returned instructions in the form of text to me.
It did not actually boo the
02:08
flight for me. You see that's all an LLM can do natively.
An LLM can generate responses in the form of text, pictures or videos, but cannot by itself do anything or take any action. But what does taking an action mean here?
When I
02:24
say I would like to fly to North London, my application should be able to interact with these thirdparty flight services such as Joy Air or Dra Air or Aeroggo and retrieve flight details from these sites and then also compare that
02:40
against say my preferences such as whether I prefer cheap or luxury flights, my seat preferences or meal preferences and based on all of that information make a decision for me and not stop until it's retrieved enough information to be able to make a decision. and then book the flights for
02:55
me and tell me the flight details and booking reference numbers. So, I'd like my AI to take action for me.
So, we need something magical, something that can do that for us. And what is that?
Those are called AI agents. AI agents are able to interact with third party platforms or
03:12
websites, gather information and combine that with a memory that it has based on our previous conversations and then interact with an LLM, which is a real AI here, to make a decision for me. And that magical thing is known as an AI agent.
An AI agent can interact with thirdparty tools, have its own memory,
03:28
and interact with an LLM and go back and forth between these operations multiple times to eventually be able to have enough knowledge to make a decision for me and then also take an action to book a flight for me and not stop until it's done that job. That's what an AI agent does.
Now, one of the most common
03:44
examples of agents that we work with are idees that we work with every day. If you have worked with cursor, winer or VS code and used GitHub copilot, they have this agent mode that makes them work as an agent.
And what does that really mean? Work as an agent.
In the past, in the non-aggent era, if you ask a
04:00
question, it would give you an answer. That's all.
In an agent mode, you can give it one task, a big task even, such as build an entire app or troubleshoot an issue. It goes through this sequence of multiple AI calls and interacting with the codebase as well as the terminal if needed and does not stop
04:16
until it's done. what you asked it to do.
That's difference between a chatbot calling one LM called to an AI agent that performs a series of different tasks until it gets things done. Now, one of the real world use cases of AI agents is in software development.
04:33
You can ask it a question like we recently noticed that a button was missing on the UI will help me identify when and how this changed and share a plan to revert it. And the AI agent now scans through the code bases, looks at the front end and the backend code and also the git history via the terminal
04:48
and finally tells you exactly which commit caused this change and even how to revert that change or fix it. So how do I get started with agents?
There are platforms that have built pre-built agents that you can call like agent.ai for example where people have built hundreds of agents that perform different kinds of tasks like video
05:04
script generators or web design graders etc. You can integrate these directly in your application by invoking them remotely or you could build your own agents using tools like NA10 without having to actually code.
NA10 gives you the ability to drag and drop and build your own agents. Some example workflows
05:20
available on NA10 include automating generating AI videos uh on YouTube, intelligent email organization with content classification, etc. Another option would be for you to build an agent from scratch using platforms like Langchain or Langraph.
But we won't get
05:36
into this in any more detail for now. We have an entire course that covers these topics on our platform.
So coming back to this, we said that an agent can interact with third party platforms this way. But how does an agent really interact with a third party platform?
It does that through what are known as
05:53
tools. A tool allows the agent to interact with another platform.
Let's take a closer look at that. So, here the agent has the ability to interact with these airlines using a tool for each one.
But how does that tool interact with an airline? So, here's a quick heads up.
If you know about APIs
06:09
already, uh you may want to skip ahead a few minutes. If you don't or need a refresher, stay on and allow me to explain.
Well, let's forget about AI and tools for a second and see how we interact with these airlines as a human user. So, as a human user, say I would go to the airlines website at say
06:25
www.irates.com. com and see that it's returns me a web page and click around to find a flight as per my preference and book the flight and this is known as the UI or the user interface of uh the airline.
So you have the websites or mobile apps all fall
06:40
into this category. But if I were not a human user instead if I was a third party website like make my trip or booking.com or cheap slides I'm an application trying to communicate with another application.
In the past, what these thirdparty applications did is scrape the websites which is basically
06:55
saving the website as a text file and when they scrape the website you get a junk text like this which is the HTML but within that junk lies the information you need which are the flight details and so they would run complex algorithms against these to pull the required flight details from here. So eventually the airlines realized it's
07:12
beneficial for them too to be on these third party platforms. So the airlines told them instead of going to emirates.com you can go to emirates.com/apiflights and when you do that we will just send you the flight details in a structured format.
So you don't have to do any crazy parsing algorithms and that
07:28
interface that applications provide to other applications is known as an application programming interface or APIs. Now, not only did they say you could retrieve flight details if you call the /appi/book flights, then we would book
07:43
the flight tickets for you and return the booking reference number. So, you can let your customers book the flights from your own website without even coming to our site.
Now, I'm super simplifying this. So, if you go to these URLs, it won't work like this because it requires authentication and authorization and other mechanisms.
But,
07:58
we have a hands-on lab that will help you learn all about this MCP. So, check it out using the link in the description below.
I'll also walk you through the lab at the end of this video. So just to summarize that the interface that users use to interact with the site is called the user interface and the interface that the applications use to interact is called as an API.
So back to this now
08:16
that we know how applications interact with applications. How do you think tools interact with airlines?
Well through APIs. So each tool is a piece of code that interacts with the API of the respective airlines to retrieve flight information and those details are then
08:32
shared with LLMs to make a decision and then based on the decision, the agent uses the tool again to make another API call to book the flight on the respective airlines. In this case, the agent made another API call to the Joy Air to book the flight on that airline.
08:48
Now, if you take a closer look at that call, you'll see that each call is different. The first one is / API/flights.
The second one is /flights-list. The third one is list flights.
And also their responses are different too. The first flight returns uh information in a
09:05
format that has flight number, origin, destination. The second one returns uh information that says flight number from and to.
And the third one uh says detailed flights and flight and start and finish etc. So each airline has its own standard when it comes to their APIs.
There are hundreds of airline
09:22
sites and there are millions of other third party sites. And if I want my application to interact with all of them, do I now need to write all of these adapter codes?
Now we are in the AI world and I shouldn't have to do this. Well, gone are those days where I would sit and write programs to connect to these different flight service
09:37
providers one by one. Why can't AI just do it for me?
Only if there was some magical solution that existed that could do that for me. And so comes MCPs or model context protocols.
Well, think of MCPS as a guide for the AIS to choose the right APIs and interact with the
09:54
third party platforms. Well, MCPs provide agents the context they need to make the right API calls.
What does that mean? For example, it might look like this.
In this case, the MCP tells the agent that Joy Air has search flights and bookflight capabilities. And the
10:10
input structure looks like this. And the output structure looks like this.
And we'll dig deeper into the implementation of this uh in the part two when it comes to building an MCP server. So MCP was introduced by Anthropic, the company behind Claude, and has since been open source and is now the default standard
10:27
used by everyone to build AI agents. Now, so if you go to model context protocol/servers, you can find MCP servers for a long list of applications.
Now, every agent has an MCP configuration file located at mcp.com at some location depending on what agent
10:42
you're using. So you must specify the name of the MCP.
In this case, it's MongoDB. The command and arguments associated with it uh associated with running the MongoDB MCP server.
In this case, the arguments are MongoDB's connection string to reach the database, which is my local database. This allows
10:59
the agent to use the MCP server's abilities to connect to the database and retrieve information as well as make modifications to the data. Now, in my case, it's a local database.
The location of this file depends upon the tool being used. Cursor for example has this path specified at the cursor
11:15
directory in the user's home directory for windsurf it's under thecodium/winsurfs directory and this is a path for the configuration file for claude so going back to this MCP server works in a client server model so instead of interacting with the API directly we now
11:33
have the MCP server for each of these airlines and then you have an MCP client at the agent that interacts with these MCP servers And so a combination of AI agents that has memory, has cold-driven behavior, and has access to AI as an LM with MCP
11:49
servers that helps AI agents discover the capabilities of third party applications help us build magical solutions to problems. Now it's time for us to expand and scale up.
This agent can only book flights, but what if we want to expand our use case to book hotels, too? So, one thing I could do is
12:06
expand this agent to add more MCP servers to also connect to hotels, but that's going to add bloat to my agent. Uh, now my agent needs to be good at two things and remember uh my preferences for two things.
I might have amenities, beds, and other preferences for my hotel
12:21
which are different from those for flights. So, we ideally want one agent to do one thing and do that thing really, really well.
And so our next option is to build a new agent that can do the hotel booking really well that has its own integration with MCP servers and has its own memory with those
12:36
specific preferences. And my original agent is going to call this agent.
So that's an agent to agent call. Now I have one flight agent that's really good at finding and booking flights.
And I have another hotel agent that's really good at booking hotels. But how does one agent talk to another
12:52
agent? How does one agent know what are the capabilities of another agent?
What format can one agent pass information to another agent? Well, this is where the agentto aagent model comes in.
The agentto aagent model was developed by Google with the goal of making it possible for agents to be able to
13:07
collaborate in a dynamic multi- aent ecosystem with support and contributions from a lot of other partners in the ecosystem. So, how does it work?
The agentto agent model allows one agent to discover capabilities in the other agent. For example, the flight agent can ask the hotel agent, "What can you do?"
13:25
The hotel agent responds with its capabilities that it can search and book hotels. Then the flight agent gives the hotel agent a task to search for the best hotels and then the hotel agent responds back with the results of that task.
So agentto agent uh model defines
13:40
a set of standards that allows agents to discover each other's capabilities. It defines a standard to assign task to another agent and check its status.
It defines a standard on how agents communicate with each other and also defines how context and results are shared back and forth between agents.
13:56
We'll see this in more detail in part two of this video. Well, let's take a look at some real world use cases of agents and MCPs.
Here's one use case that we spoke about earlier with reference to development. Say if I have an issue with my application, I could say we recently noticed that a button
14:11
was missing. help me identify when and how this changed and should I plan to revert.
The AI agent interacts with the get history uh reads the back end and front end code and then identifies the exact change or commit that caused this change. The next use case is using AI agents and MCPS to build backend
14:27
applications. In this case, we are developing APIs and I'd like my agent to have access to the MongoDB database so that during the development of the APIs, the agent can test these APIs and make sure the data is available in MongoDB.
So this is a very helpful use case. And
14:43
then here's another one that we had internally. So we have uh three uh data sources.
Stripe, Google, bequery which is our data infrastructure and then we have metabase which is a visualization platform. We had an issue where uh we were missing an invoice detail from a particular stripe record and we were not able to identify which user that was for
15:01
and so we t the AI agent provided it access to these three data sources through NTPB servers and it was able to go on a 5 to 10 minute uh troubleshooting journey and eventually come back and um tell us uh why it was missing and the particular transaction ID associated with that and the amount
15:18
associated with that. So uh those are some examples of real use cases that the uh MCP servers can be used for.
Next we'll get uh some hands-on experience. So we'll head over to the lab using the link given in the description below and uh let's uh quickly take a look at the hands-on labs.
15:35
All right. In this lab, we're going to walk through simulation of the flight MCP with client in our code you know in the VS code editor.
So this is a free lab that's hosted on codecloud. So use the link in the description below to gain access to this lab so that you can
15:51
walk through it yourself. So once you open the lab environment, you're given a a set of instructions on the left side here and then you have a set of the VS code editor here on the right.
So the there's some code here that you can ignore for now. I'll explain all about
16:07
it in a little bit. So we'll start with a quick walk through.
So in this lab we will explore how to configure the flight NCB server in client. So client is similar to cursor or or winds surf but there's another agent here.
So if you click on this agent button here, this is a VS code extension or plug-in that
16:26
behaves just like the cursor or Windsor or other or GitHub copilot that you might have worked with. Right?
So let's click okay and go ahead. So in the next step here it says let's set up client.
So we want to open the client interface by clicking on the robot icon on the
16:41
left side of the MWS code server. So that's we've just already done that.
So the first step is to set up a client so that you can chat with it, chat with the AI. So let's go to the next step.
Okay. So the instruction here is to configure API key.
So you don't have to bring your own
16:56
keys. We we provide you the keys that are needed.
So but here you need to select use your own API key and then in the API provider you'll need to select open API compatible this one. Right.
Right. So now we need to provide a set of information including the keys and those details are actually available
17:12
here. So it's already available in your environment.
So if you go to your home directory and bash profile here you have the keys that are needed for you to work with any of these endpoints during this lab. And this is
17:28
free for you to use and play around with as much as you want. So in this case we need the base URL.
So the base URL we're going to use OpenAI. So I'm going to copy this base URL from here.
Paste it here. And then I need the API key.
So I'm going to copy the API key for the OpenAI uh from here to here. And then I need
17:45
the model ID. So the model ID is going to be OpenAI/DP-4.1.
And that's it. And then you click let's go.
Okay. So client is set up.
So we're just going to close these messages. And we'll send a quick test message to check if it
18:03
can hear us. So, it's going to send an API request and we'll see.
Yep, it says I can receive a message and respond to your request. Let me know what do you need help with.
Okay. All right.
So, this step is complete. So, we're going to go ahead click okay and go to the next
18:19
step. Okay.
So, now we're going to ask for flight details and observe that it's not working. So, we have not set up the MCP server yet, but I'm going to ask it to share flight details.
So I'm going to say can you check flight details for me from
18:35
Safo to JFK for today and let's see if it's able to do that. Okay.
So what it's not going to do is it's understood my request and it's going to open Google and try and access
18:52
you know publicly available flight information. But that's not really what we need because we don't want it to go out and access the browser.
Instead, we wanted to use our MCP tool that already has that information, right? So, I'm just going to reject that request and prevent it from going out.
And let's see how we can
19:09
configure the MCP server. It says I'm unable to access the browser to look up flight details.
Would like to provide access to a specific API or connect MCP server. So, that's what I'm going to do.
So, to connect the MCP server, the steps here are to click on the server button at the bottom. So here you have manage MCP servers.
I'm going to
19:24
click on that and then there's a settings icon and then here I have configure MCP server. So I'm going to click on that and what this does is it opens up this client MCP settings.json file.
So for client this is the the file that needs to be updated. Then I'm going to go here and copy this configuration
19:40
and I'm going to paste it here. So what this means is this is a list of MCP servers.
I'm going to call mine flight sim or flight simulator. And then there's this is basically a simple script that's located at root flights sim mcp and flights sim-mcp.sh.
So as
19:56
soon as I put it here, it's already become available here. As you can see, it's green.
This means this MCP server is ready to use. So I'm going to click on done now.
But I just want to take a minute and show you what the location of this file. So you have this flight sim MCP here, which is this path.
And if you look into this, there is all the code
20:12
that's written to run the this server. So this is basically a Python file and if you expand this here you're able to see this list of uh the code here on the src and then you can see the prompts and resources and tools and everything defined.
Don't look at this for now
20:29
because we're going to have another video where we explain these in much more detail. So let's go back to this and let me ask the same question again.
Can you help me find flights from SFO to JFK? and we'll give it a minute for it to
20:46
interact. Okay, so it says there's now an MTP server available flight sim that provides details of first search flights and all that.
It's given me this details and it's done the search and it's actually got back with some of these details and it's going to make a call with some of these details. It's asking
21:01
me for my permission. So I'm just going to say approve.
All right, so it's identified the flight and uh you can see the response here. But yeah, here's a more human readable format.
So it says there are these airlines that are available.
21:17
And so I'm going to say book the cheapest flight for me. And let's see what it what it does.
Now the capabilities of the flight MCP tool tells it that it can book the flight which is this particular one. But
21:34
it needs these inputs. So the inputs are first name and last name.
So I'm just going to give my name and email. Let's say cloud.com and they're going to give me a phone number, right?
And let's see if it can pick
21:49
that. Okay, it's got the passenger details and it's going it's going to do the booking.
It's asking me for an approval. I'm going to say approve.
I think there's some error in terms of
22:04
validation. I probably did not give the right information.
Yeah, the phone number is not valid. So let's say let's say I'm going to give it another number.
22:20
Okay, I'm going to prove again. Yeah, I think there's uh probably still got the phone number wrong.
Let's just copy and paste that.
22:41
Now, while it does that, here's something else we could do. Okay, so that task is complete.
So, it's able to book my flight and it's given me all the flight details and booking numbers and all of that, which is pretty cool. Okay, so that's a quick demo of using flight simulator MCP.
So, go ahead and try this out yourself. In the upcoming video and
22:58
lab, we will have we'll see how to build your own MCP servers. But for now, if you'd like to play around with it, you can take a look at this codebase and you can basically ask client to explain read and understand code at this location and explain it to
23:17
me. Now, as an agent, what it's able to do is it has access to the file and folder structure.
So, it's going to spend some time reviewing the directory, reading the files, and understanding its structure. and it's going to be able to tell me and explain to me how it's all
23:34
set up. Okay, so there it is.
So it's able to tell me that it uses a fast NCP server. That's what I've used.
And there are these core features which is search flight and get flights and all of that. Then there are resources and API prompts
23:49
and it has a modular design. All right.
So yeah, that's quick intro to the MCP server lab. Take a look at it and let us know how it goes.
Right. In the next part of this video, we will dive deeper into how to build your own MCB servers and clients and the agent to agent
24:05
models. So, do subscribe to our channel to be notified when it's out.