Category: AI Development
00:00
[Music] All righty. Hi everyone.
My name is Isaac and I'm a core contributor to DSP. So I want to start off quick thank you to Jeff and the Chroma team for setting up this awesome event and thank you to
00:16
RAMP for hosting us here. So today I'll be talking about using DSP for context engineering.
Let's get right to it. So why are we talking about context engineering?
So, this term is fairly new, but really context engineering's
00:31
popularity is a symptom of the problem that it's trying to solve, which is that as a field, we're still learning how to build reliable AI software. Here's a tweet from Toby, the CEO of Shopify, where he defines context
00:46
engineering as the art of providing all the context for the task to be plausibly solved by the LLM. I'm sure you're all familiar with this tweet with this tweet.
So then kind of a follow-up to that is that what are the actual challenges that you need to solve when you're doing context engineering.
01:03
Let's look at this from an example of extracting information from an email. So for this example, you're writing an internal productivity app to digest email threads per employee.
So this app needs to do two things. The first it needs to do the first thing it needs to
01:19
do is to collect all the important events from whatever email you're passing in. Then you want to extract and prioritize all the relevant action items from this email.
And so given this information, our app could do a number of things. It could propose focus blocks.
It could um propose meetings in
01:38
order to follow up on these action items. There are a number of possibilities.
So if we were to build this task, what does it actually entail? Well, we have three input fields, right?
The subject, email thread, and attachments. These are fairly arbitrary that I selected.
And if
01:55
you look at it, subject, well, that's probably a string. Email thread, well, that's probably also a string.
But email data is messy if anyone has ever worked with it. And it can get really long.
How do we actually know what information we want to include in this from this email
02:11
thread in our program? And then lastly, we have attachments, which for this example, we'll say they're images with um they're just images, but they could be powerpoints, they could be word documents.
Um they could be a number of things. So, we need to take this
02:26
information. We need to pass it to our LLM.
And then on the other end, we need to get events and action items out. And events and action items, at this point, we haven't actually specified what they are or what they need to contain.
our our task is wide open. And so at this at
02:43
a high level, the system doesn't really sound complicated yet, but it brings up a lot of questions. So you need to pass the the LM some of these inputs like subject, email, thread, and attachments.
But these could be in a lot of messy formats, and you
02:59
need to actually consistently pass it in a way so that the LLM will respond fairly consistently across different data inputs. So when you're doing this and you're writing a prompt, you have to ask yourself, do I want to use XML?
Do I want to use JSON? Do I want to use some
03:16
other format? How does the model actually know what is what?
And then you also ask like how do I include images? Um how do I make sure the model knows what what image or attachment is being referenced at each point, right?
Is that this becomes a prompting nightmare um in
03:33
terms of just tagging everything. And so you need some way to structure your inputs.
Then you might ask, okay, I know that I need to give this to the model. How do I actually get it to do the task correctly?
Um, do you need to add some
03:50
reasoning field? Do you need to add tools?
How does that pollute your context window? What is the downstream implications of doing any of these?
Right? right?
Is that when you're editing a string prompt and you're you're begging the model, please add in a reasoning field in between thinking
04:05
tokens, you actually need to be able to measure what will happen downstream. And you want to be able to flexibly change between different strategies because a strategy that's good this week might not be good next week, right?
These change week to week.
04:22
And then one important part is that you might ask, okay, I've given the LM everything it needs to know. What do I actually want it to output?
Which is a hard question until you dive deep into what your actual problem is and what the requirements are for your specific user.
04:38
So you then have to beg the model to actually output it in the JSON format that you or in the format that you want. It could be JSON, it could be XML.
It changes from model to model, from provider to provider, from si model size, right? These things are different and these models are successful are only
04:55
successful in outputting um these schemas at different scales and with proper examples. So now we've added structure to our outputs.
We know what we want the model to to say, but it's still not working.
05:11
And so what is the actual what is the LM actually doing when I pass in an example? And what is it that I'm telling it to do?
Okay, let's say it's hallucinating events. Well, now I need to add another line to my already growing prompt that contains my inputs.
05:26
It contains whatever my inference strategy is. It contains my output format.
It contains some validate like some ideas on what the model should do for this output format. And now you're begging it to not hallucinate too.
Um, but maybe kind of in the inverse, it then adds too few events. So you need to
05:43
change to go the other direction. So that brings me to the point of my talk, which is that prompts are bad ways to build good software.
Is that all of your design choices end up getting mushed together into one single giant
05:58
brittle string and then when you want to change anything, it breaks. And so this is just not robust to different LLMs.
Your different your exact word choice and what example you use matters for this LLM. what inference
06:13
strategy you want to use. Do you want to use chain of thought?
Do you want to use react? Do you want to use reflection?
I don't know. It depends on whatever your actual problem is.
Um, and you need to try different options easily in order to see what actually works to solve your problem. Then there's also formatting
06:31
your data and your input instructions and output formatting, right? is that all of these different engineering design decisions get tangled into one big long string.
So, DSP asked the question, what if
06:46
building AI systems was more of an engineering discipline, dare I say context engineering? And now we get to DSP, which is a framework for programming and not prompting language models.
And here's a tweet again from Toby, the CEO of Shopify, saying that DSP is his context
07:01
engineering tool of choice, which is an awesome endorsement for the framework. So, let's go over some of the core components and how they actually help you do context engineering.
We'll start off with signatures. Signatures turn LLM into structured function definitions.
07:17
Let's go back to our email example from before. We have our inputs, uh, subject, email thread, and attachments.
We have some task specification, which at this point still isn't clear. We're going to pass those into an LLM.
And then we have our outputs event and
07:34
action items. So how would you represent this inside DSPI when you're avoiding this big long prompt string?
Well, signatures are just class definitions. It starts with a dock string which is just a specification of the actual task that you're trying to complete.
Then you
07:51
have input fields which are can be explicitly typed or they can just be natural language strings. In this example, subject and thread are implicitly just strings.
But then attachments is actually just a dictionary from the file name to an image. Right?
It's that easy to specify
08:07
what your input format is and it's consistent across every input that that uses this program. Then we have our outputs which are a list of events and events can just be Python objects.
They can be paid objects. They can be anything.
And
08:23
action items which are guaranteed to be a string and then map to one of these three literal options. And here there's some DSPI magic because all you have to do is declare the types in your signature and DSP will make sure that whatever the output is at the end
08:38
matches this type or it will throw an error. And so signatures are great because they delegate your inference strategies, however you want to do learning and your language model plumbing to other parts of the framework.
This just represents your event extraction, right? This is the
08:55
task the the language model is doing which is different from an inference strategy. So the second concept is a module.
A module lets you swap and apply inference strategies to your actual signatures. So in order to try a different inference
09:11
strategy, let's say you want to try react instead of chain of thought. You don't actually have to change a string prompt.
So we'll start off with just chain of thought, right? Chain of thought is not new, but notably here is that it's all you need to add chain of thought is you just need to wrap your
09:27
signature extract events in dspi.chain of thought. Now, what's nice about this example is that your concerns are separated.
The reasoning field is not part of your task and it shouldn't be in the signature. Right?
In order to extract events, you don't need a reasoning field. You're saying that in
09:43
order to have an LLM be good at this task, I need a reasoning field. Then what if we say okay we think tools will help.
So then what we can do is we can convert our signature into an interactive react program where we're giving the model access to OCR and other
09:58
and a search tool. Here react will let the model reason and then take some action repeatedly until it ends with the outputs for our signature.
But you'll notice all we need to do is provide our signature and provide our tools which are Python functions to the model.
10:15
Then lastly, we have refine, which will iteratively try different examples in order to return the one that scores the best according to some reward function. What is this reward function?
It's it's any Python function that you want. It can be an LMS judge.
It can be concrete um a concrete rubric.
10:32
The third concept is an optimizer which tunes your AI's system prompts or even weights. Here we have an example using one of DSP's most popular optimizers, microv2.
Here we have our judge as a metric which just takes in a subject, a thread,
10:48
predicted events and returns if it's accurate or not. And then we have our optimizer which will take in this program and try different combinations of few shots and prompts in order to maximize according to this metric on our
11:04
data set which is just some list of email threads that we've provided. And one of the cool things about DSPI is that we have the community building awesome integrations.
So here uh we have it's only a few lines to set up gRPO fine-tuning in order to maximize our
11:19
program's performance according to this metric. Um so this is an awesome work from Noah Ze who's a community member.
So what does DSPI actually do? DSPI handles all the plumbing, the scaling and the learning so that you get to
11:37
focus on the system design, natural language specification and evaluations. And together, these make up all of what you need to do context engineering.
And because of this design, you're separated from any single brittle string
11:52
prompt. And that means that you can actually invest in defining the things that are specific to your problem and the system that you are trying to build.
In conclusion, DSPI has everything that you need in order to build reliable AI
12:08
software and to free you from battling string prompts in order to actually do the important context engineering work. Lastly, DSPI is made is made possible by an open awesome open source community which you should all join and be a part of if you aren't already.
Um, check us
12:25
out at DSpai.ai. Um, follow us on Twitter and I'll be around afterwards and feel free to reach out on Twitter.
Thank you. [Applause] [Music]
12:42
[Music]