How to Build a Multi-Agent AI App with AutoGen | SingleStore Webinars
How to Build a Multi-Agent AI App with AutoGen | SingleStore Webinars
How to Build a Multi-Agent AI App with AutoGen: Insights from SingleStore Webinars
In today’s fast-evolving AI landscape, the development of multi-agent AI applications is gaining momentum, offering profound implications for various industries. SingleStore, well-known for orchestrating top-notch webinars on AI and data, recently showcased a valuable session on building a multi-agent AI app using AutoGen. This article delves into the insightful concepts presented in the webinar, providing a roadmap for businesses and developers eager to leverage multi-agent architectures in their projects.
Introduction to Multi-Agent AI Systems and AutoGen
The recent webinar hosted by SingleStore featured Matt Brown as the moderator, who set the stage for discussing multi-agent AI systems. Multi-agent systems involve the integration of multiple AI agents that can operate independently or in collaboration, tackling complex tasks more efficiently than a single AI agent could.
The session emphasized the power of AutoGen — a platform designed to facilitate the development of multi-agent AI systems. AutoWorkflows in AutoGen enable the orchestration of several AI agents, each performing specialized tasks and contributing to a cumulative outcome, effectively managing more complex queries and operational demands.
Understanding Multi-Agent Architectures
The core functionality of multi-agent systems can be likened to organizational structures in companies where tasks are distributed across various departments. Each agent in a multi-agent system handles specific sub-tasks, relaying information back to a supervisory agent which synthesizes the input into coherent outputs. This not only enhances task efficiency but also leverages specialized expertise of different agents in problem-solving.
AutoGen for Building Multi-Agent Systems
During the webinar, the process of using AutoGen for building robust multi-agent AI applications was explained. AutoGen facilitates the creation and management of multiple AI agents, allowing them to interact seamlessly and exchange information to resolve complex inquiries effectively. This system can be particularly advantageous in environments requiring dynamic interaction and data retrieval from diverse sources to deliver accurate responses.
The Role of SingleStore in Enhanced Data Handling
In the discussion, SingleStore was highlighted as a pivotal component in managing data for AI applications. SingleStore efficiently handles massive volumes of structured and unstructured data, providing a unified platform that enhances data retrieval processes critical for informing AI agents accurately in a multi-agent system. This streamlined data handling ensures that AI agents have access to the required data in real-time, essential for making informed decisions and generating reliable outputs.
Implementing Multi-Agent Systems: Practical Insights
Further into the webinar, practical aspects of implementing multi-agent AI systems using AutoGen were discussed. It covered how to setup different agents, define their roles, and configure their interactions to maximize the efficacy of the system. Real-world applications were also discussed, illustrating how multi-agent systems can be deployed in sectors such and financial services and telecommunications, where they can handle tasks ranging from customer service to data analysis more efficiently than traditional single-agent systems.
Conclusion: The Future of AI with Multi-Agent Systems
The webinar concluded with a forward-looking perspective on the development of AI technologies, emphasizing that multi-agent systems represent a significant leap forward in making AI applications more versatile and effective. As AI continues to permeate various sectors, the ability to develop and deploy effective multi-agent systems will become a crucial skillset.
In summary, the informative session by SingleStore provided a deep dive into the construction and benefits of multi-agent AI systems using AutoGen. For developers and businesses aiming to leverage the latest advancements in AI, adopting a multi-agent approach could substantially enhance operational efficiency and decision-making processes. As AI evolves, the integration of platforms like AutoGen and data management solutions like SinglejsOnlineStore will be instrumental in harnessing the full potential of AI technologies.
[h3]Watch this video for the full details:[/h3]
Join us for an enlightening webinar on “How to Build a Multi-Agent AI App with AutoGen,” where we’ll dive into the innovative realm of Microsoft’s open-source AutoGen Studio and the development of multi-agent AI applications.
This session is crucial for those eager to navigate the AI-driven technological landscape, emphasizing the integration of AutoGen Studio with SingleStore to create dynamic, interactive AI skills. Our demonstration will showcase how to use a skill that seamlessly communicates with SingleStore and also how to use multiple agents collaboratively performing tasks.
Sign up for SingleStore’s Free Shared Tier today! Click the link below and get full access to our cloud data platform without committing to a paid plan
https://bit.ly/singlestore-free-tier
=====================================
Discover the power of AutoGen Studio in revolutionizing the way we approach AI application development, offering unparalleled insights into efficiency, decision-making, and system scalability.
Here’s what You’ll learn:
1. Introduction to Microsoft’s open-source AutoGen Studio and its role in AI application development.
2. How to leverage an AI skill that interfaces directly with SingleStore data
3. Techniques for coordinating multiple agents to perform collaborative tasks.
4. A comprehensive live demo showcasing the creation and deployment of a multi-agent AI system using AutoGen Studio.
Sign up for SingleStore’s Free Shared Tier today! Click the link below and get full access to our cloud data platform without committing to a paid plan
https://bit.ly/singlestore-cloud-free-trial
==============================================
In case you loved this webinar and want more such amazing ones you can sign up for our upcoming webinars from here – https://bit.ly/singlestorewebinars
#AutoGen #SingleStore #AIdriven #AIApplications #applicationdevelopment
[h3]Transcript[/h3]
hello everyone and Welcome to our webinar how to build a multi-agent AI app with autogen my name is Matt Brown and I’ll be your moderator today one of my main jobs at single store is to organize these weekly AI webinars two per week demoing data and AI use cases and new technologies and tools I post about these upcoming webinars every week so if these topics are interesting to you feel free to connect with me on LinkedIn to stay in the Loop I’d also love to hear your feedback or any ideas on topics that you’d like to see in future sessions speaking of future sessions if you click to the next slide you’ll see that on Monday we are going to be presenting building with Google’s new open Gemma models you can come and join to see a live demo and a code share on how to use these new the new Gemma models uh they’ve got very lightweight designs open access uh for AI applications so if this top topic sounds interesting to you go ahead and RSVP it’s the uh short link you see on your screen the Google Gemma webinar uh I’ll put that link in the chat in a minute as well or you can just take a photo or a screenshot of that QR code that’s on your screen and that’ll take you to the same place to RSVP and get your Zoom link hope to see you there on Monday back to today’s topic uh we’d love for you to participate in the Q&A throughout the session there is a button at the bottom of your Zoom that says Q&A uh feel free to click that type in your question we will try to get to your questions as they come in we’ll also have a live uh Q&A session with our panelists um talking at the end of this session um but we’ll also try to answer them as they come in um and uh this has gotten tough with so many registrants these days but we have a mission statement to answer 100% of the questions that come in so feel free to ask away make sure that you include your name and email address in your question because we will not be answering any anonymous questions uh we’d also love for you to try out the technology that we’re we’re talking through today in fact anyone who tries it today will be entered for a chance to win new airpods Pros all you need to do to participate in that raffle is visit that link that’s on your screen it’s the multi agent autogen raffle the QR code will take you to the same place I’ll put that link in the chat and just just a minute um and all you need to do is log in you may know already that single store has had a free trial for many years but just in the past few weeks we’ve also announced a new completely free shared tier uh we can talk about that more a little bit later today um so let me introduce our speaker for today Alex Pang is a growth engineer at single store um his formal training was from the Stanford computer science program he has expertise in Big Data machine learning and AI he’s becoming a regular speaker at these weekly webinars on data and AI welcome Alex the floor is now yours thanks Matt um well like like Matt mentioned um you know this set of weekly webinars is something that we’ve been doing because we’re really excited about some of these new capabilities that are becoming possible because of this wave of AI and to really understand and to set a foundation for you know the topic that we’re going to be today which is you know autogen for multi-agent AI workflows we really have to set um and understand the emerging architecture architecture that we’re seeing for llm applications so really the core of this is going to be this whole um concept of retrieval augmented generation which if you’ve attended um our webinar our webinars in the past then you’re going to be well familiar with this but if you haven’t then um this would be some good uh first material to get familiar with and that’s really the idea that when we use these llms um you know on first pass we use something like chachu BT or we use something like you know um Claude from anthropic and we’re kind of wowed by like how much they seem to know they’re almost like this you know Genie in a Bottle who can give us all of our answers right but very quickly you find out that these llms are actually not all knowing oracles and instead um they tend to do this thing that we call hallucinating um which you know lots of humans do too but we especially don’t want to see it in an AI application especially especially if we’re deploying it in um an Enterprise setting and hallucinations are essentially when an llm doesn’t have the information necessary uh to produce an answer so it just makes things up right and to fix this um we’ve come up with this whole concept of retrieval augment generation or um the Comm the AI community at large has come up with this concept of retrieval augment generation and that’s essentially taking um the semantic meaning of a user’s query so you know if you’re asking hey what are the you know 10 most profitable companies in my stock portfolio right it’s taking the semantic meaning of that and then it’s looking across your uh your database uh for the most relevant information um and then supplementing that information to the llm so that it’s able to give you a more accurate picture and give you hopefully a non- hallucinated answer and so where does uh retrieval retrieval augmented generation fit inside our kind of wider AI architecture well obviously we have you know our UI which is going to be you know the chat gbt web UI it’s going to look like you know some of these AI startup tools that are coming out like flow wise which lets you um create you know a no code UI on top of any um llm model um we have a great webinar on flow wise as well highly recommend checking that out also by me um and then one layer below that we also have you know the API endpoints and that’s how we’re going to hook up the UI and the browser components uh to our actual llm um but we don’t want an llm that’s going to be hallucinating all over the place so we also need to provide it with um contextual data we need to um augment its generations with retrieval and that’s where something like single story is going to come in where we’re going to perform semantic search so that’s looking at the meaning of your query and then looking for pieces of information that are closest to your query in meaning and also lexical traditional traditional lexical or keyword search which is going to be search that looks something like what you do do when you go to Google and you search for something that’s going to be much more uh keyword based and matching along keywords right um you’re going to want to do usually a combination of both to get the best kind of mix of information for um your use cases and just kind of zooming in a bit more into what this contextual data layer looks like um this is obviously going to look different um depending on your contextual data store at single store we we we think you know we think that uh our solution is going to be something that’s going to be um you know perfect for Enterprise use cases because we end up um we end up unifying these two often siloed sources of data structured and unstructured data into one single store um and so you can see here obviously we have you know table tabular data you know from a relational database um you know in a SQL database and then also unstructured data you know slack messages or PDFs or csvs or you know Jura tickets right all of that information can flow into the same um data platform which we call single store and then um out of it you can get um you can pull uh semantic and lexical information to augment your llms and what are some benefits of you know having everything in one you know Under One Roof right we think that there’s huge benefits to be had you know um enforcing a single data access policy over your the entirety of your Corpus of data right um You don’t want to muck about configuring multiple databases potentially seeing multiple databases fail and having multiple points of failure um when you could just simplify and have you know one single data platform for all of your um AI data needs so let’s think a little bit more about this whole kind of multi-agent um architecture that we’ve been seeing recently and hopefully the reason why everyone is here today um I want us to think about the typical um you know LLC the typical company right at the top you might have the CEO and then you might have directors and you might have vice presidents and then you know under vice presidents you might have some managers and then under the managers you might have you know Engineers or customer support agents or you know um uh sales salespeople you know the list can go on um but really we see this hierarchical structure right and why why have this hierarchical structure because the organization at as a whole is T usually tackling a problem which is much much larger than any single person can take on right and so you know some very smart people got together and they thought about this when also the whole wave of llms is coming out and they thought uh some they thought well hey we’re throwing some pretty difficult problems at these llms um and you know often times they’re not not doing as well as we hope them to we hope for them to right um you know we’re asking the llm to hey go into like my library of like 3,000 PDFs and answer this really complex question or go over my entire email inbox and draft answers to every single unread email right we wouldn’t really expect you know a person with no context no insight into our lives uh to just be able to handle these tasks by themsel right so why would we expect that you know um why would we expect that a single llm is able to do this and that’s where these multi-agent workflows come in um you know we’ve seen today we’re going to be focusing on Microsoft’s autogen studio um but you know uh We’ve also seen actually just very recently Lang chain came out with um a really great multi-agent framework called Lang graph um and that’s actually a really that’s that’s also really cool because we’re directly integrated into Lang chain and so if you’d like to work with Lang graph um and you’re not especially tied to autogen Studio that’s another great alternative that I suggest checking out um but really the meat and bones of what we’re going to be um talking about today is taking this concept of you know a supervisor LL you know deploying sub agent L llms and then really putting it to practice and what does this look like well going back to the previous example of you know drafting responses to an entire inbox of messages right like you might think well I mean why not just feed it to an llm in sequence you know all of these unread emails and then it’ll just give you you know it’ll give you a perfectly good enough response right but I think it’s taking the naive approach as we’ve seen in AI often gives you know um very quick results but not necessarily production grade and Enterprise grade results right um you’re going to run into problems with hallucination you’re going to run into problems with you know benchmarking you’re going to want to make sure responses are up to a certain amount of quality right and that requires that for example you know again going back to this email example um that you know potentially we send out a supervisor LM for each email right for each unread email and we have the supervisor LM you know send out an agent to get context from other email threads and then we send out an agent to get context from you know the most recent items that you’ve been working on like the most recent like goog M like Google sheet or the most recent Google slide you’ve been working on maybe another agent goes into um you know your slack and pulls up your slack history with all of the um with everyone in the thread right and then these agents will summarize what they found and then bubble that back up to the supervisor L who then takes all of this information right this additional cont context right and produces a much much better response than you know the most generic possible first pass in LM would get you so just to recap a little bit um some considerations for multi-agent workflows um obviously you know uh we want to think about the complexity of the task um you know if we’re just generating you know a poem for our very good friend um and we want to give them you know we want a little poem for their birthday card then maybe a multi-agent workflow is overkilled right um but again if we’re going into these more complex tasks like um you know working over a giant data set or working over or performing a task that requires multiple steps that requires maybe even multiple long ranging steps that are separated in time right and so maybe we need a schedule agents these all start to give us clues that maybe we need a multi-agent workflow um second is the complexity of the task recursive and this is an interesting um kind of point to think about right um if your task is you know to uh respond or provide a first draft to all of your uh unread emails in your inbox right you can think of the performance of the task almost as the quality of the first draft for each unread uh email in your inbox right right um and if you can break down your task into the kind of those sub pieces right and measure performance in your task by the performance in each kind of sub piece then you’re starting to get to a problem that’s that might be well suited for a multi-agent workflow right um and then third let’s think about how easily chunked is the task right oftentimes you’re not going to find that a you know a really difficult problem is you know so easily divided into you know nice little discretized units you know take for example you know you’re doing a complex question answering task over you know 3,000 PDFs that’s not a super easily disc like that’s not a super easily chunked task right you might think oh well I’ll just chunk by each PDF right but maybe you need to like pay special attention to certain PDFs and you could ignore entirely other PDFs well you know introduces some additional um considerations into the system and then finally uh and this is a concept that we’ve talked about if you’ve joined us for past uh webinars is um how many parameters do we need for sufficient performance and this is always going to go back to benchmarking benchmarking benchmarking right um if you are um deploying uh these multi-agent workflow any any LM application really but especially these multi-agent workflows in Enterprise you’d better make sure that um these that that you’re actually introducing a significant improvement over the Baseline and that you know how to measure that significant improvement over the Baseline um and you know maybe you get to that really significant Improvement by using a model like gp4 which is I think has been last reported like a a trillion parameter model but that’s using you know like a V12 you maybe you’re using like a V12 engine for your like your commute to the grocery store right um You might not need that many parameters you could be instead getting like maybe 75 80% 90% even of that performance on a fine-tuned you know 7 billion parameter model which is you know orders of magnitude less and could even run locally um you could get that performance on a much much lighter model and actually that that’s something that we’re going to be talking about next week at the the the Google Gemma uh webinar that’s a new family of low parameter count models which has demonstrated really astounding performance um and then you know final final point of consideration is well something interesting that we’ve seen possible with these multi-agent workflows that you can’t really do with just a single llm is that you can actually vary the amount of the parameter count of the uh supervisor LMS versus the sub agents right um you know traditionally you’re not able to do this you kind of have to make the decision to for how many parameters you want or basically the model you’d like to run before you actually run it but you know some some people have found really good success in you know a gp4 you know supervisor LM which is really really skilled at coordinating some sub agents right and then the sub agents are like seven billion seven billion parameter llms that are super specialized in a specific task you know maybe don’t have General performance across a number of different tasks but it doesn’t matter because the supervisor LM is going to know exactly the tasks that they’re good at send them out to do it and then the sub agent is able to run much more cheaply than you know a gp4 agent overseeing other gp4 agents okay great so uh we’re going to get into the live demo really soon here but I just wanted to talk a bit more about single store um and so single store is a database we call it a database to transact analyze and contextualize data in real time so um we run on Prem we run in the cloud and in hybrid environments um we have you know row based Storage inmemory storage um we are horizontally scalable uh infinitely scalable um we’ve also included a bunch more kind of data modalities over the last few years you know we’ve actually had vectors since I believe 2017 um and so we’ve we’ve been in this space for a while it’s not it’s not a fad for us um we have column columnar data um obviously Vector data which is what we’re interested in today um and most recently we’ve introduced um notebooks uh which are uh you know which we’ll hopefully get to show the demo in today uh but it’s just a really cool way to run um python notebooks or jupyter notebooks uh inside of single store and right next to your data um and you know able to talk directly to your database just some really quick St statistics on single store as an as an organization you know we’ve uh supported 3500 over 3500 users at Uber um you know we’ve had an average of 10 millisecond response time you know 10 million upserts per second you know analyzed over 1.2 million Smart Meters I won’t I won’t bore anyone with too much of the details here but just some quick stats all right let’s take a look at our demo for today um let me just double check we have been having some technical difficulties so I’m just going to double check that it’s all set to go I think you’re good to go Alex good to go okay great let me see if I can go and start sharing it all right fantastic um so let’s see um Matt are you able to see my screen all right yeah I can you could maybe click the plus uh let’s do that search twice yeah that’s looking better there you go great okay all right okay so uh so now we’re g to jump into a notebook um kind of looking at how to actually put some of these Concepts that we’ve talked about in practice um so really quickly um obviously you know we’re going to need to install some of these uh packages uh that we’ve you know we’ve seen come out in the community a lot um you know Lang chain we’re going to be using the Lang chain integration with single store today we’re also going to be using um you know autogen Studio Microsoft’s autogen studio um and then uh we’re going to be um supplementing um our autogen agents with some additional information about um actually a Microsoft Library called Flamel right and the the details of it aren’t super important what’s important to remember here is that um we’re going to be deploying an agent to essentially go look over our um or we’re going to be emulating a software team right which is going to deploy a software engineer um a product manager and uh you know a retrieval agent to basically come up with um a response to a user question right right so you might be thinking oh that’s kind of interesting right because usually when you’re asking an llm to you know write some code for you or something like that you you’re presumably just asking one LM maybe you’re just asking chat jpt or you’re asking jp4 and it usually doesn’t actually have access to your code base right um but here we’re actually going to define the structure for you know a software team and uh we we’ll get right into the details of how that’s actually all defined so so uh let’s see here um uh oh we will be sharing out this notebook with everyone after the uh webinar so just stay tuned for that I believe it’s going to be going out to everyone in emails um but yes you will have access to all this code whoops I’m G to have to deactivate that uh open AI key um so uh we’re just installing some packages here and then um to start off we’re just going to be um defining the retrieval architecture for our retrieval agent right and so Lang chain or single stour uh first party integration into Lang chain makes this super super easy um what we do here is we essentially read in um the uh markdown file which is going to provide some documentation about um you know what the context that we’re interested in and we’re going to load that into a single store database and you’ll be able to see that actually here I’ve already I’ve already preloaded all of this um but if you run this code for yourself and you um run this on your own single store database you’re able to also run this on a single Store free share tier um if you run this yourself you’ll be able to see it populate your own database but um I’ll just pull pull up the database here this is what the data looks like um let see oh it’s it is automatically suspended as of right now so I will actually I what I’ll do is I’ll just go ahead and spin it up and then I’ll uh we’ll come back to it at the end we’ll give it a minute to warm up let’s go back to the notebook here awesome okay so um essentially what we do when we load in this data into single store is we’re going to split up our markdown file or whatever um data that that we’re interested in really um Lang chain is really cool because it supports all these different document loaders um it doesn’t just have to be marked down you know there’s PDF loaders there’s uh Json loaders you can load in um you know web URLs you can load in you know a notion page um if you go look at the Lang chain Community uh website and you look at their documentation you’ll see just a really extensive list of all of the document loaders that they have and you can use any of those with single store which is uh which makes you know dealing with all of your data um really really easy um but essentially what we do is we take the data that’s been chunked up right by our data loader and the text splitter and we vectorize it with um not single store but we vectorize it with um in this case the open AI embedding model which is just going to be you know text Ada number two or something like that um and what’s important to understand here is that when we’re we’re vectorizing data we’re essentially converting um data into a um llm understandable format um and what’s more important to actually remember is that you need to use the same embedding function that you um vectorize data on ingest with as when you query your data later right um because you can think of it as trying to fit you know a square peg and a round hole if you use a different um embedding function uh when you’re ingesting the data compared to when you’re querying the data you’re going to get really wacky results and so really important to keep those to the same it looks like our looks like it just spun up so we can connect it here too see we’ll give it another second um and so this this this cell is really what’s going to um in one line of code combine all those Concepts that we just talked about it’s going to create an instance of a single store database as a vector store um by taking in all of these documents that we’ve now split up you can define a custom table name or you can leave it default and we’re also using the embedding function that we’ve defined earlier I’m actually going to really quickly take a look at the Q&A just to make sure I’m not missing anything so okay looks good for now awesome okay um we will have time for more Q&A at the end um so here we’re just going to be installing the uh retrieve chat module of uh autogen Studio the the python SDK of autogen studio um and then here we actually Define a custom single store retrieval agent right um this is a you know a work in progress I think if we see enough interest in autogen we’ll you will definitely offer a first class first-party integration with autogen studio um for now this is a really great place to get started um but essentially this is some just some custom logic that I wrote that um creates a an agent in autogen studio which is able to query and uh which is able to essentially retrieve documents from your single store uh Vector store right um the details of it I’m not going to get into too into the weeds of because hopefully everyone is you know more interested in the applications um than the actual implementation of like uh you know like a data integration but you know if there’s questions about that I’m happy to dive into this um but at a high level what it does is it just wraps around the uh Lang chin integration with single store um and perform similarity search um whenever uh the agent is requested to uh to retrieve the most relevant context for the conversation that the agent is in so that’s that’s what it does at a high level um but once we’ve defined that agent um let’s see let’s delete this [Music] again um we can finally get into uh actually Divi defining the multi-agent architecture so who Okay that was a lot to get to the meat and bones all right let’s see here all right and stick with me here it’s it’s going to get a little um it’s going to get a little hectic um so do put any questions you have in the chat um but hopefully this is clear enough at the end um first before we do anything we have to Define what llm model or what llm provider um the um agents are actually going to use and here we’re just going to be using uh you know GPT 3.5 turbo obviously you can um and this is a across theboard configuration that we’re going to be using for every uh agent but you’ll see here um we can actually Define an llm config argument for each agent right um I’m getting a little Hut of myself here but the product manager agent the code reviewer agent um the python engineer agent um you can see some of these Concepts come in of where we could potentially you know think about how to tune um the number of parameters or the really the model that we’d like to use for each agent maybe it’s more really important that we give the python engineer a gp4 you know model for for example but actually maybe we only need to give the you know the supervisor agent uh you know GPT 3.5 because it’s it’s not a super taxing task to be able to delegate between just like you know four members of the team or something like that um you can kind of see some glimmers hopefully of where um you know these Concepts can hopefully apply to your own use cases right but anyway uh going back to what we were talking about all the way at the start part of the the demo um essentially what we have here is um a boss agent who is the proxy for the user it essentially is the agent that relays the users’s intent to the rest of the agents um we also have this boss Aid agent which um uh is which augments the boss’s context with um information from our single store Vector store right so um in the case that we need extra content to be retrieved like for example we’d like to you know pull in some code documentation from when uh from that was created after the llm was pre-trained then we want to use this uh single store retrieve user proxy agent um and here we also important to note that we’re uh also going to give it the single store database as an argument which is uh different from the normal uh retrieval user proxy agent if that’s what you’re familiar with from um autogen studio um finally we’re going to Define just a few other um agents who are going to kind of work in tandem work with all this information to uh give us hopefully a really great final output we have a you know senior python engineer agent who has you know a system a system message and then a description so the system message is the message which is appended to the top of of this agents generations to every single one of this agents generations and the description is the a description of the agent which is given to the boss agent right which then is able to understand hopefully okay I need to give this task to the senior python engineer I’m going to give this task to the product manager much like how um you know in a in a corporation we see delegation among um you know different tiers of the organizational structure great so um so we’re going to be showing uh just really quickly um two functions rag chat and no rag chat hopefully to show the power of being able to uh perform retrieval augmented generation versus no retrieval augmented generation I realize I’m running a bit of tight on time here and I really want to make sure that we get to as many questions as possible so I’m going to speed through the rest of this just a tiny bit um let’s see rag chat no rag chat um and then uh when we actually call um our rag chat uh you’ll see that we end up retrieving content um from our um from our single store uh proxy agent user proxy agent um to give additional contacts to the uh boss agent which is the agent ends up you know bubbling up the final response to the user right um all right and then finally we have this concept in autogen uh and this is this is really the the Crux of it um of a group chat right um just like how you might have a group chat with your friends um here we Define a group chat between you know the boss the product manager the coder and the code reviewer and um we also Define you know the max number of messages that are allowed to be sent between each other uh you know we can there’s a number of different you know speaker selection methods that we can Define that autogen has but in this case we use round robin so we just go round the circle everybody goes one at a time and um and then now hopefully we can see uh what this um uh what this group chat actually looks like so first we’re going to look at the no rag chat um and so to start off the boss you know asks how do you use spark for parallel training and Flamel give me sample code and this is uh you know the details of this we’re not going to dive too deep into but um again what’s important to know is that Flamel is a library which was built by Microsoft but actually built and defined uh or this relevant documentation was defined after uh GPT 3.5 was pre-trained and I I believe GPT 3.5 was pre-trained last in like it was it was last updated in probably March March of 2023 it was released in November 2022 and then they they’ve updated its knowledge up until like around March 2023 um though don’t site me on that um but well before Flamel was actually um well before Flamel was actually um was actually defined here and so when we see the python engineers’s response it’s actually totally hallucinated um we see you know some you know OB like this is the problem that obviously with hallucinations um and that that that hopefully everyone has run into already and you so you’ve seen the kind of motivation behind retriev augment generation but these responses look actually very plausible if you’re not familiar with the relevant documentation at hand um and so you know they can cause a lot of headache right because you you might even run this code which I do not recommend uh I definitely don’t recommend running any code from llms without reviewing it and really understanding what it does um but you might even run this code and you know get all these Arcane error messages um and You’ be like oh what the heck it’s like it’s supposed to be um you know this LM is supposed to be all knowing um I’d really like to um you know just get good responses out of it without having to you know mess around with it right and so obviously you know we um you know we we’ve got a completely hallucinated response here um and the code reviewer in this case after the python engineer responds just responds to terminate the group chat uh because it it it says uh or it presumes that this is actually sufficient uh as an answer now let’s actually look at what happens when we use rag chat right um now rag chat is uh going to look a little different um the boss assistant says that it tells the retrieval augmented user proxy agent pulling from single store that um you can uh actually retrieve knowledge and um use that knowledge in your response so um the so here the boss assistant is calling the user the retrieval user proxy agent it gives it the user’s question and then we pull up actually the context the relevant context which is going to be important for um actually generating you know valid Flamel code right we even get a little code snippet um you know we get this is all still context um that’s been retrieved from retrieved from single store we get you know some additional information about you know how you know Flamel is integrated spark you know um it you know has some additional information about you know how how to actually use it in an example with um uh you know python data frames and then all so we’re going to scroll down um and then we have the product manager responding uh after getting this um this additional context um you know interesting so this is why benchmarking is important here but uh interesting how the product manager responds here and not the uh python engineer right but the product manager you know obviously is still running with the same base model and so it’s also able to generate code and and it ends up giving um you know a code snippet which is um actually valid for running parallel training in Flamel using spark right and that’s because it has um all of this context right all of this context which the boss assistant um you know prompted the retrieval user agent uh the single store retrieval user agent to actually um pull from okay wow great great that was a lot and then you see at the end here the python engineer terminates conversation because it you know deems that you know this is sufficient for an answer for the user now we can get into some Q&A um let’s see AAL do you want to maybe share with me like a a theme of some of the questions that have come up so far yeah Alex lots of questions about the agents themselves you know are they tools for example you know sub agents you know what can you talk to the internet using them a whole bunch of stuff there um do you have any thoughts on that Alex I mean you know could you sum summarize that let me let’s pretend you’re chat gbt could you summarize that in like one paragraph okay great so tools tools is um that’s a really great um concept that’s come out in the last you know few months uh you know open AI has come out with function calling um but the kind of limitation of tools is that um you can really only expose so many tools to a single llm before its performance really degrades right and so for those that aren’t familiar tools are essentially like actions that we make available to llm like you know for example browse the browse the web or run code or generate an image like if you’re if you’re using gp4 today you know it already has access to tools like like all those tools that I’ve just mentioned actually but when you extend past maybe like three four five tools um the llm just really can’t reliably pick um which tool to use and so that’s where the kind of beauty and uh extensibility of Agents really comes in um you can give each agent access to like maybe just two or three tools that really good at picking between or maybe you even just think of each agent at least in this example of just having one tool right and then um you have a hierarchy of Agents right that ends up picking um picking out sub problems to solve that the agents can go out you know and solve a well-defined Sub sub problem and then once you have that piece solved you have you know the pieces necessary to solve a higher level sub problem and so on right and so in that sense you can maybe even have you know if you think of the boss agent as having access to the tools of everyone you know below the boss agent or every not everyone but every agent below the boss agent then you could effectively have access to you know an infinite number of tools you know obviously in practice that’s not feasible but it gives you access to obviously you know definitely many more than like four or five tools before a single lm’s performance degrades yeah one quick observation Alex the uh spark code you were showing a little bit earlier absolutely uh I’ve just been working with a bit of spark recently that code looks as though it would work you know it’s uh very very accurate I mean I didn’t have a chance to see it all in detail but it looks pretty good so but has to be tested obviously you know make sure it does actually do do what it says on the tin yeah yeah yeah that’s can’t I really can’t emphasize that enough obviously all these demos are really cool but do not you know just like you shouldn’t run with scissors don’t don’t go out running um you know LM generated code on your you know production environments without first testing yeah I think there’s a couple of great questions there in the Q&A as well Alex I’m not sure if you you’re able to see that all right let yeah let me let me pull up some of these questions okay so from our our pitha we have are the tasks being routed to PMS Engineers Etc um uh are they agents are as the tasks are being routed are as are the agents able to respond back with the right answers um well your mileage will vary obviously in this demo we we’re showing hopefully the golden path scenario um but um you know benchmarking is going to be really important here right so you’re going to want to make sure that um you’re setting up you know uh these uh reliable indicators of performance uh especially if you’re using these um multi-agent workflows in production right uh before actually deploying them um I think it’s actually especially important in multi-agent workflows because you do have you know potentially more points of failure right you’re going to want to make sure each agent is doing what they should then that the whole workflow as a whole is doing what it should right um maybe if it’s a really fragile system just one agent failing and hallucinating a response could potentially you know um that could Cascade up to the top and give you know give you a bad response up at the top so important to Benchmark and and and test here okay so let’s see um let’s see why we need agents for this so from shavan we have why do we need agents for this uh when we could just do rag plus one single llm call actually really good question um this hopefully you know you can do you can do essentially this entire workflow um in uh just what Shin mentioned one single rag call so why do we do why do we bother with multi workflows in the first place and that’s because this is extensible across an entire hierarchy of agents and sub agents right so it’s very possible that we want to work with for example um multiple databases uh and we want to work with um you know maybe even you know we want to work with not just uh context from a single store data store right but we also want to work with you know web search information and we want to be able to execute code and we’d like to ingest we’d like to be able to also like listen to audio and watch a like have the LM like a multimodal LM watch a video right if you if you have a complex task like this um you’re going to very quickly find out that delegating all to a single LM is is going to turn sour very very quickly and so uh the purpose behind this webinar is hopefully to give you the tools to extend um to you know your production kind of or Enterprise grade kind of problems and environments right okay so let’s see so Kristoff asks do all agents prompt messages get updated after each turn or does the boss decide which agent which agent needs to know something if all gets updated doesn’t that make it financially unsustainable with n agents for one problem you may be spending n amount instead of one um that’s a really really good question um I actually don’t know the exact answer to that um I believe what happens today is you you preset The Prompt messages um uh before running the whole group chat and then you run the group chat and the prompt messages stay fixed um during the uh group chat itself uh because I think that I mean that does sound like a pretty pretty comput computationally expensive problem of of updating the prompts after each turn so that’s that’s how I understand um the autogen group chat to work today okay great uh so we have another question from um kulan who asks how are you thinking about front end for autogen to make it usable by business users who don’t have technical knowledge so really really great question here um I’m actually going to share out uh let’s see I’m going to share out a GitHub library for everybody [Music] rag now I I’ve I’ve only played around with this a little bit um but um let’s see oh strange okay I cannot find this example um but I will send it at the chat at the very end if we’ve got time um but you can use this Library called gradio which is a really great open source Library um which basically makes it super simple to hook up these um llm chat flows into a front end experience so you can see like in a much more traditional chat interface you know the agents responding to each other in turn um in a you know in a you know hopefully more non-technical friendly um interface and that’s something that you can you know get up and running pretty quickly after you have everything running in code okay let’s see okay so um so from Rebecca we got uh Alex I love your presentation I wish there was more time for your descriptions it’s valuable walk through the code however I was expecting an actual demo would have been great to see how the agents work um no really great really great um Point Rebecca um just for the sake of time and for this presentation you can see we’re already running up almost on the end of the webinar um we pre-baked some of these responses or I ran the code before um we actually did the webinar um so you’re free to go and actually uh run the python notebook yourself we’ll be sharing out all the code and all the relevant details uh after the presentation um but yes just just so that we can get to as many questions as possible and hopefully clarify as many questions as possible great okay we’re running up on time here I’m going to go a little further back um so from Sarah Wilson we got how do you go about defining your agent hierarchy do you explicitly state it in your agent instructions or is it defined in some other way so really great question so you actually do end up defining this uh in code I’ll see if I can actually point out where this is defined see so I’m going to try my best to explain this code uh though I know I have some just personal opinions on how autogen um you know their uh their SDK could be improved a little bit um essentially uh what we do is um for the boss when we retrieve content um we’re actually going to uh like the boss is actually going to give what autogen defines as a problem or problem statement to the boss uh Aid and then when the boss Aid gets that problem statement then it goes out into the context store and actually retrieves the information um and then after so after that so that’s that’s the kind of boss boss Aid um kind of line of communication we also have um some uh we also have a lineup communication between the PM the coder and the reviewer um and so they basically kind of see each other um as functions so going back to like tools um like the PMS coders and reviewers can call each other as tools to um you know the PM can ask for can you know make a problem specification and then ask the coder to code it up the coder can ask the reviewer to um review its code right they all kind of see each other uh in their kind of uh prompt as tools that they can actually call each other up for and then finally the boss is the one that’s in communication with the PM so the boss gives you know the problem and the context you know to the PM and the PM is the one that actually draws up the problem statement in this case in the example we actually saw that the PM just went straight ahead and coded up the solution um you know uh but in the kind of Ideal scenario we would have seen that the PM actually ends up calling the coder as we’ve as we’ve intended it to um and then the coder calls the reviewer but you know as as probably many of you have seen already um llms are inherently non-deterministic systems and so what’s really important as you know you know I I probably sound like you know a broken record at this point but what’s really really important is benchmarking your outputs and making sure that your outputs are up to a certain um improve a certain amount of improvement over the Baseline and ensuring that that’s repeatable great I think we’re running up on time here yeah I think we should probably go ahead with the Apple airpods announc that’s more exciting before before I I announce that name just quick reminder Monday we’re going to be doing that Google Gemma uh webinar come come check it check it out we got a really fun demo and code share planed that’s that Google Gemma the new uh lightweight model Open Access model uh it’s really cool come check it out on Monday I just put the uh registration Link in the chat and I hope to see you there now the announcement you’ve been waiting for today’s airpods goes to Sam swamy Nathan the senior manager of engineering at US Bank congrats Sam thank you for joining if that’s not you don’t give up we’re going to give out one more airpods by the end of the day to someone who has tried out Alex’s demo here we’re going to send out the uh the link for this notebook and the video recording of this session so give it a chance and hopefully you’ll be the next airpods winner thank you so much Alex for an awesome presentation today uh thank you to our audience for joining us enjoy the rest of your day and your weekend thanks everyone bye-bye thanks everyone