Now Hiring: Are you a driven and motivated 1st Line IT Support Engineer?

How to create AI agents that don’t suck

1725224465_maxresdefault.jpg

How to create AI agents that don’t suck

Creating AI agents that stand out for their performance and utility doesn’t need to be a daunting task. With careful planning, the right tools, and a comprehensive approach, even novice developers can create AI agents that are both efficient and powerful. Here’s your guide on how to make AI agents that don’t just perform well but are also engaging and helpful.

Introduction to Effective AI Agent Development

AI agents are built to automate tasks ranging from simple data retrieval to complex problem-solving processes. A well-developed AI agent can significantly enhance user interaction, automate routine tasks, streamline operations, and offer new insights. The challenge lies in creating agents that are both reliable and effective in real-world scenarios.

Understanding The Core Principles

The foundation of any good AI agent begins with a strong understanding of AI principles and techniques. Developers must be adept at integrating Artificial Intelligence (Machine Learning, Natural Language Processing, etc.), software development skills, and system integration capabilities.

AI Models and Algorithms

Start with selecting the right AI models. Whether it involves natural language understanding, predictive analysis, or something else, the choice of model greatly impacts the agent’s performance. Tools like TensorFlow, PyTorch, and others can be used for building and training models.

Data Quality and Management

High-quality, well-managed data is crucial for training AI agents efficiently. This involves gathering, cleaning, and structuring data in a way that optimizes the learning process for AI models. Data management also involves constant updates and maintenance to adapt to new information or changes in the existing datasets.

Utilizing Development Frameworks and Tools

To build effective AI agents, developers can leverage various frameworks and toolkits designed to simplify the creation and management of AI functionalities.

Framework Selection

Choose a development framework that best fits the project requirements. Frameworks like Google’s Dialogflow, Microsoft’s Bot Framework, and open-source alternatives like Rasa can provide robust starting points for developing conversational agents.

Integration With APIs and Microservices

Incorporating APIs and microservices can extend the functionalities of AI agents, allowing them to perform tasks such as retrieving information from a database, integrating with other software services, or managing requests and responses in real-time.

Testing and Evaluation

To ensure the AI agent performs well across all expected scenarios, comprehensive testing is needed. This includes:

  • Functional Testing: Testing the individual functions of an AI agent.
  • Integration Testing: Ensuring all integrations with APIs and microservices work seamlessly.
  • Performance Testing: Testing the speed and responsiveness of the agent.
  • User Acceptance Testing (UAT): Validating the built AI agent with real-world users to ensure it meets user requirements.

Continuous Learning and Optimization

AI agents should improve over time. Implementing continuous learning mechanisms, such as reinforcement learning, allows AI agents to adapt to new data and evolving conditions. Monitoring the performance and making iterative improvements based on user feedback and interaction data can greatly enhance the agent’s accuracy and effectiveness.

Deployment and Scaling

When you move from development to deployment, considerations for how the AI agent can scale are important. Ensure the infrastructure can handle increased loads and consider using cloud services, which can provide scalability and robustness.

Ethics and Compliance

Last but not least, developers must adhere to ethical guidelines and ensure compliance with all relevant laws and regulations, especially those related to data privacy and security. This builds trust and reliability in AI applications, promoting wider acceptance and use.

Summary

Building AI agents that “don’t suck”, requires a deep understanding of both the technological aspects and the human interaction elements. Developers must focus on the end-to-end lifecycle of AI development, from concept and data handling to development, testing, and deployment. By following best practices and leveraging modern tools and frameworks, the creation of effective, efficient, and reliable AI agents is well within reach for teams of all sizes.

[h3]Watch this video for the full details:[/h3]


How to create AI agents that don’t suck

[h3]Transcript[/h3]
hello hello welcome everyone how to create AI agents that don’t suck this is the title of the workshop we’re doing with Alex rman I I don’t I don’t know how to pronounce your last name but we we’ll look into it um founder of Asian Ops he’s supposed to be with us here today um right at 11:30 um Assuming he’s busy might send an send an agent that doesn’t suck to to join the call so we’ll we’ll just try to buy some time till till they can make make it um this this is part of the workshop for the be out and build hackathon and I’ll pass it on to Bahar hello everyone hi I’m Bahar I’m co-founder of Bel out porya is also a co-founder of Bel out he forgot to introduce himself and uh as P mentioned we are hosting the loudest biggest AI hackathon which is global virtual hybrid this week in two days exactly and you have a little bit of time to register there is not enough time left but you still can register right now and we will share more about it and I’m just waiting for other people to join in slowly and then we can start the presentation yes so this is a live stream going directly to LinkedIn YouTube and Twitter um pick your poison if you do want real time chat uh Twitter is your best option and we could um see all your comments over here um some people said the registrations were closed for de hackathon so we did open it up to a few more slots um if you do want to join go for it there are two inperson locations possibly one more in SF which we’re confirming today so we got Vancouver Canada so far London UK and quite possibly um San Francisco in the Bay Area and also virtual but the catch is if you do want to participate virtually you need to have one person physically in one of the locations so either in Vancouver Canada in Vancouver London or possibly SF which will confirm and it it’s one of those new experiments we’re we’re trying out right every single hackathon that we’ve I I participated in two hackathon two three hackathons so far never hosted hackathon so it’s the first time for everything but a common theme that was going around is how how do you gamify the hackathon how do you get people to be more engaged and and make it more inviting something new we’re trying is uh figuring out how to live stream from Day Zero to day three of the hackathon fullon like from when people are pitching their ideas up on the stage virtual or in person um going around with a sorry going around with a Rog mic that that we got from from from Amazon which was supposed to be delivered today I hope they’re not late we’re going to go around interview them we got a big big rake um that like like has a light and some mic that you can go around and interview people so we’re going to do that um for the the pitching day on Saturday do the uh co-working hacking where teams are in PODS of two to five and going around to see what they need help with what idea they’re working on is it a billion dooll idea who’s like the CEO the the web person the coding guy the content creator and and also share about their project live upon the stage uh yeah as pory mentioned we also have a you’re using a very very cool platform called soor for everyone who’s joining us virtually it’s literally like Sims and you will be interacting and co-working together and joining the sessions and the workshop and having office Avers all in that gamified game scene and it’s so interesting we can search it up right now but you will all have access to it and you will you spend uh more than less than three days of your live there for in next two next two days um also as pory mentioned we have an emphasize on um building a product that you can find users Forin three days as you know there are a lot of products out there that have very good backin in front end but the problem is that they don’t have any users and that that’s as if they don’t exist so um it’s really important find users for that reason we are inviting content creators and creatives and designers to join every single team and help you with the exposure and creating content about the idea you’re working on the day one in this three days are they going to build a startup or you gonna fail really fast as something in just three days which both would end up be good um and I tell everyone that what you can accomplish in three days outside of these three days it will take you six months to either accomplish it execute it or learn it so it can be a good thing yeah so I I just tagged uh Alex um in the chat okay Alex trying to join the call but doesn’t have the link um okay I will email you the link Alex if if that’s that’s okay um so as Bahar mentioned one idea to an MVP to a user part of the reason we started this is that within our Network and the stuff we were seeing a cool thing like this wasn’t happen happening secondly we we we saw an interview that that Sam Alman did and he was saying that a team small team you can make the next billion dollar startup um using Ai and we were like come and do it in three days like let’s let’s let’s challenge you to it and let’s show let’s show the world the power of AI um as we push push the boundaries um all in together so the theme is build in AI or using AI I um fairly fairly uh general just so you could understand that it’s easy to like well it’s not easy but doable to like make a product and find a user for it in three days um and people have done it before in hackathons by AGI house I think um either the OG one or the not OG one um and other hackathons before so it’s totally possible I was talking to this content creator who who had like 100k is 400k followers on Instagram he’s like participating in the hackathon as well he said my goal um is to go from one idea to an MVP to being Acquired and he was saying that’s ambitious enough I’m like dude we could we could totally like help you with that so um part of the reason why I reached out to to we funder want to reach out to um acquire. comom um founder and if if they’re watching this which I don’t think they are but uh I’ll reach out to you we do want to put like their funding page on the live stream on the spot so if if people from the public like like to support their project over there they could just like transfer the money and stuff it’s it’s G to be quite fun we’re super excited for it that’s great I think it will take a couple more minutes for Alex to get here and we can start a presentation what do you think for you um sure sure yeah we could we could do that um I just hope it’s the real real Alex showing up not his agent um which at this point I’m assuming it could be both but uh we’ll we’ll we’ll see go go for it B awesome so as you know today oh it’s great we have Alex is here so we’ll invite him first okay can you hear me am I in yes we can hear you welcome Alex okay just to confirm this is the real Alex and not Alex the agent okay perfect so yeah we’re just trying to kill time and we’re telling people about uh about the hackathon uh we’re super excited to to to to partner up together on it um and we’re gonna do like a brief one minute thing about about be out in and the heor then we’ll pass it on to you sounds good okay perfect awesome uh welcome Alex so happy that to have we have you here um hello everyone thank you for being here thank you for Alex the time spending it was so last minute uh I love this fast speed of setting up things so thank you so much it’s so different from other companies and organizations and I I love that um thank you for everyone who’s been here it’s uh 11:40 a.m. in the morning uh on a Wednesday Thursday and it’s really um I’m so thankful for being here I know you might be busy have time have things to do so a little bit about what we do be out is a non for-profit organization we help startups go from zero funding their next 10K customers or raising fund we have a community of Founders Builders creators and investors in the accelerator we help startups to grow in the studio we help startups to be built up and then we host uh workshops events such as the hackathon workshop such as this like almost uh every week for the community you’re in a mission to import the next gener of creators Builders and investors investors on it shap shaping the imagination age and into the imagination age but what is Imagination age uh if you remember the time of emailing and texting was that kind of a thing it was getting popular time of Facebook all of that is in Information Age and that’s in the past right now we are in the imagination age the age of AI web 3 AR VR open AI chat GPT all of these cool technologies that are giving this giving us abundance have access to resources to realize ideas and to make things done um so we thought now that you can go from from prompt to anything in matters of seconds why not test this and see if you can build a billion dollar startup in three days and you’re not going to be alone you’re GNA have team members from diverse backgrounds someone from marketing someone who’s a developer someone who’s a designer and we’re trying to put create this environment for you to be able to create this billion dollar startup so we started working on B out and build AI for more than less than two months ago and we are here now with Alex and please join in if you have a friends colleagues who you think might be interested invite them it’s going to be fun it’s Global virtual and in person with have two onset location one in Vancouver one in London and we are so excited to see you there now we also have hosted events before in Vancouver if you’re in Vancouver join us in person as well most of our events are both in person and virtual to be more accessible to everyone and the community and uh thank you for being here you can follow us follow us on socials Bel Bel out XYZ and we have we have Alex here the stage is yours and thank you everyone okay awesome so uh I am going to figure out how to share my screen here um at the bottom of the screen there’s that plus button and then we can add add it to the stage uh quick notes again thank you so much for coming Alex we literally set this up last night at past midnight um I don’t know if you stay up too much I don’t usually stay up tonight but thank you so much for for doing it um I’m super ex hard to sleep when you’re in such an exciting field so I’m always happy to uh tag along and do this kind of stuff so cool so uh I will kind of give the brief or the tile of overview um so um yeah about myself so my name is Alex Reedman I am one of the founders of aent Ops so uh agent Ops is an observability testing and debugging platform for building a agents um so what exactly does that mean so what what exactly is an AI agent this is the question that everyone’s been asking um you know Sam mman was talking about it Bill Gates was talking about it what what’s going on with AI agents um well they’re really popular for one so um kind of like early last year um there’s a project called Auto GPT and it Rose to become one of the most popular projects ever on GitHub like I think in the spend of a few months it had like almost 100,000 stars um it’s well over 100,000 now at least uh and the idea was um what if you took chat gbt and you give the ability to program itself and build itself out uh so like really turning a a large language model into an autonomous program um and there were some other projects that came out of this as well so there was Jarvis um Jarvis was an idea by Microsoft which is like it kind of be like an internal AI assistant kind of akin to the one from Iron Man I think uh and probably the most popular one uh that everyone’s heard of is baby AGI so baby AGI was actually project by this guy yoh Nakajima he’s a a hobbyist developer he’s actually venture capitalist full-time and he developed Babb AGI which is the idea is you give an agent a highle task you say hey I want you to find cool hackathons in San Francisco or tell me about uh beloud right or you know search the web and do whatever and what it does is it actually constructs a task list and with that task list it um it tries to execute tasks using large language models and vector embeddings and the set of tools and these were like in the very very early days this was probably March right right after gp4 released that was March 14th 2023 people started playing around with the stuff and it really blew up so the idea is you don’t just have um self you don’t just have like regular llm calls and rag gaps anymore you actually have self-driving programs so the symbol analogy and I use this to describe agents all the time is like we have software for self-driving cars you know we have zukes we have weo we have Crews so on and so forth uh if you’re in San Francisco you see these things all the time if you’re in other parts of the world they’re coming I promise you uh but now we have this idea of like a self-driving program um and you have the autog gpds of the world you have Devon uh you know a self self-driving software engineer you have Lindy which is kind of a self- orchestrating executive assistant and kind of the broader thesis here is that the at least the way that we think about it is the 90% of programs in the next decade are going to be autonomous they’re not going to be written by a human they’re going to be self- orchestrated they’re going to be self-designed and they’re going to be uh fundamentally changing the way that we undera on the web which is right now we have we have apis we have developers and developers talk to apis but soon you’re going to have agents you’re going to have developers talking to agents and then agents talking to apis and then agents talking to agents and it’s going to be this whole new construct it’s the point where um the way that we think about Computing as kind of a standard service based approach is is going to be totally different um so just you know for this is kind of aimed at being a little more high level uh and for folks who are a little less uh into the agent space so I’m going to show off some examples of some really cool agents so there’s a paper that came out last year and I thought this was really mind-blowing uh where a team at Nvidia created an agent called Voyager the dine Voyager is um you uh it actually was a simulator for the game of Minecraft so in the game of Minecraft you were a blocky character you go around a blocky world and you explore and you you hunt creatures and you craft items and you mine things uh and the way that this agent was constructed was it’s all being powered by gb4 uh and what they did is they they said gave it um the ability to teach itself skills so you could say uh you give a a player in the world and has it knows nothing but it has a high level objectives like fighting enemies or crafting items or mining for resources uh and it would actually iteratively write its own code and if the code was successful and if it was doing a good job it would save that code to a skill library and that way when the agent was in the same situation again instead of having to think and reason and design its own code it would actually just pull a skill from the skill Library so if I gave it the task of starting a campfire uh instead of thinking step by step on how to do that it would say oh I need wood and I already know how to mine or chop down a tree so I will take that tree and then I will make a campfire out of it uh or for fighting a zombie it would say I know how to fight zombies I have a sword in my inventory I will use my skill for creating my sword and then use it to fight the enemy uh and what this end up looking looking like uh this’s little video here this is the agent in action so this is entirely autonomous there’s no players playing this whatsoever I just figured out how to play all on its own so what we seeing here is like these emergent behaviors of uh basically again self-driving programs where they’re able to navigate the world and solve problems and solve puzzles uh and it’s really remarkable it’s like okay yeah it’s it’s able to play video games which is a simulation for the real world now just imagine 10 20 30 years in the future when we have the actuators of the robotics that are actually good enough we might actually have autonomous robots being able to maybe not fight enemies but maybe chop down wood for us right or starting campfires or mining for resources uh and this is where we start entering kind of like this futuristic age of Blade Runner and all this kind of stuff uh there’s all sorts of philosophical discussions about it but we really need to think like this is a glimpse into the future of what uh the the next generation of Technology will look like which will be AI agents so anyway uh that that’s enough of watching Minecraft but um kind of a conceptual level and there’s a lot of like clout leadership online about this but um a lot of people have been thinking like what what is kind of the next abstraction Beyond basic computers looking like so we all we all think about you know we have CPUs and gpus we have storage we have you know networking cable so on and so forth uh entry Kathy like legendary ml researcher thought of like the llm OS which is like your new CPU is your large language model and your memory is basically a combination of how big is your context window so how many tokens or how many how many words can you fit into chat gpt’s prompt interface before it runs out of memory um uh and then you know sometimes you want to extend the memory right so you can actually have um uh extended Vector database for this kind of stuff so you have a dis storage like file systems eddings you give it Wikipedia so you don’t have to tell it to memorize Wikipedia internally um but kind of more cohesively what you want to be doing is adding tools to your agent so the ability for your agent to have actuators that to actually interact with the world in some sort of way and that that that’s where things get extremely powerful very quickly um but the the limitating factor here is actually going to probably be at the large language botle level um we don’t know when gp5 is coming out uh a lot of people Rising it’s can happen next month so prepare your business um but the thinking here is that if you’re able to create a powerful enough um CPU uh for your for your agentic program you can build some really cool stuff out of it like you know self-driving programs um so um this is kind of like a high level view of like what agents are um like from a diagram level and I’ll break this down to like more and more grur or details over time so kind of encompassing the side of an agent you have um number one first and foremost you have tools so the ability for your agent to actually do things is what makes it an agent chat GPT by itself isn’t really that smart if I ask chat GPT you know who is the president it will probably know the answer to that right um it’s just baked into the the the weights of the model itself but if I ask Chad to buy me a pizza it will not know how to do that right it doesn’t actually have a way to interact with apis to make phone calls to send emails to use credit cards when you give it these tools and you can Define these tools basically userdefined functions you say hey chadt I know you’re bad at math but I want you to multiply these two prime numbers together give me the results instead of actually trying to guess that number which is what it does currently you could just say hey use the calculator tool or hey chadt can you write the Fibonacci sequence for me and resolve that at runs instead of guessing the strings guessing itself you can actually use a code interpreter tool or a Search tool so on and so forth um we’re able to kind of extend this uh there’s some theory about like infinite context length window models uh I don’t think that’s that may or may not happen we’ll see uh maybe gp5 will really blow us away but uh for the time being there’s like a lot of hacks around it which is like we have the dis storage we have Vector database pine cone we8 um so on and so forth highly recommend checking these out if you want to save documents in embedding space and then uh basically use them for retrieval uh so that you can enhance your responses um and then around that just encompassing like what the agent does is like it has the ability to create actions so it’s not just like predicting strings but like actually triggering some sort of fun function to enact those tools uh which is a challenging model by itself so actually like a lot of folks are specializing there’s a lab at Berkeley right now called gorilla they were specializing entirely just on building models that know how to read prompts and decide which tools to pick so this is a very very large component coming out soon um secondary to that and this is kind of last is the the planning mechanism so planning is the ability for agents to actually think through problems uh by and large llms like you know there’s like little hacks where you say like hey think step by step to solve this problem and it performs a little bit better or sometimes you can offer a tip say hey chat I will tip you $300 if you do a really good job or chat TPT my grandmother is going to die unless you solve this task for me and then you actually inch up the uh the accuracy and Improvement of it that that’s actually like what the planning stuff is so it’s like not just like you know prompt engineering prompt thing but actually sometimes can we a task list giving it uh critics to actually criticize its goals and make sure that it accomplishes the way you want it to um and then also giving sub goals and decomposing those goals so on so forth this all actually helps the agent reason through problems use the tools that it has at its disposal and solve the problem that you um you get throw at it um so here’s kind of like the the mega diagram right so what what components does an agent actually have Beyond these kind of four core mechanisms so underlying it all is your large language model uh by and large if you’re building an agent right now I implore you please pick the best llm which is going to be gbd4 uh in 99% of cases uh I heard Claude is getting up there but gbd4 is going to be the best tool for most circumstances uh the reason why is llms are becoming commodified in a way you have a bunch of big providers and they’re all kind of like the ASM toote is that they’re all going to be the same uh in the same way we don’t really care which gas station we get our oil from we’re not going to really care which provider we get our tokens from um so just think of it like right now like pick the best oil for your car um which is gp4 uh and then you can worry about smaller cheaper local models later on after you prove these things work in production and honestly it’s only going to be a couple of years where everything’s kind of more or less the same so just work with what’s best right it’s just swapping out a string at some point um I’ll work from the bottom up so that we kind of have a high level description of what’s going on so the biggest idea is like having a Works Space for your agent to execute in is probably one of the most important things um agents by themselves like again like the reason why I chose this chose this little like creature here this is actually a kind of a mythical creature called a shogo and the idea behind a shog goth is that it’s a um it’s like a cthulu like creature that is developing the tools that orchestrates the end of the planet but we have a little smiley face on it because that’s what reinforcement learning through human feedback does which is like it aligns the models in a way but these things are actually quite dangerous right if you tell it to um uh do buy a pizza how do you make sure it doesn’t buy 500 pizzas uh there’s a famous case where they gave an agent uh access to a terminal and says hey agent I want you to please delete every Json in this directory every every Json file and it deleted every Json on the user’s computer so these things can be destructive if you’re not careful so being able to sandbox containerize your agents is really tantam out there’s a few tools for this uh number one you just open up a virtual machine right and just put your agent in there sandboxing it uh secondary there’s a great Tool uh kit called e2b e2b uh that’s e the letter E the letter number two and the letter B um and they basically have like a containerized environment where you can just give your agent the ability to perform actions so it doesn’t do anything destructive um now with that uh it agents themselves like I think agents is kind of like the name of your app like we call agents app or apps agents these days if they involve L language models but often times you end up having multiple agents working like in concurrency with each other so uh I I’ll go through an example later we actually do a code walkth through um One agent might be like writing a job description for a a job right and so you have in there is like an agent that does research on the web and then an agent that summarizes the results and then an agent that writes the job description and then an agent that is an editor and basically re uh figures out how to compress the the researcher did into kind of a small um small text sentence um and that’s where you have the multi-agent tendency thing come into play uh major reasoning behind that is number one it’s a little bit of a hack um number one it gets around context window limits so gbd4 turbo has like 128,000 tokens which is actually not enough for like a lot of tasks um so being able to kind of paralyze these paralyze these uh these agents actually artificially lets you expand your context window there’s obviously trade-offs too but um having more agents actually does the job better also you can parallelize the tasks so you don’t actually necessarily have to wait for one to complete before the other one executes you have them work concurrently and then that way you save a lot of time during the execution uh and just one big complication here is that agents are very slow so anything you do to cut off seconds is a a very big one um in terms of actions uh one very important thing that you want to do is be able to actually know what your actions are doing creating an audit trail of how your agents are actually interacting with the world is useful for two very important reasons number one debugging what you’re going to find is Agents right now are very unreliable um so being able to just see the paper trail what your agent does is a it’s a huge win for debugging uh second to that is uh being able to uh have like a um an understanding of like what tools your agents are using so for example if you give your agent access to a very expensive database that it never uses because it never needs to use it you can probably cancel that subscription right so being able to see which tools it uses is a big win um planning manager I’ll go into that in the next few slides um external memory uh there’s a few kind of like interesting thoughts about this again like I mentioned a lot of vector databases like pine cone we8 uh so on and so forth but I actually really recommend instead of just like you know using these databases think about like what the purpose of databases is uh oftentimes they’re used for rag applications so being able to retrieve documents put it in context and summarize it but this is just like a convenience way to search through things you could easily replace a vector DB with elastic search uh it wouldn’t be the easiest but Vector DBS uh they’re they’re they’re basically just a way of retrieving more tokens for your your model um all right so we’re almost done with this bit but uh understanding how your agent using context is very important so uh one major important complication with these agents is that the more token that you actually load in the prompt the less accurate they get so just understanding how much tokens that your agent is using uh will actually be important towards like measuring the accuracy think about it like this if you have an agent that takes 20 steps and the agent is only 90% accurate on each step uh by the time it reached the 20th step there’s probably going to be a high likelihood that it’s veed off course at some degree so you want to just like be careful to like make the agents very very as confined as possible and managing your token usage and your context usage is G be hu huge one going forward tools functions talked about that before there’s a lot of different tools you can use just imagine any userdefined function you can write in Python that’s basically what we can do um all right so uh with that said so kind of on the metadata set things um when you’re developing kind of an an industrial grade agent right you want to be able tracking all these things so the session ID like what what exactly was the context that your agent had when it was running um so you know what permissions did they have who are the model providers how much budget did you get it which tools did you you give it what was the system prompt of your agent uh these are all like major considerations you to think about and just like tally down before you just start building it’s like okay how how do I make sure this thing is not going to cost me $80 in runs we’ve had some users that spend $20,000 like a month on GPT just because of how expensive it is uh so keep in mind like you know this’s restriction you want to add and just track your agent uh the tools themselves like number one thing is like giving it a list of functions that able are able to call Api probably the biggest thing and um just being able to man that list of tools is like a huge win for you and it helps you understand like how do you prioritize which tools are the most valuable for your agent and also which function calling algorithm is your agent going to be using um second of that is I just want to mention this like a very very big thing is that code execution is probably one of the biggest use cases for AI agents right now so being able for agents to um basically uh read write execute code use its own Cod code interpreter so on and so forth is going to be a massive win going forward uh so just uh consider that like these agents are able to um basically like write themselves in the way that like Voyager was you know giving it an environment where can compile lint and run the code is a huge one also uh one more thing uh I just want to make sure am I still streaming I don’t want to know have I been talking to myself for 10 minutes see wait can you hear me yes 152 people are watching this live yeah so you should be good okay I had this huge fear that I might have just been talking to myself for 20 minutes so no you have not even if so that’s okay that’s uh that’s the beauty of it okay I I do that enough already so um all right we’ll go back to the big laundry list and then uh ultimately what I do is show off a few more charts and uh run some agents right um and by the end of this we’re actually going to build an agent ourselves uh should be super simple and we’ll we’ll take from there um sorry um Alex what what type of agent are we going to create do you have like a specific use case you want to cover yeah we will be doing an agent and I will post a link to the repo in the chats uh it will be an agent that as I mentioned earlier will actually write a job description for you so we’re going to be using crew aai which is an agent orchestration framework we can actually take multiple agents and have them work together to solve a problem that’s perfect oh does this tie into staff AI that Adam was working on is it is it kind of similar I will show that off just in a few slides I promise okay sounds good super super excited um yeah so uh what else are we talking about so planning we talked about planning a little bit which is like kind of the understanding of how your agent executes things I’ll go a little deeper into that um action history understanding how your agent performs actions multi-agent tendency we talked about that uh workspaces LM providers so on and so forth so anyway this is just like metadata stuff screenshot it or I’ll share the deck with you later just shoot me a message happy to chat um just um just a little bit on the planning mechanism which is like we haven’t just just give every everyone context here we don’t really have a perfect algorithm for agents yet um that is we don’t really know like the best way to orchestrate these things and there’s a lot of labing right like just be confident that like the way that we get these things to work is by experimenting and having a good evaluations tests is how we make these things work so here’s like some common things which is chain of prompt Chain of Thought prompting which is instead of just asking your model like you know questions and answers you um you you ask it to Think Through how to solve the problem um and that actually usually produces better results um and this is called Chain of Thought prompting there’s like some other things which is you can do like um self-consistency checks you can do a least to most uh there’s like a lot of research around this I don’t think it’s like yeah sorry are you sharing something on your screen because it’s frozen on the um PowerPoint oh no that okay yeah I was sharing oh okay uh stop screen we try it again for sure uh thank you for interrupting me no uh someone is also asking why use crew AI which I think you would get into in a bit Yeah crew AI um recommended for a few reasons it’s not a no code it’s not a low code but it’s like a very simple agent abstraction that I think is helpful enough to get things off the ground um I’ll talk about some Asian recommendation Frameworks later but I think like anything that helps you picture like how do you create a class that has basically large language models baked in context window talk context Window Man baked in the ability to add tools baked in um it’s it’s all very simple to do with crew um that said like in terms of Frameworks maybe I’ll just skip to the slide a little bit um here are the main Frameworks I do recommend using um super agent has like a great chat rag uh agent framework um it’s very like super use easy to use but crew AI like lets you just kind of like spin together a bunch of Agents concurrently um and just like you know you can assign them rules and tasks and I think just the abstraction is very simple another really great one I really recommend checking out is autogen so what autogen does is uh it’s also a multi- agent framework you can create like multiple actors but there’s like different interaction patterns you have um so you can create like a chat room where your agents are all like in the same chat room like kind of like a slack Channel or a WhatsApp group or a Facebook group whatever you’re using and the Agents all kind of listen to each other and they’re able to like action based on that um and these interaction patterns actually have led to a lot of success in Industry we have some users using agent Ops who are also building a of autogen which um and they’re they’re building an agent that basically is a research analyst for a bank or for um equi trading so they’re able to connect it with Morning Star and fact set and all of their financial data sets proprietary and licensed and actually have to do research they have multiple agents doing research concurrently that come to conclusions and arrive at like a good investment analysis um and they’re making a ton of money with it too so it’s a really really really great uh service that they have um with autogen meta GPT cool is really really cool I’d recommend checking it out because um it’s one of the coolest coach gen agents I’ve played around with we don’t have time for it today but what you can do is just give it a highle task like hey I want you to ride the game of pong for me or build me an app build me Tinder for dogs or Uber for cats and it will just automatic like what it does is it delegates a product manager an engineer a QA manager and then a tester and they all work together to actually solve the task um so all really really cool Frameworks I’d recommend checking out um things I would recommend staying away from um I’d say like anything that’s like a little too convoluted um I I’m like I really caution linkchain uh not because I think it’s bad uh but I think it’s because like it’s easy to get lost in the abstractions the number one thing I’d recommend doing before even playing around with any of these tools just play around with open AI in your terminal and just like figure out how things work on its own uh and then that way like if one of the Frameworks solves one of your problems I’d recommend leaning in towards it 100% but there’s not really like a huge reason to adopt any framework right now because it’s all so exploratory so um yeah crew is great but you know do what works for you yeah so sorry to interrupt so a quick question for someone who doesn’t know how how is Lang chain and super agent different like why would you use either or or are they essentially doing the same thing yeah um Lan chain is you can think of it like as a set of Primitives for interacting with large language models um so what they’ve done is like instead of calling open AI directly every single time uh using the open AI K what if you want to swap it out with anthropic or Claude or um or uh what are the other ones uh AI 21 Labs right or uh command R from coh here uh they basically built like a simple set of abstraction patterns in Python that let you kind of swap these things out pretty easily and on top of that they have like in context memory management they have a great rag tool they have like a tools Library class that let you kind of add these things together um so it’s like the set of Primitives um they’re doing some experimental stuff that’s pretty cool like L graph which is like graph based execution of test again like any other framework I don’t think it’s going to work perfectly the first time it might be the right way it might not be so it’s all very kind of like lab right now um super agent is like kind of like a highlevel like you could think of it like the open AI assistance API but like much more better managed and more open source um so it’s a great tool A lot of people are using that for rag uh you’re selling some Enterprise applications too um honestly I just recommend checking it out um and playing around with it yourself it’s like a really cool like drag and drop tool so than yeah that makes more sense because you were going through all these Frameworks and and and tools I just wanted to like make sure that we we get it so thank you thank you for clarifying it yeah of course um so uh let’s go here anyway um I I’ll just briefly touch on this slide but anyway the point is like there’s a lot of different planning mechanisms behind the agents um for example like uh the most common agent framework is called react um but it’s uh you know it’s kind of like outdated in a lot of ways and a lot of people train things with like trees of thought graphs of thought multiple chains of thought multiple reacts Etc no perfect solution right now do what works for you that’s the number one thing uh and it’s a lot of experimentation and I wish I had a better answer like use this use this use this but um it’s really about finding what works for you and that boils down to evals so um basically speaking is uh this is going to be your cycle so this was a great post um on modal’s website um so basically like the cycle for improving an Lon based application or an AI agent is figuring out like what your LM indications are logging everything having tests for that and then having an eval set so an eval set basically means like I’ll come up with analogy so when you’re in school like a lot of people study different ways like you know you use flash cards or some people just cram everything the last day before the test uh some people they study sporadically or some people like study every single day right there’s no perfect method here but at the end of the day you have a test you have the SAT you have you know the AP exam the IB exam whatever it is and you want to know which method is going to work best for you and that’s by creating an eval set an eval set is just basically a giant test think of it like the SAT for agents there’s a ton of different evals for different ton of different domains you have like coding evals you have information retrieval emails truthful evals um you have like document summarization evals creating a good eval set is like the most important thing you can do to create a energetic system and bring it to production uh with that like everything else is like basically how you use it to improve the uh underlying system uh in general so you can do things like uh improving fine-tuning your model so that it becomes more stylized to produce results that are in line with what you’re after uh you can do prompt engineering right that was kind of the idea behind Chain of Thought and Treats of thought things so on and so forth which is you iteratively find a way to create the right prompts and ultimately you can use that to kind of like iterate improve your agent system overall I’d say the most important part of the St though it’s just being able to add the logging and traceability two main reasons number one understanding as a developer how your agent is able to uh work with things is the best way to actually improve it just having visibility you’re going to be reading through thousands and thousands thousands of prompts understanding how that works is going to be key to how you as an AI engineer is going to develop the agents second is um you can actually do a lot of interesting stuff with the logging data so every llm call that you do not say to your database is a lost LM call it’s a leaky bucket what you can do with these LM calls you actually fine-tune cheaper and faster and more effective models uh just based on your inputs and your completions so for example agent Ops we work with a provider called open pipe what they do is they take 100 to 1,000 examples and all it does is it just fine-tunes very good effective models that are 10 times cheaper than gbd4 and like twice as fast so uh and just as good so highly recommend just like adding an observability layer uh doing that we can talk about a little bit more of that later um all right so anyway um what are the biggest use cases so the biggest three use cases we see in Enterprise are Tech summarization so basically talk with documents chat uh sorry just like you know being able to take long contracts things like this summarize it huge use case a lot time of people Enterprise are using this that’s why they have Enterprise chat GPT chatting with your documents is also a very big one just being able to understand like the formalities of how things work um huge use case and that’s where the rag applications come into play where you have like very very large data sets you to load in memory so you can chat with them um the last one is support Q&A uh I wish I had a slide for this but there’s a company called Clara Clara is a Swedish buy now pay lader company so B think of it like lay away but on Rails um they had a support Q&A system with human agents right and what they ended up doing was they built an AI agent that replaced all their support Q&A um and uh not only did they see 90 they were able to eliminate 90% of their human agents but they had higher NPS scores are uh higher scores people love talking with the AI agents uh and on top of that they’re saving $40 million a year with these support Q&A agents so huge value add to this kind of stuff support Q&A um so with that said um all right um I want to show Qui yeah go ahead quick question so Leon is asking how do you get slake eval sets how many evals do you need uh fantastic question so the first thing you need to do is decide what data set you want to work with um so if you’re building a code eval agent I’d really recommend checking out hugging face they have a ton of evals and data sets on there um looking on open source repositories a lot of people are uploading those there’s no standard configuration we’ll talk about what standard configurations look like but um definitely check that out if you’re actually building evals you’re building an agentic system right now definitely DM me afterwards we have agent evals built into agent Ops um so I can actually show off that looks like why not for sure and so here is a um so for example we have like a testing dashboard on hent Ops this is Alpha so uh kind of pardon the dust a little bit but the way it works is that you you develop configuration files so this is an eval set called Web Arena web arena is where we take these fake websites so this is like a shopping admin dashboard and we’re hosting these websites if you go here you can send your agent it’s like a Sandbox environment where your agent can like interact you know type in the username and password fuss around with the dashboard so on and so forth and you give it the intent and your eval so the eval might be what is the top selling brand of q1 2022 the answer should be Sprites and in this case the agent got it wrong so it failed it was supposed to do an exact match but off sensibly you can do with like a lot of other things you can do um fuzzy match matching you do llm matching so on and so forth and you can basically see like how effective is your agent at answering questions and actually performing actions um and that’s this is like kind of the basic pattern most evals do which is they have like a Json or a config Theo file where you define exactly how you’re supposed to execute the task and then how you evaluate it so that’s for youil yeah thank you um second question it’s for me you mentioned three use case common use cases for AI agents uh Tech summarization support Q I missed the third one but chat with documents chat with documents uh you got it um because the chat with PDF um repo like blue up and stuff and PDF a um big big project um what are other common use cases or interesting ones that enterprises haven’t explored yet that are possible with AI agents so now your typical basic um chat bot support Q&A um summarization yeah um my biggest uh theory is that RPA which is robotic process datamation is going to be probably one of the biggest use cases so I want to show off an agent called mulon figure it out uh so mulon is a um AI agent uh and basically what it does is it takes control of your browser um let’s see here log in cool all right so we’re in um so I can like give this agent like any task um let’s say um reserve a table at Fogo right which is a popular Brazilian restaurant in San Francisco um and what we’ll do is you have a chat bar here and the agent actually reasons through how to solve the task it opens up a browser and then the agent actually intelligently clicks on elements in that browser to automate tasks so you can imagine like a lot of work is just navigating through ancient websites so right right now it’s going to Google um it’s uh going to read the web page as images as text as HTML okay now it’s navigating it’s on Google it’s typed in yeah it also talks to you but um sometimes it can be a little bit annoying but it it’s very slow right now that that’s like one thing I I like kind of hammer home the point which is Agents right now are very slow they’re unreliable and they’re very expensive um so right now okay cool it’s clicking the link now it’s Ono and and what would be the solution to that to reduce the cost and make it faster um hold on it’s talking to me all right so uh the number one thing is llm costs are extraordinarily expensive uh and that’s because just you know again it’s going to become commodified in a way but uh I’d say the solution is getting cheaper faster local open source models running on your agents um I again I would just say like eat the cost right now because accuracy matters more than everything reliability is the number one thing preventing Agents from hitting the market um so yeah that’s that’s what I recommend gotcha and and we’re booked for for frogo um for tonight or is is it figuring it out we’ll continue playing let’s see what happens um so all right and so one one could Pi comp with these agents they say hey um let’s say May 8 at 800 p.m. so there’s like human in the loops where you can actually send agentic actions to humans there’s a company interesting company called payman and what they do is they just every time an agent can’t do something to get a human to do it clicking on capture entering credit card details so on and so forth it sounds so dystopian when you actually talk about it AI agents hiring people but I feel like it could become normal at some point like five years down the line people just like getting paid from like agents and that it’s G to be insane yes I uh okay I’m gonna I’m G to close this because it’s annoying me but uh just talking um but yeah anyway like I think navigating the web is actually huge use case there’s a lot of really cool companies working on this reworks is an agenta company basically extracting web data at scale so the idea here is like you can give it data on real estate or e-commerce stores and just increas these agents that scraped the web intelligently without any need for developer programming and that’s like a huge use case um second to that we have um browser base these guys are popping off recently so it’s kind of like AI agents for web browsers um really really interesting stuff that they’re doing um and yeah I can think of a few other AI agents um I think kind of the most popular one right now is uh cognition Labs Devon Devon is like for writing code so the idea here is like you might add Devon to your software engineering team um and what Devon’s able to do is like you know find and fix all right they they recently did a hackathon to I think um giving people um beta access to to Devon yeah I heard that uh I unfortunately was not able to attend but uh it looks super super cool yeah because I feel like if you you were there you would have live streamed the whole demo day so you got the hook up let me know next time there’s a lot of cool hackathons I don’t get invited to or I don’t know about so I’m always happy to kind of blast those out yeah for sure um could we could we hypothetically or is it’s probably done now integrate Devon to meta GPT to to code better and faster or tie that with with other um chains to to selfcheck and self- evaluate so are you asking could we give Devon access to meta GPT to make meta GPT better or uh well that one and also just improving metag GPT um like you know 10 10 Xing its its coding Powers um maybe um I think that like your ultimate limitation with LMS in general is that they’re they’re basically stochastic parot they’re reflections of the world model um so they’re only as effective as like effect you know how good gbd4 is that understanding code um so one thing I think a lot of people underrate is that most of software engineering work is not dealing with existing tools and stitching those together and that’s what metag GPD and Devon do quite well it’s actually about taking new tools or tools that you develop yourself and then figure out how do you um how do you chain those together to make things that are interesting uh and so those are like kind of the the biggest complications with these uh these these coding agents like they don’t understand code that hasn’t been written yet prior to checkpointing the models um so I think like yeah hypothetically it’s true uh I think Deon could probably make it work uh the biggest advance in Devon in my book um is actually um the Devon interface I think the reason why Devon was such a huge drop in in the programming world is that they have this such a like look at this UI it makes so much sense which is like you have your terminal in the top right you have your code notebook in the bottom right you have the web browser in the bottom left and then you have like the reasoning Edge in the top left and it’s so clear and concise exactly what it’s doing that you’re saving just like a ton of time and effort uh so the UI layer I think is probably one of the most underrated side of things of how agents should be operating right tools for us they’re not tools for robots yeah so that’s the thing so most of these tools are kind of doing the same thing but it’s just like the UI that could differentiate them or maybe the reach the marketing drops to how they could stand out oh can I drop hello oh no can can you hear me also this um kind of looks like what repet AI is doing um so I know we might be doing a workshop with them tomorrow too but um is it is there like something super cool about that UI that makes it stand out so much or is it like you know vs code and like steroids or or something but what’s like the the thing that that really that you say like Devon is the thing um they want to use for coding two two main things number one is uh I think it’s the Simplicity of the design that they really did know uh it’s a great looking design uh they figured it out but second to that they actually have kind of like checkpoint debugging uh we actually in agent office have a time travel debugging feature we’ll talk about in a little bit but um idea here is like being able to rewind your agent to a checkpoint and then restart it from that point is a huge value ad uh because I’ve heard some folks were using Devon and they were running things for nine hours right they’re very very slow and very expensive so um if you want your H running for nine hours um there’s a likelihood that it’s going to mess up somewhere along the way fall C in a circle and you want to modify the parameters so they can get back on course so that’s where the uh kind of the main um the value ads are UI and time travel sounds good yeah let’s uh let’s Circle back to to agent Ops because this is not a workshop on Devon uh we we we could do a separate one with them um but yeah you mentioned you wanted to like showcase a how to make an AI agent live um with agent Ops absolutely uh so let me um I am going to uh let me unshare my screen how do I do that um okay oops okay stop screen cool I am going to put a link in the chat everyone will have access to this repo um we have uh taken some crew examples from crew AI um and uh just made them easy to access with agent Ops so and I’ll show you how to run them perfect um and is the link on Twitter um under the U the live stream or where did you I just posted it in the chat right now um I don’t private private chat okay perfect I can post it on YouTube and go and put it on Twitter as well excellent very very good um so uh all right uh what I will do now is I will [Music] um I will do some live coding give me a minute here set up a new desktop to take your time some uh we had a live stream I think with with uh it was either uh ABK from NE 4G or somebody else and uh we were we’re chatting about this idea that if folks are live streaming live there should be like an AI privacy Checker that blurs your private keys and stuff because you don’t want it to be leaked right um so that’s something that people could potentially build I don’t know how feasible it is because you have to like integrate with OBS or zoom and stuff that’ll be a cool thing to see as well I I have leached keys before and I’ve lost hundreds of dollars so just be mindful are yeah that’s a shame that that could be like the the problem in the quote um Alex loses $100 with leaked API Keys cool so uh what I will do now is I will open up a terminal I have gone to the k i examples repo it is a fork on the Hops repo uh so what we’re going to do is Let’s Make a new folder I will zoom in a lot and I will to make dear crew AI Asian Ops crew AI a i all right so now I’m in my directory I will get clone the repository link I have posted uh sorry did you want me to put your screen up on there or or not oh am I not streaming no okay yeah uh I just added now yeah okay sweet um all right so uh everyone check out the link if you want to follow along otherwise I will just do it live in front of everyone we have a repo here um crei examples it’s a fork of crei from the cre projects and we have plugged in agent Ops into this so you can actually see how to run this um so I will clone it okay and then we’ll CD to cre examples the one we’ll play around with is the job posting so I’m going to see job posting uh and let’s open it up okay job posting so what job posting. looks like is um we have let’s see here main.py let me zoom in a lot so you can see it okay so um the way that we set it up is um first of all if you’re building agents like U I’m Shing this because it’s my company but I also think that you get a lot of value just by like not looking at logs and terminal um if you run agent Ops all of your logs and all your llm calls and all your actions will be logged so um we set up a client with agent ops. in it and that’s basically all you need to do to get your agent Ops uh plug in with crew uh so we’ll say job posting and then we’ll maybe we’ll say the job is for agent Ops right so I will the idea here is like um we will create a job posting for agent Ops and what we’re going to do is we’re going to have a few input variables we’ll have the company description and we’ll say what is the company description company to domain what is the domain of the company what are the hiring needs and what are specific benefits you offer um and we’re going to kick off three individual agents you can have a research agent a writer agent and a review agent so to create an agent there’s here says agents. P there’s a class called agents um and that’s where we Define these things so underlying it all we have this uh agent class that comes from crew AI um basically all you have to do is provide it a role and a goal and the tools so the research agent for example we want to be able the ability to use Google uh so serer uh and the web search uh the writer agent also has these tools but also has the ability to read files um and then lastly the review agent uh it’s it’s job is just to summarize and clarify and make everything grammatically correct uh and has all the tools and the ability to read files and it’s just going to print things out at the very very end for us so um that said all we have to do is add these agents to the crew so we say the researcher agent the writer agent and the review agent go here um we Define a set of tasks so this is kind of like the planning mechanism that crew uses again there’s like a lot of different ways to plan agents around stuff but here’s how we’re going to be setting it up um we’re going to give it the ability uh see task. research task. py so task. py is we can define a bunch of tasks here tasks are also a class from crew AI um task one analyze the provided company website and the hiring manager company domain and here’s the description focus on understanding the company’s culture values and Mission identify unique selling points awesome another task this will be for the um the role requirements based on the hiring manager needs identify the key skills experiences and qualities the ideal candidate should be using for the door awesome expect that output this kind of stuff uh and then we Define the agent as whatever agent we passing um drafting a job posting same deal reviewing editing reviewing and editing same deal so on and so forth and we’re going to pass all these tasks into the task Cu uh and the task CU will just be defined in the crew so we have the crew class we have the H we have the task and then all you need to do is run cre that kickoff and that it all works all by itself so with that said let’s run this so first thing we’re going to do is umv um we are going to give it an agent Ops key so um number one thing is I’m going to comment this out uh because open AI key is in my environment right now um and I’m going to keep out serer um let’s see here I will get an agent Ops apiq let’s do that so put up a new window what I recommend everyone do is you go to app. agent ops. and you simply you when you log in you go to your uh projects tab um and you can see all your projects in here we’re going to create an API key so I’m going to go copy my API key from here copy it good we’re on keyboard or on clipboard and I will just give that here and sweet we are in the parent key is actually if you have an organization you can actually attach everyone in your organization can see your runs so for example if you want to share your runs with your team you’re able to do that by sharing a parents key um we have main.py and we should be good to go all we have to do is create a copy of that copy example excellent and we’re in so I think the server thing will be fine but we will see if it fails um and then we’re going to create a virtual environment this is everyone’s favorite part VNV EnV and then we’ll Source activate it then lastly we will pit install R requirements. text and that will install all stuff it will install crei for us it will install um agent Ops for us and we’ll be good to go so let’s give that a minute do we have any other questions in the chat while I download all the dependencies okay I guess we’re good uh all right so anyway um I have a pre-loaded run now because I think this might take too much time so let me run this guy uh where is my job posting oops Yeah so this is a run that I have pre-loaded we will run this guy we’ll say Python main.py and we will give it the job that we wanted to run all right um oops one more time let it run All Right company description agent Ops builds observability testing logging and debugging for AI agents spelling is fine with agents by the way because large language models are byart quite smart say agent ops. a we are hiring an uh AI engineering expert who has worked with agents and knows a lot about rag Vector databases and react JS just for say specific benefits uh benefits is uh work on Cutting Edge research and uh build cool projects Okay now what’s going to happen is the crew is going to kick off and we can see here is that we have um the agent is going to start scraping the web it’s going to start start scraping agent Ops AI uh it’s going to print out a bunch of stuff in the terminal and this is how it actually reasons through all its tasks and problems so we have agent Ops make each Works blah blah blah blah blah blah blah um I’m going to go to the website look at their values the perks their benefits this is actually launching a headless browser that searches the web uh and defines exactly what the stuff is going on here um this agent run itself actually takes a very very long time it might take maybe 5 to 10 minutes so I will show you a pre-cache run that I have on agent Ops um see here all right great so on agent Ops you actually have the ability to uh see all of your agent runs in a dashboard so I can see exactly how much money I’ve been spending my average cost per session the number of tokens per session uh the number of tokens and failed session so on and so forth but also be able to kind of more granularly see which sessions have been failing and drill into those so at a high level view like most of my agents kind of suck I’ve had 70 failures 43 successes and then 12 that never actually completed and so the IDE the idea here is I can actually see a high level how my agents is performing over time uh I’m going to go into my session drill down chart here uh and look at my agents that have been running uh so let me do this I will go to um sort this descending cool so here is an AI agent that I’ve been running I’ve ran in the past it took 6 and a half minutes and cost me $267 and we can see all the information that we talked about in the PowerPoint is in here so the host environment I’m not running in a sandbox I’m running my laptop see I have 16 gigs of RAM and I’m on python 311 uh and we can see that I actually finished the execution during the run and I consumed about 880,000 tokens while I was doing so the interesting bit here is that since we actually ran three agents concurrently we had um we can actually filter through we see we have a research analyst a JD writer and a few specialist and we get a whole llm call summary here so instead of having to scroll through the um back scroll through all of this kind of stuff here right you know this is this is unbearable you can actually see all of your logs saved in the the dashboard here so I can see step by step how the agent is reasoning for problems so this is all the agents but what if I just want to say the research analyist right I can actually filter through and see which which run uh which calls the LM was making which tools it was using so on and so forth so for example here um my uh job description writer actually used seven tools talked to CB insights talked to a number of other tools I was able to basically coate them all into a single prompt uh I’m going to show all of them and then lastly what we can get out of this is a session drill down or a session replay where I can see every single step that the agent has made during the execution flow I can see all the llm calls all the tool calls so on and so forth so for example here I had GPD 4 spend two cents to basically evaluate the The Prompt that it was given so you’re a research analist you expert you know you research company culture value so on and so forth um and here’s your job right blah blah blah blah blah a lot of text but I could also see that I was instructed to search the web right so you can see the tool call here search the website um and basically I can trace back exactly what was going on and actually debug which llm calls took the most amount of time so for example this one it was a 17 Cent API call to open AI we could see that it was actually a lot chunkier than the other ones just in terms of like the number of prompt tokens that I FedEd so that’s some information here that we could use to say okay maybe this prompt could be condensed in some sort of way we could save a few cents on the execution um so in any case uh that’s basically how we run crew aai agents uh anyway that’s that’s the key demo things uh the way that you work with agent Ops by the way um if we go to uh GitHub if you’re interested in either using CI crei is pretty baked in we will send a link to the the branch there just follow the documentation page we have some cre integration examples uh but secondary to that um all you have to do is import agent Ops into your python script you do agent Ops in it and then you actually just add some sample recorders and then all that information will get saved to your dashboard um yeah anyway so that’s the demo we can also run some other agents uh try to debug them live uh if that makes sense otherwise we can kind of do freestyle Q&A talk about all sorts of stuff perfect thank you so much for for your time Alex this was super helpful um I learned a lot um if it’s possible to get like a copy of the deck because some people ask them for it I mean the live Serum is there people can go through it but if it’s okay would love to share the deck with them to gotcha I will figure out how to make the deck sharable and share the deck for sure yeah you do that um Also if anyone has any questions put in the comments uh LinkedIn YouTube uh Twitter while we have um Alex we could uh we could ask him on that note while I have you here because I know you’re super busy um Chief Vibes at at cille Valley um love to have you as a community partner as well um I’m I’m trying to secure a location in SF uh The Residency by Nick and forgot the other co-founders name to to have people come coming in person in their Hacker House 40 hackathon May 10 1112 so would love to have it Vancouver London SF and virtual it’s one of those things to to remember awesome hi by the way uh if you were in San Francisco if you definitely check out The Residency super cool organization but also cerebral Valley I’ll put the link in the chat has all the links to all the events. a so B basically they’re a link tracker of all the cool things that are happening in San Francisco hackathons meetups events I think that your events probably on there as well um or if it’s not I can help get that on there uh I hope it is we I think some somebody from the team submitted the the form so somebody has to um check that either that’s you or or somebody else I I will I will make sure that gets bubbled up but perfect sounds good if there is no more questions in the chat um we could end the call right now because I know you’re super busy uh we got to let you go some uh promo for uh a YouTube series we’re thinking of is having uh you like um from agent Ops and a complimentary uh tool to create a app or project live on on the spot so say you use agent Ops you show people how to integrate it with their to like I don’t know a pine cone for example and then we do like a mishmash of Hands-On working and this could be like a a repetitive Series right of like Asian alss times x Asian alss time repet so I feel like that’ll be super interesting for people to watch because it’s you know the deck is nice the coding session perfect but if they actually see that you’re making something in like one or two hours then they’ll be inspired to be like oh I can do this in one or two hours if if Alex is doing it all right red I’m happy to do Cod alongs uh I will uh come up with some repos that don’t take as much time to install and uh we can basically jam on that it sounds super fun perfect sounds good thank you so much for your time Alex um once again uh this was the workshop part of bead build um hackthon series we’re partnering up with with Asian Ops um and we’ll add the event to Cal Valley do check it out and thank you so much again Alex for taking the time to come on right adios thank you so much it was great okay thank you thank you so much everyone for coming over uh if I’m going to try to get Alex to to share the um minus all the private info um and blurring all of it uh do check out the hackathon U Nick um people know about the residency now they might come over to your place um regardless if you want it or not 4 hackathon so be prepared um yeah if you do want to join go for it and we’ll be happy to to support you um do let Alex know if you have any followup questions on it um if you want other workshops let us know we’re doing one with repet tomorrow we’re doing another one with Nexus GP possibly uh crew AI TBD so let us know and thank you so much for tuning in um I’m Poria from beloud okay and I’m Bahar co-founder ofas thank you so much for coming here uh please share us feedbacks comments that how we can make these workshops better and what are the topics that you like us to share and don’t forget to sign up for the hackathon uh it’s both virtual in person so it’s accessible for anyone who join in who wants to join in and it’s three days of building a startup so love to see what you can come up with and what you can build in three days and I think we can end the yeah thank you so much for tuning in you take care have a wonderful day bye-bye bye