AI Frontiers: Chi Wang, Principal Researcher at Microsoft Research and Creator of AutoGen
AI Frontiers: Chi Wang, Principal Researcher at Microsoft Research and Creator of AutoGen
AI Frontiers: Exploring the Innovations of Chi Wang and the Creation of AutoGen
In the rapidly evolving field of artificial intelligence (AI), groundbreaking innovations and theoretical advancements are continuously pushing the boundaries of what technology can achieve. A notable contributor to this dynamic field is Chi Wang, Principal Researcher at Microsoft Research and the creator of AutoGen. This revolutionary project not only enhances the capabilities of AI but also simplifies the development of complex AI systems, making it accessible to a broader range of developers and researchers.
Introduction to Chi Wang and AutoGen
Chi Wang, a seasoned expert in the realm of artificial intelligence, has made significant strides with his creation of AutoGen at Microsoft Research. AutoGen is a framework designed to streamline the research and development process in AI, particularly focusing on AI agents. Wang’s work addresses the challenges and complexities involved in designing AI systems that are both efficient and effective. His approach leverages the concept of AI agents that can operate with a degree of autonomy, enhancing interactions between human users and AI systems.
Defining AI Agents
Understanding the concept of AI agents is crucial to appreciating the innovations brought forth by AutoGen. Wang defines AI agents as entities capable of acting on behalf of human intent. These agents can send and receive messages, perform actions, and interact with other agents to fulfill their designated tasks. This broad definition allows for the inclusion of various complexities within the agents, from simple units performing basic functions to complex systems composed of nested agents.
Simplification through AutoGen
The AutoGen framework introduces a simplified method of constructing AI applications by defining agents and enabling their interaction through a unified interface. This approach reduces the necessity for developers to manage intricate details, focusing instead on the broader architecture and functionality of the AI system. Wang’s vision is to create a modular structure where AI agents can be easily adjusted and configured to meet specific requirements, thereby accelerating the development and implementation of AI solutions.
Application and Impact of AI Agents in AutoGen
The practical applications of AI agents as conceptualized in AutoGen are vast. From enhancing digital assistants to managing complex data analysis tasks, these agents can be tailored to handle various operations across different industries. The flexibility and adaptability of AutoGen make it a powerful tool for developers looking to harness the potential of AI without getting bogged down by the underlying technical complexities.
Future Prospects and Developments
As AI continues to advance, the role of frameworks like AutoGen becomes increasingly significant. The ability to customize and control AI agents to perform specific tasks will likely lead to more personalized and efficient AI systems. Future developments may include more sophisticated agent interactions, improved learning capabilities, and broader integration with other advanced technologies.
Conclusion: The Road Ahead for AutoGen and AI Innovations
Chi Wang’s work with AutoGen represents a pivotal shift in how AI research and development are approached. By prioritizing simplicity, modularity, and flexibility, AutoGen not only empowers researchers and developers but also paves the way for more innovative uses of AI technology. As we look to the future, the continuous refinement and expansion of AutoGen will undoubtedly play a critical role in shaping the landscape of artificial intelligence.
Chi Wang’s contributions underscore the importance of forward-thinking approaches in technology. Through his work, we gain insights into the potential for AI to not only mimic human behavior but also to enhance our capabilities and interactions in a digital world. As AI agents become more integrated into our daily lives, the principles laid out in AutoGen will guide the development of more intuitive and responsive AI systems, marking a new frontier in the evolution of technology.
[h3]Watch this video for the full details:[/h3]
For more on Valory:
Follow us on X – https://bit.ly/47sxdtS
Check out our site – https://bit.ly/3NMfNBm
Valory on GitHub – https://bit.ly/3V4R1Rw
Read “The argument for co-owned AI” – https://bit.ly/4aHPINK
Follow Thomas on X: https://bit.ly/3PjRxXQ
For more from Chi Wang:
Chi’s MSFT profile: https://bit.ly/3WaaoZE
Follow him on X – https://bit.ly/3WaPwkX
LinkedIn – https://bit.ly/3vTbu1u
Chapter Markers:
Introducing AI Frontiers 0:00
Introducing Chi 0:55
How do you define an AI agent? 01:21
Can you give an example of something you would consider an AI agent, and contrast it with something that’s not an AI agent? 06:05
How did you get into AI? 11:29
Are there any early breakthrough moments that stand out? 13:33
Why AI agents for your work? 18:50
What design features impact AI agent effectiveness? 26:00
Have we, or have you created a truly autonomous AI agent yet? 35:41
Any predictions for the near term future of AI? 38:22
Any recommendations for people who want to learn more? 41:15
Where can people keep up with you? 42:13
Resources Mentioned:
AutoGen paper – https://bit.ly/4b8ix56
AutoGen Github – https://bit.ly/49M7H3H
AutoGen site, with tutorials and more – https://bit.ly/4b9QJ0r
AutoGen Discord – https://bit.ly/3JtGjwy
Music by Chris Zabriskie – @chriszabriskie, license CC BY 4.0 DEED Attribution 4.0 International
[h3]Transcript[/h3]
[Music] hello and welcome back to another episode of AI Frontiers where we bring you conversations with the people currently working at the far edges of artificial intelligence before we get started I want to tell you a little bit about Valerie Valerie is the Premier creator of Open Source Frameworks for co-owned AI and it’s our mission to empower communities organizations and even countries to co-own the AI system to the Future it’s our belief that artificial general intelligence will likely be agentic and that’s why we found it important to take this moment to bring you a broad range of perspectives from people currently working in the space one of those people is today’s guest chur Wong he’s a principal researcher for Microsoft research and he’s also the creator of autogen I found this conversation particularly engaging and I think you will too so here’s my conversation with Chong yeah it’s great to have you here uh and thank you for making the time and great place to start would be you could tell me your name and what you’re currently working on yeah uh hi Thomas my name is TR TR Wong I’m from Microsoft resch search I’m working on the autogen project uh I created it last year and now I’m leading the research and development of this project awesome I look forward to hearing a little bit about that and diving into your experience with AI and AI agents in particular on that note a perfect place to start is if we Define our terms and I’d like to know how do you currently Define an AI agent yeah that’s a very interesting question I I do have a very long answer to that uh the reason is you know there are so many different definitions of a is used ining many different locations so I want to give you a more comprehensive view from my perspective what when I design autogen I really think about what is the most comprehensive definition of agent so that we can incorporate all these different kind of Notions so I’m trying to find the minimal set of Concepts necessary for a agents and try to remove all the uh secondary considerations and make but make them compatible with the definition right now my definition is they are the entities who can act on behalf of the human intent they can send messages receive messages perform actions and generate Supply and interact with other agents so that’s still a little bit AB abstract so if I want to make it more concrete uh I would say I by definition of AI agents cover a spectrum of different complexities of these kind agents for example you can compose complex agents using simpler agents you can have Ness chat inside the complex agent to recursively build up more and more complex agents or you can have very simple agents that perform relatively simple functions in autogen every agent can the the simple agents can use different types of backends to support that them Behavior they can use large Ang models or in general any AI models or they can use tools that are non models but also are useful uh to generate certain kind of responses and third they can use human input as the source of the response but all the agents in autogen are convertible and customizable so that’s the commonality and they all share the same unified interface and on the underneath they can have like I mentioned either simple responses using a single source of backand or they can contain arbit complex inner conversations like Ness chat between other agents so in this way we can Define both very powerful agents that are composed of like many different agents inside it or can Define relatively simple agents and use them as a basic units to build more complex systems so this is um my definition I also want to talk about uh in a different dimension about when people think about agents they sometimes think about them in different context so I I make some clarification about that for example sometimes people talk about them as a interface so that means users have the experience to talk to some agent for example in those case they uh they can think about this interface and you can say when us are talking talking about this interface they have the perception that they are talking with a single agent so that’s a interface point of view but there’s another point of view about the software components right you can think about agents as some basic units to compose a big AI system so they perform many actions that are not necessarily are observable by by the user when user talks to for example a assistant agent underneath that assistant agent may talk to a few other agents perform different jobs but user doesn’t necessarily need to know that so in that case those other agents we consider them as software components so this is like the architecture view about agents so I think both are important the interface view uh mainly emphasizes that the experience from some any user point of view U they have the experience of working with a very strong agent that can do many different tasks but the software view provides us a path to build s such experience so even if you want eventually build a single agent experience you could still use multi-agent as architecture to build a system so yeah so I I hope that’s that provides some clarification about what I’m thinking about the agents yeah that’s great and I’m glad that you touched on that agentic action is a way of acting in the world it’s not quite as narrow as just the assistant model where you send an instruction but yeah as you said below the surface there can be things that are not necessarily observable and I think that’s that’s really interesting and uh will be quite quite relevant for a lot of my audience who arrived through multi-agent system approach and so I’m really glad you touched on that it would be cool to hear you maybe um provide an example of something that you do consider an AI agent and then contrast it with something something that you don’t consider an AI agent I think maybe we can see something interesting there yeah that’s that’s interesting question so I think from interface uh we can continue using our terminology before if you think about interface then what is considered as a agentic interface or is consider as not agen interface I think that’s inter question so I think for many users they would think about agentic interface as some interface which they perceive as contain certain level of intelligence and degree of automat right a chatbot is a very typical kind of interface and assistant like personal assistant from mobile or from web can comp consider as agenting interface and what is not agenting interface I think when people use a traditional software with go where people need to click buttons and esally ask for relatively straightforward action to perform and get the executive Behavior many many traditional software like or or service have that interface is basically a function call and you ask it to perform a certain functionality with certain parameters bindings and you you can basically get back the simple result but but this I think what I said is I’m not happy with my answer because it’s it’s a bit fuzzy because it’s kind sub it’s quite subjective I feel like in the scenario described it sounds like if the user has very straightforward expectation of the of the result then it sounds like that’s identic if the user doesn’t expect the software to or didn’t didn’t expect the solution will be used by the software and but the software can sometimes come up with the concrete solution without the every detailed instruction provided by user it considered as it’s considered more agentic but when you this definition will notice that this is related to the user expectation right some in future if even for complex tasks all the users expect the system to come up with a straightforward answer without any different alternative path then even that complex task can be consider as non- agentic so here I think the dimensions uh we want to consider is both complexity and auton autonity and whether there are multiple alternative solutions to the problem and the system is a able to figure it out on its own and also maybe another dimension is whether uh there’s this conversational interface that the user can iterate back and forth if something doesn’t work the user is able to provide further clarification and get better response um that’s also element that’s consider important as agentic versus not yeah I’m sorry I don’t have answer I I like myself none of them is scientific none of these is very rigorous but want to come back in my earlier statement about autogen agent autogen agent doesn’t try too hard to draw a boundary between agent and non- agents it tries to incorporate the broadest sense of agents for example even um true Bas agent that only performs certain determinist Behavior like executing a function or executing some code suggested by another agent and send response back that’s valid agent Concept in in autogen the reason is uh autogen wants to to offer these agents not only as interface to end user but also as a software component so that developers can build complex aib applications using them so want to keep the relatively simple agent which performs even a certain tool-based action we want to incorporate that as a one type of agent so that developers have all the necessary type of Agents they need even though they are not typically regarded as agent in terms of interface that the human deals with right so yeah I guess a general conclusion I had is uh when we think about the software components we want to be broader and incorporate some relatively simple agents as valid agent concept for for for coming is to have a unified way to build arbitary complex applications but when we when we come to the interface then uh I guess there’s higher requirement to the intelligence and convers ability uh ability to handle complex tasks when for for people to qualify them as agents that’s probably my take yeah for sure and I completely understand it’s a very fuzzy and like difficult to Define I’m glad that again you’re mentioning that tool-based agents can be composable into larger systems and this brings us back more to the way the term like agent was used maybe like four or five years ago we’re talking about multi-agent systems or like agentic Behavior swarming these kinds of things like what we can add in with llms is another kind of agentic behavior but it’s what’s most interesting to me is yeah that you can stack these things together in a way that allows you to when you hit a unexpected complexity what happens after that often where I see what I consider like quite agentic Behavior so I’m glad you’re mentioning all of these even though the the space is yeah quite fuzzy I’d like to hear about your background so maybe you could tell me like how you got into the field or how you started working with AI yeah thanks uh let’s let’s see where should we start so okay let’s let go backwards right now I’m a principal researcher at Microsoft research AI Frontiers uh I’ve worked on large language model and AI Frameworks uh Automated machine learning and Hyper parameter tuning model selection machine learning for systems uh scalable solutions for data science data analytics I’m the creator of autogen and uh I also the creator of Flo another open source Library great for fast uh autl and uh tuning it’s also a widely used uh project uh in both inside and outside Microsoft so yeah so you can see that I worked on a variety of different projects uh in AI machine Learning Systems data science my PhD is uh in data mining but I did like text mining graphic uh mining at that time by the way I want to mention an interesting thing uh at that my P time I was also working on generative models for for text for natural language but it’s very different from today’s generative AI techniques large angular models back then it was very small models with high interpretability there are like the probabil models something called B networks uh so probably uh to the younger generation today is not familiar with that but they are indeed generative models or and also language models that can be used to model Text data but it’s amazing to see how fast the technology evolves uh so now today a lot of the dreams earlier become true right we can now make very powerful yeah agent based on these strong strong models so uh that’s a kind of short maybe background myself did I miss anything I like that you mention that or seeing the ability to make things we maybe dreamed about come to reality so looking back on your experience are there any standout moments where in your early experiments you got really excited or had like a a breakthrough moment that really stands out to you yeah so the I think it’s just like everyone I think the GPT moment is is indeed a a big standout moment for me uh and before that I think deep learning is also a big breakthrough but I would say the gbt especially the CH gbt moment was the most exciting but before that I was mainly working on this automating machine learning this flam open source project for tuning models uh model parameters choosing the right model for each application automatically that covers both small models large models basically any machiner models but these gbd models and just chbd experience is so powerful and different such that I started also trying to apply the autom technique to to the gbt models and see if we can also tune the models inference parameters to get out of to get the best quality out of them while using the lowest cost because these are so expensive and during that work I’ve found uh many interesting behaviors and many interesting implementations about how to build a very effective application using these powerful langage models I realized uh it’s often not enough to just uh use the model to the inference once or use the same configuration once sometimes we need to use different configuration uh and combine them with different tools or involve human in the loop to build the best experience and best application so then I start to think about this more generic way to design these powerful applications and try to think about the very large design space but have a standard way a relatively standardized way for developers to simplify the reasoning to consider all different complexities in a relatively simple fashion I’m trying to find a similar to you know the Deep learning and machine learning time people have been able to find some standardized uh API and interface for developers to build these models even when they can build different type of models they are follow the similar kind of standardized interface so that simplify a lot of things and makes the Automated machine learning possible and coming from that background I I was I was really excited to try to find this common uh interface for developers uh to be able to build all different kind of applications all different complexes and different domains in um actually unified and simple language and the way to reason about them and build them so that’s where I found this agent notion agent concept very essential and useful concept that can simplify a lot of different things and in when the design aogen try to say okay no matter how complex the application is let’s make the developer job really easy to just use two steps to build application one is to Define agents and step two is to get them to talk so and as as simple as that and by the way the why why why get them talk it’s because of the the chat interface in chat abbt uh reminds me of something I learned long time ago back in my college about the power of the conversation so I so when I see the CH GB interface I need related that to that lesson I learned long time ago from a professor uh at my college time I learned that uh conversation is a pro way to to do learning and to make progress and to eventually solve a problem and that is some kind of belief maybe rooted but with this progress of GB models and trbt those lessons got waken up basically and I quickly decided that uh I’m going to use this conversation as a key mechanism to like combine and connect all different agents and make them work more effectively as an integrated kind of system and solve bigger problems then they could alone so back to the design principle about you know autogen we want to have want devors to be able to build very complex applications in two steps Define agents and get them to talk the agents like like I mentioned earlier can be defined in many different ways they can have different kind simpler agents or it can have more compens agents that utilize s agents inside them and have n chat inside them but no matter whatever they are eventually they can use some convenient conversation interface to weave them together and make them perform different types of conversations and eventually work together to solve a problem yeah that’s a very brief um introduction about what are the like big moments that influenced the work uh and some some design inspired by those big moments yeah that’s great I’m glad that you can put it in a few simp phrases like build complex applications in two steps and Define the age and get away for them to talk with each other I’d like to hear a little more about the main focus areas of what you’re working on now and how AI agents are fitting into that and maybe why AI agents versus another approach what about them is particularly well suited to what you’re working on yeah that’s great question why did I choose AI agents as the essential concept when I try to find that standard that interface can be indeed many different ways to offer that uh interface I think uh there are maybe several reasons and but the most important one maybe the design principle before I get to the agents is that we need to remember like what is the source of this new wave of Technology Evolution right so that is the fundamentally it’s because of the lar models they are the newest technology and they are the most important technical breakthrough so the initial design principle needs to be surrounded that them so I was thinking about lessons we learned from previous operating systems or how we build the computer systems before for example when CPU and GPU were the most valuable resource uh the operating system will be designed to try to maximize the utilization of these valuable resources give them the right peripheral peripher fors um like memory and other head hardware and software that you know make best use of that same similar lesson can here if we think the L model are the most important most most valuable new kind of resource we have you can use them as a brain to essentially control the new type of systems then we also need to try to maximize the utiliz of them okay so that’s the number one that’s where where it got started and then it’s maybe not hard to understand why uh we think about agent tick abstraction to use uh because as we noticed the most advanced L models uh they are indeed able to uh perform different types of conversations that make them exhibit this intelligent and sometimes self corrective Behavior so if we can make make them work we don’t have to uh build applications in in a traditional way such that we need to give them detailed instructions about every single step what to do these models sometimes are able to suggest a plan and follow the plan and when it encounters unexpected result they’re able to discover the problems and debug and cor their mistakes so if those abilities can be well leveraged and makes agent concept work then the development of application will be greatly simplified and the opposite way is if we simply use them as simpler kind of T text completion tool then the developers often need to think about many different layers and steps and that that is more complex to to build applications Yeah so basically uh I’m trying to say that after some exploration and thinking I got convinced that especially with the progress of the model capability uh I got convin that in future AI agents will be a very promising new way to greatly simplify theel of any complex AI applications and that is probably the most U most promising way to um maximize the utilization of these power models I think that that’s number two I think number three is what about the limitation of the langage models at a time when I started to build agentic program framework like autogen there were still a lot of limitations about what these L models can do in terms of uh using them as agents if you simply put a lang model and make them gener response they don’t always work in many cases they will hallucinate they will generate uh wrong code wrong answers so at that time it seems not mature seems not so many people have have doubts whether it’s indeed the right direction to go yeah when I thing about I agreed that uh they were not ideal uh yet but there there are several things not made made me believe there’s indeed a good potential and I I don’t want to make a bad and follow this pass one is we know that the model is still evolving they’re still getting better every time uh when the new version is released so that’s number one and number two is I I don’t believe we should slowly around the Lang model to build the agents in any case even when the model evolves because they often need external information to validate what they suggest uh even though when they become very good at reasoning uh they don’t necessarily have the information all they need uh when performing interactions they don’t necessarily know what the user want exactly the users may give a relative vague request and they will need to figure out what’s the true intention by getting more information from the user and when they suggest for example performing actions in certain environment like in the users environment they don’t also know everything about the user environment so they will need to get execu result in that environment to validate things and also when external knowledge changes when the word changes uh and have new information that a model didn’t have when they were trended they also need to get up toate information so in any way I don’t think model is enough to build build useful agents so we will need this type of non model component in the system and then of to have the choice to have separated Notions or concepts for each of them but that goes against the princip of like simplifying the the reasoning and the concepts so why not try to find an approach that uses the same notion without introducing new Concepts can we still use agents to represent them right so it turns out we indeed were able to uh use S more General notion of agents to incorporate all the different kind of entities but we make them very effectively work together through this conversational mechanism conversation programming and once once I see that uh it makes me a stronger believer that uh this agent concept and multi-agent especially multi-agent conversation is very generic useful way or architecture to build any application powered by L models and as time goes by I got more and more validation from different users and different type of applications from different domains that just make me uh more and more a strong believer in that yeah that all makes a ton of sense and it’s really interesting to hear that as you considered working with agents this you returned over and over again to looking at the advantages of one approach and the disadvantage of another I’d like to know are there elements that you found that significantly impact the effectiveness of an AI agent at the design level yeah there are there are indeed a lot of challenges still uh I I uh I think it’s a it’s a big open question um what is the most effective multi-agent design or architecture for particular application and and how to how do we create this highly capable agents while ensuring like skill safety and human agency these are all very difficult hard know open open questions I think there are uh a number of Dimensions uh that we need to pay attention to one of them is evaluation I think evaluation is uh very important to uh measure the progress and understand are we are we doing better or are we not making any advancement right uh and this for agentic applications is especially challenge because the the metric is sometimes not easy to to find what is the effective metric for task completion people people can use like success rate but it may be not not enough even for the same success rate two different systems can exibit other different behaviors like number of steps they need the cost uh involved and the easy need to understand what they’re doing transparency interpretability or all these kind of different metrics but but on the other hand it’s possible to build agentic evaluation framework like building agents are able to uh look into the behavior of other other agents and reflect what are the important Dimensions we need to evaluate them we can measure them and then score the behavior of different systems we have a some initial ongoing work called Agent eval that is essentially a agent based evaluation tools we also have a benchmark tool called autogen bench that helps users to download Benchmark and run them and get a sense of like how could the performance and these tools actually helped us to reach very good performance on the Gaia Benchmark uh which is a benchmark by meta for measuring the capability of models or agents in solving very complex tasks that involve many different steps our initial experiment using autogen actually turned out to be the number one performance on on that Benchmark in at all three different levels uh just a few weeks ago so we’re excited by by that that because we knew there was still a lot of room for improvement but this basically demonstrated the like big potential of using autogen to tackle the compx tasks okay so that’s about evaluation and second category I want to mention is the program interface uh or general in general the interface to build agents and uh make them work together so we’re making a lot of progress in that too uh the why is that important is like I said earlier the best architecture is not found yet and also it may be different for different applications so we we need a framework that allows people to quickly experiment with different ideas different AR detectors and in a modular way because if they want to make small modification by adding an agent or removing agent or modifying agent we want them to minimize the change needed and also we want them to think about these problems clearly instead of having too many like cou hold component so that it’s hard to make any modification so these are all the reasons that multi-agent framework like M uh is is a good uh choice for like modularity no flexibility and simplicity so and then if we uh work together the community experiment different ideas together we could potentially make very very very fast progress uh and recently we also making these different types of programming conversation programming uh more explicit previously alter already had this fundamental architecture that can support different uh conversation patterns and we provide some examples of the high level Burr interface like group chat by having multiple agents talking group in the freey uh but that was meant as example we don’t mean that um this is the only way to build multi applications using Auto gy and to make that more explicit recently we also added a few more high level constructs for for a few other patterns these previously users can build these patterns using the same same basic interface conversible agents to achieve them we Alside examples uh but with the newly added high level constructs uh high level syntax convenience it’s even easier for users to choose those patterns like sequential chats n chat and uh Finance machine based group chat which allows more control uh of the agent speak harder in the group chat uh we found that this this simplification also GED help with the practitioners to more quickly build a complex agent uh workflows okay that’s about interface and uh we also not only for programs but also we’re building interface for like non programs for example we had example application called auten studio uh which allows non-developers to quickly create agent and multi-agent workflows without writing code so it’s a no code UI and it got a lot of attractions too uh I think that will accelerate the beat for experimenting with all these the different uh implementations of multi agents and a third category I want to mention is the learning teing optimization this is a essential capability for agents to improve over time for example by learning from the past interactions take teachings from users or or from other agents uh and improve over time so the more people use these agents the more they get better one benefit of this is initially we can begin with some agents work which don’t always perform as desired but without changing the design of them if we we give them the learning capability then they can simply become better when users use them over time without um uh no changing the architecture or rebuilding the application right so they can start with application with not so good performance but as uh initially you can have some developers use them and teach them things and once they get to the point where it’s okay to be tested among large set of users then it can get larger set of B feedback and gradually you can grow the capability and if effectiveness of of this application simply by using them so that was a very kind of early design principle when I started building autogen I really want to enable this kind of uh experience and we recently we have also added many such techniques in the autogen library some of them are about enabling any conversive agent to be able to take teachings remember them in long term and apply them in future tasks that sometimes can have a big effect in the performance even when you use the powerful GBC models with the teaching and without teaching the performance can be hugely different and the effort required is relatively lightweight so users often only need to teach once and they could become very useful and we also have techniques like having agents of different capability teaching each other like having gbd4 agent teach gbt five turbo agent and eventually what we get is both higher quality higher success rate and lower cost uh com compared to using single gb4 agent because now many cheaper agents now can be used to solve complex tasks they couldn’t before yeah so we also very excited about the progress in this area and the fourth category is the integration of with new technologies uh like for example with different type of model models uh we can make them perform different tasks uh some models can be fine-tuned to be very good at certain specialized tasks uh with high performance and low cost that also can potentially increase the effectiveness of the application while keeping the cost of exer slow which can eventually lead to faster progress and there another another example is integration with multimodal models multimodal with uh gave agent multimodality we have seen examples of using them for robotics for embodied agents and there are many creative agents that can be enabled by this different new capability um so in general integration with new technologies both interal models and other tic Tech techniques uh would be a great way to make Leap Forward make new breakthroughs and and yeah we do have lots of examples of these alen I did want to ask about autonomy because that’s something very that we’re very interested in at Valerie so I’d like to know have we or have you managed to create a truly autonomous agent and if not what steps do you think are missing yeah I think that’s uh equally tricky question um and of course one thing that’s come up when I’ve asked this question in the past is that it’s really important to Define what we mean by autonomy because by some metrics we already have autonomous software and have for a long time with the level of autonomy that people expect from AI agents when they imagine what they can do is sometimes higher than what’s possible and but we we are sometimes seeing people make at least small steps towards it so in your own work are you seeing something that you consider autonomous that meets your definitions yeah so okay I think I’m I’m constantly Amazed by how creative the community is in building all different type of Agents so every time I’m surprised I feel like we are getting yeah I feel like people were now able to build agents that we didn’t know whether they could perform that task uh in a more or less autonomous way before so I definitely I’m seeing a lot of progress like one recent example I found is uh using agents to look at the Google Earth U map and try to calculate the area for Wildfire right so that can be considered as with a certain amount of autonomy uh which were I didn’t expect to do before um so I I think to some extent uh in many application domains we are seeing increasingly capable agents that can solve more M tasks of course once you solve some task people always wonder okay can they some bigger questions can our bigger tasks it seems natural progression uh I haven’t seen a fundamental block to enable agents to solve more and more compx tasks it’s just like any software we usually accom do this step by step before we tackle the current complexity it’s hard to get to the next level and maybe at some time we we’ll hit a aable blocker but it occurs to me that we’re not in the blocker yet yeah that’s great so I think we can just expect to see iterative improvement over time as we get new capabilities and and most importantly I I think more people experiment that’s what we really need to move the field and the space forward as as more people try to solve for use cases we’re going to see more and more agents that we do consider like truly autonomous yeah I know I’m definitely looking forward to that okay I have one last question for you and I won’t hold you to any of these predictions but do you have any predictions for like the near-term future of the field like what do you think might be coming next in the near future I I would expect a number of more complex tasks get getting big progress uh using the MTI agent Frameworks uh the recent breakthrough in Gaia Benchmark kind of hinted that and and that is just for that particular Benchmark but we also found uh recently from uh softare engineering development or video creation that you know we already seen interesting progress in this more contest tasks before and even though the absolute performance number is still in the low digits like 10 10% 20% but I think the basic components are there and because the potential of using MTI agents is very high that’s a big design space I think currently people only explor a small fraction of that so by exploring more in that space uh I do see very high chance people could make a lot of progress in in these traditionally complex uh tasks but by having agents playing different roles talking to each other performing multiple steps and restart when necessary this potentially can enable a large uh improvement over over over the short term really really encouraging and I look forward to seeing how that shakes out another potential breakthrough is we might see smaller models getting more and more performant in specialized tasks potentially also using autogen for the purpose of preparing the training data when with an example I’m see is allar math model which uses autogen to prepare high quality and diverse training data for math problems and finding a small model just for that domain it outperforms models like GB 3.5 turbo which is of much larger size and due to the quality of the Sham data generated by the multiagents the cost is Al also not very high the the train data size is not very large it’s quite manageable and I think if if this kind of success is repeated in many different types of tasks then we should expect you know lots of progress in training the small small size models and use them in applications which will greatly break down the cost and potentially even also improve the quality because because of the affordability of the smaller models so and people can potentially build more complex multi agent systems withing them yeah that’s fascinating so as we get to the here one last thing I like to ask all of my guests is if listeners or viewers are interested in this space and want to learn more are there any learning Pathways or reading recommendations that you like to give or places that you would Point people who want to learn more so yeah uh we we are U actively building new tutorials in our website and uh expanding that to cover also many Advanced topics for uh anyone uh interested in learning more specific topics U please reach out on Discord and suggest uh what to add and also in general I’ve learned a lot of new knowledge and new technologies from our Discord Channel and new ideas so having discussion and collaboration throughout disc channel is a very effective way of learning I’ve found many successful examples from there so yeah tutorials um Discord or involving on GitHub uh I would encourage people to join these channels perfect so I’ll get links in the description box or the show notes for people who are listening if people want to follow you or or know what you’re working on I can include your uh Twitter account is there anywhere else yeah please Twitter and Discord and GitHub we Linked In yeah so I’m so I’m most active on GitHub and Discord and but I also check Twitter and Linkin frequently okay perfect then I’ll get links to your files in those places uh and for people who want to yeah keep up with what you’re doing yeah uh I want to thank you for your time it’s been a really interesting conversation and I think people are really going to get a lot out of it yeah thank you so much yeah really nice meet you yeah likewise all right that is the end of this episode of AI Frontiers I want to thank you for watching and or listening and invite you to check out what Valerie is doing at valerie. XYZ there you’ll find out more about our open source framework for co-ownership of AI ways that you can get started running your own decentralized autonomous AI agents and much more you can also find us on X where we are at Valerie AG okay until next time