Microsoft’s AutogenStudio running on Llama3 locally by LM Studio! Using for Arxiv research!
Microsoft’s AutogenStudio running on Llama3 locally by LM Studio! Using for Arxiv research!
Leveraging Local AI Powers: A Deep Dive into Microsoft’s AutogenStudio with Llama3 on LM Studio
In the rapidly evolving landscape of artificial intelligence (AI), local computing capabilities and research tools are empowering developers and researchers like never before. Microsoft’s AutogenStudio, coupled with Llama3 by LM Studio, is at the forefront of this innovation, offering robust AI-driven solutions right from the convenience of local setups. Today, we’ll explore how this powerful combination is enhancing research capabilities, particularly within the realms of Arxiv research.
Introduction to Autogen and Local AI Deployment
The shift towards localized AI deployment marks a significant step in the AI community, providing users with greater control over their computational tools and data privacy. AutogenStudio by Microsoft, when run on a local server environment using LM Studio’s Llama3, exemplifies this shift. It offers an all-local, cohesive environment for AI agents to perform complex tasks without the need for cloud computing resources.
Setting Up AutogenStudio Locally
The user experience begins with setting up the AutogenStudio environment on a local machine. This involves installing LM Studio’s Llama3, a powerful tool in itself with an impressive capability for handling extensive model calculations at a quantized efficiency. Imagine running an 8-billion parameter model, quantized at 8 bits, directly on your local server – this has become a reality with Llama3. The setup process, although technical, unlocks the potential for running intricate AI operations with surprising speed, reaching up to six or seven tokens per second in response time.
Exploring AI-Driven Research with AutogenStudio
One of the quintessential features of AutogenStudio locally is its ability to delve deep into specific research queries using bespoke AI agents. For instance, when tasked to explore promising scientific advances related to space travel and energy production, the system can efficiently parse through extensive databases, like Arxiv, to pull relevant academic papers. This is done through a structured dialogue between AI agents, where each agent plays a role in refining the search and presenting the results.
AI Speed and Efficiency: The Need for Improvement
Despite the remarkable capabilities of running these AI models locally, there remains a pressing need for increased speed. Current speeds allow for the processing of six to seven tokens per second – adequate, yet far from ideal. The dream is to reach two to three hundred tokens per second, thereby significantly reducing response times and enhancing the usability of AI agents in practical research scenarios.
Local Configuration and Customization
A significant part of running AutogenStudio locally involves configuring and customizing settings to fit specific needs. Researchers can alter base URLs in the system configuration to direct the AI’s focus, adapting it to various research requirements. This flexibility in configuration underscores AutogenStudio’s utility in academic and research settings, where customization is often key to obtaining relevant and actionable insights.
Practical Applications and Case Studies
During practical use, AutogenStudio has shown its worth by successfully managing complex queries about recent scientific advances, despite sometimes pulling older but still relevant papers. This indicates a robust archival search capability but also highlights areas where prompt adjustment can refine results further.
The Future of Local AI Tools
Looking forward, the integration of AutogenStudio and Llama3 by LM Studio is set to become even more influential. With advancements in AI models and increasing context windows (up to 128K), local AI operations will only become faster and more efficient, paving the way for more dynamic and extensive research applications.
Conclusion: The Localized Revolution in AI Research
The combination of Microsoft’s AutogenStudio with Llama3 represents a significant leap toward democratizing AI research and application. Running these tools locally not only ensures data privacy and security but also offers researchers control over their computational environment. As AI continues to evolve, the local empowerment provided by tools like AutogenStudio will undoubtedly play a crucial role in shaping the future of AI-driven research and development.
With technologies like these becoming more accessible, the potential for innovation in fields such as quantum computing, space exploration, and energy solutions is immense. The localized AI revolution is just beginning, and its impact on scientific research and practical applications will be profound and far-reaching.
[h3]Watch this video for the full details:[/h3]
Here’s a short video I made of me running AutogenStudio using LM Studio to run Llama3 as the ai brains. It works surprisingly well! Here i’m able to make a skills call out to Arxiv to retrieve papers on promising space technology!
[h3]Transcript[/h3]
hello and welcome back uh I just wanted to do a little video today on autogen Studio running completely locally using LM studio uh and running LL 3 right now uh so this is all just going locally and these agents are working really wonderfully um I’ll show you here’s the the LM Studio server running I’m using the do dolphin 29 the uncensored llama 3 uh 8 billion actually uh Model quantized 8 uh quantized at eight bits uh you can see it’s getting here’s my request thank you for finding uh oh no that’s not my request that is the agents talking to each other content uh thank you for finding those archive papers so it’s relatively Speedy I’m getting maybe I don’t know six seven tokens a second you look it’s it’s it’s responding right now however fast that is um you know six seven tokens a second it needs to be a lot faster to really be convenient um I think that once we get this llama 3 level performance in terms of quality of the responses and AI model itself uh once we get that level of quality but a two or 300 token a second uh inference speed I think that these AIS are just going to fly and I think we’ll see AI agents just take off uh so that’s that’s just a little bit of my how I think this is all going to go but let’s look at this what it’s doing uh these are my two agents they’re talking back and forth to each other the prompt that I sent in this time was one of the most promising scientific advances in archive papers in regards to future of space travel and energy production I am interested in propulsion breakthroughs and miniaturization of energy production particularly so you can go here to the Windows server and see this was my first uh that was the previous request right here find archive papers on semiconductors and Computing uh oh not there here uh so that was the conversation I had there uh let’s see where’s the it’s going to be a little uh whatever uh in any case you can track it’s going back I may cut that out I spent a little bit too much time right there looking for this but it’s going back and forth uh explore new areas of physics Quantum Corrections Anda cral area of research for developing blah blah blah blah blah um oh here’s actually the I just found it uh here’s the the one I just sent in uh Energy Products uh production interested in whatever so you can see from there it goes uh from that was my user proxy that just basically fed that same request I had I believe straight to the primary assistant here uh and then the primary assistant used this skill called find archive papers which I believe came with uh autogen studio uh so this is remarkable uh for home scientists right or even universities get up your get your local archive uh Studio running and just give incredible access to your students right uh for their research but uh any anyway it gave it gave you a few here few few examples then it delves deeper and it tries to refine the answer so that’s what it’s doing right now is it’s refining uh it’s still going here I don’t know exactly how long it will go but we can see it will keep going here uh I Here’s the the oh no I’m sorry I did have to change one thing uh I had to go into the uh was it this into some base file and I I saw where autogen Studio I saw in the logs where it was it had a base URL oh is this it right here syn Ki client yes this is what I did I changed this this line right here um in the open AI site package because autogen Studios kind of built around open AI uh they’re both Microsoft well not products but uh you know open AI is is they Microsoft invests heavily in open Ai and uh and this uh sorry I lost my train of thought there uh this autogen studio it’s a Microsoft product too uh sorry uh autogen studio is is is open AI so it kind of makes sense that they kind of use each other uh or whatever so I I saw where it was setting the base URL and you could probably find a config file to do this with I was experimenting with that a little bit or trying to with this um I I couldn’t quite get it with config files but I I’m sure that if I continued to look around at it we could you know I could figure it out um and then in the in here it just you have your local models pointing to your local um your local model server I’m running llama 3 and it’s just working wonderfully uh you you can see I did come across one error uh I think but I don’t even remember what it was uh but it seemed like this request finished okay to me me uh and it has all of the history right here that’s one nice thing about this UI is this uh all of the history of the agents you can see right here user proxy primary assistant uh it’ll give you the history once it finishes responding so that’s really convenient uh you can see exactly how the AI agents made their decisions to give you the response that they gave you uh if this finishes at all soon I’ll show you the sign flow the sine wave one that I did earlier uh on here that that worked uh but let’s see if this is running still um well we’ll see I I’m probably going to cut the video here uh and just skip to once this is back uh once this gives me a response and we’ll see what the response is uh so yeah we’ll give that just a minute and I I’ll see you in a moment okay and we’re back uh they finished the agents you can see agents responded certainly here’s a brief summary of the paper recent advances in Quantum simulation with cold Adams from 2011 so 13 years ago kind of out of date uh did I SP specify recent advances uh I just said one the most promising scientific advances I should have said recent uh but I did say most promising so maybe that why this is older because it’s very promising and uh is holding true I don’t know in any case uh here’s the 12 messages it took 10 minutes to run this so that’s that’s you know we’ve got to get that that speed down this other one let’s see if we can just scroll to the top yeah you can see here 22 minutes it took this other one uh 12 messages this one took 10 minutes and 4 45 seconds so yeah that’s that takes a while um but yeah that’s uh the the reason this I think all works is because these contexts are passed with a string of uh of previous messages into the into the history right each you know each this I think was maybe the uh the response but uh you’re going to get a series of messages like this roll and content assistant or user proxy or whatever and um it helps it to track state and and also context right that’s that’s the context Windows uh this 8,000 context window for llama 3 uh that’s pretty low uh but there are there are some I I think that there are other versions coming out soon with like 128k I want to say somebody was releasing a version of llama 3 with 28k context window um those that will come in time and that will only make this better the only thing is that the larger your context window gets the longer it takes to the AI to process all of that input and generate an output that is Meaningful um so uh yeah I think I mean that’s that’s pretty dog gone impressive here let’s go to the sine wave one this was something I said earlier uh right I just clicked the sine wave button here which is a built-in thing and just popped up this message write a python script to plot a sign wave and save it to dis as a ping file sine wave.png uh and you get agents you’re welcome blah blah blah and here’s the results one file and you can see well hot dog that’s looks right this is all running locally it’s you can see the same workflow local workflow I don’t remember what this one was uh oh this was earlier um I was running this inside my uh V my code to and so it didn’t have quite the permissions I guess to execute things on my system so now I’m running it through Powershell and uh I just run autogen Studio UI Port 8081 I think is the command it’s very simple command uh and uh it’s the simplest command uh sorry I had to do that uh so yeah just you run a little command in here I did have to do something uh like export some key or key set or I don’t know something uh before I was able to execute scripts in the pshell but once I could do that I was able to activate my correct environment uh and run the the uh run this script with all of the Python libraries installed so that’s it uh I just wanted to show a little bit about this autogen studio uh running locally with LM Studio on llama 3 it’s amazing uh and by goly it’s going to knock our socks off so uh thanks for watching if you watch this far uh and see you again maybe next time