I wish every AI Engineer could watch this.
I wish every AI Engineer could watch this.
The Revolutionary Potential of Large Language Models (LLMs) in AI Engineering
Large Language Models (LLMs) are transforming the artificial intelligence landscape, delivering capabilities that are increasingly integral to various applications across industries. For AI engineers, understanding the potential and applications of LLMs is essential. This guide explores a structured framework to better comprehend the extent and utility of LLMs, introducing concepts that are reshaping industries from tech to healthcare.
Introduction to LLMs and Their Applications
LLMs are advanced AI systems capable of understanding and generating human-like text based on the vast amount of data they have been trained on. They function by predicting the next word in a sequence, making them powerful tools for a variety of language-based tasks. However, there’s a lot of mystique and misunderstanding about what LLMs can and cannot do and where they can be effectively applied. This guide aims to demystify LLMs and provide a structured approach to harnessing their capabilities through a multi-level framework.
Level 1: Question and Answering Engines
At the most basic level, LLMs can be utilized as question and answering engines. In this application, an LLM processes a user’s question and delivers an answer based on its training data. This setup is straightforward but forms the foundation for more complex interactions, such as chatbots or virtual assistants.
Level 2: Building Conversational Chatbots
Evolving from simple Q&A applications, LLMs can be adapted to support conversational chatbots. This involves not just responding to isolated queries but maintaining a context or a session of interaction. Leveraging techniques like in-context learning, where the model retains information over the course of a conversation, LLMs can provide more nuanced and contextually relevant responses.
Level 3: Integrating External Knowledge with Retrieval-Augmented Generation (RAG)
To make LLM applications more robust, external knowledge sources can be integrated using the retrieval-augmented generation approach. This involves enhancing the LLM’s responses with information retrieved from external databases or documents, allowing for responses that require up-to-date or domain-specific knowledge that the LLM’s original training data might not cover.
Level 4: Advanced Applications with Function Calling and Multi-Agent Systems
At a more advanced level, LLMs can be employed to perform function calls, allowing them to interact with other software systems, APIs, or tools to perform specific tasks. Furthermore, the concept of multi-agent systems involves deploying multiple LLM agents that can handle specialized tasks, working together to achieve complex goals, much like a team of experts each handling different aspects of a project.
Level 5: Towards an LLM Operating System
The pinnacle of LLM applications envisages an LLM-based operating system, where large language models act as the central processing unit that orchestrates a variety of tasks, manages data, interacts with users, and utilizes other AI models and tools. This level represents the integration of all previous capabilities into a single, unified system, offering unprecedented interaction and automation capabilities.
Implications and Future Directions
Understanding the different levels at which LLMs can operate allows AI engineers and practitioners to better utilize these models for specific applications. The potential of LLMs to transform operations across sectors is immense, emphasizing the importance of staying current with developments in this field. As LLM technology continues to evolve, so too will the opportunities for innovative applications and solutions.
In conclusion, every AI engineer and enthusiast should endeavor to understand and keep abreast of the advancements in the field of Large Language Models. Their transformative potential is vast, with the ability to redefine interactions between humans and machines. The journey through the five levels of LLM applications not only provides a roadmap for leveraging this technology but also stimulates the exploration of new frontiers in AI research and real-world applications.
[h3]Watch this video for the full details:[/h3]
π Timestamp π
00:00 Intro
00:02 Understanding the framework for using LLMs in various applications
02:15 Question answering with LLM
06:54 Chatbots need more than short-term memory for effective use.
09:13 LLM is central to leveraging prompt, short-term, and long-term memory
13:46 Importance of Context Window in Language Models
15:55 Implement retrieval augmented generation for chatbots
20:06 Leveraging LLM for NLP tasks
22:20 Function calling in AI models enables structured responses.
26:18 Understanding the concept of Agents in AI
28:30 Agents are the next Frontier in AI development.
32:24 AI developing with extended tools and memory capabilities.
π Links π
This video talks about gives a blueprint for LLM Apps. It talks about 5 different LLM levels framework for building AI Apps.
β€οΈ If you want to support the channel β€οΈ
Support here:
Patreon – https://www.patreon.com/1littlecoder/
Ko-Fi – https://ko-fi.com/1littlecoder
π§ Follow me on π§
Twitter – https://twitter.com/1littlecoder
Linkedin – https://www.linkedin.com/in/amrrs/
[h3]Transcript[/h3]
five levels of llm apps consider this to be a framework and help you decide where you can use llm there are lot of different myths around what llms can do what llms cannot do where do you use llms today so I decided to put together this material uh in which I’m going to take you through kind of like a mental framework based on the extension or the depth in which you go towards an LM you can decide where you can fit this llm so we’re going to first see what are those different levels of llms that I have put together then we are going to see slight extension of that got two different documents to take you through that so this will give you an idea about how LM is being used today and how you can use llms for your own applications to start with imagine this pyramid structure this is a very simple pyramid structure and as you can imagine with any pyramid structure the top of the pyramid or the peak of the pyramid is our aspirational goal and what you see at the bottom is the easiest that we can do and as with everything else you have to slowly climb to the top of the pyramid so you can probably hit the aspirational goal so to start with where do we use llms first Q and A a question and answering engine what do I mean by that it is quite simple for us to understand so question and answering engine is a system where you have an llm and all you are going to ask the llm is a question so you send a prompt and the llm takes the prompt and gives you an answer that is it that is the entire transaction that you have between an llm send a prompt get send it to the llm get an answer llm large language models are nothing but sophisticated next word prediction engines and they have been fine-tuned with something called instruction so the instruction fine tune models that means they can take a human instruction and get you an answer back for example if I ask a question for this what is the capital of India then the llm would process this and then llm has information about how to answer it and then it will give me the answer back the capital of India is New Delhi that’s all what you’re going to do with this thing so first level question and answering now you might wonder at this point that where can you use question and answering as an llm engine this is the first thing that people built like when llm started even back in the day gp22 level people started building simply Q&A bots so all you want to do is ask a question give an answer could be a homework could be a general knowledge question could be something about the world could be about science could be about anything ask a question get an answer as simple as that it’s a very three-step process ask a question or send a prompt take the llm to process it give me the answer back very simple application now what you’re going to do is you’re going to add something to that application and that is how you actually build a conversational chat bot and to understand this better I would like to take you to my second document which will give you probably better idea whenever we are talking about llm there’s one important thing that we need to understand is we have crossed the stage where llm is simply a large language model we have more than that so for you to understand that I have five dimensions a prompt a short-term memory an external knowledge tools and extended tools if you think of this as your horizontal these are your verticals these are different dimensions that can add to an LM so you have a prompt you have a short-term memory you have a long-term memory or external data you have tools and you have got extended tools so let me give you an example for each of this so that you can understand this better a prompt is what is the capital of New Delhi that’s all the prompts you simply go give what is the capital of New Delhi and the llm understands it and gives you a back understanding just gives it back now shortterm memory is when you have conversational history or something in the llm that is what we call as ICL in context learning so whatever you stuff inside the context window the llm can take it that is your shortterm memory so you give a few short examples you give an example like for example what is the capital of us uh I guess it’s Washington DC Washington DC and you give a bunch of examples like this so the llm knows what is that it has to answer this is a short-term memory next you have external data now you take data from Wikipedia and you keep it and then give it to the LM that is your long-term memory because short-term memory just like a computer the ram it gets reset every time you reset the conversation or the session and then tools you let llm use tools like calculator internet python terminal and all these things and extended tools is when you expand this much Beyond it I hope now you have understanding about the five different dimensions that we have in llms a prompt a shortterm memory or in context memory a long-term memory or external knowledge external data or custom knowledge tools like calculators and python Ripple and extended tools that goes much beyond that what we do not have currently so these are different dimensions now coming into what we wanted to see is chatbot so how do you make a Q&A bot as a chat bot is very simple now at this point you might have already got this idea so you take a prompt and you give it to the llm where you can have shortterm memory me in context memory in context learning for example so what is the capital of India so you what is the capital of India you ask and the llm answers New Delhi this is what happens in a simple q and a bot but how do you make it a conversational bot or a chat bot by adding a new dimension called shortterm memory and how do you do that you keep all these things that you are conversing into the chat conversational history so what this gives the ability for an llm to do is when you say what is the capital of India it says new D then you can just simply go and say what are some famous Cuisines uh there so at this point the llm would have an understanding you’re talking about New Delhi because that conversation is stored there in the lm’s shortterm memory or the in context memory so the llm can do something called I in context learning and give you the response back and that is how you upgrade in the pyramid by building a Q&A Bard giving a new dimension call history and then making the Q&A bot a chat bot so that it can converse now chat bot has applications everywhere that you can turn towards youve got chatbot in customer support you have got chatbot on websites you have got chatbot for Education like you’ve seen a lot of demos from Khan Academy so chatbot is quite versatile it almost has its purpose in every single business or domain that you can think of now people were using chatbot um but you know chatbot itself is not enough why we already know the answer to the question can you pause and answer if you know the answer so why is that chatbot is not enough uh for a lot of use cases the answer to the question is chatbot stops with only short-term memory you need long-term memory or you need external memory see for example I ask what is the capital of India it says new what are the famous quins there it will give me an answer quite valid llm is doing its job so let’s say I’m a I’m a company okay so I’m I’m an organization let’s take uh Apple for an example okay now I ask what who is the CEO of Apple of course the internet has information about it so it will say Tim Cook that’s quite easy now if I go say who is the manager of the team handling iPhone 16 will it answer no I mean it might answer because it hallucinates a lot but the answer would not be correct and that has become a big bottleneck in a lot of Enterprise use cases because you do not just need internet knowledge you do not just need the knowledge that the llm has got you need more than that and that is the custom knowledge component or the external knowledge component that you need the dimension that you need to make your llm slightly more than just a chatbot and that is where a new technique called rag comes into picture retrieval augmented generation where you use the knowledge that you provide or you call it a long-term memory you use the documents the internet the sources everything that you have around and you use that knowledge to send to route to llm and then make the llm use the leverage that knowledge and now at this point probably you might have guessed it see first we had only prompt one dimension second we had shortterm memory two Dimension now we have external knowledge which is three dimension so this llm is at the center of three different things you have got prompt you have got um short-term memory and you have got long-term memory to make you understand this better uh so I’m going to take you to the rag so how does a rag look like so you have got the llm at the center of it you have got your data somewhere available so it could be on different structures it could be on database most organizations have data in their database structure database rdbms database then you have got documents which are unstructured like PDF HTML files internal portals blah blah blah blah blah then you have got apas let’s say you are a sales team uh probably your data is in some CRM or Salesforce right so you need a programmatic call to make the call and get the answer back so your data could be of these different places could be like structured database like rdbms system it could be unstructured documents uh PDFs uh HTML documents anything that you have locally and then you have got programmatic access like you’re a marketing team you need data from Google ads you a sales team you need data from Salesforce you are your company is heavily into it so you need data from AWS like billing cost and all other things so this is programmatic so you use one of these methods a structured passing or unstructured passing a programmatic call and take all the input data and create an index an index is what Google creates at every single moment you have got all these websites what Google does is Google creates this index so it is easy easier for Google to go Travers when somebody’s asking a question and that’s how Google became popular before Google people were using totally different thing Google came up with something called page rank algorithm at the fundamental of page rank algorithm you have got this index with the different parameters of course and definitely we’re not building Google but so index is what we are building it makes it easier for you to understand what is inside the data so now a user comes in asks a question what is a question who is the manager of iPhone 16 team so so that question goes to the index the in this this system particular system takes that and picks only the relevant information see this index might have information about all the teams iPhone 16 Apple Vision Pro billing accounting procurement marketing blah blah blah blah blah so it has all the information what you are interested in is only this particular piece which is what you asked which is iPhone 16 manager so it this particular part is where it takes only the relevant information from the index and then it matches with the query uh The Prompt that you give and then it finally gives you sends it to the llm The Prompt what you asked and the data that you extracted and it goes to the llm llm gives the answer back to the user this is quite different from the chatbot application if you see I’ll give you an example why so in the chat bot all you are doing is you have a memory question is there sometimes you might do uh let’s say a long-term memory by doing user profiling I’ll I’ll ignore this for now you don’t have to use this now ignore this for now so what you’re doing is you have a question you’re sending it as a prompt and you have memory that also goes to the prompt because that’s how you can do it and you have llm answering this question and you get the answer back now you might ask me hey why do I need to put my thing in the external data and create an index rather why can’t I keep it in memory if you have got this question at this point that is a very important question and you are thinking in the right direction in fact people who reached at this point you can tell me whether you know the answer or not the reason why we cannot do this uh or we could not have done it early in these days of alms is due to an important factor called CTX window what is CTX window CTX window is nothing but called context window this internal memory and question or the short-term memory and the question is bounded by what is the context window of this particular l so you have an llm the llm might have context window like 4K which is quite popular these days or 8K and even G like LMS have like 1 million as context window so context window is there now what you are actually doing here is you have a question the llm answers so you have a question one right and answer one comes back then you have a question two then you have answer two by the time you go to question three what you are sending to the llm is not just your question 3 you are actually sending all these things right so let’s say this is 2K this is 1K answer then again 2K question 1K answer and let’s say this is a 2K question so at the end of the day when you are hitting the third level of conversation I’m kind of exaggerating but let’s say 2 + 3 uh 2 + 1 3 3 6 8 so you already hit 8K so conversation context window so if you have got 8K token model at this point your model will hit out of memory error or it cannot hold it in shortterm memory and that is exactly why you need rag ritual augmented generation because this one is not bound by the conversation of course you are going to keep it in conversation but you don’t have to stuff everything inside your question rather you can keep it inside your index right because you already indexed and you can keep it and only the bit that is relevant comes to you and now you might be asking how is that possible and for that you know you go into like a separate tangential side that talks about semantics and uh semantic search and all the other things embedding semantic search that is quite out of scope uh if you want to go deep you should read rag llama index is an excellent library for you to read about rag uh they have got really good developer relation system uh they have got a lot of Articles uh and you should definitely read about llama index and rag if you want Advanced rag but I hope you get the point going back to our system that we put together so what do we have we have a Q and A system at the front which just takes an input gives an output nothing else then you have got the chatbot the input plus history goes together that is always short-term memory you get the output the output also goes back to the input that’s why you keep the conversation history then you have got a rag retrieval augmented generation the reason why it is called retrial augmented generation is because you have got a retrieval component that you augment with the llm component and then you generate the response back so that is retrial augmented generation and the applications are enormous there are a lot of startups in in 2024 when we are recording this lot of startups just doing rag so if you can build a rag solution today in 2024 you can probably even raise F or you can be a good successful SAS there are a lot of companies making really good money solid money out of it I’ll give you an example in fact like one thing that I’ve seen site gp. if you go to site gp. it says make eii your customer export Export customer support agent and I know this is this is a product that is making a lot of money um hundreds and thousands of dollars and at the foundation of it it is a rag it takes all the information that is available in your website indexes it or we call it data injection injection and index is set and when you ask a question it just gives you an answer back that’s it it’s not just a normal chatbot it is a chatbot that can answer based on the existing data so if you are breaking into llm today I would strongly encourage you to do some rag system that is by default something that you should do so if you’re University student watching this if you’re an early in career professional I would say you should build a couple of rag examples so you know there are a lot of no aners in rag like how do you improve indexing how do you improve indexing by changing chunking what kind of algorithms you use for embedding and what kind of models are good with rag whether you put the text at the top is it good whether you put the text at the bottom is it good good if the text is in the middle it is good a lot of components to rag rag is not just simply what we discuss usually on this channel you can go Advanced Rag and I would strongly encourage you to spend some time in drag unless you want to get into something that is quite exciting and interesting but before we do that I would like to quickly show you one more thing that not a lot of people discuss when we talk about llms it is not necessarily rag it is just like using short-term memory so it doesn’t use long-term memory but it has its own potential which is to use llms large language models for classical NLP task classical NLP Downstream tasks for example let’s say you want to build a text classification system what is a text classification system you give a sentence for example uh the movie was complete crap now is it positive or negative positive or negative you choose you build you train a text class classification model just to figure out this for example or the other example I can give is you have a review let’s say the movie was amazing and the actress um was exceptional now you try to build a model that will say what kind of review is this for example is this review about movie um theater or director or actor so now you know this is an actor based so this is what Tex classification in classical nlps there are lot of other tasks that you do in classical NLP what you can do is without having to build your custom model like let’s say bird based model XG boost based models you can use llms large language models for classical NLP problems because large language models have really good in context learning and with the current memory that you have got with a few short examples or tree tree of thoughts or a chain of thoughts you can make your large language models a good zero NLP classifier or it is applicable for lot of other tasks as well so one thing that not a lot of people are exploring I would encourage you to explore if you work in classical NLP problems like labeling or a text classification entity recognition whatever it is you can leverage llm now the question is do you want an llm based Solution that’s a different topic I’m not talking about you know you looking for a nail because you have got a hammer I’m just saying that this is a good opportunity wherever you don’t want to build models you can use this but of course if you can build models that will be probably cheaper than you know making calls to llms and getting answer back but summarization text classification entity recognition I think llms are exceptional zero short llm uh and down for Downstream tasks and you should definitely leverage it now now with this we have arrived at rag okay so we have arrived at Rag and we already know what is rag now we are entering into a very interesting phase about what everybody is obsessed with what everybody’s love agents very recent announcements from Google Microsoft previously open Ai and every announcement you would have seen two important things as a common Trend one is you would have seen multimodality multimodality what does it mean it just simply means instead of just chatting with text you can chat with images you can ask questions in voice it can respond back in speech you can send videos so one important Trend that you are seeing is multimodality and the second important Trend that you see everywhere is Agents multi-agent setup where you have got multiple agents you can summon these agents to do certain tasks and these agents will do it for you just like men in black MIB they have a purpose and they will do certain tasks but before I jump into agents I want to actually introduce you to another important concept called function calling function calling is the precursor to llm agents in function calling what you do is you have a short-term prompt you you have prompt you have short-term memory uh sometimes you need external memory sometimes you don’t need external memory but you are giving the ability of calling external tools and you are giving the ability of calling external tools by doing something called function calling function calling to be honest is a terrible name cuz you’re not calling any function here you’re you’re not making the llm call anything not at all all you are doing is you’re forcing the llm to give a structured response back so you can call and I’ll give you an example what is function calling so let’s say that you have a weather API okay weather I think everybody goes with weather AP so I’m going to I’m going to skip uh let’s say you have a currency converter okay currency converter what kind of things a currency converter need okay you need input currency you need output currency you need date you need amount technically these are the four things you need what is the amount that you want to convert for what is the input currency what is the output currency and what is the date for which you want to do currency conversion let’s keep as a simple APA now typically when you go to an llm okay and say what is USD to irr today first of all llm may not understand what is today llm might know USD llm might know INR but lm’s memory is frozen because it’s a snapshot see a large language model is a snapshot so it memory has been frozen to let’s say September 2023 or something like that okay so what it cannot do is it cannot give you the latest information and you cannot do this with I mean you can do this with rag kind of like you can every day take knowledge ingest keep it in your memory and then give it back not very efficient um expand this to stock market a daily data doesn’t matter because everything changes like every minute and every second so you need something instant what you do you call an API if you are a programmer that’s what you would naturally do you call an API now if you want to call an API uh what you need to call an API so let’s say this is the information what I need at the end of the day I want to call it currency converter right and I’ll say input output date amount right I need to make a call like this so I need four arguments that is a solid input could not be like oh United States dollar and some other time I’ll be like USD some other time I’ll be like US dollar I mean that will not work right you need a specific format for everything your let’s say amount should be an integer right a this should be a date object so you need to force this llm to give you a particular response back otherwise what happens is this LM will throw you anything for example what I want to say is what is USD and I so it’ll be like oh USD and I Soo on September 2023 so you have to force guide the llm to make a particular type of output and somehow universally everybody has accepted that format is going to be Json EX except anthropic which absolutely loves XML so if you use anthropic you use XML if you use any other model you use Json so you’re forcing an llm to give you a structured response back a Json that can help you make this function call you can call this function with that Json so a guided response into a Json is what everybody calls function calling you don’t necessarily call the function and function calling but you get the output that will help you call function call right clear now that is exactly what is a precursor to agent because in a function call you have the ability to call a function and agents are nothing but a bunch of function calls stitched with tools so what do we have in agents we have a bunch of function calls plus tools and I would like to introduce to you a very interesting solution that can help you understand more about a agents if you are too old in the AI world you would have probably recognized this immediately and this was the workflow of something called Baby AGI so baby AGI was quite a popular thing back in the day I mean back in the days like less than one year before I guess or maybe more than one year a function call is what I said is the foundation of Agents but what is an agent now if you have seen our pyramid you would know know our agent sits right at the top like closer to what we our aspirational figure is now what is this agent how do you define an agent so it’s simple first of all a chatbot and a rag all of these guys if you see here they end a text or you know some kind of thing like input output images video all these things right that’s where they in one of these modalities they’re done what you achieve with agent is something that is absolutely stunning you don’t stop at text response you stop at an action you trigger an action and that is what agents are simply you take llm you connect them with tool you give them a purpose or goal that is your agent and that is exactly what baby AG has done back in the day like there are multiple agents now but if you see baby a which is a very wonderful framework you can see that there is a task like there is something that has to happen there are certain tools like for example Vector DB and all the other things are there and every agent has a purpose like okay you have to execute you have to return you have to do something you have to do something and they have a goal so you have tools purpose SL goals and llms and this all together work for a common goal and that is your agent there are multiple agent Frameworks that are quite popular these days is crew AI L graph you have got a py autogen and most of these things you will see first you have to define a role you have to refine a goal Define a goal a role goal and then you have to save which llm that you want to use as a backend engine and then you put together a system of one this is single agent now you put together like this is a team that is your multi-agent setup with agents people are doing amazing things you can make make an agent book your ticket you can make an agent let’s say read something um distill something create a note publish the blog post you can summon these agents to do a lot of things and personally for me uh the most time that I spent reading about agents because you it’s it’s becoming quite obvious that agents are the next Frontier in uh the way we can take llms forward I mean there are a lot of different things but at least personally I’m quite interested in automation usually and I think agents are going to be the next big thing in I mean currently itself is a big thing Google has got Google’s own projects like they call their own agents I don’t know what they call they have a lot of different names opena has its own agents and uh every time you talk to some company you speak about agents because you want to summon these agents you want to connect these llms to like different dimension and on this Dimension that what we are connecting is the tools Dimension so you take llms you have the function calling ability and once you connect them to to tools you are unlocking the potential of something immense and that is what you call as agents I’m not going deep into agents because this is probably I’m hoping it to be a series depending upon how you all like it but in the series my next focus is going to be agents so agent is quite closer to the top and that takes us to the almost the end of the video which is what is our aspirational thing what is that we are all trying to go towards to which is L LM OS and this is inspired by Andre kPa who created this amazing structure so what is happening here this talks about using llm at the center of a conversation or sorry center of an operating system if you go back in the day computer was created just for simple calculation purpose right you want to add a and you want to add a and b you want to keep a for one and B for two and then you want to add them that’s that’s what like initially computer was started like very very very back back in the days then computation started increasing computation started becoming less expensive more compute then we have the computer that we have today and garpa is arguing can we have a similar vision for llm and where the vision is you keep llm at the center right you keep llm at the center and at the center with llm you have Ram which is the shortterm memory or the context window then you have long-term memory the diss system that can be used with rag then you have the agent structure that you have with tools and then you connect it with internet and when you connect it with other llms to have like a multi-agent setup or like a peripheral setup and then you have your peripheral devices where you have got audio and video can we put together a system with all these things working towards a common goal and that will ideally become your large language model operating system this is quite a vision at this point there are certain implementations available at this point those implementations are based on current understanding they are mostly let’s say llms plus function calling plus agents multi-agent more tools that is what the current llm OES it’s not like a radically has a different total View altoe and that’s why if you see even in my framework that I’ve created llm o is currently developing and it is everything that we have got the tools the extended tools the peripheral tools with long-term memory with shortterm memory just one input from the user where it can run itself and then it can execute certain things I think that is a future that we are heading I’m not sure when we are going to do it but uh if somebody says something a for me today a could be like this could be like the baby a I mean I don’t I don’t I don’t trust a as a concept anytime soon but um yeah leaving the conscious thing Consciousness and all the other things out I would say llm o is at the top where we can expect something closer to a happen and all these things lead us up to there so I wanted to keep this video brief but uh this video is already going to be like more than half an hour I wanted this to be like a crash course where you understand if you don’t know anything about llm OS uh maybe you have not taken any course so this is going to help you to see how the future of llm O is coming and what led us up to there and uh let me know in the comment section if you like this kind of content I’ll put together more this took me a lot of time to create the framework design put it um in a particular thought process to you know make it make it understandable and this is basically what a lot of llm courses offer so I’m I’m definitely looking forward to hear more feedback and if you like this kind of format subscribe to the channel see you in another video Happy prompting