The term AI-agent has been used since the early days of AI in the 1950s, defined as an autonomous entity capable of perceiving its environment, making decisions, and taking actions to achieve specific goals. AI-agents of various flavors have existed for years; however, the recent emergence of LLMs as powerful reasoning engines has enabled a new class of AI-agents that are incredibly powerful. They can act as autonomous workers in enterprises, operating with independence and accountability much like a human, allowing human workers to focus on more strategic and challenging tasks.
AI-agents require a layer of software above the LLM (I’ll call it an Agent Operating System or Agent OS) to orchestrate their activities. An LLM by itself is simply a word prediction engine that takes text in and spits text out. It doesn’t have memory or the ability to take actions. The Agent OS, while making use of the LLM for decision-making, enables something much more powerful: teams of autonomous entities that have the ability to operate over long periods of time, make observations about their environments, create and executing action plans, remember pertinent information, self-reflect, and course-correct when necessary based on what is happening.
The core construct of the Agent OS is a mechanism whereby agents loop through these actions:
– Perceive
– Decide
– Act
– Self-reflect
Perceiving involves sensing its environment through various means, including processing text, audio, images, and other forms of data. The Agent OS ensures the AI-agent maintains an understanding of its context, adapts to new information, and keeps track of past interactions, allowing it to build a coherent picture of the situation it’s dealing with.
Deciding is the process where the AI-agent leverages the LLM to evaluate possible actions based on its goals and the information it has gathered. This decision-making process involves complex reasoning and prediction, akin to how humans use their knowledge and experience to choose the best course of action. The LLM provides the AI-agent with the ability to understand language, generate responses, and decide on actions based on nuanced information.
Acting is the execution phase where the AI-agent carries out the decisions it has made. This can include communicating with other AI-agents, communicating with humans through text, email, or voice, making changes to databases, interacting with SaaS platforms, and even controlling physical devices. The Agent OS facilitates this by providing interfaces for the AI-agent to interact with other systems.
Self-reflecting and course-correcting distinguish advanced AI-agents from simple automated systems. The Agent OS equips the AI-agent with mechanisms to evaluate the outcomes of its actions, learn from successes and failures, and adjust its strategies accordingly. This continuous feedback loop ensures that the AI-agent improves over time, becoming more effective and reliable in achieving its goals.
AI-agents represent an essential element of AI transformation, enabling a new type of entity within the enterprise. By combining the power of LLMs with sophisticated Agent OS frameworks, these AI-agents can perform a wide range of tasks with a high degree of independence and accountability.