Agent
Last reviewed
May 30, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v8 · 5,834 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 30, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v8 · 5,834 words
Add missing citations, update stale details, or suggest a clearer explanation.
This article covers the foundational, conceptual, and historical idea of an agent in AI and computer science (the rational agent, PEAS, classical architectures, reinforcement learning, and pre-LLM milestones). For contemporary LLM agents in practice, see AI agents; for the 2023 to 2026 "agentic AI" paradigm and discourse, see Agentic AI.
See also: AI agents, Agentic AI, Multi-agent system, Machine learning terms
In artificial intelligence (AI), an agent is an entity that perceives its environment through sensors and acts upon that environment through actuators in pursuit of objectives. The concept is the central organising abstraction of the field. Stuart Russell and Peter Norvig open Chapter 2 of Artificial Intelligence: A Modern Approach with the line that an agent is "anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators," and frame the entire study of AI as "the study and design of rational agents."[1]
This article focuses on the foundational concept of an agent as it has developed across more than five decades of AI research: from the actor model and early planning systems of the 1970s, through the reactive and BDI architectures of the 1980s and 1990s, into reinforcement learning, classical game-playing agents, and finally the modern LLM-driven systems of the 2020s. For the contemporary product and engineering landscape (Operator, Claude Code, Devin, frameworks, benchmarks, market data), see the companion articles AI agents and Agentic AI.
Imagine you have a robot friend who can look around a room, think about what to do, and then do it. If the robot sees that the floor is dirty, it decides to vacuum. If it bumps into a chair, it turns and goes another way. That robot is an "agent" because it can sense things (see the dirty floor), think about them (decide to vacuum), and take action (start cleaning). AI agents work the same way, but most live inside computers. Some play chess, some answer questions, some help drive cars, and the newest ones can read websites and write computer code for you. They all share the same simple recipe: sense, think, act, repeat.
Russell and Norvig define an agent abstractly through its agent function, a mathematical mapping from every possible percept sequence to an action:
f : P -> A*
where P* is the set of all possible percept sequences and A is the set of actions available to the agent. The agent function is the abstract mathematical description; the agent program is the concrete implementation that runs on a physical or virtual platform (the agent architecture). A rational agent is one that, for each possible percept sequence, selects the action expected to maximise its performance measure, given the evidence the percept sequence provides and whatever built-in knowledge it possesses.[1]
Russell and Norvig are careful to distinguish rationality from omniscience and from perfection. Rationality maximises expected performance given what the agent can know; perfection would require knowing the actual outcome of every action in advance, which is generally impossible.[1]
Four ingredients fully specify an agent: the percepts it can receive, the actions it can take, the goals or performance measure it tries to optimise, and the environment in which it operates. Russell and Norvig package these into the PEAS framework (Performance measure, Environment, Actuators, Sensors), which they use to specify the task environment of any agent before discussing its internal design.[1]
| PEAS element | Question it answers | Self-driving taxi example |
|---|---|---|
| Performance measure | What counts as success? | Safe, fast, legal, comfortable trips; profits |
| Environment | What does the agent live in? | Roads, traffic, pedestrians, customers, weather |
| Actuators | What can the agent change? | Steering, accelerator, brake, signal, horn, display |
| Sensors | What can the agent perceive? | Cameras, lidar, GPS, speedometer, accelerometer, microphone |
A second taxonomy classifies the task environment itself along several axes that determine how hard the design problem is.[1] An environment is fully observable if sensors give access to the complete state and partially observable otherwise; deterministic if the next state is fully determined by the current state and action, stochastic otherwise; episodic if each action stands alone, sequential if actions have lasting consequences; static or dynamic; discrete or continuous; single-agent or multi-agent; and known or unknown (whether the agent has access to the rules). A chess agent operates in a fully observable, deterministic, sequential, static, discrete, multi-agent, known environment. A self-driving taxi lives in the hardest possible setting along nearly every axis.
The interaction between an agent and its environment follows a cyclical pattern that is especially well formalised in reinforcement learning. At each discrete time step t, the agent:
This cycle repeats until a terminal condition is met or the process continues indefinitely. When the environment satisfies the Markov property (future states depend only on the current state and action, not on history), the framework is called a Markov Decision Process (MDP).[2]
| Component | Symbol | Description |
|---|---|---|
| State | s_t | Representation of the environment at time t |
| Action | a_t | Choice made by the agent |
| Reward | r_{t+1} | Scalar feedback signal from the environment |
| Policy | pi(s) | Mapping from states to actions |
| Value function | V(s) | Expected cumulative reward from state s |
| Action-value function | Q(s, a) | Expected cumulative reward from taking action a in state s |
| Transition function | T(s, a, s') | Probability of moving to state s' after taking action a in state s |
| Discount factor | gamma | Weight on future rewards, typically 0.9 to 0.99 |
The expected return from a state under policy pi is V_pi(s) = E[ sum_t gamma^t r_t | s_0 = s, pi ]. Optimal control reduces to finding the policy that maximises this quantity. The same loop describes a thermostat regulating a room, a chess engine choosing moves, a robot arm placing a chip on a circuit board, and an LLM calling tools in a browser. The differences lie in the action space, the observation space, and how the policy is computed, not in the abstraction.
Russell and Norvig's textbook, now in its fourth edition (Pearson, 2021), classifies agents into five types based on internal structure and level of sophistication. Each successive type builds on the capabilities of the previous one. This framing has structured AI courses for nearly thirty years and remains the canonical introduction to the agent concept.[1]
Simple reflex agents select actions based solely on the current percept, ignoring the entire percept history. They operate through condition-action rules ("if the car ahead is braking, then apply brakes"). They work well in fully observable environments but fail when the environment is partially observable because they have no memory of past events. A household thermostat is a classic example: it turns on heating when the temperature drops below a threshold and turns it off when the threshold is exceeded. A spam filter that only inspects the current message and applies fixed rules is another. Simple reflex agents fail catastrophically in any setting where the right action depends on context the current percept does not reveal.
Model-based reflex agents maintain an internal model of the world that tracks aspects of the environment not directly visible at any moment. This internal state is updated after each action and percept using two kinds of knowledge: how the world evolves independently of the agent, and how the agent's own actions affect the world. By maintaining this model, the agent can handle partially observable environments far more effectively than a pure reflex agent. A self-driving car that remembers a pedestrian who briefly stepped behind a parked truck is a model-based reflex agent. The internal model can be as simple as a flag ("the lights are on") or as elaborate as a 3D occupancy grid of the surrounding street.
Goal-based agents extend model-based agents by incorporating explicit goal information that describes desirable states. Rather than just reacting, these agents use search and planning algorithms to identify sequences of actions that will achieve their goals. This makes them flexible: when the environment or goals change, the agent can recompute its plan rather than requiring a complete rewrite of its condition-action rules. A robot vacuum that plans an efficient path through a room is a goal-based agent. So is a route planner that searches a graph of intersections to compute the shortest path to a destination, and a STRIPS planner that chains preconditions and effects to assemble a plan.
Utility-based agents go further by employing a utility function that maps each state (or sequence of states) to a real number representing how desirable that state is. While goal-based agents have a binary notion of success and failure, utility-based agents can compare multiple outcomes on a continuous scale. This is especially important when there are conflicting goals ("arrive on time" vs. "avoid bumpy roads"), when goals can be achieved to different degrees, or when there is uncertainty about outcomes. A rational utility-based agent selects the action that maximises expected utility, weighing probabilities and desirability of potential outcomes. A financial trading agent that balances expected return against variance is a utility-based agent.
Learning agents improve their performance over time through experience. They consist of four conceptual components in Russell and Norvig's diagram: a learning element that makes improvements based on feedback, a performance element that selects actions, a critic that evaluates how well the agent is doing relative to a fixed performance standard, and a problem generator that suggests exploratory actions to discover new experiences. Nearly all sophisticated AI systems today are learning agents in some form. AlphaGo learned its policy from millions of self-play games; an LLM-based coding agent learns implicitly from the gradient updates that produced its base model and then explicitly from examples in its prompt.
| Agent type | Internal state | Planning | Learning | Example |
|---|---|---|---|---|
| Simple reflex | None | No | No | Thermostat |
| Model-based reflex | World model | No | No | Spam filter with context state |
| Goal-based | World model + goals | Yes | No | Route planner, STRIPS |
| Utility-based | World model + utility function | Yes | No | Financial trading agent |
| Learning | All of the above + learning element | Yes | Yes | Self-driving car, AlphaGo |
This taxonomy is conceptual rather than architectural. A modern coding agent like Devin blends elements of all five: it reacts to immediate test failures, maintains a model of the codebase, decomposes goals into subtasks, weighs alternative implementations against a quality measure, and updates its plan as new information arrives.
Russell and Norvig's five types describe what agents do conceptually. A parallel research literature, mostly developed between the mid-1980s and late 1990s, addresses how to actually build them. Three architectural families dominated that period and still shape modern systems.
The earliest AI agents were deliberative, in the tradition Brooks would later label "sense-model-plan-act." STRIPS (1971), the planner inside the SRI robot Shakey, took a symbolic model of the world plus a goal and produced a sequence of actions whose preconditions and effects led from the initial state to the goal. Successors such as PARTIAL-ORDER PLANNING, NOAH, and Graphplan refined the search but kept the same shape. Deliberative agents are powerful when the world model is accurate and tractable but brittle when the environment changes faster than the planner can replan.
In 1986, Rodney Brooks at the MIT AI Lab introduced the subsumption architecture, presented in the paper "A Robust Layered Control System for a Mobile Robot."[3] Brooks rejected the central role of symbolic world models for mobile robotics, arguing that "the world is its own best model." Instead of a single planner, a subsumption robot is built from layered behaviours, each of which couples sensors directly to actuators. Lower layers (avoid obstacles, wander) run continuously; higher layers (explore, identify objects) can subsume lower ones by suppressing their outputs when needed. The approach produced robust real-time robots in unstructured environments (Allen, Herbert, Genghis) and launched behaviour-based robotics. Brooks's slogan "intelligence without representation" (1991) became a manifesto for purely reactive agents.
Purely reactive agents have no memory and no model; they cannot reason about distant goals. The trade-off motivated hybrid architectures, such as TouringMachines (Ferguson, 1992) and InteRRaP (Muller, 1996), which layer a deliberative planner on top of a reactive component.
The most influential symbolic agent architecture is the belief-desire-intention (BDI) model, originally a philosophical theory of human practical reasoning proposed by Michael Bratman in Intention, Plans, and Practical Reason (Harvard University Press, 1987).[4] Bratman argued that intentions cannot be reduced to combinations of belief and desire; they are a third kind of mental state that commits an agent to future action and stabilises planning over time.
Anand Rao and Michael Georgeff translated Bratman's framework into a computational architecture in a series of papers culminating in "BDI Agents: From Theory to Practice" at ICMAS 1995.[5] In a BDI agent:
The deliberation cycle of a BDI agent updates beliefs from percepts, chooses which desires to adopt as intentions, selects a plan to achieve those intentions, and executes the plan while monitoring for events that would require reconsideration.
The Procedural Reasoning System (PRS), developed by Georgeff and Amy Lansky at SRI International in the mid 1980s, is generally recognised as the first implementation of the BDI model.[6] PRS was used in a fault detection system for the reaction control system of the Space Shuttle Discovery, in factory process control, and in air traffic management. It inspired a family of BDI agent programming languages, including AgentSpeak(L) (Rao, 1996) and its open source implementation Jason (Bordini and Hubner), as well as Jadex and Jack Intelligent Agents.[7]
| Architecture family | Year | Key contributors | Idea |
|---|---|---|---|
| Symbolic planning (STRIPS) | 1971 | Fikes, Nilsson (SRI) | Search over action sequences with preconditions and effects |
| Actor model | 1973 | Hewitt, Bishop, Steiger (MIT) | Concurrent objects communicating by message passing |
| Blackboard systems (HEARSAY-II) | 1980 | Erman et al. (CMU) | Cooperating knowledge sources sharing a workspace |
| Subsumption | 1986 | Brooks (MIT) | Layered reactive behaviours, no central world model |
| BDI | 1987 to 1995 | Bratman, Rao, Georgeff (SRI) | Beliefs, desires, intentions; PRS, AgentSpeak |
| Soar | 1987 | Laird, Newell, Rosenbloom | Unified cognitive architecture with chunking |
| Hybrid layered (TouringMachines, InteRRaP) | 1992 to 1996 | Ferguson, Muller | Reactive plus deliberative layers |
| Agent-oriented programming (AOP) | 1993 | Shoham (Stanford) | Programming paradigm in mentalistic terms[8] |
Carl Hewitt, Peter Bishop, and Richard Steiger introduced the actor model in their 1973 IJCAI paper "A Universal Modular ACTOR Formalism for Artificial Intelligence."[9] Actors are concurrent computational entities that respond to messages by sending new messages, creating new actors, and changing their own future behaviour. The actor model is a direct ancestor of every modern message-passing concurrency framework and of agent communication generally.
Yoav Shoham's "Agent-Oriented Programming" (Stanford, 1990; revised 1993) proposed that the natural way to specify an agent is in mentalistic terms (belief, commitment, choice, ability) and that a programming language should expose these primitives directly. His AGENT0 language, although a research prototype, framed the conceptual goal of agent-oriented programming and is widely cited.[8]
Michael Wooldridge and Nicholas Jennings's survey "Intelligent Agents: Theory and Practice" (The Knowledge Engineering Review, 1995) gave the most widely cited operational definition of an intelligent agent. They argued that an agent is autonomous if it can act without direct human intervention and has control over its own state, and that an intelligent agent is autonomous plus three further properties:[10]
| Property | Definition |
|---|---|
| Autonomy | Operates without direct human intervention; controls its own state and behaviour |
| Reactivity | Perceives its environment and responds in a timely manner to changes |
| Proactiveness | Exhibits goal-directed behaviour by taking the initiative, not just responding |
| Social ability | Interacts with other agents (or humans) through some agent communication language |
The Wooldridge and Jennings list became the de facto definition of an intelligent agent in the multi-agent systems community and is repeated in essentially every textbook on the subject. It also clarified the field's vocabulary: a thermostat is autonomous and reactive but not proactive or social, while a deliberative robot may be proactive but not yet social, and so on.
A parallel definitional thread comes from MIT. Pattie Maes coined the term "intelligent agent" in her doctoral and post-doctoral work in the late 1980s and built one of the most influential research groups on interface agents at the MIT Media Lab from 1991 onward.[11] Her group's work on learning interface agents, recommendation systems (HOMR and Ringo, 1994), and the metaphor of a "personal assistant who collaborates with the user" anticipates many features of modern LLM agents by three decades.
Michael Genesereth and Steven Ketchpel's 1994 Communications of the ACM article "Software Agents" introduced Agent Communication Languages (ACLs) to the wider software community and influenced the standardisation efforts that followed.
When several agents share an environment, the design problem becomes the design of a multi-agent system (MAS). Multi-agent research is closely tied to distributed AI, game theory, and economics. Yoav Shoham and Kevin Leyton-Brown's textbook Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations (Cambridge, 2009) gives the canonical modern treatment.
Three threads dominate the MAS literature:
The first agent communication language was KQML (Knowledge Query and Manipulation Language), developed within the DARPA Knowledge Sharing Effort starting in 1990. KQML introduced the idea of performatives: speech-act-style message types such as ask, tell, subscribe, and achieve that name the intent of a message rather than just its content.[12]
In 1996 the Foundation for Intelligent Physical Agents (FIPA) was established as a non-profit standards body to produce interoperability specifications for agent-based systems.[13] Its FIPA-ACL specification of 1998 became the dominant successor to KQML. FIPA-ACL formalised over 20 performatives (inform, request, propose, confirm, agree, etc.) and gave them a formal semantics grounded in a logic of belief, desire, and intention. It also specified interaction protocols (contract-net, request, query), agent management, and message transport. FIPA was incorporated into the IEEE Computer Society in 2005. Reference platforms such as JADE (Java Agent Development Framework) implement the full FIPA stack and remain in industrial use.
The International Conference on Autonomous Agents and Multiagent Systems (AAMAS), the field's flagship venue, was created in 2002 in Bologna, Italy by merging three predecessor conferences: the International Conference on Autonomous Agents (AA), the International Conference on Multiagent Systems (ICMAS), and the International Workshop on Agent Theories, Architectures, and Languages (ATAL). AAMAS is run by the non-profit International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS) and has met annually since.[14]
In reinforcement learning (RL), the agent is the central learning entity. Unlike supervised learning where correct answers are provided, an RL agent must discover which actions yield the highest reward through trial and error. The agent interacts with its environment over many episodes, gradually improving its policy.
RL agents are broadly categorised as model-based or model-free. Model-based agents build an internal model of the environment's transition dynamics and use it for planning. Model-free agents learn directly from experience without constructing an explicit environment model. Model-free approaches are often simpler to implement but may require more training data; model-based methods can be more sample efficient but rely on the accuracy of the learned model.[2]
Key RL algorithms for training agents include:
DQN's success in 2013 marked the start of the deep RL era and made the agent abstraction concrete in a way classical control theory had not. The same conceptual loop, now powered by neural function approximators, would later produce AlphaGo, AlphaStar, OpenAI Five, and many of the simulation results that influenced modern LLM training.
Game playing has been one of the most visible domains for AI agents, producing landmark achievements that demonstrated the power of different agent architectures.
Deep Blue (IBM, 1997) defeated chess world champion Garry Kasparov using brute-force search with hand-crafted evaluation functions. While not a learning agent, Deep Blue showcased the power of combining search algorithms with domain expertise and dedicated hardware.[16]
AlphaGo (DeepMind, 2015 to 2017) became the first computer program to defeat a professional human Go player without handicap. AlphaGo combined deep neural networks with Monte Carlo tree search and was trained through a combination of supervised learning on human games and reinforcement learning through self-play. Its successor, AlphaZero, learned to play Go, chess, and shogi entirely through self-play with no human game data, achieving superhuman performance in all three games within twenty four hours of training.[17]
OpenAI Five (2019) defeated the world champions of Dota 2, a complex five-on-five multiplayer video game. The system used a team of five neural network agents trained with PPO over the equivalent of 45,000 years of gameplay, winning 99.4% of its public games and demonstrating that RL agents could master highly complex, partially observable, multi-agent environments.[18]
AlphaStar (DeepMind, 2019) reached Grandmaster level in StarCraft II, a real-time strategy game requiring long-horizon planning, imperfect information handling, and real-time decision making.
Voyager (Wang et al., 2023) was the first LLM-powered embodied lifelong learning agent. Built on top of GPT-4, Voyager played Minecraft autonomously and combined three components: an automatic curriculum that proposed exploration tasks, an ever-growing skill library of executable code, and an iterative prompting loop that incorporated environment feedback and self-verification. It obtained 3.3 times more unique items, travelled 2.3 times longer distances, and unlocked tech tree milestones up to 15.3 times faster than prior state of the art, all without any model fine-tuning.[19] Voyager helped popularise the idea that an LLM could serve as the policy of an open-ended agent in a simulated world.
| Agent | Game | Year | Key technique | Achievement |
|---|---|---|---|---|
| Deep Blue | Chess | 1997 | Search + evaluation | Beat world champion Kasparov |
| AlphaGo | Go | 2016 | Neural nets + MCTS + RL | Beat 9 dan professional Lee Sedol |
| AlphaZero | Go, chess, shogi | 2017 | Pure self-play RL | Superhuman in all three games |
| OpenAI Five | Dota 2 | 2019 | Multi-agent PPO | Beat world champion team OG |
| AlphaStar | StarCraft II | 2019 | Multi-agent RL + imitation | Grandmaster level |
| MuZero | Atari, Go, chess, shogi | 2020 | Learned model + MCTS | Matched AlphaZero without rules |
| Voyager | Minecraft | 2023 | GPT-4 + skill library | Lifelong learning embodied agent |
The post-2022 wave of agents built around large language models inherits the same abstraction but rearranges its parts. The LLM serves as the policy: given a context that encodes the agent's beliefs (transcript, retrieved documents, recent observations) and goals (system prompt, user request), it emits actions in the form of tokens, which downstream machinery decodes into tool calls or messages. The "environment" is the digital world of websites, APIs, file systems, and operating systems rather than a physical workshop.
Three patterns in particular bridge classical agent theory and the LLM era:
AutoGPT (Toran Bruce Richards, March 2023) was the first viral demonstration that an LLM driven by a continuous loop of plan, act, and reflect could pursue open-ended goals, even when the resulting reliability was limited. Subsequent systems such as Devin (Cognition, 2024), Claude Code, and Operator extended the idea into production. The full landscape of these products, frameworks, benchmarks, and standards is covered in AI agents and Agentic AI. What matters for the conceptual definition is that all of them remain agents in the Russell and Norvig sense: percepts in, actions out, policy in between, performance measured by a goal.
A practical agent, classical or modern, is an assembly of capabilities. The classical Russell and Norvig categorisation above describes them at a high level; the contemporary engineering view is more granular and maps cleanly onto modern frameworks.
| Capability | What it does | Classical realisation | Modern realisation in 2026 |
|---|---|---|---|
| Perception | Convert raw inputs into a usable representation | Hand-engineered sensor fusion | Text tokenisation, vision encoders, ASR, screenshot parsing |
| World modelling | Track non-observable state | Symbolic knowledge base, belief base | Context window, retrieval, scratchpad |
| Reasoning | Decide what to do given current state | First-order logic, search, planning | LLM forward pass, possibly with extended thinking |
| Planning | Decompose a goal into ordered subgoals | STRIPS, HTN, BDI plan library | Chain of thought, ReAct, tree of thoughts |
| Action | Change the environment | Effectors, API calls, robot motors | Tool calls, code execution, browser actions |
| Memory | Retain useful information across time | BDI belief base, episodic memory | Context window, vector DB, key-value store |
| Reflection | Critique own behaviour | Russell and Norvig "critic" component | Reflexion, verifier model, judge LLM |
| Communication | Talk to humans or other agents | KQML, FIPA-ACL | Chat UI, MCP, A2A protocol, structured outputs |
The distinction between an agent and a chatbot turns on autonomous action. A chatbot answers questions and produces text in response to prompts. An agent decides on its own to call tools, take actions in external systems, and pursue multi-step goals over time. The same underlying LLM can serve both roles depending on the scaffolding around it. ChatGPT in plain conversation is a chatbot; ChatGPT with browsing, code interpreter, and the ability to navigate websites becomes an agent.
Many products blur the line. A coding assistant that suggests a single completion is a chatbot; the same product in agent mode that reads files, runs tests, and applies a multi-file refactor is an agent. The relevant question is whether the system maintains state, takes consequential actions, and makes its own decisions about what to do next. In Wooldridge and Jennings's terminology, the agent is autonomous, reactive, and proactive; the chatbot has only the first two properties to a limited degree.
The concept of an agent in AI has evolved across more than seven decades. The table below picks out the milestones most cited in the academic and industrial literature.
| Year | Milestone |
|---|---|
| 1950 | Alan Turing's "Computing Machinery and Intelligence" proposes the imitation game; agent-style framing is implicit |
| 1956 | Dartmouth conference establishes AI as a field; Logic Theorist embodies agent-like behaviour |
| 1966 | ELIZA at MIT becomes the first chatbot |
| 1971 | STRIPS introduces formal planning with preconditions and effects; the Shakey robot runs it |
| 1973 | Hewitt, Bishop, and Steiger publish the actor model at IJCAI |
| 1976 | MYCIN expert system at Stanford diagnoses bacterial infections from rules |
| 1986 | Brooks introduces the subsumption architecture and behaviour-based robotics |
| 1987 | Bratman publishes Intention, Plans, and Practical Reason; Georgeff and Lansky build the Procedural Reasoning System |
| 1990 | DARPA Knowledge Sharing Effort launches KQML |
| 1991 | Pattie Maes founds the Software Agents Group at the MIT Media Lab |
| 1993 | Shoham proposes Agent-Oriented Programming with AGENT0 |
| 1995 | Wooldridge and Jennings publish "Intelligent Agents: Theory and Practice"; Rao and Georgeff publish "BDI Agents: From Theory to Practice" |
| 1995 | Russell and Norvig's Artificial Intelligence: A Modern Approach (1st ed.) makes the agent the central organising concept of AI |
| 1996 | FIPA founded to standardise agent interoperability |
| 1997 | IBM Deep Blue defeats Garry Kasparov at chess |
| 1998 | FIPA-ACL specification published |
| 2002 | First AAMAS conference held in Bologna |
| 2013 | DeepMind DQN learns to play Atari games from raw pixels |
| 2016 | AlphaGo defeats Lee Sedol at Go |
| 2017 | AlphaZero masters Go, chess, and shogi via self-play |
| 2019 | OpenAI Five defeats Dota 2 world champions; AlphaStar reaches StarCraft II Grandmaster |
| Oct 2022 | ReAct paper formalises the reasoning + acting loop for LLM agents |
| Mar 2023 | AutoGPT and BabyAGI bring autonomous LLM agents to mainstream attention |
| 2024 to 2026 | LLM agent products go mainstream: see AI agents and Agentic AI |