# Minimum Viable Agent

> Source: https://aiwiki.ai/wiki/minimum_viable_agent
> Updated: 2026-06-28
> Categories: Artificial Intelligence
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

*See also: [Artificial intelligence terms](/wiki/artificial_intelligence_terms)*

A **Minimum Viable Agent** (MVA) is the simplest valid implementation of an [AI agent](/wiki/ai_agent): a program in which a [large language model](/wiki/large_language_model) calls tools in a loop, observes the results, and decides on its own when to stop. In structural terms it is an LLM call inside a `while` loop, a small set of [tool](/wiki/tool_use) definitions, an environment those tools can act on, and a termination condition. Everything beyond those four elements (frameworks, planners, multi-agent orchestration, vector memory, evaluators) is an addition justified only by a concrete need the minimum cannot meet.

The phrase has two intersecting meanings in agent-engineering practice. The first, drawn from product methodology, treats the MVA as the agent equivalent of a Minimum Viable Product: a narrow first version shipped to real users to validate utility before adding features. The second, popularized by Anthropic's December 2024 essay [Building effective agents](/wiki/building_effective_agents), treats the MVA as a structural minimum: the smallest set of primitives that still constitutes a true agent rather than a deterministic [workflow](/wiki/workflow).[1] The two readings overlap in spirit (start small, escalate complexity only when needed) but they answer different questions. The product reading asks "What is the smallest valuable thing to ship?" The structural reading asks "What is the smallest thing that is still an agent?"

This article focuses primarily on the structural reading, which has become the dominant usage among practitioners after Anthropic's essay and the wave of "build an agent in fifty lines of Python" tutorials that followed. Under that reading, an MVA is an LLM call inside a `while` loop, a small set of tool definitions, an environment those tools can act on, and a termination condition. Anything beyond those four elements (frameworks, planners, multi-agent orchestration, vector memory, evaluators) is an addition on top of the minimum, justified only by a concrete need that the minimum cannot meet.

## What is a minimum viable agent?

The term "minimum viable agent" emerged informally on developer forums and social media through 2023 and 2024 as builders reacted against what they perceived as the over-abstraction of early agent frameworks. By late 2024, two influences crystallized the modern usage. The first was Anthropic's engineering essay [Building effective agents](https://www.anthropic.com/engineering/building-effective-agents), published on 19 December 2024 by Erik Schluntz and Barry Zhang, which drew a sharp line between two categories of "agentic systems."[1] The essay defines the two categories directly: "Workflows are systems where LLMs and tools are orchestrated through predefined code paths," while "Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage."[1] The MVA is the smallest implementation that still meets the second definition. The second influence was Hugging Face's release of [smolagents](/wiki/smolagents) on 31 December 2024, a deliberately barebones library whose core agent logic fit in roughly one thousand lines of Python and whose marketing centered on the message that agentic capabilities do not require heavy frameworks.[2]

The Anthropic essay is the most frequent citation when authors define the MVA. Its central recommendation is that "the most successful implementations weren't using complex frameworks or specialized libraries. Instead, they were building with simple, composable patterns."[1] The essay advises that "developers start by using LLM APIs directly: many patterns can be implemented in a few lines of code."[1] That stance, repeated across the OpenAI [practical guide to building agents](https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf) (April 2025)[3] and the documentation of frameworks such as the [OpenAI Agents SDK](/wiki/openai_agents_sdk) and the [Claude Agent SDK](/wiki/claude_agent_sdk), is now the consensus position in agent engineering: complexity is something a project earns, not something a project starts with.

### What are the components of the structural minimum?

The structural definition of an MVA, distilled from these sources, has four required components and one optional one.

1. **An LLM call.** A function that sends messages and tool definitions to a model and receives back either a final response or one or more tool calls. The model must support [function calling](/wiki/function_calling) or be prompted in a structured way that the agent harness can parse.
2. **A set of tool definitions.** Each tool is a name, a JSON-schema description of its arguments, and a function that executes when the model calls it. The full set defines the agent's [action space](/wiki/action_space).
3. **An environment.** The world the tools act on. This can be a shell, a file system, a web browser, a database, an API, or a simulator. Anthropic's essay emphasizes the importance of allowing "agents to gain 'ground truth' from the environment at each step (such as tool call results or code execution) to assess its progress," which is what distinguishes an agent from a stateless chat call.[1]
4. **A termination condition.** The rule that ends the loop. Common conditions are: the model returns a message with no tool calls, an explicit `finish` tool is invoked, a maximum iteration count is reached, a token or cost budget is exhausted, or a human approves a final answer.

The optional fifth component is **short-term memory**, usually implemented as the running list of messages and tool results that the agent re-sends to the model on each iteration. Some authors treat this as part of the LLM call rather than a separate component, since most modern model APIs require the caller to pass full conversation history. The OpenAI Agents SDK, the [LangGraph](/wiki/langgraph) `create_react_agent` helper, and the Claude Agent SDK all manage this transcript automatically.

What is deliberately absent from the structural minimum is also informative. There is no planner, no critic, no separate retrieval system, no long-term memory store, no observability layer, no guardrails, no evaluator. Each of those can be added (and in production usually is) but none is required for the system to be a working agent.

### What is the product reading of a minimum viable agent?

The older product-flavored reading of the MVA, common in startup and consulting writing through 2023 and 2024, frames the agent as a deliverable rather than an architecture. Under this reading, an MVA is a single-purpose agent shipped to real users as quickly as possible, building on the [Minimum Viable Product](/wiki/minimum_viable_product) tradition associated with [Eric Ries](/wiki/eric_ries) and the [Lean Startup](/wiki/lean_startup) movement. A customer-support agent that answers FAQs and escalates anything else, a recruiting agent that ranks resumes against a single job description, or a finance agent that extracts five fields from earnings releases would all qualify. The structural definition does not contradict this reading; an agent can be both a product MVP and a structurally minimal implementation. The two views simply emphasize different axes of "minimum": product surface area versus engineering surface area.

## What is the agent loop?

The defining pattern of an agent, and therefore of any minimum viable agent, is the **agent loop**. Different sources describe it with slightly different vocabulary, but the steps converge.

1. **Receive input.** A user message or task description enters the system. This becomes the first item in the message history.
2. **Call the model.** The LLM is invoked with the message history, the system prompt, and the tool schema list.
3. **Inspect the response.** The response is either a final message (the loop ends) or one or more tool calls (the loop continues).
4. **Execute the tools.** The harness runs each requested tool, captures its return value or error, and appends the result to the message history.
5. **Loop.** Control returns to step two with the augmented message history.
6. **Terminate.** The loop exits when the model returns a final message, the iteration cap is hit, the budget is exhausted, or the harness raises a stop signal.

Anthropic's Claude Agent SDK reduces this to a four-word slogan: "gather context, take action, verify work, repeat."[4] The Claude Code SDK was renamed the Claude Agent SDK on 29 September 2025 because Anthropic observed that the same loop powering its coding tool was being used internally for research, video creation, note-taking, and other agentic work.[4] The loop is the thing; the domain is interchangeable.

The blog post [The Unreasonable Effectiveness of an LLM Agent Loop with Tool Use](https://sketch.dev/blog/agent-loop) argues the same point even more sharply. The loop, the author writes, is a "shockingly simple" construct in Python:[5]

```python
def loop(llm):
    msg = user_input()
    while True:
        output, tool_calls = llm(msg)
        print("Agent:", output)
        if tool_calls:
            msg = [handle_tool_call(tc) for tc in tool_calls]
        else:
            msg = user_input()
```

The argument is that with even one general-purpose tool (the post uses bash) and a strong model, this loop is enough to handle a surprising fraction of real engineering work.[5] The MVA is the loop; everything else is plumbing.

### What is the parse, decide, execute, observe cycle?

A more precise way to describe the loop, popularized by the [ReAct](/wiki/react_agent) literature, breaks step three into two phases.

- **Parse.** Read the model's output and decide whether it is a final answer, a tool call, or both.
- **Decide.** Choose which tool to invoke and with what arguments. In ReAct-style agents, the model emits an explicit `Thought` before acting, making the decision step visible.
- **Execute.** Run the tool against the environment.
- **Observe.** Record the result as a new message and feed it back into the next model call.

This four-step cycle (parse, decide, execute, observe) is what most practitioners mean when they say "agent loop." It is the operational core of an MVA.

## Why is it called "minimum"?

The word minimum carries two implicit critiques. The first is aimed at the early generation of agent frameworks ([LangChain](/wiki/langchain) before LangGraph, [BabyAGI](/wiki/babyagi), [AutoGPT](/wiki/autogpt), and similar) that wrapped the agent loop in many layers of abstraction. Builders who tried to debug or extend those systems often found that the framework's ontology (chains, agents, tools, callbacks, parsers) obscured rather than clarified the simple `while` loop underneath. Anthropic's essay makes the case explicitly: frameworks "often create extra layers of abstraction that can obscure the underlying prompts and responses, making them harder to debug."[1] The MVA is the response: write the loop yourself, see every prompt and every tool call, and add a framework only when you have a problem the loop alone cannot solve.

The second critique is aimed at the temptation to add multi-agent orchestration, planners, critics, and elaborate memory schemes before validating that a single tool-calling loop fails. The OpenAI [practical guide to building agents](/wiki/openai_agents_guide) recommends starting with a single agent and a small tool set, then expanding only when the agent's failure modes call for it.[3] The smolagents launch post puts it as "don't build agents for everything," and Anthropic's essay opens with the same advice: optimizing a single LLM call with retrieval and in-context examples is usually enough for many use cases.[1][2]

The minimum, in other words, is a discipline. It says: justify each addition. If your agent works without a planner, do not add one. If it works without a vector store, do not add one. If it works without a sub-agent, do not add one. The MVA is the baseline against which all such additions must demonstrate their value.

## What does a code-level minimum example look like?

The canonical demonstration of an MVA is a short Python script that solves a real task using only the OpenAI or Anthropic SDK and standard library. The example below is representative of the shape that appears in dozens of tutorials, including the [Anthropic cookbook](https://github.com/anthropics/anthropic-cookbook), the OpenAI documentation, and the smolagents introduction. It is paraphrased to illustrate the structure rather than copied from any single source.

```python
import json
import anthropic

client = anthropic.Anthropic()

def get_weather(location: str) -> str:
    # Stand-in for a real API call.
    return json.dumps({"location": location, "temp_c": 14, "sky": "overcast"})

def calculator(expression: str) -> str:
    return str(eval(expression, {"__builtins__": {}}, {}))

TOOLS = [
    {
        "name": "get_weather",
        "description": "Look up the current weather for a city.",
        "input_schema": {
            "type": "object",
            "properties": {"location": {"type": "string"}},
            "required": ["location"],
        },
    },
    {
        "name": "calculator",
        "description": "Evaluate an arithmetic expression.",
        "input_schema": {
            "type": "object",
            "properties": {"expression": {"type": "string"}},
            "required": ["expression"],
        },
    },
]

DISPATCH = {"get_weather": get_weather, "calculator": calculator}

def run_agent(user_message: str, max_turns: int = 10) -> str:
    messages = [{"role": "user", "content": user_message}]
    for _ in range(max_turns):
        response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1024,
            tools=TOOLS,
            messages=messages,
        )
        if response.stop_reason == "end_turn":
            return "".join(
                block.text for block in response.content if block.type == "text"
            )
        if response.stop_reason == "tool_use":
            messages.append({"role": "assistant", "content": response.content})
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    fn = DISPATCH[block.name]
                    result = fn(**block.input)
                    tool_results.append(
                        {
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": result,
                        }
                    )
            messages.append({"role": "user", "content": tool_results})
    raise RuntimeError("Agent exceeded max_turns without finishing.")
```

The full program is approximately fifty lines including imports and tool definitions. It contains every required component of an MVA. The model is the Claude API call. The tools are `get_weather` and `calculator`. The environment is the local Python interpreter plus whatever the tools reach (the weather function would normally hit an external API). The termination conditions are two: the model returns `end_turn`, or the loop hits `max_turns`. The message history serves as short-term memory.

Nothing in this script depends on a framework. It runs on `pip install anthropic` alone. The same shape ports cleanly to OpenAI's tool-calling API by swapping the client and adjusting the response parsing. It is, in the literal structural sense, a minimum viable agent.

## What are the common variations on the minimum?

Once a builder has the basic loop, several well-known variations refine it without abandoning the minimal posture. Each is a recognizable pattern in the agent literature, named in research papers or framework documentation.

### What is the ReAct pattern?

The ReAct pattern, introduced by Shunyu Yao and colleagues in the 2022 paper *ReAct: Synergizing Reasoning and Acting in Language Models* (presented at ICLR 2023), is the historical ancestor of most modern tool-calling agents.[6] ReAct interleaves natural-language `Thought` traces with `Action` calls and `Observation` outputs, producing transcripts shaped like:

```
Thought: I need to find the population of France first.
Action: search["population of France 2024"]
Observation: 68 million.
Thought: Now compare it to Germany.
Action: search["population of Germany 2024"]
Observation: 84 million.
Thought: I have enough to answer.
```

The ReAct paper showed that interleaving reasoning and acting reduces hallucination on question-answering benchmarks like HotpotQA and Fever, and that on the ALFWorld and WebShop interactive benchmarks ReAct "outperforms imitation and reinforcement learning methods by an absolute success rate of 34% and 10% respectively, while being prompted with only one or two in-context examples."[6] Most modern tool-calling APIs (OpenAI function calling, Anthropic tool use, Google Gemini tools) effectively implement a structured ReAct loop; the model's `assistant` message is the `Thought`, the tool call is the `Action`, and the tool result is the `Observation`. The MVA above is, in this sense, a ReAct agent in all but name.

### What is the Reflection (Reflexion) pattern?

[Reflection](/wiki/reflection_agent) variants, exemplified by the 2023 paper *Reflexion: Language Agents with Verbal Reinforcement Learning* by Noah Shinn, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao (NeurIPS 2023), insert a self-critique step into the loop.[7] After each attempt, the agent generates a verbal reflection on what went wrong and stores it in an episodic memory buffer that conditions future attempts. Reflexion reported a 91% pass-at-1 score on the [HumanEval](/wiki/humaneval) coding benchmark, surpassing the 80% baseline of GPT-4 at the time.[7] In MVA terms, Reflection adds an extra step to the loop (after observation, generate a reflection) but does not require a second agent or a separate planner.

### What is the plan-and-execute pattern?

Plan-and-execute agents, popularized by BabyAGI in early 2023 and formalized in the LangChain blog and LangGraph tutorials, split the work into two phases.[8] A planner LLM generates a list of steps once, and an executor LLM (often a smaller model) carries out each step. Replanning happens only when the executor fails or new information arrives. The pattern reduces calls to the expensive planner model, cutting cost and latency relative to step-by-step ReAct. The trade-off is rigidity: a static plan cannot react to surprising tool results as cleanly as a fully dynamic loop. An MVA can grow into plan-and-execute by adding a single up-front planning call, with the rest of the loop unchanged.

### What is a multi-agent variation?

[Multi-agent](/wiki/multi_agent_systems) variants, represented by frameworks such as [AutoGen](/wiki/autogen), [CrewAI](/wiki/crewai), and the OpenAI Agents SDK's handoff feature, run several specialized agents that exchange messages. Microsoft Research's AutoGen paper, *AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation*, frames this as conversation among customizable, conversable agents.[9] Multi-agent setups can outperform single agents on tasks that decompose cleanly along role lines (researcher and writer, planner and coder), but they cost more tokens and add coordination failure modes. Anthropic reports that multi-agent systems use about fifteen times more tokens than chat interactions, so they are economically viable only when the value of the task justifies the expense.[10] Anthropic's essay treats multi-agent as a late addition, used only when a single agent demonstrably cannot handle the task.[1] The MVA's multi-agent extension is to instantiate two MVAs and let one call the other as a tool.

### What is a code agent?

[Code agents](/wiki/code_agent), the design at the heart of Hugging Face's smolagents, replace JSON tool calls with executable code blocks. Instead of emitting a tool name and arguments, the model writes Python that uses the tools as functions. The smolagents launch post (31 December 2024, Aymeric Roucher, Merve Noyan, and Thomas Wolf) reports that this approach reduces LLM calls by roughly 30% on complex benchmarks because a single code block can chain several tool calls and intermediate computations.[2] The MVA shape is preserved; only the tool-call serialization changes. Smolagents executes the model-generated code in a sandbox (E2B, Modal, Docker, or WebAssembly) to contain the obvious risk.[2]

## How does an MVA compare with full frameworks?

A recurring question for new builders is when to graduate from a hand-rolled MVA to a framework. The honest answer in 2026 is that the major frameworks have largely converged on "give us the minimum loop, then add the things you need." Each framework defines the minimum slightly differently.

| Framework | Vendor | Minimum unit | Distinctive value over a hand-rolled MVA |
| --- | --- | --- | --- |
| smolagents | Hugging Face | ~1,000-line core, code agents | Smallest commercial framework; teaching tool; model-agnostic via LiteLLM[2] |
| OpenAI Agents SDK | OpenAI | `Agent` + `Runner.run_sync` | Built-in tracing and handoffs; production successor to Swarm[11] |
| Claude Agent SDK | Anthropic | Managed loop + default tools | "Gather context, take action, verify work"; gives the agent a terminal[4] |
| LangGraph | LangChain | `create_react_agent` / `StateGraph` | Durable execution, checkpointers, human-in-the-loop interrupts[12] |
| AutoGen | Microsoft | `ConversableAgent` + `UserProxyAgent` | Multi-agent conversation; now in maintenance mode[9] |

- **LangGraph** (LangChain). LangGraph models the agent as a `StateGraph`: a typed state, nodes that read it and return deltas, and edges (some conditional) that wire them together. The `create_react_agent` helper produces a one-line MVA equivalent. LangGraph's value over a hand-rolled MVA is durable execution: a checkpointer (in-memory, SQLite, or Postgres) writes state at every super-step so a graph can be paused, resumed, replayed, or branched.[12] Builders typically choose LangGraph when they need [human-in-the-loop](/wiki/human_in_the_loop) interrupts or long-running tasks that survive process crashes.
- **AutoGen** (Microsoft Research). AutoGen leans into multi-agent conversation. Its core implements an actor model with asynchronous message passing between agents. The minimum AutoGen agent is a `ConversableAgent` paired with a `UserProxyAgent`. AutoGen v0.4 made the runtime event-driven, and Microsoft has since put AutoGen into maintenance mode, recommending the new Microsoft Agent Framework for greenfield work.[13] AutoGen's appeal was always multi-agent debate and group chat patterns; its baseline is heavier than an MVA but lighter than orchestration platforms.
- **OpenAI Agents SDK**. Released on 11 March 2025 as the production successor to the experimental [Swarm](/wiki/openai_swarm) project, the OpenAI Agents SDK exposes four primitives: `Agent` (an LLM with instructions, tools, and handoffs), `Runner` (the loop), guardrails, and tracing.[11] The smallest possible program is two lines after import: `agent = Agent(name="Assistant", instructions="You are helpful.")` and `Runner.run_sync(agent, "Write a haiku.")`. The SDK's signature feature beyond the loop is built-in tracing and the handoff primitive that converts a multi-agent system into a graph of single-purpose `Agent` objects.
- **Claude Agent SDK**. Anthropic released the Claude Agent SDK on 29 September 2025, renaming the prior Claude Code SDK.[4] The SDK pairs a managed agent loop ("gather context, take action, verify work, repeat") with a default tool set drawn from Claude Code: filesystem read and write, `Bash`, code generation, file search, and [MCP](/wiki/model_context_protocol) integration. Subagents run in isolated context windows. The Claude Agent SDK's design principle is to give the agent a computer: rather than enumerate tools, it gives the agent a terminal and lets it improvise.[4]
- **smolagents** (Hugging Face). Released on 31 December 2024, smolagents is the closest commercial framework to the structural MVA: roughly one thousand lines of core code, model-agnostic via [LiteLLM](/wiki/litellm), and oriented toward code agents.[2] It is often used as a teaching framework precisely because there is so little of it.

The practical heuristic adopted by most teams writing about this in 2025 and 2026 is: start with a hand-rolled MVA in fewer than one hundred lines of Python; move to smolagents or the OpenAI/Claude Agent SDKs when you want managed tracing and prebuilt tools; move to LangGraph or the Microsoft Agent Framework when you need durable state, human-in-the-loop interrupts, or graph orchestration with multiple agents.

## When should you use an agent at all?

A central message of Anthropic's essay (and one repeated in OpenAI's, Google's, and Hugging Face's guides) is that many problems do not need an agent. The essay phrases it directly: "Agents can be used for open-ended problems where it's difficult or impossible to predict the required number of steps, and where you can't hardcode a fixed path."[1] If the workflow is predictable, a workflow is better. If a single LLM call with retrieval suffices, that is better still. The first question to ask before building an MVA is whether the problem actually requires dynamic decision-making at runtime.

Anthropic's essay names five workflow patterns that should be tried first: prompt chaining, routing, parallelization, orchestrator-workers, and evaluator-optimizer.[1] Only after these prove insufficient does the autonomous agent become the right tool. The reason is the cost-error tradeoff. The essay notes that "Agentic systems often trade latency and cost for better task performance," and that "the autonomous nature of agents means higher costs, and the potential for compounding errors."[1] An agent that takes ten model calls to do what a workflow could do in two is an agent that costs more and has more chances to fail.

The practical signs that an agent is appropriate are: the task has a long tail of edge cases that resist enumeration, the path through the task depends on information that arrives during execution, the task is open-ended in length (a research project, a debugging session, a multi-step support case), or the human alternative is a person with expertise rather than a person following a script. If the task fails any of those tests, a workflow or a single augmented LLM call is the better starting point.

## What are the pitfalls of minimum viable agents?

The simplicity of the MVA is also its largest source of failure modes. A short loop with no guardrails is a short path to a runaway bill or a stuck agent. The pitfalls below recur in every postmortem written about agents shipped to production.

### How do you prevent infinite loops and non-progress?

The most common failure is the agent that does not stop. Without a hard cap, an agent can call a broken tool hundreds of times, debate itself, or re-plan endlessly. Industry guidance settles on three layered termination conditions: a maximum-iteration cap (typical production values are fifteen to twenty-five steps), a token or cost budget per run, and a no-progress detector that exits the loop when several consecutive iterations produce no new information. Termination conditions are not optional polish.

### How do you avoid cost runaway?

Agents consume roughly four times the tokens of a comparable chat interaction, and multi-agent setups about fifteen times, because every loop iteration re-sends the growing message history to the model.[10] The fix is a combination of per-run budgets (hard caps on tokens and dollars), [prompt caching](/wiki/prompt_caching) of the system prompt and stable parts of the history, summarization of older turns once the history grows past a threshold, and a routing layer that sends easy tasks to a cheaper model. Some teams place a proxy in front of all LLM calls that enforces these limits centrally rather than relying on each agent to police itself.

### How should tools handle errors?

The MVA template above runs tools with no try/except wrapper. In production, every tool needs error handling because tool failures are how environments communicate negative information. A `404` from an HTTP tool is data, not a crash. The harness should catch tool exceptions, format them as messages back to the model, and let the agent decide whether to retry, switch tools, or give up. Repeated identical errors in a row should trip a circuit breaker.

### Why does an MVA need observability?

A loop that nobody can read is a loop that nobody can debug. Production agents need structured tracing of every model call, tool invocation, and tool result, with token counts and latencies. The de facto standard in 2026 is [OpenTelemetry](/wiki/opentelemetry): frameworks including [Pydantic AI](/wiki/pydantic_ai), smolagents, the Claude Agent SDK, and Strands Agents emit OTel spans, which platforms such as [Langfuse](/wiki/langfuse), [LangSmith](/wiki/langsmith), [Arize Phoenix](/wiki/arize_phoenix), and [Helicone](/wiki/helicone) can ingest.[14] Adding tracing to an MVA usually means wrapping the LLM call and the tool dispatcher with an instrumentation library; it does not require restructuring the loop.

### What are common tool-design mistakes?

Anthropic's essay calls out the [agent-computer interface](/wiki/agent_computer_interface) as deserving as much attention as the prompts themselves, advising builders to "think about how much effort goes into human-computer interfaces (HCI), and plan to invest just as much effort in creating good agent-computer interfaces (ACI)."[1] Tool descriptions should read like documentation written for a junior engineer who will use the tool blind. Common mistakes include: ambiguous parameter names that the model fills with the wrong values, tool sets that overlap so the model picks the wrong one, side-effecting tools that should be idempotent but are not, and tools that return walls of text the model cannot summarize. The fix is to write tool docs first, test them with a few prompts, and shrink the tool set until each tool has one clear purpose.

### How do you defend against prompt injection?

An agent that reads untrusted text and then acts on tools is an attack surface. A web-browsing agent that reads a malicious page can be coaxed into calling tools the user did not ask for. The literature on [prompt injection](/wiki/prompt_injection) recommends three defenses: separate the system prompt from user content with structural markers, run the agent in a sandbox where its tools cannot reach sensitive systems, and confirm any irreversible action (sending email, executing trades, deleting files) with a human. Smolagents enforces sandboxing for code agents by routing execution through E2B, Modal, Docker, or WebAssembly.[2] The Claude Agent SDK and OpenAI Agents SDK both ship sandboxed execution modes.

### What are compounding errors?

Anthropic's essay warns that "the autonomous nature of agents means higher costs, and the potential for compounding errors": small mistakes in early steps that cascade into worse outcomes later.[1] An agent that misreads a user's date format on step one will compound that mistake on every subsequent step. The defenses are evaluation, redundant verification (have the agent check its own work or have a second model verify), and intermediate human checkpoints for high-stakes flows.

## What does it take to move an MVA to production?

Moving an MVA from a notebook to production requires several additions, none of which violate the minimal posture but all of which extend it.

### How do you add memory?

Short-term memory is the message history that the loop already carries. [Long-term memory](/wiki/long_term_memory) is what the agent remembers across runs: prior conversations, learned preferences, accumulated facts. Implementations range from a key-value store keyed on user ID to a vector database for semantic recall. The OpenAI Agents SDK and LangGraph both expose session abstractions that persist message history; for richer memory, teams add a [vector store](/wiki/vector_database) such as [Pinecone](/wiki/pinecone), [Weaviate](/wiki/weaviate), or [pgvector](/wiki/pgvector). Anthropic's Claude Agent SDK recommends starting with agentic search (the agent uses a search tool against a file or document store) and adding semantic search only if the agentic version is too slow.[4]

### Why do production agents need evaluations?

Production agents need [evals](/wiki/evaluation): scripted test cases that exercise representative tasks and grade outputs. Anthropic, OpenAI, Google, and Hugging Face all publish evaluation guides in their agent docs because the failure modes of agents are emergent rather than catchable in unit tests. Common eval shapes include ground-truth tasks (a known-answer QA set), trajectory grading (does the agent's tool sequence look right?), and LLM-as-judge scoring on open-ended outputs. Evals run in CI prevent regressions when a prompt or model is changed.

### What are guardrails?

[Guardrails](/wiki/guardrails) are checks layered on agent inputs and outputs. Input guardrails block prompt injection, jailbreaks, or out-of-scope queries before the agent runs. Output guardrails check the final response for [PII](/wiki/personally_identifiable_information), policy violations, or factual contradictions before it reaches the user. The OpenAI Agents SDK exposes guardrails as a first-class primitive; LangGraph users typically add guard nodes; hand-rolled MVAs add explicit `validate_input` and `validate_output` calls.[11] Guardrails are not part of the structural minimum but they are part of any responsible production deployment.

### How do tracing and analytics fit in?

Once an agent runs against real users, the operator needs to see what is happening. Tracing platforms record full transcripts, token counts, costs, latencies, and tool-result samples. They support replay, A/B comparison of prompts, and aggregate analytics on which tools are used, which fail, and which produce the longest latencies. The integration is shallow (a wrapper around the model client and tool dispatcher) but the operational value is large.

### When do you add human-in-the-loop?

For any agent whose actions have non-trivial cost (sending a customer email, posting code, moving money), production deployments include human-in-the-loop checkpoints. Anthropic's essay frames this as the agent "pausing for human feedback at checkpoints or when it encounters blockers."[1] LangGraph's interrupt primitive and the Claude Agent SDK's permission system are designed for exactly this. The MVA template above adds human-in-the-loop with a single change: insert a confirmation step before tools whose `requires_approval` flag is true.

### How do you deploy a long-running agent?

Agents that run for minutes or hours need different deployment shapes than chat applications that respond in seconds. Long-running agents typically run as background jobs (Celery, Temporal, AWS Step Functions) with a checkpointer that lets them resume after a crash. LangGraph's persistence layer is built for this; the Microsoft Agent Framework adds enterprise-grade durability on top of AutoGen's lessons.[13] The MVA does not require any of this until run times exceed the request timeout of the host platform.

## A worked example: the support-agent MVA

A practical illustration brings the concepts together. Suppose a startup wants a customer-support agent that answers product questions, looks up order status, and escalates anything else to a human. The product MVA is one of the canonical examples cited above (a narrow scope, a clear value to users). The structural MVA is the implementation.

The agent has three tools: `search_docs(query)` that returns the top three FAQ snippets from a vector store, `get_order_status(order_id)` that hits the order database, and `escalate_to_human(reason)` that posts to a Slack channel and ends the loop. The system prompt instructs the agent to answer from docs when possible, to verify identity before disclosing order details, and to escalate anything outside its scope. The loop has a `max_turns` of eight and a per-run cost cap of two cents. The harness runs in a sandboxed function with no other network access. Every turn is traced to Langfuse. A nightly eval set of fifty tickets, each with a graded ideal response, runs in CI.

This is still an MVA in the structural sense. The loop is a few dozen lines. The tool set is three. The environment is a vector store and an order database. The termination conditions are: model returns a final answer, escalation tool is called, max turns hit, cost cap hit. What has been added is not framework abstraction but operational scaffolding: tracing, evals, sandboxing, cost limits. Each addition was justified by a real failure mode that the bare loop did not handle. The MVA discipline is preserved.

## How does the MVA relate to the Minimum Viable Product tradition?

The naming clearly echoes Eric Ries's Minimum Viable Product. The lineage is more than rhetorical. Both ideas reject premature optimization: the MVP rejects building features without validating user need, and the MVA rejects building agent abstractions without validating that the loop alone fails. Both treat shipping and observation as superior to planning and theorizing. Both recognize that the cost of complexity is paid up front and the value of features is paid back only if the underlying assumptions survive contact with reality.

Where they differ is in the role of users. The product MVP is fundamentally a hypothesis test about what users want. The structural MVA is a hypothesis test about what an architecture can do. A team can ship a product MVP that is a deeply complex agent, or a product non-MVP that is a structural MVA. The two minimums are independent axes; the rigorous practitioner attends to both.

## What are the common misconceptions about MVAs?

Several misconceptions about MVAs have hardened in informal usage and are worth correcting.

- **"MVA means weak agent."** It does not. A two-tool MVA running on a top-tier model can outperform a multi-agent framework running on a weaker model. The minimum is about architectural surface area, not capability ceiling.
- **"MVA means no framework."** It does not. Smolagents, the OpenAI Agents SDK, and the Claude Agent SDK are all explicitly framed as MVA-friendly. The relevant question is whether the framework adds abstractions you can read and skip past, not whether the word "framework" appears.
- **"MVA means no memory."** Short-term memory (the message history) is part of the loop. Long-term memory is optional. The minimum viable agent has memory of the current run by definition; it has no memory across runs by default.
- **"MVA implies single-agent."** Not strictly. An MVA can be multi-agent if multi-agent is justified, but adding a second agent should be a defended decision rather than a default. Most production deployments cited by Anthropic, OpenAI, and Hugging Face start as single-agent MVAs and add agents only when the problem decomposes cleanly.
- **"MVA is a synonym for ReAct."** ReAct is one of several patterns that satisfy the MVA shape. So are tool-calling loops, code agents, plan-and-execute (with a single planner call), and reflection loops. The MVA is the genus; ReAct is one species.

## How did the minimum viable agent idea emerge and spread?

The core idea predates the term. The 2022 ReAct paper described what is recognizably an MVA loop in research form, and the 2023 Toolformer paper by Timo Schick and colleagues at Meta AI showed that language models could be trained to decide when to call APIs.[6][15] By mid-2023, BabyAGI and AutoGPT had popularized the idea of LLM agents that pursue goals over many steps; both were criticized for opacity and unreliability, which fueled the appetite for simpler implementations.

Through 2023 and into 2024, LangChain became the dominant framework for building agents, and a backlash against its abstractions developed in parallel. Anthropic's Building effective agents essay in December 2024 crystallized that backlash into actionable guidance and gave the MVA mindset its definitive citation.[1] Hugging Face's smolagents release one week later (31 December 2024) provided a concrete framework that exemplified the philosophy.[2] OpenAI's practical guide to building agents (April 2025) and the OpenAI Agents SDK (March 2025) brought a major commercial backer behind the same design language.[3][11] Anthropic's renaming of the Claude Code SDK to the Claude Agent SDK in September 2025 generalized the pattern beyond coding, and the Microsoft Agent Framework carried AutoGen's multi-agent ideas into a more disciplined runtime.[4][13]

By 2026 the MVA is the default starting point in agent engineering. New projects begin with a hand-rolled loop or a thin SDK; complexity is added in defended increments; and frameworks compete on how cleanly they preserve the underlying loop while adding production scaffolding. The discourse has moved from "which framework should I pick?" to "what is the simplest agent that could possibly work for this task, and how do I instrument it?"

## Which tools and frameworks fit the MVA mindset?

The ecosystem of tools that pair well with MVA-style development has grown rapidly. The list below is representative rather than exhaustive.

- **Frameworks.** smolagents, OpenAI Agents SDK, Claude Agent SDK, LangGraph, Pydantic AI, [LlamaIndex](/wiki/llamaindex) Workflows, [Microsoft Agent Framework](/wiki/microsoft_agent_framework), [Strands Agents](/wiki/strands_agents).
- **Models.** [Claude](/wiki/claude) (Anthropic), [GPT](/wiki/gpt) (OpenAI), [Gemini](/wiki/gemini) (Google), [Llama](/wiki/llama) (Meta), [DeepSeek-R1](/wiki/deepseek_r1), [Qwen](/wiki/qwen). Tool-calling support is now standard across all major model families.
- **Tooling and protocols.** Model Context Protocol (MCP), introduced by Anthropic in November 2024, has become the cross-vendor standard for connecting agents to external systems. LiteLLM provides a unified interface across providers.
- **Sandboxes.** [E2B](/wiki/e2b), [Modal](/wiki/modal), [Daytona](/wiki/daytona), Docker, [Pyodide](/wiki/pyodide) (WebAssembly).
- **Observability.** Langfuse, LangSmith, Arize Phoenix, Helicone, [Honeycomb](/wiki/honeycomb) for OpenTelemetry-based traces.
- **Coding aids.** [Claude Code](/wiki/claude_code), [Cursor](/wiki/cursor), [Windsurf](/wiki/windsurf), [GitHub Copilot](/wiki/github_copilot), [Aider](/wiki/aider). All are themselves agents and good case studies in MVA-shaped engineering.

## What are the limitations of the MVA approach?

The MVA mindset is not a universal solvent. Several classes of problem strain the simple loop and benefit from more structure from the start.

- **Long-horizon work.** Tasks that take hours or days (research projects, large refactors, multi-step business processes) need durable execution, checkpoints, and human review steps. A bare MVA that crashes loses everything; LangGraph's persistence layer or Temporal-backed orchestration earns its complexity here.
- **Tightly coupled multi-agent collaboration.** When two agents must hold consistent shared state (a researcher and a writer agreeing on facts), simple message passing breaks down. Frameworks with explicit state graphs handle this better than ad-hoc loops.
- **High-stakes domains.** Medical, legal, and financial agents need verification, audit logs, and explicit policy enforcement that a minimal loop does not provide by default. Adding those is straightforward; assuming they are not needed is dangerous.
- **Latency-sensitive interactions.** A multi-turn agent that takes thirty seconds per response is unusable in voice or chat UIs that expect sub-second latency. For those, single-call augmented LLMs or tightly scripted workflows usually win.

Knowing these limits is part of the discipline. The MVA is the default, not the destination.

## Summary

A minimum viable agent is the smallest implementation that still qualifies as an agent: an LLM in a loop, a few tools, an environment, and a termination condition. The concept has both a product reading (ship the narrowest useful version first) and a structural reading (do not exceed the architectural minimum without justification). The structural reading dominates in 2026, anchored by Anthropic's Building effective agents,[1] the OpenAI practical guide,[3] and the design choices of frameworks such as smolagents, the OpenAI Agents SDK, the Claude Agent SDK, and LangGraph. Builders who internalize the MVA discipline write fewer lines of code, debug more easily, ship more reliable systems, and add complexity only when the bare loop demonstrably fails. The principle is older than the term: start small, observe carefully, escalate only when justified.

## References

1. Schluntz, Erik and Barry Zhang. *Building effective agents*. Anthropic Engineering, 19 December 2024. https://www.anthropic.com/engineering/building-effective-agents
2. Roucher, Aymeric, Merve Noyan and Thomas Wolf. *Introducing smolagents: simple agents that write actions in code*. Hugging Face Blog, 31 December 2024. https://huggingface.co/blog/smolagents
3. OpenAI. *A Practical Guide to Building Agents*. 17 April 2025. https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf
4. Anthropic. *Building agents with the Claude Agent SDK*. 29 September 2025. https://www.anthropic.com/engineering/building-agents-with-the-claude-agent-sdk
5. Sketch Dev. *The Unreasonable Effectiveness of an LLM Agent Loop with Tool Use*. https://sketch.dev/blog/agent-loop
6. Yao, Shunyu, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan and Yuan Cao. *ReAct: Synergizing Reasoning and Acting in Language Models*. arXiv:2210.03629, October 2022. ICLR 2023. https://arxiv.org/abs/2210.03629
7. Shinn, Noah, Federico Cassano, Edward Berman, Ashwin Gopinath, Karthik Narasimhan and Shunyu Yao. *Reflexion: Language Agents with Verbal Reinforcement Learning*. arXiv:2303.11366, March 2023. NeurIPS 2023. https://arxiv.org/abs/2303.11366
8. LangChain. *Plan-and-Execute Agents*. https://blog.langchain.com/planning-agents/
9. Wu, Qingyun et al. *AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation*. Microsoft Research, 2023. https://www.microsoft.com/en-us/research/publication/autogen-enabling-next-gen-llm-applications-via-multi-agent-conversation-framework/
10. Anthropic. *How we built our multi-agent research system*. Anthropic Engineering, 13 June 2025. https://www.anthropic.com/engineering/multi-agent-research-system
11. OpenAI. *New tools for building agents*. 11 March 2025. https://openai.com/index/new-tools-for-building-agents/
12. LangChain. *LangGraph documentation: overview*. https://docs.langchain.com/oss/python/langgraph/overview
13. Microsoft. *Microsoft Agent Framework*. https://learn.microsoft.com/en-us/agent-framework/overview/
14. Langfuse. *AI Agent Observability with Langfuse*. https://langfuse.com/blog/2024-07-ai-agent-observability-with-langfuse
15. Schick, Timo, Jane Dwivedi-Yu, Roberto Dessi, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda and Thomas Scialom. *Toolformer: Language Models Can Teach Themselves to Use Tools*. arXiv:2302.04761, February 2023. https://arxiv.org/abs/2302.04761
16. Willison, Simon. *Building effective agents*. Annotation, 20 December 2024. https://simonwillison.net/2024/Dec/20/building-effective-agents/
17. Anthropic. Anthropic Cookbook (agent examples). https://github.com/anthropics/anthropic-cookbook
18. Hugging Face. smolagents repository. https://github.com/huggingface/smolagents