smolagents
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 3,819 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 3,819 words
Add missing citations, update stale details, or suggest a clearer explanation.
smolagents is an open source Python library for building agents powered by large language models (LLMs), released by hugging face on 30 December 2024.[1] Its central design choice is to favor code agents, agents that emit executable Python snippets as their action format, over the more conventional pattern of emitting structured JSON tool calls. The library is deliberately minimal: its core agent loop is contained in roughly one thousand lines of Python in a single file, src/smolagents/agents.py, and it ships with first-party support for sandboxed code execution through E2B, Docker, Modal, Blaxel, and a Pyodide/Deno WebAssembly backend.[2][3]
smolagents is the direct successor to Hugging Face's earlier transformers.agents module, which it replaced over the course of 2025 as the agent code was spun out of the main transformers repository.[4] The library is released under the Apache License 2.0 and is maintained by a core team of Hugging Face engineers led by Aymeric Roucher (also known on the Hub as m-ric), with significant contributions from Albert Villanova del Moral, Thomas Wolf, Leandro von Werra, and Erik Kaunismäki.[2][5]
The library is one of the more visible expressions of a broader argument that Hugging Face has been making since mid-2024, that code is a more natural and more expressive action format for LLMs than JSON. That position was first laid out in the July 2024 blog post "Our Transformers Code Agent beats the GAIA benchmark," in which a Hugging Face code-agent system reached the top of the validation split of the gaia benchmark leaderboard.[6] smolagents inherits and extends that lineage. In February 2025 a smolagents-based reimplementation of OpenAI's Deep Research, distributed in the repository's examples/open_deep_research directory, reached 55.15 percent pass-at-1 on the GAIA validation set, surpassing the previous open source state of the art held by Microsoft's Magentic-One (~46 percent) while remaining substantially behind OpenAI's proprietary Deep Research system (67.36 percent).[7]
smolagents was first published to PyPI on 27 December 2024 as version 0.1.0, and the public launch took place on 30 December 2024 with a blog post titled "Introducing smolagents: simple agents that write actions in code," authored by Aymeric Roucher, Merve Noyan, and Thomas Wolf.[1][8] The accompanying GitHub repository, huggingface/smolagents, was opened on the same day, along with an initial set of documentation pages on huggingface.co/docs/smolagents.[2][3]
The release was positioned not as a brand new research program but as a consolidation of patterns that Hugging Face had already shipped inside transformers.agents. The two libraries share an API surface that is close enough that the official migration guide describes the move as primarily a matter of changing imports.[4] The launch blog framed smolagents as a deliberately small, opinionated library for building agents, in contrast with what the authors saw as the increasingly heavy abstraction layers of competing frameworks such as langgraph and crewai.[1]
The project picked up rapid adoption in the weeks following launch. By mid-January 2025 the GitHub repository had crossed ten thousand stars, and by mid-2025 it had become the default agent library taught in Hugging Face's free "AI Agents" course on the platform's learning portal.[9]
To understand the design of smolagents it helps to recall the trajectory of Hugging Face's agent work. The company first shipped an agents module inside langchain-influenced glue code in mid-2023, then released "Transformers Agents 2.0" in May 2024, introducing two new agent types that could iterate based on past observations and step toward solutions over multiple turns.[10] The 2.0 release already included a CodeAgent that executed Python in a sandboxed AST-walking interpreter, and a ReactJsonAgent that used JSON for tool calls. The library was used internally by Hugging Face teams for tasks such as agentic retrieval-augmented generation and as the backbone of the July 2024 GAIA benchmark submission.[6]
By late 2024 Hugging Face had concluded that the agents subsystem was being slowed down by being inside transformers, a much larger library with broader compatibility constraints. They also wanted the agent library to be more model-agnostic, supporting providers from openai and anthropic to ollama and deepseek without dragging the whole transformers dependency chain along. smolagents was the result. The new repository carved out the agent loop and tool plumbing as a standalone package, removed the deeper coupling to transformers, and added explicit first-class support for sandboxed code execution through E2B and Docker.[1][2] Agents and tools were formally removed from the transformers library in version 4.52, released in mid-2025, with users redirected to smolagents.[4]
The release also fits into Hugging Face's broader 2024 to 2025 strategy of competing on agentic infrastructure for open weights. Other projects in the same arc include the Open Deep Research reproduction, the SmolLM family of small language models tuned for on-device and agentic use, and the integration of an agent runtime into the Hub itself via Hugging Face Spaces.[7][11]
The smolagents architecture revolves around a small number of orthogonal concepts: an agent that runs a ReAct-style loop, tools that the agent can invoke, a model abstraction that wraps an LLM backend, and an executor that actually runs the agent's Python actions.[3] The library exposes two concrete agent classes, CodeAgent and ToolCallingAgent, both of which subclass a shared MultiStepAgent base.
CodeAgent is the headline class. On each step the agent receives the task and the history of previous observations, calls the underlying LLM, and expects the model to emit a block of Python code wrapped in a Code: heading and <end_code> marker. The framework parses the response, extracts the code block, passes it to a PythonExecutor instance, and feeds the printed output (and any returned value) back to the LLM as the next observation. When the model is satisfied with the current state of the variables, it calls a built-in final_answer() function with the answer, which terminates the loop.[3]
The executor environment makes tools available to the model as ordinary Python callables. For example, a code agent equipped with a DuckDuckGoSearchTool and an image_generator tool can in principle write a single block such as images = [image_generator(headline) for headline in DuckDuckGoSearchTool()(query)[:3]], exploiting the full expressive power of Python loops, list comprehensions, and nested function calls. Hugging Face argues that this composability, together with native handling of complex objects such as PIL.Image or numpy arrays, is the central reason code agents outperform JSON tool-calling agents.[1]
By default CodeAgent runs Python through smolagents' bundled LocalPythonExecutor, an AST-walking interpreter that enforces an explicit allow list of imports and built-ins, caps the number of executed operations to prevent infinite loops, and truncates excessive output. Hugging Face is explicit in the documentation that this interpreter "must not be used as a security boundary," and recommends running untrusted code in an E2B, Modal, Blaxel, Docker, or Pyodide/Deno sandbox in any production deployment.[2][3]
ToolCallingAgent implements the more conventional pattern in which the LLM emits a JSON object describing a tool name and arguments. The framework parses the JSON, dispatches to the corresponding tool, and feeds the tool's return value back as an observation. This class exists for cases where the user prefers the JSON paradigm, for models that have been heavily fine-tuned for tool calling (such as OpenAI's function-calling models or Anthropic's tool-use Claudes), or for environments where Python execution is not available at all. Inside the library both agents share the same memory, planning, and prompting machinery; only the action format differs.[3]
A tool in smolagents is a Python class that subclasses Tool and declares a name, description, inputs schema, output_type, and a forward method. The library bundles a handful of default tools, including DuckDuckGoSearchTool, GoogleSearchTool, VisitWebpageTool, PythonInterpreterTool (for tool-calling agents that still want a Python evaluator), UserInputTool, and a FinalAnswerTool. The convenience extra pip install 'smolagents[toolkit]' pulls in this default set.[3]
The library also defines several interoperability surfaces:
Tool.from_langchain(...) wraps a langchain tool so it can be used inside a smolagents loop.Tool.from_space(...) wraps a Hugging Face Hub Gradio Space as a tool, so any public Space (for example a Stable Diffusion XL endpoint) can be called from an agent.ToolCollection.from_mcp(...) connects to a Model Context Protocol (MCP) server and surfaces all of the server's tools as smolagents tools.Tool.from_hub(...) and Tool.push_to_hub(...), which serializes the tool's Python source code, dependencies, and metadata into a Hub repository.[3]This Hub round trip is one of smolagents' more distinctive integration points. Because tools are first-class Hub artifacts, they can be versioned, shared, and discovered the same way that a model checkpoint or a dataset can.[2]
smolagents abstracts the LLM behind a Model class. Each backend implements a __call__ method that takes a list of chat messages and returns a model response. The library ships with the following first-party adapters:[2][3]
InferenceClientModel: Wraps the Hugging Face huggingface_hub Inference Client, supporting any model exposed by Hugging Face's Inference Providers, including serverless endpoints and dedicated Inference Endpoints.LiteLLMModel: Routes through the LiteLLM library, which provides a unified API over 100+ cloud LLMs including openai GPT-4 and GPT-5, anthropic Claude, Google Gemini, Mistral, Together AI, OpenRouter, and many others.OpenAIModel: A direct adapter for OpenAI and OpenAI-compatible servers such as Together AI or vLLM-served endpoints.TransformersModel: Loads a local model via the transformers library and runs inference on the host machine, suitable for running deepseek R1, Llama, Qwen, or other open weight models locally.AzureOpenAIModel: Targets Azure OpenAI deployments.AmazonBedrockModel: Targets AWS Bedrock.MLXModel and VLLMModel: Target Apple's MLX framework and the vLLM inference server respectively.This model-agnostic posture is central to smolagents' positioning. A code agent that runs against a hosted Claude or GPT-5 in development can in principle be moved over to a self-hosted DeepSeek R1 instance without any code changes beyond instantiating a different Model subclass.
Code agents only work if there is somewhere to run the code. smolagents abstracts that as a PythonExecutor and ships several implementations:[2][3]
LocalPythonExecutor runs code in-process using the bundled AST-walking interpreter. It is fast and dependency-free but, as already noted, is not a security boundary.E2BExecutor runs each code block in a fresh E2B cloud sandbox. E2B provides Firecracker-based microVMs that boot in milliseconds, making it practical to give each agent step its own isolated environment.DockerExecutor runs code inside a self-managed Docker container, suitable for on-premises or air-gapped deployments.ModalExecutor runs code on Modal's serverless infrastructure.BlaxelExecutor runs code on the Blaxel platform.WasmExecutor (Pyodide plus Deno) compiles the code to run inside a WebAssembly sandbox in the same process, providing OS-level isolation without requiring an external service.The library's documentation walks through the security tradeoffs in some depth, recommending E2B or Docker as the default choice for any agent that runs untrusted instructions, and reserving the local executor for trusted internal pipelines.[3]
Internally, each agent maintains a list of MemoryStep records, one per loop iteration, capturing the LLM input, the LLM output, the executed code or tool call, and the observation that came back. The agent exposes hooks for inspecting and editing this memory between steps, which is the main lever for advanced behaviors such as reflection or self-correction. The library also supports an optional planning_step, which periodically asks the LLM to summarize known facts and produce a plan, separately from the per-step action generation. Hugging Face's GAIA submission found that re-planning every two steps for a manager agent and every five steps for a web-browsing sub-agent improved performance.[6]
The system prompt is template-driven and lives in a YAML file shipped with the library. Users can override it by passing system_prompt= to the agent constructor, which is the preferred extension point for steering style or adding safety instructions.[3]
A CodeAgent or ToolCallingAgent can itself be wrapped as a tool and given to another agent. This is the smolagents idiom for multi-agent systems, often described in the documentation as a "manager agent" coordinating one or more "worker agents." The July 2024 Transformers Code Agent GAIA submission used this pattern, with a top-level ReactCodeAgent orchestrating a file inspector, a visualizer, and a web-browsing search sub-agent.[6] Hugging Face explicitly contrasts this lightweight composition pattern with the more elaborate graph-based or role-based orchestration in langgraph and crewai.
The central technical claim of smolagents is that code is a strictly better action format than JSON tool calls for LLM agents. The launch post lays out the argument in four points:[1]
Hugging Face buttresses these claims by citing three research papers that argue the same point empirically: Wang et al.'s "Executable Code Actions Elicit Better LLM Agents" (2024), "If LLM Is the Wizard, Then Code Is the Wand" (2024), and "DynaSaur: Large Language Agents Beyond Predefined Actions" (2024).[1] The Wang et al. paper, which compares CodeAct-style agents against JSON tool-call agents on a battery of benchmarks, is the most directly relevant; it reports that code agents need roughly 30 percent fewer turns to reach a correct answer, a number that Hugging Face repeats in its own documentation.[12]
The open-source-DeepResearch followup post in February 2025 provides smolagents' own internal comparison: holding all other components fixed (same model, same browser tool, same file inspector, same task set), switching from a JSON tool-call agent to a code agent moved GAIA validation pass-at-1 from 33 percent to 55.15 percent, a 22 point absolute improvement.[7]
The gaia benchmark is a 2023 benchmark of "General AI Assistant" tasks, introduced by Meta, Hugging Face, and AutoGPT collaborators, that pairs open-ended questions with grounded tools (web browser, file reader, image analyzer). It is widely regarded as the most difficult publicly available agent benchmark, because tasks routinely require chains of five or more correct decisions and Level 3 tasks are designed to be unsolvable without correct multi-step orchestration.[13]
smolagents and its transformers.agents predecessor have been used in two notable GAIA submissions:
The same blog post reports that the JSON tool-call variant of the same agent, run with the same model and tools, achieved only 33 percent pass-at-1, the basis for the 22-point code-versus-JSON gap Hugging Face cites in marketing material.[7]
The smolagents repository also maintains its own internal benchmark, smolagents/benchmark-v1, a smaller hand-curated set of agentic tasks intended for regression-testing changes to the library and for cross-model comparison. The associated leaderboard space, smolagents-benchmark/smolagents_llm_leaderboard, ranks LLMs by their performance when driving a smolagents code agent through the benchmark.[14]
smolagents occupies a particular niche in the increasingly crowded agent framework landscape. The 2025 comparative surveys from Langfuse, Latenode, and others all describe it as the minimal, opinionated, code-first option, in contrast to the more elaborate offerings from competitors.[15]
vs langgraph. LangGraph (from LangChain Inc.) models agent workflows as explicit directed graphs of nodes and edges, with first-class state objects, conditional routing, and built-in checkpointing. It is generally regarded as the production-oriented framework of choice for complex, branching workflows that need precise observability and rollback. smolagents takes the opposite approach: it leaves the orchestration implicit in the agent's own reasoning loop, exposes a much smaller API, and lets the agent compose actions on the fly as Python code rather than as graph nodes. Practical guidance from comparative posts is to pick LangGraph for production workflows where every transition needs to be auditable, and smolagents for research and rapid prototyping where one wants to give a model agency over how to chain steps together.[15]
vs autogen. AutoGen, originally from Microsoft Research, is built around the idea of multi-agent conversation: agents talk to each other in turn until they reach a solution. AutoGen has strong support for human-in-the-loop and group-chat patterns. smolagents, by contrast, defaults to a single-agent ReAct loop and treats multi-agent setups as a special case where one agent is used as a tool by another. Hugging Face argues that the conversational paradigm in AutoGen often hides the same Python code execution that smolagents makes explicit, and that being honest about code as the action format simplifies debugging.[1][15]
vs crewai. CrewAI is organized around named "crews" of role-specialized agents that collaborate on tasks, with a heavy emphasis on declarative agent definitions and role prompts. It is comparatively opinionated about how multi-agent collaboration should be structured. smolagents is much more permissive, providing only the primitives (agent, tool, executor) and leaving the team structure up to the user. CrewAI tends to be picked for marketing-content, sales, or research-assistant workflows where role-based prompting is a natural fit. smolagents tends to be picked when the team structure is simple but the per-step actions are complex.[15]
vs OpenAI Agents SDK. The OpenAI Agents SDK, released in March 2025, is the official successor to OpenAI's earlier Swarm experiments and a peer competitor to smolagents in terms of footprint: both are deliberately small libraries. The Agents SDK is JSON tool-call-first, deeply integrated with OpenAI's Responses API, and ships with built-in handoffs and guardrails. smolagents is code-action-first and model-agnostic. The two frameworks make the cleanest direct comparison in the field on the code-versus-JSON design question.[15]
vs Microsoft Semantic Kernel and Microsoft Magentic-One. Microsoft's Semantic Kernel is a polyglot (C#, Python, Java) SDK with a heavier enterprise emphasis on plugins and planners. Magentic-One is the agent system Hugging Face's open-source Deep Research effort explicitly benchmarked against on GAIA. smolagents reports a clear win against the public Magentic-One numbers, though Microsoft's internal followups have not been independently verified.[7]
vs other lightweight libraries. Other small, code-action-friendly libraries in the same niche include agno (formerly Phidata) and various small wrappers around the OpenAI Code Interpreter. None of these have matched smolagents in terms of Hub integration or in citation count by independent benchmarks.
The general consensus across the 2025 and 2026 framework comparison literature is that smolagents has found a comfortable position as the default research and prototyping framework for code-action agents, particularly within the Hugging Face ecosystem, while heavier frameworks dominate production deployments where workflow auditability matters more than expressiveness.[15]
smolagents is tightly integrated with the rest of the Hugging Face platform. Tools and agents can be pushed to and pulled from the Hub, complete with the Python source code and a requirements.txt, so that an agent definition becomes a Hub artifact in the same way a model checkpoint is. Agents can be launched as Gradio Spaces directly from a Python script, which is how the public Open Deep Research demo at m-ric-open-deep-research.hf.space is hosted.[7]
The library also ships two command-line utilities, smolagent and webagent, that allow a user to run a code agent or a web-browsing agent on a one-off task from the shell without writing any boilerplate Python. These are particularly useful for quick experiments and for embedding agentic steps inside larger shell pipelines.[3]
On the documentation side, the official huggingface.co/docs/smolagents site provides a guided tour, conceptual guides on tool design and code execution, and a series of how-to recipes covering text-to-SQL, web browsing, multi-agent setups, and Retrieval-Augmented Generation. A separate free course, "AI Agents" on huggingface.co/learn, uses smolagents as its reference framework.[9]
By late 2025 the library had attracted contributions from several hundred independent developers, and the smolagents organization on the Hub hosted dozens of public agents and tools, ranging from data-analysis assistants to vision-driven web browsers. Independent reproductions of the Open Deep Research recipe on private LLM endpoints, including DeepSeek R1 and Qwen, have become a popular weekend project in the open weights community.[7][9]
smolagents is released under the Apache License 2.0, the same permissive license Hugging Face uses for transformers, datasets, diffusers, and most of its other open source libraries. The license permits commercial use, modification, distribution, and private use, subject to the standard requirements of attribution and inclusion of the license text in redistributions.[16]