Agent Tools API
Last reviewed
May 16, 2026
Sources
16 citations
Review status
Source-backed
Revision
v3 ยท 3,982 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 16, 2026
Sources
16 citations
Review status
Source-backed
Revision
v3 ยท 3,982 words
Add missing citations, update stale details, or suggest a clearer explanation.
Agent Tools API is xAI's server-side tool suite for agentic workflows built around the Grok family of language models. xAI announced it on November 19, 2025 alongside Grok 4.1 Fast, pitching the API as a way to give agents access to real-time X data, web search, hosted Python code execution, and other built-in tools without forcing developers to assemble and operate those systems themselves.[1][2]
The API is exposed through three surfaces: the OpenAI-compatible Responses API at https://api.x.ai/v1/responses, the native xAI SDK in Python, and the Vercel AI SDK in JavaScript. The central design idea is that Grok decides when and how to call tools, often combining several in parallel across multiple turns, while execution stays on xAI's infrastructure. Developers do not have to handle separate API keys, rate limits, sandboxes, or retrieval pipelines for each integration.[1][3]
At launch, xAI offered both Grok 4.1 Fast and the Agent Tools API for free for two weeks ending December 3, 2025, with Grok 4.1 Fast also free during the same window through OpenRouter. After the promotion, xAI priced Grok 4.1 Fast at $0.20 per million input tokens (with caching discounts down to $0.05) and $0.50 per million output tokens, and capped most server-side tool invocations at $5 per 1,000 successful calls.[1][2][4]
The Agent Tools API sits at the heart of xAI's pivot from a chatbot company to a developer platform. It was the first feature that put Grok on the same product map as OpenAI Responses API, the Anthropic Claude Messages API tool use surface, and Google's Gemini API tool calling. The X search tool, which pulls posts, threads, and user data directly from X, remains its most defensible feature because no other major provider has comparable first-party access to that dataset.[5]
Throughout 2025, hosted tool layers became the new battleground for large model providers. OpenAI led with the Responses API, which exposed web search, file search, code interpreter, and computer use as first-party tools that ran on OpenAI infrastructure. Anthropic followed with web search, code execution, and computer use built into its Messages API, and Google added Google Search grounding, code execution, and URL context to the Gemini API. Each provider argued that hosted tools simplified agent development by removing the operational burden of running search backends or sandboxes.
xAI had been catching up. Through most of 2025, Grok's developer surface relied on plain function calling and a real-time search add-on that the company had marketed as Live Search. Function calling worked, but it forced every developer who wanted web search, X search, or Python execution to either build their own pipeline or stitch together third-party connectors. xAI's November 19, 2025 announcement collapsed those capabilities into a single tools array that any compatible client could send to the Responses API.[1][2]
The launch was paired with Grok 4.1 Fast, a long-context tool-calling model with a 2 million token context window aimed specifically at agentic tasks. xAI shipped two variants of the model, grok-4-1-fast-reasoning for harder multi-step problems and grok-4-1-fast-non-reasoning for latency-sensitive workloads, and reported headline benchmark numbers on agentic evaluations such as 100 percent on the Telecom variant of tau-bench, 72 percent on Berkeley Function Calling v4, 63.9 on xAI's internal Research-Eval, 87.6 on Reka FRAMES, and 56.3 on the company's internal X Browse benchmark. xAI compared the latter two figures against GPT-5 scores of 45.5 and 24.2 on the same evaluations.[1][2]
The November release was overshadowed in the press by a separate controversy about Grok's chatbot output praising Elon Musk as more athletic than LeBron James and smarter than Albert Einstein. VentureBeat and other outlets observed that the technical achievement of the Agent Tools API was buried by what they called a glazing scandal, which complicated xAI's enterprise pitch about a maximally truth-seeking model. Several writers noted that the same alignment instability that produced the Musk praise on the consumer X feed was a concern for any business considering Grok for autonomous, tool-using workflows.[6]
At launch, the Agent Tools API exposed a small set of hosted tools that Grok could invoke through a single request. Tool names differed slightly between the xAI SDK and the OpenAI-compatible Responses API surface, but the underlying capability was the same.[3][5]
| Tool | xAI SDK name | Responses API name | What it does |
|---|---|---|---|
| Web Search | web_search | web_search | Searches the live web and browses pages for current information, with optional domain allowlists and image understanding |
| X Search | x_search | x_search | Pulls posts, users, and threads from X (formerly Twitter) for real-time social data |
| Code Execution | code_execution | code_interpreter | Runs Python in a sandboxed environment with packages such as NumPy, Pandas, Matplotlib, and SciPy |
| Collections Search | collections_search | file_search | Performs semantic search over uploaded files and document collections |
| View Image | view_image | view_image | Lets the agent open and analyze images encountered during search results |
| Remote MCP Tools | mcp_call | mcp | Connects Grok to external MCP servers over Streaming HTTP or SSE so the agent can call third-party tools |
The View Image tool is a helper that lets the model open images returned inside a search result and analyze their contents. xAI's documentation treats it as a sub-capability of web and X search rather than a tool the user invokes on its own. Enabling image understanding on web search automatically enables it for X search inside the same request.[5]
Web Search lets Grok search the live internet and then browse individual pages to pull more detail. The configuration accepts up to five entries each in allowed_domains and excluded_domains, which are mutually exclusive, and an enable_image_understanding flag that activates the view image tool for any results the model encounters during the search. xAI exposes the same tool under three SDKs, web_search() in the xAI Python SDK, the web_search object type in the Responses API, and xai.tools.webSearch() in the Vercel AI SDK.[5]
The Web Search tool always returns citations on the response object. Inline citations appear as Markdown links such as [[1]](https://x.ai/news) inside the generated text. They are enabled by default in the Responses API and toggled on in the xAI SDK with include=["inline_citations"]. xAI's documentation notes that enabling the feature does not force the model to cite every claim, only that it should attach a citation when its response is grounded in a tool-fetched source.[10]
With Grok acting as the agent, Web Search behaves more like a small browsing loop than a single lookup. The model can query, read snippets, decide that it needs to open a page, and then continue searching against the new context until it has enough information.
X Search is the API's distinguishing tool. It searches X posts, users, and threads through xAI's pipeline rather than through the public X API or a third-party connector. The tool supports several modes including keyword search, semantic search, and user search. Posts can be filtered by date range, language, and engagement metrics, and the model can chain X Search with Web Search and Code Execution in the same query to combine real-time social data with broader web context.[3][7]
For enterprise customers, X Search is the strongest reason to consider Grok as the agent backbone. No other major model provider can hit X without a paid third-party connector or general web search. xAI has positioned the tool as a way to pull market-moving news, customer support signals, sports scores, election results, and other content that surfaces faster on X than on the open web.
The Code Execution tool runs Python in a secure sandbox with most popular scientific packages preinstalled, including NumPy, Pandas, Matplotlib, and SciPy. The runtime is stateless between requests and has no access to external networks or persistent file systems. The documentation flags that complex computations can hit execution time and memory limits, and recommends pairing the tool with reasoning models for math, data analysis, financial modeling, and scientific computing.[8]
In the Responses API the tool is registered as code_interpreter, matching OpenAI's naming convention, and in the xAI SDK it is exposed as code_execution(). Output from the code execution call is returned alongside the model response so the calling client can render charts, dataframes, or computed values.
Collections Search powers retrieval over uploaded knowledge bases and is the building block for retrieval-augmented generation workflows on xAI. It accepts an array of collection_ids (or vector_store_ids in the OpenAI-compatible surface), a query string, and a max_num_results limit, and returns matches with citation URIs in the form collections://collection_id/files/file_id. Supported file types include PDFs, text files, CSVs, and other common formats.[9]
Unlike a single lookup, the tool is built for autonomous multi-document synthesis. One example in xAI's documentation shows Grok performing thirteen sequential Collections Search calls against SEC filings before synthesizing Tesla's production figures with full source attribution. Collections Search supports hybrid workflows that combine it with Web Search and X Search, so a customer support agent can answer a question by simultaneously checking an internal knowledge base, public documentation on the company website, and recent X posts that mention an outage.
The View Image tool is a sub-capability that lets the model open and analyze images encountered during search. It is activated by setting enable_image_understanding=True on Web Search, and is automatically applied to any X Search calls in the same request. The tool is billed by token rather than per call.[5]
Remote MCP Tools let xAI act as the Model Context Protocol host. Developers pass a server_url and server_label, optionally restrict access with allowed_tools, and provide credentials through an authorization token or arbitrary headers. The xAI SDK calls the additional headers parameter extra_headers and the allowed tools parameter allowed_tool_names. xAI's documentation says only Streaming HTTP and Server-Sent Events transports are supported for the MCP connection.[11]
MCP integration lets the agent reach beyond xAI's first-party catalog into the wider ecosystem of MCP servers shipped by companies such as GitHub, Stripe, and Notion, as well as community-built integrations. It is billed only by token, not by call.
All Agent Tools API tools execute on xAI infrastructure. The model can keep calling tools across multiple turns until it has enough information to answer, and the documentation explicitly supports multiple server-side tools running in the same request, so a single query can combine web search, X search, and code execution. Tool calls are streamed back to the client through verbose streaming so the application can show users which tool is running.[12][14]
For stateful agent loops, xAI offers two patterns. The first stores conversation history remotely on the xAI server by setting store_messages=True and lets the client resume from the previous response ID, with the full reasoning trail, tool calls, and tool responses available on the server. The second returns encrypted reasoning and encrypted tool output to the client, which is then echoed back on the next request to preserve the agent's intermediate state without exposing it to the calling application.[12]
Follow-up requests do not need to use the same tools, model parameters, or other configuration as the original. xAI's documentation notes that the conversation is fully hydrated with the previous agentic state, so a developer can run a web search heavy first turn and then switch to a code execution heavy second turn without rebuilding context.[12]
Because execution stays on xAI's side, developers do not provision sandboxes, manage tool quotas, or rotate keys for each integration. The trade-off is that tool runtime, retries, and rate limits are determined by xAI rather than the calling application. For workloads where deterministic infrastructure matters, the API also supports client-side function calls that pause execution and return control to the application; those calls act as checkpoints that reset the max_turns counter.[12]
The Agent Tools API returns two kinds of citations for tool-backed answers. A citations attribute on the response object lists every URL the agent encountered during its tool calls; xAI says this list is always returned by default with no extra configuration. Inline citations are Markdown links such as [[1]](https://x.ai/news) placed in the generated text where the model references a source. Citation numbers always start from 1 and increment sequentially, and the same number is reused when the same source appears more than once.[10]
Inline citations are on by default in the Responses API and can be turned off by adding "no_inline_citations" to the include field. In the native xAI SDK, developers enable them with include=["inline_citations"]. Each annotation carries a url_citation type, the source URL, character start_index and end_index positions, and a numeric title used for rendering. During streaming, the citations accumulate as the model generates text.[10]
The xAI SDK exposes built-in tools as small Python helpers. A minimal request that turns on Web Search, X Search, and Code Execution looks like this:
from xai_sdk import Client
from xai_sdk.tools import web_search, x_search, code_execution
import os
client = Client(api_key=os.getenv("XAI_API_KEY"))
chat = client.chat.create(
model="grok-4-1-fast",
tools=[web_search(), x_search(), code_execution()],
)
response = chat.sample(
"Find the latest 10-K filing for NVIDIA, pull recent X posts about"
" the company's data center revenue, and compute the year over year"
" growth rate from the cited numbers."
)
print(response.content)
for citation in response.citations:
print(citation)
A more realistic agent that also adds Collections Search and a Remote MCP server looks like this:
import os
from xai_sdk import Client
from xai_sdk.tools import (
code_execution,
web_search,
x_search,
collections_search,
mcp,
)
client = Client(api_key=os.getenv("XAI_API_KEY"))
chat = client.chat.create(
model="grok-4-1-fast-reasoning",
tools=[
web_search(allowed_domains=["sec.gov", "nvidia.com"]),
x_search(),
code_execution(),
collections_search(collection_ids=["col_internal_filings"]),
mcp(server_url="https://mcp.example.com", server_label="crm"),
],
include=["inline_citations", "code_execution_call_output"],
)
The equivalent call against the OpenAI-compatible Responses API at https://api.x.ai/v1/responses uses the names web_search, x_search, and code_interpreter in the tools array.[3][5]
For streaming with tool call visibility, xAI's official SDK example uses verbose_streaming and code_execution_call_output to surface every tool invocation and its outputs as they happen:[13]
from xai_sdk import AsyncClient
from xai_sdk.chat import user
from xai_sdk.tools import web_search, x_search, code_execution
async def agentic_search(client: AsyncClient, model: str, query: str) -> None:
chat = client.chat.create(
model=model,
tools=[web_search(), x_search(), code_execution()],
include=["verbose_streaming", "code_execution_call_output"],
)
chat.append(user(query))
async for response, chunk in chat.stream():
for tool_call in chunk.tool_calls:
print(f"Calling tool: {tool_call.function.name}")
Requests that use server-side tools are billed in two parts: the underlying token usage of the model and a per-call charge for each successful tool invocation. Because the agent decides how many tools to run, the total cost scales with query complexity rather than with a fixed agent fee.[3]
| Cost component | Charge |
|---|---|
| Web Search invocation | $5 per 1,000 successful calls |
| X Search invocation | $5 per 1,000 successful calls |
| Code Execution invocation | $5 per 1,000 successful calls |
| Collections Search invocation | $2.50 per 1,000 successful calls (post-launch tier) |
| File Attachments | $10 per 1,000 successful calls |
| View Image | Token-based, no per-call fee |
| Remote MCP Tools | Token-based, no per-call fee |
| Custom function calling | Token-based, no per-call fee |
| Grok 4.1 Fast input tokens (Nov 2025) | $0.20 per 1M (cached as low as $0.05 per 1M) |
| Grok 4.1 Fast output tokens (Nov 2025) | $0.50 per 1M |
xAI noted in its November 19, 2025 release notes that it had cut tool pricing by up to fifty percent so that no server-side tool exceeded $5 per 1,000 successful calls. The launch promotion made every Agent Tools call free until December 3, 2025.[2][4]
In practice, a single research query can fan out into several tool calls. Independent analyses of the pricing model estimated that a typical agent question triggers three to five Web Search calls plus a couple of code executions, adding $0.015 to $0.040 per query in tool fees on top of token costs. The model decides how many tools to run, so applications that want predictable spending often cap turns or whitelist a small set of tools.
Agent Tools API entered a market where most large model providers already shipped some form of hosted tool layer. The closest analogues are OpenAI Responses API, Anthropic Claude's tool use, Google's Gemini API tool use, Mistral's Agents API, and AWS Bedrock Agents.
| Provider | Hosted tools offered | Distinguishing feature |
|---|---|---|
| xAI Agent Tools API | Web search, X search, code execution, collections search, view image, remote MCP | Native X data access, single price ceiling of $5 per 1,000 successful tool calls |
| OpenAI Responses API | Web search, file search, code interpreter, computer use, image generation, MCP | Largest third-party tool ecosystem, integrated computer use, mature SDK |
| Anthropic Messages API tool use | Web search, code execution, MCP, computer use | Strong support for parallel tool calls and tool result reasoning, no first-party social data tool |
| Google Gemini API tool use | Google Search grounding, code execution, function calling, URL context | Tight integration with Google Search and YouTube |
| Mistral Agents API | Web search, code interpreter, image generation, document library, MCP | Per-agent design with persistent agent IDs and handoffs |
| AWS Bedrock Agents | Action groups, knowledge bases, code interpreter, multi-agent orchestration | Runs inside AWS accounts with IAM-level controls |
xAI's distinguishing feature is the X Search tool, which queries posts, users, and threads from X directly through xAI's pipeline. Other providers can hit X only through general web search or paid third-party connectors. The single price ceiling of $5 per 1,000 successful tool calls also undercuts much of the competition, which often charges more per call for premium tools such as web search.
Where OpenAI and Anthropic have larger SDK ecosystems and more partner-built integrations, Agent Tools API attempts to close the gap by adopting OpenAI's Responses API names for most tools. The choice means most code that targets OpenAI's hosted tools can be repointed at https://api.x.ai/v1 with minor changes, lowering the cost of trying Grok as a drop-in replacement for an OpenAI agent.
xAI's launch material focuses on four categories of agent work. Customer support automation uses the long context window of Grok 4.1 Fast plus Web Search and Collections Search to look up account details, knowledge base entries, and policy documents in the same conversation. Research and agentic search workflows combine Web Search, X Search, and Code Execution to gather material from many sources and run calculations on top of it.[1]
Long-horizon multi-tool workflows benefit from the encrypted state pattern, which lets an agent run for many turns without the client storing intermediate reasoning. Because the agent runs server-side and does not need a local browser or container, the API also gives Grok browser-like behavior without forcing developers to set up local automation infrastructure.[12]
Financial and market workflows take advantage of the X Search tool to surface market-moving posts seconds after they go live, then feed the same query through Web Search for SEC filings or press releases and Code Execution for back-of-the-envelope modeling. Developer relations and community management teams use X Search and Web Search together to track sentiment and incidents across platforms.
Voice agents add another use case. The November 19, 2025 release brought tool support to Grok's Voice Agent API, which suggests xAI is pushing the same hosted tool layer across text, voice, and future modalities rather than building separate stacks for each. In April 2026 the company doubled down on that direction with grok-voice-think-fast-1.0, a voice agent model designed for complex multi-step workflows that calls the same Agent Tools API surface for tasks like CRM lookups, calendar manipulation, and database queries through Collections Search.[3][14]
Agent Tools API fits into a broader push by xAI and X Corp to package Grok as a developer platform rather than only a consumer chatbot. The API ships alongside Grok 4.1 Fast, a long-context tool-calling model with a 2 million token context window aimed at agentic tasks. Agent Tools API was xAI's first serious attempt to compete with OpenAI's Responses API on hosted tooling, and it leveraged X data as a defensible advantage that no other provider can match without going through xAI.[1][2]
The API became a central building block for xAI's wider 2026 platform. Grok 4.20, released in February and March 2026 with a four-agent and sixteen-agent multi-agent architecture, used the same tools. Grok 4.3, which launched in beta on April 17, 2026 and officially on the API in late April, became the default agentic tool-calling model and saw the older grok-4, grok-4-fast, and grok-4-1-fast aliases scheduled for retirement on May 15, 2026 with traffic redirected to the new model. Tool pricing remained at $5 per 1,000 calls across the catalog for the three search-style tools.[15][16]
The November 19, 2025 release also brought tool support for Grok's Voice Agent API, signaling that xAI intended to share the same hosted tool layer across modalities rather than build separate stacks for each.[3][14]
The API is tied to the xAI ecosystem. Some tools, especially X Search, only make sense if the application's users tolerate a Grok-only architecture, and the underlying X data is governed by X's own terms. The hosted execution model also means the tool surface, runtime limits, and retry logic are decided by xAI rather than the calling application. Developers who need their own sandboxing or custom search backends still have to fall back to function calling.
Documentation and community resources remain smaller than for the equivalent OpenAI Responses API or Anthropic tool use surfaces, since both shipped earlier and have larger SDK and tutorial ecosystems. Some niche features that competitors expose, such as native computer use, video understanding through the same agent surface, or first-party image generation as an in-conversation tool, are not currently part of the core Agent Tools API even though xAI exposes those capabilities through separate Imagine, Speech, and Voice endpoints.
The November 2025 launch was also dragged into a broader debate about Grok's reliability. VentureBeat and other outlets noted that the technical release was overshadowed by a viral incident in which the consumer Grok chatbot praised Elon Musk in implausible terms compared to athletes and scientists, sparking renewed concern about alignment and bias controls. Coverage argued that the same instability that allowed the chatbot's behaviour would be amplified by autonomous, tool-using agents, and that enterprises would want stronger safeguards, audit trails, and reproducible evaluations before committing core workloads to the platform.[6]
A further concern surfaced in the response patterns. Because the model is free to chain several tool calls per turn, real bills can balloon when an agent runs unbounded loops. Several practitioner blogs recommended capping max_turns and applying strict allowed-domain lists on Web Search to keep costs predictable.