The OpenAI Responses API is the agent-oriented HTTP API that OpenAI recommends for building agentic applications. A single endpoint, POST /v1/responses, returns a stored response object that can carry text, images, function calls, and the results of OpenAI-hosted tools. OpenAI introduced the API on March 11, 2025 alongside the Agents SDK, then expanded it on May 21, 2025 with remote MCP support, image generation, code interpreter, and several enterprise features. After Responses reached feature parity with the older Assistants API, OpenAI announced an Assistants beta deprecation on August 26, 2025 with a sunset on August 26, 2026, naming Responses and the related Conversations API as the long-term replacements.[1][2][3][4]
The Responses API is OpenAI's attempt to fold the simplicity of the Chat Completions API together with the agentic capabilities that previously lived in the Assistants stack. A single request can call multiple built-in tools, preserve reasoning state across turns, and return polymorphic output items in the order they happened. Where Chat Completions returns one message, Responses returns a list of typed items, including message, reasoning, function_call, web_search_call, file_search_call, code_interpreter_call, and image_generation_call entries that document each step the model took.[5][3]
Responses are stored by default. Every call returns a response.id that the caller can pass into the next request as previous_response_id, which lets OpenAI manage conversation history server-side rather than forcing the client to resend the full transcript on every turn. A separate Conversations API (added in August 2025) extends this with a long-running Conversation object that holds messages, tool calls, and tool outputs across sessions. The combination gives developers a stateful interface that does not depend on client-side history management.[5][6]
OpenAI announced the Responses API on March 11, 2025, together with the open-source Agents SDK and a set of hosted tools aimed at agent builders. At launch the API supported three built-in tools: web search, file search, and computer use. OpenAI described the release as a way to give developers "the same building blocks we use internally to ship products like Operator and Deep Research" without forcing them to wire those tools together by hand.[1][7]
Olivier Godement, OpenAI's head of product for the API, framed the launch around a problem his team had observed in 2024: "It's pretty easy to demo your agent. To scale an agent is pretty hard, and to get people to use it often is very hard." The Responses API was meant to lower the engineering cost of getting from prototype to production by handling the orchestration loop on the server.[7]
The built-in web search tool reused the infrastructure behind ChatGPT search and was available through gpt-4o-search-preview and gpt-4o-mini-search-preview. OpenAI reported that these search-tuned variants reached 90% and 88% accuracy on the SimpleQA factuality benchmark, compared with 63% for GPT-4.5. The computer use tool was powered by a new computer-use-preview model, the same Computer-Using Agent that powered OpenAI's Operator product. OpenAI noted that computer use was still a research preview, gated to usage tiers 3 through 5, and acknowledged that it was "not yet highly reliable for automating tasks on operating systems." The model scored 38.1% on the OSWorld benchmark at release.[1][7][8]
On May 21, 2025 OpenAI shipped a major update that added remote MCP server support, native image generation, code interpreter, file search upgrades, background mode, reasoning summaries, and encrypted reasoning items.[2]
| Addition | What OpenAI added |
|---|---|
| Remote MCP support | Connect to any remote Model Context Protocol server with a few lines of code; OpenAI joined the MCP steering committee at the same time |
| Image generation | gpt-image-1 available as a tool inside Responses, with streaming previews and multi-turn refinement |
| Code interpreter | Hosted Python sandbox available as a tool, the same execution environment used by ChatGPT |
| File search upgrades | Multi-vector-store search and metadata attribute filtering |
| Background mode | Asynchronous long-running tasks for jobs that exceed the synchronous timeout |
| Reasoning summaries | Natural-language summaries of the model's hidden chain of thought |
| Encrypted reasoning items | Reusable reasoning state for Zero Data Retention customers, so reasoning can persist across calls without OpenAI storing it |
The May update also unlocked tool use inside the chain of thought for OpenAI o3 and OpenAI o4-mini. Earlier reasoning models could call tools only after finishing their reasoning step. With the Responses API, o3 and o4-mini can interleave tool calls with their internal thinking, which OpenAI claimed produced better results on tasks where the model needs intermediate evidence to reason well.[2]
A Responses request is sent as POST /v1/responses. The minimum payload is a model and an input. The input field accepts either a plain string or a list of typed content items including input_text, input_image, input_file, and input_audio. The response object contains id, output (the list of items the model produced), output_text (a convenience accessor), usage, and metadata about any tools that were invoked.[5]
| Parameter | Purpose |
|---|---|
model | The model that will generate the response, for example gpt-5 or gpt-4.1. |
input | The prompt. Accepts a string or a list of typed input items. |
instructions | A system-level instruction inserted ahead of input. |
tools | A list of tools the model can call: built-in tools (web_search, file_search, computer_use_preview, code_interpreter, image_generation, mcp) and user-defined function tools. |
tool_choice | Controls whether the model may, must, or must not call tools. |
previous_response_id | The id of an earlier response to chain on top of. |
conversation | An identifier for a Conversations API thread. |
store | Whether the response should be stored on OpenAI's servers (defaults to true). |
stream | If true, the API streams typed events. |
background | If true, the response runs asynchronously and can be polled or cancelled. |
reasoning | Configuration for reasoning models, including effort and summary preferences. |
max_output_tokens | Cap on output tokens. |
temperature, top_p | Standard sampling controls. |
response_format / text | Structured output configuration, including JSON Schema. |
A basic Python example using the official SDK looks like this:
from openai import OpenAI
client = OpenAI()
response = client.responses.create(
model="gpt-5",
input="Summarize the latest news about the Mars Sample Return mission.",
tools=[{"type": "web_search"}],
)
print(response.output_text)
The equivalent JavaScript call uses the same shape:
import OpenAI from "openai";
const client = new OpenAI();
const response = await client.responses.create({
model: "gpt-5",
input: "Summarize the latest news about the Mars Sample Return mission.",
tools: [{ type: "web_search" }],
});
console.log(response.output_text);
A conversation can be continued by passing the prior response id:
follow_up = client.responses.create(
model="gpt-5",
input="What are the funding risks for the mission?",
previous_response_id=response.id,
)
The API also supports server-sent events streaming. When stream=true is set the server emits semantic events such as response.output_text.delta, response.function_call_arguments.delta, and response.completed, which are richer than the raw token deltas emitted by Chat Completions.[5]
One of the main reasons developers choose the Responses API over Chat Completions is the set of OpenAI-hosted tools that run server-side, so the model can invoke them inside a single request without round-tripping through the developer's backend.
| Tool | What it does | Status |
|---|---|---|
| Web search | Live web retrieval powered by ChatGPT Search infrastructure, returns cited sources | Generally available |
| File search | Vector-store-backed RAG with multi-store search, metadata filters, and custom ranking | Generally available |
| Computer use | Browser and desktop automation through the computer-use-preview model | Research preview, tiers 3 to 5 |
| Code interpreter | Hosted Python sandbox with file upload and persistent containers | Generally available |
| Image generation | Native gpt-image-1 generation and editing with streaming previews | Generally available |
| Remote MCP | Connects to any external MCP server to expose third-party tools | Generally available |
| Custom function calling | Developer-defined JSON-schema functions, the same model used in Chat Completions | Generally available |
The hosted tools can be combined freely. A model can run a web search, feed the result into code interpreter, then call a custom function the developer defined, and finally write a response. OpenAI describes this as the API being "agentic by default."[2][5]
The Responses API supports OpenAI's flagship text and reasoning models. Reasoning models pick up extra benefits because the API preserves their chain of thought across turns.[2][9]
| Model family | Examples | Notes |
|---|---|---|
| GPT-4.1 | gpt-4.1, gpt-4.1-mini, gpt-4.1-nano | Strong general-purpose models, full tool support |
| GPT-4o | gpt-4o, gpt-4o-mini, gpt-4o-audio | Multimodal text plus image; search variants for web search |
| o-series reasoning | o1, o1-mini, o3, o3-mini, o4-mini | Tool calls inside chain of thought, reasoning summaries |
| GPT-5 | gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-pro, gpt-5.2, gpt-5.4-mini, gpt-5.4-nano, gpt-5.5, gpt-5.5-pro | OpenAI recommends Responses for GPT-5 to preserve reasoning state |
| Codex variants | gpt-5.2-codex, gpt-5.3-codex | Tuned for agentic coding tasks |
| Computer use | computer-use-preview | Only available through Responses, not Chat Completions |
| Deep research | o3-deep-research, o4-mini-deep-research | Long-running research tasks, work well with background mode |
OpenAI states that GPT-5 specifically benefits from Responses because the API preserves hidden reasoning tokens across calls. The company has reported that GPT-5 scores roughly 5% higher on the TAUBench agent benchmark when used through Responses versus Chat Completions, an improvement OpenAI attributes to that reasoning preservation.[6][9]
With previous_response_id, OpenAI prepends the prior input and output items, including reasoning items from o-series models, before generating the next response. This is the recommended pattern for reasoning models because it preserves the chain of thought without the client having to handle it.[6][10]
The Conversations API is the alternative for longer-running threads. A conversation is a named container of items, and a Responses request can take a conversation id instead of (or in addition to) previous_response_id. New input and output items are appended to the conversation automatically.[5]
For sensitive deployments, OpenAI added encrypted reasoning items in May 2025. Reasoning content can be returned in encrypted form, stored on the client, and replayed on the next request without OpenAI ever retaining the plaintext. This is what lets organizations on Zero Data Retention contracts still benefit from reasoning continuity.[2]
Background mode (also added in May 2025) is for long-running agent runs. The client sets background=true, gets a response id immediately, and can poll the response or cancel it. This is the recommended approach for deep research workflows or long computer-use sessions, because a single response can take minutes to produce and a regular HTTP request would time out.[2]
The Responses API supports structured outputs through the response_format (or text.format) parameter. By passing a JSON Schema with strict: true, the developer guarantees that the response conforms to the schema; the model is constrained at decoding time so the JSON cannot be malformed. Structured outputs work with both regular chat models and reasoning models, and they also constrain the arguments emitted for function tools.[5]
Function calling itself is unchanged in spirit from Chat Completions: declare functions as tools, get back arguments, run the function, post the result. The wire format uses typed items rather than a tool_calls list, which makes mixed sequences of model text, function calls, web searches, and code execution easier to reason about as a single output stream.[5][10]
Responses API requests are billed at the standard per-token price for whichever model is invoked, plus separate per-call fees for hosted tools. The May 2025 release post itemized the tool charges.[2][11]
| Tool or feature | Price noted by OpenAI |
|---|---|
| Web search (preview, non-reasoning) | $25.00 per 1,000 calls |
| Web search (preview, reasoning) | $10.00 per 1,000 calls |
| File search | $0.10 per GB of vector storage per day, plus $2.50 per 1,000 tool calls |
| Code interpreter | $0.03 per container session |
| Image generation | $5.00 per 1M text input tokens, $10.00 per 1M image input tokens, $40.00 per 1M image output tokens |
Computer use (computer-use-preview model) | $3.00 per 1M input tokens, $12.00 per 1M output tokens |
| Remote MCP tool | No extra tool fee beyond standard token billing for the hosting model |
Reasoning tokens generated by o-series and GPT-5 thinking models count as output tokens, so a single Responses call against a reasoning model can consume far more output tokens than the visible reply suggests. Web search billing has two parts: the per-call fee shown above plus token charges for the search content the model ingests, which catches some developers off guard.[11][12]
OpenAI now runs three primary text-generation APIs. Responses is the recommended path for new projects, Chat Completions remains supported indefinitely, and Assistants is still available until its sunset.[3][6]
| Feature | Chat Completions | Responses | Assistants (legacy) |
|---|---|---|---|
| Endpoint | /v1/chat/completions | /v1/responses | /v1/threads, /v1/assistants |
| State | Stateless, client manages history | Optional server state via previous_response_id and Conversations | Server-managed Threads |
| Built-in tools | None (function calling only) | Web search, file search, computer use, code interpreter, image gen, MCP | Code interpreter, file search, function calling |
| Streaming | Token deltas | Semantic event stream | Run-event stream |
| Reasoning preservation | No | Yes, including encrypted reasoning items | Limited |
| Polymorphic output | Single message | List of typed items | Run steps |
| Status | Supported indefinitely | Recommended for new builds | Beta, sunsets August 26, 2026 |
OpenAI's official guidance is that Chat Completions is still the right choice for simple text generation and for code that needs to remain portable across providers, since the Chat Completions schema is implemented by Anthropic-compatible endpoints, Azure, and most open-source inference servers. Responses is the right choice when an application needs hosted tools, reasoning preservation, server-managed conversation state, or the ability to combine multiple tools in a single agentic loop. The Realtime API is a separate surface for low-latency voice and live multimodal sessions over WebSocket and is not replaced by Responses.[3][6]
Responses is OpenAI's answer to a wave of agent-oriented APIs that other model providers have shipped over the same period.[13][14]
| API | Provider | State | Built-in tools | Notes |
|---|---|---|---|---|
| Responses | OpenAI | Server-side via Conversations | Web search, file search, computer use, code interpreter, image gen, MCP | Polymorphic output, encrypted reasoning |
| Messages API | Anthropic | Client-managed | Computer use, code execution, web search, MCP connectors | Extended thinking, prompt caching, top_k |
| Gemini API | Client-managed, optional File API | Google Search grounding, code execution, function calling | Multimodal native, large context | |
| Mistral Agents API | Mistral | Server-side agents | Web search, code interpreter, image gen, document library, MCP | Agent objects similar to Assistants |
| Bedrock Agents | AWS | Server-managed sessions | Knowledge bases, action groups, code interpreter | Multi-model, runs across Anthropic, Mistral, Meta, Amazon Nova |
The Responses API and the Anthropic Messages API are the closest competitors. Both expose computer use, web search, and MCP as first-class tools, and both ship official SDKs for Python and Node. The biggest practical differences are that Anthropic exposes its full chain of thought verbatim while OpenAI summarizes it, and Anthropic exposes prompt caching as a developer-controlled feature while OpenAI's caching is implicit. Anthropic's API also keeps system prompts in a separate system field rather than embedding them in the message array.[13][14]
Migrating from Chat Completions is mostly a matter of renaming messages to input and accepting a different output shape. A Chat Completions message list translates almost directly: each message becomes an input item with the same role and content. The biggest change for callers is reading response.output_text instead of response.choices[0].message.content, and looking through response.output to handle tool calls, since tool calls are a separate item type rather than a field on a message.[3]
Migrating from Assistants is more involved because Assistants used persistent Assistant, Thread, Message, and Run objects, while Responses replaces them with Response and Conversation objects plus reusable Prompts. OpenAI published a side-by-side migration guide that maps each Assistants concept to its Responses equivalent: Threads become Conversations, Assistants become Prompts (versioned in the dashboard rather than created via the API), and Runs collapse into a single responses.create call that may invoke many tools. Vector stores carry over directly because both APIs use the same vector store backend for file search.[3][15]
OpenAI announced the Assistants API beta deprecation on August 26, 2025 with a sunset date of August 26, 2026. The company gave developers a full year to migrate. After the sunset, the /v1/assistants and /v1/threads endpoints will return errors, and existing Assistants will need to be reimplemented as Prompts and Conversations on the Responses API. OpenAI explicitly framed the move as the result of Responses reaching parity with everything Assistants offered, including code interpreter and persistent conversation state, plus new capabilities such as remote MCP, computer use, and reasoning preservation that Assistants never supported.[15][4]
The Agents SDK is OpenAI's open-source orchestration framework for multi-agent workflows. It launched on March 11, 2025 alongside the Responses API and is available for both Python and JavaScript. The SDK is built on top of Responses and uses the API's native tool-call loop as its execution primitive. An Agent in the SDK is a thin wrapper around a Responses call with a system instruction, a list of tools, and optional handoffs to other agents.[1][7]
A minimal Agents SDK script in Python looks like this:
from agents import Agent, Runner
researcher = Agent(
name="Researcher",
instructions="Use web search to gather facts.",
tools=[{"type": "web_search"}],
)
result = Runner.run_sync(researcher, "Latest funding rounds in robotics startups.")
print(result.final_output)
The SDK handles the tool-call loop, retries, tracing, and handoffs between agents. Because it sits on top of Responses, anything the SDK can do is also possible with raw Responses calls if a developer wants finer control.
The Responses API has matured quickly but still has gaps developers should plan around. Computer use is gated to higher usage tiers and remains a research preview, with OpenAI itself noting it is not yet reliable for general OS automation. Some hosted tools, including image generation and code interpreter, are not yet available in every Azure region; Microsoft's Azure OpenAI service rolled them out behind a separate availability matrix. The hosted tools are tightly coupled to OpenAI's ecosystem, so an application that relies heavily on the built-in web search or file search cannot easily swap to another model provider without rewriting that orchestration. The Responses schema is more expressive than Chat Completions, but that same expressiveness has slowed adoption by third-party libraries that originally targeted the Chat Completions shape.[7][8][13]
OpenAI has also been criticized for the way Responses handles reasoning. The API does not return raw chain-of-thought text by default. Instead it returns a summary or, for Zero Data Retention customers, an encrypted blob that can be passed back in subsequent calls. Some developers and researchers have argued that this design exists partly to keep OpenAI's reasoning traces out of competitors' training data. Anthropic's API, by comparison, returns the model's full thinking when the developer asks for it, which makes debugging and analysis easier at the cost of more exposed surface area.[16]
Reception of the Responses API has been broadly positive among developers building agents. Coverage in TechCrunch, VentureBeat, and InfoQ described the launch as a long-overdue consolidation of OpenAI's previously fragmented agent stack, with the Agents SDK and computer use rounding out the picture. The May 2025 update was widely covered as the moment when Responses reached real feature parity with Assistants, and the August 2025 deprecation announcement confirmed Responses as the long-term direction.[7][8][4][14]
Criticism has clustered around three themes. First, the cost of hosted tools, especially the $25 per 1,000 web search calls on preview models, prompted developers to compare Responses unfavorably with rolling their own retrieval pipelines. Second, the lack of cross-provider portability, since Responses is OpenAI-only while Chat Completions has near-universal third-party support. Third, the secrecy around reasoning tokens, which makes it harder for outside researchers to audit how the models behave during agentic loops.[11][13][16]