OpenAI Responses API

The OpenAI Responses API is the agent-oriented HTTP API that OpenAI recommends for building agentic applications. A single endpoint, POST /v1/responses, returns a stored response object that can carry text, images, function calls, and the results of OpenAI-hosted tools. OpenAI introduced the API on March 11, 2025 alongside the Agents SDK, then expanded it on May 21, 2025 with remote MCP support, image generation, code interpreter, and several enterprise features. After Responses reached feature parity with the older Assistants API, OpenAI announced an Assistants beta deprecation on August 26, 2025 with a sunset on August 26, 2026, naming Responses and the related Conversations API as the long-term replacements.[1][2][3][4]

overview

The Responses API is OpenAI's attempt to fold the simplicity of the Chat Completions API together with the agentic capabilities that previously lived in the Assistants stack. A single request can call multiple built-in tools, preserve reasoning state across turns, and return polymorphic output items in the order they happened. Where Chat Completions returns one message, Responses returns a list of typed items, including message, reasoning, function_call, web_search_call, file_search_call, code_interpreter_call, and image_generation_call entries that document each step the model took.[5][3]

Responses are stored by default. Every call returns a response.id that the caller can pass into the next request as previous_response_id, which lets OpenAI manage conversation history server-side rather than forcing the client to resend the full transcript on every turn. A separate Conversations API (added in August 2025) extends this with a long-running Conversation object that holds messages, tool calls, and tool outputs across sessions. The combination gives developers a stateful interface that does not depend on client-side history management.[5][6]

march 2025 launch

OpenAI announced the Responses API on March 11, 2025, together with the open-source Agents SDK and a set of hosted tools aimed at agent builders. At launch the API supported three built-in tools: web search, file search, and computer use. OpenAI described the release as a way to give developers "the same building blocks we use internally to ship products like Operator and Deep Research" without forcing them to wire those tools together by hand.[1][7]

Olivier Godement, OpenAI's head of product for the API, framed the launch around a problem his team had observed in 2024: "It's pretty easy to demo your agent. To scale an agent is pretty hard, and to get people to use it often is very hard." The Responses API was meant to lower the engineering cost of getting from prototype to production by handling the orchestration loop on the server.[7]

The built-in web search tool reused the infrastructure behind ChatGPT search and was available through gpt-4o-search-preview and gpt-4o-mini-search-preview. OpenAI reported that these search-tuned variants reached 90% and 88% accuracy on the SimpleQA factuality benchmark, compared with 63% for GPT-4.5. The computer use tool was powered by a new computer-use-preview model, the same Computer-Using Agent that powered OpenAI's Operator product. OpenAI noted that computer use was still a research preview, gated to usage tiers 3 through 5, and acknowledged that it was "not yet highly reliable for automating tasks on operating systems." The model scored 38.1% on the OSWorld benchmark at release.[1][7][8]

may 2025 expansion

On May 21, 2025 OpenAI shipped a major update that added remote MCP server support, native image generation, code interpreter, file search upgrades, background mode, reasoning summaries, and encrypted reasoning items.[2]

Addition	What OpenAI added
Remote MCP support	Connect to any remote Model Context Protocol server with a few lines of code; OpenAI joined the MCP steering committee at the same time
Image generation	`gpt-image-1` available as a tool inside Responses, with streaming previews and multi-turn refinement
Code interpreter	Hosted Python sandbox available as a tool, the same execution environment used by ChatGPT
File search upgrades	Multi-vector-store search and metadata attribute filtering
Background mode	Asynchronous long-running tasks for jobs that exceed the synchronous timeout
Reasoning summaries	Natural-language summaries of the model's hidden chain of thought
Encrypted reasoning items	Reusable reasoning state for Zero Data Retention customers, so reasoning can persist across calls without OpenAI storing it

The May update also unlocked tool use inside the chain of thought for OpenAI o3 and OpenAI o4-mini. Earlier reasoning models could call tools only after finishing their reasoning step. With the Responses API, o3 and o4-mini can interleave tool calls with their internal thinking, which OpenAI claimed produced better results on tasks where the model needs intermediate evidence to reason well.[2]

endpoint and request shape

A Responses request is sent as POST /v1/responses. The minimum payload is a model and an input. The input field accepts either a plain string or a list of typed content items including input_text, input_image, input_file, and input_audio. The response object contains id, output (the list of items the model produced), output_text (a convenience accessor), usage, and metadata about any tools that were invoked.[5]

Parameter	Purpose
`model`	The model that will generate the response, for example `gpt-5` or `gpt-4.1`.
`input`	The prompt. Accepts a string or a list of typed input items.
`instructions`	A system-level instruction inserted ahead of `input`.
`tools`	A list of tools the model can call: built-in tools (`web_search`, `file_search`, `computer_use_preview`, `code_interpreter`, `image_generation`, `mcp`) and user-defined `function` tools.
`tool_choice`	Controls whether the model may, must, or must not call tools.
`previous_response_id`	The id of an earlier response to chain on top of.
`conversation`	An identifier for a Conversations API thread.
`store`	Whether the response should be stored on OpenAI's servers (defaults to true).
`stream`	If true, the API streams typed events.
`background`	If true, the response runs asynchronously and can be polled or cancelled.
`reasoning`	Configuration for reasoning models, including effort and summary preferences.
`max_output_tokens`	Cap on output tokens.
`temperature`, `top_p`	Standard sampling controls.
`response_format` / `text`	Structured output configuration, including JSON Schema.

A basic Python example using the official SDK looks like this:

from openai import OpenAI

client = OpenAI()

response = client.responses.create(
    model="gpt-5",
    input="Summarize the latest news about the Mars Sample Return mission.",
    tools=[{"type": "web_search"}],
)

print(response.output_text)

The equivalent JavaScript call uses the same shape:

import OpenAI from "openai";

const client = new OpenAI();

const response = await client.responses.create({
  model: "gpt-5",
  input: "Summarize the latest news about the Mars Sample Return mission.",
  tools: [{ type: "web_search" }],
});

console.log(response.output_text);

A conversation can be continued by passing the prior response id:

follow_up = client.responses.create(
    model="gpt-5",
    input="What are the funding risks for the mission?",
    previous_response_id=response.id,
)

The API also supports server-sent events streaming. When stream=true is set the server emits semantic events such as response.output_text.delta, response.function_call_arguments.delta, and response.completed, which are richer than the raw token deltas emitted by Chat Completions.[5]

built-in tools

One of the main reasons developers choose the Responses API over Chat Completions is the set of OpenAI-hosted tools that run server-side, so the model can invoke them inside a single request without round-tripping through the developer's backend.

Tool	What it does	Status
Web search	Live web retrieval powered by ChatGPT Search infrastructure, returns cited sources	Generally available
File search	Vector-store-backed RAG with multi-store search, metadata filters, and custom ranking	Generally available
Computer use	Browser and desktop automation through the `computer-use-preview` model	Research preview, tiers 3 to 5
Code interpreter	Hosted Python sandbox with file upload and persistent containers	Generally available
Image generation	Native `gpt-image-1` generation and editing with streaming previews	Generally available
Remote MCP	Connects to any external MCP server to expose third-party tools	Generally available
Custom function calling	Developer-defined JSON-schema functions, the same model used in Chat Completions	Generally available

The hosted tools can be combined freely. A model can run a web search, feed the result into code interpreter, then call a custom function the developer defined, and finally write a response. OpenAI describes this as the API being "agentic by default."[2][5]

supported models

The Responses API supports OpenAI's flagship text and reasoning models. Reasoning models pick up extra benefits because the API preserves their chain of thought across turns.[2][9]

Model family	Examples	Notes
GPT-4.1	gpt-4.1, gpt-4.1-mini, gpt-4.1-nano	Strong general-purpose models, full tool support
GPT-4o	gpt-4o, gpt-4o-mini, gpt-4o-audio	Multimodal text plus image; search variants for web search
o-series reasoning	o1, o1-mini, o3, o3-mini, o4-mini	Tool calls inside chain of thought, reasoning summaries
GPT-5	gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-pro, gpt-5.2, gpt-5.4-mini, gpt-5.4-nano, gpt-5.5, gpt-5.5-pro	OpenAI recommends Responses for GPT-5 to preserve reasoning state
Codex variants	gpt-5.2-codex, gpt-5.3-codex	Tuned for agentic coding tasks
Computer use	computer-use-preview	Only available through Responses, not Chat Completions
Deep research	o3-deep-research, o4-mini-deep-research	Long-running research tasks, work well with background mode

OpenAI states that GPT-5 specifically benefits from Responses because the API preserves hidden reasoning tokens across calls. The company has reported that GPT-5 scores roughly 5% higher on the TAUBench agent benchmark when used through Responses versus Chat Completions, an improvement OpenAI attributes to that reasoning preservation.[6][9]

state and conversation management

With previous_response_id, OpenAI prepends the prior input and output items, including reasoning items from o-series models, before generating the next response. This is the recommended pattern for reasoning models because it preserves the chain of thought without the client having to handle it.[6][10]

The Conversations API is the alternative for longer-running threads. A conversation is a named container of items, and a Responses request can take a conversation id instead of (or in addition to) previous_response_id. New input and output items are appended to the conversation automatically.[5]

For sensitive deployments, OpenAI added encrypted reasoning items in May 2025. Reasoning content can be returned in encrypted form, stored on the client, and replayed on the next request without OpenAI ever retaining the plaintext. This is what lets organizations on Zero Data Retention contracts still benefit from reasoning continuity.[2]

Background mode (also added in May 2025) is for long-running agent runs. The client sets background=true, gets a response id immediately, and can poll the response or cancel it. This is the recommended approach for deep research workflows or long computer-use sessions, because a single response can take minutes to produce and a regular HTTP request would time out.[2]

structured outputs and function calling

The Responses API supports structured outputs through the response_format (or text.format) parameter. By passing a JSON Schema with strict: true, the developer guarantees that the response conforms to the schema; the model is constrained at decoding time so the JSON cannot be malformed. Structured outputs work with both regular chat models and reasoning models, and they also constrain the arguments emitted for function tools.[5]

Function calling itself is unchanged in spirit from Chat Completions: declare functions as tools, get back arguments, run the function, post the result. The wire format uses typed items rather than a tool_calls list, which makes mixed sequences of model text, function calls, web searches, and code execution easier to reason about as a single output stream.[5][10]

pricing

Responses API requests are billed at the standard per-token price for whichever model is invoked, plus separate per-call fees for hosted tools. The May 2025 release post itemized the tool charges.[2][11]

Tool or feature	Price noted by OpenAI
Web search (preview, non-reasoning)	$25.00 per 1,000 calls
Web search (preview, reasoning)	$10.00 per 1,000 calls
File search	$0.10 per GB of vector storage per day, plus $2.50 per 1,000 tool calls
Code interpreter	$0.03 per container session
Image generation	$5.00 per 1M text input tokens, $10.00 per 1M image input tokens, $40.00 per 1M image output tokens
Computer use (`computer-use-preview` model)	$3.00 per 1M input tokens, $12.00 per 1M output tokens
Remote MCP tool	No extra tool fee beyond standard token billing for the hosting model

Reasoning tokens generated by o-series and GPT-5 thinking models count as output tokens, so a single Responses call against a reasoning model can consume far more output tokens than the visible reply suggests. Web search billing has two parts: the per-call fee shown above plus token charges for the search content the model ingests, which catches some developers off guard.[11][12]

comparison with sibling OpenAI APIs

OpenAI now runs three primary text-generation APIs. Responses is the recommended path for new projects, Chat Completions remains supported indefinitely, and Assistants is still available until its sunset.[3][6]

Feature	Chat Completions	Responses	Assistants (legacy)
Endpoint	`/v1/chat/completions`	`/v1/responses`	`/v1/threads`, `/v1/assistants`
State	Stateless, client manages history	Optional server state via `previous_response_id` and Conversations	Server-managed Threads
Built-in tools	None (function calling only)	Web search, file search, computer use, code interpreter, image gen, MCP	Code interpreter, file search, function calling
Streaming	Token deltas	Semantic event stream	Run-event stream
Reasoning preservation	No	Yes, including encrypted reasoning items	Limited
Polymorphic output	Single `message`	List of typed items	Run steps
Status	Supported indefinitely	Recommended for new builds	Beta, sunsets August 26, 2026

OpenAI's official guidance is that Chat Completions is still the right choice for simple text generation and for code that needs to remain portable across providers, since the Chat Completions schema is implemented by Anthropic-compatible endpoints, Azure, and most open-source inference servers. Responses is the right choice when an application needs hosted tools, reasoning preservation, server-managed conversation state, or the ability to combine multiple tools in a single agentic loop. The Realtime API is a separate surface for low-latency voice and live multimodal sessions over WebSocket and is not replaced by Responses.[3][6]

comparison with peer APIs

Responses is OpenAI's answer to a wave of agent-oriented APIs that other model providers have shipped over the same period.[13][14]

API	Provider	State	Built-in tools	Notes
Responses	OpenAI	Server-side via Conversations	Web search, file search, computer use, code interpreter, image gen, MCP	Polymorphic output, encrypted reasoning
Messages API	Anthropic	Client-managed	Computer use, code execution, web search, MCP connectors	Extended thinking, prompt caching, top_k
Gemini API	Google	Client-managed, optional File API	Google Search grounding, code execution, function calling	Multimodal native, large context
Mistral Agents API	Mistral	Server-side agents	Web search, code interpreter, image gen, document library, MCP	Agent objects similar to Assistants
Bedrock Agents	AWS	Server-managed sessions	Knowledge bases, action groups, code interpreter	Multi-model, runs across Anthropic, Mistral, Meta, Amazon Nova

The Responses API and the Anthropic Messages API are the closest competitors. Both expose computer use, web search, and MCP as first-class tools, and both ship official SDKs for Python and Node. The biggest practical differences are that Anthropic exposes its full chain of thought verbatim while OpenAI summarizes it, and Anthropic exposes prompt caching as a developer-controlled feature while OpenAI's caching is implicit. Anthropic's API also keeps system prompts in a separate system field rather than embedding them in the message array.[13][14]

migration from chat completions and assistants

Migrating from Chat Completions is mostly a matter of renaming messages to input and accepting a different output shape. A Chat Completions message list translates almost directly: each message becomes an input item with the same role and content. The biggest change for callers is reading response.output_text instead of response.choices[0].message.content, and looking through response.output to handle tool calls, since tool calls are a separate item type rather than a field on a message.[3]

Migrating from Assistants is more involved because Assistants used persistent Assistant, Thread, Message, and Run objects, while Responses replaces them with Response and Conversation objects plus reusable Prompts. OpenAI published a side-by-side migration guide that maps each Assistants concept to its Responses equivalent: Threads become Conversations, Assistants become Prompts (versioned in the dashboard rather than created via the API), and Runs collapse into a single responses.create call that may invoke many tools. Vector stores carry over directly because both APIs use the same vector store backend for file search.[3][15]

deprecation of the assistants api

OpenAI announced the Assistants API beta deprecation on August 26, 2025 with a sunset date of August 26, 2026. The company gave developers a full year to migrate. After the sunset, the /v1/assistants and /v1/threads endpoints will return errors, and existing Assistants will need to be reimplemented as Prompts and Conversations on the Responses API. OpenAI explicitly framed the move as the result of Responses reaching parity with everything Assistants offered, including code interpreter and persistent conversation state, plus new capabilities such as remote MCP, computer use, and reasoning preservation that Assistants never supported.[15][4]

use in the agents sdk

The Agents SDK is OpenAI's open-source orchestration framework for multi-agent workflows. It launched on March 11, 2025 alongside the Responses API and is available for both Python and JavaScript. The SDK is built on top of Responses and uses the API's native tool-call loop as its execution primitive. An Agent in the SDK is a thin wrapper around a Responses call with a system instruction, a list of tools, and optional handoffs to other agents.[1][7]

A minimal Agents SDK script in Python looks like this:

from agents import Agent, Runner

researcher = Agent(
    name="Researcher",
    instructions="Use web search to gather facts.",
    tools=[{"type": "web_search"}],
)

result = Runner.run_sync(researcher, "Latest funding rounds in robotics startups.")
print(result.final_output)

The SDK handles the tool-call loop, retries, tracing, and handoffs between agents. Because it sits on top of Responses, anything the SDK can do is also possible with raw Responses calls if a developer wants finer control.

limitations

The Responses API has matured quickly but still has gaps developers should plan around. Computer use is gated to higher usage tiers and remains a research preview, with OpenAI itself noting it is not yet reliable for general OS automation. Some hosted tools, including image generation and code interpreter, are not yet available in every Azure region; Microsoft's Azure OpenAI service rolled them out behind a separate availability matrix. The hosted tools are tightly coupled to OpenAI's ecosystem, so an application that relies heavily on the built-in web search or file search cannot easily swap to another model provider without rewriting that orchestration. The Responses schema is more expressive than Chat Completions, but that same expressiveness has slowed adoption by third-party libraries that originally targeted the Chat Completions shape.[7][8][13]

OpenAI has also been criticized for the way Responses handles reasoning. The API does not return raw chain-of-thought text by default. Instead it returns a summary or, for Zero Data Retention customers, an encrypted blob that can be passed back in subsequent calls. Some developers and researchers have argued that this design exists partly to keep OpenAI's reasoning traces out of competitors' training data. Anthropic's API, by comparison, returns the model's full thinking when the developer asks for it, which makes debugging and analysis easier at the cost of more exposed surface area.[16]

reception

Reception of the Responses API has been broadly positive among developers building agents. Coverage in TechCrunch, VentureBeat, and InfoQ described the launch as a long-overdue consolidation of OpenAI's previously fragmented agent stack, with the Agents SDK and computer use rounding out the picture. The May 2025 update was widely covered as the moment when Responses reached real feature parity with Assistants, and the August 2025 deprecation announcement confirmed Responses as the long-term direction.[7][8][4][14]

Criticism has clustered around three themes. First, the cost of hosted tools, especially the $25 per 1,000 web search calls on preview models, prompted developers to compare Responses unfavorably with rolling their own retrieval pipelines. Second, the lack of cross-provider portability, since Responses is OpenAI-only while Chat Completions has near-universal third-party support. Third, the secrecy around reasoning tokens, which makes it harder for outside researchers to audit how the models behave during agentic loops.[11][13][16]

references

OpenAI. "New tools for building agents." March 11, 2025. https://openai.com/index/new-tools-for-building-agents/
OpenAI. "New tools and features in the Responses API." May 21, 2025. https://openai.com/index/new-tools-and-features-in-the-responses-api/
OpenAI. "Migrate to the Responses API." OpenAI Platform Docs. https://platform.openai.com/docs/guides/migrate-to-responses
OpenAI Developer Community. "Assistants API beta deprecation, August 26, 2026 sunset." August 26, 2025. https://community.openai.com/t/assistants-api-beta-deprecation-august-26-2026-sunset/1354666
OpenAI. "Responses API Reference." OpenAI Platform Docs. https://platform.openai.com/docs/api-reference/responses
OpenAI Developers. "Using GPT-5.5." https://developers.openai.com/api/docs/guides/latest-model
VentureBeat. "OpenAI unveils Responses API, open source Agents SDK, letting developers build their own Deep Research and Operator." March 11, 2025. https://venturebeat.com/programming-development/openai-unveils-responses-api-open-source-agents-sdk-letting-developers-build-their-own-deep-research-and-operator/
TechCrunch. "OpenAI launches new tools to help businesses build AI agents." March 11, 2025. https://techcrunch.com/2025/03/11/openai-launches-new-tools-to-help-businesses-build-ai-agents/
OpenAI. "Models." OpenAI Platform Docs. https://platform.openai.com/docs/models
OpenAI Cookbook. "Better performance from reasoning models using the Responses API." https://developers.openai.com/cookbook/examples/responses_api/reasoning_items
OpenAI. "API Pricing." OpenAI. https://openai.com/api/pricing/
OpenAI Developer Community. "Heads up: Web Search Tool Billing Can Be Higher Than You Expect." 2025. https://community.openai.com/t/heads-up-web-search-tool-billing-can-be-higher-than-you-expect-here-s-why/1236954
Portkey. "OpenAI Responses API vs Chat Completions vs Anthropic Messages API." 2025. https://portkey.ai/blog/open-ai-responses-api-vs-chat-completions-vs-anthropic-anthropic-messages-api/
VentureBeat. "OpenAI updates its new Responses API rapidly with MCP support, GPT-4o native image gen, and more enterprise features." May 21, 2025. https://venturebeat.com/programming-development/openai-updates-its-new-responses-api-rapidly-with-mcp-support-gpt-4o-native-image-gen-and-more-enterprise-features/
OpenAI. "Assistants API migration guide." OpenAI Platform Docs. https://platform.openai.com/docs/assistants/migration
Goedecke, Sean. "The whole point of OpenAI's Responses API is to help them hide reasoning traces." 2025. https://www.seangoedecke.com/responses-api/

OpenAI Responses API

overview

march 2025 launch

may 2025 expansion

endpoint and request shape

built-in tools

supported models

state and conversation management

structured outputs and function calling

pricing

comparison with sibling OpenAI APIs

comparison with peer APIs

migration from chat completions and assistants

deprecation of the assistants api

use in the agents sdk

limitations

reception

see also

references

Improve this article

overview

march 2025 launch

may 2025 expansion

endpoint and request shape

built-in tools

supported models

state and conversation management

structured outputs and function calling

pricing

comparison with sibling OpenAI APIs

comparison with peer APIs

migration from chat completions and assistants

deprecation of the assistants api

use in the agents sdk

limitations

reception

see also

references

overview

march 2025 launch

may 2025 expansion

endpoint and request shape

built-in tools

supported models

state and conversation management

structured outputs and function calling

pricing

comparison with sibling OpenAI APIs

comparison with peer APIs

migration from chat completions and assistants

deprecation of the assistants api

use in the agents sdk

limitations

reception

see also

references

Improve this article

Related Articles

Agent Tools API

GPT API

22.500 Best Custom GPTs

ChatGPT Classic

GPT Search

GPT Shop Keeper

overview

march 2025 launch

may 2025 expansion

endpoint and request shape

built-in tools

supported models

state and conversation management

structured outputs and function calling

pricing

comparison with sibling OpenAI APIs

comparison with peer APIs

migration from chat completions and assistants

deprecation of the assistants api

use in the agents sdk

limitations

reception

see also

references

Related Articles

Agent Tools API

GPT API

22.500 Best Custom GPTs

ChatGPT Classic

GPT Search

GPT Shop Keeper