OpenAI Assistants API

28 min read

Updated Jul 23, 2026

The OpenAI Assistants API is a stateful, server-managed application programming interface that OpenAI introduced on November 6, 2023, at its first DevDay developer conference, letting developers build agent-like assistants that retain conversation history and call hosted tools without managing that infrastructure themselves.^[1]^[2] It was OpenAI's earliest "agentic" interface, exposing four primary object types (Assistants, Threads, Messages, and Runs) plus three first-party tools (Code Interpreter, Retrieval, and Function calling).^[1]^[2] The Assistants API is being deprecated in favor of the newer Responses API: OpenAI announced the Responses replacement on March 11, 2025, published a formal deprecation notice on August 26, 2025, and scheduled the Assistants API beta to sunset exactly one year later, on August 26, 2026.^[3]^[4]^[5] It never left beta during its roughly three-year lifetime.

Infobox

Field	Value
Type	Stateful HTTP API for building AI assistants and AI agents
Creator	OpenAI
Initial release	November 6, 2023 (v1 beta)
Major revision	April 17, 2024 (v2 beta)
Deprecation announced	August 26, 2025
Scheduled sunset	August 26, 2026
Successor	OpenAI Responses API plus OpenAI Agents SDK
Core abstractions	Assistant, Thread, Message, Run, Run Step, Vector Store (v2)
Hosted tools	Code Interpreter, Retrieval / File Search, Function calling
Status as of mid-2026	Beta, deprecated, approaching the August 26, 2026 sunset

What is the OpenAI Assistants API?

The Assistants API is a hosted runtime for building AI assistants on top of OpenAI's models. Rather than re-sending the full conversation on every request (the pattern of the stateless Chat Completions endpoint), a developer creates a persistent Assistant (a saved configuration of model, instructions, and tools), opens a Thread (a server-stored conversation), appends Messages, and triggers a Run that executes the Assistant against the Thread.^[2]^[6] OpenAI stores the Thread and its history on its own servers, so the client does not have to manage conversation memory, a vector database, or a sandbox for code execution.

The API was the first time a frontier model lab shipped a packaged server-side runtime for the agent loop. It combined four resource types and three hosted tools into a single offering, which several commentators read as a first-party response to client-side agent frameworks such as LangChain and LlamaIndex.^[1]^[19]^[23] It also served as the developer-facing counterpart to the consumer GPTs product announced the same day: both shared the abstractions of instructions, model, and tools, but GPTs lived inside the ChatGPT surface while the Assistants API exposed the same primitives to developers' own applications.^[19]

How does the Assistants API work?

The Assistants API exposed a small set of REST endpoints under /v1/assistants, /v1/threads, and (in v2) /v1/vector_stores. The objects formed a directed hierarchy:

Object	Lifetime	Purpose
Assistant	Persistent, account-scoped	Stores a model selection, system instructions, attached tools, and (in v2) attached Vector Stores. Reusable across many Threads and users.^[2]^[6]
Thread	Persistent, account-scoped	Holds an ordered list of Messages for one conversation. Independent of any Assistant; the same Thread can be run against multiple Assistants.^[2]^[6]
Message	Persistent, child of Thread	A `user` or `assistant` content payload (text, file attachments, images). Messages are append-only within a Thread.^[6]^[11]
Run	Persistent, child of Thread	One execution of an Assistant on a Thread. Carries a status, token usage, and a list of Run Steps.^[8]^[11]
Run Step	Persistent, child of Run	A single step of the agent loop: a message creation, a tool call, or a tool output.^[11]
Vector Store (v2)	Persistent, account-scoped	An auto-chunked, auto-embedded collection of files that the `file_search` tool can query.^[9]^[10]

The Run object's state machine drove the agent loop. After a client called POST /v1/threads/{thread_id}/runs, the Run started in queued, advanced to in_progress, and ended in one of completed, failed, cancelled, expired, or requires_action.^[8] The requires_action state handed control back to the client for function calling: when the model emitted one or more tool_calls for developer-defined functions, the Run paused and waited for the client to submit tool outputs via POST /v1/threads/{thread_id}/runs/{run_id}/submit_tool_outputs.^[8]^[16] In OpenAI's own summary of the flow, developers created an Assistant, dropped messages into a Thread, fired a Run, polled until it completed, and retrieved the output.^[2]^[8]

Because Threads were server-managed, the Assistants API was not idempotent in the way a stateless Chat Completions call was: while a Run was in a non-terminal state the Thread was locked, and no new Messages could be appended and no new Runs could be created against it.^[8] Clients either polled the Run endpoint until a terminal state was reached or, after the v2 update added streaming, subscribed to a server-sent event stream that emitted thread.run.created, thread.message.delta, thread.run.requires_action, and similar events.^[10]^[11]

What tools does the Assistants API provide?

Three tool types were available across the API's lifetime, and they could be mixed within a single Assistant:

Code Interpreter ran Python in an OpenAI-managed sandbox with internet access disabled and a per-session file system. Sessions were billed at a flat fee per hour of activity on a given Thread, regardless of how many invocations occurred inside that hour.^[1]^[17] Code Interpreter could read attached files, write generated files (charts, CSVs, images) into the session, and emit them as Message attachments. OpenAI described the tool as one that "writes and runs Python code in a sandboxed execution environment" and "can generate graphs and charts, and process files with diverse data and formatting."^[2]
Retrieval / File Search indexed uploaded documents and let the model query them by semantic similarity. Under v1 the indexing pipeline was opaque and capped at twenty files per Assistant.^[7] Under v2, files were grouped into Vector Store objects that OpenAI parsed, chunked, and embedded automatically. v2 results carried inline citations identifying the source file and the byte range, and supported both vector and keyword retrieval.^[9]^[10] OpenAI positioned Retrieval as a way to augment "the assistant with knowledge from outside our models, such as proprietary domain data, product information or documents provided by your users."^[2]
Function calling let developers register JSON schemas for arbitrary functions. The model returned the function name and a JSON argument object; the client executed the function locally and posted the result back to the Run. v2 added parallel function calls, allowing the model to emit multiple tool_calls in a single requires_action event.^[10]^[16]

The v2 tool_choice parameter let callers force the use of a specific tool on a per-Run basis.^[10]

When was the Assistants API released, and what changed in v2?

Origins at DevDay 2023

OpenAI announced the Assistants API on November 6, 2023, during its first DevDay conference in San Francisco, alongside GPT-4 Turbo and the consumer-facing custom GPTs product.^[1]^[2] The release was framed by chief executive Sam Altman and developer experience lead Romain Huet as a step toward "agent-like experiences" inside third-party applications: the new API would let developers build assistants with specific instructions, persistent memory, and access to first-party tools without having to wire up conversation storage, sandboxed Python execution, or document indexing themselves.^[1]^[6]

The November 6 release shipped four resource types and three tools. The four resource types were the Assistant (a server-stored configuration of model, instructions, and tools), the Thread (a server-stored conversation), the Message (a unit of user or assistant content inside a Thread), and the Run (an execution of an Assistant against a Thread).^[2]^[6] The three first-party tools were Code Interpreter, which executed Python in an OpenAI-hosted sandbox; Retrieval, which performed embedding-based document search; and Function calling, which let the model emit structured JSON calls to developer-defined functions.^[1]^[2] The API entered beta open to all developers on the day of the announcement.^[1]

The launch positioned the Assistants API as a structural counterpart to the customer-facing GPTs product. Both shared the same underlying abstractions of instructions, model, and tools, but where GPTs lived inside the ChatGPT consumer surface and were monetised through the OpenAI-operated GPT Store, the Assistants API exposed the same primitives to developers building their own applications.^[19] OpenAI's developer-relations posts argued that this split followed a deliberate product strategy: GPTs would democratise assistant authoring for non-technical users, while the Assistants API would let businesses embed similar functionality into their own clients with their own branding, authentication, and data handling.^[19]^[2]

Coverage of the launch emphasised that the new API was not merely a wrapper over Chat Completions. Earlier in 2023, OpenAI had released function calling on the Chat Completions endpoint, and the developer community had built a substantial ecosystem of agent libraries on top of that primitive. The Assistants API was the first time the model provider itself shipped a server-side runtime for the agent loop, persistent conversation storage, and hosted retrieval as a packaged offering, which several commentators identified as a competitive response to client-side frameworks such as LangChain and LlamaIndex that had grown popular over the preceding year.^[1]^[19]^[23]

Initial limitations and developer reception

The first version of Retrieval was capped at twenty files per Assistant, with a per-file size limit of 512 megabytes and a per-Assistant storage charge.^[7] Several developer forum threads in November and December 2023 criticised the twenty-file ceiling as too small for production retrieval-augmented generation workloads, with users asking whether the limit would be lifted.^[7] Early reviews also noted that the API's asynchronous Run model required client-side polling: after creating a Run, callers had to repeatedly fetch the Run object until its status changed from queued or in_progress to a terminal state such as completed, failed, cancelled, expired, or requires_action.^[8]

Cost surprises were a recurring theme in early reviews. The v1 Retrieval tool was billed at 0.20 US dollars per gigabyte per Assistant per day, meaning that two Assistants sharing the same source corpus would each incur full storage charges; v2 later corrected this by introducing the shared Vector Store object.^[7]^[10] Code Interpreter's hourly session billing was another point of confusion: developers who assumed a single Code Interpreter invocation cost 0.03 US dollars sometimes discovered that opening parallel Threads multiplied the charge, because each Thread maintained its own sandbox session.^[17]^[18] Several early adopters concluded that the Assistants API was best suited to prototypes and internal tools where these unpredictable per-feature charges were tolerable, while large consumer-scale deployments could be cheaper to operate on the stateless Chat Completions endpoint with developer-managed retrieval and execution.^[22]

v2 release in April 2024

OpenAI released Assistants API v2 on April 17, 2024.^[9]^[10] The v2 revision was a backwards-incompatible upgrade that introduced a new top-level object, the Vector Store, and renamed the Retrieval tool to file_search.^[9]^[10] A Vector Store handled automatic parsing, chunking, and embedding for uploaded files and could be attached to either an Assistant or a Thread; the same Vector Store could be shared across multiple Assistants.^[10] The per-Assistant file ceiling rose from twenty to ten thousand files, a five-hundred-fold increase, while the per-file size limit remained at 512 megabytes and a new per-file token cap of five million tokens was imposed.^[9]^[10] OpenAI's own materials described the change as letting assistants "draw information from up to 10,000 files" via the new vector store, "500x more than before."^[9]^[11]

The v2 update also added stream events for Runs, parallel function calls (multiple tool_calls returned in a single required_action step), a tool_choice parameter, token-usage fields on completed Runs, and standard sampling parameters (temperature, top_p, plus per-Run token limits).^[10]^[11] File search results returned annotated citations identifying the source file and chunk for each retrieved span.^[10] File Search was priced at 0.10 US dollars per gigabyte of vector store storage per day, with the first gigabyte free.^[9]^[17] OpenAI announced that access to the v1 endpoints would end on December 18, 2024, after which only v2 would be available.^[11]

Several smaller v2 enhancements addressed long-running developer complaints. The introduction of tool_choice let callers force a specific tool invocation on a per-Run basis, a feature already present on Chat Completions. Vector Stores accepted optional expiration policies that automatically deleted files after a configurable interval, reducing the risk of perpetual storage charges accruing on forgotten Assistants. The v2 endpoints also added support for fine-tuned gpt-3.5-turbo-0125 variants, and later expanded to fine-tuned GPT-4o derivatives, allowing developers to combine a customised base model with the Assistants runtime.^[10]^[11] In aggregate, v2 closed many of the smaller feature gaps between the Assistants API and the simpler Chat Completions endpoint, but it preserved the asynchronous Run model and the perpetual beta label that had been criticised since launch.^[22]^[23]

Is the Assistants API deprecated?

Yes. The Assistants API is deprecated, and its endpoints are scheduled to be removed permanently on August 26, 2026.^[4]^[5] After that date, requests to /v1/assistants, /v1/threads, and related endpoints are expected to fail. The deprecation followed a roughly eighteen-month period of production experience and the introduction of a cleaner successor, the Responses API.

The Responses API successor

On March 11, 2025, OpenAI introduced the Responses API together with the open-source Agents SDK as part of a launch titled "New tools for building agents".^[3]^[12] The Responses API was positioned as a unification of the strengths of the older Chat Completions endpoint and the Assistants API: it supports server-side conversation state, hosted tools, and an item-based event stream, but with a flatter object model.^[3]^[4]^[13] OpenAI described it as "our recommended path to integrate with the OpenAI API today, and for the future," and reported that the "Responses API has already overtaken Chat Completions in token activity."^[4] The March 11 announcement stated that OpenAI intended to achieve feature parity in the Responses API and then formally deprecate the Assistants API, with a sunset target in mid-2026 and a twelve-month migration window after the formal deprecation notice.^[3]^[14]

The August 2025 deprecation notice

That formal notice arrived on August 26, 2025. OpenAI told developers: "We're winding down the Assistants API beta. It will sunset one year from now, August 26, 2026."^[4] The same day the company published an "Assistants migration guide" describing how to translate Assistants (which become Prompts, a dashboard-only configuration object), Threads (which become Conversations), Runs (which become Responses), and Run Steps (which become Items) into the Responses API and the companion Conversations API.^[4]^[15] As of mid-2026 the Assistants API remains operational but is in legacy support, approaching the scheduled shutdown.^[5] The Azure OpenAI Service mirror of the Assistants API is on the same retirement schedule, also slated for full removal on August 26, 2026.^[4]^[21]

The object-to-object mapping that the migration guide defines is summarised below:

Assistants API concept	Responses API equivalent
Assistant	Prompt (dashboard-only configuration object)^[4]^[15]
Thread	Conversation (via the Conversations API)^[4]^[15]
Run	Response^[4]^[15]
Run Step	Item^[4]^[15]

How does the Assistants API handle pricing and streaming?

Pricing

The Assistants API used a layered pricing model. Token usage on the underlying model (for example GPT-4 Turbo or, later, GPT-4o) was billed at the standard per-token rate.^[1]^[17] On top of that, the hosted tools carried infrastructure surcharges. Code Interpreter cost 0.03 US dollars per session, with a session defined as up to one hour of activity on a single Thread; concurrent sessions on different Threads were billed independently.^[17]^[18] File Search in v2 was billed at 0.10 US dollars per gigabyte of Vector Store storage per day, with the first gigabyte free and a default project storage cap of 100 gigabytes; v1 Retrieval had been charged at 0.20 US dollars per gigabyte per Assistant per day.^[7]^[17] Function calling itself carried no separate fee, only the model token cost.

Streaming

Streaming was not available at the November 2023 launch. OpenAI added stream events in the v2 update, emitting incremental deltas for Message content (thread.message.delta), Run status changes (thread.run.in_progress, thread.run.completed), and tool invocations (thread.run.step.created, thread.run.requires_action).^[11]^[10] Token-usage statistics on a Run were populated only after the Run reached a terminal state, which complicated billing instrumentation for long-running streamed Runs.^[11]

Streaming did not eliminate the underlying state-machine complexity of the Assistants API. A streamed Run still progressed through the same set of Run Step events as a polled Run; the stream simply delivered them as server-sent events rather than requiring repeated GET calls. Tool execution still required the client to detect a thread.run.requires_action event, locally execute the listed tool_calls, and reopen a connection with submit_tool_outputs to resume the Run.^[11]^[16] This made the streaming developer experience richer than polling but did not simplify the orchestration logic that any agent-style application had to implement.

SDK helpers

OpenAI shipped first-party Python and Node.js SDK helpers that wrapped the polling and streaming patterns. The Python helper client.beta.threads.runs.create_and_poll ran a Run to completion and returned the final Run object, while client.beta.threads.runs.stream returned an event-handler interface that fired callbacks for each Run Step.^[8] Similar helpers existed for Vector Store creation, including a method that uploaded a directory of files and polled until indexing was complete.^[10]^[11] These helpers reduced boilerplate but did not change the underlying HTTP surface; clients in other languages still had to implement the polling or streaming loops themselves.

Who used the Assistants API?

The Assistants API saw broad if shallow adoption. Coverage of DevDay 2023 reported that OpenAI made the beta available to "all developers" on the day of announcement, and use cases highlighted in the launch materials ranged from a natural-language data-analysis app to a coding assistant to an AI vacation planner.^[1]^[2] Subsequent third-party guides documented production deployments at customer-support shops, financial-services firms, and internal enterprise assistants, often in hybrid configurations alongside the consumer-facing Custom GPTs product.^[19]^[20] Through the Azure OpenAI Service, Microsoft offered a managed mirror of the Assistants API on Azure, extending its reach into regulated enterprise environments.^[21]

Independent reviewers consistently noted that the API was attractive for prototypes because it absorbed several normally tedious pieces of infrastructure (conversation storage, embedding indexes, sandboxed code execution) but became awkward at scale. A widely cited 2024 review summarised the trade-off bluntly: "the good, bad, and expensive."^[22] Reported pain points included perceived latency of four to eight seconds per turn versus one to two seconds for Chat Completions, unpredictable file-search billing on large Vector Stores, the inability to control chunking or embedding choices, and the persistent beta label.^[22]^[23]

By the time of the August 2025 deprecation announcement, OpenAI's developer-relations posts described the Responses API as having "already overtaken Chat Completions in token activity" and characterised it as the recommended path for new agent applications.^[4] The same posts encouraged Assistants API users to begin migrating, citing internally measured improvements of forty to eighty per cent in cache-hit rate under the Responses API compared with Chat Completions.^[4]^[13]

A notable adoption pattern, documented across multiple third-party integration guides, was the hybrid stack in which a single organisation ran the Assistants API for its internal-facing assistants and a fleet of consumer-facing Custom GPTs for end-user productivity. In this configuration the Assistants API typically powered chatbots inside proprietary applications that needed strict data residency, audit logging, or integration with existing identity systems, while Custom GPTs were used by employees for ad-hoc tasks in the ChatGPT product surface.^[19]^[20] As of the August 2025 deprecation, OpenAI's migration guidance treated this hybrid use as the modal case and provided distinct migration paths for each, with Assistants API users moving to the Responses API and Custom GPT authors continuing to use the existing GPT Builder interface.^[4]^[15]

How does the Assistants API compare to other agent APIs?

Versus the Chat Completions endpoint

The classic OpenAI Chat Completions endpoint was stateless: every call sent the full message list and received a single completion. The Assistants API was stateful: a Thread persisted on OpenAI servers, the model could autonomously decide to call multiple tools across multiple Run Steps, and the conversation history did not need to be retransmitted on each turn.^[2]^[6] Chat Completions returned a single response synchronously; an Assistants Run was asynchronous and required either polling or stream subscription.^[8] Chat Completions never hosted Code Interpreter or vector retrieval; those tools were introduced first on the Assistants API and later ported to the Responses API.^[3]^[13]

Versus the Anthropic Messages API

The Anthropic Messages API takes the opposite philosophical stance from the Assistants API on conversation state. Anthropic's Messages endpoint is fully stateless: each call must include the entire conversation history, and the client is responsible for storing, truncating, and resubmitting it.^[24] Tool use under the Messages API follows a stop-resume pattern in which the model returns stop_reason: "tool_use" with one or more tool_use content blocks, the client executes the call locally, and the client sends back a tool_result block on the next turn; there is no server-managed Run object.^[24] Server-side tools introduced later by Anthropic (web search, code execution, computer use, and others) run on Anthropic infrastructure but still flow through the same stateless Messages envelope.^[24] In 2025, Anthropic introduced a Managed Agents offering for stateful agent execution, more directly analogous to the Assistants and Responses APIs, while continuing to ship the Messages API itself as a stateless primitive.^[25]

Versus custom agent frameworks

Before and during the lifetime of the Assistants API, several open-source frameworks offered comparable abstractions for conversation memory, tool orchestration, and RAG pipelines, but as client-side libraries rather than server-side endpoints. LangChain provided chains, agents, memory classes, and document loaders that could be assembled into agent loops over any model provider, while LlamaIndex specialised in retrieval and indexing primitives for RAG applications. Both libraries could call the underlying OpenAI Chat Completions endpoint or the Assistants API, but they kept state in the developer's own process rather than on OpenAI's servers.^[23] Compared with the Assistants API, these frameworks offered finer control over chunking, embedding choice, and prompt construction at the cost of more developer code; the Assistants API offered hosted infrastructure at the cost of opacity and lock-in.^[22]^[23]

Versus the Model Context Protocol

The Model Context Protocol (MCP), introduced by Anthropic in late 2024, is not an agent API but a transport protocol that lets language models discover and call tools exposed by external servers. MCP is complementary to, rather than competitive with, the Assistants API: where the Assistants API combined a particular runtime (the Run loop) with a particular tool set (Code Interpreter, file search, functions), MCP standardises only the tool-discovery and tool-calling surface, leaving the agent runtime to the model provider.^[26] The Responses API and the OpenAI Agents SDK subsequently added MCP support as a built-in tool type, which the Assistants API never did.^[4]

Summary table

API	Provider	State	Hosted tools	Status (mid-2026)
Assistants API	OpenAI	Server-managed Threads	Code Interpreter, File Search, Functions	Beta, sunset 2026-08-26^[4]
Responses API	OpenAI	Optional server state via Conversations	Web search, File Search, Code Interpreter, Computer Use, MCP, Image	GA^[3]^[13]
Chat Completions	OpenAI	Stateless	None	GA^[3]
Messages API	Anthropic	Stateless	Web Search, Code Execution, Computer Use (server-side blocks)	GA^[24]
Managed Agents	Anthropic	Server-managed sessions	Mirrors Messages tool set	GA^[25]

What were the main limitations of the Assistants API?

The Assistants API drew several recurrent criticisms over its lifetime, many of which OpenAI itself cited as motivation for the Responses API.

Polling and locking. Without streaming, the only way to know that a Run had finished was to call GET /v1/threads/{thread_id}/runs/{run_id} in a loop. Even with the v2 streaming events, the Thread was locked while a Run was active, which prevented appending new user messages or starting a parallel Run on the same Thread.^[8]^[23] Multi-user front-ends therefore had to map each end-user to a distinct Thread and queue requests carefully.

Opaque retrieval. File Search did not expose its chunking strategy, embedding model, or top-k retrieval parameters. Developers who needed control over how documents were split or how results were ranked routinely fell back to LangChain, LlamaIndex, or a self-hosted vector database with the standard Chat Completions endpoint.^[22]^[23]

Unpredictable costs. The combination of token billing, per-session Code Interpreter charges, per-gigabyte daily Vector Store charges, and the model's own discretion over whether to call tools made budgeting difficult. Reviewers reported cases where a single user question caused the same source PDF to be reprocessed in multiple Runs, accumulating tokens disproportionate to the conversational length.^[22]^[23]

Perpetual beta. The Assistants API never reached general availability. Developer forum posts from late 2024 and early 2025 asked repeatedly whether it would ever leave beta; OpenAI's eventual answer was the Responses API, not a graduation of the existing surface.^[14]^[23]

Limited tool surface. Web search, computer use, image generation, and MCP support were not added to the Assistants API. Each of these capabilities was introduced first on the Responses API in 2025, leaving Assistants users on an effectively frozen tool set.^[3]^[12]^[4]

What lessons did OpenAI draw from the Assistants API?

In its August 2025 deprecation announcement and in subsequent developer blog posts, OpenAI framed the move from the Assistants API to the Responses API as a deliberate reset informed by a year and a half of production experience. The company described server-managed Threads as a useful prototype affordance that proved limiting in production: developers needed finer control over which context entered each model call, lower-latency turn execution, and stateless operation modes for organisations with zero-data-retention compliance requirements.^[4]^[13]

The Responses API addresses these constraints by making server state optional (turns can be chained with a previous_response_id parameter or stored explicitly via the Conversations API, but they can also be sent fully stateless), by collapsing the Assistant configuration into a Prompt object that lives only in the dashboard, by replacing the asynchronous Run with a synchronous Response that streams items as they are produced, and by extending the hosted tool set to include web search, computer use, image generation, and Model Context Protocol clients.^[3]^[4]^[13] OpenAI also reported that the Responses API allows for substantially better prompt caching, citing internal benchmarks showing forty to eighty per cent cache-hit improvements over Chat Completions on equivalent workloads.^[4]^[13]

The deprecation post stopped short of calling the Assistants API a failure. It was the first agentic API any major model provider had shipped, and many of its abstractions (server-managed conversation state, hosted code execution, hosted retrieval, structured tool calls) are now baseline expectations across the industry. The article's principal lesson, as OpenAI articulated it, was that "API design has always been guided by how the models themselves work," and that the rapid evolution of the underlying models (from GPT-4 Turbo through GPT-4o to GPT-5 and reasoning-focused successors) made the original Assistants object model too rigid to absorb new capabilities cleanly.^[13]

A second articulated lesson concerned the separation between configuration and execution. The Assistants object conflated three concerns into a single persistent resource: the model and decoding parameters, the system instructions and prompt, and the attached tools and vector stores. When any one of these needed to change (for example, swapping the underlying model to a newer reasoning variant), developers had to either mutate the Assistant in place or create a new Assistant and migrate clients to its identifier. The Responses API resolves this by treating Prompts as dashboard-managed templates that are referenced at call time, and by allowing tools, models, and instructions to be specified per-request when desired, which decouples runtime evolution from persistent identity.^[4]^[13]^[15]

A third lesson concerned the agent loop itself. Several Responses API design choices, including the unified item stream, the use of previous_response_id for turn chaining, and the move from polled Runs to streamed Responses, were framed in OpenAI's blog posts as direct responses to friction points reported by Assistants API users. The migration guide also clarifies that some Assistants API affordances, including dashboard-only prompt creation and certain conversation-truncation features, did not translate one-to-one into the Responses API; OpenAI's documented position is that these gaps reflect intentional simplifications rather than regressions.^[4]^[14]^[15]

Why does the Assistants API matter?

The Assistants API marked the moment when "agent" stopped being purely a research and open-source-framework concept and became a first-party product surface offered by a frontier model lab. By bundling a conversation store, a code sandbox, a vector retrieval system, and a function-calling interface into a single hosted offering, it lowered the activation energy for building chatbots that could browse documents, run calculations, and call external services. Many of the abstractions it introduced (persistent Threads, Runs as agent-loop executions, tool calls as discrete protocol events) recur in nearly every successor system, from the Responses API and Agents SDK to Anthropic's Managed Agents, to assorted client-side frameworks.^[3]^[13]^[25]

Even after its sunset, the Assistants API is likely to be remembered as the experiment that established server-managed agent state as a viable product category, identified its main failure modes (latency, opacity, billing complexity, beta drift), and informed the design of the cleaner APIs that replaced it.

ELI5: What is the Assistants API in simple terms?

Imagine you want to build a helper robot in an app. Normally you would have to remember the whole conversation yourself, give the robot a calculator, give it a filing cabinet of documents, and hook up any extra tools by hand. The Assistants API let OpenAI do all of that for you: you set up an "Assistant" with a personality and some tools, you start a "Thread" (the chat), you press "Run," and OpenAI's servers remember the chat, run any Python code, search your documents, and call your tools. It launched in November 2023. The catch was that it could be slow, the bills were hard to predict, and it always stayed a "beta." OpenAI is now turning it off (on August 26, 2026) and asking everyone to use a newer, simpler version called the Responses API instead.

OpenAI Responses API: the direct successor, generally available since 2025.
OpenAI Agents SDK: the open-source orchestration layer announced alongside the Responses API.
OpenAI API: the broader API platform of which Assistants was a member.
GPTs: the consumer-facing analogue announced the same day as the Assistants API.
Anthropic API: the stateless Messages API offered by Anthropic.
Anthropic Computer Use: a closely related agent capability launched by Anthropic in late 2024.
Model Context Protocol: a standard for exposing tools to language models.
Function calling: the structured-tool primitive shared across modern model APIs.
Tool use: the general capability that Assistants API tools implement.
Retrieval-Augmented Generation: the technique automated by the File Search tool.
LangChain and LlamaIndex: client-side frameworks that overlapped the Assistants API's scope.

References

^Wiggers, Kyle, "OpenAI launches API that lets developers build 'assistants' into their apps", TechCrunch, 2023-11-06. techcrunch.com/...build-assistants-into-their-apps Accessed 2026-06-26.
^OpenAI, "New models and developer products announced at DevDay", OpenAI Blog, 2023-11-06. openai.com/...veloper-products-announced-at-devday Accessed 2026-06-26.
^OpenAI, "New tools for building agents", OpenAI Blog, 2025-03-11. openai.com/...new-tools-for-building-agents Accessed 2026-06-26.
^OpenAI Developer Community, "Assistants API beta deprecation: August 26, 2026 sunset", OpenAI, 2025-08-26. community.openai.com/...1354666. Accessed 2026-06-26.
^OpenAI, "Deprecations", OpenAI Platform Documentation, 2025-08-26. developers.openai.com/...deprecations. Accessed 2026-06-26.
^HackerNoon (transcript), "OpenAI DevDay Keynote by Sam Altman and Romain Huet About Assistants API and Custom Models", HackerNoon, 2023-11-08. hackernoon.com/...ssistants-api-and-custom-models. Accessed 2026-06-26.
^OpenAI Developer Community, "The 20 file limit on assistants is not useful for large retrieval-augmented generation", OpenAI, 2023-11-20. community.openai.com/...492800. Accessed 2026-06-26.
^OpenAI, "Assistants API deep dive", OpenAI Platform Documentation, 2024. developers.openai.com/...deep-dive. Accessed 2026-06-26.
^Mamezou Developer Portal, "Using the Newly Updated File Search (Vector Stores) in OpenAI Assistants API (v2)", Mamezou, 2024-04-21. developer.mamezou-tech.com/...ai-file-search-intro Accessed 2026-06-26.
^Woyera, "OpenAI Assistants API v2: What's New and Improved?", Medium, 2024-05-03. medium.com/...whats-new-and-improved-a67c4f3936fc. Accessed 2026-06-26.
^OpenAI, "Assistants API (v2) FAQ", OpenAI Help Center, 2024. help.openai.com/...8550641-assistants-api-v2-faq. Accessed 2026-06-26.
^Franzen, Carl, "OpenAI unveils Responses API, open source Agents SDK, letting developers build their own Deep Research and Operator", VentureBeat, 2025-03-11. venturebeat.com/...own-deep-research-and-operator. Accessed 2026-06-26.
^OpenAI Developers, "Why we built the Responses API", OpenAI Developer Blog, 2025. developers.openai.com/...responses-api. Accessed 2026-06-26.
^OpenAI Developer Community, "Introducing the Responses API", OpenAI, 2025-03-11. community.openai.com/...1140929. Accessed 2026-06-26.
^OpenAI, "Assistants migration guide", OpenAI Platform Documentation, 2025-08-26. developers.openai.com/...migration. Accessed 2026-06-26.
^OpenAI, "Assistants Function Calling", OpenAI Platform Documentation, 2024. developers.openai.com/...function-calling. Accessed 2026-06-26.
^OpenAI Developer Community, "Assistants Code Interpreter API: how is session pricing defined?", OpenAI, 2023-11. community.openai.com/...477990. Accessed 2026-06-26.
^Microsoft Learn, "How does pricing for Azure Assistants API code-interpreter work?", Microsoft Q&A, 2024. learn.microsoft.com/...re-assistants-api-code-int. Accessed 2026-06-26.
^Layton, Dennis, "OpenAI GPTs and Assistants API: What you need to know", Medium, 2023-11. medium.com/...-what-you-need-to-know-5017e4886e62. Accessed 2026-06-26.
^Eesel AI, "OpenAI Assistants API: A 2025 guide to the deprecation and better alternatives", Eesel, 2025. eesel.ai/...openai-assistants-api. Accessed 2026-06-26.
^Microsoft Learn, "Azure OpenAI in Microsoft Foundry Models: Assistants API concepts", Microsoft, 2024-2026. learn.microsoft.com/...assistants. Accessed 2026-06-26.
^Kothari, Amit, "OpenAI Assistants API: the good, bad, and expensive", amitkoth.com, 2024. amitkoth.com/openai-assistants-api-review Accessed 2026-06-26.
^Syntackle, "OpenAI's Assistants API is Deprecated: Migrate to the New Responses API", Syntackle Blog, 2025. syntackle.com/...openai-assistants-to-responses-api Accessed 2026-06-26.
^Anthropic, "Using the Messages API", Anthropic Claude API Documentation, 2026. platform.claude.com/...working-with-messages. Accessed 2026-06-26.
^Pillitteri, Pasquale, "Claude Managed Agents: Python Tutorial with Code, Practical Examples, and OpenAI Responses API Comparison", pasqualepillitteri.it, 2025. pasqualepillitteri.it/...hon-tutorial-code-openai. Accessed 2026-06-26.
^Portkey, "OpenAI Responses API vs Chat Completions vs Anthropic Messages API", Portkey Blog, 2025. portkey.ai/...-vs-anthropic-anthropic-messages-api Accessed 2026-06-26.

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

3 revisions by 1 contributor · v4 · 5,606 words · full history

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Suggest edit

What links here

Agent Builder (OpenAI AgentKit)ChatGPT plugins Code Interpreter (Advanced Data Analysis)Custom GPT Letta (MemGPT)OpenAI AgentKit OpenAI Responses API Search Engine ChatGPT Plugins

Infobox

What is the OpenAI Assistants API?

How does the Assistants API work?

What tools does the Assistants API provide?

When was the Assistants API released, and what changed in v2?

Origins at DevDay 2023

Initial limitations and developer reception

v2 release in April 2024

Is the Assistants API deprecated?

The Responses API successor

The August 2025 deprecation notice

How does the Assistants API handle pricing and streaming?

Pricing

Streaming

SDK helpers

Who used the Assistants API?

How does the Assistants API compare to other agent APIs?

Versus the Chat Completions endpoint

Versus the Anthropic Messages API

Versus custom agent frameworks

Versus the Model Context Protocol

Summary table

What were the main limitations of the Assistants API?

What lessons did OpenAI draw from the Assistants API?

Why does the Assistants API matter?

ELI5: What is the Assistants API in simple terms?

Related Work

See also

References

Improve this article

Related Articles

OpenAI Agents SDK

OpenAI Responses API

OpenAI Codex CLI

OpenAI AgentKit

OpenAI Apps SDK

OpenAI Codex Cloud

What links here

Related Articles

OpenAI Agents SDK

OpenAI Responses API

OpenAI Codex CLI

OpenAI AgentKit

OpenAI Apps SDK

OpenAI Codex Cloud

What links here