OpenAI Batch API

9 min read

Updated Jul 23, 2026

The OpenAI Batch API is an asynchronous service from OpenAI that lets developers submit large groups of API requests in a single JSONL file and receive the results within 24 hours, in exchange for a 50% discount on token costs relative to the equivalent synchronous (real-time) endpoints. It is designed for high-volume, non-time-sensitive workloads such as evaluations, bulk classification, embeddings generation, and summarization, and it became available on April 15, 2024.^[1]^[2] OpenAI's documentation summarizes the trade as a "50% cost discount compared to synchronous APIs," where "each batch completes within 24 hours (and often more quickly)" using "a separate pool of significantly higher rate limits."^[3]

What is the OpenAI Batch API?

The Batch API addresses a common pattern in production use of large language models: jobs that involve many thousands of requests but do not require an immediate, low-latency response. Rather than sending each request individually to a synchronous endpoint, a developer collects the requests into one file, uploads it, and creates a single batch job that OpenAI processes in the background.^[3]

Compared with calling the standard endpoints directly, the Batch API offers three documented advantages. First, it applies a 50% cost discount on both input and output tokens relative to the synchronous price of the same model.^[2]^[3]^[4] Second, it provides a separate pool of substantially higher rate limits, so batch work does not draw down the standard per-minute token and request limits used by real-time traffic; at launch OpenAI cited a ceiling of 250 million input tokens enqueued for GPT-4 Turbo.^[2] Third, each batch is processed within a 24-hour completion window, and OpenAI states that results often return more quickly than the full window.^[3]

The trade-off is latency. Because work is queued and processed asynchronously, the Batch API is not suitable for interactive applications that need a response in seconds. If a batch cannot finish inside the 24-hour window, the unfinished requests are marked as expired and any completed results are still returned.^[3]

How does the OpenAI Batch API work?

The Batch API uses a small set of endpoints to build a request file, start a job, monitor it, and collect results. Each input line is an independent request that mirrors the body a developer would otherwise send to a synchronous endpoint, wrapped with a unique identifier so results can be matched back to inputs. The typical workflow is as follows.^[3]

Step	Action	Endpoint / mechanism	Notes
1	Prepare a `.jsonl` file	Local file creation	One JSON object per line; each line needs a unique `custom_id`, plus `method`, `url`, and the request `body`. A single file may target only one model.
2	Upload the input file	Files API, `POST /v1/files` with `purpose="batch"`	Returns an `input_file_id`.
3	Create the batch	`POST /v1/batches`	References the `input_file_id`, the target `endpoint`, and a `completion_window` (set to `"24h"`).
4	Poll the batch status	`GET /v1/batches/{batch_id}`	Status moves through `validating`, `in_progress`, `finalizing`, then `completed` (or `failed`, `expired`, `cancelling`, `cancelled`).
5	Retrieve the results	`GET /v1/files/{output_file_id}/content`	When complete, the batch object exposes an `output_file_id` (successful results) and an `error_file_id` (failed requests), each in JSONL form keyed by `custom_id`.

The custom_id field is required and must be unique within the file. As OpenAI's guide warns, "the output line order may not match the input line order," so "instead of relying on order to process your results, use the custom_id field," which is present on every line of the output.^[3] Batches can also be cancelled before completion, and a list endpoint allows developers to enumerate their batch jobs.^[5]

How much cheaper is the OpenAI Batch API?

Batch jobs are billed at 50% of the standard synchronous token price for the same model, applied to both input and output tokens.^[2]^[3]^[4] Billing follows the model used, so the absolute per-token rate varies by model (for example, a GPT-4o batch is billed at half of the GPT-4o synchronous rate). The discount is the defining commercial feature of the service, and it stacks with prompt caching: caching and batch pricing are independent discounts that can apply to the same request.^[3]^[4]

The API enforces explicit size limits on each batch and its input file, summarized below.^[3]

Limit	Value
Maximum requests per batch	50,000
Maximum input file size	200 MB
Model per input file	Exactly one
Completion window	24 hours
Input/output file retention	Files expire after 30 days
Cost vs. synchronous endpoints	50% discount on input and output tokens

For embeddings, batches are additionally restricted to a maximum of 50,000 embedding inputs across all requests in the batch.^[3] Rate limits for batch work are tracked in a dedicated pool, separate from the synchronous rate limits, and are expressed in part as a cap on the number of input tokens that can be enqueued at once per model.^[2]^[3]

Which endpoints does the OpenAI Batch API support?

At launch on April 15, 2024, the Batch API supported only the Chat Completions endpoint (/v1/chat/completions).^[2]^[6] On April 29, 2024, OpenAI published a dedicated Batch API guide and added support for embeddings models via /v1/embeddings, allowing bulk generation with models such as text-embedding-3.^[7] When GPT-4o launched in the API on May 13, 2024, it was available through the Batch API as a text and vision model, extending batch processing to image inputs handled by chat completions.^[8]

OpenAI has continued to broaden coverage over time. As documented, the Batch API supports the following endpoints:^[3]

/v1/responses (the Responses API)
/v1/chat/completions
/v1/embeddings
/v1/completions
/v1/moderations
/v1/images/generations
/v1/images/edits
/v1/videos

Each input file targets a single endpoint, specified when the batch is created.^[3]

What is the OpenAI Batch API used for?

The Batch API is intended for workloads where throughput and cost matter more than immediate latency. Common applications include:^[1]^[3]

Running evaluations: scoring a model against large benchmark or test sets, where all prompts are known in advance.
Bulk classification and tagging: labeling large datasets, such as moderating or categorizing user-generated content, support tickets, or product listings.
Embeddings generation: producing vector embeddings for large document collections to populate search indexes or retrieval systems.
Summarization and content generation: condensing or rewriting large corpora, for example summarizing documents or generating product descriptions in bulk.
Data extraction and transformation: parsing structured information out of unstructured text across many records.
Synthetic data creation: generating large volumes of examples for fine-tuning or testing.

Because the service decouples submission from completion, it is well suited to scheduled or overnight pipelines, and to organizations that want to process large jobs without exhausting the rate limits reserved for their real-time, customer-facing traffic.^[3]

How does the OpenAI Batch API compare to Anthropic and Google?

The Batch API is one of three closely matched asynchronous batch services from the major model providers. All three follow the same commercial model: a 50% discount on standard token prices in exchange for asynchronous processing within a 24-hour target window. They differ mainly in per-batch size limits, input format, and how long results are retained.

Anthropic offers the Message Batches API, introduced in public beta on October 8, 2024 and reaching general availability on December 17, 2024.^[9]^[10] Anthropic's documentation describes it as "a powerful, cost-effective way to asynchronously process large volumes of Messages requests" suited to "tasks that do not require immediate responses, with most batches finishing in less than 1 hour while reducing costs by 50% and increasing throughput."^[10] A single Anthropic batch is "limited to either 100,000 Message requests or 256 MB in size, whichever is reached first," and "batch results are available for 29 days after creation."^[10]

Google provides Batch Mode in the Gemini API, which Google describes as a way to "submit large jobs, offload scheduling and processing, and retrieve your results within 24 hours, all at a 50% discount compared to our standard interactive APIs."^[11] Gemini Batch Mode accepts either inline requests (recommended for batches under 20 MB) or a JSONL file of GenerateContentRequest objects for larger jobs.^[12]

Provider	Service	Discount	Target turnaround	Per-batch limit	Result retention
OpenAI	Batch API	50%	Within 24 hours (often faster)	50,000 requests / 200 MB file	Files expire after 30 days^[3]
Anthropic	Message Batches API	50%	Within 24 hours (most under 1 hour)	100,000 requests or 256 MB	29 days^[10]
Google	Gemini API Batch Mode	50%	Within 24 hours (often faster)	JSONL or inline (under 20 MB inline)	Per Gemini API policy^[11]^[12]

All three discounts can be combined with prompt caching where the provider supports it, and in each case unfinished requests expire at the 24-hour boundary while completed results remain retrievable.^[3]^[10]^[11]

ELI5: what is a Batch API?

Imagine you have ten thousand questions to ask an AI, but you do not need the answers right this second. Instead of asking one question at a time and waiting for each reply, you write all of them in one big list, hand the list over, and come back later (within a day) to pick up all the answers at once. Because the AI company can fit your big list in whenever it has spare capacity, it charges you about half price. That is a Batch API: cheaper and bigger, but you wait longer for the answers.

References

^OpenAI. "Batch API FAQ." OpenAI Help Center. help.openai.com/...9197833-batch-api-faq
^OpenAI Developer Community. "Batch API is now available." April 15, 2024. community.openai.com/...718416
^OpenAI. "Batch API guide." OpenAI API documentation. developers.openai.com/...batch
^The Decoder. "OpenAI introduces Batch API with up to 50% discount for asynchronous tasks." April 2024. the-decoder.com/...discount-for-asynchronous-tasks
^OpenAI. "Create batch." OpenAI API Reference. platform.openai.com/...create
^OpenAI. "API Pricing." openai.com/...pricing
^OpenAI. "Changelog." OpenAI API documentation (entries dated April 29, 2024). developers.openai.com/...changelog
^OpenAI Developer Community. "Announcing GPT-4o in the API!" May 13, 2024. community.openai.com/...744700
^Anthropic. "Introducing the Message Batches API." October 8, 2024. anthropic.com/...message-batches-api
^Anthropic. "Batch processing." Claude Platform Docs. platform.claude.com/...batch-processing
^Google. "Batch Mode in the Gemini API: Process more for less." Google Developers Blog. developers.googleblog.com/...batch-mode-gemini-api
^Google. "Batch API." Gemini API documentation. ai.google.dev/...batch-api

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

3 revisions by 1 contributors · v4 · 1,786 words · full history

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Suggest edit

What links here

Anthropic API Azure OpenAI Service GPT API OpenAI API

What is the OpenAI Batch API?

How does the OpenAI Batch API work?

How much cheaper is the OpenAI Batch API?

Which endpoints does the OpenAI Batch API support?

What is the OpenAI Batch API used for?

How does the OpenAI Batch API compare to Anthropic and Google?

ELI5: what is a Batch API?

See also

References

Improve this article

Related Articles

GPT API

Gym (OpenAI Gym / Gymnasium)

OpenAI Agents SDK

OpenAI Responses API

OpenAI Codex

OpenAI Realtime API

What links here

Related Articles

GPT API

Gym (OpenAI Gym / Gymnasium)

OpenAI Agents SDK

OpenAI Responses API

OpenAI Codex

OpenAI Realtime API

What links here