# OpenAI Batch API

> Source: https://aiwiki.ai/wiki/batch_api
> Updated: 2026-06-03
> Categories: Developer Tools, OpenAI
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

**The OpenAI Batch API** is an asynchronous service from [OpenAI](/wiki/openai) that lets developers submit large groups of [API](/wiki/openai_api) requests in a single file for processing within a target window of 24 hours, in exchange for a 50% discount on token costs relative to the equivalent synchronous (real-time) endpoints. It became available on April 15, 2024, and is designed for high-volume, non-time-sensitive workloads such as evaluations, bulk classification, embeddings generation, and summarization.[1][2]

## Overview

The Batch API addresses a common pattern in production use of large language models: jobs that involve many thousands of requests but do not require an immediate, low-latency response. Rather than sending each request individually to a synchronous endpoint, a developer collects the requests into one file, uploads it, and creates a single batch job that OpenAI processes in the background.[3]

Compared with calling the standard endpoints directly, the Batch API offers three documented advantages. First, it applies a 50% cost discount on both input and output tokens relative to the synchronous price of the same model.[2][4] Second, it provides a separate pool of substantially higher rate limits, so batch work does not draw down the standard per-minute token and request limits used by real-time traffic; at launch OpenAI cited a ceiling of 250 million input tokens enqueued for GPT-4 Turbo.[2] Third, each batch is processed within a 24-hour completion window, and OpenAI states that results often return more quickly than the full window.[3]

The trade-off is latency. Because work is queued and processed asynchronously, the Batch API is not suitable for interactive applications that need a response in seconds. If a batch cannot finish inside the 24-hour window, the unfinished requests are marked as expired and any completed results are still returned.[3]

## How it works (workflow)

The Batch API uses a small set of endpoints to build a request file, start a job, monitor it, and collect results. Each input line is an independent request that mirrors the body a developer would otherwise send to a synchronous endpoint, wrapped with a unique identifier so results can be matched back to inputs. The typical workflow is as follows.[3]

| Step | Action | Endpoint / mechanism | Notes |
| --- | --- | --- | --- |
| 1 | Prepare a `.jsonl` file | Local file creation | One JSON object per line; each line needs a unique `custom_id`, plus `method`, `url`, and the request `body`. A single file may target only one model. |
| 2 | Upload the input file | Files API, `POST /v1/files` with `purpose="batch"` | Returns an `input_file_id`. |
| 3 | Create the batch | `POST /v1/batches` | References the `input_file_id`, the target `endpoint`, and a `completion_window` (set to `"24h"`). |
| 4 | Poll the batch status | `GET /v1/batches/{batch_id}` | Status moves through `validating`, `in_progress`, `finalizing`, then `completed` (or `failed`, `expired`, `cancelling`, `cancelled`). |
| 5 | Retrieve the results | `GET /v1/files/{output_file_id}/content` | When complete, the batch object exposes an `output_file_id` (successful results) and an `error_file_id` (failed requests), each in JSONL form keyed by `custom_id`. |

The `custom_id` field is required and must be unique within the file, because results in the output are not guaranteed to be in the same order as the inputs.[3] Batches can also be cancelled before completion, and a list endpoint allows developers to enumerate their batch jobs.[5]

## Pricing and limits

Batch jobs are billed at 50% of the standard synchronous token price for the same model, applied to both input and output tokens.[2][4] Billing follows the model used, so the absolute per-token rate varies by model (for example, a [GPT-4o](/wiki/gpt_4o) batch is billed at half of the GPT-4o synchronous rate). The discount is the defining commercial feature of the service.

The API enforces explicit size limits on each batch and its input file, summarized below.[3]

| Limit | Value |
| --- | --- |
| Maximum requests per batch | 50,000 |
| Maximum input file size | 200 MB |
| Model per input file | Exactly one |
| Completion window | 24 hours |
| Input/output file retention | Files expire after 30 days |
| Cost vs. synchronous endpoints | 50% discount on input and output tokens |

For embeddings, batches are additionally restricted to a maximum of 50,000 embedding inputs across all requests in the batch.[3] Rate limits for batch work are tracked in a dedicated pool, separate from the synchronous rate limits, and are expressed in part as a cap on the number of input tokens that can be enqueued at once per model.[2][3]

## Supported endpoints

At launch on April 15, 2024, the Batch API supported only the [Chat Completions](/wiki/openai_api) endpoint (`/v1/chat/completions`).[2][6] On April 29, 2024, OpenAI published a dedicated Batch API guide and added support for embeddings models via `/v1/embeddings`, allowing bulk generation with models such as [text-embedding-3](/wiki/text_embedding_3).[7] When GPT-4o launched in the API on May 13, 2024, it was available through the Batch API as a text and vision model, extending batch processing to image inputs handled by chat completions.[8]

OpenAI has continued to broaden coverage over time. As documented, the Batch API supports the following endpoints:[3]

- `/v1/responses` (the Responses API)
- `/v1/chat/completions`
- `/v1/embeddings`
- `/v1/completions`
- `/v1/moderations`
- `/v1/images/generations`
- `/v1/images/edits`
- `/v1/videos`

Each input file targets a single endpoint, specified when the batch is created.[3]

## Use cases

The Batch API is intended for workloads where throughput and cost matter more than immediate latency. Common applications include:[1][3]

- **Running evaluations**: scoring a model against large benchmark or test sets, where all prompts are known in advance.
- **Bulk classification and tagging**: labeling large datasets, such as moderating or categorizing user-generated content, support tickets, or product listings.
- **Embeddings generation**: producing vector embeddings for large document collections to populate search indexes or retrieval systems.
- **Summarization and content generation**: condensing or rewriting large corpora, for example summarizing documents or generating product descriptions in bulk.
- **Data extraction and transformation**: parsing structured information out of unstructured text across many records.
- **Synthetic data creation**: generating large volumes of examples for fine-tuning or testing.

Because the service decouples submission from completion, it is well suited to scheduled or overnight pipelines, and to organizations that want to process large jobs without exhausting the rate limits reserved for their real-time, customer-facing traffic.[3]

## References

1. OpenAI. "Batch API FAQ." OpenAI Help Center. https://help.openai.com/en/articles/9197833-batch-api-faq
2. OpenAI Developer Community. "Batch API is now available." April 15, 2024. https://community.openai.com/t/batch-api-is-now-available/718416
3. OpenAI. "Batch API guide." OpenAI API documentation. https://developers.openai.com/api/docs/guides/batch
4. The Decoder. "OpenAI introduces Batch API with up to 50% discount for asynchronous tasks." April 2024. https://the-decoder.com/openai-introduces-batch-api-with-up-to-50-discount-for-asynchronous-tasks/
5. OpenAI. "Create batch." OpenAI API Reference. https://platform.openai.com/docs/api-reference/batch/create
6. OpenAI. "API Pricing." https://openai.com/api/pricing/
7. OpenAI. "Changelog." OpenAI API documentation (entries dated April 29, 2024). https://developers.openai.com/api/docs/changelog
8. OpenAI Developer Community. "Announcing GPT-4o in the API!" May 13, 2024. https://community.openai.com/t/announcing-gpt-4o-in-the-api/744700

