Anthropic Message Batches API
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 4,267 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 4,267 words
Add missing citations, update stale details, or suggest a clearer explanation.
The Message Batches API (commonly referred to as the Anthropic Batches API) is an asynchronous bulk processing endpoint offered by Anthropic for its Claude family of large language models. Introduced in public beta on October 8, 2024 and reaching general availability on December 17, 2024, the API allows developers to submit up to 100,000 message requests per batch and receive results within 24 hours at a 50% discount on both input and output tokens compared with the synchronous Messages API.[^1][^2] The service is positioned for workloads that do not require real-time responses, including evaluations, classification, content moderation, summarization, and large-scale data labeling.[^1][^3] As of 2026 it is available directly on the Anthropic API, through Anthropic's first-party SDKs, on Amazon Bedrock (as Bedrock Batch Inference), and on Google Cloud's Vertex AI (as Vertex AI Batch Prediction for partner models).[^1][^4][^5]
| Field | Value |
|---|---|
| Official name | Message Batches API |
| Provider | Anthropic |
| Announced | October 8, 2024 (beta) |
| Generally available | December 17, 2024 |
| Discount | 50% off input and output tokens |
| Max requests per batch | 100,000 |
| Max batch size | 256 MB |
| Service-level target | Up to 24 hours (most under 1 hour) |
| Result retention | 29 days |
| Streaming | Not supported |
| Endpoint prefix | /v1/messages/batches |
| Supported models | All active Claude models |
| First-party SDKs | Python, TypeScript, Go, Java, C#, PHP, Ruby |
Anthropic publicly announced the Message Batches API on October 8, 2024 via its corporate blog under the title "Introducing the Message Batches API."[^1] The announcement framed the product as a way for organizations to process "non-time-sensitive workloads" at lower cost. At launch, the API was in public beta with support for Claude 3.5 Sonnet, Claude 3 Opus, and Claude 3 Haiku. Pricing was set at exactly half of standard Messages API rates for both input and output tokens, and developers could submit up to 10,000 requests per batch with processing completing within a 24-hour window.[^1]
The original launch positioned the Batches API directly against OpenAI's Batch API, which had launched in April 2024 with a similar 50% discount structure and a 24-hour service-level target.[^6][^7] Independent technology journalists noted that the Anthropic product matched OpenAI on price and processing window, while differentiating on per-batch request volume and on its compatibility with the Anthropic Messages request schema.[^8][^7]
A featured launch customer was the question-and-answer platform Quora. Andy Edmonds, a product manager at Quora, was quoted in Anthropic's announcement: "Anthropic's Batches API provides cost savings while also reducing the complexity of running a large number of queries that don't need to be processed in real time. It's very convenient to submit a batch and download the results within 24 hours, instead of having to deal with the complexity of running many parallel live queries to get the same result."[^1]
On December 17, 2024, the API moved out of beta to general availability on the Anthropic API.[^1][^2] Coinciding with GA, Anthropic increased the per-batch request limit from 10,000 to 100,000 and added a hard byte ceiling of 256 megabytes (whichever is reached first), substantially expanding the effective batch size from the beta cap of roughly 32 MB.[^9][^10] The retention policy was set at 29 days, after which the batch object itself remains visible but its result file becomes inaccessible for download.[^9]
After GA, Anthropic progressively extended the Batches API to cover most of the features added to the synchronous Messages API:
tool_choice, and structured output schemas are accepted as part of the params object in each batched request, allowing batches to mix function calling and free-form generations.[^9][^10]thinking configuration block is supported inside batched params, letting batched requests run with the same chain-of-thought reasoning behavior available in the sync API. Anthropic published a beta header (output-300k-2026-03-24) that raises the per-request maximum output to 300,000 tokens for Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6 inside batches, an option targeted at long-form structured generation and code workloads.[^11]Batch inference for Claude on third-party clouds tracked the first-party API:
The Batches API exposes a small set of REST endpoints under /v1/messages/batches and follows the standard authentication pattern of the Anthropic API, including the x-api-key header and the anthropic-version versioning header.[^9][^14]
| Operation | HTTP method | Path |
|---|---|---|
| Create a batch | POST | /v1/messages/batches |
| Retrieve a batch | GET | /v1/messages/batches/{message_batch_id} |
| Stream results | GET | /v1/messages/batches/{message_batch_id}/results |
| List batches | GET | /v1/messages/batches |
| Cancel a batch | POST | /v1/messages/batches/{message_batch_id}/cancel |
| Delete a batch | DELETE | /v1/messages/batches/{message_batch_id} |
The create endpoint accepts a JSON body with a requests array. Each element contains a custom_id string (matching the regular expression ^[a-zA-Z0-9_-]{1,64}$) and a params object whose schema is identical to the body of a regular Messages API call.[^14][^9] The custom_id is mandatory because results are not guaranteed to come back in the same order as the input array; callers use it to join responses back to the originating request.[^9]
A representative create request looks like:
curl https://api.anthropic.com/v1/messages/batches \
--header "x-api-key: $ANTHROPIC_API_KEY" \
--header "anthropic-version: 2023-06-01" \
--header "content-type: application/json" \
--data '{
"requests": [
{
"custom_id": "doc-001",
"params": {
"model": "claude-opus-4-7",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Summarize the attached transcript."}
]
}
}
]
}'
The retrieve endpoint is idempotent and returns a MessageBatch object containing id, processing_status (in_progress, canceling, or ended), request_counts (with the per-status tallies processing, succeeded, errored, canceled, and expired), created_at, expires_at, ended_at, and once processing has finished, a results_url pointing to a .jsonl file of the per-request outcomes.[^15]
When processing ends, every request in the batch falls into one of four terminal result types: succeeded (a regular Message response, billed at batch rates), errored (invalid request or server error, not billed), canceled (terminated by user cancellation before reaching the model, not billed), or expired (24-hour expiration reached before the model ran, not billed).[^9] The request_counts field on the parent batch object tracks how many requests fell into each bucket, while the per-request lines in the .jsonl output carry the full request output or error detail keyed by custom_id.
The batch lifecycle is documented as:
POST /v1/messages/batches containing the request list. The batch is created with processing_status: "in_progress" and request_counts.processing equal to the number of submitted requests.[^9]expired and the batch ends.[^9]processing_status is ended, the results_url becomes available for streaming download.[^9][^15]archived_at timestamp is set and the results file is removed; the batch metadata itself can still be listed.[^9]Internally, the discount reflects Anthropic's ability to schedule batched traffic during periods of lower system demand and to amortize batched requests across spare capacity rather than serving them on the latency-sensitive synchronous tier. Anthropic does not publish a formal scheduling SLA other than the 24-hour expiration ceiling, and notes explicitly that "processing may be slowed down based on current demand and your request volume," with the consequence that "you may see more requests expiring after 24 hours" during periods of heavy contention.[^9]
In practice, this means the Batches API behaves as a best-effort opportunistic queue rather than a hard-bound scheduler: under low system load batches usually drain within minutes to a few hours, while during sustained peak load the median time-to-completion can extend toward the 24-hour cap. The asymmetry is deliberate, since the discount is justified precisely by the fact that batched work can absorb capacity that would otherwise be unused while not displacing latency-sensitive sync calls. Anthropic's public guidance is therefore to plan around the 24-hour ceiling rather than around the median, and to design downstream consumers to handle out-of-order arrival via custom_id joining rather than relying on per-call dispatch ordering.[^9]
Batches are scoped to a Workspace, meaning that any API key in a given Workspace can list and retrieve batches created within that Workspace.[^9] Anthropic enforces two separate rate-limit dimensions: a per-HTTP-call rate on the Batches API itself, and a separate limit on the number of in-flight requests inside batches waiting to be processed.[^9] The documentation also warns that because batches process requests concurrently and asynchronously, the actual billed spend on a batch may slightly exceed a Workspace's configured spend limit (i.e., the spend-limit guardrail is not strictly enforced at the per-request level once a batch has been admitted).[^9]
There are a small number of feature gaps between batch processing and a real-time call:
custom_id for joining.[^9]max_tokens must be at least 1. The max_tokens: 0 pattern used for prompt cache pre-warming in the sync API is not allowed inside a batch because a cache entry written mid-batch would likely expire before any follow-up request runs.[^9]The Batches API charges 50% of the standard Anthropic Messages API per-token rate, applied symmetrically to input and output tokens for all supported models.[^1][^9] Anthropic's official pricing table for the batch tier (as published in 2026) shows:
| Model | Batch input (per million tokens) | Batch output (per million tokens) |
|---|---|---|
| Claude Opus 4.7 | $2.50 | $12.50 |
| Claude Opus 4.6 | $2.50 | $12.50 |
| Claude Opus 4.5 | $2.50 | $12.50 |
| Claude Opus 4.1 | $7.50 | $37.50 |
| Claude Sonnet 4.6 | $1.50 | $7.50 |
| Claude Sonnet 4.5 | $1.50 | $7.50 |
| Claude Haiku 4.5 | $0.50 | $2.50 |
| Claude Haiku 3.5 (Bedrock and Vertex only) | $0.40 | $2.00 |
Source: Anthropic, "Batch processing" documentation.[^9]
Because the discount applies to the same model parameters as the sync API, all per-model differences such as long-context price tiers, vision token costs, and the Claude Opus 4.1 token rate are preserved at 50% of the published synchronous rate. The discount stacks with prompt caching: cache writes are billed at the cache-write multiplier (typically a small premium over base input tokens) while cache hits are billed at 10% of base input tokens, and the resulting figure is again halved when accessed via a batch request.[^8][^16]
The Batches API is wrapped in idiomatic methods across Anthropic's official client libraries. In Python and TypeScript, the calls live under client.messages.batches.create, .retrieve, .list, .cancel, and .delete, returning typed batch and result objects.[^9] Anthropic also publishes batch wrappers in Go (Messages.Batches.New/Get), Java (MessageBatchService), C# (Anthropic.Models.Messages.Batches), PHP, and Ruby, with auto-pagination on the list endpoint.[^9] An anthropic command-line interface (ant messages:batches create / retrieve / list) provides a YAML-based equivalent for shell workflows.[^9]
Amazon Bedrock exposes batch inference as a distinct Bedrock primitive rather than reusing Anthropic's /v1/messages/batches shape. Customers upload JSONL records to S3, invoke a Bedrock CreateModelInvocationJob, and retrieve completed results from a destination S3 bucket. Bedrock charges 50% of its on-demand rate for batch jobs and, as of August 2025, supports Claude Sonnet 4 alongside other Anthropic models. The Bedrock implementation also publishes CloudWatch metrics for batch progress, including tokens pending processing for Claude.[^4]
On Vertex AI, Anthropic Claude is exposed as a partner model and batched via Google Cloud's Batch Prediction service. Inputs come from BigQuery tables or Cloud Storage JSONL files, with the per-row body following the Anthropic Messages schema, and results land back in BigQuery or Cloud Storage. The Vertex variant defaults to a maximum of four concurrent batch jobs per project, with a 24-hour completion window matching the first-party API.[^5]
The Batches API is also exposed through API gateways and orchestration platforms, including LiteLLM, Portkey, and the Vercel AI Gateway, which provide compatibility layers between the Anthropic batch endpoints and equivalent batch endpoints from other providers such as OpenAI and Mistral AI. These integrations typically translate between provider-specific input shapes and a common batch abstraction, but they do not change the underlying pricing, limits, or SLA of the source provider.
The Batches API is deliberately scoped to workloads where the 24-hour deadline is acceptable in exchange for the 50% discount. Anthropic's documentation lists four canonical use cases: large-scale evaluations, content moderation pipelines, large-volume data analysis, and bulk content generation.[^9] In practice, the dominant published patterns are:
succeeded rows by custom_id, and computes pass-at-K or rubric-graded scores offline. Anthropic explicitly recommends this pattern in its documentation, and notes that splitting a sweep into a single batch produces results that are easier to compare than the same sweep run across many sync sessions where latency-driven retries can perturb the workload.[^9]The 50% discount and 24-hour SLA pattern has become a de facto industry standard for batch text generation. The table below compares the three best-documented public batch APIs as of 2026.
| Property | Anthropic Batches API | OpenAI Batch API | Mistral La Plateforme Batch API |
|---|---|---|---|
| Announced | October 8, 2024[^1] | April 15, 2024[^7] | November 7, 2024[^17] |
| Discount versus sync | 50% on input and output[^1] | 50% on input and output[^7] | 50% versus sync[^17] |
| SLA | Up to 24 hours[^9] | Up to 24 hours[^7] | Asynchronous, with explicit batch window |
| Max requests per batch | 100,000[^9] | 50,000 per file[^7] | Documented per file, supports millions across a job |
| Max batch size | 256 MB[^9] | Per-file file-size limit on the Files API[^7] | Documented as multi-gigabyte capable |
| Result retention | 29 days[^9] | Set by Files API retention[^7] | Time-bounded per documentation |
| Endpoint style | /v1/messages/batches[^9] | /v1/batches with input_file_id[^7] | Per-job submit with input file ID[^17] |
| Streaming inside a batched request | Not supported[^9] | Not supported[^7] | Not supported |
Functionally, the three APIs are very close substitutes: each provides a 50% asynchronous discount, a 24-hour ceiling, and identical-to-the-sync-API per-request parameter schemas. The principal differences are in the shape of input upload (Anthropic accepts an inline JSON array of up to 100,000 requests, OpenAI requires a JSONL file uploaded via its Files API, Mistral uploads an input file via its job-creation flow); in raw batch ceilings (Anthropic's documented 100,000 request maximum is higher per batch than OpenAI's 50,000); and in tooling around results (Vertex AI's Claude integration adds BigQuery-native input and output handling, while Bedrock's Batch Inference adds CloudWatch metrics).[^7][^9][^5][^4][^17]
In commentary published the day of the Anthropic launch, Simon Willison observed that the Anthropic Batches API "matches offerings from OpenAI and Google Gemini, which both provide identical 50% pricing discounts on their respective batch services," noting that the convergence of all three major providers on the same discount and SLA structure had effectively standardized a tier of asynchronous text generation pricing across the industry.[^8] VentureBeat described the launch as Anthropic "challenging OpenAI with affordable batch processing," arguing that the introduction of batching specifically targeted enterprise workloads where the 50% discount was material at scale.[^6]
The key remaining differentiator across providers in 2026 is not price or SLA but the underlying model itself: each provider's Batch API uses the same model line as its sync API, so the choice between Anthropic, OpenAI, Mistral, and (via Vertex AI) Gemini Batch Prediction is dominated by model selection rather than batch mechanics.
Anthropic's Responses API competitor, the synchronous Messages API, is not a direct batch substitute because each call is billed at standard rates and is rate-limited per-request. The Batches API is also distinct from interactive Claude products like Claude Artifacts, Claude Skills, and Anthropic Computer Use, which depend on real-time response streaming and tool execution. By contrast, the Batches API is purely a backend bulk-processing primitive; it shares the parameter schema with the rest of the Anthropic API but offers neither streaming nor low latency.
Despite broad feature coverage, the Batches API has several documented limitations:
custom_id.[^9]expired. Anthropic warns that under high demand more requests may expire, with no formal pre-emption SLA other than the 24-hour cap.[^9].jsonl files become inaccessible after 29 days. The batch metadata remains visible but the data itself must be downloaded and persisted by the customer if needed long-term.[^9]max_tokens: 0 cache pre-warming. Because the cache slot would likely expire before any follow-up sync request, Anthropic explicitly disallows cache-pre-warm requests inside batches.[^9]A more subtle constraint is that the asynchronous model defeats use cases that interleave model output with tool execution at low latency, since each batched call is a single completed turn rather than an interactive loop. Agentic workflows that depend on tool round-tripping within a session, including the kind of step-by-step tool execution at the heart of Anthropic Computer Use or Claude Code interactive sessions, are not a natural fit for the Batches API.
params, with a beta header raising the per-request output cap to 300,000 tokens for the largest current models.[^11]tools and tool_choice parameters; results return tool-use blocks like the sync API.[^9]