Anthropic Message Batches API

21 min read
Updated
Suggest editHistory
RawGraph

Last reviewed

Sources

No citations yet

Review status

Needs citations

Revision

v2 · 4,267 words

The Message Batches API (commonly referred to as the Anthropic Batches API) is an asynchronous bulk processing endpoint offered by Anthropic for its Claude family of large language models. Introduced in public beta on October 8, 2024 and reaching general availability on December 17, 2024, the API allows developers to submit up to 100,000 message requests per batch and receive results within 24 hours at a 50% discount on both input and output tokens compared with the synchronous Messages API.[1][2] The service is positioned for workloads that do not require real-time responses, including evaluations, classification, content moderation, summarization, and large-scale data labeling.[1][3] As of 2026 it is available directly on the Anthropic API, through Anthropic's first-party SDKs, on Amazon Bedrock (as Bedrock Batch Inference), and on Google Cloud's Vertex AI (as Vertex AI Batch Prediction for partner models).[1][4][5]

Infobox

FieldValue
Official nameMessage Batches API
ProviderAnthropic
AnnouncedOctober 8, 2024 (beta)
Generally availableDecember 17, 2024
Discount50% off input and output tokens
Max requests per batch100,000
Max batch size256 MB
Service-level targetUp to 24 hours (most under 1 hour)
Result retention29 days
StreamingNot supported
Endpoint prefix/v1/messages/batches
Supported modelsAll active Claude models
First-party SDKsPython, TypeScript, Go, Java, C#, PHP, Ruby

History

Origins and beta launch

Anthropic publicly announced the Message Batches API on October 8, 2024 via its corporate blog under the title "Introducing the Message Batches API."[1] The announcement framed the product as a way for organizations to process "non-time-sensitive workloads" at lower cost. At launch, the API was in public beta with support for Claude 3.5 Sonnet, Claude 3 Opus, and Claude 3 Haiku. Pricing was set at exactly half of standard Messages API rates for both input and output tokens, and developers could submit up to 10,000 requests per batch with processing completing within a 24-hour window.[1]

The original launch positioned the Batches API directly against OpenAI's Batch API, which had launched in April 2024 with a similar 50% discount structure and a 24-hour service-level target.[6][7] Independent technology journalists noted that the Anthropic product matched OpenAI on price and processing window, while differentiating on per-batch request volume and on its compatibility with the Anthropic Messages request schema.[8][7]

A featured launch customer was the question-and-answer platform Quora. Andy Edmonds, a product manager at Quora, was quoted in Anthropic's announcement: "Anthropic's Batches API provides cost savings while also reducing the complexity of running a large number of queries that don't need to be processed in real time. It's very convenient to submit a batch and download the results within 24 hours, instead of having to deal with the complexity of running many parallel live queries to get the same result."[1]

General availability and limit increase

On December 17, 2024, the API moved out of beta to general availability on the Anthropic API.[1][2] Coinciding with GA, Anthropic increased the per-batch request limit from 10,000 to 100,000 and added a hard byte ceiling of 256 megabytes (whichever is reached first), substantially expanding the effective batch size from the beta cap of roughly 32 MB.[9][10] The retention policy was set at 29 days, after which the batch object itself remains visible but its result file becomes inaccessible for download.[9]

Feature integrations through 2025 and 2026

After GA, Anthropic progressively extended the Batches API to cover most of the features added to the synchronous Messages API:

  • Prompt caching interoperability. Anthropic confirmed in October 2024 that batch requests are eligible for prompt caching, and that the cache discount stacks with the 50% batch discount. Combined with cache hit pricing of 10% of base input tokens, the effective input price for cached prefixes can fall to roughly 5% of the standard rate, a figure that Alex Albert of Anthropic publicly characterized as "close to a 95% discount" on cacheable workloads.[8] Anthropic later recommended the 1-hour cache duration variant for batch workloads, since batches can take longer than the default 5-minute cache window.[9]
  • Tool use and structured output. Tool use definitions, tool_choice, and structured output schemas are accepted as part of the params object in each batched request, allowing batches to mix function calling and free-form generations.[9][10]
  • Extended thinking. The thinking configuration block is supported inside batched params, letting batched requests run with the same chain-of-thought reasoning behavior available in the sync API. Anthropic published a beta header (output-300k-2026-03-24) that raises the per-request maximum output to 300,000 tokens for Claude Opus 4.7, Claude Opus 4.6, and Claude Sonnet 4.6 inside batches, an option targeted at long-form structured generation and code workloads.[11]
  • Citations. Anthropic's Citations API, announced on January 23, 2025, is explicitly compatible with batch processing, prompt caching, and token counting, allowing grounded retrieval pipelines to be operated in bulk.[12][13]
  • Vision, multi-turn conversations, PDFs, and beta features. Anthropic's documentation states that "any request that you can make to the Messages API can be included in a batch," with each request processed independently. This means a single batch can mix vision, tool use, system messages, and multi-turn requests without restriction.[9]

Cloud provider parity

Batch inference for Claude on third-party clouds tracked the first-party API:

  • Amazon Bedrock. Bedrock has offered batch inference for Claude since 2024, and in August 2025 Amazon Web Services formally announced Bedrock Batch Inference support for Claude Sonnet 4 with pricing at 50% of on-demand Bedrock rates, along with CloudWatch metrics for tokens pending processing.[4]
  • Google Cloud Vertex AI. Anthropic's October 2024 launch post stated that batch predictions on Vertex AI were "coming soon"; by 2026 the Vertex AI documentation listed batch prediction support for Claude Opus 4.7, 4.6, 4.5, 4.1, and 4, Claude Sonnet 4.6, 4.5, and 4, Claude Haiku 4.5, and Claude 3.5 Haiku, with input from BigQuery tables or Cloud Storage JSONL files and a default cap of four concurrent batch jobs per project.[1][5]

How it works

The Batches API exposes a small set of REST endpoints under /v1/messages/batches and follows the standard authentication pattern of the Anthropic API, including the x-api-key header and the anthropic-version versioning header.[9][14]

Endpoint surface

OperationHTTP methodPath
Create a batchPOST/v1/messages/batches
Retrieve a batchGET/v1/messages/batches/{message_batch_id}
Stream resultsGET/v1/messages/batches/{message_batch_id}/results
List batchesGET/v1/messages/batches
Cancel a batchPOST/v1/messages/batches/{message_batch_id}/cancel
Delete a batchDELETE/v1/messages/batches/{message_batch_id}

The create endpoint accepts a JSON body with a requests array. Each element contains a custom_id string (matching the regular expression ^[a-zA-Z0-9_-]{1,64}$) and a params object whose schema is identical to the body of a regular Messages API call.[14][9] The custom_id is mandatory because results are not guaranteed to come back in the same order as the input array; callers use it to join responses back to the originating request.[9]

A representative create request looks like:

curl https://api.anthropic.com/v1/messages/batches \
  --header "x-api-key: $ANTHROPIC_API_KEY" \
  --header "anthropic-version: 2023-06-01" \
  --header "content-type: application/json" \
  --data '{
    "requests": [
      {
        "custom_id": "doc-001",
        "params": {
          "model": "claude-opus-4-7",
          "max_tokens": 1024,
          "messages": [
            {"role": "user", "content": "Summarize the attached transcript."}
          ]
        }
      }
    ]
  }'

The retrieve endpoint is idempotent and returns a MessageBatch object containing id, processing_status (in_progress, canceling, or ended), request_counts (with the per-status tallies processing, succeeded, errored, canceled, and expired), created_at, expires_at, ended_at, and once processing has finished, a results_url pointing to a .jsonl file of the per-request outcomes.[15]

Result types

When processing ends, every request in the batch falls into one of four terminal result types: succeeded (a regular Message response, billed at batch rates), errored (invalid request or server error, not billed), canceled (terminated by user cancellation before reaching the model, not billed), or expired (24-hour expiration reached before the model ran, not billed).[9] The request_counts field on the parent batch object tracks how many requests fell into each bucket, while the per-request lines in the .jsonl output carry the full request output or error detail keyed by custom_id.

Lifecycle and processing model

The batch lifecycle is documented as:

  1. Client submits a POST /v1/messages/batches containing the request list. The batch is created with processing_status: "in_progress" and request_counts.processing equal to the number of submitted requests.[9]
  2. Anthropic's infrastructure dispatches the requests asynchronously and independently. Anthropic states that most batches complete in under one hour, though the documented service-level target is up to 24 hours; requests that have not been dispatched at the 24-hour mark are marked expired and the batch ends.[9]
  3. Clients poll the retrieve endpoint, watch the Console, or (for SDKs that wrap it) await the batch object. Once processing_status is ended, the results_url becomes available for streaming download.[9][15]
  4. Results remain downloadable for 29 days. After 29 days, the batch object's archived_at timestamp is set and the results file is removed; the batch metadata itself can still be listed.[9]

Internally, the discount reflects Anthropic's ability to schedule batched traffic during periods of lower system demand and to amortize batched requests across spare capacity rather than serving them on the latency-sensitive synchronous tier. Anthropic does not publish a formal scheduling SLA other than the 24-hour expiration ceiling, and notes explicitly that "processing may be slowed down based on current demand and your request volume," with the consequence that "you may see more requests expiring after 24 hours" during periods of heavy contention.[9]

In practice, this means the Batches API behaves as a best-effort opportunistic queue rather than a hard-bound scheduler: under low system load batches usually drain within minutes to a few hours, while during sustained peak load the median time-to-completion can extend toward the 24-hour cap. The asymmetry is deliberate, since the discount is justified precisely by the fact that batched work can absorb capacity that would otherwise be unused while not displacing latency-sensitive sync calls. Anthropic's public guidance is therefore to plan around the 24-hour ceiling rather than around the median, and to design downstream consumers to handle out-of-order arrival via custom_id joining rather than relying on per-call dispatch ordering.[9]

Rate limits, spend limits, and Workspace scoping

Batches are scoped to a Workspace, meaning that any API key in a given Workspace can list and retrieve batches created within that Workspace.[9] Anthropic enforces two separate rate-limit dimensions: a per-HTTP-call rate on the Batches API itself, and a separate limit on the number of in-flight requests inside batches waiting to be processed.[9] The documentation also warns that because batches process requests concurrently and asynchronously, the actual billed spend on a batch may slightly exceed a Workspace's configured spend limit (i.e., the spend-limit guardrail is not strictly enforced at the per-request level once a batch has been admitted).[9]

Constraints relative to the synchronous API

There are a small number of feature gaps between batch processing and a real-time call:

  • No streaming responses. Each batch request's full response is materialized in the result file rather than streamed as server-sent events.[9]
  • No fine-grained ordering. Results may arrive out of input order, and the documentation requires custom_id for joining.[9]
  • max_tokens must be at least 1. The max_tokens: 0 pattern used for prompt cache pre-warming in the sync API is not allowed inside a batch because a cache entry written mid-batch would likely expire before any follow-up request runs.[9]
  • No Zero Data Retention (ZDR). The Batches API is explicitly excluded from Anthropic's ZDR offering; instead, batch data is retained under the standard retention policy.[9]

Pricing

The Batches API charges 50% of the standard Anthropic Messages API per-token rate, applied symmetrically to input and output tokens for all supported models.[1][9] Anthropic's official pricing table for the batch tier (as published in 2026) shows:

ModelBatch input (per million tokens)Batch output (per million tokens)
Claude Opus 4.7$2.50$12.50
Claude Opus 4.6$2.50$12.50
Claude Opus 4.5$2.50$12.50
Claude Opus 4.1$7.50$37.50
Claude Sonnet 4.6$1.50$7.50
Claude Sonnet 4.5$1.50$7.50
Claude Haiku 4.5$0.50$2.50
Claude Haiku 3.5 (Bedrock and Vertex only)$0.40$2.00

Source: Anthropic, "Batch processing" documentation.[9]

Because the discount applies to the same model parameters as the sync API, all per-model differences such as long-context price tiers, vision token costs, and the Claude Opus 4.1 token rate are preserved at 50% of the published synchronous rate. The discount stacks with prompt caching: cache writes are billed at the cache-write multiplier (typically a small premium over base input tokens) while cache hits are billed at 10% of base input tokens, and the resulting figure is again halved when accessed via a batch request.[8][16]

Variants and integrations

First-party SDKs

The Batches API is wrapped in idiomatic methods across Anthropic's official client libraries. In Python and TypeScript, the calls live under client.messages.batches.create, .retrieve, .list, .cancel, and .delete, returning typed batch and result objects.[9] Anthropic also publishes batch wrappers in Go (Messages.Batches.New/Get), Java (MessageBatchService), C# (Anthropic.Models.Messages.Batches), PHP, and Ruby, with auto-pagination on the list endpoint.[9] An anthropic command-line interface (ant messages:batches create / retrieve / list) provides a YAML-based equivalent for shell workflows.[9]

Amazon Bedrock Batch Inference

Amazon Bedrock exposes batch inference as a distinct Bedrock primitive rather than reusing Anthropic's /v1/messages/batches shape. Customers upload JSONL records to S3, invoke a Bedrock CreateModelInvocationJob, and retrieve completed results from a destination S3 bucket. Bedrock charges 50% of its on-demand rate for batch jobs and, as of August 2025, supports Claude Sonnet 4 alongside other Anthropic models. The Bedrock implementation also publishes CloudWatch metrics for batch progress, including tokens pending processing for Claude.[4]

Vertex AI Batch Prediction

On Vertex AI, Anthropic Claude is exposed as a partner model and batched via Google Cloud's Batch Prediction service. Inputs come from BigQuery tables or Cloud Storage JSONL files, with the per-row body following the Anthropic Messages schema, and results land back in BigQuery or Cloud Storage. The Vertex variant defaults to a maximum of four concurrent batch jobs per project, with a 24-hour completion window matching the first-party API.[5]

Third-party gateways

The Batches API is also exposed through API gateways and orchestration platforms, including LiteLLM, Portkey, and the Vercel AI Gateway, which provide compatibility layers between the Anthropic batch endpoints and equivalent batch endpoints from other providers such as OpenAI and Mistral AI. These integrations typically translate between provider-specific input shapes and a common batch abstraction, but they do not change the underlying pricing, limits, or SLA of the source provider.

Applications and adoption

The Batches API is deliberately scoped to workloads where the 24-hour deadline is acceptable in exchange for the 50% discount. Anthropic's documentation lists four canonical use cases: large-scale evaluations, content moderation pipelines, large-volume data analysis, and bulk content generation.[9] In practice, the dominant published patterns are:

  • Model evaluations and red-teaming. Sweeping thousands of test prompts across multiple Claude models is a common batch workload, since per-test latency is irrelevant and the cost savings are material on test suites of meaningful size. A typical eval harness submits a few thousand prompts per Claude variant, joins the resulting succeeded rows by custom_id, and computes pass-at-K or rubric-graded scores offline. Anthropic explicitly recommends this pattern in its documentation, and notes that splitting a sweep into a single batch produces results that are easier to compare than the same sweep run across many sync sessions where latency-driven retries can perturb the workload.[9]
  • Content moderation and labeling. Batch jobs power classification and moderation pipelines for data labeling at scale, often paired with a structured output schema so that downstream consumers receive deterministically shaped JSON results. Because batches scale to 100,000 requests per job and 256 MB per file, a single nightly batch can frequently cover an entire day's accumulated moderation queue for a mid-sized platform without partitioning logic on the client.[9][3]
  • Document processing and summarization. Quora, the original launch customer, uses the Batches API for "summarization and highlight extraction to create new end-user features."[1] Similar bulk-summarization, translation, and extraction workloads are documented in AWS's batch inference blog and in third-party tutorials. The pattern typically pairs prompt caching of a stable system prompt or document chunk with batch submission of variable per-document instructions, achieving the stacked cache-plus-batch discount described in the pricing section above.[4][8]
  • Synthetic data and offline distillation. Generating large corpora of completions to fine-tune smaller models or to support retrieval-augmented generation indexes is a natural batch workload because the throughput need dominates and individual completions can be processed independently. Pipelines that generate millions of completions for downstream training corpora can be cheaper to run via the Batches API than via repeated sync calls, even before accounting for the lower complexity of avoiding rate limits.
  • Embedding-like throughput tasks. While the Batches API does not produce embedding vectors directly, it is commonly used as the bulk-classification, bulk-rerank, or bulk-summarization stage of a pipeline that also touches an embeddings index, providing high-throughput LLM-shaped enrichments to documents that have already been indexed elsewhere.[9]
  • Citations and grounded QA at scale. With the addition of Citations support in early 2025, batches became viable for offline grounded-QA pipelines, where each batched request returns both an answer and character-indexed pointers into the source documents. This pattern is particularly well-suited to legal, scientific, and regulatory document review, where reviewers need explicit traceability from each generated assertion back to the input corpus.[12][13]

Comparison with other batch APIs

The 50% discount and 24-hour SLA pattern has become a de facto industry standard for batch text generation. The table below compares the three best-documented public batch APIs as of 2026.

PropertyAnthropic Batches APIOpenAI Batch APIMistral La Plateforme Batch API
AnnouncedOctober 8, 2024[1]April 15, 2024[7]November 7, 2024[17]
Discount versus sync50% on input and output[1]50% on input and output[7]50% versus sync[17]
SLAUp to 24 hours[9]Up to 24 hours[7]Asynchronous, with explicit batch window
Max requests per batch100,000[9]50,000 per file[7]Documented per file, supports millions across a job
Max batch size256 MB[9]Per-file file-size limit on the Files API[7]Documented as multi-gigabyte capable
Result retention29 days[9]Set by Files API retention[7]Time-bounded per documentation
Endpoint style/v1/messages/batches[9]/v1/batches with input_file_id[7]Per-job submit with input file ID[17]
Streaming inside a batched requestNot supported[9]Not supported[7]Not supported

Functionally, the three APIs are very close substitutes: each provides a 50% asynchronous discount, a 24-hour ceiling, and identical-to-the-sync-API per-request parameter schemas. The principal differences are in the shape of input upload (Anthropic accepts an inline JSON array of up to 100,000 requests, OpenAI requires a JSONL file uploaded via its Files API, Mistral uploads an input file via its job-creation flow); in raw batch ceilings (Anthropic's documented 100,000 request maximum is higher per batch than OpenAI's 50,000); and in tooling around results (Vertex AI's Claude integration adds BigQuery-native input and output handling, while Bedrock's Batch Inference adds CloudWatch metrics).[7][9][5][4][17]

In commentary published the day of the Anthropic launch, Simon Willison observed that the Anthropic Batches API "matches offerings from OpenAI and Google Gemini, which both provide identical 50% pricing discounts on their respective batch services," noting that the convergence of all three major providers on the same discount and SLA structure had effectively standardized a tier of asynchronous text generation pricing across the industry.[8] VentureBeat described the launch as Anthropic "challenging OpenAI with affordable batch processing," arguing that the introduction of batching specifically targeted enterprise workloads where the 50% discount was material at scale.[6]

The key remaining differentiator across providers in 2026 is not price or SLA but the underlying model itself: each provider's Batch API uses the same model line as its sync API, so the choice between Anthropic, OpenAI, Mistral, and (via Vertex AI) Gemini Batch Prediction is dominated by model selection rather than batch mechanics.

Anthropic's Responses API competitor, the synchronous Messages API, is not a direct batch substitute because each call is billed at standard rates and is rate-limited per-request. The Batches API is also distinct from interactive Claude products like Claude Artifacts, Claude Skills, and Anthropic Computer Use, which depend on real-time response streaming and tool execution. By contrast, the Batches API is purely a backend bulk-processing primitive; it shares the parameter schema with the rest of the Anthropic API but offers neither streaming nor low latency.

Limitations

Despite broad feature coverage, the Batches API has several documented limitations:

  • No streaming output. Server-sent events cannot be used inside batched requests; full responses are only available after the request completes and the result file is written.[9]
  • Order is not preserved. Results may arrive in any order in the result file, requiring callers to join on custom_id.[9]
  • 24-hour expiration is hard. Requests not dispatched within 24 hours are marked expired. Anthropic warns that under high demand more requests may expire, with no formal pre-emption SLA other than the 24-hour cap.[9]
  • No Zero Data Retention. Data submitted via the Batches API is held under standard retention rather than the ZDR variant available for some sync workloads.[9]
  • 29-day result expiration. Result .jsonl files become inaccessible after 29 days. The batch metadata remains visible but the data itself must be downloaded and persisted by the customer if needed long-term.[9]
  • No max_tokens: 0 cache pre-warming. Because the cache slot would likely expire before any follow-up sync request, Anthropic explicitly disallows cache-pre-warm requests inside batches.[9]
  • Spend-limit overshoot. Anthropic notes that batches may go slightly over a Workspace's configured spend limit because of concurrent processing within the batch.[9]
  • Workspace scoping. Batches and their results are visible to any API key in the same Workspace; cross-workspace isolation requires creating separate workspaces.[9]

A more subtle constraint is that the asynchronous model defeats use cases that interleave model output with tool execution at low latency, since each batched call is a single completed turn rather than an interactive loop. Agentic workflows that depend on tool round-tripping within a session, including the kind of step-by-step tool execution at the heart of Anthropic Computer Use or Claude Code interactive sessions, are not a natural fit for the Batches API.

  • Synchronous Messages API. The Batches API shares its request schema with the standard Messages endpoint described under the Anthropic API. Choosing between them is essentially a latency-versus-cost trade.
  • Prompt caching. Stacks with the 50% batch discount and is recommended for batches with repeated prefixes, with Anthropic suggesting the 1-hour cache duration variant for batched workloads.[9][16]
  • Extended thinking. Available inside batched params, with a beta header raising the per-request output cap to 300,000 tokens for the largest current models.[11]
  • Tool use and function calling. Fully supported inside batches via tools and tool_choice parameters; results return tool-use blocks like the sync API.[9]
  • OpenAI Batch API and Mistral La Plateforme Batch API. Direct competitors with near-identical 50%/24-hour structures.[7][17]
  • Amazon Bedrock Batch Inference and Vertex AI Batch Prediction. Cloud-managed routes to the same underlying Anthropic models with platform-specific input and output handling.[4][5]

See also

References

  1. Anthropic, "Introducing the Message Batches API", Anthropic / Claude blog, 2024-10-08. https://www.anthropic.com/news/message-batches-api. Accessed 2026-05-20.
  2. AI In Transit, "Anthropic Launches Message Batches API for Cost-Effective Querying with Claude AI", Medium, 2024-12-17. https://aiintransit.medium.com/anthropic-launches-message-batches-api-for-cost-effective-querying-with-claude-ai-3bd4ba3a003e. Accessed 2026-05-20.
  3. Nishant N, "Anthropic AI Introduces the Message Batches API: A Powerful and Cost-Effective Way to Process Large Volumes of Queries Asynchronously", MarkTechPost, 2024-10-09. https://www.marktechpost.com/2024/10/09/anthropic-ai-introduces-the-message-batches-api-a-powerful-and-cost-effective-way-to-process-large-volumes-of-queries-asynchronously/. Accessed 2026-05-20.
  4. Amazon Web Services, "Amazon Bedrock now supports Batch inference for Anthropic Claude Sonnet 4 and OpenAI GPT-OSS models", AWS What's New, 2025-08-18. https://aws.amazon.com/about-aws/whats-new/2025/08/amazon-bedrock-batch-inference-anthropic-claude-sonnet-4-openai-gpt-oss-models/. Accessed 2026-05-20.
  5. Google Cloud, "Batch predictions with Anthropic Claude models", Generative AI on Vertex AI documentation, 2026. https://docs.cloud.google.com/vertex-ai/generative-ai/docs/partner-models/claude/batch. Accessed 2026-05-20.
  6. Carl Franzen, "Anthropic challenges OpenAI with affordable batch processing", VentureBeat, 2024-10-08. https://venturebeat.com/ai/anthropic-challenges-openai-with-affordable-batch-processing. Accessed 2026-05-20.
  7. OpenAI, "Batch API", OpenAI Help Center FAQ, 2024-2026. https://help.openai.com/en/articles/9197833-batch-api-faq. Accessed 2026-05-20.
  8. Simon Willison, "Anthropic: Message Batches (beta)", simonwillison.net, 2024-10-08. https://simonwillison.net/2024/Oct/8/anthropic-batch-mode/. Accessed 2026-05-20.
  9. Anthropic, "Batch processing", Claude API documentation, 2026. https://platform.claude.com/docs/en/build-with-claude/batch-processing. Accessed 2026-05-20.
  10. Anthropic, "Create a Message Batch", Claude API reference, 2026. https://platform.claude.com/docs/en/api/creating-message-batches. Accessed 2026-05-20.
  11. Anthropic, "Claude Platform release notes", Claude API documentation, 2026. https://docs.anthropic.com/en/release-notes/api. Accessed 2026-05-20.
  12. Anthropic, "Introducing Citations on the Anthropic API", Anthropic / Claude blog, 2025-01-23. https://www.anthropic.com/news/introducing-citations-api. Accessed 2026-05-20.
  13. Simon Willison, "Anthropic's new Citations API", simonwillison.net, 2025-01-24. https://simonwillison.net/2025/Jan/24/anthropics-new-citations-api/. Accessed 2026-05-20.
  14. Anthropic, "List Message Batches", Claude API reference, 2026. https://docs.anthropic.com/en/api/listing-message-batches. Accessed 2026-05-20.
  15. Anthropic, "Retrieve a Message Batch", Claude API reference, 2026. https://platform.claude.com/docs/en/api/retrieving-message-batches. Accessed 2026-05-20.
  16. Anthropic, "Prompt caching", Claude API documentation, 2026. https://platform.claude.com/docs/en/build-with-claude/prompt-caching. Accessed 2026-05-20.
  17. Mistral AI, "Mistral batch API", Mistral AI news, 2024-11-07. https://mistral.ai/news/batch-api. Accessed 2026-05-20.

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation.

Suggest edit