Anthropic API
Last reviewed
May 8, 2026
Sources
27 citations
Review status
Source-backed
Revision
v3 ยท 7,494 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 8, 2026
Sources
27 citations
Review status
Source-backed
Revision
v3 ยท 7,494 words
Add missing citations, update stale details, or suggest a clearer explanation.
The Anthropic API is the primary interface for developers to access Anthropic's Claude family of large language models. Launched on March 14, 2023, alongside the original Claude and Claude Instant models, the API provides a Messages-based HTTP interface for building conversational AI applications, AI agents, and automated workflows [1]. The API is also available through third-party cloud platforms including Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry (formerly Azure AI Foundry).
As of May 2026, the Anthropic API serves three model tiers: Claude Opus (highest capability), Claude Sonnet (balanced performance and cost), and Claude Haiku (fastest and most affordable). The current flagship is Claude Opus 4.7, released April 16, 2026, which sits alongside Claude Sonnet 4.6, Claude Opus 4.6, and Claude Haiku 4.5. The latest generation offers up to one million tokens of context at standard pricing, adaptive thinking for complex reasoning, and a deep stack of developer features including tool use, prompt caching, batch processing, computer use, citations, Skills, MCP connectors, and vision [2].
The API is hosted at https://api.anthropic.com and follows REST conventions. The current API version header is anthropic-version: 2023-06-01, with feature additions delivered through opt-in beta headers rather than versioned endpoint changes. Code written against the Messages API in 2024 still works in 2026, even as the surface area underneath has grown substantially.
Anthropic released its API on March 14, 2023, making two models available: Claude (the more capable variant) and Claude Instant (a faster, lower-cost derivative) [1][3]. Access was initially limited to selected partners, including Notion, Quora (through its Poe chatbot), DuckDuckGo, and Robin AI. The API launch came approximately four months after ChatGPT's release and the same day OpenAI released GPT-4, placing Anthropic squarely in competition with OpenAI from the start.
The original API used a text completions format with a \n\nHuman: and \n\nAssistant: turn structure, distinct from OpenAI's message-based approach. The legacy text completions endpoint (POST /v1/complete) remains accessible for backward compatibility but has been formally deprecated in favor of the Messages API.
Claude 2, released in July 2023, became the first Anthropic model available to the general public without an application process [4]. It introduced a 100K token context window, a major differentiator at a time when most competing models topped out at 4K to 32K tokens. Claude 2 also showed significant improvements on academic benchmarks, scoring above the 90th percentile on the bar exam.
In September 2023, Amazon announced a partnership with Anthropic, initially investing $1.25 billion (later expanded to $4 billion total), making Claude available on Amazon Bedrock [5]. The following month, Google invested $500 million in Anthropic, with commitments for $1.5 billion more over time, and Claude became available on Google Cloud Vertex AI. These partnerships gave enterprise customers the option of accessing Claude through their existing cloud infrastructure, with data residency guarantees and unified billing.
The Claude 3 release on March 4, 2024, introduced the three-tier model structure that Anthropic continues to use: Opus (most capable), Sonnet (balanced), and Haiku (fastest) [6]. Claude 3 Opus set new benchmarks in reasoning and analysis, while Haiku offered exceptionally fast responses at a fraction of the cost. This release also brought native vision capabilities to all tiers, allowing the API to process images alongside text. The Messages API, introduced earlier alongside Claude 2.1, fully supplanted the text completions format with this generation.
Throughout 2024 and 2025, Anthropic expanded the API's feature set substantially. Each addition tended to land first as a beta header, mature for several months under customer use, then graduate to GA. Key milestones:
In 2026 Anthropic added a dedicated managed agent runtime to the API, exposing three new resource types alongside Messages: Agents (/v1/agents), Sessions (/v1/sessions), and Environments (/v1/environments). Sessions run in stateful cloud containers managed by Anthropic, billed on session-hours, and are streamed back to the client. This is the API-side counterpart to features that previously required developers to bring their own infrastructure with the Claude Agent SDK.
The Claude API spans several endpoint groups, some generally available and others in beta. The Messages API is the center; everything else either feeds into it (Files, Skills) or wraps around it (Batches, Sessions).
| Endpoint | Path | Purpose |
|---|---|---|
| Messages | POST /v1/messages | Primary endpoint for all Claude interactions |
| Message Batches | POST /v1/messages/batches | Asynchronous bulk processing at 50% discount |
| Token Counting | POST /v1/messages/count_tokens | Pre-flight token counting before sending |
| Models | GET /v1/models | List available models with metadata and capabilities |
| Rate Limits | GET /v1/organizations/usage_report/rate_limits | Programmatic access to current rate limits |
| Endpoint | Path | Beta header |
|---|---|---|
| Files | POST /v1/files, GET /v1/files/{id} | files-api-2025-04-14 |
| Skills | POST /v1/skills, GET /v1/skills | skills-2025-10-02 |
| Agents | POST /v1/agents, GET /v1/agents | Managed Agents preview |
| Sessions | POST /v1/sessions, GET /v1/sessions/{id}/stream | Managed Agents preview |
| Environments | POST /v1/environments, GET /v1/environments | Managed Agents preview |
The Admin API allows organizations to manage workspaces, members, and API keys programmatically. It uses a separate credential type, the Admin API key (prefixed sk-ant-admin...), which can only be provisioned by users holding the admin role through the Console. Admin API endpoints sit under /v1/organizations/:
| Operation | Path | Method |
|---|---|---|
| List workspaces | /v1/organizations/workspaces | GET |
| Create workspace | /v1/organizations/workspaces | POST |
| Get workspace | /v1/organizations/workspaces/{id} | GET |
| Archive workspace | /v1/organizations/workspaces/{id}/archive | POST |
| List members | /v1/organizations/workspaces/{id}/members | GET |
| Add member | /v1/organizations/workspaces/{id}/members | POST |
| Remove member | /v1/organizations/workspaces/{id}/members/{user_id} | DELETE |
Note that Anthropic does not offer first-party text embeddings at the API level. The recommended embeddings partner is Voyage AI, now owned by MongoDB; example notebooks for Voyage models live in the official Anthropic Cookbook repository. Developers who need vector retrieval typically pair Claude with Voyage embeddings or another provider.
| Endpoint | Maximum request size |
|---|---|
| Messages, Token Counting | 32 MB |
| Batch API | 256 MB |
| Files API | 500 MB per file, 500 GB per organization |
| Sessions, Agents, Environments | 32 MB |
Exceeding these limits returns a 413 request_too_large error. Vertex AI imposes a tighter 30 MB limit and Bedrock 20 MB, so applications that hit cloud platform limits sometimes need to switch to the direct API for the largest payloads.
The pricing tables below reflect the published rates as of May 2026 [2]. All prices are in USD per million tokens (MTok). Cache hit pricing applies to both 5-minute and 1-hour cached reads at 10% of the base input rate.
| Model | Input | 5m cache write | 1h cache write | Cache hit | Output | Context | Max output |
|---|---|---|---|---|---|---|---|
| Claude Opus 4.7 | $5.00 | $6.25 | $10.00 | $0.50 | $25.00 | 1,000,000 | 128k |
| Claude Sonnet 4.6 | $3.00 | $3.75 | $6.00 | $0.30 | $15.00 | 1,000,000 | 64k |
| Claude Haiku 4.5 | $1.00 | $1.25 | $2.00 | $0.10 | $5.00 | 200,000 | 64k |
| Model | Input | 5m cache write | 1h cache write | Cache hit | Output | Status |
|---|---|---|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $6.25 | $10.00 | $0.50 | $25.00 | Active |
| Claude Opus 4.5 | $5.00 | $6.25 | $10.00 | $0.50 | $25.00 | Active |
| Claude Opus 4.1 | $15.00 | $18.75 | $30.00 | $1.50 | $75.00 | Active |
| Claude Opus 4 | $15.00 | $18.75 | $30.00 | $1.50 | $75.00 | Deprecated, retires June 15, 2026 |
| Claude Sonnet 4.5 | $3.00 | $3.75 | $6.00 | $0.30 | $15.00 | Active |
| Claude Sonnet 4 | $3.00 | $3.75 | $6.00 | $0.30 | $15.00 | Deprecated, retires June 15, 2026 |
| Claude Sonnet 3.7 | $3.00 | $3.75 | $6.00 | $0.30 | $15.00 | Deprecated |
| Claude Haiku 3.5 | $0.80 | $1.00 | $1.60 | $0.08 | $4.00 | Deprecated |
| Claude Haiku 3 | $0.25 | $0.30 | $0.50 | $0.03 | $1.25 | Active |
| Claude Opus 3 | $15.00 | $18.75 | $30.00 | $1.50 | $75.00 | Deprecated |
Opus 4.7 uses a new tokenizer compared to previous models, contributing to its improved performance on a wide range of tasks. The new tokenizer may use up to 35% more tokens for the same fixed text, so cost comparisons across model generations should be done by reading actual usage from API responses rather than naive per-token math.
The Claude 4.5 and 4.6 generation represents a dramatic price reduction compared to earlier Opus models. Claude Opus 4.5 onward costs $5 per MTok input and $25 per MTok output, compared to $15 and $75 for Opus 4.1, a 67% reduction in both directions. Sonnet pricing has held steady at $3 / $15 since the Claude 3 generation.
Anthropic publishes a deprecation calendar at docs.anthropic.com/en/docs/about-claude/model-deprecations and gives roughly six months of notice before retiring a model. Claude Sonnet 4 and Opus 4 (the original 2025 versions) retire on June 15, 2026. Claude 3.7 Sonnet, Haiku 3.5, and Opus 3 are also marked deprecated, though they remain callable until their retirement dates. After retirement, requests to a sunsetted model ID return a 404 not_found_error, and applications must migrate to a current alias such as claude-opus-4-7 or claude-sonnet-4-6.
The Messages API (POST /v1/messages) is the primary endpoint for all Claude interactions [10]. It uses a structured format with:
A basic request sends a system prompt and user message, receiving back an assistant response with metadata including token usage counts. The API returns a structured JSON response containing the model's output, stop reason, and detailed usage information.
Claude's tool use capabilities allow developers to extend the model's functionality by defining external tools that Claude can invoke [11]. The developer describes available tools with names, descriptions, and JSON Schema input definitions; Claude decides when to use them and generates structured arguments. The API then returns a tool_use content block, and the developer executes the tool and sends results back as a tool_result message.
Tool use supports several modes:
| Tool choice | Behavior |
|---|---|
auto | Claude decides whether to use tools |
any | Claude must use at least one tool |
tool (specific name) | Claude must use the specified tool |
none | Tools are disabled for this request |
Parallel tool use, added with the Claude 4 family in May 2025, lets the model emit multiple tool_use blocks in a single turn so the client can run them concurrently. Token-efficient tool use, available since Sonnet 3.7, reformats the underlying tool prompts to consume fewer system-prompt tokens, which becomes meaningful when an application loads dozens of tool definitions.
Anthropic also provides server-side tools that run on Anthropic's infrastructure rather than the client:
web_search_20260209): Searches the web and returns results, $10 per 1,000 searchesweb_fetch_20260209): Fetches and processes web page content, no additional charge beyond standard token costscode_execution_20250825): Runs Python in a sandboxed container; free when used with web search or web fetch, otherwise $0.05 per container-hour beyond 1,550 free hours per monthtext_editor_20250728 for Opus 4.7+): Edits files with view, create, and replace operationsbash_20250124): Runs shell commands; client-side executioncomputer_20251124): Controls desktop environments via screenshots and mouse/keyboard inputmemory_20250818): Persistent file-based memory across sessions; client-side storageAll current Claude models support vision, accepting images in requests alongside text [12]. Images can be provided as base64-encoded data, URLs, or via Files API references. Claude can analyze photographs, charts, diagrams, screenshots, documents, and other visual content. Vision is priced based on image dimensions, with each image consuming tokens proportional to its resolution. The API constrains images to roughly 1.15 megapixels and 1568 pixels on the longest edge for older models, though Opus 4.7 raises that ceiling to 2576 pixels with 1:1 coordinate mapping for computer use workflows.
Extended thinking is a feature that allows Claude to engage in deeper, step-by-step reasoning before producing a response [9]. When enabled, Claude generates internal thinking tokens that work through the problem systematically, similar to the approach used by OpenAI's o-series reasoning models.
The original implementation used a thinking: {type: "enabled", budget_tokens: N} parameter, where the developer set a maximum budget for reasoning tokens. Starting with Claude Opus 4.6 and Sonnet 4.6, Anthropic introduced adaptive thinking, which uses an effort parameter (low, medium, or high) and lets the model decide how deeply to think. By Claude Opus 4.7, manual extended thinking is no longer supported and adaptive thinking is the only mode.
# Adaptive thinking (Claude Opus 4.7)
thinking={"type": "adaptive", "effort": "medium"}
# Manual extended thinking (Sonnet 4.6 and earlier)
thinking={"type": "enabled", "budget_tokens": 10000, "display": "summarized"}
Extended thinking tokens are billed at the standard output token rate. For complex tasks, thinking can take several minutes, and the thinking tokens can significantly exceed the visible output. Sonnet 4.6 is notable for being a hybrid model: developers can toggle adaptive thinking on or off per request, choosing between faster standard responses and deeper reasoning as needed. Sonnet 4.6 supports both adaptive and manual modes; Opus 4.6 supports both but recommends adaptive; Opus 4.7 only supports adaptive.
Prompt caching reduces costs and latency by reusing previously processed portions of prompts across API calls [7]. Instead of reprocessing the same system prompt, document, or conversation prefix on every request, the API reads from cache at a fraction of the standard input price.
There are two cache durations with different write costs:
| Cache operation | Cost multiplier | Duration |
|---|---|---|
| 5-minute cache write | 1.25x base input price | 5 minutes |
| 1-hour cache write | 2.0x base input price | 1 hour |
| Cache read (hit) | 0.1x base input price | Same duration as write |
A cache hit costs 10% of the standard input price, a 90% reduction on cached content. For the 5-minute cache, a single read after the initial write already saves money (1.25x write plus 0.1x read equals 1.35x total, versus 2.0x for two uncached requests). The 1-hour cache breaks even at two reads. Caching is especially valuable for applications with long, stable system prompts or repeated document analysis.
Minimum cacheable prompt lengths vary by model:
| Model | Minimum tokens to cache |
|---|---|
| Opus 4.7, 4.6, 4.5 | 4,096 |
| Opus 4.1, 4 | 1,024 |
| Sonnet 4.6 | 2,048 |
| Sonnet 4.5, 4, 3.7 | 1,024 |
| Haiku 4.5 | 4,096 |
| Haiku 3.5 | 2,048 |
Prompts below the minimum are processed without caching silently rather than returning an error. Caches use a 20-block lookback window, meaning developers should add multiple cache breakpoints in long conversations to ensure the system finds a hit.
Prompt caching can be combined with other discounts. The Batch API's 50% discount stacks with caching, and long context pricing multipliers also stack, enabling substantial compound savings. For most current models, cache reads do not count against the input tokens-per-minute (ITPM) rate limit, which means a high cache hit ratio effectively raises throughput.
The Batch API (/v1/messages/batches) processes large volumes of requests asynchronously at a 50% discount on all token costs [13]. Results are typically delivered within hours, with a 24-hour SLA target.
| Model | Batch input | Batch output |
|---|---|---|
| Claude Opus 4.7 | $2.50 | $12.50 |
| Claude Opus 4.6 | $2.50 | $12.50 |
| Claude Opus 4.5 | $2.50 | $12.50 |
| Claude Opus 4.1 | $7.50 | $37.50 |
| Claude Sonnet 4.6 | $1.50 | $7.50 |
| Claude Sonnet 4.5 | $1.50 | $7.50 |
| Claude Sonnet 4 | $1.50 | $7.50 |
| Claude Haiku 4.5 | $0.50 | $2.50 |
| Claude Haiku 3.5 | $0.40 | $2.00 |
| Claude Haiku 3 | $0.125 | $0.625 |
Batch requests are submitted as collections of individual message requests, each with a custom ID for tracking. Results are stored for 29 days, after which they are purged. The 50% discount combined with prompt caching can yield total savings exceeding 90% compared to standard synchronous pricing. Batches are not eligible for ZDR or fast mode.
The computer use tool, introduced in beta in October 2024, enables Claude to interact with desktop environments by viewing screenshots and performing mouse and keyboard actions [8]. When given access to a computer environment (typically via a Docker container or virtual machine), Claude can navigate file systems and applications, fill out forms and spreadsheets, interact with web browsers, and execute multi-step workflows across applications.
Current versions are differentiated by tool type string:
computer_20251124: Opus 4.7, Opus 4.6, Sonnet 4.6, Opus 4.5; adds the zoom action for inspecting regions at full resolutioncomputer_20250124: Sonnet 4.5, Haiku 4.5, Opus 4.1, Sonnet 4, Opus 4, Sonnet 3.7Computer use follows standard tool use pricing, plus 466 to 499 tokens of system prompt overhead, 735 tokens per tool definition for Claude 4.x, and additional tokens from the screenshot images Claude analyzes. The feature requires a beta header (computer-use-2025-11-24 or computer-use-2025-01-24) and is ZDR-eligible because all screenshots and files stay on the client. Anthropic ships a reference implementation in a Docker container in the anthropic-quickstarts repository.
The citations feature, introduced in early 2025, allows Claude to produce grounded responses with character-level pointers back to source documents. When a developer enables citations on a document content block, Claude's response is broken into multiple text blocks, each annotated with the specific cited text, document index, and either character indices, page numbers, or content block indices.
Three document types are supported:
| Type | Best for | Chunking | Citation format |
|---|---|---|---|
| Plain text | Prose, simple documents | Sentence | Character indices (0-indexed) |
| PDF files with extractable text | Sentence | Page numbers (1-indexed) | |
| Custom content | Lists, transcripts, fine-grained citations | None (use blocks as-is) | Content block indices |
Citations work with all current models except Haiku 3, and are compatible with prompt caching, token counting, and batch processing. They are not compatible with structured outputs because the citation format requires interleaving citation blocks with text. The cited_text field in the response does not count toward output tokens, which makes citations cheaper than asking the model to repeat quotes in its prose.
The Files API, in beta since April 2025, lets developers upload files once and reference them across multiple Messages requests. Files are addressed by file_id and persist until explicitly deleted, with a 500 MB per-file ceiling and 500 GB per-organization storage cap. Supported types include PDF (application/pdf), plain text, images (jpeg, png, gif, webp), and various dataset formats for the code execution tool.
File API operations are free; developers pay only for the tokens consumed when a file's contents are loaded into a Messages request. The Files API requires the files-api-2025-04-14 beta header and is not currently supported on Bedrock or Vertex AI. It is also not ZDR-eligible because files necessarily persist server-side.
The memory tool (memory_20250818), introduced in August 2025, gives Claude a persistent file-based memory directory that survives across conversations. Claude can view, create, str_replace, insert, delete, and rename files inside /memories, and the system prompt automatically includes a memory protocol instructing Claude to check the directory before doing any work.
Memory is a client-side tool: storage and retrieval happen in the developer's infrastructure, which means the developer chooses the backend (filesystem, database, encrypted storage) and is responsible for path traversal protection. Because nothing is stored at Anthropic, the memory tool is ZDR-eligible. The most common pattern pairs memory with context editing or compaction so long-running agents can summarize old context server-side while persisting the truly important state in memory files.
Claude Skills, launched in October 2025 as a beta API surface, are filesystem-based packages of instructions, scripts, and reference material that Claude loads on demand. Each skill consists of a SKILL.md file with YAML frontmatter (name, description) plus optional supporting files. The design uses progressive disclosure: only the metadata of every installed skill loads upfront (around 100 tokens each), the SKILL.md body loads when triggered, and bundled scripts run via bash without ever entering the context window.
The Skills API exposes POST /v1/skills and GET /v1/skills endpoints for uploading and managing custom skills, scoped to a workspace. Anthropic also publishes pre-built skills for PowerPoint (pptx), Excel (xlsx), Word (docx), and PDF generation, plus an open-source claude-api skill that ships up-to-date API reference material across eight languages. Using skills via the API requires three beta headers: code-execution-2025-08-25, skills-2025-10-02, and files-api-2025-04-14. Skills are not ZDR-eligible.
In November 2024, Anthropic open-sourced the Model Context Protocol, an open standard for connecting AI agents to external tools and data sources. The Anthropic API integrates with MCP through several paths.
Client-side MCP: The Claude Agent SDK, Claude Code, and other clients can connect to local or remote MCP servers using stdio, HTTP, or SSE transports. The agent process loads MCP servers and exposes their tools to Claude as part of the tools array.
MCP connector (server-side): In April 2025 Anthropic added an mcp_servers parameter directly to the Messages API, letting developers point Claude at remote MCP server URLs without running an MCP client themselves. The current beta header is mcp-client-2025-11-20, with the older mcp-client-2025-04-04 deprecated. The connector supports OAuth authentication for protected servers and lets a single request connect to multiple MCP endpoints. As of 2026 only the tools subset of the MCP specification is supported through the connector; resources and prompts are not yet exposed. The MCP connector is not ZDR-eligible.
System prompts in the Anthropic API provide instructions and context that shape Claude's behavior throughout a conversation [10]. Unlike user messages, system prompts are processed once and set the foundation for all subsequent interactions. Common uses include:
System prompts are particularly effective when combined with prompt caching. A long system prompt containing reference documentation can be cached, reducing costs by 90% on subsequent requests while maintaining consistent behavior. The Messages API also automatically prepends specific instructions for tool use (around 313-346 tokens depending on model and tool_choice setting) and computer use (466-499 tokens), so heavy tool use applications often see the system prompt grow without explicit developer input.
The API supports server-sent events (SSE) streaming, delivering response tokens incrementally as they are generated [14]. Streaming is enabled by setting "stream": true in the request body. The event stream uses standard SSE framing, with event: and data: lines, and includes the following event types:
| Event | Description |
|---|---|
message_start | Initial message metadata, including model and starting usage counts |
content_block_start | Beginning of a text, thinking, tool use, or citation block |
content_block_delta | Incremental text, JSON fragments, thinking deltas, or citation deltas |
content_block_stop | End of a content block |
message_delta | Updates to message-level fields such as stop reason and final usage |
message_stop | Complete message has been delivered |
ping | Heartbeat to keep the connection alive |
error | Stream-level error |
The Python, TypeScript, Java, Go, Ruby, C#, and PHP SDKs provide high-level streaming helpers that handle event parsing and offer both synchronous and asynchronous interfaces. Thinking deltas (sub-events of content_block_delta for adaptive and extended thinking) and citation deltas (when citations are enabled) follow the same framing. Fine-grained tool streaming, available since Sonnet 3.7, lets clients render tool use parameters as they are produced rather than waiting for the full block.
Anthropic provides official client SDKs across eight environments [15]. As of May 2026, all SDKs share a near-identical surface area: an Anthropic (or equivalent) client class with a messages.create() method, beta namespaces for opt-in features, automatic retry with exponential backoff, and request-level overrides for timeouts and headers.
| SDK | Package | Min version | Notable features |
|---|---|---|---|
| Python | anthropic (PyPI) | Python 3.9+ | Sync and async clients, Pydantic models, streaming helpers |
| TypeScript | @anthropic-ai/sdk (npm) | TypeScript 4.9+, Node.js 20+ | ESM/CJS, browser support, streaming via async iterators |
| Java | com.anthropic:anthropic-java (Maven, current 2.27.0) | Java 8+ | Builder pattern, CompletableFuture async |
| Go | github.com/anthropics/anthropic-sdk-go | Go 1.23+ | Context cancellation, functional options |
| Ruby | anthropic (Bundler) | Ruby 3.2.0+ | Sorbet types, streaming helpers |
| C# / .NET | Anthropic (NuGet) | .NET Standard 2.0 | IChatClient integration |
| PHP | anthropic-ai/sdk (Composer) | PHP 8.1.0+ | Value objects, builder pattern |
| CLI | ant (Homebrew) | n/a | Shell scripting with typed flags and response transforms |
All SDKs automatically detect the ANTHROPIC_API_KEY environment variable and support the major cloud platforms (Bedrock, Vertex AI, Foundry) via environment variable switches. Beta features are accessed through a beta namespace, which on the Python SDK looks like client.beta.messages.create(..., betas=["feature-name"]). The CLI is the newer addition: it ships typed YAML inputs and JSONPath-style response transforms, making it useful for shell scripting.
A basic Python example:
from anthropic import Anthropic
client = Anthropic()
message = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello, Claude"}]
)
print(message.content<sup><a href="#cite_note-0" class="cite-ref">[0]</a></sup>.text)
All requests to the Claude API must include either an API key or a workload identity token. The required headers are:
| Header | Value |
|---|---|
x-api-key | Standard API key from the Console (sk-ant-...) |
Authorization | Bearer <token> from POST /v1/oauth/token (Workload Identity Federation) |
anthropic-version | API version, currently 2023-06-01 |
content-type | application/json |
anthropic-beta | Optional, comma-separated list of beta features |
API keys are scoped to workspaces, so an organization with multiple projects can issue separate keys with their own spend limits and rate limits. Admin API keys (sk-ant-admin-...) are a separate credential type used for organization management; only users with the admin role can mint them.
Workload Identity Federation, added in 2025, allows applications to exchange short-lived OAuth tokens for API access without storing long-lived API keys. This is the recommended path for production workloads in regulated environments. The API accepts either authentication mode but not both in the same request.
Claude models are available on the three major cloud platforms, providing enterprise customers with additional deployment options [5][16]. The direct Claude API tends to receive new features first, with cloud platforms following on a delay measured in weeks for most features and months for some.
All current Claude models are available on Amazon Bedrock, AWS's managed AI service. Bedrock integration provides:
Starting with Claude Sonnet 4.5 and Haiku 4.5, Bedrock offers two endpoint types: global endpoints (dynamic routing for maximum availability) and regional endpoints (data guaranteed within specific regions, at a 10% premium). For Opus 4.7, Haiku 4.5, and newer, the Messages API style Bedrock endpoint is the preferred integration path; older models still use the legacy Bedrock invocation interface. Bedrock is also the only platform that supports fine-tuning Claude models, with Claude 3 Haiku fine-tuning generally available since late 2024.
Claude is available through Vertex AI, offering:
Multi-region and regional endpoints carry the same 10% premium as Bedrock regional endpoints. Vertex AI imposes a tighter 30 MB request size limit than the direct API.
Microsoft Foundry (formerly Azure AI Foundry, Azure OpenAI's umbrella service) added Claude models in 2026, completing the three-cloud availability story. Foundry integrates Claude with Azure billing, Entra ID authentication, and Azure data residency regions. As with Bedrock and Vertex AI, Foundry pricing closely tracks the direct API but is published separately on Microsoft's pricing page.
Claude Managed Agents (the Sessions/Agents/Environments API) is available only through the direct Claude API and not through any cloud platform. This is the same pattern that has played out before with new features: cloud parity follows after the direct API has stabilized the design.
Anthropic enforces rate limits at the organization level using a token bucket algorithm, measured across three dimensions: requests per minute (RPM), input tokens per minute (ITPM), and output tokens per minute (OTPM) [17]. Limits apply per model class. For most models, only uncached input tokens count against ITPM; cache reads are free against the rate limit, which can effectively multiply throughput for cache-heavy workloads.
| Tier | Credit purchase | Max credit purchase | Monthly spend limit |
|---|---|---|---|
| Tier 1 | $5 | $100 | $100 |
| Tier 2 | $40 | $500 | $500 |
| Tier 3 | $200 | $1,000 | $1,000 |
| Tier 4 | $400 | $200,000 | $200,000 |
| Monthly Invoicing | Negotiated | n/a | No cap |
Organizations advance tiers automatically as their cumulative credit purchases hit each threshold. Tier 4 to Monthly Invoicing requires a sales conversation. Spend limits are user-configurable below the tier ceiling for cost control.
| Tier | Sonnet 4.x RPM | Sonnet 4.x ITPM | Sonnet 4.x OTPM | Opus 4.x RPM | Opus 4.x ITPM | Opus 4.x OTPM | Haiku 4.5 RPM | Haiku 4.5 ITPM | Haiku 4.5 OTPM |
|---|---|---|---|---|---|---|---|---|---|
| Tier 1 | 50 | 30,000 | 8,000 | 50 | 30,000 | 8,000 | 50 | 50,000 | 10,000 |
| Tier 2 | 1,000 | 450,000 | 90,000 | 1,000 | 450,000 | 90,000 | 1,000 | 450,000 | 90,000 |
| Tier 3 | 2,000 | 800,000 | 160,000 | 2,000 | 800,000 | 160,000 | 2,000 | 1,000,000 | 200,000 |
| Tier 4 | 4,000 | 2,000,000 | 400,000 | 4,000 | 2,000,000 | 400,000 | 4,000 | 4,000,000 | 800,000 |
Opus rate limits apply to combined traffic across Opus 4.7, 4.6, 4.5, 4.1, and 4. Sonnet 4.x limits apply across Sonnet 4.6, 4.5, and 4. The Batch API has its own per-tier RPM limits and processing-queue caps (100,000 requests per batch at all tiers; 100,000 to 500,000 in-queue requests by tier).
For applications requiring guaranteed throughput, Anthropic offers a Priority Tier with committed spend and enhanced service levels. Priority Tier traffic gets dedicated capacity allocation; the API exposes parallel anthropic-priority-input-tokens-* and anthropic-priority-output-tokens-* rate limit headers so customers can monitor priority capacity separately from on-demand.
Fast mode, in research preview as of May 2026 on Opus 4.6, provides significantly faster output at 6x standard pricing ($30 input / $150 output per MTok). It has its own rate limit pool and anthropic-fast-* response headers and is not available with the Batch API. Fast mode pricing applies across the full context window including beyond 200k input tokens. Prompt caching multipliers and data residency multipliers stack on top.
Every API response includes detailed rate limit headers so clients can track remaining capacity:
anthropic-ratelimit-requests-limit/remaining/reset: Request rate limitanthropic-ratelimit-input-tokens-limit/remaining/reset: Input token limitanthropic-ratelimit-output-tokens-limit/remaining/reset: Output token limitretry-after: Seconds to wait before retry on a 429request-id: Globally unique request identifier for support ticketsanthropic-organization-id: The org ID that issued the API keyThe SDKs surface these headers as client-side properties so well-behaved applications can implement adaptive concurrency without parsing them by hand.
Starting with Claude Opus 4.7, 4.6, and newer, the API supports an inference_geo parameter for specifying US-only inference, incurring a 1.1x multiplier on all token pricing including input, output, cache writes, and cache reads. By default, the API uses global routing at standard pricing. Earlier models retain their existing pricing regardless of inference_geo settings. Rate limits are currently shared across inference_geo values, so US-only and global requests draw from the same pool.
Anthropic's data handling policies state that API inputs and outputs are not used to train models, an important privacy guarantee for enterprise customers handling sensitive data. The default API retention is 30 days for most requests, with longer retention only when required by law or for safety violations (up to 2 years for flagged content). Third-party platforms (Bedrock, Vertex AI, Foundry) provide additional data residency controls through their regional infrastructure and operate under those platforms' compliance frameworks.
Anthropic offers two distinct data handling arrangements for organizations with specific compliance needs.
Under a ZDR arrangement, customer data is not stored at rest after the API response is returned, except where needed to comply with law or combat misuse. ZDR applies to the Messages API and Token Counting API, plus Claude Code when used with commercial API keys or through Claude Enterprise. ZDR is requested through the Anthropic sales team and configured per organization.
ZDR-eligible features include the Messages API, token counting, web search, web fetch, advisor tool, memory tool, context management and editing, fast mode, the 1M context window, adaptive thinking, citations, data residency, effort, extended thinking, PDF support inline, search results, bash and text editor tools, computer use, fine-grained tool streaming, prompt caching, structured outputs (qualified, schemas cached), and tool search.
Not ZDR-eligible include batch processing (29-day retention required for async storage), code execution (container retention up to 30 days), programmatic tool calling, the Files API (files persist until deleted), agent skills, and the MCP connector. Console and Workbench usage, Managed Agents, Claude consumer products, and Claude Teams or Enterprise interfaces are also outside ZDR scope.
For organizations handling protected health information (PHI), Anthropic offers HIPAA-ready API access with a signed Business Associate Agreement (BAA). This is enforced at the organization level: a HIPAA-enabled organization automatically blocks API requests that include non-eligible features and returns a 400 error specifying which features are restricted. PHI must not appear in JSON schema property names, enum values, const values, or regular expression patterns, since cached schemas do not receive the same PHI protections as message content.
HIPAA readiness applies only to the direct Claude API, not Bedrock or Vertex AI deployments, which have their own HIPAA paths through AWS and Google Cloud.
Anthropic publishes its compliance posture through the Trust Center at trust.anthropic.com. The current set includes SOC 2 Type II, ISO 27001, ISO 27701, ISO 27017, and ISO 27018 certifications, along with HIPAA readiness for the API and for Claude Enterprise. CSA STAR certifications are also available. Customer audit reports and the full HIPAA Implementation Guide are downloadable from the Trust Center.
The Anthropic status page at status.anthropic.com reports availability across the API, Console, claude.ai, and the Bedrock and Vertex AI integrations. Outages have been infrequent: notable incidents in 2024 and 2025 included intermittent latency during the Claude 3.5 launch surge and a brief regional outage around the Claude 4 release in May 2025. Anthropic publishes post-incident reports for major incidents and exposes a public RSS feed for status updates.
The Anthropic API and the OpenAI API are the two most widely used commercial LLM APIs, with the Gemini API increasingly competitive on specific axes such as multimodality and context length. The following table compares their key characteristics as of May 2026.
| Feature | Anthropic API | OpenAI API | Gemini API |
|---|---|---|---|
| Primary endpoint | Messages API (/v1/messages) | Chat Completions and Responses | generateContent and streamGenerateContent |
| Flagship model | Claude Opus 4.7 ($5/$25 per MTok) | GPT-5.4 (around $2.50/$15) | Gemini 2.5 Pro |
| Mid-tier model | Claude Sonnet 4.6 ($3/$15) | GPT-4.1 (around $2/$8) | Gemini 2.5 Flash |
| Budget model | Claude Haiku 4.5 ($1/$5) | GPT-4.1 Nano ($0.10/$0.40) | Gemini 2.5 Flash-Lite |
| Max context window | 1,000,000 tokens (Opus 4.7, Opus 4.6, Sonnet 4.6) | About 1,050,000 (GPT-5.4) | 2,000,000 (Gemini 2.5 Pro) |
| Reasoning mode | Adaptive thinking, per-request | Dedicated o-series models | Thinking mode toggle |
| Prompt caching | 90% discount on hits, 5m and 1h tiers | 50-90% discount | 75% discount |
| Batch API | 50% discount | 50% discount | 50% discount |
| Tool use | Server-side and client-side tools | Built-in tools and function calling | Function calling and built-in tools |
| Computer use | Beta on all 4.x models | Native in GPT-5.4 | Limited beta |
| Vision | All current models | GPT-4o and later | All Gemini models |
| First-party embeddings | None (Voyage AI is the recommended partner) | text-embedding-3 family | text-embedding-004 |
| Native MCP | Yes (connector and clients) | Limited | Limited |
| Skills runtime | Yes (Skills API) | None | None |
| SDKs | Python, TypeScript, Java, Go, Ruby, C#, PHP, CLI | Python, Node.js, .NET, Java, Go | Python, Node.js, Go, Java, Dart |
| Cloud availability | AWS Bedrock, Google Vertex AI, Microsoft Foundry | Microsoft Azure (exclusive through 2030) | Google Cloud Vertex AI |
| Fine-tuning | Claude 3 Haiku on Bedrock only | GPT-4o, GPT-4o mini, GPT-4.1 mini direct | Gemini 2.5 Flash and earlier |
| Streaming | SSE | SSE | SSE |
The APIs have converged on most surface features. Anthropic leads on multi-cloud distribution, MCP support, and the Skills runtime. OpenAI leads on breadth of product surface (image generation, speech, embeddings, real-time voice) and per-token pricing on flagship models. Gemini leads on raw context window size (2M tokens on 2.5 Pro) and tightly integrated multimodal output. Per-million pricing has converged within a factor of two across the three providers for comparable capabilities, so applications now tend to pick by feature priority rather than headline price.
The Anthropic API has matured into a full-featured development platform competing directly with the OpenAI API across enterprise and developer markets. The release of Claude Opus 4.6 and Sonnet 4.6 in February 2026 brought the full 1M token context window to standard pricing (previously, long context required beta access and premium rates), making Anthropic the first major provider to offer 1M context without surcharges on its latest models [2]. The April 2026 release of Claude Opus 4.7 added a new tokenizer and large gains on agentic coding benchmarks.
Several developments define the API's current position:
The API continues to evolve, with Anthropic maintaining a roughly quarterly release cadence for major model updates and a steady stream of incremental beta features in between. The pattern has held: ship a feature behind a beta header, iterate based on customer feedback, then graduate it to general availability with a stable name. The Messages API itself has proven remarkably durable.