Anthropic API

AI Tools & Products Anthropic Large Language Models

39 min read

Updated Jun 21, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 21, 2026

Fact-checked

In review queue

Sources

30 citations

Revision

v4 · 7,710 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

The Anthropic API is the developer interface for Anthropic's Claude family of large language models, a Messages-based HTTP service hosted at https://api.anthropic.com for building conversational AI applications, AI agents, and automated workflows. Anthropic launched the API on March 14, 2023, alongside the original Claude and Claude Instant models, the same day OpenAI released GPT-4 ^[1]. The interface has proven durable: code written against the Messages API in 2024 still runs unchanged in 2026, with new capabilities delivered through opt-in beta headers rather than breaking endpoint changes. The API is also available through third-party cloud platforms including Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry (formerly Azure AI Foundry).

As of May 2026, the Anthropic API serves three model tiers: Claude Opus (highest capability), Claude Sonnet (balanced performance and cost), and Claude Haiku (fastest and most affordable). The current flagship is Claude Opus 4.7, released April 16, 2026, which sits alongside Claude Sonnet 4.6, Claude Opus 4.6, and Claude Haiku 4.5. The latest generation offers up to one million tokens of context at standard pricing, adaptive thinking for complex reasoning, and a deep stack of developer features including tool use, prompt caching, batch processing, computer use, citations, Skills, MCP connectors, and vision ^[2].

The API is hosted at https://api.anthropic.com and follows REST conventions. The current API version header is anthropic-version: 2023-06-01, with feature additions delivered through opt-in beta headers rather than versioned endpoint changes. Code written against the Messages API in 2024 still works in 2026, even as the surface area underneath has grown substantially.

History

When was the Anthropic API launched?

Anthropic released its API on March 14, 2023, making two models available: Claude (the more capable variant) and Claude Instant (a faster, lower-cost derivative) ^[1]^[3]. Access was initially limited to selected partners, including Notion, Quora (through its Poe chatbot), DuckDuckGo, and Robin AI. The API launch came approximately four months after ChatGPT's release and the same day OpenAI released GPT-4, placing Anthropic squarely in competition with OpenAI from the start.

The original API used a text completions format with a \n\nHuman: and \n\nAssistant: turn structure, distinct from OpenAI's message-based approach. The legacy text completions endpoint (POST /v1/complete) remains accessible for backward compatibility but has been formally deprecated in favor of the Messages API.

Claude 2 and public availability (July 2023)

Claude 2, released in July 2023, became the first Anthropic model available to the general public without an application process, reachable through both an API and a public beta website ^[4]^[28]. It introduced a 100K token context window, a major differentiator at a time when most competing models topped out at 4K to 32K tokens. Claude 2 also showed significant improvements on academic benchmarks, scoring 76.5% on the multiple-choice section of the Uniform Bar Exam (up from 73.0% for Claude 1.3) and landing in the 90th percentile on the GRE reading and writing sections ^[28].

Cloud platform partnerships (2023)

In September 2023, Amazon announced a partnership with Anthropic, initially investing $1.25 billion (later expanded to $4 billion total), making Claude available on Amazon Bedrock ^[5]. The following month, Google invested $500 million in Anthropic, with commitments for $1.5 billion more over time, and Claude became available on Google Cloud Vertex AI. These partnerships gave enterprise customers the option of accessing Claude through their existing cloud infrastructure, with data residency guarantees and unified billing.

Claude 3 family (March 2024)

The Claude 3 release on March 4, 2024, introduced the three-tier model structure that Anthropic continues to use: Opus (most capable), Sonnet (balanced), and Haiku (fastest) ^[6]. Claude 3 Opus set new benchmarks in reasoning and analysis, while Haiku offered exceptionally fast responses at a fraction of the cost. This release also brought native vision capabilities to all tiers, allowing the API to process images alongside text. The Messages API, introduced earlier alongside Claude 2.1, fully supplanted the text completions format with this generation.

Messages API and modern features (2024-2025)

Throughout 2024 and 2025, Anthropic expanded the API's feature set substantially. Each addition tended to land first as a beta header, mature for several months under customer use, then graduate to GA. Key milestones:

April 2024: Tool use (function calling) became generally available
August 2024: Prompt caching launched, offering up to 90% cost reduction on repeated input content ^[7]
October 2024: Message Batches API launched in public beta on October 8, 2024 with 50% cost savings for asynchronous workloads, reaching general availability December 17, 2024 ^[13]
October 2024: Computer use tool in beta, enabling Claude to interact with desktop environments ^[8]
November 2024: Open-sourcing of the Model Context Protocol (MCP) on November 25, 2024 as a standard for connecting agents to external data sources and tools ^[25]
February 2025: Extended thinking mode for enhanced reasoning on complex tasks ^[9]
April 2025: Files API in beta, supporting persistent uploads referenced by file ID
May 2025: Claude 4 models launched, along with the text editor tool and code execution tool
August 2025: Memory tool introduced for cross-session knowledge retention
September 2025: Claude Sonnet 4.5 and the renamed Claude Agent SDK
October 2025: Claude 4.5 series with Haiku 4.5; Skills API in beta
November 2025: MCP connector beta for remote MCP servers from the Messages API
February 2026: Claude Opus 4.6 and Sonnet 4.6 with 1M context at standard pricing and adaptive thinking; fast mode beta
April 2026: Claude Opus 4.7 released on April 16, 2026 with a new tokenizer and step-change improvements in agentic coding ^[29]

Claude Managed Agents (2026)

In 2026 Anthropic added a dedicated managed agent runtime to the API, exposing three new resource types alongside Messages: Agents (/v1/agents), Sessions (/v1/sessions), and Environments (/v1/environments). Sessions run in stateful cloud containers managed by Anthropic, billed on session-hours, and are streamed back to the client. This is the API-side counterpart to features that previously required developers to bring their own infrastructure with the Claude Agent SDK.

Available APIs and endpoints

The Claude API spans several endpoint groups, some generally available and others in beta. The Messages API is the center; everything else either feeds into it (Files, Skills) or wraps around it (Batches, Sessions).

General availability

Endpoint	Path	Purpose
Messages	`POST /v1/messages`	Primary endpoint for all Claude interactions
Message Batches	`POST /v1/messages/batches`	Asynchronous bulk processing at 50% discount
Token Counting	`POST /v1/messages/count_tokens`	Pre-flight token counting before sending
Models	`GET /v1/models`	List available models with metadata and capabilities
Rate Limits	`GET /v1/organizations/usage_report/rate_limits`	Programmatic access to current rate limits

Beta endpoints

Endpoint	Path	Beta header
Files	`POST /v1/files`, `GET /v1/files/{id}`	`files-api-2025-04-14`
Skills	`POST /v1/skills`, `GET /v1/skills`	`skills-2025-10-02`
Agents	`POST /v1/agents`, `GET /v1/agents`	Managed Agents preview
Sessions	`POST /v1/sessions`, `GET /v1/sessions/{id}/stream`	Managed Agents preview
Environments	`POST /v1/environments`, `GET /v1/environments`	Managed Agents preview

Admin API

The Admin API allows organizations to manage workspaces, members, and API keys programmatically. It uses a separate credential type, the Admin API key (prefixed sk-ant-admin...), which can only be provisioned by users holding the admin role through the Console. Admin API endpoints sit under /v1/organizations/:

Operation	Path	Method
List workspaces	`/v1/organizations/workspaces`	GET
Create workspace	`/v1/organizations/workspaces`	POST
Get workspace	`/v1/organizations/workspaces/{id}`	GET
Archive workspace	`/v1/organizations/workspaces/{id}/archive`	POST
List members	`/v1/organizations/workspaces/{id}/members`	GET
Add member	`/v1/organizations/workspaces/{id}/members`	POST
Remove member	`/v1/organizations/workspaces/{id}/members/{user_id}`	DELETE

Note that Anthropic does not offer first-party text embeddings at the API level. The recommended embeddings partner is Voyage AI, now owned by MongoDB; example notebooks for Voyage models live in the official Anthropic Cookbook repository. Developers who need vector retrieval typically pair Claude with Voyage embeddings or another provider.

Request size limits

Endpoint	Maximum request size
Messages, Token Counting	32 MB
Batch API	256 MB
Files API	500 MB per file, 500 GB per organization
Sessions, Agents, Environments	32 MB

Exceeding these limits returns a 413 request_too_large error. Vertex AI imposes a tighter 30 MB limit and Bedrock 20 MB, so applications that hit cloud platform limits sometimes need to switch to the direct API for the largest payloads.

Available models and pricing

The pricing tables below reflect the published rates as of May 2026 ^[2]. All prices are in USD per million tokens (MTok). Cache hit pricing applies to both 5-minute and 1-hour cached reads at 10% of the base input rate.

Current models

Model	Input	5m cache write	1h cache write	Cache hit	Output	Context	Max output
Claude Opus 4.7	$5.00	$6.25	$10.00	$0.50	$25.00	1,000,000	128k
Claude Sonnet 4.6	$3.00	$3.75	$6.00	$0.30	$15.00	1,000,000	64k
Claude Haiku 4.5	$1.00	$1.25	$2.00	$0.10	$5.00	200,000	64k

Available legacy models

Model	Input	5m cache write	1h cache write	Cache hit	Output	Status
Claude Opus 4.6	$5.00	$6.25	$10.00	$0.50	$25.00	Active
Claude Opus 4.5	$5.00	$6.25	$10.00	$0.50	$25.00	Active
Claude Opus 4.1	$15.00	$18.75	$30.00	$1.50	$75.00	Active
Claude Opus 4	$15.00	$18.75	$30.00	$1.50	$75.00	Deprecated, retires June 15, 2026
Claude Sonnet 4.5	$3.00	$3.75	$6.00	$0.30	$15.00	Active
Claude Sonnet 4	$3.00	$3.75	$6.00	$0.30	$15.00	Deprecated, retires June 15, 2026
Claude Sonnet 3.7	$3.00	$3.75	$6.00	$0.30	$15.00	Deprecated
Claude Haiku 3.5	$0.80	$1.00	$1.60	$0.08	$4.00	Deprecated
Claude Haiku 3	$0.25	$0.30	$0.50	$0.03	$1.25	Active
Claude Opus 3	$15.00	$18.75	$30.00	$1.50	$75.00	Deprecated

Opus 4.7 uses a new tokenizer compared to previous models, contributing to its improved performance on a wide range of tasks. The new tokenizer may map the same input to roughly 1.0 to 1.35 times as many tokens depending on content type, so cost comparisons across model generations should be done by reading actual usage from API responses rather than naive per-token math ^[29]. Anthropic also cautions that the model follows instructions more literally than its predecessors: "where previous models interpreted instructions loosely or skipped parts entirely, Opus 4.7 takes the instructions literally," so prompts tuned for earlier models can behave differently ^[29].

The Claude 4.5 and 4.6 generation represents a dramatic price reduction compared to earlier Opus models. Claude Opus 4.5 onward costs $5 per MTok input and $25 per MTok output, compared to $15 and $75 for Opus 4.1, a 67% reduction in both directions. Sonnet pricing has held steady at $3 / $15 since the Claude 3 generation.

Model deprecation policy

Anthropic publishes a deprecation calendar at docs.anthropic.com/en/docs/about-claude/model-deprecations and gives roughly six months of notice before retiring a model. Claude Sonnet 4 and Opus 4 (the original 2025 versions) retire on June 15, 2026. Claude 3.7 Sonnet, Haiku 3.5, and Opus 3 are also marked deprecated, though they remain callable until their retirement dates. After retirement, requests to a sunsetted model ID return a 404 not_found_error, and applications must migrate to a current alias such as claude-opus-4-7 or claude-sonnet-4-6.

Core API features

Messages API

The Messages API (POST /v1/messages) is the primary endpoint for all Claude interactions ^[10]. It uses a structured format with:

System prompt: Optional instructions that set Claude's behavior, persona, or constraints
Messages array: Alternating user and assistant messages forming the conversation
Model selection: Specify which Claude model to use
Parameters: Temperature, max tokens, top-p, stop sequences, and tool definitions

A basic request sends a system prompt and user message, receiving back an assistant response with metadata including token usage counts. The API returns a structured JSON response containing the model's output, stop reason, and detailed usage information.

Tool use (function calling)

Claude's tool use capabilities allow developers to extend the model's functionality by defining external tools that Claude can invoke ^[11]. The developer describes available tools with names, descriptions, and JSON Schema input definitions; Claude decides when to use them and generates structured arguments. The API then returns a tool_use content block, and the developer executes the tool and sends results back as a tool_result message.

Tool use supports several modes:

Tool choice	Behavior
`auto`	Claude decides whether to use tools
`any`	Claude must use at least one tool
`tool` (specific name)	Claude must use the specified tool
`none`	Tools are disabled for this request

Parallel tool use, added with the Claude 4 family in May 2025, lets the model emit multiple tool_use blocks in a single turn so the client can run them concurrently. Token-efficient tool use, available since Sonnet 3.7, reformats the underlying tool prompts to consume fewer system-prompt tokens, which becomes meaningful when an application loads dozens of tool definitions.

Anthropic also provides server-side tools that run on Anthropic's infrastructure rather than the client:

Web search (web_search_20260209): Searches the web and returns results, $10 per 1,000 searches
Web fetch (web_fetch_20260209): Fetches and processes web page content, no additional charge beyond standard token costs
Code execution (code_execution_20250825): Runs Python in a sandboxed container; free when used with web search or web fetch, otherwise $0.05 per container-hour beyond 1,550 free hours per month
Text editor (text_editor_20250728 for Opus 4.7+): Edits files with view, create, and replace operations
Bash (bash_20250124): Runs shell commands; client-side execution
Computer use (computer_20251124): Controls desktop environments via screenshots and mouse/keyboard input
Memory (memory_20250818): Persistent file-based memory across sessions; client-side storage
Tool search: Dynamically loads tool definitions on demand for very large tool catalogs
Advisor: Returns advisor model output for evaluation flows

Vision

All current Claude models support vision, accepting images in requests alongside text ^[12]. Images can be provided as base64-encoded data, URLs, or via Files API references. Claude can analyze photographs, charts, diagrams, screenshots, documents, and other visual content. Vision is priced based on image dimensions, with each image consuming tokens proportional to its resolution. The API constrains images to roughly 1.15 megapixels and 1568 pixels on the longest edge for older models, though Opus 4.7 raises that ceiling to 2576 pixels with 1:1 coordinate mapping for computer use workflows.

Extended thinking and adaptive thinking

Extended thinking is a feature that allows Claude to engage in deeper, step-by-step reasoning before producing a response ^[9]. When enabled, Claude generates internal thinking tokens that work through the problem systematically, similar to the approach used by OpenAI's o-series reasoning models.

The original implementation used a thinking: {type: "enabled", budget_tokens: N} parameter, where the developer set a maximum budget for reasoning tokens. Starting with Claude Opus 4.6 and Sonnet 4.6, Anthropic introduced adaptive thinking, which uses an effort parameter (low, medium, or high) and lets the model decide how deeply to think. By Claude Opus 4.7, manual extended thinking is no longer supported and adaptive thinking is the only mode.

# Adaptive thinking (Claude Opus 4.7)
thinking={"type": "adaptive", "effort": "medium"}

# Manual extended thinking (Sonnet 4.6 and earlier)
thinking={"type": "enabled", "budget_tokens": 10000, "display": "summarized"}

Extended thinking tokens are billed at the standard output token rate. For complex tasks, thinking can take several minutes, and the thinking tokens can significantly exceed the visible output. Sonnet 4.6 is notable for being a hybrid model: developers can toggle adaptive thinking on or off per request, choosing between faster standard responses and deeper reasoning as needed. Sonnet 4.6 supports both adaptive and manual modes; Opus 4.6 supports both but recommends adaptive; Opus 4.7 only supports adaptive.

Prompt caching

Prompt caching reduces costs and latency by reusing previously processed portions of prompts across API calls ^[7]. Instead of reprocessing the same system prompt, document, or conversation prefix on every request, the API reads from cache at a fraction of the standard input price.

There are two cache durations with different write costs:

Cache operation	Cost multiplier	Duration
5-minute cache write	1.25x base input price	5 minutes
1-hour cache write	2.0x base input price	1 hour
Cache read (hit)	0.1x base input price	Same duration as write

A cache hit costs 10% of the standard input price, a 90% reduction on cached content. For the 5-minute cache, a single read after the initial write already saves money (1.25x write plus 0.1x read equals 1.35x total, versus 2.0x for two uncached requests). The 1-hour cache breaks even at two reads. Caching is especially valuable for applications with long, stable system prompts or repeated document analysis.

Minimum cacheable prompt lengths vary by model:

Model	Minimum tokens to cache
Opus 4.7, 4.6, 4.5	4,096
Opus 4.1, 4	1,024
Sonnet 4.6	2,048
Sonnet 4.5, 4, 3.7	1,024
Haiku 4.5	4,096
Haiku 3.5	2,048

Prompts below the minimum are processed without caching silently rather than returning an error. Caches use a 20-block lookback window, meaning developers should add multiple cache breakpoints in long conversations to ensure the system finds a hit.

Prompt caching can be combined with other discounts. The Batch API's 50% discount stacks with caching, and long context pricing multipliers also stack, enabling substantial compound savings. For most current models, cache reads do not count against the input tokens-per-minute (ITPM) rate limit, which means a high cache hit ratio effectively raises throughput.

Batch API

The Batch API (/v1/messages/batches) processes large volumes of requests asynchronously at a 50% discount on all token costs ^[13]. Results are typically delivered within hours, with a 24-hour SLA target.

Model	Batch input	Batch output
Claude Opus 4.7	$2.50	$12.50
Claude Opus 4.6	$2.50	$12.50
Claude Opus 4.5	$2.50	$12.50
Claude Opus 4.1	$7.50	$37.50
Claude Sonnet 4.6	$1.50	$7.50
Claude Sonnet 4.5	$1.50	$7.50
Claude Sonnet 4	$1.50	$7.50
Claude Haiku 4.5	$0.50	$2.50
Claude Haiku 3.5	$0.40	$2.00
Claude Haiku 3	$0.125	$0.625

Batch requests are submitted as collections of individual message requests, each with a custom ID for tracking. Results are stored for 29 days, after which they are purged. The 50% discount combined with prompt caching can yield total savings exceeding 90% compared to standard synchronous pricing. Batches are not eligible for ZDR or fast mode.

Computer use

The computer use tool, introduced in beta in October 2024, enables Claude to interact with desktop environments by viewing screenshots and performing mouse and keyboard actions ^[8]. When given access to a computer environment (typically via a Docker container or virtual machine), Claude can navigate file systems and applications, fill out forms and spreadsheets, interact with web browsers, and execute multi-step workflows across applications.

Current versions are differentiated by tool type string:

computer_20251124: Opus 4.7, Opus 4.6, Sonnet 4.6, Opus 4.5; adds the zoom action for inspecting regions at full resolution
computer_20250124: Sonnet 4.5, Haiku 4.5, Opus 4.1, Sonnet 4, Opus 4, Sonnet 3.7

Computer use follows standard tool use pricing, plus 466 to 499 tokens of system prompt overhead, 735 tokens per tool definition for Claude 4.x, and additional tokens from the screenshot images Claude analyzes. The feature requires a beta header (computer-use-2025-11-24 or computer-use-2025-01-24) and is ZDR-eligible because all screenshots and files stay on the client. Anthropic ships a reference implementation in a Docker container in the anthropic-quickstarts repository.

Citations

The citations feature, introduced in early 2025, allows Claude to produce grounded responses with character-level pointers back to source documents. When a developer enables citations on a document content block, Claude's response is broken into multiple text blocks, each annotated with the specific cited text, document index, and either character indices, page numbers, or content block indices.

Three document types are supported:

Type	Best for	Chunking	Citation format
Plain text	Prose, simple documents	Sentence	Character indices (0-indexed)
PDF	PDF files with extractable text	Sentence	Page numbers (1-indexed)
Custom content	Lists, transcripts, fine-grained citations	None (use blocks as-is)	Content block indices

Citations work with all current models except Haiku 3, and are compatible with prompt caching, token counting, and batch processing. They are not compatible with structured outputs because the citation format requires interleaving citation blocks with text. The cited_text field in the response does not count toward output tokens, which makes citations cheaper than asking the model to repeat quotes in its prose.

Files API

The Files API, in beta since April 2025, lets developers upload files once and reference them across multiple Messages requests. Files are addressed by file_id and persist until explicitly deleted, with a 500 MB per-file ceiling and 500 GB per-organization storage cap. Supported types include PDF (application/pdf), plain text, images (jpeg, png, gif, webp), and various dataset formats for the code execution tool.

File API operations are free; developers pay only for the tokens consumed when a file's contents are loaded into a Messages request. The Files API requires the files-api-2025-04-14 beta header and is not currently supported on Bedrock or Vertex AI. It is also not ZDR-eligible because files necessarily persist server-side.

Memory tool

The memory tool (memory_20250818), introduced in August 2025, gives Claude a persistent file-based memory directory that survives across conversations. Claude can view, create, str_replace, insert, delete, and rename files inside /memories, and the system prompt automatically includes a memory protocol instructing Claude to check the directory before doing any work.

Memory is a client-side tool: storage and retrieval happen in the developer's infrastructure, which means the developer chooses the backend (filesystem, database, encrypted storage) and is responsible for path traversal protection. Because nothing is stored at Anthropic, the memory tool is ZDR-eligible. The most common pattern pairs memory with context editing or compaction so long-running agents can summarize old context server-side while persisting the truly important state in memory files.

Skills

Claude Skills, launched in October 2025 as a beta API surface, are filesystem-based packages of instructions, scripts, and reference material that Claude loads on demand. Each skill consists of a SKILL.md file with YAML frontmatter (name, description) plus optional supporting files. The design uses progressive disclosure: only the metadata of every installed skill loads upfront (around 100 tokens each), the SKILL.md body loads when triggered, and bundled scripts run via bash without ever entering the context window.

The Skills API exposes POST /v1/skills and GET /v1/skills endpoints for uploading and managing custom skills, scoped to a workspace. Anthropic also publishes pre-built skills for PowerPoint (pptx), Excel (xlsx), Word (docx), and PDF generation, plus an open-source claude-api skill that ships up-to-date API reference material across eight languages. Using skills via the API requires three beta headers: code-execution-2025-08-25, skills-2025-10-02, and files-api-2025-04-14. Skills are not ZDR-eligible.

MCP support and the MCP connector

On November 25, 2024, Anthropic open-sourced the Model Context Protocol, describing it as "an open standard that enables developers to build secure, two-way connections between their data sources and AI-powered tools" ^[25]. The protocol was created at Anthropic by engineers David Soria Parra and Justin Spahr-Summers, and the initial release shipped Python and TypeScript SDKs plus pre-built reference servers for Google Drive, Slack, GitHub, Git, Postgres, and Puppeteer ^[25]. The Anthropic API integrates with MCP through several paths.

Client-side MCP: The Claude Agent SDK, Claude Code, and other clients can connect to local or remote MCP servers using stdio, HTTP, or SSE transports. The agent process loads MCP servers and exposes their tools to Claude as part of the tools array.

MCP connector (server-side): In April 2025 Anthropic added an mcp_servers parameter directly to the Messages API, letting developers point Claude at remote MCP server URLs without running an MCP client themselves. The current beta header is mcp-client-2025-11-20, with the older mcp-client-2025-04-04 deprecated. The connector supports OAuth authentication for protected servers and lets a single request connect to multiple MCP endpoints. As of 2026 only the tools subset of the MCP specification is supported through the connector; resources and prompts are not yet exposed. The MCP connector is not ZDR-eligible.

System prompts

System prompts in the Anthropic API provide instructions and context that shape Claude's behavior throughout a conversation ^[10]. Unlike user messages, system prompts are processed once and set the foundation for all subsequent interactions. Common uses include:

Defining Claude's persona or role
Setting output format requirements
Providing reference documents or knowledge bases
Establishing behavioral constraints
Specifying language or tone preferences

System prompts are particularly effective when combined with prompt caching. A long system prompt containing reference documentation can be cached, reducing costs by 90% on subsequent requests while maintaining consistent behavior. The Messages API also automatically prepends specific instructions for tool use (around 313-346 tokens depending on model and tool_choice setting) and computer use (466-499 tokens), so heavy tool use applications often see the system prompt grow without explicit developer input.

Streaming and SSE

The API supports server-sent events (SSE) streaming, delivering response tokens incrementally as they are generated ^[14]. Streaming is enabled by setting "stream": true in the request body. The event stream uses standard SSE framing, with event: and data: lines, and includes the following event types:

Event	Description
`message_start`	Initial message metadata, including model and starting usage counts
`content_block_start`	Beginning of a text, thinking, tool use, or citation block
`content_block_delta`	Incremental text, JSON fragments, thinking deltas, or citation deltas
`content_block_stop`	End of a content block
`message_delta`	Updates to message-level fields such as stop reason and final usage
`message_stop`	Complete message has been delivered
`ping`	Heartbeat to keep the connection alive
`error`	Stream-level error

The Python, TypeScript, Java, Go, Ruby, C#, and PHP SDKs provide high-level streaming helpers that handle event parsing and offer both synchronous and asynchronous interfaces. Thinking deltas (sub-events of content_block_delta for adaptive and extended thinking) and citation deltas (when citations are enabled) follow the same framing. Fine-grained tool streaming, available since Sonnet 3.7, lets clients render tool use parameters as they are produced rather than waiting for the full block.

SDKs

Anthropic provides official client SDKs across eight environments ^[15]. As of May 2026, all SDKs share a near-identical surface area: an Anthropic (or equivalent) client class with a messages.create() method, beta namespaces for opt-in features, automatic retry with exponential backoff, and request-level overrides for timeouts and headers.

SDK	Package	Min version	Notable features
Python	`anthropic` (PyPI)	Python 3.9+	Sync and async clients, Pydantic models, streaming helpers
TypeScript	`@anthropic-ai/sdk` (npm)	TypeScript 4.9+, Node.js 20+	ESM/CJS, browser support, streaming via async iterators
Java	`com.anthropic:anthropic-java` (Maven, current 2.27.0)	Java 8+	Builder pattern, CompletableFuture async
Go	`github.com/anthropics/anthropic-sdk-go`	Go 1.23+	Context cancellation, functional options
Ruby	`anthropic` (Bundler)	Ruby 3.2.0+	Sorbet types, streaming helpers
C# / .NET	`Anthropic` (NuGet)	.NET Standard 2.0	IChatClient integration
PHP	`anthropic-ai/sdk` (Composer)	PHP 8.1.0+	Value objects, builder pattern
CLI	`ant` (Homebrew)	n/a	Shell scripting with typed flags and response transforms

All SDKs automatically detect the ANTHROPIC_API_KEY environment variable and support the major cloud platforms (Bedrock, Vertex AI, Foundry) via environment variable switches. Beta features are accessed through a beta namespace, which on the Python SDK looks like client.beta.messages.create(..., betas=["feature-name"]). The CLI is the newer addition: it ships typed YAML inputs and JSONPath-style response transforms, making it useful for shell scripting.

A basic Python example:

from anthropic import Anthropic
client = Anthropic()
message = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello, Claude"}]
)
print(message.content[0].text)

Authentication

All requests to the Claude API must include either an API key or a workload identity token. The required headers are:

Header	Value
`x-api-key`	Standard API key from the Console (`sk-ant-...`)
`Authorization`	`Bearer <token>` from `POST /v1/oauth/token` (Workload Identity Federation)
`anthropic-version`	API version, currently `2023-06-01`
`content-type`	`application/json`
`anthropic-beta`	Optional, comma-separated list of beta features

API keys are scoped to workspaces, so an organization with multiple projects can issue separate keys with their own spend limits and rate limits. Admin API keys (sk-ant-admin-...) are a separate credential type used for organization management; only users with the admin role can mint them.

Workload Identity Federation, added in 2025, allows applications to exchange short-lived OAuth tokens for API access without storing long-lived API keys. This is the recommended path for production workloads in regulated environments. The API accepts either authentication mode but not both in the same request.

Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry

Claude models are available on the three major cloud platforms, providing enterprise customers with additional deployment options ^[5]^[16]. The direct Claude API tends to receive new features first, with cloud platforms following on a delay measured in weeks for most features and months for some.

Amazon Bedrock

All current Claude models are available on Amazon Bedrock, AWS's managed AI service. Bedrock integration provides:

Unified billing through AWS accounts
VPC endpoints for private network access
Integration with AWS IAM for authentication
AWS CloudTrail for audit logging
Data residency within AWS regions

Starting with Claude Sonnet 4.5 and Haiku 4.5, Bedrock offers two endpoint types: global endpoints (dynamic routing for maximum availability) and regional endpoints (data guaranteed within specific regions, at a 10% premium). For Opus 4.7, Haiku 4.5, and newer, the Messages API style Bedrock endpoint is the preferred integration path; older models still use the legacy Bedrock invocation interface. Bedrock is also the only platform that supports fine-tuning Claude models, with Claude 3 Haiku fine-tuning generally available since late 2024.

Google Cloud Vertex AI

Claude is available through Vertex AI, offering:

Integration with Google Cloud IAM and billing
Vertex AI feature store and pipeline integration
Three endpoint tiers: global, multi-region (dynamic routing within a geographic area), and regional
Google Cloud monitoring and logging

Multi-region and regional endpoints carry the same 10% premium as Bedrock regional endpoints. Vertex AI imposes a tighter 30 MB request size limit than the direct API.

Microsoft Foundry

Microsoft Foundry (formerly Azure AI Foundry, Azure OpenAI's umbrella service) added Claude models in 2026, completing the three-cloud availability story. Foundry integrates Claude with Azure billing, Entra ID authentication, and Azure data residency regions. As with Bedrock and Vertex AI, Foundry pricing closely tracks the direct API but is published separately on Microsoft's pricing page.

Claude Managed Agents (the Sessions/Agents/Environments API) is available only through the direct Claude API and not through any cloud platform. This is the same pattern that has played out before with new features: cloud parity follows after the direct API has stabilized the design.

Rate limits and usage tiers

Anthropic enforces rate limits at the organization level using a token bucket algorithm, measured across three dimensions: requests per minute (RPM), input tokens per minute (ITPM), and output tokens per minute (OTPM) ^[17]. Limits apply per model class. For most models, only uncached input tokens count against ITPM; cache reads are free against the rate limit, which can effectively multiply throughput for cache-heavy workloads.

Tier qualification

Tier	Credit purchase	Max credit purchase	Monthly spend limit
Tier 1	$5	$100	$100
Tier 2	$40	$500	$500
Tier 3	$200	$1,000	$1,000
Tier 4	$400	$200,000	$200,000
Monthly Invoicing	Negotiated	n/a	No cap

Organizations advance tiers automatically as their cumulative credit purchases hit each threshold. Tier 4 to Monthly Invoicing requires a sales conversation. Spend limits are user-configurable below the tier ceiling for cost control.

Standard rate limits

Tier	Sonnet 4.x RPM	Sonnet 4.x ITPM	Sonnet 4.x OTPM	Opus 4.x RPM	Opus 4.x ITPM	Opus 4.x OTPM	Haiku 4.5 RPM	Haiku 4.5 ITPM	Haiku 4.5 OTPM
Tier 1	50	30,000	8,000	50	30,000	8,000	50	50,000	10,000
Tier 2	1,000	450,000	90,000	1,000	450,000	90,000	1,000	450,000	90,000
Tier 3	2,000	800,000	160,000	2,000	800,000	160,000	2,000	1,000,000	200,000
Tier 4	4,000	2,000,000	400,000	4,000	2,000,000	400,000	4,000	4,000,000	800,000

Opus rate limits apply to combined traffic across Opus 4.7, 4.6, 4.5, 4.1, and 4. Sonnet 4.x limits apply across Sonnet 4.6, 4.5, and 4. The Batch API has its own per-tier RPM limits and processing-queue caps (100,000 requests per batch at all tiers; 100,000 to 500,000 in-queue requests by tier).

Priority Tier and fast mode

For applications requiring guaranteed throughput, Anthropic offers a Priority Tier with committed spend and enhanced service levels. Priority Tier traffic gets dedicated capacity allocation; the API exposes parallel anthropic-priority-input-tokens-* and anthropic-priority-output-tokens-* rate limit headers so customers can monitor priority capacity separately from on-demand.

Fast mode, in research preview as of May 2026 on Opus 4.6, provides significantly faster output at 6x standard pricing ($30 input / $150 output per MTok). It has its own rate limit pool and anthropic-fast-* response headers and is not available with the Batch API. Fast mode pricing applies across the full context window including beyond 200k input tokens. Prompt caching multipliers and data residency multipliers stack on top.

Rate limit response headers

Every API response includes detailed rate limit headers so clients can track remaining capacity:

anthropic-ratelimit-requests-limit/remaining/reset: Request rate limit
anthropic-ratelimit-input-tokens-limit/remaining/reset: Input token limit
anthropic-ratelimit-output-tokens-limit/remaining/reset: Output token limit
retry-after: Seconds to wait before retry on a 429
request-id: Globally unique request identifier for support tickets
anthropic-organization-id: The org ID that issued the API key

The SDKs surface these headers as client-side properties so well-behaved applications can implement adaptive concurrency without parsing them by hand.

Data residency and privacy

Starting with Claude Opus 4.7, 4.6, and newer, the API supports an inference_geo parameter for specifying US-only inference, incurring a 1.1x multiplier on all token pricing including input, output, cache writes, and cache reads. By default, the API uses global routing at standard pricing. Earlier models retain their existing pricing regardless of inference_geo settings. Rate limits are currently shared across inference_geo values, so US-only and global requests draw from the same pool.

Anthropic's data handling policies state that API inputs and outputs are not used to train models, an important privacy guarantee for enterprise customers handling sensitive data. The default API retention is 30 days for most requests, with longer retention only when required by law or for safety violations (up to 2 years for flagged content). Third-party platforms (Bedrock, Vertex AI, Foundry) provide additional data residency controls through their regional infrastructure and operate under those platforms' compliance frameworks.

Compliance: ZDR, HIPAA, and certifications

Anthropic offers two distinct data handling arrangements for organizations with specific compliance needs.

Zero Data Retention (ZDR)

Under a ZDR arrangement, customer data is not stored at rest after the API response is returned, except where needed to comply with law or combat misuse. ZDR applies to the Messages API and Token Counting API, plus Claude Code when used with commercial API keys or through Claude Enterprise. ZDR is requested through the Anthropic sales team and configured per organization.

ZDR-eligible features include the Messages API, token counting, web search, web fetch, advisor tool, memory tool, context management and editing, fast mode, the 1M context window, adaptive thinking, citations, data residency, effort, extended thinking, PDF support inline, search results, bash and text editor tools, computer use, fine-grained tool streaming, prompt caching, structured outputs (qualified, schemas cached), and tool search.

Not ZDR-eligible include batch processing (29-day retention required for async storage), code execution (container retention up to 30 days), programmatic tool calling, the Files API (files persist until deleted), agent skills, and the MCP connector. Console and Workbench usage, Managed Agents, Claude consumer products, and Claude Teams or Enterprise interfaces are also outside ZDR scope.

Is the Anthropic API HIPAA compliant?

For organizations handling protected health information (PHI), Anthropic offers HIPAA-ready API access with a signed Business Associate Agreement (BAA). This is enforced at the organization level: a HIPAA-enabled organization automatically blocks API requests that include non-eligible features and returns a 400 error specifying which features are restricted. PHI must not appear in JSON schema property names, enum values, const values, or regular expression patterns, since cached schemas do not receive the same PHI protections as message content.

HIPAA readiness applies only to the direct Claude API, not Bedrock or Vertex AI deployments, which have their own HIPAA paths through AWS and Google Cloud.

Other compliance certifications

Anthropic publishes its compliance posture through the Trust Center at trust.anthropic.com. The current set includes SOC 2 Type II, ISO 27001, ISO 27701, ISO 27017, and ISO 27018 certifications, along with HIPAA readiness for the API and for Claude Enterprise. CSA STAR certifications are also available. Customer audit reports and the full HIPAA Implementation Guide are downloadable from the Trust Center.

Reliability and status

The Anthropic status page at status.anthropic.com reports availability across the API, Console, claude.ai, and the Bedrock and Vertex AI integrations. Outages have been infrequent: notable incidents in 2024 and 2025 included intermittent latency during the Claude 3.5 launch surge and a brief regional outage around the Claude 4 release in May 2025. Anthropic publishes post-incident reports for major incidents and exposes a public RSS feed for status updates.

How does the Anthropic API differ from the OpenAI API?

The Anthropic API and the OpenAI API are the two most widely used commercial LLM APIs, with the Gemini API increasingly competitive on specific axes such as multimodality and context length. The following table compares their key characteristics as of May 2026.

Feature	Anthropic API	OpenAI API	Gemini API
Primary endpoint	Messages API (`/v1/messages`)	Chat Completions and Responses	`generateContent` and `streamGenerateContent`
Flagship model	Claude Opus 4.7 ($5/$25 per MTok)	GPT-5.4 (around $2.50/$15)	Gemini 2.5 Pro
Mid-tier model	Claude Sonnet 4.6 ($3/$15)	GPT-4.1 (around $2/$8)	Gemini 2.5 Flash
Budget model	Claude Haiku 4.5 ($1/$5)	GPT-4.1 Nano ($0.10/$0.40)	Gemini 2.5 Flash-Lite
Max context window	1,000,000 tokens (Opus 4.7, Opus 4.6, Sonnet 4.6)	About 1,050,000 (GPT-5.4)	2,000,000 (Gemini 2.5 Pro)
Reasoning mode	Adaptive thinking, per-request	Dedicated o-series models	Thinking mode toggle
Prompt caching	90% discount on hits, 5m and 1h tiers	50-90% discount	75% discount
Batch API	50% discount	50% discount	50% discount
Tool use	Server-side and client-side tools	Built-in tools and function calling	Function calling and built-in tools
Computer use	Beta on all 4.x models	Native in GPT-5.4	Limited beta
Vision	All current models	GPT-4o and later	All Gemini models
First-party embeddings	None (Voyage AI is the recommended partner)	text-embedding-3 family	text-embedding-004
Native MCP	Yes (connector and clients)	Limited	Limited
Skills runtime	Yes (Skills API)	None	None
SDKs	Python, TypeScript, Java, Go, Ruby, C#, PHP, CLI	Python, Node.js, .NET, Java, Go	Python, Node.js, Go, Java, Dart
Cloud availability	AWS Bedrock, Google Vertex AI, Microsoft Foundry	Microsoft Azure (exclusive through 2030)	Google Cloud Vertex AI
Fine-tuning	Claude 3 Haiku on Bedrock only	GPT-4o, GPT-4o mini, GPT-4.1 mini direct	Gemini 2.5 Flash and earlier
Streaming	SSE	SSE	SSE

The APIs have converged on most surface features. Anthropic leads on multi-cloud distribution, MCP support, and the Skills runtime. OpenAI leads on breadth of product surface (image generation, speech, embeddings, real-time voice) and per-token pricing on flagship models. Gemini leads on raw context window size (2M tokens on 2.5 Pro) and tightly integrated multimodal output. Per-million pricing has converged within a factor of two across the three providers for comparable capabilities, so applications now tend to pick by feature priority rather than headline price.

Current state (May 2026)

The Anthropic API has matured into a full-featured development platform competing directly with the OpenAI API across enterprise and developer markets. The release of Claude Opus 4.6 and Sonnet 4.6 in February 2026 brought the full 1M token context window to standard pricing (previously, long context required beta access and premium rates), making Anthropic the first major provider to offer 1M context without surcharges on its latest models ^[2]. The April 2026 release of Claude Opus 4.7 added a new tokenizer and large gains on agentic coding benchmarks.

Several developments define the API's current position:

Fast mode (research preview) for Claude Opus 4.6 provides significantly faster output at 6x standard pricing, targeting latency-sensitive applications. It has not yet rolled out to Opus 4.7.
Stacking discounts: Prompt caching, batch processing, and long context pricing multipliers all stack, enabling compound savings that can reduce effective costs by over 95% in optimal configurations.
Three-cloud availability: With Claude now on AWS Bedrock, Google Vertex AI, and Microsoft Foundry, enterprise customers can access Claude through whichever cloud provider they already use, though Managed Agents remains direct-API only.
Native MCP: The MCP connector lets the Messages API talk directly to remote MCP servers, and the Claude Agent SDK plus Claude Code handle local MCP servers with no extra plumbing.
Skills runtime: The Skills API gives organizations a way to package domain expertise that loads on demand, with progressive disclosure built in.
Managed Agents: Anthropic now offers stateful agent containers as a managed service, billed on session-runtime alongside tokens.
Compliance posture: SOC 2 Type II, ISO 27001/27017/27018/27701, HIPAA readiness, and ZDR cover the main enterprise procurement requirements.
Competitive dynamics: While OpenAI offers a broader product surface (image generation, speech, embeddings, fine-tuning), Anthropic has differentiated through safety focus, consistent API design, multi-cloud distribution, MCP leadership, and competitive pricing on its core text generation capabilities.

The API continues to evolve, with Anthropic maintaining a roughly quarterly release cadence for major model updates and a steady stream of incremental beta features in between. The pattern has held: ship a feature behind a beta header, iterate based on customer feedback, then graduate it to general availability with a stable name. The Messages API itself has proven remarkably durable.

References

TechCrunch. "Anthropic launches Claude, a chatbot to rival OpenAI's ChatGPT." March 14, 2023. https://techcrunch.com/2023/03/14/anthropic-launches-claude-a-chatbot-to-rival-openais-chatgpt/ ↩
Anthropic. "Pricing." Claude API Documentation. https://platform.claude.com/docs/en/about-claude/pricing ↩
VentureBeat. "Google-funded Anthropic introduces Claude, ChatGPT rival through chat and API." March 14, 2023. https://venturebeat.com/ai/google-funded-anthropic-introduces-claude-chatgpt-rival-through-chat-and-api/ ↩
Wikipedia. "Claude (language model)." https://en.wikipedia.org/wiki/Claude_(language_model) ↩
Amazon Web Services. "Claude by Anthropic, Models in Amazon Bedrock." https://aws.amazon.com/bedrock/anthropic/ ↩
Anthropic. "Introducing the next generation of Claude." March 4, 2024. https://www.anthropic.com/news/claude-3-family ↩
Anthropic. "Prompt caching with Claude." August 2024. https://www.anthropic.com/news/prompt-caching ↩
Anthropic. "Computer use tool." Claude API Documentation. https://platform.claude.com/docs/en/agents-and-tools/tool-use/computer-use-tool ↩
Anthropic. "Building with extended thinking." Claude API Documentation. https://platform.claude.com/docs/en/build-with-claude/extended-thinking ↩
Anthropic. "Messages API reference." Claude API Documentation. https://platform.claude.com/docs/en/api/messages/create ↩
Anthropic. "Tool use with Claude." Claude API Documentation. https://platform.claude.com/docs/en/agents-and-tools/tool-use/overview ↩
Anthropic. "Vision." Claude API Documentation. https://platform.claude.com/docs/en/build-with-claude/vision ↩
Anthropic. "Batch processing." Claude API Documentation. https://platform.claude.com/docs/en/build-with-claude/batch-processing ↩
Anthropic. "Streaming Messages." Claude API Documentation. https://platform.claude.com/docs/en/api/messages-streaming ↩
Anthropic. "Client SDKs." Claude API Documentation. https://platform.claude.com/docs/en/api/client-sdks ↩
Google Cloud. "Anthropic's Claude models on Vertex AI." https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/claude ↩
Anthropic. "Rate limits." Claude API Documentation. https://platform.claude.com/docs/en/api/rate-limits ↩
Anthropic. "API and data retention." Claude API Documentation. https://platform.claude.com/docs/en/manage-claude/api-and-data-retention
Anthropic. "Models overview." Claude API Documentation. https://platform.claude.com/docs/en/about-claude/models/overview
Anthropic. "Agent Skills overview." Claude API Documentation. https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview
Anthropic. "Citations." Claude API Documentation. https://platform.claude.com/docs/en/build-with-claude/citations
Anthropic. "Memory tool." Claude API Documentation. https://platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool
Anthropic. "Files API." Claude API Documentation. https://platform.claude.com/docs/en/build-with-claude/files
Anthropic. "MCP connector." Claude API Documentation. https://platform.claude.com/docs/en/agents-and-tools/mcp-connector
Anthropic. "Introducing the Model Context Protocol." November 2024. https://www.anthropic.com/news/model-context-protocol ↩
AWS Machine Learning Blog. "Fine-tuning for Anthropic's Claude 3 Haiku in Amazon Bedrock is now generally available." November 2024. https://aws.amazon.com/blogs/aws/fine-tuning-for-anthropics-claude-3-haiku-model-in-amazon-bedrock-is-now-generally-available/
Anthropic Trust Center. https://trust.anthropic.com/resources
Anthropic. "Claude 2." July 11, 2023. https://www.anthropic.com/news/claude-2 ↩
Anthropic. "Introducing Claude Opus 4.7." April 16, 2026. https://www.anthropic.com/news/claude-opus-4-7 ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

3 revisions by 1 contributors · full history

Suggest edit