The Anthropic API is the primary interface for developers to access Anthropic's Claude family of large language models. Launched on March 14, 2023, alongside the original Claude and Claude Instant models, the API provides a Messages-based HTTP interface for building conversational AI applications, AI agents, and automated workflows [1]. The API is also available through third-party cloud platforms including Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry.
As of March 2026, the Anthropic API serves three model tiers: Claude Opus (highest capability), Claude Sonnet (balanced performance and cost), and Claude Haiku (fastest and most affordable). The latest generation, the Claude 4.6 series, offers up to one million tokens of context at standard pricing, extended thinking for complex reasoning, and a suite of developer features including tool use, prompt caching, batch processing, computer use, and vision [2].
Anthropic released its API on March 14, 2023, making two models available: Claude (the more capable variant) and Claude Instant (a faster, lower-cost derivative) [1][3]. Access was initially limited to selected partners, including Notion, Quora (through its Poe chatbot), DuckDuckGo, and Robin AI. The API launch came approximately four months after ChatGPT's release and the same day OpenAI released GPT-4, placing Anthropic squarely in competition with OpenAI from the start.
The original API used a text completions format with a \n\nHuman: and \n\nAssistant: turn structure, distinct from OpenAI's message-based approach.
Claude 2, released in July 2023, became the first Anthropic model available to the general public without an application process [4]. It introduced a 100K token context window, a major differentiator at a time when most competing models topped out at 4K-32K tokens. Claude 2 also showed significant improvements on academic benchmarks, scoring above the 90th percentile on the bar exam.
In September 2023, Amazon announced a partnership with Anthropic, initially investing $1.25 billion (later expanded to $4 billion total), making Claude available on Amazon Bedrock [5]. The following month, Google invested $500 million in Anthropic (with commitments for $1.5 billion more over time), and Claude became available on Google Cloud Vertex AI. These partnerships gave enterprise customers the option of accessing Claude through their existing cloud infrastructure, with data residency guarantees and unified billing.
The Claude 3 release on March 4, 2024, introduced the three-tier model structure that Anthropic continues to use: Opus (most capable), Sonnet (balanced), and Haiku (fastest) [6]. Claude 3 Opus set new benchmarks in reasoning and analysis, while Haiku offered exceptionally fast responses at a fraction of the cost. This release also brought native vision capabilities to all tiers, allowing the API to process images alongside text.
Throughout 2024 and 2025, Anthropic expanded the API's feature set substantially:
The following table shows current pricing for all Claude models as of March 2026 [2].
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Context Window | Best For |
|---|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $25.00 | 1,000,000 | Complex analysis, research, multi-step reasoning |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 1,000,000 | Balanced performance, coding, general tasks |
| Claude Opus 4.5 | $5.00 | $25.00 | 200,000 | High-capability reasoning |
| Claude Sonnet 4.5 | $3.00 | $15.00 | 200,000 (1M beta) | Production workloads, coding |
| Claude Haiku 4.5 | $1.00 | $5.00 | 200,000 | Speed-critical tasks, classification, extraction |
| Claude Haiku 3.5 | $0.80 | $4.00 | 200,000 | Budget-friendly fast responses |
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Status |
|---|---|---|---|
| Claude Opus 4.1 | $15.00 | $75.00 | Active |
| Claude Opus 4 | $15.00 | $75.00 | Active |
| Claude Sonnet 4 | $3.00 | $15.00 | Active |
| Claude Opus 3 | $15.00 | $75.00 | Deprecated |
| Claude Haiku 3 | $0.25 | $1.25 | Active |
The Claude 4.5/4.6 generation represents a dramatic price reduction compared to earlier Opus models. Claude Opus 4.6 costs $5/$25 per million tokens, compared to $15/$75 for Claude Opus 4.1 and Opus 4, a 67% reduction in input cost and the same reduction in output cost.
The Messages API (POST /v1/messages) is the primary endpoint for all Claude interactions [10]. It uses a structured format with:
A basic request sends a system prompt and user message, receiving back an assistant response with metadata including token usage counts. The API returns a structured JSON response containing the model's output, stop reason, and detailed usage information.
Claude's tool use capabilities allow developers to extend the model's functionality by defining external tools that Claude can invoke [11]. The developer describes available tools with names, descriptions, and input schemas; Claude decides when to use them and generates structured JSON arguments. The API then returns a tool_use content block, and the developer executes the tool and sends results back as a tool_result message.
Tool use supports several modes:
| Tool Choice | Behavior |
|---|---|
auto | Claude decides whether to use tools |
any | Claude must use at least one tool |
tool (specific name) | Claude must use the specified tool |
none | Tools are disabled for this request |
Anthropic also provides server-side tools that run on Anthropic's infrastructure:
web_search_20260209): Searches the web and returns results ($10 per 1,000 searches)web_fetch_20260209): Fetches and processes web page content (no additional charge)text_editor_20250429): Edits files with view, create, and replace operationsAll current Claude models support vision, accepting images in requests alongside text [12]. Images can be provided as base64-encoded data or URLs. Claude can analyze photographs, charts, diagrams, screenshots, documents, and other visual content. Vision is priced based on image dimensions, with each image consuming tokens proportional to its resolution.
Extended thinking is a feature that allows Claude to engage in deeper, step-by-step reasoning before producing a response [9]. When enabled, Claude generates internal "thinking" tokens that work through the problem systematically, similar to the approach used by OpenAI's o-series reasoning models. Extended thinking is available on Claude Opus 4.5, Sonnet 4.5, Haiku 4.5, and all 4.6 models.
Extended thinking tokens are billed at the standard output token rate for each model. For complex tasks, thinking can take several minutes, and the thinking tokens can significantly exceed the visible output. Sonnet 4.6 is notable for being a hybrid model: developers can toggle extended thinking on or off per request, choosing between faster standard responses and deeper reasoning as needed.
Prompt caching reduces costs and latency by reusing previously processed portions of prompts across API calls [7]. Instead of reprocessing the same system prompt, document, or conversation prefix on every request, the API reads from cache at a fraction of the standard input price.
There are two cache durations with different write costs:
| Cache Operation | Cost Multiplier | Duration |
|---|---|---|
| 5-minute cache write | 1.25x base input price | 5 minutes |
| 1-hour cache write | 2.0x base input price | 1 hour |
| Cache read (hit) | 0.1x base input price | Same as write duration |
A cache hit costs just 10% of the standard input price, representing a 90% cost reduction on cached content. For the 5-minute cache, a single cache read after the initial write already saves money (1.25x write + 0.1x read = 1.35x total, versus 2.0x for two uncached requests). Caching is especially valuable for applications with long, stable system prompts or repeated document analysis.
Prompt caching can be combined with other discounts. The Batch API's 50% discount stacks with caching, and long context pricing multipliers also stack, enabling substantial compound savings.
The Batch API (/v1/messages/batches) processes large volumes of requests asynchronously at a 50% discount on all token costs [13].
| Model | Batch Input (per 1M tokens) | Batch Output (per 1M tokens) |
|---|---|---|
| Claude Opus 4.6 | $2.50 | $12.50 |
| Claude Sonnet 4.6 | $1.50 | $7.50 |
| Claude Haiku 4.5 | $0.50 | $2.50 |
Batch requests are submitted as collections of individual message requests, each with a custom ID for tracking. Results are delivered asynchronously, typically within hours. The 50% discount combined with prompt caching can yield total savings exceeding 90% compared to standard synchronous pricing.
The computer use tool, introduced in beta in October 2024, enables Claude to interact with desktop environments by viewing screenshots and performing mouse and keyboard actions [8]. When given access to a computer environment (typically via a Docker container or virtual machine), Claude can:
Computer use follows standard tool use pricing, with additional token consumption from screenshot images. The feature requires the anthropic-beta: computer-use-2025-11-24 header.
System prompts in the Anthropic API provide instructions and context that shape Claude's behavior throughout a conversation [10]. Unlike user messages, system prompts are processed once and set the foundation for all subsequent interactions. Common uses include:
System prompts are particularly effective when combined with prompt caching. A long system prompt containing reference documentation can be cached, reducing costs by 90% on subsequent requests while maintaining consistent behavior.
The API supports server-sent events (SSE) streaming, delivering response tokens incrementally as they are generated [14]. Streaming is enabled by setting "stream": true in the request body. The event stream includes several event types:
message_start: Contains initial message metadatacontent_block_start: Signals the beginning of a text or tool use blockcontent_block_delta: Delivers incremental text or JSON fragmentscontent_block_stop: Signals the end of a content blockmessage_stop: Indicates the complete message has been deliveredThe Python, TypeScript, and PHP SDKs provide high-level streaming helpers that handle event parsing and offer both synchronous and asynchronous interfaces.
Anthropic provides official SDKs for two languages [15]:
| SDK | Language | Package | Features |
|---|---|---|---|
| anthropic-sdk-python | Python 3.7+ | anthropic (pip) | Sync/async, streaming, Pydantic models, auto-retries |
| anthropic-sdk-typescript | TypeScript 4.5+ | @anthropic-ai/sdk (npm) | Type-safe, streaming, auto-retries, ESM/CJS |
Both SDKs automatically detect the ANTHROPIC_API_KEY environment variable for authentication. They handle request formatting, error handling, retry logic with exponential backoff, and provide typed interfaces for all API features. A PHP SDK (community-supported) is also available.
Basic usage in Python is straightforward:
from anthropic import Anthropic
client = Anthropic()
message = client.messages.create(
model="claude-sonnet-4-6-20260301",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello, Claude"}]
)
Claude models are available on major cloud platforms, providing enterprise customers with additional deployment options [5][16].
All current Claude models are available on Amazon Bedrock, AWS's managed AI service. Bedrock integration provides:
Starting with Claude 4.5 models, Bedrock offers two endpoint types: global endpoints (dynamic routing for maximum availability) and regional endpoints (data guaranteed within specific regions, at a 10% premium).
Claude is also available through Google Cloud's Vertex AI platform, offering:
As of 2026, Claude models are additionally available through Microsoft Foundry, expanding the platform availability to all three major cloud providers [2].
Anthropic enforces rate limits at the organization level, measured across three dimensions: requests per minute (RPM), input tokens per minute (ITPM), and output tokens per minute (OTPM) [17].
| Tier | Monthly Spend Cap | Qualification |
|---|---|---|
| Tier 1 (Build) | $100 | Default for new accounts |
| Tier 2 (Build) | $500 | Deposit or accumulated spend |
| Tier 3 (Scale) | $1,000 | Higher deposit or spend history |
| Tier 4 (Scale) | $5,000 | Established usage patterns |
| Enterprise | Custom | Contact sales for negotiated terms |
Rate limits increase with each tier, and organizations advance automatically as they reach spending thresholds. For applications requiring guaranteed throughput, Anthropic offers a Priority Tier with committed spend and enhanced service levels.
The Anthropic API and OpenAI API are the two most widely used commercial LLM APIs. The following table compares their key characteristics as of March 2026 [2][18].
| Feature | Anthropic API | OpenAI API |
|---|---|---|
| Primary endpoint | Messages API (/v1/messages) | Chat Completions (/v1/chat/completions) and Responses (/v1/responses) |
| Flagship model | Claude Opus 4.6 ($5/$25 per MTok) | GPT-5.4 ($2.50/$15 per MTok) |
| Mid-tier model | Claude Sonnet 4.6 ($3/$15 per MTok) | GPT-4.1 ($2/$8 per MTok) |
| Budget model | Claude Haiku 4.5 ($1/$5 per MTok) | GPT-4.1 Nano ($0.10/$0.40 per MTok) |
| Max context window | 1,000,000 tokens (Opus/Sonnet 4.6) | 1,050,000 tokens (GPT-5.4) |
| Reasoning mode | Extended thinking (toggle per request) | o-series dedicated models |
| Prompt caching | 90% discount on cache hits | 50-90% discount (varies by model) |
| Batch API | 50% discount | 50% discount |
| Tool use | Server-side + client-side tools | Built-in tools + function calling |
| Computer use | Beta (all 4.x models) | Native in GPT-5.4 |
| Vision | All current models | GPT-4o and later |
| SDKs | Python, TypeScript | Python, Node.js, .NET |
| Cloud availability | AWS Bedrock, Google Vertex AI, Microsoft Foundry | Microsoft Azure (exclusive through 2030) |
| Fine-tuning | Not currently available | GPT-4o, GPT-4o mini, GPT-4.1 mini |
| Streaming | SSE | SSE |
The two APIs have converged significantly in terms of feature parity, with both supporting tool use, vision, streaming, batch processing, and prompt caching. Key differentiators include Anthropic's per-request extended thinking toggle (versus OpenAI's separate reasoning model family), Anthropic's multi-cloud availability (versus OpenAI's Azure exclusivity), and OpenAI's broader model range including image generation, speech, and embeddings.
Starting with Claude Opus 4.6, the API supports a inference_geo parameter for specifying US-only inference, incurring a 1.1x multiplier on all token pricing [2]. By default, the API uses global routing at standard pricing.
Anthropic's data handling policies state that API inputs and outputs are not used to train models, providing a privacy guarantee important for enterprise customers handling sensitive data. Third-party platforms (Bedrock, Vertex AI) provide additional data residency controls through their respective regional infrastructure.
As of March 2026, the Anthropic API has matured into a full-featured development platform competing directly with the OpenAI API across enterprise and developer markets. The release of Claude Opus 4.6 and Sonnet 4.6 brought the full one million token context window to standard pricing (previously, long context required beta access and premium rates), making Anthropic the first major provider to offer 1M context without surcharges on its latest models [2].
Several developments define the API's current position:
The API continues to evolve, with Anthropic maintaining a roughly quarterly release cadence for major model updates.