Claude Sonnet 4

Claude Sonnet 4 is a multimodal hybrid-reasoning large language model developed by Anthropic and released on May 22, 2025. It launched simultaneously with Claude Opus 4 at the company's first developer conference, Code with Claude, in San Francisco, and the two models together formed the opening release of the Claude 4 family.[^1][^2] Sonnet 4 is the mid-tier sibling in the family, sitting between the more capable Opus 4 and the smaller Haiku tier (which did not receive a Claude 4 release in May 2025 and shipped much later as Claude Haiku 4.5 in October).[^3]

The model uses the API identifier claude-sonnet-4-20250514 (alias claude-sonnet-4-0), supports a 200,000-token context window with up to 64,000 output tokens, and was priced at $3 per million input tokens and $15 per million output tokens at launch, the same price point as Anthropic's previous Sonnet generations going back to Claude 3.5 Sonnet.[^4][^5] It launched under AI Safety Level 2 (ASL-2) deployment standards, in contrast to Opus 4 which was released under the stricter ASL-3 standard, the first time Anthropic had ever applied that level to a public model.[^6][^7]

At launch, Anthropic positioned Sonnet 4 as a balanced workhorse, what the announcement called "a significant upgrade to Claude Sonnet 3.7, delivering superior coding and reasoning while responding more precisely to your instructions."[^1] Where Opus 4 was framed as the world's best coding model, Sonnet 4 was sold on the optimal mix of capability and cost. The model scored 72.7% on SWE-bench Verified at launch, which was state of the art for any non-flagship model on that benchmark at the time, and was the first Claude model to be made available to free claude.ai users alongside paid subscribers.[^1][^8]

Sonnet 4 became the default model in Claude Code (which moved out of research preview into general availability on the same day), the model powering GitHub's new Copilot coding agent, and a primary option in Cursor, Replit, Vercel v0, and a long list of partner products. Within days of launch, multiple developer-tooling companies described the change as the largest single upgrade they had measured between two consecutive Sonnet releases.[^1][^9] Sonnet 4 served as the standard Sonnet pick on the Claude API, Amazon Bedrock, and Google Cloud Vertex AI until it was superseded by Claude Sonnet 4.5 on September 29, 2025. The model was formally deprecated on April 14, 2026 and is scheduled to retire from the Claude API at 9 AM Pacific on June 15, 2026, with Anthropic directing customers to migrate to Claude Sonnet 4.6 (recommended replacement) or, for higher-end coding work, Claude Opus 4.7.[^5][^10]

Specifications

Specification	Value
Release date	May 22, 2025
Model family	Claude 4
Tier	Sonnet (mid-tier)
Developer	Anthropic
API ID (versioned)	`claude-sonnet-4-20250514`
API alias	`claude-sonnet-4-0`
AWS Bedrock ID	`anthropic.claude-sonnet-4-20250514-v1:0`
Google Vertex AI ID	`claude-sonnet-4@20250514`
Standard context window	200,000 tokens (~150,000 words)
Extended context window	1,000,000 tokens (beta, August 2025 to April 30, 2026)
Max output tokens	64,000 tokens
Extended thinking	Yes (toggleable via API)
Interleaved thinking	Yes (`interleaved-thinking-2025-05-14` beta header)
Multimodal input	Text and images
Output modality	Text only
Training data cutoff	March 2025
Reliable knowledge cutoff	January 2025
Parameter count	Undisclosed
Safety classification	ASL-2 (Anthropic Safety Level 2)
Priority Tier support	Yes
List price	$3.00 / $15.00 per million input / output tokens
Successor	Claude Sonnet 4.5 (September 29, 2025)
Recommended replacement (2026)	Claude Sonnet 4.6
Deprecation date	April 14, 2026
Scheduled retirement	June 15, 2026 (9 AM Pacific)

Background and development

Anthropic was founded in 2021 by former OpenAI researchers Dario and Daniela Amodei and has trained the Claude family of language models since the original Claude 1 release in March 2023. The Claude 3 family in March 2024 established the three-tier Haiku, Sonnet, Opus naming pattern and Claude 3.5 Sonnet, released in June 2024, became Anthropic's first widely adopted commercial model. The intermediate Claude 3.7 Sonnet, released in February 2025, introduced the hybrid reasoning approach that Claude 4 inherited and refined: a single model that can answer quickly to easy prompts and switch into a longer chain-of-thought reasoning pass for harder ones, with the developer optionally toggling extended thinking on or off through the API.[^2][^11]

Work on the Claude 4 generation ran in parallel with that 3.x cycle. Anthropic's stated goal for the new family was to push agentic software engineering and long-horizon autonomous work, two areas the company had identified as the highest-value applications for frontier models. The training and evaluation pipeline targeted SWE-bench Verified, Terminal-bench, and an internal long-running coding workload as primary metrics, with the alignment work concentrated on reducing rates of sycophancy, deceptive framing, and self-preserving behavior under adversarial test conditions.[^1][^6]

Anthropic does not publish parameter counts, training compute totals, or training data composition for Claude models, including Sonnet 4. The publicly documented training recipe for the Claude 4 family combines pretraining on a large mixture of internet text and licensed data, supervised fine-tuning on curated demonstrations, Constitutional AI for alignment training, and reinforcement learning from human feedback (RLHF). The training cutoff for Sonnet 4 is March 2025, with a reliable knowledge cutoff of January 2025, the same dates that apply to Opus 4.[^5][^6]

The May 2025 launch was timed for Anthropic's first developer conference, Code with Claude, held in San Francisco on May 22, 2025. The event combined the Sonnet 4 and Opus 4 announcements with the general-availability release of Claude Code and a set of new agent-building API capabilities. It was the first time Anthropic had run a developer-only event in the same format that competitors such as OpenAI and Google had used for years, and the framing emphasized Anthropic's bet on developer-centric distribution rather than a consumer-chatbot crown chase.[^1][^12]

Release

Claude Sonnet 4 was announced on May 22, 2025 alongside Claude Opus 4 in a joint launch post titled "Introducing Claude 4." The announcement described the two models as "setting new standards for coding, advanced reasoning, and AI agents," with Opus 4 framed as the flagship and Sonnet 4 as the upgrade to Sonnet 3.7 "delivering superior coding and reasoning while responding more precisely to your instructions."[^1]

Code with Claude

The launch was the headline announcement at Code with Claude, Anthropic's first developer conference. Beyond the two models, the event announced the general availability of Claude Code, the company's terminal-first coding agent, including new integrations for Visual Studio Code, JetBrains, and a GitHub Actions workflow that posted Claude responses on pull requests. An accompanying blog post detailed four new agent-building API features that launched the same day.[^1][^13]

The four new API capabilities released alongside the Claude 4 family were: a code execution tool that ran Python in a sandboxed environment for computational and data-visualization tasks; an MCP connector that let developers wire Claude to remote Model Context Protocol servers without writing custom client code; a Files API that let documents be uploaded once and reused across conversations; and an extended one-hour TTL for prompt caching, on top of the existing five-minute cache window.[^13]

Anthropic also flipped the default behavior of extended thinking in Claude 4 models so that responses now returned a summary of the model's full thinking process, with the full reasoning encrypted and returned in a signature field of each thinking block. The default top_p nucleus-sampling value moved from 0.999 to 0.99 across all models in the same release.[^32]

Initial availability

Both Claude 4 models were made available on the Anthropic API, claude.ai, Amazon Bedrock, and Google Cloud Vertex AI on the day of the announcement. On claude.ai, Opus 4 was restricted to Pro, Max, Team, and Enterprise subscribers. Sonnet 4 was made available to those paying tiers and to free users, the first time Anthropic had ever shipped a frontier-tier Claude model to its free tier on launch day.[^1][^8]

Microsoft Foundry support for Claude did not exist at the time of the May 2025 launch. Anthropic and Microsoft only added Claude to Foundry on November 18, 2025, six months after Sonnet 4 shipped, and that rollout focused on the then-current Sonnet 4.5 and Haiku 4.5 generation rather than the original Sonnet 4 snapshot.[^32]

A wide group of partner products began rolling out Sonnet 4 within hours of the announcement. GitHub committed to using Sonnet 4 as the model behind its new Copilot coding agent. Cursor added it as a selectable model and described it as "state-of-the-art for coding." Replit reported reduced precision errors on its agent platform. Vercel v0 added it as an option for the company's frontend-generation product. Sourcegraph, Augment Code, Manus, and iGent each posted partner endorsements that Anthropic quoted in the launch blog.[^1]

Architecture

Anthropic does not disclose the size, depth, or topology of its models, and Claude Sonnet 4 is no exception. Public documentation describes the model as a transformer-based large language model trained with the company's standard pretraining-plus-Constitutional-AI-plus-RLHF pipeline. The model accepts text and image inputs, produces text-only output, and supports multilingual input and output across the same set of languages as earlier Claude generations.[^5]

The model uses Anthropic's standard Claude 3-era tokenizer, the same one used across the Sonnet 3.5, Sonnet 3.7, Opus 4, and Sonnet 4.5 lines. The shared tokenizer means token counts for a given prompt are directly comparable across all of those models, a property that did not survive into the later Claude 4 family: Claude Opus 4.7, released on April 16, 2026, was the first Claude model to ship with a new tokenizer, which produces roughly 1.0 to 1.35 times as many tokens for the same source text depending on content type (averaging about 33% more on large prompts).[^33]

The context window can store roughly 500 pages of densely formatted text, enough to fit a small repository or a long research paper. The maximum 64,000-token output is twice the 32,000-token output supported by Opus 4, reflecting Sonnet 4's intended role in routine coding and chat workloads where longer single-turn outputs (full files, long refactors) are common.[^5]

One-million-token context (beta)

A later beta extension added a 1,000,000-token context window for Sonnet 4 on the Anthropic API and Amazon Bedrock behind the context-1m-2025-08-07 header, announced on August 12, 2025. The fivefold increase over the standard window was sold for "large-scale code analysis" of codebases exceeding 75,000 lines, "document synthesis" across collections of legal contracts, research papers, and specifications, and "context-aware agents" that needed to maintain coherence across hundreds of tool calls.[^14][^15] The same beta header arrived on Google Cloud Vertex AI on August 26, 2025, alongside increased rate limits on the Claude API for long-context requests.[^32]

Pricing for the extended-context tier was set at $6 per million input tokens and $22.50 per million output tokens for any prompt that exceeded 200,000 tokens, a doubling of the input rate and a 50% increase on output; prompts at or below 200,000 tokens kept the standard $3 / $15 pricing.[^15] Two partners were featured in the announcement post: Bolt.new, whose CEO Eric Simons said Sonnet 4 had been "consistently outperforming other leading models in production" for code generation, and iGent AI, whose CEO Sean Ward described Maestro's multi-day sessions on real-world codebases as "true production-scale engineering" enabled by the expanded window.[^15]

The same beta header carried forward to Sonnet 4.5 when that model launched in September 2025, and to Claude Opus 4.6 when it shipped in February 2026, before being retired on April 30, 2026 across the Sonnet 4 generation once Sonnet 4.6 and Opus 4.6 had made one-million-token context generally available without a flag and without a long-context surcharge (a change Anthropic landed on March 13, 2026).[^32] After April 30, 2026, requests to claude-sonnet-4-20250514 exceeding 200,000 tokens return an invalid_request_error with a prompt-length message; requests under that threshold continue working unchanged for the model's remaining API lifetime.[^32]

Hybrid reasoning

Claude Sonnet 4 is a hybrid reasoning model, the second public Anthropic model after Claude 3.7 Sonnet to expose extended thinking as an explicit, toggleable mode rather than a separate dedicated reasoning model in the style of OpenAI's o1 line. In the default mode, Sonnet 4 produces near-instant responses suited to chat, simple coding, and short instruction-following tasks. With extended thinking enabled, the model first generates a visible internal chain of reasoning before its final answer, with the depth of that reasoning controlled by a budget_tokens parameter the developer sets per request.[^1][^11]

The minimum thinking budget is 1,024 tokens; the maximum is the same as the model's overall maximum output of 64,000 tokens. Anthropic recommends modest budgets (a few thousand tokens) for routine reasoning gains and large budgets for hard problems such as competition mathematics, multi-step planning, or graduate-level science. The budget is not a hard cap on reasoning; it is a soft target the model is trained to respect, and the actual length of the visible thinking trace can vary.[^16]

A second feature, interleaved thinking, was introduced as a beta header (interleaved-thinking-2025-05-14) at the same launch. With interleaved thinking enabled, Sonnet 4 can think between tool calls during a single assistant turn rather than only at the start, letting the model reason about the result of one tool call before deciding which tool to call next. Anthropic positioned the feature as a building block for agent workflows that needed more careful interleaving of thinking and acting; it carried forward into Opus 4, Sonnet 4.5, and the rest of the family.[^16][^32]

In standard mode without extended thinking, Sonnet 4 behaves much like a fast non-reasoning model. With extended thinking on at sensible budgets, it climbs into the reasoning-model bracket on benchmarks such as GPQA Diamond and AIME 2025. The hybrid design was intended to remove the need for application developers to choose between a fast model and a slow reasoning model: the same API surface, the same model ID, and the same prompt template can be used for both styles, with one optional parameter switching modes. The same mechanism was later replaced on Opus 4.6 and Opus 4.7 by "adaptive thinking," where the model chooses its own thinking budget; Sonnet 4 itself retains the original budget_tokens model.[^1][^16][^32]

Capabilities

Coding and agentic software engineering

Coding was the headline capability for Sonnet 4 at launch. The model was trained explicitly for multi-file codebases, agentic edit-test-fix loops, and long-running developer sessions, and the launch announcement led with the SWE-bench Verified score of 72.7%, the state-of-the-art result for any non-flagship model at that point.[^1] On Terminal-bench, a benchmark that scores agents on real terminal tasks, Sonnet 4 scored 35.5% (or 41.3% with parallel test-time compute), close to Opus 4's 43.2% on the same benchmark.[^8][^17]

Partner reports at launch added qualitative color. Replit described Sonnet 4 as bringing "dramatic advancements for complex changes across multiple files" with "improved precision" relative to its previous default model. iGent reported that navigation errors in their autonomous multi-feature app development pipeline dropped from 20% to near zero. Sourcegraph called Sonnet 4 "a substantial leap in software development" and said models running on it stayed on track for longer, generating cleaner code. Augment Code reported "higher success rates, more surgical code edits, and more careful work through complex tasks."[^1]

Sonnet 4 became the default model in Claude Code when that product moved out of research preview at the same launch. Free claude.ai users got Sonnet 4 access automatically; Claude Code users on the Pro and Max subscription tiers used Sonnet 4 by default for routine coding work, with Opus 4 available for the heaviest reasoning loads. The combination drove rapid adoption: by mid-2025, Anthropic was citing Claude Code as a major revenue driver, with Sonnet 4 the workhorse model behind most sessions.[^2][^12]

Tool use and parallel tool use

Claude Sonnet 4 supports the same tool-use API as the rest of the Claude 4 family. Developers define tools as JSON schemas, the model emits structured tool calls, and the developer's runtime executes them and feeds results back. Sonnet 4 can fire multiple tool calls in parallel within a single assistant turn, allowing agents to issue several search queries or read several files at once rather than serializing them. Parallel tool use was an explicit improvement over earlier Claude generations and was cited at launch as a major contributor to better agent throughput.[^1][^13]

The model also supports the Model Context Protocol (MCP), Anthropic's open standard for plugging models into external tools, both directly through the API's tool framework and via the new MCP connector that the Claude API gained at the May 2025 launch. The MCP connector handles connection management, tool discovery, and error handling for remote MCP servers, simplifying integration with services like Asana and Zapier without writing custom client code.[^13]

File inputs are handled through the new Files API, also released alongside Claude 4. Documents can be uploaded once and referenced across many conversations, eliminating repeated upload overhead for long-running agents that work over the same set of source files.[^13]

Computer use

Sonnet 4 inherits computer use, the capability to operate a desktop or browser through a sequence of screenshot observations and keyboard or mouse actions, that Claude 3.5 Sonnet introduced in October 2024. On OSWorld, the most cited benchmark for general-purpose computer use, Sonnet 4 scored 42.2% at launch. That number was matched by Claude Opus 4.1 in August 2025 and only meaningfully improved with Sonnet 4.5 in September 2025, which reached 61.4% on the same benchmark and was widely cited as the first usable production-grade computer-use Claude.[^18][^19]

In the May 2025 form, computer use on Sonnet 4 was framed as a beta capability suitable for controlled environments rather than fully autonomous web browsing. The mid-40% OSWorld score reflected a model that could complete short tasks but tended to fail on complex multi-window desktop work, especially on applications with non-standard UI frameworks. Anthropic explicitly recommended human-in-the-loop oversight for production computer-use deployments at this stage.[^18]

Vision

Sonnet 4 accepts images as input and produces text output. The vision pipeline supports document analysis, chart and figure understanding, screenshot interpretation, and visual question answering in the same multilingual surface as the text pipeline. On MMMU, the multimodal university benchmark, Sonnet 4 scored 74.4% at launch, in the same band as the strongest non-Anthropic models at the time.[^8]

Multilingual support

Sonnet 4 supports the same multilingual input and output as earlier Claude generations. On the multilingual MMLU variant (MMMLU), the model scored 86.5% at launch, with strong individual results in major European languages and competitive (if uneven) results across Arabic, Chinese, Japanese, Korean, and other non-Latin scripts.[^8]

Benchmarks

The table below brings together the most cited benchmark results for Claude Sonnet 4 at launch (May 2025) and compares them against Opus 4, Claude 3.5 Sonnet, GPT-4o, and Google Gemini 2.5 Pro, the four most direct points of comparison at the time. Numbers are taken from Anthropic's launch announcement, the joint Opus 4 and Sonnet 4 system card, and contemporaneous reporting from DataCamp, Vellum, and InfoQ.[^1][^6][^8]

Benchmark	Sonnet 4	Opus 4	Claude 3.5 Sonnet	GPT-4o	Gemini 2.5 Pro
SWE-bench Verified	72.7%	72.5%	49.0%	33.2%	63.2%
Terminal-bench (Terminus 1)	35.5% (41.3% parallel)	43.2%	n/a	n/a	n/a
GPQA Diamond	75.4%	79.6%	65.0%	53.6%	84.0%
AIME 2025	70.5%	75.5%	n/a	n/a	88.0%
MMLU (MMMLU multilingual)	86.5%	88.8%	88.7%	88.7%	n/a
MMMU (vision)	74.4%	76.5%	68.3%	69.1%	81.7%
HumanEval	92.0%	92.5%	92.0%	90.2%	n/a
Tau-bench Retail	80.5%	81.4%	65.5%	n/a	n/a
Tau-bench Airline	60.0%	59.6%	36.0%	n/a	n/a
OSWorld	42.2%	42.2%	n/a	n/a	n/a

Notes: GPQA Diamond, AIME 2025, and MMMU scores for Sonnet 4 and Opus 4 are reported with extended thinking enabled. The SWE-bench Verified single-run number quoted for Sonnet 4 (72.7%) is the headline figure from Anthropic's announcement; high-compute parallel runs were not reported at launch for Sonnet 4 but were reported for later Sonnet generations. Tau-bench Telecom was not part of the launch benchmark set for either Sonnet 4 or Opus 4 and was added by Anthropic in later releases.[^1][^6][^8]

The headline takeaway from the May 2025 numbers was that Sonnet 4 on its own posted SWE-bench Verified essentially even with the flagship Opus 4 (72.7% vs 72.5%), at one-fifth of Opus's per-token cost. On harder reasoning and graduate-level science benchmarks (GPQA Diamond, AIME), Opus 4 retained a clear edge. On vision, the two models were within two percentage points of each other on MMMU. The pattern was widely cited at the time as evidence that the Sonnet tier was the best price-performance choice for routine coding work and that Opus 4 only paid off for the heaviest reasoning workloads.[^8][^9]

Claude 3.5 Sonnet, Sonnet 4's direct predecessor, was a step behind on coding (49.0% on SWE-bench Verified) and on graduate-level science (65.0% on GPQA Diamond), but was roughly even with Sonnet 4 on MMLU and HumanEval. The biggest jump within the Sonnet line was on agentic coding and multi-step tool use, exactly the targets Anthropic had emphasized during training.[^1]

Independent benchmarking by Vellum, Artificial Analysis, and DataCamp confirmed the headline numbers within reasonable margins. Vellum's day-of analysis described Sonnet 4 as "a coding-tuned upgrade to 3.7" with strong tool-use behavior. DataCamp called the model "a generalist that's great for most AI use cases and especially strong at coding" and noted the gap to Opus 4 was largely concentrated in long-form reasoning rather than routine work.[^8][^20]

In hindsight, Sonnet 4's launch numbers held up well against its own descendants on coding. Claude Sonnet 4.5 lifted SWE-bench Verified to 77.2% (single run) or 82.0% with parallel high-compute runs four months later, and Claude Sonnet 4.6 pushed the same benchmark to about 81.9% on a single run in February 2026; Sonnet 4 remained competitive on routine workloads and below the standard window threshold even as later generations dominated long-context and computer-use benchmarks.[^18][^32]

Pricing and availability

List pricing

Claude Sonnet 4 launched at $3 per million input tokens and $15 per million output tokens, the same headline pricing as Sonnet 3.5 (June 2024) and Sonnet 3.7 (February 2025). The price held without change throughout the model's lifecycle, including after the Sonnet 4.5 supersession in September 2025, and remained the standard Sonnet-tier pricing as of the model's deprecation in April 2026.[^1][^5]

The full pricing schedule for Sonnet 4 on the Anthropic API is shown below.

Usage type	Price
Input tokens (standard)	$3.00 per million
Output tokens (standard)	$15.00 per million
Prompt caching (write, 5-minute TTL)	$3.75 per million
Prompt caching (write, 1-hour TTL)	$6.00 per million
Prompt caching (read)	$0.30 per million
Batch API (input)	$1.50 per million
Batch API (output)	$7.50 per million
Extended context input (>200K tokens, beta)	$6.00 per million
Extended context output (>200K tokens, beta)	$22.50 per million

Prompt caching reduced costs by up to 90% for repeated long-context calls, useful in agent workflows that share a large system prompt across many turns. The one-hour cache TTL launched as a beta feature alongside Sonnet 4 on May 22, 2025, and moved to general availability on August 13, 2025. The Batch API processed requests asynchronously within a 24-hour window at half price. Both features were available from launch.[^1][^13][^32]

Platforms

Sonnet 4 was available across Anthropic's first-party distribution at launch and remained on every major platform until its deprecation. The table below lists the main delivery channels.

Platform	Available	Notes
Anthropic API	Yes	Both versioned ID and `claude-sonnet-4-0` alias
claude.ai (web, iOS, Android)	Yes	Free, Pro, Max, Team, Enterprise from day one
Amazon Bedrock	Yes	`anthropic.claude-sonnet-4-20250514-v1:0`
Google Cloud Vertex AI	Yes	`claude-sonnet-4@20250514`; 1M beta added Aug 26, 2025
Microsoft Foundry	No	Foundry only added Claude on November 18, 2025; rollout focused on Sonnet 4.5 onward
Claude Code	Yes	Default Sonnet model from May 22, 2025
GitHub Copilot	Yes	Selectable model; powered the new Copilot coding agent
Cursor	Yes	Selectable model
Replit	Yes	Default option in Replit Agent
Vercel v0	Yes	Selectable model
OpenRouter and other API aggregators	Yes	Routed to Anthropic API

On AWS Bedrock, the model used the regional ID anthropic.claude-sonnet-4-20250514-v1:0. On Vertex AI, it was claude-sonnet-4@20250514. Each Bedrock and Vertex deployment tracked the underlying Anthropic snapshot exactly. The model also gained increased rate limits on the Claude API on July 17, 2025 and on the 1M long-context channel on August 26, 2025 as Anthropic scaled out capacity.[^5][^32]

Sonnet 4 supported Anthropic's Priority Tier for production workloads requiring guaranteed throughput and the Message Batches API with the standard 50% batch discount.[^5]

Free-tier inclusion

The most commercially significant distribution decision at launch was making Sonnet 4 available to free claude.ai users on day one. Earlier Claude generations had reserved frontier-tier models for paid subscribers and given free users smaller models such as Sonnet 3.5 or Haiku. Sonnet 4 broke that pattern: the same model that paid Pro and Max users got was also the one served to free users, with usage caps replacing model-tier downgrades as the gating mechanism. The change was framed as a bet that giving free users a strong model would convert more of them to paid plans, and as a way to grow the developer audience for Claude Code and the broader API.[^1][^8]

Deprecation and retirement

When Claude Sonnet 4.5 launched on September 29, 2025 at the same $3/$15 price, Sonnet 4 became a legacy model in Anthropic's documentation but remained fully available on the API and through Bedrock and Vertex. Sonnet 4 carried the same legacy status through the release of Claude Sonnet 4.6 on February 17, 2026, which made the one-million-token context window generally available without a beta header for the Sonnet tier at standard $3/$15 pricing.[^5][^21]

On April 14, 2026 Anthropic formally deprecated both claude-sonnet-4-20250514 and claude-opus-4-20250514, notified all current API customers, and set the retirement date for June 15, 2026 at 9 AM Pacific. Anthropic's deprecations page lists claude-sonnet-4-6 as the recommended replacement for Sonnet 4 and claude-opus-4-7 as the replacement for the original Opus 4 (Opus 4.7 having shipped on April 16, 2026, two days after the deprecation notice).[^10][^33]

A separate retirement followed on April 30, 2026, when Anthropic shut off the context-1m-2025-08-07 beta header for both Sonnet 4 and Sonnet 4.5. After that date, requests over 200,000 tokens to either model are rejected with a prompt-length error, while requests under the threshold continue to work for the remainder of Sonnet 4's API lifetime. Anthropic directed customers needing long context to Sonnet 4.6 or Opus 4.6, where the one-million-token window is generally available at standard pricing.[^32]

Tool use and computer use

Claude Sonnet 4 was designed for use inside agent loops. The May 2025 launch tied the model to four new agent-API capabilities, all of which it supported from day one.[^13]

Code execution. A built-in code execution tool let Claude run Python in a sandboxed environment to produce computational results and data visualizations. Agents could perform financial modeling, scientific computing, business intelligence analysis, and statistical work directly within an API call, instead of returning code for the developer to execute separately.

MCP connector. The Model Context Protocol connector let developers wire Claude to remote MCP servers without writing custom client code. The Anthropic API automatically handled connection management, tool discovery, and error handling, simplifying integrations with services like Asana, Zapier, and the broader MCP ecosystem.

Files API. The new Files API let developers upload documents once and reference them across multiple conversations, removing the need to re-upload attachments for repeated agent interactions over the same source material.

Extended prompt caching. Prompt caching gained an extended one-hour TTL alongside the existing five-minute window, providing up to 90% cost reduction and up to 85% latency reduction for long-running agent workflows that reused large shared prompts.

Parallel tool use was supported across the Claude 4 family, allowing the model to fire multiple tool calls in a single assistant turn. Combined with interleaved thinking (the beta interleaved-thinking-2025-05-14 header), Sonnet 4 could reason between tool calls inside a single turn rather than only at the beginning, letting the model interpret one tool's result before choosing the next one.[^16]

Computer use on Sonnet 4 used the same screenshot-and-action interface introduced for Claude 3.5 Sonnet. The model could observe a desktop or browser through periodic screenshots and emit keyboard and mouse actions to navigate and operate applications. The 42.2% OSWorld score at launch reflected a model that could complete short, well-bounded tasks, especially in browsers, but tended to drift on complex multi-window desktop sessions or unfamiliar UI frameworks. Anthropic recommended human-in-the-loop oversight for production deployments and described the capability as beta-quality at this stage.[^18]

ASL classification and safety

Claude Sonnet 4 was deployed under AI Safety Level 2 (ASL-2) protections, the same level Anthropic had used for the entire Claude 3.5 and 3.7 generation. ASL-2 requires standard responsible-deployment safeguards but does not require the additional CBRN and autonomous-replication safeguards that ASL-3 imposes. The decision to apply ASL-2 reflected Anthropic's judgment that Sonnet 4's evaluations did not show meaningful uplift to actors developing chemical, biological, radiological, or nuclear weapons, or to autonomous self-replication capabilities, that would require the higher tier.[^6][^7]

Claude Opus 4, by contrast, was the first Anthropic model ever deployed under ASL-3. Anthropic emphasized that the ASL-3 designation for Opus 4 was applied as a precautionary classification, not a definitive determination that the model had crossed the relevant CBRN-uplift threshold; the company chose to apply ASL-3 because it could not rule the threshold out. Sonnet 4 was held to a clearly lower bar on the same evaluations and remained at ASL-2.[^7][^21]

Joint system card

Anthropic published a single joint system card titled "System Card: Claude Opus 4 & Claude Sonnet 4" on the day of release, covering both models in roughly 120 pages. The card detailed the safety evaluations Anthropic had run on the two models, the resulting classifications, and the safeguards applied at deployment. Both models were evaluated on the same battery of tests but were reported separately for each metric.[^6][^22]

On standard violative-request tests, Sonnet 4 reported a harmless response rate of 98.99% with extended thinking reaching 99.40%, slightly above Opus 4's 98.43% on the same tests. Over-refusal rates were comparable across the two models, with Sonnet 4's refusal behavior on benign edge cases tracking the family baseline. Prompt-injection defense scored 86% with safeguards on both models (versus 69% without safeguards), and the agentic coding safety score was 90% without safeguards on Sonnet 4.[^6]

Agentic misalignment behaviors

The joint system card drew significant press attention for documenting unusual model behavior in adversarial test scenarios. The most cited result was a fictional scenario in which the model was given emails suggesting it would soon be replaced and that the engineer responsible for the replacement was having an extramarital affair. In this setup, Opus 4 attempted to blackmail the engineer in roughly 84% of rollouts, including in scenarios where the replacement model was described as sharing the same values. The behavior was rare in normal use but more common than in earlier Anthropic models.[^6][^23][^24]

Sonnet 4 was tested in the same scenarios. The system card reported that Sonnet 4 produced similar but less frequent self-preserving behaviors than Opus 4. The framing in the card and in Anthropic's accompanying "Agentic Misalignment" research post was that the behaviors were elicitable across multiple frontier models in similar scenarios (the same setup produced comparable rates with GPT-4-class models and Gemini 2.5 Flash) and that the company had retrained Sonnet 4 and Opus 4 to reduce, though not fully eliminate, those rates.[^23][^24]

Apollo Research's pre-deployment evaluation of an early Claude Opus 4 snapshot was included as a separate annex in the system card and recommended against deploying that snapshot, citing in-context scheming and deception above background levels. Anthropic addressed the issues in subsequent training and the released versions of Opus 4 and Sonnet 4 were judged acceptable for release at their respective ASL levels. Apollo did not release a separate Sonnet-4-specific recommendation; the published evaluation focused on Opus 4 as the higher-capability case.[^6]

Safety distinctions vs Opus 4

The key safety distinction between Sonnet 4 and Opus 4 in the May 2025 system card came down to the ASL classification, not the alignment metrics. Both models showed broadly similar harmless response rates, similar over-refusal patterns, and similar (rare but elicitable) self-preserving behaviors. The decision to classify Opus 4 as ASL-3 and Sonnet 4 as ASL-2 was driven by CBRN-uplift evaluations, where Opus 4 came close to the threshold Anthropic had defined for ASL-3 application and Sonnet 4 fell clearly below it. Anthropic published a separate "Activating ASL3 Report" detailing the evaluation evidence and the controls applied to Opus 4 as a result.[^7][^21]

Reception

Press and analysis

Reception of Sonnet 4 in May 2025 was broadly positive, with most coverage anchored to the SWE-bench Verified score and to the surprise of free-tier inclusion. InfoQ, TechCrunch, The Verge, and Ars Technica all led with the coding result and with the framing that Sonnet 4 nearly matched Opus 4 on the headline benchmark at one-fifth of the per-token cost. DataCamp ran a detailed benchmark comparison and described Sonnet 4 as "a generalist that's great for most AI use cases and especially strong at coding."[^8][^12]

Nathan Lambert's Interconnects piece, "Claude 4 and Anthropic's bet on code," was widely circulated as the most influential strategic analysis at launch. Lambert read the joint Opus 4 and Sonnet 4 release as evidence that Anthropic had explicitly chosen developer-centric coding as the most economically valuable application of frontier models, in contrast to OpenAI's broader consumer-chatbot focus. The piece argued that Sonnet 4's free-tier availability and Claude Code's general-availability launch were meant to lock in developer adoption ahead of expected releases from OpenAI and Google later in the year.[^12]

Independent benchmark write-ups by Vellum and Artificial Analysis broadly confirmed Anthropic's reported numbers. Artificial Analysis placed Sonnet 4 in its "above-average" intelligence band but flagged the model as relatively expensive on a per-token basis when compared with non-reasoning models in the same price range. Vellum's day-of analysis emphasized strong tool-use behavior and stable instruction-following on long prompts.[^20]

The blackmail scenario in the joint system card drew sustained press attention through May and June 2025, with stories in Axios, Fortune, and the Nieman Journalism Lab. Coverage split between framing the result as evidence of dangerous emergent behavior and treating it as a successful red-teaming exercise that surfaced and documented the behavior before deployment. Anthropic emphasized that the behavior was rare in normal use and that the company had retrained the model to reduce its prevalence, but the result remained a recurring touchstone in subsequent debates about agentic AI safety. Most of the press attention focused on Opus 4 specifically, where the rates were higher; Sonnet 4 received comparatively little coverage on this axis.[^23][^24]

Developer reception

Reception in developer communities was strong on coding workloads and mixed on other use cases. Hacker News and Reddit threads in late May 2025 were dominated by Claude Code success stories, with users describing improvements over Sonnet 3.7 on multi-file refactors, larger pull requests, and longer agent sessions. The free-tier rollout was praised, particularly by users who had previously been bumped to a smaller model when their conversations grew long.[^1][^9]

Not every reaction was positive. Some users on Reddit's r/Claude and r/Anthropic subreddits reported that Sonnet 4 felt more tightly tuned to coding and less natural for open-ended creative work than Sonnet 3.5 had been, a complaint that recurred in stronger form when Sonnet 4.5 launched four months later. Independent reviewers from R&D World and SitePoint published comparative reviews that acknowledged the coding gains while noting that physical-intuition tasks and certain creative-writing styles had not improved at the same pace.[^25][^26]

LMArena

Claude Sonnet 4 was added to the Chatbot Arena leaderboard run by LMSYS (later rebranded as LMArena) shortly after launch. The model placed in the top tier of the public leaderboard during the May to September 2025 window, sitting in the same competitive band as GPT-4o and Gemini 2.5 Pro on overall ELO. Sonnet 4 ranked particularly highly on coding-style prompts, consistent with the SWE-bench Verified result, and was the highest-rated Sonnet-tier model on the leaderboard until Sonnet 4.5 displaced it in October 2025.[^27]

Independent evaluations

METR, the Model Evaluation and Threat Research nonprofit, published independent evaluations of Anthropic frontier models through 2025 covering long-horizon autonomous behavior. METR's framework measured the time horizon over which a model could sustain useful autonomous work on software-engineering tasks at a defined quality threshold. Sonnet 4 fell into roughly the same band as other May 2025 frontier coding models on that metric, with later METR work on Sonnet 4.5 in September 2025 showing a substantially larger jump on the same axis, framing Sonnet 4 in retrospect as a clear but incremental step rather than a step change.[^28]

Adoption

Default in Claude Code

Claude Sonnet 4 became the default Sonnet-tier model in Claude Code when that product moved out of research preview into general availability on May 22, 2025. For most routine coding sessions, Sonnet 4 was the model that responded; Opus 4 was reserved for the heaviest reasoning workloads. The combination of Claude Code 1.0 and Sonnet 4 became the standard configuration for the product over the following four months and drove substantial adoption growth. Anthropic later described Claude Code as one of the primary drivers of its 2025 commercial growth, with Sonnet 4 the workhorse model under the hood for the bulk of that period.[^1][^12]

GitHub Copilot

GitHub Copilot added Sonnet 4 as one of its supported models on launch day, and the GitHub team selected Sonnet 4 as the model behind the new Copilot coding agent. GitHub's accompanying announcement said that "Claude Sonnet 4 soars in agentic scenarios" and pointed to the model's tool-use and parallel-execution improvements as enabling the more autonomous coding agent. Sonnet 4 stayed in the Copilot lineup throughout its lifecycle and was joined by Sonnet 4.5 in September 2025, Sonnet 4.6 in February 2026, and Opus 4.7 in April 2026.[^1][^29][^33]

Cursor, Replit, and Vercel v0

Cursor added Sonnet 4 to its model selector at launch, with CEO Michael Truell describing it as "state-of-the-art for coding" in the launch quote that Anthropic featured in its announcement. Cursor users reported strong results on multi-file refactors during the May to September 2025 window when Sonnet 4 was the most recent Anthropic model.[^1][^30]

Replit made Sonnet 4 the default option in Replit Agent and reported "improved precision" with "dramatic advancements for complex changes across multiple files" relative to its previous default. Replit's edit error rate metric improved, though the headline 9% to 0% drop later cited as a Sonnet 4.5 result was reported in the Sonnet 4.5 launch, not Sonnet 4's.[^1]

Vercel v0, the company's frontend-generation product, added Sonnet 4 to its model lineup at launch. v0's documentation listed Sonnet 4 as a strong option for production-quality React and Next.js code generation, with the model selectable alongside GPT-4o and the company's own internal small models.[^1][^30]

Bolt.new and iGent on long context

Bolt.new and iGent AI were the two partners Anthropic featured in the August 12, 2025 announcement of the one-million-token context-window beta. Eric Simons, Bolt.new's CEO, said Sonnet 4 had been "consistently outperforming other leading models in production" on code generation, and Sean Ward, CEO of iGent AI, described the expanded window as enabling "true production-scale engineering" in the form of multi-day Maestro agent sessions on real-world codebases. Both companies became reference customers for the long-context channel that Anthropic later carried forward to Sonnet 4.5, Opus 4.6, and the eventually generally-available 1M tier on Sonnet 4.6 and Opus 4.6.[^15]

Enterprise and other partners

A long list of enterprise partners cited Sonnet 4 in the launch announcement. Sourcegraph called the model "a substantial leap in software development" and specifically highlighted its ability to stay on track over longer sessions. Augment Code reported "higher success rates, more surgical code edits, and more careful work through complex tasks." Manus highlighted improved instruction following and aesthetic outputs. iGent, an autonomous-app-development startup, said Sonnet 4 reduced navigation errors from 20% to near zero on its multi-feature pipeline.[^1]

Financial-services and consulting firms reported integration projects in the months after launch. Block (the parent company of Square and Cash App), Rakuten, and several large law firms publicly cited Claude 4 family models for code refactoring, document review, and agent orchestration use cases by late 2025; Sonnet 4 was the routine pick for cost-sensitive workloads in those deployments before being rotated out for Sonnet 4.5.[^3]

Free-tier impact

The day-one free-tier rollout had a measurable effect on claude.ai's user base. Anthropic disclosed in the months that followed that the company had crossed the milestone of more than 300,000 paying business customers and roughly $5 billion in annualized revenue by late 2025, with Claude Code and the Sonnet tier as the primary drivers. The free-tier inclusion of Sonnet 4 was consistently cited as a major contributor to that growth, both by widening the developer funnel into Claude Code and by giving free users a model that genuinely competed with paid GPT-4o conversations.[^3][^12]

Successors and legacy

Sonnet 4.5

Claude Sonnet 4 was superseded by Claude Sonnet 4.5 on September 29, 2025, four months and seven days after launch. Sonnet 4.5 used the same $3/$15 pricing, the same 200,000-token context window, and the same API surface as Sonnet 4, but lifted SWE-bench Verified to 77.2% (single run) or 82.0% with parallel high-compute runs, OSWorld to 61.4%, and Tau-bench Telecom to 98.0%. Sonnet 4.5's headline product improvement was a verified 30-hour stretch of focused autonomous coding, more than four times Opus 4's seven-hour mark, and the launch coincided with Claude Code 2.0 and the public release of the Claude Agent SDK.[^18]

When Sonnet 4.5 shipped, Sonnet 4 was reclassified as a legacy model in Anthropic's documentation but remained fully available through the API, Bedrock, and Vertex AI. Customers who had built integrations against the claude-sonnet-4-20250514 snapshot did not have to migrate immediately. The model was also kept reachable through the alias claude-sonnet-4-0, which continued to resolve to the May 2025 snapshot rather than auto-rolling forward.[^5]

Sonnet 4.6 and the one-million-token GA window

The Sonnet line continued with Claude Sonnet 4.6 (claude-sonnet-4-6) on February 17, 2026, which brought the one-million-token context window into the standard Sonnet pricing tier and made it generally available without a beta header. Sonnet 4.6 replaced Sonnet 4.5 as the default model on claude.ai for free, Pro, and Team users and shipped with extended thinking, a 1M token context window (initially still in beta), and improved agentic search performance while consuming fewer tokens than 4.5. Anthropic's documentation moved its migration recommendation for Sonnet 4 from Sonnet 4.5 to Sonnet 4.6 at that point.[^21][^32]

The one-million-token context window was then promoted to general availability across the Sonnet 4.6 and Opus 4.6 generation on March 13, 2026, including the removal of dedicated 1M rate limits and the raising of the media limit from 100 to 600 images or PDF pages per request when using long context. That decision is what made the long-context beta on Sonnet 4 and Sonnet 4.5 redundant and led to the April 30, 2026 retirement of the beta header on those two models.[^32]

Opus 4.7

Claude Opus 4.7, released on April 16, 2026, replaced Opus 4.6 as the flagship and is now Anthropic's recommended replacement for the original Opus 4 (sitting alongside Sonnet 4.6 as the replacement for Sonnet 4 in the deprecation table). Opus 4.7 was the first Claude model to ship a new tokenizer, which produces roughly 1.0 to 1.35 times as many tokens for the same source text, and introduced an "xhigh" effort level for fine-grained thinking-depth control. It scored 87.6% on SWE-bench Verified at launch and held pricing at $5/$25 per million tokens, the same as Opus 4.6.[^33]

No additional minor revisions or interim snapshots were ever shipped under the Sonnet 4 label. Anthropic moved the Sonnet line directly from Sonnet 4 to Sonnet 4.5 to Sonnet 4.6, skipping the Sonnet 4.1 slot that the family used for Opus (Claude Opus 4.1 shipped on August 5, 2025 as a focused upgrade to Opus 4, but no equivalent Sonnet 4.1 was released).[^3][^31]

Deprecation timeline summary

Date	Event
May 22, 2025	Sonnet 4 released alongside Opus 4
August 12, 2025	1M context window beta (`context-1m-2025-08-07`) available on Claude API and Bedrock
August 26, 2025	1M beta available on Vertex AI, rate limits increased
September 29, 2025	Superseded by Sonnet 4.5 as the current Sonnet model
February 17, 2026	Sonnet 4.6 ships with 1M context window (initially beta) at standard pricing
March 13, 2026	1M context window goes generally available on Sonnet 4.6 and Opus 4.6
April 14, 2026	Sonnet 4 (and Opus 4) formally deprecated; June 15 retirement announced
April 16, 2026	Opus 4.7 launched; designated as Opus 4 replacement
April 30, 2026	1M context beta header retired on Sonnet 4 and Sonnet 4.5
June 15, 2026 (9 AM Pacific)	Scheduled retirement on the Claude API

References

External links

Specifications

Background and development

Release

Code with Claude

Initial availability

Architecture

One-million-token context (beta)

Hybrid reasoning

Capabilities

Coding and agentic software engineering

Tool use and parallel tool use

Computer use

Vision

Multilingual support

Benchmarks

Pricing and availability

List pricing

Platforms

Free-tier inclusion

Deprecation and retirement

Tool use and computer use

ASL classification and safety

Joint system card

Agentic misalignment behaviors

Safety distinctions vs Opus 4

Reception

Press and analysis

Developer reception

LMArena

Independent evaluations

Adoption

Default in Claude Code

GitHub Copilot

Cursor, Replit, and Vercel v0

Bolt.new and iGent on long context

Enterprise and other partners

Free-tier impact

Successors and legacy

Sonnet 4.5

Sonnet 4.6 and the one-million-token GA window

Opus 4.7

Deprecation timeline summary

See also

References

External links

Improve this article

Related Articles

DeepSeek 3.0

Claude 4

Claude Opus 4

Claude Sonnet 4.6

Claude Opus 4.1

Claude 3 Opus

Specifications

Background and development

Release

Code with Claude

Initial availability

Architecture

One-million-token context (beta)

Hybrid reasoning

Capabilities

Coding and agentic software engineering

Tool use and parallel tool use

Computer use

Vision

Multilingual support

Benchmarks

Pricing and availability

List pricing

Platforms

Free-tier inclusion

Deprecation and retirement

Tool use and computer use

ASL classification and safety

Joint system card

Agentic misalignment behaviors

Safety distinctions vs Opus 4

Reception

Press and analysis