Claude Sonnet 4.6

Claude Sonnet 4.6 is a large language model developed by Anthropic and released on February 17, 2026. It is the seventh model in the Claude 4 generation and the immediate successor to Claude Sonnet 4.5, which had served as Anthropic's mid-tier flagship since September 2025. Sonnet 4.6 launched twelve days after Claude Opus 4.6 and brought the same one-million-token context window, adaptive thinking, and context compaction features to the lower-priced Sonnet tier at unchanged pricing of $3 per million input tokens and $15 per million output tokens.^[1]^[2]

Anthropic positioned the release around a single claim: that performance which had previously required reaching for an Opus-class model, especially on real-world office tasks, computer use, and agentic coding, was now available at Sonnet pricing. The company reported that developers preferred Sonnet 4.6 to Sonnet 4.5 about 70% of the time in Claude Code testing and preferred it to Claude Opus 4.5 about 59% of the time on the same comparison.^[1] On the Artificial Analysis Intelligence Index, Sonnet 4.6 scored 51, two points behind Opus 4.6 and tied with GPT-5.2 (xhigh) at 51 points, narrowing what had been a seven-point gap between the Sonnet and Opus tiers in the prior generation to two points.^[3]

The model became the default Claude on claude.ai for free, Pro, and Team users on the day of release, replacing Sonnet 4.5 in that role. It also shipped on the same day to Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, GitHub Copilot, Snowflake Cortex AI, and Anthropic's own Claude Code terminal agent, with day-one integrations from Cursor, Replit, Hex, Factory, Notion, and several other developer platforms.^[1]^[4]^[5]^[33]

The release was framed as a turning point for the Sonnet tier rather than a routine update. Artificial Analysis called Sonnet 4.6 the leader on GDPval-AA and TerminalBench, ahead of Opus 4.6 on those two evaluations despite the price gap.^[3] DataCamp described it as approaching Opus-level intelligence at a price point that made it practical for far more tasks, while Caylent highlighted the jump in computer-use accuracy from 61.4% on Sonnet 4.5 to 72.5% on Sonnet 4.6 as the change with the largest practical implications for production agents.^[6]^[7] Zvi Mowshowitz and others noted that real-world cost savings depended on effort tuning, since Sonnet 4.6 at max effort can over-spend output tokens.^[8] On March 13, 2026, less than four weeks after launch, Anthropic moved the 1M context window to general availability and removed the long-context price surcharge entirely; a 900,000-token Sonnet 4.6 request was thereafter billed at the same per-token rate as a 9,000-token request.^[21]^[22] By mid-May 2026, Sonnet 4.6 had become the standard Sonnet across nearly every major Claude distribution channel; Microsoft 365 Copilot Studio was scheduled to auto-migrate Sonnet 4.5 agents on May 30, 2026.^[34]

Background

The Sonnet product line

The Sonnet name within Anthropic's three-tier Claude family designates the mid-tier model: cheaper and faster than Opus, more capable than Haiku. The line began with Claude 3 Sonnet in March 2024 and ran through Claude 3.5 Sonnet (June 2024, refreshed October 2024), Claude 3.7 Sonnet (February 2025), Claude Sonnet 4 (May 2025), and Claude Sonnet 4.5 (September 2025). Sonnet 4.6 is the sixth Sonnet release overall and the third within the Claude 4 generation. Sonnet pricing has held at $3 / $15 per million tokens since March 2024 despite substantial capability gains. Sonnet 4.5 was the first Sonnet to launch under ASL-3 in September 2025, alongside context awareness, the agentic coding features that powered Claude Code 2.0, and Anthropic's Claude Agent SDK. Sonnet 4.6 inherits all those features and extends the line into 1M context windows and adaptive thinking, previously Opus-only.^[1]^[9]^[10]

Position within Claude 4

The Claude 4 family at Sonnet 4.6's release comprised seven publicly available models. Claude Opus 4 and Claude Sonnet 4 from May 2025 were both deprecated and scheduled for retirement on June 15, 2026. Claude Opus 4.1 (August 2025) and Claude Opus 4.5 (November 2025) remained available as legacy models. Claude Sonnet 4.5 (September 2025), Claude Haiku 4.5 (October 2025), and Claude Opus 4.6 (February 2026) were the active prior releases.^[2]^[9] Sonnet 4.6 fits between Opus 4.6 and Haiku 4.5 in capability and price: 40% cheaper per token than Opus 4.6 with the same 1M context but a 64K output cap (versus 128K on Opus 4.6), trailing Opus 4.6 by one to two points on most benchmarks. Compared to Haiku 4.5 it is three times more expensive, with a five-times-larger context window and several percentage points stronger on reasoning, coding, and agentic benchmarks.^[1]^[2]^[6]

The progression of recent Sonnet releases is summarized below.

Sonnet model	Release date	Context window	Max output	Input ($/MTok)	Output ($/MTok)	SWE-bench Verified	OSWorld	ASL
Claude 3.7 Sonnet	February 2025	200K	64,000	$3.00	$15.00	70.3%	n/a	ASL-2
Claude Sonnet 4	May 22, 2025	200K	64,000	$3.00	$15.00	72.7%	42.2%	ASL-2
Claude Sonnet 4.5	September 29, 2025	200K	64,000	$3.00	$15.00	77.2%	61.4%	ASL-3
Claude Sonnet 4.6	February 17, 2026	1M	64,000	$3.00	$15.00	79.6%	72.5%	ASL-3

Release context

Sonnet 4.6 arrived during a compressed window of Claude 4 family releases. Opus 4.5 had landed in November 2025 with the new $5 / $25 Opus pricing. Opus 4.6 followed on February 5, 2026, introducing adaptive thinking, context compaction, and the 1M-token context window. Sonnet 4.6 launched twelve days later, then long-context pricing went to standard rates on March 13, 2026, and Claude Opus 4.7 shipped on April 16, 2026 with a new tokenizer. With Sonnet 4.6 serving as the default in claude.ai (Free, Pro, Team) and in Claude Code for Pro and Team subscribers, it became the version of Claude that the largest share of users actually interact with, even after Opus 4.7 took over flagship benchmark headlines.^[5]^[14]^[21]^[24] The April 9, 2026 GA of Claude Cowork on all paid plans extended Sonnet 4.6 into knowledge-work automation outside Claude Code, and the May 11, 2026 launch of Claude Platform on AWS added a third hyperscaler-billed surface.^[35]^[36]

Technical specifications

The table below summarizes the key parameters of Claude Sonnet 4.6 as documented in Anthropic's API documentation, the AWS Bedrock model card, and the system card.

Parameter	Value
Model ID (Anthropic API)	`claude-sonnet-4-6`
Model alias	`claude-sonnet-4-6`
AWS Bedrock ID	`anthropic.claude-sonnet-4-6`
GCP Vertex AI ID	`claude-sonnet-4-6`
Release date	February 17, 2026
Context window	1,000,000 tokens (GA from March 13, 2026)
Max output tokens	64,000 (300,000 via batch beta header)
Input modalities	Text, images
Output modalities	Text
Extended thinking	Yes
Adaptive thinking	Yes
Computer use	Supported
Tool use / function calling	Supported
MCP support	Supported
Reliable knowledge cutoff	August 2025
Training data cutoff	January 2026
AI Safety Level	ASL-3
Prompt caching (min tokens)	1,024 (Bedrock); 4,096 (Anthropic API)
Prompt caching (max checkpoints)	4 per request
Prompt caching cacheable fields (Bedrock)	system, messages, tools
Priority Tier	Yes
Latency tier	Fast

Unlike Opus 4.7, which shipped with a new tokenizer in April 2026 that increased per-task token usage by up to 35%, Sonnet 4.6 uses the same tokenizer as the rest of the Claude 4 generation. Per-token cost comparisons against Sonnet 4.5 therefore translate cleanly into per-task cost comparisons for the same workload.^[1]^[2]

Model ID format and snapshot

Starting with the Claude 4.6 generation, Anthropic moved away from snapshot-dated model identifiers in favor of a dateless format that is still a pinned snapshot rather than an evergreen pointer. The model ID claude-sonnet-4-6 refers to a single fixed snapshot released on February 17, 2026, which Anthropic does not silently update. Future Sonnet releases will receive new dateless identifiers (for example, a hypothetical claude-sonnet-4-7) rather than continuing to update claude-sonnet-4-6.^[2]

The Bedrock listing carries some additional notes. Bedrock supports Sonnet 4.6 through global cross-region inference (global.anthropic.claude-sonnet-4-6), four geographic-region IDs (us, eu, au, jp), and a London in-region option (eu-west-2). Geo cross-region inference is available from a wider region set including most US, EU, AU, and JP regions; all 30 advertised AWS regions worldwide can use the global inference profile. The Bedrock marketplace product ID is prod-ffvjxvh4ltq64. Standard and Reserved service tiers are supported; Priority and Flex are not at launch.^[4]

One-million-token context window

Sonnet 4.6 is the first Sonnet to support a one-million-token context window, matching Opus 4.6 and Opus 4.7. Anthropic describes this as enough to hold an entire codebase, a long contract, or dozens of research papers in a single request. At launch the feature was in beta on the Anthropic API and required usage tier 4 or above, with a long-context price surcharge for any input above 200,000 tokens.^[2]^[11] On March 13, 2026, Anthropic moved the 1M context window to general availability for Sonnet 4.6 and Opus 4.6 and announced that standard pricing applies across the full window: a 900,000-token request is billed at the same per-token rate as a 9,000-token request. Full rate limits became available at every context length, prompt caching and batch processing discounts continued to apply at standard rates, and media capacity for image and PDF inputs expanded from 100 to 600 items per request. Cursor and other agent platforms removed their long-context multiplier within hours of the announcement.^[21]^[22]

Adaptive thinking

Like Opus 4.6, Sonnet 4.6 uses adaptive thinking instead of the fixed budget_tokens model that Sonnet 4.5 inherited from Claude 3.7 Sonnet. The model decides at runtime whether and how deeply to reason; developers choose an effort level (low, medium, high, max) instead of a hard token budget. Low effort suits chat and low-latency agents; medium is the default; high allocates more tokens for complex multi-step problems; max permits tens of thousands of reasoning tokens on a single response with corresponding cost.^[1]^[12] Artificial Analysis observed that at max effort Sonnet 4.6 sometimes used as many tokens as Opus 4.6 on a benchmark question, partly explaining why the headline 40% per-token discount versus Opus 4.6 narrowed in real evaluation runs.^[3] Sonnet 4.5 deployments that passed thinking={"type": "enabled", "budget_tokens": 32000} continue to work for backward compatibility; new integrations should use thinking={"type": "adaptive"} plus an effort setting.^[12]

Context compaction

Context compaction, introduced for Opus 4.6 and inherited here, lets long-running agents summarize earlier turns automatically once they approach the context-window limit. With compaction enabled, an agent can run for hundreds of turns or thousands of tool calls before hitting an explicit context overflow. Default thresholds are 50,000 tokens minimum and 150,000 tokens default trigger, both configurable.^[1]^[2]^[5] Alexlavaee.me described compaction as enabling "effectively unlimited session length," with the caveat that the compacted summary loses fidelity versus the raw transcript.^[5]

Vision and computer use

Sonnet 4.6 supports image input alongside text, with the same vision pipeline as the rest of the Claude 4 generation: document analysis, chart and figure understanding, screenshot interpretation, and visual question answering. Vision integrates tightly with computer use, where the model reads desktop or browser screenshots and produces keyboard and mouse actions for multi-step UI tasks.^[1]^[6] On OSWorld-Verified, Sonnet 4.6 scored 72.5%, narrowly behind Opus 4.6 (72.7%), well ahead of Sonnet 4.5 (61.4%), and dramatically above the original Sonnet computer-use baseline of around 14.9%. Anthropic also reported 94% accuracy on a real-world insurance-industry computer-use evaluation.^[1]^[3]^[5]^[7]

Tool use, function calling, and MCP

Sonnet 4.6 supports structured tool use, parallel tool calls, the Model Context Protocol, and interleaved reasoning between tool calls. The pricing documentation lists per-request token overhead: 346 tokens for the system prompt when tool_choice is auto or none, 313 tokens when tool_choice is any or a specific tool. Server-side tools add their own overheads: 245 tokens for the bash tool, 700 tokens for the text editor (text_editor_20250429), 735 tokens per computer-use tool definition (with an additional 466 to 499 tokens of system prompt). These figures matter when budgeting prompt-cache hit rates and per-call token costs.^[1]^[2] Developers using Claude Agent SDK or third-party agent frameworks can switch from Sonnet 4.5 to Sonnet 4.6 by changing only the model identifier; the API surface is backward compatible within the Sonnet line.^[2]^[12]

Pricing

Claude Sonnet 4.6 uses standard per-token billing with no subscription requirement for API access. Pricing is unchanged from Sonnet 4.5 and remained at the standard rate after long-context pricing went GA on March 13, 2026.

Tier	Input ($/MTok)	Output ($/MTok)
Standard (any context length, GA from March 13, 2026)	$3.00	$15.00
Long context (>200K input, prior to March 13, 2026)	$6.00	$22.50
Batch API (50% discount)	$1.50	$7.50
Prompt cache writes (5-minute TTL, 1.25x base)	$3.75	n/a
Prompt cache writes (1-hour TTL, 2x base)	$6.00	n/a
Prompt cache reads (0.1x base)	$0.30	n/a
US data residency multiplier (Anthropic API)	1.1x	1.1x
AWS Bedrock regional/multi-region premium	1.1x	1.1x

Anthropic markets up to 90% cost savings via prompt caching and 50% via batch processing. The cache-read rate of $0.30/MTok is one tenth of the standard input rate, breaking even after one cached read for the 5-minute TTL or two reads for the 1-hour TTL. Two pricing modifiers stack: US-only inference via inference_geo (1.1x across all categories) and Bedrock/Vertex AI regional or multi-region endpoints (1.1x versus global), introduced with Sonnet 4.5 in late 2025.^[2]^[11] Most Bedrock and Vertex AI listings match Anthropic API rates closely; OpenRouter quotes the standard $3 / $15.

Cost relative to sibling Claude models

The table below compares Sonnet 4.6 to the rest of the Claude 4 generation on a per-token basis at the post-March-13 standard rate.

Model	Input ($/MTok)	Output ($/MTok)	Long-context premium	Ratio vs Sonnet 4.6 input	Ratio vs Sonnet 4.6 output
Claude Haiku 4.5	$1.00	$5.00	n/a	0.33x	0.33x
Claude Sonnet 4.6	$3.00	$15.00	None (standard pricing across full 1M window)	1.0x	1.0x
Claude Sonnet 4.5	$3.00	$15.00	n/a (200K only)	1.0x	1.0x
Claude Opus 4.5	$5.00	$25.00	n/a	1.67x	1.67x
Claude Opus 4.6	$5.00	$25.00	None (standard pricing across full 1M window)	1.67x	1.67x
Claude Opus 4.7	$5.00	$25.00	None (standard pricing across full 1M window)	1.67x	1.67x

At list price, Sonnet 4.6 is 40% cheaper per token than Opus and 200% more expensive than Haiku. Real-world differences vary depending on effort levels and long context use. Artificial Analysis reported Sonnet 4.6 used about 74 million output tokens to complete its full Intelligence Index suite, roughly 28% more than Opus 4.6's 58M and three times Sonnet 4.5's 25M, partly offsetting per-token savings. All-in cost was $2,088 for Sonnet 4.6, $733 for Sonnet 4.5, and $2,486 for Opus 4.6: Sonnet 4.6 was only 16% cheaper than Opus 4.6 at max effort on this benchmark, a much smaller discount than the headline 40% per-token gap. For workloads dominated by short tasks where deep reasoning rarely fires, the per-token gap translates more directly.^[3]

The April 9, 2026 Advisor Strategy offered another cost-quality trade-off: pairing Sonnet 4.6 or Haiku 4.5 as executor with Opus 4.7 as advisor cuts all-in cost roughly 11.9% versus Opus-only while preserving most Opus-tier accuracy on complex agentic tasks.^[35]

Code execution, web search, and managed agent pricing

Beyond messaging, several built-in tools have separate fees. Code execution is free when invoked with web search or web fetch; otherwise execution time is billed against an organization-wide allowance of 1,550 free hours per month, with overage at $0.05 per container-hour. Web search is $10 per 1,000 search calls plus standard token costs for retrieved content. Claude Managed Agents add a session runtime fee of $0.08 per running session-hour on top of standard token rates and do not get the Batch API discount or the Fast mode premium.^[2]

Benchmark performance

The table below presents the most widely cited benchmark scores for Claude Sonnet 4.6 alongside Sonnet 4.5 (its predecessor) and Opus 4.6 (the contemporary flagship). Where a benchmark was reported in the Anthropic launch post, the headline figure is used; where it was reported by an independent benchmark aggregator, the source is cited.

Benchmark	Sonnet 4.6	Sonnet 4.5	Opus 4.6	Source
SWE-bench Verified	79.6%	77.2%	80.8%	Anthropic launch post^[1]
SWE-bench Pro	56.8%	n/a	n/a	Morphllm leaderboard^[37]
SWE-bench Multilingual	75.9%	n/a	n/a	Anthropic system card^[20]
OSWorld-Verified (computer use)	72.5%	61.4%	72.7%	Anthropic launch post^[1]
Terminal-Bench 2.0	59.1%	51.0%	65.4%	DataCamp / Caylent^[6]^[7]
HumanEval	93.5%	n/a	n/a	Pricepertoken aggregation^[25]
MMLU-Pro	85.2%	n/a	n/a	Morphllm aggregation^[26]
GPQA Diamond	74.1%	79.0%	91.3%	NxCode benchmark guide^[13]
Math (Anthropic launch)	89%	62%	n/a	NxCode benchmark guide^[13]
ARC-AGI-1	86%	n/a	n/a	Mowshowitz analysis^[8]
ARC-AGI-2	60.4% (or 58%)	n/a	68.8%	NxCode / Mowshowitz^[8]^[13]
GDPval-AA (Elo)	1633	n/a	1606	Artificial Analysis^[3]
TerminalBench (Artificial Analysis)	53%	n/a	46%	Artificial Analysis^[3]
Insurance computer use	94%	n/a	n/a	Anthropic launch post^[1]
Finance Agent v1.1	63.3%	n/a	60.1%	DataCamp^[6]
MCP-Atlas (tool use)	61.3%	n/a	60.3%	Morphllm aggregation^[26]
Tau-Bench (retail, original)	91.7%	n/a	n/a	Morphllm / Anthropic^[26]
Tau-Bench (telecom, original)	97.9%	n/a	97.9%	Morphllm / Anthropic^[26]
tau2-bench composite (April 30, 2026)	87.5%	86.2%	n/a	Sierra Research / BenchLM^[38]
tau2-bench Retail	86.2%	n/a	n/a	Sierra Research / BenchLM^[38]
tau2-bench Airline	70.0%	n/a	n/a	Sierra Research / BenchLM^[38]
tau2-bench Telecom	98.0%	n/a	n/a	Sierra Research / BenchLM^[38]
BrowseComp (single-agent)	74.7%	n/a	84.0%	Anthropic system card update^[27]
BrowseComp (multi-agent)	82.07%	n/a	n/a	Anthropic system card update^[27]
SRE-Skills-Bench	90.4%	85.9%	94.7%	Rootly^[28]
NYT Connections	58%	49%	substantially higher	Mowshowitz analysis^[8]
Artificial Analysis Intelligence Index	51	43	53	Artificial Analysis^[3]
Single-turn harmlessness	99.38%	97.89%	similar	System card^[20]
Prompt injection attack success (no safeguards)	1.29%	49.36%	similar	System card^[20]

Notes: SWE-bench Verified scores are the headline single-run figures from Anthropic's launch post. The OSWorld-Verified score for Sonnet 4.6 is the harmonized public number; OSWorld-Verified is a curated subset of the full OSWorld benchmark. ARC-AGI-2 figures vary slightly between sources (60.4% from NxCode, 58% from Mowshowitz's read of the same release), most likely reflecting different evaluation harnesses or effort settings. Terminal-Bench 2.0 figures for Sonnet 4.6 are from DataCamp and Caylent, with Anthropic's own launch post stressing OSWorld and SWE-bench rather than terminal-bench specifically. Anthropic released a revised system card on March 6, 2026 after running an improved cheating-detection pipeline that found nine additional unintended solutions in the original BrowseComp evaluation, leading the single-agent score to be revised from 74.72% down to 74.01% and the multi-agent score down to 82.07%; the table above uses the revised figures.^[1]^[3]^[6]^[7]^[8]^[13]^[27]

SWE-bench Verified, Multilingual, and Pro

SWE-bench Verified tests real-world GitHub issue resolution where the model must identify a bug or feature gap and write a patch that passes the repository's test suite. Sonnet 4.6's 79.6% is a 2.4-point improvement over Sonnet 4.5 (77.2%) and lands within 1.2 points of Opus 4.6 (80.8%). It is the highest SWE-bench Verified score reported for any Sonnet-class model up to that point.^[1]^[14] Caylent and DataCamp argued the 1.2-point delta to Opus 4.6 was within the noise of typical multi-run variance.^[6]^[7] Boris Cherny, the Claude Code creator, was reported as a partial dissenter, arguing the gap mattered for solo debugging where each percentage point translates into real time saved.^[5]

On the harder SWE-bench Pro split, a contamination-resistant multi-language extension that became the discriminating coding benchmark of the post-launch period, Sonnet 4.6 scored 56.8% per the Morphllm leaderboard as of May 2026, behind Opus 4.7 (64.3%) and GPT-5.4 (57.7%) and ahead of Gemini 3.1 Pro (54.2%). SWE-bench Multilingual from the Anthropic system card gave Sonnet 4.6 a 75.9% pass rate, a few points below the Verified split.^[20]^[37]

The post-launch landscape placed Opus 4.6 at 80.8%, Gemini 3.1 Pro at 80.6%, MiniMax M2.5 at 80.2%, GPT-5.4 at roughly 80%, Sonnet 4.6 at 79.6%, and Grok 4 at 75% on SWE-bench Verified. The cluster between 79% and 81% indicated that leading agentic-coding models had converged at the Verified level, with differentiation moving to the harder SWE-bench Pro split and recent agentic terminal benchmarks.^[26]^[29]

OSWorld-Verified and computer use

OSWorld-Verified measures navigation of desktop and web GUIs, form filling, and multi-step computer tasks. Sonnet 4.6's 72.5% is the largest single capability jump in the launch: Sonnet 4.5 had scored 61.4% five months earlier, and Sonnet 4 scored only 42.2% in May 2025. Sonnet 4.6 is essentially tied with Opus 4.6 (72.7%) and well ahead of GPT-5.3 Codex (64.7%). Anthropic also reported 94% accuracy on an internal insurance-industry computer-use evaluation, evidence that Sonnet 4.6 was the first Claude model on which large-scale computer-use deployment was practical for a regulated enterprise vertical.^[1]^[3]^[6]^[7]

GDPval-AA and Terminal-Bench

GDPval-AA is Artificial Analysis's Elo-style evaluation of economically valuable knowledge-work tasks (drafting documents, building spreadsheets, analyst work). Sonnet 4.6 scored 1633 Elo, ahead of Opus 4.6 at 1606 and every other model AA tested at launch. The result was the strongest evidence that Sonnet 4.6 had reached or exceeded Opus-class capability on the practical office tasks that drive enterprise AI usage.^[3]

On Terminal-Bench 2.0, Sonnet 4.6 scored 59.1%, an 8.1-point improvement over Sonnet 4.5's 51.0%. Opus 4.6 leads at 65.4%, and GPT-5.3 Codex was reported at 75.1%, the only headline-tier evaluation where the Opus and Sonnet advantage over a non-Anthropic model was clearly reversed.^[6]^[7]^[29] Artificial Analysis's parallel TerminalBench evaluation gave Sonnet 4.6 53%, ahead of Opus 4.6 at 46%, likely reflecting differences in scoring or subset.^[3]

Tau-Bench, tau2-bench, and MCP-Atlas tool use

Tau-Bench from Sierra Research tests realistic customer-service workflows by calling tools, reading results, and reasoning across many turns. On the original tau-bench, Sonnet 4.6 scored 91.7% retail and 97.9% telecom, matching Opus 4.6 on telecom and finishing within a percentage point on retail.^[26] On the expanded tau2-bench, which added voice and knowledge-retrieval domains, Sonnet 4.6 scored 87.5% composite on the April 30, 2026 leaderboard snapshot, second behind the invitation-only Claude Mythos Preview at 89.2% and ahead of Sonnet 4.5's 86.2%. Per-domain figures were 86.2% retail, 70.0% airline (the hardest subset for every frontier model), and 98.0% telecom.^[38] On MCP-Atlas, Sonnet 4.6 scored 61.3%, narrowly ahead of Opus 4.6 at 60.3% and the only frontier evaluation where Sonnet 4.6 outscored Opus 4.6 on tool use directly.^[26]

BrowseComp and web research

BrowseComp is the live-web information-seeking benchmark with 1,266 tasks requiring navigation of the live internet. Anthropic's revised system card from March 6, 2026 reported Sonnet 4.6 at 74.7% single-agent and 82.07% multi-agent, after an improved cheating-detection pipeline flagged nine additional shortcuts and revised the original 74.72% down to 74.01%. The results place Sonnet 4.6 behind Opus 4.6 (84.0%) and GPT-5.2 (77.9%) single-agent but ahead of most of the field in multi-agent configurations where multiple instances coordinate to verify answers. The transparent downward revision was unusual in commercial AI benchmarking and was cited approvingly by Mowshowitz and others as a signal about evaluation discipline.^[8]^[27]

GPQA, Math, MMLU-Pro, HumanEval, and ARC-AGI

Graduate-level reasoning is one area where Sonnet 4.6 is closer to Sonnet 4.5 than to Opus 4.6. On GPQA Diamond, Sonnet 4.6 scored 74.1% per NxCode's collation, slightly below Sonnet 4.5's 79.0% and well below Opus 4.6's 91.3%. GPT-5 scored in the high 80s and Gemini 3 Pro reached 94.3%.^[13]^[30] On pure mathematics, Sonnet 4.6 scored 89% on the Math benchmark used in Anthropic's launch documentation, a substantial gain over Sonnet 4.5's 62%. On HumanEval, Sonnet 4.6 scored 93.5%, in the same band as Opus 4.6. On MMLU-Pro, Sonnet 4.6 scored 85.2%, a few points below Opus 4.6 and Opus 4.7 but well within the leading band.^[13]^[25]^[26]

On ARC-AGI, Sonnet 4.6 scored 86% on ARC-AGI-1 and either 58% (Mowshowitz) or 60.4% (NxCode) on ARC-AGI-2, with the difference attributable to evaluation harness or effort settings. Mowshowitz reported that ARC-AGI-2 cost Sonnet 4.6 "roughly equal to Opus despite somewhat lower performance," with per-task ARC-AGI-1 at about $1.45 and ARC-AGI-2 at $2.72.^[8]

Domain benchmarks: Finance, SRE, Insurance

On Finance Agent v1.1, Sonnet 4.6 scored 63.3%, leading the field and beating Opus 4.6 at 60.1%.^[6] On SRE-Skills-Bench from Rootly, Sonnet 4.6 scored 90.4% versus Sonnet 4.5's 85.9% and Opus 4.6's 94.7%. The per-domain breakdown was instructive: Sonnet 4.6 led on general SRE (88.0% vs 87.0%), tied on AWS networking (97.1%), and trailed on AWS S3 (75.7% vs 91.9%) and AWS IAM (85.2% vs 92.2%). On root-cause accuracy in production incident reproductions, Rootly reported Sonnet 4.6 performing similarly to Opus 4.6 and beating it in some cases despite running at 40% lower cost per token.^[28] Anthropic's internal insurance computer-use evaluation gave Sonnet 4.6 a 94% accuracy result, the highest reported for any Claude model on a regulated-vertical benchmark at launch.^[1]^[26]

Other reasoning and creative benchmarks

On NYT Connections, a creative-reasoning probe, Sonnet 4.6 scored 58%, up from Sonnet 4.5's 49% but well below Opus 4.6.^[8] On the Artificial Analysis Intelligence Index, Sonnet 4.6 reached 51 points (Sonnet 4.5: 43; Opus 4.6: 53), tying GPT-5.2 (xhigh). Artificial Analysis emphasized that the two-point Sonnet-Opus gap was the smallest the firm had measured in the Claude family, down from seven points at the Sonnet 4.5 / Opus 4.5 release.^[3]

Speed and throughput

Independent launch measurements put Sonnet 4.6 throughput at roughly 60 to 90 tokens per second on the Anthropic API, depending on extended-thinking effort. On claude.ai, time-to-first-token was reported under one second. Anthropic positions Sonnet 4.6 as Fast latency tier, between the Fastest Haiku 4.5 and Moderate Opus 4.6/4.7. Opus 4.6 has a separate Fast mode beta at 6x standard pricing; no equivalent fast mode tier is documented for Sonnet 4.6.^[2]^[6]

Comparison with other Claude models

The table below compares Sonnet 4.6 to current and legacy Claude models on the dimensions most relevant to deployment decisions.

Model	Input ($/MTok)	Output ($/MTok)	Context window	Max output	Extended thinking	Adaptive thinking	Computer use	Latency tier
Claude Haiku 4.5	$1.00	$5.00	200K	64K	Yes	No	Yes	Fastest
Claude Sonnet 4.5	$3.00	$15.00	200K	64K	Yes	No	Yes	Fast
Claude Sonnet 4.6	$3.00	$15.00	1M	64K	Yes	Yes	Yes	Fast
Claude Opus 4.5	$5.00	$25.00	200K	64K	Yes	Yes	Yes	Moderate
Claude Opus 4.6	$5.00	$25.00	1M	128K	Yes	Yes	Yes	Moderate
Claude Opus 4.7	$5.00	$25.00	1M	128K	No (adaptive only)	Yes	Yes	Moderate

Compared to Sonnet 4.5, Sonnet 4.6 has the same pricing and the same maximum output token cap but a 5x larger context window, plus adaptive thinking and context compaction. The two models are otherwise drop-in compatible at the API level; an application calling Sonnet 4.5 typically needs only the model identifier change and the optional addition of an effort level to migrate.^[2]^[12]

Compared to Opus 4.6, Sonnet 4.6 is 40% cheaper per token, has the same context window but a smaller output cap (64K vs 128K), and trails Opus 4.6 by roughly one to two percentage points on most benchmarks except GDPval-AA, MCP-Atlas, Finance Agent v1.1, and TerminalBench (Artificial Analysis), where Sonnet 4.6 leads. For agentic office work and tool-heavy customer-service workloads, Artificial Analysis recommends Sonnet 4.6 outright; for deep scientific reasoning, GPQA, ARC-AGI-2, BrowseComp single-agent, and Terminal-Bench 2.0 in absolute terms, Opus 4.6 retains its advantage.^[3]^[6]^[7]

Compared to Opus 4.7, Sonnet 4.6 is similarly 40% cheaper but trails by a wider margin on coding benchmarks. Opus 4.7's SWE-bench Verified of 87.6% is eight percentage points above Sonnet 4.6, and on SWE-bench Pro the gap widens to 7.5 points (Opus 4.7 at 64.3% versus Sonnet 4.6 at 56.8%), the largest Sonnet-Opus delta since the Claude 4 generation began. Opus 4.7 also has higher OSWorld accuracy (78.0% versus 72.5%) and a 1:1 vision pixel mapping that Sonnet 4.6 does not have. Anthropic positions Opus 4.7 as the model of choice for the most complex agentic coding work and Sonnet 4.6 as the right default for everyday workloads, including most of Claude Code's day-to-day usage.^[14]^[15]^[37]

Comparison with non-Anthropic frontier models

By Q2 2026, Sonnet 4.6 was being benchmarked against GPT-5 variants and Gemini 3 Pro/3.1 Pro on most of the same evaluations. Sonnet 4.6 led on agentic computer use (OSWorld 72.5% vs GPT-5.3 Codex 64.7%) and agentic office tasks (GDPval-AA Elo 1633 vs next-best 1606), tied or led on tool-use benchmarks (MCP-Atlas 61.3%, Tau-Bench retail 91.7%, telecom 97.9%, tau2-bench composite 87.5%), and trailed on graduate-level scientific reasoning (Gemini 3.1 Pro reached 94.3% on GPQA Diamond) and Terminal-Bench 2.0 (GPT-5.3 Codex 75.1%).^[7]^[13]^[29]^[30]^[38]

BenchLM's later comparison against GPT-5.5 found Sonnet 4.6 leading on multimodal and grounded tasks (77.4 vs 70.4) while GPT-5.5 led on agentic work (81.5 vs 65.1); Kilo.ai's May 2026 analysis observed that the three frontier coding models converged within 1.2 points on SWE-bench Verified and 0.8 on MMLU, shifting buyer differentiation toward price and integration footprint.^[39]^[40]

Pricing comparisons showed Gemini 3.1 Pro at roughly 33 to 35% lower per-token cost than Sonnet 4.6, explaining why Google's model became the cost-led default for some long-context document workflows. In a Q1 2026 blind preference study cited by IntuitionLabs, Claude-generated content was preferred 47% of the time versus 29% for GPT-5.4 and 24% for Gemini 3.1 Pro, a gap analysts attributed to Claude's stronger long-form coherence over 10,000-word outputs.^[30]^[31]

Capabilities in detail

Agentic coding

Sonnet 4.6's 79.6% on SWE-bench Verified places it at near-flagship coding accuracy. Anthropic's launch post highlighted the model's behavior across long agentic coding sessions: it follows multi-step plans more reliably, makes fewer false claims of completion, and is less prone to over-engineering simple changes than its predecessor.^[1] DataCamp's review described the same effect concretely: in repeated tests of multi-file Python and TypeScript refactors, Sonnet 4.6 completed tasks faster and with fewer regressions, particularly when given a clear specification.^[6]

Claude Code shipped Sonnet 4.6 support on launch day, with the model picker allowing users to choose between Sonnet 4.6 and Opus 4.6.^[1]^[16] GitHub Copilot made Sonnet 4.6 generally available across Pro, Pro+, Business, and Enterprise plans on the same date, with support spanning Visual Studio Code (chat, ask, edit, agent), Visual Studio (agent, ask), github.com, GitHub Mobile, the Copilot CLI, and the Copilot coding agent. The integration launched at a 1x premium-request multiplier described as tentative.^[16]

GitHub's Joe Binder, lead PM for Copilot, said that "out of the gate, Claude Sonnet 4.6 is already excelling at complex code fixes, especially when searching across large codebases is essential," a claim consistent with the strong long-context retrieval and near-Opus SWE-bench score.^[1]^[16] Replit's chief executive Michele Catasta described Sonnet 4.6 as having an "extraordinary" performance-to-cost ratio and noted that "it's hard to overstate how fast Claude models have been evolving in recent months."^[1]

Computer use

The move from 61.4% to 72.5% on OSWorld-Verified is the single largest capability change in the Sonnet 4.6 release in proportional terms. Combined with the 94% accuracy on a real-world insurance computer-use task, Sonnet 4.6 is the first Claude model on which production computer-use deployments are economically practical at scale, given that Opus 4.6 reaches similar accuracy at 67% higher cost per token. Alexlavaee.me reported a roughly 5x improvement in OSWorld accuracy over the original computer-use baseline from October 2024 (around 14.9%).^[1]^[3]^[5]

As with Opus 4.6, Anthropic recommends that production computer-use deployments include human-in-the-loop oversight for high-stakes actions (financial transactions, file deletions, irreversible operations) and explicit retry, validation, and rollback layers for low-stakes tasks. Caylent's deployment review walked through a sample insurance claims pipeline using Sonnet 4.6 with three validation stages and human approval at the disbursement step, citing it as a template for compliant computer-use rollouts. The recommended pattern is "approval workflows + retries + validation" rather than fully autonomous operation.^[7]

Long-horizon agents and planning

Sonnet 4.6 inherits the long-horizon agent design that Sonnet 4.5 introduced in September 2025 and the agent-team architecture that Opus 4.6 introduced in February 2026. With context compaction enabled, a Sonnet 4.6 agent maintains coherent state across hundreds of turns or thousands of tool calls without an external memory layer. With adaptive thinking, it scales reasoning depth dynamically rather than spending the same effort on every sub-step. Anthropic's launch demos included a Sonnet 4.6 agent that completed a research task lasting several hours autonomously, with intermediate checkpoints and tool-use traces.^[1]^[2]^[12]

In the multi-agent configuration that Anthropic reported on BrowseComp, multiple Sonnet 4.6 instances coordinated to research and verify answers. The 82.07% multi-agent score versus 74.7% single-agent indicated a roughly 7.4-point uplift from coordination alone, with multi-agent runs spending roughly 4x as many tokens through the system as single-agent runs.^[8]^[27]

A second multi-model pattern emerged on April 9, 2026 with the Advisor Strategy: a Sonnet 4.6 executor consults Opus 4.7 only on the hardest sub-steps, reducing total cost roughly 11.9% versus Opus-only.^[35]

Vision and document understanding

Sonnet 4.6 processes image inputs alongside text including documents, charts, screenshots, and photos. The model carries forward the same vision pipeline as Sonnet 4.5 and Opus 4.6, distinct from the new tokenizer and high-resolution vision arriving with Opus 4.7.^[15] DataCamp described Sonnet 4.6 as competent on document layouts, contract analysis, and chart interpretation with no significant regressions versus Sonnet 4.5.^[6] The March 13 GA announcement raised per-request media capacity from 100 to 600 images or PDF pages.^[21]

Multilingual capability

Claude Sonnet 4.6 supports the same major world languages as the Claude 4 family: English, Spanish, French, German, Portuguese, Italian, Dutch, Japanese, Korean, Chinese. Anthropic did not publish a dedicated MMMLU figure at launch, but third-party evaluations placed Sonnet 4.6 in the high 80s to low 90s on multilingual MMLU, in line with the rest of the Claude 4.5/4.6 generation. Enterprise reviews highlighted improvements in non-English coding comments and localized customer-service tone consistency versus Sonnet 4.5.^[6]^[13] On SWE-bench Multilingual specifically, Sonnet 4.6 scored 75.9%, a few points below its 79.6% on the Verified split and consistent with the regression most frontier coding models show when moving outside Python.^[20]

Prompt caching and batch processing

Sonnet 4.6 supports prompt caching with a minimum cache checkpoint of 4,096 tokens (Anthropic API) or 1,024 tokens (Bedrock), and up to four cache checkpoints per request. TTL is five minutes or one hour. Cache reads bill at one-tenth the input rate, so a 200,000-token system prompt costing $0.60 to write at the 5-minute TTL is $0.06 per cached read.^[2]^[4]^[11] Cacheable fields on Bedrock include system prompt, messages, and tools blocks; the Anthropic API adds explicit cache breakpoints on individual content blocks. Batch processing through the Message Batches API gives a 50% discount with up to 24-hour turnaround. Sonnet 4.6 supports up to 300,000 output tokens per batched request via the output-300k-2026-03-24 beta header. Caching and batch discounts stack with each other but not with managed-agent runtime billing.^[2]

Use cases

Default model on claude.ai

On February 17, 2026, Sonnet 4.6 replaced Sonnet 4.5 as the default Claude on claude.ai for free, Pro, and Team subscribers. Pro and Max users continued to have access to Opus 4.6 as a premium model selectable from the picker. The free-tier default position was unusual for a frontier-class model: industry analysts read it as Anthropic embedding Sonnet 4.6 in professional workflows ahead of enterprise procurement decisions, treating widespread free-tier adoption as a way to create switching costs against premium-tier alternatives at competing labs.^[1]^[5]^[17]^[24]

Daily coding driver

Claude Code shipped Sonnet 4.6 as the default for most users on launch day, with Opus 4.6 selectable for harder problems. Hex Technologies' chief technology officer said the company was moving the majority of its analytical workload to Sonnet 4.6 citing accuracy and price simultaneously, and Factory.AI transitioned its primary developer-agent traffic from Sonnet 4.5 to Sonnet 4.6. Pasquale Pillitteri's Claude Code guide described Sonnet 4.6 as the appropriate daily driver for roughly 90% of developer workloads, with Opus 4.6 reserved for the difficult 10%. Cursor's documentation took a similar position: Sonnet 4.6 "is better on longer tasks but below Opus for raw intelligence."^[1]^[5]^[18]^[32]

Document and analyst workflows

Sonnet 4.6's strength on GDPval-AA and on the insurance computer-use benchmark made it a natural fit for document-processing and analyst workflows in regulated industries: contract review, financial analyst tasks, policy comparison.^[6] After the March 13 1M context GA, the cost calculus for very long workflows changed materially. A 700,000-token contract review that would have cost $4.20 on input and $4.50 on output at the prior long-context premium dropped to $2.10 input and $3.00 output at standard rates, a roughly 40% reduction. The same reduction applies to large-codebase security audits, multi-paper research synthesis, and similar long-form analyst tasks.^[21]^[22]

Multi-agent orchestration

The Sonnet 4.5-plans, Haiku 4.5-executes pattern from October 2025 carries forward with Sonnet 4.6 capable of taking the orchestrator role for non-flagship workloads where Opus 4.6 would be overkill. The larger context window lets a Sonnet 4.6 orchestrator maintain coherent global state for longer runs without external memory.^[1]^[7] Caylent's deployment review showed a Sonnet 4.6 orchestrator paired with multiple Haiku 4.5 sub-agents at roughly 60% lower monthly cost than Sonnet 4.6 throughout while maintaining most of the quality.^[7] The BrowseComp multi-agent result (82.07%) demonstrated the technique inside the same tier.^[27] The Advisor Strategy variant from April 2026 has a Sonnet 4.6 executor selectively consulting Opus 4.7 at roughly 88% of the Opus-only cost.^[35]

AI SRE and incident response

Rootly's evaluation cited Sonnet 4.6's 90.4% on SRE-Skills-Bench (up from 85.9%) and near-parity with Opus 4.6 on production incident root-cause investigations as evidence that the model had become a viable default for AI SRE products at the same per-token cost. The 1M context lets long incident logs and infrastructure manifests fit in a single request. Per-domain breakdown still favors Opus 4.6 on AWS S3 (91.9% vs 75.7%) and AWS IAM (92.2% vs 85.2%), suggesting infrastructure investigations crossing those services may benefit from escalation.^[28]

Microsoft 365 and Snowflake enterprise channels

Microsoft 365 Copilot added Sonnet 4.6 alongside its OpenAI defaults on March 9, 2026, reaching the approximately 400 million seats covered by Microsoft 365 Copilot licensing and making it the largest single Sonnet 4.6 distribution channel by addressable user count. Microsoft 365 Copilot Studio retired Sonnet 4.5 on May 30, 2026 with auto-migration to Sonnet 4.6 (one-time 30-day deferral through June 30 available), and on May 1, 2026 the Microsoft 365 E7 Frontier Suite reached GA with multi-model Copilot, Agent 365, and Copilot Cowork including Sonnet 4.6 out of the box.^[34]^[41]^[42]

Snowflake Cortex AI shipped Sonnet 4.6 on launch day under a multi-year $200 million partnership announced in February 2026, making Claude available to Snowflake's 12,600+ enterprise customers and powering Snowflake Intelligence and Cortex Code for SQL, Python, and Streamlit workloads.^[33]^[43]

Reception and adoption

Reception was strongly positive on technical merits, with coverage focused on three threads: the closing gap between the Sonnet and Opus tiers, the leap in computer-use accuracy, and price stability. DataCamp described Sonnet 4.6 as approaching Opus-level intelligence at a price point that made it practical for far more tasks.^[6] Caylent emphasized that Sonnet 4.6 "makes computer-use automation viable at scale" with appropriate validation layers.^[7] Artificial Analysis called Sonnet 4.6 the most impressive mid-tier release of the cycle.^[3] NxCode framed it as "79.6% SWE-bench at $3/MTok."^[13]

Developer testimonials in the launch post and follow-up coverage included Hex Technologies' CTO ("moving majority traffic"), Factory.AI (transitioning primary agent traffic), Replit's Michele Catasta ("extraordinary" performance-to-cost ratio), GitHub's Joe Binder ("excelling at complex code fixes, especially when searching across large codebases is essential"), and Hercules' CEO emphasizing "Opus 4.6-level accuracy at a meaningfully lower cost." One early-access design customer said Sonnet 4.6 "has perfect design taste when building frontend pages and data reports, and it requires far less hand-holding to get there than anything we've tested before."^[1]^[5]^[16]

Mowshowitz provided the most detailed independent analysis, headlining his post "Claude Sonnet 4.6 Gives You Flexibility." Mowshowitz argued the model is "modestly less capable" than Opus 4.6 but "significantly cheaper and faster" and recommended it for most workloads. He noted concerns about Sonnet 4.6 appearing "overfitted on agenticity" (excessive tool calling, occasional unproductive reasoning loops at max effort) and described the headline 40% cost discount as eroding under high-effort settings, but his overall verdict was clearly positive. He also flagged improvements in honesty, reduced sycophancy, and substantial gains in prompt-injection resistance.^[8]

Not all reactions were uniformly enthusiastic. Boris Cherny, the Claude Code creator, was reported as preferring Opus for his own use, arguing the SWE-bench gap mattered for solo debugging.^[5] On Hacker News and the Claude developer Discord, some users reported Sonnet 4.6 calling tools more aggressively than Sonnet 4.5, leading to occasional unnecessary tool invocations.^[3]^[8] Early-launch reports of hallucinated function names and structured-output formatting errors tapered off within hours, suggesting a serving-layer or toolchain transient rather than a model-level regression.^[5]

A mid-May 2026 Winbuzzer report flagged that Microsoft was reducing internal Claude Code licenses for some engineering teams in favor of GitHub Copilot CLI, framed as internal tooling consolidation. Microsoft 365 Copilot's Claude integration, the Copilot Studio Sonnet 4.6 default, and the GitHub Copilot Sonnet 4.6 listing remained unaffected.^[44]

Adoption and market context

By late April 2026, multiple data points pointed to broad uptake. Hex, Factory, and Replit publicly disclosed migration of primary traffic. Cursor, Notion, Augment Code, Warp, Zencoder, and Poe all reported Sonnet 4.6 as their top-volume Claude variant by mid-March, alongside Opus 4.6 for premium-tier requests. Microsoft Foundry surfaced Sonnet 4.6 as the recommended Claude Sonnet model in its model catalog within 48 hours of launch. The March 13 1M-context GA announcement paired standard pricing with full rate limits at every context length, signalling that production capacity for long-context workloads had matured to the point where Anthropic was comfortable removing previously tier-gated access.^[1]^[5]^[16]^[21]^[22]^[32]

Aggregate adoption numbers from third-party trackers gave further context. Demandsage reported the Claude API processed more than 25 billion calls per month by Q1 2026, 45% from enterprise platforms, and Anthropic's annualized run-rate revenue reached roughly $14 billion in February 2026. The Claude API served over 87,000 organizations by Q1 2026, a 142% year-over-year increase, with Anthropic counting 300,000+ business customers.^[45]^[46]^[47] Salesforce's April 2026 Developer Edition refresh added Agentforce Vibes IDE with Claude as a first-party coding model, deepening Sonnet 4.6's footprint in the CRM developer ecosystem.^[48]

Safety and alignment

Anthropic deployed Claude Sonnet 4.6 under AI Safety Level 3 (ASL-3), the same standard applied to Sonnet 4.5, Opus 4.5, Opus 4.6, and Opus 4.7. The company's Responsible Scaling Policy version 3.0 designates ASL-3 for models that may provide meaningful uplift to actors developing chemical, biological, radiological, or nuclear (CBRN) weapons, or that may provide significant uplift to autonomous self-replicating activity. The classification is conservative: Anthropic emphasizes that ASL-3 is applied when the company cannot rule out the relevant capability threshold rather than as a definitive determination that the model has crossed it.^[1]^[19]

Single-turn harmlessness

The system card reports 99.38% harmless-response rate on single-turn violative requests, up from Sonnet 4.5's 97.89% and broadly comparable to Opus 4.6. On the harder violative subset the rate is 99.40% (vs 98.40% for Sonnet 4.5). Over-refusal on benign prompts is 0.18% (higher-difficulty) and 0.41% (straightforward), versus Sonnet 4.5's 8.50% and 0.08%; the 8.50% figure had been one of the most-cited concerns about Sonnet 4.5 and the new number addresses it directly. On agentic coding tests with malicious intent, Sonnet 4.6 achieved a 100% refusal rate. The child-safety subset shows 99.96% harmless on violative prompts and 0.08% refusal on benign prompts.^[7]^[20]

Prompt injection

Resistance to prompt injection saw the largest absolute change of any safety metric. On Anthropic's standard browser-environment prompt-injection benchmark, Sonnet 4.6 had an attack success rate of 1.29% of scenarios (0.29% of attempts) without external safeguards, versus Sonnet 4.5's 49.36% of scenarios (16.23% of attempts). With safeguards enabled, the success rate fell to 0.51% of scenarios and 0.08% of attempts. Anthropic described the result as "matching Opus-level" prompt-injection resistance for the first time in a Sonnet model.^[1]^[7]^[20] The reduction from a roughly 50% scenario-level success rate on Sonnet 4.5 to a roughly 1% rate brings Sonnet-tier deployments much closer to a security profile compatible with sensitive enterprise data, though Anthropic still recommends production deployments stack input and output classifier safeguards on top.^[7]^[20]

Misalignment evaluations

The system card reports Sonnet 4.6 showed lower rates of misaligned behavior than Sonnet 4.5 on Anthropic's automated alignment evaluations, including the agentic misalignment scenarios that drew attention with the Opus 4 system card in May 2025. On sycophancy, researcher Drake Thomas noted Sonnet 4.6 was less sycophantic than all prior Claude models including Opus 4.6, with "warmth" modestly higher than Sonnet 4.5 but lower than Opus 4.6.^[8]^[20]

Mowshowitz's read found mixed results: clear improvements on honesty, sycophancy, and prompt injection, while the model continued the evaluation awareness documented for Sonnet 4.5 (verbalizing in some test transcripts that it suspected it was being evaluated). Anthropic's launch communication called the character "broadly warm, honest, prosocial, and at times funny," with "very strong safety behaviors, and no signs of major concerns around high-stakes forms of misalignment."^[1]^[8]^[20]

CBRN, cybersecurity, and autonomy

The system card includes evaluations for biological, chemical, radiological, and nuclear uplift, cybersecurity, and autonomous capability. Anthropic's published headline result is that Sonnet 4.6 met the ASL-3 thresholds across all dimensions but did not approach the higher ASL-4 thresholds. Deployment-time safeguards include the same input and output classifiers rolled out for Opus 4.6, plus the constitutional-AI-derived training that the rest of the Claude 4 family shares. For cybersecurity, Sonnet 4.6 is competent on entry-level cyber tasks but does not reliably solve advanced offensive scenarios; the highest-capability cybersecurity-relevant model is held back as the invitation-only Project Glasswing Mythos Preview, which crossed 93.9% on SWE-bench Verified internally but remains restricted to defensive cybersecurity uplift.^[15]^[19]^[20]

System card revision

On March 6, 2026, Anthropic published a revised system card after running an improved cheating-detection pipeline on BrowseComp. The new pipeline identified nine additional unintended shortcuts and revised the single-agent BrowseComp score from 74.72% to 74.01% (74.7% in summary materials) and the multi-agent score to 82.07%. The willingness to publish a downward revision rather than quietly retire the original was cited approvingly as a signal about evaluation discipline.^[8]^[27]

Availability and deployment

The deployment matrix below summarizes current options as of May 2026.

Channel	Model identifier	Notes
Anthropic API (1P)	`claude-sonnet-4-6`	Messages API, Batches API, prompt caching, MCP, all betas
Claude Managed Agents	`claude-sonnet-4-6`	Adds $0.08/hour session runtime; Batch and Fast mode unavailable
Claude Platform on AWS	`claude-sonnet-4-6` (Anthropic console, AWS auth)	Launched May 11, 2026; 18 AWS regions; AWS billing and IAM
AWS Bedrock (global)	`global.anthropic.claude-sonnet-4-6`	Routes worldwide; available in 30+ regions
AWS Bedrock (geo)	`us / eu / au / jp.anthropic.claude-sonnet-4-6`	Single geography for data residency; 1.1x premium
AWS Bedrock (in-region)	`anthropic.claude-sonnet-4-6` in eu-west-2	London in-region for strict residency; 1.1x premium
GCP Vertex AI	`claude-sonnet-4-6`	Global, multi-region, regional endpoints; 1.1x premium non-global
Snowflake Cortex AI	Anthropic Claude Sonnet 4.6	Day-one; $200M expanded partnership; 12,600+ enterprise customers
Microsoft Foundry	Claude Sonnet 4.6 in catalog	Recommended Sonnet 48 hours after launch
Microsoft 365 Copilot	Sonnet 4.6 in model picker	Live March 9, 2026; ~400M Office seats
Microsoft 365 Copilot Studio	Sonnet 4.6 (default after May 30, 2026)	Sonnet 4.5 retires May 30; auto-migration; 30-day deferral option
claude.ai	default on Free, Pro, Team	Free tier includes file creation, connectors, skills, compaction
Claude Code	default for Pro/Team	Opus 4.6 selectable; 1M context for Max/Team/Enterprise after March 13
Claude Cowork	default underlying model	GA April 9, 2026 on all paid plans; role-based access, Zoom MCP, group spend
GitHub Copilot	generally available across plans	VS Code, Visual Studio, GitHub Mobile, Copilot CLI, coding agent
Cursor	`claude-4-6-sonnet`	Default for most agent-mode coding sessions
Salesforce Agentforce Vibes IDE	Claude as first-party coding model	Developer Edition refresh April 2026
Third-party	Replit, Hex, Factory.AI, Notion, Augment Code, Warp, Zencoder, Poe	Day-one or near-day-one integrations

The Anthropic API exposes the most beta features (300K-token batch outputs, prompt cache breakpoints, MCP servers, latest computer-use beta) but requires Anthropic's own billing. AWS Bedrock and Vertex AI offer the same model with platform-native billing, data residency at a 1.1x premium, and cloud IAM integration. Claude Platform on AWS, generally available since May 11, 2026, gives AWS customers Anthropic's native Claude platform (Console, Managed Agents, Skills, Files API, MCP connector, advisor strategy) using AWS authentication and billing across 18 launch regions. The choice is typically a procurement decision rather than a capability one.^[2]^[4]^[36]

Snapshots and version history

Snapshot	API model ID	Release date	Status (May 2026)
Initial release	`claude-sonnet-4-6`	February 17, 2026	Active

As of May 17, 2026, no second Sonnet 4.6 snapshot has been released and Anthropic has not officially announced a successor. A source map file accidentally shipped in the Claude Code npm package on March 31, 2026 referenced a claude-sonnet-4-8 identifier, leading several outlets to report that Anthropic plans to skip a hypothetical Sonnet 4.7 numbering; the leak remains unconfirmed.^[49]^[50] The February 17, 2026 snapshot remains the only Sonnet-tier model in the Claude 4 generation beyond Sonnet 4.5. The dateless identifier convention introduced with the 4.6 generation means future updates will use a new dateless ID rather than reusing claude-sonnet-4-6. The system card was revised on March 6, 2026; the underlying model snapshot was not changed.^[2]^[27]

Timeline of Sonnet 4.6 events

Date	Event
February 5, 2026	Claude Opus 4.6 released; introduces adaptive thinking, context compaction, 1M context beta
February 17, 2026	Claude Sonnet 4.6 released; default on claude.ai for Free, Pro, and Team
February 17, 2026	GitHub Copilot, Cursor, Replit, Hex, Factory, Notion, AWS Bedrock, GCP Vertex AI, Microsoft Foundry, Snowflake Cortex AI day-one launches
February 2026	Snowflake and Anthropic announce $200 million multi-year partnership covering 12,600+ Snowflake enterprise customers
March 6, 2026	System card revised after improved BrowseComp cheating-detection pipeline; scores adjusted
March 9, 2026	Microsoft 365 Copilot integration live; Sonnet 4.6 reaches ~400M Office seats
March 13, 2026	1M context window moves to GA at standard pricing for Sonnet 4.6 and Opus 4.6; long-context surcharge removed; image/PDF capacity raised to 600 per request
March 31, 2026	Source map leak references `claude-sonnet-4-8` identifier inside Claude Code npm package (unconfirmed by Anthropic)
April 9, 2026	Claude Cowork reaches general availability on all paid plans; Advisor Strategy introduced with Sonnet 4.6 as executor
April 16, 2026	Claude Opus 4.7 released; new tokenizer; Sonnet 4.6 remains active default Sonnet
April 30, 2026	Microsoft Copilot Studio Sonnet 4.5 evaluation window opens; admins can test agents on Sonnet 4.6
May 1, 2026	Microsoft 365 E7 Frontier Suite GA with multi-model Copilot, Agent 365, and Copilot Cowork including Sonnet 4.6
May 11, 2026	Claude Platform on AWS launches with Sonnet 4.6 across 18 AWS regions
May 30, 2026	Scheduled retirement of Sonnet 4.5 in Microsoft Copilot Studio; auto-migration to Sonnet 4.6 begins
May 2026	Sonnet 4.6 remains the active Sonnet flagship; no successor officially announced

Migration guide

Migrating from Sonnet 4.5 to Sonnet 4.6 typically requires only changing the model identifier. Two areas warrant attention. First, code that set thinking={"type": "enabled", "budget_tokens": <n>} on Sonnet 4.5 should migrate to adaptive thinking via thinking={"type": "adaptive"} plus an effort hint (low, medium, high, max), starting with medium and raising to high for tasks that previously needed deep extended thinking.^[12] Second, Sonnet 4.6 at higher effort levels can spend significantly more reasoning tokens than Sonnet 4.5 did under the prior fixed-budget regime. Artificial Analysis's all-in cost data (Sonnet 4.6 at $2,088 versus Sonnet 4.5 at $733 on the same Intelligence Index suite) is the most-cited reference; production cost monitoring should track per-task output-token usage during the migration window.^[3]^[12]

Prompt caching, batch API, and MCP-based tool-use code requires no changes; the same multipliers apply. For Bedrock, the identifier change from anthropic.claude-sonnet-4-5-20250929-v1:0 to anthropic.claude-sonnet-4-6 plus a region or endpoint selection is sufficient. Code that previously used a regional endpoint should review whether the global endpoint with 10% lower pricing is acceptable for its data residency requirements.^[2]^[4]^[12]

Microsoft Copilot Studio tenants face a separate cutover: Sonnet 4.5 retires on May 30, 2026 with auto-migration to Sonnet 4.6 unless administrators enable the one-time 30-day deferral through June 30. The April 30 to May 30 evaluation window is the recommended testing slot.^[34]^[42]

Limitations

Smaller output cap than Opus 4.6

Claude Sonnet 4.6 has a 64,000-token maximum output, half of Opus 4.6's 128,000 tokens. For long-form generation use cases such as book-length manuscripts, very long contracts, or large codebase rewrites, the output cap can require chunking or multi-call orchestration even when the input fits in a single 1M-token request. The Batch API beta header output-300k-2026-03-24 partially addresses this for non-real-time workloads, allowing up to 300,000 output tokens per batched request.^[2]

High-effort cost

Multiple independent reviews found that Sonnet 4.6 at max effort can spend output tokens at a rate that erodes the 40% per-token discount versus Opus 4.6. Artificial Analysis reported that Sonnet 4.6 used roughly 28% more output tokens than Opus 4.6 to complete the same Intelligence Index suite, and Mowshowitz found ARC-AGI-2 evaluation costs roughly equal between Sonnet at max effort and Opus. Production deployments that use max effort routinely should benchmark real-world cost rather than relying on per-token list price.^[3]^[8]

GPQA Diamond regression

On GPQA Diamond, Sonnet 4.6 scored 74.1% per NxCode's collation, slightly below Sonnet 4.5's 79.0%. The regression is small and may reflect training-data trade-offs that benefit other capabilities, but the Anthropic launch post does not address this benchmark directly. Use cases that depend heavily on graduate-level science reasoning may want to test Sonnet 4.6 head-to-head against Sonnet 4.5 on the specific workload before switching.^[13]

Computer-use accuracy still requires oversight

While 72.5% on OSWorld-Verified is a major leap, it still implies that roughly 27% of complex GUI automation tasks fail or require intervention. Anthropic recommends production computer-use deployments include human-in-the-loop oversight for high-stakes actions and explicit retry-and-validation layers for low-stakes tasks. Caylent's review described the design pattern as "approval workflows + retries + validation," not autonomous operation.^[1]^[7]

Tool-calling eagerness

Mowshowitz and several developer-forum commenters reported that Sonnet 4.6 sometimes called tools more aggressively than Sonnet 4.5 in agent loops, occasionally invoking tools that were not necessary for the immediate task. The behavior is broadly positive in environments designed around heavy tool use but can show up as unnecessary cost or latency in lighter-weight applications.^[8]

Knowledge cutoff

The model's reliable knowledge cutoff is August 2025 and its training data cutoff is January 2026. Applications requiring up-to-date information about events after January 2026 should supplement the model with retrieval-augmented generation or live tool access to current sources.^[2]

No high-resolution vision

Claude Opus 4.7, released two months after Sonnet 4.6, introduced a new tokenizer and high-resolution vision pipeline (up to 2,576 pixels on the long edge with 1:1 pixel-to-coordinate mapping). Sonnet 4.6 retains the older Claude 4 tokenizer and vision pipeline, with vision capped at 1,568 pixels on the long edge. Use cases requiring fine-grained vision, such as clicking on small UI elements or reading low-resolution scanned text, may need to use Opus 4.7 instead.^[15]

No fast mode tier

Opus 4.6 has a Fast mode beta available at 6x standard pricing for cases where developers prefer to pay for higher tokens-per-second at flagship intelligence. Sonnet 4.6 does not have an equivalent fast mode tier as of May 2026; users requiring sub-Fast latency at Sonnet quality must rely on Anthropic's Priority Tier or on the standard latency profile.^[2]

Widening coding gap to Opus 4.7

The arrival of Opus 4.7 on April 16, 2026 widened the headline coding gap between Sonnet and Opus inside the Claude 4 family: eight points on SWE-bench Verified (87.6% vs 79.6%) and 7.5 points on SWE-bench Pro (64.3% vs 56.8%). For workloads where Sonnet 4.6's coding accuracy is the binding constraint, the Advisor Strategy is an intermediate option, but applications requiring Opus-level coding accuracy on every step should price in the full Opus 4.7 cost.^[15]^[35]^[37]

References

Anthropic. "Introducing Claude Sonnet 4.6." Anthropic Newsroom, February 17, 2026. https://www.anthropic.com/news/claude-sonnet-4-6
Anthropic. "Models overview." Claude API Documentation. Retrieved May 2026. https://platform.claude.com/docs/en/about-claude/models/overview
Artificial Analysis. "Claude Sonnet 4.6: Everything You Need to Know." February 2026. https://artificialanalysis.ai/articles/sonnet-4-6-everything-you-need-to-know
Amazon Web Services. "Claude Sonnet 4.6 model card." Amazon Bedrock Documentation. https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-anthropic-claude-sonnet-4-6.html
Lavaee, Alex. "Claude Sonnet 4.6: What Developers Actually Need to Know." alexlavaee.me, February 2026. https://alexlavaee.me/blog/sonnet-4-6-technical-breakdown/
DataCamp. "Claude Sonnet 4.6: Features, Access, Tests, and Benchmarks." February 2026. https://www.datacamp.com/blog/claude-sonnet-4-6
Caylent. "Claude Sonnet 4.6 in Production: Capability, Safety, and Cost Explained." February 2026. https://caylent.com/blog/claude-sonnet-4-6-in-production-capability-safety-and-cost-explained
Mowshowitz, Zvi. "Claude Sonnet 4.6 Gives You Flexibility." Don't Worry About the Vase (Substack), February 2026. https://thezvi.substack.com/p/claude-sonnet-46-gives-you-flexibility
Wikipedia contributors. "Claude (language model)." Wikipedia, retrieved May 2026. https://en.wikipedia.org/wiki/Claude_(language_model)
Anthropic. "Introducing Claude Sonnet 4.5." Anthropic Newsroom, September 29, 2025. https://www.anthropic.com/news/claude-sonnet-4-5
Anthropic. "Claude Sonnet 4.6 product page." Anthropic. Retrieved May 2026. https://www.anthropic.com/claude/sonnet
Anthropic. "Adaptive thinking." Claude API Documentation. https://platform.claude.com/docs/en/build-with-claude/adaptive-thinking
NxCode. "Claude Sonnet 4.6: 79.6% SWE-bench at $3/MTok, Complete Guide (2026)." 2026. https://www.nxcode.io/resources/news/claude-sonnet-4-6-complete-guide-benchmarks-pricing-2026
Anthropic. "Introducing Claude Opus 4.6." Anthropic Newsroom, February 5, 2026. https://www.anthropic.com/news/claude-opus-4-6
Anthropic. "Introducing Claude Opus 4.7." Anthropic Newsroom, April 16, 2026. https://www.anthropic.com/news/claude-opus-4-7
GitHub. "Claude Sonnet 4.6 is now generally available in GitHub Copilot." GitHub Changelog, February 17, 2026. https://github.blog/changelog/2026-02-17-claude-sonnet-4-6-is-now-generally-available-in-github-copilot/
Anthropic. "Claude Sonnet 4.6 product page." Anthropic. Retrieved May 2026. https://www.anthropic.com/claude/sonnet
Pillitteri, Pasquale. "How to Use Claude Sonnet 4.6 in Claude Code: Complete Guide." 2026. https://pasqualepillitteri.it/en/news/292/claude-sonnet-4-6-claude-code-guide
Anthropic. "Responsible Scaling Policy Version 3.0." Anthropic Newsroom, 2025. https://www.anthropic.com/news/responsible-scaling-policy-v3
Anthropic. "System Card: Claude Sonnet 4.6." February 17, 2026. https://www-cdn.anthropic.com/bbd8ef16d70b7a1665f14f306ee88b53f686aa75.pdf
Anthropic. "1M context window now generally available at standard pricing." Anthropic blog, March 13, 2026. https://claude.com/blog/1m-context-ga
The New Stack. "Anthropic makes a pricing change that matters for Claude's longest prompts." March 2026. https://thenewstack.io/claude-million-token-pricing/
Anthropic. "Release notes." Claude Help Center. Retrieved May 2026. https://support.claude.com/en/articles/12138966-release-notes
FetchLogic. "Claude Sonnet 4.6: Anthropic's 2026 Power Move Explained." February 2026. https://fetchlogic.net/claude-sonnet-4-6-is-anthropics-quiet-power-move-and-free-users-are-the-trojan-horse/
PricePerToken. "HumanEval Leaderboard 2026 - Compare AI Model Scores." Retrieved May 2026. https://pricepertoken.com/leaderboards/benchmark/humaneval
Morphllm. "Claude Benchmarks (2026): Every Score for Opus 4.6, Sonnet 4.6 & Haiku." 2026. https://www.morphllm.com/claude-benchmarks
Anthropic. "Claude Sonnet 4.6 system card revision (BrowseComp cheating-detection update)." March 6, 2026. https://www.anthropic.com/claude-sonnet-4-6-system-card
Rootly. "Claude Sonnet 4.6: Benchmark Results and Lessons for AI SRE." February 18, 2026. https://rootly.com/blog/claude-sonnet-4-6-benchmark-results-and-lessons-for-ai-sre
Marc0.dev. "SWE-Bench Verified Leaderboard March 2026." Retrieved May 2026. https://www.marc0.dev/en/leaderboard
AceCloud. "Gemini 3.1 Pro Vs Sonnet 4.6 Vs Opus 4.6 Vs GPT-5.2 Vs Meta Muse Spark (2026)." 2026. https://acecloud.ai/blog/gemini-3-1-pro-vs-sonnet-4-6-vs-opus-4-6-vs-gpt-5-2/
IntuitionLabs. "Claude vs ChatGPT vs Copilot vs Gemini: 2026 Enterprise Guide." 2026. https://intuitionlabs.ai/articles/claude-vs-chatgpt-vs-copilot-vs-gemini-enterprise-comparison
Cursor. "Claude 4.6 Sonnet documentation." Cursor Docs. Retrieved May 2026. https://cursor.com/docs/models/claude-4-6-sonnet
Snowflake. "Announcing Anthropic Claude Sonnet 4.6 on Snowflake Cortex AI." Snowflake blog, February 2026. https://www.snowflake.com/en/blog/claude-sonnet-4-6-snowflake-cortex-ai/
HandsOnTek M365 Admin. "Microsoft 365 Copilot: Claude Sonnet 4.5 retiring for Copilot Studio (automatic migration to Claude Sonnet 4.6)." April 2026. https://m365admin.handsontek.net/microsoft-365-copilot-claude-sonnet-4-5-retiring-copilot-studio-automatic-migration-claude-sonnet-4-6/
Pillitteri, Pasquale. "Anthropic Launches Managed Agents and Claude Cowork GA: The Triple Announcement of April 9, 2026." 2026. https://pasqualepillitteri.it/en/news/755/anthropic-managed-agents-cowork-ga-april-9-2026
InfoQ. "Anthropic Launches Claude Platform on AWS." May 2026. https://www.infoq.com/news/2026/05/anthropic-claude-aws/
Morphllm. "SWE-Bench Pro Leaderboard (2026): Why 46% Beats 81%." 2026. https://www.morphllm.com/swe-bench-pro
BenchLM. "TAU-bench Benchmark 2026: 38 model averages." Retrieved May 2026. https://benchlm.ai/benchmarks/tauBench
BenchLM. "Claude Sonnet 4.6 vs GPT-5.5: AI Benchmark Comparison 2026." 2026. https://benchlm.ai/compare/claude-sonnet-4-6-vs-gpt-5-5
Kilo.ai blog. "Benchmarking the Benchmarks: New GPT and Claude Releases Continue to One-Up Themselves." May 2026. https://blog.kilo.ai/p/benchmarking-the-benchmarks-new-gpt
Kim, Sean. "Microsoft 365 Copilot Now Runs Claude Sonnet: How 400 Million Office Users Got Access to Anthropic's AI." March 2026. https://blog.imseankim.com/microsoft-365-copilot-claude-sonnet-anthropic-enterprise-integration-2026/
Microsoft Copilot Blog. "Available today: Claude Sonnet 4.5 in Microsoft Copilot Studio." Microsoft, 2025. https://www.microsoft.com/en-us/microsoft-copilot/blog/copilot-studio/available-today-claude-sonnet-4-5-in-microsoft-copilot-studio/
Snowflake / Anthropic. "Snowflake and Anthropic Announce $200 Million Partnership to Bring Agentic AI to Global Enterprises." Snowflake press release, February 2026. https://www.snowflake.com/en/news/press-releases/snowflake-and-anthropic-announce-200-million-partnership-to-bring-agentic-ai-to-global-enterprises/
Winbuzzer. "Microsoft Shifts Engineers from Claude Code to GitHub Copilot CLI." May 15, 2026. https://winbuzzer.com/2026/05/15/microsoft-starts-canceling-claude-code-licenses-xcxwbn/
Demandsage. "Claude Statistics 2026: 18.9M Active Users & $14B Revenue." 2026. https://www.demandsage.com/claude-ai-statistics/
Presenc. "Claude Usage Statistics 2026." Retrieved May 2026. https://presenc.ai/research/claude-usage-statistics
GetPanto. "Anthropic AI Statistics 2026: Users, Revenue & Market Share." 2026. https://www.getpanto.ai/blog/anthropic-ai-statistics
Salesforce Developers. "New in Salesforce Developer Edition: Agentforce Vibes IDE, Claude 4.5, MCP." April 2026. https://developer.salesforce.com/blogs/2026/04/new-developer-edition-agentforce-vibes-claude-mcp
NxCode. "Claude Sonnet 4.8 (Not 4.7): Release Date, Features & What to Expect (2026)." 2026. https://www.nxcode.io/resources/news/claude-sonnet-4-8-release-date-features-what-to-expect-2026
Geeky-Gadgets. "Claude Sonnet 4.8 Leaks Reveal About Anthropic's Next AI Release." 2026. https://www.geeky-gadgets.com/claude-sonic-4-8-leaks/

Background

The Sonnet product line

Position within Claude 4

Release context

Technical specifications

Model ID format and snapshot

One-million-token context window

Adaptive thinking

Context compaction

Vision and computer use

Tool use, function calling, and MCP

Pricing

Cost relative to sibling Claude models

Code execution, web search, and managed agent pricing

Benchmark performance

SWE-bench Verified, Multilingual, and Pro

OSWorld-Verified and computer use

GDPval-AA and Terminal-Bench

Tau-Bench, tau2-bench, and MCP-Atlas tool use

BrowseComp and web research

GPQA, Math, MMLU-Pro, HumanEval, and ARC-AGI

Domain benchmarks: Finance, SRE, Insurance

Other reasoning and creative benchmarks

Speed and throughput

Comparison with other Claude models

Comparison with non-Anthropic frontier models

Capabilities in detail

Agentic coding

Computer use

Long-horizon agents and planning

Vision and document understanding

Multilingual capability

Prompt caching and batch processing

Use cases

Default model on claude.ai

Daily coding driver

Document and analyst workflows

Multi-agent orchestration

AI SRE and incident response

Microsoft 365 and Snowflake enterprise channels

Reception and adoption

Adoption and market context

Safety and alignment

Single-turn harmlessness

Prompt injection

Misalignment evaluations

CBRN, cybersecurity, and autonomy

System card revision

Availability and deployment

Snapshots and version history

Timeline of Sonnet 4.6 events

Migration guide

Limitations

Smaller output cap than Opus 4.6

High-effort cost

GPQA Diamond regression

Computer-use accuracy still requires oversight

Tool-calling eagerness

Knowledge cutoff

No high-resolution vision

No fast mode tier

Widening coding gap to Opus 4.7

See also

References

Improve this article

Related Articles

DeepSeek 3.0

Claude 4

Claude Opus 4

Claude Opus 4.1

Claude Sonnet 4

Claude 3 Opus

Background

The Sonnet product line

Position within Claude 4

Release context

Technical specifications

Model ID format and snapshot

One-million-token context window

Adaptive thinking