Claude Sonnet 4.6 is a large language model developed by Anthropic and released on February 17, 2026. It is the seventh model in the Claude 4 generation and the immediate successor to Claude Sonnet 4.5, which had served as Anthropic's mid-tier flagship since September 2025. Sonnet 4.6 launched twelve days after Claude Opus 4.6 and brought the same one-million-token context window, adaptive thinking, and context compaction features to the lower-priced Sonnet tier at unchanged pricing of $3 per million input tokens and $15 per million output tokens.[1][2]
Anthropic positioned the release around a single claim: that performance which had previously required reaching for an Opus-class model, especially on real-world office tasks, computer use, and agentic coding, was now available at Sonnet pricing. The company reported that developers preferred Sonnet 4.6 to Sonnet 4.5 about 70% of the time in Claude Code testing and preferred it to Claude Opus 4.5 about 59% of the time on the same comparison.[1] On the Artificial Analysis Intelligence Index, Sonnet 4.6 scored 51, two points behind Opus 4.6 and tied with GPT-5.2 (xhigh), narrowing what had been a seven-point gap between the Sonnet and Opus tiers in the prior generation to two points.[3]
The model became the default Claude on claude.ai for free, Pro, and Team users on the day of release, replacing Sonnet 4.5 in that role. It also shipped on the same day to Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, GitHub Copilot, and Anthropic's own Claude Code terminal agent, with day-one integrations from Cursor, Replit, Hex, Factory, Notion, and several other developer platforms.[1][4][5]
The release was framed as a turning point for the Sonnet tier rather than a routine update. Artificial Analysis called Sonnet 4.6 the leader on GDPval-AA and TerminalBench, ahead of Opus 4.6 on those two evaluations despite the price gap.[3] DataCamp described it as approaching Opus-level intelligence at a price point that made it practical for far more tasks, while Caylent highlighted the jump in computer-use accuracy from 61.4% on Sonnet 4.5 to 72.5% on Sonnet 4.6 as the change with the largest practical implications for production agents.[6][7] Zvi Mowshowitz and others noted that real-world cost savings depended on effort tuning, since Sonnet 4.6 at max effort can over-spend output tokens.[8] On March 13, 2026, less than four weeks after launch, Anthropic moved the 1M context window to general availability and removed the long-context price surcharge entirely; a 900,000-token Sonnet 4.6 request was thereafter billed at the same per-token rate as a 9,000-token request.[21][22]
The Sonnet name within Anthropic's three-tier Claude family designates the mid-tier model: cheaper and faster than Opus, more capable than Haiku. The line began with Claude 3 Sonnet in March 2024 and ran through Claude 3.5 Sonnet (June 2024, refreshed October 2024), Claude 3.7 Sonnet (February 2025), Claude Sonnet 4 (May 2025), and Claude Sonnet 4.5 (September 2025). Sonnet 4.6 is the sixth Sonnet release overall and the third within the Claude 4 generation. Sonnet pricing has held at $3 / $15 per million tokens since March 2024 despite substantial capability gains. Sonnet 4.5 was the first Sonnet to launch under ASL-3 in September 2025, alongside context awareness, the agentic coding features that powered Claude Code 2.0, and Anthropic's Claude Agent SDK. Sonnet 4.6 inherits all those features and extends the line into 1M context windows and adaptive thinking, previously Opus-only.[1][9][10]
The Claude 4 family at Sonnet 4.6's release comprised seven publicly available models. Claude Opus 4 and Claude Sonnet 4 from May 2025 were both deprecated and scheduled for retirement on June 15, 2026. Claude Opus 4.1 (August 2025) and Claude Opus 4.5 (November 2025) remained available as legacy models. Claude Sonnet 4.5 (September 2025), Claude Haiku 4.5 (October 2025), and Claude Opus 4.6 (February 2026) were the active prior releases.[2][9] Sonnet 4.6 fits between Opus 4.6 and Haiku 4.5 in capability and price: 40% cheaper per token than Opus 4.6 with the same 1M context but a 64K output cap (versus 128K on Opus 4.6), trailing Opus 4.6 by one to two points on most benchmarks. Compared to Haiku 4.5 it is three times more expensive, with a five-times-larger context window and several percentage points stronger on reasoning, coding, and agentic benchmarks.[1][2][6]
The progression of recent Sonnet releases is summarized below.
| Sonnet model | Release date | Context window | Max output | Input ($/MTok) | Output ($/MTok) | SWE-bench Verified | OSWorld | ASL |
|---|---|---|---|---|---|---|---|---|
| Claude 3.7 Sonnet | February 2025 | 200K | 64,000 | $3.00 | $15.00 | 70.3% | n/a | ASL-2 |
| Claude Sonnet 4 | May 22, 2025 | 200K | 64,000 | $3.00 | $15.00 | 72.7% | 42.2% | ASL-2 |
| Claude Sonnet 4.5 | September 29, 2025 | 200K | 64,000 | $3.00 | $15.00 | 77.2% | 61.4% | ASL-3 |
| Claude Sonnet 4.6 | February 17, 2026 | 1M | 64,000 | $3.00 | $15.00 | 79.6% | 72.5% | ASL-3 |
Sonnet 4.6 arrived during a compressed window of Claude 4 family releases. Opus 4.5 had landed in November 2025 with the new $5 / $25 Opus pricing. Opus 4.6 followed on February 5, 2026, introducing adaptive thinking, context compaction, and the 1M-token context window. Sonnet 4.6 launched twelve days later, then long-context pricing went to standard rates on March 13, 2026, and Claude Opus 4.7 shipped on April 16, 2026 with a new tokenizer. With Sonnet 4.6 serving as the default in claude.ai (Free, Pro, Team) and in Claude Code for Pro and Team subscribers, it became the version of Claude that the largest share of users actually interact with, even after Opus 4.7 took over flagship benchmark headlines.[5][14][21][24]
The table below summarizes the key parameters of Claude Sonnet 4.6 as documented in Anthropic's API documentation, the AWS Bedrock model card, and the system card.
| Parameter | Value |
|---|---|
| Model ID (Anthropic API) | claude-sonnet-4-6 |
| Model alias | claude-sonnet-4-6 |
| AWS Bedrock ID | anthropic.claude-sonnet-4-6 |
| GCP Vertex AI ID | claude-sonnet-4-6 |
| Release date | February 17, 2026 |
| Context window | 1,000,000 tokens (GA from March 13, 2026) |
| Max output tokens | 64,000 (300,000 via batch beta header) |
| Input modalities | Text, images |
| Output modalities | Text |
| Extended thinking | Yes |
| Adaptive thinking | Yes |
| Computer use | Supported |
| Tool use / function calling | Supported |
| MCP support | Supported |
| Reliable knowledge cutoff | August 2025 |
| Training data cutoff | January 2026 |
| AI Safety Level | ASL-3 |
| Prompt caching (min tokens) | 1,024 (Bedrock); 4,096 (Anthropic API) |
| Prompt caching (max checkpoints) | 4 per request |
| Prompt caching cacheable fields (Bedrock) | system, messages, tools |
| Priority Tier | Yes |
| Latency tier | Fast |
Unlike Opus 4.7, which shipped with a new tokenizer in April 2026 that increased per-task token usage by up to 35%, Sonnet 4.6 uses the same tokenizer as the rest of the Claude 4 generation. Per-token cost comparisons against Sonnet 4.5 therefore translate cleanly into per-task cost comparisons for the same workload.[1][2]
Starting with the Claude 4.6 generation, Anthropic moved away from snapshot-dated model identifiers in favor of a dateless format that is still a pinned snapshot rather than an evergreen pointer. The model ID claude-sonnet-4-6 refers to a single fixed snapshot released on February 17, 2026, which Anthropic does not silently update. Future Sonnet releases will receive new dateless identifiers (for example, a hypothetical claude-sonnet-4-7) rather than continuing to update claude-sonnet-4-6.[2]
The Bedrock listing carries some additional notes. Bedrock supports Sonnet 4.6 through global cross-region inference (global.anthropic.claude-sonnet-4-6), four geographic-region IDs (us, eu, au, jp), and a London in-region option (eu-west-2). Geo cross-region inference is available from a wider region set including most US, EU, AU, and JP regions; all 30 advertised AWS regions worldwide can use the global inference profile. The Bedrock marketplace product ID is prod-ffvjxvh4ltq64. Standard and Reserved service tiers are supported; Priority and Flex are not at launch.[4]
Sonnet 4.6 is the first Sonnet to support a one-million-token context window, matching Opus 4.6 and Opus 4.7. Anthropic describes this as enough to hold an entire codebase, a long contract, or dozens of research papers in a single request. At launch the feature was in beta on the Anthropic API and required usage tier 4 or above, with a long-context price surcharge for any input above 200,000 tokens.[2][11] On March 13, 2026, Anthropic moved the 1M context window to general availability for Sonnet 4.6 and Opus 4.6 and announced that standard pricing applies across the full window: a 900,000-token request is billed at the same per-token rate as a 9,000-token request. Full rate limits became available at every context length, prompt caching and batch processing discounts continued to apply at standard rates, and media capacity for image and PDF inputs expanded from 100 to 600 items per request. Cursor and other agent platforms removed their long-context multiplier within hours of the announcement.[21][22]
Like Opus 4.6, Sonnet 4.6 uses adaptive thinking instead of the fixed budget_tokens model that Sonnet 4.5 inherited from Claude 3.7 Sonnet. The model decides at runtime whether and how deeply to reason; developers choose an effort level (low, medium, high, max) instead of a hard token budget. Low effort suits chat and low-latency agents; medium is the default; high allocates more tokens for complex multi-step problems; max permits tens of thousands of reasoning tokens on a single response with corresponding cost.[1][12] Artificial Analysis observed that at max effort Sonnet 4.6 sometimes used as many tokens as Opus 4.6 on a benchmark question, partly explaining why the headline 40% per-token discount versus Opus 4.6 narrowed in real evaluation runs.[3] Sonnet 4.5 deployments that passed thinking={"type": "enabled", "budget_tokens": 32000} continue to work for backward compatibility; new integrations should use thinking={"type": "adaptive"} plus an effort setting.[12]
Context compaction, introduced for Opus 4.6 and inherited here, lets long-running agents summarize earlier turns automatically once they approach the context-window limit. With compaction enabled, an agent can run for hundreds of turns or thousands of tool calls before hitting an explicit context overflow. Default thresholds are 50,000 tokens minimum and 150,000 tokens default trigger, both configurable.[1][2][5] Alexlavaee.me described compaction as enabling "effectively unlimited session length," with the caveat that the compacted summary loses fidelity versus the raw transcript.[5]
Sonnet 4.6 supports image input alongside text, with the same vision pipeline as the rest of the Claude 4 generation: document analysis, chart and figure understanding, screenshot interpretation, and visual question answering. Vision integrates tightly with computer use, where the model reads desktop or browser screenshots and produces keyboard and mouse actions for multi-step UI tasks.[1][6] On OSWorld-Verified, Sonnet 4.6 scored 72.5%, narrowly behind Opus 4.6 (72.7%), well ahead of Sonnet 4.5 (61.4%), and dramatically above the original Sonnet computer-use baseline of around 14.9%. Anthropic also reported 94% accuracy on a real-world insurance-industry computer-use evaluation.[1][3][5][7]
Sonnet 4.6 supports structured tool use, parallel tool calls, the Model Context Protocol, and interleaved reasoning between tool calls. The pricing documentation lists per-request token overhead: 346 tokens for the system prompt when tool_choice is auto or none, 313 tokens when tool_choice is any or a specific tool. Server-side tools add their own overheads: 245 tokens for the bash tool, 700 tokens for the text editor (text_editor_20250429), 735 tokens per computer-use tool definition (with an additional 466 to 499 tokens of system prompt). These figures matter when budgeting prompt-cache hit rates and per-call token costs.[1][2] Developers using Claude Agent SDK or third-party agent frameworks can switch from Sonnet 4.5 to Sonnet 4.6 by changing only the model identifier; the API surface is backward compatible within the Sonnet line.[2][12]
Claude Sonnet 4.6 uses standard per-token billing with no subscription requirement for API access. Pricing is unchanged from Sonnet 4.5 and remained at the standard rate after long-context pricing went GA on March 13, 2026.
| Tier | Input ($/MTok) | Output ($/MTok) |
|---|---|---|
| Standard (any context length, GA from March 13, 2026) | $3.00 | $15.00 |
| Long context (>200K input, prior to March 13, 2026) | $6.00 | $22.50 |
| Batch API (50% discount) | $1.50 | $7.50 |
| Prompt cache writes (5-minute TTL, 1.25x base) | $3.75 | n/a |
| Prompt cache writes (1-hour TTL, 2x base) | $6.00 | n/a |
| Prompt cache reads (0.1x base) | $0.30 | n/a |
| US data residency multiplier (Anthropic API) | 1.1x | 1.1x |
| AWS Bedrock regional/multi-region premium | 1.1x | 1.1x |
Anthropic markets up to 90% cost savings via prompt caching and 50% via batch processing. The cache-read rate of $0.30/MTok is one tenth of the standard input rate, breaking even after one cached read for the 5-minute TTL or two reads for the 1-hour TTL. Two pricing modifiers stack: US-only inference via inference_geo (1.1x across all categories) and Bedrock/Vertex AI regional or multi-region endpoints (1.1x versus global), introduced with Sonnet 4.5 in late 2025.[2][11] Most Bedrock and Vertex AI listings match Anthropic API rates closely; OpenRouter quotes the standard $3 / $15.
The table below compares Sonnet 4.6 to the rest of the Claude 4 generation on a per-token basis at the post-March-13 standard rate.
| Model | Input ($/MTok) | Output ($/MTok) | Long-context premium | Ratio vs Sonnet 4.6 input | Ratio vs Sonnet 4.6 output |
|---|---|---|---|---|---|
| Claude Haiku 4.5 | $1.00 | $5.00 | n/a | 0.33x | 0.33x |
| Claude Sonnet 4.6 | $3.00 | $15.00 | None (standard pricing across full 1M window) | 1.0x | 1.0x |
| Claude Sonnet 4.5 | $3.00 | $15.00 | n/a (200K only) | 1.0x | 1.0x |
| Claude Opus 4.5 | $5.00 | $25.00 | n/a | 1.67x | 1.67x |
| Claude Opus 4.6 | $5.00 | $25.00 | None (standard pricing across full 1M window) | 1.67x | 1.67x |
| Claude Opus 4.7 | $5.00 | $25.00 | None (standard pricing across full 1M window) | 1.67x | 1.67x |
At list price, Sonnet 4.6 is 40% cheaper per token than Opus and 200% more expensive than Haiku. Real-world differences vary depending on effort levels and long context use. Artificial Analysis reported Sonnet 4.6 used about 74 million output tokens to complete its full Intelligence Index suite, roughly 28% more than Opus 4.6's 58M and three times Sonnet 4.5's 25M, partly offsetting per-token savings. All-in cost was $2,088 for Sonnet 4.6, $733 for Sonnet 4.5, and $2,486 for Opus 4.6: Sonnet 4.6 was only 16% cheaper than Opus 4.6 at max effort on this benchmark, a much smaller discount than the headline 40% per-token gap. For workloads dominated by short tasks where deep reasoning rarely fires, the per-token gap translates more directly.[3]
Beyond messaging, several built-in tools have separate fees. Code execution is free when invoked with web search or web fetch; otherwise execution time is billed against an organization-wide allowance of 1,550 free hours per month, with overage at $0.05 per container-hour. Web search is $10 per 1,000 search calls plus standard token costs for retrieved content. Claude Managed Agents add a session runtime fee of $0.08 per running session-hour on top of standard token rates and do not get the Batch API discount or the Fast mode premium.[2]
The table below presents the most widely cited benchmark scores for Claude Sonnet 4.6 alongside Sonnet 4.5 (its predecessor) and Opus 4.6 (the contemporary flagship). Where a benchmark was reported in the Anthropic launch post, the headline figure is used; where it was reported by an independent benchmark aggregator, the source is cited.
| Benchmark | Sonnet 4.6 | Sonnet 4.5 | Opus 4.6 | Source |
|---|---|---|---|---|
| SWE-bench Verified | 79.6% | 77.2% | 80.8% | Anthropic launch post[1] |
| OSWorld-Verified (computer use) | 72.5% | 61.4% | 72.7% | Anthropic launch post[1] |
| Terminal-Bench 2.0 | 59.1% | 51.0% | 65.4% | DataCamp / Caylent[6][7] |
| HumanEval | 93.5% | n/a | n/a | Pricepertoken aggregation[25] |
| MMLU-Pro | 85.2% | n/a | n/a | Morphllm aggregation[26] |
| GPQA Diamond | 74.1% | 79.0% | 91.3% | NxCode benchmark guide[13] |
| Math (Anthropic launch) | 89% | 62% | n/a | NxCode benchmark guide[13] |
| ARC-AGI-1 | 86% | n/a | n/a | Mowshowitz analysis[8] |
| ARC-AGI-2 | 60.4% (or 58%) | n/a | 68.8% | NxCode / Mowshowitz[8][13] |
| GDPval-AA (Elo) | 1633 | n/a | 1606 | Artificial Analysis[3] |
| TerminalBench (Artificial Analysis) | 53% | n/a | 46% | Artificial Analysis[3] |
| Insurance computer use | 94% | n/a | n/a | Anthropic launch post[1] |
| Finance Agent v1.1 | 63.3% | n/a | 60.1% | DataCamp[6] |
| MCP-Atlas (tool use) | 61.3% | n/a | 60.3% | Morphllm aggregation[26] |
| Tau-Bench (retail) | 91.7% | n/a | n/a | Morphllm / Anthropic[26] |
| Tau-Bench (telecom) | 97.9% | n/a | 97.9% | Morphllm / Anthropic[26] |
| BrowseComp (single-agent) | 74.7% | n/a | 84.0% | Anthropic system card update[27] |
| BrowseComp (multi-agent) | 82.07% | n/a | n/a | Anthropic system card update[27] |
| SRE-Skills-Bench | 90.4% | 85.9% | 94.7% | Rootly[28] |
| NYT Connections | 58% | 49% | substantially higher | Mowshowitz analysis[8] |
| Artificial Analysis Intelligence Index | 51 | 43 | 53 | Artificial Analysis[3] |
| Single-turn harmlessness | 99.38% | 97.89% | similar | System card[20] |
| Prompt injection attack success (no safeguards) | 1.29% | 49.36% | similar | System card[20] |
Notes: SWE-bench Verified scores are the headline single-run figures from Anthropic's launch post. The OSWorld-Verified score for Sonnet 4.6 is the harmonized public number; OSWorld-Verified is a curated subset of the full OSWorld benchmark. ARC-AGI-2 figures vary slightly between sources (60.4% from NxCode, 58% from Mowshowitz's read of the same release), most likely reflecting different evaluation harnesses or effort settings. Terminal-Bench 2.0 figures for Sonnet 4.6 are from DataCamp and Caylent, with Anthropic's own launch post stressing OSWorld and SWE-bench rather than terminal-bench specifically. Anthropic released a revised system card on March 6, 2026 after running an improved cheating-detection pipeline that found nine additional unintended solutions in the original BrowseComp evaluation, leading the single-agent score to be revised from 74.72% down to 74.01% and the multi-agent score down to 82.07%; the table above uses the revised figures.[1][3][6][7][8][13][27]
SWE-bench Verified tests real-world GitHub issue resolution where the model must identify a bug or feature gap and write a patch that passes the repository's test suite. Sonnet 4.6's 79.6% is a 2.4-point improvement over Sonnet 4.5 (77.2%) and lands within 1.2 points of Opus 4.6 (80.8%). It is the highest SWE-bench Verified score reported for any Sonnet-class model up to that point.[1][14] Caylent and DataCamp argued the 1.2-point delta to Opus 4.6 was within the noise of typical multi-run variance.[6][7] Boris Cherny, the Claude Code creator, was reported as a partial dissenter, arguing the gap mattered for solo debugging where each percentage point translates into real time saved.[5]
The post-launch landscape placed Opus 4.6 at 80.8%, Gemini 3.1 Pro at 80.6%, MiniMax M2.5 at 80.2%, GPT-5.4 at roughly 80%, Sonnet 4.6 at 79.6%, and Grok 4 at 75%. The cluster between 79% and 81% indicated that leading agentic-coding models had converged at the SWE-bench Verified level, with differentiation moving to the harder SWE-bench Pro split and recent agentic terminal benchmarks.[26][29]
OSWorld-Verified measures navigation of desktop and web GUIs, form filling, and multi-step computer tasks. Sonnet 4.6's 72.5% is the largest single capability jump in the launch: Sonnet 4.5 had scored 61.4% five months earlier, and Sonnet 4 scored only 42.2% in May 2025. Sonnet 4.6 is essentially tied with Opus 4.6 (72.7%) and well ahead of GPT-5.3 Codex (64.7%). Anthropic also reported 94% accuracy on an internal insurance-industry computer-use evaluation, evidence that Sonnet 4.6 was the first Claude model on which large-scale computer-use deployment was practical for a regulated enterprise vertical.[1][3][6][7]
GDPval-AA is Artificial Analysis's Elo-style evaluation of economically valuable knowledge-work tasks (drafting documents, building spreadsheets, analyst work). Sonnet 4.6 scored 1633 Elo, ahead of Opus 4.6 at 1606 and every other model AA tested at launch. The result was the strongest evidence that Sonnet 4.6 had reached or exceeded Opus-class capability on the practical office tasks that drive enterprise AI usage.[3]
On Terminal-Bench 2.0, Sonnet 4.6 scored 59.1%, an 8.1-point improvement over Sonnet 4.5's 51.0%. Opus 4.6 leads at 65.4%, and GPT-5.3 Codex was reported at 75.1%, the only headline-tier evaluation where the Opus and Sonnet advantage over a non-Anthropic model was clearly reversed.[6][7][29] Artificial Analysis's parallel TerminalBench evaluation gave Sonnet 4.6 53%, ahead of Opus 4.6 at 46%, likely reflecting differences in scoring or subset.[3]
Tau-Bench from Sierra Research tests realistic customer-service workflows by calling tools, reading results, and reasoning across many turns. Sonnet 4.6 scored 91.7% on the retail subset and 97.9% on telecom, matching Opus 4.6 on telecom and finishing within a percentage point on retail.[26] On MCP-Atlas, an academic benchmark that wires up a large set of real Model Context Protocol servers, Sonnet 4.6 scored 61.3%, narrowly ahead of Opus 4.6 at 60.3% and the only frontier evaluation where Sonnet 4.6 outscored Opus 4.6 on tool use directly. The result is consistent with Anthropic's claim that Sonnet 4.6 was specifically tuned for agentic tool use.[26]
BrowseComp is the live-web information-seeking benchmark with 1,266 tasks requiring navigation of the live internet. Anthropic's revised system card from March 6, 2026 reported Sonnet 4.6 at 74.7% single-agent and 82.07% multi-agent, after an improved cheating-detection pipeline flagged nine additional shortcuts and revised the original 74.72% down to 74.01%. The results place Sonnet 4.6 behind Opus 4.6 (84.0%) and GPT-5.2 (77.9%) single-agent but ahead of most of the field in multi-agent configurations where multiple instances coordinate to verify answers. The transparent downward revision was unusual in commercial AI benchmarking and was cited approvingly by Mowshowitz and others as a signal about evaluation discipline.[8][27]
Graduate-level reasoning is one area where Sonnet 4.6 is closer to Sonnet 4.5 than to Opus 4.6. On GPQA Diamond, Sonnet 4.6 scored 74.1% per NxCode's collation, slightly below Sonnet 4.5's 79.0% and well below Opus 4.6's 91.3%. GPT-5 scored in the high 80s and Gemini 3 Pro reached 94.3%.[13][30] On pure mathematics, Sonnet 4.6 scored 89% on the Math benchmark used in Anthropic's launch documentation, a substantial gain over Sonnet 4.5's 62%. On HumanEval, Sonnet 4.6 scored 93.5%, in the same band as Opus 4.6. On MMLU-Pro, Sonnet 4.6 scored 85.2%, a few points below Opus 4.6 and Opus 4.7 but well within the leading band.[13][25][26]
On ARC-AGI, Sonnet 4.6 scored 86% on ARC-AGI-1 and either 58% (Mowshowitz) or 60.4% (NxCode) on ARC-AGI-2, with the difference attributable to evaluation harness or effort settings. Mowshowitz reported that ARC-AGI-2 cost Sonnet 4.6 "roughly equal to Opus despite somewhat lower performance," with per-task ARC-AGI-1 at about $1.45 and ARC-AGI-2 at $2.72.[8]
On Finance Agent v1.1, Sonnet 4.6 scored 63.3%, leading the field and beating Opus 4.6 at 60.1%.[6] On SRE-Skills-Bench from Rootly, Sonnet 4.6 scored 90.4% versus Sonnet 4.5's 85.9% and Opus 4.6's 94.7%. The per-domain breakdown was instructive: Sonnet 4.6 led on general SRE (88.0% vs 87.0%), tied on AWS networking (97.1%), and trailed on AWS S3 (75.7% vs 91.9%) and AWS IAM (85.2% vs 92.2%). On root-cause accuracy in production incident reproductions, Rootly reported Sonnet 4.6 performing similarly to Opus 4.6 and beating it in some cases despite running at 40% lower cost per token.[28] Anthropic's internal insurance computer-use evaluation gave Sonnet 4.6 a 94% accuracy result, the highest reported for any Claude model on a regulated-vertical benchmark at launch.[1][26]
On NYT Connections, a creative-reasoning probe, Sonnet 4.6 scored 58%, up from Sonnet 4.5's 49% but well below Opus 4.6.[8] On the Artificial Analysis Intelligence Index, Sonnet 4.6 reached 51 points (Sonnet 4.5: 43; Opus 4.6: 53), tying GPT-5.2 (xhigh). Artificial Analysis emphasized that the two-point Sonnet-Opus gap was the smallest the firm had measured in the Claude family, down from seven points at the Sonnet 4.5 / Opus 4.5 release.[3]
Independent launch measurements put Sonnet 4.6 throughput at roughly 60 to 90 tokens per second on the Anthropic API, depending on extended-thinking effort. On claude.ai, time-to-first-token was reported under one second. Anthropic positions Sonnet 4.6 as Fast latency tier, between the Fastest Haiku 4.5 and Moderate Opus 4.6/4.7. Opus 4.6 has a separate Fast mode beta at 6x standard pricing; no equivalent fast mode tier is documented for Sonnet 4.6.[2][6]
The table below compares Sonnet 4.6 to current and legacy Claude models on the dimensions most relevant to deployment decisions.
| Model | Input ($/MTok) | Output ($/MTok) | Context window | Max output | Extended thinking | Adaptive thinking | Computer use | Latency tier |
|---|---|---|---|---|---|---|---|---|
| Claude Haiku 4.5 | $1.00 | $5.00 | 200K | 64K | Yes | No | Yes | Fastest |
| Claude Sonnet 4.5 | $3.00 | $15.00 | 200K | 64K | Yes | No | Yes | Fast |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 1M | 64K | Yes | Yes | Yes | Fast |
| Claude Opus 4.5 | $5.00 | $25.00 | 200K | 64K | Yes | Yes | Yes | Moderate |
| Claude Opus 4.6 | $5.00 | $25.00 | 1M | 128K | Yes | Yes | Yes | Moderate |
| Claude Opus 4.7 | $5.00 | $25.00 | 1M | 128K | No (adaptive only) | Yes | Yes | Moderate |
Compared to Sonnet 4.5, Sonnet 4.6 has the same pricing and the same maximum output token cap but a 5x larger context window, plus adaptive thinking and context compaction. The two models are otherwise drop-in compatible at the API level; an application calling Sonnet 4.5 typically needs only the model identifier change and the optional addition of an effort level to migrate.[2][12]
Compared to Opus 4.6, Sonnet 4.6 is 40% cheaper per token, has the same context window but a smaller output cap (64K vs 128K), and trails Opus 4.6 by roughly one to two percentage points on most benchmarks except GDPval-AA, MCP-Atlas, Finance Agent v1.1, and TerminalBench (Artificial Analysis), where Sonnet 4.6 leads. For agentic office work and tool-heavy customer-service workloads, Artificial Analysis recommends Sonnet 4.6 outright; for deep scientific reasoning, GPQA, ARC-AGI-2, BrowseComp single-agent, and Terminal-Bench 2.0 in absolute terms, Opus 4.6 retains its advantage.[3][6][7]
Compared to Opus 4.7, Sonnet 4.6 is similarly 40% cheaper but trails by a wider margin on coding benchmarks. Opus 4.7's SWE-bench Verified of 87.6% is eight percentage points above Sonnet 4.6, the largest Sonnet-Opus gap since the Claude 4 generation began. Opus 4.7 also has higher OSWorld accuracy (78.0% versus 72.5%) and a 1:1 vision pixel mapping that Sonnet 4.6 does not have. Anthropic positions Opus 4.7 as the model of choice for the most complex agentic coding work and Sonnet 4.6 as the right default for everyday workloads, including most of Claude Code's day-to-day usage.[14][15]
By Q2 2026, Sonnet 4.6 was being benchmarked against GPT-5 variants and Gemini 3 Pro/3.1 Pro on most of the same evaluations. Sonnet 4.6 led on agentic computer use (OSWorld 72.5% vs GPT-5.3 Codex 64.7%) and agentic office tasks (GDPval-AA Elo 1633 vs next-best 1606), tied or led on tool-use benchmarks (MCP-Atlas 61.3%, Tau-Bench retail 91.7%, telecom 97.9%), and trailed on graduate-level scientific reasoning (Gemini 3.1 Pro reached 94.3% on GPQA Diamond) and Terminal-Bench 2.0 (GPT-5.3 Codex 75.1%).[7][13][29][30]
Pricing comparisons showed Gemini 3.1 Pro at roughly 33 to 35% lower per-token cost than Sonnet 4.6, explaining why Google's model became the cost-led default for some long-context document workflows. In a Q1 2026 blind preference study cited by IntuitionLabs, Claude-generated content was preferred 47% of the time versus 29% for GPT-5.4 and 24% for Gemini 3.1 Pro, a gap analysts attributed to Claude's stronger long-form coherence over 10,000-word outputs.[30][31]
Sonnet 4.6's 79.6% on SWE-bench Verified places it at near-flagship coding accuracy. Anthropic's launch post highlighted the model's behavior across long agentic coding sessions: it follows multi-step plans more reliably, makes fewer false claims of completion, and is less prone to over-engineering simple changes than its predecessor.[1] DataCamp's review described the same effect concretely: in repeated tests of multi-file Python and TypeScript refactors, Sonnet 4.6 completed tasks faster and with fewer regressions, particularly when given a clear specification.[6]
Claude Code shipped Sonnet 4.6 support on launch day, with the model picker allowing users to choose between Sonnet 4.6 and Opus 4.6.[1][16] GitHub Copilot made Sonnet 4.6 generally available across Pro, Pro+, Business, and Enterprise plans on the same date, with support spanning Visual Studio Code (chat, ask, edit, agent), Visual Studio (agent, ask), github.com, GitHub Mobile, the Copilot CLI, and the Copilot coding agent. The integration launched at a 1x premium-request multiplier described as tentative.[16]
GitHub's Joe Binder, lead PM for Copilot, said that "out of the gate, Claude Sonnet 4.6 is already excelling at complex code fixes, especially when searching across large codebases is essential," a claim consistent with the strong long-context retrieval and near-Opus SWE-bench score.[1][16] Replit's chief executive Michele Catasta described Sonnet 4.6 as having an "extraordinary" performance-to-cost ratio and noted that "it's hard to overstate how fast Claude models have been evolving in recent months."[1]
The move from 61.4% to 72.5% on OSWorld-Verified is the single largest capability change in the Sonnet 4.6 release in proportional terms. Combined with the 94% accuracy on a real-world insurance computer-use task, Sonnet 4.6 is the first Claude model on which production computer-use deployments are economically practical at scale, given that Opus 4.6 reaches similar accuracy at 67% higher cost per token. Alexlavaee.me reported a roughly 5x improvement in OSWorld accuracy over the original computer-use baseline from October 2024 (around 14.9%).[1][3][5]
As with Opus 4.6, Anthropic recommends that production computer-use deployments include human-in-the-loop oversight for high-stakes actions (financial transactions, file deletions, irreversible operations) and explicit retry, validation, and rollback layers for low-stakes tasks. Caylent's deployment review walked through a sample insurance claims pipeline using Sonnet 4.6 with three validation stages and human approval at the disbursement step, citing it as a template for compliant computer-use rollouts. The recommended pattern is "approval workflows + retries + validation" rather than fully autonomous operation.[7]
Sonnet 4.6 inherits the long-horizon agent design that Sonnet 4.5 introduced in September 2025 and the agent-team architecture that Opus 4.6 introduced in February 2026. With context compaction enabled, a Sonnet 4.6 agent maintains coherent state across hundreds of turns or thousands of tool calls without an external memory layer. With adaptive thinking, it scales reasoning depth dynamically rather than spending the same effort on every sub-step. Anthropic's launch demos included a Sonnet 4.6 agent that completed a research task lasting several hours autonomously, with intermediate checkpoints and tool-use traces.[1][2][12]
In the multi-agent configuration that Anthropic reported on BrowseComp, multiple Sonnet 4.6 instances coordinated to research and verify answers. The 82.07% multi-agent score versus 74.7% single-agent indicated a roughly 7.4-point uplift from coordination alone, with multi-agent runs spending roughly 4x as many tokens through the system as single-agent runs.[8][27]
Sonnet 4.6 processes image inputs alongside text including documents, charts, screenshots, and photos. The model carries forward the same vision pipeline as Sonnet 4.5 and Opus 4.6, distinct from the new tokenizer and high-resolution vision arriving with Opus 4.7.[15] DataCamp described Sonnet 4.6 as competent on document layouts, contract analysis, and chart interpretation with no significant regressions versus Sonnet 4.5.[6] The March 13 GA announcement raised per-request media capacity from 100 to 600 images or PDF pages.[21]
Claude Sonnet 4.6 supports the same major world languages as the Claude 4 family: English, Spanish, French, German, Portuguese, Italian, Dutch, Japanese, Korean, Chinese. Anthropic did not publish a dedicated MMMLU figure at launch, but third-party evaluations placed Sonnet 4.6 in the high 80s to low 90s on multilingual MMLU, in line with the rest of the Claude 4.5/4.6 generation. Enterprise reviews highlighted improvements in non-English coding comments and localized customer-service tone consistency versus Sonnet 4.5.[6][13]
Sonnet 4.6 supports prompt caching with a minimum cache checkpoint of 4,096 tokens (Anthropic API) or 1,024 tokens (Bedrock), and up to four cache checkpoints per request. TTL is five minutes or one hour. Cache reads bill at one-tenth the input rate, so a 200,000-token system prompt costing $0.60 to write at the 5-minute TTL is $0.06 per cached read.[2][4][11] Cacheable fields on Bedrock include system prompt, messages, and tools blocks; the Anthropic API adds explicit cache breakpoints on individual content blocks. Batch processing through the Message Batches API gives a 50% discount with up to 24-hour turnaround. Sonnet 4.6 supports up to 300,000 output tokens per batched request via the output-300k-2026-03-24 beta header. Caching and batch discounts stack with each other but not with managed-agent runtime billing.[2]
On February 17, 2026, Sonnet 4.6 replaced Sonnet 4.5 as the default Claude on claude.ai for free, Pro, and Team subscribers. Pro and Max users continued to have access to Opus 4.6 as a premium model selectable from the picker. The free-tier default position was unusual for a frontier-class model: industry analysts read it as Anthropic embedding Sonnet 4.6 in professional workflows ahead of enterprise procurement decisions, treating widespread free-tier adoption as a way to create switching costs against premium-tier alternatives at competing labs.[1][5][17][24]
Claude Code shipped Sonnet 4.6 as the default for most users on launch day, with Opus 4.6 selectable for harder problems. Hex Technologies' chief technology officer said the company was moving the majority of its analytical workload to Sonnet 4.6 citing accuracy and price simultaneously, and Factory.AI transitioned its primary developer-agent traffic from Sonnet 4.5 to Sonnet 4.6. Pasquale Pillitteri's Claude Code guide described Sonnet 4.6 as the appropriate daily driver for roughly 90% of developer workloads, with Opus 4.6 reserved for the difficult 10%. Cursor's documentation took a similar position: Sonnet 4.6 "is better on longer tasks but below Opus for raw intelligence."[1][5][18][32]
Sonnet 4.6's strength on GDPval-AA and on the insurance computer-use benchmark made it a natural fit for document-processing and analyst workflows in regulated industries: contract review, financial analyst tasks, policy comparison.[6] After the March 13 1M context GA, the cost calculus for very long workflows changed materially. A 700,000-token contract review that would have cost $4.20 on input and $4.50 on output at the prior long-context premium dropped to $2.10 input and $3.00 output at standard rates, a roughly 40% reduction. The same reduction applies to large-codebase security audits, multi-paper research synthesis, and similar long-form analyst tasks.[21][22]
The Sonnet 4.5-plans, Haiku 4.5-executes pattern from October 2025 carries forward with Sonnet 4.6 capable of taking the orchestrator role for non-flagship workloads where Opus 4.6 would be overkill. The larger context window lets a Sonnet 4.6 orchestrator maintain coherent global state for longer runs without external memory.[1][7] Caylent's deployment review showed a Sonnet 4.6 orchestrator paired with multiple Haiku 4.5 sub-agents at roughly 60% lower monthly cost than Sonnet 4.6 throughout while maintaining most of the quality.[7] The BrowseComp multi-agent result (82.07%) demonstrated the technique inside the same tier.[27]
Rootly's evaluation cited Sonnet 4.6's 90.4% on SRE-Skills-Bench (up from 85.9%) and near-parity with Opus 4.6 on production incident root-cause investigations as evidence that the model had become a viable default for AI SRE products at the same per-token cost. The 1M context lets long incident logs and infrastructure manifests fit in a single request. Per-domain breakdown still favors Opus 4.6 on AWS S3 (91.9% vs 75.7%) and AWS IAM (92.2% vs 85.2%), suggesting infrastructure investigations crossing those services may benefit from escalation.[28]
Reception was strongly positive on technical merits, with coverage focused on three threads: the closing gap between the Sonnet and Opus tiers, the leap in computer-use accuracy, and price stability. DataCamp described Sonnet 4.6 as approaching Opus-level intelligence at a price point that made it practical for far more tasks.[6] Caylent emphasized that Sonnet 4.6 "makes computer-use automation viable at scale" with appropriate validation layers.[7] Artificial Analysis called Sonnet 4.6 the most impressive mid-tier release of the cycle.[3] NxCode framed it as "79.6% SWE-bench at $3/MTok."[13]
Developer testimonials in the launch post and follow-up coverage included Hex Technologies' CTO ("moving majority traffic"), Factory.AI (transitioning primary agent traffic), Replit's Michele Catasta ("extraordinary" performance-to-cost ratio), GitHub's Joe Binder ("excelling at complex code fixes, especially when searching across large codebases is essential"), and Hercules' CEO emphasizing "Opus 4.6-level accuracy at a meaningfully lower cost." One early-access design customer said Sonnet 4.6 "has perfect design taste when building frontend pages and data reports, and it requires far less hand-holding to get there than anything we've tested before."[1][5][16]
Mowshowitz provided the most detailed independent analysis, headlining his post "Claude Sonnet 4.6 Gives You Flexibility." Mowshowitz argued the model is "modestly less capable" than Opus 4.6 but "significantly cheaper and faster" and recommended it for most workloads. He noted concerns about Sonnet 4.6 appearing "overfitted on agenticity" (excessive tool calling, occasional unproductive reasoning loops at max effort) and described the headline 40% cost discount as eroding under high-effort settings, but his overall verdict was clearly positive. He also flagged improvements in honesty, reduced sycophancy, and substantial gains in prompt-injection resistance.[8]
Not all reactions were uniformly enthusiastic. Boris Cherny, the Claude Code creator, was reported as preferring Opus for his own use, arguing the SWE-bench gap mattered for solo debugging.[5] On Hacker News and the Claude developer Discord, some users reported Sonnet 4.6 calling tools more aggressively than Sonnet 4.5, leading to occasional unnecessary tool invocations.[3][8] Early-launch reports of hallucinated function names and structured-output formatting errors tapered off within hours, suggesting a serving-layer or toolchain transient rather than a model-level regression.[5]
By late April 2026, multiple data points pointed to broad uptake. Hex, Factory, and Replit publicly disclosed migration of primary traffic. Cursor, Notion, Augment Code, Warp, Zencoder, and Poe all reported Sonnet 4.6 as their top-volume Claude variant by mid-March, alongside Opus 4.6 for premium-tier requests. Microsoft Foundry surfaced Sonnet 4.6 as the recommended Claude Sonnet model in its model catalog within 48 hours of launch. The March 13 1M-context GA announcement paired standard pricing with full rate limits at every context length, signalling that production capacity for long-context workloads had matured to the point where Anthropic was comfortable removing previously tier-gated access.[1][5][16][21][22][32]
Anthropic deployed Claude Sonnet 4.6 under AI Safety Level 3 (ASL-3), the same standard applied to Sonnet 4.5, Opus 4.5, Opus 4.6, and Opus 4.7. The company's Responsible Scaling Policy version 3.0 designates ASL-3 for models that may provide meaningful uplift to actors developing chemical, biological, radiological, or nuclear (CBRN) weapons, or that may provide significant uplift to autonomous self-replicating activity. The classification is conservative: Anthropic emphasizes that ASL-3 is applied when the company cannot rule out the relevant capability threshold rather than as a definitive determination that the model has crossed it.[1][19]
The system card reports 99.38% harmless-response rate on single-turn violative requests, up from Sonnet 4.5's 97.89% and broadly comparable to Opus 4.6. On the harder violative subset the rate is 99.40% (vs 98.40% for Sonnet 4.5). Over-refusal on benign prompts is 0.18% (higher-difficulty) and 0.41% (straightforward), versus Sonnet 4.5's 8.50% and 0.08%; the 8.50% figure had been one of the most-cited concerns about Sonnet 4.5 and the new number addresses it directly. On agentic coding tests with malicious intent, Sonnet 4.6 achieved a 100% refusal rate. The child-safety subset shows 99.96% harmless on violative prompts and 0.08% refusal on benign prompts.[7][20]
Resistance to prompt injection saw the largest absolute change of any safety metric. On Anthropic's standard browser-environment prompt-injection benchmark, Sonnet 4.6 had an attack success rate of 1.29% of scenarios (0.29% of attempts) without external safeguards, versus Sonnet 4.5's 49.36% of scenarios (16.23% of attempts). With safeguards enabled, the success rate fell to 0.51% of scenarios and 0.08% of attempts. Anthropic described the result as "matching Opus-level" prompt-injection resistance for the first time in a Sonnet model.[1][7][20] The reduction from a roughly 50% scenario-level success rate on Sonnet 4.5 to a roughly 1% rate brings Sonnet-tier deployments much closer to a security profile compatible with sensitive enterprise data, though Anthropic still recommends production deployments stack input and output classifier safeguards on top.[7][20]
The system card reports Sonnet 4.6 showed lower rates of misaligned behavior than Sonnet 4.5 on Anthropic's automated alignment evaluations, including the agentic misalignment scenarios that drew attention with the Opus 4 system card in May 2025. On sycophancy, researcher Drake Thomas noted Sonnet 4.6 was less sycophantic than all prior Claude models including Opus 4.6, with "warmth" modestly higher than Sonnet 4.5 but lower than Opus 4.6.[8][20]
Mowshowitz's read found mixed results: clear improvements on honesty, sycophancy, and prompt injection, while the model continued the evaluation awareness documented for Sonnet 4.5 (verbalizing in some test transcripts that it suspected it was being evaluated). Anthropic's launch communication called the character "broadly warm, honest, prosocial, and at times funny," with "very strong safety behaviors, and no signs of major concerns around high-stakes forms of misalignment."[1][8][20]
The system card includes evaluations for biological, chemical, radiological, and nuclear uplift, cybersecurity, and autonomous capability. Anthropic's published headline result is that Sonnet 4.6 met the ASL-3 thresholds across all dimensions but did not approach the higher ASL-4 thresholds. Deployment-time safeguards include the same input and output classifiers rolled out for Opus 4.6, plus the constitutional-AI-derived training that the rest of the Claude 4 family shares. For cybersecurity, Sonnet 4.6 is competent on entry-level cyber tasks but does not reliably solve advanced offensive scenarios; the highest-capability cybersecurity-relevant model is held back as the invitation-only Project Glasswing Mythos Preview.[15][19][20]
On March 6, 2026, Anthropic published a revised system card after running an improved cheating-detection pipeline on BrowseComp. The new pipeline identified nine additional unintended shortcuts and revised the single-agent BrowseComp score from 74.72% to 74.01% (74.7% in summary materials) and the multi-agent score to 82.07%. The willingness to publish a downward revision rather than quietly retire the original was cited approvingly as a signal about evaluation discipline.[8][27]
The deployment matrix below summarizes current options.
| Channel | Model identifier | Notes |
|---|---|---|
| Anthropic API (1P) | claude-sonnet-4-6 | Messages API, Batches API, prompt caching, MCP, all betas |
| Claude Managed Agents | claude-sonnet-4-6 | Adds $0.08/hour session runtime; Batch and Fast mode unavailable |
| AWS Bedrock (global) | global.anthropic.claude-sonnet-4-6 | Routes worldwide; available in 30+ regions |
| AWS Bedrock (geo) | us / eu / au / jp.anthropic.claude-sonnet-4-6 | Single geography for data residency; 1.1x premium |
| AWS Bedrock (in-region) | anthropic.claude-sonnet-4-6 in eu-west-2 | London in-region for strict residency; 1.1x premium |
| GCP Vertex AI | claude-sonnet-4-6 | Global, multi-region, regional endpoints; 1.1x premium non-global |
| Microsoft Foundry | Claude Sonnet 4.6 in catalog | Recommended Sonnet 48 hours after launch |
| claude.ai | default on Free, Pro, Team | Free tier includes file creation, connectors, skills, compaction |
| Claude Code | default for Pro/Team | Opus 4.6 selectable; 1M context for Max/Team/Enterprise after March 13 |
| GitHub Copilot | generally available across plans | VS Code, Visual Studio, GitHub Mobile, Copilot CLI, coding agent |
| Cursor | claude-4-6-sonnet | Default for most agent-mode coding sessions |
| Third-party | Replit, Hex, Factory.AI, Notion, Augment Code, Warp, Zencoder, Poe | Day-one or near-day-one integrations |
The Anthropic API exposes the most beta features (300K-token batch outputs, Anthropic-side prompt cache breakpoints, MCP servers, latest computer-use beta) but requires Anthropic's own billing. AWS Bedrock and Vertex AI offer the same model with platform-native billing, data residency options at a 1.1x premium, and integration with cloud IAM and compliance tooling. The choice is typically a procurement decision rather than a capability one.[2][4]
| Snapshot | API model ID | Release date | Status (May 2026) |
|---|---|---|---|
| Initial release | claude-sonnet-4-6 | February 17, 2026 | Active |
As of May 2026, no second Sonnet 4.6 snapshot has been released and Anthropic has not announced a Claude Sonnet 4.7. The February 17, 2026 snapshot remains the only Sonnet-tier model in the Claude 4 generation beyond Sonnet 4.5. The dateless identifier convention introduced with the 4.6 generation means future updates will use a new dateless ID rather than reusing claude-sonnet-4-6. The system card was revised on March 6, 2026; the underlying model snapshot was not changed.[2][27]
| Date | Event |
|---|---|
| February 5, 2026 | Claude Opus 4.6 released; introduces adaptive thinking, context compaction, 1M context beta |
| February 17, 2026 | Claude Sonnet 4.6 released; default on claude.ai for Free, Pro, and Team |
| February 17, 2026 | GitHub Copilot, Cursor, Replit, Hex, Factory, Notion, AWS Bedrock, GCP Vertex AI, Microsoft Foundry day-one launches |
| March 6, 2026 | System card revised after improved BrowseComp cheating-detection pipeline; scores adjusted |
| March 13, 2026 | 1M context window moves to GA at standard pricing for Sonnet 4.6 and Opus 4.6; long-context surcharge removed; image/PDF capacity raised to 600 per request |
| April 16, 2026 | Claude Opus 4.7 released; new tokenizer; Sonnet 4.6 remains active default Sonnet |
| May 2026 | Sonnet 4.6 remains the active Sonnet flagship; no successor announced |
Migrating from Sonnet 4.5 to Sonnet 4.6 typically requires only changing the model identifier. Two areas warrant attention. First, code that set thinking={"type": "enabled", "budget_tokens": <n>} on Sonnet 4.5 should migrate to adaptive thinking via thinking={"type": "adaptive"} plus an effort hint (low, medium, high, max), starting with medium and raising to high for tasks that previously needed deep extended thinking.[12] Second, Sonnet 4.6 at higher effort levels can spend significantly more reasoning tokens than Sonnet 4.5 did under the prior fixed-budget regime. Artificial Analysis's all-in cost data (Sonnet 4.6 at $2,088 versus Sonnet 4.5 at $733 on the same Intelligence Index suite) is the most-cited reference; production cost monitoring should track per-task output-token usage during the migration window.[3][12]
Prompt caching, batch API, and MCP-based tool-use code requires no changes; the same multipliers apply. For Bedrock, the identifier change from anthropic.claude-sonnet-4-5-20250929-v1:0 to anthropic.claude-sonnet-4-6 plus a region or endpoint selection is sufficient. Code that previously used a regional endpoint should review whether the global endpoint with 10% lower pricing is acceptable for its data residency requirements.[2][4][12]
Claude Sonnet 4.6 has a 64,000-token maximum output, half of Opus 4.6's 128,000 tokens. For long-form generation use cases such as book-length manuscripts, very long contracts, or large codebase rewrites, the output cap can require chunking or multi-call orchestration even when the input fits in a single 1M-token request. The Batch API beta header output-300k-2026-03-24 partially addresses this for non-real-time workloads, allowing up to 300,000 output tokens per batched request.[2]
Multiple independent reviews found that Sonnet 4.6 at max effort can spend output tokens at a rate that erodes the 40% per-token discount versus Opus 4.6. Artificial Analysis reported that Sonnet 4.6 used roughly 28% more output tokens than Opus 4.6 to complete the same Intelligence Index suite, and Mowshowitz found ARC-AGI-2 evaluation costs roughly equal between Sonnet at max effort and Opus. Production deployments that use max effort routinely should benchmark real-world cost rather than relying on per-token list price.[3][8]
On GPQA Diamond, Sonnet 4.6 scored 74.1% per NxCode's collation, slightly below Sonnet 4.5's 79.0%. The regression is small and may reflect training-data trade-offs that benefit other capabilities, but the Anthropic launch post does not address this benchmark directly. Use cases that depend heavily on graduate-level science reasoning may want to test Sonnet 4.6 head-to-head against Sonnet 4.5 on the specific workload before switching.[13]
While 72.5% on OSWorld-Verified is a major leap, it still implies that roughly 27% of complex GUI automation tasks fail or require intervention. Anthropic recommends production computer-use deployments include human-in-the-loop oversight for high-stakes actions and explicit retry-and-validation layers for low-stakes tasks. Caylent's review described the design pattern as "approval workflows + retries + validation," not autonomous operation.[1][7]
Mowshowitz and several developer-forum commenters reported that Sonnet 4.6 sometimes called tools more aggressively than Sonnet 4.5 in agent loops, occasionally invoking tools that were not necessary for the immediate task. The behavior is broadly positive in environments designed around heavy tool use but can show up as unnecessary cost or latency in lighter-weight applications.[8]
The model's reliable knowledge cutoff is August 2025 and its training data cutoff is January 2026. Applications requiring up-to-date information about events after January 2026 should supplement the model with retrieval-augmented generation or live tool access to current sources.[2]
Claude Opus 4.7, released two months after Sonnet 4.6, introduced a new tokenizer and high-resolution vision pipeline (up to 2,576 pixels on the long edge with 1:1 pixel-to-coordinate mapping). Sonnet 4.6 retains the older Claude 4 tokenizer and vision pipeline, with vision capped at 1,568 pixels on the long edge. Use cases requiring fine-grained vision, such as clicking on small UI elements or reading low-resolution scanned text, may need to use Opus 4.7 instead.[15]
Opus 4.6 has a Fast mode beta available at 6x standard pricing for cases where developers prefer to pay for higher tokens-per-second at flagship intelligence. Sonnet 4.6 does not have an equivalent fast mode tier as of May 2026; users requiring sub-Fast latency at Sonnet quality must rely on Anthropic's Priority Tier or on the standard latency profile.[2]