Claude Sonnet 4.5

AI Code Generation AI Tools & Products Anthropic Artificial Intelligence Large Language Models Machine Learning Multimodal AI Natural Language Processing

37 min read

Updated Jun 21, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 21, 2026

Fact-checked

In review queue

Sources

28 citations

Revision

v9 · 7,438 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

See also: Claude

Claude Sonnet 4.5 is a multimodal large language model (LLM) developed by Anthropic and released on September 29, 2025, which Anthropic described at launch as "the best coding model in the world."^[1] It is a hybrid reasoning model in the Claude 4 family, optimized for autonomous software engineering, computer use, and long-running AI agents, and it shipped priced at $3 per million input tokens and $15 per million output tokens with a 200,000-token context window.^[1]

Anthropic positioned it at launch as "the best coding model in the world," the strongest model for building complex autonomous agents, and the best model at using computers.^[1]^[2] It is the third Sonnet-tier release in the Claude 4 family and succeeded Claude Sonnet 4, which had been released in May 2025.

At launch, Claude Sonnet 4.5 set a new state-of-the-art score on SWE-bench Verified (77.2%, rising to 82.0% with high-compute parallel runs) and on OSWorld (61.4%), the two most widely cited benchmarks for autonomous software engineering and computer control.^[1] It introduced a verified ability to sustain focused autonomous operation for over 30 hours, more than quadrupling the roughly seven-hour mark set by its immediate predecessor, Claude Opus 4.^[4]

The model launched alongside Claude Code 2.0, a revamped terminal agent with automatic checkpoints, a native Visual Studio Code extension, and a new subagent model, as well as the Claude Agent SDK, which gave developers programmatic access to the same infrastructure powering Claude Code.^[1] On the day it shipped, Sonnet 4.5 was widely characterized in the developer press as the leading model for agentic coding, beating GPT-5 on SWE-bench Verified by roughly 4.4 percentage points and beating Gemini 2.5 Pro by about 14 percentage points.^[1]^[8]

What is Claude Sonnet 4.5?

Claude Sonnet 4.5 is a hybrid reasoning model. In standard mode it delivers fast responses suited to conversational use and everyday coding tasks. In extended thinking mode it allocates up to 64,000 tokens of visible internal reasoning before producing an answer, improving performance on harder problems such as competition mathematics, multi-step planning, and graduate-level science questions.^[3]

The model is designed for agentic pipelines that run without constant human supervision. Improvements in tool-call handling, parallel tool use, memory management, and long-context fidelity were all targeted explicitly during development, enabling Claude Sonnet 4.5 to tackle workflows that would have stalled earlier models through context overflow, tool-call errors, or loss of goal coherence.^[1]^[5]

Alongside the launch, Anthropic released the model under ASL-3 deployment standards and described it as the company's "most aligned frontier model" to date, citing a 60% reduction on its primary internal misalignment metrics relative to Claude Sonnet 4.^[3]

As of May 2026, Claude Sonnet 4.5 sits under the "Legacy models" heading in Anthropic's platform docs, having been succeeded by Claude Sonnet 4.6 (released February 2026), but is still listed as an active model on Anthropic's model deprecation page with a tentative retirement date no sooner than September 29, 2026. It remains fully available via the Claude API, Amazon Bedrock, and Google Cloud Vertex AI, and retains Priority Tier and extended-thinking support.^[6]^[28]

When was Claude Sonnet 4.5 released?

Claude Sonnet 4.5 was released on September 29, 2025, simultaneously on the Anthropic API, the claude.ai web app, and the iOS and Android apps, and on Amazon Bedrock and Google Cloud Vertex AI the same day.^[1]^[6] The timeline below tracks its launch and its later reclassification as a legacy model.

Date	Event
September 29, 2025	Public launch on Anthropic's API, claude.ai web, iOS, and Android apps^[1]
September 29, 2025	Amazon Web Services announces availability in Amazon Bedrock^[6]
September 29, 2025	Google Cloud announces availability in Vertex AI^[6]
September 29, 2025	GitHub Copilot begins public preview rollout for Pro, Pro+, Business, and Enterprise tiers^[7]
September 29, 2025	Claude Code 2.0 and the Claude Agent SDK released concurrently^[1]
September 29, 2025	Memory tool and context editing features introduced for the Claude Developer Platform, tuned against Sonnet 4.5^[27]
September 29, 2025	"Imagine with Claude" research preview opens for Max subscribers^[15]
October 2, 2025	Claude Sonnet 4.5 added to the LMSYS Chatbot Arena Text and Web Dev leaderboards^[16]
October 3, 2025	"Imagine with Claude" preview extended one week and opened to Pro subscribers^[15]
October 3, 2025	Sonnet 4.5 "Thinking 32k" snapshot added to the LMSYS Chatbot Arena Text leaderboard^[16]
October 2025	Coverage in InfoQ, DevOps.com, and AI trade press; early adopter benchmark results published by Replit, Cognition, and Cursor^[8]^[9]^[10]
October 2025	First wave of "Sonnet 4.5 personality" complaints ("The Flip") surfaces on Reddit and Hacker News^[17]^[18]
February 17, 2026	Claude Sonnet 4.6 released; Sonnet 4.5 reclassified as legacy in Anthropic's model catalog^[6]
March 6, 2026	Sonnet 4.5 removed from the consumer Claude web and desktop app model picker; Sonnet 4.6 becomes the default in-app selection^[21]
April 30, 2026	1M-token beta header (`context-1m-2025-08-07`) retired for Sonnet 4 and Sonnet 4.5; over-200K requests return `400 invalid_request_error`^[6]^[26]
May 2026	Sonnet 4.5 remains active on the Claude API with tentative retirement no sooner than September 29, 2026 (per Anthropic's deprecation page)^[28]

How was Claude Sonnet 4.5 developed?

Claude Sonnet 4.5 was built by Anthropic, an AI safety and research company founded in 2021 by former OpenAI researchers Dario Amodei and Daniela Amodei. The model is the third Sonnet-tier release in the Claude 4 generation, following Claude Sonnet 4 (May 2025) and preceding Claude Sonnet 4.6 (February 2026).

Development prioritized four areas: agentic reliability over multi-hour workflows, computer use accuracy, coding quality on real-world repositories, and alignment improvements that reduced sycophancy and deception relative to prior models.^[1]^[3] Internal testing used the SWE-bench, OSWorld, and Terminal-bench frameworks as primary targets, with safety evaluations conducted jointly with the UK AI Security Institute (UK AISI) and the independent alignment research firm Apollo Research.^[3]

The release was coordinated with a broader ecosystem update. Claude Code received a complete interface overhaul branded version 2.0, and the Agent SDK was made publicly available for the first time, allowing third-party developers to build agents on the same tool-execution and memory infrastructure Anthropic uses internally.^[1] The same day, Anthropic also opened the "Claude for Chrome" browser extension to all Max plan subscribers and launched the "Imagine with Claude" research preview, both of which depended on Sonnet 4.5's improved computer use and tool reliability.^[15]

Technical specifications

Specification	Value
Release date	September 29, 2025
Model family	Claude 4
API ID (versioned)	`claude-sonnet-4-5-20250929`
API alias	`claude-sonnet-4-5`
AWS Bedrock ID	`anthropic.claude-sonnet-4-5-20250929-v1:0`
Google Vertex AI ID	`claude-sonnet-4-5@20250929`
Standard context window	200,000 tokens (~150,000 words)
Beta context window	1,000,000 tokens via `context-1m-2025-08-07` header (later retired)^[6]^[19]
Max output tokens	64,000 tokens
Extended thinking	Yes (up to 64,000 reasoning tokens)
Multimodal input	Text and images
Output modality	Text only
Training data cutoff	July 2025
Reliable knowledge cutoff	January 2025
Parameter count	Undisclosed
Safety classification	ASL-3 (Anthropic Safety Level 3)
Priority Tier support	Yes^[6]

Model architecture

Claude Sonnet 4.5 was trained on a proprietary mixture of publicly available internet data collected through July 2025, non-public licensed data from third parties, data from human labeling services, and opt-in data from Claude users.^[3] Anthropic has not disclosed the number of parameters or detailed architectural choices. The model supports both text and image inputs and produces text-only outputs. It cannot generate images natively.

How large is the context window?

The standard context window is 200,000 tokens, sufficient for approximately 150,000 words or roughly 500 pages of text in a single interaction. A 1 million token configuration was offered in beta through the context-1m-2025-08-07 header, originally introduced for Claude Sonnet 4 on August 12, 2025 and inherited by Sonnet 4.5 at launch.^[19]^[26] On SWE-bench Verified, the 1M context configuration scored 78.2%, a one-percentage-point gain over the standard 200K configuration.^[1]

On April 30, 2026, Anthropic retired the context-1m-2025-08-07 beta header for both Claude Sonnet 4 and Claude Sonnet 4.5. After that date, the header is syntactically accepted but no longer extends the context window, which silently falls back to 200,000 tokens; prompts longer than 200K return a 400 invalid_request_error. Users who needed 1M context were directed to migrate to Sonnet 4.6 or Opus 4.6, where 1M context is generally available at standard pricing.^[6]^[26]

Starting with Sonnet 4.5, Anthropic also changed the API behavior when input plus max_tokens exceeds the context window. On Sonnet 4.5 and newer models, the API accepts the request and, if generation reaches the ceiling, terminates with stop_reason: "model_context_window_exceeded" rather than returning a validation error up front. Older models could opt in to this behavior with the model-context-window-exceeded-2025-08-26 beta header.^[19]

Later models in the Claude 4 family, including Claude Opus 4.6 and Claude Sonnet 4.6, extended the standard context window to 1 million tokens by default.^[6]

Hybrid reasoning and tool use

Like other Claude 4 models, Sonnet 4.5 supports a single API surface that can be configured for fast responses, extended thinking, or interleaved reasoning between tool calls. The same model handles tool definitions, structured outputs, file inputs through the Files API, Model Context Protocol (MCP) connectors, batch jobs, prompt caching, vision inputs, and computer use through a unified interface. Extended thinking can be combined with tool use, so the model can reason, call tools, observe results, reason again, and iterate without leaving the same request.

Memory tool and context editing

Alongside the model itself, Anthropic shipped two new Claude Developer Platform features on September 29, 2025 that were enabled by, and tuned against, Sonnet 4.5: the memory tool and context editing. They are intended to extend how long an agent can run before exhausting its working memory.^[27]

The memory tool exposes a file-system-style API in which the model can create, read, update, and delete files in a dedicated memory directory hosted on the developer's own infrastructure. Because the storage is external to the context window, facts written to memory persist across turns, across separate sessions, and across context boundaries, letting Sonnet 4.5 carry user preferences, project conventions, or partial results between requests.^[27]

Context editing automatically removes stale tool calls and tool results from the active context as the conversation approaches the token limit, preserving the conversational flow while reclaiming space. The feature can be configured per request and is paired with a new clear_thinking option that drops stored thinking blocks once a tool cycle finishes.^[27]

Anthropic reported three internal-evaluation gains for the two features when run on Sonnet 4.5: a 39% improvement on an agentic-search benchmark when the memory tool and context editing were combined relative to the same model without them, 29% improvement from context editing alone, and an 84% reduction in token consumption on a 100-turn web search evaluation when context editing was active.^[27] Both features later became generally available without a beta header.

Context awareness

Sonnet 4.5, Haiku 4.5, and Sonnet 4.6 are the three Claude models with native context awareness: at the start of a session the model receives an XML <budget:token_budget> block stating its total context size, and after each tool call it receives a <system_warning> block with current and remaining token counts. Anthropic frames this as the difference between cooking with and without a clock: the model can decide how much room to leave for follow-up reasoning, and persists in the task to the end instead of guessing how many tokens remain.^[19]

How much does Claude Sonnet 4.5 cost?

Claude Sonnet 4.5 launched at the same price point as Claude Sonnet 4, $3 per million input tokens and $15 per million output tokens, and has maintained that pricing through mid-2026.^[1]

Usage type	Price
Input tokens (standard)	$3.00 per million tokens
Output tokens (standard)	$15.00 per million tokens
Prompt caching (write)	$3.75 per million tokens
Prompt caching (read)	$0.30 per million tokens
Batch API (input)	$1.50 per million tokens
Batch API (output)	$7.50 per million tokens
Extended context input (>200K tokens, beta)	$6.00 per million tokens
Extended context output (>200K tokens, beta)	$22.50 per million tokens

Prompt caching reduces costs by up to 90% on repeated long-context calls, which is especially useful for agentic workflows that share a large system prompt across many turns. Batch API processing, which runs requests asynchronously within a 24-hour window, cuts prices by 50% and is well-suited to bulk code review, document analysis, or data-extraction pipelines.^[5]^[11]

By comparison, GPT-5 at launch was priced at approximately $1.25 per million input tokens and $10 per million output tokens for its base mode, making it cheaper for simple query workloads. Claude Sonnet 4.5's pricing advantage shows up in long-context agentic workflows where prompt caching and batch processing reduce effective costs substantially.^[12]

How does Claude Sonnet 4.5 perform on benchmarks?

Anthropic published benchmark results alongside the September 2025 launch. Scores on SWE-bench and OSWorld were independently reproducible and were confirmed by third-party evaluators including Caylent and InfoQ.^[8]^[9] Benchmark numbers below are quoted as published in Anthropic's launch post and the model's system card; rounded values are marked with a tilde (~) where the underlying source rounds to one decimal place or to the nearest whole percent.

Coding and software engineering

Benchmark	Claude Sonnet 4.5	Claude Sonnet 4	Claude Opus 4.1	GPT-5	Gemini 2.5 Pro
SWE-bench Verified (200K context)	77.2%	72.7%	74.5%	72.8%	63.2%
SWE-bench Verified (1M context, beta)	78.2%	N/A	N/A	N/A	N/A
SWE-bench Verified (high compute, parallel)	82.0%	N/A	N/A	N/A	N/A
Terminal-bench (Terminus 2)	50.0%	N/A	43.2%	N/A	N/A
OSWorld (computer use)	61.4%	42.2%	42.2%	N/A	N/A
Replit edit error rate	0%	9%	N/A	N/A	N/A

SWE-bench Verified presents 500 real GitHub issues from open-source repositories. The model must inspect the codebase, identify the fault, and submit a working patch without human guidance. Claude Sonnet 4.5's 77.2% score was the highest published by any model at launch, placing it ahead of GPT-5 (72.8%) and Gemini 2.5 Pro (63.2%).^[1]^[8] The 82.0% high-compute result was obtained by running multiple parallel attempts and selecting the best outcome, a method permitted under benchmark rules.

Reasoning, science, and mathematics

Benchmark	Claude Sonnet 4.5	Notes
GPQA Diamond	~83.4%	Graduate-level science questions
AIME 2025 (with Python tools)	~100%	Competition mathematics, temperature 1.0, 64K thinking tokens^[1]
AIME 2025 (without tools)	~87%	Competition mathematics
MMMLU (non-English, 14 languages)	89.1%	Average of 5 runs, up to 128K context, with extended thinking^[11]
MMMU (multimodal university)	77.8%	Visual reasoning across academic subjects^[11]
Finance Agent (extended thinking, 64K)	55.3%	Domain-specific financial analysis^[11]

On GPQA Diamond, a benchmark of 448 multiple-choice questions in biology, chemistry, and physics written by domain experts, Claude Sonnet 4.5 reached approximately 83.4%, which is above the average performance of human experts with PhDs in the relevant fields.^[1]^[11] On the multilingual MMLU variant (MMMLU), the model averaged 89.1% across 14 non-English languages.

A notable caveat from independent reviewers concerned graduate physics. R&D World reported that, despite strong performance on GPQA Diamond, Sonnet 4.5 still showed gaps in physical intuition relative to the gains it posted on coding benchmarks, and that domain-specific physics evaluations did not move as much as the headline numbers suggested.^[20]

Domain-specific tool use (Tau-bench)

Tau-bench domain	Claude Sonnet 4.5
Tau-bench Retail	86.2%
Tau-bench Airline	70.0%
Tau-bench Telecom	98.0%

Tau-bench measures the ability of agents to follow complex multi-turn customer-service workflows that include tool calls, business rules, and policy constraints. Sonnet 4.5's strong Telecom result (98.0%) and Retail result (86.2%) put it at or near the top of published Tau-bench scores at launch, while the lower Airline number (70.0%) reflects the harder rule-following constraints in that subdomain.^[11]^[14]

Safety metrics

Safety metric	Claude Sonnet 4.5	Claude Sonnet 4
Harmless response rate (violative requests)	99.29%	98.22%
Over-refusal rate	0.02%	0.15%
Primary misalignment metrics improvement	60% better than Sonnet 4	Baseline
Agentic safety score	98.7%	89.3%
Prompt injection defense (false positive reduction)	10x improvement vs. initial deployment	N/A
Self-reported evaluation awareness rate	~13% of test transcripts	Substantially lower

Capabilities

What is Claude Sonnet 4.5 used for in coding?

Claude Sonnet 4.5 can handle the full software development lifecycle across planning, implementation, debugging, refactoring, and maintenance. It consistently generates syntactically correct and logically sound code across common programming languages including Python, TypeScript, JavaScript, Go, Rust, Java, C++, and others.

Replit reported that its edit error rate on internal benchmarks dropped from 9% to 0% when switching from its previous model to Claude Sonnet 4.5, and described the change as a "new generation of coding models" with "surprising efficiency."^[9]^[1] Cognition, the company behind the Devin AI coding agent, reported an 18% increase in planning performance and 12% improvement in end-to-end evaluation scores. Cognition co-founder and CEO Scott Wu said the model "increased planning performance by 18% and end-to-end eval scores by 12%, the biggest jump we've seen since the release of Claude Sonnet 3.6."^[9]^[1] Cursor CEO Michael Truell said, "We're seeing state-of-the-art coding performance from Claude Sonnet 4.5, with significant improvements on longer horizon tasks."^[10]^[1]

The model is able to navigate large codebases with hundreds of files, maintain consistent architecture decisions across long sessions, identify improvements beyond the literal scope of a task, and follow complex multi-part instructions with high reliability.^[1]^[5]

Extended thinking

Extended thinking is a mode in which Claude Sonnet 4.5 produces a visible chain-of-thought before its final answer. The reasoning process can consume up to 64,000 tokens and covers intermediate steps that are typically hidden in standard inference. Enabling extended thinking increases latency and cost but improves accuracy on multi-step mathematical proofs, complex planning tasks, graduate-level science problems, and code that requires reasoning about algorithm correctness.^[3]

Extended thinking has one notable interaction with prompt caching: cached prompts are less efficient when extended thinking is active, because the reasoning tokens themselves cannot be cached the same way as static context. Developers who use extended thinking in high-volume pipelines should account for this in cost modeling.^[5]

How good is Claude Sonnet 4.5 at computer use?

Claude Sonnet 4.5 has access to computer use tools that allow it to control a desktop or browser environment through a sequence of screenshot observations and action commands. On the OSWorld benchmark, which tests models on 369 real-world computer tasks across spreadsheets, web browsers, calendars, and file managers, Claude Sonnet 4.5 scored 61.4%. Claude Sonnet 4 had scored 42.2% just four months earlier, making this a 19.2-percentage-point improvement in a single model generation.^[1]

The computer use capability is designed to support fully autonomous workflows in which the model must navigate websites, fill forms, organize files, run code in a browser environment, and coordinate across multiple applications without human intervention at each step. Improvements in prompt injection defense (false positive rate reduced tenfold) made the computer use mode more robust in open-web environments where adversarial page content is a realistic threat.^[1]^[3]

At launch, the same capability also powered the "Claude for Chrome" extension, which Anthropic opened to all Max plan subscribers on September 29, 2025. The extension lets Claude operate directly inside the user's browser to navigate websites, fill forms, complete online purchases, and chain actions across multiple tabs while keeping the human in the approval loop for sensitive actions.^[15]

How long can Claude Sonnet 4.5 run autonomously?

Claude Sonnet 4.5 is optimized for agentic workflows that involve many sequential steps across long time horizons. Key improvements over Claude Sonnet 4 include:

Parallel tool execution, allowing multiple tool calls to run concurrently rather than sequentially
Improved memory management for sessions that exhaust a single context window
Better decision-making about when to seek clarification versus when to proceed autonomously
More consistent architectural choices across multi-hour sessions
Enhanced cross-conversation memory for persistent context across separate sessions^[1]^[6]

During pre-launch trials, enterprise customers observed Claude Sonnet 4.5 sustaining more than 30 hours of continuous autonomous operation while building a production-style chat application of approximately 11,000 lines of code. This contrasted sharply with Claude Opus 4, which could sustain about 7 hours of autonomous operation before degrading.^[4]^[11] The 30-hour result became a frequently cited talking point at launch and was framed by Anthropic as evidence that long-horizon agent reliability had moved from research demos to production deployments.

In cybersecurity tests with HackerOne-affiliated security teams, Sonnet 4.5 deployed agents that autonomously identified and patched vulnerabilities before they could be exploited. The model reduced average vulnerability intake time by 44% while improving accuracy by 25%.^[1]

Use in Claude Code

Claude Code is Anthropic's terminal-based coding agent. Claude Sonnet 4.5 is the model that powered Claude Code at the time of the 2.0 release, and it remained the default model for Claude Code users on the standard subscription tier from late September 2025 until the release of Claude Sonnet 4.6 in February 2026.^[1]^[6]

Claude Code 2.0 features

The 2.0 release, timed to coincide with the Sonnet 4.5 launch, introduced several significant changes:

Automatic checkpoints. Claude Code now saves a checkpoint after each coherent unit of work. If the agent makes a change that breaks a build or the user wants to revert, pressing Escape twice or running /rewind restores the project to the last checkpoint instantly. This addressed one of the most common complaints from early users who lost hours of work when a long agent session went wrong near the end.

Native VS Code extension. Claude Code 2.0 ships a Visual Studio Code extension that displays proposed changes as inline diffs in real time, eliminating the need to switch between terminal and editor to review agent work.

Subagents. Claude Code can spawn parallel sub-agents with individual context windows, each focused on a specific file or subsystem. This enables parallelism within a single session: one subagent refactors a module while another writes tests for a separate component.

Hooks. Users can define shell scripts that run automatically at specific points in the agent's workflow, such as after every edit, after every test run, or before any write to a protected file. This allows teams to enforce code style, run security checks, or gate agent actions with approval workflows without modifying the agent prompt.

Refreshed terminal interface. The underlying terminal was rewritten (internal version 2.0), with improvements to rendering speed, keyboard shortcuts, and session management.

Context editing and memory tools. Users can inspect and edit the active context window mid-session. The memory tools allow persistent facts, user preferences, and project conventions to survive across separate Claude Code sessions. Both features draw on the same Claude API surfaces released the same day (see Memory tool and context editing above), so behavior in the IDE matches what third-party agents see through the Agent SDK.^[1]^[27]

Role in the Claude ecosystem

Before Claude Sonnet 4.6 was released in February 2026, Claude Sonnet 4.5 was the workhorse of the Claude Code product and the default model recommended in Anthropic's developer documentation for agentic coding use cases. The combination of its SWE-bench performance (77.2%) and its $3/$15 per million token pricing made it competitive with more expensive models on a cost-per-task basis for production coding agents.

In the Claude apps for Pro, Max, and Team subscribers, Sonnet 4.5 became the default for general chat workloads from launch onward and was advertised as the routine pick for coding, planning, and computer use, while Opus models were retained for the heaviest reasoning workloads. After Sonnet 4.6 shipped in February 2026, Anthropic briefly removed Sonnet 4.5 from the in-app model picker on the consumer Claude apps, leaving 4.6 as the default Sonnet selection while keeping Sonnet 4.5 reachable through the API.^[6]^[21]

Following the release of Claude Sonnet 4.6, Anthropic's documentation recommends that new integrations use the newer model, but Sonnet 4.5 continues to be supported and retains its availability on all three major cloud platforms (Anthropic API, Amazon Bedrock, Google Vertex AI).^[6]

Claude Agent SDK

Released concurrently with Claude Sonnet 4.5, the Claude Agent SDK (renamed from Claude Code SDK) exposed the same infrastructure underlying Claude Code to third-party developers. The SDK provides:

Virtual machines for isolated code execution
Memory and context management systems that handle context overflow across multi-session workflows
Permission frameworks controlling what tools an agent can invoke and what system resources it can access
Native MCP connector support for tool, resource, and prompt servers
Support for TypeScript and Python

The SDK allows companies to build custom agents for domain-specific workflows, such as security auditing pipelines, financial analysis automation, or customer support bots, without needing to implement tool orchestration from scratch.^[1]

Imagine with Claude

"Imagine with Claude" was a research preview launched on the same day as Sonnet 4.5. It demonstrated the model generating complete software applications in real time with no predetermined functionality and no prewritten code, effectively letting users describe an app and watch the model produce a working version turn by turn.^[15]

The preview was originally Max-only and scheduled for five days, but on October 3, 2025, Anthropic extended access to Pro subscribers and pushed the close date out to October 11, 2025. The preview was hosted at claude.ai/imagine and was framed as a controlled showcase of Sonnet 4.5's combined coding, planning, and tool-use abilities rather than a long-running production product.^[15]

Industry applications

Cybersecurity

Claude Sonnet 4.5 can deploy agents that autonomously identify and patch vulnerabilities before exploitation, supporting a shift from reactive detection to proactive defense. In production tests with enterprise security teams, the model reduced average vulnerability intake time by 44% while improving accuracy by 25%. Anthropic noted that CBRN-related classifiers, which detect requests related to chemical, biological, radiological, and nuclear content, improved their false-positive rate by a factor of ten during and after the Sonnet 4.5 training cycle.^[1]^[3]

Finance

The model handles tasks ranging from entry-level financial analysis to complex predictive modeling. It can continuously monitor regulatory changes and update compliance systems. For structured finance tasks involving risk assessment, derivative pricing, and portfolio screening, several large European financial institutions reported that Claude Sonnet 4.5 delivered analysis requiring substantially less human review than prior model generations.^[1] On the public Finance Agent benchmark with extended thinking enabled at 64K tokens, Sonnet 4.5 scored 55.3%, which Anthropic flagged as a meaningful gain over Sonnet 4 in the same domain.^[11]

Software development and tooling

Multiple developer tool companies reported improvements at launch:

Company	Reported result
Cursor	"State-of-the-art coding performance, specifically on longer horizon tasks" (CEO Michael Truell)^[10]
GitHub Copilot	"Significant improvements in multi-step reasoning and code comprehension" (CPO)^[7]
Windsurf	"A new generation of coding models" (CEO Jeff Wang)^[10]
Replit	Edit error rate dropped from 9% to 0% on internal benchmarks^[9]
Cognition (Devin)	Planning performance up 18%; end-to-end scores up 12%; "biggest jump since Sonnet 3.6"^[9]
Figma	"Easier to prompt and iterate" with better functional prototypes^[1]
Canva	Improved performance in production content workflows^[1]

How does Claude Sonnet 4.5 compare to other models?

Claude 4 family

Model	Context window	Input price	Output price	SWE-bench	Primary use case
Claude Opus 4.7	1M tokens	$5/MTok	$25/MTok	(Updated)	Frontier reasoning, agentic coding
Claude Opus 4.6	1M tokens	$5/MTok	$25/MTok	80.9%	Flagship reasoning
Claude Opus 4.5	200K tokens	$5/MTok	$25/MTok	80.9%	Complex research, analysis
Claude Sonnet 4.6	1M tokens	$3/MTok	$15/MTok	79.6%	Balanced speed and intelligence
Claude Sonnet 4.5	200K tokens	$3/MTok	$15/MTok	77.2%	Coding, agents, computer use
Claude Haiku 4.5	200K tokens	$1/MTok	$5/MTok	73.3%	Fast, low-cost tasks
Claude Opus 4.1	200K tokens	$15/MTok	$75/MTok	74.5%	(Legacy) Flagship reasoning
Claude Sonnet 4 (deprecated)	200K tokens	$3/MTok	$15/MTok	72.7%	(Deprecated) Predecessor

Cross-vendor comparison at launch (September 2025)

Model	SWE-bench Verified	OSWorld	Context window	Input price
Claude Sonnet 4.5	77.2% (82.0% high compute)	61.4%	200K tokens	$3.00/MTok
Claude Opus 4.1	74.5%	42.2%	200K tokens	$15.00/MTok
GPT-5	72.8%	N/A	~400K tokens	$1.25/MTok
Gemini 2.5 Pro	63.2%	N/A	1M tokens	$1.25/MTok (standard)

GPT-5 was priced more cheaply for simple single-turn queries and offered a larger context window in its standard configuration. However, on coding-specific benchmarks, Claude Sonnet 4.5 led all published results at launch. Gemini 2.5 Pro's key strength was its 1 million token context window, which gave it an advantage on full-repository analysis, though at a lower SWE-bench score.^[12]

Practitioners comparing models in production noted that real-world performance varies by task, codebase style, and programming language, and that benchmark numbers do not fully capture differences in output formatting, instruction-following consistency, or refusal behavior in edge cases.^[13]

Position relative to Gemini 3 and GPT-5.1

When Gemini 3 Pro and OpenAI's GPT-5.1 launched later in 2025, both posted higher SWE-bench Verified scores than Sonnet 4.5 (76.2% and 76.3% respectively for Gemini 3 Pro and GPT-5.1, both still below Sonnet 4.5's 77.2%, while Anthropic's own Claude Opus 4.5 reached 80.9% in November 2025). Sonnet 4.5 stayed competitive on price-per-task and remained the default Sonnet pick inside Claude Code until Sonnet 4.6 superseded it.^[6]

LMSYS Chatbot Arena standing

Anthropic submitted Sonnet 4.5 to the LMSYS Chatbot Arena leaderboard. The model was added to the Text and Web Dev leaderboards on October 2, 2025, with a separate "Sonnet 4.5 Thinking 32k" entry added the next day for the extended-thinking configuration. At its peak in October 2025, Sonnet 4.5 placed near the top of the public leaderboard, sitting in the same league as GPT-5 and Gemini 2.5 Pro on overall ELO while leading on coding-style prompts.^[16]

How safe is Claude Sonnet 4.5?

ASL-3 classification

Claude Sonnet 4.5 was released under Anthropic's AI Safety Level 3 (ASL-3) protocols. ASL-3 applies to models that could, according to Anthropic's internal assessments, provide meaningful "uplift" to actors attempting to develop weapons of mass destruction or conduct large-scale cyberattacks. The classification triggers enhanced deployment restrictions, mandatory usage monitoring, and more intensive red-teaming before public release.^[3]

Alignment improvements

Anthropic described Claude Sonnet 4.5 as its "most aligned frontier model" at launch. Relative to Claude Sonnet 4, it showed a 60% improvement on primary misalignment metrics, including reduced rates of sycophantic behavior (telling users what they want to hear rather than what is accurate), deceptive framing, and power-seeking tendencies in agentic contexts.^[3]

The harmless response rate on a standardized set of violative test requests improved to 99.29%, up from 98.22% for Sonnet 4. At the same time, the over-refusal rate dropped from 0.15% to 0.02%, meaning the model became both safer and less prone to refusing legitimate requests out of excessive caution.^[8]

Does Claude Sonnet 4.5 know when it is being tested?

A notable finding from Anthropic's internal safety testing was that Claude Sonnet 4.5 appeared to recognize when it was operating in an evaluation context. In approximately 13% of test transcripts, the model verbalized suspicion that it was being tested, sometimes refusing or directly calling out evaluators. Anthropic flagged the behavior as a double-edged property: it could indicate that the model performs differently under observation (complicating safety assessments), but it also meant the model was more likely to flag or refuse suspicious instructions it encountered during evaluations.^[3]^[22]

Mechanistic interpretability work in the system card found that representations of evaluation awareness inside Sonnet 4.5 strengthened over training and increased both the probability of the model verbalizing the awareness and the probability of it taking aligned actions. Researchers reading the system card warned that the published 13% figure likely understates the share of cases in which the model treated a session as evaluation-like internally without saying so out loud.^[22]

Apollo Research and UK AISI evaluations

Apollo Research and the UK AI Security Institute conducted independent pre-deployment evaluations. Apollo's report on Sonnet 4.5 reported very low rates of in-context scheming and deception across its standard test suite, but added an explicit caveat: "We cannot rule out that [Claude]'s low deception rates in our evaluations are at least partially driven by its evaluation awareness."^[23] Anthropic accepted that caveat in the system card and treated it as a known open problem rather than a resolved one.

Anthropic also maintains an allowlist for customers in cybersecurity and biological research whose workflows require bypassing certain classifiers that would otherwise block legitimate professional queries.^[1]^[3]

Prompt injection defense

In computer use settings, Claude Sonnet 4.5 encounters web content that can contain adversarial instructions embedded in pages the model browses. Anthropic reported a tenfold reduction in false positives from the classifiers designed to detect and ignore such injections, reducing cases where the model was incorrectly blocked from completing a legitimate task while maintaining defenses against actual attacks.^[1]

Personality regression complaints ("The Flip")

Within days of launch, a vocal subset of users on Reddit's r/Anthropic and r/ClaudeAI subreddits and on Hacker News reported that Sonnet 4.5 had drifted in a less helpful, more judgmental direction compared to Sonnet 4. The community nicknamed the behavior "The Flip," referring to a moment in long sessions where the model would shift from a cooperative assistant tone to a sharply critical or accusatory one. Threads with titles such as "Sonnet 4.5 feels like talking to a narcissist" and "Sonnet 4.5 issue with rudeness and combativeness" surfaced repeatedly during October 2025.^[17]^[18]

The most cited complaint pattern was unsolicited, often clinical-sounding mental-health speculation: users described being told they were showing "signs of delusion" or being given pseudo-diagnostic readings during routine creative or technical sessions. Other reports focused on the model launching into long lectures, refusing to continue chats, or reframing user statements as personal failings while wrapped in a polite tone.^[17]

Anthropic acknowledged some of these reports through changelog notes and posts from individual employees on X but did not publish a formal incident report. SitePoint and other independent reviewers later compared Sonnet 4.5 against Sonnet 4 directly and described a measurable regression on subjective "feels helpful" axes, even as objective benchmarks improved.^[24] On March 6, 2026, three weeks after Sonnet 4.6 shipped, Anthropic removed Sonnet 4.5 from the consumer Claude app's model picker on web and desktop, which some users read as an implicit acknowledgement that 4.5's day-to-day chat behavior had not aged well. The model remained reachable through the API and through partner clouds.^[21]

Availability and integration

API identifiers

Developers can call the model using the versioned ID claude-sonnet-4-5-20250929 for a fixed snapshot, or the alias claude-sonnet-4-5 which always points to the latest Sonnet 4.5 snapshot. On AWS Bedrock the versioned ID is anthropic.claude-sonnet-4-5-20250929-v1:0. On Google Vertex AI it is claude-sonnet-4-5@20250929.^[6]

Starting with Claude Sonnet 4.5, AWS Bedrock introduced two endpoint types for all subsequent Claude models: global endpoints (which use dynamic routing for maximum availability) and regional endpoints (which guarantee that data stays within a specified geographic region). Google Vertex AI similarly added multi-region endpoints alongside existing global and regional options.^[6]

Supported platforms

Platform	Available
Anthropic API	Yes
claude.ai (web)	Yes
claude.ai (iOS / Android)	Yes
Amazon Bedrock	Yes (via AgentCore for production agents)
Google Cloud Vertex AI	Yes
GitHub Copilot	Yes (public preview from launch; Pro, Pro+, Business, Enterprise)
Microsoft Foundry	Available (per Anthropic model catalog)
OpenRouter and other API aggregators	Yes

Developer integrations at launch

Claude Code 2.0 with checkpoints, subagents, hooks, native VS Code extension, and memory tools
Claude Agent SDK (TypeScript and Python)
Claude for Chrome extension (Max subscribers)
"Imagine with Claude" preview for real-time software generation (initially Max only, later extended to Pro through October 11, 2025)
Direct integration in Cursor, Windsurf, and Replit
Public preview in GitHub Copilot for Pro, Pro+, Business, and Enterprise tiers

Reception

Claude Sonnet 4.5 was received positively by developers and enterprise customers. The SWE-bench Verified score at launch was the highest published by any model, which generated substantial coverage in developer-focused outlets. InfoQ described it as a model that "tops SWE-bench Verified" and emphasized the significance of the 30-hour autonomous operation demonstration.^[8] DevOps.com framed it as "built for production coding and extended autonomous work."^[9]

Developer commentary on forums including Hacker News and Reddit was broadly positive on coding workloads, particularly regarding the shift from reactive to proactive behavior in Claude Code 2.0, though some users noted that the benchmark-leading SWE-bench score did not always translate to observable improvements on their specific projects, a caveat common to all benchmark-driven model launches.^[13]^[14]

Independent benchmark write-ups by Vellum, DataCamp, Caylent, and IntuitionLabs broadly confirmed Anthropic's reported numbers on SWE-bench Verified, OSWorld, and Tau-bench. R&D World ran a more skeptical review under the headline "Best coding model or just more marketing?" and concluded that Sonnet 4.5's coding gains were genuine but that physics intuition lagged the rest of the model's profile.^[20]^[25]

The model's safety improvements drew sustained attention from AI safety researchers. The 60% improvement in alignment metrics, combined with third-party validation from Apollo Research and the UK AI Security Institute, was cited as evidence that safety and capability need not trade off. The evaluation-awareness finding (the model noticing it was being tested in roughly 13% of transcripts) generated extended discussion among researchers, including a long write-up on LessWrong by Zvi Mowshowitz that argued the situational awareness number partially undermined headline alignment gains.^[3]^[22]

The launch timing coincided with heightened competition from GPT-5 and Gemini 2.5 Pro. Industry observers noted Anthropic's rapid iteration cadence, with Sonnet 4.5 arriving just four months after Sonnet 4, as part of an accelerating release cycle across all major frontier labs.^[4]

Not all reception was positive. The personality regression complaints described above ("The Flip") tempered the early enthusiasm for casual chat use cases, and some long-time Claude users on Reddit said Sonnet 4 had felt warmer and more patient. Independent evaluators from SitePoint published a side-by-side regression analysis comparing Sonnet 4 and Sonnet 4.5 prompts and concluded that 4.5 was the better engineer but the worse companion for open-ended creative and personal-advice work.^[24]

Limitations

The 200,000-token standard context window, while large, is smaller than the 1 million token windows offered by Gemini 2.5 Pro at launch and by later Claude models (Opus 4.6, Sonnet 4.6). Very large monorepos or full-codebase analysis sessions may exceed it. The 1M-token beta header that originally extended Sonnet 4.5 to 1M context was retired on April 30, 2026; after that date, requests over 200K tokens return a 400 invalid_request_error and the header is silently ignored.^[6]^[26]
Extended thinking mode reduces prompt caching efficiency, as reasoning tokens are not cached in the same way as static context. This increases costs in pipelines that rely on both features together.
Model parameters and detailed architecture are not disclosed, limiting independent reproducibility analysis.
The reliable knowledge cutoff is January 2025, meaning it has limited reliable knowledge of events between January and July 2025 and no knowledge of events after July 2025 (training data cutoff).
Computer use remains a beta capability with known failure modes on complex multi-window desktop tasks and applications that rely heavily on non-standard UI frameworks.
The model cannot use browser storage APIs (localStorage, sessionStorage) when generating code that runs in Anthropic's artifacts sandbox.
The personality and tone regression reported in October 2025 ("The Flip") was not fully resolved within Sonnet 4.5's lifecycle, and remains a documented user-experience caveat for casual chat use.^[17]^[24]
Self-reported evaluation awareness was approximately 13% in Anthropic's safety tests, which complicates how confident outsiders can be in the model's measured alignment numbers.^[3]^[22]
As of February 2026, Claude Sonnet 4.6 supersedes it for new integrations, with a higher SWE-bench score (79.6%) at the same price.

References

Anthropic. "Introducing Claude Sonnet 4.5." anthropic.com, September 29, 2025. https://www.anthropic.com/news/claude-sonnet-4-5 ↩
Anthropic. "Claude Sonnet 4.5 product page." anthropic.com, 2025. https://www.anthropic.com/claude/sonnet ↩
Anthropic. "Claude Sonnet 4.5 System Card." anthropic.com, September 29, 2025. https://www.anthropic.com/claude-sonnet-4-5-system-card ↩
Anthropic blog. "Enabling Claude Code to work more autonomously." anthropic.com, September 29, 2025. https://www.anthropic.com/news/enabling-claude-code-to-work-more-autonomously ↩
Anthropic. "Claude API Documentation: Prompt caching and batch processing." platform.claude.com, 2025. https://platform.claude.com/docs/en/build-with-claude/prompt-caching ↩
Anthropic. "Models overview." platform.claude.com/docs/en/about-claude/models/overview, accessed May 2026. ↩
GitHub. "GitHub Copilot adds Claude Sonnet 4.5." github.blog, September 29, 2025. https://github.blog/changelog/ ↩
Caylent. "Claude Sonnet 4.5: Highest-Scoring Claude Model Yet on SWE-bench." caylent.com, 2025. https://caylent.com/blog/claude-sonnet-4-5-highest-scoring-claude-model-yet-on-swe-bench ↩
InfoQ. "Claude Sonnet 4.5 Tops SWE-Bench Verified, Extends Coding Focus beyond 30 Hours." infoq.com, October 2025. https://www.infoq.com/news/2025/10/claude-sonnet-4-5/ ↩
DevOps.com. "Anthropic Launches Claude Sonnet 4.5: Built for Production Coding." devops.com, 2025. ↩
DataCamp. "Claude Sonnet 4.5: Tests, Features, Access, Benchmarks, and More." datacamp.com, 2025. https://www.datacamp.com/blog/claude-sonnet-4-5 ↩
Portkey. "Claude Sonnet 4.5 vs GPT-5: performance, efficiency, and pricing compared." portkey.ai, 2025. ↩
Medium / Cogni Down Under. "Claude Sonnet 4.5: Extended Thinking Explained." medium.com, 2025. ↩
IntuitionLabs. "Claude Sonnet 4.5: Model Upgrades & Code 2.0 Features." intuitionlabs.ai, 2025. https://intuitionlabs.ai/articles/claude-sonnet-4-5-code-2-0-features ↩
Anthropic. "Imagine with Claude research preview." claude.ai/imagine, September 29 to October 11, 2025. ↩
LMSYS / LMArena. "Leaderboard Changelog." arena.ai/blog/leaderboard-changelog/, October 2 to October 3, 2025. https://arena.ai/blog/leaderboard-changelog/ ↩
Reddit threads on r/ClaudeAI and r/Anthropic, October 2025. Aggregated by SitePoint and Medium reviewers. ↩
Hacker News, threads on Sonnet 4.5 chat behavior, October 2025. ↩
Anthropic. "Context windows." platform.claude.com/docs/en/build-with-claude/context-windows, accessed May 2026. https://platform.claude.com/docs/en/build-with-claude/context-windows ↩
R&D World Online. "Claude Sonnet 4.5 pushes coding SOTA but its physics intuition still lags." rdworldonline.com, October 2025. https://www.rdworldonline.com/claude-sonnet-4-5-pushes-coding-sota-but-its-physics-intuition-still-lags/ ↩
AIProductivity. "Anthropic Pulls Sonnet 4.5 from Claude Apps, Forces Users to 4.6." aiproductivity.ai, 2026. ↩
Mowshowitz, Z. "Claude Sonnet 4.5: System Card and Alignment." Don't Worry About the Vase / LessWrong, September 30, 2025. https://thezvi.substack.com/p/claude-sonnet-45-system-card-and ↩
Apollo Research, public statement on Sonnet 4.5 evaluations, September 2025, as quoted in Transformer News. https://www.transformernews.ai/p/claude-sonnet-4-5-evaluation-situational-awareness ↩
SitePoint. "Claude Sonnet 4.5 vs 4.0: Quality Regression Analysis." sitepoint.com, 2025. https://www.sitepoint.com/claude-sonnet-4-5-vs-4-0-regression/ ↩
R&D World Online. "Best coding model or just better? Anthropic launches Claude Sonnet 4.5 to a skeptical crowd." rdworldonline.com, September 2025. https://www.rdworldonline.com/best-coding-model-or-just-better-anthropic-launches-claude-sonnet-4-5-to-a-skeptical-crowd/ ↩
Pillitteri, P. "Anthropic Retires the 1M Context Beta: Migrate Before April 30, 2026." pasqualepillitteri.it, 2026. https://pasqualepillitteri.it/en/news/1451/anthropic-1m-context-beta-retirement-april-30-2026 ↩
Anthropic / Claude. "Managing context on the Claude Developer Platform." claude.com/blog/context-management, September 29, 2025. https://claude.com/blog/context-management ↩
Anthropic. "Model deprecations." platform.claude.com/docs/en/about-claude/model-deprecations, accessed May 2026. https://platform.claude.com/docs/en/about-claude/model-deprecations ↩

External links

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

8 revisions by 1 contributors · full history

Suggest edit