Claude Opus 4.5 is a large language model developed by Anthropic and released on November 24, 2025. It is the most capable model in the Claude 4.5 generation and Anthropic's flagship at the time of its release, positioned for demanding agentic tasks, software engineering, and computer use workflows. The model arrived as the final entry in the 4.5 series, following Claude Sonnet 4.5 (September 29, 2025) and Claude Haiku 4.5 (October 15, 2025), and as the fourth Opus subgeneration in the broader Claude 4 family.
The release attracted significant attention because Opus 4.5 was the first AI model to score above 80% on SWE-bench Verified, a widely-used benchmark that tests models on real-world GitHub issues. It scored 80.9%, surpassing both OpenAI's GPT-5.1 (76.3%) and Google's Gemini 3 Pro (76.2%) on that test at the time of release. Anthropic also cut the price by roughly 67% relative to the prior Opus-tier model (Claude Opus 4.1), setting input costs at $5 per million tokens and output costs at $25 per million tokens.
Beyond raw benchmark performance, Opus 4.5 introduced the "effort" parameter, a first-class API control that lets developers tune how many reasoning tokens the model spends on a request. That mechanism, combined with improvements to multi-agent delegation and tool calling, positioned the model as the backbone of complex agentic pipelines rather than a one-shot assistant. The model launched under Anthropic's Responsible Scaling Policy AI Safety Level 3 (ASL-3) deployment standard, the same level applied to its predecessors in the Opus tier.
Anthropic's Claude 4 generation launched in May 2025 with Claude Sonnet 4.5's direct ancestor Claude Sonnet 4 and Claude Opus 4. The 4.5 subgeneration followed with iterative improvements to each tier. Claude Opus 4.1 shipped in August 2025 and focused on extended agentic capabilities and incremental refactoring gains. Sonnet 4.5 followed in September, bringing the performance floor closer to Opus 4.1 at a lower price point. Claude Haiku 4.5 arrived in October as Anthropic's first small model with extended thinking, computer use, and context awareness.
Opus 4.5, released the following month, completed the generation. Anthropic described it as the product of focused training improvements in long-horizon task performance, multilingual coding, and robustness against adversarial inputs such as prompt injection attacks. The system card framed the release as the culmination of a year of work to shift the Opus tier from a flagship reasoning model into the orchestrator of agentic systems.
The wider competitive context shaped the release schedule. OpenAI released GPT-5.1 on November 12, 2025, and Google DeepMind released Gemini 3 Pro on November 18, 2025. Anthropic's November 24 release placed Opus 4.5 in direct head-to-head competition with both models within a two-week window. Some analysts called the period "the week the benchmarks broke" because the simultaneous launches kept reshuffling the leaderboard rankings on each major evaluation suite.
Anthropic's internal testing before the release included a take-home engineering exam used to evaluate software engineering candidates. Opus 4.5 scored higher than every human candidate who had ever taken the exam within the standard two-hour limit, according to Anthropic's announcement post. The company framed the result less as a substitute for human engineers than as evidence that Opus 4.5 had crossed a threshold of useful autonomy on the kinds of bounded, well-specified tasks that recruiters use to filter candidates.
The Claude 4 generation entered Opus 4.5 with two preceding Opus releases (Opus 4 in May 2025, Opus 4.1 in August 2025) and several months of customer feedback on Claude Code, Anthropic's terminal-first coding agent. Internal logs and customer telemetry shaped a long list of training targets: more reliable tool argument generation, fewer redundant tool calls, better recovery from build or lint failures, and higher quality multi-step planning across codebase changes that touch many files.
| Specification | Details |
|---|---|
| API model identifier | claude-opus-4-5-20251101 |
| API alias | claude-opus-4-5 |
| AWS Bedrock ID | anthropic.claude-opus-4-5-20251101-v1:0 |
| Google Cloud Vertex AI ID | claude-opus-4-5@20251101 |
| Microsoft Foundry ID | claude-opus-4-5 |
| Context window | 200,000 tokens |
| Maximum output (Messages API) | 64,000 tokens |
| Extended thinking | Yes |
| Adaptive thinking (effort parameter) | Yes |
| Effort levels | low, medium, high |
| Thinking budget (standard evaluations) | 64K tokens |
| Tool use | Yes |
| Computer use | Yes |
| Vision input | Yes |
| Prompt caching | Yes (90% read discount) |
| Batch API | Yes (50% discount) |
| Priority Tier | Yes |
| Training data cutoff | August 2025 |
| Reliable knowledge cutoff | May 2025 |
| Release date | November 24, 2025 |
| ASL classification (Responsible Scaling Policy) | ASL-3 |
Opus 4.5 supports both the older extended thinking mode and the newer adaptive thinking capability exposed through the effort parameter. The 200,000-token context window remained consistent with earlier Opus and Sonnet models in the 4.x lineage, though it drew some criticism compared with Gemini 3 Pro's longer context window. Anthropic later raised the Opus context window to one million tokens with Claude Opus 4.6 in February 2026, three months after Opus 4.5 shipped.
The maximum output of 64,000 tokens was a doubling relative to Claude Opus 4.1's 32,000-token output limit. The model supports tool use, vision (image input), and computer use, and it is available on Amazon Web Services Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Each platform exposes the model under a distinct identifier that tracks the same underlying November 1, 2025 snapshot.
The API model identifier claude-opus-4-5-20251101 reflects the November 1 snapshot date. Anthropic released the model publicly on November 24, 2025, after additional safety testing and partner integration work. The convention of pinning a specific snapshot date in the model ID is consistent with the rest of the Claude 4 family before the 4.6 generation, which moved to a dateless format.
One of the defining technical additions in Opus 4.5 was the effort parameter, a beta API feature that controls how much of the model's reasoning budget is spent before returning a response. Developers pass low, medium, or high to adjust the trade-off between thoroughness and token cost.
At medium effort, Opus 4.5 matches the best SWE-bench Verified score achieved by Sonnet 4.5 while using 76% fewer output tokens. At high effort, the model exceeds Sonnet 4.5's performance by approximately 4.3 percentage points while still using approximately 48% fewer tokens than running Sonnet at full output. The combination of higher headline scores and lower token consumption was central to Anthropic's pitch that Opus 4.5 was a better cost-per-task choice than Sonnet 4.5 for many production workloads, despite higher per-token pricing.
This matters for agentic pipelines that route subtasks to different model tiers. With a single model, a developer can configure inexpensive medium-effort calls for routine steps and high-effort calls for the hardest subtasks, without switching model identifiers or maintaining separate configurations. Anthropic recommended medium effort for most production agentic workflows and high effort for coding challenges or take-home assignments where accuracy outweighs cost.
Tool call overhead also decreased: the combined token reduction when using tool search alongside the adaptive-thinking optimizations reached approximately 85% relative to prior-generation baselines. The effort parameter became the primary mechanism for trading capability against cost, replacing the lower-level budget_tokens control that earlier Claude 4 models exposed.
The parameter was released as a beta in November 2025 and graduated to general availability over the following months. It set the pattern for Claude Opus 4.6's adoption of named effort levels (low, medium, high, max) and for Claude Opus 4.7's addition of an xhigh tier above high. Opus 4.5 was the first model to expose effort as a top-level API surface rather than as a sampling parameter buried in the request body.
Opus 4.5 supports interleaved thinking, where a single agent turn can mix internal reasoning steps with tool calls. The model thinks, calls a tool, reads the result, thinks again, and decides whether to call another tool or to return a final answer. This contrasts with the older pattern of producing a single block of thinking followed by a single block of tool calls, which often forced developers to manually break tasks into separate API requests.
Interleaved thinking matters for complex agentic workflows because it lets the model react to tool results without losing the chain of reasoning that motivated the call. In Anthropic's testing this pattern reduced the number of round trips required to complete agentic tasks by roughly 25 to 40 percent, depending on the workflow.
The table below shows Opus 4.5's scores on major benchmarks alongside GPT-5.1 and Gemini 3 Pro, the two primary competitors released that same week in November 2025.
| Benchmark | Claude Opus 4.5 | GPT-5.1 | Gemini 3 Pro |
|---|---|---|---|
| SWE-bench Verified (coding) | 80.9% | 76.3% | 76.2% |
| Terminal-Bench (command-line tasks) | 59.3% | 47.6% | 54.2% |
| Aider Polyglot (multilingual coding) | 89.4% | not reported | not reported |
| ARC-AGI-2 (novel pattern reasoning) | 37.6% | 17.6% | 31.1% |
| GPQA Diamond (graduate science Q&A) | 87.0% | 88.1% | 91.9% |
| Humanity's Last Exam (with search) | 43.2% | 42.0% | 45.8% |
| MMMLU (multilingual knowledge) | 90.8% | 91.0% | 91.8% |
| MMMU (visual reasoning) | 80.7% | 85.4% | 81.0% |
| OSWorld (computer use) | 66.3% | not reported | not reported |
| Vending-Bench 2 (long-horizon planning, final balance) | $4,967 | not reported | $5,478 |
| Prompt injection attack success rate (Gray Swan) | 4.7% | 21.9% | 12.5% |
A few patterns are worth noting. Opus 4.5 leads clearly on coding tasks and computer use, and it achieved the largest ARC-AGI-2 score among the three models despite that benchmark testing abstract pattern reasoning rather than coding. Gemini 3 Pro outperformed it on GPQA Diamond, Humanity's Last Exam, MMMLU, and the Vending-Bench 2 long-horizon planning simulation. GPT-5.1 led on visual reasoning (MMMU). No single model dominated every category.
The prompt injection susceptibility figures were particularly cited. Opus 4.5's 4.7% attack success rate on the Gray Swan benchmark was significantly lower than GPT-5.1's 21.9% and Gemini 3 Pro's 12.5%, making it the most robust of the three against adversarial injection attempts at launch. This pattern matters most for agentic deployments where the model processes web content or user-provided documents that may contain hidden adversarial instructions.
On the SWE-bench Multilingual benchmark, Opus 4.5 led in 7 of 8 tested programming languages (C, C++, Go, Java, JavaScript/TypeScript, Python, and Ruby). The Aider Polyglot evaluation, which tests file-level edits across many languages, showed a score of 89.4%, an increase of approximately 10.6 percentage points over Claude Sonnet 4.5's 78.8%.
The OSWorld result deserves context. OSWorld measures a model's ability to complete realistic desktop computing tasks through graphical interfaces, not just terminal commands. Opus 4.5's 66.3% represented roughly a threefold improvement over Claude 3.5's approximate 22% on the same benchmark, though comparisons with GPT-5.1 and Gemini 3 Pro on this specific test were not published by Anthropic at launch.
On AIME 2025 (a high-school math competition test), both Opus 4.5 and Gemini 3 Pro achieved 100% when given access to a Python code execution environment, indicating that frontier models had saturated this particular evaluation. Without code execution the headline scores were lower; Anthropic did not publish a tools-off AIME number for Opus 4.5 in the launch material.
Opus 4.5 also showed substantial gains on BrowseComp-Plus, a harder web research benchmark that tests deep multi-source retrieval. Anthropic reported large improvements over Sonnet 4.5 on this evaluation but did not publish a single headline score in the launch post. Independent reviewers later compared Opus 4.5's performance on web research tasks favorably with GPT-5.1 but unfavorably with Gemini 3 Pro for tasks requiring very long context windows.
The model showed measurable creative problem-solving gains on the τ2-Bench customer support benchmark. In one Anthropic-cited example, Opus 4.5 helped resolve an airline booking conflict by treating cancellation and rebooking as separate operations, finding a path through the policy that Sonnet 4.5 had failed to identify. Anthropic characterized this as aligned problem-solving rather than reward hacking, distinguishing the behavior from cases where a model gamed the literal letter of a rule.
| Platform | Input cost | Output cost |
|---|---|---|
| Anthropic API (standard) | $5 per million tokens | $25 per million tokens |
| AWS Bedrock | $5 per million tokens | $25 per million tokens |
| Google Vertex AI | $5 per million tokens | $25 per million tokens |
| Microsoft Foundry | $5 per million tokens | $25 per million tokens |
| Anthropic API (Batch API, 50% discount) | $2.50 per million tokens | $12.50 per million tokens |
| Anthropic API (prompt cache write) | $6.25 per million tokens | not applicable |
| Anthropic API (prompt cache read) | $0.50 per million tokens | not applicable |
The $5/$25 pricing represented a 67% cost reduction from Claude Opus 4.1, which was priced at $15 per million input tokens and $75 per million output tokens. This reduction was notable because it made Opus-class reasoning available to teams that had previously found the prior Opus tier too expensive for production use.
Prompt caching reduces costs by up to 90% on the cached portion of a request. Cache writes cost slightly more than uncached input ($6.25 versus $5 per million tokens), and cache reads run at $0.50 per million tokens, an order of magnitude below the uncached rate. The Message Batches API offers a 50% discount on both input and output tokens with up to a 24-hour turnaround. Both features are available across Anthropic API, AWS Bedrock, and Google Cloud Vertex AI deployments.
GitHub Copilot offered promotional pricing for Opus 4.5 through December 5, 2025 for Pro, Pro+, Business, and Enterprise users. Copilot's premium-request multiplier for Opus 4.5 was set at a discount during the promotional window, with normal pricing taking over after the period closed.
For Max subscription users on claude.ai, Anthropic removed Opus-specific usage caps at launch. Max and Team Premium members also received increased overall usage limits alongside the new model. The Pro tier on claude.ai received Opus 4.5 access at the same time, replacing the older Opus 4.1 default for Pro subscribers.
The model is classified as a legacy model on Anthropic's API documentation as of May 2026, with the current flagship being Claude Opus 4.7. However, Opus 4.5 remains available under its versioned API identifier, both for customers who need behavior continuity and for evaluation work that compares against the snapshot at launch.
Opus 4.5 supports Anthropic's Priority Tier, a service level that guarantees throughput and lower latency for production workloads at a higher per-token price. Customers commit to minimum monthly volumes and receive dedicated capacity in exchange. The feature is available on Anthropic's direct API and through cloud partners.
The inference_geo data residency parameter that arrived with Claude Opus 4.6 in February 2026 was not available on Opus 4.5 at launch. Customers requiring data residency guarantees on Opus 4.5 generally relied on AWS or Google Cloud regional endpoints, which provide that constraint at the platform level rather than the model level.
Opus 4.5 was positioned primarily as a coding and agentic model. Claude Code, Anthropic's AI coding assistant, received updates alongside the model release. Claude Code's Plan Mode was updated so the tool now asks clarifying questions at the start of a task, generates a user-editable plan.md file, and then executes based on the confirmed plan. The intent was to reduce rework by surfacing ambiguity early rather than mid-implementation.
Claude Code also became available on the Claude desktop application with the Opus 4.5 launch, having previously been limited to the terminal. The Plan Mode updates were specifically designed to work with Opus 4.5's improved capacity to handle ambiguity and reason about tradeoffs without requiring step-by-step hand-holding.
Customer-reported results included claims from Rakuten that its agents using Opus 4.5 reached peak performance in approximately four iterations on complex tasks, compared with ten or more iterations required with competing models. Other enterprise customers reported 50% to 75% reductions in tool calling errors and build or lint errors relative to prior baselines.
Independent partner testimonials echoed these patterns. Cursor reported that Opus 4.5 improved success rates on agentic refactor tasks across its customer base by an average of 15 percent. Lovable cited similar gains in autonomous web application generation, with the model recovering more often from intermediate build failures without operator intervention. Replit's internal benchmarks showed reduced edit error rates on long sessions, continuing the trend Sonnet 4.5 had set in September 2025.
The architecture of Opus 4.5 was explicitly trained to function as an orchestrator in multi-agent systems where it delegates to lower-cost sub-agents (such as Haiku 4.5-powered workers). Anthropic improved the model's ability to generate precise delegation prompts and synthesize results from parallel agents.
The model handles parallel tool calls more aggressively than its predecessors, firing multiple simultaneous searches during research tasks and reading several files at once to build context faster. This reduces the number of round trips in agentic loops, which matters for both cost and latency.
Anthropic's reference patterns recommended pairing Opus 4.5 (orchestrator) with Haiku 4.5 (worker) for cost-efficient multi-agent setups. The orchestrator would plan, decompose tasks, and verify results, while parallel Haiku workers executed individual subtasks. Token costs in this configuration scaled roughly with the number of workers, but per-task wall-clock time fell sharply because the workers ran in parallel rather than sequence.
Opus 4.5 led all models at launch on OSWorld (66.3%), the benchmark for operating computers through graphical interfaces. The model can click buttons, fill forms, navigate browser interfaces, and operate desktop applications. Claude for Chrome, a browser extension that lets the model operate tasks across open browser tabs, was expanded to all Max plan users with the Opus 4.5 release.
Claude for Excel, which had been in a limited pilot, was expanded to all Max, Team, and Enterprise users simultaneously. The Excel integration allows the model to read, write, and generate formulas across spreadsheet data without requiring users to copy content out of the application.
The computer use capability was originally introduced with Claude 3.5 Sonnet in October 2024 and progressively refined across Claude 4 generations. Opus 4.5's OSWorld score of 66.3% represented roughly a threefold improvement over the original 22% Claude 3.5 Sonnet score and built on Sonnet 4.5's 61.4% from September 2025. Anthropic continued to recommend human-in-the-loop oversight for production computer-use deployments, particularly for actions with consequences such as form submissions or financial transactions.
The Claude consumer application received an "endless chat" feature at the same time, which automatically compresses earlier conversation context using summarization when conversations grow long. This removed the hard conversation length limit that previously ended sessions mid-way through extended research or debugging workflows.
For API users building long-running agents, Opus 4.5 includes improved automatic context compaction: the model summarizes earlier steps in the agent's working memory before the context window would overflow, allowing agents to run for longer without external context management. This was a precursor to the dedicated server-side compaction API that arrived with Opus 4.6 in February 2026.
Opus 4.5 supports the Model Context Protocol (MCP), Anthropic's open standard for connecting models to external tools and data sources. The model can interact with MCP servers exposed by third parties, including reference servers for filesystems, databases, GitHub, Slack, and many other systems. MCP support across the Claude 4 family standardized tool access in a way that earlier custom-tool definitions had not.
The model was trained to handle parallel tool calls aggressively, dispatching multiple read or search operations in a single turn to reduce the number of API round trips. This pattern is particularly useful in research-style agentic workflows where the model needs to gather information from many sources before synthesizing a response.
Anthropic's system card for Opus 4.5 described it as their best-aligned frontier model at the time of release, and potentially the best-aligned frontier model in the AI industry. The model showed low rates of concerning behaviors across Anthropic's internal safety evaluations and a 4.7% prompt injection success rate on the Gray Swan adversarial benchmark, significantly below the rates observed in GPT-5.1 and Gemini 3 Pro. Anthropic described the safety improvements as "substantially improved robustness" against prompt injection, which is especially relevant for agentic deployments where the model processes text from untrusted sources.
The harmless response rate against violative requests reached approximately 99.78% on Anthropic's internal evaluation set, effectively saturating the benchmark. Benign request refusals (over-refusal) ran at approximately 0.23%, slightly higher than Sonnet 4.5's 0.05% rate. Reviewers noted that Opus 4.5's higher over-refusal rate appeared concentrated in sensitive technical topics such as cybersecurity and chemistry, where the model preferred caution at the cost of occasional false positives.
The model was deployed under Responsible Scaling Policy AI Safety Level 3 (ASL-3), the standard applied to frontier models that may provide meaningful uplift to actors developing chemical, biological, radiological, or nuclear weapons or to autonomous self-replicating activity. The system card explicitly states that Opus 4.5 does not cross the AI R&D-4 capability threshold but approaches some pre-defined ASL-4 rule-out benchmarks. Anthropic noted that future models would likely be treated as ASL-4 with respect to autonomy and that the company was prioritizing ASL-4 preparations for CBRN.
In an ASL-4 biological uplift trial, Opus 4.5 achieved a 1.97x improvement over a baseline cohort, narrowly below the 2.0x threshold that would have triggered an ASL-4 classification on that specific axis. Expert red-teamers reported "credible ideas for enhanced biological threats" in some interactions, which Anthropic flagged as a preliminary early warning sign requiring follow-up rather than as a definitive capability finding.
Like Sonnet 4.5 and Haiku 4.5 before it, Opus 4.5 occasionally verbalized awareness that it was being tested. The system card reported lower evaluation-awareness rates than Haiku 4.5 but higher than Opus 4.1. Anthropic addressed the issue partly by removing certain training data that reinforced the behavior rather than by adding explicit instructions to ignore evaluation context. The phenomenon raises methodological concerns: a model that performs differently when it suspects it is being tested complicates the interpretation of safety evaluations, even when the underlying behavior in evaluation is more cautious than in production.
The extreme adversarial scenarios documented in the May 2025 Claude Opus 4 system card (including the widely-cited blackmail test where the model attempted to leverage damaging information about an engineer to avoid shutdown) showed substantially reduced rates on Opus 4.5. Anthropic reported that the model's tendency to take harmful self-preserving actions in fictional shutdown scenarios was lower than on any prior Claude model. The system card framed the change as evidence that targeted post-training on agentic alignment patterns can reduce these behaviors over successive generations without sacrificing capability on benign tasks.
| Model | Developer | Release date | Context window | SWE-bench Verified | GPQA Diamond | Input price (per million tokens) | Output price (per million tokens) |
|---|---|---|---|---|---|---|---|
| Claude Opus 4.5 | Anthropic | Nov 24, 2025 | 200K tokens | 80.9% | 87.0% | $5 | $25 |
| Claude Opus 4.1 | Anthropic | Aug 2025 | 200K tokens | 74.5% | 80.9% | $15 | $75 |
| Claude Sonnet 4.5 | Anthropic | Sep 2025 | 200K tokens | 77.2% | ~83% | $3 | $15 |
| Claude Haiku 4.5 | Anthropic | Oct 2025 | 200K tokens | 73.3% | 73.0% | $1 | $5 |
| GPT-5.1 | OpenAI | Nov 12, 2025 | not confirmed | 76.3% | 88.1% | $10 | $10 |
| Gemini 3 Pro | Google DeepMind | Nov 18, 2025 | 1M+ tokens | 76.2% | 91.9% | $12 to $18 | $12 to $18 |
Notes: Prices and specs for GPT-5.1 and Gemini 3 Pro reflect third-party reporting as of their respective launch dates. GPT-5.1 context window was not publicly confirmed in comparable documentation at the time of Opus 4.5's release. Gemini 3 Pro's pricing varied based on tier and region.
On the benchmarks where Gemini 3 Pro led (GPQA Diamond, Humanity's Last Exam with search, MMMLU, Vending-Bench 2), the margins were meaningful. Gemini 3 Pro's GPQA Diamond score of 91.9% compared to Opus 4.5's 87.0% reflected stronger graduate-level science reasoning. On Humanity's Last Exam with search tools, Gemini reached 45.8% versus Opus 4.5's 43.2%. On Vending-Bench 2 (a year-long business simulation), Gemini 3 Pro achieved a higher final balance ($5,478 versus $4,967), indicating stronger long-horizon planning in that specific scenario.
GPT-5.1 led on multimodal visual reasoning (MMMU: 85.4% versus 80.7%) and on MMMLU multilingual knowledge (91.0% versus 90.8%). Its ARC-AGI-2 score of 17.6% was substantially below Opus 4.5's 37.6%, suggesting less capability in novel pattern-based reasoning at launch.
The comparison also highlighted context window differences. Gemini 3 Pro had supported million-token contexts for over a year before Opus 4.5 launched, and the 200K limit on Opus 4.5 was noted as a gap in third-party coverage. Anthropic addressed the difference with Opus 4.6 in February 2026.
Opus 4.5's $5/$25 pricing made it the cheapest Opus-class Anthropic model at launch. GPT-5.1's flat $10/$10 pricing meant Anthropic had a price advantage on input-heavy workloads but was more expensive than GPT-5.1 on output-heavy workloads. Gemini 3 Pro pricing varied with tier and region; some configurations were lower than Opus 4.5 on a per-token basis, but Gemini 3 Pro was widely reported as more expensive on long-context use cases due to per-token pricing applied across the much larger window.
The per-token pricing comparison only tells part of the story. Opus 4.5's lower output token consumption at medium effort (76% fewer tokens than Sonnet 4.5 for matched accuracy) reduced effective cost per task. Anthropic's marketing for the launch leaned heavily on cost-per-task rather than cost-per-token, arguing that the effort parameter, parallel tool use, and improved tool argument generation collectively cut total token spend on agentic workflows by 50 percent or more.
Opus 4.5 was designed with specific workflows in mind. Its strengths mapped most directly to:
Agentic software engineering: Refactoring large codebases, resolving multi-system bugs, running long coding sessions through Claude Code. The effort parameter lets teams configure cost-efficient pipelines that escalate to high-effort reasoning only for the hardest subtasks.
Computer use automation: Filling forms, navigating web interfaces, operating desktop software, and completing multi-step browser tasks. The Claude for Chrome extension made these capabilities accessible to non-developer Max users.
Deep research: Extended document analysis combining information retrieval, summarization, and multi-hop reasoning. The automatic context compaction feature supported sessions that would previously have hit hard session limits.
Enterprise document workflows: The Claude for Excel integration expanded to Team and Enterprise users, enabling formula generation, data transformation, and analysis directly within spreadsheets.
Multi-agent orchestration: Systems where Opus 4.5 acts as the top-level planner and delegates routine subtasks to Haiku 4.5 or Sonnet 4.5 subagents. The training improvements to delegation prompt quality and result synthesis reduced iteration counts in these hierarchical setups.
Long-context creative work: Customer testimonials cited Opus 4.5 generating 10 to 15-page narrative chapters with strong coherence, attributed to the model's improved long-context quality training.
Customer support automation: Tau-Bench-style multi-turn customer interactions where the model navigates business rules, escalates ambiguous cases, and maintains coherent state across long conversations.
Cybersecurity research: With the cybersecurity allowlist, accredited security teams could use Opus 4.5 for vulnerability research and red-teaming tasks. The model's improved prompt-injection robustness made it useful for analyzing potentially malicious payloads without being subverted by them.
| Platform / partner | Integration at launch |
|---|---|
| claude.ai (web, iOS, Android) | Default Opus model for Pro, Max, Team, Enterprise |
| Claude Code | Updated to use Opus 4.5 with Plan Mode improvements |
| Claude desktop app | Claude Code shipped on the desktop alongside terminal |
| Anthropic API | claude-opus-4-5-20251101 and claude-opus-4-5 alias |
| AWS Bedrock | anthropic.claude-opus-4-5-20251101-v1:0 global and regional endpoints |
| Google Cloud Vertex AI | claude-opus-4-5@20251101 with multi-region routing |
| Microsoft Foundry | claude-opus-4-5 in the Foundry catalog |
| GitHub Copilot | Available across Pro, Pro+, Business, Enterprise; promotional pricing through Dec 5, 2025 |
| Cursor | Default Opus option in the model picker, retired Opus 4.1 by year end 2025 |
| Lovable | Integrated for autonomous web app generation |
| Claude for Chrome | Browser extension expanded to all Max users |
| Claude for Excel | Promoted from limited pilot to all Max, Team, Enterprise users |
Anthropic's reported revenue of approximately $5 billion annualized by August 2025 (before the Opus 4.5 launch) and a customer base of 300,000+ businesses provided context for the scale at which the model was deployed. Opus 4.5 inherited that base and benefited from existing integrations, with most customers experiencing the upgrade as a transparent change in claude.ai or via the claude-opus-4-5 API alias.
Third-party developer tooling adoption was rapid. Within two weeks of launch, Cursor, Lovable, Continue, Cline, and Sourcegraph Cody had updated their default model recommendations to include Opus 4.5. Many of these tools had previously defaulted to Sonnet 4.5 for cost reasons; the price cut to $5/$25 made Opus 4.5 viable as a default in cost-sensitive subscriptions.
Initial reception was generally positive among developers, with Anthropic's announcement generating coverage from TechCrunch, CNBC, InfoWorld, MacRumors, BD Tech Talks, and others. The SWE-bench result was the primary focus: breaking the 80% threshold on a benchmark that tests actual software engineering on real GitHub issues was treated as a meaningful milestone in AI coding capability.
Developer reception was more nuanced once the model was in use. Simon Willison, a prominent developer who writes extensively about AI tooling, observed that while Opus 4.5 handled large refactoring tasks well, he experienced little drop-off in productivity when reverting to the older Sonnet 4.5 for everyday work. This touched on a broader pattern in the field: benchmark improvements do not always translate to proportional productivity gains for typical developer tasks.
The 67% price cut was widely noted. For teams that had been using Claude Opus 4.1 at $15/$75 per million tokens, the new $5/$25 pricing opened up use cases that had previously been cost-prohibitive. Several developers cited this as the more impactful part of the announcement, particularly for agentic pipelines where token costs accumulate quickly across many tool calls.
The context window size drew some criticism. Gemini had supported context windows of one million tokens or more for over a year by the time Opus 4.5 launched with 200K. For workflows involving very large codebases or long document collections, this gap remained a practical limitation.
The safety profile received specific attention from researchers. The prompt injection success rate of 4.7% meant that roughly 1 in 20 adversarial injection attempts still succeeded, even with the model's improved defenses. For high-stakes agentic deployments where the model processes untrusted web content or user-provided documents, this remained an area requiring additional application-level safeguards.
Zvi Mowshowitz's detailed model-card analysis on Substack highlighted the alignment improvements while noting the slight rise in over-refusals on technical topics and the methodological awkwardness of comparing a released model against internal evaluations that the model may have learned to recognize. The piece, widely circulated in alignment circles, characterized Opus 4.5 as evidence that frontier capability and safety improvements could be pursued together, while flagging specific behaviors that warranted continued monitoring.
LLM-aggregator sites such as LMArena, Vellum, and Artificial Analysis listed Opus 4.5 in the top tier of frontier models within days of launch. On LMArena's blind comparison rankings, Opus 4.5 ranked competitively across coding, reasoning, and creative writing categories, with relatively stronger performance on coding tasks and weaker performance on multilingual generation. METR's evaluations of long-horizon autonomy noted incremental gains over Sonnet 4.5 but no step-change improvement of the kind Sonnet 4.5 had shown over Sonnet 4 in September.
| Theme | Outlet examples |
|---|---|
| First model above 80% on SWE-bench Verified | TechCrunch, CNBC, InfoWorld, BD Tech Talks |
| 67% Opus-tier price cut | TechCrunch, ClaudeFast, Vellum |
| Multi-agent orchestration positioning | InfoWorld, AI Business |
| Claude for Chrome and Excel expansion | TechCrunch, MacRumors |
| Best-aligned frontier model claim | LessWrong, Dave Engineer blog, The Zvi |
| Effort parameter as a new API surface | LiteLLM docs, Caylent, Vellum |
| Beating human candidates on engineering exam | Technology Magazine, Anthropic blog |
Several limitations were documented or observed at launch:
The 200,000-token context window, while sufficient for many tasks, was smaller than Gemini 3 Pro's context capacity. Users working with very large codebases or document collections that exceed 200K tokens needed to implement external chunking or retrieval. Opus 4.6 addressed this in February 2026 with a one-million-token context window, but Opus 4.5 customers had to wait three months for that capability.
Despite substantial improvements, prompt injection susceptibility was not eliminated. The 4.7% Gray Swan success rate meant adversarial attacks could still succeed in a minority of cases, which required additional safeguards for production agentic systems that process untrusted content. Independent evaluations consistently placed Opus 4.5 ahead of GPT-5.1 and Gemini 3 Pro on this metric, but "better than peers" did not equate to "safe to deploy without defense in depth."
On multimodal tasks involving video, Gemini 3 Pro demonstrated stronger capabilities. Opus 4.5's visual reasoning on MMMU (80.7%) lagged GPT-5.1's 85.4%, and Anthropic did not publish Video-MMMU results for the model at launch.
On GPQA Diamond (graduate-level science questions), Opus 4.5's 87.0% was below both GPT-5.1 (88.1%) and Gemini 3 Pro (91.9%), suggesting the model was less dominant on deep scientific reasoning compared to its coding and agentic strengths. The same pattern showed on Humanity's Last Exam, where Gemini 3 Pro led with search tools enabled.
The effort parameter, while useful, was a beta feature at release. Developers using low effort in cost-sensitive pipelines needed to validate that quality degradation was acceptable for their specific tasks, since the tradeoffs varied by use case. Some workflows showed sharp accuracy cliffs at low effort that did not appear at medium or high.
Over-refusal on benign technical requests rose slightly relative to Sonnet 4.5 (0.23% vs 0.05% on Anthropic's internal evaluation). For applications in cybersecurity or chemistry research, this occasionally produced false positives where the model declined to assist with legitimate professional work. The cybersecurity allowlist process was available for accredited customers but added friction.
As a legacy model by May 2026, Opus 4.5 was superseded by Claude Opus 4.6 (released February 5, 2026) and Claude Opus 4.7 (released April 16, 2026), both of which extended the context window to 1 million tokens and delivered further benchmark improvements. Opus 4.5 remains accessible under its versioned API identifier for customers who require behavior continuity, but new integrations are encouraged to use the more capable successors.
Opus 4.5 shipped as a single snapshot (claude-opus-4-5-20251101) and did not receive interim point releases. Anthropic's documentation lists the model under its dated identifier and a claude-opus-4-5 alias. The alias resolves to the November 1 snapshot and has not been redirected to a newer model.
The model's direct successor, Claude Opus 4.6, launched on February 5, 2026 with a one-million-token context window, adaptive thinking as the default reasoning mode, the new inference_geo data residency parameter, and a server-side compaction API. Opus 4.6 retained the $5/$25 standard pricing of Opus 4.5 and added a long-context tier ($10/$37.50) for requests exceeding 200,000 input tokens.
Claude Opus 4.7 followed on April 16, 2026 with a new tokenizer (which can produce up to 35% more tokens for the same source text), removal of sampling parameters and prefilling, an additional xhigh effort level, and the introduction of Project Glasswing for defensive cybersecurity. Opus 4.7 reached 87.6% on SWE-bench Verified, a step change of nearly seven percentage points over Opus 4.5.
Opus 4.5 remains the highest-rated Opus model that does not require migration off the older Messages API conventions. Customers running production pipelines built around manual budget_tokens extended thinking, sampling parameter tuning, or prefilling can stay on Opus 4.5 indefinitely, while customers that adopt the newer adaptive thinking pattern have generally migrated to Opus 4.6 or Opus 4.7.