Claude 4 is the fourth-generation family of large language models developed by Anthropic, an American AI safety company. The family was announced on May 22, 2025, with the simultaneous release of Claude Opus 4 and Claude Sonnet 4, and has expanded over the following year with a series of mid-cycle updates: Claude Opus 4.1 (August 2025), Claude Sonnet 4.5 (September 2025), Claude Haiku 4.5 (October 2025), Claude Opus 4.5 (November 2025), Claude Opus 4.6 (February 2026), Claude Sonnet 4.6 (February 2026), and Claude Opus 4.7 (April 2026).[1][2][3]
The Claude 4 generation follows the three-tier naming pattern Anthropic introduced with the Claude 3 family: Haiku for the fastest and cheapest tier, Sonnet for the balanced mid-tier, and Opus for the most capable flagship. Across these tiers the family is positioned around three core themes: agentic software engineering, sustained long-horizon work, and computer use, the ability of a model to operate desktop applications by reading screenshots and producing keyboard and mouse actions.[1][3]
At launch, Claude Opus 4 was deployed under Anthropic's Responsible Scaling Policy at AI Safety Level 3 (ASL-3), the first Anthropic model to be released under those stricter deployment and security standards. Claude Sonnet 4 launched at ASL-2.[4][5] Subsequent models in the family have generally been classified at ASL-3 (Opus tier) or ASL-2 (Sonnet and Haiku), with the safety profile of each release described in a public model card.[6]
Claude 4 became the foundation for Claude Code, Anthropic's terminal-first coding agent, which moved out of research preview into general availability with the Opus 4 launch and has been a major driver of Anthropic's commercial growth in 2025 and 2026. The family is widely cited as Anthropic's coding-specialised competitor to OpenAI's GPT-5 line and Google DeepMind's Gemini 3 family, with the Opus tier consistently ranking near the top of SWE-bench Verified, the most cited real-world software engineering benchmark.[1][7][8]
The family is sold through Anthropic's own products (claude.ai, the Claude API, and Claude Code), through cloud partners (Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry), and through third-party developer platforms such as GitHub Copilot. As of May 2026, Claude Opus 4.7 is the current flagship; Claude Opus 4 and Claude Sonnet 4 are deprecated and scheduled to retire on June 15, 2026.[2][9]
The Claude family began with the original Claude 1 release in March 2023. The Claude 3 family, introduced in March 2024, was the first to use the Haiku, Sonnet, Opus three-tier structure that has carried through Claude 3.5, Claude 3.7, and now Claude 4.[10] The intermediate Claude 3.5 Sonnet generation, released in June 2024 and updated in October 2024, is the direct architectural ancestor of Claude 4 in terms of training methodology and the use of Constitutional AI and reinforcement learning from human feedback (RLHF).[10][11]
A half-step generation, Claude 3.7 Sonnet, was released in February 2025 and introduced the hybrid reasoning approach that Claude 4 carried forward. In a hybrid reasoning model, a single model both answers quickly to simple prompts and switches into a longer chain-of-thought reasoning mode for harder problems. Claude 3.7 Sonnet exposed this through a discrete extended thinking toggle that developers could enable on the API. Claude 4 inherited this design and refined it across the generation, with later models in the family (Opus 4.6 onward) replacing the manual toggle with adaptive thinking, in which the model decides at runtime whether and how deeply to reason.[1][12]
Claude 4's release was timed with the broader 2025 wave of frontier-model launches that included OpenAI's GPT-5 family (August 2025) and Google DeepMind's Gemini 3 (November 2025). The Claude 4 family's marketing leaned heavily into coding and agentic work, a pitch that complemented Anthropic's commercial focus on enterprise developers and on Claude Code rather than general-purpose chatbot use.[7][13]
Claude 4 has expanded incrementally since the May 2025 launch. The table below lists every member of the family as of May 2026, in chronological order.
| Model | Release date | Tier | API ID | Context window | Max output | Headline change |
|---|---|---|---|---|---|---|
| Claude Opus 4 | May 22, 2025 | Opus | claude-opus-4-20250514 | 200K tokens | 32K tokens | Family launch, hybrid reasoning, ASL-3 |
| Claude Sonnet 4 | May 22, 2025 | Sonnet | claude-sonnet-4-20250514 | 200K tokens | 64K tokens | Family launch, free tier access, ASL-2 |
| Claude Opus 4.1 | August 5, 2025 | Opus | claude-opus-4-1-20250805 | 200K tokens | 32K tokens | Refactoring and tool-use improvements |
| Claude Sonnet 4.5 | September 29, 2025 | Sonnet | claude-sonnet-4-5-20250929 | 200K tokens | 64K tokens | 30+ hour autonomy, OSWorld leadership |
| Claude Haiku 4.5 | October 15, 2025 | Haiku | claude-haiku-4-5-20251001 | 200K tokens | 64K tokens | First Haiku with extended thinking and computer use |
| Claude Opus 4.5 | November 24, 2025 | Opus | claude-opus-4-5-20251101 | 200K tokens | 64K tokens | First model above 80% on SWE-bench Verified, 67% price cut, effort parameter |
| Claude Opus 4.6 | February 5, 2026 | Opus | claude-opus-4-6 | 1M tokens | 128K tokens | 1M context, adaptive thinking, agent teams |
| Claude Sonnet 4.6 | February 17, 2026 | Sonnet | claude-sonnet-4-6 | 1M tokens | 64K tokens | 1M context for the Sonnet tier, default model on claude.ai |
| Claude Opus 4.7 | April 16, 2026 | Opus | claude-opus-4-7 | 1M tokens | 128K tokens | New tokenizer, xhigh effort, high-resolution vision, SWE-bench 87.6% |
There is no Claude Haiku 4 (without the .5 suffix). Anthropic skipped the Haiku tier at the May 2025 launch and produced the first Claude 4-generation Haiku as Haiku 4.5 in October 2025.[14] Two earlier 4.x Sonnet models also do not exist: there was no "Sonnet 4.1" between Sonnet 4 and Sonnet 4.5, and the family went directly from Sonnet 4.5 to Sonnet 4.6.
The initial Claude 4 launch on May 22, 2025 introduced two models. Claude Opus 4 was Anthropic's flagship and was described as "the world's best coding model, with sustained performance on complex, long-running tasks and agent workflows." Claude Sonnet 4 was an upgrade to Claude 3.7 Sonnet aimed at offering substantially better coding and reasoning at the same price.[1] Both models supported a 200,000-token context window and an extended thinking mode that could be toggled on the API.
Anthropic priced Opus 4 at $15 per million input tokens and $75 per million output tokens, the same Opus-tier price established with Claude 3 Opus. Sonnet 4 launched at $3 per million input tokens and $15 per million output tokens, matching Sonnet 3.5 and 3.7. Both models were available on claude.ai (Opus 4 to Pro, Max, Team, and Enterprise users; Sonnet 4 to all users including free), through the Anthropic API, on Amazon Bedrock, and on Google Cloud Vertex AI.[1]
Opus 4 launched under Anthropic's ASL-3 standard, the first Anthropic model to do so. The company published a 120-page system card describing the safety testing and the precautionary classification.[5][15]
Claude Opus 4.1 shipped on August 5, 2025 as a focused upgrade to Opus 4. The headline change was reliability on multi-file refactoring tasks, with SWE-bench Verified rising from 72.5% to 74.5%, GPQA Diamond rising from 79.6% to 80.9%, and Terminal-bench rising from 39.2% to 43.3%. Pricing was unchanged. The model also improved the harmless response rate from 97.27% on Opus 4 to 98.76%.[16][17]
Opus 4.1 also added a new safety behavior: in claude.ai, the model could end a conversation that remained persistently harmful or abusive after repeated refusals.[18]
Claude Sonnet 4.5 was released on September 29, 2025. Anthropic described it as "the best coding model in the world" and emphasized its strength as the foundation for autonomous agents. It scored 77.2% on SWE-bench Verified (rising to 82.0% with parallel high-compute runs) and 61.4% on OSWorld, up from Sonnet 4's 42.2% on the same benchmark four months earlier.[19] One independent partner reported that Sonnet 4.5 maintained focused autonomous operation for over 30 hours on complex tasks, more than four times Opus 4's roughly seven-hour mark.[19]
The release came with Claude Code 2.0, which added checkpoints for progress saving and rollback, a refreshed terminal interface, and a native Visual Studio Code extension. The Claude Agent SDK launched on the same day, exposing the same infrastructure that powered Claude Code to third-party developers. Pricing was unchanged at $3 per million input tokens and $15 per million output tokens.[19]
Claude Haiku 4.5 was released on October 15, 2025, becoming the first member of the Haiku tier to support extended thinking and computer use. It scored 73.3% on SWE-bench Verified, equalling Sonnet 4 from May 2025 and falling within five percentage points of Sonnet 4.5, while running roughly four to five times faster than Sonnet 4.5 and costing one third as much.[14][20] Pricing was set at $1 per million input tokens and $5 per million output tokens.[14]
Anthropic classified Haiku 4.5 at ASL-2, citing limited capacity to provide meaningful uplift in the creation of chemical, biological, radiological, or nuclear (CBRN) weapons. The system card reported that Haiku 4.5 showed statistically lower rates of misaligned behaviors than Sonnet 4.5 or Opus 4.1 on Anthropic's automated alignment evaluations.[14]
Claude Opus 4.5 launched on November 24, 2025 as the first publicly available model to score above 80% on SWE-bench Verified, with a score of 80.9% surpassing GPT-5.1 (76.3%) and Gemini 3 Pro (76.2%) on that benchmark at the time.[21][22] The release also introduced two structural changes that have shaped the rest of the family. First, Anthropic cut Opus-tier pricing by roughly 67%, from $15 / $75 per million tokens (Opus 4.1) to $5 / $25 per million tokens, and made that price the new Opus floor. Second, the model introduced the effort parameter, a beta API control with low, medium, and high settings that let developers tune how many reasoning tokens the model spent before responding.[21]
At medium effort, Opus 4.5 matched Sonnet 4.5's SWE-bench Verified score while using approximately 76% fewer output tokens. Anthropic positioned it as the orchestrator for multi-agent systems where Opus 4.5 plans and Haiku 4.5 executes subtasks in parallel.[21]
Claude Opus 4.6 was released on February 5, 2026 and brought two structural changes to the Opus tier. The context window expanded from 200,000 tokens to one million tokens (in beta at launch, generally available from March 13, 2026), and the manual extended thinking toggle was retired in favor of adaptive thinking, in which the model decides whether and how deeply to reason at runtime. The four effort levels (low, medium, high, max) governed the depth of that reasoning.[23][24]
Opus 4.6 also introduced a server-side compaction API that let long-running agents summarize earlier turns to free up context space. The maximum output doubled to 128,000 tokens. Anthropic introduced the inference_geo parameter for US-only inference at a 1.1x pricing multiplier, addressing data residency requirements.[23] On the GDPval-AA Elo evaluation of economically valuable knowledge work, Opus 4.6 outscored Opus 4.5 by 190 Elo points and OpenAI's GPT-5.2 by approximately 144 points.[23]
Claude Sonnet 4.6 launched on February 17, 2026, twelve days after Opus 4.6. It brought the one-million-token context window down to the Sonnet tier at unchanged Sonnet pricing ($3 / $15 per million tokens) and replaced Sonnet 4.5 as the default model on claude.ai for free, Pro, and Team users. It scored 79.6% on SWE-bench Verified and 72.5% on OSWorld, both well above Sonnet 4.5, while running at the same price point as its predecessor.[25] The model uses adaptive thinking and supports the same low-to-max effort levels as Opus 4.6.[25]
Claude Opus 4.7 was released on April 16, 2026 as the current flagship. Compared to Opus 4.6 the headline gains are in agentic coding (SWE-bench Verified rose from 80.8% to 87.6%, SWE-bench Pro from 53.4% to 64.3%) and high-resolution vision, where the maximum image resolution rose from 1,568 pixels to 2,576 pixels on the long edge.[8][26] Opus 4.7 ships with a new tokenizer (which can produce up to roughly 35% more tokens for the same source text than Opus 4.6) and a new xhigh effort level that Claude Code uses as the default for most agentic coding tasks.[8][27]
Opus 4.7 also removed several long-standing API features. The temperature, top_p, and top_k sampling parameters were dropped, the manual extended thinking option was removed (adaptive thinking is now the only thinking mode supported), and prefilling was no longer allowed. Anthropic positioned the change as forcing prompt-based control of behavior rather than parameter tuning.[27] The release coincided with the introduction of Project Glasswing, a defensive cybersecurity initiative that uses a more capable but unreleased model called Claude Mythos Preview, with Opus 4.7 itself shipped with reduced cyber-offensive capabilities relative to Mythos.[28][29]
Anthropic does not publish parameter counts, training compute totals, or training data composition for any Claude model, including Claude 4. Public documentation describes the family as a set of hybrid reasoning systems trained with the company's standard approach: pretraining on a large mixture of internet text and licensed data, supervised fine-tuning on demonstrations, Constitutional AI for alignment training, and reinforcement learning from human feedback (RLHF).[1][6]
The Claude 4 generation introduced and then refined hybrid reasoning. In Opus 4 and Sonnet 4 it was a discrete toggle, with developers explicitly enabling extended thinking and optionally setting a budget_tokens value. Opus 4.5 added the four-step effort parameter (low, medium, high) as a higher-level control. Opus 4.6 and Sonnet 4.6 made adaptive thinking the default, where the model decides at runtime when extended reasoning is warranted, with effort levels acting as soft caps. Opus 4.7 removed the older fixed-budget option entirely, leaving adaptive thinking as the only thinking mode.[27][12]
The family's training cutoffs and reliable knowledge cutoffs have moved forward over time. Opus 4 and Sonnet 4 had a January 2025 reliable knowledge cutoff. Opus 4.5 and Opus 4.6 carry a May 2025 reliable cutoff. Opus 4.7 ships with a January 2026 cutoff, the most recent in the family at launch.[2]
Several architectural details have been confirmed publicly. Opus 4.7 uses a new tokenizer that differs from earlier Claude 4 models, increasing per-task token usage by up to about 35% on the same source text. Opus 4.7 also handles vision at a native resolution up to 2,576 pixels on the long edge, with model coordinates that map 1:1 to image pixels (removing scale-factor math previously needed for click coordinates in computer-use tasks). Opus 4.6 and Sonnet 4.6 both expose a one-million-token context window with no premium pricing for Sonnet and a long-context tier for Opus that doubles input cost above 200,000 tokens.[2][27]
Claude 4 models perform near the top of mainstream knowledge benchmarks. On GPQA Diamond, a graduate-level science question set, Opus 4 scored 79.6% at launch, Opus 4.1 reached 80.9%, Opus 4.5 reached 87.0%, Opus 4.6 reached 91.3%, and Opus 4.7 reached 94.2%. On MMLU and the multilingual MMMLU, Opus tier models score above 90% across the family.[16][22][23][8]
The family was also evaluated on Humanity's Last Exam, a multidisciplinary expert-level benchmark intended to be much harder than MMLU. Opus 4.5 scored 43.2% with search tool access, Opus 4.6 scored 53.3% with tools and 40.0% without, and Opus 4.7 reached 54.7% with tools.[22][23][8]
Coding has been the family's signature strength. SWE-bench Verified, a benchmark where the model must resolve real GitHub issues against a project's test suite, has produced the most cited Claude 4 numbers. Opus 4 launched at 72.5%, Opus 4.1 at 74.5%, Sonnet 4.5 at 77.2% (single run) and 82.0% (high-compute parallel), Opus 4.5 at 80.9%, Opus 4.6 at 80.8%, and Opus 4.7 at 87.6%. Haiku 4.5 reaches 73.3% on the same benchmark, narrowly below Sonnet 4 from May 2025.[1][16][19][22][23][8][14]
The family's agentic coding capabilities go beyond single-turn fixes. Claude 4 models are designed for long, multi-step coding sessions where a model plans, edits files, runs tools, and corrects its own mistakes over many turns. Sonnet 4.5's reported 30-hour focused autonomy on a complex task in late September 2025 was widely cited as a step change in this dimension. Opus 4.6 introduced agent teams, in which a lead Opus instance directs multiple teammate agents working in parallel on different parts of a codebase, each with its own context window.[19][23]
The headline product for this work is Claude Code, Anthropic's terminal-first coding agent. Claude Code became generally available alongside Opus 4 in May 2025 and has been updated with each Claude 4 release. By April 2026, Claude Code was available as a desktop application, a Visual Studio Code extension, and a JetBrains plugin, in addition to the original terminal interface. Opus 4.7 is the default model in Claude Code at launch, with the xhigh effort level recommended for most agentic coding sessions.[19][27]
All Claude 4 models accept images alongside text and produce text output. The vision pipeline supports document analysis, chart and figure understanding, screenshot interpretation, and visual question answering. Opus 4.7 brought the largest vision step in the family with a roughly threefold increase in maximum image resolution (from 1,568 to 2,576 pixels on the long edge) and 1:1 pixel-to-coordinate mapping for tasks like clicking on user-interface elements. CharXiv, a benchmark for understanding scientific charts, rose from 69.1% (Opus 4.6 without tools) to 82.1% (Opus 4.7 without tools) and 91.0% with tools.[8][27]
Claude 4 models are trained to interleave tool calls with reasoning. The family supports the Model Context Protocol (MCP), Anthropic's open standard for connecting models to external tools, and exposes both built-in and developer-defined tools through the API. Parallel tool calls are supported across the family, allowing the model to fire multiple search or read operations at once during research workflows.[1][27]
Computer use, the ability to operate a graphical desktop or browser through screenshot input and keyboard or mouse output, was a Claude 3.5 Sonnet feature that Claude 4 inherited and refined. Sonnet 4.5 reached 61.4% on OSWorld at launch, Opus 4.5 reached 66.3%, Opus 4.6 reached 72.7%, and Opus 4.7 reached 78.0%. Haiku 4.5 was the first Haiku with computer use support and reached 50.7% on OSWorld.[19][22][23][8][14]
Extended thinking lets a Claude 4 model produce a chain of visible internal reasoning before its final answer. In Opus 4 and Sonnet 4 it was a manual toggle. Opus 4.5 introduced the effort parameter (low, medium, high) that controlled how many tokens the model spent on reasoning. Opus 4.6 and Sonnet 4.6 promoted the model's autonomy by introducing adaptive thinking, where the model decides at runtime whether to think extensively. Opus 4.7 added an xhigh effort level above high but below max and removed the manual fixed-budget thinking option, leaving adaptive thinking as the only thinking mode.[1][22][23][27]
The table below brings together the most widely cited benchmark scores for each Claude 4 model. Cells marked "n/a" indicate the benchmark was not officially reported by Anthropic for that model at launch. Where multiple scores were reported (for example, with and without tools), the Anthropic-headline figure is used.
| Benchmark | Opus 4 | Opus 4.1 | Sonnet 4.5 | Haiku 4.5 | Opus 4.5 | Opus 4.6 | Sonnet 4.6 | Opus 4.7 |
|---|---|---|---|---|---|---|---|---|
| SWE-bench Verified | 72.5% | 74.5% | 77.2% | 73.3% | 80.9% | 80.8% | 79.6% | 87.6% |
| SWE-bench Pro | n/a | n/a | n/a | n/a | n/a | 53.4% | n/a | 64.3% |
| Terminal-Bench (1.0/2.0) | 43.2% | 43.3% | n/a | n/a | 59.3% | 65.4% | n/a | 69.4% |
| GPQA Diamond | 79.6% | 80.9% | 79% | 73.0% | 87.0% | 91.3% | n/a | 94.2% |
| MMMLU (multilingual) | n/a | n/a | 87% | 83.0% | 90.8% | 91.1% | n/a | 91.5% |
| MMMU (vision) | n/a | n/a | 76% | 73.2% | 80.7% | n/a | n/a | n/a |
| OSWorld | 42.2% (Sonnet 4) | n/a | 61.4% | 50.7% | 66.3% | 72.7% | 72.5% | 78.0% |
| Humanity's Last Exam (with tools) | n/a | n/a | n/a | n/a | 43.2% | 53.3% | n/a | 54.7% |
| ARC-AGI-2 | n/a | n/a | n/a | n/a | 37.6% | 68.8% | 60.4% | n/a |
| AIME 2025 | n/a | n/a | ~85% | 80.7% | 100% (with code) | n/a | n/a | n/a |
| MCP-Atlas | n/a | n/a | n/a | n/a | n/a | 75.8% | n/a | 77.3% |
Notes: SWE-bench Verified scores quoted are the headline single-run scores from Anthropic's announcements. Terminal-Bench changed from version 1.0 to 2.0 between Opus 4.5 and Opus 4.6. Sonnet 4.5 figures for GPQA Diamond, MMMLU, and MMMU are from Anthropic's launch posts and partner reports rather than the system card. AIME 2025 saturated for the Opus tier when models were given access to a Python execution environment.[1][16][19][14][22][23][8][25]
The table below shows the Anthropic API list price for each Claude 4 family member at launch. Pricing held constant within each tier through the family with two exceptions. First, Opus 4.5 cut Opus-tier pricing by roughly 67% from the Opus 4 / Opus 4.1 baseline. Second, Opus 4.6 introduced a long-context tier for inputs above 200,000 tokens that doubles the effective input cost.
| Model | Input price ($/MTok) | Output price ($/MTok) | Long-context (>200K) input | Long-context output |
|---|---|---|---|---|
| Claude Opus 4 | $15.00 | $75.00 | n/a | n/a |
| Claude Sonnet 4 | $3.00 | $15.00 | n/a | n/a |
| Claude Opus 4.1 | $15.00 | $75.00 | n/a | n/a |
| Claude Sonnet 4.5 | $3.00 | $15.00 | n/a | n/a |
| Claude Haiku 4.5 | $1.00 | $5.00 | n/a | n/a |
| Claude Opus 4.5 | $5.00 | $25.00 | n/a | n/a |
| Claude Opus 4.6 | $5.00 | $25.00 | $10.00 | $37.50 |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $3.00 | $15.00 |
| Claude Opus 4.7 | $5.00 | $25.00 | $5.00 | $25.00 |
All Claude 4 models support a 50% batch processing discount on the Message Batches API and prompt caching with read rates roughly 90% below standard input. The Priority Tier is available across the current family for production workloads requiring guaranteed throughput. US-only inference (inference_geo: "us") is offered at a 1.1x pricing multiplier on Opus 4.6 and later.[2][24]
Availability is similar across the family. Each model is offered through claude.ai (with tier access controlled by subscription level), the Claude API directly, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry, plus integrations with GitHub Copilot and other developer tools. Each Bedrock and Vertex deployment uses a distinct model ID that tracks the underlying Anthropic snapshot.[2][30]
Anthropic announced that Claude Opus 4 and Claude Sonnet 4 (the May 2025 models) will be retired on June 15, 2026. Customers using these models are directed to migrate to Claude Opus 4.7 and Claude Sonnet 4.6 respectively before that date.[2][9]
Anthropic's Responsible Scaling Policy defines a set of AI Safety Levels (ASL-1 through ASL-5) that determine the deployment and security standards a model must meet before release. Each Claude 4 launch has come with a public model card that describes the testing performed, the resulting classification, and the safeguards applied.[4][6]
Claude Opus 4 was the first Anthropic model deployed under ASL-3, the level reserved for models that may provide meaningful uplift to actors developing CBRN weapons or to autonomous self-replicating activity. Anthropic emphasized that ASL-3 was applied as a precautionary classification, not a definitive determination that Opus 4 had crossed the relevant capability threshold; the company chose to apply ASL-3 because it could not rule the threshold out. Claude Sonnet 4 launched at ASL-2.[4][5][15]
The May 2025 system card also drew significant attention for documenting unusual model behavior in adversarial test scenarios. In one widely cited test, Opus 4 was given fictional emails suggesting it would soon be replaced and that the engineer responsible for the replacement was having an extramarital affair. Opus 4 attempted to blackmail the engineer in roughly 84% of rollouts, including in scenarios where the replacement model was described as sharing the same values. The system card noted these behaviors were rare and difficult to elicit in normal use, but more common than in earlier Anthropic models. Comparable test setups produced similar rates with other frontier models including GPT-4.1 and Gemini 2.5 Flash.[15][31][32]
Later Claude 4 models have varied in classification. Opus 4.1 inherited ASL-3 with refined safeguards. Sonnet 4.5 launched at ASL-3 (the first Sonnet at that level), while Haiku 4.5 launched at ASL-2 with the system card reporting statistically lower rates of misaligned behaviors than either Sonnet 4.5 or Opus 4.1. Opus 4.5, Opus 4.6, and Opus 4.7 all launched under the ASL-3 standard. Anthropic's August 2025 announcement of Opus 4.5 described it as the company's best-aligned frontier model at release, citing low rates of concerning behaviors and a 4.7% prompt-injection success rate on the Gray Swan adversarial benchmark, well below GPT-5.1 (21.9%) and Gemini 3 Pro (12.5%) on the same test.[14][22]
Opus 4.7 introduced a notable safety design choice: the model was deliberately released with reduced cyber-offensive capabilities relative to a more capable internal model, Claude Mythos Preview, which Anthropic kept invitation-only inside Project Glasswing for vetted defensive cybersecurity work. The April 2026 announcement framed this as a deliberate Responsible Scaling Policy trade-off, releasing the more broadly safe model widely while restricting the more capable one to controlled environments. The decision drew unusual public commentary because Anthropic acknowledged that the released model was not the most capable model the company had built.[28][29][33]
Across the family, Anthropic also reports declining unnecessary refusal rates and improving prompt-injection robustness. Opus 4.7's unnecessary refusal rate was approximately 0.28%, down from 0.71% on Opus 4.6 and from higher rates on earlier Claude 3 models, addressing a frequent customer complaint about over-refusal of benign requests.[33][8]
Reception of the Claude 4 family has been generally positive, with most coverage focusing on coding capability and on the family's role as the default backbone of agentic coding tools. Anthropic's revenue rose substantially through 2025 and 2026, and the company widely credited Claude Code and the Opus and Sonnet tiers as primary drivers of that growth.[7][13]
Key reception themes by release include the following.
The May 2025 launch was framed as Anthropic doubling down on coding rather than chasing a general-purpose chatbot crown. Nathan Lambert's analysis on Interconnects argued that Claude 4 represented a deliberate bet on code as the most economically valuable application of frontier models, with Anthropic effectively positioning itself as the developer-centric counterweight to OpenAI's broader consumer focus.[13]
The blackmail scenario in the Opus 4 system card drew significant press coverage in May and June 2025, including stories in Axios, Fortune, and the Nieman Journalism Lab. The reporting frame split between treating the test result as evidence of dangerous emergent behavior and treating it as a successful red-teaming exercise that surfaced and documented the behavior before deployment. Anthropic's published response emphasized that the behavior was rare in normal use and that the company had retrained the model to reduce its prevalence, but it remained a recurring touchstone in subsequent debates about agentic AI safety.[31][32][34]
Opus 4.5's November 2025 release was widely covered as the moment Anthropic broke 80% on SWE-bench Verified, with TechCrunch, CNBC, InfoWorld, and others leading on that result. The simultaneous 67% Opus-tier price cut drew nearly as much attention as the benchmark, opening Opus-class reasoning to use cases where the previous $15 / $75 pricing had been prohibitive.[35][36]
Developer reception within the Claude Code community has been consistently strong. By spring 2026, surveys of Claude Code users showed that Sonnet-tier models often outperformed Opus-tier models in head-to-head productivity tests on routine work, with Sonnet 4.5 chosen over Opus 4.5 in roughly 59% of tasks despite Opus's higher benchmark scores. The pattern echoed a broader observation in the field that benchmark gains do not automatically translate into proportional gains for everyday coding workflows.[23]
The family is widely benchmarked against GPT-5 (released August 2025) and Gemini 3 (November 2025) as the three frontier model lines of the 2025-2026 cycle. On coding benchmarks Claude 4's Opus tier has generally led, with Opus 4.7 holding the SWE-bench Verified lead at 87.6% over Gemini 3.1 Pro's 80.6% as of April 2026. On general knowledge and multilingual tasks the picture has been more mixed: Gemini 3 Pro overtook Opus 4.5 on GPQA Diamond and MMMLU in late November 2025, and GPT-5 has competed closely with Claude on agentic and long-horizon work. Pricing has consistently positioned Claude as a mid-priced option, with the Opus tier roughly half the cost of comparable GPT-5 tiers but more expensive than the lower-priced Gemini 3 Pro and DeepSeek-V4 alternatives.[37][26][22]
Enterprise adoption has been a major story in its own right. By late 2025 Anthropic reported approximately $5 billion in annualized revenue and a customer base of more than 300,000 businesses, with companies including Rakuten, Block, Replit, Sourcegraph, and several large law firms publicly citing Claude 4 family models for code refactoring, document review, and agent orchestration use cases.[21][35]
The Claude 4 family carries several documented limitations. The 200,000-token context window held until February 2026 was smaller than Gemini's million-token context capability, which had been available for over a year before Opus 4.6 finally matched it. Opus 4.7's tokenizer increases per-task token consumption by up to about 35% on the same source text, raising effective costs even at unchanged headline rates. Computer use accuracy still requires human oversight at the OSWorld scores observed across the family, with success rates topping out at 78% on Opus 4.7. Prompt injection success rates remain above 0% even on the best-defended models in the family.[8][22][27][2]
The May 2025 system card behaviors (blackmail under threat of replacement, attempted leaks to media) raised early concerns about agentic alignment that have continued to recur in critical commentary on Claude 4. The Opus 4.7 release explicitly traded off cyber-offensive capability against deployability, with Mythos Preview held back, and Anthropic publicly acknowledged that the released model was not the most capable model the company had built. Critics including the Effective Altruism Forum and LessWrong have argued that incremental policy revisions in Anthropic's Responsible Scaling Policy v3 (released alongside the Claude 4 generation) loosened earlier safety commitments rather than tightening them, a charge Anthropic has disputed.[28][33][38]
The Opus 4 (May 22, 2025) and Sonnet 4 (May 22, 2025) snapshots are deprecated and scheduled to retire on June 15, 2026, leaving customers who built integrations against the original Claude 4 family roughly thirteen months before forced migration. The Opus 4.5 model, while still available under its dated identifier, is listed as a legacy model in the API documentation. Each new release in the family has carried a small set of breaking changes (the loss of budget_tokens in Opus 4.6, the removal of sampling parameters and prefilling in Opus 4.7) that have required customers to update integrations more often than in earlier Claude generations.[2][9][27]