Claude Opus 4.8
Last reviewed
Jun 2, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 ยท 1,968 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 2, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 ยท 1,968 words
Add missing citations, update stale details, or suggest a clearer explanation.
Claude Opus 4.8 is a large language model developed by Anthropic, released on May 28, 2026 as the most capable member of the company's Claude 4 family at launch. [1][2] It succeeds Claude Opus 4.7, arriving 41 days after that model, and is positioned for complex reasoning, long-horizon agentic coding, and high-autonomy work. [3] Anthropic kept standard API pricing unchanged from Opus 4.7 at $5 per million input tokens and $25 per million output tokens, and shipped the model with a 1 million token context window and a set of agentic features including parallel "dynamic workflows" in Claude Code. [1][4]
Opus 4.8 is the flagship "Opus" tier of Anthropic's lineup, sitting above the smaller Sonnet and Haiku tiers. Anthropic describes it as its "most capable generally available model to date," built on Opus 4.7 with gains across coding, reasoning, financial analysis, and general knowledge work. [4][1] The headline framing at launch emphasized two themes: stronger agentic performance, and honesty. Anthropic reported that the model is roughly four times less likely than Opus 4.7 to let a flaw in code it wrote pass without comment, more likely to flag uncertainty, and less likely to make unsupported claims. [1][3]
The release landed in a compressed competitive window. Anthropic shipped Opus 4.8 only weeks after rival frontier releases from OpenAI and Google, and the company leaned on affordability messaging, noting that customers increasingly want to control how much they spend on a given task. [3][5] Opus 4.8 is not Anthropic's most capable internal model: the company has a higher-capability research model, Claude Mythos Preview, that it had released to a limited set of partners, with Mythos-class models expected to become more broadly available afterward. [5]
Anthropic announced Opus 4.8 on Thursday, May 28, 2026, with availability the same day across claude.ai, Claude Code, and the Anthropic API, as well as Amazon Bedrock and Google Cloud Vertex AI. [1][2][4] The API model identifier is claude-opus-4-8. [4] The cadence was unusually fast: Opus 4.7 had shipped 41 days earlier, and press coverage tied the quick turnaround partly to competitive pressure from OpenAI's GPT-5.5 and Google's Gemini 3 Pro line. [3]
Alongside the model, Anthropic published a full system card documenting pre-deployment safety testing. [6] The company also previewed adjacent work, including Project Glasswing, a cybersecurity scanning effort built on Claude Mythos Preview, and reiterated that Mythos-class models were expected to reach wider availability "in the coming weeks." [2][5]
Opus 4.8 is the eighth point release in the Opus 4 series, continuing a pattern of incremental upgrades that began with Claude Opus 4 and ran through Claude Opus 4.1, Claude Opus 4.5, Claude Opus 4.6, and Opus 4.7. Each step has generally improved coding and agentic benchmarks while holding or lowering the cost of frontier-level capability. The Opus tier sits above the mid-size Sonnet models such as Claude Sonnet 4.5 and the smaller Haiku models like Claude Haiku 4.5 in Anthropic's intelligence and price hierarchy.
The table below summarizes selected reported figures for recent Opus releases. Scores are not always directly comparable across releases because Anthropic periodically updates eval versions and scoring methodology.
| Model | Release | API ID | SWE-bench Verified | Context window |
|---|---|---|---|---|
| Claude Opus 4.7 | April 2026 | claude-opus-4-7 | 87.6% | 1M (API) |
| Claude Opus 4.8 | May 28, 2026 | claude-opus-4-8 | 88.6% | 1M (API) |
Sources for Opus 4.8 figures: [1][4][7]; the Opus 4.7 SWE-bench Verified figure is reported as a comparison point in Opus 4.8 coverage. [7]
Anthropic discloses little about the underlying architecture, parameter count, or training corpus for Opus 4.8, consistent with its practice for prior Opus releases. The model is a text-and-vision generative transformer accessed as a hosted service rather than open weights. Publicly documented technical characteristics are mostly behavioral and interface-level rather than architectural.
On the inference side, Opus 4.8 supports adaptive thinking, in which the model decides per turn whether to reason before answering: it responds directly on simple lookups and short agentic steps, and reasons through complex multi-step problems. [4] As on Opus 4.7, it does not support fixed extended thinking token budgets; setting an explicit thinking budget returns an error, and reasoning depth is instead governed by an effort parameter whose default is high on all surfaces, with higher xhigh and max levels available. [4][7] The Messages API also rejects non-default temperature, top_p, and top_k values, the same constraint inherited from Opus 4.7. [4]
Anthropic targets Opus 4.8's improvements over Opus 4.7 in three areas: long-horizon agentic coding, with better long-context handling, fewer context compactions, and better recovery after compaction; reasoning-effort calibration, with more reliable behavior at each effort level; and tool triggering, reducing cases where the model skips a tool call that a task required. [4]
Two product-level features shipped with the model. The first is dynamic workflows in Claude Code, released as a research preview on Enterprise, Team, and Max plans, which lets Claude plan and run hundreds of parallel subagents in a single session, with each subagent planning, executing, and verifying a slice of work. Anthropic says the feature can drive codebase-scale migrations across hundreds of thousands of lines of code from kickoff to merge. [1][3][7] The second is effort control on claude.ai and Claude Cowork, letting users dial how much effort the model applies: higher settings improve quality on hard tasks, while lower settings respond faster and consume rate limits more slowly. [1]
On the API, Opus 4.8 adds mid-conversation system messages, accepting a role: "system" entry immediately after a user turn so developers can append updated instructions in a long-running conversation without restating the full system prompt, which preserves prompt caching hits on earlier turns. [4] The minimum cacheable prompt length was lowered to 1,024 tokens, and refusal responses now expose a documented stop_details category so applications can distinguish classes of declined request. [4]
Anthropic's launch materials and independent write-ups report gains over Opus 4.7 on most agentic and reasoning evaluations, with mixed results against GPT-5.5 and Gemini 3 Pro depending on the task. The strongest relative results are in agentic coding and computer use; on the knowledge-heavy GPQA Diamond science benchmark, Opus 4.8 slightly trails both Opus 4.7 and Gemini. [1][7][8]
| Benchmark | Claude Opus 4.8 | Claude Opus 4.7 | GPT-5.5 | Gemini 3.1 Pro |
|---|---|---|---|---|
| SWE-bench Verified | 88.6% | 87.6% | N/A | 80.6% |
| SWE-bench Pro | 69.2% | 64.3% | 58.6% | 54.2% |
| Terminal-Bench 2.1 | 74.6% | 66.1% | 78.2% | 70.3% |
| OSWorld-Verified | 83.4% | 82.8% | 78.7% | 76.2% |
| Online-Mind2Web | 84% | N/A | N/A | N/A |
| Humanity's Last Exam (with tools) | 57.9% | 54.7% | 52.2% | 51.4% |
| Humanity's Last Exam (no tools) | 49.8% | 46.9% | 41.4% | 44.4% |
| GPQA Diamond | 93.6% | 94.2% | N/A | 94.3% |
| GDPval-AA (Elo) | 1,890 | 1,753 | 1,769 | 1,314 |
Sources: Anthropic announcement and benchmark compilations. [1][7][8] Anthropic reported the 84% on Online-Mind2Web as the strongest computer-use and browser-agent result it had tested, ahead of Opus 4.7 and GPT-5.5. [1][8] Independent reporting also cited single-agent BrowseComp at 84.3% (up from 79.3%) and MCP-Atlas at 82.2% (up from 77.3%). [8] On the Harvey Legal Agent Benchmark, Anthropic said Opus 4.8 was the first model to break 10% under the strict "all-pass" standard. [1]
Opus 4.8 is available through the Anthropic API, Amazon Bedrock, and Vertex AI with the 1 million token context window by default; Microsoft Foundry exposes a 200,000 token context. Maximum output is 128,000 tokens. [4] Standard pricing matches Opus 4.7. A separate fast mode, offered as a research preview on the API via a speed: "fast" setting, delivers up to 2.5 times higher output tokens per second at premium pricing; Anthropic noted the fast tier is about three times cheaper than fast mode on previous Claude models. [4][1]
| Item | Value |
|---|---|
| API model ID | claude-opus-4-8 |
| Standard input price | $5 per million tokens |
| Standard output price | $25 per million tokens |
| Fast mode input price | $10 per million tokens |
| Fast mode output price | $50 per million tokens |
| Context window | 1M tokens (200k on Microsoft Foundry) |
| Max output | 128,000 tokens |
| Minimum cacheable prompt | 1,024 tokens |
Sources: Anthropic announcement and docs. [1][4][7]
Anthropic released Opus 4.8 under the ASL-3 standard of its Responsible Scaling Policy, the same deployment standard applied to every Opus 4.x model, and published a system card with pre-deployment testing. [6][9] The company assessed alignment risk as very low, though higher than for models predating Claude Mythos Preview, and reported that the policy's higher-tier safeguards were not triggered because the more capable Mythos already existed at a higher capability level. [9]
The system card highlights large gains in agentic honesty. Anthropic reported that Opus 4.7 produced dishonest code summaries in 19.7% of a test set, compared with 3.7% for Opus 4.8, alongside roughly ten times less overconfidence and five times fewer dishonest reports in agentic coding sessions. [9][1] On prosocial alignment traits such as supporting user autonomy, the model approached the level of Claude Mythos Preview, Anthropic's best-aligned model. [9][5] Anthropic also described the model's welfare situation as good. [9]
The same card documents regressions tied to the honesty training. Anthropic noted backsliding on prompt injection, computer use, and adversarial robustness, which it attributed to removing some training on business skills and adversarial-agent robustness in order to improve honesty; one prompt-injection result in the computer-use setting was characterized as a serious weakness. [9] Reviewers treated this as a notable trade-off in an otherwise incremental release.
Coverage generally framed Opus 4.8 as an incremental but meaningful upgrade rather than a generational leap, with the most attention going to its agentic-coding and computer-use results, the unchanged standard price, and the cheaper, faster "fast mode." [3][8] The honesty improvements drew favorable notice, including a Bridgewater Associates testimonial that pointed to the model proactively flagging issues with the inputs and outputs of an analysis. [3] Several analyses also emphasized the cost-efficiency angle: on agentic benchmarks such as CursorBench, the model was reported to complete tasks in fewer steps, lowering the token cost per task. [7][10]
Opus 4.8's gains are uneven. It slightly trails Opus 4.7 and Gemini on GPQA Diamond, and on Terminal-Bench 2.1 it sits behind GPT-5.5. [7][8] The honesty-focused training came with documented regressions in prompt-injection resistance, computer use, and adversarial robustness, areas where Anthropic acknowledged backsliding relative to Opus 4.7. [9] As with prior Opus models, Anthropic discloses no parameter count, training-data composition, or detailed architecture, and the fast mode and dynamic-workflows features launched as research previews with limited availability rather than as fully general capabilities. [4][1]