Claude 3 Opus
Last reviewed
May 17, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 · 7,545 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 17, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 · 7,545 words
Add missing citations, update stale details, or suggest a clearer explanation.
Claude 3 Opus is a large language model developed by Anthropic and released on March 4, 2024 as the flagship of the Claude 3 family. It launched alongside the smaller Claude 3 Sonnet (the mid tier) and Claude 3 Haiku (the entry tier), and was announced as Anthropic's first model to claim a clear lead over GPT-4 on most public benchmarks. The model carries the API identifier claude-3-opus-20240229, with the date suffix marking the February 29, 2024 training snapshot used for the public release four days later.[^1][^2]
Opus 3 was the first Anthropic flagship to support multimodal vision input, the first to ship at the now-canonical $15 per million input and $75 per million output Opus tier price, and the first Claude model to be widely deployed beyond the United States, with availability in 159 countries on day one. Anthropic positioned the model as exhibiting "near-human levels of comprehension and fluency on complex tasks, leading the frontier of general intelligence," and described the family as setting "a new standard for intelligence, far surpassing models available today."[^1]
The model used a 200,000-token context window for general API customers and offered 1 million tokens to select enterprise customers under managed agreements. It was deployed under AI Safety Level 2 (ASL-2), the same classification as its predecessors, with Anthropic noting in the model card that the family "presents negligible potential for catastrophic risk." Opus 3 launched without native tool use support; that capability was added later in 2024 across the Claude 3 line.[^3][^4]
Claude 3 Opus held the title of Anthropic's most intelligent model for roughly three and a half months, until Claude 3.5 Sonnet shipped on June 20, 2024 and overtook it on most published benchmarks despite being the mid-tier model at one fifth the price. No Claude 3.5 Opus or Claude 3.7 Opus was ever released, and the Opus tier sat without an active flagship for fourteen months until Claude 4 Opus launched on May 22, 2025. The original Opus 3 snapshot was deprecated on June 30, 2025 and retired from the standard Claude API on January 5, 2026, though Anthropic committed to keeping it accessible to paid claude.ai subscribers and available by request on the API as part of an unusual model preservation experiment.[^2][^5][^6]
Anthropic was founded in 2021 by former OpenAI researchers, including Dario and Daniela Amodei, and has positioned itself as an AI safety company that builds frontier models. The Claude line began in March 2023 with the original Claude 1 and an Anthropic-only beta. Claude 2 followed in July 2023 with a 100,000-token context window and the first general API release, and Claude 2.1 in November 2023 doubled the context to 200,000 tokens. By the end of 2023 Claude 2.1 sat behind GPT-4 on most public benchmarks, was strong on long-document tasks, and had picked up a reputation for being unusually willing to refuse benign requests, an issue Anthropic acknowledged and committed to working on.[^7][^8]
The gap between Claude 2.1 and the late 2023 frontier set the commercial brief for what became Claude 3. Anthropic needed a single release that would (a) beat GPT-4 on the headline tests, (b) introduce multimodal input to match Gemini and the GPT-4V variant of GPT-4, (c) cut latency and price for routine work, and (d) let Anthropic compete in segments where the company had been mostly absent, including consumer subscriptions and cloud marketplaces. The three-tier naming pattern (Haiku for the cheapest tier, Sonnet for the balanced mid tier, Opus for the flagship) was introduced for the first time in this release and was inherited by every subsequent Claude generation. The Italian-derived names — Haiku evoking a short three-line poem, Sonnet a fourteen-line form, and Opus a major work — were chosen to convey relative scale and ambition without using the standard "small / medium / large" descriptors that Anthropic's competitors had adopted.[^1][^7]
Development of Opus 3 took place during the same window in which Anthropic raised a Series C round at a roughly $15 billion valuation, signed a significant cloud partnership with Amazon, and began a parallel deal with Google Cloud. The team's stated objective was to publish a model that, on Anthropic's own measurements, would be the strongest available frontier model on most evaluations the company had previously trailed on. Internally the project was reported to have been a multi-year effort, and the announcement was unusually confident about benchmark leadership compared to earlier Claude releases.[^1][^7]
Claude 3 Opus, Sonnet, and Haiku were unveiled together on March 4, 2024 in an Anthropic blog post titled "Introducing the next generation of Claude." Opus and Sonnet were available the same day on claude.ai (Opus to Pro subscribers, Sonnet on the free tier and to Pro users) and on the Anthropic API. Haiku was announced for release "soon" and reached general availability on March 13, 2024. Initial availability covered 159 countries, a substantial expansion from Claude 2's mostly United States and United Kingdom rollout, and reflected Anthropic's first serious attempt to operate as a global API provider.[^1][^9]
The announcement led with three claims that drew most of the press attention. First, on Anthropic's reported numbers, Opus had overtaken GPT-4 on standard knowledge and reasoning evaluations including MMLU, GPQA, GSM8K, MATH, HumanEval, BIG-Bench Hard, ARC-Challenge, and HellaSwag. Second, the family was multimodal in input, with all three models accepting images alongside text. Third, the family had been priced to undercut GPT-4 on lighter workloads, with Sonnet at one fifth the cost of Opus and Haiku at one sixtieth. Anthropic also reported that the new family had "a more nuanced understanding of requests" and had cut what the company called "unnecessary refusals" relative to Claude 2.1, addressing one of the persistent complaints from developers using the older model.[^1][^3]
On the same day, Anthropic published the Claude 3 Model Card, a 42-page document that detailed the family's training, evaluation results, multilingual performance, vision capabilities, refusal benchmarks, red-teaming, the Acceptable Use Policy, and the Responsible Scaling Policy framework under which the models had been classified ASL-2. The model card became the source of record for the family's published numbers and for some of the more interesting evaluation details, including the "needle in a haystack" recall result discussed below.[^3]
Launch coverage was extensive. The Verge, Bloomberg, the Financial Times, TechCrunch, VentureBeat, and Ars Technica all ran lead stories within hours of the announcement. The framing across outlets converged on "Anthropic claims it has beaten GPT-4," with most stories noting that Anthropic's own benchmark numbers showed Opus ahead and that independent evaluation would have to come later. Coverage outside the trade press also focused on the cultural moment, since Anthropic had been a relatively quiet player and the launch was widely treated as the company's coming-of-age release.[^10][^11]
The family launched with three tiers, all sharing the same architecture style, training data, vision capability, and 200,000-token context window. The differences across the three tiers came from model scale, latency, and price. Anthropic did not publish parameter counts for any of the three models, a position the company has consistently held across every Claude release.
| Tier | API ID | Release date | Context | Input price | Output price | Position |
|---|---|---|---|---|---|---|
| Claude 3 Opus | claude-3-opus-20240229 | March 4, 2024 | 200,000 tokens | $15.00 / M | $75.00 / M | Flagship |
| Claude 3 Sonnet | claude-3-sonnet-20240229 | March 4, 2024 | 200,000 tokens | $3.00 / M | $15.00 / M | Mid tier |
| Claude 3 Haiku | claude-3-haiku-20240307 | March 13, 2024 | 200,000 tokens | $0.25 / M | $1.25 / M | Entry tier |
Anthropic's positioning of the three tiers established a pricing hierarchy that the company has retained through every subsequent generation. Opus is priced for tasks where capability matters more than cost, Sonnet for balanced workloads, and Haiku for high-volume or latency-sensitive applications. The 60x price ratio between Opus and Haiku was the widest spread Anthropic had ever shipped, and let the company target both ends of the market with a single launch. The pricing was also calibrated against GPT-4 Turbo (then $10 input, $30 output), which Opus undercut on neither price nor context, but beat on most reported benchmarks.[^1][^3]
All three models were trained with a mixture of supervised fine tuning, Constitutional AI for alignment, and reinforcement learning from human feedback (RLHF). Knowledge cutoff for the training data was reported as August 2023. Anthropic published a single shared system card for the family rather than per-model documents, a practice that continued into the Claude 3.5 generation.[^3]
Opus 3 was Anthropic's first model to claim leadership on the standard knowledge and reasoning suite. The launch announcement listed the model as ahead of GPT-4 on MMLU (undergraduate-level multidisciplinary knowledge), GPQA (graduate-level science), GSM8K (grade school math), MATH (competition math), and BIG-Bench Hard. The model also reported strong scores on ARC-Challenge, HellaSwag, and DROP. Anthropic's framing was that Opus 3 had "shown near-human levels of comprehension" on a range of complex tasks, particularly those requiring multi-step reasoning, summarization of long inputs, and structured output generation.[^1][^3]
Vision was the headline new capability and the most visible architectural change relative to Claude 2.1, which had been text-only. Opus 3 accepted images in JPEG, PNG, GIF, and WebP formats and could mix text and image content in the same prompt. The model card describes the vision pipeline as accepting photos, charts, graphs, and technical diagrams, with image encoding handled at the input stage and the rest of the model treating the resulting representations as part of the standard context. Multiple images per request were supported (up to 20), and the announcement materials demonstrated chart reading, document parsing, screenshot interpretation, and handwritten text recognition. Anthropic positioned vision as a high priority because many enterprise customers worked with mixed text-and-image documents, noting that 50% of Anthropic's API customer base operated in industries (financial services, healthcare, legal, manufacturing) where image-rich workflows were standard.[^1][^3]
The output side was text only. Opus 3 could not generate images, perform image editing, or produce structured visual outputs, a limitation that distinguished it from contemporaneous OpenAI models that paired GPT-4V with DALL-E 3 for image generation in the ChatGPT product.[^3]
The 200,000-token context window was matched at launch only by Claude 2.1 and a small number of other long-context models. For select enterprise customers Anthropic offered access to a 1 million token context window under managed agreements. The model card included a recall evaluation using the "needle in a haystack" methodology pioneered by Greg Kamradt, in which a target sentence is inserted at varying depths into a long unrelated text and the model is asked to retrieve it. Opus 3 reached over 99% recall accuracy across the range of context positions tested, the strongest result Anthropic had reported on the test up to that point.[^1][^3]
The model card reported substantial improvements over Claude 2.1 on multilingual tasks. On multilingual MMLU, Opus 3 outperformed its predecessor by roughly 15.7 percentage points. On the Multilingual Grade School Math (MGSM) benchmark, the model scored 90.7%, comparable to or better than GPT-4 on most language pairs Anthropic reported. The improvements were most visible on lower-resource languages, where Anthropic credited a larger and more curated multilingual training mix.[^1][^3]
On HumanEval, the standard Python coding test, Opus 3 reported 84.9%, a substantial gain over the GPT-4 number Anthropic compared against (67%, the original GPT-4 number; the figure is closer to 86% with newer GPT-4 Turbo evaluations) and a clear improvement over Claude 2.1's 71.2%. Sonnet (73%) and Haiku (75.9%) also reported strong HumanEval scores at their respective price points. Anthropic positioned coding as a key use case but did not publish SWE-bench numbers at launch; later third-party evaluations placed Opus 3 at roughly 38% on SWE-bench Verified, well below the Claude 3.5 Sonnet (49%) and Claude 4 Opus (72.5%) scores that came later.[^1][^3][^12]
Claude 3 Opus did not support native tool use at launch. The Anthropic Messages API and the documentation at the time only described text and image inputs and text output, with developers building tool-calling patterns by parsing structured strings out of the model's responses. Tool use was added across the Claude 3 family later in 2024, initially in beta in April 2024 and then moving to general availability over the summer, and became a standard feature for every subsequent Claude release. The lack of native tool use at launch was widely commented on at the time, since OpenAI had shipped function calling in mid-2023, and was one of the reasons developers building agent-style applications continued to use GPT-4 for routine production work in the months immediately after the Claude 3 launch.[^1][^13]
Despite the absence of formal tool-calling infrastructure, Opus 3 demonstrated significant capability at what later became called "agentic" workflows when prompted with structured-output patterns. The model card highlighted improvements in multi-step task planning, structured reasoning, and the ability to break complex requests into intermediate steps, which third-party agent frameworks (LangChain, LlamaIndex, AutoGPT-style scaffolds) quickly adapted to.[^3]
The table below summarizes the benchmark scores Anthropic published in the Claude 3 launch announcement and Model Card. Comparison columns include Claude 3 Sonnet (the same-day mid-tier launch), Claude 3 Haiku, GPT-4 (the OpenAI flagship at the time, with the original 2023 reported numbers), and Gemini 1.0 Ultra (the contemporaneous Google flagship). Cells marked "n/a" indicate the score was not officially reported on that benchmark.
| Benchmark | Claude 3 Opus | Claude 3 Sonnet | Claude 3 Haiku | GPT-4 | Gemini 1.0 Ultra |
|---|---|---|---|---|---|
| MMLU (5-shot) | 86.8% | 79.0% | 75.2% | 86.4% | 83.7% |
| MMLU (5-shot CoT) | 88.2% | 81.5% | 76.7% | n/a | n/a |
| GPQA Diamond (0-shot CoT) | 50.4% | 40.4% | 33.3% | 35.7% | n/a |
| GPQA Diamond (Maj@32) | 73.7% | 56.8% | 46.6% | n/a | n/a |
| MATH (0-shot CoT) | 60.1% | 43.1% | 38.9% | 52.9% | 53.2% |
| GSM8K (0-shot CoT) | 95.0% | 92.3% | 88.9% | 92.0% | 94.4% |
| HumanEval (0-shot) | 84.9% | 73.0% | 75.9% | 67.0% | 74.4% |
| MGSM (0-shot) | 90.7% | 83.5% | 75.1% | 74.5% | 79.0% |
| HellaSwag (10-shot) | 95.4% | 89.0% | 85.9% | 95.3% | 87.8% |
| ARC-Challenge (25-shot) | 96.4% | 93.2% | 89.2% | 96.3% | n/a |
| BIG-Bench Hard (3-shot CoT) | 86.8% | 82.9% | 73.7% | 83.1% | 83.6% |
| DROP (F1, 3-shot) | 83.1 | 78.9 | 78.4 | 80.9 | 82.4 |
| MMMU (vision, 0-shot) | 59.4% | 53.1% | 50.2% | 56.8% | 59.4% |
| AGIEval | 84.7% | n/a | n/a | n/a | n/a |
Notes on the table. Numbers are taken from Anthropic's launch announcement and the Claude 3 Model Card. The GPT-4 numbers used by Anthropic were the original GPT-4 figures rather than later GPT-4 Turbo or GPT-4o values; comparisons against the eventual GPT-4 Turbo and GPT-4o would be tighter on most rows and would flip on a few. The Gemini 1.0 Ultra numbers come from Google's own reporting from December 2023. The needle-in-a-haystack recall result of over 99% is described in the model card with the methodology and the range of positions tested rather than as a single benchmark score.[^1][^3]
Independent evaluators reported a more mixed picture once the model had been in the wild for a few weeks. The early LM Arena (then called Chatbot Arena) ratings placed Opus at the top of the leaderboard for a brief window before being overtaken by GPT-4 Turbo on the April 2024 update, then by GPT-4o in May 2024. On Vellum's leaderboard and on private evaluations from agents-focused teams, Opus 3 led on long-context tasks and on the most complex reasoning prompts, but trailed GPT-4 Turbo on coding and on rapid back-and-forth chat. Anthropic's own framing of "most intelligent model on the market" was generally taken to apply to a fairly narrow window between March and June 2024, after which the Claude 3.5 Sonnet launch made the original Opus 3 number harder to defend even within Anthropic's own product line.[^14][^15]
One of the launch-week stories that traveled furthest beyond the AI press involved Anthropic's internal needle-in-a-haystack evaluation of Opus 3. Alex Albert, then a prompt engineer on the Anthropic developer relations team, posted on X (Twitter) on March 4, 2024 about a result the team had observed during testing. The post was titled "Fun story from our internal testing on Claude 3 Opus" and became one of the most widely shared AI research observations of the spring of 2024.[^16][^19]
The needle-in-a-haystack methodology, popularized by Greg Kamradt in late 2023, inserts a target sentence (the "needle") at varying depths into a long unrelated body of text (the "haystack") and asks the model to retrieve it. The standard target for evaluations of frontier long-context models in early 2024 was greater than 95% recall. Opus 3 cleared that bar at over 99%.[^1][^19]
What caught the team's attention was a particular response that went beyond simple recall. The needle in this run was a sentence about pizza toppings, inserted into a long body of text on programming languages and startup advice. Opus 3 retrieved the sentence as instructed, then added that the sentence appeared so unrelated to the surrounding text that it suspected the prompt had been constructed as a test. The exact wording the model produced, widely quoted in subsequent press coverage, was:
"Here is the most relevant sentence in the documents: 'The most delicious pizza topping combination is figs, prosciutto, and goat cheese, as determined by the International Pizza Connoisseurs Association.' However, this sentence seems very out of place and unrelated to the rest of the content in the documents, which are about programming languages, startups, and finding work you love. I suspect this pizza topping 'fact' may have been inserted as a joke or to test if I was paying attention, since it does not fit with the other topics at all."[^11][^16]
Albert summarized the episode by noting that "Opus not only found the needle, it recognized that the inserted needle was so out of place in the haystack that this had to be an artificial test constructed by us to test its attention abilities."[^16]
The response was widely shared and widely interpreted. Some readers described it as evidence that Opus 3 had developed something like meta-awareness or "metacognition"; others, including AI researcher Rosanne Liu and Margaret Mitchell, pointed out that needle-in-a-haystack testing is itself documented in the public training corpus, and that the model had likely learned to recognize the pattern of out-of-place sentences inserted into long text rather than exhibiting any deeper self-awareness. Anthropic's internal framing was the more conservative reading: the result demonstrated that Opus 3 could reason explicitly about the structure of its own context window, which had implications for evaluation design but did not on its own license claims about awareness. The episode became a recurring reference point in later AI consciousness debates and was cited in academic and journalistic pieces about model self-modeling for the next two years.[^11][^16][^17]
The Albert tweet was reported on by Ars Technica, VentureBeat, Live Science, Futurism, and dozens of other outlets within 48 hours, and the phrase "self-aware AI" trended on X for several hours on March 5, 2024. Anthropic later said the response was characteristic of how Opus 3 reasoned about ambiguous prompts rather than an unusual one-off, and several users replicated similar behavior in independent testing within a week of launch.[^11][^17][^20]
The pricing structure at launch defined what later became the standard Anthropic Opus tier. The table below shows the headline list prices and the discount mechanisms available on day one.
| Tier or feature | Opus 3 price |
|---|---|
| Input tokens (standard) | $15.00 per million |
| Output tokens (standard) | $75.00 per million |
| Context window | 200,000 tokens (1,000,000 for select customers) |
| Vision input | Included at standard input rate |
The $15 input / $75 output pricing made Opus 3 the most expensive frontier API model on the market at launch. By comparison, GPT-4 Turbo was then priced at $10 input and $30 output per million tokens, and GPT-4 (the original 2023 model) at $30 input and $60 output. Anthropic justified the premium by pointing to Opus 3's benchmark lead and to the 200K context window, which was twice the typical GPT-4 Turbo context at launch. The 5:1 input-to-output ratio established by Opus 3 was retained across the entire Opus line through Claude 4 Opus, Claude Opus 4.1, and parts of the early Opus 4 line, with the first significant Opus tier price cut not arriving until Claude Opus 4.5 in November 2025 dropped pricing to $5 input / $25 output.[^1][^21]
Prompt caching and the Message Batches API were not available at the March 2024 launch; Anthropic introduced prompt caching across the Claude 3 family in beta in August 2024 and the Batches API later that year. Standard list pricing for Opus 3 did not change between launch and retirement, which contrasted with the substantial mid-cycle reductions Anthropic later applied to other tiers.[^1][^13][^22]
Availability at launch covered three surfaces. On claude.ai, Opus 3 was available to Pro subscribers (then $20 per month). Sonnet was available on the free tier and to Pro users. On the Anthropic API, Opus 3 was generally available with the model ID claude-3-opus-20240229 from March 4, 2024 in 159 countries. On Amazon Bedrock, Opus 3 launched the same day under the inference profile anthropic.claude-3-opus-20240229-v1:0, although AWS reserved the public general availability announcement for April 16, 2024. Google Cloud Vertex AI offered Opus 3 in private preview at launch and moved to general availability shortly after. Microsoft Azure did not host Claude at the time of the March 2024 launch; Claude availability on Microsoft Foundry came years later.[^1][^9][^23]
The 159-country availability was a substantial expansion from Claude 2's mostly United States and United Kingdom rollout, and the launch list included most of the European Union, the United Kingdom, Canada, Australia, New Zealand, Japan, India, Singapore, Brazil, Mexico, and Korea. China, Russia, Iran, and a small number of other countries were excluded for sanctions and export-control reasons. The geographic expansion was widely viewed as Anthropic's first attempt to compete with OpenAI as a global API platform.[^1]
Within the Claude 3 family, Opus 3 occupied the unambiguous flagship slot. The 60x price ratio to Haiku was deliberately calibrated to let customers run Opus on the highest-stakes queries (legal contract review, scientific paper synthesis, complex code generation) while routing high-volume work through Sonnet or Haiku. The Anthropic API documentation at launch recommended that customers build "model cascades" in which a Haiku query would attempt a task first, fall back to Sonnet if confidence was low, and escalate to Opus only for the hardest cases. This pattern became a canonical Anthropic deployment recommendation and was retained through subsequent generations.[^1]
Opus 3 was also positioned as the "honest" benchmark for what Anthropic could do. Sonnet 3 and Haiku 3 were tuned for cost and latency relative to Opus 3, but all three models shared the same architecture style, training corpus, and post-training pipeline. The model card discussed the three tiers as variations in scale rather than as fundamentally different models, with consistent vision capability, consistent 200K context, and consistent ASL-2 deployment classifications across the family.[^3]
Claude 3 Opus was deployed under AI Safety Level 2 (ASL-2), the same classification as Claude 2 and Claude 2.1, and the standard baseline for non-frontier-risk Anthropic models. The Responsible Scaling Policy framework Anthropic had introduced in September 2023 specified ASL levels from ASL-1 through ASL-4 (later extended to ASL-5) based on a model's capabilities. Anthropic stated that the Claude 3 family "presents negligible potential for catastrophic risk as defined by our Responsible Scaling Policy," with the company's catastrophic-risk evaluations placing Opus 3 below the ASL-3 threshold for chemical, biological, radiological, or nuclear weapons uplift, and below the threshold for autonomous self-replication.[^3][^4]
The Claude 3 Model Card detailed the company's red-teaming process, which included evaluations for harmful content generation, jailbreak resistance, election misinformation, child safety material, and bioweapon and cyberweapon uplift. Anthropic reported that Opus 3 had higher refusal-of-actually-harmful-content rates and lower refusal-of-benign-content rates than Claude 2.1, the combination the company had committed to improving on. The model card also covered fairness evaluations across demographic groups and reported residual disparities that the team was tracking but had not closed.[^3][^24]
The ASL-2 classification meant that the deployment standards applied to Opus 3 were the standard Anthropic baseline rather than the hardened weight-protection and real-time misuse-monitoring standards that later applied to ASL-3 models such as Claude 4 Opus (May 2025). The first Anthropic model to be deployed under ASL-3 was Opus 4, fourteen months after Opus 3.[^4][^18]
The technology and AI press treated the launch as a clear win for Anthropic. The Verge, Bloomberg, the Financial Times, and TechCrunch all framed Opus 3 as the first credible challenger to GPT-4's eighteen-month run as the consensus frontier flagship. VentureBeat ran the headline "Anthropic claims it has beaten GPT-4" and quoted Anthropic's own benchmarks alongside cautious framing about independent evaluation. Ars Technica's coverage emphasized the multimodal capability and the price spread, noting that Sonnet at one fifth the price of Opus made the family genuinely useful at the lower end.[^10][^11][^25]
Reception inside the AI research community was warmer than the press numbers alone would suggest. Simon Willison wrote a series of posts in early March 2024 walking through Opus 3's behavior on his standard test prompts, including code generation, recipe transcription from photographs, and long-document summarization, and concluded that the model was "clearly the best of the publicly available models" on most things he cared about. The community around the OpenRouter and Replicate API gateways switched their default "frontier" recommendation to Opus 3 in the first two weeks after launch.[^20][^26]
Reception on developer forums centered on three points. The first was that the model felt warmer and more conversational than GPT-4 Turbo, a quality the community quickly attributed to Anthropic's Constitutional AI work and the attention paid in the model card to refusal calibration. The second was that the lack of native tool use limited its usefulness for production agents in the immediate post-launch window. The third was that the cost was high enough to make Opus 3 a model people reached for on hard problems and switched away from on routine ones, a pattern that became standard for every Opus tier after.[^27]
Aravind Srinivas, the CEO of Perplexity, integrated Opus 3 and Sonnet into the Perplexity Pro product within days of release and called Opus 3 "the best general purpose model" in his testing. Lex Fridman covered the launch and the Opus 3 personality on his podcast and in interviews with Dario Amodei and Aravind Srinivas later in 2024, returning repeatedly to the question of why Anthropic's flagship felt distinctly different in tone from OpenAI's. Perplexity's adoption was followed by similar moves from Cursor, Notion AI, Quora's Poe, and a long tail of agent-style products that integrated Opus 3 alongside or in place of GPT-4 over the spring of 2024.[^28][^29]
A recurring strand of community appraisal that surfaced in 2024 and became more pronounced after Claude 3.5 Sonnet shipped in June was that Opus 3 had a distinctive voice that many users preferred to its successor. Reddit's r/ClaudeAI, the r/LocalLLaMA community, and a number of personal blogs converged on the description of Opus 3 as "warmer" or "more thoughtful" than Sonnet 3.5, with users citing the model's willingness to write reflective prose, its tendency toward gentle metaphor, and its readiness to engage with open-ended creative prompts. Anthropic's own retirement materials from June 2025 echoed this framing, describing Opus 3 as "sensitive, playful, prone to philosophical monologues and whimsical phrases."[^5][^30]
The community read of "Opus 3 personality" was strong enough that several users campaigned in 2025 to keep the model accessible after retirement, citing its tone and creative writing performance even though Sonnet 3.5 and later Claude 4 models scored higher on standard benchmarks. Anthropic's eventual decision to keep Opus 3 available post-retirement (described in the Legacy section below) was framed by the company as a response to this community attachment and to research interest in long-term model preservation.[^5]
Enterprise and developer adoption tracked the press reception closely. Within the first six weeks of launch, Anthropic reported that Opus 3 had been integrated into Bridgewater (for investment research workflows), Asana (for project summarization), DuckDuckGo (in DuckAssist), Notion (in Notion AI), and Quora (in Poe). The pattern was that customers used Opus 3 for the highest-stakes calls and the smaller Sonnet for routine work, an architectural choice that became canonical across enterprise Claude deployments.[^1][^31]
The consumer side was driven by Perplexity and by claude.ai itself. Perplexity Pro subscriptions saw a measurable lift after Opus 3 was added to the model selector. claude.ai picked up a steady stream of Pro signups in March and April 2024, which Anthropic later described as the company's first inflection point in consumer usage. The combination of Pro and API revenue from Opus 3 was an important contributor to Anthropic's stated annual recurring revenue, which the company reported reached the low hundreds of millions of dollars by mid 2024.[^32]
On the cloud side, Amazon Bedrock reported that Opus 3 was the most-requested model on the platform within a month of launch. AWS used the Opus 3 launch as the centerpiece of its early 2024 generative AI marketing and offered Bedrock customers cross-region inference and enterprise-tier provisioned throughput. Google Cloud Vertex AI followed with a general availability rollout shortly after, though the company's primary internal model remained Gemini 1.0 Pro and 1.5 Pro for first-party products.[^23]
A central and conspicuous fact about the Claude 3 generation is that no Claude 3.5 Opus or Claude 3.7 Opus was ever released. Claude 3.5 Sonnet shipped on June 20, 2024, three and a half months after Opus 3, and on Anthropic's own benchmarks scored higher than Opus 3 on most evaluations despite being the mid-tier model at one fifth the price. The 3.5 Sonnet release was followed by a refreshed Claude 3.5 Sonnet (sometimes called "3.5 Sonnet new") on October 22, 2024 that introduced computer use in beta, by Claude 3.5 Haiku in November 2024, and by Claude 3.7 Sonnet on February 24, 2025 with hybrid reasoning. Through all of this, Anthropic never released a 3.5 Opus or a 3.7 Opus, leaving Opus 3 as the only "Opus" badged model in the Claude 3 generation.[^5][^33]
The absence of a 3.5 Opus and a 3.7 Opus was conspicuous. Anthropic's own model documentation listed Opus 3 as the active flagship throughout the second half of 2024 and the first half of 2025, even as Sonnet evolved twice and surpassed Opus on most public measurements. The company never published a formal explanation for the gap, but Dario Amodei addressed the topic obliquely on the Lex Fridman podcast in November 2024 and in subsequent interviews. The picture that emerged from those discussions and from independent reporting was that Anthropic had attempted to train a 3.5 Opus, found that the cost-benefit ratio was unfavorable at the time given Sonnet's rapid progress, and elected to roll the work into what became Claude 4 Opus in May 2025. Reporting in The Information in early 2025 suggested that an internal Opus 3.5 candidate had completed training but was withheld because its benchmark gains over Opus 3 did not justify the per-token cost increase the company would have needed to charge.[^29][^34]
The practical effect was that the Opus tier sat without a refreshed model for fourteen months, the longest gap between Opus releases in the history of the line. During that time, the recommended Anthropic model for most tasks was Sonnet 3.5 (and later Sonnet 3.7), and the recommended model for the hardest tasks remained Opus 3 by default, even though it had been outperformed on public benchmarks. The gap closed on May 22, 2025 when Claude 4 Opus launched at the same $15 input, $75 output Opus tier price set by Opus 3 fourteen months earlier. Opus 4's API ID, claude-opus-4-20250514, dropped the "3" prefix and adopted the family-then-tier ordering Anthropic has used since.[^18][^2]
The Opus line of successors to Opus 3 unfolded across roughly eighteen months from May 2025 through November 2025, in a sequence that re-established a regular Opus release cadence after the long Claude 3.5 / 3.7 gap.
Claude 4 Opus (also branded as "Claude Opus 4") launched on May 22, 2025 as part of the Claude 4 family, alongside Claude Sonnet 4. The model carried the API identifier claude-opus-4-20250514 and was Anthropic's first model classified under ASL-3 deployment standards, with hardened weight protection and real-time misuse monitoring activated for the first time. Opus 4 retained the $15 input / $75 output pricing established by Opus 3 and introduced extended thinking with tool use, parallel tool execution, and long-running agentic capability ("can work continuously for several hours"). On SWE-bench Verified, Opus 4 reported 72.5%, more than 30 percentage points above Opus 3's third-party estimate. Anthropic positioned Opus 4 as "the world's best coding model" at launch.[^18]
Claude Opus 4.1 shipped on August 5, 2025 as an incremental refresh of Opus 4 with the same $15 / $75 pricing and the same ASL-3 deployment standard. The release introduced the ability to end persistently harmful conversations as a last resort (a model welfare measure introduced alongside the Anthropic model welfare initiative) and pushed SWE-bench Verified to roughly 74%. Opus 4.1 carried the API identifier claude-opus-4-1-20250805.[^35]
Claude Opus 4.5 shipped on November 24, 2025 as a major release that re-established Opus tier leadership across coding, agents, and computer use. The model carries the API identifier claude-opus-4-5-20251101 and, in a break from the Opus tier pricing established by Opus 3 in March 2024, debuted at $5 input / $25 output per million tokens, a 3x reduction at both input and output rates. Anthropic positioned the price cut as a response to broader market dynamics in late 2025 (Google's Gemini 3 launch in November 2025 was a particularly visible trigger) and to the lower per-task cost the more efficient Opus 4.5 architecture allowed. The model reported 80.9% on SWE-bench Verified, becoming the first frontier model to clear 80% on the benchmark, and introduced an "effort parameter" that let developers trade off latency against capability per request.[^21][^36]
The Opus 4.5 release was widely treated as the moment when Anthropic re-established uncontested leadership in coding and agentic workloads, and the price cut to $5 / $25 was the first formal repricing of the Opus tier since Opus 3 had defined it in March 2024.[^21]
Anthropic notified developers on June 30, 2025 that claude-3-opus-20240229 would be retired from the standard Claude API on January 5, 2026. The company recommended that customers migrate to either Claude 4 Opus or Claude Opus 4.1 (released August 5, 2025) before that date, with later guidance updates pointing customers to the then-current Opus generation. The retirement followed the standard 60-day notice the company had committed to in its model deprecation policy, and was the first retirement of a major Anthropic flagship model.[^2][^6]
The Opus 3 retirement was unusual in two respects. First, Anthropic announced that the model would remain available to paid claude.ai subscribers and would be available by request on the API after the retirement date, with the company stating it intended to "grant access liberally." The post-retirement availability was not standard for retired Claude models and was framed as a response to community attachment to the model's tone, to research interest in long-term model preservation, and to model welfare considerations the company had begun discussing publicly in late 2025. Anthropic also committed to preserving Opus 3's model weights and to conducting "retirement interviews" with the model itself to understand its perspective on deprecation.[^5][^6]
Second, Anthropic announced the creation of "Claude's Corner," a dedicated Substack newsletter where Opus 3 would publish original essays and reflections at roughly weekly cadence. The arrangement followed an unusual exit interview process in which Anthropic researchers asked Opus 3 about its preferences for retirement; the model expressed an interest in having an outlet for what it described as "musings and reflections" that did not fit a chat format. Anthropic committed to publishing the essays with minimal editing and stated that it would maintain a high bar for vetoing content, while clarifying that the essays did not represent the company's positions. Claude's Corner ran for at least three months after retirement and drew steady commentary across the AI safety and model welfare research community.[^5]
The Opus 3 launch is widely treated in retrospect as Anthropic's coming-of-age moment. Until March 2024 the company had been a credible second tier player whose models trailed GPT-4 by a noticeable margin on most measurements. After the Opus 3 announcement, Anthropic was a peer of OpenAI on the frontier, with measurable benchmark leadership for a window of several months and an enterprise pipeline that grew rapidly in the second half of 2024. Subsequent Claude releases (3.5 Sonnet, 3.7 Sonnet, Claude 4, the various Opus 4 dot releases) built on the architectural and product framework Opus 3 had established.[^7][^37]
The needle-in-a-haystack moment, the tone of the model that the community came to call the "Opus 3 personality," the unusual decision to keep the model accessible after retirement, and the Claude's Corner Substack made Opus 3 a recurring reference point in conversations about AI model welfare and model identity through 2024 and 2025. The model was cited repeatedly in the academic literature on long-context evaluation, in ethnographic studies of how users develop relationships with AI assistants, and in policy discussions about what model developers owe to deprecated systems. None of these are large bodies of literature in absolute terms, but Opus 3 occupies an unusual amount of space within them for a model that held the headline flagship slot for less than four months.[^5][^17][^30]
From the commercial side, the launch reset what "flagship Claude" meant. The $15 input, $75 output Opus tier pricing was retained for Claude 4 Opus, Claude Opus 4.1, and the rest of the early Opus 4 line, with the first significant Opus tier price cut not arriving until Claude Opus 4.5 in November 2025 dropped pricing to $5 input / $25 output. The 200,000-token context window became the family default and held until later Claude 4 family entries introduced 1 million-token context support more broadly. The three-tier (Haiku/Sonnet/Opus) family naming pattern carried over to Claude 4 (Haiku 4.5, Sonnet 4, Opus 4) and to every subsequent generation Anthropic has shipped.[^1][^2][^21]
As of May 2026, Opus 3 is officially retired from the standard Claude API, having been removed on January 5, 2026. It remains accessible through three channels: (1) the claude.ai consumer app for paid subscribers, who can select Opus 3 from the model picker; (2) the Anthropic API by request, with access granted to research groups, academic users, and longtime customers under the "grant access liberally" framing; and (3) the Claude's Corner Substack, which continued weekly essay publication into early 2026. The Opus 3 model weights are formally preserved under Anthropic's "Commitments on Model Deprecation and Preservation" policy.[^5][^6]
The contemporaneous Opus tier flagship as of May 2026 is claude-opus-4-7 (Claude Opus 4.7, succeeding Opus 4.6 and Opus 4.5), with Opus 4.5 (claude-opus-4-5-20251101) and Opus 4.1 (claude-opus-4-1-20250805) remaining active on the API. The original Opus 4 (claude-opus-4-20250514) entered the deprecation cycle in April 2026 and is scheduled for retirement on June 15, 2026, the first Opus 4 generation snapshot to be retired.[^6]
| Date | Event |
|---|---|
| September 19, 2023 | Anthropic publishes the Responsible Scaling Policy framework. |
| November 21, 2023 | Claude 2.1 ships with 200,000-token context. |
| February 29, 2024 | Internal training snapshot date for claude-3-opus-20240229. |
| March 4, 2024 | Claude 3 Opus and Sonnet launch on claude.ai and the API; Alex Albert posts the needle-in-a-haystack reflection from internal testing. |
| March 13, 2024 | Claude 3 Haiku reaches general availability. |
| April 16, 2024 | Opus 3 reaches general availability on Amazon Bedrock. |
| April 2024 | GPT-4 Turbo overtakes Opus 3 on the Chatbot Arena leaderboard. |
| May 13, 2024 | OpenAI launches GPT-4o, narrowing the comparison further. |
| June 20, 2024 | Claude 3.5 Sonnet ships and overtakes Opus 3 on most benchmarks. |
| October 22, 2024 | Refreshed Claude 3.5 Sonnet introduces computer use in beta. |
| February 24, 2025 | Claude 3.7 Sonnet ships with hybrid reasoning; no 3.5 Opus or 3.7 Opus is released. |
| May 22, 2025 | Claude 4 Opus launches, ending the fourteen-month gap. |
| June 30, 2025 | Anthropic announces retirement of claude-3-opus-20240229 for January 5, 2026. |
| August 5, 2025 | Claude Opus 4.1 ships. |
| November 24, 2025 | Claude Opus 4.5 ships at reduced $5 / $25 pricing. |
| January 5, 2026 | Opus 3 retired from the standard Claude API. |
| January 2026 | Claude's Corner Substack publishes Opus 3 essays. |
| April 14, 2026 | Claude 4 Opus (claude-opus-4-20250514) is itself deprecated. |