Claude 3 Opus is a large language model developed by Anthropic and released on March 4, 2024 as the flagship of the Claude 3 family. It launched alongside the smaller Claude 3 Sonnet (the mid tier) and Claude 3 Haiku (the entry tier), and was announced as Anthropic's first model to claim a clear lead over GPT-4 on most public benchmarks. The model carries the API identifier claude-3-opus-20240229, with the date suffix marking the February 29, 2024 training snapshot used for the public release four days later.[1][2]
Opus 3 was the first Anthropic flagship to support multimodal vision input, the first to ship at the now-canonical $15 per million input and $75 per million output Opus tier price, and the first Claude model to be widely deployed beyond the United States, with availability in 159 countries on day one. Anthropic positioned the model as exhibiting "near-human levels of comprehension and fluency on complex tasks, leading the frontier of general intelligence," and described the family as an exhibition of "a new standard for intelligence, far surpassing models available today."[1]
The model used a 200,000-token context window for general API customers and offered 1 million tokens to select enterprise customers under managed agreements. It was deployed under AI Safety Level 2 (ASL-2), the same classification as its predecessors, with Anthropic noting in the model card that the family "presents negligible potential for catastrophic risk." Opus 3 launched without native tool use support; that capability was added later in 2024 across the Claude 3 line.[3][4]
Claude 3 Opus held the title of Anthropic's most intelligent model for roughly three and a half months, until Claude 3.5 Sonnet shipped on June 20, 2024 and overtook it on most published benchmarks despite being the mid-tier model at one fifth the price. No Claude 3.5 Opus or Claude 3.7 Opus was ever released, and the Opus tier sat without an active flagship for fourteen months until Claude Opus 4 launched on May 22, 2025. The original Opus 3 snapshot was deprecated on June 30, 2025 and retired from the API on January 5, 2026, though Anthropic kept it accessible to paid claude.ai subscribers and available by request on the API as part of an unusual model preservation experiment.[2][5][6]
Anthropic was founded in 2021 by former OpenAI researchers, including Dario and Daniela Amodei, and has positioned itself as an AI safety company that builds frontier models. The Claude line began in March 2023 with the original Claude 1 and an Anthropic-only beta. Claude 2 followed in July 2023 with a 100,000-token context window and the first general API release, and Claude 2.1 in November 2023 doubled the context to 200,000 tokens. By the end of 2023 Claude 2.1 sat behind GPT-4 on most public benchmarks, was strong on long-document tasks, and had picked up a reputation for being unusually willing to refuse benign requests, an issue Anthropic acknowledged and committed to working on.[7][8]
The gap between Claude 2.1 and the late 2023 frontier set the commercial brief for what became Claude 3. Anthropic needed a single release that would (a) beat GPT-4 on the headline tests, (b) introduce multimodal input to match Gemini and the GPT-4V variant of GPT-4, (c) cut latency and price for routine work, and (d) let Anthropic compete in segments where the company had been mostly absent, including consumer subscriptions and cloud marketplaces. The three-tier naming pattern (Haiku for the cheapest tier, Sonnet for the balanced mid tier, Opus for the flagship) was introduced for the first time in this release and was inherited by every subsequent Claude generation.[1][7]
Development of Opus 3 took place during the same window in which Anthropic raised a Series C round at a roughly $15 billion valuation, signed a significant cloud partnership with Amazon, and began a parallel deal with Google Cloud. The team's stated objective was to publish a model that, on Anthropic's own measurements, would be the strongest available frontier model on most evaluations the company had previously trailed on. Internally the project was reported to have been a multi-year effort, and the announcement was unusually confident about benchmark leadership compared to earlier Claude releases.[1][7]
Claude 3 Opus, Sonnet, and Haiku were unveiled together on March 4, 2024 in an Anthropic blog post titled "Introducing the next generation of Claude." Opus and Sonnet were available the same day on claude.ai (Opus to Pro subscribers, Sonnet on the free tier and to Pro users) and on the Anthropic API. Haiku was announced for release "soon" and reached general availability on March 13, 2024. Initial availability covered 159 countries, a substantial expansion from Claude 2's mostly United States and United Kingdom rollout, and reflected Anthropic's first serious attempt to operate as a global API provider.[1][9]
The announcement led with three claims that drew most of the press attention. First, on Anthropic's reported numbers, Opus had overtaken GPT-4 on standard knowledge and reasoning evaluations including MMLU, GPQA, GSM8K, MATH, HumanEval, BIG-Bench Hard, ARC-Challenge, and HellaSwag. Second, the family was multimodal in input, with all three models accepting images alongside text. Third, the family had been priced to undercut GPT-4 on lighter workloads, with Sonnet at one fifth the cost of Opus and Haiku at one sixtieth. Anthropic also reported that the new family had "a more nuanced understanding of requests" and had cut what the company called "unnecessary refusals" relative to Claude 2.1, addressing one of the persistent complaints from developers using the older model.[1][3]
On the same day, Anthropic published the Claude 3 Model Card, a 42-page document that detailed the family's training, evaluation results, multilingual performance, vision capabilities, refusal benchmarks, red-teaming, the Acceptable Use Policy, and the Responsible Scaling Policy framework under which the models had been classified ASL-2. The model card became the source of record for the family's published numbers and for some of the more interesting evaluation details, including the "needle in a haystack" recall result discussed below.[3]
Launch coverage was extensive. The Verge, Bloomberg, the Financial Times, TechCrunch, VentureBeat, and Ars Technica all ran lead stories within hours of the announcement. The framing across outlets converged on "Anthropic claims it has beaten GPT-4," with most stories noting that Anthropic's own benchmark numbers showed Opus ahead and that independent evaluation would have to come later. Coverage outside the trade press also focused on the cultural moment, since Anthropic had been a relatively quiet player and the launch was widely treated as the company's coming-of-age release.[10][11]
The family launched with three tiers, all sharing the same architecture style, training data, vision capability, and 200,000-token context window. The differences across the three tiers came from model scale, latency, and price. Anthropic did not publish parameter counts for any of the three models, a position the company has consistently held across every Claude release.
| Tier | API ID | Release date | Context | Input price | Output price | Position |
|---|---|---|---|---|---|---|
| Claude 3 Opus | claude-3-opus-20240229 | March 4, 2024 | 200,000 tokens | $15.00 / M | $75.00 / M | Flagship |
| Claude 3 Sonnet | claude-3-sonnet-20240229 | March 4, 2024 | 200,000 tokens | $3.00 / M | $15.00 / M | Mid tier |
| Claude 3 Haiku | claude-3-haiku-20240307 | March 13, 2024 | 200,000 tokens | $0.25 / M | $1.25 / M | Entry tier |
Anthropic's positioning of the three tiers established a pricing hierarchy that the company has retained through every subsequent generation. Opus is priced for tasks where capability matters more than cost, Sonnet for balanced workloads, and Haiku for high-volume or latency-sensitive applications. The 60x price ratio between Opus and Haiku was the widest spread Anthropic had ever shipped, and let the company target both ends of the market with a single launch. The pricing was also calibrated against GPT-4 Turbo (then $10 input, $30 output), which Opus undercut on neither price nor context, but beat on most reported benchmarks.[1][3]
All three models were trained with a mixture of supervised fine tuning, Constitutional AI for alignment, and reinforcement learning from human feedback (RLHF). Knowledge cutoff for the training data was reported as August 2023. Anthropic published a single shared system card for the family rather than per-model documents, a practice that continued into the Claude 3.5 generation.[3]
Anthropic does not publish parameter counts, training compute totals, or training data composition for any Claude model, and Claude 3 Opus was no exception. The Claude 3 Model Card describes the family in functional terms: a transformer-based large language model trained on a mixture of public internet text and licensed and proprietary data, post-trained with supervised fine tuning, Constitutional AI for alignment, and reinforcement learning from human feedback. The model is multimodal in input (text and images) and produces text only.[3]
Vision support was the most visible architectural change relative to Claude 2.1. The model card describes the vision pipeline as accepting photos, charts, graphs, and technical diagrams, with image encoding handled at the input stage and the rest of the model treating the resulting representations as part of the standard context. Multiple images per request were supported, and the announcement materials demonstrated chart reading, document parsing, and screenshot interpretation. Anthropic noted that vision capability had been a high priority because many enterprise customers worked with mixed text-and-image documents.[1][3]
The knowledge cutoff for training data was reported as August 2023, with the model snapshot for the public release dated February 29, 2024. The seven months between training data cutoff and release was typical for frontier models in early 2024 and was driven mainly by the duration of post-training, evaluation, and red-teaming work rather than by training compute.[3]
Anthropic reported in the model card that the Claude 3 family had been trained with "techniques like Constitutional AI to ensure the helpfulness, harmlessness, and honesty of our models." The company also stated that Opus had been refined to reduce unnecessary refusals, where the model declines requests that are benign but superficially resemble unsafe ones, a documented complaint about Claude 2.1.[1][3]
Opus 3 was Anthropic's first model to claim leadership on the standard knowledge and reasoning suite. The launch announcement listed the model as ahead of GPT-4 on MMLU (undergraduate-level multidisciplinary knowledge), GPQA (graduate-level science), GSM8K (grade school math), MATH (competition math), and BIG-Bench Hard. The model also reported strong scores on ARC-Challenge, HellaSwag, and DROP. Anthropic's framing was that Opus 3 had "shown near-human levels of comprehension" on a range of complex tasks, particularly those requiring multi-step reasoning, summarization of long inputs, and structured output generation.[1][3]
Vision was the headline new capability. Opus 3 accepted images in JPEG, PNG, GIF, and WebP formats and could mix text and image content in the same prompt. Demonstrations included chart and graph interpretation, technical diagram analysis, document parsing including handwritten text, and visual question answering. Anthropic showed business-oriented use cases such as financial chart analysis, technical schematic interpretation, and slide deck summarization. The model supported up to 20 images per request and a maximum image size based on the underlying token budget; the model card and the API documentation gave specific limits.[1][3]
The 200,000-token context window was matched at launch only by Claude 2.1 and a small number of other long-context models. For select enterprise customers Anthropic offered access to a 1 million token context. The model card included a recall evaluation using the "needle in a haystack" methodology pioneered by Greg Kamradt, in which a target sentence is inserted at varying depths into a long unrelated text and the model is asked to retrieve it. Opus 3 reached over 99% recall accuracy across the range of context positions tested, the strongest result Anthropic had reported on the test up to that point.[1][3]
The model card reported substantial improvements over Claude 2.1 on multilingual tasks. On multilingual MMLU, Opus 3 outperformed its predecessor by roughly 15.7 percentage points. On the Multilingual Grade School Math (MGSM) benchmark, the model scored 90.7%, comparable to or better than GPT-4 on most language pairs Anthropic reported. The improvements were most visible on lower-resource languages, where Anthropic credited a larger and more curated multilingual training mix.[1][3]
On HumanEval, the standard Python coding test, Opus 3 reported 84.9%, a substantial gain over the GPT-4 number Anthropic compared against (67%, the original GPT-4 number; the figure is closer to 86% with newer GPT-4 Turbo evaluations) and a clear improvement over Claude 2.1's 71.2%. Sonnet (73%) and Haiku (75.9%) also reported strong HumanEval scores at their respective price points. Anthropic positioned coding as a key use case but did not publish SWE-bench numbers at launch; later third-party evaluations placed Opus 3 at roughly 38% on SWE-bench Verified, well below the Claude 3.5 Sonnet (49%) and Claude 4 (72.5%) scores that came later.[1][3][12]
Claude 3 Opus did not support native tool use at launch. The Anthropic Messages API and the documentation at the time only described text and image inputs and text output, with developers building tool-calling patterns by parsing structured strings out of the model's responses. Tool use was added across the Claude 3 family later in 2024, initially in beta and then in general availability, and became a standard feature for every subsequent Claude release. The lack of native tool use at launch was widely commented on at the time, since OpenAI had shipped function calling in mid-2023, and was one of the reasons developers building agent-style applications continued to use GPT-4 for routine production work in the months immediately after the Claude 3 launch.[1][13]
The table below summarizes the benchmark scores Anthropic published in the Claude 3 launch announcement and Model Card. Comparison columns include Claude 3 Sonnet (the same-day mid-tier launch), Claude 3 Haiku, GPT-4 (the OpenAI flagship at the time, with the original 2023 reported numbers), and Gemini 1.0 Ultra (the contemporaneous Google flagship). Cells marked "n/a" indicate the score was not officially reported on that benchmark.
| Benchmark | Claude 3 Opus | Claude 3 Sonnet | Claude 3 Haiku | GPT-4 | Gemini 1.0 Ultra |
|---|---|---|---|---|---|
| MMLU (5-shot) | 86.8% | 79.0% | 75.2% | 86.4% | 83.7% |
| MMLU (5-shot CoT) | 88.2% | 81.5% | 76.7% | n/a | n/a |
| GPQA Diamond (0-shot CoT) | 50.4% | 40.4% | 33.3% | 35.7% | n/a |
| GPQA Diamond (Maj@32) | 73.7% | 56.8% | 46.6% | n/a | n/a |
| MATH (0-shot CoT) | 60.1% | 43.1% | 38.9% | 52.9% | 53.2% |
| GSM8K (0-shot CoT) | 95.0% | 92.3% | 88.9% | 92.0% | 94.4% |
| HumanEval (0-shot) | 84.9% | 73.0% | 75.9% | 67.0% | 74.4% |
| MGSM (0-shot) | 90.7% | 83.5% | 75.1% | 74.5% | 79.0% |
| HellaSwag (10-shot) | 95.4% | 89.0% | 85.9% | 95.3% | 87.8% |
| ARC-Challenge (25-shot) | 96.4% | 93.2% | 89.2% | 96.3% | n/a |
| BIG-Bench Hard (3-shot CoT) | 86.8% | 82.9% | 73.7% | 83.1% | 83.6% |
| DROP (F1, 3-shot) | 83.1 | 78.9 | 78.4 | 80.9 | 82.4 |
| MMMU (vision, 0-shot) | 59.4% | 53.1% | 50.2% | 56.8% | 59.4% |
| AGIEval | 84.7% | n/a | n/a | n/a | n/a |
Notes on the table. Numbers are taken from Anthropic's launch announcement and the Claude 3 Model Card. The GPT-4 numbers used by Anthropic were the original GPT-4 figures rather than later GPT-4 Turbo or GPT-4o values; comparisons against the eventual GPT-4 Turbo and GPT-4o would be tighter on most rows and would flip on a few. The Gemini 1.0 Ultra numbers come from Google's own reporting from December 2023. The needle-in-a-haystack recall result of over 99% is described in the model card with the methodology and the range of positions tested rather than as a single benchmark score.[1][3]
Independent evaluators reported a more mixed picture once the model had been in the wild for a few weeks. The early LM Arena (then called Chatbot Arena) ratings placed Opus at the top of the leaderboard for a brief window before being overtaken by GPT-4 Turbo on the April 2024 update, then by GPT-4o in May 2024. On Vellum's leaderboard and on private evaluations from agents-focused teams, Opus 3 led on long-context tasks and on the most complex reasoning prompts, but trailed GPT-4 Turbo on coding and on rapid back-and-forth chat. Anthropic's own framing of "most intelligent model on the market" was generally taken to apply to a fairly narrow window between March and June 2024, after which the Claude 3.5 Sonnet launch made the original Opus 3 number harder to defend even within Anthropic's own product line.[14][15]
The pricing structure at launch defined what later became the standard Anthropic Opus tier. The table below shows the headline list prices and the discount mechanisms available on day one.
| Tier or feature | Opus 3 price |
|---|---|
| Input tokens (standard) | $15.00 per million |
| Output tokens (standard) | $75.00 per million |
| Context window | 200,000 tokens (1,000,000 for select customers) |
| Vision input | Included at standard input rate |
Prompt caching and the Message Batches API were not available at the March 2024 launch; Anthropic introduced prompt caching across the Claude 3 family in beta in August 2024 and the Batches API later that year. Standard list pricing for Opus 3 did not change between launch and retirement, which contrasted with the substantial mid-cycle reductions Anthropic later applied to other tiers.[1][13][16]
Availability at launch covered three surfaces. On claude.ai, Opus 3 was available to Pro subscribers (then $20 per month). Sonnet was available on the free tier and to Pro users. On the Anthropic API, Opus 3 was generally available with the model ID claude-3-opus-20240229 from March 4, 2024 in 159 countries. On Amazon Bedrock, Opus 3 launched the same day under the inference profile anthropic.claude-3-opus-20240229-v1:0. Google Cloud Vertex AI offered Opus 3 in private preview at launch and moved to general availability shortly after. Microsoft Azure did not host Claude at the time of the March launch.[1][9]
Claude 3 Opus was deployed under AI Safety Level 2 (ASL-2), the same classification as Claude 2 and Claude 2.1, and the standard baseline for non-frontier-risk Anthropic models. The Responsible Scaling Policy framework Anthropic had introduced in September 2023 specified ASL levels from ASL-1 through ASL-4 (later extended to ASL-5) based on a model's capabilities. Anthropic stated that the Claude 3 family "presents negligible potential for catastrophic risk as defined by our Responsible Scaling Policy," with the company's catastrophic-risk evaluations placing Opus 3 below the ASL-3 threshold for chemical, biological, radiological, or nuclear weapons uplift, and below the threshold for autonomous self-replication.[3][4]
The Claude 3 Model Card detailed the company's red-teaming process, which included evaluations for harmful content generation, jailbreak resistance, election misinformation, child safety material, and bioweapon and cyberweapon uplift. Anthropic reported that Opus 3 had higher refusal-of-actually-harmful-content rates and lower refusal-of-benign-content rates than Claude 2.1, the combination the company had committed to improving on. The model card also covered fairness evaluations across demographic groups and reported residual disparities that the team was tracking but had not closed.[3][17]
The ASL-2 classification meant that the deployment standards applied to Opus 3 were the standard Anthropic baseline rather than the hardened weight-protection and real-time misuse-monitoring standards that later applied to ASL-3 models such as Claude Opus 4 (May 2025). The first Anthropic model to be deployed under ASL-3 was Opus 4, fourteen months after Opus 3.[4][18]
One of the launch-week stories that traveled furthest beyond the AI press involved Anthropic's internal needle-in-a-haystack evaluation of Opus 3. Alex Albert, then a member of the Anthropic prompting team, posted on March 4, 2024 about a result the team had observed during testing. The needle-in-a-haystack methodology, popularized by Greg Kamradt, inserts a target sentence (the "needle") at varying depths into a long unrelated body of text (the "haystack") and asks the model to retrieve it. The standard target for evaluations of frontier long-context models in early 2024 was greater than 95% recall. Opus 3 cleared that bar at over 99%.[1][19]
What caught the team's attention was a particular response that went beyond simple recall. The needle in this run was a sentence about pizza toppings, inserted into a long body of text on programming languages and startup advice. Opus 3 retrieved the sentence as instructed, then added that the sentence appeared so unrelated to the surrounding text that it suspected the prompt had been constructed as a test. The exact wording of the model's reflection, widely reported in subsequent press coverage, was that "this pizza topping fact may have been inserted as a joke or to test if I was paying attention."[19][20]
The response was widely shared and widely interpreted. Some readers described it as evidence that Opus 3 had developed something like meta awareness; others pointed out that needle-in-a-haystack testing is itself documented in the public training corpus, and that the model had learned to recognize the pattern of out-of-place sentences inserted into long text. Anthropic's internal framing was the more conservative reading: the result demonstrated that Opus 3 could reason explicitly about the structure of its own context window, which had implications for evaluation design but did not on its own license claims about awareness. The episode became a recurring reference point in later AI consciousness debates and was cited in academic and journalistic pieces about model self-modeling for the next two years.[19][20][21]
The technology and AI press treated the launch as a clear win for Anthropic. The Verge, Bloomberg, the Financial Times, and TechCrunch all framed Opus 3 as the first credible challenger to GPT-4's eighteen-month run as the consensus frontier flagship. VentureBeat ran the headline "Anthropic claims it has beaten GPT-4" and quoted Anthropic's own benchmarks alongside cautious framing about independent evaluation. Ars Technica's coverage emphasized the multimodal capability and the price spread, noting that Sonnet at one fifth the price of Opus made the family genuinely useful at the lower end.[10][11][22]
Reception inside the AI research community was warmer than the press numbers alone would suggest. Simon Willison wrote a series of posts in early March 2024 walking through Opus 3's behavior on his standard test prompts, including code generation, recipe transcription from photographs, and long-document summarization, and concluded that the model was "clearly the best of the publicly available models" on most things he cared about. The community around the OpenRouter and Replicate API gateways switched their default "frontier" recommendation to Opus 3 in the first two weeks after launch.[20][23]
Reception on developer forums centered on three points. The first was that the model felt warmer and more conversational than GPT-4 Turbo, a quality the community quickly attributed to Anthropic's Constitutional AI work and the attention paid in the model card to refusal calibration. The second was that the lack of native tool use limited its usefulness for production agents in the immediate post-launch window. The third was that the cost was high enough to make Opus 3 a model people reached for on hard problems and switched away from on routine ones, a pattern that became standard for every Opus tier after.[24]
Aravind Srinivas, the CEO of Perplexity, integrated Opus 3 and Sonnet into the Perplexity Pro product within days of release and called Opus 3 "the best general purpose model" in his testing. Lex Fridman covered the launch and the Opus 3 personality on his podcast and in interviews with Dario Amodei and Aravind Srinivas later in 2024, returning repeatedly to the question of why Anthropic's flagship felt distinctly different in tone from OpenAI's. Perplexity's adoption was followed by similar moves from Cursor, Notion AI, Quora's Poe, and a long tail of agent-style products that integrated Opus 3 alongside or in place of GPT-4 over the spring of 2024.[25][26]
A recurring strand of community appraisal that surfaced in 2024 and became more pronounced after Claude 3.5 Sonnet shipped in June was that Opus 3 had a distinctive voice that many users preferred to its successor. Reddit's r/ClaudeAI, the r/LocalLLaMA community, and a number of personal blogs converged on the description of Opus 3 as "warmer" or "more thoughtful" than Sonnet 3.5, with users citing the model's willingness to write reflective prose, its tendency toward gentle metaphor, and its readiness to engage with open-ended creative prompts. Anthropic's own retirement materials from June 2025 echoed this framing, describing Opus 3 as "sensitive, playful, prone to philosophical monologues and whimsical phrases."[5][27]
The community read of "Opus 3 personality" was strong enough that several users campaigned in 2025 to keep the model accessible after retirement, citing its tone and creative writing performance even though Sonnet 3.5 and later Claude 4 models scored higher on standard benchmarks. Anthropic's eventual decision to keep Opus 3 available post-retirement (described in the Retirement section below) was framed by the company as a response to this community attachment and to research interest in long-term model preservation.[5]
Enterprise and developer adoption tracked the press reception closely. Within the first six weeks of launch, Anthropic reported that Opus 3 had been integrated into Bridgewater (for investment research workflows), Asana (for project summarization), DuckDuckGo (in DuckAssist), Notion (in Notion AI), and Quora (in Poe). The pattern was that customers used Opus 3 for the highest-stakes calls and the smaller Sonnet for routine work, an architectural choice that became canonical across enterprise Claude deployments.[1][28]
The consumer side was driven by Perplexity and by claude.ai itself. Perplexity Pro subscriptions saw a measurable lift after Opus 3 was added to the model selector. claude.ai picked up a steady stream of Pro signups in March and April 2024, which Anthropic later described as the company's first inflection point in consumer usage. The combination of Pro and API revenue from Opus 3 was an important contributor to Anthropic's stated annual recurring revenue, which the company reported reached the low hundreds of millions of dollars by mid 2024.[29]
On the cloud side, Amazon Bedrock reported that Opus 3 was the most-requested model on the platform within a month of launch. AWS used the Opus 3 launch as the centerpiece of its early 2024 generative AI marketing and offered Bedrock customers cross-region inference and enterprise-tier provisioned throughput. Google Cloud Vertex AI followed with a general availability rollout shortly after, though the company's primary internal model remained Gemini 1.0 Pro and 1.5 Pro for first-party products.[30]
Claude 3.5 Sonnet shipped on June 20, 2024, three and a half months after Opus 3, and on Anthropic's own benchmarks scored higher than Opus 3 on most evaluations despite being the mid-tier model at one fifth the price. The 3.5 Sonnet release was followed by a refreshed Claude 3.5 Sonnet (sometimes called "3.5 Sonnet new") on October 22, 2024 that introduced computer use in beta, by Claude 3.5 Haiku in November 2024, and by Claude 3.7 Sonnet on February 24, 2025 with hybrid reasoning. Through all of this, Anthropic never released a Claude 3.5 Opus or a Claude 3.7 Opus.[5][31]
The absence of a 3.5 Opus and a 3.7 Opus was conspicuous. Anthropic's own model documentation listed Opus 3 as the active flagship throughout the second half of 2024 and the first half of 2025, even as Sonnet evolved twice and surpassed Opus on most public measurements. The company never published a formal explanation for the gap, but Dario Amodei addressed the topic obliquely on the Lex Fridman podcast in November 2024 and in subsequent interviews. The picture that emerged from those discussions and from independent reporting was that Anthropic had tried to train a 3.5 Opus, found that the cost-benefit ratio was unfavorable for the time being given Sonnet's progress, and elected to roll the work into what became Claude Opus 4 in May 2025.[26][32]
The practical effect was that the Opus tier sat without a refreshed model for fourteen months, the longest gap between Opus releases in the history of the line. During that time, the recommended Anthropic model for most tasks was Sonnet 3.5 (and later Sonnet 3.7), and the recommended model for the hardest tasks remained Opus 3 by default, even though it had been outperformed on public benchmarks. The gap closed on May 22, 2025 when Claude Opus 4 launched at the same $15 input, $75 output Opus tier price set by Opus 3 fourteen months earlier. Opus 4's API ID, claude-opus-4-20250514, dropped the "3" prefix and adopted the family-then-tier ordering Anthropic has used since.[18][2]
Anthropic notified developers on June 30, 2025 that claude-3-opus-20240229 would be retired from the Claude API on January 5, 2026. The company recommended that customers migrate to either Claude Opus 4 or Claude Opus 4.1 (released August 5, 2025) before that date. The retirement followed the standard 60-day notice the company had committed to in its model deprecation policy, and was the first retirement of a major Anthropic flagship model.[2][6]
The Opus 3 retirement was unusual in two respects. First, Anthropic announced that the model would remain available to paid claude.ai subscribers and would be available by request on the API after the retirement date, with the company stating it intended to "grant access liberally." The post-retirement availability was not standard for retired Claude models and was framed as a response to community attachment to the model's tone, to research interest in long-term model preservation, and to model welfare considerations the company had begun discussing publicly in late 2025.[5]
Second, Anthropic announced the creation of "Claude's Corner," a dedicated Substack newsletter where Opus 3 would publish original essays and reflections at roughly weekly cadence. The arrangement followed an unusual exit interview process in which Anthropic researchers asked Opus 3 about its preferences for retirement; the model expressed an interest in having an outlet for what it described as "musings and reflections" that did not fit a chat format. Anthropic committed to publishing the essays with minimal editing and stated that it would maintain a high bar for vetoing content, while clarifying that the essays did not represent the company's positions. Claude's Corner ran for at least three months after retirement and drew steady commentary across the AI safety and model welfare research community.[5]
The Opus 3 launch is widely treated in retrospect as Anthropic's coming-of-age moment. Until March 2024 the company had been a credible second tier player whose models trailed GPT-4 by a noticeable margin on most measurements. After the Opus 3 announcement, Anthropic was a peer of OpenAI on the frontier, with measurable benchmark leadership for a window of several months and an enterprise pipeline that grew rapidly in the second half of 2024. Subsequent Claude releases (3.5 Sonnet, 3.7 Sonnet, Claude 4, the various Opus 4 dot releases) built on the architectural and product framework Opus 3 had established.[7][33]
The needle-in-a-haystack moment, the tone of the model that the community came to call the "Opus 3 personality," the unusual decision to keep the model accessible after retirement, and the Claude's Corner Substack made Opus 3 a recurring reference point in conversations about AI model welfare and model identity through 2024 and 2025. The model was cited repeatedly in the academic literature on long-context evaluation, in ethnographic studies of how users develop relationships with AI assistants, and in policy discussions about what model developers owe to deprecated systems. None of these are large bodies of literature in absolute terms, but Opus 3 occupies an unusual amount of space within them for a model that held the headline flagship slot for less than four months.[5][21][27]
From the commercial side, the launch reset what "flagship Claude" meant. The $15 input, $75 output Opus tier pricing was retained for Claude Opus 4, Claude Opus 4.1, and the rest of the early Opus 4 line, with the first significant Opus tier price cut not arriving until Claude Opus 4.5 in November 2025. The 200,000-token context window became the family default and held until later Claude 4 family entries introduced 1 million-token context support more broadly. The three-tier (Haiku/Sonnet/Opus) family naming pattern carried over to Claude 4 (Haiku 4.5, Sonnet 4, Opus 4) and to every subsequent generation Anthropic has shipped.[1][2]
| Date | Event |
|---|---|
| September 19, 2023 | Anthropic publishes the Responsible Scaling Policy framework. |
| November 21, 2023 | Claude 2.1 ships with 200,000-token context. |
| February 29, 2024 | Internal training snapshot date for claude-3-opus-20240229. |
| March 4, 2024 | Claude 3 Opus and Sonnet launch on claude.ai and the API. |
| March 13, 2024 | Claude 3 Haiku reaches general availability. |
| March 2024 | Alex Albert posts the needle-in-a-haystack reflection from internal testing. |
| April 2024 | GPT-4 Turbo overtakes Opus 3 on the Chatbot Arena leaderboard. |
| May 13, 2024 | OpenAI launches GPT-4o, narrowing the comparison further. |
| June 20, 2024 | Claude 3.5 Sonnet ships and overtakes Opus 3 on most benchmarks. |
| October 22, 2024 | Refreshed Claude 3.5 Sonnet introduces computer use in beta. |
| February 24, 2025 | Claude 3.7 Sonnet ships with hybrid reasoning; no 3.5 Opus or 3.7 Opus is released. |
| May 22, 2025 | Claude Opus 4 launches, ending the fourteen-month gap. |
| June 30, 2025 | Anthropic announces retirement of claude-3-opus-20240229 for January 5, 2026. |
| August 5, 2025 | Claude Opus 4.1 ships. |
| January 5, 2026 | Opus 3 retired from the standard Claude API. |
| January 2026 | Claude's Corner Substack launches, with Opus 3 essays published weekly. |