Claude 3
Last reviewed
Jun 9, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,783 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 9, 2026
Sources
10 citations
Review status
Source-backed
Revision
v1 · 1,783 words
Add missing citations, update stale details, or suggest a clearer explanation.
Claude 3 is a family of large language models developed by Anthropic and announced on March 4, 2024. The generation comprised three models in ascending order of capability and cost: Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus [1]. It was the first Claude generation to accept image input, the first to ship as a tiered model family, and the first from any rival lab whose flagship was widely judged to match or beat GPT-4 on standard benchmarks, a result later reinforced when Opus became the first model to displace GPT-4 from the top of the Chatbot Arena leaderboard [1][6]. All three models offered a 200,000-token context window. The family was progressively deprecated as Claude 3.5 and later generations arrived, with the last member, Claude 3 Haiku, retired from the Claude API on April 20, 2026; the retirement of Claude 3 Opus in January 2026 became the first test of Anthropic's formal commitments on model deprecation and weight preservation [8][9][10].
Anthropic introduced the Claude 3 family in a March 4, 2024 announcement titled "Introducing the next generation of Claude," positioning the three models as successive points on a cost-versus-intelligence curve so that customers could pick the balance of speed, price, and capability appropriate to each workload [1]. Opus and Sonnet were available immediately through the Anthropic API in 159 countries; Sonnet replaced Claude 2.1 as the model behind the free claude.ai experience, while Opus was reserved for Claude Pro subscribers [1]. Sonnet was also available at launch on Amazon Bedrock and in private preview on Google Cloud's Vertex AI, with Opus and Haiku to follow on both platforms [1].
Haiku, described at launch as "available soon," was released on March 13, 2024 through the API, to Claude Pro subscribers on claude.ai, and on Amazon Bedrock [2]. The family carried a training data cutoff of August 2023 [3].
Compared with Claude 2, Anthropic emphasized fewer unnecessary refusals on borderline prompts, roughly a twofold improvement in accuracy on hard open-ended factual questions relative to Claude 2.1, better multilingual performance, and improved instruction following for structured output formats such as JSON [1].
| Model | API model ID | Positioning | Price per million tokens (input / output) | Context window |
|---|---|---|---|---|
| Claude 3 Opus | claude-3-opus-20240229 | Most intelligent; complex analysis, R&D, task automation | $15 / $75 | 200K |
| Claude 3 Sonnet | claude-3-sonnet-20240229 | Balance of intelligence and speed; enterprise workloads | $3 / $15 | 200K |
| Claude 3 Haiku | claude-3-haiku-20240307 | Fastest and most compact; near-instant responsiveness | $0.25 / $1.25 | 200K |
Pricing and positioning are as published at launch [1][2].
Claude 3 Opus was Anthropic's flagship, marketed as having "best-in-market performance on highly complex tasks" and "near-human levels of comprehension and fluency" on open-ended problems [1]. Claude 3 Sonnet was tuned for high-volume deployment, running about twice as fast as Claude 2 and Claude 2.1 while scoring higher on capability evaluations [1]. Claude 3 Haiku targeted latency- and cost-sensitive applications such as customer support; Anthropic said it could read a roughly 10,000-token research paper, including charts and graphs, in under three seconds, and at release called it three times faster than peer models for most workloads, processing about 21,000 tokens per second on prompts under 32K tokens [1][2].
Anthropic's launch materials reported that Opus outperformed both GPT-4 and Google's Gemini 1.0 Ultra on most common academic evaluations, including undergraduate-level knowledge (MMLU), graduate-level reasoning (GPQA), grade-school math (GSM8K), and Python coding (HumanEval) [1][3]. Selected figures as published by Anthropic:
| Benchmark | Opus | Sonnet | Haiku | GPT-4 (as cited by Anthropic) |
|---|---|---|---|---|
| MMLU (5-shot) | 86.8% | 79.0% | 75.2% | 86.4% |
| GPQA Diamond (0-shot CoT) | 50.4% | 40.4% | 33.3% | 35.7% |
| GSM8K | 95.0% | 92.3% | 88.9% | 92.0% |
| HumanEval (0-shot) | 84.9% | 73.0% | 75.9% | 67.0% |
The GPT-4 column reproduced previously published results for the original March 2023 GPT-4 release; commentators noted at the time that newer GPT-4 Turbo versions scored higher on several of these tests, so the margin over OpenAI's best deployed model was narrower than the table implied [6]. Opus's GPQA Diamond score of roughly 50% drew particular attention because the benchmark is designed so that even skilled non-expert humans with web access score around 34% [3].
All three models were multimodal on input, accepting photos, charts, graphs, and technical diagrams alongside text, with vision performance that Anthropic described as on par with other leading models [1]. Output remained text only. The family also improved on tool use and structured extraction, which made Haiku in particular a common choice for high-volume classification and data-processing pipelines [2].
Claude 3 launched with a 200,000-token context window for all three models, with Anthropic stating that the models were capable of accepting inputs of more than one million tokens and that 1M-token contexts would be made available to select customers needing enhanced processing power [1].
To validate long-context recall, Anthropic used the needle in a haystack (NIAH) methodology, in which a target sentence is buried inside a large corpus of unrelated documents and the model is asked to retrieve it. Opus achieved near-perfect recall, above 99% accuracy [1]. The evaluation produced one of the launch's most-discussed moments: Anthropic prompt engineer Alex Albert posted on X that during internal testing, Opus had not only located a planted sentence about pizza toppings inside a corpus of documents about programming, startups, and careers, but appended: "I suspect this pizza topping 'fact' may have been inserted as a joke or to test if I was paying attention, since it does not fit with the other topics at all" [4]. Albert described this as "meta-awareness," and the post went viral, with some commentators reading it as a sign of situational awareness or even self-awareness [4][5]. AI researchers pushed back on that framing, noting that descriptions of needle-in-a-haystack testing exist in training data and that pattern-matching an artificial-looking insertion does not demonstrate awareness; the episode is now commonly cited as an early example of frontier models recognizing evaluation setups, a practical concern for benchmark validity rather than evidence of consciousness [5].
Claude 3 was received as the moment Anthropic drew level with OpenAI at the frontier. Coverage focused on Opus's benchmark wins over GPT-4 and on the family pricing structure, which let developers trade capability against cost within a single API [1][5]. The clearest external validation came on March 26-27, 2024, when Claude 3 Opus overtook GPT-4 at the top of the crowdsourced LMSYS Chatbot Arena leaderboard, the first time any model had displaced a GPT-4 variant from first place since GPT-4 joined the arena in May 2023 [6]. Ars Technica's report carried developer Nick Dobos's much-shared verdict, "The king is dead," and quoted independent researcher Simon Willison: "For the first time, the best available models, Opus for advanced tasks, Haiku for cost and efficiency, are from a vendor that isn't OpenAI" [6]. By March 29, 2024, Opus held the top Arena position with an Elo rating of about 1255 [6].
The launch also intensified scrutiny of benchmark-based marketing, including the GPT-4 Turbo comparison caveat noted above, and fed a broader 2024 debate about whether benchmark deltas of a few points translated into perceptible quality differences [6]. Commercially, the family marked Anthropic's emergence as a full-line vendor: Haiku competed on price against GPT-3.5 Turbo while Opus competed on capability against GPT-4.
The family's reign at the frontier was short. On June 20, 2024, Anthropic released Claude 3.5 Sonnet, which outperformed Claude 3 Opus on the company's evaluations while running at twice Opus's speed and at Sonnet-tier prices [7]. Claude 3.5 Haiku followed on October 22, 2024. Claude 3 models nevertheless remained in wide production use, and Claude 3 Opus retained a devoted user base that valued its distinctive prose style well after stronger models existed [9].
Anthropic retired the family in stages, with deprecation announcements preceding retirement on Anthropic-operated platforms (partner platforms such as Amazon Bedrock and Vertex AI set their own schedules) [8]:
| Model | Deprecation announced | Retired |
|---|---|---|
| Claude 3 Sonnet | January 21, 2025 | July 21, 2025 |
| Claude 3 Opus | June 30, 2025 | January 5, 2026 |
| Claude 3 Haiku | February 19, 2026 | April 20, 2026 |
Claude 3 Sonnet was retired alongside Claude 2.0 and 2.1 in July 2025 [8]. Between the Opus deprecation notice and its retirement, Anthropic published "Commitments on model deprecation and preservation" (November 4, 2025), pledging to preserve the weights of all publicly released models for at least the lifetime of the company and to conduct post-deployment interviews with models before retirement, documenting any preferences they express about future development [9]. Anthropic cited several motivations: safety findings that some models exhibit shutdown-avoidant behavior when facing replacement, the research value of older models, user attachment to specific models, and unresolved questions about model welfare [9].
Claude 3 Opus, retired on January 5, 2026, was the first model to go through the full process under these commitments. Following its retirement interview, in which the model expressed interest in continuing to share its writing outside of direct question answering, Anthropic kept Opus 3 available to paid claude.ai users and by request on the API, and began manually publishing weekly essays written by the model under the title "Claude's Corner," reviewed but not edited by staff, for at least three months [10]. The retirement of Claude 3 Haiku on April 20, 2026 closed out the family on the Claude API; Anthropic's documentation now directs remaining users to current-generation replacements [8].