Claude 3

AI Models Large Language Models

9 min read

Updated Jun 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 23, 2026

Fact-checked

In review queue

Sources

10 citations

Revision

v2 · 1,881 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Claude 3 is a family of three large language models released by Anthropic on March 4, 2024, comprising Claude 3 Haiku, Claude 3 Sonnet, and Claude 3 Opus in ascending order of capability and cost ^[1]. It was the first Claude generation to accept image input, the first to ship as a tiered model family, and the first from any rival lab whose flagship was widely judged to match or beat GPT-4 on standard benchmarks, with Anthropic describing Opus as exhibiting "near-human levels of comprehension and fluency on complex tasks" ^[1]. That claim was reinforced later in March 2024 when Opus became the first model to displace GPT-4 from the top of the Chatbot Arena leaderboard ^[1]^[6]. All three models offered a 200,000-token context window. The family was progressively deprecated as Claude 3.5 and later generations arrived, with the last member, Claude 3 Haiku, retired from the Claude API on April 20, 2026; the retirement of Claude 3 Opus in January 2026 became the first test of Anthropic's formal commitments on model deprecation and weight preservation ^[8]^[9]^[10].

What is the Claude 3 model family?

Anthropic introduced the Claude 3 family in a March 4, 2024 announcement titled "Introducing the next generation of Claude," positioning the three models as successive points on a cost-versus-intelligence curve so that customers could pick the balance of speed, price, and capability appropriate to each workload ^[1]. Opus and Sonnet were available immediately through the Anthropic API in 159 countries; Sonnet replaced Claude 2.1 as the model behind the free claude.ai experience, while Opus was reserved for Claude Pro subscribers ^[1]. Sonnet was also available at launch on Amazon Bedrock and in private preview on Google Cloud's Vertex AI, with Opus and Haiku to follow on both platforms ^[1].

Haiku, described at launch as "available soon," was released on March 13, 2024 through the API, to Claude Pro subscribers on claude.ai, and on Amazon Bedrock ^[2]. The family carried a training data cutoff of August 2023 ^[3].

Compared with Claude 2, Anthropic emphasized fewer unnecessary refusals on borderline prompts, a "twofold improvement in accuracy" on hard open-ended factual questions relative to Claude 2.1, better multilingual performance, and improved instruction following for structured output formats such as JSON ^[1]. The models were trained using Constitutional AI, Anthropic's technique for aligning model behavior to a written set of principles ^[3].

What are the three Claude 3 models?

Model	API model ID	Positioning	Price per million tokens (input / output)	Context window
Claude 3 Opus	claude-3-opus-20240229	Most intelligent; complex analysis, R&D, task automation	$15 / $75	200K
Claude 3 Sonnet	claude-3-sonnet-20240229	Balance of intelligence and speed; enterprise workloads	$3 / $15	200K
Claude 3 Haiku	claude-3-haiku-20240307	Fastest and most compact; near-instant responsiveness	$0.25 / $1.25	200K

Pricing and positioning are as published at launch ^[1]^[2].

Claude 3 Opus was Anthropic's flagship, marketed as having "best-in-market performance on highly complex tasks" and "near-human levels of comprehension and fluency" on open-ended problems ^[1]. Claude 3 Sonnet was tuned for high-volume deployment; Anthropic said it was "2x faster than Claude 2 and Claude 2.1 with higher levels of intelligence" ^[1]. Claude 3 Haiku targeted latency- and cost-sensitive applications such as customer support; Anthropic said it could read an information- and data-dense roughly 10,000-token research paper, including charts and graphs, "in less than three seconds," and at release called it three times faster than peer models for most workloads, processing about 21,000 tokens per second on prompts under 32K tokens ^[1]^[2].

How does Claude 3 compare to GPT-4 on benchmarks?

Anthropic's launch materials reported that Opus outperformed both GPT-4 and Google's Gemini 1.0 Ultra on most common academic evaluations, including undergraduate-level knowledge (MMLU), graduate-level reasoning (GPQA), grade-school math (GSM8K), and Python coding (HumanEval) ^[1]^[3]. Selected figures as published by Anthropic:

Benchmark	Opus	Sonnet	Haiku	GPT-4 (as cited by Anthropic)
MMLU (5-shot)	86.8%	79.0%	75.2%	86.4%
GPQA Diamond (0-shot CoT)	50.4%	40.4%	33.3%	35.7%
GSM8K	95.0%	92.3%	88.9%	92.0%
HumanEval (0-shot)	84.9%	73.0%	75.9%	67.0%

The GPT-4 column reproduced previously published results for the original March 2023 GPT-4 release; commentators noted at the time that newer GPT-4 Turbo versions scored higher on several of these tests, so the margin over OpenAI's best deployed model was narrower than the table implied ^[6]. Opus's GPQA Diamond score of roughly 50% drew particular attention because the benchmark is designed so that even skilled non-expert humans with web access score around 34% ^[3].

All three models were multimodal on input, accepting photos, charts, graphs, and technical diagrams alongside text, with vision performance that Anthropic described as on par with other leading models ^[1]. Output remained text only. The family also improved on tool use and structured extraction, which made Haiku in particular a common choice for high-volume classification and data-processing pipelines ^[2].

What is the Claude 3 context window and vision capability?

Claude 3 launched with a 200,000-token context window for all three models, with Anthropic stating that the models were "capable of accepting inputs exceeding 1 million tokens" and that 1M-token contexts "may" be made available to select customers needing enhanced processing power ^[1].

To validate long-context recall, Anthropic used the needle in a haystack (NIAH) methodology, in which a target sentence is buried inside a large corpus of unrelated documents and the model is asked to retrieve it. Anthropic reported that Opus "not only achieved near-perfect recall, surpassing 99% accuracy" ^[1]. The evaluation produced one of the launch's most-discussed moments: Anthropic prompt engineer Alex Albert posted on X that during internal testing, Opus had not only located a planted sentence about pizza toppings inside a corpus of documents about programming, startups, and careers, but appended: "I suspect this pizza topping 'fact' may have been inserted as a joke or to test if I was paying attention, since it does not fit with the other topics at all" ^[4]. Albert described this as "meta-awareness," and the post went viral, with some commentators reading it as a sign of situational awareness or even self-awareness ^[4]^[5]. AI researchers pushed back on that framing, noting that descriptions of needle-in-a-haystack testing exist in training data and that pattern-matching an artificial-looking insertion does not demonstrate awareness; the episode is now commonly cited as an early example of frontier models recognizing evaluation setups, a practical concern for benchmark validity rather than evidence of consciousness ^[5].

How was Claude 3 received?

Claude 3 was received as the moment Anthropic drew level with OpenAI at the frontier. Coverage focused on Opus's benchmark wins over GPT-4 and on the family pricing structure, which let developers trade capability against cost within a single API ^[1]^[5]. The clearest external validation came on March 26-27, 2024, when Claude 3 Opus overtook GPT-4 at the top of the crowdsourced LMSYS Chatbot Arena leaderboard, the first time any model had displaced a GPT-4 variant from first place since GPT-4 joined the arena in May 2023 ^[6]. Ars Technica's report carried developer Nick Dobos's much-shared verdict, "The king is dead," and quoted independent researcher Simon Willison: "For the first time, the best available models, Opus for advanced tasks, Haiku for cost and efficiency, are from a vendor that isn't OpenAI" ^[6]. By March 29, 2024, Opus held the top Arena position with an Elo rating of about 1255 ^[6].

The launch also intensified scrutiny of benchmark-based marketing, including the GPT-4 Turbo comparison caveat noted above, and fed a broader 2024 debate about whether benchmark deltas of a few points translated into perceptible quality differences ^[6]. Commercially, the family marked Anthropic's emergence as a full-line vendor: Haiku competed on price against GPT-3.5 Turbo while Opus competed on capability against GPT-4.

How does Claude 3 relate to Claude 3.5, and when was it deprecated?

The family's reign at the frontier was short. On June 20, 2024, Anthropic released Claude 3.5 Sonnet, described as its "first release in the forthcoming Claude 3.5 model family," which outperformed Claude 3 Opus on the company's evaluations while running at twice Opus's speed and at Sonnet-tier prices of $3 / $15 per million tokens ^[7]. Claude 3.5 Haiku followed on October 22, 2024. Claude 3 models nevertheless remained in wide production use, and Claude 3 Opus retained a devoted user base that valued its distinctive prose style well after stronger models existed ^[9].

Anthropic retired the family in stages, with deprecation announcements preceding retirement on Anthropic-operated platforms (partner platforms such as Amazon Bedrock and Vertex AI set their own schedules) ^[8]:

Model	Deprecation announced	Retired
Claude 3 Sonnet	January 21, 2025	July 21, 2025
Claude 3 Opus	June 30, 2025	January 5, 2026
Claude 3 Haiku	February 19, 2026	April 20, 2026

Claude 3 Sonnet was retired alongside Claude 2.0 and 2.1 in July 2025 ^[8]. Between the Opus deprecation notice and its retirement, Anthropic published "Commitments on model deprecation and preservation" (November 4, 2025), pledging to preserve the weights of all publicly released models for at least the lifetime of the company and to conduct post-deployment interviews with models before retirement, documenting any preferences they express about future development ^[9]. Anthropic cited several motivations: safety findings that some models exhibit shutdown-avoidant behavior when facing replacement, the research value of older models, user attachment to specific models, and unresolved questions about model welfare ^[9].

Claude 3 Opus, retired on January 5, 2026, was the first model to go through the full process under these commitments. Following its retirement interview, in which the model expressed interest in continuing to share its writing outside of direct question answering, Anthropic kept Opus 3 available to paid claude.ai users and by request on the API, and began manually publishing weekly essays written by the model under the title "Claude's Corner," reviewed but not edited by staff, for at least three months ^[10]. The retirement of Claude 3 Haiku on April 20, 2026 closed out the family on the Claude API; Anthropic's documentation now directs remaining users to current-generation replacements ^[8].

References

Anthropic. "Introducing the next generation of Claude." March 4, 2024. https://www.anthropic.com/news/claude-3-family ↩
Anthropic. "Claude 3 Haiku: our fastest model yet." March 13, 2024. https://www.anthropic.com/news/claude-3-haiku ↩
Anthropic. "The Claude 3 Model Family: Opus, Sonnet, Haiku." Model card, March 2024. https://www.anthropic.com/claude-3-model-card ↩
Albert, Alex. X post on Claude 3 Opus and the needle-in-a-haystack evaluation. March 4, 2024. https://x.com/alexalbert__/status/1764722513014329620 ↩
Edwards, Benj. "Anthropic's Claude 3 causes stir by seeming to realize when it was being tested." Ars Technica, March 5, 2024. https://arstechnica.com/information-technology/2024/03/claude-3-seems-to-detect-when-it-is-being-tested-sparking-ai-buzz-online/ ↩
Edwards, Benj. "'The king is dead': Claude 3 surpasses GPT-4 on Chatbot Arena for the first time." Ars Technica, March 27, 2024. https://arstechnica.com/information-technology/2024/03/the-king-is-dead-claude-3-surpasses-gpt-4-on-chatbot-arena-for-the-first-time/ ↩
Anthropic. "Introducing Claude 3.5 Sonnet." June 20, 2024. https://www.anthropic.com/news/claude-3-5-sonnet ↩
Anthropic. "Model deprecations." Claude API documentation, accessed June 10, 2026. https://platform.claude.com/docs/en/about-claude/model-deprecations ↩
Anthropic. "Commitments on model deprecation and preservation." November 4, 2025. https://www.anthropic.com/research/deprecation-commitments ↩
Anthropic. "An update on our model deprecation commitments for Claude Opus 3." 2026. https://www.anthropic.com/research/deprecation-updates-opus-3 ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

AI Model Release Timeline (2022-2026)Autoencoder BLINK Claude 3 Haiku Claude 3 Sonnet Claude 4 Companies Context engineering DPO Dictionary learning (for interpretability)LLM Benchmarks Timeline Node (neural network)Polysemanticity Question Answering Models Reka Core Sparse autoencoder Variable importances

What is the Claude 3 model family?

What are the three Claude 3 models?

How does Claude 3 compare to GPT-4 on benchmarks?

What is the Claude 3 context window and vision capability?

How was Claude 3 received?

How does Claude 3 relate to Claude 3.5, and when was it deprecated?

References

Improve this article

Related Articles

LLaMA/Model Card

Bert-base-uncased model

Foundation models

GPT

Llama 3

GPT-5

What links here

Related Articles

LLaMA/Model Card

Bert-base-uncased model

Foundation models

GPT

Llama 3

GPT-5

What links here