Claude 2.1

AI Models Anthropic Large Language Models

6 min read

Updated Jun 3, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 3, 2026

Fact-checked

In review queue

Sources

12 citations

Revision

v1 · 1,279 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Claude 2.1 is a large language model released by Anthropic on November 21, 2023, as an incremental update to Claude 2. The release roughly doubled the model's context window to 200,000 tokens, reported lower rates of false statements, and added two developer-facing capabilities in beta: external tool use and system prompts. It was made available through Anthropic's API and the Claude chat assistant at claude.ai.^[1]^[2]^[3]

Background

Claude 2 had launched in July 2023 with a 100,000-token context window, which at the time was already large relative to competing commercial models. Claude 2.1 arrived about four months later and was positioned by Anthropic as a step aimed mainly at enterprise users, emphasizing longer inputs, greater factual reliability, and tighter integration with external systems.^[1]^[4] The update appeared shortly after OpenAI announced GPT-4 Turbo with a 128,000-token context window, and several outlets framed Claude 2.1 as a direct competitive response on context length.^[3]^[5]

200K context window

The headline change was an expanded context window of 200,000 tokens. Anthropic described this as roughly 150,000 words, or more than 500 pages of text, and gave examples such as loading entire codebases, long financial filings like S-1 documents, or full-length books in a single prompt.^[1]^[2] The figure was double the 100,000-token limit of Claude 2.0 and exceeded the 128,000 tokens offered by GPT-4 Turbo, making it one of the largest context windows available in a commercial model at the time.^[3]^[5]

Feature	Claude 2.0	Claude 2.1
Release date	July 2023	November 21, 2023
Context window	100,000 tokens	200,000 tokens
Approx. text capacity	About 75,000 words	About 150,000 words / 500+ pages
Tool use	Not offered	Beta
System prompts	Not offered	Yes

Anthropic cautioned that prompts near the upper end of the window carried a latency cost. Processing inputs close to 200,000 tokens could take the model several minutes, and the company said it expected that speed to improve as the technology matured.^[1]^[2]

Reduced hallucination

Anthropic reported that Claude 2.1 made fewer unsupported or fabricated statements than its predecessor. The company stated a "2x decrease in false statements" compared with Claude 2.0.^[1]^[4] For tasks involving comprehension of long documents, Anthropic gave two more specific figures: a 30% reduction in incorrect answers, and a 3 to 4 times lower rate of mistakenly concluding that a document supported a particular claim.^[1]^[6] These were Anthropic's own internal measurements rather than results from an independent benchmark, and reporting at the time generally presented them as vendor claims. The company framed the improvement as part of an effort to make the model more willing to acknowledge uncertainty and decline to answer when the available text did not support a conclusion.^[1]^[6]

Tool use and system prompts

Claude 2.1 introduced tool use, a beta feature that let developers connect the model to external functions and data sources. Anthropic described capabilities such as calling a calculator for arithmetic, converting natural-language requests into structured API calls, querying databases or web sources, and retrieving information from a product catalog to make recommendations. The aim was to let Claude take actions and ground answers in live data rather than relying solely on its training.^[1]^[2] This early function-calling interface was a precursor to the more formalized tool-use API that Anthropic shipped with later Claude generations.

The release also added system prompts, a mechanism for giving the model standing instructions that apply across a conversation. A system prompt can set a role or persona, specify a tone, or define how responses should be structured, separate from the user's individual messages. The feature gave developers and chat users a more consistent way to steer behavior, and it aligned the API more closely with patterns already common in competing products.^[1]^[2] Anthropic's broader approach to model behavior and safety in this period was shaped by its constitutional AI method, which trains models against a written set of principles.

Long-context recall testing

Soon after the launch, the practical reliability of the 200,000-token window became a subject of independent scrutiny. Greg Kamradt ran a "needle in a haystack" evaluation, in which a single out-of-place fact (the needle) is planted at varying depths inside a large body of text (the haystack) and the model is asked to retrieve it. Anthropic gave him early access to run the test on Claude 2.1.^[7]

Kamradt reported that recall was strong for facts placed at the very top or very bottom of the document but degraded for facts buried in the middle, and that performance fell off as the input grew longer. By his account, recall at the bottom of the document began deteriorating at roughly 90,000 tokens.^[7]^[8] The pattern echoed the "lost in the middle" effect observed in other long-context models. Some commentators also noted that Anthropic's tokenizer was less space-efficient than OpenAI's, so a 200,000-token window held somewhat less text than the raw number implied when compared directly with GPT-4.^[8]

Anthropic responded in a December 2023 post on long-context prompting. On its own version of the evaluation, the model scored 27% when simply asked the question. Adding a single line to the start of Claude's reply, "Here is the most relevant sentence in the context:", raised the score to 98%.^[9] Anthropic explained that Claude 2.1 had been trained to avoid answering based on a sentence that seemed unsupported by the surrounding document, so an artificially injected fact could trigger that reluctance; cueing the model to surface the relevant sentence first overrode the hesitation. The company presented this as a prompting workaround rather than a change to the model, and the episode became a frequently cited example of how sensitive long-context behavior can be to prompt wording.^[9]^[10]

Availability

Claude 2.1 powered the claude.ai assistant and was available through Anthropic's developer API. The model was offered to both free and paid users of the chat product, but access to the full 200,000-token context window was reserved for subscribers to Claude Pro, the $20-per-month plan, and for API customers.^[1]^[2] Through the API, Anthropic priced the model at $8 per million input tokens and $24 per million output tokens.^[6]^[11] The model also reached users indirectly through third-party services that integrated Anthropic's API, such as the Claude option in some Perplexity tiers.^[2]

Successors

Anthropic released the Claude 3 family (Haiku, Sonnet, and Opus) in March 2024, which superseded the Claude 2 line. Claude 2.1 (API name claude-2.1) was later marked for retirement: on January 21, 2025, Anthropic notified developers that Claude 2, Claude 2.1, and Claude 3 Sonnet would be retired, and the model was retired on July 21, 2025, after which API requests to it fail.^[12]

References

Introducing Claude 2.1. Anthropic, November 21, 2023. ↩
"Anthropic Releases Claude 2.1 Generative AI Chatbot With 200K Context Window and Reduced Errors". Voicebot.ai, November 21, 2023. ↩
"Anthropic releases Claude 2.1 with 200k context window". DailyAI, November 2023. ↩
"Claude 2.1 released with a 200K token context window and lower model hallucination rates". AlternativeTo, November 2023. ↩
"Anthropic Upgrades Claude With Nearly Twice The Capabilities of GPT-4 Turbo". Decrypt, November 2023. ↩
"Anthropic Announces Claude 2.1 LLM with Wider Context Window and Support for AI Tools". InfoQ, November 2023. ↩
"The Needle In a Haystack Test: Evaluating the Performance of LLM RAG Systems". Arize AI. ↩
"Claude 2.1: Tokenizer inefficiency, Needle in a Haystack". Nils Durner's Blog. ↩
Long context prompting for Claude 2.1. Anthropic, December 2023. ↩
"Anthropic's long context 'prompt hack' shows the weirdness of LLMs". The Decoder, December 2023. ↩
"Claude 2.1 Model Card". PromptHub. ↩
Model deprecations. Anthropic Claude API documentation. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

AI Model Release Timeline (2022-2026)Claude 2 Claude 3 Sonnet

Background

200K context window

Reduced hallucination

Tool use and system prompts

Long-context recall testing

Availability

Successors

References

Improve this article

Related Articles

Claude Opus 4.7

Claude Opus 4.5

Claude Haiku 4.5

Claude Opus 4.6

Claude 4

Claude Opus 4

What links here

Related Articles

Claude Opus 4.7

Claude Opus 4.5

Claude Haiku 4.5

Claude Opus 4.6

Claude 4

Claude Opus 4

What links here