Claude 2.1
Last reviewed
Jun 3, 2026
Sources
12 citations
Review status
Source-backed
Revision
v1 · 1,279 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
12 citations
Review status
Source-backed
Revision
v1 · 1,279 words
Add missing citations, update stale details, or suggest a clearer explanation.
Claude 2.1 is a large language model released by Anthropic on November 21, 2023, as an incremental update to Claude 2. The release roughly doubled the model's context window to 200,000 tokens, reported lower rates of false statements, and added two developer-facing capabilities in beta: external tool use and system prompts. It was made available through Anthropic's API and the Claude chat assistant at claude.ai.[1][2][3]
Claude 2 had launched in July 2023 with a 100,000-token context window, which at the time was already large relative to competing commercial models. Claude 2.1 arrived about four months later and was positioned by Anthropic as a step aimed mainly at enterprise users, emphasizing longer inputs, greater factual reliability, and tighter integration with external systems.[1][4] The update appeared shortly after OpenAI announced GPT-4 Turbo with a 128,000-token context window, and several outlets framed Claude 2.1 as a direct competitive response on context length.[3][5]
The headline change was an expanded context window of 200,000 tokens. Anthropic described this as roughly 150,000 words, or more than 500 pages of text, and gave examples such as loading entire codebases, long financial filings like S-1 documents, or full-length books in a single prompt.[1][2] The figure was double the 100,000-token limit of Claude 2.0 and exceeded the 128,000 tokens offered by GPT-4 Turbo, making it one of the largest context windows available in a commercial model at the time.[3][5]
| Feature | Claude 2.0 | Claude 2.1 |
|---|---|---|
| Release date | July 2023 | November 21, 2023 |
| Context window | 100,000 tokens | 200,000 tokens |
| Approx. text capacity | About 75,000 words | About 150,000 words / 500+ pages |
| Tool use | Not offered | Beta |
| System prompts | Not offered | Yes |
Anthropic cautioned that prompts near the upper end of the window carried a latency cost. Processing inputs close to 200,000 tokens could take the model several minutes, and the company said it expected that speed to improve as the technology matured.[1][2]
Anthropic reported that Claude 2.1 made fewer unsupported or fabricated statements than its predecessor. The company stated a "2x decrease in false statements" compared with Claude 2.0.[1][4] For tasks involving comprehension of long documents, Anthropic gave two more specific figures: a 30% reduction in incorrect answers, and a 3 to 4 times lower rate of mistakenly concluding that a document supported a particular claim.[1][6] These were Anthropic's own internal measurements rather than results from an independent benchmark, and reporting at the time generally presented them as vendor claims. The company framed the improvement as part of an effort to make the model more willing to acknowledge uncertainty and decline to answer when the available text did not support a conclusion.[1][6]
Claude 2.1 introduced tool use, a beta feature that let developers connect the model to external functions and data sources. Anthropic described capabilities such as calling a calculator for arithmetic, converting natural-language requests into structured API calls, querying databases or web sources, and retrieving information from a product catalog to make recommendations. The aim was to let Claude take actions and ground answers in live data rather than relying solely on its training.[1][2] This early function-calling interface was a precursor to the more formalized tool-use API that Anthropic shipped with later Claude generations.
The release also added system prompts, a mechanism for giving the model standing instructions that apply across a conversation. A system prompt can set a role or persona, specify a tone, or define how responses should be structured, separate from the user's individual messages. The feature gave developers and chat users a more consistent way to steer behavior, and it aligned the API more closely with patterns already common in competing products.[1][2] Anthropic's broader approach to model behavior and safety in this period was shaped by its constitutional AI method, which trains models against a written set of principles.
Soon after the launch, the practical reliability of the 200,000-token window became a subject of independent scrutiny. Greg Kamradt ran a "needle in a haystack" evaluation, in which a single out-of-place fact (the needle) is planted at varying depths inside a large body of text (the haystack) and the model is asked to retrieve it. Anthropic gave him early access to run the test on Claude 2.1.[7]
Kamradt reported that recall was strong for facts placed at the very top or very bottom of the document but degraded for facts buried in the middle, and that performance fell off as the input grew longer. By his account, recall at the bottom of the document began deteriorating at roughly 90,000 tokens.[7][8] The pattern echoed the "lost in the middle" effect observed in other long-context models. Some commentators also noted that Anthropic's tokenizer was less space-efficient than OpenAI's, so a 200,000-token window held somewhat less text than the raw number implied when compared directly with GPT-4.[8]
Anthropic responded in a December 2023 post on long-context prompting. On its own version of the evaluation, the model scored 27% when simply asked the question. Adding a single line to the start of Claude's reply, "Here is the most relevant sentence in the context:", raised the score to 98%.[9] Anthropic explained that Claude 2.1 had been trained to avoid answering based on a sentence that seemed unsupported by the surrounding document, so an artificially injected fact could trigger that reluctance; cueing the model to surface the relevant sentence first overrode the hesitation. The company presented this as a prompting workaround rather than a change to the model, and the episode became a frequently cited example of how sensitive long-context behavior can be to prompt wording.[9][10]
Claude 2.1 powered the claude.ai assistant and was available through Anthropic's developer API. The model was offered to both free and paid users of the chat product, but access to the full 200,000-token context window was reserved for subscribers to Claude Pro, the $20-per-month plan, and for API customers.[1][2] Through the API, Anthropic priced the model at $8 per million input tokens and $24 per million output tokens.[6][11] The model also reached users indirectly through third-party services that integrated Anthropic's API, such as the Claude option in some Perplexity tiers.[2]
Anthropic released the Claude 3 family (Haiku, Sonnet, and Opus) in March 2024, which superseded the Claude 2 line. Claude 2.1 (API name claude-2.1) was later marked for retirement: on January 21, 2025, Anthropic notified developers that Claude 2, Claude 2.1, and Claude 3 Sonnet would be retired, and the model was retired on July 21, 2025, after which API requests to it fail.[12]