Claude 3.5 Sonnet
Last reviewed
May 17, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 · 4,962 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 17, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v2 · 4,962 words
Add missing citations, update stale details, or suggest a clearer explanation.
Claude 3.5 Sonnet is a multimodal large language model developed by Anthropic and first released on June 20, 2024 as the opening model of the company's 3.5 generation. It was the model that ended Anthropic's role as a perennial second-place lab and made the Sonnet tier the company's signature product. A substantially upgraded version followed on October 22, 2024, shipping alongside Claude 3.5 Haiku and the public beta of Computer Use. Within the developer community the upgraded model was widely nicknamed "Claude 3.6" because of how large the jump was, and Anthropic eventually skipped that version number entirely when the next official release became Claude 3.7 Sonnet in February 2025.[^1][^2][^3]
The two snapshots use the API identifiers claude-3-5-sonnet-20240620 (the original) and claude-3-5-sonnet-20241022 (the upgrade). Both share a 200,000-token context window, an 8,192-token maximum output length, and a price of $3 per million input tokens and $15 per million output tokens. Both were released under AI Safety Level 2 (ASL-2) deployment standards. The upgraded snapshot improved SWE-bench Verified from 33.4% to 49.0%, which Anthropic at the time described as state of the art for any publicly available model, ahead of OpenAI's recently launched o1-preview reasoning model.[^4][^5]
At launch Anthropic positioned 3.5 Sonnet as outperforming the larger and slower Claude 3 Opus at roughly one fifth of the price, an unusual move for the industry at the time. The June launch was paired with the debut of Artifacts, a side-panel interface in claude.ai that rendered code, websites, SVG graphics, and documents live as the model produced them, and which became the visual face of the model for most consumer users. The model's combination of strong coding, low cost, fast latency, and a long context window made it the default option in Cursor, GitHub Copilot (after Anthropic was added in October 2024), Replit, Vercel v0, Sourcegraph Cody, and a long list of agent products built between mid-2024 and early 2025.[^1][^2][^6]
Claude 3.5 Sonnet held leading or tied-leading positions on coding, reasoning, and agentic-task benchmarks for most of the seven-month window between its launch and the release of Claude 3.7 Sonnet. Both snapshots were retired on October 28, 2025 after Anthropic's standard 60-day notice period, with users directed to migrate to Claude Sonnet 4 and its successors.[^5][^7]
By the spring of 2024 Anthropic had shipped its Claude 3 family in three sizes — Haiku, Sonnet, and Opus — released as a coordinated launch on March 4, 2024. The pattern of three sizes was meant to mirror conventional product tiers: Haiku was the small, fast, cheap model; Sonnet the mid-tier; and Claude 3 Opus the large flagship that won most benchmark contests but cost roughly five times as much per token to run. Across the industry the assumption in early 2024 was that frontier-level intelligence required flagship-class models, and that intermediate tiers existed for cost-sensitive workloads rather than for state of the art performance.[^13]
The "3.5" label was chosen to signal a partial generational improvement that kept costs flat for a given tier while substantially lifting capability. Anthropic did not commit publicly to delivering all three sizes of the 3.5 family on day one; the June 20, 2024 announcement described 3.5 Sonnet as "the first in our forthcoming Claude 3.5 model family," with Haiku and Opus refreshes intended to follow later in the year. Claude 3.5 Haiku eventually shipped in October 2024 alongside the upgraded Sonnet snapshot. Claude 3.5 Opus was repeatedly promised but never released; Anthropic quietly removed it from its roadmap and shipped Claude Opus 4 in May 2025 instead. That absence — a flagship 3.5 model that never arrived — became one of the more discussed signals about Anthropic's internal compute and product trade-offs during the period.[^1][^2][^7][^10]
| Snapshot | API ID | Release date | Knowledge cutoff | Retirement |
|---|---|---|---|---|
| Original | claude-3-5-sonnet-20240620 | June 20, 2024 | April 2024 | October 28, 2025 |
| Upgraded ("new", informally "3.6") | claude-3-5-sonnet-20241022 | October 22, 2024 | April 2024 | October 28, 2025 |
Anthropic introduced Claude 3.5 Sonnet on June 20, 2024 in a blog post titled "Introducing Claude 3.5 Sonnet." The unusual framing was that the new mid-tier model outperformed the existing flagship: the post stated that 3.5 Sonnet "raises the industry bar for intelligence," beating both competitor models and Claude 3 Opus while operating "at twice the speed of Claude 3 Opus" and at the cost of the previous mid-tier model. The price was set at $3 per million input tokens and $15 per million output tokens, the same rate as Claude 3 Sonnet and far below Opus.[^1]
The model was made available the same day on the Anthropic API, claude.ai (free and Pro), the Claude iOS app, Amazon Bedrock, and Google Cloud's Vertex AI. It supported a 200,000-token context window, an 8,192-token maximum output, and image inputs (vision) in the same way Claude 3 Opus had. The knowledge cutoff was April 2024.[^1][^8]
At launch the company published a model card addendum to the existing Claude 3 model card rather than a fresh card, the rationale being that 3.5 Sonnet was an evolution of the Claude 3 family. The addendum reported benchmark results, vision evaluations, agentic coding numbers, refusal rates on Wildchat and XSTest, and a Responsible Scaling Policy safety evaluation in which Anthropic concluded that the model did not exceed thresholds for ASL-3 and shipped under ASL-2. The UK AI Safety Institute ran independent pre-deployment testing.[^8]
On October 22, 2024 Anthropic published a single post titled "Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku." The release bundled three things: an upgraded Claude 3.5 Sonnet snapshot referenced as claude-3-5-sonnet-20241022; the introduction of Claude 3.5 Haiku (which followed later in the month); and a new public beta in the Anthropic API that let developers "direct Claude to use computers the way people do, by looking at a screen, moving a cursor, clicking buttons, and typing text."[^2]
The upgraded Sonnet was made available immediately on the Anthropic API, Amazon Bedrock, Google Cloud's Vertex AI, and claude.ai. The price stayed at $3 input and $15 output per million tokens. The context window stayed at 200K. The knowledge cutoff stayed at April 2024. The model card addendum for the upgrade noted that the model had retained the original Sonnet 3.5's pricing while improving "across the board."[^2][^3]
Critically, the new snapshot shared the user-visible name "Claude 3.5 Sonnet" with the June model but used a different dated API ID. This meant the only reliable way to distinguish the two from inside an application was the suffix on the model ID. The community quickly settled on the unofficial nickname "Claude 3.6" or "Sonnet 3.6" for the upgrade, and the label stuck.
3.5 Sonnet handles general text generation, summarization, classification, multilingual translation, structured extraction, and standard reasoning. The October 2024 upgrade tightened instruction following: Anthropic reported a 51% human preference win rate over the original Sonnet 3.5 on "following precise instructions" tasks. In practice that change was visible to users in the form of cleaner adherence to format requests, JSON schemas, and complex prompt scaffolds, which is part of why the model became the default in agent frameworks.[^3]
Coding was the headline strength of both snapshots. The June model led HumanEval at 92.0% and Anthropic's internal agentic coding evaluation at 64%. The October upgrade pushed those to 93.7% and 78% respectively, and added the SWE-bench Verified jump to 49.0% that became one of the most-cited numbers in the second half of 2024. SWE-bench Verified at the time had a published state of the art of 45.2% (achieved by a system built on a competitor model with elaborate scaffolding); the upgraded Sonnet beat that as a single-model score.[^3][^15]
Real-world adoption tracked the benchmarks. Cursor reported the model as their default for most users; Replit Agent used it; Sourcegraph Cody added it; v0 by Vercel made it a primary option for frontend code generation; and a large fraction of independent agentic-coding products built between mid-2024 and early 2025 had a Sonnet 3.5 path. By the time the upgrade landed in October, the model was widely described in developer communities as "the model you reach for when you actually need the work done."[^6][^16]
Both snapshots accept images as inputs and handle chart and graph interpretation, document understanding, science diagrams, OCR-like transcription from imperfect images, and visual math. MMMU validation reached 68.3% on the original snapshot and 70.4% on the upgrade, ChartQA reached 90.8% (unchanged), and DocVQA reached 95.2% on the original (slipping marginally to 94.2% on the upgrade). The model card addendum framed vision as a steady, not headline, capability.[^3][^8]
Both snapshots ship with a 200,000-token context window, the same as Claude 3 Opus, and a maximum output length of 8,192 tokens (an early stretch of Claude 3 Opus had been capped at 4,096 output tokens, which 3.5 Sonnet doubled out of the gate). The 200K context made Sonnet 3.5 well-suited to long-document workflows: summarizing large codebases, reasoning over multi-thousand-line contracts, and serving as the back end for RAG pipelines that retrieved large evidence sets.[^1]
3.5 Sonnet was Anthropic's first model with what became the company's standard structured tool-use API generally available, where the developer registers a tool schema and the model emits typed tool_use calls that the host code executes. The approach generalized beyond function calling: it underpinned Computer Use at launch, became the basis for Model Context Protocol clients introduced in November 2024, and is the same primitive Claude Code used when it arrived in February 2025. TAU-bench, an external benchmark for tool-using customer-service agents, was where the October upgrade most clearly outscored the field: 69.2% retail and 46.0% airline.[^2][^3][^14][^17]
Alongside the June 20, 2024 release, Anthropic launched a feature called Artifacts on claude.ai. Artifacts opened a dedicated panel next to the chat where Claude could render generated content in place: code, SVG illustrations, HTML pages, mermaid diagrams, React components, single-file games, and documents. The panel updated in real time as the model produced output and could be edited or rolled back.[^1][^9]
The combination of 3.5 Sonnet plus Artifacts gave non-developer users an immediate, visual way to use the model for what would later be called "vibe coding." Artifacts also helped seed a class of single-prompt mini-apps that became a recurring genre on Twitter and Hacker News during the second half of 2024 — share-a-tweet demos of working calculators, retro games, generative art, and dashboards produced entirely from a single prompt. Anthropic later expanded Artifacts to support persistent saved projects, embedded sharing, and a separate "publish" mode in 2025.[^9]
Artifacts is one of the few consumer-facing AI UX patterns from 2024 that was widely copied. ChatGPT's Canvas (released October 2024) and Gemini's Code Canvas borrowed the side-panel structured-editor concept directly, and both companies cited Artifacts as inspiration in subsequent press coverage.
The October 22, 2024 release introduced Computer Use, a public beta capability that let Claude operate a real desktop environment by reading screenshots and emitting mouse and keyboard actions. This made the upgraded Claude 3.5 Sonnet the first frontier model from a major AI lab to ship a general-purpose desktop automation capability. The Anthropic-hosted demo ran the model inside a sandboxed virtual machine and showed it filling vendor forms, browsing Google Maps, and navigating GUI applications. On the OSWorld benchmark, the upgraded Sonnet 3.5 scored 14.9% on the 15-step screenshot-only setting and 22% with 50 steps, against a previous best of around 7.8%. Human performance on OSWorld is roughly 72.36%. Anthropic was candid in the launch post that Computer Use was "at times cumbersome and error-prone" and recommended developers run it in containers with limited privileges.[^2][^3]
Crucially, Computer Use was launched with the October 22, 2024 snapshot only — the original June 20, 2024 snapshot did not include this capability. Developers wishing to use the feature had to switch their model ID to claude-3-5-sonnet-20241022.
Computer Use was one of the first general-purpose agentic-control products from a major lab and predated OpenAI Operator (January 2025) and Google's Project Mariner (December 2024). It became the template that successive Claude generations refined: by Claude Sonnet 4.5 the same OSWorld benchmark had reached 61.4%.[^18]
Under the hood, Computer Use was implemented as a structured tool-use pattern: Anthropic defined three reference tools — a computer tool that emitted screen actions like screenshot, left_click, type, key, mouse_move, and scroll; a text_editor tool for file editing; and a bash tool for shell command execution. These three tools became the bedrock of agentic Claude integrations for the next year. The text_editor and bash tool patterns were carried over directly into Claude Code when it launched in February 2025.[^2][^17]
The table below summarizes the headline benchmark scores for both snapshots, drawn from Anthropic's own model card addendums (June 2024 for the original, October 2024 for the upgrade). Where the original is missing a number, Anthropic did not report it on the original card.
| Benchmark | Original (June 2024) | Upgraded (October 2024) | Notes |
|---|---|---|---|
| GPQA Diamond (0-shot CoT) | 59.4% | 65.0% | Graduate-level science Q&A |
| MMLU (5-shot CoT) | 90.4% | 90.5% | General reasoning |
| MMLU Pro (0-shot CoT) | not reported | 78.0% | Harder MMLU variant |
| MATH (0-shot CoT) | 71.1% | 78.3% | Mathematical problem solving |
| HumanEval | 92.0% | 93.7% | Python coding |
| MGSM | 91.6% | 92.5% | Multilingual math |
| DROP (3-shot, F1) | 87.1 | 88.3 | Reading comprehension |
| BIG-Bench Hard (3-shot CoT) | 93.1% | 93.2% | Mixed evaluations |
| GSM8K | 96.4% | not reported | Grade-school math |
| AIME 2024 (0-shot CoT) | not reported | 16.0% | High school math contest |
| AIME 2024 (Maj@64) | not reported | 27.6% | |
| IFEval | not reported | 90.2% | Instruction following |
| MMMU (validation, 0-shot) | 68.3% | 70.4% | Visual question answering |
| MathVista (testmini) | 67.7% | 70.7% | Visual math reasoning |
| AI2D | 94.7% | 95.3% | Science diagrams |
| ChartQA | 90.8% | 90.8% | Chart understanding |
| DocVQA (ANLS) | 95.2% | 94.2% | Document understanding |
| SWE-bench Verified | 33.4% | 49.0% | Real GitHub issues |
| TAU-bench retail (pass^1) | 62.6% | 69.2% | Tool-use customer service |
| TAU-bench airline (pass^1) | 36.0% | 46.0% | Harder TAU-bench split |
| OSWorld (15-step, screenshot) | not applicable | 14.9% | Desktop agent |
| Internal agentic coding | 64% | 78% | Anthropic internal eval |
The single benchmark that did the most to define the model's reputation was SWE-bench Verified. The upgraded snapshot's 49.0% put it ahead of OpenAI's o1-preview reasoning model (41.0% on the same benchmark in OpenAI's reported numbers) and ahead of all open-source alternatives, despite running without explicit chain-of-thought reasoning at inference time.[^3][^15]
Independent leaderboards generally agreed with the model card numbers. The Vellum LLM Leaderboard tracked the upgraded snapshot near the top of coding and reasoning categories from late October 2024 through February 2025. On Artificial Analysis the model held a leading-tier ranking on its composite intelligence score for that window, behind the o1 family on math but ahead on coding tasks. On the LMSYS Chatbot Arena (later LMArena), the model held a top-five Elo score for the same period, frequently in the top three for coding-tagged prompts.[^19][^20]
Both snapshots were priced identically:
| Tier | Cost |
|---|---|
| Input tokens | $3 per million |
| Output tokens | $15 per million |
| Prompt caching, cache write | $3.75 per million |
| Prompt caching, cache read | $0.30 per million |
| Batch API input | $1.50 per million (50% discount) |
| Batch API output | $7.50 per million (50% discount) |
Prompt caching was added to the Claude API as a public beta on August 14, 2024, with 3.5 Sonnet as one of the launch models. Cache reads were priced at 10% of the regular input rate, which made multi-turn agent workloads dramatically cheaper than they had been on Claude 3 Opus. The Batch API followed in October 2024 with a flat 50% discount on both input and output for asynchronous workloads.[^21][^22]
The model was available on the Anthropic API, claude.ai (free, Pro, and Team), the Claude iOS app, Amazon Bedrock, Google Cloud's Vertex AI, and through every major model-routing provider including OpenRouter and AWS Bedrock cross-region endpoints. The Claude desktop app, launched in late 2024, used 3.5 Sonnet as its default. The model was added to GitHub Copilot on October 29, 2024, the first time Anthropic models were available inside Copilot, and Microsoft's M365 Copilot followed shortly after. By early 2025 it was available natively in Cursor, Windsurf, Replit, Vercel v0, Sourcegraph Cody, Continue.dev, Aider, and most other developer tools that exposed model selection.[^2][^6][^16]
The Claude 3.5 family was announced at the June 2024 launch as a planned three-tier lineup mirroring Claude 3: a small Haiku, the mid-tier Sonnet, and a large Opus. Only Sonnet and Haiku ever shipped.
The result was a 3.5 family that consisted in practice of two models: an outsized Sonnet that beat its missing flagship, and a Haiku that matched the previous flagship. The asymmetry — flagship missing, mid-tier dominant — was unusual enough that it shaped how analysts thought about Anthropic's product strategy for the next year.[^2][^7][^10]
The initial press response in June 2024 was strong but measured. TechCrunch, The Verge, Ars Technica, and VentureBeat each ran reviews focused on the unusual price-versus-Opus framing and on Artifacts as a UX innovation. Most reviewers said the model was at least as good as GPT-4o on coding and general writing, and several called it the best frontier model available at the time. The Verge wrote that 3.5 Sonnet "feels like a meaningfully smarter Claude," and Ars Technica described Artifacts as among the most interesting consumer features shipped by a chatbot company that summer.[^24][^25]
The October 2024 upgrade was received with louder enthusiasm. Simon Willison's blog post on the launch became one of the most-cited responses, calling the SWE-bench jump "genuinely surprising" and noting that the same price tag for a substantially better model was unusual in the industry. Latent Space ran a long interview with Anthropic engineer Erik Schluntz that became a primary source for understanding how the team had trained Computer Use.[^18][^26]
The model's commercial reception was unusual for an Anthropic release. Where previous Claude generations had been respected but second-tier in market share, 3.5 Sonnet became the default model for a large slice of the developer-tooling market within a few weeks of launch. Adoption was driven by a combination of price (one fifth of GPT-4 Turbo's output rate at the time), a long context window, strong coding scores, low latency, and the perception (initially anecdotal, later confirmed in benchmarks) that the model produced cleaner code than competitors.
| Partner | Integration | Timing |
|---|---|---|
| Cursor | Default tab-complete and chat model | June 2024, deepened October 2024 |
| GitHub Copilot | Selectable model in Copilot Chat | October 29, 2024 |
| Replit Agent | Primary code-generation model | September 2024 |
| Vercel v0 | Selectable model for frontend generation | Mid-2024 |
| Sourcegraph Cody | Selectable model in Cody | June 2024 |
| Windsurf (Codeium) | Selectable model | Late 2024 |
| Continue.dev | Selectable model | June 2024 |
| Microsoft M365 Copilot | Optional model for Copilot users | Late 2024 |
| Amazon Bedrock | Available since launch | June 2024 |
| Google Cloud Vertex AI | Available since launch | June 2024 |
| Notion AI | Backbone model | Late 2024 |
| Perplexity Pro | Selectable model | June 2024 |
GitHub's Universe announcement on October 29, 2024 made Claude 3.5 Sonnet the first non-OpenAI model available in GitHub Copilot, alongside Gemini 1.5 Pro. Microsoft explicitly framed it as moving Copilot to a multi-model architecture, and it was the moment that ended the Copilot-only-uses-OpenAI era. Cursor's adoption was even more visible: by late 2024 the company was telling investors that the bulk of its inference spend went to Anthropic, and CEO Michael Truell described 3.5 Sonnet in interviews as the model that made the company's product viable in its modern form.[^6][^16][^23]
The model acquired a specific cultural reputation: by late 2024 it was common in technical Twitter and Hacker News threads to see lines like "Sonnet 3.5 is leading the field" or "if you're not using Sonnet for this you're losing." Newsletters such as Latent Space ran retrospectives describing the model as having defined the frontier for a six-to-eight-month window. The "Claude 3.6" community label was partly a recognition of capability gains and partly a recognition that the upgraded snapshot felt different — more direct, more willing to disagree, more comfortable with humor and irony than competing models.[^11][^26]
Claude 3.5 Sonnet is the model that converted Anthropic from a boutique AI safety lab into a commercial competitor. Before June 2024 the company was best known for its research output; afterward it was widely regarded as one of the two or three labs that could plausibly produce frontier models, and was repeatedly named alongside OpenAI and Google DeepMind in venture decks, press features, and government policy documents. The company's revenue more than tripled between June 2024 and the end of 2024 and continued to grow throughout the model's commercial life.[^7][^29]
Anthropic's announcement of Model Context Protocol on November 25, 2024 (one month after the Sonnet upgrade) was timed to take advantage of the moment. MCP shipped with Sonnet 3.5 as the implicit reference model and Claude Desktop as the reference client. The combined effect of an excellent coding model, Computer Use, and an open agent protocol was that Anthropic became the de facto agent infrastructure provider for late 2024 and most of 2025, even as competitors caught up on raw model quality.[^17]
The Sonnet line continued in a roughly six-to-nine-month cadence after the October 2024 upgrade. Each successor kept the same $3 / $15 per million token price point and the same 200K context window, while improving coding, agentic, and reasoning benchmarks.
| Model | Announced | API ID | Key improvements |
|---|---|---|---|
| Claude 3.7 Sonnet | February 24, 2025 | claude-3-7-sonnet-20250219 | First hybrid reasoning model with extended thinking; SWE-bench Verified 70.3% |
| Claude Sonnet 4 | May 22, 2025 | claude-sonnet-4-20250514 | Stronger coding, instruction following; SWE-bench Verified ~72.7% |
| Claude Sonnet 4.5 | September 29, 2025 | claude-sonnet-4-5-20250929 | State of the art on SWE-bench Verified (~77.2%); 30-hour autonomous focus; OSWorld 61.4% |
Claude 3.7 Sonnet was announced on February 24, 2025, exactly four months after the October upgrade. It was Anthropic's first hybrid reasoning model, capable of either responding immediately or switching into an extended-thinking mode that produced visible chain-of-thought reasoning. The version number jumped from 3.5 to 3.7, skipping 3.6 in deference to the community label that had attached to the October Sonnet 3.5 upgrade. Claude 3.7 shipped alongside an early preview of Claude Code, Anthropic's agentic coding CLI, and it became the new default Sonnet on the API, on claude.ai, on Bedrock, on Vertex AI, and inside Cursor, GitHub Copilot, and most of the partner products that had been running 3.5 Sonnet. The 3.5 Sonnet snapshots remained on the API as legacy options.[^12]
Claude Sonnet 4 (May 22, 2025) dropped the "3.x" naming and adopted a cleaner "Sonnet N" convention, released alongside Claude Opus 4 as the headline launch of the Claude 4 family. It improved SWE-bench Verified to around 72.7% at the same price point and added native long-running task support. Claude Sonnet 4.5 (September 29, 2025) was positioned by Anthropic as "the best coding model in the world" and the strongest model for building complex agents; on launch day it reached 77.2% on SWE-bench Verified and 61.4% on OSWorld, demonstrated multi-hour autonomous coding work, and explicitly recommended itself as the migration target for legacy 3.5 Sonnet workloads.[^5][^7]
On August 13, 2025 Anthropic notified developers that both Claude 3.5 Sonnet snapshots would be retired on October 28, 2025, with Claude Sonnet 4.5 listed as the recommended replacement. The roughly two-month notice followed Anthropic's standard retirement process, which guarantees at least 60 days of advance warning for retirements of publicly released models. Initial communications had pointed at an October 22, 2025 retirement date — exactly one year after the upgraded snapshot's launch — before being pushed by six days to align with internal end-of-life scheduling. Both snapshots reached end-of-life on October 28, 2025; requests to either model on the Claude API now return an error.[^5]
By the time of retirement, 3.5 Sonnet had been on the market for sixteen months — an unusually long commercial life for a frontier model — and had served as the default model in most major developer-tooling products for at least eight of those months. Its retirement closed a chapter that, more than any single previous release, had been associated with a single model: the model that beat the flagship, made Anthropic commercially competitive, defined "agentic coding" as a category, and established the $3 / $15 per million token price as the going rate for a frontier mid-tier model — a price Anthropic continued to honor through Claude Sonnet 4.5.
Anthropic has separately committed to long-term preservation of model weights and to making past models available again at some point in the future under restricted-access terms, in part because of "safety- and model welfare-related risks" the company associates with permanently retiring frontier models. Claude 3.5 Sonnet weights are preserved internally under that commitment but are not publicly redistributed.[^5]