Claude 3.5 Sonnet
Last reviewed
May 24, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v3 · 6,546 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 24, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v3 · 6,546 words
Add missing citations, update stale details, or suggest a clearer explanation.
Claude 3.5 Sonnet is a multimodal large language model developed by Anthropic and first released on June 20, 2024 as the opening model of the company's 3.5 generation.[^1] It was the model that ended Anthropic's role as a perennial second-place lab and made the Sonnet tier the company's signature product. A substantially upgraded version followed on October 22, 2024, shipping alongside Claude 3.5 Haiku and the public beta of Computer Use.[^2] Within the developer community the upgraded model was widely nicknamed "Claude 3.6" because of how large the jump was, and Anthropic eventually skipped that version number entirely when the next official release became Claude 3.7 Sonnet in February 2025.[^3]
The two snapshots use the API identifiers claude-3-5-sonnet-20240620 (the original) and claude-3-5-sonnet-20241022 (the upgrade). Both share a 200,000-token context window, an 8,192-token maximum output length, and a price of $3 per million input tokens and $15 per million output tokens.[^1][^2][^4] Both were released under AI Safety Level 2 (ASL-2) deployment standards. The upgraded snapshot improved SWE-bench Verified from 33.4% to 49.0%, which Anthropic at the time described as state of the art for any publicly available model, ahead of OpenAI's recently launched o1-preview reasoning model.[^2][^5][^6]
At launch Anthropic positioned 3.5 Sonnet as outperforming the larger and slower Claude 3 Opus at roughly one fifth of the price, an unusual move for the industry at the time.[^1] The June launch was paired with the debut of Artifacts, a side-panel interface in claude.ai that rendered code, websites, SVG graphics, and documents live as the model produced them, and which became the visual face of the model for most consumer users.[^1][^7] The model's combination of strong coding, low cost, fast latency, and a long context window made it the default option in Cursor, GitHub Copilot (after Anthropic was added in October 2024), Replit, Vercel v0, Sourcegraph Cody, and a long list of agent products built between mid-2024 and early 2025.[^1][^2][^8]
Claude 3.5 Sonnet held leading or tied-leading positions on coding, reasoning, and agentic-task benchmarks for most of the seven-month window between its launch and the release of Claude 3.7 Sonnet. Both snapshots were retired on October 28, 2025 after Anthropic's standard 60-day notice period, with users directed to migrate to Claude Sonnet 4 and its successors.[^6]
By the spring of 2024 Anthropic had shipped its Claude 3 family in three sizes, Haiku, Sonnet, and Opus, released as a coordinated launch on March 4, 2024.[^9] The pattern of three sizes was meant to mirror conventional product tiers: Haiku was the small, fast, cheap model; Sonnet the mid-tier; and Claude 3 Opus the large flagship that won most benchmark contests but cost roughly five times as much per token to run.[^9] Across the industry the assumption in early 2024 was that frontier-level intelligence required flagship-class models, and that intermediate tiers existed for cost-sensitive workloads rather than for state of the art performance.[^9]
The "3.5" label was chosen to signal a partial generational improvement that kept costs flat for a given tier while substantially lifting capability. Anthropic did not commit publicly to delivering all three sizes of the 3.5 family on day one; the June 20, 2024 announcement described 3.5 Sonnet as "the first in our forthcoming Claude 3.5 model family," with Haiku and Opus refreshes intended to follow later in the year.[^1] Claude 3.5 Haiku eventually shipped in October 2024 alongside the upgraded Sonnet snapshot.[^2] Claude 3.5 Opus was repeatedly promised but never released; Anthropic quietly removed it from its roadmap and shipped Claude Opus 4 in May 2025 instead.[^10] That absence, a flagship 3.5 model that never arrived, became one of the more discussed signals about Anthropic's internal compute and product trade-offs during the period.[^1][^2][^10]
Inside Anthropic the model line traces back to the Claude 1 family of late 2022 and the Claude 2.0 and 2.1 releases of 2023, but the architectural lineage that produced 3.5 Sonnet is more directly the Claude 3 program. According to the model card published with the original snapshot, the family is trained on "a proprietary mix of publicly available information on the Internet as of April 2024, as well as non-public data from third parties, data provided by data labeling services and paid contractors, and data we generate internally."[^11] Anthropic has never disclosed parameter counts, exact pretraining FLOPs, or the specific architecture of any 3.5 model. The model is widely understood to be a transformer-based decoder language model with vision encoder, trained with a combination of supervised fine-tuning and reinforcement learning, including the Constitutional AI feedback procedure that Anthropic introduced in 2022.[^11]
| Snapshot | API ID | Release date | Knowledge cutoff | Retirement |
|---|---|---|---|---|
| Original | claude-3-5-sonnet-20240620 | June 20, 2024 | April 2024 | October 28, 2025 |
| Upgraded ("new", informally "3.6") | claude-3-5-sonnet-20241022 | October 22, 2024 | April 2024 | October 28, 2025 |
Anthropic introduced Claude 3.5 Sonnet on June 20, 2024 in a blog post titled "Introducing Claude 3.5 Sonnet."[^1] The unusual framing was that the new mid-tier model outperformed the existing flagship: the post stated that 3.5 Sonnet "raises the industry bar for intelligence," beating both competitor models and Claude 3 Opus while operating "at twice the speed of Claude 3 Opus" and at the cost of the previous mid-tier model.[^1] The price was set at $3 per million input tokens and $15 per million output tokens, the same rate as Claude 3 Sonnet and far below Opus.[^1]
The model was made available the same day on the Anthropic API, claude.ai (free and Pro), the Claude iOS app, Amazon Bedrock, and Google Cloud's Vertex AI.[^1] It supported a 200,000-token context window, an 8,192-token maximum output, and image inputs (vision) in the same way Claude 3 Opus had. The knowledge cutoff was April 2024.[^1][^11]
At launch the company published a model card addendum to the existing Claude 3 model card rather than a fresh card, the rationale being that 3.5 Sonnet was an evolution of the Claude 3 family.[^11] The addendum reported benchmark results, vision evaluations, agentic coding numbers, refusal rates on Wildchat and XSTest, and a Responsible Scaling Policy safety evaluation in which Anthropic concluded that the model did not exceed thresholds for ASL-3 and shipped under ASL-2.[^11] The UK AI Security Institute ran independent pre-deployment testing on the model.[^11]
On October 22, 2024 Anthropic published a single post titled "Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku."[^2] The release bundled three things: an upgraded Claude 3.5 Sonnet snapshot referenced as claude-3-5-sonnet-20241022; the introduction of Claude 3.5 Haiku (which followed later in the month); and a new public beta in the Anthropic API that let developers "direct Claude to use computers the way people do, by looking at a screen, moving a cursor, clicking buttons, and typing text."[^2]
The upgraded Sonnet was made available immediately on the Anthropic API, Amazon Bedrock, Google Cloud's Vertex AI, and claude.ai. The price stayed at $3 input and $15 output per million tokens. The context window stayed at 200K. The knowledge cutoff stayed at April 2024. The model card addendum for the upgrade noted that the model had retained the original Sonnet 3.5's pricing while improving "across the board."[^2][^12]
Critically, the new snapshot shared the user-visible name "Claude 3.5 Sonnet" with the June model but used a different dated API ID. This meant the only reliable way to distinguish the two from inside an application was the suffix on the model ID. Simon Willison, who joined widespread criticism of Anthropic's naming strategy, argued that the incremental change inadequately reflected the substantial performance improvements deserving of a version number bump.[^13] The community quickly settled on the unofficial nickname "Claude 3.6" or "Sonnet 3.6" for the upgrade, and the label stuck so firmly that Anthropic later jumped directly to 3.7 to avoid colliding with it.[^14]
3.5 Sonnet handles general text generation, summarization, classification, multilingual translation, structured extraction, and standard reasoning. The October 2024 upgrade tightened instruction following: Anthropic reported a 51% human preference win rate over the original Sonnet 3.5 on "following precise instructions" tasks, and an IFEval score of 90.2%.[^12] In practice that change was visible to users in the form of cleaner adherence to format requests, JSON schemas, and complex prompt scaffolds, which is part of why the model became the default in agent frameworks.[^12]
On the original snapshot, the model reached 59.4% on GPQA Diamond (graduate-level science questions), 90.4% on MMLU (5-shot chain-of-thought), and 71.1% on the MATH benchmark.[^11] The upgraded snapshot moved GPQA Diamond to 65.0% and MATH to 78.3%, while adding 78.0% on MMLU-Pro (a harder MMLU variant not measured on the original card).[^12] Both snapshots score in the mid-90s on BIG-Bench Hard and DROP, and the upgrade also added an AIME 2024 measurement at 16.0% on zero-shot and 27.6% with Maj@64 sampling, signaling that the model was effective but not state of the art at olympiad-style mathematics, which was OpenAI o1's specialty at the time.[^12]
Coding was the headline strength of both snapshots. The June model led HumanEval at 92.0% and Anthropic's internal agentic coding evaluation at 64%, against 38% for Claude 3 Opus on the same task.[^1][^11] The October upgrade pushed those to 93.7% and 78% respectively, and added the SWE-bench Verified jump to 49.0% that became one of the most-cited numbers in the second half of 2024.[^2][^12] SWE-bench Verified at the time had a published state of the art of 45.2% (achieved by a system built on a competitor model with elaborate scaffolding); the upgraded Sonnet beat that as a single-model score and explicitly ahead of OpenAI o1-preview, which Anthropic reported at 41.0% on the same benchmark.[^2][^15]
Anthropic's engineering team published a companion write-up that decomposed the harness used to obtain the 49% number. The agent had access to three tools: a bash tool with detailed instructions covering escaping, internet limitations, and background process management; an edit tool that performed file viewing, creation, and modification through string replacement (which Anthropic concluded was more reliable than diff-based or full-file-regeneration approaches); and a minimal natural-language prompt that suggested steps but did not enforce any particular workflow.[^15] The team noted that absolute file paths and exact-string validation were used to "error-proof" tool calls before submission, and that Sonnet 3.5 could perform agentic search across an unfamiliar codebase without a vector index or pre-built code map.[^15]
Real-world adoption tracked the benchmarks. Cursor reported the model as their default for most users; Replit Agent used it; Sourcegraph Cody added it; v0 by Vercel made it a primary option for frontend code generation; and a large fraction of independent agentic-coding products built between mid-2024 and early 2025 had a Sonnet 3.5 path.[^8] By the time the upgrade landed in October, the model was widely described in developer communities as the default model for serious coding work, with Latent Space later writing that the "vibe shift in favor of Claude 3.5 Sonnet has been remarkably long-lived and persistent, surviving multiple subsequent updates of 4o, o1 and Gemini versions, for Anthropic's Claude to end 2024 as the preferred model for AI Engineers."[^14]
Both snapshots accept images as inputs and handle chart and graph interpretation, document understanding, science diagrams, OCR-like transcription from imperfect images, and visual math. MMMU validation reached 68.3% on the original snapshot and 70.4% on the upgrade, ChartQA reached 90.8% (unchanged), and DocVQA reached 95.2% on the original (slipping marginally to 94.2% on the upgrade).[^11][^12] AI2D reached 94.7% on the original and 95.3% on the upgrade. MathVista (testmini) advanced from 67.7% to 70.7%. The June 20, 2024 announcement described 3.5 Sonnet as "our strongest vision model yet, surpassing Claude 3 Opus on standard vision benchmarks," with particular strength in visual reasoning tasks like interpreting charts and graphs and accurate text transcription from imperfect images.[^1] The model card addendum framed vision as a steady, not headline, capability.[^11]
Both snapshots ship with a 200,000-token context window, the same as Claude 3 Opus, and a maximum output length of 8,192 tokens (an early stretch of Claude 3 Opus had been capped at 4,096 output tokens, which 3.5 Sonnet doubled out of the gate).[^1] The 200K context made Sonnet 3.5 well-suited to long-document workflows: summarizing large codebases, reasoning over multi-thousand-line contracts, and serving as the back end for RAG pipelines that retrieved large evidence sets. Anthropic reported published independent third-party measurements where the model handled needle-in-a-haystack retrieval and long-document question answering at levels comparable to Claude 3 Opus while costing 80% less to operate.[^16]
3.5 Sonnet was Anthropic's first model with what became the company's standard structured tool use API generally available, where the developer registers a tool schema and the model emits typed tool_use calls that the host code executes.[^17] The approach generalized beyond function calling: it underpinned Computer Use at launch, became the basis for Model Context Protocol clients introduced in November 2024, and is the same primitive Claude Code used when it arrived in February 2025.[^17][^18] τ-bench (TAU-bench), an external benchmark for tool-using customer-service agents, was where the October upgrade most clearly outscored the field: 69.2% retail and 46.0% airline, both up substantially from the original Sonnet 3.5's 62.6% and 36.0%.[^2][^12]
Alongside the June 20, 2024 release, Anthropic launched a feature called Artifacts on claude.ai.[^7] Artifacts opened a dedicated panel next to the chat where Claude could render generated content in place: code, SVG illustrations, HTML pages, Mermaid diagrams, React components, single-file games, and documents.[^7] The panel updated in real time as the model produced output and could be edited or rolled back.[^7]
Technically the implementation relied on running rendered output inside a sandboxed iframe in the user's browser. SVG artifacts produce scalable vector graphics that render at any resolution; React components render live with state management, event handling, and Tailwind CSS styling; HTML pages accept CSS and JavaScript and can include interactive elements; Mermaid diagrams render flowcharts, sequence diagrams, class diagrams, Gantt charts, and entity-relationship diagrams from a text-based syntax. Artifacts are saved to the conversation and persist for later viewing or editing.[^7]
The combination of 3.5 Sonnet plus Artifacts gave non-developer users an immediate, visual way to use the model for what would later be called vibe coding. Artifacts also helped seed a class of single-prompt mini-apps that became a recurring genre on Twitter and Hacker News during the second half of 2024, including share-a-tweet demos of working calculators, retro games, generative art, and dashboards produced entirely from a single prompt. Anthropic later expanded Artifacts to support persistent saved projects, embedded sharing, and a separate "publish" mode in 2025.
Artifacts is one of the few consumer-facing AI UX patterns from 2024 that was widely copied. ChatGPT Canvas, released by OpenAI in October 2024, and Google's Gemini Code Canvas borrowed the side-panel structured-editor concept directly, and both companies were widely covered as having drawn on Artifacts as inspiration in subsequent press coverage.
The October 22, 2024 release introduced Computer Use, a public beta capability that let Claude operate a real desktop environment by reading screenshots and emitting mouse and keyboard actions.[^2] This made the upgraded Claude 3.5 Sonnet the first frontier model from a major AI lab to ship a general-purpose desktop automation capability.[^2] The Anthropic-hosted demo ran the model inside a sandboxed virtual machine and showed it filling vendor forms, browsing Google Maps, and navigating GUI applications.[^2] On the OSWorld benchmark, the upgraded Sonnet 3.5 scored 14.9% on the 15-step screenshot-only setting and 22% with 50 steps, against a previous best of around 7.8%.[^2] Human performance on OSWorld is roughly 72.36%.[^2] Anthropic was candid in the launch post that Computer Use was "at times cumbersome and error-prone" and recommended developers run it in containers with limited privileges.[^2]
Crucially, Computer Use was launched with the October 22, 2024 snapshot only; the original June 20, 2024 snapshot did not include this capability. Developers wishing to use the feature had to switch their model ID to claude-3-5-sonnet-20241022.
Computer Use was one of the first general-purpose agentic-control products from a major lab and predated OpenAI Operator (January 2025) and Google's Project Mariner (December 2024).[^19] It became the template that successive Claude generations refined: by Claude Sonnet 4.5 the same OSWorld benchmark had reached 61.4%.[^20]
Under the hood, Computer Use was implemented as a structured tool-use pattern. Anthropic defined three reference tools: a computer tool that emitted screen actions like screenshot, left_click, type, key, mouse_move, and scroll; a text_editor tool for file editing; and a bash tool for shell command execution.[^2][^21] These three tools became the bedrock of agentic Claude integrations for the next year. The text_editor and bash tool patterns were carried over directly into Claude Code when it launched in February 2025.[^18][^21]
The October release noted screen-resolution caveats. Anthropic recommended that developers send screenshots no larger than XGA (1024 by 768) for 4:3 aspect ratios and WXGA (1280 by 800) for widescreen aspect ratios, "to avoid issues related to image resizing."[^13] Coordinate-detection accuracy, which prior models had struggled with, was a focus of the upgrade and accounted for much of the OSWorld improvement.[^13]
Prompt injection was identified at launch as the primary residual risk of the capability. Because Claude operates by reading screenshots that may contain web content, embedded instructions on a page or in a downloaded file can attempt to override the user's instructions.[^22] Anthropic published a Trust and Safety analysis showing that internal classifiers blocked 88% of injection attempts in red-teaming, against 74% with no safety systems applied, and recommended a defense-in-depth approach including sandboxing, restricted privileges, and human approval steps for irreversible actions.[^22]
In an October 22, 2024 interview with the Latent Space podcast, Anthropic engineer Erik Schluntz described the design philosophy as "pokayoke" (mistake-proofing): the team detailed each tool with extensive documentation and examples rather than minimal API specifications, and let Claude decide which tool to invoke for each step instead of using a fixed agent loop.[^14] Schluntz argued that the existence of a general-purpose computer-use tool meant developers could "give the thing a browser that's logged into what you want to integrate with, and it's going to work immediately" without writing dedicated API integrations.[^14]
The table below summarizes the headline benchmark scores for both snapshots, drawn from Anthropic's own model card addendums (June 2024 for the original, October 2024 for the upgrade).[^11][^12] Where the original is missing a number, Anthropic did not report it on the original card.
| Benchmark | Original (June 2024) | Upgraded (October 2024) | Notes |
|---|---|---|---|
| GPQA Diamond (0-shot CoT) | 59.4% | 65.0% | Graduate-level science Q&A |
| MMLU (5-shot CoT) | 90.4% | 90.5% | General reasoning |
| MMLU-Pro (0-shot CoT) | not reported | 78.0% | Harder MMLU variant |
| MATH (0-shot CoT) | 71.1% | 78.3% | Mathematical problem solving |
| HumanEval | 92.0% | 93.7% | Python coding |
| MGSM | 91.6% | 92.5% | Multilingual math |
| DROP (3-shot, F1) | 87.1 | 88.3 | Reading comprehension |
| BIG-Bench Hard (3-shot CoT) | 93.1% | 93.2% | Mixed evaluations |
| GSM8K | 96.4% | not reported | Grade-school math |
| AIME 2024 (0-shot CoT) | not reported | 16.0% | High school math contest |
| AIME 2024 (Maj@64) | not reported | 27.6% | |
| IFEval | not reported | 90.2% | Instruction following |
| MMMU (validation, 0-shot) | 68.3% | 70.4% | Visual question answering |
| MathVista (testmini) | 67.7% | 70.7% | Visual math reasoning |
| AI2D | 94.7% | 95.3% | Science diagrams |
| ChartQA | 90.8% | 90.8% | Chart understanding |
| DocVQA (ANLS) | 95.2% | 94.2% | Document understanding |
| SWE-bench Verified | 33.4% | 49.0% | Real GitHub issues |
| τ-bench retail (pass^1) | 62.6% | 69.2% | Tool-use customer service |
| τ-bench airline (pass^1) | 36.0% | 46.0% | Harder τ-bench split |
| OSWorld (15-step, screenshot) | not applicable | 14.9% | Desktop agent |
| Internal agentic coding | 64% | 78% | Anthropic internal eval |
The single benchmark that did the most to define the model's reputation was SWE-bench Verified. The upgraded snapshot's 49.0% put it ahead of OpenAI's o1-preview reasoning model (41.0% on the same benchmark in OpenAI's reported numbers) and ahead of all open-source alternatives, despite running without explicit chain-of-thought reasoning at inference time.[^2][^5]
Independent leaderboards generally agreed with the model card numbers. The Vellum LLM Leaderboard tracked the upgraded snapshot near the top of coding and reasoning categories from late October 2024 through February 2025.[^23] On Artificial Analysis the model held a leading-tier ranking on its composite intelligence score for that window, behind the o1 family on math but ahead on coding tasks.[^24] On the LMSYS Chatbot Arena (later LMArena), the model held a top-five Elo score for the same period, frequently in the top three for coding-tagged prompts.[^25]
Both snapshots were released under Anthropic's Responsible Scaling Policy at AI Safety Level 2 (ASL-2). The original June 2024 model card addendum stated that the policy review concluded the model "did not meet the thresholds for ASL-3" and shipped under ASL-2 deployment standards.[^11] The October 2024 addendum repeated this conclusion for the upgraded snapshot, including in the context of the new Computer Use capability.[^12]
The October 22, 2024 release was the subject of a joint pre-deployment evaluation by the UK AI Security Institute (UK AISI) and the US AI Safety Institute (US AISI), the first public AI evaluation jointly conducted by the two institutes.[^26] The institutes had limited pre-deployment access to the model and shared findings with Anthropic before the public release. Testing covered four domains: biological capabilities, cyber capabilities, software and AI development, and safeguard efficacy.[^26]
On biological capabilities, the upgraded model performed comparably to other reference models on multiple-choice tasks but below human expert baselines; when augmented with bioinformatic tools, performance occasionally exceeded expert baselines on specific subtasks.[^26] On cyber capabilities, the model solved 32.5% of 40 public cybersecurity challenges, against 35% for the best reference model, and 36% of 47 UK-developed apprentice-level cybersecurity challenges, against 29% for reference models.[^26] On software and AI development, the model achieved 57% on ML model optimization tasks (against 48% for the best reference model) and 66% on software engineering tasks (against 64% for reference).[^26] On safeguard efficacy, both institutes concluded existing safeguards could be "routinely" circumvented through jailbreaks, a result consistent with other contemporary AI systems.[^26]
Anthropic separately reported that internal red-teaming systems trained to detect prompt injection attempts during Computer Use blocked 88% of attacks in adversarial testing, compared to 74% without those systems, and described prompt injection as the primary residual risk of the capability.[^22]
Both snapshots were priced identically:[^1][^2][^4]
| Tier | Cost |
|---|---|
| Input tokens | $3 per million |
| Output tokens | $15 per million |
| Prompt caching, cache write | $3.75 per million |
| Prompt caching, cache read | $0.30 per million |
| Batch API input | $1.50 per million (50% discount) |
| Batch API output | $7.50 per million (50% discount) |
Prompt caching was added to the Claude API as a public beta on August 14, 2024, with 3.5 Sonnet as one of the launch models.[^27] Cache reads were priced at 10% of the regular input rate ($0.30 per million tokens for 3.5 Sonnet, compared to $3 for non-cached input), which made multi-turn agent workloads dramatically cheaper than they had been on Claude 3 Opus.[^27] Cache writes carried a 25% premium ($3.75 per million tokens). Anthropic reported customer-side measurements showing up to 90% cost savings and 79% latency reductions on heavy-context chat workloads after enabling caching, and named Notion AI as an early adopter.[^27] The Batch API (Anthropic Message Batches API) followed in October 2024 with a flat 50% discount on both input and output for asynchronous workloads.[^28]
The model was available on the Anthropic API, claude.ai (free, Pro, and Team), the Claude iOS app, Amazon Bedrock, Google Cloud's Vertex AI, and through every major model-routing provider including OpenRouter and AWS Bedrock cross-region endpoints.[^1][^2] The Claude desktop app, launched in late 2024, used 3.5 Sonnet as its default. The model was added to GitHub Copilot on October 29, 2024, the first time Anthropic models were available inside Copilot, and Microsoft's M365 Copilot followed shortly after.[^8] By early 2025 it was available natively in Cursor, Windsurf (software), Replit, Vercel v0, Sourcegraph Cody, Continue (software), Aider, and most other developer tools that exposed model selection.[^8]
The Claude 3.5 family was announced at the June 2024 launch as a planned three-tier lineup mirroring Claude 3: a small Haiku, the mid-tier Sonnet, and a large Opus.[^1] Only Sonnet and Haiku ever shipped.
The result was a 3.5 family that consisted in practice of two models: an outsized Sonnet that beat its missing flagship, and a Haiku that matched the previous flagship. The asymmetry, flagship missing, mid-tier dominant, was unusual enough that it shaped how analysts thought about Anthropic's product strategy for the next year.[^3][^10]
The initial press response in June 2024 was strong but measured. TechCrunch, The Verge, Ars Technica, and VentureBeat each ran reviews focused on the unusual price-versus-Opus framing and on Artifacts as a UX innovation. Most reviewers said the model was at least as good as GPT-4o on coding and general writing, and several called it the best frontier model available at the time. The Verge described 3.5 Sonnet as "supposed to rival GPT-4o" and ran a side-by-side examination of the new chat and Artifacts interface.[^29] Ars Technica titled its coverage "Anthropic's Claude 3.5 Sonnet beats GPT-4o on most benchmarks" and noted the matching of GPT-4o on Anthropic's published comparisons.[^30] PYMNTS focused on a less benchmark-heavy thread, namely that the model "understands humor" better than its predecessors, picking up on Anthropic's own emphasis on writing tone in the launch post.[^31]
The October 2024 upgrade was received with louder enthusiasm. Simon Willison's blog post on the launch became one of the most-cited responses, calling the SWE-bench jump "genuinely surprising" and noting that the same price tag for a substantially better model was unusual in the industry.[^13] Latent Space ran a long interview with Anthropic engineer Erik Schluntz that became a primary source for understanding how the team had trained Computer Use and the SWE-bench agent harness.[^14]
The model's commercial reception was unusual for an Anthropic release. Where previous Claude generations had been respected but second-tier in market share, 3.5 Sonnet became the default model for a large slice of the developer-tooling market within a few weeks of launch.[^8][^14] Adoption was driven by a combination of price (one fifth of GPT-4 Turbo's output rate at the time), a long context window, strong coding scores, low latency, and the perception (initially anecdotal, later confirmed in benchmarks) that the model produced cleaner code than competitors.
| Partner | Integration | Timing |
|---|---|---|
| Cursor | Default tab-complete and chat model | June 2024, deepened October 2024 |
| GitHub Copilot | Selectable model in Copilot Chat | October 29, 2024 |
| Replit Agent | Primary code-generation model | September 2024 |
| Vercel v0 | Selectable model for frontend generation | Mid-2024 |
| Sourcegraph Cody | Selectable model in Cody | June 2024 |
| Windsurf (Codeium) | Selectable model | Late 2024 |
| Continue.dev | Selectable model | June 2024 |
| Microsoft M365 Copilot | Optional model for Copilot users | Late 2024 |
| Amazon Bedrock | Available since launch | June 2024 |
| Google Cloud Vertex AI | Available since launch | June 2024 |
| Notion AI | Backbone model | Late 2024 |
| Perplexity Pro | Selectable model | June 2024 |
GitHub's Universe announcement on October 29, 2024 made Claude 3.5 Sonnet the first non-OpenAI model available in GitHub Copilot, alongside Gemini 1.5 Pro and o1-preview.[^8] Microsoft explicitly framed it as moving Copilot to a multi-model architecture, and it was the moment that ended the Copilot-only-uses-OpenAI era. The announcement said Claude 3.5 Sonnet "excels across the entire software development lifecycle, from initial design to bug fixes, maintenance to optimizations" and described how it would roll out progressively over a one-week window.[^8]
The October 2024 Computer Use launch was accompanied by named industry adopters. Asana, Canva, Cognition AI, DoorDash, Replit, and The Browser Company were cited as launch partners.[^2] Cognition reported "substantial improvements in coding, planning, and problem-solving" using the upgraded model for its Devin software agent product, The Browser Company said the new model "outperformed every model they've tested before" for browser automation, and GitLab found "stronger reasoning (up to 10% across use cases) with no added latency" for its DevSecOps workflows.[^2]
Cursor's adoption was particularly visible. By late 2024 the company was telling investors that the bulk of its inference spend went to Anthropic, and Cursor's product communications repeatedly framed 3.5 Sonnet as the model that made the modern Cursor experience viable.[^32] Replit Agent, the company's autonomous app-building product, used Sonnet 3.5 as its primary model from launch.[^2]
The model acquired a specific cultural reputation: by late 2024 it was common in technical Twitter and Hacker News threads to see lines like "Sonnet 3.5 is leading the field" or "if you're not using Sonnet for this you're losing." Newsletters such as Latent Space ran retrospectives describing the model as having defined the frontier for a six-to-eight-month window. The "Claude 3.6" community label was partly a recognition of capability gains and partly a recognition that the upgraded snapshot felt different: more direct, more willing to disagree, more comfortable with humor and irony than competing models.[^14][^31]
Claude 3.5 Sonnet is the model that converted Anthropic from a boutique AI safety lab into a commercial competitor. Before June 2024 the company was best known for its research output; afterward it was widely regarded as one of the two or three labs that could plausibly produce frontier models, and was repeatedly named alongside OpenAI and Google DeepMind in venture decks, press features, and government policy documents. Industry analyst reports estimated Anthropic's annualized revenue grew from approximately $200 million entering 2024 to roughly $1 billion by the end of the year, with continued growth in 2025.[^33]
Anthropic's announcement of Model Context Protocol on November 25, 2024 (one month after the Sonnet upgrade) was timed to take advantage of the moment. MCP shipped with Sonnet 3.5 as the implicit reference model and the Claude Desktop app as the reference client.[^18] The combined effect of an excellent coding model, Computer Use, and an open agent protocol was that Anthropic became the de facto agent infrastructure provider for late 2024 and most of 2025, even as competitors caught up on raw model quality.[^18]
The Sonnet line continued in a roughly six-to-nine-month cadence after the October 2024 upgrade. Each successor kept the same $3 / $15 per million token price point and the same 200K context window, while improving coding, agentic, and reasoning benchmarks.
| Model | Announced | API ID | Key improvements |
|---|---|---|---|
| Claude 3.7 Sonnet | February 24, 2025 | claude-3-7-sonnet-20250219 | First hybrid reasoning model with extended thinking; SWE-bench Verified 70.3% |
| Claude Sonnet 4 | May 22, 2025 | claude-sonnet-4-20250514 | Stronger coding, instruction following; SWE-bench Verified ~72.7% |
| Claude Sonnet 4.5 | September 29, 2025 | claude-sonnet-4-5-20250929 | State of the art on SWE-bench Verified (~77.2%); 30-hour autonomous focus; OSWorld 61.4% |
Claude 3.7 Sonnet was announced on February 24, 2025, exactly four months after the October upgrade. It was Anthropic's first hybrid reasoning model, capable of either responding immediately or switching into an extended-thinking mode that produced visible chain-of-thought reasoning.[^34] The version number jumped from 3.5 to 3.7, skipping 3.6 in deference to the community label that had attached to the October Sonnet 3.5 upgrade.[^34] Claude 3.7 shipped alongside an early preview of Claude Code, Anthropic's agentic coding CLI, and it became the new default Sonnet on the API, on claude.ai, on Bedrock, on Vertex AI, and inside Cursor, GitHub Copilot, and most of the partner products that had been running 3.5 Sonnet.[^34] The 3.5 Sonnet snapshots remained on the API as legacy options.[^34]
Claude Sonnet 4 (May 22, 2025) dropped the "3.x" naming and adopted a cleaner "Sonnet N" convention, released alongside Claude Opus 4 as the headline launch of the Claude 4 family. It improved SWE-bench Verified to around 72.7% at the same price point and added native long-running task support. Claude Sonnet 4.5 (September 29, 2025) was positioned by Anthropic as "the best coding model in the world" and the strongest model for building complex agents; on launch day it reached 77.2% on SWE-bench Verified and 61.4% on OSWorld, demonstrated multi-hour autonomous coding work, and explicitly recommended itself as the migration target for legacy 3.5 Sonnet workloads.[^6]
On August 13, 2025 Anthropic notified developers that both Claude 3.5 Sonnet snapshots would be retired on October 28, 2025, with Claude Sonnet 4.6 listed as the recommended replacement.[^6] The roughly two-month notice followed Anthropic's standard retirement process, which guarantees at least 60 days of advance warning for retirements of publicly released models.[^6] Both snapshots reached end-of-life on October 28, 2025; requests to either model on the Claude API now return an error.[^6]
By the time of retirement, 3.5 Sonnet had been on the market for sixteen months, an unusually long commercial life for a frontier model, and had served as the default model in most major developer-tooling products for at least eight of those months. Its retirement closed a chapter that, more than any single previous release, had been associated with a single model: the model that beat the flagship, made Anthropic commercially competitive, defined "agentic coding" as a category, and established the $3 / $15 per million token price as the going rate for a frontier mid-tier model, a price Anthropic continued to honor through Claude Sonnet 4.5.
Anthropic has separately committed to long-term preservation of model weights and to making past models available again at some point in the future under restricted-access terms, in part because of "safety- and model welfare-related risks" the company associates with permanently retiring frontier models.[^6] Claude 3.5 Sonnet weights are preserved internally under that commitment but are not publicly redistributed.[^6]