Cursor Composer 2.5
Last reviewed
Jun 3, 2026
Sources
8 citations
Review status
Source-backed
Revision
v1 · 1,462 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
8 citations
Review status
Source-backed
Revision
v1 · 1,462 words
Add missing citations, update stale details, or suggest a clearer explanation.
Cursor Composer 2.5 is a proprietary agentic coding model built by Anysphere, the company behind the Cursor code editor. Released on May 18, 2026, it is the fourth model in Cursor's in-house line and the immediate successor to Composer 2. Like its predecessor, it is post-trained from Moonshot AI's open-weight Kimi K2.5 checkpoint rather than trained from scratch.[1][2] Cursor pitched it as a fast model for agentic work inside the editor, claiming it lands near Claude Opus 4.7 on coding benchmarks at roughly one-tenth the per-token cost.[1][3] An independent evaluation by Artificial Analysis placed it third among coding agents, behind Opus 4.7 and GPT-5.5, which tempers the "matches the frontier" framing somewhat.[4]
Composer 2.5 is an AI coding agent: a model designed to read and edit code across many files, run terminal commands, call tools, and keep working through a multi-step task with limited supervision. It is not a general chatbot and it is not sold as a standalone API. The model runs only inside Cursor's products, the desktop IDE, the Cursor CLI, and the company's web interface, so unlike Opus 4.7 or GPT-5.5 it cannot be dropped into a third-party stack behind a unified API.[3][4]
The design priority is speed in the agent loop. Cursor's whole argument for building its own model, going back to the original Composer, has been that a slightly less capable model that responds quickly feels better to code with than a stronger model that makes you wait. Composer 2.5 keeps that bet. Cursor describes it as better at sustained work on long-running tasks, more reliable at following complex instructions, and better at calibrating effort, spending more compute on hard problems and less on easy ones.[1] The company concedes that some of these behavior changes are hard to capture with existing benchmarks.[1]
Cursor spent its first few years routing requests to frontier models from Anthropic, OpenAI, and others. That changed in October 2025 with Composer, the company's first proprietary coding model, shipped alongside the Cursor 2.0 multi-agent interface. Composer was a mixture-of-experts model trained with reinforcement learning in real development environments, and Cursor claimed it generated about four times faster than comparable models while reaching frontier-level coding results on its internal tests.[5][6]
The line moved quickly after that. The table below shows the progression.
| Model | Released | Notes |
|---|---|---|
| Composer (1) | October 2025 | First in-house model, shipped with Cursor 2.0; RL-trained MoE focused on speed[5] |
| Composer 1.5 | February 2026 | Scaled-up RL iteration |
| Composer 2 | March 27, 2026 | First built on the Kimi K2.5 open checkpoint; CursorBench 61.3, SWE-Bench Multilingual 73.7[2] |
| Composer 2.5 | May 18, 2026 | Same base, heavier post-training; SWE-Bench Multilingual 79.8[1] |
A point of confusion is worth clearing up: Composer 2.5 is a successor to Composer 1 only in the loose sense of being later in the same family. Its direct parent is Composer 2, and it shares Composer 2's foundation. Cursor's founder Scott Wu later acknowledged that the company had not been upfront about that foundation, saying it "was a miss to not mention the Kimi base in our blog from the start."[3]
Composer 2.5 starts from Kimi K2.5, the open-weight checkpoint released by the Chinese lab Moonshot AI. Kimi K2.5 is a large mixture-of-experts model on the order of a trillion total parameters with a much smaller number active per token, which is what makes it cheap to serve at speed.[3][7] Cursor does not retrain this base. Instead it layers its own work on top, and according to coverage of the launch roughly 85 percent of the total compute budget went into that post-training rather than the base model.[3][7]
The post-training is mostly reinforcement learning carried out in realistic Cursor sessions, with the model given the same search and edit tools a user would have, so the rewards track real developer workflows rather than abstract puzzles.[2] For Composer 2.5 specifically, Cursor highlighted three changes over Composer 2: about 25 times more synthetic training tasks, some generated by deleting features from real codebases and asking the model to rebuild them; a technique it calls targeted RL with textual feedback, where a written hint is injected at the exact point in a long trajectory where the model went wrong, delivered through on-policy distillation; and infrastructure work, including custom low-precision kernels for MoE training on Blackwell GPUs.[1][2]
Cursor's headline numbers put Composer 2.5 within a point of Opus 4.7 on two of three coding benchmarks. The figures below come from Cursor's own announcement and the coverage that reproduced its charts. They should be read with care: CursorBench is Cursor's internal harness, the competitor scores for SWE-Bench Multilingual and Terminal-Bench are self-reported by Anthropic and OpenAI, and as of the launch no third party had rerun all the models on one unified scaffold.[3][8]
| Benchmark | Composer 2.5 | Claude Opus 4.7 | GPT-5.5 | Composer 2 |
|---|---|---|---|---|
| SWE-Bench Multilingual | 79.8% | 80.5% | 77.8% | 73.7% |
| Terminal-Bench 2.0 | 69.3% | 69.4% | 82.7% | 61.7% |
| CursorBench v3.1 | 63.2% | 61.6% (default) | 59.2% (default) | 52.2% |
The independent picture is more measured. Artificial Analysis, which runs its own coding-agent suite, scored Composer 2.5 at 62 on its Coding Agent Index, a 14-point jump over Composer 2 but third overall behind Opus 4.7 at 66 and GPT-5.5 at 65. On that group's SWE-Bench-Pro-Hard subset the model climbed from 12 percent to 47 percent, a large gain that still trails the leaders.[4]
Where Composer 2.5 is unambiguously ahead is price. Standard-tier usage costs $0.50 per million input tokens and $2.50 per million output tokens, against $5.00 and $25.00 for both Opus 4.7 and GPT-5.5, the basis for the roughly tenfold-cheaper claim.[1][8] A faster default variant, billed as having the same intelligence, costs $3.00 and $15.00.[1] On a per-task basis Artificial Analysis measured Composer 2.5 at about $0.07 standard and $0.44 fast, versus $4.10 for Opus 4.7 and $4.82 for GPT-5.5, while its Fast mode finished a task in about 6.7 minutes of wall time on average, the third-fastest agent it tested.[4]
Inside the editor, Composer 2.5 is selectable as the agent model and works the way Cursor's agent mode already did: you describe a change in natural language, and the model edits across files, runs commands, and iterates until the task is done or it hands back control. Because Cursor 2.0 introduced an interface for running several agents in parallel, a fast and cheap model is especially useful for fanning out work across multiple agents at once.[5] New users got double the usual usage allowance during their first week to encourage trying it.[1]
Reaction split along familiar lines. Supporters pointed to the price-to-performance ratio: if the benchmarks roughly hold, getting near-Opus coding for a tenth of the cost is a real shift for teams running many automated coding jobs.[8] Skeptics were shaped by what some called the "Composer 2 hangover," the sense that Cursor's previous in-house model was sold with a similar benchmark story but lagged Opus and Sonnet in day-to-day engineering for months. One widely shared comment summed up the doubt bluntly, saying that on the benchmarks "it wasn't even close in practice."[3] The recurring caveats were the same throughout: the comparison numbers lean on self-reported and internal benchmarks, and the model's IDE-only distribution makes independent, side-by-side testing harder than for an open API.[3][4]
Cursor framed Composer 2.5 as a step toward something larger. The company said that, together with SpaceXAI, the entity formed by the SpaceX-xAI merger, it is training a much bigger model from scratch using about ten times more total compute, drawing on the Colossus supercomputer's expansion to a million H100-equivalents.[1]