Cursor Composer 2.5

AI Code Generation AI Models Developer Tools

7 min read

Updated Jun 3, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 3, 2026

Fact-checked

In review queue

Sources

8 citations

Revision

v1 · 1,462 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Cursor Composer 2.5 is a proprietary agentic coding model built by Anysphere, the company behind the Cursor code editor. Released on May 18, 2026, it is the fourth model in Cursor's in-house line and the immediate successor to Composer 2. Like its predecessor, it is post-trained from Moonshot AI's open-weight Kimi K2.5 checkpoint rather than trained from scratch.^[1]^[2] Cursor pitched it as a fast model for agentic work inside the editor, claiming it lands near Claude Opus 4.7 on coding benchmarks at roughly one-tenth the per-token cost.^[1]^[3] An independent evaluation by Artificial Analysis placed it third among coding agents, behind Opus 4.7 and GPT-5.5, which tempers the "matches the frontier" framing somewhat.^[4]

What it is

Composer 2.5 is an AI coding agent: a model designed to read and edit code across many files, run terminal commands, call tools, and keep working through a multi-step task with limited supervision. It is not a general chatbot and it is not sold as a standalone API. The model runs only inside Cursor's products, the desktop IDE, the Cursor CLI, and the company's web interface, so unlike Opus 4.7 or GPT-5.5 it cannot be dropped into a third-party stack behind a unified API.^[3]^[4]

The design priority is speed in the agent loop. Cursor's whole argument for building its own model, going back to the original Composer, has been that a slightly less capable model that responds quickly feels better to code with than a stronger model that makes you wait. Composer 2.5 keeps that bet. Cursor describes it as better at sustained work on long-running tasks, more reliable at following complex instructions, and better at calibrating effort, spending more compute on hard problems and less on easy ones.^[1] The company concedes that some of these behavior changes are hard to capture with existing benchmarks.^[1]

Lineage of Cursor's in-house models

Cursor spent its first few years routing requests to frontier models from Anthropic, OpenAI, and others. That changed in October 2025 with Composer, the company's first proprietary coding model, shipped alongside the Cursor 2.0 multi-agent interface. Composer was a mixture-of-experts model trained with reinforcement learning in real development environments, and Cursor claimed it generated about four times faster than comparable models while reaching frontier-level coding results on its internal tests.^[5]^[6]

The line moved quickly after that. The table below shows the progression.

Model	Released	Notes
Composer (1)	October 2025	First in-house model, shipped with Cursor 2.0; RL-trained MoE focused on speed^[5]
Composer 1.5	February 2026	Scaled-up RL iteration
Composer 2	March 27, 2026	First built on the Kimi K2.5 open checkpoint; CursorBench 61.3, SWE-Bench Multilingual 73.7^[2]
Composer 2.5	May 18, 2026	Same base, heavier post-training; SWE-Bench Multilingual 79.8^[1]

A point of confusion is worth clearing up: Composer 2.5 is a successor to Composer 1 only in the loose sense of being later in the same family. Its direct parent is Composer 2, and it shares Composer 2's foundation. Cursor's founder Scott Wu later acknowledged that the company had not been upfront about that foundation, saying it "was a miss to not mention the Kimi base in our blog from the start."^[3]

Base model and post-training

Composer 2.5 starts from Kimi K2.5, the open-weight checkpoint released by the Chinese lab Moonshot AI. Kimi K2.5 is a large mixture-of-experts model on the order of a trillion total parameters with a much smaller number active per token, which is what makes it cheap to serve at speed.^[3]^[7] Cursor does not retrain this base. Instead it layers its own work on top, and according to coverage of the launch roughly 85 percent of the total compute budget went into that post-training rather than the base model.^[3]^[7]

The post-training is mostly reinforcement learning carried out in realistic Cursor sessions, with the model given the same search and edit tools a user would have, so the rewards track real developer workflows rather than abstract puzzles.^[2] For Composer 2.5 specifically, Cursor highlighted three changes over Composer 2: about 25 times more synthetic training tasks, some generated by deleting features from real codebases and asking the model to rebuild them; a technique it calls targeted RL with textual feedback, where a written hint is injected at the exact point in a long trajectory where the model went wrong, delivered through on-policy distillation; and infrastructure work, including custom low-precision kernels for MoE training on Blackwell GPUs.^[1]^[2]

Benchmark and cost claims

Cursor's headline numbers put Composer 2.5 within a point of Opus 4.7 on two of three coding benchmarks. The figures below come from Cursor's own announcement and the coverage that reproduced its charts. They should be read with care: CursorBench is Cursor's internal harness, the competitor scores for SWE-Bench Multilingual and Terminal-Bench are self-reported by Anthropic and OpenAI, and as of the launch no third party had rerun all the models on one unified scaffold.^[3]^[8]

Benchmark	Composer 2.5	Claude Opus 4.7	GPT-5.5	Composer 2
SWE-Bench Multilingual	79.8%	80.5%	77.8%	73.7%
Terminal-Bench 2.0	69.3%	69.4%	82.7%	61.7%
CursorBench v3.1	63.2%	61.6% (default)	59.2% (default)	52.2%

The independent picture is more measured. Artificial Analysis, which runs its own coding-agent suite, scored Composer 2.5 at 62 on its Coding Agent Index, a 14-point jump over Composer 2 but third overall behind Opus 4.7 at 66 and GPT-5.5 at 65. On that group's SWE-Bench-Pro-Hard subset the model climbed from 12 percent to 47 percent, a large gain that still trails the leaders.^[4]

Where Composer 2.5 is unambiguously ahead is price. Standard-tier usage costs $0.50 per million input tokens and $2.50 per million output tokens, against $5.00 and $25.00 for both Opus 4.7 and GPT-5.5, the basis for the roughly tenfold-cheaper claim.^[1]^[8] A faster default variant, billed as having the same intelligence, costs $3.00 and $15.00.^[1] On a per-task basis Artificial Analysis measured Composer 2.5 at about $0.07 standard and $0.44 fast, versus $4.10 for Opus 4.7 and $4.82 for GPT-5.5, while its Fast mode finished a task in about 6.7 minutes of wall time on average, the third-fastest agent it tested.^[4]

How it is used in Cursor

Inside the editor, Composer 2.5 is selectable as the agent model and works the way Cursor's agent mode already did: you describe a change in natural language, and the model edits across files, runs commands, and iterates until the task is done or it hands back control. Because Cursor 2.0 introduced an interface for running several agents in parallel, a fast and cheap model is especially useful for fanning out work across multiple agents at once.^[5] New users got double the usual usage allowance during their first week to encourage trying it.^[1]

Reception

Reaction split along familiar lines. Supporters pointed to the price-to-performance ratio: if the benchmarks roughly hold, getting near-Opus coding for a tenth of the cost is a real shift for teams running many automated coding jobs.^[8] Skeptics were shaped by what some called the "Composer 2 hangover," the sense that Cursor's previous in-house model was sold with a similar benchmark story but lagged Opus and Sonnet in day-to-day engineering for months. One widely shared comment summed up the doubt bluntly, saying that on the benchmarks "it wasn't even close in practice."^[3] The recurring caveats were the same throughout: the comparison numbers lean on self-reported and internal benchmarks, and the model's IDE-only distribution makes independent, side-by-side testing harder than for an open API.^[3]^[4]

Cursor framed Composer 2.5 as a step toward something larger. The company said that, together with SpaceXAI, the entity formed by the SpaceX-xAI merger, it is training a much bigger model from scratch using about ten times more total compute, drawing on the Colossus supercomputer's expansion to a million H100-equivalents.^[1]

References

Cursor. "Introducing Composer 2.5." cursor.com, May 18, 2026. https://cursor.com/blog/composer-2-5 ↩
Cursor. "A technical report on Composer 2." cursor.com, March 27, 2026. https://cursor.com/blog/composer-2-technical-report ↩
Handy, Jake. "Model Drop: Composer 2.5." Handy AI (Substack), May 2026. https://handyai.substack.com/p/model-drop-composer-25 ↩
Artificial Analysis. "Cursor's Composer 2.5: third on the Coding Agent Index and ~10-60x lower cost than rivals." artificialanalysis.ai, May 2026. https://artificialanalysis.ai/articles/cursor-composer-2-5-coding-agent-index ↩
Cursor. "Composer: Building a fast frontier model with RL." cursor.com, October 29, 2025. https://cursor.com/blog/composer ↩
Willison, Simon. "Composer: Building a fast frontier model with RL." simonwillison.net, October 29, 2025. https://simonwillison.net/2025/Oct/29/cursor-composer/ ↩
The Decoder. "Cursor's Composer 2.5 matches Opus 4.7 and GPT-5.5 benchmarks at a fraction of the cost." the-decoder.com, May 2026. https://the-decoder.com/cursors-composer-2-5-matches-opus-4-7-and-gpt-5-5-benchmarks-at-a-fraction-of-the-cost/ ↩
Ng, Adel et al. "Composer 2.5: Benchmarks, Pricing, and How It Compares." DataCamp, May 2026. https://www.datacamp.com/blog/composer-2-5 ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

Claude Code

What it is

Lineage of Cursor's in-house models

Base model and post-training

Benchmark and cost claims

How it is used in Cursor

Reception

References

Improve this article

Related Articles

Claude Code

GitHub Copilot

Cursor (code editor)

Windsurf (software)

Replit

Bolt.new

What links here

Related Articles

Claude Code

GitHub Copilot

Cursor (code editor)

Windsurf (software)

Replit

Bolt.new