Devin (AI software engineer)

AI Agents AI Companies AI Tools & Products

45 min read

Updated Jun 21, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 21, 2026

Fact-checked

In review queue

Sources

44 citations

Revision

v9 · 8,985 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Devin is an autonomous AI software engineering agent developed by Cognition AI that plans, writes, tests, debugs, and deploys code inside a sandboxed cloud environment, taking a natural-language task and returning a finished pull request with little human intervention. Founded in 2023 by Scott Wu, Steven Hao, and Walden Yan, Cognition announced Devin on March 12, 2024, marketing it as the "first AI software engineer," equipped with a shell, code editor, and web browser ^[1]. The announcement generated enormous attention in the AI and software development communities, along with significant controversy about the accuracy of its demonstrated capabilities.

Devin reached general availability in December 2024 at a price of $500 per month for engineering teams, opening with the line "Today we're making Devin generally available starting at $500 a month for engineering teams" ^[2]. Cognition subsequently launched Devin 2.0 in April 2025 at a dramatically reduced starting price of $20 per month, alongside a major feature overhaul ^[3]. The company has since grown rapidly, acquiring the Windsurf IDE in July 2025 and raising more than $1 billion at a $25 billion pre-money valuation ($26 billion post-money) in a round that closed on May 27, 2026, led by Lux Capital, General Catalyst, and 8VC ^[4]^[34]^[44].

By May 2026, Devin has merged hundreds of thousands of pull requests across thousands of companies, with its PR merge rate climbing from 34% to 67% year over year ^[19]. The system has expanded from a single chat-based interface into a multi-product AI development platform encompassing the autonomous Devin agent, the Windsurf IDE, Devin Wiki, and Devin Search, with proprietary SWE-1.5 and SWE-1.6 coding models now powering production deployments at Goldman Sachs, Citigroup, and dozens of other large enterprises ^[35]^[36].

What is Devin?

Devin is a cloud-hosted, fully autonomous AI software engineering agent: a user describes a task in plain language, and Devin produces a step-by-step plan, then executes it across files using a terminal, code editor, and browser, reporting progress and ultimately opening a pull request for human review ^[1]. Unlike autocomplete-style assistants that suggest the next line of code, Devin is designed to own an entire well-scoped engineering ticket end to end, which is why Cognition describes it as a teammate rather than a tool. Devin works best on delegated, junior-level tasks where Cognition advises that "you give Devin tasks that you know how to do yourself" ^[2].

Background and founding

Cognition AI

Cognition AI (originally Cognition Labs) was founded in August 2023 by three former competitive programmers: Scott Wu (CEO), Steven Hao, and Walden Yan. All three founders won gold medals at the International Olympiad in Informatics (IOI), and the team's background in competitive programming shaped the company's approach to building AI systems capable of complex reasoning and problem-solving ^[5].

Scott Wu, born in 1997 in Louisiana to a Chinese immigrant family, attended Harvard University where he studied economics before leaving the program early. He won three gold medals at the IOI, including a first-place finish in 2014, establishing himself as one of the top competitive programmers in the world ^[5]. Wu also competed in Mathcounts, winning the individual championship in 2011, and represented Harvard in the 2016 International Collegiate Programming Contest (ICPC), where the team won a gold medal and placed third overall. Before founding Cognition, Wu co-founded and served as CTO of Lunchclub, an AI networking platform backed by Lightspeed and Coatue, from 2017 to 2022 ^[5].

Steven Hao and Walden Yan brought complementary experience from companies including Jane Street and Google. The team had originally explored cryptocurrency-related projects before pivoting to AI following the release of ChatGPT and the resulting surge in interest in large language models ^[25].

The company emerged from stealth in March 2024 alongside the Devin announcement, immediately positioning itself at the center of the AI-assisted software development movement.

Early funding

Cognition raised a $21 million Series A round from Peter Thiel's Founders Fund shortly before the Devin announcement. Just one month later, in April 2024, the company secured an additional $175 million at a $2 billion valuation, again led by Founders Fund ^[6]. This rapid follow-on funding reflected the intense investor enthusiasm surrounding AI coding tools and the viral reception of Devin's initial demo.

How Devin works

Architecture and environment

Devin operates within a sandboxed computing environment that replicates the tools a human software engineer would use daily. This environment includes:

Component	Function
Shell/Terminal	Executes commands, installs packages, runs scripts
Code editor	Writes and modifies source code across files
Web browser	Reads documentation, searches for solutions, accesses APIs
Planner	Breaks tasks into steps and tracks progress
Chat interface	Communicates with human users for clarification and status updates
Virtual desktop	(Devin 2.2+) Full Linux desktop for testing desktop applications

Unlike simpler AI code generation tools that provide autocomplete suggestions or respond to single prompts, Devin is designed to handle multi-step engineering tasks autonomously. A user provides a natural language description of a task through a chatbot-style interface, and Devin develops a detailed, step-by-step plan before executing it using its developer tools ^[1].

Planning and execution

Devin's core differentiator is its ability to engage in long-term reasoning and planning. For a given task, the system can:

Analyze the requirements and break the problem into subtasks
Write code across multiple files and languages
Execute the code and observe the output
Identify errors from stack traces and debugging output
Iteratively fix issues until the task is complete
Report progress to the user in real time

The agent can recall relevant context at each step of a multi-step process, learning from mistakes made earlier in the session. When working on an existing codebase, Devin indexes the repository to understand the project structure, dependencies, and coding patterns before making changes ^[1].

Proprietary models: SWE-1 and SWE-1.5

Cognition develops its own proprietary AI models optimized specifically for software engineering tasks. The SWE model family powers both Devin and the Windsurf IDE.

SWE-1 was Cognition's first internally developed coding model, designed to handle the multi-step reasoning and tool use required by autonomous software engineering agents.

SWE-1.5, announced in October 2025, represents a major leap in both performance and speed ^[26]. It is a frontier-scale model with hundreds of billions of parameters. Cognition states that "SWE-1.5 is trained on our state-of-the-art cluster of thousands of GB200 NVL72 chips," which may make it the first public production model trained on the GB200 generation of NVIDIA hardware ^[26].

Metric	SWE-1.5 Performance
Inference speed	Up to 950 tokens/second
Speed vs. Haiku 4.5	6x faster
Speed vs. Sonnet 4.5	13x faster
SWE-Bench Pro score	40.08%
Infrastructure partner	Cerebras

SWE-1.5 achieves near state-of-the-art coding performance while prioritizing speed, a critical factor for autonomous agents that iterate through many steps per task. On Scale AI's SWE-Bench Pro benchmark, SWE-1.5 scored 40.08%, ranking second after Claude Sonnet 4.5's 43.60% ^[26]. The speed advantage comes from a partnership with Cerebras, whose wafer-scale inference hardware enables the high token throughput. In the same launch post, Cognition argued that benchmarks are an imperfect guide to real-world utility, writing that "performance on coding benchmarks is often not representative of the real-world experience of using an agent, which is why we stopped reporting SWE-Bench numbers in 2024" ^[26].

In addition to SWE-1.5, Cognition has developed SWE-grep, a specialized model for codebase search and understanding that powers the Devin Search feature.

SWE-1.6, which Cognition began previewing in late 2025 and rolled out more broadly through early 2026, was post-trained on the same pre-trained base as SWE-1.5 and runs at the same 950 tokens per second, but scores roughly 11% higher than SWE-1.5 on Scale AI's SWE-Bench Pro and surpasses the top open-source coding models on the same benchmark ^[37]^[35]. Cognition described SWE-1.6 as the product of a refined reinforcement-learning recipe, stating that "since training SWE 1.5, we have refined our RL recipe and scaled our infrastructure to unlock two orders of magnitude more compute," including a substantially larger set of training environments and stricter data-quality filtering, and noted that SWE-1.6 "was trained on thousands of GB200 NVL72 chips" ^[35]. The early-access checkpoint was distributed to a small subset of Devin users so the company could tune behavioral issues such as overthinking and excessive self-verification before a broader release ^[35]. By mid-2026, SWE-1.6 had become the default model behind both Devin and the Windsurf IDE's Cascade agent, available in a free tier (200 tokens/second) and a fast tier (950 tokens/second), with SWE-1.5 retained as a faster fallback for short, latency-sensitive tasks ^[37].

SWE model lineage

Model	Released	Pre-training	Post-training focus	Key result
SWE-1	Mid-2025	Cognition base	First in-house coding model	Initial production model for Windsurf and Devin
SWE-1.5	October 2025	Frontier base (hundreds of billions of params)	Speed and tool-use efficiency	40.08% on SWE-Bench Pro at 950 tok/s ^[26]
SWE-1.6 (preview)	Late 2025	Same base as SWE-1.5	Refined RL recipe, expanded environments	~11% relative gain over SWE-1.5 on SWE-Bench Pro ^[35]
SWE-grep	2025	Specialized	Codebase search	Powers Devin Search and Devin Wiki ^[26]

Devin Agent Preview with Claude Sonnet 4.5

In late 2025, Cognition released a Devin Agent Preview built on Anthropic's Claude Sonnet 4.5, offering an alternative to the SWE model family ^[27]. The integration required significant architectural changes because Sonnet 4.5 operates differently from previous models that Devin was built around.

Key findings from the integration:

Planning performance increased by 18% compared to the previous Claude-based version
End-to-end evaluation scores improved by 12%, the biggest jump since Claude Sonnet 3.6
Sonnet 4.5 executes parallel tool calls (running multiple bash commands and reading several files simultaneously) rather than working sequentially
The model is notably more proactive about writing and executing short test scripts to create feedback loops

The Devin Agent Preview with Sonnet 4.5 is 2x faster and 12% better on Cognition's Junior Developer Evaluations compared to the previous Devin agent ^[27].

Capabilities

At launch, Cognition demonstrated Devin performing a range of software engineering tasks:

Deploying and improving applications and websites end-to-end
Finding and fixing bugs in large codebases
Setting up and fine-tuning large language models using research repositories from GitHub
Learning unfamiliar technologies by reading documentation
Completing freelance tasks on platforms like Upwork
Passing technical interviews at leading AI companies

A Bloomberg test showed that Devin could create a website within approximately ten minutes and recreate a Pong game in a similar timeframe ^[7].

With production experience from thousands of enterprise deployments, Cognition's 2025 performance review identified specific use cases where Devin delivers the strongest results ^[19]:

Use Case	Performance Metric
Security vulnerability fixes	20x faster than human engineers (1.5 min vs. 30 min average)
ETL file migration	10x improvement (3-4 hours vs. 30-40 hours)
Java version migration	14x less time than a human engineer
Test coverage improvement	Typical increase from 50-60% to 80-90%
Data features (EightSleep)	3x more features and investigations shipped
Regression testing (Litera)	40% test coverage increase, 93% faster regression cycles

SWE-bench performance

Devin's initial announcement highlighted its performance on SWE-bench, a benchmark that evaluates AI systems on their ability to resolve real GitHub issues from popular open-source Python repositories.

How did Devin score on SWE-bench at launch?

System	SWE-bench (unassisted)	Date
Previous state-of-the-art (unassisted)	1.96%	Pre-March 2024
Previous SOTA (assisted)	4.80%	Pre-March 2024
Devin	13.86%	March 2024

According to Cognition's SWE-bench technical report, Devin was evaluated on 570 issues, a randomly selected 25% of the full 2,294-issue SWE-bench dataset, and resolved 79 of them, yielding a 13.86% success rate without any human assistance ^[32]. The report states: "Devin successfully resolves 13.86% of issues, far exceeding the previous highest unassisted baseline of 1.96%" (the prior best being Claude 2 with BM25 retrieval) ^[32]. Because Devin navigates files on its own rather than being told which files to edit, Cognition argued the result is most fairly compared to the "unassisted" LLM setting, representing roughly a 7x improvement over the best prior unassisted system ^[1]^[32].

However, the SWE-bench results came with important caveats that critics were quick to point out. The 13.86% figure, while a genuine improvement, still meant that Devin failed on more than 86% of tasks. Additionally, the evaluation ran against a random 25% subset rather than the full benchmark, and questions arose about task selection and evaluation methodology that Cognition's marketing materials did not fully address ^[8].

How have SWE-bench scores changed since 2024?

SWE-bench scores across the industry rose rapidly throughout 2024 and 2025, with multiple competing systems eventually far surpassing Devin's initial benchmark. The major frontier model families (Claude 4, Claude Opus 4.7, and GPT-5) all reported SWE-bench Verified scores above 70% by late 2025, and GPT-5.1 and follow-on Anthropic releases pushed those numbers higher again. By spring 2026, leading systems achieved scores around 80% on SWE-bench Verified:

System	SWE-bench Verified	Date
Claude Opus 4.5 (Anthropic)	~80.9%	Q1 2026
Claude Opus 4.6 (Anthropic)	~80.8%	Q1 2026
Gemini 3.1 Pro (Google)	~80.6%	Q1 2026
Claude Code (Anthropic, with scaffold)	~78.4%	Q1 2026
OpenAI Codex (GPT-5.1 scaffold)	~71.0%	Q1 2026
Cursor agent	~67.2%	Q1 2026
Devin 2.0 (Cognition)	~60.8% / ~45.8% standard run	2025-2026
SWE-1.5 (Cognition)	40.08% (SWE-Bench Pro)	October 2025
Amazon Q Developer	Top scores on SWE-bench Leaderboard	2025

Cognition's reported Devin 2.0 number is approximately 45.8% on SWE-bench Verified under what the company calls a "standard" evaluation: single-agent, no human in the loop, and no best-of-N voting. The higher 60.8% figure circulating in some 2026 leaderboards uses a more aggressive scaffold and multi-attempt setup ^[38]^[39]. Cognition itself has publicly distanced its product roadmap from SWE-bench, arguing in its SWE-1.5 launch post that "performance on coding benchmarks is often not representative of the real-world experience of using an agent" and that internal junior-developer evaluations, merge rates, and Agent Compute Unit efficiency are better proxies for production utility ^[26]^[19].

The rapid improvement in benchmark scores across the industry underscores how quickly the AI coding agent landscape has evolved since Devin's initial demonstration. The gap on SWE-bench Verified between Claude-based scaffolds and Devin's standard run also illustrates a broader pattern: scaffolds, prompt strategies, and best-of-N voting can move scores by 15 to 35 percentage points on top of the underlying model, which is one of the reasons Cognition has shifted its public marketing toward enterprise outcome metrics.

Pricing and plans

Devin's pricing has evolved significantly since its initial launch.

Version	Date	Starting Price	Details
Devin 1.0 (GA)	December 2024	$500/month	Team plan with 250 ACUs, aimed at engineering teams
Devin 2.0	April 2025	$20/month	Core plan for individuals; Team plan at $500/month/seat + $2.00/ACU

The original $500/month price point positioned Devin as a premium tool for professional engineering teams. With Devin 2.0, Cognition slashed the entry price by 96%, making the tool accessible to individual developers for the first time ^[3].

The pricing model revolves around Agent Compute Units (ACUs), where approximately 15 minutes of active Devin work equals one ACU. The Core plan at $20/month includes a limited number of ACUs, while the Team plan provides additional ACUs at $2.00 each plus the per-seat fee ^[9].

Controversies and criticism

Demo accuracy

The initial excitement surrounding Devin's March 2024 announcement was followed by swift backlash as independent developers attempted to reproduce the demonstrated tasks. The most prominent critique came in April 2024 from veteran software developer Carl Brown, who runs the YouTube channel "Internet of Bugs." Brown published a roughly 27-minute video titled "Debunking Devin: 'First AI Software Engineer' Upwork Lie Exposed," in which he walked through Cognition's "Devin's Upwork Side Hustle" demo frame by frame and compared the on-screen actions to the actual Upwork posting and the underlying GitHub repository ^[10].

According to Brown's analysis, the demo and the underlying task differed in several specific ways:

The Upwork posting Devin was shown working on was a request for help running an existing computer-vision model, not a request to write or modify code. The customer had asked for setup instructions and configuration help, while Devin's session focused on editing files in the repository.
Some of the files Devin appeared to "fix" in the demo did not contain the bugs Devin claimed to find. In at least one case, Brown said the file Devin pointed to as broken did not exist in the referenced repository at all, and Devin was working in a file it had created itself rather than one from the original codebase.
A subset of the errors Devin "discovered and resolved" appeared to have been introduced earlier in the same session by Devin's own actions, so the agent was effectively self-correcting fictional problems rather than fixing pre-existing issues.
The full session, edited down to roughly two minutes for Cognition's marketing video, ran for several hours of agent time. Brown set up the same task on his own machine and finished it manually in about 36 minutes, which he argued undercut the framing of Devin as a faster-than-human freelancer.

Brown emphasized in the video and in subsequent interviews that he was not opposed to AI coding tools as a category, repeating a line that became widely quoted in coverage of the controversy: "I am not anti-AI, but I really am anti-hype" ^[10]. The video gathered hundreds of thousands of views within days and was cited extensively in trade press coverage at TechCrunch, Futurism, The Register, and Tweaktown, with Tweaktown picking up the related framing that Devin failed roughly 85% of its assigned tasks based on the SWE-bench Lite figure ^[33].

Critics argued that the demonstration examples were cherry-picked to present Devin in the most favorable light possible, and that the showcased tasks were significantly simpler than portrayed ^[10]. As one widely shared analysis put it, "the product that we are sold isn't fully congruent with the product we see" ^[28].

Cognition's response

Cognition did not retract the original demo, but the company's public posture shifted noticeably in the months that followed. Scott Wu, in interviews around the time of the Series B announcement, framed the launch video as illustrative rather than as a fully reproducible benchmark, emphasizing that Devin was a research preview and that the SWE-bench technical report was the more rigorous artifact ^[32]. Cognition's subsequent blog posts focused on benchmark numbers, customer case studies, and product updates rather than on viral demos, and later launches (including Devin 2.0 and the Devin Agent Preview built on Claude Sonnet 4.5) leaned more on quantitative claims about merge rates, ACU efficiency, and junior developer evaluations than on cinematic walkthroughs.

By the time of the December 2024 general availability launch, the company had pulled back from the strongest readings of the "first AI software engineer" line and was instead positioning Devin as a teammate aimed at well-scoped, junior-level engineering work, alongside human reviewers rather than in place of them, advising customers to "give Devin tasks that you know how to do yourself" ^[2]. The Internet of Bugs critique is now widely cited as a turning point in the public discussion of agentic coding tools, both for sharpening the standard of evidence demanded of demos and for sparking durable skepticism that subsequent agentic releases (including OpenAI Operator and Anthropic's computer use feature, both covered separately below) had to navigate when they shipped their own capabilities.

Benchmark concerns

Beyond the demo issues, the SWE-bench claims drew scrutiny. While Cognition published a technical report detailing their methodology, independent observers noted that the company's marketing materials emphasized the 13.86% figure without adequately contextualizing the 86% failure rate. The choice to evaluate on a random 25% subset of SWE-bench (570 of 2,294 issues) rather than the full benchmark also raised questions, though partial-subset evaluation was a common practice among AI coding tool developers at the time ^[8]^[32].

Answer.AI evaluation

In January 2025, researchers at Answer.AI published a detailed evaluation of Devin 1.0 across 20 real-world software engineering tasks. The results were sobering ^[11]:

Outcome	Count	Percentage
Failures	14	70%
Successes	3	15%
Inconclusive	3	15%

Among the three successful tasks were relatively straightforward operations like pulling a Notion database into Google Sheets and researching how to build a Discord bot in Python. The failures were more concerning: Devin struggled with tasks involving existing codebases, where understanding context and maintaining consistency with established patterns was required. When asked to migrate a Python project to nbdev, for example, Devin could not grasp even basic setup procedures despite having access to comprehensive documentation ^[11].

The researchers highlighted several recurring problems:

Tasks that seemed straightforward often took days rather than hours
Devin would pursue impossible solutions for extended periods rather than recognizing fundamental blockers
The autonomous nature that was supposed to be an advantage became a liability, as the agent wasted time on dead-end approaches
Results were unpredictable; even tasks similar to successful ones could fail in complex ways

"First AI software engineer" claim

The marketing claim that Devin was the "first AI software engineer" drew criticism from the developer community. While Devin represented a step forward in autonomous coding agents, earlier tools such as GitHub Copilot, ChatGPT, and various coding assistants had already been writing code for years. The distinction Cognition drew was that Devin could handle entire engineering workflows autonomously rather than just assisting with individual coding tasks, but many observers felt the "first AI software engineer" label was hyperbolic ^[8].

Devin 2.0

Cognition released Devin 2.0 on April 3, 2025, representing a major overhaul of the product ^[3].

Key features

Agent-native IDE: Devin 2.0 introduced a full IDE experience that resembles Visual Studio Code, moving beyond the chat-only interface of version 1.0. Developers interact with Devin directly within a coding environment rather than solely through a conversational interface.

Interactive planning: A core addition that allows developers to begin with broad or incomplete ideas and collaboratively scope out a detailed task plan with Devin. Within seconds of starting a session, Devin analyzes the codebase, identifies relevant files, and proposes an initial plan that developers can refine before execution begins.

Parallel multi-agent support: Devin 2.0 supports running multiple Devin agents simultaneously, each with its own isolated workspace. This allows teams to tackle several tasks in parallel.

Codebase understanding: Devin now automatically indexes repositories at regular intervals, generating detailed documentation including architecture diagrams and source links. The Devin Search feature allows developers to ask natural-language questions about their codebase and receive cited responses.

Devin Wiki: An automatically generated, continuously updated documentation system for codebases. Devin Wiki produces textual explanations, architecture diagrams, and links to relevant source code, giving teams a living reference that stays current as the code evolves ^[3]^[16].

Devin Search: A natural-language search feature that lets developers query their codebase and receive answers with citations pointing to specific files, functions, and code blocks ^[3].

Performance improvements: According to Cognition's internal benchmarks, Devin 2.0 completes 83% more junior-level development tasks per ACU compared to the 1.x series. Devin's Fast Agent Model (SWE-1.5) allows it to iterate through bugs up to 10x faster than a human engineer ^[3]^[17].

Devin 2.2

Cognition released Devin 2.2 on February 24, 2026, adding several significant capabilities ^[12]:

Feature	Description
Desktop computer use	Full access to a Linux desktop for launching and testing desktop applications, beyond the browser-only testing in earlier versions
Self-reviewing pull requests	Devin Review runs an automated quality pass on every PR, identifying logic errors, missing edge cases, and style violations before human review
Video-recorded QA sessions	Devin can request to QA its own PR, run the application, click through the UI on its desktop, and send an edited recording of the testing session for developer review
3x faster startup	Session boot time dropped from approximately 45 seconds to roughly 15 seconds, making Devin viable for quick tasks
Linear integration	Smoother Slack and Linear integrations to start sessions without switching context
Redesigned interface	Fully overhauled UI connecting every step of the development lifecycle

The desktop computer-use capability is particularly notable. While Devin has always been able to use a browser to test web applications, version 2.2 gives it access to a full Linux desktop, enabling it to launch, interact with, and test desktop applications as well. This expands the range of tasks Devin can autonomously verify, including interactions with tools like Figma and Photoshop ^[12].

The Devin Review feature represents a shift toward quality assurance automation. Rather than simply generating code and submitting it for human review, Devin now performs its own first pass at code review, catching logic errors, missing edge cases, and style violations before the pull request reaches a human reviewer ^[12].

Windsurf acquisition

In July 2025, Cognition signed a definitive agreement to acquire Windsurf (formerly Codeium), an AI-powered integrated development environment ^[4]. The acquisition, estimated at approximately $250 million, followed a dramatic sequence of events in the AI coding tool space.

OpenAI had been in talks to acquire Windsurf for roughly $3 billion in April 2025, but the deal collapsed. Google then hired Windsurf's co-founder and CEO Varun Mohan in a $2.4 billion licensing arrangement ^[13]. Cognition moved quickly, reaching a deal over a single weekend with the remaining Windsurf team and assets.

Acquisition detail	Value
Estimated price	~$250 million
Windsurf ARR at time of deal	$82 million
Enterprise customers	350+
Employees joining Cognition	~210
Date signed	July 14, 2025

The acquisition gave Cognition access to Windsurf's IDE product, brand, intellectual property, and fast-growing enterprise customer base. All Windsurf employees participated financially in the deal with fully accelerated vesting ^[4].

Strategic rationale

The Windsurf acquisition transformed Cognition from a single-product company into a multi-product AI development platform. The combined entity now addresses the full spectrum of AI-assisted development:

Product	Type	Use Case
Devin	Fully autonomous agent	Complex, multi-step tasks that can run independently
Windsurf IDE	Interactive AI code editor	Real-time coding assistance with developer in the loop
Devin Wiki	Documentation tool	Automated codebase documentation
Devin Search	Codebase search	Natural-language queries about code

Following the acquisition, Cognition's combined annual recurring revenue more than doubled, with enterprise ARR growing by more than 30% ^[14].

Enterprise adoption

Devin's enterprise traction has grown significantly through 2025 and into 2026, with deployments at major financial institutions, technology companies, government agencies, and other large organizations.

Goldman Sachs deployment

In July 2025, Goldman Sachs became the first major bank to pilot Devin, marking a significant milestone for autonomous AI agents in enterprise environments. The deployment is notable for its scale and ambitions ^[22]^[23]:

Goldman Sachs is testing Devin across its 12,000-person programming team
Hundreds of Devin instances were initially deployed, with potential expansion to thousands
The initiative focuses on repetitive programming tasks including legacy code management, refactoring, and debugging
Goldman Sachs reported productivity improvements of 3x to 4x compared to previous AI tools
The bank views Devin as the beginning of a "hybrid workforce" model where AI agents work alongside human engineers

Goldman's CIO described a vision for a near-future "hybrid workforce" where humans and AI coexist, and identified a new type of employee called "AI natives" who would be fluent in managing autonomous agents, delegating tasks, supervising results, and remaining accountable for what the AI delivers ^[23].

Citigroup deployment

Less than two weeks after the Goldman Sachs announcement, Citigroup confirmed that it had selected Devin as the centerpiece of an agentic AI rollout to its roughly 40,000-person developer organization, making it the second major Wall Street bank to commit to Cognition's autonomous agent at scale ^[36]. Tim Ryan, Citi's head of technology, framed the deployment in terms similar to Goldman's hybrid-workforce language, but emphasized a tighter human-in-the-loop pattern in the early phases.

Citi's initial Devin workloads were deliberately scoped to deterministic, easily testable tasks:

Citi use case	Typical workflow
Library and dependency upgrades	Devin scans repositories, identifies impacted files, runs automated tests, and presents a diff for human approval
Middleware patching	Devin applies vendor-issued patches across hundreds of services and surfaces regressions before deployment
Cross-language code rewrites	Devin translates legacy modules between languages (for example, Java to Kotlin, or VB to C#) with reviewer sign-off
Code reviews	Devin performs a first automated review pass before pull requests reach a human reviewer

In April 2026, Citi unveiled a centralized agentic-AI platform called Arc, described internally as an "operating system" for AI agents that orchestrates Devin alongside other autonomous tools across the bank's engineering organization ^[40]. Arc rolls out to developers first and is expected to extend to broader business units later in 2026. Citi's chief information officer told American Banker that the bank expects Devin's task scope to expand from "simple but tedious" upgrades into more complex multi-service refactors as confidence in agent outputs grows ^[36]^[40].

The back-to-back commitments from Goldman Sachs and Citi turned Wall Street into Cognition's most visible reference market and were widely cited as a key signal driving the 2026 funding round that valued the company at $25 billion ^[34]^[41]^[44].

Enterprise customer list

As of early 2026, Devin and Windsurf power engineering teams at category-defining organizations ^[14]^[17]:

Sector	Customers
Financial services	Goldman Sachs, Citi, Santander, Nubank
Technology	Dell, Cisco, Palantir, Ramp
E-commerce/Retail	Mercado Libre
Defense	Anduril, Sierra Nevada Corporation (SNC)
Government	U.S. Army, U.S. Navy, Treasury Department, NASA-JPL
Consulting	Cognizant

Cognizant partnership

On January 28, 2026, Cognizant announced a strategic partnership with Cognition to deploy Devin and Windsurf across its engineering organization and global client base ^[29]. The partnership combines Devin and Windsurf with Cognizant's Flowsource platform, a unified full-stack engineering tool designed to integrate generative and agentic AI across the software development lifecycle.

Cognizant reported that 30% of its code was already generated with AI at the time of the partnership announcement, with an aim to reach 50% in the near future. The initial focus areas are enterprise modernization and engineering transformation programs ^[29].

Government deployment

On February 25, 2026, Cognition launched "Cognition for Government," a program aimed at modernizing critical defense and civilian software systems ^[24]^[30]. The program targets a specific pain point: of the $100 billion the U.S. government spends annually on IT, nearly 80% goes to maintaining existing systems rather than building new ones. Only 3 of 10 critical legacy systems flagged by the Government Accountability Office (GAO) in 2019 have been modernized.

Government-specific details:

Feature	Status
Windsurf IDE	FedRAMP High authorized; DoD IL4/5/6 accredited; SOC 2 Type II certified
Devin	Available in AWS GovCloud; FedRAMP High authorization forthcoming
Data retention	Zero data retention policy; CUI and ITAR compliant
Deployment options	Cloud and on-premises

Cognition reports that Devin can complete migrations 5 to 40x faster than human engineers. Common government use cases include SAS to PySpark migrations, COBOL modernization, Angular to React conversions, .NET Framework to .NET Core upgrades, and migrating off proprietary frameworks ^[30].

Task suitability

Based on 18 months of production data, Cognition has identified the types of tasks where Devin performs best versus where it struggles:

Task Type	Devin Performance	Notes
Well-defined, scoped tasks	Strong	Clear requirements with verifiable outcomes
Junior-level tasks (4-8 hours of human work)	Strong	Sweet spot for autonomous completion
Bug fixing in familiar codebases	Moderate to strong	Effective when patterns are clear
Security vulnerability remediation	Strong	20x faster than human average
Code migration	Strong	5-40x faster depending on complexity
Legacy code refactoring	Moderate	Improves with Devin Wiki context
Test coverage expansion	Strong	Routinely increases coverage from 50-60% to 80-90%
Complex architectural decisions	Weak	Requires human judgment
Open-ended exploration	Weak	Tends to pursue dead ends
Large, unfamiliar codebases	Moderate	Improved significantly with 2.0 codebase indexing

Funding history

Cognition's funding trajectory reflects the intense competition and high valuations in the AI coding space.

Round	Date	Amount	Valuation	Lead Investor
Series A	Early 2024	$21M	Undisclosed	Founders Fund
Series B	April 2024	$175M	$2B	Founders Fund
Growth round	Early 2025	Undisclosed	~$4B	8VC
Series C	August 2025	~$500M	$9.8B	Founders Fund
Post-Windsurf round	September 2025	$400M	$10.2B	Founders Fund
Series E	May 2026	$1B+	$25B pre-money / $26B post-money	Lux Capital, General Catalyst, 8VC ^[44]

Peter Thiel's Founders Fund has led multiple rounds, with additional participation from Lux Capital, 8VC, Bain Capital Ventures, Neo, Elad Gil, Definition Capital, D1 Capital, and Hanabi Capital ^[14]^[17]. Including the May 2026 round, Cognition has raised well over $1.5 billion across its closed financings.

Path to a $25 billion valuation

In late April 2026, Bloomberg and SiliconANGLE reported that Cognition was in early talks to raise a new round at a $25 billion valuation, roughly 2.5 times its $10.2 billion mark from September 2025 and more than 12 times the $2 billion Series B price set just two years earlier ^[34]^[41]. The round closed on May 27, 2026: TechCrunch reported that "Cognition, the makers of the autonomous AI software engineer named Devin, has raised more than $1 billion at a $25 billion pre-money valuation ($26 billion post money)," with Lux Capital, General Catalyst, and 8VC leading the financing ^[44].

Investors and analysts attributed the step-up in valuation to four converging factors:

The Goldman Sachs and Citi flagship deployments, which validated Devin in highly regulated, security-sensitive enterprise environments
A reported eightyfold year-over-year increase in enterprise Devin usage, against a backdrop of doubling weekly session counts every six weeks ^[19]^[34]
Combined Cognition/Windsurf enterprise ARR growth of more than 30% in the seven weeks immediately following the Windsurf acquisition close, with continued momentum through Q1 2026 ^[14]^[17]
Industry-wide repricing of AI coding companies, after Anysphere's Cursor crossed $2 billion in annualized revenue and Anthropic's Claude Code became the top-rated agent in 2026 developer surveys ^[15]^[18]

The $25 billion mark placed Cognition in the same valuation bracket as Anysphere and well above other autonomous-agent peers, even as critics noted that the underlying Devin product still resolves only a minority of SWE-bench Verified tasks under its own standard evaluation ^[38].

Revenue growth

Metric	Value	Date
Devin ARR	~$1M	September 2024
Devin ARR	~$73M	June 2025
Windsurf ARR (at acquisition)	$82M	July 2025
Combined estimated ARR	~$150M+	Late 2025
Total net burn (company history)	Under $20M	Through mid-2025

Cognition's capital efficiency is notable: the company achieved $73M ARR with total net burn under $20M across its entire history, one of the most efficient growth trajectories in enterprise software ^[17].

Internal adoption metrics

Cognition uses Devin extensively for its own internal development. In their best week of 2025, Cognition's team merged 154 Devin-generated PRs internally. By early 2026, that figure had grown to 659 PRs in a single week, with total Devin sessions per week across all enterprise customers doubling in just six weeks ^[19].

Competition

Devin operates in a crowded and rapidly evolving market for AI-powered development tools. Its competitive position is shaped by the distinction between fully autonomous agents and interactive coding assistants.

Autonomous agents vs. interactive assistants

Devin's primary differentiator is its autonomous operation: given a task, it plans, executes, and delivers results with minimal human intervention. Most competing tools take a more interactive approach, assisting developers in real time as they write code rather than working independently.

Tool	Developer	Type	Pricing (starting)
Devin	Cognition AI	Autonomous agent	$20/month
Claude Code	Anthropic	Terminal-based agent	Included with API usage
Cursor	Anysphere	AI-powered IDE	Free (Hobby)
GitHub Copilot	GitHub/Microsoft	IDE extension + agent mode	$10/month
Windsurf	Cognition AI	AI-powered IDE	$15/month
Amazon Q Developer	Amazon Web Services	IDE extension + agent	Free tier available
SWE-Agent	Princeton NLP	Open-source agent	Free (open source)
Augment Code	Augment	AI coding platform	Enterprise pricing

Claude Code

Claude Code, released by Anthropic in 2025, operates as a terminal-based coding agent that can handle complex multi-file refactors, debug subtle architectural issues, and navigate unfamiliar codebases. By early 2026, Claude Code had achieved a 46% "most loved" rating among developers, making it the top-rated AI coding tool in developer surveys ^[15]. On SWE-bench Verified, Claude Code's published scaffold scores approximately 78.4%, well above Devin's standard-evaluation 45.8% and its scaffolded 60.8% result ^[38]^[42]. Unlike Devin, Claude Code runs locally in the developer's terminal rather than in a cloud-based sandbox, giving developers more direct control over the agent's actions. The two products are increasingly framed as complementary rather than competitive: Claude Code is positioned as a pair-programming partner for human-supervised work, while Devin specializes in delegated, asynchronous tasks where the developer reviews only the final pull request. Cognition itself ships a Devin Agent Preview that uses Claude Sonnet 4.5 as the underlying model, blurring the boundary between the two ecosystems ^[27].

Cursor

Cursor, developed by Anysphere, is an AI-powered code editor built as a fork of Visual Studio Code. It offers Tab autocomplete, chat, multi-file editing, and agent mode with background agents. Cursor surpassed $2 billion in annualized revenue by early 2026 and is used by over half of the Fortune 500 ^[18]. Cursor's agent mode posts a SWE-bench Verified score around 67.2% with its default scaffold, sitting above Devin's standard-run number but below Claude Code's scaffolded result ^[38]^[42]. Cursor takes a more interactive approach than Devin, keeping the developer closely involved in the coding process. Cognition and Anysphere thus occupy opposite ends of the AI coding spectrum: Cursor optimizes for the developer-in-the-loop case where the human authors each commit, while Devin optimizes for the asynchronous case where the human reviews only the final result.

GitHub Copilot

GitHub Copilot, backed by Microsoft's distribution and GitHub's developer ecosystem, remains the most widely deployed AI coding assistant. In 2025 and 2026, GitHub expanded Copilot with agent mode capabilities and a coding agent that can work on assigned issues autonomously, narrowing some of the feature gaps that initially distinguished tools like Devin ^[15]. The Copilot coding agent became generally available in 2026, enabling asynchronous autonomous development within the GitHub ecosystem.

Amazon Q Developer

Amazon Q Developer, Amazon Web Services' AI coding assistant, has emerged as a strong competitor particularly in enterprise environments already using AWS. Amazon Q Developer achieved top scores on the SWE-bench Leaderboard for its agentic capabilities, with features for implementing code across multiple files, generating tests, documentation, and automated code reviews ^[31].

SWE-Agent

SWE-Agent, developed by Princeton University's NLP group, is an open-source autonomous coding agent that achieved 12.29% on SWE-bench shortly after Devin's 13.86% result in 2024. Notably, SWE-Agent ran significantly faster, completing tasks in an average of 93 seconds compared to Devin's 5 minutes ^[32]. The open-source nature of SWE-Agent contributed to rapid community-driven improvements in the autonomous coding agent space.

General-purpose computer-use agents

Devin sits at the agentic end of the AI coding spectrum, but it is not the only agent that takes actions in a sandboxed environment. OpenAI Operator, released as a research preview in January 2025, runs in a hosted browser and is positioned more as a generalist web automation agent than as a coding tool, although users have demonstrated it filing GitHub issues, browsing documentation, and triggering CI runs. Anthropic's computer use capability, introduced in October 2024 with Claude 3.5 Sonnet, lets the model click, type, and read pixels on a virtual desktop and is more often used as a building block by other developers than as an end-user product. Cognition's own demos with Sonnet 4.5 inside Devin's Linux desktop in version 2.2 are conceptually closer to Anthropic's computer-use approach than to a pure coding assistant ^[12]^[27].

Browser-first builders: Replit Agent, Bolt, v0

A separate category of tools focuses on rapid prototyping rather than on long-running engineering tasks: Replit Agent (launched in September 2024 inside Replit's online IDE), Bolt by StackBlitz, and Vercel's v0. These products typically generate full applications from a prompt, run them in a containerized preview, and let users iterate by chatting with the agent. They occupy a niche adjacent to Devin but emphasize design-driven, web-app prototyping rather than the legacy code migrations and bug-fixing workflows that dominate Devin's enterprise deployments.

Tool	Vendor	Focus	Typical task length
Devin	Cognition AI	Autonomous engineering on production codebases	Hours to days
Claude Code	Anthropic	Terminal agent for human-supervised coding	Minutes to hours
Cursor (background agents)	Anysphere	IDE with optional agent mode	Seconds to minutes
GitHub Copilot (agent mode)	GitHub/Microsoft	IDE assistant plus issue-attached coding agent	Minutes to hours
OpenAI Operator	OpenAI	Generalist browser-based web agent	Minutes
Anthropic computer use	Anthropic	Pixel-level desktop automation primitive	Variable
Replit Agent	Replit	Browser-based app builder	Minutes
Bolt	StackBlitz	Web app prototyping in WebContainers	Minutes
v0	Vercel	UI generation from natural language	Seconds to minutes

Developer reception and sentiment

Developer sentiment toward Devin has evolved significantly since the controversial launch:

Period	Sentiment	Key Driver
March 2024 (launch)	Highly skeptical	Demo accuracy concerns, "first AI software engineer" backlash
Late 2024 (GA)	Cautious	$500/month price barrier, Answer.AI evaluation showing 70% failure rate
April 2025 (2.0 launch)	Improving	Price reduction to $20/month, IDE interface, improved performance
Late 2025 (Windsurf acquisition)	Mixed positive	Combined product strategy, enterprise traction
Early 2026 (2.2 release)	Moderately positive	Desktop computer use, self-reviewing PRs, Goldman Sachs adoption
Spring 2026 ($25B raise)	Polarized	Wall Street rollouts and SWE-1.6 gains balanced against continued benchmark gap with Claude Code and ongoing Windsurf pricing complaints ^[34]^[43]

As of February 2026, Windsurf ranked #1 in the LogRocket AI Dev Tool Power Rankings, ahead of both Cursor and GitHub Copilot, suggesting that Cognition's combined product portfolio is resonating with developers ^[21].

Devin's performance review (2025)

In late 2025, Cognition published "Devin's 2025 Performance Review," reflecting on 18 months of operating coding agents in production environments ^[19]. The review provided a candid assessment of both achievements and ongoing challenges.

Key achievements

Devin merged hundreds of thousands of PRs across thousands of companies
PR merge rate improved from 34% to 67% year over year
Problem-solving speed increased 4x
Resource efficiency improved 2x
Codebase understanding improved substantially, which Cognition identified as a primary driver of the doubled merge rate

Enterprise results

The performance review included specific customer metrics:

Customer/Use Case	Result
Large organization (security fixes)	5-10% savings of total developer time; Devin completes vulnerabilities in 1.5 min vs. 30 min human average
Bank (ETL migration)	Tasks completed in 3-4 hours vs. 30-40 hours for humans
Java version migration	14x less time than human engineers
Nubank (data migrations)	12x efficiency improvement in engineering hours; 20x cost savings
EightSleep	3x more data features and investigations shipped
Litera	40% test coverage increase; 93% faster regression cycles

Acknowledged limitations

The review also acknowledged that autonomous agents face inherent challenges in real-world software development, including:

Difficulty with large, complex codebases where understanding context is critical
Variability in output quality across different types of tasks
The need for human oversight to catch errors that autonomous systems miss
Long execution times for tasks that a skilled developer could complete quickly

Cognition stated that in 2026, it would continue to focus on making Devin better at understanding real-world codebases, investing in UX to make Devin easier to direct, and expanding the range of tasks the agent can handle autonomously ^[19].

Industry impact

Workforce implications

Devin's launch and subsequent enterprise adoption have intensified the ongoing debate about AI's impact on software engineering jobs. Goldman Sachs' adoption of Devin specifically raised questions about whether autonomous coding agents would reduce demand for human developers ^[23].

The prevailing view among industry leaders by early 2026 is that AI coding agents will shift the nature of software engineering work rather than eliminate it entirely. Goldman Sachs emphasized that Devin would handle "drudgery" tasks like updating internal code to newer programming languages, freeing human engineers for higher-level work. However, business leaders have also warned that early-career roles may be most vulnerable, with some predicting AI could significantly reduce the number of entry-level programming positions within five years ^[23].

At Nubank, engineers were able to delegate migrations to Devin and achieve a 12x efficiency improvement in engineering hours saved and over 20x cost savings, suggesting that the impact is concentrated in repetitive, well-defined tasks rather than creative or architectural work ^[19].

Broader market trends

Devin's launch in March 2024 is widely credited with accelerating investment and development in the AI coding agent space. The viral demo, despite its controversies, demonstrated the concept of a fully autonomous software engineering agent to a broad audience and prompted competitors to accelerate their own agent capabilities.

By early 2026, real-world data suggests that AI coding assistants deliver 20-30% productivity gains concentrated in specific workflows, with benefits varying by developer experience and team size ^[31]. This is notably more modest than the 10x productivity improvements initially promised during the hype cycle of 2023-2024, but still represents meaningful value for engineering organizations.

Current state (2026)

As of mid-2026, Cognition AI operates both the autonomous Devin agent and the Windsurf IDE, positioning itself as a company that addresses the full spectrum of AI-assisted development: from interactive coding assistance (Windsurf) to fully autonomous task completion (Devin).

The company raised more than $1 billion at a $25 billion pre-money valuation ($26 billion post-money) in a round that closed on May 27, 2026, led by Lux Capital, General Catalyst, and 8VC, bringing its total closed financing to well over $1.5 billion ^[34]^[41]^[44]. With the Windsurf acquisition providing a large enterprise customer base and established ARR, Cognition has evolved from a single-product startup into a multi-product AI development platform. The company's combined portfolio serves enterprise customers across financial services, technology, e-commerce, defense, and government, with Goldman Sachs and Citi serving as flagship Wall Street deployments ^[22]^[36].

Recent product developments include Devin 2.2's desktop computer-use capabilities, self-reviewing pull requests, 3x faster startup times, and Linear integration. The Windsurf IDE continues to be developed with proprietary models including SWE-1.5 and SWE-1.6, which together deliver up to 950 tokens per second through a partnership with Cerebras while pushing SWE-Bench Pro scores above the leading open-source coding models ^[26]^[35]^[37]. Cognition for Government, launched in February 2026, brings Devin and Windsurf to federal agencies and defense contractors, with FedRAMP High authorization for Windsurf and a forthcoming FedRAMP High version of Devin.

The Cognizant partnership, announced in January 2026, represents a new go-to-market channel through one of the world's largest IT services companies, with Cognizant integrating Devin and Windsurf into its engineering transformation offerings for global enterprise clients. By spring 2026, Cognition reported that enterprise Devin usage had grown roughly eightyfold year over year, with combined Cognition and Windsurf ARR more than doubling in the months following the acquisition close ^[34]^[14].

The broader market for AI coding tools continues to expand rapidly. Claude Code, Cursor, GitHub Copilot, and a growing number of competitors are all investing heavily in agent capabilities. The central question for Devin and autonomous coding agents in general remains whether full autonomy or human-in-the-loop collaboration will prove to be the dominant paradigm for AI-assisted software development.

Devin's journey from a viral demo to a production tool used by thousands of companies illustrates both the promise and the challenges of autonomous AI agents. While the initial marketing claims were widely seen as overstated, the underlying technology has improved substantially. The concept of AI agents that can independently complete software engineering tasks has become a mainstream area of investment and development across the industry, with Devin playing a central role in shaping both the technology and the discourse around it.

References

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

8 revisions by 1 contributors · full history

Suggest edit