Cognition AI
Last reviewed
May 17, 2026
Sources
33 citations
Review status
Source-backed
Revision
v2 ยท 5,566 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 17, 2026
Sources
33 citations
Review status
Source-backed
Revision
v2 ยท 5,566 words
Add missing citations, update stale details, or suggest a clearer explanation.
Cognition AI is an American artificial intelligence company headquartered in San Francisco, California. Founded in late 2023 by competitive programmers Scott Wu, Steven Hao, and Walden Yan, the company focuses on building autonomous AI software engineering agents. Its flagship product, Devin (AI software engineer), launched in March 2024 as the first publicly announced AI system designed to perform software engineering tasks end-to-end with minimal human supervision. Cognition later acquired Windsurf (software), an agentic integrated development environment, in July 2025 and released its own coding model family, SWE-1.5, in October 2025. By early 2026 the company reported a revenue run rate of approximately $445 million, and as of April 2026 it was in talks to raise a new funding round at a $25 billion valuation.
Cognition was founded in late 2023 by Scott Wu, Steven Hao, and Walden Yan, all of whom share an unusual background: each won gold medals at the International Olympiad in Informatics (IOI), the premier global competition in algorithmic problem-solving for secondary students.
Scott Wu, who serves as CEO, was born in Louisiana in 1997 to a Chinese immigrant family. He attended Baton Rouge Magnet High School before enrolling at Harvard University. Wu won three IOI gold medals in 2012, 2013, and 2014, placing first overall in the 2014 competition. He also won the Harvard-MIT Mathematics Tournament (HMMT) in 2014, served as Mathcounts national champion in 2011 representing Louisiana, earned a gold medal with Harvard's team at the International Collegiate Programming Contest (ICPC) in 2016, and placed third at the Google Code Jam in 2021. On Codeforces, the online competitive programming platform, Wu achieved Legendary Grandmaster status with a peak rating above 3,000. He left Harvard without completing his degree. Before Cognition, Wu co-founded Lunchclub, an AI-powered professional networking platform, serving as its CTO from 2017 to 2022. Forbes named him to its 30 Under 30 list in 2020.
Steven Hao, who serves as CTO, graduated from MIT with a degree in computer science and mathematics. He was an IOI gold medalist in 2014, finishing sixth globally. After MIT, he joined Scale AI as one of its earliest senior engineers, working there from 2018 onward before co-founding Cognition.
Walden Yan, who serves as CPO, was an IOI gold medalist in 2020, finishing nineteenth globally. He enrolled at Harvard but left without graduating in 2023. Before Cognition, he worked briefly on Cursor (code editor) at Anysphere and co-founded DeepReason, a web3 security startup.
The three founders knew each other through competitive programming circuits, having crossed paths at IOI events and at Harvard and MIT. They initially explored cryptocurrency-related projects before pivoting to generative AI in late 2022 as models like ChatGPT accelerated interest across Silicon Valley. The company was formally established in late 2023.
The founding team extended well beyond the three co-founders. As of March 2024, Cognition's ten-person team collectively held ten IOI gold medals. Among those early employees was Gennady Korotkevich, widely regarded as the greatest competitive programmer of all time. Korotkevich, known by the handle "tourist" on competitive programming platforms, joined Cognition as a software engineer in December 2024 and announced in July 2025 that Cognition had become "the first AI lab to win a verified gold medal at the IOI" through its AI system ryanbAI, which placed seventh overall in the competition under the same conditions as human contestants.
The concentration of elite competitive programmers was intentional. Wu has argued that the reasoning skills developed through algorithmic competition, specifically the ability to break complex problems into sequences of logical steps and adapt strategies under time pressure, translate directly into the kind of planning and debugging that autonomous software engineering requires.
Cognition operated in stealth through early 2024. A formative internal moment came in late 2023 when an early version of Devin autonomously solved a server configuration problem before the Christmas holiday, which the team cited as evidence the approach was viable.
The company went public on March 12, 2024, with a blog post and demo video announcing Devin. The video, showing Devin completing software tasks from start to finish, accumulated over 30 million views on X by early 2026.
Alongside the three founders, Cognition built out its senior leadership through 2024 and 2025. Russell Kaplan joined as president in August 2024. Kaplan previously led machine learning and ML infrastructure at Scale AI, where his own computer vision startup Helia had been acquired in 2020, and before that worked on the Autopilot team at Tesla. Kaplan led negotiations on the Windsurf acquisition in July 2025 and frequently represents Cognition publicly through podcasts, conference talks, and interviews on enterprise rollouts.
The broader executive team also grew to include a chief financial officer, a head of enterprise sales recruited from larger SaaS companies, and several senior research and infrastructure hires drawn from OpenAI, Anthropic, Scale AI, and quantitative trading firms. Cognition publicly named only a small set of executives in 2025 and 2026, in keeping with the company's preference for spotlighting research and product output rather than personnel.
Cognition began as a small team operating out of San Francisco. By early 2026 the company occupied a roughly 25,000-square-foot headquarters in the South Park neighborhood of San Francisco, which it presents as a working environment closer to a research lab than to a traditional software-as-a-service startup. The space includes shared work areas, on-site dining, and overnight accommodations used during product launches.
With the Windsurf acquisition completed in July 2025, Cognition inherited a workforce previously distributed across Mountain View and other locations. The company subsequently consolidated technical staffing into its San Francisco hub while opening or expanding satellite offices in New York City (for enterprise sales and customer engineering), Austin, and London (for European go-to-market activity). Team size grew from approximately ten in early 2024 to several hundred employees by 2026, with most engineering and research concentrated in San Francisco.
Cognition has raised capital primarily from Founders Fund, Peter Thiel's venture firm, across multiple rounds.
| Round | Date | Amount | Valuation | Lead investor |
|---|---|---|---|---|
| Series A | March 2024 | $21 million | $350 million | Founders Fund |
| Series A2 | April 2024 | $175 million | ~$2 billion | Founders Fund |
| Undisclosed round | March 2025 | Undisclosed | $4 billion | 8VC (Joe Lonsdale) |
| Series C | September 2025 | $400 million | $10.2 billion | Founders Fund |
| Funding talks | April 2026 | Hundreds of millions (reported) | $25 billion | Undisclosed |
The Series A in March 2024 raised $21 million at a $350 million valuation. The round was supported by Patrick Collison and John Collison of Stripe, Elad Gil, and others alongside Founders Fund. One month later, Cognition closed a second tranche of $175 million at approximately $2 billion, making it a unicorn within its first year of public visibility. That April 2024 round was also led by Founders Fund.
In March 2025, a new round led by 8VC, the firm co-founded by Joe Lonsdale, valued Cognition at $4 billion. After the Windsurf acquisition closed in July 2025, the combined company's valuation reached approximately $9.7 billion on secondary markets before the formal Series C closed.
In September 2025, Cognition closed $400 million at a $10.2 billion post-money valuation. Founders Fund led the round again. Returning investors included Lux Capital, 8VC, Neo, Elad Gil, Definition Capital, and Swish VC. New investors included Bain Capital Ventures, Hanabi Capital, and D1 Capital. Total funding raised through that point was approximately $896 million. The company disclosed it had kept total net burn under $20 million since founding, an unusually low figure given the pace of revenue growth and capital raised.
In April 2026, Bloomberg reported that Cognition was in early talks to raise a new round that would value the company at $25 billion, more than doubling the September 2025 valuation. The talks followed continued growth in the vibe coding and autonomous engineering markets and a reported revenue run rate of approximately $445 million by the end of the company's first eighteen months of public operation. As of the April 2026 report, the company had not finalized terms.
Founders Fund had served as the consistent anchor investor across nearly every priced round, with managing partner Trae Stephens leading the firm's involvement. Cognition's repeat-investor concentration was unusual relative to AI peers like OpenAI and Anthropic, which tend to diversify lead investors across rounds.
Cognition's core product is Devin (AI software engineer), an autonomous AI agent designed to complete software engineering tasks from a natural language description. Unlike code completion tools that assist developers in writing individual lines or functions, Devin is designed to take on multi-step engineering projects: writing code, running tests, debugging failures, browsing documentation, and opening pull requests without continuous human input.
When Devin launched in March 2024, it achieved a score of 13.86% on the SWE-bench benchmark, a dataset of real GitHub issues drawn from popular open-source repositories. That figure far exceeded the previous state of the art of 1.96% and represented the first time any autonomous system had resolved more than a small fraction of real-world software issues without being told which files to look at. Cognition positioned Devin not as a replacement for software engineers but as an autonomous teammate that could handle delegated tasks while engineers focus on higher-level architecture and product decisions.
The original product required a subscription starting at $500 per month and operated through a web interface where developers submitted tasks and reviewed completed pull requests.
For extended coverage of Devin's capabilities, benchmarks, reception, and technical architecture, see Devin (AI software engineer).
In April 2025, Cognition released Devin 2.0 with several major changes. The most visible was a price reduction from $500 to $20 per month for a starter plan, a 96% cut that significantly broadened the accessible user base. According to Cognition's internal benchmarks, Devin 2.0 completed 83% more junior-level development tasks per Agent Compute Unit (ACU) compared to prior versions.
Devin 2.0 introduced an agent-native integrated development environment where users and Devin could work together within the same interface. Developers could edit and refine code directly inside the Devin IDE using familiar shortcuts. The update also added the ability to run multiple Devin instances in parallel, each with its own cloud-based IDE, allowing teams to process concurrent tasks.
Three new features shipped with Devin 2.0. Interactive Planning allowed Devin to analyze a codebase and generate a preliminary plan within seconds, which the user could review and adjust before autonomous execution began. Devin Search gave users a way to ask questions about their codebase and receive detailed, cited answers through agentic exploration, with a Deep Mode option for more complex queries. Devin Wiki automatically indexed repositories into a structured wiki containing architecture diagrams, source links, and generated documentation, refreshed every few hours as the codebase changed.
Devin 2.1 added a confidence scoring system. At the start of each session, after planning, and during code-related queries, Devin shows a green, yellow, or red confidence indicator expressing how certain it is that it can complete the task successfully. Sessions with green confidence scores produced pull requests that merged at roughly twice the rate of sessions with red scores. When confidence is low, Devin asks clarifying questions before proceeding.
The 2.1 update also integrated Devin's codebase understanding technology directly into the task workflow. Users can query the codebase mid-session using the "!ask" command or allow Devin to automatically scan for context when it detects a gap in its understanding. The same underlying technology also appeared in DeepWiki, a separate product Cognition released for generating documentation wikis from public repositories.
Through the Linear and Jira integrations, teams could request confidence assessments on multiple backlog issues simultaneously without launching full Devin sessions, letting engineering managers prioritize which tasks to delegate based on predicted success likelihood.
Devin 2.2, released in February 2026, introduced desktop testing through computer use. Devin gained full access to its own Linux desktop environment, allowing it to launch and interact with desktop applications as part of testing a pull request. After completing code changes, Devin could run the application on its own desktop, record the screen, and return that recording for human review before the pull request was merged. Desktop testing was enabled by default for new sessions.
The 2.2 update also shipped Devin Review Autofix, a self-review loop in which Devin plans, writes code, reviews its own output, identifies issues, and fixes them before the pull request is opened. The interface was fully rebuilt to unify the development lifecycle from planning through code review in a single view. Startup time improved by a factor of three.
Cognition introduced MultiDevin for enterprise customers, enabling parallelized task execution across multiple agents. In the MultiDevin configuration, one manager agent coordinates up to ten worker agents, each running independently on separate tasks. This allows engineering teams to delegate entire sprints rather than individual tickets.
In April 2025, Cognition launched DeepWiki, a free service that generates a structured, navigable wiki for any public GitHub repository. The tool repurposes the same codebase-indexing technology that powers Devin Search and Devin Wiki and exposes it through a public web interface. Users can browse generated documentation, ask questions in natural language about how a repository works, and trace answers back to specific source files.
DeepWiki is accessible by replacing "github.com" with "deepwiki.com" in any repository URL, a convention that requires no login or sign-up for open-source projects. By late 2025 the service indexed more than 50,000 of the largest public GitHub repositories, including the Model Context Protocol specification, LangChain, and many widely used machine learning libraries. A subsequently released DeepWiki Model Context Protocol (MCP) server allowed external agents, including agents from other vendors, to query DeepWiki's indexed knowledge as a tool.
Cognition has framed DeepWiki as both a useful free product and a marketing channel: developers exploring an unfamiliar open-source repository through DeepWiki are exposed to Cognition's indexing technology and may become candidates for the paid Devin product when they want similar context applied to private codebases.
See Windsurf (software).
In July 2025, Cognition acquired Windsurf, an agentic IDE developed by Codeium. The acquisition came days after Google completed a $2.4 billion licensing deal in which Windsurf's CEO Varun Mohan, co-founder Douglas Chen, and key research staff left to join Google. That departure left Windsurf's product, brand, remaining employees, and intellectual property available.
Cognition president Russell Kaplan described the deal as coming together over a single weekend, with a first call placed after 5 p.m. on a Friday and a definitive agreement signed Monday morning. While terms were not disclosed publicly, later reporting estimated the price at approximately $250 million.
At the time of acquisition, Windsurf had $82 million in annual recurring revenue, more than 350 enterprise customers, and hundreds of thousands of daily active users, with enterprise ARR growing at a rate of roughly double quarter over quarter. All Windsurf employees participated financially in the deal, with vesting cliffs waived for prior work and full acceleration of existing vesting schedules.
Following the acquisition, Scott Wu framed the combination as pairing Devin's autonomous agent capabilities with Windsurf's developer-facing IDE and established enterprise sales infrastructure. The two products addressed different parts of the software development workflow: developers actively coding used Windsurf, while task delegation and autonomous execution ran through Devin. Wu described the combination as "a massive unlock" toward the company's goal of building the future of software engineering.
Within seven weeks of the acquisition closing, combined enterprise ARR grew over 30%. The Windsurf IDE was updated to include full access to the latest Claude models from Anthropic, and Cognition began integrating its proprietary SWE-1.5 model family into the Windsurf environment.
The Windsurf integration also became one of the most publicly discussed organizational events in the company's short history. Three weeks after the acquisition closed, Cognition laid off about 30 Windsurf employees and offered the remaining roughly 200 a nine-month severance buyout. Employees who elected to stay were required to commit to a six-day-per-week, 80-hour schedule and relocate to or work consistently out of Cognition's San Francisco office.
The restructuring drew significant external criticism. Critics argued the moves contradicted the inclusive language used at announcement time, when Cognition emphasized that "100% of Windsurf employees" would receive financial compensation from the deal. Defenders, including Wu in internal emails and public statements, framed the schedule as opt-in and the buyout as a way to give people a graceful exit if the culture was not a fit. A separate controversy emerged when Prem Qu Nair, an early Windsurf engineer, publicly stated he had received only about 1% of the share value he had expected from the Google licensing transaction that preceded the Cognition deal, sparking broader debate about how acquihires and partial talent acquisitions distribute proceeds to rank-and-file employees.
Alongside the Devin product, Cognition has developed an in-house family of coding models. The line traces back to SWE-1, an internal experimental model released in 2025, and matured with SWE-1.5 in October 2025. The SWE-1.5 release was Cognition's first publicly positioned frontier-size coding model, with several hundred billion parameters and training tailored to long-horizon, multi-step engineering tasks rather than single-turn code completion.
SWE-1.5 was designed jointly with Cerebras Systems to run on Cerebras's wafer-scale inference hardware. On that hardware, Cognition reported inference speeds of up to 950 tokens per second, approximately 13 times faster than Claude Sonnet 4.5 on comparable workloads, while reaching coding benchmark scores in the same range as Sonnet 4.5 and surpassing reports for GPT-5 in some agentic settings. Cognition described the model, its inference system, and the surrounding agent harness as a single co-designed system, a design philosophy that distinguishes it from products built on top of third-party general-purpose models.
Supporting SWE-1.5 are SWE-grep and SWE-grep-mini, two specialized models tuned for fast parallel repository search. They run at roughly 20 times and 4.5 times the speed of Claude Haiku 4.5 respectively, allowing Devin to retrieve relevant code passages across large repositories without dominating the latency budget of an agent run.
Cognition deploys its in-house models alongside frontier models from external labs. By default Devin routes between Cognition's own SWE family, Claude models from Anthropic, and other providers depending on the task. Windsurf likewise offers users a choice between SWE-1.5 for fast iteration and external models such as Claude Sonnet and Opus for more demanding reasoning. This multi-model strategy lets Cognition tune cost and latency on common tasks using its own models while still benefiting from frontier capability when needed.
In May 2024, Cognition partnered with Microsoft to integrate Devin with Microsoft Azure. The partnership included both infrastructure (Azure as Cognition's cloud provider) and commercial collaboration through Microsoft's enterprise sales and engineering teams. At Microsoft Build 2024, Microsoft CTO Kevin Scott described Devin as "an absolutely amazing tool."
Through the Microsoft relationship, Devin was deployed at enterprise customers including Visma, a Norwegian fintech company with approximately $2 billion in annual revenue undergoing large-scale cloud migration. Visma reported a 50% reduction in migration project costs and developer productivity gains of up to 2x in some scenarios.
Cognition's October 2025 SWE-1.5 release also formalized a deep relationship with Cerebras. Cerebras's wafer-scale processors became the primary inference platform for SWE-1.5 in both Devin and Windsurf. Cognition and Cerebras engineers co-designed training and serving stacks over the months preceding launch, an arrangement Cerebras highlighted in its public statements about the deal.
Though not a formal commercial partnership in the same sense as the Microsoft and Cerebras relationships, Anthropic is a major model provider for Devin and Windsurf. Cognition uses Claude models for tasks that exceed the strengths of its own SWE family, and Windsurf users explicitly choose between Claude Sonnet, Claude Opus, and SWE-1.5 within the IDE. Cognition has described this multi-model posture as practical rather than philosophical, on the view that no single model is optimal for every coding task.
Cognition launched Devin commercially in mid-2024. Annual recurring revenue grew from approximately $1 million in September 2024 to $73 million by June 2025, a 73-fold increase in nine months. Following the Windsurf acquisition in July 2025, which brought Windsurf's approximately $82 million ARR, combined company ARR was estimated at roughly $150 to $155 million by mid-2025. By the company's first eighteen-month anniversary in early 2026, public commentary and an interview with Scott Wu placed the combined annualized revenue run rate at approximately $445 million.
Enterprise customer adoption has been one of the most consistently cited drivers of the company's growth. Enterprise customers as of late 2025 included Goldman Sachs, Citigroup, Dell Technologies, Cisco Systems, Ramp, Palantir, Nubank, Mercado Libre, OpenSea, and Curai Health. Cognition has stated that Devin had merged hundreds of thousands of pull requests across customer deployments by 2025, and that overall enterprise usage grew roughly 80-fold over the year ending in early 2026.
In July 2025, Goldman Sachs announced it was piloting Devin alongside its approximately 12,000 internal developers, with chief information officer Marco Argenti describing the deployment as part of a broader "hybrid workforce" strategy targeting 20% efficiency gains in software development. The bank framed Devin as its first "AI employee" rather than as a tool used by existing engineers, a positioning Cognition cited as evidence that large regulated organizations were prepared to assign discrete, accountable work to autonomous agents.
Citigroup announced it was rolling out Cognition's Devin agent to its roughly 40,000 developers. The bank's stated initial use cases were straightforward, well-scoped engineering tasks such as routine refactors and bug fixes, with plans to expand to more complex assignments over time. The Citi rollout, combined with the Goldman pilot, made financial services one of the most visible verticals for the product.
One documented case study involved Linktree, the social media link-in-bio platform. In February 2025, Devin authored approximately 300 pull requests in one month, of which roughly 100 were merged. A Nubank case study, developed in partnership with Microsoft, reported an 8-fold improvement in engineering efficiency and 20-fold cost savings on a monolithic codebase refactoring project. Visma's large-scale Azure migration, mentioned above, was another frequently cited engagement.
Pricing as of 2026 starts at $20 per month for individual access. A Team Plan at $500 per month includes unlimited seats with usage-based compute billing measured in Agent Compute Units (ACUs). Enterprise pricing is custom and includes VPC deployment, custom fine-tuned Devin instances, dedicated account support, and custom legal and security terms.
Cognition has built a public identity around an unusually intense work culture even by Silicon Valley standards. Wu has repeatedly described the company as a place where the founders and most employees expect to work long hours, six days a week, with significant time spent together in person at the San Francisco office. Internal communications publicized after the Windsurf acquisition included a statement from Wu that the company does not "believe in work-life balance" and a follow-up clarification framing intensity as opt-in rather than mandatory.
The practical mechanisms of the culture include extended on-site hours, an in-office gym and dining program, and overnight accommodations for engineers during product launches and pre-release crunch windows. Several profiles of the company by The San Francisco Standard and others have described employees living near or staying at company-managed houses, working out of a single open floor, and adopting a research-lab style of collaboration in which research, infrastructure, and product engineering coexist in the same physical space.
Defenders of the culture argue that it reflects the personal preferences of a relatively small, mission-driven team and that financial outcomes for participating employees have outpaced more conventional companies. Critics, including some former employees, have pushed back on the broader normalization of 80-hour weeks in frontier AI startups, arguing that the practice excludes caregivers and contributes to burnout. The Windsurf restructuring discussed above became the most visible flashpoint for these debates.
The March 2024 announcement drew significant attention across the software industry. The demo video showing Devin completing a full-stack programming task went viral, and industry figures including the Collison brothers and Elad Gil publicly backed the company. Within weeks, the company had secured $175 million in additional funding and a $2 billion valuation.
Criticism of the launch appeared quickly. The YouTube channel Internet of Bugs published a detailed breakdown of the original demo video, arguing that the Upwork task Devin appeared to complete had been misrepresented and that the autonomous completion shown relied on conditions not disclosed in the promotional material. AI researcher Devansh published similar concerns on Medium. These critiques raised questions about whether Cognition had overstated Devin's real-world capabilities during the launch.
An evaluation by Answer.AI, an applied AI research lab, attempted 20 tasks using Devin in late 2024. The evaluation found 14 failures, 3 inconclusive results, and 3 successes, a success rate of 15%. The evaluators noted that Devin's autonomous nature became a liability in some cases, where it continued pursuing unworkable approaches for extended periods rather than recognizing when a task was structurally blocked.
The SWE-bench score of 13.86% was itself questioned in some quarters. While the number represented a genuine improvement over prior systems, critics noted that failing 86% of tasks was a meaningful limitation for a product positioned as an autonomous software engineer, and that the benchmark's curated subset of issues may not reflect the distribution of tasks encountered in production environments.
By 2025, Cognition's public messaging shifted to emphasize Devin as a tool for augmenting engineering teams rather than replacing individual engineers. Internal performance metrics from November 2025 showed a pull request merge rate of approximately 67%, compared to 34% in earlier versions, alongside reported improvements in speed and compute efficiency.
In 2025, Cognition's ryanbAI system competed at the International Olympiad in Informatics under the same conditions as human contestants. According to Gennady Korotkevich, who announced the result, ryanbAI placed seventh overall and earned a gold medal, making Cognition the first AI lab to win a verified IOI gold medal.
Cognition operates in a crowded market for AI coding tools, competing with products from Anthropic, OpenAI, Microsoft, and several well-funded startups including Cursor.
| Devin (Cognition) | Claude Code (Anthropic) | Codex (OpenAI) | GitHub Copilot (Microsoft) | |
|---|---|---|---|---|
| Architecture | Cloud-based autonomous agent | Terminal-based agentic assistant | Cloud-based asynchronous agent | IDE plugin |
| Where code runs | Cognition cloud infrastructure | Local machine (developer's environment) | OpenAI-managed cloud containers | Local environment |
| Primary interface | Web app + IDE (Windsurf) | Terminal / shell | ChatGPT web interface | IDE extension |
| Underlying model | SWE-1.5 (proprietary) | Claude Sonnet 4 / Opus 4 | codex-1 (fine-tuned o3) | GPT-4 family |
| Parallelism | Yes (MultiDevin, up to 10 agents) | Agent Teams (lead + sub-agents) | Independent parallel agents | No |
| Starting price | $20/month | Included with Claude Pro | Included with ChatGPT Plus | $10/month (individual) |
| SWE-bench (representative) | 13.86% (March 2024, original Devin) | 70.3% (Claude Code, verified) | High (o3-based, specific figure varies by report) | N/A |
The most significant structural difference between Devin and terminal-based tools like Claude Code is where execution happens. Claude Code runs in the developer's own environment with direct access to local files, while Devin and Codex (OpenAI) run in cloud-controlled sandboxes, submitting results as pull requests. Developers who need tight control over their local environment tend to prefer terminal-based agents. Teams that want fully asynchronous delegation without monitoring an active shell session tend toward cloud-based agents.
By acquiring Windsurf, Cognition bridged this gap. The Windsurf IDE provides a developer-facing local environment with a native coding experience, while Devin handles task delegation in parallel. This positions Cognition as the only major player offering both an agentic IDE and an autonomous cloud-based software engineering agent within the same company.
Cursor (code editor) from Anysphere, which Walden Yan briefly worked on before founding Cognition, had reached $500 million in ARR and a $29.3 billion valuation by November 2025. Cursor's growth illustrated the scale of demand for AI-native development tools, and Cognition's acquisition of Windsurf was partly a response to that competitive dynamic.
Autonomous agents that run unsupervised on cloud infrastructure create distinct failure modes compared to interactive tools. When Devin misunderstands a task or encounters an unexpected environment state, it can continue working in the wrong direction for extended periods before surfacing an error. Answer.AI's 2024 evaluation documented cases where Devin spent significant time pursuing approaches that were structurally blocked rather than stopping to ask for clarification.
The confidence scoring system introduced in Devin 2.1 addresses part of this problem by having Devin express uncertainty upfront and ask questions before proceeding. But the accuracy of the confidence estimates depends on how well Devin has indexed the relevant codebase, and large or poorly documented repositories remain harder for the system to reason about.
Security and compliance are ongoing concerns for enterprise adoption. Cognition offers VPC deployment to keep customer code inside their own network perimeter, but any cloud-based execution model requires customers to grant an external system significant access to their codebase and infrastructure. Organizations in regulated industries have been slower to adopt autonomous agents for this reason.
The compute costs of running multi-step autonomous agents at scale are substantially higher than those of interactive code completion tools. Usage-based billing through ACUs means that a task that takes longer than expected can consume more compute than anticipated, which has been noted as a friction point in enterprise procurement.
SWE-bench scores across the industry rose rapidly throughout 2024 and 2025, with multiple competing systems eventually surpassing Devin's initial 13.86% benchmark by large margins. Claude Code reached 70.3% on SWE-bench Verified, and newer models from Anthropic, OpenAI, and Google continued to push the benchmark higher. Cognition's SWE-1.5 family closed some of that gap, but SWE-bench performance does not directly predict success on the diverse range of tasks that appear in production engineering work.