# Software Development

> Source: https://aiwiki.ai/wiki/software_development
> Updated: 2026-05-13
> Categories: AI Tools & Products, Software Development
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

*See also: [Software Development ChatGPT Plugins](/wiki/software_development_chatgpt_plugins)*

**AI in software development** is the use of [large language models](/wiki/large_language_model) and other machine learning systems to assist or automate tasks that programmers have traditionally done by hand. The field spans single-line autocomplete inside an editor, conversational chat assistants that explain or refactor code, autonomous agents that plan multi-step changes, and specialised tools for code review, testing, documentation, and security analysis. Between mid-2021, when [GitHub Copilot](/wiki/github_copilot) entered private preview, and the end of 2025, AI-assisted coding moved from a curiosity to a default part of many professional engineering workflows, with adoption surveys reporting that a majority of working developers had used such a tool in the prior year [1][2].

The shift has been driven by [transformer](/wiki/transformer) models trained on very large corpora of public source code and natural language, the wide deployment of inline completion in editors such as Visual Studio Code and the JetBrains family, and a wave of agentic systems including [Devin](/wiki/devin), [Claude Code](/wiki/claude_code), and [OpenAI Codex](/wiki/openai_codex). Productivity studies have produced mixed and sometimes contradictory findings, ranging from a 55.8% speed-up on a controlled HTTP server task to a randomised trial of senior open-source maintainers who were 19% slower when allowed to use AI tools [3][4]. Concerns about hallucinated APIs, supply-chain attacks via fabricated package names, licensing of training data, and the long-term effect on code quality have prompted academic study, lawsuits, and changes in enterprise policy.

## Overview

AI coding tools fall into a small number of categories that are useful to distinguish because they have different cost structures, failure modes, and review requirements.

| Category | Typical interaction | Examples |
| --- | --- | --- |
| Inline completion | Ghost text inside the editor, accepted by Tab | Early Copilot, [Tabnine](/wiki/tabnine), Codeium |
| Chat assistants | Side-panel or pop-up chat with file context | Copilot Chat, [Sourcegraph Cody](/wiki/sourcegraph_cody), JetBrains AI Assistant |
| Edit-mode tools | Model rewrites a selection in place | [Cursor](/wiki/cursor) Composer, [Aider](/wiki/aider), [Continue](/wiki/continue_dev) |
| Coding agents | Multi-step planning, shell access, file I/O | [Devin](/wiki/devin), [Claude Code](/wiki/claude_code), [OpenHands](/wiki/openhands), Replit Agent |
| Domain specialists | Reviews, tests, docs, security, observability | CodeRabbit, [Qodo](/wiki/qodo), Diffblue, Snyk DeepCode |

The boundary between these categories is fuzzy because most flagship products now bundle several modes. GitHub Copilot, for example, started as inline completion, added chat in 2023, added the Copilot Workspace agent in 2024, and shipped a fully agentic mode in 2025 [5][6][7]. Cursor began as a fork of Visual Studio Code with chat and edit features and now ships a background agent that runs on remote machines [8].

Underneath the products, the technology stack is fairly consistent. The base model is almost always a transformer-based [LLM](/wiki/llm), either a general-purpose frontier model such as [GPT-4o](/wiki/gpt_4o), [Claude 3.5 Sonnet](/wiki/claude_3_5_sonnet) and its successors, or [Gemini](/wiki/gemini), or a code-specialised model such as DeepSeek-Coder, Qwen 2.5 Coder, [StarCoder](/wiki/starcoder), or [Code Llama](/wiki/code_llama) [9][10][11]. Retrieval over the local repository is added on top, using embeddings, language servers, or filesystem traversal. For agents, a planning loop, a tool harness, and a sandboxed shell are layered over the model. The [Model Context Protocol](/wiki/mcp) released by [Anthropic](/wiki/anthropic) in November 2024 has become a common way to plug external tools into these agents [12].

## History

### Statistical autocomplete to neural code models

Early editor assistance was rule-based or grammar-driven. Microsoft's IntelliSense, which shipped in Visual Basic 5.0 in 1996 and then in Visual Studio, used static type information and parse trees to suggest method names. Eclipse's Java Development Tools and the C/C++ Development Tooling shipped similar features for open source. These tools knew nothing about semantics beyond what compilers exposed, and they did not generate new code.

Statistical approaches arrived in the 2010s. Kite, founded by Adam Smith in 2014, used cloud-trained models to suggest Python completions inside editors. It briefly ran a free copilot product before shutting down in 2022, with Smith publishing a post-mortem that described the difficulty of building a viable business around inline suggestions before transformer models existed [13]. Tabnine, founded as Codota in 2013 by Dror Weiss and Eran Yahav, used statistical and neural models to provide multi-language autocomplete and is one of the few pre-Copilot vendors that survived the transition.

Neural language models for code became prominent with OpenAI's Codex paper in July 2021, which described a fine-tuning of [GPT](/wiki/gpt_generative_pre-trained_transformer)-3 on public GitHub code and introduced the [HumanEval](/wiki/humaneval) benchmark of 164 hand-written Python problems [14]. The model behind Codex powered the first version of Copilot, which Microsoft and GitHub launched as a private preview on 29 June 2021 [15]. General availability followed on 21 June 2022 at $10 per month for individuals, with free access for verified students and maintainers of popular open-source projects [16].

### The Copilot era

GitHub Copilot is the product that turned AI autocomplete into a mainstream developer tool. Microsoft and GitHub leadership cited it heavily on earnings calls, and CEO Satya Nadella reported that paid Copilot users for individuals and the GitHub Copilot business plan had exceeded 1.3 million on the company's fiscal year 2024 fourth quarter earnings call in July 2024, growing more than 180% year over year [17]. Copilot Chat reached general availability for business users in December 2023 after a public beta announced at GitHub Universe in November of that year [5]. Copilot Workspace, a task-oriented interface that lets the model plan, propose code edits, and run tests against a repository, was announced at GitHub Universe in April 2024 [6]. Copilot agent mode, which lets the model run shell commands and iterate on a task without per-step approval, shipped in 2025 [7].

The success of Copilot triggered a wave of competitors. Amazon previewed CodeWhisperer in June 2022 and reached general availability in April 2023, then rebranded the product as Amazon Q Developer at AWS re:Invent in November 2023 with a broader rollout in April 2024 [18]. Google launched Duet AI for Developers at Google Cloud Next in August 2023 and renamed it Gemini Code Assist in February 2024 [19]. JetBrains, which makes the IntelliJ family of IDEs, shipped JetBrains AI Assistant in December 2023 and added the agentic Junie in 2025 [20]. Sourcegraph released Cody, which uses repository graph indexes to ground completions in large codebases, in 2023.

### The agentic shift

The step from "autocomplete" to "agent" came in 2024. Cognition AI announced Devin, billed as the "first AI software engineer," on 12 March 2024, with a demo video showing the system completing freelance jobs end-to-end inside a sandboxed Linux environment [21]. The Devin demo, and the company's claim of a 13.86% unassisted score on the [SWE-bench](/wiki/swe_bench) benchmark, drew both attention and scepticism. Several engineers, including Carl Brown of Internet of Bugs, published video critiques arguing that the demo had been edited and that the actual capability was lower than advertised [22]. The episode marked the start of a more public debate about how to measure agentic coders.

Academic work moved quickly. The SWE-Agent system, developed at Princeton NLP by John Yang, Carlos Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press, was released in April 2024 [23]. It introduced the agent-computer interface, a deliberate set of file-editing and shell commands designed for an LLM, and reached 12.5% on SWE-bench Lite using [GPT-4](/wiki/gpt_generative_pre-trained_transformer). The team at the University of Illinois Urbana-Champaign and the All Hands AI startup released OpenDevin, later renamed [OpenHands](/wiki/openhands), as an open-source alternative in March 2024 [24].

Anthropic announced [Claude Code](/wiki/claude_code) on 24 February 2025 as a terminal-based agent with file editing, shell execution, and git tools [25]. [OpenAI](/wiki/openai) re-released the Codex brand on 16 May 2025, this time as a cloud-hosted coding agent that runs in parallel containers, accepts tasks from chat or pull requests, and ships a CLI for local use [26]. Replit's [Agent](/wiki/replit) shipped in September 2024 with a focus on building and deploying full applications from natural language, and Replit Agent 2 followed in February 2025 [27].

## Code completion assistants

The table below lists widely used tools, with first-release dates referring to the public availability of the AI-coding product, not to the parent company.

| Tool | First release | Organisation | Key features |
| --- | --- | --- | --- |
| [GitHub Copilot](/wiki/github_copilot) | June 2021 preview, June 2022 GA | GitHub (Microsoft) | Inline completion, Copilot Chat, Copilot Workspace, agent mode |
| [Cursor](/wiki/cursor) | March 2023 | [Anysphere](/wiki/anysphere) | Forked VS Code editor, Composer multi-file edits, background agent |
| [Windsurf](/wiki/windsurf) | November 2024 | Codeium then OpenAI then Cognition | Cascade agent, Codeium roots, acquired by Cognition AI in July 2025 |
| [Codeium](/wiki/codeium) | 2022 | Codeium (later Windsurf) | Free tier completion across many editors |
| [Tabnine](/wiki/tabnine) | 2018 (neural), 2023 (chat) | Tabnine | Self-hosted option, enterprise compliance focus |
| [Aider](/wiki/aider) | 2023 | Paul Gauthier (open source) | Terminal-based pair programmer that writes commits |
| [Cline](/wiki/cline) | 2024 (as Claude Dev) | Open source | VS Code extension, BYO key, plan and act modes |
| [Continue](/wiki/continue_dev) | 2023 | Continue.dev | Open-source autopilot for VS Code and JetBrains |
| [Sourcegraph Cody](/wiki/sourcegraph_cody) | 2023 | Sourcegraph | Repo-graph search context, enterprise focus |
| Replit Agent | September 2024 | [Replit](/wiki/replit) | Build and deploy apps from prompts, Replit Agent 2 in 2025 |
| JetBrains AI Assistant | December 2023 | JetBrains | Multi-language chat and completion across IntelliJ family |
| JetBrains Junie | 2025 | JetBrains | Agentic coder for the JetBrains IDEs |
| Amazon Q Developer | April 2024 (rebrand) | [Amazon](/wiki/amazon_q) | Successor to CodeWhisperer, AWS service integration |
| Gemini Code Assist | February 2024 (rebrand) | [Google](/wiki/google) | Successor to Duet AI for Developers, GCP integration |
| [Gemini CLI](/wiki/gemini_cli) | 2025 | Google | Open-source terminal agent powered by Gemini |
| Phind | 2023 | Phind | Search-based answer engine for developers |
| Mintlify | 2022 | Mintlify | Documentation generator and authoring platform |
| Stack Overflow OverflowAI | Announced July 2023 | Stack Overflow | AI search, enterprise knowledge ingestion |

Several of these products have stories that are worth flagging.

**Cursor** is built and operated by [Anysphere](/wiki/anysphere), a company founded in 2022 in San Francisco by Michael Truell, Sualeh Asif, Arvid Lunnemark, and Aman Sanger, all then in their early twenties. Cursor's editor is a fork of VS Code with deeper LLM integration, including the Composer feature for multi-file edits and an inline edit mode triggered by Cmd-K. The product pulled ahead of standalone Copilot use among early adopters in 2024 and crossed $100 million in annualised revenue within twelve months. By June 2025, Anysphere had reached a $9.9 billion valuation on more than $500 million in ARR [28].

**Codeium and Windsurf** had a turbulent 2025. Codeium, the consumer brand, evolved into Windsurf with the launch of the Cascade agent in November 2024. In July 2025, Google reportedly paid about $2.4 billion in a licensing and hiring deal that pulled CEO Varun Mohan and other engineers to DeepMind, leaving the Windsurf product behind. Cognition AI then acquired what remained of Windsurf later in July 2025 [29][30].

**Aider**, written and maintained by Paul Gauthier, is an open-source command-line tool that pairs an LLM with a Git workflow. It applies edits as commits, with diffs the user can review, and has been used as a reference implementation for many later agentic coders [31].

**Cline**, originally released as Claude Dev, is an open-source VS Code extension that uses your own API key, runs in a plan-then-act loop, and supports multiple model providers [32].

## Agentic coders

A "coding agent" in this article means a system that, given a high-level task, can plan steps, edit files, run commands such as tests, observe results, and iterate without per-step human approval. The line between an agent and a chat tool is blurry, since most chat tools now ship some form of tool use.

### Devin

[Devin](/wiki/devin) is the agent that arguably defined the category. Cognition AI, founded in 2023 by Scott Wu, Steven Hao, and Walden Yan, announced Devin on 12 March 2024 with a launch video, a benchmark claim, and a private waitlist [21]. The system runs in a sandbox with a Linux shell, code editor, and browser, and presents a chat interface where users can hand off tasks. Devin reached general availability in December 2024 at $500 per month, then dropped to a $20 starting price with Devin 2.0 in April 2025 [33]. Cognition was valued at over $10 billion by September 2025 [34].

### Claude Code

[Claude Code](/wiki/claude_code) is Anthropic's terminal-based coding agent. It launched in research preview with [Claude 3.7 Sonnet](/wiki/claude_3_5_sonnet) on 24 February 2025 [25]. Claude Code runs as a CLI, with permission to read and edit files in a working directory, execute shell commands, and call MCP servers for external tools. Anthropic has rolled out follow-on features including subagents, hooks, a VS Code extension, GitHub integration, and a one-million-token context window for Sonnet variants. Anthropic reported on its fiscal results that Claude Code reached significant adoption among engineering teams during 2025, contributing to a sharp rise in API revenue [35].

### OpenAI Codex (2025)

[OpenAI Codex](/wiki/openai_codex) is a reuse of the Codex brand. The first Codex, retired in March 2023, was the model behind early Copilot. The new Codex, released on 16 May 2025, is a cloud-hosted coding agent that runs each task in its own container, with support for parallel runs and pull-request style review of changes. It is powered by codex-1, a fine-tuned variant of the o3 reasoning model, and ships a CLI for local use [26].

### OpenHands and SWE-Agent

[OpenHands](/wiki/openhands) is the leading open-source agent. It supports many model providers, runs in a Docker sandbox, and includes a browser and shell tool. The project was created at All Hands AI by the team that built OpenDevin [24]. SWE-Agent, also open source and developed by the Princeton NLP group, is more focused on the SWE-bench task format and contributed the influential agent-computer interface design [23].

### Replit Agent

Replit's Agent, launched in September 2024 and followed by Agent 2 in February 2025, is positioned for non-professional developers who want to build and deploy an entire web application from a natural-language description, with hosting and a database provisioned automatically inside Replit's platform [27].

### AutoGen and Magentic-One

Microsoft Research published [AutoGen](/wiki/autogen) in late 2023 as a Python framework for multi-agent LLM systems, with built-in patterns for code-generation agents that call a Python interpreter. Magentic-One, released by the same team in November 2024, builds a generalist multi-agent orchestrator on top of AutoGen [36]. Both are research artefacts rather than products, but they have been used as starting points for downstream agents.

## Benchmarks

Progress in AI for code is tracked through a small set of benchmarks. The two that drive most headline claims are HumanEval, for function-level synthesis, and SWE-bench, for repository-level bug fixing.

### HumanEval and MBPP

HumanEval was released by Chen, Tworek, Jun et al. of OpenAI in July 2021 alongside the Codex paper. It contains 164 hand-written Python programming problems, each with a function signature, docstring, body, and unit tests. Models are scored with pass@k, the probability that at least one of k sampled completions passes the tests [14]. By 2024, top models were essentially saturating HumanEval, with reported pass@1 above 90% for many frontier and code-specialised models, which made the benchmark a much weaker signal for new systems.

MBPP, the Mostly Basic Python Programming benchmark, was released by Austin et al. of Google in August 2021 with 974 short Python tasks crowdsourced from entry-level programmers [37]. MBPP plays a similar role to HumanEval and has been similarly saturated.

### APPS and CRUXEval

APPS was released by Hendrycks, Basart, Kadavath et al. in May 2021 with 10,000 Python coding problems collected from competitive-programming sites and split by difficulty into introductory, interview, and competition tiers [38].

[CRUXEval](/wiki/cruxeval), released by Gu, Dao, Ermon, Wang, Yamada, and Sahoo in January 2024, is a benchmark for code reasoning, understanding, and execution. Models are asked to predict program inputs given outputs or vice versa, which separates execution understanding from synthesis [39].

### SWE-bench

[SWE-bench](/wiki/swe_bench), released by Carlos Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan of Princeton in October 2023, was the first benchmark to evaluate models on real-world software engineering. It consists of 2,294 GitHub issues from 12 popular Python repositories, each paired with the merged fix as a hidden test case. A model is given the repo and the issue text, must produce a patch, and is scored on whether the patch passes the project's own tests [40]. SWE-bench Lite is a 300-task subset designed to be cheaper to run.

[SWE-bench Verified](/wiki/swe_bench_verified), released by OpenAI in August 2024 in collaboration with the SWE-bench authors, is a 500-task human-validated subset of SWE-bench. OpenAI engineers manually checked each task to remove ambiguous problem statements and broken tests, since the original benchmark had a known false-negative rate [41]. SWE-bench Verified has become the default leaderboard for coding agents, with scores from the major model and tool vendors widely cited in launch materials. By late 2025, top agents combined with strong models were reaching the 70% to 80% range on SWE-bench Verified, up from under 5% in early 2024.

SWE-bench Multimodal was released in October 2024 and includes JavaScript repositories where the GitHub issue references images, such as broken UI screenshots. SWE-bench Live extends the benchmark with newer issues to mitigate training-data contamination.

### LiveCodeBench and RepoBench

[LiveCodeBench](/wiki/livecodebench), introduced by Jain, Han, Gu, Li, Yan, Zhang, Wang, Stoica, Sen, and Song in April 2024, scrapes problems from competitive-programming sites such as LeetCode, AtCoder, and Codeforces, with explicit dating so evaluators can filter to problems released after a model's training cutoff and reduce contamination [42]. The benchmark also adds tasks beyond synthesis such as self-repair, test output prediction, and execution prediction.

RepoBench, released by Liu, Xu, and McAuley in June 2023 and updated in 2024, evaluates repository-level autocomplete by feeding models a long context drawn from across the repo and scoring the next-line prediction [43].

## Productivity research

The most cited productivity studies are the GitHub field experiment and the 2025 METR randomised trial. They reach almost opposite headline conclusions, which is part of why the topic remains contested.

### GitHub Copilot field experiment (2023)

Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer published a controlled experiment in 2023 in which 95 freelance programmers were randomly assigned to write an HTTP server in JavaScript with or without GitHub Copilot. Developers using Copilot completed the task 55.8% faster on average, with stronger speed-ups for less-experienced developers [3]. The study is widely cited but is limited by the simplicity of the task and the freelancer population. Critics have noted that a tutorial-style coding task does not generalise to maintenance work on a complex codebase, and that the population was not representative of long-tenured engineers.

### Microsoft 365 Copilot studies

Microsoft Research has published several internal studies of Microsoft 365 Copilot and GitHub Copilot in production use at Microsoft and Accenture. The 2024 paper by Cui, Demirer et al., "The Effects of Generative AI on High-Skilled Work," reported productivity gains in pull-request count and time-to-merge but cautioned that the effect varied across teams and tasks [44].

### Dell'Acqua "Centaur and Cyborg" study (2023)

Fabrizio Dell'Acqua of Harvard Business School, together with co-authors from BCG and other institutions, ran an experiment with 758 BCG consultants on knowledge-work tasks. Consultants using GPT-4 outperformed controls on tasks inside the "jagged frontier" of GPT-4's capabilities but did worse on tasks outside it, a result the authors framed in terms of Centaur and Cyborg work styles [45]. The study was not strictly software engineering, but it shaped the way researchers discuss developer productivity with AI.

### METR randomised trial (2025)

The [METR](/wiki/metr) (Model Evaluation and Threat Research) study, published in August 2025 by Becker, Rush, Barnes, and Christiansen, ran a randomised controlled trial with 16 experienced open-source developers contributing to large, mature repositories of their own. Developers were randomly assigned tasks to do with or without permission to use AI tools such as Cursor and Claude. Developers expected the AI tools to make them 24% faster. The measured result was a 19% slowdown when AI tools were allowed [4]. The authors offer several hypotheses for the gap between expectation and result, including the cost of prompting and review and the maturity of the developers' own context for their codebases. The study became one of the most discussed pieces of AI productivity research in 2025 and has been used to argue that productivity effects depend heavily on the developer's prior expertise and the repository's complexity.

### Industry claims

Outside the academic literature, large enterprises have made strong claims about AI's effect on engineering output. Salesforce CEO Marc Benioff said in 2025 that the company had stopped hiring new engineers because Agentforce and other internal AI systems had increased per-engineer productivity by around 30% [46]. Klarna told investors in 2024 that its OpenAI-powered customer service assistant was doing the work of 700 contractors, and the company paused engineering hiring in 2023 with AI cited as a factor [47]. Klarna later reversed course, hiring engineers and humans for customer service in 2025 after reporting service-quality issues. These announcements have been controversial in the trade press.

## Code review AI

Automated code review combines LLM analysis with static checks and Git platform integration. The market grew quickly in 2024 and 2025 alongside agentic coders, because reviewing AI-written code at scale is itself a workload AI vendors want to handle.

**CodeRabbit**, launched in 2023 by Harjot Gill and Gur Singh, posts inline comments on pull requests across GitHub, GitLab, and Azure DevOps using an LLM grounded in repo-wide indices [48].

**Greptile**, founded in 2024, focuses on grounding review on codebase-wide context with a code graph and embeddings store, with a free tier for open-source repositories [49].

**Qodo**, formerly Codium AI, ships PR-Agent, an open-source pull-request reviewer, alongside a paid platform that generates tests and identifies regressions. Qodo rebranded from Codium AI in May 2024, partly to avoid confusion with the unrelated Codeium product [50].

**Diamond by Sourcery** is an LLM-powered review tool layered on top of Sourcery's earlier rule-based Python refactoring engine.

**GitHub Copilot Code Review**, which began rolling out in late 2024 and reached general availability in 2025, lets reviewers request a Copilot review on a pull request, with comments posted as if from a bot reviewer [51].

## Testing and documentation AI

Automated testing has had AI-augmented vendors for longer than chat assistants. Diffblue, founded in 2017 as a spinout from the University of Oxford by Daniel Kroening and others, ships Diffblue Cover, which generates JUnit tests for Java codebases using reinforcement learning and symbolic search [52]. Newer entrants such as Functionize, Mabl, and KaneAI focus on end-to-end and UI test generation. Microsoft Playwright added LLM features for selector healing and test generation in 2024 and 2025 [53].

In documentation, **Mintlify Writer** generates inline docstrings from existing code; the broader Mintlify platform produces hosted developer documentation. **ReadMe** added AI-powered authoring features for API documentation in 2024. **Swimm** combines documentation authoring with code-aware diff tracking so that snippets stay in sync with source code as it changes.

## Open-source models for code

Closed-source frontier models such as those from OpenAI, Anthropic, and Google dominate leaderboards, but open-weights code models are widely deployed for self-hosted use, on-device assistants, and fine-tuned domain-specific tools.

| Model family | Releaser | First release | Notes |
| --- | --- | --- | --- |
| StarCoder, StarCoder 2 | BigCode (Hugging Face, ServiceNow) | May 2023, February 2024 | Open weights, trained on permissively licensed code from The Stack |
| [Code Llama](/wiki/code_llama) | [Meta](/wiki/meta) | August 2023 | Fine-tunes of [LLaMA](/wiki/llama) 2, deprecated in favour of [LLaMA 3](/wiki/llama_3) instruct variants |
| DeepSeek-Coder | [DeepSeek](/wiki/deepseek) | November 2023 | Released as 1.3B, 6.7B, and 33B; trained on a code-heavy corpus |
| DeepSeek-Coder-V2 | DeepSeek | June 2024 | Mixture-of-experts model with 236B total parameters, strong on HumanEval and MBPP |
| DeepSeek-V3 | DeepSeek | December 2024 | General-purpose MoE that posted leading open-weights coding scores |
| Qwen 2.5 Coder | [Alibaba](/wiki/alibaba) Cloud (Qwen team) | November 2024 | Family from 0.5B to 32B with strong SWE-bench-style results |
| [Qwen3](/wiki/qwen_3) | Alibaba (Qwen team) | 2025 | Reasoning and coding focused models with public weights |

The BigCode Project, a collaboration between Hugging Face and ServiceNow Research, also published The Stack, a permissively licensed code dataset, and the SantaCoder series. The Stack v2, released alongside StarCoder 2, includes over four trillion tokens of code from 600 programming languages [54].

DeepSeek's models have been particularly important because they have repeatedly closed the gap with closed-weight frontier coders at much lower training costs, and because they ship with permissive licences that allow commercial use [9].

## Code-LLM concerns

### Hallucinated libraries and slop squatting

LLMs frequently invent package names that look plausible but do not exist. Joseph Spracklen, Raveen Wijewickrama, Nazmus Sakib, Anindya Maiti, Bimal Viswanath, and Murtuza Jadliwala studied this in 2024 and found that across 16 popular LLMs, 19.7% of code samples that imported a Python or JavaScript package referenced a non-existent name [55]. The risk, sometimes called "slop squatting" after typo-squatting, is that an attacker registers the fabricated name on PyPI or npm and ships malicious code that AI-coded projects then download. Security researcher Bar Lanyado of Lasso Security demonstrated the attack practically in March 2024 by registering a fake package that ChatGPT had repeatedly hallucinated and reporting that thousands of downloads followed within weeks [56].

### Security of AI-written code

NYU researchers Pearce, Ahmad, Tan, Dolan-Gavitt, and Karri published a 2021 study of Copilot output across 89 security-relevant scenarios in C and Python, finding that 40% of completions contained known vulnerabilities mapped to MITRE CWEs [57]. Follow-up work has shown mixed results, with some studies arguing that AI-written code has roughly the same vulnerability rate as human-written code when controlled for the population of developers using each.

Vendors have built specialised security review tools. Snyk DeepCode AI uses LLM analysis combined with symbolic reasoning to flag vulnerabilities and propose fixes. GitHub Advanced Security added AI-based autofix for Copilot users in 2024. Veracode AI Fix, released in 2024, generates patches for vulnerabilities found by Veracode's static analysis. Microsoft Security Copilot ships extensions for code reviewers in Azure DevOps. Each of these tools requires careful evaluation, since LLM-generated patches can introduce regressions.

### Code quality and reuse

GitClear, a Git analytics company, published a 2024 report that compared four years of commit data and reported that the share of code that was copy-pasted within a project rose alongside Copilot adoption, while the share of refactored or moved code fell. The report concluded that AI assistance correlated with what the authors called "code churn" and reduced reuse [58]. The methodology has been disputed by GitHub and by other researchers, and the result has not been independently replicated, but the report has been widely cited in the debate about whether AI tools improve maintainability.

### Long-context limits

Long-context performance is a recurring weakness. The Needle in a Haystack and RULER evaluations have shown that retrieval accuracy in very long contexts degrades sharply for many models, which limits how well a coding agent can reason over a large monorepo without retrieval [59]. The release of one-million-token windows for [Gemini 1.5 Pro](/wiki/gemini), Claude Sonnet, and GPT-4.1 has changed what is feasible, but agents still typically rely on chunking, embeddings, and language-server queries rather than naive context stuffing.

### Licensing of training data

Most large code LLMs are trained on public GitHub repositories, including code with copyleft licences such as GPL. Critics including Matthew Butterick, Joseph Saveri, and Tim Davidson filed a class-action suit in November 2022 in the US District Court for the Northern District of California alleging that GitHub Copilot violated the licences of the open-source code it was trained on by emitting verbatim or near-verbatim snippets without the required attribution [60]. The case (Doe v. GitHub, Inc., No. 4:22-cv-06823) has been narrowed through several rulings since filing. A separate suit, Doe v. OpenAI, makes similar arguments against the underlying Codex model. As of late 2025, the litigation is ongoing and has not produced a definitive ruling that resolves the question.

### Job and skill effects

The broader debate about whether AI coding tools will displace or augment software engineers has surfaced in public statements from CEOs (see Industry claims above) and in surveys. The Stack Overflow Developer Survey for 2024 reported that 72% of professional respondents used AI tools in their workflow, up from 44% in 2023, but only 43% trusted the accuracy of AI tools, down from 49% [2]. Junior-developer hiring trends and changes in computer-science enrolment have been linked to AI by industry commentators including Gergely Orosz of The Pragmatic Engineer, though the causality is contested.

## Notable lawsuits and policy

- **Doe v. GitHub, Inc. (2022 to present):** Class-action complaint filed 3 November 2022 in California against GitHub, Microsoft, and OpenAI, alleging that Copilot reproduces copyrighted code without attribution. Several claims have been dismissed but the core DMCA and breach-of-contract claims have survived motions to dismiss [60].
- **Galactica v. The Stack contributors:** Hugging Face introduced a tool that lets developers check whether their code was included in The Stack training set and opt out, in response to criticism that opt-out was not adequately offered.
- **EU AI Act provisions on transparency:** The European Union's AI Act, which entered into force in August 2024, requires general-purpose AI providers to publish summaries of training data, with implications for how code models disclose their training corpora [61].
- **US Executive Order 14110:** Issued in October 2023, this executive order required reporting on AI used in critical software and prompted the National Institute of Standards and Technology to publish guidance on AI-assisted software development. The order was revoked by President Donald Trump in January 2025, though some agency-level guidance remains [62].

## See also

- [Github Copilot](/wiki/github_copilot)
- [Cursor (code editor)](/wiki/cursor)
- [Windsurf (software)](/wiki/windsurf)
- [Codeium](/wiki/codeium)
- [Devin (AI software engineer)](/wiki/devin)
- [Claude Code](/wiki/claude_code)
- [Codex (OpenAI)](/wiki/codex)
- [OpenAI Codex](/wiki/openai_codex)
- [OpenHands](/wiki/openhands)
- [Aider](/wiki/aider)
- [Cline (AI coding agent)](/wiki/cline)
- [Continue (software)](/wiki/continue_dev)
- [Sourcegraph Cody](/wiki/sourcegraph_cody)
- [Tabnine](/wiki/tabnine)
- [Replit](/wiki/replit)
- [Anysphere](/wiki/anysphere)
- [Cognition AI](/wiki/cognition_ai)
- [Anthropic](/wiki/anthropic)
- [OpenAI](/wiki/openai)
- [DeepSeek](/wiki/deepseek)
- [StarCoder](/wiki/starcoder)
- [Code Llama](/wiki/code_llama)
- [Qwen](/wiki/qwen)
- [Gemini CLI](/wiki/gemini_cli)
- [Amazon Q](/wiki/amazon_q)
- [HumanEval](/wiki/humaneval)
- [MBPP](/wiki/mbpp)
- [SWE-bench](/wiki/swe_bench)
- [SWE-bench Verified](/wiki/swe_bench_verified)
- [LiveCodeBench](/wiki/livecodebench)
- [CRUXEval](/wiki/cruxeval)
- [METR (Model Evaluation and Threat Research)](/wiki/metr)
- [Model Context Protocol](/wiki/mcp)
- [AutoGen](/wiki/autogen)
- [Vibe coding](/wiki/vibe_coding)
- [Hallucination](/wiki/hallucination)
- [Programming](/wiki/programming)
- [Large Language Model](/wiki/large_language_model)

## References

1. [Stack Overflow 2024 Developer Survey - AI section](https://survey.stackoverflow.co/2024/ai)
2. [Stack Overflow 2024 Developer Survey - Methodology and key findings](https://survey.stackoverflow.co/2024/)
3. [The Impact of AI on Developer Productivity: Evidence from GitHub Copilot - Peng, Kalliamvakou, Cihon, Demirer (arXiv:2302.06590)](https://arxiv.org/abs/2302.06590)
4. [Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity - METR (August 2025)](https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/)
5. [GitHub Copilot Chat now generally available - GitHub Blog (December 2023)](https://github.blog/news-insights/product-news/github-copilot-chat-now-generally-available-for-organizations-and-individuals/)
6. [GitHub Copilot Workspace: Welcome to the Copilot-native developer environment - GitHub Blog (April 2024)](https://github.blog/news-insights/product-news/github-copilot-workspace/)
7. [GitHub Copilot agent mode now generally available - GitHub Blog (2025)](https://github.blog/changelog/2025-04-04-copilot-agent-mode/)
8. [Cursor changelog and feature history - Cursor.com](https://cursor.com/changelog)
9. [DeepSeek-Coder: When the Large Language Model Meets Programming - Guo et al. (arXiv:2401.14196)](https://arxiv.org/abs/2401.14196)
10. [Qwen2.5-Coder Technical Report - Hui et al. (arXiv:2409.12186)](https://arxiv.org/abs/2409.12186)
11. [Code Llama: Open Foundation Models for Code - Roziere et al. (arXiv:2308.12950)](https://arxiv.org/abs/2308.12950)
12. [Introducing the Model Context Protocol - Anthropic (November 2024)](https://www.anthropic.com/news/model-context-protocol)
13. [Lessons from building Kite - Adam Smith blog (2022)](https://www.kite.com/blog/product/kite-is-saying-farewell-and-open-sourcing-its-code/)
14. [Evaluating Large Language Models Trained on Code - Chen et al. (arXiv:2107.03374)](https://arxiv.org/abs/2107.03374)
15. [Introducing GitHub Copilot: your AI pair programmer - GitHub Blog (June 29, 2021)](https://github.blog/news-insights/product-news/introducing-github-copilot-ai-pair-programmer/)
16. [GitHub Copilot is generally available to all developers - GitHub Blog (June 21, 2022)](https://github.blog/news-insights/product-news/github-copilot-is-generally-available-to-all-developers/)
17. [Microsoft FY24 Q4 earnings call transcript - July 30, 2024](https://www.microsoft.com/en-us/Investor/earnings/FY-2024-Q4/transcript)
18. [Amazon Q Developer is generally available - AWS News Blog (April 2024)](https://aws.amazon.com/blogs/aws/amazon-q-developer-is-now-generally-available/)
19. [Gemini Code Assist (formerly Duet AI) launches with 1M token context - Google Cloud Blog (February 2024)](https://cloud.google.com/blog/products/ai-machine-learning/introducing-gemini-code-assist)
20. [JetBrains AI Assistant launch and Junie - JetBrains Blog](https://blog.jetbrains.com/ai/)
21. [Introducing Devin, the first AI software engineer - Cognition AI (March 12, 2024)](https://cognition.ai/blog/introducing-devin)
22. [Debunking Devin - Internet of Bugs / Hacker News discussion](https://news.ycombinator.com/item?id=40008109)
23. [SWE-Agent: Agent-Computer Interfaces Enable Automated Software Engineering - Yang et al. (arXiv:2405.15793)](https://arxiv.org/abs/2405.15793)
24. [OpenHands: An Open Platform for AI Software Developers as Generalist Agents - Wang et al. (arXiv:2407.16741)](https://arxiv.org/abs/2407.16741)
25. [Claude 3.7 Sonnet and Claude Code - Anthropic (February 24, 2025)](https://www.anthropic.com/news/claude-3-7-sonnet)
26. [Introducing Codex - OpenAI (May 16, 2025)](https://openai.com/index/introducing-codex/)
27. [Replit Agent and Agent 2 - Replit Blog](https://replit.com/blog/agent)
28. [Cursor's Anysphere nabs $9.9B valuation, soars past $500M ARR - TechCrunch (June 2025)](https://techcrunch.com/2025/06/05/cursors-anysphere-nabs-9-9b-valuation-soars-past-500m-arr/)
29. [Google strikes deal with Windsurf for AI coding talent - The Information / Bloomberg (July 2025)](https://www.bloomberg.com/news/articles/2025-07-11/google-pays-2-4-billion-to-license-windsurf-ai-tech-hire-staff)
30. [Cognition to buy AI startup Windsurf days after Google poached CEO - CNBC (July 14, 2025)](https://www.cnbc.com/2025/07/14/cognition-to-buy-ai-startup-windsurf-days-after-google-poached-ceo.html)
31. [Aider GitHub repository - paul-gauthier/aider](https://github.com/Aider-AI/aider)
32. [Cline GitHub repository - cline/cline](https://github.com/cline/cline)
33. [Devin 2.0: Cognition slashes price of AI software engineer - VentureBeat (April 2025)](https://venturebeat.com/programming-development/devin-2-0-is-here-cognition-slashes-price-of-ai-software-engineer-to-20-per-month-from-500)
34. [Cognition valued at $10.2 billion two months after Windsurf purchase - CNBC (September 2025)](https://www.cnbc.com/2025/09/08/cognition-valued-at-10point2-billion-two-months-after-windsurf-.html)
35. [Anthropic revenue and Claude Code adoption - The Information (2025)](https://www.theinformation.com/articles/anthropics-revenue-soars-as-businesses-bet-on-claude)
36. [Magentic-One: A Generalist Multi-Agent System - Microsoft Research (November 2024)](https://www.microsoft.com/en-us/research/publication/magentic-one-a-generalist-multi-agent-system-for-solving-complex-tasks/)
37. [Program Synthesis with Large Language Models - Austin et al. (MBPP) (arXiv:2108.07732)](https://arxiv.org/abs/2108.07732)
38. [Measuring Coding Challenge Competence With APPS - Hendrycks et al. (arXiv:2105.09938)](https://arxiv.org/abs/2105.09938)
39. [CRUXEval: A Benchmark for Code Reasoning, Understanding, and Execution - Gu et al. (arXiv:2401.03065)](https://arxiv.org/abs/2401.03065)
40. [SWE-bench: Can Language Models Resolve Real-World GitHub Issues? - Jimenez et al. (arXiv:2310.06770)](https://arxiv.org/abs/2310.06770)
41. [Introducing SWE-bench Verified - OpenAI (August 13, 2024)](https://openai.com/index/introducing-swe-bench-verified/)
42. [LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code - Jain et al. (arXiv:2403.07974)](https://arxiv.org/abs/2403.07974)
43. [RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - Liu et al. (arXiv:2306.03091)](https://arxiv.org/abs/2306.03091)
44. [The Effects of Generative AI on High-Skilled Work: Evidence from Three Field Experiments with Software Developers - Cui et al. (SSRN)](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4945566)
45. [Navigating the Jagged Technological Frontier - Dell'Acqua et al. (Harvard Business School Working Paper 24-013, 2023)](https://www.hbs.edu/faculty/Pages/item.aspx?num=64700)
46. [Salesforce CEO says AI is replacing engineers - The Information / Bloomberg (2025)](https://www.bloomberg.com/news/articles/2025-02-26/salesforce-ceo-says-company-won-t-add-engineers-this-year-due-to-ai)
47. [Klarna says OpenAI assistant does work of 700 contractors - Reuters (February 2024)](https://www.reuters.com/technology/klarna-ai-assistant-does-work-700-people-after-layoffs-2024-02-27/)
48. [CodeRabbit launch and documentation - CodeRabbit.ai](https://www.coderabbit.ai/)
49. [Greptile launch and documentation - Greptile.com](https://www.greptile.com/)
50. [Codium AI rebrands to Qodo - Qodo blog (May 2024)](https://www.qodo.ai/blog/codium-ai-is-now-qodo/)
51. [GitHub Copilot Code Review GA - GitHub Blog (2025)](https://github.blog/changelog/2025-04-29-copilot-code-review-now-generally-available/)
52. [Diffblue Cover - Diffblue.com](https://www.diffblue.com/)
53. [Playwright MCP server and AI features - Microsoft Playwright Blog (2024 and 2025)](https://playwright.dev/blog/)
54. [StarCoder 2 and The Stack v2 - BigCode Project (arXiv:2402.19173)](https://arxiv.org/abs/2402.19173)
55. [We Have a Package for You: A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs - Spracklen et al. (arXiv:2406.10279)](https://arxiv.org/abs/2406.10279)
56. [AI hallucinations cause Python packages to appear on PyPI - The Register (March 2024)](https://www.theregister.com/2024/03/28/ai_bots_hallucinate_software_packages/)
57. [Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions - Pearce et al. (arXiv:2108.09293)](https://arxiv.org/abs/2108.09293)
58. [Coding on Copilot: 2024 Data Suggests Downward Pressure on Code Quality - GitClear](https://www.gitclear.com/coding_on_copilot_data_shows_ais_downward_pressure_on_code_quality)
59. [RULER: What's the Real Context Size of Your Long-Context Language Models? - Hsieh et al. (arXiv:2404.06654)](https://arxiv.org/abs/2404.06654)
60. [Doe v. GitHub, Inc., No. 4:22-cv-06823 (N.D. Cal., filed November 3, 2022) - Joseph Saveri Law Firm](https://githubcopilotlitigation.com/)
61. [EU Artificial Intelligence Act - Official Journal of the European Union (August 2024)](https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32024R1689)
62. [Executive Order 14110 on Safe, Secure, and Trustworthy AI - White House (October 2023), revoked January 2025](https://web.archive.org/web/20240101000000*/whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/)
63. [GitHub Universe 2023 keynote announcements - GitHub Blog (November 2023)](https://github.blog/news-insights/company-news/universe-2023-copilot-transforms-github-into-the-ai-powered-developer-platform/)
64. [Microsoft AutoGen documentation - GitHub microsoft/autogen](https://github.com/microsoft/autogen)
65. [Sourcegraph Cody documentation and launch - Sourcegraph blog](https://sourcegraph.com/blog/cody-is-generally-available)

