Software Development
Last reviewed
May 13, 2026
Sources
65 citations
Review status
Source-backed
Revision
v2 ยท 5,958 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 13, 2026
Sources
65 citations
Review status
Source-backed
Revision
v2 ยท 5,958 words
Add missing citations, update stale details, or suggest a clearer explanation.
See also: Software Development ChatGPT Plugins
AI in software development is the use of large language models and other machine learning systems to assist or automate tasks that programmers have traditionally done by hand. The field spans single-line autocomplete inside an editor, conversational chat assistants that explain or refactor code, autonomous agents that plan multi-step changes, and specialised tools for code review, testing, documentation, and security analysis. Between mid-2021, when GitHub Copilot entered private preview, and the end of 2025, AI-assisted coding moved from a curiosity to a default part of many professional engineering workflows, with adoption surveys reporting that a majority of working developers had used such a tool in the prior year [1][2].
The shift has been driven by transformer models trained on very large corpora of public source code and natural language, the wide deployment of inline completion in editors such as Visual Studio Code and the JetBrains family, and a wave of agentic systems including Devin, Claude Code, and OpenAI Codex. Productivity studies have produced mixed and sometimes contradictory findings, ranging from a 55.8% speed-up on a controlled HTTP server task to a randomised trial of senior open-source maintainers who were 19% slower when allowed to use AI tools [3][4]. Concerns about hallucinated APIs, supply-chain attacks via fabricated package names, licensing of training data, and the long-term effect on code quality have prompted academic study, lawsuits, and changes in enterprise policy.
AI coding tools fall into a small number of categories that are useful to distinguish because they have different cost structures, failure modes, and review requirements.
| Category | Typical interaction | Examples |
|---|---|---|
| Inline completion | Ghost text inside the editor, accepted by Tab | Early Copilot, Tabnine, Codeium |
| Chat assistants | Side-panel or pop-up chat with file context | Copilot Chat, Sourcegraph Cody, JetBrains AI Assistant |
| Edit-mode tools | Model rewrites a selection in place | Cursor Composer, Aider, Continue |
| Coding agents | Multi-step planning, shell access, file I/O | Devin, Claude Code, OpenHands, Replit Agent |
| Domain specialists | Reviews, tests, docs, security, observability | CodeRabbit, Qodo, Diffblue, Snyk DeepCode |
The boundary between these categories is fuzzy because most flagship products now bundle several modes. GitHub Copilot, for example, started as inline completion, added chat in 2023, added the Copilot Workspace agent in 2024, and shipped a fully agentic mode in 2025 [5][6][7]. Cursor began as a fork of Visual Studio Code with chat and edit features and now ships a background agent that runs on remote machines [8].
Underneath the products, the technology stack is fairly consistent. The base model is almost always a transformer-based LLM, either a general-purpose frontier model such as GPT-4o, Claude 3.5 Sonnet and its successors, or Gemini, or a code-specialised model such as DeepSeek-Coder, Qwen 2.5 Coder, StarCoder, or Code Llama [9][10][11]. Retrieval over the local repository is added on top, using embeddings, language servers, or filesystem traversal. For agents, a planning loop, a tool harness, and a sandboxed shell are layered over the model. The Model Context Protocol released by Anthropic in November 2024 has become a common way to plug external tools into these agents [12].
Early editor assistance was rule-based or grammar-driven. Microsoft's IntelliSense, which shipped in Visual Basic 5.0 in 1996 and then in Visual Studio, used static type information and parse trees to suggest method names. Eclipse's Java Development Tools and the C/C++ Development Tooling shipped similar features for open source. These tools knew nothing about semantics beyond what compilers exposed, and they did not generate new code.
Statistical approaches arrived in the 2010s. Kite, founded by Adam Smith in 2014, used cloud-trained models to suggest Python completions inside editors. It briefly ran a free copilot product before shutting down in 2022, with Smith publishing a post-mortem that described the difficulty of building a viable business around inline suggestions before transformer models existed [13]. Tabnine, founded as Codota in 2013 by Dror Weiss and Eran Yahav, used statistical and neural models to provide multi-language autocomplete and is one of the few pre-Copilot vendors that survived the transition.
Neural language models for code became prominent with OpenAI's Codex paper in July 2021, which described a fine-tuning of GPT-3 on public GitHub code and introduced the HumanEval benchmark of 164 hand-written Python problems [14]. The model behind Codex powered the first version of Copilot, which Microsoft and GitHub launched as a private preview on 29 June 2021 [15]. General availability followed on 21 June 2022 at $10 per month for individuals, with free access for verified students and maintainers of popular open-source projects [16].
GitHub Copilot is the product that turned AI autocomplete into a mainstream developer tool. Microsoft and GitHub leadership cited it heavily on earnings calls, and CEO Satya Nadella reported that paid Copilot users for individuals and the GitHub Copilot business plan had exceeded 1.3 million on the company's fiscal year 2024 fourth quarter earnings call in July 2024, growing more than 180% year over year [17]. Copilot Chat reached general availability for business users in December 2023 after a public beta announced at GitHub Universe in November of that year [5]. Copilot Workspace, a task-oriented interface that lets the model plan, propose code edits, and run tests against a repository, was announced at GitHub Universe in April 2024 [6]. Copilot agent mode, which lets the model run shell commands and iterate on a task without per-step approval, shipped in 2025 [7].
The success of Copilot triggered a wave of competitors. Amazon previewed CodeWhisperer in June 2022 and reached general availability in April 2023, then rebranded the product as Amazon Q Developer at AWS re:Invent in November 2023 with a broader rollout in April 2024 [18]. Google launched Duet AI for Developers at Google Cloud Next in August 2023 and renamed it Gemini Code Assist in February 2024 [19]. JetBrains, which makes the IntelliJ family of IDEs, shipped JetBrains AI Assistant in December 2023 and added the agentic Junie in 2025 [20]. Sourcegraph released Cody, which uses repository graph indexes to ground completions in large codebases, in 2023.
The step from "autocomplete" to "agent" came in 2024. Cognition AI announced Devin, billed as the "first AI software engineer," on 12 March 2024, with a demo video showing the system completing freelance jobs end-to-end inside a sandboxed Linux environment [21]. The Devin demo, and the company's claim of a 13.86% unassisted score on the SWE-bench benchmark, drew both attention and scepticism. Several engineers, including Carl Brown of Internet of Bugs, published video critiques arguing that the demo had been edited and that the actual capability was lower than advertised [22]. The episode marked the start of a more public debate about how to measure agentic coders.
Academic work moved quickly. The SWE-Agent system, developed at Princeton NLP by John Yang, Carlos Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press, was released in April 2024 [23]. It introduced the agent-computer interface, a deliberate set of file-editing and shell commands designed for an LLM, and reached 12.5% on SWE-bench Lite using GPT-4. The team at the University of Illinois Urbana-Champaign and the All Hands AI startup released OpenDevin, later renamed OpenHands, as an open-source alternative in March 2024 [24].
Anthropic announced Claude Code on 24 February 2025 as a terminal-based agent with file editing, shell execution, and git tools [25]. OpenAI re-released the Codex brand on 16 May 2025, this time as a cloud-hosted coding agent that runs in parallel containers, accepts tasks from chat or pull requests, and ships a CLI for local use [26]. Replit's Agent shipped in September 2024 with a focus on building and deploying full applications from natural language, and Replit Agent 2 followed in February 2025 [27].
The table below lists widely used tools, with first-release dates referring to the public availability of the AI-coding product, not to the parent company.
| Tool | First release | Organisation | Key features |
|---|---|---|---|
| GitHub Copilot | June 2021 preview, June 2022 GA | GitHub (Microsoft) | Inline completion, Copilot Chat, Copilot Workspace, agent mode |
| Cursor | March 2023 | Anysphere | Forked VS Code editor, Composer multi-file edits, background agent |
| Windsurf | November 2024 | Codeium then OpenAI then Cognition | Cascade agent, Codeium roots, acquired by Cognition AI in July 2025 |
| Codeium | 2022 | Codeium (later Windsurf) | Free tier completion across many editors |
| Tabnine | 2018 (neural), 2023 (chat) | Tabnine | Self-hosted option, enterprise compliance focus |
| Aider | 2023 | Paul Gauthier (open source) | Terminal-based pair programmer that writes commits |
| Cline | 2024 (as Claude Dev) | Open source | VS Code extension, BYO key, plan and act modes |
| Continue | 2023 | Continue.dev | Open-source autopilot for VS Code and JetBrains |
| Sourcegraph Cody | 2023 | Sourcegraph | Repo-graph search context, enterprise focus |
| Replit Agent | September 2024 | Replit | Build and deploy apps from prompts, Replit Agent 2 in 2025 |
| JetBrains AI Assistant | December 2023 | JetBrains | Multi-language chat and completion across IntelliJ family |
| JetBrains Junie | 2025 | JetBrains | Agentic coder for the JetBrains IDEs |
| Amazon Q Developer | April 2024 (rebrand) | Amazon | Successor to CodeWhisperer, AWS service integration |
| Gemini Code Assist | February 2024 (rebrand) | Successor to Duet AI for Developers, GCP integration | |
| Gemini CLI | 2025 | Open-source terminal agent powered by Gemini | |
| Phind | 2023 | Phind | Search-based answer engine for developers |
| Mintlify | 2022 | Mintlify | Documentation generator and authoring platform |
| Stack Overflow OverflowAI | Announced July 2023 | Stack Overflow | AI search, enterprise knowledge ingestion |
Several of these products have stories that are worth flagging.
Cursor is built and operated by Anysphere, a company founded in 2022 in San Francisco by Michael Truell, Sualeh Asif, Arvid Lunnemark, and Aman Sanger, all then in their early twenties. Cursor's editor is a fork of VS Code with deeper LLM integration, including the Composer feature for multi-file edits and an inline edit mode triggered by Cmd-K. The product pulled ahead of standalone Copilot use among early adopters in 2024 and crossed $100 million in annualised revenue within twelve months. By June 2025, Anysphere had reached a $9.9 billion valuation on more than $500 million in ARR [28].
Codeium and Windsurf had a turbulent 2025. Codeium, the consumer brand, evolved into Windsurf with the launch of the Cascade agent in November 2024. In July 2025, Google reportedly paid about $2.4 billion in a licensing and hiring deal that pulled CEO Varun Mohan and other engineers to DeepMind, leaving the Windsurf product behind. Cognition AI then acquired what remained of Windsurf later in July 2025 [29][30].
Aider, written and maintained by Paul Gauthier, is an open-source command-line tool that pairs an LLM with a Git workflow. It applies edits as commits, with diffs the user can review, and has been used as a reference implementation for many later agentic coders [31].
Cline, originally released as Claude Dev, is an open-source VS Code extension that uses your own API key, runs in a plan-then-act loop, and supports multiple model providers [32].
A "coding agent" in this article means a system that, given a high-level task, can plan steps, edit files, run commands such as tests, observe results, and iterate without per-step human approval. The line between an agent and a chat tool is blurry, since most chat tools now ship some form of tool use.
Devin is the agent that arguably defined the category. Cognition AI, founded in 2023 by Scott Wu, Steven Hao, and Walden Yan, announced Devin on 12 March 2024 with a launch video, a benchmark claim, and a private waitlist [21]. The system runs in a sandbox with a Linux shell, code editor, and browser, and presents a chat interface where users can hand off tasks. Devin reached general availability in December 2024 at $500 per month, then dropped to a $20 starting price with Devin 2.0 in April 2025 [33]. Cognition was valued at over $10 billion by September 2025 [34].
Claude Code is Anthropic's terminal-based coding agent. It launched in research preview with Claude 3.7 Sonnet on 24 February 2025 [25]. Claude Code runs as a CLI, with permission to read and edit files in a working directory, execute shell commands, and call MCP servers for external tools. Anthropic has rolled out follow-on features including subagents, hooks, a VS Code extension, GitHub integration, and a one-million-token context window for Sonnet variants. Anthropic reported on its fiscal results that Claude Code reached significant adoption among engineering teams during 2025, contributing to a sharp rise in API revenue [35].
OpenAI Codex is a reuse of the Codex brand. The first Codex, retired in March 2023, was the model behind early Copilot. The new Codex, released on 16 May 2025, is a cloud-hosted coding agent that runs each task in its own container, with support for parallel runs and pull-request style review of changes. It is powered by codex-1, a fine-tuned variant of the o3 reasoning model, and ships a CLI for local use [26].
OpenHands is the leading open-source agent. It supports many model providers, runs in a Docker sandbox, and includes a browser and shell tool. The project was created at All Hands AI by the team that built OpenDevin [24]. SWE-Agent, also open source and developed by the Princeton NLP group, is more focused on the SWE-bench task format and contributed the influential agent-computer interface design [23].
Replit's Agent, launched in September 2024 and followed by Agent 2 in February 2025, is positioned for non-professional developers who want to build and deploy an entire web application from a natural-language description, with hosting and a database provisioned automatically inside Replit's platform [27].
Microsoft Research published AutoGen in late 2023 as a Python framework for multi-agent LLM systems, with built-in patterns for code-generation agents that call a Python interpreter. Magentic-One, released by the same team in November 2024, builds a generalist multi-agent orchestrator on top of AutoGen [36]. Both are research artefacts rather than products, but they have been used as starting points for downstream agents.
Progress in AI for code is tracked through a small set of benchmarks. The two that drive most headline claims are HumanEval, for function-level synthesis, and SWE-bench, for repository-level bug fixing.
HumanEval was released by Chen, Tworek, Jun et al. of OpenAI in July 2021 alongside the Codex paper. It contains 164 hand-written Python programming problems, each with a function signature, docstring, body, and unit tests. Models are scored with pass@k, the probability that at least one of k sampled completions passes the tests [14]. By 2024, top models were essentially saturating HumanEval, with reported pass@1 above 90% for many frontier and code-specialised models, which made the benchmark a much weaker signal for new systems.
MBPP, the Mostly Basic Python Programming benchmark, was released by Austin et al. of Google in August 2021 with 974 short Python tasks crowdsourced from entry-level programmers [37]. MBPP plays a similar role to HumanEval and has been similarly saturated.
APPS was released by Hendrycks, Basart, Kadavath et al. in May 2021 with 10,000 Python coding problems collected from competitive-programming sites and split by difficulty into introductory, interview, and competition tiers [38].
CRUXEval, released by Gu, Dao, Ermon, Wang, Yamada, and Sahoo in January 2024, is a benchmark for code reasoning, understanding, and execution. Models are asked to predict program inputs given outputs or vice versa, which separates execution understanding from synthesis [39].
SWE-bench, released by Carlos Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan of Princeton in October 2023, was the first benchmark to evaluate models on real-world software engineering. It consists of 2,294 GitHub issues from 12 popular Python repositories, each paired with the merged fix as a hidden test case. A model is given the repo and the issue text, must produce a patch, and is scored on whether the patch passes the project's own tests [40]. SWE-bench Lite is a 300-task subset designed to be cheaper to run.
SWE-bench Verified, released by OpenAI in August 2024 in collaboration with the SWE-bench authors, is a 500-task human-validated subset of SWE-bench. OpenAI engineers manually checked each task to remove ambiguous problem statements and broken tests, since the original benchmark had a known false-negative rate [41]. SWE-bench Verified has become the default leaderboard for coding agents, with scores from the major model and tool vendors widely cited in launch materials. By late 2025, top agents combined with strong models were reaching the 70% to 80% range on SWE-bench Verified, up from under 5% in early 2024.
SWE-bench Multimodal was released in October 2024 and includes JavaScript repositories where the GitHub issue references images, such as broken UI screenshots. SWE-bench Live extends the benchmark with newer issues to mitigate training-data contamination.
LiveCodeBench, introduced by Jain, Han, Gu, Li, Yan, Zhang, Wang, Stoica, Sen, and Song in April 2024, scrapes problems from competitive-programming sites such as LeetCode, AtCoder, and Codeforces, with explicit dating so evaluators can filter to problems released after a model's training cutoff and reduce contamination [42]. The benchmark also adds tasks beyond synthesis such as self-repair, test output prediction, and execution prediction.
RepoBench, released by Liu, Xu, and McAuley in June 2023 and updated in 2024, evaluates repository-level autocomplete by feeding models a long context drawn from across the repo and scoring the next-line prediction [43].
The most cited productivity studies are the GitHub field experiment and the 2025 METR randomised trial. They reach almost opposite headline conclusions, which is part of why the topic remains contested.
Sida Peng, Eirini Kalliamvakou, Peter Cihon, and Mert Demirer published a controlled experiment in 2023 in which 95 freelance programmers were randomly assigned to write an HTTP server in JavaScript with or without GitHub Copilot. Developers using Copilot completed the task 55.8% faster on average, with stronger speed-ups for less-experienced developers [3]. The study is widely cited but is limited by the simplicity of the task and the freelancer population. Critics have noted that a tutorial-style coding task does not generalise to maintenance work on a complex codebase, and that the population was not representative of long-tenured engineers.
Microsoft Research has published several internal studies of Microsoft 365 Copilot and GitHub Copilot in production use at Microsoft and Accenture. The 2024 paper by Cui, Demirer et al., "The Effects of Generative AI on High-Skilled Work," reported productivity gains in pull-request count and time-to-merge but cautioned that the effect varied across teams and tasks [44].
Fabrizio Dell'Acqua of Harvard Business School, together with co-authors from BCG and other institutions, ran an experiment with 758 BCG consultants on knowledge-work tasks. Consultants using GPT-4 outperformed controls on tasks inside the "jagged frontier" of GPT-4's capabilities but did worse on tasks outside it, a result the authors framed in terms of Centaur and Cyborg work styles [45]. The study was not strictly software engineering, but it shaped the way researchers discuss developer productivity with AI.
The METR (Model Evaluation and Threat Research) study, published in August 2025 by Becker, Rush, Barnes, and Christiansen, ran a randomised controlled trial with 16 experienced open-source developers contributing to large, mature repositories of their own. Developers were randomly assigned tasks to do with or without permission to use AI tools such as Cursor and Claude. Developers expected the AI tools to make them 24% faster. The measured result was a 19% slowdown when AI tools were allowed [4]. The authors offer several hypotheses for the gap between expectation and result, including the cost of prompting and review and the maturity of the developers' own context for their codebases. The study became one of the most discussed pieces of AI productivity research in 2025 and has been used to argue that productivity effects depend heavily on the developer's prior expertise and the repository's complexity.
Outside the academic literature, large enterprises have made strong claims about AI's effect on engineering output. Salesforce CEO Marc Benioff said in 2025 that the company had stopped hiring new engineers because Agentforce and other internal AI systems had increased per-engineer productivity by around 30% [46]. Klarna told investors in 2024 that its OpenAI-powered customer service assistant was doing the work of 700 contractors, and the company paused engineering hiring in 2023 with AI cited as a factor [47]. Klarna later reversed course, hiring engineers and humans for customer service in 2025 after reporting service-quality issues. These announcements have been controversial in the trade press.
Automated code review combines LLM analysis with static checks and Git platform integration. The market grew quickly in 2024 and 2025 alongside agentic coders, because reviewing AI-written code at scale is itself a workload AI vendors want to handle.
CodeRabbit, launched in 2023 by Harjot Gill and Gur Singh, posts inline comments on pull requests across GitHub, GitLab, and Azure DevOps using an LLM grounded in repo-wide indices [48].
Greptile, founded in 2024, focuses on grounding review on codebase-wide context with a code graph and embeddings store, with a free tier for open-source repositories [49].
Qodo, formerly Codium AI, ships PR-Agent, an open-source pull-request reviewer, alongside a paid platform that generates tests and identifies regressions. Qodo rebranded from Codium AI in May 2024, partly to avoid confusion with the unrelated Codeium product [50].
Diamond by Sourcery is an LLM-powered review tool layered on top of Sourcery's earlier rule-based Python refactoring engine.
GitHub Copilot Code Review, which began rolling out in late 2024 and reached general availability in 2025, lets reviewers request a Copilot review on a pull request, with comments posted as if from a bot reviewer [51].
Automated testing has had AI-augmented vendors for longer than chat assistants. Diffblue, founded in 2017 as a spinout from the University of Oxford by Daniel Kroening and others, ships Diffblue Cover, which generates JUnit tests for Java codebases using reinforcement learning and symbolic search [52]. Newer entrants such as Functionize, Mabl, and KaneAI focus on end-to-end and UI test generation. Microsoft Playwright added LLM features for selector healing and test generation in 2024 and 2025 [53].
In documentation, Mintlify Writer generates inline docstrings from existing code; the broader Mintlify platform produces hosted developer documentation. ReadMe added AI-powered authoring features for API documentation in 2024. Swimm combines documentation authoring with code-aware diff tracking so that snippets stay in sync with source code as it changes.
Closed-source frontier models such as those from OpenAI, Anthropic, and Google dominate leaderboards, but open-weights code models are widely deployed for self-hosted use, on-device assistants, and fine-tuned domain-specific tools.
| Model family | Releaser | First release | Notes |
|---|---|---|---|
| StarCoder, StarCoder 2 | BigCode (Hugging Face, ServiceNow) | May 2023, February 2024 | Open weights, trained on permissively licensed code from The Stack |
| Code Llama | Meta | August 2023 | Fine-tunes of LLaMA 2, deprecated in favour of LLaMA 3 instruct variants |
| DeepSeek-Coder | DeepSeek | November 2023 | Released as 1.3B, 6.7B, and 33B; trained on a code-heavy corpus |
| DeepSeek-Coder-V2 | DeepSeek | June 2024 | Mixture-of-experts model with 236B total parameters, strong on HumanEval and MBPP |
| DeepSeek-V3 | DeepSeek | December 2024 | General-purpose MoE that posted leading open-weights coding scores |
| Qwen 2.5 Coder | Alibaba Cloud (Qwen team) | November 2024 | Family from 0.5B to 32B with strong SWE-bench-style results |
| Qwen3 | Alibaba (Qwen team) | 2025 | Reasoning and coding focused models with public weights |
The BigCode Project, a collaboration between Hugging Face and ServiceNow Research, also published The Stack, a permissively licensed code dataset, and the SantaCoder series. The Stack v2, released alongside StarCoder 2, includes over four trillion tokens of code from 600 programming languages [54].
DeepSeek's models have been particularly important because they have repeatedly closed the gap with closed-weight frontier coders at much lower training costs, and because they ship with permissive licences that allow commercial use [9].
LLMs frequently invent package names that look plausible but do not exist. Joseph Spracklen, Raveen Wijewickrama, Nazmus Sakib, Anindya Maiti, Bimal Viswanath, and Murtuza Jadliwala studied this in 2024 and found that across 16 popular LLMs, 19.7% of code samples that imported a Python or JavaScript package referenced a non-existent name [55]. The risk, sometimes called "slop squatting" after typo-squatting, is that an attacker registers the fabricated name on PyPI or npm and ships malicious code that AI-coded projects then download. Security researcher Bar Lanyado of Lasso Security demonstrated the attack practically in March 2024 by registering a fake package that ChatGPT had repeatedly hallucinated and reporting that thousands of downloads followed within weeks [56].
NYU researchers Pearce, Ahmad, Tan, Dolan-Gavitt, and Karri published a 2021 study of Copilot output across 89 security-relevant scenarios in C and Python, finding that 40% of completions contained known vulnerabilities mapped to MITRE CWEs [57]. Follow-up work has shown mixed results, with some studies arguing that AI-written code has roughly the same vulnerability rate as human-written code when controlled for the population of developers using each.
Vendors have built specialised security review tools. Snyk DeepCode AI uses LLM analysis combined with symbolic reasoning to flag vulnerabilities and propose fixes. GitHub Advanced Security added AI-based autofix for Copilot users in 2024. Veracode AI Fix, released in 2024, generates patches for vulnerabilities found by Veracode's static analysis. Microsoft Security Copilot ships extensions for code reviewers in Azure DevOps. Each of these tools requires careful evaluation, since LLM-generated patches can introduce regressions.
GitClear, a Git analytics company, published a 2024 report that compared four years of commit data and reported that the share of code that was copy-pasted within a project rose alongside Copilot adoption, while the share of refactored or moved code fell. The report concluded that AI assistance correlated with what the authors called "code churn" and reduced reuse [58]. The methodology has been disputed by GitHub and by other researchers, and the result has not been independently replicated, but the report has been widely cited in the debate about whether AI tools improve maintainability.
Long-context performance is a recurring weakness. The Needle in a Haystack and RULER evaluations have shown that retrieval accuracy in very long contexts degrades sharply for many models, which limits how well a coding agent can reason over a large monorepo without retrieval [59]. The release of one-million-token windows for Gemini 1.5 Pro, Claude Sonnet, and GPT-4.1 has changed what is feasible, but agents still typically rely on chunking, embeddings, and language-server queries rather than naive context stuffing.
Most large code LLMs are trained on public GitHub repositories, including code with copyleft licences such as GPL. Critics including Matthew Butterick, Joseph Saveri, and Tim Davidson filed a class-action suit in November 2022 in the US District Court for the Northern District of California alleging that GitHub Copilot violated the licences of the open-source code it was trained on by emitting verbatim or near-verbatim snippets without the required attribution [60]. The case (Doe v. GitHub, Inc., No. 4:22-cv-06823) has been narrowed through several rulings since filing. A separate suit, Doe v. OpenAI, makes similar arguments against the underlying Codex model. As of late 2025, the litigation is ongoing and has not produced a definitive ruling that resolves the question.
The broader debate about whether AI coding tools will displace or augment software engineers has surfaced in public statements from CEOs (see Industry claims above) and in surveys. The Stack Overflow Developer Survey for 2024 reported that 72% of professional respondents used AI tools in their workflow, up from 44% in 2023, but only 43% trusted the accuracy of AI tools, down from 49% [2]. Junior-developer hiring trends and changes in computer-science enrolment have been linked to AI by industry commentators including Gergely Orosz of The Pragmatic Engineer, though the causality is contested.