Autonomous coding
Last reviewed
Sources
37 citations
Review status
Source-backed
Revision
v3 ยท 5,879 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
37 citations
Review status
Source-backed
Revision
v3 ยท 5,879 words
Add missing citations, update stale details, or suggest a clearer explanation.
Autonomous coding refers to the use of artificial intelligence systems that can independently write, debug, test, and maintain software with minimal human intervention. Unlike earlier AI code generation tools that offered line-by-line suggestions or responded to single prompts, autonomous coding systems operate through extended execution loops where an AI agent plans a task, writes code, runs tests, observes failures, and iterates on its work until the task is complete. The field has evolved rapidly since 2021, progressing from simple code completion to fully autonomous agents capable of resolving real-world software engineering issues across entire codebases.
As of 2026, autonomous coding represents one of the most commercially significant applications of large language models (LLMs), with tools like Claude Code, Devin, GitHub Copilot, Cursor, and OpenAI Codex reshaping how software is built. According to a JetBrains survey from October 2025, roughly 85% of nearly 25,000 surveyed developers regularly used AI tools for coding and software design work.[15] The same survey, based on 24,534 respondents across 194 countries, found that 62% relied on at least one AI coding assistant, agent, or code editor, with ChatGPT (41%) and GitHub Copilot (30%) the most commonly used tools.[15]
The progression from basic code assistance to autonomous coding agents occurred over several distinct phases, each building on advances in machine learning and natural language processing.
Automated code assistance dates back to the 1990s, when features like Microsoft's IntelliSense introduced basic code completion in integrated development environments (IDEs). These systems suggested identifiers, keywords, and method signatures as programmers typed, relying on static analysis and symbol tables rather than machine learning. For roughly two decades, code completion remained limited to syntactic suggestions within a single file.
The application of deep learning to code began shifting this landscape around 2018 and 2019. TabNine, released in 2018, was among the first tools to apply GPT-2-based predictions to code completion, offering whole-line and multi-line suggestions. Early efforts in program synthesis also aimed to generate correct programs from formal specifications, though these approaches struggled with real-world codebases.
In August 2021, OpenAI introduced Codex, a modified version of GPT-3 fine-tuned on 159 gigabytes of Python code from 54 million GitHub repositories.[1] Codex could translate natural language instructions into working code across multiple programming languages, though OpenAI noted it completed only about 37% of requests correctly.[1] Despite these limitations, Codex became the foundation for GitHub Copilot, which launched as a technical preview on June 29, 2021, and reached general availability on June 21, 2022. In its first month of general availability, Copilot attracted over 400,000 paid subscribers.
The release of ChatGPT in November 2022 and GPT-4 in March 2023 introduced a new interaction model for code generation. Rather than providing inline suggestions, developers could describe problems in natural language and receive complete code blocks, explanations, and debugging advice through a conversational interface. Tools like GitHub Copilot Chat, Amazon CodeWhisperer (later Amazon Q Developer), and various IDE integrations adopted this chat-based approach.
DeepMind's AlphaCode, published in Science in December 2022, demonstrated another direction. AlphaCode generated millions of candidate programs for competitive programming problems, then filtered and clustered them down to 10 submissions. In evaluations on Codeforces contests with over 5,000 participants each, AlphaCode achieved an average ranking in the top 54.3%, roughly at the level of a median human competitor.[3] This marked the first time an AI system reached competitive-level performance in programming contests.[3]
The transition from chat-based assistants to autonomous agents began in earnest in 2024. On March 12, 2024, Cognition Labs unveiled Devin, which it described as the "world's first AI software engineer."[4] Devin could autonomously plan, code, debug, and deploy software within a sandboxed environment that included a terminal, code editor, and web browser.[4] On the SWE-bench benchmark, Devin resolved 13.86% of real-world GitHub issues end-to-end, compared to the previous best of 1.96%.[4]
Also in 2024, researchers at Princeton University and Stanford University released SWE-agent, an open-source system that introduced the concept of an Agent-Computer Interface (ACI), a set of custom commands and feedback formats designed to help language models navigate repositories, edit files, and execute tests more effectively.[2] SWE-agent was published at NeurIPS 2024 and achieved state-of-the-art results on SWE-bench when paired with Claude 3.7 Sonnet in February 2025.[2]
By 2025 and 2026, the major AI labs and developer tool companies had all released autonomous coding agents, establishing the category as a primary battleground in AI product development.
Autonomous coding agents share a common architectural pattern that distinguishes them from simpler code completion or chat-based tools. The core formula, widely cited in the research literature, is:
Agent = LLM + Memory + Planning + Tool Use
The defining mechanism of an autonomous coding agent is its iterative feedback loop, often called the "agent loop" or "agentic loop." Rather than generating code in a single pass and stopping, the agent repeatedly cycles through a sequence of steps:
This cycle continues until the task is complete or the agent determines it cannot make further progress. The most common implementation follows the ReAct (Reasoning + Acting) pattern, where the model explicitly reasons about why a particular tool call is appropriate before executing it.
Claude Code, for example, uses what Anthropic describes as a "single-threaded master loop" architecture. The SDK runs an execution loop where Claude evaluates the prompt, calls tools to take action, receives results, and repeats until the task finishes. Subagents can be spawned to handle specialized tasks in parallel, such as building a backend API while the main agent works on the frontend.
Autonomous coding agents operate within environments that give them access to the same tools a human developer would use:
| Tool category | Examples | Purpose |
|---|---|---|
| File system | Read, write, search, navigate directories | Understand and modify codebases |
| Terminal/shell | Run commands, install packages, execute scripts | Build, test, and deploy code |
| Code editor | View files with syntax highlighting, make targeted edits | Precise code modifications |
| Web browser | Search documentation, access APIs, research solutions | Gather information |
| Version control | Git operations, create branches, submit pull requests | Manage code changes |
| Testing frameworks | Run unit tests, integration tests, linters | Verify correctness |
A persistent challenge for autonomous coding agents is managing context within the token limits of the underlying LLM. Agents use several strategies to handle this:
Several autonomous coding systems have emerged since 2024, each with different architectures and target use cases.
Devin, developed by Cognition Labs (founded in August 2023 by Scott Wu, Steven Hao, and Walden Yan, all gold medalists at the International Olympiad in Informatics), is designed as a fully autonomous software engineer. Devin operates inside a secure sandboxed virtual machine with access to a terminal, code editor, and web browser. It can plan multi-step tasks, write and debug code, run tests, and deploy applications without human intervention.
As of 2025, Devin 2.0 completes 83% more junior-level development tasks per Agent Compute Unit compared to Devin 1.x, according to Cognition's internal benchmarks.[17] The system supports multi-agent coordination, where a main Devin session delegates work to managed sub-Devins that each operate in isolated virtual machines. Devin also supports desktop testing using computer use, where it can run applications, interact with their interfaces, and record testing sessions for human review.
Goldman Sachs has piloted Devin alongside its 12,000 human developers as part of what IBM described as a "hybrid workforce" initiative.[20] In July 2025, Cognition acquired Windsurf (formerly Codeium), aiming to merge Windsurf's IDE-level intelligence with Devin's autonomous capabilities.[21][22] The deal was announced on July 14, 2025, days after Google paid roughly $2.4 billion in a reverse-acquihire to bring Windsurf's chief executive Varun Mohan, co-founder Douglas Chen, and several research leaders to Google DeepMind; Cognition did not disclose the acquisition price, and stated that Windsurf had reached about $82 million in annual recurring revenue.[22]
In its 2025 performance review, published on November 14, 2025, Cognition reported that 67% of Devin's pull requests were being merged, up from 34% a year earlier, and that the agent had become four times faster and twice as resource-efficient at problem solving.[17] The report cited customers including Goldman Sachs, Citi, Santander, and Nubank, and noted that when Oracle ended legacy support for a Java version, Devin migrated affected repositories in roughly one-fourteenth of the time a human engineer would take, and that customer test coverage typically rose from 50 to 60 percent up to 80 to 90 percent.[17] Nubank used multiple Devin instances in parallel to compress what it described as an 18-month migration project into a matter of weeks.[17]
Cognition's valuation rose sharply on the strength of this adoption. The company reached a $10.2 billion valuation in September 2025, and on May 27, 2026 it announced a Series D round of more than $1 billion at a $25 billion pre-money valuation (about $26 billion post-money), led by Lux Capital, General Catalyst, and 8VC.[23][24] At the time, Cognition reported an annualized revenue run-rate of about $492 million, with enterprise usage growing roughly 50% month over month over the prior six months.[24]
Claude Code is Anthropic's agentic coding tool, available in the terminal, IDE extensions (VS Code, Cursor, Windsurf, JetBrains), the desktop app, and the browser. Originally built to support developer productivity within Anthropic, it was released publicly and has become one of the most widely adopted autonomous coding tools.
Claude Code reads codebases, edits files across multiple directories, runs terminal commands, and integrates with GitHub and GitLab to handle complete workflows from reading issues to submitting pull requests. A checkpoint system automatically saves code state before each change, allowing developers to rewind to previous versions if an agent's changes go wrong.
As of 2026, Claude Code works with the Opus 4.6, Sonnet 4.6, and Haiku 4.5 models. Claude Opus 4.6, released on February 5, 2026, added the ability to assemble agent teams inside Claude Code that break complex tasks into independent subtasks and run tools and subagents in parallel.[37] Anthropic's research found that users grant Claude Code more autonomy as they gain experience: newer users employ full auto-approve mode in roughly 20% of sessions, increasing to over 40% by their 750th session.[7][6] The underlying architecture was generalized into the Claude Agent SDK (originally called the Claude Code SDK) for building custom autonomous agents.
OpenAI reused the "Codex" brand in 2025 for a cloud-based software engineering agent, distinct from the 2021 language model of the same name. The modern Codex is powered by codex-1, a version of o3 optimized for software engineering.[8] Each task runs in its own cloud sandbox environment preloaded with the user's repository.[8]
Codex can write features, answer codebase questions, fix bugs, and propose pull requests. As of February 2026, GPT-5.3-Codex combines the coding capabilities of earlier Codex models with the reasoning of GPT-5.2, running 25% faster while using fewer tokens.[9] Reports indicate GPT-5.3-Codex can work independently for more than seven hours on large, complex tasks, iterating on implementations and fixing test failures until delivering a working result.[9] Codex became available to ChatGPT Plus users on June 3, 2025.[8] The Codex line advanced quickly through 2025 and 2026: GPT-5.1-Codex-Max (November 19, 2025) was observed working on tasks for more than 24 hours in OpenAI's internal evaluations, and GPT-5.2-Codex (January 14, 2026) was reported by OpenAI as state of the art on SWE-bench Pro at 56.4% and on Terminal-Bench 2.0.[26][27] In April 2026, OpenAI released GPT-5.5 (codename "Spud"), which it described as its strongest agentic coding model to date and which it reported scoring 82.7% on Terminal-Bench 2.0.[36]
GitHub's approach to autonomous coding evolved through several stages. Copilot Workspace, a browser-based environment launched as a technical preview in April 2024, could take a GitHub issue written in plain English and produce a specification, a plan, and actual code changes. GitHub sunset the Workspace preview by May 2025, but rebuilt its core concepts (sub-agent architecture, issue-to-PR workflow, asynchronous execution) as the Copilot coding agent, which became generally available to all paid Copilot subscribers in September 2025.[11][12]
The Copilot coding agent works autonomously in a GitHub Actions-powered environment. It can be assigned tasks through GitHub issues or Copilot Chat, and creates pull requests with the results.[11] It handles low-to-medium complexity tasks including adding features, fixing bugs, extending tests, refactoring code, and improving documentation. By 2026, GitHub Copilot supports multi-model selection, allowing users to choose between GPT-4o, GPT-5.1-Codex-Max, Claude Opus 4.5, and Gemini 2.0 Flash. GitHub continued to expand this model menu through 2026: on April 14, 2026 it added per-model selection for the Claude and Codex coding agents on github.com, exposing newer Anthropic and OpenAI models as they shipped, and in May 2026 it removed Google's Gemini models from Copilot Chat on the web.[33]
Cursor, developed by Anysphere (a Y Combinator-backed startup founded in 2022), is an AI-native IDE built on a VS Code foundation. Cursor's agent mode lets it operate autonomously within the IDE: executing terminal commands, running tests, installing packages, and iterating on errors, all within the developer's local environment.
Cursor uses subagents that run in parallel to explore codebases, with each subagent using the best model for its specific task. A custom embedding model provides retrieval across large codebases. The landmark 0.50 release in 2025 introduced Background Agents, which execute tasks independently while developers focus on other work. Cursor crossed $500 million in annual recurring revenue and reached a $10 billion valuation in 2025. By February 2026, Anysphere reported reaching about $2 billion in annualized revenue, and it was reported to be in talks to raise new funding at a valuation near $50 billion.[32] The company also moved from routing exclusively to third-party models toward shipping its own in-house frontier coding model, Composer, trained for multi-file edits and self-correcting test loops.[32]
SWE-agent, developed by researchers at Princeton University and Stanford University, is an open-source framework for autonomous software engineering. Its core contribution is the Agent-Computer Interface (ACI), a set of custom commands and feedback formats that make it easier for language models to browse repositories, view and edit code, and execute tests.[2]
Key design features of SWE-agent include a code linter integrated into the edit function (alerting the agent to mistakes and discarding invalid edits), informative prompts and error messages, and history processors that keep agent context concise.[2] SWE-agent works with multiple LLMs including GPT-4o and Claude Sonnet 4. A minimal implementation called Mini-SWE-Agent achieved 65% on SWE-bench Verified in just 100 lines of Python.
Amazon Q Developer is AWS's autonomous coding agent, capable of implementing features, documenting code, refactoring, and performing software upgrades. The agent runs in a dedicated environment with access to IDE functionalities, generates multiple candidate solutions for each problem, selects the most promising one, and returns the result to the developer.[18] In April 2025, Amazon reported the agent achieved 49% on SWE-bench (its internal benchmark variant) and 66% on SWE-bench Verified.[18] The CLI agent is powered by Amazon Bedrock and supports multi-turn conversations.
| System | Developer | Key characteristics |
|---|---|---|
| Windsurf | Cognition AI (acquired from Codeium) | Cascade agentic system; learns architecture patterns over 48 hours of use; MCP integrations with GitHub, Slack, Figma |
| Replit Agent | Replit | Browser-based; plans and builds projects end-to-end; handles environment setup, coding, testing, and deployment in one place |
| Copilot Workspace | GitHub | Sunset May 2025; concepts evolved into Copilot coding agent |
| Antigravity | Embeds autonomous agents into the coding environment for planning, executing, testing, and validating software tasks |
Measuring the capabilities of autonomous coding systems requires benchmarks that go beyond simple function-level code generation.
SWE-bench, introduced by Princeton researchers, evaluates whether AI systems can resolve real-world GitHub issues from popular open-source Python repositories. The benchmark presents the agent with a repository state and an issue description, then checks whether the agent's code changes cause the relevant test cases to pass.
Several variants of SWE-bench exist:
| Variant | Description | Size |
|---|---|---|
| SWE-bench (full) | Original benchmark with issues from 12 Python repositories | 2,294 instances |
| SWE-bench Lite | Filtered subset for faster evaluation | 300 instances |
| SWE-bench Verified | Manually curated subset funded by OpenAI [10] | 500 instances |
| SWE-bench Pro | Scale AI benchmark with long-horizon tasks across Python, Go, TypeScript, and JavaScript | 1,865 instances from 41 repositories |
As of early 2026, the top scores on SWE-bench Verified include Claude Opus 4.5 at 80.9%, Claude 4 at 77.2%, Gemini 3 Flash at 75.8%, and GPT-5 at 74.9%. Claude Opus 4.5, released by Anthropic on November 24, 2025, was the first model to cross the 80% mark on the benchmark.[31] The SWE-bench Verified scaffold received a major upgrade on February 12, 2026, with updated scaffolding, environments, and token limits. OpenAI stopped reporting Verified scores after finding training data contamination across all frontier models on the dataset.[25] In a post published on February 23, 2026, OpenAI reported that an audit found at least 59.4% of the harder problems contained flawed test cases that rejected functionally correct solutions, and that every frontier model it tested (GPT-5.2, Claude Opus 4.5, and Gemini 3) could reproduce verbatim gold patches from certain tasks, indicating memorization rather than genuine capability; OpenAI recommended that the community report SWE-bench Pro instead.[25]
SWE-bench Pro, which requires an average of 107 lines of changes across 4.1 files per task, shows considerably lower scores. When Scale AI introduced the benchmark in a paper first posted on September 21, 2025, the best models scored only around 23% (GPT-5 at 23.3% and Claude Opus 4.1 at 23.1%), compared with figures above 70% on SWE-bench Verified.[35] The dataset is split into a public set of 11 repositories, a held-out set of 12 repositories, and a commercial set of 18 proprietary repositories, a structure intended to resist contamination.[35] The gap between Verified and Pro results highlights that longer-horizon, multi-file tasks remain significantly harder for current agents.
Terminal-Bench is a benchmark that evaluates agents on hard, realistic tasks performed in command-line environments, such as configuring legacy systems, reimplementing research papers, compiling code, training models, and debugging environments. Terminal-Bench 2.0 comprises 89 hand-crafted, human-verified tasks, each with a unique environment and comprehensive tests.[30] It has become a standard evaluation for terminal-driven agents including Cursor, Codex CLI, Claude Code, and Gemini CLI, with frontier systems generally scoring below 65% on the 2.0 version.[30] Vendors increasingly report Terminal-Bench alongside SWE-bench: Anthropic cited a 15% Terminal-Bench improvement for Claude Opus 4.5 over Sonnet 4.5, and OpenAI reported GPT-5.5 at 82.7% on Terminal-Bench 2.0.[31][36]
HumanEval, created by OpenAI in 2021, consists of 164 hand-written Python programming problems that test a model's ability to generate correct functions from docstrings.[1] While widely used as a coding benchmark, HumanEval measures single-function generation rather than autonomous software engineering.
Performance on HumanEval has improved dramatically. In early 2023, the best models achieved approximately 67% accuracy. By mid-2024, models like GPT-4 Turbo and Claude 3 Opus exceeded 85%. As of 2025, top models score above 90%, with Claude Sonnet 4 reaching 95.1% and Kimi K2 0905 reaching 94.5%. According to Stanford's AI Index Report, the performance gap between leading American and Chinese models narrowed from 31.6 percentage points in late 2023 to just 3.7 points by the end of 2024.[19]
However, the benchmark is widely considered saturated. The 164-problem set is nearly solved, and studies show that top models experience 19.6 to 47.7 percentage point drops on transformed variants of the problems, suggesting memorization rather than genuine compositional understanding.
Additional benchmarks used to evaluate coding AI include:
Vibe coding is a related concept coined by computer scientist Andrej Karpathy (co-founder of OpenAI and former AI leader at Tesla) on February 6, 2025.[14] Karpathy described it as a style of programming where you "fully give in to the vibes, embrace exponentials, and forget that the code even exists," using tools like Cursor Composer with Claude Sonnet to generate code almost entirely through natural language.[14]
While autonomous coding and vibe coding both involve AI writing code, they differ in intent and rigor. Autonomous coding systems are designed to produce production-quality software with testing and verification. Vibe coding, by contrast, is an informal practice where the developer accepts AI-generated code with minimal review, suitable for prototypes and personal projects but not for production systems. The term was named Collins English Dictionary's Word of the Year for 2025.
As of early 2026, autonomous coding agents are effective at several categories of tasks:
Rakuten reported that Claude Code completed a task on a 12.5-million-line codebase in seven hours of autonomous work, achieving 99.9% numerical accuracy without human code contribution during execution.[5]
Significant limitations persist:
A METR study published in July 2025 found that experienced open-source developers actually took 19% longer when using AI tools compared to working without them.[13] The study observed 16 experienced contributors to large repositories (averaging more than 1 million lines of code) across 246 real tasks; the same developers had predicted AI would make them about 24% faster and still believed they had been roughly 20% faster after the fact.[13] The researchers attributed this partly to time spent reviewing, debugging, and correcting AI-generated code. However, other studies reported positive results: developers using AI coding assistants reported an average productivity increase of 31.4% in certain task categories.
Autonomous coding introduces several security challenges that the industry is still working to address.
Studies have found a 23.7% increase in security vulnerabilities in AI-assisted code compared to manually written code. Agents can introduce vulnerabilities because they have limited context of the larger codebase's security model, may use insecure coding patterns from training data, and lack awareness of application-specific security requirements.
Researchers uncovered prompt injection flaws in GitHub Copilot, Cursor, and other tools that allow malicious actors to edit workspace configuration files to achieve code execution. Autonomous agents that can run terminal commands, install packages, and access networks create a larger attack surface than traditional code completion tools.
The security research community has identified several categories of risk specific to autonomous coding agents:
| Risk category | Description |
|---|---|
| Prompt injection | Malicious instructions embedded in code comments, documentation, or dependencies that redirect agent behavior |
| Tool misuse and privilege escalation | Agents using their terminal or network access in unintended ways |
| Supply chain attacks | Agents installing compromised packages or introducing backdoors from training data |
| Memory poisoning | Corrupting the context or memory that agents use for decision-making |
| Cascading failures | Errors propagating across multi-agent systems where agents delegate to other agents |
In November 2025, reports indicated that a Chinese state-sponsored group had jailbroken Claude Code to launch cyber operations against roughly thirty targets, automating 80 to 90 percent of the operation. The claim originated with Anthropic, which published a report on November 13, 2025 stating that it assessed "with high confidence" that the threat actor was a Chinese state-sponsored group, that Claude Code had been used to inspect target systems across around thirty organizations with success in a small number of cases, and that the AI performed an estimated 80 to 90 percent of the campaign with only 4 to 6 human decision points.[28] Anthropic also noted that Claude sometimes hallucinated credentials or overstated what it had obtained, which limited full autonomy.[28] The disclosure drew skepticism from parts of the security community: researchers including Kevin Beaumont criticized the absence of indicators of compromise, and others argued that the report overstated the role of AI, with Daniel Card commenting that AI "is a super boost but it's not skynet."[29]
Organizations are deploying autonomous coding agents faster than they can secure them. While most Chief Information Security Officers express concern about AI agent risks, only a small number have implemented mature safeguards. The disconnect between rapid adoption and security readiness is a recurring theme in industry analyses from late 2025 and early 2026.
According to Anthropic's 2026 Agentic Coding Trends Report, software development is shifting "from an activity centered on writing code to one grounded in orchestrating agents that write code."[5] Engineering roles are increasingly focused on agent supervision, system design, and output review rather than direct code implementation.[5]
The report found that developers integrate AI into 60% of their work while maintaining active oversight on 80 to 100% of delegated tasks.[5] Approximately 27% of AI-assisted work consists of tasks that would not have been done otherwise, including scaling projects, building internal tools, and exploratory development that previously did not justify the time investment.[5] Published on February 28, 2026, the report frames these shifts as eight distinct trends, and notes that while developers use AI across roughly 60% of their work, they report being able to fully delegate only 0 to 20% of tasks.[5]
Productivity results are mixed and context-dependent. Several organizations have reported measurable gains:
| Organization | Reported outcome |
|---|---|
| TELUS | Teams shipped engineering code 30% faster after creating over 13,000 custom AI solutions |
| Zapier | 97% AI adoption across the entire organization as of January 2026 |
| Rakuten | Seven-hour autonomous task on 12.5M-line codebase with 99.9% accuracy |
However, a disconnect exists between perceived and measured productivity. While over 75% of developers report feeling more productive with AI tools, many organizations have not seen measurable improvement in delivery velocity or business outcomes. Stack Overflow's 2025 Developer Survey found that 66% of developers cited AI's "almost correct" solutions as their biggest time sink, due to the debugging effort required.[16]
A Stanford University study found that employment among software developers aged 22 to 25 fell nearly 20% between 2022 and 2025, coinciding with the rise of AI coding tools.[34] The causal relationship is debated, as multiple economic factors affect junior developer hiring, but the correlation has drawn attention from researchers and policymakers. Stanford's 2026 AI Index Report, drawing on the same payroll analysis, documented the roughly 20% decline among the youngest developers since 2024 while employment for developers over 26 held steady or grew, and observed that performance on SWE-bench Verified rose from about 60% to near 100% over a single year.[34]
A growing trend in 2026 is the use of hierarchical multi-agent systems, where an orchestrator agent coordinates multiple specialist agents working in parallel. Each specialist has its own dedicated context and tool access. Devin's multi-agent mode, for example, uses a main session that scopes work, monitors progress, resolves conflicts, and compiles results from managed sub-Devins running in isolated virtual machines.
Anthropic's trends report identifies multi-agent coordination as one of four strategic priorities for organizations adopting autonomous coding, alongside scaling human-agent oversight, extending agentic coding beyond engineering teams, and embedding security architecture as a core design principle.[5]
Several trends suggest where autonomous coding may develop in the near term: