OpenHands
Last reviewed
May 7, 2026
Sources
14 citations
Review status
Source-backed
Revision
v1 ยท 4,404 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 7, 2026
Sources
14 citations
Review status
Source-backed
Revision
v1 ยท 4,404 words
Add missing citations, update stale details, or suggest a clearer explanation.
OpenHands is an open-source, autonomous software development agent platform created by All Hands AI. Originally launched in March 2024 under the name OpenDevin, the project enables AI agents to write code, execute shell commands, browse the web, and manage files inside an isolated Docker sandbox in order to complete software engineering tasks end-to-end. It is released under the MIT License and accepts contributions from a global community of researchers and engineers. As of late 2025, the repository had accumulated more than 65,000 GitHub stars, more than 7,000 forks, and contributions from over 440 individuals.
The platform's default agent, CodeActAgent, implements the CodeAct framework, which unifies the agent's action space around executable code rather than structured tool calls. This design allows a single agent to fix bugs, implement features, resolve merge conflicts, write documentation, deploy infrastructure, and perform data analysis, all from a natural-language task description. OpenHands is model-agnostic by design: it routes LLM calls through LiteLLM and therefore supports models from Anthropic, OpenAI, Google, Mistral, Cohere, and local deployments via Ollama, vLLM, or SGLang.
On the SWE-bench Verified benchmark, OpenHands achieved a resolution rate of 60.6 percent with a single attempt and 66.4 percent with an inference-time scaling strategy that runs five parallel attempts and selects the best solution using a trained critic model. These results placed it among the top-performing open-source systems on the benchmark as of late 2025.
The project that became OpenHands began as a direct response to the March 2024 debut of Devin (AI software engineer) by Cognition AI. Devin was the first commercially announced autonomous coding agent and generated widespread attention across the software engineering community. Within days of the announcement, Binyuan Hui and Junyang Lin, researchers working on the Qwen large language model series at Alibaba DAMO Academy, created a GitHub repository named OpenDevin with the stated goal of building an open-source equivalent. The repository's tagline, "Code Less, Make More," established the project's ambition to automate routine development work.
The project was created on March 12, 2024. Within weeks it became the number-one trending repository on GitHub, amassing 12,000 stars in its first days and demonstrating that the open-source community shared strong interest in accessible autonomous coding tools.
Graham Neubig, an associate professor at Carnegie Mellon University's Language Technologies Institute with extensive research in NLP and open-source tooling, joined the project early as a core maintainer. Xingyao Wang, a PhD candidate at the University of Illinois Urbana-Champaign studying interactive language agents, and Robert Brennan, who had held product and engineering leadership roles at the open-source Kubernetes governance company Fairwinds and had previously been a senior software engineer at Google, also became principal contributors. These three individuals would go on to co-found All Hands AI.
As OpenDevin matured, Brennan, Neubig, and Wang founded All Hands AI to provide institutional backing for the open-source project, pursue commercial applications, and build a hosted cloud service. Brennan became CEO, Wang became Chief AI Officer, and Neubig became Chief Scientist. The company is headquartered in Boston.
On September 5, 2024, All Hands AI announced a $5 million seed round led by Menlo Ventures, with additional participation from Pillar VC, Betaworks, and Rebellion. Notable angels included Thom Wolf, co-founder of Hugging Face; Jeff Hammerbacher, co-founder of Cloudera; and Soumith Chintala, creator of PyTorch and a vice president at Meta. The seed round validated the thesis that open-source AI development tooling could sustain a commercial business.
At the time of the seed announcement, OpenDevin already had over 30,000 GitHub stars and more than 180 unique contributors.
On August 26, 2024, Graham Neubig announced via social media that the project would be renamed from OpenDevin to OpenHands, coinciding with the release of version 0.9.0. The rebrand was made to establish an independent identity separate from the Devin name and to signal that the platform had grown beyond an open-source clone into its own ecosystem. The new name reflected the project's collaborative, human-agent partnership ethos. The GitHub repository moved to the All-Hands-AI organization, and the main repository URL changed accordingly.
Version 0.9.0 shipped a number of new features including improved agent delegation, expanded browser support, and a redesigned event stream architecture.
On November 18, 2025, All Hands AI, operating publicly as OpenHands, announced an $18.8 million Series A round led by Madrona Venture Group, with participation from Menlo Ventures, Pillar VC, Obvious Ventures, Fujitsu Ventures, and Alumni Ventures. The total capital raised across both rounds reached approximately $23.8 million.
At the time of the Series A, the project had more than 65,000 GitHub stars, more than 7,000 forks, and over three million total downloads. Engineers at AMD, Apple, Google, Amazon, Netflix, TikTok, NVIDIA, Mastercard, and VMware had forked or cloned the repository. Early enterprise adopters reported reductions of up to 50 percent in code-maintenance backlogs by using OpenHands to automate dependency upgrades, lint fixes, and vulnerability remediation.
Soma Somasegar, a partner at Madrona, stated: "Autonomous agents are transforming from side-projects into core members of the engineering team, and OpenHands' open, model-agnostic approach ensures this transformation happens safely, transparently, and at enterprise scale."
In a retrospective blog post published in November 2025, the team noted that the project had completed approximately 3,500 commits and had grown to more than 250 unique contributors since its founding. The project was also accepted as a conference paper at ICLR 2025, lending it academic credibility alongside its practical deployment base.
OpenHands is designed around the principle that AI agents should interact with software environments in the same manner as human developers: by writing code, running commands, and navigating the web. The architecture separates three primary concerns: the agent, which decides what actions to take; the runtime, which executes those actions inside an isolated sandbox; and the event stream, which maintains a chronological log of all actions and observations.
All interactions between the agent and its environment flow through an append-only event stream. Each step of agent execution reads the full history of typed event objects, including prior actions, observations, and user messages, and appends a new action. This design provides several benefits: complete reproducibility of agent trajectories, straightforward multi-agent coordination through shared logs, and deterministic replay of past sessions for debugging.
The event stream records metadata alongside each event, including token counts for cost tracking and identifiers that enable a parent agent to spawn child agents and route their observations correctly.
All agent code execution takes place inside Docker containers. The default runtime image includes a bash shell for system commands, a Jupyter IPython server for interactive Python execution, and a Chromium browser managed by Playwright for web navigation. An internal REST API server running inside the container manages communication between the host process and the sandbox.
This containerization approach means that agents can install arbitrary packages, modify files, and run networked processes without risk of affecting the host machine's state. It also makes agent environments reproducible across different operating systems and hardware configurations.
The BaseWorkspace abstraction supports two deployment modes. In local mode, the sandbox runs as a container on the developer's own machine, which is appropriate for interactive development and prototyping. In remote mode, the sandbox runs on a managed server, which enables headless execution, parallel agent runs, and cloud deployment. The Software Agent SDK formalizes this abstraction and allows the same agent code to operate in both environments with minimal configuration changes.
OpenHands's default agent, CodeActAgent, operates within the CodeAct framework developed by Xingyao Wang and collaborators. Rather than exposing a fixed set of named tools, CodeAct expresses all agent actions as executable code. There are three primary action types:
A fourth action type, MessageAction, enables the agent to communicate with the human user in natural language to request clarification or confirmation before proceeding.
Because the action space is based on programming language primitives rather than a closed schema, agents can compose sequences of complex multi-step operations, install new tools at runtime, and adapt to tasks that were not anticipated at design time.
Web interaction is implemented through integration with BrowserGym, a standardized interface for browser-based agent tasks. The BrowsingAgent sub-agent uses BrowserGym's action primitives, including navigation, clicking, form filling, typing, and scrolling, and receives rich observations that include not only the rendered HTML but also the browser's accessibility tree and screenshots. This gives the agent multiple representations of the page state, which improves robustness on modern web applications that rely heavily on JavaScript rendering.
The BrowsingAgent can be invoked autonomously by CodeActAgent through the AgentDelegateAction mechanism, which instructs a parent agent to hand off a subtask to a specialized child agent and resume after receiving the child's result. This multi-agent delegation pattern allows OpenHands to decompose complex tasks: a code-writing agent can pause, delegate a web research subtask to the BrowsingAgent, incorporate the results, and continue coding.
OpenHands routes all LLM calls through LiteLLM, a Python library that provides a unified interface to more than 100 model providers. This means that users can configure the platform to use any supported model by setting environment variables or configuration file entries, without changing any agent code. Supported providers include Anthropic (Claude family), OpenAI (GPT-4o and later), Google (Gemini), Mistral, Cohere, and local models served by Ollama, vLLM, or SGLang.
When running with local models, OpenHands requires that the model's context window be set to at least 22,000 tokens, as the default window of many local model servers is too small to hold the event stream for multi-step tasks. Some local models may struggle with reliable tool use if they were not fine-tuned on agent-style trajectories.
In November 2025, All Hands AI released OpenHands LM 32B v0.1, a fine-tuned model optimized specifically for use within the OpenHands platform. The model is based on Qwen2.5-Coder-Instruct 32B and was trained using a reinforcement learning framework called SWE-Gym, which generates training data by having the agent attempt tasks on real open-source repositories and fine-tuning on trajectories that were successfully resolved.
OpenHands LM 32B features a 128,000-token context window, making it well-suited for large codebases and long-horizon tasks. Despite having 32 billion parameters, the team reported that it achieves performance comparable to models with significantly more parameters, including DeepSeek V3 at 671 billion parameters, on coding agent benchmarks. The model is published on Hugging Face under an open license. All Hands AI also announced a planned 7-billion-parameter variant for users with limited computational resources.
The team also trained and released a critic model built on Qwen2.5-Coder-Instruct 32B that evaluates agent trajectories using temporal difference learning to propagate unit-test success signals backward through action sequences. This critic model is used in the inference-time scaling approach that raised OpenHands's SWE-bench Verified score from 60.6 to 66.4 percent.
SWE-bench is a benchmark that evaluates software engineering agents on real GitHub issues drawn from popular Python repositories. Each issue requires the agent to identify the relevant code, implement a fix, and produce a patch that passes the repository's test suite. SWE-bench Verified is a curated subset of 500 problems that have been manually validated for correctness of the problem statements.
OpenHands's performance on SWE-bench has improved substantially with each major release:
| Version | Model | SWE-bench Lite | SWE-bench Verified |
|---|---|---|---|
| CodeActAgent v1.5 | GPT-4o | ~7% | N/A |
| CodeActAgent v1.8 | Claude 3.5 Sonnet | 26% | N/A |
| OpenHands (single attempt) | Claude Sonnet | N/A | 60.6% |
| OpenHands (5-attempt scaling) | Claude Sonnet | N/A | 66.4% |
Beyond SWE-bench, the ICLR 2025 paper reported results across 15 benchmarks covering software engineering, web navigation, and general reasoning tasks. On HumanEvalFix, which evaluates bug repair, CodeActAgent v1.5 with GPT-4o achieved 79.3 percent. On GPQA, a graduate-level science reasoning benchmark, CodeActAgent v1.8 with Claude 3.5 Sonnet achieved 52 percent. On the OS portion of AgentBench, CodeActAgent v1.5 achieved 57.6 percent.
OpenHands also ranked first on Multi-SWE-Bench, a variant of SWE-bench covering eight programming languages including Python, JavaScript, TypeScript, Java, and Go. On LiveSWEBench, which uses issues created after the training cutoffs of common models to prevent contamination, OpenHands was among the top-performing systems as of late 2025.
The OpenHands CLI provides terminal-native access to the OpenHands agent. It supports two primary modes of operation:
Interactive mode presents a conversational interface in the terminal in which the user types a task description and observes the agent's actions in real time. The CLI offers save-and-resume functionality so that interrupted sessions can be restarted from the last saved state. It also integrates with the Agent Communication Protocol (ACP), enabling IDE plugins and other developer tools to dispatch tasks to a running agent process.
Headless mode suppresses the interactive terminal display and accepts task instructions via command-line arguments or standard input. This mode is designed for use in CI/CD pipelines, cron jobs, and internal automation systems where human interaction is not required. In headless mode, the CLI can be invoked as a step in a GitHub Actions workflow, a Jenkins pipeline, or any shell script, and will return a structured exit code indicating success or failure.
The CLI also offers a confirmation mode in which the agent pauses before executing any destructive or irreversible action and prompts the user for approval, providing a safety net for local development without the overhead of reviewing every shell command.
In November 2025, All Hands AI published a companion paper titled "The OpenHands Software Agent SDK: A Composable and Extensible Foundation for Production Agents," along with the open-source SDK repository at github.com/OpenHands/software-agent-sdk.
The SDK formalizes the internal architecture of OpenHands into a set of composable primitives that developers can use to build custom agents without adopting the full OpenHands application stack. Key components include:
On SWE-bench Verified, the SDK-based agent achieved a resolution rate of 72 percent when using Claude Sonnet 4.5 with extended thinking enabled. On the GAIA benchmark for general assistant capabilities, it achieved 67.9 percent accuracy.
In November 2024, All Hands AI launched All Hands Online (beta), a hosted version of OpenHands accessible at app.all-hands.dev. The cloud service removes the need for users to run Docker locally and provides several features that are difficult to replicate in a self-hosted deployment:
The hosted service is built on the same open-source codebase, and the team has committed that OpenHands will remain free and open-source for self-hosted use. The cloud service monetizes through a subscription model that adds user management, team coordination, and enterprise security features on top of the open core.
All Hands AI has also announced partnerships with AMD, integrating AMD's Lemonade Server inference runtime with OpenHands to enable locally deployed coding agents on Ryzen AI PCs, emphasizing privacy and cost efficiency for enterprise users who cannot or prefer not to send code to external cloud services.
The platform supports integrations with GitHub, GitLab, Bitbucket, Slack, Jira, and Linear, enabling OpenHands to be embedded into existing engineering workflows.
OpenHands occupies a distinct position in the autonomous coding agent landscape by combining open-source availability, model agnosticism, and full-environment sandboxing.
| Feature | OpenHands | Devin | Claude Code | Cline |
|---|---|---|---|---|
| License | MIT (open source) | Proprietary | Proprietary | MIT (open source) |
| Model support | Any (100+ via LiteLLM) | Proprietary only | Anthropic only | Any |
| Execution environment | Docker sandbox | Proprietary cloud VM | Host machine | VS Code terminal |
| Web browsing | Yes (BrowserGym + Playwright) | Yes | No (by default) | Limited |
| Interface | Web GUI, CLI, SDK | Web app | Terminal (CLI) | VS Code extension |
| Multi-agent delegation | Yes | Limited | No | No |
| Self-hosted | Yes | No | Yes (local) | Yes |
| Cloud hosted | Yes (All Hands Online) | Yes (Devin cloud) | No | No |
| Pricing | Free (self-hosted); subscription (cloud) | $20/month minimum | $20/month Claude Max | Free (API keys required) |
| SWE-bench Verified | 66.4% (5-attempt) | Reported ~13.86% (original) | Varies by model | Varies by model |
Devin by Cognition AI was the commercial product that inspired OpenHands. Devin runs in a fully managed cloud environment with its own browser, terminal, and IDE, and is designed for maximum autonomy with minimal human interaction. It is a closed-source, proprietary service. OpenHands offers comparable autonomous operation but in a fully open, auditable codebase that can be self-hosted and configured with any LLM.
Claude Code by Anthropic is a terminal-based coding assistant that runs directly on the user's machine without Docker. It benefits from tight integration with Anthropic's Claude models and produces high-quality multi-file reasoning. However, it is limited to Anthropic models, does not include built-in web browsing by default, and cannot run agents in the background on remote infrastructure. OpenHands offers greater environment flexibility, web browsing capability, and model choice at the cost of heavier setup requirements.
Cline is an open-source VS Code extension that can autonomously create files, run terminal commands, and apply multi-file changes within the VS Code environment. Like OpenHands, it supports any model provider. Its human-in-the-loop GUI requires approval for each file change or terminal command, which increases safety but reduces the level of autonomous task completion. OpenHands operates at a higher level of autonomy and supports longer-horizon tasks through its sandboxed environment and event-sourced state model, but lacks Cline's tight IDE integration.
OpenHands supports a range of software development use cases that can be delegated to an autonomous agent:
Bug fixing: Given a GitHub issue or a failure description, the agent examines the repository, writes a failing test to reproduce the problem, implements the fix, and opens a pull request. The agent can be triggered automatically when a new issue is filed through GitHub Actions.
Feature implementation: Users describe a desired feature in natural language, and the agent explores the codebase, adds the required code and tests, and submits a pull request for human review.
CI failure diagnosis and repair: When a CI run fails due to a linting error, broken test, or environment issue, the agent can autonomously reproduce the failure, implement the fix, and push a corrected commit, eliminating the repeated push-check-fix cycle.
Merge conflict resolution: When a long-lived feature branch diverges from the main branch, the agent can resolve conflicts, run the relevant unit tests from the workflow configuration, and push a corrected branch.
Documentation generation: The agent reads source code and generates or updates documentation files, API reference pages, or README sections.
Dependency and vulnerability management: The agent scans repositories for outdated dependencies or known vulnerabilities, applies updates, verifies that tests still pass, and opens reviewable pull requests. Early enterprise adopters reported that this use case reduced vulnerability resolution times from days to minutes.
Legacy code migration: The agent can translate legacy codebases, such as COBOL systems, into modern languages with tests and validation, enabling organizations to modernize critical systems.
Data analysis and visualization: The agent can analyze datasets, generate charts, and produce summary reports using Python data science libraries within the Jupyter sandbox.
Infrastructure provisioning: The agent can interact with cloud provider APIs or infrastructure-as-code tools such as Terraform to provision resources, though users are advised to review such actions carefully before applying them to production environments.
Automated code review: Using the Software Agent SDK and GitHub Actions, OpenHands can post inline code review comments directly on pull requests, providing fast automated feedback typically within two to three minutes of a PR being opened.
The project attracted substantial community attention from its first days. Reaching the top of GitHub Trending within weeks of its March 2024 launch demonstrated the extent of developer interest in open alternatives to commercial coding agents. The ICLR 2025 acceptance of the OpenHands paper represented significant academic recognition of the platform's technical contributions, particularly the CodeAct framework and the multi-agent delegation architecture.
The TechCrunch coverage of the seed round in September 2024 helped bring the project to a wider audience outside the open-source AI community. Joff Redfern, a partner at Menlo Ventures, noted: "AI is going to completely change how developers work."
By the time of the Series A in November 2025, the project had demonstrated measurable enterprise value, with adopters at major technology companies reporting meaningful reductions in development toil. The Madrona-led round signaled that institutional investors considered OpenHands's open-source-led, enterprise-monetization model viable.
The research community has engaged with the platform as both a subject of study and a tool for conducting evaluations. The 15-benchmark evaluation framework included in the OpenHands paper has been adopted or referenced by subsequent work on agent evaluation, and the public release of the critic model and the OpenHands LM 32B weights contributed to the reproducibility of benchmark results.
Despite its capabilities, OpenHands has a number of documented limitations:
Context window degradation: Long multi-step tasks can fill the LLM's context window, forcing the platform to trim earlier portions of the event stream. When this trimming occurs, the agent may lose track of earlier instructions, previously established constraints, or the structure of files it has already seen, leading to inconsistent behavior or repeated work.
Sensitivity to model quality: The platform's performance is directly tied to the capabilities of the underlying LLM. Smaller local models frequently fail to use tools reliably, ignore provided instructions after a few steps, or produce repetitive action sequences. The minimum recommended model for productive use is a large frontier model or the purpose-built OpenHands LM 32B.
Single-user design of self-hosted instance: The self-hosted version is intended to be run by a single user on a local workstation. It includes no built-in authentication, user isolation, or access control. Deploying it as a multi-tenant shared service requires significant additional engineering.
Task scope and complexity ceiling: While OpenHands performs well on well-defined, bounded tasks such as fixing a specific GitHub issue, it can struggle with very large-scale refactors that require coordinating changes across dozens of interacting subsystems, or with tasks that require sustained judgment calls about product direction rather than mechanical implementation.
Nondeterminism and hallucination: As with all LLM-based agents, OpenHands can produce incorrect code, misread error messages, or make incorrect assumptions about library APIs. The event stream provides a complete audit trail for detecting such errors, but the platform does not have a built-in mechanism to guarantee that generated code is semantically correct beyond running the available tests.
Docker dependency: The Docker requirement adds setup overhead compared to tools that run directly in the terminal or as IDE extensions. Users on shared servers, certain CI environments, or systems without Docker-in-Docker support may face configuration challenges.