# Browser-use agent

> Source: https://aiwiki.ai/wiki/browser-use_agent
> Updated: 2026-05-24
> Categories: AI Agents, AI Tools & Products, Large Language Models
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

A **Browser-use agent** (also called an **autonomous web agent** or **LLM-based browser agent**) is a type of [artificial intelligence](/wiki/artificial_intelligence) software agent that operates a standard web browser through its graphical user interface to accomplish goals specified by users in natural language.[^1][^2] The term is also strongly associated with **Browser Use**, an open-source Python library and cloud platform of the same name founded in late 2024 by Magnus Müller and Gregor Žunič that became one of the fastest-growing developer projects of 2025, reaching more than 95,000 GitHub stars by mid-2026.[^3][^4]

Unlike traditional web scraping, API-based approaches, or simple automation scripts that follow predefined rules, browser-use agents use the reasoning of [large language models](/wiki/large_language_model) (LLMs) combined with browser automation infrastructure to perceive web page content, plan sequences of actions, and execute them across diverse websites without bespoke integration.[^5][^6] They represent a step toward general-purpose digital assistants that can handle real-world web-based tasks such as booking travel, completing forms, researching information, or running enterprise workflows.[^7]

## Terminology and definition

The term "browser-use agent" describes agents that complete tasks by controlling a web browser rather than calling site-specific APIs.[^8] These systems share three traits:

- Render or inspect webpages through visual or DOM-based perception
- Plan multi-step procedures using LLM reasoning
- Execute low-level browser actions (click, type, select, navigate) to achieve goals such as booking, data extraction, or account management[^5]

Browser-use agents are distinguished from [computer-use agents](/wiki/computer-use_agent), which operate in broader desktop environments, by their focus on web interactions inside a browser instance.[^9] In contemporary usage the phrase has also become a brand name: the open-source library `browser-use` published on PyPI and developed by Browser Use Inc. is sometimes referred to simply as "the browser-use agent."[^4]

## History

The concept of browser-use agents emerged from the convergence of advances in [large language models](/wiki/large_language_model) and web automation technologies.

### Predecessor research (2021 to 2023)

OpenAI's [WebGPT](/wiki/webgpt), introduced in December 2021, demonstrated early browser-assisted question-answering with human feedback, fine-tuning GPT-3 to issue search and click commands in a text-based browser environment.[^10] In September 2022, [Adept AI](/wiki/adept_ai) introduced ACT-1, a transformer trained to use common software tools including web browsers, framed as an "action transformer" mapping natural language to UI operations.[^11] During 2023, academic benchmarks WebArena from Carnegie Mellon and Mind2Web from Ohio State University established standardized evaluation frameworks that would shape later progress.[^1][^12] OpenAI's GPT-4V vision API, released in late 2023, made it feasible for general-purpose LLMs to interpret screenshots, opening the door to vision-based web agents.[^13]

### Origin of the Browser Use library (October to November 2024)

Browser Use was created by Magnus Müller and Gregor Žunič, two ETH Zurich graduate students.[^3][^14] According to Müller, the project began as "a weekend experiment to see if LLMs could navigate the web like humans," with an initial prototype built in four days and launched on Hacker News in October 2024.[^3] The pair worked out of ETH Zurich's Student Project House (SPH), a campus incubator. The project went viral after gaining traction on Hacker News and X (formerly Twitter) in late November 2024 and acquired tens of thousands of GitHub stars within weeks, becoming one of the fastest-growing open-source repositories in the agentic-AI ecosystem.[^4][^15] The team was admitted to the Y Combinator Winter 2025 batch.[^15]

### Modern era of computer- and browser-use agents (2024 to 2025)

In October 2024, [Anthropic](/wiki/anthropic) released the Computer Use feature for [Claude](/wiki/claude) 3.5 Sonnet, allowing the model to read pixels of a screen and emit keystrokes and mouse coordinates.[^16] In December 2024, Google DeepMind unveiled [Project Mariner](/wiki/project_mariner), an experimental browser agent built on Gemini 2.0.[^17] On 23 January 2025, OpenAI launched Operator, powered by the Computer-Using Agent (CUA) model, initially as a research preview to ChatGPT Pro subscribers.[^2][^18] OpenAI later integrated Operator capabilities into a unified [ChatGPT Agent](/wiki/chatgpt_agent) mode and announced deprecation of the standalone `operator.chatgpt.com` site.[^19]

### Funding and commercial growth (2025 to 2026)

On 22 March 2025, Browser Use announced a $17 million seed round led by Felicis Ventures, with participation from A Capital, Nexus Venture Partners, SV Angel, Liquid2, Pioneer Fund, and angel investors including Paul Graham.[^20][^21] The round was led at Felicis by Senior Venture Partner Astasia Myers.[^21] At the time of the announcement, the open-source repository had passed 50,000 GitHub stars and 15,000 active developers.[^22][^21]

In May 2025, Salesforce acquired Convergence AI, a UK-based competitor whose Proxy product launched browser-based agents; the acquisition was framed as accelerating Salesforce's Agentforce roadmap.[^23] In June 2025, Browserbase, the cloud infrastructure provider behind the Stagehand agent framework, announced a $40 million Series B led by Notable Capital and launched a no-code product called Director.[^24] Later in 2025 Browser Use shipped a managed cloud platform (`browser-use.com`) with Pay-As-You-Go and Enterprise tiers offering SLAs, on-premise deployment, HIPAA compliance, and zero-retention contracts.[^25]

On 27 January 2026, Browser Use released BU 2.0, a proprietary in-house model tuned for web automation, claiming a 12% accuracy improvement over the prior default while preserving speed.[^26] By May 2026 the open-source `browser-use` package had passed 95,000 GitHub stars and 10,000 forks.[^4]

## Rationale

Many real-world workflows remain locked behind human-oriented web interfaces that lack public APIs. Browser-use agents aim to generalize across diverse sites without bespoke integration by:[^27]

1. Interpreting on-page content and structure through perception systems
2. Mapping high-level natural-language instructions to concrete browser interactions
3. Adapting to website changes without reprogramming
4. Reducing fragmented evaluation through unified testbeds

Browser Use co-founder Müller has framed the core technical wager as treating the browser "not just as an interface for humans, but as the execution environment for intelligent agents."[^21]

## Architecture and core components

A browser-use agent follows a perception, reasoning, action loop where it perceives the state of a web page, reasons about the next best action toward its goal, and executes that action; the cycle repeats until task completion, failure, or a step limit is reached.[^28]

| Component | Description | Technologies | Implementation details |
| --- | --- | --- | --- |
| Perception layer | Understands content and layout of current web page | DOM parsing, CSS selectors, XPath, Accessibility Tree APIs, vision models | DOM extraction for interactive elements; screenshot processing (base64); visual analysis for layout; text extraction and semantic parsing |
| Reasoning and planning layer | Core decision-making powered by LLMs | [GPT-4](/wiki/gpt_4o), Claude, Gemini, Llama; chain-of-thought; ReAct framework | Task decomposition into sub-goals; multi-step action planning; context management across pages; error detection and recovery |
| Action execution layer | Translates abstract actions into browser commands | Selenium, Playwright, Puppeteer, browser extensions, Chrome DevTools Protocol | Low-level control (click, type, scroll); multi-browser support; headless and visible modes; session management |
| Memory management | Maintains state and context | Vector databases, session storage, [reinforcement learning](/wiki/reinforcement_learning) memories | Working memory for active tasks; persistent memory across sessions; semantic memory; episodic action history |
| Safety and monitoring | Ensures safe operation and compliance | Refusal mechanisms, audit logging, permission systems | Prompt injection prevention; sensitive action gates; user approval workflows; activity logging and rollback |

### The Browser Use agent loop

In the open-source `browser-use` library, the agent loop is exposed through a high-level `Agent` class instantiated with a natural-language task and an LLM (the `ChatBrowserUse`, `ChatOpenAI`, `ChatAnthropic`, `ChatGoogle`, or `ChatOllama` wrappers).[^29] When `agent.run()` is called, the library executes the following cycle for each step until completion or `max_steps` is reached:[^30]

1. Capture the current page state (DOM tree, optional screenshot, URL, tab list, scroll position)
2. Convert the DOM into a structured, indexed text representation in which interactive elements are numbered (`[3]<button>Submit</button>`)
3. Build a prompt containing the task description, recent history, and the indexed element list
4. Call the LLM and parse a JSON response specifying one or more actions and an updated internal "next goal" or memory note
5. Dispatch actions through the browser backend (click on indexed element, type into indexed input, scroll, navigate, extract content, switch tab, etc.)
6. Update history and repeat

The library's design philosophy departs from screenshot-only agents (such as OpenAI's Operator) by relying primarily on the structured DOM with vision as an optional channel, which the team argues is more deterministic, cheaper in tokens, and easier to debug.[^21][^31] Configuration parameters include `max_actions_per_step` (default 4), `max_failures` (default 3), `use_vision` ("auto", True, or False), `vision_detail_level`, `fallback_llm`, `flash_mode` (skips evaluation and next-goal generation for speed), and a separate `page_extraction_llm` for text extraction.[^29]

### Playwright to CDP migration

The original 2024 implementation built actions on top of Microsoft's Playwright cross-browser framework.[^32] In August 2025 the team announced a rewrite that drops Playwright and speaks Chrome DevTools Protocol (CDP) directly.[^33] In a public blog post the team described the move as "leaving the curse of abstraction," citing several reasons: the extra hop through Playwright's Node.js websocket server added latency; Playwright's synchronous "update view between actions" model fought against CDP's event-driven nature; and cross-origin iframe support and async reactions were difficult to retrofit.[^33]

The CDP rewrite introduced an event-driven architecture in which "watchdog" services subscribe to CDP events. Examples documented by the project include a `downloads_watchdog` that monitors spontaneous file downloads, a `crash_watchdog` that detects renderer crashes via a single CDP event subscription, and dialog watchdogs that auto-handle `beforeunload` and JavaScript modals.[^33] The DOM extraction pipeline produces "super-selectors" that combine target ID, frame ID, backend node ID, position data, and fallback CSS selectors, enabling reliable element tracking across cross-origin iframes.[^33] The CLI 2.0 release (March 2025) reported approximately 50 ms command latency via a persistent background daemon and a roughly 50% reduction in token use compared to prior versions.[^34]

## Technical implementation

### Browser automation backends

| Framework | Primary use case | Advantages | Limitations | Browser-use agent adoption |
| --- | --- | --- | --- | --- |
| Playwright | Cross-browser automation | Fast, reliable, modern API, built-in waiting | Newer ecosystem; extra abstraction layer | Default for many agents; was used by browser-use until mid-2025[^32] |
| Selenium | Traditional web testing | Mature, wide language support | Slower, more complex setup | Legacy support |
| Puppeteer | Chrome/Chromium control | Direct CDP access, lightweight | Chrome-only | Specialized use cases |
| Chrome DevTools Protocol (CDP) | Low-level browser control | Maximum control, event-driven | Complex, browser-specific | Used directly by browser-use from August 2025[^33] |

### Processing modes

| Mode | Description | Token usage | Speed | Accuracy | Best for |
| --- | --- | --- | --- | --- | --- |
| Snapshot mode | Uses accessibility tree or indexed DOM for element identification | Low (500 to 2K) | Fast (less than 1s) | High for simple pages | Form filling, standard layouts |
| Vision mode | Processes screenshots for visual understanding | High (5K to 15K) | Slow (2 to 5s) | High for complex layouts | Dynamic content, visual elements |
| Hybrid mode | Combines DOM parsing with visual processing | Medium (2K to 8K) | Medium (1 to 3s) | Highest overall | General-purpose automation |
| Streaming mode | Continuous observation and action | Very high | Real-time | Variable | Interactive applications |

### Language model integration

Browser-use agents support various LLM providers with different capabilities:[^17]

| Provider | Models | Vision support | Cost (per 1M tokens) | Latency | Best use case |
| --- | --- | --- | --- | --- | --- |
| [OpenAI](/wiki/openai) | [GPT-4o](/wiki/gpt_4o), GPT-4-turbo, o-series | Yes | $5 to 15 | Low | Production systems |
| [Anthropic](/wiki/anthropic) | [Claude 3.5 Sonnet](/wiki/claude_3_5_sonnet), Claude 3 Opus, Claude Sonnet 4 | Yes | $3 to 15 | Low | Complex reasoning |
| Google | Gemini 1.5 Pro, Gemini 2.0, Gemini 2.5 | Yes | $3.5 to 7 | Low | Multimodal tasks |
| Open source | Llama 3, Mistral, Qwen | Limited | $0.5 to 2 | Variable | Cost-sensitive applications |

The Browser Use library exposes wrapper classes (`ChatOpenAI`, `ChatAnthropic`, `ChatGoogle`, `ChatGroq`, `ChatOllama`) plus its own `ChatBrowserUse` client that defaults to the in-house BU 2.0 model.[^29][^26] In April and May 2026 the project removed `litellm` from core dependencies in response to a supply chain incident, while keeping a `ChatLiteLLM` wrapper for users who install it separately.[^35]

## Performance benchmarks

### Standardized evaluation frameworks

| Benchmark | Focus area | Task count | Characteristics | Key metrics |
| --- | --- | --- | --- | --- |
| [WebArena](/wiki/webarena) | Realistic multi-site environment | 812 tasks | Self-hostable sites across e-commerce, CMS, social platforms; execution-based evaluation | Task success rate, efficiency score[^1] |
| [Mind2Web](/wiki/mind2web) | Cross-website generalization | 2,350 tasks | 137 websites, real-world task diversity, action sequence annotation | Element accuracy, action F1 score[^12] |
| WebVoyager | Live website interaction | 643 tasks | Amazon, GitHub, Google Maps, real-time execution | End-to-end success rate[^18] |
| [VisualWebArena](/wiki/visualwebarena) | Multimodal and visual tasks | 910 tasks | Image-heavy tasks, visual grounding | Visual element accuracy[^36] |
| BrowserGym | Unified ecosystem | 5,000+ tasks | Standardized obs/action spaces, cross-benchmark evaluation | Aggregate performance score[^27] |
| WebShop | E-commerce navigation | 12,087 products | Product search and selection, attribute matching | Purchase success rate[^37] |
| OSWorld | Full OS control | 369 tasks | Ubuntu, Windows, macOS environments | Cross-platform success rate[^18] |
| Online-Mind2Web | Real websites, live tasks | 300 tasks | 136 real websites; binary scoring; replaces self-hosted Mind2Web | Human-evaluated success rate[^38] |

### Reported comparative performance

| Agent / model | WebArena | WebVoyager | Mind2Web | OSWorld | Online-Mind2Web |
| --- | --- | --- | --- | --- | --- |
| Human baseline | 78.2% | 90.0% | 85.3% | 72.4% | n/a |
| Browser Use (open source, GPT-4o) | 51.2% | 89.1%[^31] | 73.4% | n/a | n/a |
| Browser Use Auto-Research (2025) | n/a | n/a | n/a | n/a | 97.7%[^38] |
| OpenAI CUA (Operator) | 58.1% | 87.0% | 76.2% | 38.1% | 61.3%[^38] |
| Anthropic Computer Use | 45.3% | 56.0% | 62.1% | 22.0% | n/a |
| Project Mariner (Google) | 52.4% | 83.5% | 71.3% | n/a | n/a |
| Agent S2 (Simular, OSWorld 50-step) | n/a | n/a | n/a | 34.5%[^39] | n/a |

The 89.1% WebVoyager number Browser Use reported in early 2025 has been disputed: independent evaluations by Browserable and others produced lower numbers (77.3% self-reported, 60.2% LLM-verified) on the same tasks, and competing projects such as Magnitude have reported 93.9% on WebVoyager.[^31][^40] On Online-Mind2Web (a successor benchmark with 300 tasks on 136 live websites), Browser Use reported 97.7% success rate using its Auto-Research approach built on the Claude Agent SDK, compared with 61.3% for OpenAI's Operator and 28 to 40% for most other agents.[^38]

The Online-Mind2Web paper, titled "An Illusion of Progress? Assessing the Current State of Web Agents" (Hou et al., 2025), argued that earlier benchmarks systematically overestimated real-world agent ability because of impossible tasks, drifted website content, and inadequate evaluators.[^41]

## Major implementations

### Browser Use library

`browser-use` is an open-source MIT-licensed Python library that enables LLM-powered browser interaction via natural language.[^4] Notable characteristics include:

- **Statistics**: more than 95,000 GitHub stars and 10,700+ forks as of May 2026, with 125+ tagged releases since first publication.[^4]
- **Codebase**: approximately 98% Python; minimum Python 3.11; installable via `pip install browser-use` or via the `uv` package manager.[^4]
- **Features**: multi-LLM support, indexed-DOM perception with optional vision, multi-tab support, persistent profile authentication, custom-tool registration, output schema validation, file upload containment, sensitive-data masking, fallback LLM routing.[^29]
- **Cloud product**: `cloud.browser-use.com` with typed Python and TypeScript SDKs, pay-as-you-go pricing (~$0.01 per task initialization plus per-step LLM cost), live session preview, and Enterprise plans with on-premise deployment and HIPAA support.[^25][^42]

### OpenAI Operator and ChatGPT Agent

Released 23 January 2025, [OpenAI Operator](/wiki/openai_operator) is powered by the Computer-Using Agent (CUA) model and combines GPT-4o's vision with reinforcement learning:[^2][^18]

- Architecture: iterative perception, reasoning, action loop with self-correction
- Safety: refusal mechanisms, user approval gates for sensitive actions, takeover mode for credentials
- Performance: 58.1% on WebArena, 87% on WebVoyager
- Availability: initially $200/month ChatGPT Pro tier in the United States; later folded into [ChatGPT Agent](/wiki/chatgpt_agent) mode with deprecation of the standalone `operator.chatgpt.com` site[^19]

### Anthropic Computer Use and Claude for Chrome

Released October 2024, [Anthropic Computer Use](/wiki/anthropic_computer_use) lets Claude models interact with computer interfaces through visual perception and simulated input.[^16] On 26 August 2025, Anthropic launched a research preview of Claude for Chrome, a browser extension that lets Claude operate the user's actual Chrome session.[^43] The extension entered general availability for Claude Pro, Team, Enterprise, and Max subscribers in late December 2025 with site-level permissions, action confirmations for high-risk operations, and built-in prompt injection defenses.[^44][^45]

### Google Project Mariner and Gemini agents

[Project Mariner](/wiki/project_mariner) is an experimental browser agent from Google DeepMind built on Gemini 2.0 and later Gemini 2.5, focused on multimodal understanding and announced in December 2024.[^17] Google brought aspects of Mariner into its Gemini app and Project Astra during 2025 and 2026, and added an Agent Mode to Chrome in 2026 alongside agentic web protocols (the Agent Payments Protocol, Agent2Agent, and Trust Tokens for Agents).[^46]

### Stagehand and Browserbase

Stagehand is a TypeScript framework from Browserbase that adds AI methods (`act`, `extract`, `observe`) to existing Playwright code, with a hybrid model where deterministic Playwright handles predictable flows and AI commands handle ambiguous steps.[^47] Browserbase, the cloud infrastructure platform behind Stagehand, raised a $40 million Series B in June 2025 and launched a no-code product called Director that emits Stagehand scripts from natural-language goals.[^24]

### Skyvern

Skyvern is an open-source agent that treats DOM as unreliable and instead feeds screenshots to vision models, with native two-factor authentication, CAPTCHA support, structured schema-based extraction, and a no-code workflow builder.[^48] Skyvern is often cited as a fork-friendly alternative to Browser Use for form-heavy automation.[^49]

### Other notable systems

- **Convergence Proxy**: a browser-agent system from a UK startup acquired by Salesforce in May 2025 for an undisclosed amount, integrated into the Agentforce platform.[^23]
- **Agent-S2 (Simular)**: an open compositional generalist-specialist framework that achieves 34.5% on OSWorld's 50-step evaluation, surpassing OpenAI CUA and UI-TARS.[^39]
- **Magnitude**: an open-source web agent that reported 93.9% on WebVoyager and contested Browser Use's leaderboard claims.[^40]
- **Browserable**: a managed AI browser agent that reported 90.4% on WebVoyager and 65% on Online-Mind2Web.[^50]
- **Cloudflare Browser Run**: a managed remote-browser service launched in 2025 to give agents an isolated Chromium with per-request billing.[^51]

## Applications and use cases

### Enterprise automation

Browser-use agents are increasingly positioned as a successor to traditional Robotic Process Automation (RPA), with applications including:

- Business process automation across portals without dedicated APIs
- Cross-platform data extraction and synchronization
- Automated regulatory checks and reporting
- Supply chain workflows in vendor portals[^25]

### E-commerce and services

- Real-time competitor analysis and price monitoring
- Multi-marketplace stock synchronization
- Automated order tracking and status updates
- Review aggregation with downstream [sentiment analysis](/wiki/sentiment_analysis)

### Research and analysis

- Literature review and citation gathering
- Market intelligence and competitor monitoring
- Earnings report extraction
- Patent and prior-art searches

### Quality assurance

- End-to-end user journey validation[^52]
- WCAG accessibility checks
- Cross-browser compatibility
- Performance and load testing

### Personal productivity

- Travel planning across multiple booking sites
- Resume parsing and job-application submission
- News and content aggregation
- Cross-platform social media management

## Challenges and limitations

### Dynamic content handling

Modern single-page applications with asynchronous loading, virtual scrolling, and lazy loading complicate element discovery; AJAX-heavy interfaces require sophisticated waiting strategies.[^53] Browser Use's CDP migration was motivated in part by the inadequacy of synchronous Playwright waits for the agent's event-driven needs.[^33]

### Element identification

Shadow DOM and iframes create isolation barriers; dynamically generated IDs defeat naive selectors; and visually similar elements require disambiguation. Browser Use addresses these through cross-origin iframe handling and "super-selector" tuples (target ID, frame ID, backend node ID, position, fallback selector).[^33][^54]

### State management

Persistent sessions across page transitions, authentication and two-factor flows, and unexpected logouts or timeouts present recurring difficulties.[^55] Browser Use exposes real browser profiles for authentication, and Skyvern provides native 2FA and TOTP integration.[^48]

### Error recovery and reliability

Documented success rates on real-world tasks ranged from 60 to 90% in 2025; reliability gaps remain the dominant blocker for production deployment.[^56] Specific failure modes include CAPTCHA handling, modal and popup dialogs, network failures, and rate limiting.[^57]

### Performance and cost

| Issue | Impact | Current solutions | Future approaches |
| --- | --- | --- | --- |
| LLM inference latency | 2 to 5 second delays per action | Caching, batching, persistent daemons[^34] | Edge deployment, model distillation |
| Token consumption | $0.10 to $1.00 per complex task | Efficient prompting, DOM-only mode | Specialized models (BU 2.0), compression[^26] |
| Memory limits | Context window constraints | Summarization, pruning | Extended context, hierarchical memory |
| Reliability | 60 to 90% success rates | Retry logic, fallback LLMs | Reinforcement learning, self-improvement |

### Safety, security, and ethical concerns

[Prompt injection](/wiki/prompt_injection) attacks from malicious web content are widely considered the dominant security risk for browser-use agents.[^58][^59] Notable 2025 vulnerabilities include EchoLeak (CVE-2025-32711), a zero-click vulnerability in Microsoft 365 Copilot that allowed remote attackers to exfiltrate data via email content;[^59] CurXecute (CVE-2025-54135), a remote code execution flaw in Cursor IDE triggered by malicious README content (CVSS 9.8);[^60] and a prompt-injection-via-navigation flaw in Perplexity Comet disclosed in October 2025.[^61] OpenAI stated in December 2025 that prompt injection in AI browsers is "unlikely to ever be fully solved," and the UK National Cyber Security Centre echoed that this class of attack "may be a problem that is never fully fixed."[^62][^63] Google's January 2026 report noted a 32% relative increase in malicious prompt-injection activity between November 2025 and February 2026.[^64] [Indirect prompt injection](/wiki/indirect_prompt_injection), which delivers attacker payloads through third-party content (web pages, documents, emails) rather than direct user prompts, was identified by OWASP as the #1 AI security threat for 2026.[^65]

Other concerns include credential theft, cross-site scripting via injected payloads, data exfiltration, processing of sensitive personal information, screenshot capture of private data, audit-trail retention, automated spam, large-scale unauthorized scraping, and terms-of-service violations.[^66] Browser Use's documentation includes a `sensitive_data` parameter that masks specified strings in LLM prompts and screenshots.[^29]

## 2025 to 2026 developments

### Maturation of the Browser Use ecosystem

The 2025 to 2026 period transformed Browser Use from a viral library into a venture-backed company with a commercial cloud and an in-house model:

- **March 2025**: $17M seed led by Felicis Ventures closes; Y Combinator W25 graduation.[^20][^15]
- **March 2025**: CLI 2.0 ships, replacing Playwright internals with a CDP daemon and reporting ~50ms command latency and ~50% token reduction.[^34]
- **August 2025**: Public "Closer to the metal: leaving Playwright for CDP" rewrite announced; introduces watchdog services and event-driven DOM extraction.[^33]
- **Late 2025**: Cloud product reaches general availability with typed Python and TypeScript SDKs, free tier, and Enterprise contracts.[^25][^42]
- **27 January 2026**: BU 2.0 model launches, claimed +12% accuracy at unchanged latency; supports bring-your-own-key and reusable automation script generation.[^26]
- **April to May 2026**: Removal of `litellm` from core dependencies after a supply-chain incident; release of `install-lite.sh`; security hardening for daemon socket access and file handling.[^35]

### Competitive landscape consolidation

The browser-use agent market consolidated rapidly during this period:

- Salesforce acquired Convergence AI in May 2025.[^23]
- Browserbase closed a $40M Series B in June 2025 and launched Director.[^24]
- Anthropic shipped Claude for Chrome as a research preview in August 2025 and general availability in December 2025.[^43][^44]
- Microsoft expanded Copilot Studio with browser-control actions across Microsoft 365.[^67]
- OpenAI merged Operator into the broader ChatGPT Agent product.[^19]
- The Online-Mind2Web paper argued that earlier benchmark progress had been illusory, prompting a new wave of evaluation work.[^41]

### Standards and protocols

Two related agent web protocols emerged in this period: the **Model Context Protocol (MCP)**, introduced by Anthropic in November 2024 and adopted broadly by 2025, gave LLM hosts a standard interface for tools, including browser tools.[^68] The **Agent2Agent (A2A)** protocol from Google in 2025 specified interoperability between agents.[^69] The **Agent Payments Protocol (AP2)**, also from Google, defined trusted-credentials handshakes for agent-initiated payments.[^46] Browser Use has integrated MCP via a server that exposes its actions as MCP tools.[^29]

### Performance frontier

By mid-2026 the state of the art on Online-Mind2Web was held by Browser Use's Auto-Research configuration at 97.7%, but with the caveat that two tasks were judged "impossible" and that the agentic judge was a Claude-based system aligned with human reviewers.[^38] On OSWorld the open Agent S2 from Simular led at 34.5% (50 steps).[^39] On WebVoyager, project-internal numbers from Magnitude (93.9%), Browserable (90.4%), and Browser Use (89.1%) all exceeded human-claimed baselines but were difficult to compare across independent harnesses.[^40][^50][^31]

### Adoption signs

By May 2026, the open-source `browser-use` repository had:

- More than 95,000 GitHub stars and 10,700+ forks[^4]
- Active integration into [agentic workflow](/wiki/agentic_workflow) tooling, [LangChain](/wiki/langchain) examples, and the Claude Agent SDK[^38]
- A presence in Y Combinator's "leading open-source web agent" company profile[^15]
- Production deployments at "tens of thousands" of developer organizations, per Felicis's investment thesis update[^21]

## Notable research projects

### Academic initiatives

- **[WebGPT](/wiki/webgpt) (OpenAI, 2021)**: pioneering browser-assisted question answering with human feedback; established foundations for modern browser-use agents.[^10]
- **[Mind2Web](/wiki/mind2web) (OSU and Allen AI, 2023)**: large-scale dataset and framework for generalist web agents, with cross-website generalization tests.[^12]
- **Online-Mind2Web (OSU, 2025)**: live successor benchmark; introduced the "Illusion of Progress" critique of earlier evaluations.[^41]
- **[WebArena](/wiki/webarena) (CMU, 2023)**: realistic self-hosted benchmark environment for reproducible agent evaluation.[^1]
- **AgentTuning**: research on enabling generalized agent abilities through fine-tuning of LLMs for web tasks.[^28]
- **Agent S2 (Simular, 2025)**: compositional generalist-specialist framework with Mixture-of-Grounding and Proactive Hierarchical Planning.[^39]
- **Mira benchmark / Mira AI**: a 2025 multi-site benchmark covering 600+ real workflows with emphasis on multi-tab and authenticated tasks (cited by multiple agent vendors during 2025 to 2026).[^49]

### Industry research

- **Adept ACT-1**: universal action transformer for software interface control.[^11]
- **Google multimodal web navigation**: research on instruction-finetuned foundation models for web navigation.[^70]
- **OpenAI Computer-Using Agent (CUA)**: the reinforcement-learning-trained model behind Operator.[^2]
- **Claude for Chrome research preview**: Anthropic's report on per-action confirmations and built-in injection defenses.[^43]

### Open-source frameworks

- **[LangChain](/wiki/langchain) and LlamaIndex**: provide building blocks and tool wrappers commonly used with browser-use agents.[^71]
- **BrowserGym**: unified ecosystem for web-agent research and evaluation.[^27]
- **Skyvern**: open-source vision-first browser automation.[^48]
- **Stagehand**: TypeScript hybrid Playwright + LLM framework from Browserbase.[^47]
- **Magnitude**: open-source SOTA-claiming agent that contested Browser Use's leaderboard numbers.[^40]

## Comparison with related technologies

| Aspect | Browser-use agent | [Computer-use agent](/wiki/computer-use_agent) | Traditional RPA | Web scraping |
| --- | --- | --- | --- | --- |
| Scope | Web browsers | Full desktop OS | Predefined workflows | Data extraction only |
| Adaptability | High (LLM-based) | High (LLM-based) | Low (scripted) | Low (rule-based) |
| Setup complexity | Medium | High | High | Low |
| Maintenance | Self-adapting | Self-adapting | Frequent updates | Regular updates |
| Cost | $0.10 to $1.00 per task | $0.50 to $2.00 per task | High initial, low per-task | Low |
| Use cases | General web automation | Any desktop application | Repetitive business processes | Data collection |
| Error handling | Intelligent recovery | Intelligent recovery | Basic retry logic | Minimal |

## See also

- [Artificial intelligence](/wiki/artificial_intelligence)
- [Large language model](/wiki/large_language_model)
- [AI browser agent](/wiki/ai_browser_agent)
- [Computer-use agent](/wiki/computer-use_agent)
- [Anthropic Computer Use](/wiki/anthropic_computer_use)
- [OpenAI Operator](/wiki/openai_operator)
- [ChatGPT Agent](/wiki/chatgpt_agent)
- [Project Mariner](/wiki/project_mariner)
- [Claude Code Playwright](/wiki/claude_code_playwright)
- [WebArena](/wiki/webarena)
- [VisualWebArena](/wiki/visualwebarena)
- [Mind2Web](/wiki/mind2web)
- [WebGPT](/wiki/webgpt)
- [Agentic workflow](/wiki/agentic_workflow)
- [Prompt injection](/wiki/prompt_injection)
- [Indirect prompt injection](/wiki/indirect_prompt_injection)
- [LangChain](/wiki/langchain)
- [Software agent](/wiki/software_agent)
- [Computer vision](/wiki/computer_vision)
- [Natural language processing](/wiki/natural_language_processing)
- [Reinforcement learning](/wiki/reinforcement_learning)
- [Prompt engineering](/wiki/prompt_engineering)
- [Chain-of-thought prompting](/wiki/chain_of_thought)

## References

[^1]: Zhou, Shuyan et al., "WebArena: A Realistic Web Environment for Building Autonomous Agents", arXiv, 2023-07-25. https://arxiv.org/abs/2307.13854. Accessed 2026-05-24.

[^2]: OpenAI, "Introducing Operator", OpenAI Blog, 2025-01-23. https://openai.com/index/introducing-operator/. Accessed 2026-05-24.

[^3]: Žunič, Gregor, "We Raised $17M to Build the Future of Web for Agents", Browser Use Blog, 2025-03-22. https://browser-use.com/posts/seed-round. Accessed 2026-05-24.

[^4]: Browser Use, "browser-use/browser-use: Make websites accessible for AI agents", GitHub repository, accessed 2026-05-24. https://github.com/browser-use/browser-use. Accessed 2026-05-24.

[^5]: Deng, Xiang et al., "Mind2Web: Towards a Generalist Agent for the Web", arXiv, 2023-06-09. https://arxiv.org/abs/2306.06070. Accessed 2026-05-24.

[^6]: He, Hongliang et al., "WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models", arXiv, 2024-01-25. https://arxiv.org/abs/2401.13919. Accessed 2026-05-24.

[^7]: Nakano, Reiichiro et al., "WebGPT: Browser-assisted question-answering with human feedback", arXiv, 2021-12-17. https://arxiv.org/abs/2112.09332. Accessed 2026-05-24.

[^8]: Yao, Shunyu et al., "ReAct: Synergizing Reasoning and Acting in Language Models", arXiv, 2022-10-06. https://arxiv.org/abs/2210.03629. Accessed 2026-05-24.

[^9]: Anthropic, "Developing a computer use model", Anthropic News, 2024-10-22. https://www.anthropic.com/news/developing-computer-use. Accessed 2026-05-24.

[^10]: OpenAI, "WebGPT: Improving the factual accuracy of language models through web browsing", OpenAI Blog, 2021-12-16. https://openai.com/index/webgpt/. Accessed 2026-05-24.

[^11]: Adept AI, "ACT-1: Transformer for Actions", Adept Blog, 2022-09-14. https://www.adept.ai/blog/act-1. Accessed 2026-05-24.

[^12]: Deng, Xiang et al., "Mind2Web: Towards a Generalist Agent for the Web", Project Page, OSU NLP Group, 2023. https://osu-nlp-group.github.io/Mind2Web/. Accessed 2026-05-24.

[^13]: OpenAI, "GPT-4V(ision) system card", OpenAI, 2023-09-25. https://openai.com/index/gpt-4v-system-card/. Accessed 2026-05-24.

[^14]: Müller, Magnus, "Magnus Müller, Founder of browser-use (YC W25)", LinkedIn profile, accessed 2026-05-24. https://ch.linkedin.com/in/magnus-mueller. Accessed 2026-05-24.

[^15]: Y Combinator, "Browser Use: Leading open-source web agent project with 50k stars in 3 months", Y Combinator company page, accessed 2026-05-24. https://www.ycombinator.com/companies/browser-use. Accessed 2026-05-24.

[^16]: Anthropic, "Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku", Anthropic News, 2024-10-22. https://www.anthropic.com/news/3-5-models-and-computer-use. Accessed 2026-05-24.

[^17]: Google DeepMind, "Project Mariner: Building the future of human-agent interaction", Google DeepMind Blog, 2024-12-11. https://deepmind.google/discover/blog/. Accessed 2026-05-24.

[^18]: OpenAI, "Computer-Using Agent: Introducing a universal interface for AI to interact with the digital world", OpenAI Blog, 2025-01-23. https://openai.com/index/computer-using-agent/. Accessed 2026-05-24.

[^19]: OpenAI, "Introducing ChatGPT agent: bridging research and action", OpenAI Blog, 2025-07-17. https://openai.com/index/introducing-chatgpt-agent/. Accessed 2026-05-24.

[^20]: Wiggers, Kyle, "Browser Use, the tool making it easier for AI 'agents' to navigate websites, raises $17M", TechCrunch, 2025-03-23. https://techcrunch.com/2025/03/23/browser-use-the-tool-making-it-easier-for-ai-agents-to-navigate-websites-raises-17m/. Accessed 2026-05-24.

[^21]: Myers, Astasia, "Felicis's Seed in Browser Use: Enabling AI Agents to Navigate the Web with Reliable Web Interaction", Felicis Ventures Blog, 2025-03-23. https://www.felicis.com/blog/investing-in-browser-use. Accessed 2026-05-24.

[^22]: Wheatley, Mike, "Browser Use raises $17M to help steer AI agents through the internet", SiliconANGLE, 2025-03-23. https://siliconangle.com/2025/03/23/browser-use-raises-17m-help-steer-ai-agents-internet/. Accessed 2026-05-24.

[^23]: Salesforce, "Salesforce Signs Definitive Agreement to Acquire Convergence.ai", Salesforce News, 2025-05-15. https://www.salesforce.com/news/stories/salesforce-signs-definitive-agreement-to-acquire-convergence-ai/. Accessed 2026-05-24.

[^24]: Browserbase, "Browserbase Launches 'Director' to Automate the Web for Everyone; Announces $40M Series B", PR Newswire, 2025-06-18. https://www.prnewswire.com/news-releases/browserbase-launches-director-to-automate-the-web-for-everyone-announces-40m-series-b-302483761.html. Accessed 2026-05-24.

[^25]: Browser Use, "Pricing", Browser Use website, accessed 2026-05-24. https://browser-use.com/pricing. Accessed 2026-05-24.

[^26]: Browser Use, "Browser Use Model: BU 2.0", Browser Use Changelog, 2026-01-27. https://browser-use.com/changelog/27-1-2026. Accessed 2026-05-24.

[^27]: Chezelles, Thibault Le Sellier de et al., "BrowserGym: A Unified Ecosystem for Web Agent Research", arXiv, 2024-12-06. https://arxiv.org/abs/2412.05467. Accessed 2026-05-24.

[^28]: Zeng, Aohan et al., "AgentTuning: Enabling Generalized Agent Abilities for LLMs", arXiv, 2023-10-19. https://arxiv.org/abs/2310.12823. Accessed 2026-05-24.

[^29]: Browser Use, "Agent Settings", Browser Use Documentation, accessed 2026-05-24. https://docs.browser-use.com/customize/agent-settings. Accessed 2026-05-24.

[^30]: Browser Use, "Quickstart", Browser Use Documentation, accessed 2026-05-24. https://docs.browser-use.com/quickstart. Accessed 2026-05-24.

[^31]: Browser Use, "Browser Use: state of the art Web Agent", Browser Use Blog, 2025-02. https://browser-use.com/posts/sota-technical-report. Accessed 2026-05-24.

[^32]: Microsoft, "Playwright: Fast and reliable end-to-end testing for modern web apps", Playwright Documentation, accessed 2026-05-24. https://playwright.dev/. Accessed 2026-05-24.

[^33]: Browser Use, "Closer to the Metal: Leaving Playwright for CDP", Browser Use Blog, 2025-08-20. https://browser-use.com/posts/playwright-to-cdp. Accessed 2026-05-24.

[^34]: Browser Use, "browser-use Releases (0.12.x)", GitHub Releases, accessed 2026-05-24. https://github.com/browser-use/browser-use/releases. Accessed 2026-05-24.

[^35]: Browser Use, "browser-use 0.12.8 release notes", GitHub, 2026-05-23. https://github.com/browser-use/browser-use/releases/tag/0.12.8. Accessed 2026-05-24.

[^36]: Koh, Jing Yu et al., "VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks", arXiv, 2024-01-24. https://arxiv.org/abs/2401.13649. Accessed 2026-05-24.

[^37]: Yao, Shunyu et al., "WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents", arXiv, 2022-07-04. https://arxiv.org/abs/2207.01206. Accessed 2026-05-24.

[^38]: Browser Use, "How we built the best browser agent with Auto-Research", Browser Use Blog, accessed 2026-05-24. https://browser-use.com/posts/online-mind2web-benchmark. Accessed 2026-05-24.

[^39]: Agashe, Saaket et al., "Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents", arXiv, 2025-04-02. https://arxiv.org/abs/2504.00906. Accessed 2026-05-24.

[^40]: Magnitude Dev, "Magnitude achieves SOTA 94% on WebVoyager benchmark", GitHub, accessed 2026-05-24. https://github.com/magnitudedev/webvoyager. Accessed 2026-05-24.

[^41]: Hou, Yixing et al., "An Illusion of Progress? Assessing the Current State of Web Agents", arXiv, 2025-04-01. https://arxiv.org/abs/2504.01382. Accessed 2026-05-24.

[^42]: Browser Use, "Browser Use Cloud Quickstart", Browser Use Cloud Documentation, accessed 2026-05-24. https://docs.cloud.browser-use.com/. Accessed 2026-05-24.

[^43]: Anthropic, "Piloting Claude in Chrome", Anthropic News, 2025-08-26. https://www.anthropic.com/news/claude-for-chrome. Accessed 2026-05-24.

[^44]: Anthropic, "Claude for Chrome Extension", Chrome Web Store, accessed 2026-05-24. https://chromewebstore.google.com/publisher/anthropic/u308d63ea0533efcf7ba778ad42da7390. Accessed 2026-05-24.

[^45]: Anthropic, "Mitigating the risk of prompt injections in browser use", Anthropic Research, accessed 2026-05-24. https://www.anthropic.com/research/prompt-injection-defenses. Accessed 2026-05-24.

[^46]: Google for Developers, "15 updates from Google I/O 2026: Powering the agentic web with new capabilities", Chrome for Developers Blog, 2026-05. https://developer.chrome.com/blog/chrome-at-io26. Accessed 2026-05-24.

[^47]: Browserbase, "Stagehand: AI-powered browser automation", GitHub, accessed 2026-05-24. https://github.com/browserbase/stagehand. Accessed 2026-05-24.

[^48]: Skyvern AI, "Skyvern: open-source browser automation with LLMs and computer vision", GitHub, accessed 2026-05-24. https://github.com/Skyvern-AI/skyvern. Accessed 2026-05-24.

[^49]: Gonsalvez, Steven, "Browser Tools for AI Agents Part 2: The Framework Wars (browser-use, Stagehand, Skyvern)", DEV Community, 2025. https://dev.to/stevengonsalvez/browser-tools-for-ai-agents-part-2-the-framework-wars-browser-use-stagehand-skyvern-4gn. Accessed 2026-05-24.

[^50]: Browserable, "WebVoyager Benchmark Results", Browserable Blog, accessed 2026-05-24. https://www.browserable.ai/blog/web-voyager-benchmark. Accessed 2026-05-24.

[^51]: Cloudflare, "Browser Run: give your agents a browser", Cloudflare Blog, 2025. https://blog.cloudflare.com/browser-run-for-ai-agents/. Accessed 2026-05-24.

[^52]: Microsoft Playwright Team, "Playwright Test for end-to-end testing", Microsoft, accessed 2026-05-24. https://playwright.dev/docs/test-intro. Accessed 2026-05-24.

[^53]: Scrapfly, "Stagehand vs Browser Use: AI Browser Agent Guide", Scrapfly Blog, accessed 2026-05-24. https://scrapfly.io/blog/posts/stagehand-vs-browser-use. Accessed 2026-05-24.

[^54]: Lightpanda, "CDP Under the Hood: A Deep Dive", Lightpanda Blog, accessed 2026-05-24. https://lightpanda.io/blog/posts/cdp-under-the-hood. Accessed 2026-05-24.

[^55]: NxCode, "Stagehand vs Browser Use vs Playwright: AI Browser Automation Compared (2026)", NxCode Resources, 2026. https://www.nxcode.io/resources/news/stagehand-vs-browser-use-vs-playwright-ai-browser-automation-2026. Accessed 2026-05-24.

[^56]: Steel.dev, "AI Browser Agent Leaderboards", Steel.dev, accessed 2026-05-24. https://leaderboard.steel.dev/. Accessed 2026-05-24.

[^57]: Skyvern, "Browser Use vs Stagehand: Which is Better? (February 2026)", Skyvern Blog, 2026-02. https://www.skyvern.com/blog/browser-use-vs-stagehand-which-is-better/. Accessed 2026-05-24.

[^58]: Greshake, Kai et al., "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection", arXiv, 2023-02-23. https://arxiv.org/abs/2302.12173. Accessed 2026-05-24.

[^59]: Aim Security, "EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit in a Production LLM System", arXiv, 2025-09. https://arxiv.org/abs/2509.10540. Accessed 2026-05-24.

[^60]: National Vulnerability Database, "CVE-2025-54135 Cursor IDE remote code execution via prompt injection in README", NIST NVD, 2025. https://nvd.nist.gov/vuln/detail/CVE-2025-54135. Accessed 2026-05-24.

[^61]: Brave Software, "Unseeable prompt injections in screenshots: more vulnerabilities in Comet and other AI browsers", Brave Blog, 2025-10. https://brave.com/blog/unseeable-prompt-injections/. Accessed 2026-05-24.

[^62]: Wiggers, Kyle, "OpenAI says AI browsers may always be vulnerable to prompt injection attacks", TechCrunch, 2025-12-22. https://techcrunch.com/2025/12/22/openai-says-ai-browsers-may-always-be-vulnerable-to-prompt-injection-attacks/. Accessed 2026-05-24.

[^63]: Hutton, Catherine, "OpenAI says prompt injections that can trick AI browsers may never be fully solved", Fortune, 2025-12-23. https://fortune.com/2025/12/23/openai-ai-browser-prompt-injections-cybersecurity-hackers/. Accessed 2026-05-24.

[^64]: Google Security, "AI threats in the wild: The current state of prompt injections on the web", Google Security Blog, 2026. https://blog.google/security/prompt-injections-web/. Accessed 2026-05-24.

[^65]: Securance, "Prompt injection: the OWASP #1 AI threat in 2026", Securance Blog, 2026. https://www.securance.com/blog/prompt-injection-the-owasp-1-ai-threat-in-2026/. Accessed 2026-05-24.

[^66]: Malwarebytes Labs, "AI browsers could leave users penniless: A prompt injection warning", Malwarebytes, 2025-08. https://www.malwarebytes.com/blog/news/2025/08/ai-browsers-could-leave-users-penniless-a-prompt-injection-warning. Accessed 2026-05-24.

[^67]: Microsoft, "Copilot Studio: build agents for the web and Microsoft 365", Microsoft Copilot Studio Documentation, accessed 2026-05-24. https://learn.microsoft.com/en-us/microsoft-copilot-studio/. Accessed 2026-05-24.

[^68]: Anthropic, "Introducing the Model Context Protocol", Anthropic News, 2024-11-25. https://www.anthropic.com/news/model-context-protocol. Accessed 2026-05-24.

[^69]: Google, "Announcing the Agent2Agent Protocol", Google Developers Blog, 2025. https://developers.googleblog.com/. Accessed 2026-05-24.

[^70]: Furuta, Hiroki et al., "Multimodal Web Navigation with Instruction-Finetuned Foundation Models", arXiv, 2023-05-19. https://arxiv.org/abs/2305.11854. Accessed 2026-05-24.

[^71]: LangChain, "LangChain documentation: Browser tools and agents", LangChain Docs, accessed 2026-05-24. https://python.langchain.com/. Accessed 2026-05-24.

## External links

- [Browser Use on GitHub](https://github.com/browser-use/browser-use)
- [Browser Use official website](https://browser-use.com/)
- [Browser Use documentation](https://docs.browser-use.com/)
- [Browser Use Cloud](https://cloud.browser-use.com/)
- [OpenAI Operator](https://openai.com/index/introducing-operator/)
- [Claude for Chrome](https://www.anthropic.com/news/claude-for-chrome)
- [Stagehand on GitHub](https://github.com/browserbase/stagehand)
- [Skyvern on GitHub](https://github.com/Skyvern-AI/skyvern)
- [Browserbase](https://www.browserbase.com)
- [WebArena GitHub repository](https://github.com/web-arena-x/webarena)
- [Mind2Web dataset](https://github.com/OSU-NLP-Group/Mind2Web)
- [Online-Mind2Web leaderboard](https://hal.cs.princeton.edu/online_mind2web)
- [WebShop benchmark](https://webshop-pnlp.github.io/)

