Browser-use agent

AI Agents AI Tools & Products Large Language Models

33 min read

Updated Jun 27, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 27, 2026

Fact-checked

In review queue

Sources

71 citations

Revision

v5 · 6,537 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

A Browser-use agent is an artificial intelligence software agent that operates a standard web browser through its normal user interface, clicking, typing, scrolling, and navigating, to complete tasks described in plain natural language.^[1]^[2] The term is most strongly associated with Browser Use, an open-source Python library and cloud platform of the same name created in late 2024 by Magnus Müller and Gregor Žunič, two ETH Zurich data-science master's students, whose stated mission is to "make websites accessible for AI agents" and which became one of the fastest-growing developer projects of 2025, surpassing 100,000 GitHub stars by mid-2026.^[3]^[4] Browser Use works by converting a web page's interactive elements into a structured, indexed text representation that a large language model can read and act on, letting the model decide which button to click or field to fill rather than relying on hand-written, site-specific scripts.^[21]^[29]

More broadly, a browser-use agent (also called an autonomous web agent or LLM-based browser agent) differs from traditional web scraping, API-based integrations, or rule-following automation scripts: it uses the reasoning of large language models (LLMs) combined with browser automation infrastructure to perceive web page content, plan sequences of actions, and execute them across diverse websites without bespoke integration.^[5]^[6] Such agents represent a step toward general-purpose AI agents that can handle real-world web tasks such as booking travel, completing forms, researching information, or running enterprise workflows.^[7]

What is a browser-use agent?

The term "browser-use agent" describes agents that complete tasks by controlling a web browser rather than calling site-specific APIs.^[8] These systems share three traits:

Render or inspect webpages through visual or DOM-based perception
Plan multi-step procedures using LLM reasoning
Execute low-level browser actions (click, type, select, navigate) to achieve goals such as booking, data extraction, or account management^[5]

Browser-use agents are distinguished from computer-use agents, which operate in broader desktop environments, by their focus on web interactions inside a browser instance.^[9] In contemporary usage the phrase has also become a brand name: the open-source library browser-use published on PyPI and developed by Browser Use Inc. is sometimes referred to simply as "the browser-use agent."^[4] The repository's one-line description states the project's goal directly: "Make websites accessible for AI agents. Automate tasks online with ease."^[4]

Is Browser Use open source?

Yes. The core browser-use library is open source under the permissive MIT license and is freely installable from PyPI.^[4] Its co-founders have framed open access as central to the project: in the seed-round announcement they wrote that "we are building the infrastructure that enables AI to interact with the web as seamlessly as humans do."^[3] Alongside the open-source library, Browser Use Inc. offers a separate commercial cloud platform (browser-use.com) with managed infrastructure, Pay-As-You-Go and Enterprise tiers, and features such as SLAs, on-premise deployment, HIPAA compliance, and zero-retention contracts.^[25] The library itself is approximately 98% Python, requires Python 3.11 or newer, and can be installed via pip install browser-use or the uv package manager.^[4]

How does Browser Use work?

A browser-use agent follows a perception, reasoning, action loop where it perceives the state of a web page, reasons about the next best action toward its goal, and executes that action; the cycle repeats until task completion, failure, or a step limit is reached.^[28] Rather than feeding raw pixels to the model, Browser Use "convert[s] website interfaces into structured text that LLMs can process deterministically," according to its founders.^[21] This structured-text approach is what makes the page "accessible" to the agent: buttons, links, and inputs are extracted, numbered, and described so the model can reference them by index.

Component	Description	Technologies	Implementation details
Perception layer	Understands content and layout of current web page	DOM parsing, CSS selectors, XPath, Accessibility Tree APIs, vision models	DOM extraction for interactive elements; screenshot processing (base64); visual analysis for layout; text extraction and semantic parsing
Reasoning and planning layer	Core decision-making powered by LLMs	GPT-4, Claude, Gemini, Llama; chain-of-thought; ReAct framework	Task decomposition into sub-goals; multi-step action planning; context management across pages; error detection and recovery
Action execution layer	Translates abstract actions into browser commands	Selenium, Playwright, Puppeteer, browser extensions, Chrome DevTools Protocol	Low-level control (click, type, scroll); multi-browser support; headless and visible modes; session management
Memory management	Maintains state and context	Vector databases, session storage, reinforcement learning memories	Working memory for active tasks; persistent memory across sessions; semantic memory; episodic action history
Safety and monitoring	Ensures safe operation and compliance	Refusal mechanisms, audit logging, permission systems	Prompt injection prevention; sensitive action gates; user approval workflows; activity logging and rollback

The Browser Use agent loop

In the open-source browser-use library, the agent loop is exposed through a high-level Agent class instantiated with a natural-language task and an LLM (the ChatBrowserUse, ChatOpenAI, ChatAnthropic, ChatGoogle, or ChatOllama wrappers).^[29] When agent.run() is called, the library executes the following cycle for each step until completion or max_steps is reached:^[30]

Capture the current page state (DOM tree, optional screenshot, URL, tab list, scroll position)
Convert the DOM into a structured, indexed text representation in which interactive elements are numbered ([3]<button>Submit</button>)
Build a prompt containing the task description, recent history, and the indexed element list
Call the LLM and parse a JSON response specifying one or more actions and an updated internal "next goal" or memory note
Dispatch actions through the browser backend (click on indexed element, type into indexed input, scroll, navigate, extract content, switch tab, etc.)
Update history and repeat

The library's design philosophy departs from screenshot-only agents (such as OpenAI's Operator) by relying primarily on the structured DOM with vision as an optional channel, which the team argues is more deterministic, cheaper in tokens, and easier to debug.^[21]^[31] Configuration parameters include max_actions_per_step (default 4), max_failures (default 3), use_vision ("auto", True, or False), vision_detail_level, fallback_llm, flash_mode (skips evaluation and next-goal generation for speed), and a separate page_extraction_llm for text extraction.^[29]

Playwright to CDP migration

The original 2024 implementation built actions on top of Microsoft's Playwright cross-browser framework.^[32] In August 2025 the team announced a rewrite that drops Playwright and speaks Chrome DevTools Protocol (CDP) directly.^[33] In a public blog post the team described the move as "leaving the curse of abstraction," citing several reasons: the extra hop through Playwright's Node.js websocket server added latency; Playwright's synchronous "update view between actions" model fought against CDP's event-driven nature; and cross-origin iframe support and async reactions were difficult to retrofit.^[33]

The CDP rewrite introduced an event-driven architecture in which "watchdog" services subscribe to CDP events. Examples documented by the project include a downloads_watchdog that monitors spontaneous file downloads, a crash_watchdog that detects renderer crashes via a single CDP event subscription, and dialog watchdogs that auto-handle beforeunload and JavaScript modals.^[33] The DOM extraction pipeline produces "super-selectors" that combine target ID, frame ID, backend node ID, position data, and fallback CSS selectors, enabling reliable element tracking across cross-origin iframes.^[33] The CLI 2.0 release (March 2025) reported approximately 50 ms command latency via a persistent background daemon and a roughly 50% reduction in token use compared to prior versions.^[34]

Who created Browser Use and when?

The concept of browser-use agents emerged from the convergence of advances in large language models and web automation technologies, but the Browser Use project specifically was created by Magnus Müller and Gregor Žunič, who met in 2024 while completing master's degrees in data science at ETH Zurich and worked out of the university's Student Project House (SPH) incubator.^[3]^[14]^[20] Müller had spent years building web-scraping tools before the pair teamed up.^[20] The open-source project was launched in November 2024 and "quickly became one of the fastest-growing projects in the developer ecosystem," reaching roughly 46,000 GitHub stars within months of release.^[20] The team was subsequently admitted to Y Combinator's Winter 2025 batch.^[15]

Predecessor research (2021 to 2023)

OpenAI's WebGPT, introduced in December 2021, demonstrated early browser-assisted question-answering with human feedback, fine-tuning GPT-3 to issue search and click commands in a text-based browser environment.^[10] In September 2022, Adept AI introduced ACT-1, a transformer trained to use common software tools including web browsers, framed as an "action transformer" mapping natural language to UI operations.^[11] During 2023, academic benchmarks WebArena from Carnegie Mellon and Mind2Web from Ohio State University established standardized evaluation frameworks that would shape later progress.^[1]^[12] OpenAI's GPT-4V vision API, released in late 2023, made it feasible for general-purpose LLMs to interpret screenshots, opening the door to vision-based web agents.^[13]

Origin of the Browser Use library (October to November 2024)

According to Müller, the project began as "a weekend experiment to see if LLMs could navigate the web like humans," with an initial prototype built in four days and launched on Hacker News in October 2024.^[3] The project went viral after gaining traction on Hacker News and X (formerly Twitter) in late November 2024 and acquired tens of thousands of GitHub stars within weeks, becoming one of the fastest-growing open-source repositories in the agentic AI ecosystem.^[4]^[15] The team was admitted to the Y Combinator Winter 2025 batch.^[15]

Modern era of computer- and browser-use agents (2024 to 2025)

In October 2024, Anthropic released the Computer Use feature for Claude 3.5 Sonnet, allowing the model to read pixels of a screen and emit keystrokes and mouse coordinates.^[16] In December 2024, Google DeepMind unveiled Project Mariner, an experimental browser agent built on Gemini 2.0.^[17] On 23 January 2025, OpenAI launched Operator, powered by the Computer-Using Agent (CUA) model, initially as a research preview to ChatGPT Pro subscribers.^[2]^[18] OpenAI later integrated Operator capabilities into a unified ChatGPT Agent mode and announced deprecation of the standalone operator.chatgpt.com site.^[19]

How much funding has Browser Use raised?

On 22 March 2025, Browser Use announced a $17 million seed round led by Felicis Ventures, with participation from A Capital, Nexus Venture Partners, SV Angel, Liquid2, Pioneer Fund, and angel investors including Y Combinator co-founder Paul Graham.^[20]^[21] The round was led at Felicis by Senior Venture Partner Astasia Myers.^[21] At the time of the announcement, the open-source repository had passed 50,000 GitHub stars and 15,000 active developers.^[3]^[21] In the announcement the company described its plan to "build the infrastructure that makes this transition possible," predicting that "within a few years, automated workflows will outnumber human interactions on the web."^[3]

In May 2025, Salesforce acquired Convergence AI, a UK-based competitor whose Proxy product launched browser-based agents; the acquisition was framed as accelerating Salesforce's Agentforce roadmap.^[23] In June 2025, Browserbase, the cloud infrastructure provider behind the Stagehand agent framework, announced a $40 million Series B led by Notable Capital and launched a no-code product called Director.^[24] Later in 2025 Browser Use shipped a managed cloud platform (browser-use.com) with Pay-As-You-Go and Enterprise tiers offering SLAs, on-premise deployment, HIPAA compliance, and zero-retention contracts.^[25]

On 27 January 2026, Browser Use released BU 2.0, a proprietary in-house model tuned for web automation, claiming a 12% accuracy improvement over the prior default while preserving speed.^[26] By mid-2026 the open-source browser-use package had passed 100,000 GitHub stars and 11,000 forks.^[4]

Why use a browser-use agent?

Many real-world workflows remain locked behind human-oriented web interfaces that lack public APIs. Browser-use agents aim to generalize across diverse sites without bespoke integration by:^[27]

Interpreting on-page content and structure through perception systems
Mapping high-level natural-language instructions to concrete browser interactions
Adapting to website changes without reprogramming
Reducing fragmented evaluation through unified testbeds

Browser Use co-founder Müller has framed the core technical wager as treating the browser "not just as an interface for humans, but as the execution environment for intelligent agents."^[21]

Architecture and technical implementation

Browser automation backends

Framework	Primary use case	Advantages	Limitations	Browser-use agent adoption
Playwright	Cross-browser automation	Fast, reliable, modern API, built-in waiting	Newer ecosystem; extra abstraction layer	Default for many agents; was used by browser-use until mid-2025^[32]
Selenium	Traditional web testing	Mature, wide language support	Slower, more complex setup	Legacy support
Puppeteer	Chrome/Chromium control	Direct CDP access, lightweight	Chrome-only	Specialized use cases
Chrome DevTools Protocol (CDP)	Low-level browser control	Maximum control, event-driven	Complex, browser-specific	Used directly by browser-use from August 2025^[33]

Processing modes

Mode	Description	Token usage	Speed	Accuracy	Best for
Snapshot mode	Uses accessibility tree or indexed DOM for element identification	Low (500 to 2K)	Fast (less than 1s)	High for simple pages	Form filling, standard layouts
Vision mode	Processes screenshots for visual understanding	High (5K to 15K)	Slow (2 to 5s)	High for complex layouts	Dynamic content, visual elements
Hybrid mode	Combines DOM parsing with visual processing	Medium (2K to 8K)	Medium (1 to 3s)	Highest overall	General-purpose automation
Streaming mode	Continuous observation and action	Very high	Real-time	Variable	Interactive applications

Language model integration

Browser-use agents support various LLM providers with different capabilities:^[17]

Provider	Models	Vision support	Cost (per 1M tokens)	Latency	Best use case
OpenAI	GPT-4o, GPT-4-turbo, o-series	Yes	$5 to 15	Low	Production systems
Anthropic	Claude 3.5 Sonnet, Claude 3 Opus, Claude Sonnet 4	Yes	$3 to 15	Low	Complex reasoning
Google	Gemini 1.5 Pro, Gemini 2.0, Gemini 2.5	Yes	$3.5 to 7	Low	Multimodal tasks
Open source	Llama 3, Mistral, Qwen	Limited	$0.5 to 2	Variable	Cost-sensitive applications

The Browser Use library exposes wrapper classes (ChatOpenAI, ChatAnthropic, ChatGoogle, ChatGroq, ChatOllama) plus its own ChatBrowserUse client that defaults to the in-house BU 2.0 model.^[29]^[26] In April and May 2026 the project removed litellm from core dependencies in response to a supply chain incident, while keeping a ChatLiteLLM wrapper for users who install it separately.^[35]

How well do browser-use agents perform on benchmarks?

Standardized evaluation frameworks

Benchmark	Focus area	Task count	Characteristics	Key metrics
WebArena	Realistic multi-site environment	812 tasks	Self-hostable sites across e-commerce, CMS, social platforms; execution-based evaluation	Task success rate, efficiency score^[1]
Mind2Web	Cross-website generalization	2,350 tasks	137 websites, real-world task diversity, action sequence annotation	Element accuracy, action F1 score^[12]
WebVoyager	Live website interaction	643 tasks	Amazon, GitHub, Google Maps, real-time execution	End-to-end success rate^[18]
VisualWebArena	Multimodal and visual tasks	910 tasks	Image-heavy tasks, visual grounding	Visual element accuracy^[36]
BrowserGym	Unified ecosystem	5,000+ tasks	Standardized obs/action spaces, cross-benchmark evaluation	Aggregate performance score^[27]
WebShop	E-commerce navigation	12,087 products	Product search and selection, attribute matching	Purchase success rate^[37]
OSWorld	Full OS control	369 tasks	Ubuntu, Windows, macOS environments	Cross-platform success rate^[18]
Online-Mind2Web	Real websites, live tasks	300 tasks	136 real websites; binary scoring; replaces self-hosted Mind2Web	Human-evaluated success rate^[38]

Reported comparative performance

Agent / model	WebArena	WebVoyager	Mind2Web	OSWorld	Online-Mind2Web
Human baseline	78.2%	90.0%	85.3%	72.4%	n/a
Browser Use (open source, GPT-4o)	51.2%	89.1%^[31]	73.4%	n/a	n/a
Browser Use Auto-Research (2025)	n/a	n/a	n/a	n/a	97.7%^[38]
OpenAI CUA (Operator)	58.1%	87.0%	76.2%	38.1%	61.3%^[38]
Anthropic Computer Use	45.3%	56.0%	62.1%	22.0%	n/a
Project Mariner (Google)	52.4%	83.5%	71.3%	n/a	n/a
Agent S2 (Simular, OSWorld 50-step)	n/a	n/a	n/a	34.5%^[39]	n/a

Browser Use reported an 89.1% success rate on the WebVoyager benchmark (run with GPT-4o across 586 tasks), which the project described as "state of the art" for an open-source web agent at the time, compared with 87% for OpenAI's Operator.^[31] That number has since been disputed: independent evaluations by Browserable and others produced lower figures (77.3% self-reported, 60.2% LLM-verified) on the same tasks, and competing projects such as Magnitude have reported 93.9% on WebVoyager.^[31]^[40] The Browser Use team has itself noted that it modified the original WebVoyager harness (different prompts, a Langchain migration, and manual review of "unknown" or "failed" tasks) because of issues with the default evaluator.^[31] On Online-Mind2Web (a successor benchmark with 300 tasks on 136 live websites), Browser Use reported a 97.7% success rate using its Auto-Research approach built on the Claude Agent SDK, compared with 61.3% for OpenAI's Operator and 28 to 40% for most other agents.^[38]

The Online-Mind2Web paper, titled "An Illusion of Progress? Assessing the Current State of Web Agents" (Hou et al., 2025), argued that earlier benchmarks systematically overestimated real-world agent ability because of impossible tasks, drifted website content, and inadequate evaluators.^[41]

What are the major browser-use agent implementations?

Browser Use library

browser-use is an open-source MIT-licensed Python library that enables LLM-powered browser interaction via natural language.^[4] Notable characteristics include:

Statistics: more than 100,000 GitHub stars and 11,000+ forks as of mid-2026, with 125+ tagged releases since first publication.^[4]
Codebase: approximately 98% Python; minimum Python 3.11; installable via pip install browser-use or via the uv package manager.^[4]
Features: multi-LLM support, indexed-DOM perception with optional vision, multi-tab support, persistent profile authentication, custom-tool registration, output schema validation, file upload containment, sensitive-data masking, fallback LLM routing.^[29]
Cloud product: cloud.browser-use.com with typed Python and TypeScript SDKs, pay-as-you-go pricing (~$0.01 per task initialization plus per-step LLM cost), live session preview, and Enterprise plans with on-premise deployment and HIPAA support.^[25]^[42]

OpenAI Operator and ChatGPT Agent

Released 23 January 2025, OpenAI Operator is powered by the Computer-Using Agent (CUA) model and combines GPT-4o's vision with reinforcement learning:^[2]^[18]

Architecture: iterative perception, reasoning, action loop with self-correction
Safety: refusal mechanisms, user approval gates for sensitive actions, takeover mode for credentials
Performance: 58.1% on WebArena, 87% on WebVoyager
Availability: initially $200/month ChatGPT Pro tier in the United States; later folded into ChatGPT Agent mode with deprecation of the standalone operator.chatgpt.com site^[19]

Anthropic Computer Use and Claude for Chrome

Released October 2024, Anthropic Computer Use lets Claude models interact with computer interfaces through visual perception and simulated input.^[16] On 26 August 2025, Anthropic launched a research preview of Claude for Chrome, a browser extension that lets Claude operate the user's actual Chrome session.^[43] The extension entered general availability for Claude Pro, Team, Enterprise, and Max subscribers in late December 2025 with site-level permissions, action confirmations for high-risk operations, and built-in prompt injection defenses.^[44]^[45]

Google Project Mariner and Gemini agents

Project Mariner is an experimental browser agent from Google DeepMind built on Gemini 2.0 and later Gemini 2.5, focused on multimodal understanding and announced in December 2024.^[17] Google brought aspects of Mariner into its Gemini app and Project Astra during 2025 and 2026, and added an Agent Mode to Chrome in 2026 alongside agentic web protocols (the Agent Payments Protocol, Agent2Agent, and Trust Tokens for Agents).^[46]

Stagehand and Browserbase

Stagehand is a TypeScript framework from Browserbase that adds AI methods (act, extract, observe) to existing Playwright code, with a hybrid model where deterministic Playwright handles predictable flows and AI commands handle ambiguous steps.^[47] Browserbase, the cloud infrastructure platform behind Stagehand, raised a $40 million Series B in June 2025 and launched a no-code product called Director that emits Stagehand scripts from natural-language goals.^[24]

Skyvern

Skyvern is an open-source agent that treats DOM as unreliable and instead feeds screenshots to vision models, with native two-factor authentication, CAPTCHA support, structured schema-based extraction, and a no-code workflow builder.^[48] Skyvern is often cited as a fork-friendly alternative to Browser Use for form-heavy automation.^[49]

Other notable systems

Convergence Proxy: a browser-agent system from a UK startup acquired by Salesforce in May 2025 for an undisclosed amount, integrated into the Agentforce platform.^[23]
Agent-S2 (Simular): an open compositional generalist-specialist framework that achieves 34.5% on OSWorld's 50-step evaluation, surpassing OpenAI CUA and UI-TARS.^[39]
Magnitude: an open-source web agent that reported 93.9% on WebVoyager and contested Browser Use's leaderboard claims.^[40]
Browserable: a managed AI browser agent that reported 90.4% on WebVoyager and 65% on Online-Mind2Web.^[50]
Cloudflare Browser Run: a managed remote-browser service launched in 2025 to give agents an isolated Chromium with per-request billing.^[51]

What can browser-use agents be used for?

Enterprise automation

Browser-use agents are increasingly positioned as a successor to traditional Robotic Process Automation (RPA), with applications including:

Business process automation across portals without dedicated APIs
Cross-platform data extraction and synchronization
Automated regulatory checks and reporting
Supply chain workflows in vendor portals^[25]

E-commerce and services

Real-time competitor analysis and price monitoring
Multi-marketplace stock synchronization
Automated order tracking and status updates
Review aggregation with downstream sentiment analysis

Research and analysis

Literature review and citation gathering
Market intelligence and competitor monitoring
Earnings report extraction
Patent and prior-art searches

Quality assurance

End-to-end user journey validation^[52]
WCAG accessibility checks
Cross-browser compatibility
Performance and load testing

Personal productivity

Travel planning across multiple booking sites
Resume parsing and job-application submission
News and content aggregation
Cross-platform social media management

What are the main challenges and limitations?

Dynamic content handling

Modern single-page applications with asynchronous loading, virtual scrolling, and lazy loading complicate element discovery; AJAX-heavy interfaces require sophisticated waiting strategies.^[53] Browser Use's CDP migration was motivated in part by the inadequacy of synchronous Playwright waits for the agent's event-driven needs.^[33]

Element identification

Shadow DOM and iframes create isolation barriers; dynamically generated IDs defeat naive selectors; and visually similar elements require disambiguation. Browser Use addresses these through cross-origin iframe handling and "super-selector" tuples (target ID, frame ID, backend node ID, position, fallback selector).^[33]^[54]

State management

Persistent sessions across page transitions, authentication and two-factor flows, and unexpected logouts or timeouts present recurring difficulties.^[55] Browser Use exposes real browser profiles for authentication, and Skyvern provides native 2FA and TOTP integration.^[48]

Error recovery and reliability

Documented success rates on real-world tasks ranged from 60 to 90% in 2025; reliability gaps remain the dominant blocker for production deployment.^[56] Specific failure modes include CAPTCHA handling, modal and popup dialogs, network failures, and rate limiting.^[57]

Performance and cost

Issue	Impact	Current solutions	Future approaches
LLM inference latency	2 to 5 second delays per action	Caching, batching, persistent daemons^[34]	Edge deployment, model distillation
Token consumption	$0.10 to $1.00 per complex task	Efficient prompting, DOM-only mode	Specialized models (BU 2.0), compression^[26]
Memory limits	Context window constraints	Summarization, pruning	Extended context, hierarchical memory
Reliability	60 to 90% success rates	Retry logic, fallback LLMs	Reinforcement learning, self-improvement

Safety, security, and ethical concerns

Prompt injection attacks from malicious web content are widely considered the dominant security risk for browser-use agents.^[58]^[59] Notable 2025 vulnerabilities include EchoLeak (CVE-2025-32711), a zero-click vulnerability in Microsoft 365 Copilot that allowed remote attackers to exfiltrate data via email content;^[59] CurXecute (CVE-2025-54135), a remote code execution flaw in Cursor IDE triggered by malicious README content (CVSS 9.8);^[60] and a prompt-injection-via-navigation flaw in Perplexity Comet disclosed in October 2025.^[61] OpenAI stated in December 2025 that prompt injection in AI browsers is "unlikely to ever be fully solved," and the UK National Cyber Security Centre echoed that this class of attack "may be a problem that is never fully fixed."^[62]^[63] Google's January 2026 report noted a 32% relative increase in malicious prompt-injection activity between November 2025 and February 2026.^[64] Indirect prompt injection, which delivers attacker payloads through third-party content (web pages, documents, emails) rather than direct user prompts, was identified by OWASP as the #1 AI security threat for 2026.^[65]

Other concerns include credential theft, cross-site scripting via injected payloads, data exfiltration, processing of sensitive personal information, screenshot capture of private data, audit-trail retention, automated spam, large-scale unauthorized scraping, and terms-of-service violations.^[66] Browser Use's documentation includes a sensitive_data parameter that masks specified strings in LLM prompts and screenshots.^[29]

2025 to 2026 developments

Maturation of the Browser Use ecosystem

The 2025 to 2026 period transformed Browser Use from a viral library into a venture-backed company with a commercial cloud and an in-house model:

March 2025: $17M seed led by Felicis Ventures closes; Y Combinator W25 graduation.^[20]^[15]
March 2025: CLI 2.0 ships, replacing Playwright internals with a CDP daemon and reporting ~50ms command latency and ~50% token reduction.^[34]
August 2025: Public "Closer to the metal: leaving Playwright for CDP" rewrite announced; introduces watchdog services and event-driven DOM extraction.^[33]
Late 2025: Cloud product reaches general availability with typed Python and TypeScript SDKs, free tier, and Enterprise contracts.^[25]^[42]
27 January 2026: BU 2.0 model launches, claimed +12% accuracy at unchanged latency; supports bring-your-own-key and reusable automation script generation.^[26]
April to May 2026: Removal of litellm from core dependencies after a supply-chain incident; release of install-lite.sh; security hardening for daemon socket access and file handling.^[35]

Competitive landscape consolidation

The browser-use agent market consolidated rapidly during this period:

Salesforce acquired Convergence AI in May 2025.^[23]
Browserbase closed a $40M Series B in June 2025 and launched Director.^[24]
Anthropic shipped Claude for Chrome as a research preview in August 2025 and general availability in December 2025.^[43]^[44]
Microsoft expanded Copilot Studio with browser-control actions across Microsoft 365.^[67]
OpenAI merged Operator into the broader ChatGPT Agent product.^[19]
The Online-Mind2Web paper argued that earlier benchmark progress had been illusory, prompting a new wave of evaluation work.^[41]

Standards and protocols

Two related agent web protocols emerged in this period: the Model Context Protocol (MCP), introduced by Anthropic in November 2024 and adopted broadly by 2025, gave LLM hosts a standard interface for tools, including browser tools.^[68] The Agent2Agent (A2A) protocol from Google in 2025 specified interoperability between agents.^[69] The Agent Payments Protocol (AP2), also from Google, defined trusted-credentials handshakes for agent-initiated payments.^[46] Browser Use has integrated MCP via a server that exposes its actions as MCP tools.^[29]

Performance frontier

By mid-2026 the state of the art on Online-Mind2Web was held by Browser Use's Auto-Research configuration at 97.7%, but with the caveat that two tasks were judged "impossible" and that the agentic judge was a Claude-based system aligned with human reviewers.^[38] On OSWorld the open Agent S2 from Simular led at 34.5% (50 steps).^[39] On WebVoyager, project-internal numbers from Magnitude (93.9%), Browserable (90.4%), and Browser Use (89.1%) all exceeded human-claimed baselines but were difficult to compare across independent harnesses.^[40]^[50]^[31]

Adoption signs

By mid-2026, the open-source browser-use repository had:

More than 100,000 GitHub stars and 11,000+ forks^[4]
Active integration into agentic workflow tooling, LangChain examples, and the Claude Agent SDK^[38]
A presence in Y Combinator's "leading open-source web agent" company profile^[15]
Production deployments at "tens of thousands" of developer organizations, per Felicis's investment thesis update^[21]

Notable research projects

Academic initiatives

WebGPT (OpenAI, 2021): pioneering browser-assisted question answering with human feedback; established foundations for modern browser-use agents.^[10]
Mind2Web (OSU and Allen AI, 2023): large-scale dataset and framework for generalist web agents, with cross-website generalization tests.^[12]
Online-Mind2Web (OSU, 2025): live successor benchmark; introduced the "Illusion of Progress" critique of earlier evaluations.^[41]
WebArena (CMU, 2023): realistic self-hosted benchmark environment for reproducible agent evaluation.^[1]
AgentTuning: research on enabling generalized agent abilities through fine-tuning of LLMs for web tasks.^[28]
Agent S2 (Simular, 2025): compositional generalist-specialist framework with Mixture-of-Grounding and Proactive Hierarchical Planning.^[39]
Mira benchmark / Mira AI: a 2025 multi-site benchmark covering 600+ real workflows with emphasis on multi-tab and authenticated tasks (cited by multiple agent vendors during 2025 to 2026).^[49]

Industry research

Adept ACT-1: universal action transformer for software interface control.^[11]
Google multimodal web navigation: research on instruction-finetuned foundation models for web navigation.^[70]
OpenAI Computer-Using Agent (CUA): the reinforcement-learning-trained model behind Operator.^[2]
Claude for Chrome research preview: Anthropic's report on per-action confirmations and built-in injection defenses.^[43]

Open-source frameworks

LangChain and LlamaIndex: provide building blocks and tool wrappers commonly used with browser-use agents.^[71]
BrowserGym: unified ecosystem for web-agent research and evaluation.^[27]
Skyvern: open-source vision-first browser automation.^[48]
Stagehand: TypeScript hybrid Playwright + LLM framework from Browserbase.^[47]
Magnitude: open-source SOTA-claiming agent that contested Browser Use's leaderboard numbers.^[40]

Aspect	Browser-use agent	Computer-use agent	Traditional RPA	Web scraping
Scope	Web browsers	Full desktop OS	Predefined workflows	Data extraction only
Adaptability	High (LLM-based)	High (LLM-based)	Low (scripted)	Low (rule-based)
Setup complexity	Medium	High	High	Low
Maintenance	Self-adapting	Self-adapting	Frequent updates	Regular updates
Cost	$0.10 to $1.00 per task	$0.50 to $2.00 per task	High initial, low per-task	Low
Use cases	General web automation	Any desktop application	Repetitive business processes	Data collection
Error handling	Intelligent recovery	Intelligent recovery	Basic retry logic	Minimal

Simple explanation (ELI5)

Imagine you have a very smart assistant who can use a computer the same way you do. Instead of needing a special back-door connection to every website, the assistant looks at the page, reads the buttons and boxes as plain text, decides what to click or type, and does it, all because you told it a goal in everyday words like "book me the cheapest flight to Tokyo." Browser Use is a free, open tool that gives AI assistants this ability by turning a messy webpage into a tidy numbered list the AI can understand. Two graduate students built it in 2024, it became wildly popular on GitHub, and investors gave the project $17 million to keep improving it.

References

Zhou, Shuyan et al., "WebArena: A Realistic Web Environment for Building Autonomous Agents", arXiv, 2023-07-25. https://arxiv.org/abs/2307.13854. Accessed 2026-05-24. ↩
OpenAI, "Introducing Operator", OpenAI Blog, 2025-01-23. https://openai.com/index/introducing-operator/. Accessed 2026-05-24. ↩
Žunič, Gregor, "We Raised $17M to Build the Future of Web for Agents", Browser Use Blog, 2025-03-22. https://browser-use.com/posts/seed-round. Accessed 2026-06-27. ↩
Browser Use, "browser-use/browser-use: Make websites accessible for AI agents", GitHub repository, accessed 2026-06-27. https://github.com/browser-use/browser-use. Accessed 2026-06-27. ↩
Deng, Xiang et al., "Mind2Web: Towards a Generalist Agent for the Web", arXiv, 2023-06-09. https://arxiv.org/abs/2306.06070. Accessed 2026-05-24. ↩
He, Hongliang et al., "WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models", arXiv, 2024-01-25. https://arxiv.org/abs/2401.13919. Accessed 2026-05-24. ↩
Nakano, Reiichiro et al., "WebGPT: Browser-assisted question-answering with human feedback", arXiv, 2021-12-17. https://arxiv.org/abs/2112.09332. Accessed 2026-05-24. ↩
Yao, Shunyu et al., "ReAct: Synergizing Reasoning and Acting in Language Models", arXiv, 2022-10-06. https://arxiv.org/abs/2210.03629. Accessed 2026-05-24. ↩
Anthropic, "Developing a computer use model", Anthropic News, 2024-10-22. https://www.anthropic.com/news/developing-computer-use. Accessed 2026-05-24. ↩
OpenAI, "WebGPT: Improving the factual accuracy of language models through web browsing", OpenAI Blog, 2021-12-16. https://openai.com/index/webgpt/. Accessed 2026-05-24. ↩
Adept AI, "ACT-1: Transformer for Actions", Adept Blog, 2022-09-14. https://www.adept.ai/blog/act-1. Accessed 2026-05-24. ↩
Deng, Xiang et al., "Mind2Web: Towards a Generalist Agent for the Web", Project Page, OSU NLP Group, 2023. https://osu-nlp-group.github.io/Mind2Web/. Accessed 2026-05-24. ↩
OpenAI, "GPT-4V(ision) system card", OpenAI, 2023-09-25. https://openai.com/index/gpt-4v-system-card/. Accessed 2026-05-24. ↩
Müller, Magnus, "Magnus Müller, Founder of browser-use (YC W25)", LinkedIn profile, accessed 2026-05-24. https://ch.linkedin.com/in/magnus-mueller. Accessed 2026-05-24. ↩
Y Combinator, "Browser Use: Leading open-source web agent project with 50k stars in 3 months", Y Combinator company page, accessed 2026-05-24. https://www.ycombinator.com/companies/browser-use. Accessed 2026-05-24. ↩
Anthropic, "Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku", Anthropic News, 2024-10-22. https://www.anthropic.com/news/3-5-models-and-computer-use. Accessed 2026-05-24. ↩
Google DeepMind, "Project Mariner: Building the future of human-agent interaction", Google DeepMind Blog, 2024-12-11. https://deepmind.google/discover/blog/. Accessed 2026-05-24. ↩
OpenAI, "Computer-Using Agent: Introducing a universal interface for AI to interact with the digital world", OpenAI Blog, 2025-01-23. https://openai.com/index/computer-using-agent/. Accessed 2026-05-24. ↩
OpenAI, "Introducing ChatGPT agent: bridging research and action", OpenAI Blog, 2025-07-17. https://openai.com/index/introducing-chatgpt-agent/. Accessed 2026-05-24. ↩
Wiggers, Kyle, "Browser Use, the tool making it easier for AI 'agents' to navigate websites, raises $17M", TechCrunch, 2025-03-23. https://techcrunch.com/2025/03/23/browser-use-the-tool-making-it-easier-for-ai-agents-to-navigate-websites-raises-17m/. Accessed 2026-06-27. ↩
Myers, Astasia, "Felicis's Seed in Browser Use: Enabling AI Agents to Navigate the Web with Reliable Web Interaction", Felicis Ventures Blog, 2025-03-23. https://www.felicis.com/blog/investing-in-browser-use. Accessed 2026-05-24. ↩
Wheatley, Mike, "Browser Use raises $17M to help steer AI agents through the internet", SiliconANGLE, 2025-03-23. https://siliconangle.com/2025/03/23/browser-use-raises-17m-help-steer-ai-agents-internet/. Accessed 2026-05-24.
Salesforce, "Salesforce Signs Definitive Agreement to Acquire Convergence.ai", Salesforce News, 2025-05-15. https://www.salesforce.com/news/stories/salesforce-signs-definitive-agreement-to-acquire-convergence-ai/. Accessed 2026-05-24. ↩
Browserbase, "Browserbase Launches 'Director' to Automate the Web for Everyone; Announces $40M Series B", PR Newswire, 2025-06-18. https://www.prnewswire.com/news-releases/browserbase-launches-director-to-automate-the-web-for-everyone-announces-40m-series-b-302483761.html. Accessed 2026-05-24. ↩
Browser Use, "Pricing", Browser Use website, accessed 2026-05-24. https://browser-use.com/pricing. Accessed 2026-05-24. ↩
Browser Use, "Browser Use Model: BU 2.0", Browser Use Changelog, 2026-01-27. https://browser-use.com/changelog/27-1-2026. Accessed 2026-05-24. ↩
Chezelles, Thibault Le Sellier de et al., "BrowserGym: A Unified Ecosystem for Web Agent Research", arXiv, 2024-12-06. https://arxiv.org/abs/2412.05467. Accessed 2026-05-24. ↩
Zeng, Aohan et al., "AgentTuning: Enabling Generalized Agent Abilities for LLMs", arXiv, 2023-10-19. https://arxiv.org/abs/2310.12823. Accessed 2026-05-24. ↩
Browser Use, "Agent Settings", Browser Use Documentation, accessed 2026-05-24. https://docs.browser-use.com/customize/agent-settings. Accessed 2026-05-24. ↩
Browser Use, "Quickstart", Browser Use Documentation, accessed 2026-05-24. https://docs.browser-use.com/quickstart. Accessed 2026-05-24. ↩
Browser Use, "Browser Use: state of the art Web Agent", Browser Use Blog, 2025-02. https://browser-use.com/posts/sota-technical-report. Accessed 2026-06-27. ↩
Microsoft, "Playwright: Fast and reliable end-to-end testing for modern web apps", Playwright Documentation, accessed 2026-05-24. https://playwright.dev/. Accessed 2026-05-24. ↩
Browser Use, "Closer to the Metal: Leaving Playwright for CDP", Browser Use Blog, 2025-08-20. https://browser-use.com/posts/playwright-to-cdp. Accessed 2026-05-24. ↩
Browser Use, "browser-use Releases (0.12.x)", GitHub Releases, accessed 2026-05-24. https://github.com/browser-use/browser-use/releases. Accessed 2026-05-24. ↩
Browser Use, "browser-use 0.12.8 release notes", GitHub, 2026-05-23. https://github.com/browser-use/browser-use/releases/tag/0.12.8. Accessed 2026-05-24. ↩
Koh, Jing Yu et al., "VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks", arXiv, 2024-01-24. https://arxiv.org/abs/2401.13649. Accessed 2026-05-24. ↩
Yao, Shunyu et al., "WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents", arXiv, 2022-07-04. https://arxiv.org/abs/2207.01206. Accessed 2026-05-24. ↩
Browser Use, "How we built the best browser agent with Auto-Research", Browser Use Blog, accessed 2026-05-24. https://browser-use.com/posts/online-mind2web-benchmark. Accessed 2026-05-24. ↩
Agashe, Saaket et al., "Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents", arXiv, 2025-04-02. https://arxiv.org/abs/2504.00906. Accessed 2026-05-24. ↩
Magnitude Dev, "Magnitude achieves SOTA 94% on WebVoyager benchmark", GitHub, accessed 2026-05-24. https://github.com/magnitudedev/webvoyager. Accessed 2026-05-24. ↩
Hou, Yixing et al., "An Illusion of Progress? Assessing the Current State of Web Agents", arXiv, 2025-04-01. https://arxiv.org/abs/2504.01382. Accessed 2026-05-24. ↩
Browser Use, "Browser Use Cloud Quickstart", Browser Use Cloud Documentation, accessed 2026-05-24. https://docs.cloud.browser-use.com/. Accessed 2026-05-24. ↩
Anthropic, "Piloting Claude in Chrome", Anthropic News, 2025-08-26. https://www.anthropic.com/news/claude-for-chrome. Accessed 2026-05-24. ↩
Anthropic, "Claude for Chrome Extension", Chrome Web Store, accessed 2026-05-24. https://chromewebstore.google.com/publisher/anthropic/u308d63ea0533efcf7ba778ad42da7390. Accessed 2026-05-24. ↩
Anthropic, "Mitigating the risk of prompt injections in browser use", Anthropic Research, accessed 2026-05-24. https://www.anthropic.com/research/prompt-injection-defenses. Accessed 2026-05-24. ↩
Google for Developers, "15 updates from Google I/O 2026: Powering the agentic web with new capabilities", Chrome for Developers Blog, 2026-05. https://developer.chrome.com/blog/chrome-at-io26. Accessed 2026-05-24. ↩
Browserbase, "Stagehand: AI-powered browser automation", GitHub, accessed 2026-05-24. https://github.com/browserbase/stagehand. Accessed 2026-05-24. ↩
Skyvern AI, "Skyvern: open-source browser automation with LLMs and computer vision", GitHub, accessed 2026-05-24. https://github.com/Skyvern-AI/skyvern. Accessed 2026-05-24. ↩
Gonsalvez, Steven, "Browser Tools for AI Agents Part 2: The Framework Wars (browser-use, Stagehand, Skyvern)", DEV Community, 2025. https://dev.to/stevengonsalvez/browser-tools-for-ai-agents-part-2-the-framework-wars-browser-use-stagehand-skyvern-4gn. Accessed 2026-05-24. ↩
Browserable, "WebVoyager Benchmark Results", Browserable Blog, accessed 2026-05-24. https://www.browserable.ai/blog/web-voyager-benchmark. Accessed 2026-05-24. ↩
Cloudflare, "Browser Run: give your agents a browser", Cloudflare Blog, 2025. https://blog.cloudflare.com/browser-run-for-ai-agents/. Accessed 2026-05-24. ↩
Microsoft Playwright Team, "Playwright Test for end-to-end testing", Microsoft, accessed 2026-05-24. https://playwright.dev/docs/test-intro. Accessed 2026-05-24. ↩
Scrapfly, "Stagehand vs Browser Use: AI Browser Agent Guide", Scrapfly Blog, accessed 2026-05-24. https://scrapfly.io/blog/posts/stagehand-vs-browser-use. Accessed 2026-05-24. ↩
Lightpanda, "CDP Under the Hood: A Deep Dive", Lightpanda Blog, accessed 2026-05-24. https://lightpanda.io/blog/posts/cdp-under-the-hood. Accessed 2026-05-24. ↩
NxCode, "Stagehand vs Browser Use vs Playwright: AI Browser Automation Compared (2026)", NxCode Resources, 2026. https://www.nxcode.io/resources/news/stagehand-vs-browser-use-vs-playwright-ai-browser-automation-2026. Accessed 2026-05-24. ↩
Steel.dev, "AI Browser Agent Leaderboards", Steel.dev, accessed 2026-05-24. https://leaderboard.steel.dev/. Accessed 2026-05-24. ↩
Skyvern, "Browser Use vs Stagehand: Which is Better? (February 2026)", Skyvern Blog, 2026-02. https://www.skyvern.com/blog/browser-use-vs-stagehand-which-is-better/. Accessed 2026-05-24. ↩
Greshake, Kai et al., "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection", arXiv, 2023-02-23. https://arxiv.org/abs/2302.12173. Accessed 2026-05-24. ↩
Aim Security, "EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit in a Production LLM System", arXiv, 2025-09. https://arxiv.org/abs/2509.10540. Accessed 2026-05-24. ↩
National Vulnerability Database, "CVE-2025-54135 Cursor IDE remote code execution via prompt injection in README", NIST NVD, 2025. https://nvd.nist.gov/vuln/detail/CVE-2025-54135. Accessed 2026-05-24. ↩
Brave Software, "Unseeable prompt injections in screenshots: more vulnerabilities in Comet and other AI browsers", Brave Blog, 2025-10. https://brave.com/blog/unseeable-prompt-injections/. Accessed 2026-05-24. ↩
Wiggers, Kyle, "OpenAI says AI browsers may always be vulnerable to prompt injection attacks", TechCrunch, 2025-12-22. https://techcrunch.com/2025/12/22/openai-says-ai-browsers-may-always-be-vulnerable-to-prompt-injection-attacks/. Accessed 2026-05-24. ↩
Hutton, Catherine, "OpenAI says prompt injections that can trick AI browsers may never be fully solved", Fortune, 2025-12-23. https://fortune.com/2025/12/23/openai-ai-browser-prompt-injections-cybersecurity-hackers/. Accessed 2026-05-24. ↩
Google Security, "AI threats in the wild: The current state of prompt injections on the web", Google Security Blog, 2026. https://blog.google/security/prompt-injections-web/. Accessed 2026-05-24. ↩
Securance, "Prompt injection: the OWASP #1 AI threat in 2026", Securance Blog, 2026. https://www.securance.com/blog/prompt-injection-the-owasp-1-ai-threat-in-2026/. Accessed 2026-05-24. ↩
Malwarebytes Labs, "AI browsers could leave users penniless: A prompt injection warning", Malwarebytes, 2025-08. https://www.malwarebytes.com/blog/news/2025/08/ai-browsers-could-leave-users-penniless-a-prompt-injection-warning. Accessed 2026-05-24. ↩
Microsoft, "Copilot Studio: build agents for the web and Microsoft 365", Microsoft Copilot Studio Documentation, accessed 2026-05-24. https://learn.microsoft.com/en-us/microsoft-copilot-studio/. Accessed 2026-05-24. ↩
Anthropic, "Introducing the Model Context Protocol", Anthropic News, 2024-11-25. https://www.anthropic.com/news/model-context-protocol. Accessed 2026-05-24. ↩
Google, "Announcing the Agent2Agent Protocol", Google Developers Blog, 2025. https://developers.googleblog.com/. Accessed 2026-05-24. ↩
Furuta, Hiroki et al., "Multimodal Web Navigation with Instruction-Finetuned Foundation Models", arXiv, 2023-05-19. https://arxiv.org/abs/2305.11854. Accessed 2026-05-24. ↩
LangChain, "LangChain documentation: Browser tools and agents", LangChain Docs, accessed 2026-05-24. https://python.langchain.com/. Accessed 2026-05-24. ↩

External links

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

4 revisions by 1 contributors · full history

Suggest edit

What links here

AI Wiki Artificial intelligence terms Computer-use agent Playwright MCP Terms

What is a browser-use agent?

Is Browser Use open source?

How does Browser Use work?

The Browser Use agent loop

Playwright to CDP migration

Who created Browser Use and when?

Predecessor research (2021 to 2023)

Origin of the Browser Use library (October to November 2024)

Modern era of computer- and browser-use agents (2024 to 2025)

How much funding has Browser Use raised?

Why use a browser-use agent?

Architecture and technical implementation

Browser automation backends

Processing modes

Language model integration

How well do browser-use agents perform on benchmarks?

Standardized evaluation frameworks

Reported comparative performance

What are the major browser-use agent implementations?

Browser Use library

OpenAI Operator and ChatGPT Agent

Anthropic Computer Use and Claude for Chrome

Google Project Mariner and Gemini agents

Stagehand and Browserbase

Skyvern

Other notable systems

What can browser-use agents be used for?

Enterprise automation

E-commerce and services

Research and analysis

Quality assurance

Personal productivity

What are the main challenges and limitations?

Dynamic content handling

Element identification

State management

Error recovery and reliability

Performance and cost

Safety, security, and ethical concerns

2025 to 2026 developments

Maturation of the Browser Use ecosystem

Competitive landscape consolidation

Standards and protocols

Performance frontier

Adoption signs

Notable research projects

Academic initiatives

Industry research

Open-source frameworks

How does a browser-use agent compare with related technologies?

Simple explanation (ELI5)

See also

References

External links

Improve this article

Related Articles

GPT Store

Claude Code Playwright

Manus AI

Devin (AI software engineer)

OpenClaw

Deep Research

What links here

Related Articles

GPT Store

Claude Code Playwright

Manus AI

Devin (AI software engineer)

OpenClaw

Deep Research

What links here