Browser Use

Browser Use is an open-source Python library that enables large language models to control web browsers, turning any LLM into an agent capable of browsing the web, clicking buttons, filling forms, and completing multi-step tasks online. Released in late 2024 by Magnus Müller and Gregor Žunič, the project accumulated over 50,000 GitHub stars within three months of its public release, surpassed 78,000 stars by early 2025, and crossed roughly 94,000 stars by early 2026, making it one of the fastest-growing AI agent infrastructure projects on GitHub.[^1][^2] The company behind the library, Browser Use Inc., was part of Y Combinator's Winter 2025 batch and raised a $17 million seed round in March 2025.[^3]

Where proprietary alternatives like OpenAI Operator and Computer use from Anthropic rely on closed APIs, Browser Use provides an open-source foundation that developers can inspect, modify, and self-host. It originally wrapped the Playwright browser automation library in an LLM-driven control loop, exposing a simple Python interface for building web agents; with CLI 2.0 in March 2026 the project began moving toward direct Chrome DevTools Protocol implementations to reduce overhead.[^4]

History

Magnus Müller had been working on web scraping tools for several years before meeting Gregor Žunič in 2024 while both were completing master's degrees in data science at ETH Zurich. The two began collaborating at ETH Zurich's Student Project House accelerator, a workspace designed for side projects. They built the first version of Browser Use in approximately five weeks.[^7]

The initial demo resonated with developers building AI agents who had struggled with brittle web automation. Müller and Žunič open sourced the project under the MIT license and the GitHub repository grew rapidly, attracting contributors and users building everything from personal automation scripts to commercial products.[^1] The speed of adoption caught the attention of Y Combinator, and the two founders joined the Winter 2025 batch in San Francisco.

By early 2025, Browser Use had become a common dependency in AI agent projects. The company announced its $17 million seed round in March 2025 and around the same time began rolling out Browser Use Cloud, a managed hosting product that lets teams use Browser Use without running their own browser infrastructure.[^3][^4]

In March 2025, Chinese AI startup Butterfly Effect launched Manus AI, an autonomous agent that could complete complex multi-step tasks such as resume screening and stock analysis. Manus became a viral phenomenon, and Browser Use was one of the tools powering its web interaction capabilities. A post explaining this connection garnered over 2.4 million views on X, introducing Browser Use to a large audience outside the developer community.[^3]

Through the second half of 2025 the project shifted from a single-purpose library into a layered product suite. The team shipped Profile Use for syncing real Chrome profiles in October 2025, achieved SOC 2 Type II compliance on November 4, 2025 after an audit by Accorp Partners covering July 17 to October 17, 2025, released a community Templates Library on November 13, 2025, added MCP server support and Gemini 3 integration on November 21, 2025, launched the Skills API for plain-text automation workflows on December 4, 2025, and open-sourced the first version of its proprietary browser model on December 16, 2025.[^5][^6]

The pace accelerated in 2026. By February the engineering team had grown to roughly eight people, and Müller publicly noted he was spending less time writing code and more time in a CEO role.[^7] Browser Use Cloud restructured pricing on April 3, 2026 to add a permanent free tier, and on April 25, 2026 the company launched Browser Use Box (BUX), a 24/7 remote virtual machine bundling Claude Code with the Browser Harness and accessible through a web terminal, SSH, or a user-supplied Telegram bot.[^8][^9]

Funding

Browser Use announced a $17 million seed round on March 23, 2025. The round was led by Felicis Ventures, with participation from A Capital, Nexus Venture Partners, Y Combinator, SV Angel, Pioneer Fund, Liquid2, and Paul Graham as an individual investor.[^3]

At the time of the funding announcement, the company employed approximately seven people and had been adopted by engineering teams at companies including Airbnb, Amazon, and Anthropic. More than 20 companies in the YC W25 batch were using Browser Use in their core workflows.[^3] The seed funding was directed toward building out Browser Use Cloud, expanding the team, and improving the reliability of the web agent framework. As of mid-2026 the company has not announced a Series A round.[^7]

Architecture

Browser Use was originally built on top of Playwright, Microsoft's open-source browser automation library, with Playwright handling low-level browser control and Browser Use adding an LLM reasoning layer using the Chrome DevTools Protocol (CDP). The CLI 2.0 release on March 22, 2026 replaced the Playwright dependency in the command-line tool with a direct CDP implementation, citing roughly a two-times speed improvement and lower cost per task as the motivation.[^4][^10]

The central architectural insight is how Browser Use represents web pages to the language model. Rather than sending raw HTML or screenshots to the LLM, it extracts and processes the page's Document Object Model (DOM) tree into a cleaned, structured representation that strips out irrelevant styling and metadata. This text-based format is considerably cheaper to process with a language model than pixel-based approaches, which must encode and decode full-resolution screenshots on every step.

DOM extraction and the element tree

The core component responsible for making web pages readable to LLMs is the DomService. When an agent loads a page, the process works as follows:

Multiple CDP calls gather the full browser state, including accessibility data, layout metrics, and interactive element locations.
A semantic tree is constructed from the page DOM, filtering out non-interactive and invisible elements to reduce noise.
Each interactive element receives a numeric index (buttons, input fields, links, dropdowns, checkboxes, and similar components).
This tree is serialized as a compact XML-style string included in the LLM's context window.

The result looks something like:

<sup><a href="#cite_note-1" class="cite-ref">[1]</a></sup> <button>Sign in</button>
<sup><a href="#cite_note-2" class="cite-ref">[2]</a></sup> <input type="text" placeholder="Email address" />
<sup><a href="#cite_note-3" class="cite-ref">[3]</a></sup> <a href="/forgot-password">Forgot password?</a>

The LLM reads this representation, decides which element to interact with, and returns a structured action command referencing elements by their index. This avoids the need to parse raw HTML or interpret screenshots for precise click targeting.

Vision-based fallback

For pages where DOM extraction is insufficient, Browser Use includes a screenshot-based fallback. When Claude Sonnet 4 or Claude Opus 4 models are used, coordinate clicking is automatically enabled, letting the agent click on arbitrary coordinates in a screenshot if the element tree does not contain a suitable target. This hybrid approach lets the library handle dynamic content, canvas elements, and pages that use non-standard rendering pipelines.

Agent loop

The Agent class orchestrates execution through an iterative loop:

The MessageManager constructs a prompt that includes the current browser state (element tree or screenshot), the task description, and the conversation history.
The prompt is sent to the configured LLM, which returns a structured action.
The Tools registry dispatches the action to the appropriate browser operation via CDP.
The browser executes the operation, and the new page state is observed.
The loop repeats until the agent calls the done action or a maximum step count is reached.

An ActionLoopDetector monitors for repetitive patterns in the action history and can interrupt the loop to prevent the agent from cycling in place. An AgentHistoryList maintains a record of all previous steps, giving the LLM context about what has already been attempted.

Action space

Browser Use exposes a registry of high-level actions that the LLM can invoke. The built-in action set includes:

Action	Description
`navigate`	Load a URL in the browser
`click`	Click an element by its index
`type`	Type text into an input field
`scroll`	Scroll the page in a given direction
`extract`	Run a separate LLM call against the page content to retrieve specific information
`go_back`	Navigate to the previous page
`wait`	Pause execution for a specified duration
`screenshot`	Capture the current page state as an image
`select_dropdown`	Select an option from a dropdown menu
`upload_file`	Upload a file to a file input
`evaluate`	Execute arbitrary JavaScript on the page
`done`	Signal task completion and return a result

Developers can register custom actions by decorating Python functions with the @action decorator and passing them to the agent at initialization. Custom actions give agents access to domain-specific operations such as writing to a database, calling an external API, or reading from the local file system.

The extract action is a notable design choice. Rather than asking the main LLM to reason about large amounts of page text, it spawns a separate, lightweight LLM call against the page's markdown content to pull out only the relevant data, keeping the main context window focused on decision-making.

Skills

Released on December 4, 2025, the Skills API lets developers describe reusable automation workflows in plain text and have the agent execute them deterministically on later runs. A skill captures a sequence such as logging into a SaaS portal and downloading an invoice, and exposes it as a callable building block from the standard agent loop. The intent is to reduce per-task LLM cost by replaying known-good interaction sequences instead of re-deriving them from scratch.[^6]

How it works

Installing Browser Use requires Python 3.11 or higher. The package is available on PyPI:

uv add browser-use

A minimal agent looks like this:

from browser_use import Agent
from langchain_openai import ChatOpenAI
import asyncio

agent = Agent(
    task="Find the price of a MacBook Pro on apple.com",
    llm=ChatOpenAI(model="gpt-4o"),
)

asyncio.run(agent.run())

The Agent class accepts any LangChain-compatible LLM, a task description in plain English, and optional configuration for browser settings, maximum number of steps, and callbacks. The agent runs asynchronously and returns a structured result when it completes or fails.

More complex workflows can chain multiple agents, define custom actions that extend the agent's capabilities beyond the built-in browser actions, or integrate Browser Use into larger agent systems. The library exposes hooks for observing agent state at each step, which makes it possible to build monitoring dashboards, inject human-in-the-loop approvals, or log detailed traces for debugging.

Supported models

Browser Use is model-agnostic. Because it uses LangChain's model abstraction layer, it works with any LLM provider that LangChain supports. The library maintains wrapper classes for a wide range of providers:

Provider	Notes
Browser Use Cloud (`ChatBrowserUse`)	Proprietary model optimized for browser tasks; variants `bu-latest` and `bu-2-0` released January 27, 2026
OpenAI (`ChatOpenAI`)	GPT-4o, GPT-4.1, and compatible models
Anthropic (`ChatAnthropic`)	Claude Sonnet and Opus; coordinate clicking auto-enabled for Sonnet 4 and Opus 4
Google (`ChatGoogle`)	Gemini 2.5 Flash and Pro; Vertex AI and Gemma supported; Gemini 3 added November 21, 2025
Azure OpenAI	Azure-hosted GPT models
AWS Bedrock	Claude and other Bedrock-hosted models
DeepSeek	DeepSeek-V3, DeepSeek-R1
Mistral	Mistral and Mixtral models
Groq	Llama models served via Groq's inference API
Ollama	Local models
OpenRouter	Access to 300+ models through a single endpoint
LiteLLM	Any LiteLLM-compatible provider string

The ChatBrowserUse model is a proprietary model built by Browser Use for browser automation tasks. The first generation, retroactively named BU 1.0, was open-sourced on December 16, 2025 with an efficiency claim of approximately 200 tasks per dollar.[^6] BU 2.0 followed on January 27, 2026, reporting +12% absolute accuracy over BU 1.0 (from 74.7% to 83.3% on Browser Use's internal task suite) at unchanged latency of roughly 62 seconds per task. The team described BU 2.0 as matching Claude Opus 4.5 accuracy while running about 40% faster.[^11]

For cost-sensitive applications, smaller models like GPT-4o mini and Gemini 2.0 Flash work reasonably well on straightforward tasks, while complex multi-step workflows typically benefit from more capable models. Vision-language models are preferable for pages with complex visual layouts that the DOM representation alone does not capture accurately.

Browser Use Cloud

In addition to the open-source library, Browser Use operates a managed cloud platform at browser-use.com. The cloud platform provides managed browser infrastructure so developers do not need to run Playwright or Chrome locally.

Browser Use Cloud offers several deployment modes. In Browser Harness mode, teams use their own agent code (the open-source library) but connect to cloud-hosted browsers rather than spinning up local Chromium instances. In Hosted V3 Agent mode (introduced with the experimental Agent API and SDK 3.0 on February 25, 2026), a fully managed agent accepts a task description and handles everything from browser control to LLM reasoning without requiring any agent code from the user.[^6]

Key features of Browser Use Cloud include stealth browsers with fingerprinting bypass across 195+ countries, built-in CAPTCHA solving (including an hCaptcha solver added with CLI 2.0 and improved reCAPTCHA accuracy), managed proxy pools, parallel session execution, and persistent browser profiles for sites requiring authentication. November 4, 2025 saw the company complete SOC 2 Type II certification, aimed at unblocking sales conversations with regulated enterprise buyers.[^5][^10]

Browser Use Box (BUX)

Launched on April 25, 2026, Browser Use Box is a remote virtual machine that comes pre-loaded with Claude Code and the Browser Harness. Spin-up takes 30 to 60 seconds from a baked AMI, browser sessions automatically rotate every 240 minutes, and the VM runs under a locked-down IAM role with zero AWS permissions. Users can attach a Telegram bot token created via @BotFather to control the agent from a phone, SSH directly into the box, or use the cloud.browser-use.com/bux web terminal. The launch announcement positioned BUX as a workaround for complex dashboards such as Microsoft Azure and Google Workspace that pure browser agents had historically struggled to operate reliably.[^9]

Pricing

Browser Use Cloud reorganized its pricing on April 3, 2026 around a free starter tier and per-credit usage on top of subscriptions.[^8] Browser sessions cost $0.06 per hour, with proxy bandwidth charged at $5 per GB. The V3 Agent uses token-based pricing at 1.2 times the underlying provider's rates (for example, Claude Sonnet 4.6 priced at $3.60 input / $18.00 output per million tokens), and Browser Use Box is billed hourly at roughly $1 to $4 per day depending on the compute tier.[^12]

Plan	Monthly price	Credits included	Concurrent sessions
Free	$0	Community browsers only	3
Dev	$29	$29	25
Business	$299	$400	200
Scaleup	$999	$1,400	500
Enterprise	Custom	Negotiated	Negotiated

Business and Scaleup subscribers receive a 25% discount on LLM step costs and a 50% discount on skill execution, browser sessions, and proxy data. All paid tiers support bring-your-own-key, so customers can plug in their own Anthropic, OpenAI, or Google credentials and avoid the token markup. New accounts receive $10 in free credits.[^12]

Comparison with alternatives

Browser Use occupies a different position in the market from the two most prominent proprietary alternatives, OpenAI Operator and Anthropic's Computer use. The three approaches differ significantly in architecture, cost structure, and intended audience.

Feature	Browser Use	OpenAI Operator	Anthropic Computer Use
Open source	Yes (MIT)	No	No
Self-hostable	Yes	No	No
Interaction approach	DOM element tree (+ vision fallback)	Screenshots (vision-first)	Screenshots (vision-first)
Model support	Any LLM (15+ providers)	Proprietary CUA model	Claude only
Desktop app control	No	No	Yes
Developer customization	High (custom actions, model choice)	Low (no direct API as of 2025)	Limited
WebVoyager accuracy	89.1% (original 2025 figure)	87%	~56%
Pricing	Free self-hosted; $29+/mo cloud	Requires OpenAI Pro ($200/mo)	Token-based via Claude API

OpenAI Operator is a standalone consumer-facing agent product that runs in its own sandboxed browser. It is designed for non-technical users and offers a polished interface with a takeover mode for entering sensitive credentials. Because it does not expose a developer API, it is not suitable for building custom automation pipelines.

Anthropic Computer Use works by giving Claude the ability to view screenshots of a desktop environment and issue mouse and keyboard commands. Because it operates at the pixel level, it can control any desktop application, not just web pages. However, this pixel-level approach is slower, more expensive in tokens, and less reliable on web-specific tasks than DOM-based methods.

Browser Use targets developers. The open-source library is free, works with any LLM, and can be customized extensively through the action registry. The trade-off is that it requires Python experience and cannot control non-browser desktop applications.

Manus AI used Browser Use as its web interaction layer, demonstrating that the library can serve as infrastructure inside a larger agentic system.

Performance

Browser Use published a technical report presenting results on the WebVoyager benchmark, a widely used evaluation suite for web agents covering tasks across diverse websites including Amazon, Google Flights, GitHub, Hugging Face, Apple, and Booking.com.[^13]

On the WebVoyager benchmark spanning 586 web tasks (57 tasks were removed from the original 643 due to outdated dataset information, Cloudflare blocks, or ambiguous specifications), Browser Use achieved an 89.1% success rate using GPT-4o as the primary model. This compared favorably to OpenAI Operator's reported score of 87.0% on the same benchmark.[^13]

Results by domain:

Domain	Success rate
Hugging Face	100%
Google Flights	95%
Amazon	92%
GitHub	92%
Apple	91%
Booking.com	80%

The Magnitude web agent subsequently reported a 93.9% success rate on WebVoyager using a different technical approach, and a separate submission from Om Labs claimed 98.9% (603 of 610 tasks) in April 2026, though benchmark conditions, model versions, and evaluation methodologies vary substantially across implementations.[^14] On the public 2026 leaderboard snapshot of three-model averages, GLM-5V-Turbo led at 88.5% and Claude Opus 4.6 followed at 88.0%, illustrating how quickly the bar has moved since Browser Use's original 89.1% number was published.[^15] The WebVoyager benchmark has known limitations and is best treated as a directional indicator rather than a definitive ranking.

Use cases

Developers use Browser Use across a wide range of automation scenarios.

Web research and data collection agents visit multiple websites to compare prices, gather product specifications, collect publicly available data, or aggregate information across sources. Because the library works with authenticated sessions, it can access data on sites that would block a traditional scraper.

Form automation covers filling out and submitting web forms, including multi-step workflows like job applications, registration flows, and checkout processes. The agent reads the form structure dynamically, reducing maintenance overhead compared to brittle CSS-selector-based scripts.

CRM and SaaS data entry is another common use case. Companies transfer data between SaaS platforms that lack official API integrations; the agent logs into each platform using a stored browser profile and performs the data transfer through the user interface. The Browser Use Box launch specifically highlighted complex enterprise dashboards such as Microsoft Azure and Google Workspace as a category that benefits from a long-running remote VM rather than a stateless cloud session.[^9]

QA teams write natural-language test scripts that the agent executes against staging environments. Unlike Selenium or Playwright scripts, the agent can handle dynamic layouts that change between releases without requiring selector updates.

Shopping agents compare product availability, track price changes, add items to carts, and complete purchases when prices hit a target threshold.

One widely shared demo showed an agent filling out job applications by reading job requirements, matching them to a resume, and completing application forms across dozens of employer sites.

Limitations

Browser Use has several constraints worth considering.

The library controls a web browser only. It cannot interact with native desktop applications, system dialogs, or operating system interfaces. For those use cases, Computer Use or screen-recording-based agents are more appropriate.

Agent performance depends on the quality of the underlying LLM. On complex tasks with many steps, smaller or less capable models may make incorrect decisions that cause cascading failures. The 89.1% WebVoyager result was achieved with GPT-4o; smaller models score lower.

Some sites aggressively detect automated browsers through fingerprinting, CAPTCHA challenges, or behavior analysis. The open-source library does not include stealth features by default; the cloud platform's stealth browsers are required for consistently accessing these sites.

Sites that load content asynchronously may present an empty or partial element tree if the agent acts before JavaScript finishes rendering. The library includes a wait action and some automatic stabilization, but highly dynamic pages can still cause premature action on incomplete states.

On pages with very large element trees, the serialized representation can consume a significant portion of the LLM's context window, leaving less room for task history and reasoning. The DomService applies filtering heuristics to reduce tree size, but complex pages with many interactive elements can still produce large inputs.

The benchmark methodology also raises comparability questions. The team removed 57 tasks from the WebVoyager evaluation set, which means the reported 89.1% figure is not directly comparable to implementations that run the original unmodified benchmark.

LiteLLM supply chain incident (March 2026)

On March 24, 2026, the upstream litellm Python package was hit by a supply chain attack in which versions 1.82.7 and 1.82.8 were published with malicious code that exfiltrated cloud and CI/CD credentials. The compromised packages were live on PyPI for approximately 40 minutes before being quarantined. Because litellm was a direct dependency of Browser Use (alongside CrewAI, DSPy, Mem0, and several other widely used agent frameworks), the project shipped emergency releases on March 24 (0.12.4, pinning a known-good litellm version) and March 25 (0.12.5, removing litellm from core dependencies entirely and exposing ChatLiteLLM only as an optional install).[^16][^17] Users who installed Browser Use through the broader litellm dependency chain during the exposure window were advised to rotate API keys and audit their environments.

Reception

Browser Use drew immediate attention from the developer community when it launched in late 2024. The GitHub repository grew to 50,000 stars in roughly three months, with 15,000 or more developers actively building on the library. By mid-2025 the project had exceeded 78,000 stars, and by early 2026 the count was approaching 95,000.[^1][^2]

TechCrunch covered the $17 million seed round in March 2025, describing Browser Use as "the tool making it easier for AI agents to navigate websites."[^3] The Runa Capital ROSS Index for Q1 2025 listed Browser Use as one of the top trending open-source AI projects. InfoWorld and SiliconANGLE also covered the funding as evidence of growing demand for agentic web infrastructure.[^18][^19]

Within the YC W25 batch, more than 20 companies adopted Browser Use for core product functionality, an unusually rapid rate of upstream integration for a library that was only a few months old at Demo Day.[^3]

Some developers raised security concerns about running Chrome in remote debugging mode without sandboxing, which opens a CDP port that could be exploited by malicious pages via XSS attacks. The Browser Use team addressed this in documentation by recommending that production deployments use cloud-hosted browsers or configure proper network isolation. The litellm incident in March 2026 sharpened these concerns and prompted the team to publicly tighten the project's dependency posture by removing transitive third-party LLM clients from the core install.[^16]

References

Browser Use

History

Funding

Architecture

DOM extraction and the element tree

Vision-based fallback

Agent loop

Action space

Skills

How it works

Supported models

Browser Use Cloud

Browser Use Box (BUX)

Pricing

Comparison with alternatives

Performance

Use cases

Limitations

LiteLLM supply chain incident (March 2026)

Reception

See also

References

Improve this article

Related Articles

OpenClaw

Browser-use agent

Context engineering

Dev tools

Tailwind CSS

LangChain

Browser Use

History

Funding

Architecture

DOM extraction and the element tree

Vision-based fallback

Agent loop

Action space

Skills

How it works

Supported models

Browser Use Cloud

Browser Use Box (BUX)

Pricing

Comparison with alternatives

Performance

Use cases

Limitations

LiteLLM supply chain incident (March 2026)

Reception

See also

References

Related Articles

OpenClaw

Browser-use agent

Context engineering

Dev tools

Tailwind CSS

LangChain