Browser Use is an open-source Python library that enables large language models to control web browsers, turning any LLM into an agent capable of browsing the web, clicking buttons, filling forms, and completing multi-step tasks online. Released in late 2024 by Magnus Müller and Gregor Žunič, the project accumulated over 50,000 GitHub stars within three months of its public release and surpassed 78,000 stars by early 2025, making it one of the fastest-growing AI agent infrastructure projects on GitHub. The company behind the library, Browser Use Inc., was part of Y Combinator's Winter 2025 batch and raised a $17 million seed round in March 2025.
Where proprietary alternatives like OpenAI Operator and Computer use from Anthropic rely on closed APIs, Browser Use provides an open-source foundation that developers can inspect, modify, and self-host. It wraps the Playwright browser automation library in an LLM-driven control loop, exposing a simple Python interface for building web agents.
Magnus Müller had been working on web scraping tools for several years before meeting Gregor Žunič in 2024 while both were completing master's degrees in data science at ETH Zurich. The two began collaborating at ETH Zurich's Student Project House accelerator, a workspace designed for side projects. They built the first version of Browser Use in approximately five weeks.
The initial demo resonated with developers building AI agents who had struggled with brittle web automation. Müller and Žunič open sourced the project under the MIT license and the GitHub repository grew rapidly, attracting contributors and users building everything from personal automation scripts to commercial products. The speed of adoption caught the attention of Y Combinator, and the two founders joined the Winter 2025 batch in San Francisco.
By early 2025, Browser Use had become a common dependency in AI agent projects. The company announced its $17 million seed round in March 2025 and around the same time began rolling out Browser Use Cloud, a managed hosting product that lets teams use Browser Use without running their own browser infrastructure.
In March 2025, Chinese AI startup Butterfly Effect launched Manus AI, an autonomous agent that could complete complex multi-step tasks such as resume screening and stock analysis. Manus became a viral phenomenon, and Browser Use was one of the tools powering its web interaction capabilities. A post explaining this connection garnered over 2.4 million views on X, introducing Browser Use to a large audience outside the developer community.
Browser Use announced a $17 million seed round on March 23, 2025. The round was led by Felicis Ventures, with participation from A Capital, Nexus Venture Partners, Y Combinator, and Paul Graham as an individual investor.
At the time of the funding announcement, the company employed approximately seven people and had been adopted by engineering teams at companies including Airbnb, Amazon, and Anthropic. More than 20 companies in the YC W25 batch were using Browser Use in their core workflows. The seed funding was directed toward building out Browser Use Cloud, expanding the team, and improving the reliability of the web agent framework.
Browser Use is built on top of Playwright, Microsoft's open-source browser automation library. Playwright handles the low-level browser control, while Browser Use adds an LLM reasoning layer on top using the Chrome DevTools Protocol (CDP) to communicate with browser instances.
The central architectural insight is how Browser Use represents web pages to the language model. Rather than sending raw HTML or screenshots to the LLM, it extracts and processes the page's Document Object Model (DOM) tree into a cleaned, structured representation that strips out irrelevant styling and metadata. This text-based format is considerably cheaper to process with a language model than pixel-based approaches, which must encode and decode full-resolution screenshots on every step.
The core component responsible for making web pages readable to LLMs is the DomService. When an agent loads a page, the process works as follows:
The result looks something like:
<sup><a href="#cite_note-1" class="cite-ref">[1]</a></sup> <button>Sign in</button>
<sup><a href="#cite_note-2" class="cite-ref">[2]</a></sup> <input type="text" placeholder="Email address" />
<sup><a href="#cite_note-3" class="cite-ref">[3]</a></sup> <a href="/forgot-password">Forgot password?</a>
The LLM reads this representation, decides which element to interact with, and returns a structured action command referencing elements by their index. This avoids the need to parse raw HTML or interpret screenshots for precise click targeting.
For pages where DOM extraction is insufficient, Browser Use includes a screenshot-based fallback. When Claude Sonnet 4 or Claude Opus 4 models are used, coordinate clicking is automatically enabled, letting the agent click on arbitrary coordinates in a screenshot if the element tree does not contain a suitable target. This hybrid approach lets the library handle dynamic content, canvas elements, and pages that use non-standard rendering pipelines.
The Agent class orchestrates execution through an iterative loop:
MessageManager constructs a prompt that includes the current browser state (element tree or screenshot), the task description, and the conversation history.Tools registry dispatches the action to the appropriate browser operation via CDP.done action or a maximum step count is reached.An ActionLoopDetector monitors for repetitive patterns in the action history and can interrupt the loop to prevent the agent from cycling in place. An AgentHistoryList maintains a record of all previous steps, giving the LLM context about what has already been attempted.
Browser Use exposes a registry of high-level actions that the LLM can invoke. The built-in action set includes:
| Action | Description |
|---|---|
navigate | Load a URL in the browser |
click | Click an element by its index |
type | Type text into an input field |
scroll | Scroll the page in a given direction |
extract | Run a separate LLM call against the page content to retrieve specific information |
go_back | Navigate to the previous page |
wait | Pause execution for a specified duration |
screenshot | Capture the current page state as an image |
select_dropdown | Select an option from a dropdown menu |
upload_file | Upload a file to a file input |
evaluate | Execute arbitrary JavaScript on the page |
done | Signal task completion and return a result |
Developers can register custom actions by decorating Python functions with the @action decorator and passing them to the agent at initialization. Custom actions give agents access to domain-specific operations such as writing to a database, calling an external API, or reading from the local file system.
The extract action is a notable design choice. Rather than asking the main LLM to reason about large amounts of page text, it spawns a separate, lightweight LLM call against the page's markdown content to pull out only the relevant data, keeping the main context window focused on decision-making.
Installing Browser Use requires Python 3.11 or higher. The package is available on PyPI:
uv add browser-use
A minimal agent looks like this:
from browser_use import Agent
from langchain_openai import ChatOpenAI
import asyncio
agent = Agent(
task="Find the price of a MacBook Pro on apple.com",
llm=ChatOpenAI(model="gpt-4o"),
)
asyncio.run(agent.run())
The Agent class accepts any LangChain-compatible LLM, a task description in plain English, and optional configuration for browser settings, maximum number of steps, and callbacks. The agent runs asynchronously and returns a structured result when it completes or fails.
More complex workflows can chain multiple agents, define custom actions that extend the agent's capabilities beyond the built-in browser actions, or integrate Browser Use into larger agent systems. The library exposes hooks for observing agent state at each step, which makes it possible to build monitoring dashboards, inject human-in-the-loop approvals, or log detailed traces for debugging.
Browser Use is model-agnostic. Because it uses LangChain's model abstraction layer, it works with any LLM provider that LangChain supports. The library maintains wrapper classes for a wide range of providers:
| Provider | Notes |
|---|---|
Browser Use Cloud (ChatBrowserUse) | Proprietary model optimized for browser tasks; reported 3-5x faster than general models; variants: bu-latest, bu-2-0 |
OpenAI (ChatOpenAI) | GPT-4o, GPT-4.1, and compatible models |
Anthropic (ChatAnthropic) | Claude Sonnet and Opus; coordinate clicking auto-enabled for Sonnet 4 and Opus 4 |
Google (ChatGoogle) | Gemini 2.5 Flash and Pro; Vertex AI and Gemma also supported |
| Azure OpenAI | Azure-hosted GPT models |
| AWS Bedrock | Claude and other Bedrock-hosted models |
| DeepSeek | DeepSeek-V3, DeepSeek-R1 |
| Mistral | Mistral and Mixtral models |
| Groq | Llama models served via Groq's inference API |
| Ollama | Local models |
| OpenRouter | Access to 300+ models through a single endpoint |
| LiteLLM | Any LiteLLM-compatible provider string |
The ChatBrowserUse model is a proprietary model built by Browser Use for browser automation tasks. According to the company, it completes tasks three to five times faster than general-purpose models on web benchmarks while maintaining comparable accuracy. It is available through the Browser Use Cloud API.
For cost-sensitive applications, smaller models like GPT-4o mini and Gemini 2.0 Flash work reasonably well on straightforward tasks, while complex multi-step workflows typically benefit from more capable models. Vision-language models are preferable for pages with complex visual layouts that the DOM representation alone does not capture accurately.
In addition to the open-source library, Browser Use operates a managed cloud platform at browser-use.com. The cloud platform provides managed browser infrastructure so developers do not need to run Playwright or Chrome locally.
Browser Use Cloud offers two primary deployment modes. In Browser Harness mode, teams use their own agent code (the open-source library) but connect to cloud-hosted browsers rather than spinning up local Chromium instances. In Hosted V3 Agent mode, a fully managed agent accepts a task description and handles everything, from browser control to LLM reasoning, without requiring any agent code from the user.
Key features of Browser Use Cloud include stealth browsers with fingerprinting bypass for sites that detect automated traffic, built-in CAPTCHA solving, managed proxy pools with geographic distribution, parallel session execution, and persistent browser profiles for sites requiring authentication.
Browser sessions on the cloud cost $0.06 per hour, with proxy bandwidth charged at $5 per GB. The V3 Agent uses token-based pricing at 1.2 times the underlying provider's rates. The platform offers subscription tiers:
| Plan | Monthly price | Credits included | Concurrent sessions |
|---|---|---|---|
| Free | $0 | Community browsers only | 3 |
| Dev | $29 | $29 | 25 |
| Business | $299 | $400 | 200 |
| Scaleup | $999 | $1,400 | 500 |
| Enterprise | Custom | Negotiated | Negotiated |
Business and Scaleup subscribers receive a 25% discount on LLM step costs and a 50% discount on skill execution, browser sessions, and proxy data. Users can bring their own API keys from Anthropic, OpenAI, or Google to avoid the token markup. New accounts receive $10 in free credits.
Browser Use occupies a different position in the market from the two most prominent proprietary alternatives, OpenAI Operator and Anthropic's Computer use. The three approaches differ significantly in architecture, cost structure, and intended audience.
| Feature | Browser Use | OpenAI Operator | Anthropic Computer Use |
|---|---|---|---|
| Open source | Yes (MIT) | No | No |
| Self-hostable | Yes | No | No |
| Interaction approach | DOM element tree (+ vision fallback) | Screenshots (vision-first) | Screenshots (vision-first) |
| Model support | Any LLM (15+ providers) | Proprietary CUA model | Claude only |
| Desktop app control | No | No | Yes |
| Developer customization | High (custom actions, model choice) | Low (no direct API as of 2025) | Limited |
| WebVoyager accuracy | 89.1% | 87% | ~56% |
| Pricing | Free self-hosted; $29+/mo cloud | Requires OpenAI Pro ($200/mo) | Token-based via Claude API |
OpenAI Operator is a standalone consumer-facing agent product that runs in its own sandboxed browser. It is designed for non-technical users and offers a polished interface with a takeover mode for entering sensitive credentials. Because it does not expose a developer API, it is not suitable for building custom automation pipelines.
Anthropic Computer Use works by giving Claude the ability to view screenshots of a desktop environment and issue mouse and keyboard commands. Because it operates at the pixel level, it can control any desktop application, not just web pages. However, this pixel-level approach is slower, more expensive in tokens, and less reliable on web-specific tasks than DOM-based methods.
Browser Use targets developers. The open-source library is free, works with any LLM, and can be customized extensively through the action registry. The trade-off is that it requires Python experience and cannot control non-browser desktop applications.
Manus AI used Browser Use as its web interaction layer, demonstrating that the library can serve as infrastructure inside a larger agentic system.
Browser Use published a technical report presenting results on the WebVoyager benchmark, a widely used evaluation suite for web agents covering tasks across diverse websites including Amazon, Google Flights, GitHub, Hugging Face, Apple, and Booking.com.
On the WebVoyager benchmark spanning 586 web tasks (57 tasks were removed from the original 643 due to outdated dataset information, Cloudflare blocks, or ambiguous specifications), Browser Use achieved an 89.1% success rate using GPT-4o as the primary model. This compared favorably to OpenAI Operator's reported score of 87.0% on the same benchmark.
Results by domain:
| Domain | Success rate |
|---|---|
| Hugging Face | 100% |
| Google Flights | 95% |
| Amazon | 92% |
| GitHub | 92% |
| Apple | 91% |
| Booking.com | 80% |
The Magnitude web agent subsequently reported a 93.9% success rate on WebVoyager using a different technical approach, though benchmark conditions and evaluation methodologies vary across implementations, making direct comparisons imprecise. The WebVoyager benchmark has known limitations and is best treated as a directional indicator rather than a definitive ranking.
Developers use Browser Use across a wide range of automation scenarios.
Web research and data collection agents visit multiple websites to compare prices, gather product specifications, collect publicly available data, or aggregate information across sources. Because the library works with authenticated sessions, it can access data on sites that would block a traditional scraper.
Form automation covers filling out and submitting web forms, including multi-step workflows like job applications, registration flows, and checkout processes. The agent reads the form structure dynamically, reducing maintenance overhead compared to brittle CSS-selector-based scripts.
CRM and SaaS data entry is another common use case. Companies transfer data between SaaS platforms that lack official API integrations; the agent logs into each platform using a stored browser profile and performs the data transfer through the user interface.
QA teams write natural-language test scripts that the agent executes against staging environments. Unlike Selenium or Playwright scripts, the agent can handle dynamic layouts that change between releases without requiring selector updates.
Shopping agents compare product availability, track price changes, add items to carts, and complete purchases when prices hit a target threshold.
One widely shared demo showed an agent filling out job applications by reading job requirements, matching them to a resume, and completing application forms across dozens of employer sites.
Browser Use has several constraints worth considering.
The library controls a web browser only. It cannot interact with native desktop applications, system dialogs, or operating system interfaces. For those use cases, Computer Use or screen-recording-based agents are more appropriate.
Agent performance depends on the quality of the underlying LLM. On complex tasks with many steps, smaller or less capable models may make incorrect decisions that cause cascading failures. The 89.1% WebVoyager result was achieved with GPT-4o; smaller models score lower.
Some sites aggressively detect automated browsers through fingerprinting, CAPTCHA challenges, or behavior analysis. The open-source library does not include stealth features by default; the cloud platform's stealth browsers are required for consistently accessing these sites.
Sites that load content asynchronously may present an empty or partial element tree if the agent acts before JavaScript finishes rendering. The library includes a wait action and some automatic stabilization, but highly dynamic pages can still cause premature action on incomplete states.
On pages with very large element trees, the serialized representation can consume a significant portion of the LLM's context window, leaving less room for task history and reasoning. The DomService applies filtering heuristics to reduce tree size, but complex pages with many interactive elements can still produce large inputs.
The benchmark methodology also raises comparability questions. The team removed 57 tasks from the WebVoyager evaluation set, which means the reported 89.1% figure is not directly comparable to implementations that run the original unmodified benchmark.
Browser Use drew immediate attention from the developer community when it launched in late 2024. The GitHub repository grew to 50,000 stars in roughly three months, with 15,000 or more developers actively building on the library. By mid-2025, the project had exceeded 78,000 stars.
TechCrunch covered the $17 million seed round in March 2025, describing Browser Use as "the tool making it easier for AI agents to navigate websites." The Runa Capital ROSS Index for Q1 2025 listed Browser Use as one of the top trending open-source AI projects. InfoWorld and SiliconANGLE also covered the funding as evidence of growing demand for agentic web infrastructure.
Within the YC W25 batch, more than 20 companies adopted Browser Use for core product functionality, an unusually rapid rate of upstream integration for a library that was only a few months old at Demo Day.
Some developers raised security concerns about running Chrome in remote debugging mode without sandboxing, which opens a CDP port that could be exploited by malicious pages via XSS attacks. The Browser Use team addressed this in documentation by recommending that production deployments use cloud-hosted browsers or configure proper network isolation.