Browser Use (Python library)
Last reviewed
May 24, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 3,702 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 24, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 3,702 words
Add missing citations, update stale details, or suggest a clearer explanation.
Browser Use is an open-source Python library for AI-controlled web browser automation. The project, hosted at github.com/browser-use/browser-use, exposes a small Agent API that drives a real playwright mcp-style browser via Playwright and lets a large language model click, type, scroll, and navigate web pages to complete tasks expressed in natural language.[^1][^2] Browser Use was started in 2024 by Magnus Müller and Gregor Žunič, two ETH Zurich master's students, and was open-sourced under the MIT license on 6 November 2024 as version 0.1.0.[^2][^3] The library accumulated roughly 50,000 GitHub stars in its first three months and grew to more than 95,000 stars by mid-2026, making it one of the fastest-growing AI repositories ever published.[^3][^4][^1] The associated company, Browser Use Inc., went through Y Combinator's Winter 2025 batch and raised a $17 million seed round announced on 23 March 2025, led by Felicis Ventures.[^5][^6]
The library belongs to the wider category of ai browser agent systems, alongside hosted offerings such as openai operator, anthropic computer use, and chatgpt agent. Whereas those products are proprietary services tied to a single LLM provider, Browser Use is a model-agnostic Python framework that can be driven by any LLM, runs locally on a developer's machine, and exposes the underlying browser session to the user.[^1][^7] The implementation pattern (observe page state, prompt the LLM with a structured description, parse a tool call, execute in the browser, repeat) has become a reference design for the broader browser-use agent concept and for agentic workflow research more generally.[^7][^8]
Magnus Müller and Gregor Žunič met in 2024 while both were enrolled in master's programmes at ETH Zurich. Müller, originally from a small German town, had completed a bachelor's in cognitive science at the University of Osnabrück (2020 to 2022), spent an exchange semester at the National University of Singapore in 2022, and had previously co-founded a traffic-optimisation startup called GreenWAI before pausing an MSc in computer science at ETH Zurich.[^9] He had also worked on research with Cambridge CARES, the Cambridge Centre for Advanced Research and Education in Singapore, on web-scraping and process automation projects.[^9][^10] Žunič, who studied physics as an undergraduate and was completing a master's in data science at ETH, became the second co-founder.[^10]
The pair began the project at ETH's Student Project House, an on-campus incubator that Müller described as a space where "it's forbidden to study and you have to work on your side projects."[^9] An early prototype combining web scraping with an LLM was built in roughly four to five days. According to founder interviews, the concept emerged from Müller's frustration with software complexity: "why can't I just tell my computer what to do and it figures out itself how to do the tasks?"[^9]
The Browser Use repository was created on GitHub during October 2024, and version 0.1.0 of the browser-use PyPI package was published on 6 November 2024.[^2] An initial demonstration video posted to social media and a Show HN style submission on Hacker News drove early traffic. The repository reached roughly 25,000 GitHub stars within three months of the first commit and crossed 50,000 stars by early 2025, a growth rate that the founders described as "wild" in an interview with TechCrunch.[^4][^7][^3]
Browser Use was admitted to Y Combinator's Winter 2025 batch.[^3][^11] During the programme, the founders reported that more than 20 other companies in the same YC cohort were already using the library for their own products, including the viral generalist agent Manus by Butterfly Effect.[^7][^3] In a March 2025 interview with TechCrunch, Žunič observed that downloads jumped from roughly 5,000 per day on 3 March 2025 to about 28,000 per day on 10 March 2025 after a viral X post about Manus drew attention to its dependency on Browser Use.[^7]
On 23 March 2025, the company announced a $17 million seed round led by Astasia Myers of Felicis Ventures, with participation from A Capital, Nexus Venture Partners, Y Combinator, Paul Graham, Liquid2 Ventures, SV Angel, and the Pioneer Fund.[^5][^6] The funding had not previously been reported. According to the founder profile published by Ambitious & Driven, the company had collected about $4 million in uncapped SAFEs before YC's Demo Day and went on to take roughly 140 investor meetings before closing the round.[^9]
In parallel with the open-source library, Browser Use launched a hosted cloud service at browser-use.com. The cloud offering is priced from $30 per month for individuals, runs the agent loop on managed Chrome instances with rotating proxies, and offers stealth-mode browsers and a CAPTCHA-solving pipeline.[^11][^12] The library and the cloud share an evaluation harness; in March 2026 the company reported a 97% success rate on the Online-Mind2Web benchmark using the cloud agent, which the report described as the highest publicly reported score on that benchmark.[^13]
A Browser Use agent runs an observe-think-act loop. At each step it:
click, input_text, go_to_url, scroll, extract, done, etc.) through Playwright.[^1][^14]The loop continues until the model emits a done action, the configured step limit is reached, or an error is unrecoverable. Every step is logged with the action, the page URL, and the resulting state, which the developers say allows deterministic rerun of a captured trajectory using stored xPaths.[^14]
A central design choice in Browser Use is to feed the model a compressed, text-first representation of the page rather than relying on screenshots alone. The library walks the DOM, filters for visible and interactive nodes, and assigns each one a small integer index that the LLM can refer to in its tool calls (for example, click_element(index=42)). This approach is similar in spirit to set-of-mark prompting used in academic research on web agents and is intended to reduce hallucinations relative to coordinate-based clicking.[^15][^14]
When a vision-capable LLM is in use, Browser Use additionally captures a screenshot of the viewport and overlays the same numeric labels on the rendered elements, so the model can correlate visual and textual descriptions. The screenshot path uses Playwright's built-in capture API and is optional; agents can run in a pure DOM mode that is cheaper and faster but lacks visual reasoning.[^14][^7]
The browser layer is built on Playwright, the cross-browser automation library originally developed at Microsoft. Browser Use ships with a default Chromium launcher and supports persistent contexts, custom profiles, and remote browser endpoints via the Chrome DevTools Protocol. Users can attach to an existing Chrome instance on a developer's machine (for authenticated sessions) or to a remote browser hosted by a service such as Browserbase or the company's own cloud.[^14][^12]
Browser Use is model-agnostic. The repository's documentation lists native chat-model wrappers for OpenAI, Anthropic, Google, Groq, Ollama (for local models), and AWS Bedrock, along with a generic litellm adapter that routes to any provider that LiteLLM supports. In May 2026 the project added ChatBrowserUse, a proprietary model and routing layer optimised for browser tasks, with usage-based pricing.[^14][^16] Earlier releases used langchain-style chat interfaces; later versions of the library exposed a thinner, vendor-specific API while retaining LangChain compatibility.[^17][^14]
Beyond freeform tasks, agents can be given a Pydantic schema as the desired output type. The agent then performs its loop and returns a typed object (for example, a list of product objects with title, price, and url fields) rather than free text. This pattern, sometimes called structured extraction, is used in many of the data-collection workflows that the community has shipped on top of the library.[^14][^7]
Recent versions of the package ship a browser-use command-line interface that wraps the Agent class and persists browser state across runs. A uvx browser-use init --template default quickstart generates a starter project with example tasks. The CLI is also exposed as a Claude Code skill so that claude code users can invoke the agent without writing any Python code.[^14][^16]
Public examples and case studies group the library's use cases into a small number of patterns:
Browser Use uses a hybrid representation: the DOM-derived list of interactive elements is the primary handle, and the screenshot serves as a supplementary signal for vision-capable models. Independent analyses have noted that this is closer to how an anthropic computer use agent would behave if it could read the accessibility tree directly, and that it tends to be more reliable than the purely vision-driven approach used by some competitors.[^15] The trade-off is that the agent depends on the DOM being well-formed; canvas-heavy applications can be harder to drive than standard HTML pages.[^15]
Because every action is logged with the xPath of the targeted element, a successful trajectory can be replayed without consulting the LLM, much like recording a macro. The launch announcement on Hacker News described this as a way to use the agent once to discover a flow and then run that flow as a cheap deterministic script afterwards.[^17]
The first widely cited benchmark result was on the WebVoyager benchmark, a set of 643 live web tasks introduced by researchers at Tencent in 2024. On 15 December 2024, Browser Use published a technical report claiming an 89.1% success rate on a slightly modified version of WebVoyager (586 of the original tasks, after the team removed 55 deemed impossible or outdated). The evaluation used GPT-4o as the underlying LLM, and the report noted task-level success rates of 100% on Hugging Face, 95% on Google Flights, and 92% on Amazon.[^18] At the time of publication this was the highest reported result on that benchmark. Subsequent third-party evaluations have reported lower numbers for Browser Use under stricter protocols, and other agents have since claimed higher headline scores; the company has acknowledged that scores are not directly comparable across labs because of differences in task selection and evaluation criteria.[^19][^18]
A more recent benchmark, Online-Mind2Web, contains 300 tasks across 136 live websites grouped into easy (83), medium (143), and hard (74) categories. In a March 2026 technical report, Browser Use Cloud reported 97% success on Online-Mind2Web. The team described the result as the highest publicly reported score and attributed it to an "Auto-Research" methodology in which claude code was given a CLI to the evaluation harness and ran roughly 20 cycles of tree-search experiments across parallel browser sessions to discover prompt and tool-design improvements. A key engineering change was the addition of an in-agent Python execution environment for HTML parsing and data extraction, which the report says aligned better with the underlying LLM's training distribution.[^13]
The published comparison table on the same date placed Browser Use Cloud at 97%, GPT-5.4 Native Computer Use at 93%, UI-TARS-2 at 88%, and an "ABP" agent built on Claude Opus 4.6 at 86%.[^13]
The Browser Use team has also evaluated the library on tasks drawn from webarena and osworld, though these are not the primary headline results. WebArena focuses on self-hosted web applications and OSWorld extends evaluation to full desktop environments; both have been used as supplementary stress tests in technical reports rather than as the principal benchmark for the project.[^18]
The repository crossed 50,000 GitHub stars by March 2025 and roughly 95,000 stars by May 2026, with more than 10,000 forks.[^1][^3] The project's Discord server reached more than 5,000 members in the first year, and PyPI download statistics tracked daily download counts in the tens of thousands by early 2025.[^11][^7]
Among the most visible production users is Manus, a generalist agent product by Chinese startup Butterfly Effect that went viral in early 2025; Manus uses Browser Use to drive its web-browsing actions.[^7][^20] The TechCrunch reporting indicated that more than 20 companies in YC's Winter 2025 batch alone integrated the library into their own products.[^7][^3] Other coverage has identified use of the library inside indie developer projects, no-code agent builders, and internal automations at established SaaS vendors, though many of these deployments are not publicly attributed.[^11][^20]
The Browser Use GitHub organisation maintains several adjacent repositories, including a web UI (browser-use/web-ui), a Python SDK for the cloud service (browser-use/browser-use-python), and a desktop application (browser-use/desktop). The library has also been wrapped as a skill for claude code and as a server under the model context protocol.[^16][^14]
The 2025 to 2026 wave of AI browser agents produced several adjacent tools with different design choices. The table summarises the most commonly compared options.
| Tool | Language | Primary representation | LLM coupling | Hosted offering |
|---|---|---|---|---|
| Browser Use | Python | DOM + optional screenshot | Multi-provider | browser-use.com cloud |
| Stagehand (Browserbase) | TypeScript | DOM (Playwright wrapper) | Multi-provider | Browserbase cloud |
| Skyvern | Python | Screenshot first | Vision-capable models | Skyvern cloud |
| AgentQL | Python / JS | Structured query language | Multi-provider | AgentQL cloud |
| Anthropic Computer Use | Any | Screenshot first | Claude only | API only |
| OpenAI Operator | Web app | Screenshot first | GPT-4o family | ChatGPT only |
Source: vendor documentation and independent comparisons.[^15][^11][^21]
Stagehand, developed by Browserbase, extends Playwright with high-level natural-language verbs (act, extract, observe) that the developer invokes only when needed; the rest of the script is conventional Playwright. Stagehand caches action mappings after first use so that repeated runs incur no LLM call. Browser Use takes the inverse approach: the entire script is replaced by a single high-level task string, and the agent re-reasons at every step. The trade-off is flexibility versus inference cost. Browser Use is generally seen as easier to start with but more expensive at scale, while Stagehand requires more upfront engineering but is cheaper for repeated workflows.[^15][^21]
Skyvern, a YC W23 company, treats the DOM as unreliable and feeds screenshots to a vision-capable model that locates buttons, forms, and text the way a human would. The approach is robust to canvas-heavy applications and to layout changes that break selectors, but is more expensive per step and depends heavily on the model's grounding ability. Browser Use sits in between: it uses the DOM as the primary handle and the screenshot as a secondary signal.[^15][^21]
AgentQL exposes a declarative query language for the web; developers write a small expression that names the elements they want, and AgentQL maps those names to live DOM nodes. The resulting workflow is closer to a typed scraping API than to an agent. Browser Use, by contrast, takes a natural-language goal and discovers the necessary actions at runtime.[^21]
Compared to openai operator and anthropic computer use, Browser Use is the open-source, model-agnostic alternative. Operator runs only inside ChatGPT against OpenAI's hosted browser; Computer Use is an Anthropic-specific tool that issues coordinates to a virtual machine controlled by Claude. Both can drive only their own models and run on infrastructure controlled by the vendor. Browser Use runs locally, accepts any LLM, and exposes the browser to the developer. The flip side is that the developer is responsible for proxies, captchas, and rate limits; the cloud product exists in part to absorb this operational burden.[^11][^7]
Multi-On was an earlier (2023 to 2024) consumer-facing browser agent that ran as a Chrome extension and was driven by GPT-4. The product wound down its public consumer offering in 2024, leaving the open-source category dominated by Browser Use and Stagehand. The two efforts shared the basic observe-think-act loop but differed in distribution: Multi-On targeted end users with a packaged extension, while Browser Use targets developers with a library.[^21][^11]
The Python library is free and MIT-licensed.[^2][^16] The Browser Use Cloud service publishes a tiered pricing page that, as documented in the company's Hacker News launch post and the official site, starts at $30 per month for individuals with hosted browsers, proxy rotation, and CAPTCHA solving included.[^17][^11] Enterprise pricing is custom. The proprietary ChatBrowserUse model is billed at $0.20 per million input tokens and $2.00 per million output tokens at the rates documented in the May 2026 release notes.[^16]
Reaction to the project has been broadly positive. TechCrunch described Browser Use as "the tool making it easier for AI 'agents' to navigate websites,"[^5] and the Hacker News launch thread on 25 February 2025 produced largely favourable discussion, with commenters comparing the library favourably to closed-source alternatives.[^17] Critical responses on the same thread and on developer-focused blogs centred on three concerns: the cost of running the agent on the LLM side, the security implications of giving an LLM control of a logged-in browser session (including exposure of cookies and credentials via the Chrome DevTools Protocol), and the load that widespread agent traffic could place on hosting infrastructure.[^17] Some third-party benchmark write-ups have also questioned whether the headline WebVoyager score of 89.1% can be reproduced under stricter evaluation.[^19]
The team's response has emphasised continued investment in evaluation transparency (publishing the modified WebVoyager harness as open source), improvements to the underlying agent loop, and the introduction of a hosted product to absorb operational risks that individual developers cannot easily manage.[^18][^13][^11]
Several limitations are acknowledged by the maintainers or have been documented by users: