Playwright MCP

AI Agents Developer Tools Microsoft

18 min read

Updated Jul 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 23, 2026

Fact-checked

In review queue

Sources

26 citations

Revision

v3 · 3,593 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Playwright MCP is an open-source model context protocol server developed and maintained by microsoft that lets large language model agents control real web browsers through the Playwright cross-browser automation library. Released by the Playwright team on 22 March 2025 at the GitHub repository microsoft/playwright-mcp, it exposes a set of MCP tools for navigation, clicking, typing, form filling, screenshot capture, network interception, and reading page state.^[1]^[2] The server distinguishes itself from screenshot-only desktop control agents by operating primarily on the browser accessibility tree, which produces compact structured snapshots that LLMs parse with a fraction of the tokens required for full DOM dumps or pixel-based vision.^[3] As of May 2026 the package is published on npm as @playwright/mcp and is wired into clients including claude code, Claude Desktop, cursor, windsurf, and Microsoft Visual Studio Code in GitHub Copilot agent mode.^[1]^[4]

Background

The Model Context Protocol is an open standard announced by Anthropic on 25 November 2024 to give LLM applications a uniform way to discover and call external tools, fetch data, and read prompts from outside systems.^[5] The initial release shipped with reference servers for Google Drive, Slack, GitHub, Git, Postgres, and Puppeteer along with SDKs in Python, TypeScript, C#, and Java.^[5]^[6] OpenAI announced official adoption of MCP across its Agents SDK, Responses API, and ChatGPT desktop app in March 2025, and Google followed with native support in Gemini and a set of managed MCP servers for Maps, BigQuery, Compute Engine, and Kubernetes Engine later in 2025.^[6]^[7] By the end of 2025 Anthropic donated the protocol to the Linux Foundation through the new Agentic AI Foundation, with Block and OpenAI as co-founders.^[6]

Playwright itself predates the MCP project by several years. Microsoft released the Playwright 1.0 cross-browser test automation framework on 6 May 2020, after a public preview that began on 31 January 2020.^[8] The library was built by a team that included engineers who had previously worked on Google's Puppeteer headless Chrome project; the design goal was a single API that drives Chromium, Firefox, and WebKit with the same code, with built-in handling of timing flakiness, network interception, and parallel test execution.^[8] By 2026 the microsoft/playwright repository had passed 89,000 GitHub stars, with bindings for JavaScript, TypeScript, Python, .NET, and Java.^[9]

The combination of these two threads, an open agent protocol and a mature cross-browser driver, made an official MCP wrapper around Playwright a natural step. Microsoft's Playwright team shipped the first public version of playwright-mcp on 22 March 2025, three days before independent developer Simon Willison covered the launch on his weblog and described the implementation as "pretty fascinating" because Claude Desktop could now drive a browser by reading the Chrome accessibility tree rather than by analyzing screenshots.^[2]^[10]

Release history

Date	Version	Notes
22 March 2025	First public release	Initial publish of `@playwright/mcp` on npm; `microsoft/playwright-mcp` repository made public.^[1]^[11]
March-April 2025	v0.0.x point releases	Rapid iteration on tool naming, snapshot format, and configuration flags.
6 February 2026	v0.0.64	Expanded tab management and vision tools.^[11]
14 February 2026	v0.0.67 / v0.0.68	Added storage-state and tracing utilities.^[11]
30 March 2026	v0.0.69	Documentation reorganization.^[11]
7 May 2026	v0.0.75	Latest version at time of writing.^[11]

The version numbers stayed in the 0.0.x range through 65 releases, reflecting a deliberate decision by the maintainers to keep the package marked as pre-1.0 while the tool surface stabilized.^[11] Despite the pre-release labeling, the GitHub repository reached more than 32,000 stars within roughly a year of launch.^[12]

Architecture

Playwright MCP is a Node.js process. When an MCP client launches it, the server creates a Playwright Browser instance, exposes a list of MCP tools over a transport, and translates each tool call into a Playwright API invocation against an open page or tab.^[1]^[13]

Transports

The server supports two transports drawn from the MCP specification.

STDIO. The default. The MCP client spawns the server as a child process and exchanges JSON-RPC messages over standard input and standard output. This mode is used by Claude Desktop, Claude Code, Cursor, Windsurf, and VS Code when configured with command: "npx" and args: ["@playwright/mcp@latest"].^[4]^[13]
HTTP with Server-Sent Events. Activated by passing a --port flag. The server then listens on /mcp and /sse endpoints, accepts POST requests from clients, and streams responses back as SSE events. Each client gets its own session identifier, allowing one server process to handle multiple concurrent agents.^[13]

The transport abstraction means the same tool implementations run whether the agent is local or remote, which is how the official Playwright MCP Chrome Extension attaches to an existing browser session over a WebSocket bridge.^[1]

Browser backends

Because Playwright MCP wraps Playwright, it inherits support for Chromium, Firefox, and WebKit on Linux, macOS, and Windows.^[9]^[13] By default it launches Chromium in headed mode, which is the inverse of the Playwright CLI default of headless, and it persists cookies and local storage in a user data directory between sessions so that login state survives across agent invocations.^[4]^[14] Command line flags select alternative browsers, isolated incognito profiles, executable paths for branded builds of Chrome or Edge, and viewport dimensions.^[1]^[14]

Snapshot versus vision modes

A defining design choice is the way the agent perceives the page.

Snapshot mode (default). On each tool call, the server returns a structured YAML-like accessibility snapshot of the current page. Each interactive element receives a stable reference identifier of the form e1, e2, e3 that the agent uses to address that element in subsequent tool calls. The accessibility tree is the same hierarchy that screen readers and assistive technology consume, with roles, names, and parent-child relationships rather than pixel coordinates.^[3]^[15]
Vision mode (optional). When the agent needs to read a canvas, an image, or a layout that has no usable accessibility tree, the server can be configured to expose coordinate-based mouse and keyboard tools and return screenshots. This mode is heavier on tokens and depends on a vision-capable model.^[3]^[15]

Public benchmarks from the Playwright documentation report that a snapshot consumes roughly 200 to 400 tokens for a typical page, against several thousand for a raw DOM dump or a base64-encoded screenshot.^[3] Third-party analysis by Morph notes that an entire snapshot-driven workflow can still climb into six figures of tokens over a long session because the server returns a fresh snapshot after every action, which is why MCP usage is often paired with CLI-style on-disk snapshots for production test runs.^[16]

Tools

The tool catalog in v0.0.75 contains about 40 named tools grouped into several capability families. The Playwright documentation organizes them into four buckets: Core, Network and Storage, Testing and Debugging, and Vision.^[14]^[15]

Core interaction

Tool	Purpose
`browser_navigate`	Open a URL in the active tab.^[14]
`browser_navigate_back`, `browser_navigate_forward`	Move through browser history.^[14]
`browser_snapshot`	Capture the accessibility tree of the current page and return it with element refs.^[15]
`browser_click`	Click an element identified by its snapshot ref.^[15]
`browser_type`	Type text into a focused or referenced field, with submit-on-enter as an option.^[15]
`browser_fill_form`	Fill a structured form mapping fields to values in a single call.^[14]
`browser_press_key`	Send a keyboard key, including modifier combinations.^[14]
`browser_hover`	Hover over a referenced element.^[14]
`browser_drag`	Drag from one ref to another.^[14]
`browser_select_option`	Pick a value in a select element by label or value.^[14]
`browser_handle_dialog`	Accept or dismiss native browser dialogs.^[14]
`browser_wait_for`	Block until a referenced element appears, disappears, or a text condition is met.^[14]

Tab management

browser_tab_new, browser_tab_close, browser_tab_list, and browser_tab_select give the agent control over multiple tabs in the same browser instance, which is useful for cross-site workflows like comparing prices on two storefronts.^[14]

Screenshots and media

browser_take_screenshot returns a PNG or JPEG of the current viewport or a specific element. browser_pdf_save writes the current page to PDF, primarily for archival or test reporting. The vision profile adds coordinate-based tools browser_mouse_move_xy, browser_mouse_click_xy, and browser_mouse_drag_xy for cases where the accessibility tree is not enough.^[14]^[15]

Network and storage

Tool	Purpose
`browser_network_requests`	List requests issued since the last snapshot.^[14]
`browser_set_cookies` / `browser_get_cookies`	Read and write cookies.^[14]
`browser_set_local_storage`	Inject local storage values, often used to bypass login flows in test harnesses.^[14]
`browser_route`	Mock or intercept specific network responses, useful for offline testing.^[14]

Debugging and code

browser_console_messages returns recent console output. browser_evaluate runs arbitrary JavaScript in the page context and returns the serializable result. browser_generate_playwright_test emits a runnable Playwright test script that reproduces the actions the agent has taken so far, which converts an exploratory session into a regression test.^[14]

The exact tool names and shapes have evolved through the v0.0.x series. Older clients and tutorials sometimes refer to tools by earlier names; the canonical list lives in the tools directory of the GitHub repository and in the Playwright MCP documentation site.^[1]^[14]

How agents use it

An agent that wants to drive Playwright MCP follows the standard MCP client pattern: register the server in a configuration file, list the available tools at session start, and call them through the model's tool-use loop.^[4]^[17]

For Claude Code, the registration is a single shell command that writes to ~/.claude.json:

claude mcp add playwright npx @playwright/mcp@latest

For most other clients, the configuration is a JSON snippet that names the server and gives the command to spawn it.^[4]^[14]

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp@latest"]
    }
  }
}

This same snippet works for Cursor, Windsurf, Claude Desktop, and VS Code's GitHub Copilot agent mode, which Microsoft added in VS Code 1.99 alongside MCP client support in March 2025.^[14]^[18]

A typical agent turn looks like this. The user asks the model to find the cheapest flight from San Francisco to Tokyo next month. The model calls browser_navigate with the Google Flights URL, then browser_snapshot to read the page. The snapshot returns a YAML accessibility tree with refs for the origin input (ref=e23), the destination input (ref=e24), the date pickers, and the search button. The model calls browser_type with ref=e23 and text SFO, then browser_type on ref=e24 with text Tokyo, then browser_click on the search button. After the page loads, another browser_snapshot returns the result list, and the model reads off the prices. Because refs are scoped to a single snapshot, the model is expected to take a fresh snapshot after any action that changes the DOM.^[15]^[19]

MCP servers from other vendors

The original Anthropic launch shipped a reference Puppeteer MCP server, which served as a proof of concept rather than a production tool.^[5] After Microsoft's official release, the community converged on Playwright MCP as the default browser server, although alternatives exist with different trade-offs.

Server	Maintainer	Engine	Notes
Playwright MCP	Microsoft	Playwright	Official, accessibility-tree first, headed by default, 40+ tools.^[1]^[14]
Puppeteer MCP	Anthropic reference	Puppeteer	Initial launch example, Chromium only, smaller surface.^[5]
ExecuteAutomation `mcp-playwright`	Community	Playwright	Adds API testing helpers and Cline support; predates the official server.^[20]
Browserbase MCP	Browserbase	Headless cloud Chrome	Runs the browser in Browserbase's hosted infrastructure for proxy and CAPTCHA support.^[21]
Hyperbrowser	Hyperbrowser.ai	Headless cloud	Similar cloud-runner model with stealth and session features.^[21]

The choice between local Playwright MCP and a hosted browser MCP usually comes down to whether the agent needs the user's cookies and login state (local wins) or whether it needs an IP that is not the user's home address and resilience to bot detection (cloud wins).

Versus full desktop control

anthropic computer use is a tool family launched by Anthropic in public beta on 22 October 2024 that lets Claude move a mouse, type, and read the screen of an arbitrary desktop environment by analyzing screenshots.^[22] openai operator is OpenAI's competing system, released on 23 January 2025 in a research preview, built on the Computer-Using Agent model that combines GPT-4o vision with reinforcement learning.^[23] Both work by taking screenshots and emitting coordinate-level mouse and keyboard actions, and both can drive any GUI rather than just a browser.

Playwright MCP occupies a narrower but more reliable slice of that space. By targeting only the browser and only the accessibility tree, it avoids the cost and brittleness of vision while keeping a fully open-source local-execution path. For tasks that live entirely on the web, such as form filling, data extraction from rendered pages, or end-to-end test generation, Playwright MCP is usually faster and cheaper per action than Computer Use or Operator.^[3]^[16] For tasks that span native applications, file dialogs outside the browser, or operating system settings, desktop control agents remain the only option.

chatgpt agent is OpenAI's successor product that bundles the Operator browser, a code interpreter, and tool use into a single ChatGPT mode. It is closer in scope to a full virtual assistant than to a developer tool like Playwright MCP, but the underlying browser-driving primitive is similar.

Versus library-level browser agents

browser-use agent is a Python library by Magnus Müller and Gregor Žunić that wraps Playwright in an LLM-driven control loop, originally released in late 2024. It is not an MCP server. The Python program embeds an LLM client, calls browser-use directly, and runs entirely in the developer's process.^[24] Playwright MCP and browser-use are sibling projects rather than competitors: developers who want a programmable Python loop pick browser-use, and developers who want any MCP-aware client to drive the browser pick Playwright MCP. Both ultimately call Playwright under the hood.

ai browser agent is the broader category. The space includes hosted products such as Adept's Workflow Language for action agents, Reka's browser tools, Arc Browser's smart features, and a long tail of open-source experiments. Playwright MCP is the lowest-common-denominator infrastructure piece that most of these can use as a backend if they choose to.

Use cases

The Playwright MCP repository and developer write-ups describe a recurring set of applications.

Self-quality assurance during development. An agent built into an IDE can launch a local development server, open the URL, click through the new feature, and check that the visible behavior matches the requested change. Claude Code, Cursor, and Windsurf all support this pattern out of the box once the server is registered.^[4]^[25]
Exploratory and acceptance testing. Quality engineers describe a test flow in natural language, the agent walks through it, and a final tool call asks browser_generate_playwright_test to emit a Playwright test() block that captures the steps as a runnable assertion. This converts manual exploration into a regression test without the engineer writing selectors by hand.^[14]^[25]
Web scraping with judgment. Traditional scrapers brittle on dynamic pages because they rely on CSS selectors. An LLM agent driving Playwright MCP reads the accessibility tree, decides which row of a table holds the desired data, and adapts when the layout changes. The cost is higher per page than a hand-written scraper, but the rate of breakage is lower for one-off or low-volume extraction tasks.^[3]^[16]
Form filling and multi-step web tasks. Filling a long government form, completing a multi-stage checkout, or onboarding a SaaS account often spans several pages with conditional flows. Snapshot-based control handles the conditionals because the agent reads what is actually on the page after each step rather than executing a fixed script.^[4]
Browser-based research agents. A research agent that needs to search, follow links, read summaries, and gather citations can use Playwright MCP rather than a search API in cases where the relevant content is behind JavaScript-rendered pages or requires user login. This is the pattern that products like ChatGPT Agent and similar offerings deploy in their browser tools.^[23]
End-to-end test generation in CI. When paired with the Playwright CLI for replay, the MCP server can generate tests in development that the CLI then runs unattended in continuous integration, separating exploration from automation.^[25]^[14]

Limitations

Several caveats appear repeatedly in third-party reviews and the documentation itself.

Token accumulation in long sessions. Because every interaction tool returns a fresh accessibility snapshot, multi-step workflows can run up large context-window bills. The Playwright team recommends turning off automatic snapshots after every action for production agents and asking for snapshots only when needed. Even so, Morph's analysis found that MCP-driven flows can use roughly four times the tokens of CLI-driven flows on the same task.^[16]
No native CAPTCHA, no robust anti-bot evasion. Playwright MCP is a thin layer over Playwright. It does not solve CAPTCHAs, manage proxy rotation, or fingerprint a browser to avoid detection. Sites that block automated traffic block Playwright MCP. Cloud alternatives like Browserbase and Hyperbrowser exist partly to fill this gap.^[21]^[26]
Node.js requirement. The server runs on Node.js 18 or later. Earlier Node versions produce a confusing performance is not defined error at startup.^[25]
Headed default. The MCP server runs the browser visibly by default. For agents running on remote machines without a display server, the user has to add --headless or run inside a virtual framebuffer like Xvfb.^[14]
Snapshot drift across actions. Element refs are stable within a single snapshot but not across snapshots. A model that caches refs from an earlier snapshot and tries to reuse them after a navigation will get errors. The protocol expects every action to be preceded by a fresh snapshot.^[15]
Vision mode token cost. When vision is enabled for canvas-heavy or layout-sensitive pages, the cost advantage over OpenAI Operator narrows significantly because both modes are then sending screenshots into the model context.^[3]

Reception

Playwright MCP became one of the most downloaded MCP servers within weeks of release. By mid-2026 it appeared at or near the top of MCP server directory rankings on aggregators such as PulseMCP, which estimated tens of millions of monthly visitors to its listing.^[12] Developer adoption was driven in part by direct integration paths in widely used clients. Anthropic published instructions for installing Playwright MCP in Claude Code, and Microsoft made it the default browser server for the GitHub Copilot agent mode shipped in VS Code 1.99 in March 2025.^[4]^[18]

Reviews focused on three points. First, the accessibility-tree approach was widely judged to be the right trade-off for browser automation, with several authors noting that earlier attempts at vision-based web agents had failed in production because of token cost and screenshot noise.^[2]^[16] Second, the snapshot reference system was praised for giving agents deterministic targets, an improvement over earlier patterns that asked the model to write CSS selectors directly.^[15] Third, critics pointed to enterprise gaps including CAPTCHA, multi-factor authentication, and the lack of a built-in workflow scheduler, and argued that Playwright MCP works best as plumbing inside a larger agent rather than as a complete automation product.^[26]

References

Microsoft, "microsoft/playwright-mcp: Playwright MCP server", GitHub, 2026-05-07. https://github.com/microsoft/playwright-mcp. Accessed 2026-05-25. ↩
Simon Willison, "microsoft/playwright-mcp", simonwillison.net, 2025-03-25. https://simonwillison.net/2025/Mar/25/playwright-mcp/. Accessed 2026-05-25. ↩
Playwright project, "Snapshots", Playwright MCP documentation, 2026. https://playwright.dev/mcp/snapshots. Accessed 2026-05-25. ↩
Builder.io engineering, "How to Use Playwright MCP Server with Claude Code", builder.io blog, 2025. https://www.builder.io/blog/playwright-mcp-server-claude-code. Accessed 2026-05-25. ↩
Anthropic, "Introducing the Model Context Protocol", Anthropic news, 2024-11-25. https://www.anthropic.com/news/model-context-protocol. Accessed 2026-05-25. ↩
Wikipedia contributors, "Model Context Protocol", Wikipedia, 2026. https://en.wikipedia.org/wiki/Model_Context_Protocol. Accessed 2026-05-25. ↩
Google Cloud, "Announcing official MCP support for Google services", Google Cloud Blog, 2025-12-10. https://cloud.google.com/blog/products/ai-machine-learning/announcing-official-mcp-support-for-google-services. Accessed 2026-05-25. ↩
InfoQ, "Playwright 1.0 Release Automates Chromium, Firefox, and WebKit-Based Browsers", InfoQ news, 2020-05-12. https://www.infoq.com/news/2020/05/playwright-10-browser-automation/. Accessed 2026-05-25. ↩
Microsoft, "microsoft/playwright: Playwright is a framework for Web Testing and Automation", GitHub, 2026-05-11. https://github.com/microsoft/playwright. Accessed 2026-05-25. ↩
Skyvern, "Playwright MCP Reviews and Alternatives 2025", Skyvern blog, 2025. https://www.skyvern.com/blog/playwright-mcp-reviews-and-alternatives-2025/. Accessed 2026-05-25. ↩
Microsoft, "Releases for microsoft/playwright-mcp", GitHub, 2026. https://github.com/microsoft/playwright-mcp/releases. Accessed 2026-05-25. ↩
PulseMCP, "Official Playwright Browser Automation MCP Server", PulseMCP server directory, 2026. https://www.pulsemcp.com/servers/microsoft-playwright. Accessed 2026-05-25. ↩
DeepWiki, "Transport Layer: microsoft/playwright-mcp", DeepWiki documentation index, 2026. https://deepwiki.com/microsoft/playwright-mcp/4.3-transport-layer. Accessed 2026-05-25. ↩
Playwright project, "Playwright MCP", Playwright documentation, 2026. https://playwright.dev/mcp/introduction. Accessed 2026-05-25. ↩
Playwright project, "Snapshots in Playwright MCP", Playwright MCP documentation, 2026. https://playwright.dev/mcp/snapshots. Accessed 2026-05-25. ↩
Morph, "Playwright MCP Setup and Cost: Why the CLI Is 4x Cheaper", morphllm.com, 2026. https://www.morphllm.com/playwright-mcp. Accessed 2026-05-25. ↩
Model Context Protocol authors, "Specification", modelcontextprotocol.io, 2025-11-25. https://modelcontextprotocol.io/specification/2025-11-25. Accessed 2026-05-25. ↩
4sysops, "Install Microsoft Playwright MCP server in VS Code for AI-powered browser automation in GitHub Copilot Agent Mode", 4sysops.com, 2025. https://4sysops.com/archives/install-microsoft-playwright-mcp-server-in-vs-code-for-ai-powered-browser-automation-in-github-copilot-agent-mode/. Accessed 2026-05-25. ↩
Qaskills, "Playwright MCP browser automation guide", qaskills.sh blog, 2026. https://qaskills.sh/blog/playwright-mcp-browser-automation-guide. Accessed 2026-05-25. ↩
ExecuteAutomation, "executeautomation/mcp-playwright: Playwright Model Context Protocol Server", GitHub, 2026. https://github.com/executeautomation/mcp-playwright. Accessed 2026-05-25. ↩
TestCollab, "Playwright MCP Server: How to Set Up, Configure and Use It (2026)", TestCollab blog, 2026. https://testcollab.com/blog/playwright-mcp. Accessed 2026-05-25. ↩
TechCrunch, "Anthropic's new AI model can control your PC", TechCrunch, 2024-10-22. https://techcrunch.com/2024/10/22/anthropics-new-ai-can-control-your-pc/. Accessed 2026-05-25. ↩
OpenAI, "Introducing Operator", OpenAI index, 2025-01-23. https://openai.com/index/introducing-operator/. Accessed 2026-05-25. ↩
Browser-use authors, "browser-use/browser-use: Make websites accessible for AI agents", GitHub, 2026. https://github.com/browser-use/browser-use. Accessed 2026-05-25. ↩
MindStudio, "Automate Browser Tasks with Claude Code and the Playwright MCP Server", MindStudio blog, 2026. https://www.mindstudio.ai/blog/automate-browser-tasks-claude-code-playwright. Accessed 2026-05-25. ↩
Skyvern, "Playwright MCP Reviews and Alternatives 2025", Skyvern blog, 2025. https://www.skyvern.com/blog/playwright-mcp-reviews-and-alternatives-2025/. Accessed 2026-05-25. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

Claude Agent SDK Claude Code Playwright

Background

Release history

Architecture

Transports

Browser backends

Snapshot versus vision modes

Tools

Core interaction

Tab management

Screenshots and media

Network and storage

Debugging and code

How agents use it

Comparison with related tools

MCP servers from other vendors

Versus full desktop control

Versus library-level browser agents

Use cases

Limitations

Reception

See also

References

Improve this article

Related Articles

Semantic Kernel

Microsoft Agent Framework

AutoGen

Magentic-One

Windows Agent Arena

GitHub Copilot

What links here

Related Articles

Semantic Kernel

Microsoft Agent Framework

AutoGen

Magentic-One

Windows Agent Arena

GitHub Copilot

What links here