OpenAI Operator
Last reviewed
May 17, 2026
Sources
33 citations
Review status
Source-backed
Revision
v6 · 8,393 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 17, 2026
Sources
33 citations
Review status
Source-backed
Revision
v6 · 8,393 words
Add missing citations, update stale details, or suggest a clearer explanation.
OpenAI Operator was an AI agent developed by OpenAI that could autonomously perform tasks through web browser interactions. Launched on January 23, 2025, as a research preview for ChatGPT Pro subscribers in the United States, Operator was powered by a new model called the Computer-Using Agent (CUA), which combined GPT-4o's vision capabilities with reinforcement learning to interact with graphical user interfaces. Operator represented one of OpenAI's first consumer-facing AI agents, designed to browse the web and complete real-world tasks on behalf of users. The standalone product was deprecated on August 31, 2025, after its core functionality was integrated into ChatGPT as part of the new ChatGPT agent mode, announced on July 17, 2025.
AI agents that browse the web and interact with software on behalf of users became a major focus in the AI industry during 2024 and 2025. Several major AI companies raced to ship browser-based agents that could go beyond generating text and actually take actions in digital environments.
Anthropic was first to demonstrate the capability publicly, introducing computer use in public beta for the upgraded Claude 3.5 Sonnet model on October 22, 2024. The feature let Claude interact with a computer's desktop by moving the cursor, clicking, and typing, accessible to developers through the Anthropic API. Shortly after, on December 11, 2024, Google DeepMind unveiled Project Mariner, a research prototype built on Gemini 2.0 Flash that navigated websites through a Chrome extension. Cognition's Devin, released in March 2024, had already established a related but narrower template: an autonomous agent that wrote and ran code in a sandboxed dev environment.
OpenAI entered this competitive space with Operator in January 2025, positioning it as one of its first consumer-facing AI agents. CEO Sam Altman had been signaling the company's interest in agents for months, and Operator was the most visible realization of that vision. Unlike traditional chatbots that only generate text, Operator could click buttons, fill out forms, scroll, and type into input fields, taking direct actions on the web on a user's behalf. The name reflected the product's role as a digital assistant that operates web interfaces on behalf of the user, mediating between natural-language intent and the long sequence of browser actions needed to satisfy that intent.
At the core of Operator was the Computer-Using Agent (CUA), a model that OpenAI developed specifically for GUI-based interaction. CUA built on years of foundational research at the intersection of multimodal understanding and reasoning, combining GPT-4o's vision capabilities with advanced reasoning trained through reinforcement learning. The model was designed to interact with graphical user interfaces the same way humans do, by interpreting buttons, menus, text fields, dropdown lists, and other visual elements on a screen.
CUA operated through an iterative perception-reasoning-action loop that cycled continuously until a task was completed:
Perception: Screenshots from the browser were captured and added to the model's context, providing a visual snapshot of the current state of the web page. Rather than relying on structured HTML, DOM elements, or accessibility trees, CUA analyzed raw pixel data from these screenshots to understand what was displayed on screen. This approach meant the model could work with any website regardless of its underlying technical implementation.
Reasoning: CUA processed the visual input using chain-of-thought reasoning, maintaining an internal monologue that helped it evaluate observations, track intermediate steps, and adapt dynamically to changing conditions. The model could break complex tasks into multi-step plans, deciding which element to click, what text to type, or where to scroll. This inner reasoning process took into consideration both the current screenshot and the history of past screenshots and actions within the session, allowing the model to maintain context about what it had already done and what remained.
Action: The model generated a specific action to execute, such as clicking a button at particular coordinates, entering text into a form field, scrolling up or down, or pressing keyboard shortcuts. The browser then executed this action, the page updated, and the loop repeated with a fresh screenshot.
This screenshot-based approach was significant because it allowed Operator to work with any website without requiring custom API integrations, browser plugins, or site-specific configurations. The model interacted with web pages purely through their visual presentation, in the same way a human user would. This made it inherently generalizable, as there was no dependency on website cooperation or structured data formats.
The reasoning component of CUA was particularly important for handling multi-step tasks like booking a restaurant reservation for a specific date and party size, where the model had to navigate the booking site, enter the date, select the party size, choose a time slot, and confirm. CUA's chain-of-thought process allowed it to anticipate upcoming steps, handle branching paths such as a preferred time slot being unavailable, and maintain progress when individual steps did not go as expected.
Adaptive self-correction was the second piece. When the model encountered unexpected results or errors during execution, it could recognize the problem and adjust. If a button did not respond, a page loaded differently than anticipated, a popup appeared, or a form validation error occurred, CUA would reassess and try alternative strategies. This resilience was essential for the unpredictable nature of real-world websites.
OpenAI described CUA as combining GPT-4o's vision capabilities with reasoning trained through reinforcement learning, though the company did not publish detailed training methodology. The reinforcement learning component was critical for teaching the model how to navigate graphical interfaces effectively, allowing the system to learn from trial and error which sequences of actions led to successful task completion. CUA built on years of internal OpenAI research at the intersection of multimodal understanding and reasoning, and inherited the visual perception stack from GPT-4o.
CUA set new state-of-the-art results on several established benchmarks for computer and web interaction at the time of its January 2025 release:
| Benchmark | CUA Score | Previous SOTA | Human Performance | Description |
|---|---|---|---|---|
| OSWorld | 38.1% | 22.0% | 72.4% | Full computer use tasks across operating systems |
| WebArena | 58.1% | N/A | ~78% | Web-based interaction tasks on realistic websites |
| WebVoyager | 87.0% | N/A | N/A | End-to-end real-world web navigation tasks across 15 popular websites |
OSWorld is a benchmark that measures an agent's ability to perform general computer-use tasks across operating systems, including interacting with desktop applications, file systems, and web browsers. CUA achieved a 38.1% success rate, far exceeding the previous state-of-the-art of 22.0%. This represented a roughly 73% relative improvement over prior approaches. However, human performance on the same benchmark stood at 72.4%, illustrating the substantial gap that still existed between AI agents and human capability at the time.
WebArena evaluates agents on web-based interaction tasks using realistic, self-hosted website replicas. The benchmark tests whether agents can navigate complex web applications and complete multi-step tasks. CUA achieved a 58.1% success rate on this benchmark, compared to human performance of approximately 78%.
WebVoyager tests agent performance on end-to-end real-world web tasks across 15 popular websites, with each site containing approximately 40 tasks for a total of 643 evaluation tasks. CUA achieved an 87.0% success rate on WebVoyager, representing strong performance on practical web navigation scenarios.
CUA demonstrated test-time scaling properties, meaning its performance improved when allowed more computational steps to complete a task. When granted additional interaction steps and more time to reason and act, the model could solve problems that it would otherwise fail at with fewer attempts. This property suggested that future improvements in compute availability and efficiency could further enhance the model's capabilities without changes to the underlying architecture.
In the year following Operator's launch, the broader industry treated CUA's 38.1% OSWorld score as the floor for serious agent work, and frontier models from multiple labs raced past it. In July 2025, the XLANG Lab that maintains OSWorld released OSWorld-Verified, a tightened version of the benchmark that added community-driven task fixes, AWS-based parallel evaluation that reduced run time to roughly one hour, and stricter task signals intended to make scores harder to game. By early 2026, OSWorld-Verified had largely displaced the original benchmark in published comparisons.
Progress on OSWorld-Verified was rapid through late 2025 and into 2026. Claude Sonnet 4.5 reached 61.4% in September 2025, the first frontier model to break past the 60% mark on the new variant. Through the first half of 2026, models from OpenAI and Anthropic traded the top spot:
| Model | OSWorld-Verified score | Notes |
|---|---|---|
| OpenAI CUA (Jan 2025 baseline) | 38.1% (original OSWorld) | The initial Operator launch number |
| Claude Sonnet 4.5 | 61.4% | September 2025; first model past 60% |
| OpenCUA-72B | 45.0% | Open-source baseline established by XLANG Lab in 2025 |
| GPT-5.4 | 75.0% | March 2026 OpenAI release |
| Claude Opus 4.7 | 78.0% | Late 2025 and early 2026 leaderboard contender |
| GPT-5.5 | 78.7% | April 2026; among the top published scores |
| Claude Mythos Preview | 79.6% | Top published score on OSWorld-Verified as of mid-2026 |
The progression matters in context: a benchmark on which OpenAI's January 2025 CUA could complete only 38.1% of tasks now has multiple frontier systems above the 72.4% human baseline from the original OSWorld paper. The underlying CUA architecture, with its perception-reasoning-action loop and screenshot-only interface, became the template against which these later models were measured, even as the specific weights and reasoning backends underneath changed several times.
Users accessed Operator through a dedicated website at operator.chatgpt.com. The interface presented a ChatGPT-style chat window on one side and a miniaturized browser window on the other side that the agent controlled. The homepage displayed suggested prompts for common tasks, such as "Find tickets for the next concert at the Sphere" or "Find a restaurant with a great happy hour for next Wednesday for 6 people," helping users understand the types of tasks Operator could handle.
To use Operator, a user would type a natural language request describing the task they wanted completed. Operator would then:
Users could watch Operator work in real time, observing each click, scroll, and keystroke as it happened. The transparency of this process allowed users to verify the agent's actions and intervene if necessary. At any point, users could click a "take over" button to assume direct control of the browser window.
Operator allowed users to personalize their workflows by adding custom instructions, either globally across all websites or for specific sites. For example, a user could set airline preferences for booking sites or specify dietary restrictions for grocery ordering. Users could also save prompts for quick access on the homepage, which was particularly useful for repeated tasks like weekly grocery restocking on Instacart.
Operator was designed for a wide variety of repetitive browser-based tasks. OpenAI collaborated with companies including DoorDash, Instacart, OpenTable, Priceline, StubHub, Thumbtack, and Uber to test and refine Operator's performance on their platforms. Common use cases included:
| Task Category | Example | Platforms |
|---|---|---|
| Grocery ordering | Finding ingredients for a recipe and adding them to a cart | Instacart, DoorDash |
| Restaurant reservations | Booking a table for a specified date, time, and party size | OpenTable |
| Event tickets | Searching for and purchasing tickets for concerts or sporting events | StubHub |
| Travel booking | Comparing flights, hotels, and vacation packages | Priceline, Booking.com |
| Form filling | Completing online forms, applications, and registration pages | Various |
| Shopping | Browsing products, comparing prices, and adding items to carts | Various e-commerce sites |
| Appointment scheduling | Booking appointments with service providers | Thumbtack |
| Ride hailing | Requesting rides and managing transportation | Uber |
Operator could also handle parallel tasks within a single session. For instance, a user could ask Operator to order groceries on Instacart while simultaneously making a hotel booking on Booking.com.
OpenAI implemented a multi-layered safety framework for Operator, recognizing the unique risks inherent in an AI agent that can take actions on the web with real-world consequences. Unlike a standard chatbot where the worst outcome of an error is an incorrect response, a browser agent that makes a mistake could place an unintended order, send an email to the wrong person, or modify account settings incorrectly. The safety framework included four distinct levels of protection.
The CUA model was trained to request user confirmation before finalizing tasks with external side effects. Before submitting an order, sending an email, deleting a calendar event, making a purchase, or completing any action that could not be easily undone, Operator would pause and display a summary of the pending action, asking the user to review and approve. This allowed users to double-check the agent's work before the action became permanent. According to OpenAI's system card, Operator o3 (a later version) achieved a 92.1% critical confirmation recall rate for financial transactions, with 100% accuracy for editing permissions and completing financial transactions, and 99.9% accuracy for sending high-stakes communications.
When Operator encountered steps requiring sensitive information, such as login credentials, passwords, payment details, or CAPTCHAs, it would pause and prompt the user to "take over" the browser directly. During take-over mode, several important privacy protections activated:
This design ensured that sensitive credentials, payment card numbers, and authentication tokens were never seen or processed by the AI model. Once the user completed the sensitive step (such as logging in or entering payment information), they could hand control back to Operator to continue the task.
On particularly sensitive websites, such as email services, banking platforms, and financial services, Operator required close user supervision through watch mode. This mode had several distinct behaviors:
Watch mode was designed to ensure that the user was actively monitoring the agent's behavior on high-risk sites where mistakes could have serious financial, legal, or personal consequences.
For certain categories of actions where the risk was deemed too significant, OpenAI fully restricted the model from assisting. These hard restrictions could not be overridden by user instructions. Operator would decline tasks involving:
Operator had several notable limitations during its research preview period that reflected the early state of AI browser agent technology.
Accuracy and reliability: Operator did not achieve human-level accuracy. Long, multi-step tasks accumulated errors. The model could misinterpret visual elements, misidentify clickable targets, or make incorrect assumptions about interface behavior. On the OSWorld benchmark, which approximates real computer use, the 38.1% success rate meant it failed on the majority of tasks.
Complex and non-standard interfaces: Websites with heavy JavaScript rendering, drag-and-drop functionality, intricate calendar or date-picker widgets, interactive maps, or other non-standard UI components could confuse the agent because the model's visual understanding did not always correctly interpret novel design patterns.
Looping and crashes: Operator could sometimes get stuck in infinite loops, repeatedly attempting the same action without making progress, or freeze entirely. These stalls limited fully autonomous operation and meant users had to stay engaged.
No scheduling or background operation: Operator lacked built-in scheduling or continuous operation. Users could not schedule tasks for later, set up recurring tasks like weekly grocery orders, or have Operator run as a background service. Each task ran only within its active session, and closing the tab stopped the task.
Speed: Because Operator relied on capturing screenshots, processing them, generating an action, executing it, and capturing a new screenshot, it was generally slower than a human performing the same tasks. Early reviews described it as "significantly slower" than direct manual completion.
CAPTCHA handling: While Operator was designed to ask users to take over for CAPTCHAs, reports indicated it could sometimes solve simple ones on its own, raising questions about the boundaries of what the agent should handle autonomously.
Geographic and language limitations: At launch, Operator worked best with English-language websites and US-centric services. Performance on websites in other languages or other regions was less reliable.
In the months following launch, reviewers and users developed a clearer picture of which tasks Operator handled well and which it struggled with. The pattern that emerged was that Operator performed best on narrow, well-trodden, high-volume web flows on partner sites that OpenAI had explicitly tested against, and worse on anything off the beaten path.
| Task category | Reliability in practice | Notes |
|---|---|---|
| Restaurant reservations on OpenTable | Generally reliable | Operator could correctly apply date, time, party size, and cuisine filters, then commit to a slot. |
| Grocery ordering on Instacart | Generally reliable | Adding ingredients from a recipe to a single cart was a frequently demonstrated and repeatable workflow. |
| Travel search on Priceline and Booking.com | Reliable for searching, less for booking | Comparing flights and hotels worked, but checkout typically required take-over for payment. |
| Single-page form filling | Reliable | Static forms with clear labels were one of Operator's strongest cases. |
| Multi-tab research and aggregation | Mixed | Worked on simple comparisons, struggled when results spanned many sites or required reasoning about subtle differences. |
| CAPTCHAs and anti-bot interstitials | Mixed | Often handed off to the user; some simple CAPTCHAs were reportedly solved by the model on its own, raising questions about agent boundaries. |
| Calendar widgets and date pickers | Unreliable | Custom calendar components, especially those using JavaScript drag-and-drop, frequently confused the agent. |
| Slide creation, document layout, and visual editors | Poor | Operator was not built for slideshow editors or other visually rich production tools, which it tended to misinterpret. |
| Long-running multi-hour tasks | Poor | Sessions could stall, loop, or time out before completing the task. |
| Bot-detected sites with login walls | Poor | Account verification, two-factor prompts, and aggressive bot detection routinely interrupted execution. |
A common observation from early reviewers was that, even on tasks Operator could complete, performing the work manually was often faster than supervising the agent. The value proposition skewed toward batchable tasks (such as a recurring weekly grocery order) and toward users who preferred delegating the click-through work even when it was not a strict speed win.
Operator's availability expanded in phases during its roughly seven-month lifespan:
| Date | Availability Change |
|---|---|
| January 23, 2025 | Launched as research preview for ChatGPT Pro subscribers ($200/month) in the US |
| February 21, 2025 | Expanded to Pro subscribers in Australia, Brazil, Canada, India, Japan, Singapore, South Korea, UK, and other countries |
| Later in 2025 | Expanded to most regions where ChatGPT was available, including Europe |
| July 17, 2025 | Functionality integrated into ChatGPT agent; Operator deprecation announced |
| August 31, 2025 | Operator shut down; all chat history permanently deleted |
Throughout its existence, the standalone Operator product remained exclusive to ChatGPT Pro subscribers at $200 per month. OpenAI had announced plans to expand access to Plus ($20/month), Team, and Enterprise subscribers, but this expansion never materialized for the standalone product. Instead, OpenAI chose to integrate the capability directly into ChatGPT through the agent mode, which was made available to a broader set of subscription tiers.
Operator's January 2025 debut drew heavy press attention because it was the first time most ChatGPT Pro subscribers had seen an OpenAI product that took web actions on their behalf, rather than only producing text. Reception split into roughly three groups: paying users testing the product against everyday errands, AI press evaluating it as a category-defining release, and security researchers probing the safety story.
Among ChatGPT Pro subscribers, reactions were mixed. Reviewers consistently said Operator was capable of completing common tasks like booking dinner reservations, ordering groceries, and filling out forms, but that the experience was slower than performing the same task manually. The combination of frequent confirmation prompts, watch-mode pauses, and the screenshot loop's inherent latency meant that even when Operator finished a task correctly, it rarely felt time-saving on the first attempt. Users who returned to Operator most often were those running repeated, batch-style tasks, such as restocking groceries from a saved recipe list, where the upfront supervision could be amortized over multiple uses.
A second common complaint concerned reliability. Operator could complete a flight comparison flawlessly on Tuesday and stall or loop on the same site on Wednesday, often because of small layout changes, A/B tests, or intermittent ad overlays. This unevenness made it difficult to trust the agent for anything time-sensitive without monitoring.
The MIT Technology Review described Operator as the most ambitious consumer agent any major lab had shipped, while warning that it was "very much a work in progress." TechCrunch and the Verge framed it as OpenAI's first real attempt to commercialize agents, noting both the breakthrough nature of the screenshot-based approach and the practical limits of a product locked behind a $200-per-month tier. VentureBeat highlighted the partner integrations with DoorDash, Instacart, OpenTable, Priceline, StubHub, Thumbtack, and Uber as evidence that OpenAI saw consumer commerce as the initial wedge.
Bloomberg and the Information emphasized the strategic context: OpenAI, having watched Anthropic ship computer use three months earlier and Google ship Project Mariner six weeks earlier, needed a visible agent product. Operator was as much a market-positioning statement as it was a finished tool.
Independent reviewer Simon Willison covered Operator extensively from launch and treated it as more interesting from a security and architecture standpoint than as a productivity tool. Willison wrote that the leaked Operator system prompt, surfaced by Johann Rehberger, "doubles as better written documentation than any of the official sources," and used it to walk through how the model decides when to confirm, when to take over, and when to refuse. His broader take was that Operator validated the agent paradigm but inherited every unsolved problem of LLMs on the open web, particularly prompt injection.
Security researcher Johann Rehberger demonstrated, in February 2025, that indirect prompt injection attacks against Operator were practical. In one widely discussed proof of concept, an attacker hid instructions inside a GitHub issue. When Operator visited the issue as part of a benign user task, the embedded instructions caused it to navigate to a site where the user was already logged in, scrape personal information such as email addresses and phone numbers, and exfiltrate that information back to the attacker by encoding it into a URL.
Rehberger's work showed that the visible action stream and watch mode helped attentive users notice anomalies, but that an inattentive user, or one running Operator across multiple parallel tasks, could plausibly miss the attack. OpenAI responded by tightening confirmation prompts on certain navigation patterns and by expanding watch mode coverage on sensitive sites, but Rehberger and other researchers continued to argue that prompt injection was an unsolved problem at the model level rather than something that could be fully patched at the product layer.
These findings became part of a broader 2025 conversation about AI safety for browser agents. The same underlying concerns later resurfaced for Anthropic's Claude for Chrome and for OpenAI's own Atlas browser, both of which received fresh prompt-injection demonstrations within days of launch.
Researchers also probed Operator's refusal behavior on restricted tasks. Hard restrictions on stock trading, banking transfers, and certain legal actions held under direct prompts, but softer guardrails (such as the disposition to ask before clicking "Place order") could be weakened by elaborate role-play prompts or chained instructions. The pattern was familiar from chat-based jailbreaks: the model's underlying willingness to take an action could be raised through context manipulation, even when the wrapper layer eventually caught the result.
Operator received several model and policy updates during its short standalone life.
On March 11, 2025, OpenAI made the CUA model available to developers through its new Responses API as part of a broader set of agent-building tools. Access was initially limited to a research preview for select developers in usage tiers 3 through 5. Pricing was set at $3 per million input tokens and $12 per million output tokens.
Unlike the consumer-facing Operator product, which controlled only a web browser, the API version of CUA could in principle drive any graphical environment that exposed screenshots, including desktop applications and mobile interfaces. Common targets included data entry, application workflow automation, internal tool testing, and cross-application processes that previously required brittle scraping or robotic process automation. OpenAI published a sample application on GitHub (openai/openai-cua-sample-app), and enterprises could optionally run CUA locally for additional security and data control.
On February 21, 2025, Operator rolled out to ChatGPT Pro subscribers in Australia, Brazil, Canada, India, Japan, Singapore, South Korea, and the United Kingdom, among other countries. Subsequent expansions in the spring brought it to most regions where ChatGPT Pro was sold, with the European Economic Area lagging because of regulatory review.
On May 23, 2025, OpenAI replaced the GPT-4o-based CUA with a version built on the o3 reasoning model, branded internally as "Operator o3." OpenAI published an addendum to the o3 and o4-mini system card describing the change. The reasoning model produced large performance gains on agent benchmarks. On the GAIA benchmark, which evaluates general assistant tasks requiring tool use, the o3-based Operator scored 62.2 compared with 12.3 for the previous model. OpenAI also reported that the new version asked for confirmation on 94% of sensitive actions and on 100% of financial transactions, with critical confirmation recall of about 92% for editing permissions and high-stakes communications.
Reviewers reported that the o3-based Operator was visibly more persistent: it gave up less often when a page failed to load, recovered better from unexpected popups, and required fewer manual nudges across long tasks. The trade-off was latency, since the underlying reasoning model spent more compute per step. The o3 upgrade did not change Operator's pricing or its ChatGPT Pro exclusivity, but it sharpened the value proposition for subscribers who were on the fence about renewing.
Across spring and early summer 2025, OpenAI iterated on Operator's safety surface. Watch mode coverage was expanded to include more email, banking, and government-services domains. The hard-restricted task list was tightened around stock trading, certain regulated transactions, and tasks involving sensitive identity verification. Confirmation prompts were tuned to fire on a wider range of "irreversible" actions, including certain account setting changes that earlier versions had allowed without explicit user approval.
On July 17, 2025, OpenAI announced the ChatGPT agent, a significant product update that consolidated several previous capabilities into a unified system within ChatGPT. The ChatGPT agent merged Operator's web browsing and task execution abilities with Deep Research's multi-step research and reporting capabilities, along with code execution, file manipulation, and third-party service integrations. OpenAI described it as an agent that "bridges research and action," capable of not just finding information but acting on it.
The ChatGPT agent operates on its own virtual computer accessible from within the ChatGPT.com interface. Rather than being a single model, the agent is an orchestrated system that coordinates multiple tools within a continuous agent loop. This virtual environment includes:
The agent can coordinate across all of these tools within a single task. For example, it can search the web for information using the visual browser, download a dataset, analyze it using Python in the terminal, generate a report, and then email the results through Gmail, all within a single conversation.
The ChatGPT agent represented a substantial expansion beyond what Operator could do. Key differences included:
| Capability | Operator | ChatGPT Agent |
|---|---|---|
| Web browsing | Yes (visual browser only) | Yes (visual and text browsers) |
| Code execution | No | Yes (terminal with Python, shell) |
| File creation and manipulation | No | Yes (spreadsheets, presentations, reports) |
| Third-party service connectors | No | Yes (Gmail, Drive, GitHub, Slack, etc.) |
| Deep Research integration | No | Yes (multi-step research and reporting) |
| Image generation | No | Yes |
| Access within ChatGPT interface | Separate site (operator.chatgpt.com) | Integrated in ChatGPT.com via "agent mode" dropdown |
To access the agent, users select "agent mode" from the dropdown menu in the ChatGPT composer and enter their query directly. The previous Deep Research mode remains available as a separate option for users who specifically want exhaustive research reports without task execution.
ChatGPT agent began rolling out on July 18, 2025. Pro subscribers received immediate access on launch day. Plus and Team users gained access over the following days. Enterprise and Education tiers were scheduled for availability in the subsequent weeks.
Usage was structured with monthly message quotas:
| Subscription Tier | Agent Messages per Month | Monthly Cost |
|---|---|---|
| Pro | 400 | $200 |
| Plus | 40 | $20 |
| Team | 40 | $25/user |
Users who needed additional capacity beyond their monthly quota could purchase extra credits. Free-tier users and European Economic Area residents initially did not have access to the agent mode, with OpenAI citing ongoing regulatory work for the EEA exclusion. As of 2026, only the initial user-initiated agent request counts against the monthly quota; intermediate clarifications, authentication hand-offs, and follow-up turns within the same task do not consume additional message credits.
The ChatGPT agent retained and expanded the safety mechanisms that Operator pioneered. The system provides live narration of its actions, requests explicit permission before high-impact actions such as sending emails or making purchases, keeps watch mode for sensitive sites, lets users wipe browsing data with a single click, and keeps sensitive inputs in secure take-over mode private from the model.
According to OpenAI's evaluations, the ChatGPT agent performed comparably to or better than the standalone o3 model on many tasks. Across internal evaluations spanning over 40 occupations including law, logistics, sales, and engineering, the ChatGPT agent matched or outperformed domain experts in roughly half the cases. The integrated approach, combining browsing, research, and code execution, often produced better results than any single tool used alone.
With the launch of ChatGPT agent, OpenAI announced that the standalone Operator experience at operator.chatgpt.com would be deprecated. The deprecation followed a clear timeline:
Users were warned to save any conversations or task results they wished to keep before the August 31 deadline, as no data recovery would be possible after the shutdown.
The deprecation reflected OpenAI's strategy of consolidating its agent capabilities into a single, unified interface rather than maintaining separate specialized products. By combining Operator's web interaction, Deep Research's analytical depth, and new code execution and connector capabilities into one system, OpenAI aimed to provide a more versatile and accessible tool. The integration also made agent capabilities available to a much broader user base, as the standalone Operator had been restricted to Pro subscribers at $200 per month.
On October 21, 2025, OpenAI announced ChatGPT Atlas, a dedicated web browser with ChatGPT built directly into the browsing experience. Atlas further extended the trajectory that began with Operator by embedding agent capabilities natively in a full Chromium-based browser. The product launched initially for macOS, with versions for Windows, iOS, and Android announced for later release.
Atlas integrates ChatGPT through an always-available sidebar that can answer questions about the current page, summarize content, compare products across sites, analyze data, and rewrite selected text. Paid subscribers can enable an agent mode inside Atlas that allows ChatGPT to take actions on websites, inheriting the Operator-derived capabilities and benefiting from the user's current browsing context. Atlas also introduced browser memories, which let ChatGPT retain facts and preferences across sites, subject to user privacy controls. Users can ask Atlas to recall things like "find all the job postings I was looking at last week and summarize the industry trends," and the browser maintains an archive of session context that the user can view, archive, or delete at any time. By default, OpenAI does not use browsed content to train its models, though users can opt in.
The browser uses a freemium model: a free tier provides the basic sidebar, while agent mode and enhanced memory features are restricted to Plus and Pro subscribers, with a separate beta for Business users. Like Operator and ChatGPT agent before it, Atlas drew immediate prompt-injection demonstrations from independent researchers shortly after launch, reinforcing the unsolved nature of that class of attack on browser agents.
OpenAI described Atlas's internal architecture in a separate blog post titled "How we built OWL, the new architecture behind our ChatGPT-based browser, Atlas." Rather than embedding Chromium inside the Atlas application process, OWL ("OpenAI's Web Layer") runs Chromium's browser process outside the main Atlas app and brokers communication between the model, the browser, and the user interface. This separation provides three properties that the operator.chatgpt.com architecture could not cleanly support:
The practical consequence is that Atlas's agent mode behaves like a tighter, more contextual version of Operator: the agent benefits from the page the user is already on, while OWL's sandbox enforces the restrictions Operator previously implemented inside the model and product wrapper.
Through late 2025 and into 2026, OpenAI publicly described the prompt-injection problem for Atlas and the descendants of Operator as a security problem that is "unlikely to ever be fully solved," comparing it to scams and social engineering that target humans. In a December 2025 blog post titled "Continuously hardening ChatGPT Atlas against prompt injection attacks," OpenAI detailed a defense pipeline that combined three layers:
Independent commentators including Simon Willison and security press such as TechCrunch, Fortune, and CyberScoop noted that the December 2025 statement was unusually candid for a frontier lab, effectively acknowledging that prompt injection is a permanent operating risk for browser agents rather than a bug awaiting a patch. The framing was widely viewed as inherited from lessons learned during the Operator era, where Rehberger's GitHub-issue exploit had already shown the basic shape of the attack. OpenAI also began advising users to give agents narrowly scoped instructions, since broad permissions like "take whatever action is needed in my inbox" gave injected content too much latitude.
Operator was part of a broader wave of AI browser agents released by major AI companies between late 2024 and 2025. Each company took a distinct approach to architecture, deployment, and safety, reflecting different philosophies about how AI agents should interact with the web. The following table compares the key offerings at the time of their respective launches:
| Feature | OpenAI Operator / ChatGPT Agent | Anthropic Computer Use | Google Project Mariner | Cognition Devin |
|---|---|---|---|---|
| Initial Announcement | January 23, 2025 | October 22, 2024 | December 11, 2024 | March 12, 2024 |
| Underlying Model | CUA (GPT-4o + RL); upgraded to o3 in May 2025 | Claude 3.5 Sonnet (upgraded), later Claude 4 family | Gemini 2.0 Flash | Proprietary; reportedly built on top of frontier LLMs |
| Interaction Method | Screenshot-based; clicks, types, scrolls in dedicated browser | Screenshot-based; cursor movement, clicking, typing on full desktop | Chrome extension; pixels and web elements in active tab | Browser, code editor, and shell access in a sandboxed dev environment |
| Scope | Web browser only (Operator); full virtual computer (ChatGPT agent) | Full desktop environment (not limited to browser) | Web browser only (Chrome active tab) | Software engineering: writing, running, and debugging code |
| Deployment | Dedicated site at operator.chatgpt.com (later integrated into ChatGPT) | Developer API; sandboxed environments (Docker containers, VMs) | Chrome extension; cloud-based VMs for processing | Cloud-hosted product on devin.ai with optional Slack and IDE integrations |
| WebVoyager Score | 87.0% | ~52% (Claude 3.5 Sonnet, Oct 2024) | 83.5% | Not directly comparable; software-engineering benchmark focus |
| OSWorld Score | 38.1% (CUA); higher with o3-based variant | 14.9% (screenshot-only, Claude 3.5 Sonnet, Oct 2024) | Not publicly reported | Not directly comparable |
| Headline Benchmark | Browser-task success | Computer-use success | Web-task completion | SWE-bench (software engineering tasks) |
| Safety Approach | Confirmation mode, watch mode, take-over mode, restricted tasks | ASL-3 safety classifiers; isolated sandbox environments | Cloud-based VMs; real-time user intervention; restricted actions requiring credit card or cookies | Sandboxed dev environment; user reviews diffs before merging |
| Initial Availability | ChatGPT Pro subscribers in the US | Developer API (public beta, all developers) | Trusted testers (limited invitation) | Waitlist; later $500/month team plan |
| Subsequent Expansion | Integrated into ChatGPT agent (July 2025) for Pro, Plus, Team users | Claude for Chrome extension (August 2025) for Max plan subscribers | Google AI Ultra subscribers in US (May 2025) | Lower-tier individual plan, IDE plugins, Cognition acquired Windsurf in 2025 |
| Pricing Model | Included with ChatGPT subscription; API: $3/$12 per million tokens | API token-based pricing | Google AI Ultra subscription | Subscription-based ($20/month individual; $500/month team historically) |
| Background Operation | No (requires active session) | Yes (runs in sandboxed VM) | No (requires active Chrome tab) | Yes (runs autonomously in cloud sandbox) |
OpenAI took a consumer-first approach with Operator, providing a polished product accessible through a web interface before later integrating it into ChatGPT and offering an API. This prioritized ease of use for non-technical users but limited initial adoption to high-paying Pro subscribers.
Anthropic chose a developer-first strategy, releasing computer use as an API capability in public beta and encouraging developers to build their own computer-using agents in sandboxed environments. This approach prioritized safety (through isolation) and flexibility (through developer customization) but initially lacked a consumer-facing product. Anthropic later launched Claude for Chrome as a browser extension in August 2025 for Max plan subscribers, expanding to Pro, Team, and Enterprise plans by December 2025. The underlying models also continued to advance: by the time of Claude Opus 4 and Claude Opus 4.5, Anthropic was shipping considerably stronger computer use performance, and by Claude Opus 4.7 on OSWorld-Verified the gap with OpenAI's CUA descendants had narrowed to within a few percentage points at the top of the leaderboard.
Google positioned Project Mariner as a research prototype, initially limiting access to trusted testers and gradually expanding availability. By May 2025, it became accessible to Google AI Ultra subscribers in the US. Google's approach of running agents on cloud-based VMs (rather than the user's local machine) created physical separation between agent actions and user systems, providing a different security model.
Cognition's Devin sat in a different segment of the agent market. Rather than browsing the open web on behalf of consumers, Devin was an autonomous software engineering agent that wrote, ran, and debugged code in a sandboxed development environment, then handed back a pull request for human review. Operator and Devin shared the broader "AI agent" framing and the screenshot-and-action loop pattern, but their target users (consumers running errands versus engineering teams shipping code) and their evaluation regimes (browser benchmarks versus SWE-bench) were distinct.
Microsoft's Copilot Vision, rolled out across Microsoft Copilot products in 2025, pursued yet another path: instead of taking actions on a user's behalf, Copilot Vision focused on letting Copilot see what the user was already viewing on screen and answer questions about it, with action-taking handled through scoped automations rather than a generalized browser agent. This made it less ambitious than Operator but also less exposed to the prompt-injection class of failures that dogged true browser agents.
Viewed from 2026, the Operator-style approach, a hosted screenshot-driven web agent running in OpenAI infrastructure, sits in the middle of a spectrum that runs from full-desktop computer use on one end to in-browser extension agents on the other. Anthropic's Claude for Chrome and Anthropic's computer-use API span the desktop side. OpenAI's ChatGPT Atlas browser and Perplexity Comet sit on the in-browser end. ChatGPT agent mode and the Responses API computer-use tool occupy the middle ground that Operator originally defined. Independent reviewers in 2026 generally treated this middle position as table stakes for any frontier AI product rather than a distinguishing feature, with practical differentiation coming from model quality, latency, sandbox guarantees, and integration with services the user already uses.
As of mid-2026, the standalone Operator product no longer exists. operator.chatgpt.com has been offline since August 31, 2025, and the underlying CUA technology now lives in three places: inside the ChatGPT agent mode, inside the agent mode of the Atlas browser, and through the computer use tool in the OpenAI Responses API. Models built on later GPT-5 and o-series reasoning families have largely replaced the original GPT-4o-based CUA in production usage, with GPT-5.4 and GPT-5.5 powering the descendants of CUA across Atlas and ChatGPT agent mode through 2026.
Despite its short standalone life, Operator left a clear mark on the AI agent ecosystem. The CUA model demonstrated that combining vision capabilities with reinforcement learning could produce an agent capable of interacting with arbitrary graphical interfaces without requiring API integrations or website cooperation, and the screenshot-based approach became the dominant paradigm across the industry. The 38.1% OSWorld score that headlined the January 2025 launch also became the de facto starting line against which subsequent progress was measured: by mid-2026, frontier scores on the successor benchmark OSWorld-Verified sat in the high 70s, above the original human baseline of 72.4%.
Operator's multi-layered safety framework, including confirmation mode, watch mode, take-over mode, and a restricted-task list, established patterns that carried forward into ChatGPT agent, Atlas, and competitors' products. The principle that browser agents should request human approval before irreversible actions became a widely adopted norm. The trade-offs around prompt injection that surfaced in the Operator era also remained largely unresolved at the model level, with later Anthropic and OpenAI agent products inheriting the same class of vulnerabilities and OpenAI itself stating publicly in December 2025 that prompt injection is unlikely to ever be fully solved.
The transition from Operator as a standalone product to an integrated feature within ChatGPT reflected a broader trend in AI product design: rather than building separate specialized tools, companies found more success embedding agent capabilities into existing platforms where users already spend time. This integration strategy, culminating in the Atlas browser, established a template that other companies have since followed.