OpenAI Operator was an AI agent developed by OpenAI that could autonomously perform tasks through web browser interactions. Launched on January 23, 2025, as a research preview for ChatGPT Pro subscribers in the United States, Operator was powered by a new model called the Computer-Using Agent (CUA), which combined GPT-4o's vision capabilities with reinforcement learning to interact with graphical user interfaces. Operator represented one of OpenAI's first consumer-facing AI agents, designed to browse the web and complete real-world tasks on behalf of users. The standalone product was deprecated on August 31, 2025, after its core functionality was integrated into ChatGPT as part of the new ChatGPT agent mode, announced on July 17, 2025.
The concept of AI agents that can browse the web and interact with software on behalf of users became a major focus in the AI industry during 2024 and 2025. Several major AI companies raced to develop browser-based agents that could go beyond generating text and actually take actions in digital environments.
Anthropic was among the first to demonstrate this capability, introducing computer use in public beta for the upgraded Claude 3.5 Sonnet model on October 22, 2024. This feature allowed Claude to interact with a computer's desktop environment by moving the cursor, clicking buttons, and typing text, accessible to developers through the Anthropic API. Shortly after, on December 11, 2024, Google DeepMind unveiled Project Mariner, a research prototype built on Gemini 2.0 Flash that could autonomously navigate websites and complete tasks through a Chrome extension.
OpenAI entered this competitive space with Operator in January 2025, positioning it as one of the company's first true AI agents. OpenAI CEO Sam Altman had previously signaled the company's interest in agents, and Operator was the most visible realization of that vision. OpenAI described agents as AIs capable of doing work independently: a user provides a task, and the agent executes it. Unlike traditional chatbots that generate text responses, Operator could take direct actions on the web, including clicking buttons, filling out forms, scrolling through pages, and typing text into input fields.
The name "Operator" reflected the product's role as a digital assistant that operates web interfaces on behalf of the user, functioning as an intermediary between the user's intent expressed in natural language and the complex series of browser interactions needed to accomplish a given task.
At the core of Operator was the Computer-Using Agent (CUA), a model that OpenAI developed specifically for GUI-based interaction. CUA built on years of foundational research at the intersection of multimodal understanding and reasoning, combining GPT-4o's vision capabilities with advanced reasoning trained through reinforcement learning. The model was designed to interact with graphical user interfaces the same way humans do, by interpreting buttons, menus, text fields, dropdown lists, and other visual elements on a screen.
CUA operated through an iterative perception-reasoning-action loop that cycled continuously until a task was completed:
Perception: Screenshots from the browser were captured and added to the model's context, providing a visual snapshot of the current state of the web page. Rather than relying on structured HTML, DOM elements, or accessibility trees, CUA analyzed raw pixel data from these screenshots to understand what was displayed on screen. This approach meant the model could work with any website regardless of its underlying technical implementation.
Reasoning: CUA processed the visual input using chain-of-thought reasoning, maintaining an internal monologue that helped it evaluate observations, track intermediate steps, and adapt dynamically to changing conditions. The model could break complex tasks into multi-step plans, deciding which element to click, what text to type, or where to scroll. This inner reasoning process took into consideration both the current screenshot and the history of past screenshots and actions within the session, allowing the model to maintain context about what it had already done and what remained.
Action: The model generated a specific action to execute, such as clicking a button at particular coordinates, entering text into a form field, scrolling up or down, or pressing keyboard shortcuts. The browser then executed this action, the page updated, and the loop repeated with a fresh screenshot.
This screenshot-based approach was significant because it allowed Operator to work with any website without requiring custom API integrations, browser plugins, or site-specific configurations. The model interacted with web pages purely through their visual presentation, in the same way a human user would. This made it inherently generalizable, as there was no dependency on website cooperation or structured data formats.
The reasoning component of CUA was particularly important for handling multi-step tasks. When a user requested a complex operation like booking a restaurant reservation for a specific date and party size, the model needed to plan a sequence of interactions: navigating to the restaurant booking site, entering the date, selecting the party size, choosing a time slot, and confirming the reservation. CUA's chain-of-thought process allowed it to anticipate upcoming steps, handle branching paths (such as when a preferred time slot was unavailable), and maintain progress toward the overall goal even when individual steps did not go as expected.
One of CUA's notable capabilities was adaptive self-correction. When the model encountered unexpected results or errors during task execution, it could recognize the problem and adjust its approach. If a button did not respond as expected, a page loaded differently than anticipated, a popup appeared unexpectedly, or a form validation error occurred, CUA would reassess the situation and try alternative strategies to complete the task. This resilience was essential for navigating the unpredictable nature of real-world websites, where interfaces can behave inconsistently, load slowly, or present unexpected states.
OpenAI described CUA as combining GPT-4o's vision capabilities with advanced reasoning through reinforcement learning, though the company did not publish extensive details about the specific training methodology. The model built on years of foundational research at the intersection of multimodal understanding and reasoning. The reinforcement learning component was critical for teaching the model how to navigate graphical interfaces effectively, as it allowed the system to learn from trial and error which sequences of actions led to successful task completion.
CUA set new state-of-the-art results on several established benchmarks for computer and web interaction at the time of its January 2025 release:
| Benchmark | CUA Score | Previous SOTA | Human Performance | Description |
|---|---|---|---|---|
| OSWorld | 38.1% | 22.0% | 72.4% | Full computer use tasks across operating systems |
| WebArena | 58.1% | N/A | ~78% | Web-based interaction tasks on realistic websites |
| WebVoyager | 87.0% | N/A | N/A | End-to-end real-world web navigation tasks across 15 popular websites |
OSWorld is a benchmark that measures an agent's ability to perform general computer-use tasks across operating systems, including interacting with desktop applications, file systems, and web browsers. CUA achieved a 38.1% success rate, far exceeding the previous state-of-the-art of 22.0%. This represented a roughly 73% relative improvement over prior approaches. However, human performance on the same benchmark stood at 72.4%, illustrating the substantial gap that still existed between AI agents and human capability at the time.
WebArena evaluates agents on web-based interaction tasks using realistic, self-hosted website replicas. The benchmark tests whether agents can navigate complex web applications and complete multi-step tasks. CUA achieved a 58.1% success rate on this benchmark, compared to human performance of approximately 78%.
WebVoyager tests agent performance on end-to-end real-world web tasks across 15 popular websites, with each site containing approximately 40 tasks for a total of 643 evaluation tasks. CUA achieved an 87.0% success rate on WebVoyager, representing strong performance on practical web navigation scenarios.
CUA demonstrated test-time scaling properties, meaning its performance improved when allowed more computational steps to complete a task. When granted additional interaction steps and more time to reason and act, the model could solve problems that it would otherwise fail at with fewer attempts. This property suggested that future improvements in compute availability and efficiency could further enhance the model's capabilities without changes to the underlying architecture.
Users accessed Operator through a dedicated website at operator.chatgpt.com. The interface presented a ChatGPT-style chat window on one side and a miniaturized browser window on the other side that the agent controlled. The homepage displayed suggested prompts for common tasks, such as "Find tickets for the next concert at the Sphere" or "Find a restaurant with a great happy hour for next Wednesday for 6 people," helping users understand the types of tasks Operator could handle.
To use Operator, a user would type a natural language request describing the task they wanted completed. Operator would then:
Users could watch Operator work in real time, observing each click, scroll, and keystroke as it happened. The transparency of this process allowed users to verify the agent's actions and intervene if necessary. At any point, users could click a "take over" button to assume direct control of the browser window.
Operator allowed users to personalize their workflows by adding custom instructions, either globally across all websites or for specific sites. For example, a user could set airline preferences for booking sites or specify dietary restrictions for grocery ordering. Users could also save prompts for quick access on the homepage, which was particularly useful for repeated tasks like weekly grocery restocking on Instacart.
Operator was designed for a wide variety of repetitive browser-based tasks. OpenAI collaborated with companies including DoorDash, Instacart, OpenTable, Priceline, StubHub, Thumbtack, and Uber to test and refine Operator's performance on their platforms. Common use cases included:
| Task Category | Example | Platforms |
|---|---|---|
| Grocery ordering | Finding ingredients for a recipe and adding them to a cart | Instacart, DoorDash |
| Restaurant reservations | Booking a table for a specified date, time, and party size | OpenTable |
| Event tickets | Searching for and purchasing tickets for concerts or sporting events | StubHub |
| Travel booking | Comparing flights, hotels, and vacation packages | Priceline, Booking.com |
| Form filling | Completing online forms, applications, and registration pages | Various |
| Shopping | Browsing products, comparing prices, and adding items to carts | Various e-commerce sites |
| Appointment scheduling | Booking appointments with service providers | Thumbtack |
| Ride hailing | Requesting rides and managing transportation | Uber |
Operator could also handle parallel tasks within a single session. For instance, a user could ask Operator to order groceries on Instacart while simultaneously making a hotel booking on Booking.com.
OpenAI implemented a multi-layered safety framework for Operator, recognizing the unique risks inherent in an AI agent that can take actions on the web with real-world consequences. Unlike a standard chatbot where the worst outcome of an error is an incorrect response, a browser agent that makes a mistake could place an unintended order, send an email to the wrong person, or modify account settings incorrectly. The safety framework included four distinct levels of protection.
The CUA model was trained to request user confirmation before finalizing tasks with external side effects. Before submitting an order, sending an email, deleting a calendar event, making a purchase, or completing any action that could not be easily undone, Operator would pause and display a summary of the pending action, asking the user to review and approve. This allowed users to double-check the agent's work before the action became permanent. According to OpenAI's system card, Operator o3 (a later version) achieved a 92.1% critical confirmation recall rate for financial transactions, with 100% accuracy for editing permissions and completing financial transactions, and 99.9% accuracy for sending high-stakes communications.
When Operator encountered steps requiring sensitive information, such as login credentials, passwords, payment details, or CAPTCHAs, it would pause and prompt the user to "take over" the browser directly. During take-over mode, several important privacy protections activated:
This design ensured that sensitive credentials, payment card numbers, and authentication tokens were never seen or processed by the AI model. Once the user completed the sensitive step (such as logging in or entering payment information), they could hand control back to Operator to continue the task.
On particularly sensitive websites, such as email services, banking platforms, and financial services, Operator required close user supervision through watch mode. This mode had several distinct behaviors:
Watch mode was designed to ensure that the user was actively monitoring the agent's behavior on high-risk sites where mistakes could have serious financial, legal, or personal consequences.
For certain categories of actions where the risk was deemed too significant, OpenAI fully restricted the model from assisting. These hard restrictions could not be overridden by user instructions. Operator would decline tasks involving:
Despite its capabilities, Operator had several notable limitations during its research preview period that reflected the early state of AI browser agent technology.
Accuracy and reliability: Operator did not achieve human-level accuracy. Complex tasks with many steps had a higher likelihood of errors or unexpected outcomes. The model could misinterpret visual elements, misidentify clickable targets, or make incorrect assumptions about interface behavior. On the OSWorld benchmark, which approximates real computer use, the model's 38.1% success rate meant it failed on the majority of tasks.
Complex and non-standard interfaces: Websites with heavy JavaScript rendering, drag-and-drop functionality, intricate calendar or date-picker widgets, interactive maps, or non-standard UI components could confuse the agent. Highly customized or unconventional web interfaces were particularly challenging because the model's visual understanding could not always correctly interpret novel design patterns.
Looping and crashes: Operator could sometimes get stuck in infinite loops on a task step, repeatedly attempting the same action without making progress. It could also freeze entirely, requiring the user to manually intervene or restart the session. These stalls limited the possibility of fully autonomous operation and meant users needed to remain engaged during task execution.
No scheduling or background operation: Operator lacked built-in scheduling or continuous operation capabilities. Users could not schedule tasks for later execution, set up recurring tasks (such as weekly grocery orders), or have Operator run as a background service. Each task was initiated interactively and ran only within its active session. Closing the Operator browser tab or navigating away would stop the task.
Speed: Because Operator relied on capturing screenshots, processing them through the model, generating an action, executing it, and then capturing a new screenshot, it was generally slower than a human performing the same tasks. Each step in the perception-reasoning-action loop added latency, and even simple tasks could take significantly longer than manual completion. Early reviews noted that using Operator was "significantly slower" than performing the same tasks directly.
CAPTCHA handling: While Operator was designed to pause and ask users to take over when encountering CAPTCHAs, reports indicated that the model could sometimes solve simple CAPTCHAs on its own. This raised questions about the boundaries of what the agent should and should not handle autonomously.
Geographic and language limitations: At launch, Operator worked best with English-language websites and services popular in the United States. Performance on websites in other languages or designed for users in other regions was less reliable.
Operator's availability expanded in phases during its roughly seven-month lifespan:
| Date | Availability Change |
|---|---|
| January 23, 2025 | Launched as research preview for ChatGPT Pro subscribers ($200/month) in the US |
| February 21, 2025 | Expanded to Pro subscribers in Australia, Brazil, Canada, India, Japan, Singapore, South Korea, UK, and other countries |
| Later in 2025 | Expanded to most regions where ChatGPT was available, including Europe |
| July 17, 2025 | Functionality integrated into ChatGPT agent; Operator deprecation announced |
| August 31, 2025 | Operator shut down; all chat history permanently deleted |
Throughout its existence, the standalone Operator product remained exclusive to ChatGPT Pro subscribers at $200 per month. OpenAI had announced plans to expand access to Plus ($20/month), Team, and Enterprise subscribers, but this expansion never materialized for the standalone product. Instead, OpenAI chose to integrate the capability directly into ChatGPT through the agent mode, which was made available to a broader set of subscription tiers.
On March 11, 2025, OpenAI released the CUA model to developers through its new Responses API framework as part of a broader set of agent-building tools. The computer use tool was initially offered as a research preview for select developers in usage tiers 3 through 5.
The API integration allowed developers to build their own computer-using agents, with the CUA model generating mouse and keyboard actions that could be applied to various computer environments. Unlike the consumer-facing Operator product, which only controlled a web browser, the API version of CUA could potentially be applied to any graphical environment, including desktop applications and mobile interfaces. Developers could automate tasks like data entry, application workflows, testing, and cross-application processes using the same underlying model that powered Operator.
OpenAI also released a sample application on GitHub (openai/openai-cua-sample-app) to help developers learn how to use CUA via the API on multiple computer environments.
The CUA model in the API was priced at $3 per million input tokens and $12 per million output tokens, following OpenAI's standard token-based billing model. Enterprises could optionally run the CUA model locally on their own systems for additional security and control over data handling.
On July 17, 2025, OpenAI announced the ChatGPT agent, a significant product update that consolidated several previous capabilities into a unified system within ChatGPT. The ChatGPT agent merged Operator's web browsing and task execution abilities with Deep Research's multi-step research and reporting capabilities, along with code execution, file manipulation, and third-party service integrations. OpenAI described it as an agent that "bridges research and action," capable of not just finding information but acting on it.
The ChatGPT agent operates on its own virtual computer accessible from within the ChatGPT.com interface. Rather than being a single model, the agent is an orchestrated system that coordinates multiple tools within a continuous agent loop. This virtual environment includes:
The agent can coordinate across all of these tools within a single task. For example, it can search the web for information using the visual browser, download a dataset, analyze it using Python in the terminal, generate a report, and then email the results through Gmail, all within a single conversation.
The ChatGPT agent represented a substantial expansion beyond what Operator could do. Key differences included:
| Capability | Operator | ChatGPT Agent |
|---|---|---|
| Web browsing | Yes (visual browser only) | Yes (visual and text browsers) |
| Code execution | No | Yes (terminal with Python, shell) |
| File creation and manipulation | No | Yes (spreadsheets, presentations, reports) |
| Third-party service connectors | No | Yes (Gmail, Drive, GitHub, Slack, etc.) |
| Deep Research integration | No | Yes (multi-step research and reporting) |
| Image generation | No | Yes |
| Access within ChatGPT interface | Separate site (operator.chatgpt.com) | Integrated in ChatGPT.com via "agent mode" dropdown |
To access the agent, users select "agent mode" from the dropdown menu in the ChatGPT composer and enter their query directly. The previous Deep Research mode remains available as a separate option for users who specifically want exhaustive research reports without task execution.
ChatGPT agent began rolling out on July 18, 2025. Pro subscribers received immediate access on launch day. Plus and Team users gained access over the following days. Enterprise and Education tiers were scheduled for availability in the subsequent weeks.
Usage was structured with monthly message quotas:
| Subscription Tier | Agent Messages per Month | Monthly Cost |
|---|---|---|
| Pro | 400 | $200 |
| Plus | 40 | $20 |
| Team | 40 | $25/user |
Users who needed additional capacity beyond their monthly quota could purchase extra credits. Free-tier users and European Economic Area residents initially did not have access to the agent mode, with OpenAI citing ongoing regulatory work for the EEA exclusion.
The ChatGPT agent retained and expanded the safety mechanisms that Operator had pioneered. The system provides live narration of its actions, explaining to the user what it is doing and why at each step, making the agent's behavior transparent and auditable. It requests explicit permission before executing high-impact actions such as sending emails, making purchases, or modifying files. Watch mode continues to require user approval for sensitive operations on high-risk websites. Users can wipe all browsing data generated during a session with a single click. Secure take-over mode keeps password inputs and sensitive credentials private from the model, just as it did in Operator.
According to OpenAI's evaluations, the ChatGPT agent performed comparably to or better than the standalone o3 model on many tasks. In internal evaluations spanning over 40 occupations including law, logistics, sales, and engineering, the ChatGPT agent was comparable to or outperformed domain experts in roughly half the cases. The agent's integrated approach, combining browsing, research, and code execution, often produced better results than any single tool used alone.
With the launch of ChatGPT agent, OpenAI announced that the standalone Operator experience at operator.chatgpt.com would be deprecated. The deprecation followed a clear timeline:
Users were warned to save any conversations or task results they wished to keep before the August 31 deadline, as no data recovery would be possible after the shutdown.
The deprecation reflected OpenAI's strategy of consolidating its agent capabilities into a single, unified interface rather than maintaining separate specialized products. By combining Operator's web interaction, Deep Research's analytical depth, and new code execution and connector capabilities into one system, OpenAI aimed to provide a more versatile and accessible tool. The integration also made agent capabilities available to a much broader user base, as the standalone Operator had been restricted to Pro subscribers at $200 per month.
On October 21, 2025, OpenAI announced ChatGPT Atlas, a dedicated web browser with ChatGPT built directly into the browsing experience. Atlas further extended the trajectory that began with Operator by embedding agent capabilities natively within a full web browser, bringing AI assistance to the point where users naturally spend their time online.
Atlas is built on the Chromium open-source project, the same rendering engine used by Google Chrome, Microsoft Edge, and other major browsers. It launched initially for macOS, with versions for Windows, iOS, and Android announced for future release.
Atlas integrates ChatGPT through a sidebar assistant that is always available while browsing. This sidebar can:
Paid subscribers can enable an agent mode within Atlas that allows ChatGPT to interact with websites and complete tasks on the user's behalf, inheriting the Operator-derived capabilities. The agent mode in Atlas benefits from the user's browsing context, meaning it can reference what the user is currently viewing, their browsing history (with permission), and accumulated knowledge from previous sessions.
Atlas also introduced browser memories, a feature that allows ChatGPT to remember facts, preferences, and insights from previously visited sites to provide ongoing contextual assistance. This feature is subject to user privacy controls and can be disabled or selectively managed.
Atlas operates on a freemium model, providing a free version with basic ChatGPT sidebar functionality alongside paid subscriptions. Advanced features, including the agent mode and enhanced memory features, are available to Plus and Pro subscribers. The browser is available globally for Free, Plus, Pro, and Go users, with a beta version available for Business users.
Operator was part of a broader wave of AI browser agents released by major AI companies between late 2024 and 2025. Each company took a distinct approach to architecture, deployment, and safety, reflecting different philosophies about how AI agents should interact with the web. The following table compares the key offerings at the time of their respective launches:
| Feature | OpenAI Operator / ChatGPT Agent | Anthropic Computer Use | Google Project Mariner |
|---|---|---|---|
| Initial Announcement | January 23, 2025 | October 22, 2024 | December 11, 2024 |
| Underlying Model | CUA (GPT-4o + RL) | Claude 3.5 Sonnet (upgraded) | Gemini 2.0 Flash |
| Interaction Method | Screenshot-based; clicks, types, scrolls in dedicated browser | Screenshot-based; cursor movement, clicking, typing on full desktop | Chrome extension; interacts with pixels and web elements in active tab |
| Scope | Web browser only (Operator); full virtual computer (ChatGPT agent) | Full desktop environment (not limited to browser) | Web browser only (Chrome active tab) |
| Deployment | Dedicated site at operator.chatgpt.com (later integrated into ChatGPT) | Developer API; runs in sandboxed environments (Docker containers, VMs) | Chrome extension; cloud-based VMs for processing |
| WebVoyager Score | 87.0% | ~52% (Claude 3.5 Sonnet, Oct 2024) | 83.5% |
| OSWorld Score | 38.1% | 14.9% (screenshot-only, Claude 3.5 Sonnet, Oct 2024) | Not publicly reported |
| Safety Approach | Confirmation mode, watch mode, take-over mode, restricted tasks | ASL-3 safety classifiers; isolated sandbox environments | Cloud-based VMs; real-time user intervention; restricted actions requiring credit card or cookies |
| Initial Availability | ChatGPT Pro subscribers in the US | Developer API (public beta, all developers) | Trusted testers (limited invitation) |
| Subsequent Expansion | Integrated into ChatGPT agent (July 2025) for Pro, Plus, Team users | Claude for Chrome extension (August 2025) for Max plan subscribers | Google AI Ultra subscribers in US (May 2025) |
| Pricing Model | Included with ChatGPT subscription; API: $3/$12 per million tokens | API token-based pricing | Google AI Ultra subscription |
| Background Operation | No (requires active session) | Yes (runs in sandboxed VM) | No (requires active Chrome tab) |
OpenAI took a consumer-first approach with Operator, providing a polished product accessible through a web interface before later integrating it into ChatGPT and offering an API. This prioritized ease of use for non-technical users but limited initial adoption to high-paying Pro subscribers.
Anthropic chose a developer-first strategy, releasing computer use as an API capability in public beta and encouraging developers to build their own computer-using agents in sandboxed environments. This approach prioritized safety (through isolation) and flexibility (through developer customization) but initially lacked a consumer-facing product. Anthropic later launched Claude for Chrome as a browser extension in August 2025 for Max plan subscribers, expanding to Pro, Team, and Enterprise plans by December 2025.
Google positioned Project Mariner as a research prototype, initially limiting access to trusted testers and gradually expanding availability. By May 2025, it became accessible to Google AI Ultra subscribers in the US. Google's approach of running agents on cloud-based VMs (rather than the user's local machine) created physical separation between agent actions and user systems, providing a different security model.
Operator represented an important step in OpenAI's evolution from a provider of conversational AI to a builder of autonomous agents. While the standalone product existed for only about seven months (January to August 2025), its underlying technology and design patterns continue to influence the broader AI agent ecosystem.
The CUA model demonstrated that combining vision capabilities with reinforcement learning could produce an agent capable of interacting with arbitrary graphical interfaces without requiring API integrations or website cooperation. This screenshot-based approach became the dominant paradigm for AI browser agents across the industry, with Anthropic, Google, and smaller companies adopting similar visual interaction strategies.
The multi-layered safety framework developed for Operator, including confirmation mode, watch mode, take-over mode, and restricted task categories, established patterns that carried forward into ChatGPT agent and set industry expectations for how AI browser agents should handle sensitive operations. The principle that AI agents should request human approval before irreversible actions became a widely adopted standard.
Operator also validated the market demand for AI browser agents, demonstrating that users were willing to pay premium prices for the convenience of automated web task execution, even in its imperfect research preview state. This market validation contributed to increased investment across the industry in agent technology during 2025.
The transition from Operator as a standalone product to an integrated feature within ChatGPT reflected a broader trend in AI product design: rather than building separate specialized tools, companies found more success embedding agent capabilities into existing platforms where users already spend time. This integration strategy, culminating in the ChatGPT Atlas browser, established a model that other companies have since followed.