# OpenAI Operator

> Source: https://aiwiki.ai/wiki/openai_operator
> Updated: 2026-06-21
> Categories: AI Agents, Artificial Intelligence, OpenAI
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

**OpenAI Operator** was an [AI agent](/wiki/ai_agents) developed by [OpenAI](/wiki/openai) that could autonomously perform tasks through web browser interactions. Launched on January 23, 2025, as a research preview for [ChatGPT](/wiki/chatgpt) Pro subscribers in the United States, Operator was powered by a new model called the **Computer-Using [Agent](/wiki/agent) (CUA)**, which combined [GPT-4o](/wiki/gpt_4o)'s vision capabilities with [reinforcement learning](/wiki/reinforcement_learning) to interact with graphical user interfaces.[1][2] At launch, CUA set a new state of the art on agent benchmarks, scoring 38.1% on OSWorld, 58.1% on WebArena, and 87.0% on WebVoyager.[2] Operator represented one of OpenAI's first consumer-facing AI agents, designed to browse the web and complete real-world tasks on behalf of users.[1] The standalone product was deprecated on August 31, 2025, after its core functionality was integrated into ChatGPT as part of the new **ChatGPT agent** mode, announced on July 17, 2025.[4]

## What was OpenAI Operator?

AI agents that browse the web and interact with software on behalf of users became a major focus in the AI industry during 2024 and 2025. Several major AI companies raced to ship browser-based agents that could go beyond generating text and actually take actions in digital environments.

[Anthropic](/wiki/anthropic) was first to demonstrate the capability publicly, introducing [computer use](/wiki/computer_use) in public beta for the upgraded [Claude](/wiki/claude) 3.5 Sonnet model on October 22, 2024.[13] The feature let Claude interact with a computer's desktop by moving the cursor, clicking, and typing, accessible to developers through the [Anthropic API](/wiki/anthropic_api).[13] Shortly after, on December 11, 2024, [Google DeepMind](/wiki/google_deepmind) unveiled **Project Mariner**, a research prototype built on [Gemini](/wiki/gemini) 2.0 Flash that navigated websites through a Chrome extension.[14][20] Cognition's [Devin](/wiki/devin), released in March 2024, had already established a related but narrower template: an autonomous agent that wrote and ran code in a sandboxed dev environment.

OpenAI entered this competitive space with Operator in January 2025, positioning it as one of its first consumer-facing AI agents.[1][16] CEO [Sam Altman](/wiki/sam_altman) had been signaling the company's interest in agents for months, and Operator was the most visible realization of that vision. Unlike traditional chatbots that only generate text, Operator could click buttons, fill out forms, scroll, and type into input fields, taking direct actions on the web on a user's behalf.[1] The name reflected the product's role as a digital assistant that operates web interfaces on behalf of the user, mediating between natural-language intent and the long sequence of browser actions needed to satisfy that intent.

## How did the Computer-Using Agent (CUA) model work?

At the core of Operator was the **Computer-Using Agent (CUA)**, a model that OpenAI developed specifically for GUI-based interaction.[2] CUA built on years of foundational research at the intersection of [multimodal](/wiki/multimodal_ai) understanding and reasoning, combining GPT-4o's vision capabilities with advanced reasoning trained through reinforcement learning.[2] The model was designed to interact with graphical user interfaces the same way humans do, by interpreting buttons, menus, text fields, dropdown lists, and other visual elements on a screen.[2] OpenAI described the result as a single general-purpose model: "CUA is trained to interact with graphical user interfaces (GUIs), the buttons, menus, and text fields people see on a screen, just as humans do."[2]

### Technical architecture

CUA operated through an iterative **perception-reasoning-action loop** that cycled continuously until a task was completed:[2]

1. **Perception:** Screenshots from the browser were captured and added to the model's context, providing a visual snapshot of the current state of the web page. Rather than relying on structured HTML, DOM elements, or accessibility trees, CUA analyzed raw pixel data from these screenshots to understand what was displayed on screen.[2] This approach meant the model could work with any website regardless of its underlying technical implementation.

2. **[Reasoning](/wiki/reasoning):** CUA processed the visual input using chain-of-thought reasoning, maintaining an internal monologue that helped it evaluate observations, track intermediate steps, and adapt dynamically to changing conditions.[2] The model could break complex tasks into multi-step plans, deciding which element to click, what text to type, or where to scroll. This inner reasoning process took into consideration both the current screenshot and the history of past screenshots and actions within the session, allowing the model to maintain context about what it had already done and what remained.

3. **Action:** The model generated a specific action to execute, such as clicking a button at particular coordinates, entering text into a form field, scrolling up or down, or pressing keyboard shortcuts.[2] The browser then executed this action, the page updated, and the loop repeated with a fresh screenshot.

This screenshot-based approach was significant because it allowed Operator to work with any website without requiring custom API integrations, browser plugins, or site-specific configurations.[2] The model interacted with web pages purely through their visual presentation, in the same way a human user would. This made it inherently generalizable, as there was no dependency on website cooperation or structured data formats.

### Chain-of-thought reasoning and self-correction

The reasoning component of CUA was particularly important for handling multi-step tasks like booking a restaurant reservation for a specific date and party size, where the model had to navigate the booking site, enter the date, select the party size, choose a time slot, and confirm. CUA's chain-of-thought process allowed it to anticipate upcoming steps, handle branching paths such as a preferred time slot being unavailable, and maintain progress when individual steps did not go as expected.[2]

Adaptive self-correction was the second piece. When the model encountered unexpected results or errors during execution, it could recognize the problem and adjust.[2] If a button did not respond, a page loaded differently than anticipated, a popup appeared, or a form validation error occurred, CUA would reassess and try alternative strategies. This resilience was essential for the unpredictable nature of real-world websites.

### Training methodology

OpenAI described CUA as combining GPT-4o's vision capabilities with reasoning trained through reinforcement learning, though the company did not publish detailed training methodology.[2] The reinforcement learning component was critical for teaching the model how to navigate graphical interfaces effectively, allowing the system to learn from trial and error which sequences of actions led to successful task completion.[2] CUA built on years of internal OpenAI research at the intersection of multimodal understanding and reasoning, and inherited the visual perception stack from GPT-4o.

## How did Operator perform on benchmarks?

CUA set new state-of-the-art results on several established benchmarks for computer and web interaction at the time of its January 2025 release:[2]

| Benchmark | CUA Score | Previous SOTA | Human Performance | Description |
|---|---|---|---|---|
| OSWorld | 38.1% | 22.0% | 72.4% | Full computer use tasks across operating systems |
| WebArena | 58.1% | N/A | ~78% | Web-based interaction tasks on realistic websites |
| WebVoyager | 87.0% | N/A | N/A | End-to-end real-world web navigation tasks across 15 popular websites |

### OSWorld

OSWorld is a benchmark that measures an agent's ability to perform general computer-use tasks across operating systems, including interacting with desktop applications, file systems, and web browsers. CUA achieved a 38.1% success rate, far exceeding the previous state-of-the-art of 22.0%.[2] This represented a roughly 73% relative improvement over prior approaches. However, human performance on the same benchmark stood at 72.4%, illustrating the substantial gap that still existed between AI agents and human capability at the time.[2]

### WebArena

WebArena evaluates agents on web-based interaction tasks using realistic, self-hosted website replicas. The benchmark tests whether agents can navigate complex web applications and complete multi-step tasks. CUA achieved a 58.1% success rate on this benchmark, compared to human performance of approximately 78%.[2]

### WebVoyager

WebVoyager tests agent performance on end-to-end real-world web tasks across 15 popular websites, with each site containing approximately 40 tasks for a total of 643 evaluation tasks. CUA achieved an 87.0% success rate on WebVoyager, representing strong performance on practical web navigation scenarios.[2]

### Test-time scaling

CUA demonstrated **test-time scaling** properties, meaning its performance improved when allowed more computational steps to complete a task.[2] When granted additional interaction steps and more time to reason and act, the model could solve problems that it would otherwise fail at with fewer attempts. This property suggested that future improvements in compute availability and efficiency could further enhance the model's capabilities without changes to the underlying architecture.

### Benchmark progress after Operator

In the year following Operator's launch, the broader industry treated CUA's 38.1% OSWorld score as the floor for serious agent work, and frontier models from multiple labs raced past it. In July 2025, the XLANG Lab that maintains [OSWorld](/wiki/osworld) released **OSWorld-Verified**, a tightened version of the benchmark that added community-driven task fixes, AWS-based parallel evaluation that reduced run time to roughly one hour, and stricter task signals intended to make scores harder to game.[29] By early 2026, OSWorld-Verified had largely displaced the original benchmark in published comparisons.

Progress on OSWorld-Verified was rapid through late 2025 and into 2026. [Claude](/wiki/claude) Sonnet 4.5 reached 61.4% in September 2025, the first frontier model to break past the 60% mark on the new variant. Through the first half of 2026, models from OpenAI and [Anthropic](/wiki/anthropic) traded the top spot:

| Model | OSWorld-Verified score | Notes |
|---|---|---|
| OpenAI CUA (Jan 2025 baseline) | 38.1% (original OSWorld) | The initial Operator launch number |
| Claude Sonnet 4.5 | 61.4% | September 2025; first model past 60% |
| OpenCUA-72B | 45.0% | Open-source baseline established by XLANG Lab in 2025 |
| GPT-5.4 | 75.0% | March 2026 OpenAI release |
| Claude Opus 4.7 | 78.0% | Late 2025 and early 2026 leaderboard contender |
| GPT-5.5 | 78.7% | April 2026; among the top published scores |
| Claude Mythos Preview | 79.6% | Top published score on OSWorld-Verified as of mid-2026 |

The progression matters in context: a benchmark on which OpenAI's January 2025 CUA could complete only 38.1% of tasks now has multiple frontier systems above the 72.4% human baseline from the original OSWorld paper. The underlying CUA architecture, with its perception-reasoning-action loop and screenshot-only interface, became the template against which these later models were measured, even as the specific weights and reasoning backends underneath changed several times.

## How did Operator work for users?

Users accessed Operator through a dedicated website at **operator.chatgpt.com**.[1] The interface presented a ChatGPT-style chat window on one side and a miniaturized browser window on the other side that the agent controlled. The homepage displayed suggested prompts for common tasks, such as "Find tickets for the next concert at the Sphere" or "Find a restaurant with a great happy hour for next Wednesday for 6 people," helping users understand the types of tasks Operator could handle.

### Task execution

To use Operator, a user would type a natural language request describing the task they wanted completed. Operator would then:

1. Parse the user's request to understand the goal and identify the relevant website or service
2. Open its built-in browser and navigate to the appropriate website
3. Begin executing the task step by step, with each action visible in the browser window
4. Pause at certain checkpoints to request user confirmation, additional information, or input
5. Complete the task or report back with results and ask follow-up questions if needed

Users could watch Operator work in real time, observing each click, scroll, and keystroke as it happened. The transparency of this process allowed users to verify the agent's actions and intervene if necessary. At any point, users could click a "take over" button to assume direct control of the browser window.[1]

### Personalization

Operator allowed users to personalize their workflows by adding custom instructions, either globally across all websites or for specific sites.[1] For example, a user could set airline preferences for booking sites or specify dietary restrictions for grocery ordering. Users could also save prompts for quick access on the homepage, which was particularly useful for repeated tasks like weekly grocery restocking on Instacart.

### What tasks could Operator do?

Operator was designed for a wide variety of repetitive browser-based tasks. OpenAI collaborated with companies including DoorDash, Instacart, OpenTable, Priceline, StubHub, Thumbtack, and Uber to test and refine Operator's performance on their platforms.[1][19] Common use cases included:

| Task Category | Example | Platforms |
|---|---|---|
| Grocery ordering | Finding ingredients for a recipe and adding them to a cart | Instacart, DoorDash |
| Restaurant reservations | Booking a table for a specified date, time, and party size | OpenTable |
| Event tickets | Searching for and purchasing tickets for concerts or sporting events | StubHub |
| Travel booking | Comparing flights, hotels, and vacation packages | Priceline, Booking.com |
| Form filling | Completing online forms, applications, and registration pages | Various |
| Shopping | Browsing products, comparing prices, and adding items to carts | Various e-commerce sites |
| Appointment scheduling | Booking appointments with service providers | Thumbtack |
| Ride hailing | Requesting rides and managing transportation | Uber |

Operator could also handle parallel tasks within a single session. For instance, a user could ask Operator to order groceries on Instacart while simultaneously making a hotel booking on Booking.com.

## How did Operator keep tasks safe?

OpenAI implemented a multi-layered safety framework for Operator, recognizing the unique risks inherent in an AI agent that can take actions on the web with real-world consequences.[3] Unlike a standard chatbot where the worst outcome of an error is an incorrect response, a browser agent that makes a mistake could place an unintended order, send an email to the wrong person, or modify account settings incorrectly. The safety framework included four distinct levels of protection.[3]

### Confirmation mode

The CUA model was trained to request user confirmation before finalizing tasks with external side effects.[3] Before submitting an order, sending an email, deleting a calendar event, making a purchase, or completing any action that could not be easily undone, Operator would pause and display a summary of the pending action, asking the user to review and approve.[3] This allowed users to double-check the agent's work before the action became permanent. According to OpenAI's system card, Operator o3 (a later version) achieved a 92.1% critical confirmation recall rate for financial transactions, with 100% accuracy for editing permissions and completing financial transactions, and 99.9% accuracy for sending high-stakes communications.[10]

### Take-over mode

When Operator encountered steps requiring sensitive information, such as login credentials, passwords, payment details, or CAPTCHAs, it would pause and prompt the user to "take over" the browser directly.[3] During take-over mode, several important privacy protections activated:

- Operator stopped capturing screenshots of the browser
- No information entered by the user was collected or stored by the model
- The model could not observe what the user typed or clicked
- Credentials were not retained across sessions; each new task requiring a login needed fresh authentication

This design ensured that sensitive credentials, payment card numbers, and authentication tokens were never seen or processed by the AI model.[3] Once the user completed the sensitive step (such as logging in or entering payment information), they could hand control back to Operator to continue the task.

### Watch mode

On particularly sensitive websites, such as email services, banking platforms, and financial services, Operator required close user supervision through watch mode.[3] This mode had several distinct behaviors:

- Operator automatically paused execution when the user became inactive or navigated away from the Operator conversation window
- The user had to remain present and attentive for the agent to continue operating on these sensitive sites
- Actions on these sites were subject to more frequent confirmation requests

Watch mode was designed to ensure that the user was actively monitoring the agent's behavior on high-risk sites where mistakes could have serious financial, legal, or personal consequences.

### Restricted tasks

For certain categories of actions where the risk was deemed too significant, OpenAI fully restricted the model from assisting.[3] These hard restrictions could not be overridden by user instructions. Operator would decline tasks involving:

- Buying or selling stocks and securities
- Direct banking transactions (transfers, payments)
- Tasks requiring sensitive financial decision-making
- Actions that could permanently affect legal rights or contractual obligations
- Other high-stakes operations determined to be beyond the acceptable risk threshold during the research preview phase

## What were Operator's limitations?

Operator had several notable limitations during its research preview period that reflected the early state of [AI browser agent](/wiki/ai_browser_agent) technology.

**Accuracy and reliability:** Operator did not achieve human-level accuracy. Long, multi-step tasks accumulated errors. The model could misinterpret visual elements, misidentify clickable targets, or make incorrect assumptions about interface behavior. On the OSWorld benchmark, which approximates real computer use, the 38.1% success rate meant it failed on the majority of tasks.[2]

**Complex and non-standard interfaces:** Websites with heavy JavaScript rendering, drag-and-drop functionality, intricate calendar or date-picker widgets, interactive maps, or other non-standard UI components could confuse the agent because the model's visual understanding did not always correctly interpret novel design patterns.

**Looping and crashes:** Operator could sometimes get stuck in infinite loops, repeatedly attempting the same action without making progress, or freeze entirely. These stalls limited fully autonomous operation and meant users had to stay engaged.

**No scheduling or background operation:** Operator lacked built-in scheduling or continuous operation. Users could not schedule tasks for later, set up recurring tasks like weekly grocery orders, or have Operator run as a background service. Each task ran only within its active session, and closing the tab stopped the task.

**Speed:** Because Operator relied on capturing screenshots, processing them, generating an action, executing it, and capturing a new screenshot, it was generally slower than a human performing the same tasks. Early reviews described it as "significantly slower" than direct manual completion.[18]

**CAPTCHA handling:** While Operator was designed to ask users to take over for CAPTCHAs, reports indicated it could sometimes solve simple ones on its own, raising questions about the boundaries of what the agent should handle autonomously.

**Geographic and language limitations:** At launch, Operator worked best with English-language websites and US-centric services. Performance on websites in other languages or other regions was less reliable.

## What was Operator good at in practice?

In the months following launch, reviewers and users developed a clearer picture of which tasks Operator handled well and which it struggled with. The pattern that emerged was that Operator performed best on narrow, well-trodden, high-volume web flows on partner sites that OpenAI had explicitly tested against, and worse on anything off the beaten path.

| Task category | Reliability in practice | Notes |
|---|---|---|
| Restaurant reservations on OpenTable | Generally reliable | Operator could correctly apply date, time, party size, and cuisine filters, then commit to a slot. |
| Grocery ordering on Instacart | Generally reliable | Adding ingredients from a recipe to a single cart was a frequently demonstrated and repeatable workflow. |
| Travel search on Priceline and Booking.com | Reliable for searching, less for booking | Comparing flights and hotels worked, but checkout typically required take-over for payment. |
| Single-page form filling | Reliable | Static forms with clear labels were one of Operator's strongest cases. |
| Multi-tab research and aggregation | Mixed | Worked on simple comparisons, struggled when results spanned many sites or required reasoning about subtle differences. |
| CAPTCHAs and anti-bot interstitials | Mixed | Often handed off to the user; some simple CAPTCHAs were reportedly solved by the model on its own, raising questions about agent boundaries. |
| Calendar widgets and date pickers | Unreliable | Custom calendar components, especially those using JavaScript drag-and-drop, frequently confused the agent. |
| Slide creation, document layout, and visual editors | Poor | Operator was not built for slideshow editors or other visually rich production tools, which it tended to misinterpret. |
| Long-running multi-hour tasks | Poor | Sessions could stall, loop, or time out before completing the task. |
| Bot-detected sites with login walls | Poor | Account verification, two-factor prompts, and aggressive bot detection routinely interrupted execution. |

A common observation from early reviewers was that, even on tasks Operator could complete, performing the work manually was often faster than supervising the agent. The value proposition skewed toward batchable tasks (such as a recurring weekly grocery order) and toward users who preferred delegating the click-through work even when it was not a strict speed win.

## How much did Operator cost and where was it available?

Operator's availability expanded in phases during its roughly seven-month lifespan:

| Date | Availability Change |
|---|---|
| January 23, 2025 | Launched as research preview for ChatGPT Pro subscribers ($200/month) in the US |
| February 21, 2025 | Expanded to Pro subscribers in Australia, Brazil, Canada, India, Japan, Singapore, South Korea, UK, and other countries |
| Later in 2025 | Expanded to most regions where ChatGPT was available, including Europe |
| July 17, 2025 | Functionality integrated into ChatGPT agent; Operator deprecation announced |
| August 31, 2025 | Operator shut down; all chat history permanently deleted |

Throughout its existence, the standalone Operator product remained exclusive to ChatGPT Pro subscribers at $200 per month.[1] OpenAI had announced plans to expand access to Plus ($20/month), Team, and Enterprise subscribers, but this expansion never materialized for the standalone product.[1] Instead, OpenAI chose to integrate the capability directly into ChatGPT through the agent mode, which was made available to a broader set of subscription tiers.[4]

## How was Operator received?

Operator's January 2025 debut drew heavy press attention because it was the first time most ChatGPT Pro subscribers had seen an OpenAI product that took web actions on their behalf, rather than only producing text. Reception split into roughly three groups: paying users testing the product against everyday errands, AI press evaluating it as a category-defining release, and security researchers probing the safety story.

### Paying user reactions

Among ChatGPT Pro subscribers, reactions were mixed. Reviewers consistently said Operator was capable of completing common tasks like booking dinner reservations, ordering groceries, and filling out forms, but that the experience was slower than performing the same task manually.[18] The combination of frequent confirmation prompts, watch-mode pauses, and the screenshot loop's inherent latency meant that even when Operator finished a task correctly, it rarely felt time-saving on the first attempt. Users who returned to Operator most often were those running repeated, batch-style tasks, such as restocking groceries from a saved recipe list, where the upfront supervision could be amortized over multiple uses.

A second common complaint concerned reliability. Operator could complete a flight comparison flawlessly on Tuesday and stall or loop on the same site on Wednesday, often because of small layout changes, A/B tests, or intermittent ad overlays. This unevenness made it difficult to trust the agent for anything time-sensitive without monitoring.

### Press coverage

The MIT Technology Review described Operator as the most ambitious consumer agent any major lab had shipped, while warning that it was "very much a work in progress."[18] TechCrunch and the Verge framed it as OpenAI's first real attempt to commercialize agents, noting both the breakthrough nature of the screenshot-based approach and the practical limits of a product locked behind a $200-per-month tier.[16] VentureBeat highlighted the partner integrations with DoorDash, Instacart, OpenTable, Priceline, StubHub, Thumbtack, and Uber as evidence that OpenAI saw consumer commerce as the initial wedge.[19]

Bloomberg and the Information emphasized the strategic context: OpenAI, having watched Anthropic ship computer use three months earlier and Google ship Project Mariner six weeks earlier, needed a visible agent product. Operator was as much a market-positioning statement as it was a finished tool.

### Independent reviews

Independent reviewer Simon Willison covered Operator extensively from launch and treated it as more interesting from a security and architecture standpoint than as a productivity tool.[25] Willison wrote that the leaked Operator system prompt, surfaced by Johann Rehberger, "doubles as better written documentation than any of the official sources," and used it to walk through how the model decides when to confirm, when to take over, and when to refuse.[25] His broader take was that Operator validated the agent paradigm but inherited every unsolved problem of [LLMs](/wiki/large_language_model) on the open web, particularly prompt injection.

### Safety researcher concerns

Security researcher Johann Rehberger demonstrated, in February 2025, that indirect prompt injection attacks against Operator were practical.[26] In one widely discussed proof of concept, an attacker hid instructions inside a GitHub issue. When Operator visited the issue as part of a benign user task, the embedded instructions caused it to navigate to a site where the user was already logged in, scrape personal information such as email addresses and phone numbers, and exfiltrate that information back to the attacker by encoding it into a URL.[26][27]

Rehberger's work showed that the visible action stream and watch mode helped attentive users notice anomalies, but that an inattentive user, or one running Operator across multiple parallel tasks, could plausibly miss the attack.[26] OpenAI responded by tightening confirmation prompts on certain navigation patterns and by expanding watch mode coverage on sensitive sites, but Rehberger and other researchers continued to argue that prompt injection was an unsolved problem at the model level rather than something that could be fully patched at the product layer.

These findings became part of a broader 2025 conversation about [AI safety](/wiki/ai_safety) for browser agents. The same underlying concerns later resurfaced for Anthropic's Claude for Chrome and for OpenAI's own Atlas browser, both of which received fresh prompt-injection demonstrations within days of launch.[21]

### Jailbreak and refusal behavior

Researchers also probed Operator's refusal behavior on restricted tasks. Hard restrictions on stock trading, banking transfers, and certain legal actions held under direct prompts, but softer guardrails (such as the disposition to ask before clicking "Place order") could be weakened by elaborate role-play prompts or chained instructions. The pattern was familiar from chat-based jailbreaks: the model's underlying willingness to take an action could be raised through context manipulation, even when the wrapper layer eventually caught the result.

## Subsequent updates

Operator received several model and policy updates during its short standalone life.

### CUA in the API (March 2025)

On March 11, 2025, OpenAI made the CUA model available to developers through its new Responses API as part of a broader set of agent-building tools.[9] Access was initially limited to a research preview for select developers in usage tiers 3 through 5.[9] Pricing was set at $3 per million input tokens and $12 per million output tokens.[9]

Unlike the consumer-facing Operator product, which controlled only a web browser, the API version of CUA could in principle drive any graphical environment that exposed screenshots, including desktop applications and mobile interfaces. Common targets included data entry, application workflow automation, internal tool testing, and cross-application processes that previously required brittle scraping or robotic process automation. OpenAI published a sample application on GitHub (openai/openai-cua-sample-app), and enterprises could optionally run CUA locally for additional security and data control.[9]

### Geographic expansion (February to mid-2025)

On February 21, 2025, Operator rolled out to ChatGPT Pro subscribers in Australia, Brazil, Canada, India, Japan, Singapore, South Korea, and the United Kingdom, among other countries.[17] Subsequent expansions in the spring brought it to most regions where ChatGPT Pro was sold, with the European Economic Area lagging because of regulatory review.

### Operator on o3 (May 2025)

On May 23, 2025, OpenAI replaced the GPT-4o-based CUA with a version built on the [o3](/wiki/openai_o-series) reasoning model, branded internally as "Operator o3."[10][22][23] OpenAI published an addendum to the o3 and o4-mini system card describing the change.[10] The reasoning model produced large performance gains on agent benchmarks. On the GAIA benchmark, which evaluates general assistant tasks requiring tool use, the o3-based Operator scored 62.2 compared with 12.3 for the previous model.[10] OpenAI also reported that the new version asked for confirmation on 94% of sensitive actions and on 100% of financial transactions, with critical confirmation recall of about 92% for editing permissions and high-stakes communications.[10]

Reviewers reported that the o3-based Operator was visibly more persistent: it gave up less often when a page failed to load, recovered better from unexpected popups, and required fewer manual nudges across long tasks.[24] The trade-off was latency, since the underlying reasoning model spent more compute per step. The o3 upgrade did not change Operator's pricing or its ChatGPT Pro exclusivity, but it sharpened the value proposition for subscribers who were on the fence about renewing.[22]

### Other policy and safety updates

Across spring and early summer 2025, OpenAI iterated on Operator's safety surface.[8] Watch mode coverage was expanded to include more email, banking, and government-services domains. The hard-restricted task list was tightened around stock trading, certain regulated transactions, and tasks involving sensitive identity verification. Confirmation prompts were tuned to fire on a wider range of "irreversible" actions, including certain account setting changes that earlier versions had allowed without explicit user approval.

## How did Operator become ChatGPT agent?

On July 17, 2025, OpenAI announced the **ChatGPT agent**, a significant product update that consolidated several previous capabilities into a unified system within ChatGPT.[4] The ChatGPT agent merged Operator's web browsing and task execution abilities with [Deep Research](/wiki/deep_research)'s multi-step research and reporting capabilities, along with code execution, file manipulation, and third-party service integrations.[4] OpenAI described it as an agent that "bridges research and action," capable of not just finding information but acting on it.[4]

### Architecture and capabilities

The ChatGPT agent operates on its own **virtual computer** accessible from within the ChatGPT.com interface.[4] Rather than being a single model, the agent is an orchestrated system that coordinates multiple tools within a continuous agent loop.[4] This virtual environment includes:

- **Visual web browser:** A full graphical browser for navigating and interacting with websites, inheriting the core screenshot-based interaction functionality that Operator introduced. The agent can click, scroll, type, and interact with web pages just as Operator did.[4]
- **Text-based browser:** A lightweight, fast browser for retrieving web content without full visual rendering, useful for quickly pulling information from pages.[4]
- **Terminal:** A command-line environment with restricted network access for code execution, file manipulation, data analysis, and running scripts. The terminal can generate outputs such as spreadsheets, presentations, and reports.[4]
- **Connectors (Apps):** Integration with external services including Gmail, Google Drive, GitHub, Google Calendar, SharePoint, Outlook, Slack, HubSpot, Dropbox, Box, Notion, and others. These connectors allow the agent to read from and write to external services as part of its workflows.[4]

The agent can coordinate across all of these tools within a single task. For example, it can search the web for information using the visual browser, download a dataset, analyze it using Python in the terminal, generate a report, and then email the results through Gmail, all within a single conversation.[4]

### How is ChatGPT agent different from Operator?

The ChatGPT agent represented a substantial expansion beyond what Operator could do. Key differences included:

| Capability | Operator | ChatGPT Agent |
|---|---|---|
| Web browsing | Yes (visual browser only) | Yes (visual and text browsers) |
| Code execution | No | Yes (terminal with Python, shell) |
| File creation and manipulation | No | Yes (spreadsheets, presentations, reports) |
| Third-party service connectors | No | Yes (Gmail, Drive, GitHub, Slack, etc.) |
| Deep Research integration | No | Yes (multi-step research and reporting) |
| Image generation | No | Yes |
| Access within ChatGPT interface | Separate site (operator.chatgpt.com) | Integrated in ChatGPT.com via "agent mode" dropdown |

To access the agent, users select "agent mode" from the dropdown menu in the ChatGPT composer and enter their query directly.[4] The previous Deep Research mode remains available as a separate option for users who specifically want exhaustive research reports without task execution.

### Availability and usage limits

ChatGPT agent began rolling out on July 18, 2025. Pro subscribers received immediate access on launch day. Plus and Team users gained access over the following days. Enterprise and Education tiers were scheduled for availability in the subsequent weeks.[4]

Usage was structured with monthly message quotas:[7]

| Subscription Tier | Agent Messages per Month | Monthly Cost |
|---|---|---|
| Pro | 400 | $200 |
| Plus | 40 | $20 |
| Team | 40 | $25/user |

Users who needed additional capacity beyond their monthly quota could purchase extra credits. Free-tier users and European Economic Area residents initially did not have access to the agent mode, with OpenAI citing ongoing regulatory work for the EEA exclusion. As of 2026, only the initial user-initiated agent request counts against the monthly quota; intermediate clarifications, authentication hand-offs, and follow-up turns within the same task do not consume additional message credits.[7]

### Safety features in ChatGPT agent

The ChatGPT agent retained and expanded the safety mechanisms that Operator pioneered. The system provides live narration of its actions, requests explicit permission before high-impact actions such as sending emails or making purchases, keeps watch mode for sensitive sites, lets users wipe browsing data with a single click, and keeps sensitive inputs in secure take-over mode private from the model.[5]

### Performance

According to OpenAI's evaluations, the ChatGPT agent performed comparably to or better than the standalone [o3](/wiki/openai_o-series) model on many tasks.[5] Across internal evaluations spanning over 40 occupations including law, logistics, sales, and engineering, the ChatGPT agent matched or outperformed domain experts in roughly half the cases.[5] The integrated approach, combining browsing, research, and code execution, often produced better results than any single tool used alone.

## When was Operator deprecated?

With the launch of ChatGPT agent, OpenAI announced that the standalone Operator experience at operator.chatgpt.com would be deprecated.[4] The deprecation followed a clear timeline:

- **July 17, 2025:** OpenAI announced ChatGPT agent and stated that Operator would sunset "in the coming weeks"[4]
- **July 18, 2025 onward:** Users were advised to transition to the ChatGPT agent mode for browser-based task execution
- **August 31, 2025:** Operator was shut down completely. The operator.chatgpt.com website became inaccessible, and all stored chat history was permanently deleted[8]

Users were warned to save any conversations or task results they wished to keep before the August 31 deadline, as no data recovery would be possible after the shutdown.

The deprecation reflected OpenAI's strategy of consolidating its agent capabilities into a single, unified interface rather than maintaining separate specialized products. By combining Operator's web interaction, Deep Research's analytical depth, and new code execution and connector capabilities into one system, OpenAI aimed to provide a more versatile and accessible tool. The integration also made agent capabilities available to a much broader user base, as the standalone Operator had been restricted to Pro subscribers at $200 per month.

## ChatGPT Atlas browser (October 2025)

On October 21, 2025, OpenAI announced **ChatGPT Atlas**, a dedicated web browser with ChatGPT built directly into the browsing experience.[6] Atlas further extended the trajectory that began with Operator by embedding agent capabilities natively in a full Chromium-based browser. The product launched initially for macOS, with versions for Windows, iOS, and Android announced for later release.[6]

Atlas integrates ChatGPT through an always-available sidebar that can answer questions about the current page, summarize content, compare products across sites, analyze data, and rewrite selected text.[6] Paid subscribers can enable an **agent mode** inside Atlas that allows ChatGPT to take actions on websites, inheriting the Operator-derived capabilities and benefiting from the user's current browsing context.[6] Atlas also introduced **browser memories**, which let ChatGPT retain facts and preferences across sites, subject to user privacy controls.[6] Users can ask Atlas to recall things like "find all the job postings I was looking at last week and summarize the industry trends," and the browser maintains an archive of session context that the user can view, archive, or delete at any time. By default, OpenAI does not use browsed content to train its models, though users can opt in.[6]

The browser uses a freemium model: a free tier provides the basic sidebar, while agent mode and enhanced memory features are restricted to Plus and Pro subscribers, with a separate beta for Business users.[6] Like Operator and ChatGPT agent before it, Atlas drew immediate prompt-injection demonstrations from independent researchers shortly after launch, reinforcing the unsolved nature of that class of attack on browser agents.

### OWL: Atlas's integration architecture

OpenAI described Atlas's internal architecture in a separate blog post titled "How we built OWL, the new architecture behind our ChatGPT-based browser, Atlas."[11] Rather than embedding Chromium inside the Atlas application process, OWL ("OpenAI's Web Layer") runs Chromium's browser process outside the main Atlas app and brokers communication between the model, the browser, and the user interface.[11] This separation provides three properties that the operator.chatgpt.com architecture could not cleanly support:

- **Process isolation between agent sessions:** Each agent task runs in its own browser process with its own cookies, storage, and network state, so a long-running agent task in one tab cannot taint a regular browsing session in another.[11]
- **A clean sandbox boundary for agent actions:** Atlas's agent mode cannot execute arbitrary code in the browser process, download files, install extensions, access other applications on the host operating system, or touch the local file system. These restrictions are enforced outside the model, so a successful prompt injection cannot lift them.[11]
- **A common substrate for browsing, summarization, and agent mode:** The sidebar's read-only features and the full agent mode share the same browser process, which lets the model see the user's current page context when running tasks, instead of opening a separate ephemeral browser as Operator did.[11]

The practical consequence is that Atlas's agent mode behaves like a tighter, more contextual version of Operator: the agent benefits from the page the user is already on, while OWL's sandbox enforces the restrictions Operator previously implemented inside the model and product wrapper.

### Adversarial training against prompt injection (late 2025 to 2026)

Through late 2025 and into 2026, OpenAI publicly described the prompt-injection problem for Atlas and the descendants of Operator as a security problem that is "unlikely to ever be fully solved," comparing it to scams and social engineering that target humans.[12][31][32] In a December 2025 blog post titled "Continuously hardening ChatGPT Atlas against prompt injection attacks," OpenAI detailed a defense pipeline that combined three layers:[12]

- **An adversarially trained agent model:** OpenAI fine-tuned the agent backing Atlas, and transitively the CUA descendants in ChatGPT agent mode, on adversarial trajectories generated by an automated red-teaming system. The defending model was trained to recognize and ignore embedded instructions inside web content.[12]
- **An LLM-based automated attacker:** OpenAI built an internal attacker model and trained it end-to-end with reinforcement learning to discover successful prompt injections against the agent. The attacker improved over time as it learned from its own successes and failures, creating a continuous adversarial loop with the defender.[12]
- **Surrounding safeguards:** Confirmation prompts, watch mode coverage, and the OWL sandbox continued to act as fallbacks. OpenAI's framing was that prompt injection should be treated as defense in depth rather than a single model-level fix, since no defender is expected to hold under indefinite attacker iteration.[12]

Independent commentators including Simon Willison and security press such as TechCrunch, Fortune, and CyberScoop noted that the December 2025 statement was unusually candid for a frontier lab, effectively acknowledging that prompt injection is a permanent operating risk for browser agents rather than a bug awaiting a patch.[31][32][33] The framing was widely viewed as inherited from lessons learned during the Operator era, where Rehberger's GitHub-issue exploit had already shown the basic shape of the attack.[26] OpenAI also began advising users to give agents narrowly scoped instructions, since broad permissions like "take whatever action is needed in my inbox" gave injected content too much latitude.[31]

## How did Operator compare to other AI browser agents?

Operator was part of a broader wave of AI browser agents released by major AI companies between late 2024 and 2025. Each company took a distinct approach to architecture, deployment, and safety, reflecting different philosophies about how AI agents should interact with the web. The following table compares the key offerings at the time of their respective launches:

| Feature | OpenAI Operator / ChatGPT Agent | [Anthropic Computer Use](/wiki/anthropic_computer_use) | Google [Project Mariner](/wiki/project_mariner) | Cognition [Devin](/wiki/devin) |
|---|---|---|---|---|
| **Initial Announcement** | January 23, 2025 | October 22, 2024 | December 11, 2024 | March 12, 2024 |
| **Underlying Model** | CUA (GPT-4o + RL); upgraded to o3 in May 2025 | Claude 3.5 Sonnet (upgraded), later Claude 4 family | Gemini 2.0 Flash | Proprietary; reportedly built on top of frontier LLMs |
| **Interaction Method** | Screenshot-based; clicks, types, scrolls in dedicated browser | Screenshot-based; cursor movement, clicking, typing on full desktop | Chrome extension; pixels and web elements in active tab | Browser, code editor, and shell access in a sandboxed dev environment |
| **Scope** | Web browser only (Operator); full virtual computer (ChatGPT agent) | Full desktop environment (not limited to browser) | Web browser only (Chrome active tab) | Software engineering: writing, running, and debugging code |
| **Deployment** | Dedicated site at operator.chatgpt.com (later integrated into ChatGPT) | Developer API; sandboxed environments (Docker containers, VMs) | Chrome extension; cloud-based VMs for processing | Cloud-hosted product on devin.ai with optional Slack and IDE integrations |
| **WebVoyager Score** | 87.0% | ~52% (Claude 3.5 Sonnet, Oct 2024) | 83.5% | Not directly comparable; software-engineering benchmark focus |
| **OSWorld Score** | 38.1% (CUA); higher with o3-based variant | 14.9% (screenshot-only, Claude 3.5 Sonnet, Oct 2024) | Not publicly reported | Not directly comparable |
| **Headline Benchmark** | Browser-task success | Computer-use success | Web-task completion | SWE-bench (software engineering tasks) |
| **Safety Approach** | Confirmation mode, watch mode, take-over mode, restricted tasks | ASL-3 safety classifiers; isolated sandbox environments | Cloud-based VMs; real-time user intervention; restricted actions requiring credit card or cookies | Sandboxed dev environment; user reviews diffs before merging |
| **Initial Availability** | ChatGPT Pro subscribers in the US | Developer API (public beta, all developers) | Trusted testers (limited invitation) | Waitlist; later $500/month team plan |
| **Subsequent Expansion** | Integrated into ChatGPT agent (July 2025) for Pro, Plus, Team users | Claude for Chrome extension (August 2025) for Max plan subscribers | Google AI Ultra subscribers in US (May 2025) | Lower-tier individual plan, IDE plugins, Cognition acquired Windsurf in 2025 |
| **Pricing Model** | Included with ChatGPT subscription; API: $3/$12 per million tokens | API token-based pricing | Google AI Ultra subscription | Subscription-based ($20/month individual; $500/month team historically) |
| **Background Operation** | No (requires active session) | Yes (runs in sandboxed VM) | No (requires active Chrome tab) | Yes (runs autonomously in cloud sandbox) |

### Deployment philosophy comparison

OpenAI took a consumer-first approach with Operator, providing a polished product accessible through a web interface before later integrating it into ChatGPT and offering an API. This prioritized ease of use for non-technical users but limited initial adoption to high-paying Pro subscribers.

Anthropic chose a developer-first strategy, releasing computer use as an API capability in public beta and encouraging developers to build their own computer-using agents in sandboxed environments.[13] This approach prioritized safety (through isolation) and flexibility (through developer customization) but initially lacked a consumer-facing product. Anthropic later launched Claude for Chrome as a browser extension in August 2025 for Max plan subscribers, expanding to Pro, Team, and Enterprise plans by December 2025.[21] The underlying models also continued to advance: by the time of [Claude Opus 4](/wiki/claude_opus_4) and [Claude Opus 4.5](/wiki/claude_opus_4_5), Anthropic was shipping considerably stronger computer use performance, and by [Claude Opus 4.7](/wiki/claude_opus_4_7) on OSWorld-Verified the gap with OpenAI's CUA descendants had narrowed to within a few percentage points at the top of the leaderboard.

Google positioned Project Mariner as a research prototype, initially limiting access to trusted testers and gradually expanding availability.[14] By May 2025, it became accessible to Google AI Ultra subscribers in the US. Google's approach of running agents on cloud-based VMs (rather than the user's local machine) created physical separation between agent actions and user systems, providing a different security model.

Cognition's Devin sat in a different segment of the agent market. Rather than browsing the open web on behalf of consumers, Devin was an autonomous software engineering agent that wrote, ran, and debugged code in a sandboxed development environment, then handed back a pull request for human review. Operator and Devin shared the broader "AI agent" framing and the screenshot-and-action loop pattern, but their target users (consumers running errands versus engineering teams shipping code) and their evaluation regimes (browser benchmarks versus SWE-bench) were distinct.

Microsoft's Copilot Vision, rolled out across [Microsoft Copilot](/wiki/microsoft_copilot) products in 2025, pursued yet another path: instead of taking actions on a user's behalf, Copilot Vision focused on letting Copilot see what the user was already viewing on screen and answer questions about it, with action-taking handled through scoped automations rather than a generalized browser agent. This made it less ambitious than Operator but also less exposed to the prompt-injection class of failures that dogged true browser agents.

### Where Operator fits in the 2026 agent landscape

Viewed from 2026, the Operator-style approach, a hosted screenshot-driven web agent running in OpenAI infrastructure, sits in the middle of a spectrum that runs from full-desktop computer use on one end to in-browser extension agents on the other. Anthropic's Claude for Chrome and Anthropic's computer-use API span the desktop side. OpenAI's [ChatGPT Atlas](/wiki/chatgpt_atlas) browser and Perplexity Comet sit on the in-browser end. ChatGPT agent mode and the Responses API computer-use tool occupy the middle ground that Operator originally defined. Independent reviewers in 2026 generally treated this middle position as table stakes for any frontier AI product rather than a distinguishing feature, with practical differentiation coming from model quality, latency, sandbox guarantees, and integration with services the user already uses.

## Does Operator still exist?

As of mid-2026, the standalone Operator product no longer exists. operator.chatgpt.com has been offline since August 31, 2025, and the underlying CUA technology now lives in three places: inside the ChatGPT agent mode, inside the agent mode of the Atlas browser, and through the computer use tool in the OpenAI Responses API. Models built on later [GPT-5](/wiki/gpt-5) and o-series reasoning families have largely replaced the original GPT-4o-based CUA in production usage, with GPT-5.4 and GPT-5.5 powering the descendants of CUA across Atlas and ChatGPT agent mode through 2026.

Despite its short standalone life, Operator left a clear mark on the AI agent ecosystem. The CUA model demonstrated that combining vision capabilities with reinforcement learning could produce an agent capable of interacting with arbitrary graphical interfaces without requiring API integrations or website cooperation, and the screenshot-based approach became the dominant paradigm across the industry. The 38.1% OSWorld score that headlined the January 2025 launch also became the de facto starting line against which subsequent progress was measured: by mid-2026, frontier scores on the successor benchmark OSWorld-Verified sat in the high 70s, above the original human baseline of 72.4%.

Operator's multi-layered safety framework, including confirmation mode, watch mode, take-over mode, and a restricted-task list, established patterns that carried forward into ChatGPT agent, Atlas, and competitors' products. The principle that browser agents should request human approval before irreversible actions became a widely adopted norm. The trade-offs around prompt injection that surfaced in the Operator era also remained largely unresolved at the model level, with later [Anthropic](/wiki/anthropic) and OpenAI agent products inheriting the same class of vulnerabilities and OpenAI itself stating publicly in December 2025 that prompt injection is unlikely to ever be fully solved.[12]

The transition from Operator as a standalone product to an integrated feature within ChatGPT reflected a broader trend in AI product design: rather than building separate specialized tools, companies found more success embedding agent capabilities into existing platforms where users already spend time. This integration strategy, culminating in the Atlas browser, established a template that other companies have since followed.

## See also

- [AI Agents](/wiki/ai_agents)
- [Agent Mode](/wiki/agent_mode)
- [Anthropic Computer Use](/wiki/anthropic_computer_use)
- [ChatGPT](/wiki/chatgpt)
- [ChatGPT Atlas](/wiki/chatgpt_atlas)
- [Computer Use](/wiki/computer_use)
- [Deep Research](/wiki/deep_research)
- [Devin](/wiki/devin)
- [GPT-4o](/wiki/gpt_4o)
- [GPT-5](/wiki/gpt-5)
- [Large Language Model](/wiki/large_language_model)
- [Microsoft Copilot](/wiki/microsoft_copilot)
- [Multimodal AI](/wiki/multimodal_ai)
- [OpenAI](/wiki/openai)
- [OpenAI o-series](/wiki/openai_o-series)
- [OSWorld](/wiki/osworld)
- [Project Mariner](/wiki/project_mariner)
- [Prompt Injection](/wiki/prompt_injection)
- [Reinforcement Learning](/wiki/reinforcement_learning)
- [WebArena](/wiki/webarena)

## References

1. OpenAI. "Introducing Operator." OpenAI Blog, January 23, 2025. https://openai.com/index/introducing-operator/
2. OpenAI. "Computer-Using Agent." OpenAI Blog, January 23, 2025. https://openai.com/index/computer-using-agent/
3. OpenAI. "Operator System Card." January 23, 2025. https://cdn.openai.com/operator_system_card.pdf
4. OpenAI. "Introducing ChatGPT agent: bridging research and action." OpenAI Blog, July 17, 2025. https://openai.com/index/introducing-chatgpt-agent/
5. OpenAI. "ChatGPT Agent System Card." July 17, 2025. https://cdn.openai.com/pdf/839e66fc-602c-48bf-81d3-b21eacc3459d/chatgpt_agent_system_card.pdf
6. OpenAI. "Introducing ChatGPT Atlas." OpenAI Blog, October 21, 2025. https://openai.com/index/introducing-chatgpt-atlas/
7. OpenAI. "ChatGPT agent - release notes." OpenAI Help Center. https://help.openai.com/en/articles/11794368-chatgpt-agent-release-notes
8. OpenAI. "Operator - Release Notes." OpenAI Help Center. https://help.openai.com/en/articles/10561834-operator-release-notes
9. OpenAI. "New tools for building agents." OpenAI Blog, March 11, 2025. https://openai.com/index/new-tools-for-building-agents/
10. OpenAI. "Addendum to OpenAI o3 and o4-mini system card: OpenAI o3 Operator." May 23, 2025. https://openai.com/index/o3-o4-mini-system-card-addendum-operator-o3/
11. OpenAI. "How we built OWL, the new architecture behind our ChatGPT-based browser, Atlas." OpenAI Blog, October 2025. https://openai.com/index/building-chatgpt-atlas/
12. OpenAI. "Continuously hardening ChatGPT Atlas against prompt injection attacks." OpenAI Blog, December 2025. https://openai.com/index/hardening-atlas-against-prompt-injection/
13. Anthropic. "Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku." October 22, 2024. https://www.anthropic.com/news/3-5-models-and-computer-use
14. Google DeepMind. "Project Mariner." December 11, 2024. https://deepmind.google/models/project-mariner/
15. Google. "Google introduces Gemini 2.0: A new AI model for the agentic era." December 2024. https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/
16. TechCrunch. "OpenAI launches Operator, an AI agent that performs tasks autonomously." January 23, 2025. https://techcrunch.com/2025/01/23/openai-launches-operator-an-ai-agent-that-performs-tasks-autonomously/
17. TechCrunch. "OpenAI rolls out its AI agent, Operator, in several countries." February 21, 2025. https://techcrunch.com/2025/02/21/openai-rolls-out-its-ai-agent-operator-in-several-countries/
18. MIT Technology Review. "OpenAI launches Operator: an agent that can use a computer for you." January 23, 2025. https://www.technologyreview.com/2025/01/23/1110484/openai-launches-operator-an-agent-that-can-use-a-computer-for-you/
19. VentureBeat. "Meet OpenAI's Operator, an AI agent that uses the web to book you dinner reservations, order tickets, compile grocery lists and more." January 23, 2025. https://venturebeat.com/ai/meet-openais-operator-an-ai-agent-that-uses-the-web-to-book-you-dinner-reservations-order-tickets-compile-grocery-lists-and-more/
20. TechCrunch. "Google unveils Project Mariner: AI agents to use the web for you." December 11, 2024. https://techcrunch.com/2024/12/11/google-unveils-project-mariner-ai-agents-to-use-the-web-for-you/
21. TechCrunch. "Anthropic launches a Claude AI agent that lives in Chrome." August 26, 2025. https://techcrunch.com/2025/08/26/anthropic-launches-a-claude-ai-agent-that-lives-in-chrome/
22. VentureBeat. "OpenAI updates Operator to o3, making its $200 monthly ChatGPT Pro subscription more enticing." May 23, 2025. https://venturebeat.com/ai/openai-updates-operator-to-o3-making-its-200-monthly-chatgpt-subscription-more-enticing/
23. TechCrunch. "OpenAI upgrades the AI model powering its Operator agent." May 23, 2025. https://techcrunch.com/2025/05/23/openai-upgrades-the-ai-model-powering-its-operator-agent/
24. BleepingComputer. "OpenAI confirms Operator Agent is now more accurate with o3." May 23, 2025. https://www.bleepingcomputer.com/news/artificial-intelligence/openai-confirms-operator-agent-is-now-more-accurate-with-o3/
25. Simon Willison. "OpenAI Operator coverage." simonwillison.net, January and February 2025. https://simonwillison.net/tags/openai-operator/
26. Johann Rehberger / Embrace The Red. "Prompt injection exploits in ChatGPT Operator." February 2025. https://embracethered.com/blog/posts/2025/chatgpt-operator-prompt-injection-exploits/
27. Learn Prompting. "Prompt Injection Exploits in ChatGPT Operator." 2025. https://learnprompting.org/blog/prompt-injection-exploits-in-chatgpt-operator
28. DataCamp. "OpenAI's Operator: Examples, Use Cases, Competition & More." 2025. https://www.datacamp.com/blog/operator
29. XLANG Lab. "Introducing OSWorld-Verified." July 2025. https://xlang.ai/blog/osworld-verified
30. XLANG Lab. "OpenCUA: Open Foundations for Computer-Use Agents." arXiv:2508.09123, August 2025. https://arxiv.org/html/2508.09123v3
31. TechCrunch. "OpenAI says AI browsers may always be vulnerable to prompt injection attacks." December 22, 2025. https://techcrunch.com/2025/12/22/openai-says-ai-browsers-may-always-be-vulnerable-to-prompt-injection-attacks/
32. Fortune. "OpenAI says prompt injections that can trick AI browsers may never be fully 'solved'." December 23, 2025. https://fortune.com/2025/12/23/openai-ai-browser-prompt-injections-cybersecurity-hackers/
33. CyberScoop. "OpenAI says prompt injection may never be 'solved' for browser agents like Atlas." December 2025. https://cyberscoop.com/openai-chatgpt-atlas-prompt-injection-browser-agent-security-update-head-of-preparedness/

