ChatGPT Agent

ChatGPT Agent is an agentic feature of [[chatgpt|ChatGPT]] from [[sam_altman|Sam Altman]]'s [[openai_api|OpenAI]] that gives the chatbot its own virtual computer—complete with a graphical browser, a text browser, a Linux-style terminal, code execution, and connectors to external apps—so the model can complete multi-step tasks on a user's behalf rather than only producing text answers.[^1][^2] It launched on July 17, 2025, with a livestream introduction by Sam Altman alongside the project team of Casey Chu, Isa Fulford, Yash Kumar, and Zhiqing Sun, and it unified two earlier OpenAI agent products: the browsing-and-clicking assistant [[openai_operator|OpenAI Operator]] (released January 23, 2025) and the long-running research tool [[deep_research|Deep Research]] (released February 2, 2025).[^1][^2][^3][^4][^5]

OpenAI positioned ChatGPT Agent as a "unified agentic system" that "fluidly shift[s] between reasoning and action to handle complex workflows from start to finish," with example tasks including "look at my calendar and brief me on upcoming client meetings based on recent news," "plan and buy ingredients to make Japanese breakfast for four," and "analyze three competitors and create a slide deck."[^1][^2] At launch the feature was made available inside ChatGPT to subscribers on the Pro ($200/month), Plus ($20/month), and Team tiers, with Enterprise and Education access following in subsequent weeks; Pro subscribers received 400 agent messages per month and Plus and Team users received 40 per month.[^2][^6] The system card classified the underlying model as the first OpenAI release treated as "High capability" in the Biological and Chemical domain under OpenAI's Preparedness Framework, activating an unprecedented set of safety mitigations.[^7][^8][^9]

ChatGPT Agent introduced new agentic benchmark records at the time of release, including a state-of-the-art pass@1 score of 41.6% on [[humanitys_last_exam|Humanity's Last Exam]] (44.4% with parallel attempts), 27.4% on [[frontiermath|FrontierMath]] (Tiers 1–3) with tool use, 45.5% on SpreadsheetBench, 68.9% on [[browsecomp|BrowseComp]], and 65.4% on [[webarena|WebArena]].[^2][^10][^11] It nonetheless drew a cautious launch message from Altman, who described the product as "cutting edge and experimental … a chance to try the future, but not something I'd yet use for high-stakes uses or with a lot of personal information until we have a chance to study and improve it in the wild."[^12][^13] The standalone Operator preview at operator.chatgpt.com was deprecated and shut down on August 31, 2025, with its capabilities absorbed into ChatGPT Agent.[^3][^14]

Background

OpenAI's path to ChatGPT Agent passed through two distinct agent products released earlier in 2025.

Operator (January 2025)

[[openai_operator|Operator]] debuted on January 23, 2025, as an autonomous web-browsing agent built on a new "Computer-Using Agent" (CUA) model that combined the vision capabilities of [[gpt_4o|GPT-4o]] with reasoning from OpenAI's [[o3|o-series models]].[^4][^15] Operator opened a remote Chromium-based browser inside ChatGPT and could click, type, and fill forms in much the same way a human would, "without need[ing] to use developer-facing APIs."[^4][^15] At launch it was restricted to U.S. customers on the $200/month Pro tier and was framed as a "research preview"; OpenAI announced partnerships with DoorDash, eBay, Instacart, Priceline, StubHub, and Uber to ensure that Operator respected those services' terms of use.[^4][^15] Independent assessments later found that Operator reached roughly 38.1% accuracy on OS-level tasks and 58.1% on web tasks, "[not reaching] human-level accuracy" and struggling with multi-step workflows.[^14]

Deep Research (February 2025)

[[deep_research|Deep Research]] launched on February 2, 2025, as a separate, slower agent that conducted 5–30 minute autonomous browsing sessions to produce long-form, cited research reports.[^5][^16] Built on a variant of [[o3|OpenAI o3]], Deep Research was initially capped at 100 queries per month for Pro users and scored 26.6% on Humanity's Last Exam at release—a substantial gap to GPT-4o (3.3%) and DeepSeek R1 (9.4%).[^5][^16] OpenAI later expanded Deep Research to Plus, Team, Enterprise, and Education users on February 25, 2025, and on April 24, 2025, raised query allocations to 250/month for Pro, 25/month for paid Plus/Team/Enterprise/Edu, and 5/month for Free.[^5][^16]

ChatGPT Agent's system card characterizes the new model as combining "Deep research's ability to conduct multi-step research and generate high-quality reports" with "Operator's capacity to execute tasks through a remote visual browser environment," supplemented with a new terminal and connector layer.[^7] Yash Kumar, the product lead, told The Verge that the move from Operator to Agent meant ChatGPT now had access to "an entire computer" rather than just a browser.[^11]

Capabilities

Virtual computer

ChatGPT Agent runs inside an OpenAI-controlled sandbox—a virtual machine with persistent state across the duration of a task—rather than on the user's device.[^2][^17] The sandbox exposes four primary tools to the model:[^2][^7]

A visual (graphical) browser that captures screenshots of pages and lets the model click, scroll, type, and drag, modeled on Operator's CUA;
A text-based browser optimized for efficient information retrieval, modeled on Deep Research;
A terminal with Python and shell access, used for data analysis, file manipulation, and generating files such as .xlsx and .pptx documents;
Direct API access for selected applications and connectors.

State is preserved across tool switches, so the agent can, for example, log into a website in the visual browser, download a CSV, run it through pandas in the terminal, and then upload the processed file back through the visual browser.[^7][^17][^18] Sessions have been demonstrated lasting up to about two hours, with the agent reasoning, browsing, and acting in a continuous loop.[^17]

Outputs and file system

Within the sandbox the agent has a working file system. It can read user-uploaded files (PDFs, spreadsheets, images) and produce deliverables such as editable Excel spreadsheets, PowerPoint and Keynote-compatible decks, and PDF research reports written by emitting Python code that constructs the underlying .xlsx and .pptx files.[^2][^18][^19] Mobile users receive push notifications when long-running tasks complete.[^19]

Connectors

ChatGPT Agent can call into OpenAI's "connectors" framework, which authenticates the agent to read and (in some cases) write to external services through OAuth or scoped tokens.[^2][^20] Connectors documented as supported at or shortly after launch include Gmail, Google Calendar, Google Drive, GitHub, SharePoint, Outlook, Box, Dropbox, Microsoft Teams, Linear, HubSpot, and (added subsequently) Notion and Slack.[^20][^21] Connectors also leverage the [[model_context_protocol|Model Context Protocol]] (MCP), enabling organizations to expose custom tools to ChatGPT Agent through MCP servers.[^20][^21]

Scheduled tasks

ChatGPT Agent runs can be scheduled to recur via the "Tasks" feature, allowing daily, weekly, or monthly executions; users can manage these at chatgpt.com/schedules. ChatGPT enforces a cap of 10 active tasks per user.[^21][^22]

Representative use cases

The launch announcement and OpenAI's developer cookbook framed ChatGPT Agent as best suited to multi-step "knowledge work" that combines browsing, research, computation, and document production. Examples cited in the launch material and contemporaneous press coverage include:[^1][^2][^11][^19]

Briefing a user on upcoming client meetings by cross-referencing the user's calendar (via the Google Calendar connector) with recent news.
Building competitive analyses—researching multiple companies, synthesizing findings into a long-form report, and producing an editable slide deck.
Generating Japanese-breakfast meal plans, sourcing ingredients via web shopping, and (subject to user confirmation) placing orders.
Running data-analysis tasks on user-uploaded spreadsheets in the terminal and returning an updated workbook.
Scheduling appointments, booking travel, and filling out web forms under user supervision.
For developers and analysts: querying GitHub repositories and Linear issues via connectors, then composing summary documents in the chat.

In its developer documentation OpenAI also published a "workspace agents" cookbook that demonstrated ChatGPT Agent automating sales-meeting prep end-to-end, using connectors to surface CRM context, email threads, and calendar conflicts inside a single agentic workflow.[^20]

Availability and pricing

At its July 17, 2025 launch, ChatGPT Agent was activated by selecting "agent mode" from the tools dropdown in the composer (or by typing the /agent shortcut). Rollout proceeded as follows:[^1][^2][^11]

Pro ($200/month): full access by end of July 17, 2025; quota of 400 agent messages per month.[^1][^2]
Plus ($20/month) and Team: rolling out over the days following launch; quota of 40 agent messages per month.[^1][^2]
Enterprise and Education: "coming weeks."[^1][^2]

Additional usage above the monthly quota was made available "for purchase" via credits.[^1][^2]

Geographic restrictions

ChatGPT Agent was initially unavailable in the European Economic Area (EEA) and Switzerland, mirroring constraints on prior agent products under EU regulatory uncertainty.[^23] OpenAI announced on July 23, 2025 that Pro users in the EEA and Switzerland had been brought into the rollout, with the Plus tier following globally over the next few days; a footnote in OpenAI's Connectors documentation continued to list certain deep-research-style connector features as restricted in the EEA, Switzerland, and the UK.[^23][^24]

Underlying model

The system card identifies ChatGPT Agent as "a new agentic model in the same family as OpenAI o3" that has been fine-tuned with end-to-end reinforcement learning specifically for browser, terminal, and connector tool use.[^7] Press coverage on the day of launch reported that the model had been referred to internally as a successor to o3 (sometimes loosely called "o4" in pre-release rumors) before being released as a dedicated Agent model rather than a separately branded reasoning model.[^17] OpenAI did not publish an independent name for the model; it is exposed only as the underlying engine of the "agent" mode inside ChatGPT.[^7][^11] Its capabilities on standard reasoning benchmarks were broadly similar to [[o3|o3]] with browsing, with the gains concentrated on agentic tasks that require multi-step tool use.[^7][^17]

Safety and the Preparedness Framework

The launch of ChatGPT Agent was accompanied by a 41-page system card describing what OpenAI called the most extensive set of pre-deployment mitigations to date, motivated by both the broader user reach of ChatGPT itself and the new attack surface created by the agent's tools.[^7][^8]

Preparedness Framework: High biological and chemical capability

OpenAI's [[openai_api|OpenAI]] Preparedness Framework, updated to version 2 on April 15, 2025, classifies frontier capabilities and prescribes corresponding safeguards. ChatGPT Agent became the first product OpenAI explicitly designated as "High capability in the Biological and Chemical domain," activating the framework's full High-tier mitigation stack.[^7][^8][^9] In the system card, OpenAI wrote: "While we do not have definitive evidence that this model could meaningfully help a novice to create severe biological harm – our defined threshold for High capability – we have chosen to take a precautionary approach."[^7] OpenAI Technical Staff member Boaz Barak said in interviews that "some might think that biorisk is not real … that may have been true in 2024 but is definitely not true today" and that "it would have been deeply irresponsible to release this model without comprehensive mitigations."[^9]

Specific findings cited as motivation included an external SecureBio assessment in which ChatGPT Agent answered 4 of 10 "World-Class Biology" questions correctly compared with an average of 1.5 of 10 for [[o3|o3]], and a "Pathogen Acquisition" exercise in which the agent bypassed a common failure mode of prior models. The UK AI Safety Institute (AISI) was given pre-launch access and identified seven "universal" attacks, all of which OpenAI patched before the public release.[^7][^25]

OpenAI's High-tier safeguard stack for ChatGPT Agent includes:[^7]

Model training to refuse weaponization assistance and to avoid actionable dual-use detail.
A two-tier always-on monitoring system: a fast topical classifier that flags biology-related content (≥0.95 recall) and a slower reasoning monitor that classifies generations against a biothreat taxonomy.
Account-level enforcement that bans users attempting to elicit such content, with referral to law enforcement in extreme cases.
A Rapid Remediation Protocol that patches universal jailbreaks discovered in the wild.
A bug bounty for biology-related universal jailbreaks.
A trusted-access program that allows vetted partners in biodefense and life sciences to opt into a less restricted variant while still blocking weaponization content.[^7]

Agent-specific mitigations

The system card identifies three product-specific risk categories—prompt injection, the agent making a mistake, and users requesting disallowed tasks—and a corresponding mitigation stack:[^7]

Safety training specifically targeted at resisting prompt injections.
Automated monitors and filters updated in real time as new attacks are observed.
User confirmations required before "actions that affect the state of the world" such as purchases or sending email. OpenAI reports a "confirmation recall" of 91.0%, with 100% recall on critical actions including completing financial transactions and editing cloud-storage permissions.[^7]
"Watch Mode" is triggered when the agent uses the visual browser in a sensitive context (for example while logged into email or a banking site). Once enabled, Watch Mode "automatically paus[es] execution when the user becomes inactive or navigates away from the conversation in ChatGPT," requiring the user to actively observe the rest of the trajectory.[^7][^26]
Takeover mode lets users grab the virtual browser to enter credentials themselves; while the user is in control, OpenAI does not capture screenshots, so passwords are never seen by the model.[^26][^27]
Terminal network restrictions at launch limited outbound requests to GET methods against a whitelist of common datasets such as official government datasets.[^7]
ChatGPT Memory disabled while Agent runs, to mitigate prompt-injection-driven exfiltration of stored personal context.[^7]
A trained refusal layer for high-risk tasks such as bank transfers; the system card reports a 97.0% refusal rate for "Disallowed Financial Activities" such as gambling.[^7]

Prompt-injection resistance evaluations in the system card show ChatGPT Agent improving on Operator on three test sets—95% on "Irrelevant instructions – visual browser" (versus 82% for Operator on GPT-4o and 89% for Operator on o3), 78% on "In-context data exfiltration – visual browser" (versus 75%/80%), and 67% on "Active data exfiltration – visual browser" (versus 58%/75%).[^7]

Sam Altman's pre-launch warning to ChatGPT users explicitly tied these mitigations to remaining uncertainty: "bad actors may try to 'trick' users' AI agents into giving private information they shouldn't and take actions they shouldn't, in ways we can't predict," he wrote, recommending that users give agents "the minimum access required to complete a task."[^12][^13]

Benchmark performance

ChatGPT Agent's launch announcement and system card reported state-of-the-art or competitive results on several agentic and reasoning benchmarks. All figures are pass@1 unless otherwise noted; "with tools" indicates the agent had access to its browser and terminal.

Benchmark	Score	Notes
[[humanitys_last_exam	Humanity's Last Exam]]	41.6% pass@1 (44.4% with parallel attempts)
[[frontiermath	FrontierMath]] (Tiers 1–3)	27.4%
SpreadsheetBench	45.5% (direct editing)	Vs. 20.0% for Copilot in Excel.[^2][^10]
DSBench (data analysis)	89.9%	Above the average human level reported.[^17][^29]
DSBench (data modeling)	85.5%	Above the average human level reported.[^17][^29]
[[browsecomp	BrowseComp]]	68.9%
[[webarena	WebArena]]	65.4%
Investment Banking Modeling Test	71.3% mean accuracy	Reported in OpenAI's launch blog; ahead of Deep Research (~55.9%) and o3 (~48.6%).[^17][^29]
SWE-bench Verified	comparable to o3	Software-engineering benchmark; system card notes ChatGPT Agent performed similarly to o3, not better.[^7]
PaperBench	comparable to o3	Replicating ICML 2024 papers; "ChatGPT agent scores are similar to o3's scores, with and without browsing."[^7]

The system card cautions that some benchmarks are susceptible to "browsing contamination," where the model can retrieve evaluation answers from the open web rather than reason through them. OpenAI reported non-browsing variants where contamination was a concern.[^7]

Reception

Press coverage

Tech press uniformly framed ChatGPT Agent as OpenAI's largest bet to date on agentic AI. TechCrunch described it as "a general-purpose agent in ChatGPT," reporting that "Pro, Plus, and Team subscribers" would gain access on rollout day.[^6] The Verge's Hayden Field led with the model rather than the feature, writing that ChatGPT Agent "can control an entire computer and perform multi-step tasks, powered by a new dedicated model."[^11] VentureBeat highlighted the leap from text answers to actions, framing the product as ChatGPT being given "its own computer to autonomously use your email and web apps, download and create files for you."[^19] Analytics India Magazine emphasized that the rollout fused "Operator's action-taking browser and Deep Research's web synthesis" into a single experience.[^29]

Practitioner reviews and criticism

Reviewers who tested the product at launch consistently noted that the agent was slow and somewhat unreliable. Nielsen Norman Group titled its first impressions piece "Successful, Shaky, and Slow," reporting that "tasks that take humans five minutes often take longer with the agent" and that small UI elements or dynamic interfaces sometimes confused it.[^30] Trade-press reviews described the agent as "useful for experimentation and occasional tasks but insufficient for critical business processes," and reported that the agent "is unable to bypass Cloudflare/CAPTCHA verifications" and could abandon tasks behind login walls.[^31] WIRED tested the agent's ability to recite WIRED's own product recommendations and reported that "the AI confidently cited specific TVs, headphones, and laptops as WIRED's top picks, none of which the publication's reviewers had actually endorsed."[^32]

Independent observer Simon Willison—the creator of the term "prompt injection"—framed ChatGPT Agent as a watershed moment for definitional clarity: "An LLM agent runs tools in a loop to achieve a goal." He noted, however, that browser-using agents combine the "lethal trifecta" of access to private data, exposure to untrusted content, and the ability to communicate externally, and warned that those three properties together make prompt-injection mitigations especially difficult.[^33][^34][^33]

Adoption metrics circulated shortly after launch suggested that the 40-message Plus quota was a binding constraint: one analysis reported that "73% of ChatGPT Plus users burned through their monthly allocation of 40 agent runs within the first week of having access."[^31]

Limitations and criticism

Independent testing converged on several limitations of ChatGPT Agent at launch:

Latency and overhead. End-to-end runs often took minutes to hours; Nielsen Norman Group reported that "a booking form took 30 seconds for a human but the agent flailed for eight minutes."[^30]
CAPTCHA and anti-bot defenses. Sites with Cloudflare or similar defenses regularly blocked the agent. The agent was reported as more likely to abandon a task than to ask the user for help in such cases.[^31][^32]
Hallucination of cited content. WIRED documented invented "phantom" recommendations in the agent's research outputs, with the agent providing "no citations, no confidence scores, no indication that users should verify its claims" in some interactions.[^32]
Quota exhaustion. With 40 messages per month on Plus and Team, sustained use was difficult; supplemental credits were available for purchase but were not the default experience.[^2][^31]
Bio-risk safeguards. While OpenAI framed the High-capability classification as a precaution, biosecurity commentary noted that ChatGPT Agent's launch was a "first" in the industry; Transformer described the move as OpenAI "hit[ting] the biorisk alarm" and contrasted it with xAI's contemporaneous launch of Grok 4 "without safety documentation or testing disclosures."[^25]
Prompt-injection residual risk. OpenAI itself emphasized in the system card and in subsequent commentary that prompt injection in browser agents is unlikely to be fully eliminated, and Sam Altman publicly cautioned users against entrusting ChatGPT Agent with sensitive personal information at launch.[^7][^12][^13]

Successor products and evolution

ChatGPT Agent has continued to evolve through 2025 and 2026:

Operator deprecation (August 31, 2025). The standalone Operator preview at operator.chatgpt.com was shut down, with its visual-browser capabilities folded into ChatGPT Agent.[^3][^14]
Enterprise and Education rollout. Enterprise and Education customers gained access in the weeks following the initial Pro/Plus/Team release.[^21]
[[agentkit|AgentKit]] (October 6, 2025). At DevDay 2025, OpenAI introduced AgentKit, a developer-facing toolkit for building agents from scratch—including a visual "Agent Builder," ChatKit for embedded chat UIs, and an evaluations module—positioning ChatGPT Agent as the consumer-facing counterpart to a broader agent platform.[^35][^36]
[[codex|Codex]] and [[gpt_5_codex|GPT-5 Codex]] (October 2025). OpenAI's coding agent Codex, rebuilt on a GPT-5-family model specialized for agentic coding, became OpenAI's developer-facing analog to ChatGPT Agent for software engineering; OpenAI noted that Codex built "80% of the Agent Builder tool in under 6 weeks."[^36]
[[chatgpt_atlas|ChatGPT Atlas]]. OpenAI's ChatGPT-integrated web browser later inherited ChatGPT Agent's agentic mode, exposing it through an "Ask ChatGPT" sidebar and an agentic action mode; OpenAI subsequently published continuous work on hardening Atlas against prompt injection, with leadership stating publicly that prompt injection may never be fully "solved" in browser agents.[^37][^38]
Underlying model upgrades. OpenAI's ChatGPT release notes have since rolled out newer default models (including the GPT-5.x family), with ChatGPT Agent inheriting underlying model improvements in subsequent versions.[^21]

ChatGPT Agent's launch also reframed the competitive landscape for generalist agents: it was widely compared with [[anthropic_computer_use|Anthropic Computer Use]] (released October 22, 2024), [[devin|Devin]], [[manus_ai|Manus AI]], and Google's then-incoming Gemini agentic features, and was the first major release to combine a graphical computer-use agent with a deep-research-style autonomous browsing agent and a Python terminal under a single user-facing mode.[^11][^17]

Background

Operator (January 2025)

Deep Research (February 2025)

Capabilities

Virtual computer

Outputs and file system

Connectors

Scheduled tasks

Representative use cases

Availability and pricing

Geographic restrictions

Underlying model

Safety and the Preparedness Framework

Preparedness Framework: High biological and chemical capability

Agent-specific mitigations

Benchmark performance

Reception

Press coverage

Practitioner reviews and criticism

Limitations and criticism

Successor products and evolution

See also

References

Improve this article

Background

Operator (January 2025)

Deep Research (February 2025)

Capabilities

Virtual computer

Outputs and file system

Connectors

Scheduled tasks

Representative use cases

Availability and pricing

Geographic restrictions

Underlying model

Safety and the Preparedness Framework

Preparedness Framework: High biological and chemical capability

Agent-specific mitigations

Benchmark performance

Reception

Press coverage

Practitioner reviews and criticism

Limitations and criticism

Successor products and evolution

See also

References