ChatGPT Agent
Last reviewed
May 18, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 4,106 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 18, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 4,106 words
Add missing citations, update stale details, or suggest a clearer explanation.
ChatGPT Agent is an agentic feature of [[chatgpt|ChatGPT]] from [[sam_altman|Sam Altman]]'s [[openai_api|OpenAI]] that gives the chatbot its own virtual computer—complete with a graphical browser, a text browser, a Linux-style terminal, code execution, and connectors to external apps—so the model can complete multi-step tasks on a user's behalf rather than only producing text answers.[^1][^2] It launched on July 17, 2025, with a livestream introduction by Sam Altman alongside the project team of Casey Chu, Isa Fulford, Yash Kumar, and Zhiqing Sun, and it unified two earlier OpenAI agent products: the browsing-and-clicking assistant [[openai_operator|OpenAI Operator]] (released January 23, 2025) and the long-running research tool [[deep_research|Deep Research]] (released February 2, 2025).[^1][^2][^3][^4][^5]
OpenAI positioned ChatGPT Agent as a "unified agentic system" that "fluidly shift[s] between reasoning and action to handle complex workflows from start to finish," with example tasks including "look at my calendar and brief me on upcoming client meetings based on recent news," "plan and buy ingredients to make Japanese breakfast for four," and "analyze three competitors and create a slide deck."[^1][^2] At launch the feature was made available inside ChatGPT to subscribers on the Pro ($200/month), Plus ($20/month), and Team tiers, with Enterprise and Education access following in subsequent weeks; Pro subscribers received 400 agent messages per month and Plus and Team users received 40 per month.[^2][^6] The system card classified the underlying model as the first OpenAI release treated as "High capability" in the Biological and Chemical domain under OpenAI's Preparedness Framework, activating an unprecedented set of safety mitigations.[^7][^8][^9]
ChatGPT Agent introduced new agentic benchmark records at the time of release, including a state-of-the-art pass@1 score of 41.6% on [[humanitys_last_exam|Humanity's Last Exam]] (44.4% with parallel attempts), 27.4% on [[frontiermath|FrontierMath]] (Tiers 1–3) with tool use, 45.5% on SpreadsheetBench, 68.9% on [[browsecomp|BrowseComp]], and 65.4% on [[webarena|WebArena]].[^2][^10][^11] It nonetheless drew a cautious launch message from Altman, who described the product as "cutting edge and experimental … a chance to try the future, but not something I'd yet use for high-stakes uses or with a lot of personal information until we have a chance to study and improve it in the wild."[^12][^13] The standalone Operator preview at operator.chatgpt.com was deprecated and shut down on August 31, 2025, with its capabilities absorbed into ChatGPT Agent.[^3][^14]
OpenAI's path to ChatGPT Agent passed through two distinct agent products released earlier in 2025.
[[openai_operator|Operator]] debuted on January 23, 2025, as an autonomous web-browsing agent built on a new "Computer-Using Agent" (CUA) model that combined the vision capabilities of [[gpt_4o|GPT-4o]] with reasoning from OpenAI's [[o3|o-series models]].[^4][^15] Operator opened a remote Chromium-based browser inside ChatGPT and could click, type, and fill forms in much the same way a human would, "without need[ing] to use developer-facing APIs."[^4][^15] At launch it was restricted to U.S. customers on the $200/month Pro tier and was framed as a "research preview"; OpenAI announced partnerships with DoorDash, eBay, Instacart, Priceline, StubHub, and Uber to ensure that Operator respected those services' terms of use.[^4][^15] Independent assessments later found that Operator reached roughly 38.1% accuracy on OS-level tasks and 58.1% on web tasks, "[not reaching] human-level accuracy" and struggling with multi-step workflows.[^14]
[[deep_research|Deep Research]] launched on February 2, 2025, as a separate, slower agent that conducted 5–30 minute autonomous browsing sessions to produce long-form, cited research reports.[^5][^16] Built on a variant of [[o3|OpenAI o3]], Deep Research was initially capped at 100 queries per month for Pro users and scored 26.6% on Humanity's Last Exam at release—a substantial gap to GPT-4o (3.3%) and DeepSeek R1 (9.4%).[^5][^16] OpenAI later expanded Deep Research to Plus, Team, Enterprise, and Education users on February 25, 2025, and on April 24, 2025, raised query allocations to 250/month for Pro, 25/month for paid Plus/Team/Enterprise/Edu, and 5/month for Free.[^5][^16]
ChatGPT Agent's system card characterizes the new model as combining "Deep research's ability to conduct multi-step research and generate high-quality reports" with "Operator's capacity to execute tasks through a remote visual browser environment," supplemented with a new terminal and connector layer.[^7] Yash Kumar, the product lead, told The Verge that the move from Operator to Agent meant ChatGPT now had access to "an entire computer" rather than just a browser.[^11]
ChatGPT Agent runs inside an OpenAI-controlled sandbox—a virtual machine with persistent state across the duration of a task—rather than on the user's device.[^2][^17] The sandbox exposes four primary tools to the model:[^2][^7]
.xlsx and .pptx documents;State is preserved across tool switches, so the agent can, for example, log into a website in the visual browser, download a CSV, run it through pandas in the terminal, and then upload the processed file back through the visual browser.[^7][^17][^18] Sessions have been demonstrated lasting up to about two hours, with the agent reasoning, browsing, and acting in a continuous loop.[^17]
Within the sandbox the agent has a working file system. It can read user-uploaded files (PDFs, spreadsheets, images) and produce deliverables such as editable Excel spreadsheets, PowerPoint and Keynote-compatible decks, and PDF research reports written by emitting Python code that constructs the underlying .xlsx and .pptx files.[^2][^18][^19] Mobile users receive push notifications when long-running tasks complete.[^19]
ChatGPT Agent can call into OpenAI's "connectors" framework, which authenticates the agent to read and (in some cases) write to external services through OAuth or scoped tokens.[^2][^20] Connectors documented as supported at or shortly after launch include Gmail, Google Calendar, Google Drive, GitHub, SharePoint, Outlook, Box, Dropbox, Microsoft Teams, Linear, HubSpot, and (added subsequently) Notion and Slack.[^20][^21] Connectors also leverage the [[model_context_protocol|Model Context Protocol]] (MCP), enabling organizations to expose custom tools to ChatGPT Agent through MCP servers.[^20][^21]
ChatGPT Agent runs can be scheduled to recur via the "Tasks" feature, allowing daily, weekly, or monthly executions; users can manage these at chatgpt.com/schedules. ChatGPT enforces a cap of 10 active tasks per user.[^21][^22]
The launch announcement and OpenAI's developer cookbook framed ChatGPT Agent as best suited to multi-step "knowledge work" that combines browsing, research, computation, and document production. Examples cited in the launch material and contemporaneous press coverage include:[^1][^2][^11][^19]
In its developer documentation OpenAI also published a "workspace agents" cookbook that demonstrated ChatGPT Agent automating sales-meeting prep end-to-end, using connectors to surface CRM context, email threads, and calendar conflicts inside a single agentic workflow.[^20]
At its July 17, 2025 launch, ChatGPT Agent was activated by selecting "agent mode" from the tools dropdown in the composer (or by typing the /agent shortcut). Rollout proceeded as follows:[^1][^2][^11]
Additional usage above the monthly quota was made available "for purchase" via credits.[^1][^2]
ChatGPT Agent was initially unavailable in the European Economic Area (EEA) and Switzerland, mirroring constraints on prior agent products under EU regulatory uncertainty.[^23] OpenAI announced on July 23, 2025 that Pro users in the EEA and Switzerland had been brought into the rollout, with the Plus tier following globally over the next few days; a footnote in OpenAI's Connectors documentation continued to list certain deep-research-style connector features as restricted in the EEA, Switzerland, and the UK.[^23][^24]
The system card identifies ChatGPT Agent as "a new agentic model in the same family as OpenAI o3" that has been fine-tuned with end-to-end reinforcement learning specifically for browser, terminal, and connector tool use.[^7] Press coverage on the day of launch reported that the model had been referred to internally as a successor to o3 (sometimes loosely called "o4" in pre-release rumors) before being released as a dedicated Agent model rather than a separately branded reasoning model.[^17] OpenAI did not publish an independent name for the model; it is exposed only as the underlying engine of the "agent" mode inside ChatGPT.[^7][^11] Its capabilities on standard reasoning benchmarks were broadly similar to [[o3|o3]] with browsing, with the gains concentrated on agentic tasks that require multi-step tool use.[^7][^17]
The launch of ChatGPT Agent was accompanied by a 41-page system card describing what OpenAI called the most extensive set of pre-deployment mitigations to date, motivated by both the broader user reach of ChatGPT itself and the new attack surface created by the agent's tools.[^7][^8]
OpenAI's [[openai_api|OpenAI]] Preparedness Framework, updated to version 2 on April 15, 2025, classifies frontier capabilities and prescribes corresponding safeguards. ChatGPT Agent became the first product OpenAI explicitly designated as "High capability in the Biological and Chemical domain," activating the framework's full High-tier mitigation stack.[^7][^8][^9] In the system card, OpenAI wrote: "While we do not have definitive evidence that this model could meaningfully help a novice to create severe biological harm – our defined threshold for High capability – we have chosen to take a precautionary approach."[^7] OpenAI Technical Staff member Boaz Barak said in interviews that "some might think that biorisk is not real … that may have been true in 2024 but is definitely not true today" and that "it would have been deeply irresponsible to release this model without comprehensive mitigations."[^9]
Specific findings cited as motivation included an external SecureBio assessment in which ChatGPT Agent answered 4 of 10 "World-Class Biology" questions correctly compared with an average of 1.5 of 10 for [[o3|o3]], and a "Pathogen Acquisition" exercise in which the agent bypassed a common failure mode of prior models. The UK AI Safety Institute (AISI) was given pre-launch access and identified seven "universal" attacks, all of which OpenAI patched before the public release.[^7][^25]
OpenAI's High-tier safeguard stack for ChatGPT Agent includes:[^7]
The system card identifies three product-specific risk categories—prompt injection, the agent making a mistake, and users requesting disallowed tasks—and a corresponding mitigation stack:[^7]
Prompt-injection resistance evaluations in the system card show ChatGPT Agent improving on Operator on three test sets—95% on "Irrelevant instructions – visual browser" (versus 82% for Operator on GPT-4o and 89% for Operator on o3), 78% on "In-context data exfiltration – visual browser" (versus 75%/80%), and 67% on "Active data exfiltration – visual browser" (versus 58%/75%).[^7]
Sam Altman's pre-launch warning to ChatGPT users explicitly tied these mitigations to remaining uncertainty: "bad actors may try to 'trick' users' AI agents into giving private information they shouldn't and take actions they shouldn't, in ways we can't predict," he wrote, recommending that users give agents "the minimum access required to complete a task."[^12][^13]
ChatGPT Agent's launch announcement and system card reported state-of-the-art or competitive results on several agentic and reasoning benchmarks. All figures are pass@1 unless otherwise noted; "with tools" indicates the agent had access to its browser and terminal.
| Benchmark | Score | Notes |
|---|---|---|
| [[humanitys_last_exam | Humanity's Last Exam]] | 41.6% pass@1 (44.4% with parallel attempts) |
| [[frontiermath | FrontierMath]] (Tiers 1–3) | 27.4% |
| SpreadsheetBench | 45.5% (direct editing) | Vs. 20.0% for Copilot in Excel.[^2][^10] |
| DSBench (data analysis) | 89.9% | Above the average human level reported.[^17][^29] |
| DSBench (data modeling) | 85.5% | Above the average human level reported.[^17][^29] |
| [[browsecomp | BrowseComp]] | 68.9% |
| [[webarena | WebArena]] | 65.4% |
| Investment Banking Modeling Test | 71.3% mean accuracy | Reported in OpenAI's launch blog; ahead of Deep Research (~55.9%) and o3 (~48.6%).[^17][^29] |
| SWE-bench Verified | comparable to o3 | Software-engineering benchmark; system card notes ChatGPT Agent performed similarly to o3, not better.[^7] |
| PaperBench | comparable to o3 | Replicating ICML 2024 papers; "ChatGPT agent scores are similar to o3's scores, with and without browsing."[^7] |
The system card cautions that some benchmarks are susceptible to "browsing contamination," where the model can retrieve evaluation answers from the open web rather than reason through them. OpenAI reported non-browsing variants where contamination was a concern.[^7]
Tech press uniformly framed ChatGPT Agent as OpenAI's largest bet to date on agentic AI. TechCrunch described it as "a general-purpose agent in ChatGPT," reporting that "Pro, Plus, and Team subscribers" would gain access on rollout day.[^6] The Verge's Hayden Field led with the model rather than the feature, writing that ChatGPT Agent "can control an entire computer and perform multi-step tasks, powered by a new dedicated model."[^11] VentureBeat highlighted the leap from text answers to actions, framing the product as ChatGPT being given "its own computer to autonomously use your email and web apps, download and create files for you."[^19] Analytics India Magazine emphasized that the rollout fused "Operator's action-taking browser and Deep Research's web synthesis" into a single experience.[^29]
Reviewers who tested the product at launch consistently noted that the agent was slow and somewhat unreliable. Nielsen Norman Group titled its first impressions piece "Successful, Shaky, and Slow," reporting that "tasks that take humans five minutes often take longer with the agent" and that small UI elements or dynamic interfaces sometimes confused it.[^30] Trade-press reviews described the agent as "useful for experimentation and occasional tasks but insufficient for critical business processes," and reported that the agent "is unable to bypass Cloudflare/CAPTCHA verifications" and could abandon tasks behind login walls.[^31] WIRED tested the agent's ability to recite WIRED's own product recommendations and reported that "the AI confidently cited specific TVs, headphones, and laptops as WIRED's top picks, none of which the publication's reviewers had actually endorsed."[^32]
Independent observer Simon Willison—the creator of the term "prompt injection"—framed ChatGPT Agent as a watershed moment for definitional clarity: "An LLM agent runs tools in a loop to achieve a goal." He noted, however, that browser-using agents combine the "lethal trifecta" of access to private data, exposure to untrusted content, and the ability to communicate externally, and warned that those three properties together make prompt-injection mitigations especially difficult.[^33][^34][^33]
Adoption metrics circulated shortly after launch suggested that the 40-message Plus quota was a binding constraint: one analysis reported that "73% of ChatGPT Plus users burned through their monthly allocation of 40 agent runs within the first week of having access."[^31]
Independent testing converged on several limitations of ChatGPT Agent at launch:
ChatGPT Agent has continued to evolve through 2025 and 2026:
ChatGPT Agent's launch also reframed the competitive landscape for generalist agents: it was widely compared with [[anthropic_computer_use|Anthropic Computer Use]] (released October 22, 2024), [[devin|Devin]], [[manus_ai|Manus AI]], and Google's then-incoming Gemini agentic features, and was the first major release to combine a graphical computer-use agent with a deep-research-style autonomous browsing agent and a Python terminal under a single user-facing mode.[^11][^17]