GPT-5

GPT-5 is OpenAI's flagship large language model, first released on August 7, 2025. It represents a fundamental shift in OpenAI's model strategy: rather than maintaining separate model families for different capabilities (such as GPT-4o for speed and the o-series for reasoning), GPT-5 unifies these into a single system with built-in "thinking" capabilities and a real-time router that selects the appropriate level of reasoning for each query ^[1]. The model family has been updated several times since launch, with GPT-5.1 arriving on November 12, 2025, GPT-5.2 on December 11, 2025, GPT-5.3 Instant on March 3, 2026, GPT-5.4 on March 5, 2026, and GPT-5.5 on April 23, 2026 ^[2]^[3]^[4]^[24]^[27].

At launch, GPT-5 set new state-of-the-art results on multiple benchmarks, including 94.6% on AIME 2025 (mathematics), 74.9% on SWE-bench Verified (software engineering), and 84.2% on MMMU (multimodal understanding). It also showed a significant reduction in hallucinations, producing roughly six times fewer factual errors than its predecessor o3 when using its thinking mode ^[1]. Sam Altman introduced the model at launch as having "a legitimate PhD-level expert in anything" and described it as "like having a team of Ph.D.-level experts in your pocket" ^[25]^[26].

Background and Development

By mid-2025, OpenAI was maintaining two separate product lines: the GPT-4o family, optimized for low-latency conversational use, and the o-series (o1, o3), designed for complex reasoning tasks requiring chain-of-thought processing. This split created confusion for both developers and end users, who had to choose between models with different strengths and could not get both fast responses and deep reasoning from the same system ^[1].

GPT-5 was built to solve this problem. The model incorporates a unified architecture with three components: an efficient model that handles straightforward queries quickly, a deeper reasoning model (GPT-5 "thinking") for harder problems, and a real-time router that automatically decides which component to engage based on conversation type, problem complexity, tool needs, and explicit user intent. From the user's perspective, it is a single model that adapts its behavior to the difficulty of the question ^[1].

The development of GPT-5 took place against a backdrop of intensifying competition among frontier AI labs, with Anthropic's Claude models, Google's Gemini series, and open-source efforts like DeepSeek all making rapid progress. By the time of GPT-5's launch, the AI industry had entered a phase of rapid iteration where major model releases from competing labs were separated by weeks rather than months.

OpenAI's approach with GPT-5 also reflected a strategic bet on model unification. Rather than continuing to fragment its offerings across multiple model families with different API endpoints, pricing structures, and capability profiles, the company consolidated everything into a single product line. This simplified the developer experience and reduced the need for complex model-selection logic in production applications.

In the run-up to launch, OpenAI had been hinting at GPT-5 for more than a year. Altman had repeatedly suggested that the next flagship would close the gap between assistants and "experts," framing the company's mission around delivering AGI-grade capability inside ChatGPT ^[25]. The marketing language at launch leaned into this expectation, with OpenAI describing GPT-5 as "our smartest, fastest, most useful model yet, with built-in thinking" ^[1].

Launch event

OpenAI unveiled GPT-5 in a one-hour livestream on August 7, 2025, beginning at 10:00 a.m. Pacific Time on the company's YouTube channel and X profile. Sam Altman led the presentation alongside more than a dozen OpenAI staff who demonstrated the model on tasks ranging from competitive math problems to building a working web application from a single prompt. Altman compared the experience of returning to GPT-4 after using GPT-5 to switching from a pixelated phone display back to a non-retina screen, saying earlier models felt "quite miserable" by comparison ^[25]^[26]. He also emphasized that GPT-5's ability to "instantaneously create an entire piece of computer software" would define the model's appeal, coining the phrase "software on demand" to describe the workflow ^[26]^[34].

The livestream also introduced four selectable ChatGPT personalities, named Cynic, Robot, Listener, and Nerd, which let users dial in conversational tone without crafting a system prompt. ChatGPT Pro subscribers gained the ability to connect Gmail, Google Calendar, and Google Contacts so that GPT-5 could draft replies, schedule meetings, and answer questions about personal correspondence inside a single chat ^[26]^[34].

GPT-5 (August 2025)

Unified Architecture and Router System

GPT-5's most important technical innovation is its unified architecture with an intelligent routing system. The system comprises three integrated components ^[1]^[21]:

Fast model: A smart, efficient model that handles most queries with quick responses, similar in speed to GPT-4o.
Thinking model: A deeper reasoning model (GPT-5 thinking) for complex problems, producing extended chains of thought similar to the o-series.
Real-time router: An intelligent dispatcher that analyzes each query and determines which component to engage.

The router's decision-making process considers multiple factors: conversation type, complexity, tool requirements, and explicit user intent. OpenAI reports the router correctly identifies complexity in 94% of cases, with continuous improvement through reinforcement learning ^[21].

When thinking mode is engaged, GPT-5 produces 22% fewer major errors compared to standard (non-thinking) mode and dramatically improves performance on expert-level questions, from 6.3% to 24.8% accuracy ^[21]. The thinking mode achieves superior results using 50-80% fewer tokens than o3 across visual reasoning, agentic coding, and scientific problem-solving ^[1].

Developers can also override the router's decisions. The API supports explicit control over whether thinking mode is engaged, giving developers the ability to force deep reasoning for specific queries or disable it for latency-sensitive applications. The system exposes a reasoning_effort parameter (with values such as minimal, low, medium, and high) and a verbosity parameter, allowing developers to dial in cost, latency, and answer length on a per-request basis ^[5].

OpenAI has not publicly disclosed the underlying training compute, parameter count, or model architecture for the components of GPT-5. The company described the system in launch materials as "a system" rather than a single monolithic network, and analysts have generally treated the fast model and thinking model as distinct underlying weights coordinated by the router ^[1]^[9]^[21].

Specifications

GPT-5 launched with the following API-level specifications:

Specification	Value
Context window (input)	272,000 tokens
Maximum output	128,000 tokens
Model variants	gpt-5, gpt-5-mini, gpt-5-nano, GPT-5 Pro
Initial snapshot	gpt-5-2025-08-07
Thinking mode	Built-in, automatic or user-controlled
Modalities	Text, image, audio (input and output)
API input pricing	$1.25 per 1M tokens
API output pricing	$10.00 per 1M tokens
Cached input pricing	$0.125 per 1M tokens (90% discount)

The 272K-token context window represented a significant increase over GPT-4o's 128K tokens. The model also supported parallel tool use, built-in web search, and native audio processing ^[1]^[5].

OpenAI released several model sizes at launch. GPT-5 (standard) served as the main offering for complex tasks. GPT-5 mini provided a balance of capability and speed at lower cost. GPT-5 nano, priced at $0.05 per million input tokens, targeted high-volume, cost-sensitive applications like classification and extraction. GPT-5 Pro, a higher-effort reasoning configuration that uses substantially more inference compute, was made available to ChatGPT Pro subscribers and was not initially exposed in the public API at launch ^[5]^[26].

The API also exposed fixed-date snapshots so applications could pin to consistent behavior. The launch snapshot was gpt-5-2025-08-07, with the alias gpt-5 updating as OpenAI released improvements ^[5].

The caching system was a notable addition to the API offering. Tokens that appeared in a prompt recently submitted to the API were automatically cached, and subsequent requests reusing those cached tokens were charged at a 90% discount ($0.125 per million tokens instead of $1.25). This dramatically reduced costs for applications that made repeated calls with overlapping context, such as multi-turn conversations or iterative code generation workflows ^[5].

Variant lineup at launch

Variant	API input price	API output price	Positioning
GPT-5	$1.25 / 1M tokens	$10.00 / 1M tokens	Default flagship for complex tasks
GPT-5 mini	$0.25 / 1M tokens	$2.00 / 1M tokens	Balanced cost/quality
GPT-5 nano	$0.05 / 1M tokens	$0.40 / 1M tokens	High-volume, low-latency
GPT-5 Pro	Not in public API at launch	Not in public API at launch	ChatGPT Pro subscribers; deeper extended reasoning

At launch the standard gpt-5 API endpoint defaulted to medium reasoning effort, with developers able to escalate to higher effort levels for harder problems or downshift to minimal to keep responses snappy and cheap ^[5].

Benchmark Performance

GPT-5 set new state-of-the-art results across several categories at launch:

Benchmark	Category	GPT-5 (Thinking)	o3	GPT-4o
AIME 2025	Mathematics	94.6%	79.2%	26.7%
SWE-bench Verified	Software engineering	74.9%	69.1%	38.0%
MMMU	Multimodal understanding	84.2%	74.9%	69.1%
GPQA Diamond	Graduate-level science	81.6%	79.7%	53.6%
Aider Polyglot	Coding (multi-language)	88.0%	-	45.3%

With the Pro variant and Python tools enabled, GPT-5 scored a perfect 100% on AIME 2025. Even without tools, the thinking variant reached 99.6%. With extended reasoning, GPT-5 Pro also set a state-of-the-art on GPQA Diamond at 88.4% without tools, well above any score posted by an OpenAI predecessor at the time of release ^[1]^[26].

The AIME (American Invitational Mathematics Examination) results were particularly noteworthy because these are competition-level math problems designed for top high school students. A score of 94.6% without tools placed GPT-5 well above the performance of the vast majority of human test-takers.

Independent benchmarking sites such as Vellum, llm-stats.com, and the LM Council leaderboard published their own evaluations in the days after launch. They generally corroborated OpenAI's claim that GPT-5 thinking matched or beat o3 on most reasoning workloads while using a fraction of the output tokens, but several reviewers cautioned that the gap to Claude and Gemini on agentic coding was smaller than OpenAI's own charts suggested ^[9]^[13].

Tool use and agentic coding

On agentic coding evaluations, GPT-5 reported strong but not category-leading results at launch. OpenAI's own benchmark page showed GPT-5 thinking at 88.0% on Aider Polyglot, a clear improvement over o3, alongside 74.9% on SWE-bench Verified using OpenAI's published harness. On the Tau-bench retail and airline customer-service environments (often referred to as Tau-bench) GPT-5 also led the o-series for tool use, though the gap to Anthropic's Claude family on the same suite was narrower than the SWE-bench gap ^[1]^[13].

For developers, the model exposed structured outputs (JSON Schema), function calling, parallel tool calls, file search, web search, image input, and audio input/output through the standard chat completions and Responses APIs. OpenAI also rolled out a custom tools interface that lets developers describe tools in plain text rather than rigid JSON, which was pitched as a better fit for the way GPT-5 reasons about tool selection inside long sessions ^[5].

Multimodal Capabilities

GPT-5 launched as a natively multimodal model, capable of processing text, images, and audio as both inputs and outputs. This represented a continuation of the multimodal approach introduced with GPT-4o but with significantly enhanced capabilities.^[1]

On the MMMU (Massive Multi-discipline Multimodal Understanding) benchmark, which tests a model's ability to reason about images, diagrams, and charts across academic disciplines, GPT-5 Thinking scored 84.2%, compared to GPT-4o's 69.1%. The model demonstrated particular strength on tasks requiring joint reasoning across text and visual inputs, such as interpreting scientific diagrams, analyzing financial charts, and solving geometry problems presented as images.^[1]

GPT-5.2 further expanded multimodal performance, with high scores on MMMU-Pro (86.5%) and Video-MMMU (90.5%). The Video-MMMU results suggested a powerful, natively multimodal architecture capable of reasoning across temporal and spatial dimensions simultaneously, enabling the model to understand and reason about video content in addition to static images.^[2]

The native audio capabilities allowed GPT-5 to process spoken input directly and generate spoken responses, enabling real-time voice conversations without the intermediate step of speech-to-text transcription. This was particularly relevant for ChatGPT's voice mode and for applications in customer service, accessibility, and language learning.

Hallucination Reduction

One of the most significant improvements in GPT-5 was a substantial reduction in hallucinations. According to OpenAI's internal evaluations, GPT-5 (thinking) produced roughly five to six times fewer factual errors than o3 across three factual accuracy benchmarks when browsing was enabled. With web search active, GPT-5 responses were approximately 45% less likely to contain a factual error compared to GPT-4o ^[1].

This improvement addressed one of the most persistent criticisms of large language models: their tendency to generate plausible-sounding but factually incorrect information. For enterprise and professional use cases where factual reliability is critical, the hallucination reduction was arguably more important than any single benchmark improvement.

Deception rates, measured in scenarios with impossible coding tasks or missing multimodal inputs, dropped from 4.8% for o3 to 2.1% for GPT-5 with reasoning enabled. Sycophantic responses declined from approximately 14.5% to under 6% in OpenAI's internal evaluations, an effort the company explicitly credited to feedback after the unpopular sycophantic GPT-4o update of April 2025 ^[29]^[30].

Efficiency Gains

GPT-5 (thinking) matched or exceeded o3's performance across most benchmarks while using 50-80% fewer output tokens. This efficiency translated directly into lower costs and faster response times for developers, making the thinking capabilities practical for production workloads rather than being limited to specialized research scenarios ^[1].

The token efficiency improvement also had implications for user experience. Shorter reasoning chains meant faster responses, which made the model feel more responsive in interactive settings like ChatGPT conversations, even when engaging in complex reasoning.

System card and safety

OpenAI published the GPT-5 system card on August 7, 2025, alongside the launch. It describes the model's evaluations, training data choices, deployment safeguards, and remaining limitations. The system card was unusually detailed for a frontier release, running to dozens of pages and covering jailbreaks, prompt injection, deception, biological and chemical risk, cybersecurity, and persuasion ^[28].

Under OpenAI's Preparedness Framework, the company classified GPT-5 thinking as High capability in the Biological and Chemical risk domain. OpenAI stated that it did not have definitive evidence the model could meaningfully help a novice cause severe biological harm (its threshold for High), but said it adopted a precautionary stance because evaluations could not rule out marginal uplift. The classification triggered the activation of associated safeguards under the framework, including additional refusal training, monitoring of API traffic for misuse, and external red-team testing ^[28].

A central new safety design choice was safe-completions. Rather than a binary classification of user intent ("safe" vs. "unsafe"), safe-completions train the model to maximize helpfulness subject to safety constraints, often producing partial answers, high-level guidance, or explicit refusals with safer alternatives instead of stonewalling. OpenAI reported that this method recovered substantial helpfulness in dual-use scenarios while reducing genuinely harmful outputs ^[28]^[30].

The system card also documented the red-teaming campaign behind GPT-5: more than 5,000 hours of work from over 400 external testers and experts focused on violent attack planning, jailbreaks, prompt injection, bioweaponization, child-safety risks, and adversarial multimodal inputs. The classification under the Preparedness Framework also has implications for OpenAI's internal Responsible Scaling commitments, sometimes discussed in the broader AI safety community as analogous to Anthropic's ASL levels, since both frameworks gate deployment on capability evaluations rather than only on alignment evaluations ^[28].

Launch reception and the router controversy

The initial reception of GPT-5 was sharply mixed. OpenAI reported that API traffic doubled within 24 hours and Microsoft began rolling the model into its Azure and Copilot stacks the same day ^[1]^[23]. Early users praised the coding performance, the cost reduction relative to GPT-4o, and the improvements in factuality. Sam Altman's claim that GPT-5 was "like talking to a legitimate PhD-level expert" became one of the most quoted lines from the launch event ^[25]^[26].

Within hours, however, the rollout began drawing serious complaints. The most contentious change was that GPT-5 replaced GPT-4o, GPT-4, GPT-4.1, o3, o4-mini, GPT-4.5, and several other models in ChatGPT, removing them from the model picker for many users. Subscribers who had built workflows or even emotional habits around GPT-4o reacted strongly. Some longtime users described it as "the biggest bait-and-switch in AI history" on Reddit and X, and a sizeable subset of Plus subscribers said the new default felt colder and less personable than 4o ^[11]^[12].

The second source of complaints was the router. Because the system is presented as a single model, users could not always tell whether their question had been routed to the fast model or the thinking model. On August 8, 2025, the day after launch, Altman acknowledged on X that "the autoswitcher broke and was out of commission for a chunk of the day, and the result was GPT-5 seemed way dumber." The bug routed many queries that should have gone to the thinking model to the fast model instead, dragging benchmark-style behavior on hard problems down ^[12]^[14].

Within a week, OpenAI shipped a series of fixes. It restored GPT-4o for paid users in the model picker, doubled rate limits for ChatGPT Plus on GPT-5 thinking from 200 to 3,000 weekly messages, added a clearer indicator showing which underlying model was answering, and introduced "Auto," "Fast," and "Thinking" sub-options so users could override the router. OpenAI also retained legacy access to o3 for paying subscribers under a "show legacy models" toggle ^[11]^[12]^[14].

Independent reviewers were similarly split. Vellum's launch-week analysis described GPT-5 as "clearly state-of-the-art on math and STEM reasoning" but "not the leap people were primed for on coding," placing Claude Opus 4.x ahead on SWE-bench Pro while GPT-5 led AIME and FrontierMath. METR, which evaluates how long autonomous tasks frontier models can complete, reported a noticeable bump in long-horizon task completion compared to o3, with GPT-5 thinking reliably succeeding on tasks taking expert humans up to roughly two hours, but warned that error rates rose sharply beyond that horizon ^[9]^[13].

Several journalists also noted basic factual errors in early ChatGPT outputs. Quartz reported that some users got responses claiming Joe Biden was still U.S. president or misspelling "Oregon" as "Onegon," which contradicted Altman's PhD-level marketing. OpenAI argued these were largely cases where the router had wrongly selected the fast model, and shipped further router updates over the next month ^[11]^[14].

On the LMArena human-preference leaderboard, GPT-5 was added to the Text, WebDev, and Vision boards within hours of release and entered the top three on each within its first week, though Anthropic and Google updates kept it from holding the outright top spot through the rest of August 2025 ^[13]^[14].

OpenAI also drew criticism over a benchmark chart shown during the launch presentation. Several bar graphs comparing GPT-5 to o3 and GPT-4o used inconsistent scaling: in one slide, a bar representing 52.8% accuracy was drawn nearly twice as tall as a bar representing 69.1%, while the 30.8% and 69.1% bars appeared roughly the same height. Altman acknowledged the error on X the following day, calling it a "mega chart screwup," and OpenAI quietly corrected the figures in the published blog post. The Washington Post described the episode as a "chart crime" and tied it to a broader pattern of selective benchmark presentation among frontier labs that month ^[9]^[35]^[36].

On August 18, 2025, eleven days after the launch, Altman publicly conceded that OpenAI had "totally screwed up" the rollout in remarks during a dinner with reporters in San Francisco. He attributed the missteps to the speed of the launch, the underestimated emotional attachment users had to GPT-4o, and the autoswitcher bug, and said the company would invest "trillions of dollars" in data center capacity to support the new model and its successors ^[37].

ChatGPT integration at launch

OpenAI made GPT-5 the default model in ChatGPT for all users at launch, including the free tier (a first for an OpenAI flagship), with progressive rollout to Enterprise and Edu the following week ^[1].

Usage limits reflected the tier:

Tier	GPT-5 access at launch
Free	GPT-5 with router; mini fallback when limit reached; ~10 messages per 5 hours
Plus ($20/month)	Higher message caps; 3,000 GPT-5 thinking messages per week after Aug 11 update
Pro ($200/month)	Unlimited GPT-5 and access to GPT-5 Pro
Team / Business	Same as Plus, with admin controls
Enterprise / Edu	Phased rollout starting mid-August 2025

On the free tier, ChatGPT exposed a streamlined version of the router. Free users got access to the standard GPT-5 endpoint by default and were silently downgraded to GPT-5 mini after hitting their five-hour message cap, rather than being blocked outright. Plus subscribers could explicitly select "GPT-5 Thinking" from the model picker, and Pro subscribers could select "GPT-5 Pro" for the highest-effort reasoning configuration ^[1]^[14].

Enterprise launch partners

OpenAI publicized a roster of early enterprise customers alongside the GPT-5 launch, framing the model as production-ready for regulated industries. The named partners included:

Organization	Sector	Use case at launch
BNY Mellon	Financial services	Internal AI assistant for employees building on prior OpenAI partnership for early model access
Lowe's	Retail	Associate-facing assistants for store operations and inventory planning
Morgan Stanley	Financial services	Research and client-advisor tooling building on the firm's earlier GPT-4 deployments
Figma	Design software	Codex-style code generation and design-to-code workflows
Intercom	Customer support software	Customer-facing AI agent product Fin
SoftBank	Conglomerate	Internal productivity rollout across portfolio companies
T-Mobile	Telecom	Customer-care agents and call-center summarization
California State University	Higher education	Campus-wide ChatGPT Edu rollout for students and faculty

Microsoft made GPT-5 available through Microsoft 365 Copilot, GitHub Copilot, and Azure AI Foundry the same day as the OpenAI launch, with admin controls that let enterprises pilot GPT-5 in specific tenants before broader rollout ^[1]^[23]^[34].

Developer migration and pricing context

GPT-5's API pricing was structured to nudge developers off GPT-4o and the o-series. At $1.25/$10.00 per million input/output tokens for the standard model, $0.25/$2.00 for mini, and $0.05/$0.40 for nano, the family undercut GPT-4o's $2.50/$10.00 by half on input cost while delivering substantially better evaluations. Cached input was charged at a 90% discount ($0.125 per million tokens for the flagship), and Batch API calls were billed at 50% off, which made high-volume retrieval and bulk classification dramatically cheaper than under the GPT-4o regime ^[5].

OpenAI also published a comparison showing that for tasks where developers had previously chained a planner on o3 with an executor on GPT-4o, the unified GPT-5 thinking endpoint typically reduced total token spend by half or more thanks to the 50-80% reduction in reasoning tokens versus o3 ^[1]^[5].

GPT-5.1 (November 2025)

OpenAI released GPT-5.1 on November 12, 2025, three months after the initial GPT-5 launch. The release was marketed as "a smarter, more conversational ChatGPT" rather than a pure capability bump and rolled out first to paid Pro, Plus, Go, and Business users before reaching free and logged-out users a few days later ^[27].

GPT-5.1 split the family into two everyday variants:

GPT-5.1 Instant, the most-used tier, which was given a warmer default tone, better instruction following, and (for the first time in an Instant model) the ability to engage adaptive reasoning before answering harder questions.
GPT-5.1 Thinking, a reworked reasoning model that was easier to read on simple questions, faster on routine prompts, and more persistent on hard ones.

OpenAI published a system card addendum describing the safety evaluations specific to GPT-5.1 and confirmed that GPT-5 Instant and GPT-5 Thinking would remain available in ChatGPT under a "legacy models" dropdown for paid subscribers for three months after launch. The November release was widely read as OpenAI's response to user complaints that GPT-5 felt overly clinical compared to GPT-4o and as a pre-emptive move ahead of Google's Gemini 3 line ^[27]^[32].

GPT-5.2 (December 2025)

OpenAI released GPT-5.2 on December 11, 2025, roughly four months after the initial GPT-5 launch and one month after GPT-5.1. The update introduced a three-tier product structure: Instant (for fast, everyday queries), Thinking (for complex reasoning), and Pro (for maximum performance on the hardest problems) ^[2].

Key Improvements

Feature	GPT-5	GPT-5.2
Context window	272K	400K
AIME 2025 (no tools)	94.6%	100%
SWE-bench Verified	74.9%	80.0%
ARC-AGI-2 (abstract reasoning)	-	52.9%
GDPval (professional work)	38.8%	70.9%
FrontierMath	-	40.3%
API input pricing	$1.25/1M	$1.75/1M
API output pricing	$10.00/1M	$14.00/1M

GPT-5.2 expanded the context window to 400,000 tokens across all paid tiers. On GDPval, a benchmark measuring performance on knowledge work tasks across 44 occupations, GPT-5.2 Thinking became the first model to perform at or above human expert level, beating or tying top industry professionals on 70.9% of comparisons ^[2].

GPT-5.2 Thinking also produced 38% fewer errors than the previous GPT-5.1 update, with the response error rate dropping from 8.8% to 6.2% ^[2].

The ARC-AGI-2 result was notable because this benchmark tests abstract reasoning ability, a capability widely considered to be a fundamental limitation of current AI systems. GPT-5.2's score of 52.9%, compared to GPT-5.1's 17.6%, represented a 35-point improvement and suggested significant progress on a capability that has historically been resistant to scaling ^[2].

Benchmark Deep Dive: FrontierMath

GPT-5.2's performance on FrontierMath (40.3% on Tiers 1-3) was a significant milestone. FrontierMath, developed by EpochAI, consists of research-level mathematics problems that require graduate-level or beyond mathematical reasoning. Prior to o3, no model had exceeded 2% on this benchmark. o3 reached 25.2%, and GPT-5.2's 40.3% represented a further 60% relative improvement. The result demonstrated that mathematical reasoning capabilities were continuing to scale rapidly with each new model generation.^[2]

GPT-5.2-Codex

Alongside the main release, OpenAI introduced GPT-5.2-Codex, a variant specifically optimized for agentic coding tasks in the Codex environment. This version featured improvements in context compaction (allowing it to work with large codebases more efficiently) and stronger performance on large-scale code changes such as refactors and migrations ^[6].

GPT-5.2-Codex was designed for long-horizon coding workflows where an agent needs to understand a full codebase, plan a multi-file change, and execute it with minimal human intervention. The context compaction feature allowed the model to work within its context window more efficiently by summarizing less-relevant portions of the codebase while maintaining full detail on actively edited files.

Competitive Context

GPT-5.2 arrived during an intense period of competition. Google had released Gemini 3 Pro on November 18, 2025, and Anthropic launched Claude Opus 4.5 on November 24, 2025. GPT-5.2 was widely seen as OpenAI's response to these competitive releases. While Claude Opus 4.5 held the edge on SWE-bench Verified at 80.9%, GPT-5.2 achieved state-of-the-art on SWE-bench Pro at 55.6% and led in abstract reasoning with 52.9% on ARC-AGI-2 ^[2]^[7].

GPT-5.3 Instant (March 2026)

On March 3, 2026, OpenAI released GPT-5.3 Instant, which replaced GPT-5.2 Instant as the default model for all ChatGPT users, including those on the free tier ^[3].

GPT-5.3 Instant focused on conversational quality rather than raw benchmark performance. Key improvements included:

Reduced hallucinations: 26.8% fewer hallucinations with web search (19.7% fewer without) compared to GPT-5.2 Instant ^[3].
Better conversational tone: The model significantly reduced unnecessary refusals, overly defensive preambles, and moralizing language that had been a common complaint about earlier models ^[3].
Improved web search integration: Rather than simply summarizing search results, GPT-5.3 Instant balanced information from the web with its own knowledge and reasoning to provide better-contextualized answers ^[3].
Expanded context window: The Instant tier's context window increased from 128K to 400K tokens ^[3].

GPT-5.3 Instant was also made available to developers in the API as gpt-5.3-chat-latest. The conversational improvements were particularly relevant for ChatGPT's consumer user base, where natural-sounding dialogue matters more than performance on academic benchmarks ^[3].

A separate release, GPT-5.3-Codex, provided long-term support for GitHub Copilot integrations, optimizing the model specifically for inline code suggestions and repository-level understanding ^[10].

GPT-5.4 (March 2026)

GPT-5.4 was announced on March 5, 2026, and represents the most capable model in the GPT-5 family until the release of GPT-5.5. It combines frontier reasoning, coding capabilities inherited from GPT-5.3-Codex, and a new native computer-use capability in a single model ^[4].

Key Specifications

Specification	GPT-5.4
Context window	1.05M tokens
Maximum output	128,000 tokens
API input pricing (standard)	$2.50 per 1M tokens
API output pricing	$15.00 per 1M tokens
Extended context input (>272K)	$5.00 per 1M tokens
Cached input	$1.25 per 1M tokens
Computer use	Native, state-of-the-art
Variants	GPT-5.4, GPT-5.4 Pro, GPT-5.4 Mini, GPT-5.4 Nano

Native Computer Use

GPT-5.4 is the first general-purpose model from OpenAI with native, state-of-the-art computer-use capabilities. This means the model can directly operate desktop applications, navigate web interfaces, and carry out complex multi-step workflows across different programs. The model works through a screenshot-action loop: it receives a screenshot of the current screen, analyzes the visual content, and returns structured actions (clicks, typing, scrolling) that an agent framework can execute. The cycle then repeats with the next screenshot.^[4]^[22]

On OSWorld-Verified, a benchmark for desktop automation tasks, GPT-5.4 scored 75.0%, surpassing the human baseline of 72.4% and dramatically improving over GPT-5.2's 47.3%. This made GPT-5.4 the first AI model to operate a computer better than human experts on this benchmark ^[4]^[22].

On BrowseComp, a benchmark for agentic web browsing, GPT-5.4 reached 82.7% (up from 65.8% for GPT-5.2), while GPT-5.4 Pro scored 89.3% ^[4].

The computer-use capability enables a new class of AI agent applications. Rather than operating through APIs or structured tool calls, GPT-5.4 can interact with software the same way a human would: by reading screen content, moving a cursor, clicking buttons, and typing text. This makes it possible to automate tasks in applications that lack APIs or programmatic interfaces.

1.05 Million Token Context

The context window expanded to 1.05 million tokens, making GPT-5.4 the first OpenAI model with over one million tokens of context in a standard API offering. This allows agents to plan, execute, and verify tasks across long horizons, processing entire codebases or extensive document collections in a single session. Requests exceeding 272K tokens are priced at 2x for input and 1.5x for output ^[4].

The million-token context is particularly valuable for coding agents that need to reason about entire repositories, legal professionals reviewing large document sets, and research applications that involve synthesizing information from many sources simultaneously.

Benchmark Improvements

Benchmark	GPT-5.2	GPT-5.4	Change
GDPval	70.9%	83.0%	+12.1
OSWorld-Verified	47.3%	75.0%	+27.7
BrowseComp	65.8%	82.7%	+16.9
Investment banking modeling	68.4%	87.3%	+18.9
Factual accuracy	Baseline	33% fewer false claims	-
Token efficiency	Baseline	~50% improvement	-

GPT-5.4 also improved token efficiency by roughly 50% on complex tasks and reduced false claims by 33% compared to GPT-5.2 ^[4].

Model Variants

Alongside GPT-5.4, OpenAI released smaller variants:

Variant	Speed	Context	API Input Price	API Output Price	Use Case
GPT-5.4	Standard	1.05M	$2.50/1M	$15.00/1M	Complex reasoning, professional tasks
GPT-5.4 Pro	Slower	1.05M	$30.00/1M	$180.00/1M	Maximum performance on hardest problems
GPT-5.4 Mini	~180 tok/s	400K	$0.75/1M	$4.50/1M	High-volume workloads
GPT-5.4 Nano	~200 tok/s	400K	$0.20/1M	$1.25/1M	Cost-sensitive applications

GPT-5.4 Mini and Nano were released on March 17, 2026, bringing many of GPT-5.4's strengths to faster, more efficient models designed for high-volume production use. GPT-5.4 Mini operates at roughly 180-190 tokens per second, while GPT-5.4 Nano reaches approximately 200 tokens per second, more than 2x faster than the original GPT-5 Mini ^[8].

GPT-5.5 (April 2026)

OpenAI released GPT-5.5 on April 23, 2026, just six weeks after GPT-5.4. The release was framed as "a new class of intelligence for real work," with sharper performance on writing and debugging code, web research, data analysis, document and spreadsheet creation, and operating software autonomously ^[24].

GPT-5.5 was the first OpenAI model to ship with a 1 million-token context window in both ChatGPT (for selected paid tiers) and the public API, and it became the new default frontier model in ChatGPT and Codex. Pricing in the API was set at $5 per million input tokens and $30 per million output tokens for the standard model. Alongside it, OpenAI released GPT-5.5 Pro, a higher-effort reasoning configuration aimed at the hardest professional workloads, priced at $30 per million input tokens and $180 per million output tokens ^[24]^[31].

In ChatGPT, GPT-5.5 rolled out as the default for Plus, Pro, Business, and Enterprise tiers, while the free tier continued on GPT-5.3 Instant with automatic upgrades to GPT-5.5 Thinking on harder questions. In Codex, GPT-5.5 was made available with a 400K-token context window for Plus, Pro, Business, Enterprise, Edu, and Go users. The release came just six weeks after GPT-5.4, illustrating how frontier model launches have begun to resemble incremental software updates rather than blockbuster events ^[24].

On benchmarks, OpenAI reported GPT-5.5 reached 88.7% on SWE-bench Verified and 58.6% on SWE-bench Pro, solving more tasks end-to-end in a single pass than any earlier model. GPT-5.5 also scored 98.0% on Tau2-bench Telecom, a complex customer-service workflow benchmark, without prompt tuning. The model was paired with an updated GPT-5.5 system card and a separate GPT-5.5 Instant system card describing its safety evaluations ^[31]^[33].

On May 5, 2026, OpenAI quietly promoted GPT-5.5 Instant to the new default ChatGPT model for free, Plus, and Pro users, replacing GPT-5.3 Instant in everyday conversations ^[24].

Developer Migration from the o-Series

GPT-5's unified architecture was designed partly to simplify the developer experience by eliminating the need to choose between GPT-4o and the o-series for different tasks. When GPT-5 launched, OpenAI positioned it as the default model for both conversational and reasoning workloads, encouraging developers to migrate from both GPT-4o and the o-series.^[1]^[5]

The migration pattern varied by use case. Developers whose applications primarily needed conversational AI or content generation found GPT-5 to be a straightforward replacement for GPT-4o, with better performance and comparable pricing. Developers who had been using o1 or o3 for specialized reasoning tasks had a more nuanced decision, as GPT-5's thinking mode covered most reasoning use cases but did not always match o3's depth on the hardest problems.^[1]^[12]

By late 2025, OpenAI's update cadence had accelerated significantly. The rapid succession of GPT-5, GPT-5.1, and GPT-5.2 within five months required developers to continuously adapt their applications. OpenAI addressed this partly through its tiered model structure, with Instant models providing stability for everyday use while Thinking and Pro variants pushed the capability frontier. The introduction of GPT-5.4 in March 2026 further consolidated the lineup, with OpenAI framing it as the default model for both "broad general-purpose work and most coding tasks," replacing both gpt-5.2 in the API and gpt-5.3-codex in Codex.^[4]^[23]

Enterprise Adoption

GPT-5 saw rapid enterprise adoption through Microsoft's integration into its product ecosystem. Microsoft, which integrates OpenAI models across Copilot Studio, Microsoft 365 Copilot, and Azure services, began making GPT-5 available to enterprise customers starting in August 2025. Enterprises could phase in GPT-5 alongside existing models, starting with high-value workflows like code reviews, RFP automation, and analytics.^[23]

Azure's native hooks allowed minimal disruption to existing governance, security, and compliance protocols during migration. By December 2025, GPT-5.2 was introduced into Microsoft Foundry as a new standard for enterprise AI, with optimized configurations for large-scale deployment.^[23]

The hallucination reduction in GPT-5 was cited as a key factor in enterprise adoption. Several companies reported integrating GPT-5 into customer-facing applications where factual reliability had previously been a barrier. The built-in thinking mode meant that enterprises no longer needed to maintain separate integrations with the o-series for tasks requiring reasoning, simplifying their AI infrastructure.^[1]

Benchmark Progression Across GPT-5 Versions

The rapid iteration of the GPT-5 family produced measurable improvements across each version on key benchmarks.

Benchmark	GPT-5 (Aug 2025)	GPT-5.2 (Dec 2025)	GPT-5.4 (Mar 2026)	GPT-5.5 (Apr 2026)
AIME 2025 (no tools)	94.6%	100%	100%	100%
SWE-bench Verified	74.9%	80.0%	-	88.7%
SWE-bench Pro	-	55.6%	-	58.6%
GPQA Diamond	81.6%	93.2% (Pro)	-	-
GDPval	38.8%	70.9%	83.0%	-
OSWorld-Verified	-	47.3%	75.0%	-
BrowseComp	-	65.8%	82.7%	-
ARC-AGI-2	-	52.9%	-	-
FrontierMath (Tiers 1-3)	-	40.3%	-	-
Tau2-bench Telecom	-	-	-	98.0%
Context window	272K	400K	1.05M	1.05M

The GDPval benchmark, which measures performance on professional knowledge work across 44 occupations, showed particularly dramatic improvement, nearly doubling from 38.8% with GPT-5 to 70.9% with GPT-5.2, and reaching 83.0% with GPT-5.4. This trend suggested that the GPT-5 family was becoming increasingly useful for real-world professional tasks beyond the academic benchmarks that had traditionally been used to evaluate language models.^[2]^[4]

The perfect 100% score on AIME 2025 achieved by GPT-5.2 Thinking (without tools) was a watershed moment for mathematical reasoning. The AIME is a competition designed for the top 5% of US high school mathematics students, and a perfect score without tool assistance demonstrated that GPT-5.2 had reached a level of mathematical competence that matched or exceeded top human performers on this particular exam.^[2]

GPT-5 Series Timeline

Date	Release	Key Feature
August 7, 2025	GPT-5	Unified model, 272K context, built-in thinking, gpt-5-2025-08-07 snapshot
August 8-15, 2025	Router fixes	GPT-4o restored for paid users, rate limits raised, Auto/Fast/Thinking sub-options added
November 12, 2025	GPT-5.1 (Instant + Thinking)	Warmer tone, adaptive reasoning in Instant tier
December 11, 2025	GPT-5.2	400K context, Instant/Thinking/Pro tiers
December 18, 2025	GPT-5.2-Codex	Optimized for agentic coding
February 5, 2026	GPT-5.3-Codex	Combined Codex + GPT-5 stacks
March 3, 2026	GPT-5.3 Instant	Improved conversational tone, 400K context for all
March 5, 2026	GPT-5.4	1.05M context, native computer use
March 17, 2026	GPT-5.4 Mini and Nano	Smaller, faster variants of GPT-5.4
April 23, 2026	GPT-5.5 + GPT-5.5 Pro	1M context in API and ChatGPT, new default
May 5, 2026	GPT-5.5 Instant default	Replaces GPT-5.3 Instant for free, Plus, and Pro tiers

Pricing Evolution

The GPT-5 series has maintained competitive pricing relative to its predecessors, particularly given the performance improvements:

Model	Input (per 1M tokens)	Output (per 1M tokens)	Release
GPT-4o	$2.50	$10.00	May 2024
GPT-5 nano	$0.05	$0.40	August 2025
GPT-5 mini	$0.25	$2.00	August 2025
GPT-5	$1.25	$10.00	August 2025
GPT-5.2	$1.75	$14.00	December 2025
GPT-5.4	$2.50	$15.00	March 2026
GPT-5.4 Mini	$0.75	$4.50	March 2026
GPT-5.4 Nano	$0.20	$1.25	March 2026
GPT-5.5	$5.00	$30.00	April 2026
GPT-5.5 Pro	$30.00	$180.00	April 2026

Notably, GPT-5 launched at a lower price per input token than GPT-4o despite substantially better performance, reflecting the efficiency improvements in the underlying architecture. The pricing trend across the GPT-5 series shows a gradual increase for the flagship model (from $1.25 to $5.00 per million input tokens for the standard tier and up to $30 for Pro) alongside the introduction of increasingly affordable smaller variants ^[5]^[24].

Competitive Landscape

The GPT-5 series exists in a highly competitive environment. As of mid-2026, the frontier model landscape includes several strong alternatives:

Provider	Model	Notable Strength
OpenAI	GPT-5.5	1M-token context in API and ChatGPT, agentic coding, computer use
Anthropic	Claude Opus 4.5	Coding (80.9% SWE-bench Verified)
Google	Gemini 3 Pro	Reasoning (1501 LMArena Elo), 1M context
DeepSeek	DeepSeek-V3.2	Cost efficiency (10-30x cheaper)
xAI	Grok 4	Real-time information integration

The late-2025 and mid-2026 period saw a significant compression of performance gaps between leading models, with each provider developing distinct specializations. Organizations increasingly deploy multiple models, routing queries to the most suitable model for each task type, rather than standardizing on a single provider ^[7].

By the May 2026 update of the Artificial Analysis intelligence index, GPT-5.5, Claude Opus 4.7, and Gemini 3.1 Pro sat within roughly three points of each other in headline reasoning scores, with each carving out a distinct niche. GPT-5.5 led agentic benchmarks such as GDPval (84.9%) and OSWorld (78.7%) and was generally chosen for autonomous tool-using workflows. Claude Opus 4.7 led SWE-bench Pro at 64.3% versus GPT-5.5's 58.6% and was favored for production coding because of its tighter handling of destructive actions. Gemini 3.1 Pro retained the largest practical context window, the lowest price among the frontier set, and a slight edge on pure reasoning tasks. Reviewers commonly recommended Claude Opus 4.7 as the default daily driver, GPT-5.5 for autonomous agents, and Gemini 3.1 Pro for high-volume long-context jobs ^[24]^[38].

Google's Gemini 3 Pro achieved an unprecedented 91.9% on GPQA Diamond, surpassing human expert performance (approximately 89.8%). Gemini 3 Pro's Deep Think mode also pushed Humanity's Last Exam to 41%, the highest published score on that benchmark. Anthropic's Claude Opus 4.5 held the SWE-bench Verified lead at 80.9%. DeepSeek-V3.2 offered frontier-class performance at a fraction of the cost, providing a strong option for cost-sensitive applications. GPT-5.4 and GPT-5.5's distinctive advantages as of mid-2026 are their native computer-use capabilities and their 1.05M-token context window ^[7]^[24].

The competitive dynamics of this period also drove pricing pressure across the industry. With DeepSeek demonstrating that high-quality models could be offered at dramatically lower prices, all major providers were forced to offer more competitive pricing or justify their premium through unique capabilities.

Claude vs GPT-5 vs Gemini at launch

In the weeks after the August 2025 GPT-5 launch, the comparison most frequently cited by reviewers stacked GPT-5 against Anthropic's Claude Opus 4.1 and Google's Gemini 2.5 Pro Deep Think. Vellum and the LM Council leaderboard reached broadly similar conclusions: GPT-5 led on AIME, FrontierMath, and most STEM reasoning benchmarks; Claude Opus 4.1 led on SWE-bench Verified and on long-context coding tasks; and Gemini 2.5 Pro Deep Think led on Humanity's Last Exam at the time. On user-preference benchmarks like LMArena, GPT-5 thinking entered the top three within a week of launch, but rarely held the outright #1 position once Claude and Gemini updates landed later in 2025 ^[9]^[13].

Model Deprecation and Transition

The rapid pace of GPT-5 updates created a complex model lifecycle that developers needed to navigate. OpenAI's approach was to deprecate older GPT-5 versions relatively quickly as newer ones launched. When GPT-5.4 was released in March 2026, OpenAI framed it as the replacement for both gpt-5.2 in the standard API and gpt-5.3-codex in the Codex environment, consolidating what had briefly been separate model tracks.^[4]

OpenAI also announced in 2025-2026 the deprecation of several pre-GPT-5 models, including o1, GPT-4.5, o3-mini, and GPT-4o. This wave of retirements effectively pushed the entire developer ecosystem toward the GPT-5 family and the o3/o4-mini reasoning models. Developers who had built on GPT-4o were encouraged to migrate to GPT-5, while those on o1 were directed to o3 or o4-mini.^[7]

The transition was not without friction. Some developers reported that prompt behaviors changed subtly between GPT-5 versions, requiring regression testing and prompt tuning with each update. OpenAI addressed this partly by maintaining stable model snapshot endpoints (e.g., gpt-5-2025-08-07 for the original launch and gpt-5.4-2026-03-05 for GPT-5.4) that developers could pin to for production stability, while the generic gpt-5 and gpt-5.4 aliases would receive rolling updates.

For ChatGPT users, the deprecation pattern was somewhat softer. After the initial GPT-5 launch backlash, OpenAI committed to keeping prior major versions available under a "legacy models" toggle for paid subscribers for at least three months. As of March 11, 2026, GPT-5.1 models were retired from ChatGPT, with existing conversations automatically migrated to GPT-5.3 Instant, GPT-5.4 Thinking, or GPT-5.4 Pro depending on context ^[11]^[27]^[32].

Reception and Impact

GPT-5's launch in August 2025 was broadly well-received among developers, though not without serious criticism on the consumer side. The unification of reasoning and conversational capabilities into a single model was praised as a significant usability improvement. Developers no longer had to choose between model families or implement their own routing logic ^[1].

The hallucination reduction was highlighted as a particularly important advance for enterprise adoption. Several companies reported integrating GPT-5 into customer-facing applications where factual reliability was previously a barrier ^[1].

However, some researchers noted that OpenAI's benchmark presentations were occasionally misleading. In one instance, a benchmark graph in the launch materials was found to contain errors, drawing public criticism. Others pointed out that while GPT-5 was a clear improvement over OpenAI's previous models, the gap between it and competitors like Claude and Gemini was smaller than in earlier generations ^[9].

The rapid cadence of updates, from GPT-5 in August to GPT-5.5 in April, roughly nine months later, also raised questions about versioning clarity and the challenge facing developers who need to maintain stable production systems while keeping up with frequent model changes. OpenAI addressed this partly through its tiered model structure, with Instant models providing stability for everyday use while Thinking and Pro variants pushed the capability frontier.

The GPT-5 series also marked a turning point in how AI models are consumed. The built-in thinking mode and automatic routing represented a move away from exposing raw model capabilities to users and toward providing an integrated, managed AI experience. This trend toward "model as service" rather than "model as tool" has implications for how developers build applications and how end users interact with AI systems.

In the long view, the August 2025 launch is now widely cited as the moment when frontier AI moved from a benchmark race to a product race. After GPT-5, every major provider, including Anthropic, Google, and xAI, accelerated their own product roadmaps and matched OpenAI's emphasis on routing, agentic capabilities, and enterprise integration over raw benchmark dominance ^[9]^[13].

References

"Introducing GPT-5." OpenAI, August 7, 2025. https://openai.com/index/introducing-gpt-5/
"Introducing GPT-5.2." OpenAI, December 11, 2025. https://openai.com/index/introducing-gpt-5-2/
"GPT-5.3 Instant: Smoother, more useful everyday conversations." OpenAI, March 3, 2026. https://openai.com/index/gpt-5-3-instant/
"Introducing GPT-5.4." OpenAI, March 5, 2026. https://openai.com/index/introducing-gpt-5-4/
"Introducing GPT-5 for developers." OpenAI, August 7, 2025. https://openai.com/index/introducing-gpt-5-for-developers/
"Introducing GPT-5.2-Codex." OpenAI, December 2025. https://openai.com/index/introducing-gpt-5-2-codex/
"GPT 5.1 vs Claude 4.5 vs Gemini 3: 2025 AI Comparison." Passionfruit. https://www.getpassionfruit.com/blog/gpt-5-1-vs-claude-4-5-sonnet-vs-gemini-3-pro-vs-deepseek-v3-2-the-definitive-2025-ai-model-comparison
"Introducing GPT-5.4 mini and nano." OpenAI, March 17, 2026. https://openai.com/index/introducing-gpt-5-4-mini-and-nano/
"OpenAI's GPT-5 is here." TechCrunch, August 7, 2025. https://techcrunch.com/2025/08/07/openais-gpt-5-is-here/
"GPT-5.3-Codex long-term support in GitHub Copilot." GitHub Changelog, March 18, 2026. https://github.blog/changelog/2026-03-18-gpt-5-3-codex-long-term-support-in-github-copilot/
"GPT-5's model router ignited a user backlash against OpenAI." Fortune, August 12, 2025. https://fortune.com/2025/08/12/openai-gpt-5-model-router-backlash-ai-future/
"GPT-5 Launch Timeline: A Story of Backlash and OpenAI's Fix." Arsturn, August 2025. https://www.arsturn.com/blog/the-gpt-5-launch-a-timeline-of-grand-promises-user-backlash-openais-scramble-to-fix-it
"GPT-5 Benchmarks." Vellum, August 2025. https://www.vellum.ai/blog/gpt-5-benchmarks
"Sam Altman confirms ChatGPT Plus subscribers will have increased rate limit amid continued GPT-5 backlash." TechRadar, August 2025. https://www.techradar.com/ai-platforms-assistants/chatgpt/sam-altman-confirms-chatgpt-plus-subscribers-will-have-increased-rate-limit-amid-continued-gpt-5-backlash
"GPT-5 Router Explained: Selecting Thinking vs. Fast Models." RankStudio. https://rankstudio.net/articles/en/gpt-5-router-explained
"GPT-5.4 Computer Use Explained." NxCode, 2026. https://www.nxcode.io/en/resources/news/gpt-5-4-computer-use-ai-automate-desktop-tasks-2026
"GPT-5.2 in Microsoft Foundry: Enterprise AI Reinvented." Microsoft Azure Blog. https://azure.microsoft.com/en-us/blog/introducing-gpt-5-2-in-microsoft-foundry-the-new-standard-for-enterprise-ai/
"Introducing GPT-5.5." OpenAI, April 23, 2026. https://openai.com/index/introducing-gpt-5-5/
"OpenAI releases GPT-5, calling it a 'team of Ph.D. level experts in your pocket'." NBC News, August 7, 2025. https://www.nbcnews.com/tech/tech-news/openai-releases-chatgpt-5-rcna223265
"OpenAI launches GPT-5, nano, mini and Pro: not AGI, but capable of generating 'software-on-demand'." VentureBeat, August 7, 2025. https://venturebeat.com/ai/openai-launches-gpt-5-not-agi-but-capable-of-generating-software-on-demand
"GPT-5.1: A smarter, more conversational ChatGPT." OpenAI, November 12, 2025. https://openai.com/index/gpt-5-1/
"GPT-5 System Card." OpenAI, August 7, 2025. https://openai.com/index/gpt-5-system-card/ ; PDF: https://cdn.openai.com/gpt-5-system-card.pdf
"GPT-5: New Features, Tests, Benchmarks, and More." DataCamp, August 2025. https://www.datacamp.com/blog/gpt-5
"GPT-5 Explained: Features, Performance, Pricing & Use Cases." Leanware, 2026. https://www.leanware.co/insights/gpt-5-features-guide
"OpenAI's GPT-5.5 masters agentic coding with 82.7% benchmark score." Interesting Engineering, April 2026. https://interestingengineering.com/ai-robotics/opanai-gpt-5-5-agentic-coding-gains
"GPT-5.1 Instant and GPT-5.1 Thinking System Card Addendum." OpenAI, November 12, 2025. https://openai.com/index/gpt-5-system-card-addendum-gpt-5-1/
"GPT-5.5 System Card." OpenAI, April 23, 2026. https://openai.com/index/gpt-5-5-system-card/
"GPT-5 and the new era of work." OpenAI, August 7, 2025. https://openai.com/index/gpt-5-new-era-of-work/
"Sam Altman addresses 'bumpy' GPT-5 rollout, bringing 4o back, and the 'chart crime'." TechCrunch, August 8, 2025. https://techcrunch.com/2025/08/08/sam-altman-addresses-bumpy-gpt-5-rollout-bringing-4o-back-and-the-chart-crime/
"OpenAI, Anthropic committed 'chart crime.' Is it just sloppiness?" The Washington Post, August 12, 2025. https://www.washingtonpost.com/technology/2025/08/12/gpt5-chart-crimes-claude-graphs/
"Sam Altman admits OpenAI 'totally screwed up' its GPT-5 launch." Fortune, August 18, 2025. https://fortune.com/2025/08/18/sam-altman-openai-chatgpt5-launch-data-centers-investments/
"GPT-5.5 vs Claude Opus 4.7 vs Gemini 3.1 Pro for Builders." MindStudio, April 2026. https://www.mindstudio.ai/blog/gpt-5-5-review-developers-builders

Background and Development

Launch event

GPT-5 (August 2025)

Unified Architecture and Router System

Specifications

Variant lineup at launch

Benchmark Performance

Tool use and agentic coding

Multimodal Capabilities

Hallucination Reduction

Efficiency Gains

System card and safety

Launch reception and the router controversy

ChatGPT integration at launch

Enterprise launch partners

Developer migration and pricing context

GPT-5.1 (November 2025)

GPT-5.2 (December 2025)

Key Improvements

Benchmark Deep Dive: FrontierMath

GPT-5.2-Codex

Competitive Context

GPT-5.3 Instant (March 2026)

GPT-5.4 (March 2026)

Key Specifications

Native Computer Use

1.05 Million Token Context

Benchmark Improvements

Model Variants

GPT-5.5 (April 2026)

Developer Migration from the o-Series

Enterprise Adoption

Benchmark Progression Across GPT-5 Versions

GPT-5 Series Timeline

Pricing Evolution

Competitive Landscape

Claude vs GPT-5 vs Gemini at launch

Model Deprecation and Transition

Reception and Impact

See Also

References

Improve this article

Related Articles

DeepSeek 3.0

GPT-5 Codex

GPT

OpenAI o1

OpenAI o3

GPT-5.4

Background and Development

Launch event

GPT-5 (August 2025)

Unified Architecture and Router System

Specifications

Variant lineup at launch

Benchmark Performance

Tool use and agentic coding

Multimodal Capabilities

Hallucination Reduction

Efficiency Gains

System card and safety

Launch reception and the router controversy

ChatGPT integration at launch

Enterprise launch partners

Developer migration and pricing context

GPT-5.1 (November 2025)

GPT-5.2 (December 2025)

Key Improvements

Benchmark Deep Dive: FrontierMath

GPT-5.2-Codex

Competitive Context

GPT-5.3 Instant (March 2026)

GPT-5.4 (March 2026)

Key Specifications

Native Computer Use

1.05 Million Token Context

Benchmark Improvements

Model Variants

GPT-5.5 (April 2026)

Developer Migration from the o-Series