GPT-5.4

AI Models Large Language Models OpenAI

24 min read

Updated Jun 22, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 22, 2026

Fact-checked

In review queue

Sources

27 citations

Revision

v5 · 4,825 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

GPT-5.4 is a large language model developed by OpenAI and released on March 5, 2026.^[1] It is the fourth point release in the GPT-5 series, following GPT-5.3 (the Codex line and GPT-5.3 Instant) and preceding GPT-5.5. GPT-5.4 was the first time OpenAI shipped a single mainline model with native computer use, a 1 million token context window, and the coding capability that had previously been confined to its specialist Codex variants.^[1]^[10] OpenAI described it as a model that "brings together our advances in reasoning, coding, and agentic workflows into a single frontier model" and called it "our most capable and efficient frontier model for professional work."^[1]

GPT-5.4 launched in two reasoning configurations at debut, GPT-5.4 Thinking and GPT-5.4 Pro, with two smaller siblings (GPT-5.4 mini and GPT-5.4 nano) following on March 17, 2026.^[1]^[2] A defensive cybersecurity variant, GPT-5.4-Cyber, was added later in April through OpenAI's Trusted Access for Cyber program.^[21]^[22] The model headlined a unified release across ChatGPT, the OpenAI API, and Codex surfaces, and it became the default "Thinking" tier in ChatGPT for paid plans on launch day.^[1]^[6]

OpenAI positioned GPT-5.4 as the first frontier model that combines coding, knowledge work, computer use, and long context in one general-purpose system.^[1] On OSWorld-Verified, a computer-use evaluation that measures how well models can drive desktop applications, GPT-5.4 scored 75.0%, exceeding the 72.4% average human baseline reported by OpenAI.^[1]^[12] On the GDPval professional knowledge work benchmark spanning 44 occupations, the model reached 83%.^[1] It also took the top Elo position on LM Arena for several weeks following launch.^[27] Reception in the developer community was generally positive on capability, with criticism focused on price increases relative to predecessors and a 207-second average time to first token in reasoning mode reported by Artificial Analysis.^[8]

Why did OpenAI build GPT-5.4?

GPT-5.4 arrived nine months after the original GPT-5 launch in August 2025. By early 2026 OpenAI had iterated rapidly: GPT-5.1 refined instruction following, GPT-5.2 introduced the three-tier Instant/Thinking/Pro structure on December 11, 2025, and the GPT-5.3 cycle in February and March 2026 produced GPT-5.3-Codex, GPT-5.3-Codex-Spark (with Cerebras), and GPT-5.3 Instant.^[15] By the time GPT-5.4 was being prepared, OpenAI's lineup had bifurcated into a conversational track (the Instant series) and a specialist coding track (the Codex series), each with separate price points and rollout schedules.^[15]

The positioning of GPT-5.4 reversed that trend. OpenAI described it as the model that "brings together our advances in reasoning, coding, and agentic workflows into a single frontier model," folding the gains of GPT-5.3-Codex back into the mainline rather than maintaining a parallel Codex specialist.^[1] Brendan Foody, CEO of evaluation platform Mercor, said the model delivered "top performance while running faster and at a lower cost than competitive frontier models" on Mercor's APEX-Agents benchmark for law and finance.

The competitive context was a crowded frontier. Anthropic had released Claude Opus 4.7 (1M context) in early 2026 with its own 1 million token context window. Google's Gemini 3 family held a substantial share of long-context workloads through Vertex AI and offered cheaper input pricing. xAI, DeepSeek, and Meta had all shipped frontier or near-frontier models in the same window. OpenAI's pitch with GPT-5.4 was that the same model could now handle the long-context, computer-use, and agentic workflows that had previously required specialist tools or competing providers.

When was GPT-5.4 released?

GPT-5.4 launched on Thursday, March 5, 2026 across ChatGPT, the API, and Codex.^[1]^[6] OpenAI published the announcement "Introducing GPT-5.4" on its blog the same day, and the model became available immediately to paid ChatGPT users (Plus, Pro, Team, Business, Enterprise, and Edu) as a Thinking option.^[1] Free-tier users did not receive direct access to the base GPT-5.4 model at launch, although the smaller GPT-5.4 mini reached free users through the Thinking feature when it shipped on March 17, 2026.^[2]^[11]

In the API, GPT-5.4 was exposed under the model ID gpt-5.4 with a dated snapshot gpt-5.4-2026-03-05 for reproducible research.^[3] The Pro variant, which can spend longer in extended reasoning and is intended for high-stakes work, was exposed as gpt-5.4-pro.^[3] OpenAI also published an updated system card addendum tied to the existing GPT-5 system card line, classifying GPT-5.4 Thinking as High capability in the Cybersecurity domain under the Preparedness Framework while remaining below the Critical threshold.^[1]^[22]

The release was synchronized with several first-party surface updates. ChatGPT's web client added an upfront thinking plan preview that showed users what the model intended to do before it executed.^[1] Codex CLI and the IDE extensions for Visual Studio Code and JetBrains added GPT-5.4 as a selectable model.^[1] GitHub Copilot integrated the model for enterprise customers under bring-your-own-key arrangements. Microsoft made GPT-5.4 available on Azure AI Foundry on the same day.

GPT-5.2 was scheduled for retirement on June 5, 2026, three months after the launch, in line with OpenAI's standard transition policy for prior-generation models. GPT-5.3-Codex remained available but was effectively superseded as a recommended option, with Codex documentation pointing developers toward GPT-5.4 for new projects.^[14]

What variants of GPT-5.4 are there?

GPT-5.4 launched with two reasoning configurations and added small models, computer-use, and a cyber-permissive variant in subsequent weeks.

Variant	API model ID	Snapshot	Released	Primary use case
GPT-5.4 Thinking	`gpt-5.4`	`gpt-5.4-2026-03-05`	March 5, 2026	Mainline reasoning across professional work, coding, and computer use
GPT-5.4 Pro	`gpt-5.4-pro`	not separately dated	March 5, 2026	Maximum accuracy on long-running professional tasks
GPT-5.4 mini	`gpt-5.4-mini`	`gpt-5.4-mini-2026-03-17`	March 17, 2026	Fast, low-cost coding, computer use, and subagents
GPT-5.4 nano	`gpt-5.4-nano`	`gpt-5.4-nano-2026-03-17`	March 17, 2026	Classification, extraction, ranking, and high-volume sub-agents
GPT-5.4-Cyber	restricted	not public	Late April 2026	Defensive cybersecurity work for verified professionals

GPT-5.4 Thinking

GPT-5.4 Thinking is the mainline reasoning model. It supports adjustable reasoning effort (low, medium, high, and xhigh), accepts text and image input, and produces text output.^[3] It is the version surfaced in ChatGPT's Thinking tier for paid users, in Codex CLI, and on most third-party platforms.^[1] OpenAI describes it as its "most capable and efficient frontier model for professional work," with particular emphasis on long-horizon deliverables such as slide decks, financial models, and legal analysis.^[1]

Thinking inherits the developer tools from earlier GPT-5 releases (the apply_patch tool for structured file diffs, local_shell for sandboxed shell execution, preambles for persistent agent instructions) and adds two new ones at launch: native computer use, which lets the model take screenshots, move a mouse, and type into desktop applications, and tool search, which dynamically loads tool definitions only when needed instead of front-loading them in every prompt.^[1]^[10]^[24]

GPT-5.4 Pro

GPT-5.4 Pro is the higher-end reasoning configuration. It uses extended reasoning by default and is exposed only through the Responses API.^[3] OpenAI positions Pro for tasks where additional inference compute reliably improves outcomes, such as legal due diligence, complex financial modeling, frontier scientific reasoning, and difficult coding tasks that require deeper deliberation.^[1] On Humanity's Last Exam Pro reaches 41.6%, second only to Gemini 3.1 Pro Preview at 44.7% according to Artificial Analysis tracking through May 2026.^[9]

Pro carries the same 1.05M-token context window as Thinking but a much steeper price tag: $30 per million input tokens and $180 per million output tokens, twelvefold the standard tier.^[9] The variant is available to ChatGPT Pro and Enterprise subscribers and through the API for all customers.

GPT-5.4 mini and nano

OpenAI added GPT-5.4 mini and GPT-5.4 nano on March 17, 2026, in a follow-up post titled "Introducing GPT-5.4 mini and nano."^[2] Mini retains computer use and tool search but trims the context window to 400,000 tokens.^[4] It is OpenAI's recommended option for high-volume agent and subagent workloads where the standard Thinking model is too expensive.^[2] In ChatGPT, mini is available to Free and Go users as the Thinking option, giving free-tier users their first access to a GPT-5.4 derivative.^[2]^[11]

Nano is the smaller of the two and is API-only. Computer use and tool search are not supported.^[5] It is targeted at high-volume classification, data extraction, ranking, and short-scope sub-agent work where speed and cost dominate the budget.^[5] Pricing for the small variants is substantially below the mainline tier.

GPT-5.4-Cyber

GPT-5.4-Cyber is a fine-tune of GPT-5.4 Thinking trained to support legitimate defensive cybersecurity work. OpenAI announced it in late April 2026, alongside an expanded Trusted Access for Cyber (TAC) program.^[21]^[22] The variant lowers the refusal boundary for tasks that the standard model would treat as borderline (vulnerability research, malware analysis, binary reverse engineering) and adds capabilities such as binary reverse engineering of compiled software without source code access.^[21] Access is restricted to identity-verified individuals and enterprise security teams, with optional Zero-Data-Retention waivers that allow OpenAI monitoring in exchange for higher permissiveness.^[21]^[22]

What are GPT-5.4's technical specifications?

Specification	GPT-5.4 Thinking	GPT-5.4 Pro	GPT-5.4 mini	GPT-5.4 nano
Context window	1,050,000 tokens	1,050,000 tokens	400,000 tokens	400,000 tokens
Maximum output tokens	128,000	128,000	128,000	128,000
Knowledge cutoff	August 31, 2025	August 31, 2025	August 31, 2025	August 31, 2025
Input modalities	Text, images	Text, images	Text, images	Text, images
Output modalities	Text	Text	Text	Text
Audio support	None	None	None	None
Reasoning effort	none, low, medium, high, xhigh	extended	low, medium, high	low, medium
Native computer use	Yes	Yes	Yes	No
Tool search	Yes	Yes	Yes	No
Function calling	Yes	Yes	Yes	Yes
Structured outputs	Yes	Yes	Yes	Yes
Streaming	Yes	Yes	Yes	Yes
Fine-tuning	No	No	No	No

The context window is OpenAI's first to cross the 1 million-token threshold for a mainline general-purpose model, surpassing the 400K tokens used by GPT-5.2 Thinking and matching what Anthropic and Google offered in their highest-tier 2026 models.^[10]^[12] The 128,000-token output cap is unchanged from the GPT-5 series, though it can be combined with the larger context window to support longer overall sessions through compaction.^[3] The August 31, 2025 knowledge cutoff is identical to GPT-5.2 and GPT-5.3, meaning GPT-5.4 has no awareness of post-cutoff events without retrieval or web search tools enabled.^[3]

What is native computer use?

Native computer use is the most novel addition. GPT-5.4 can take screenshots of a desktop, identify UI elements, move a virtual mouse, type into windows, and chain operations across applications.^[10] The model was trained jointly on the desktop interaction data that previously powered OpenAI's Computer Using Agent (CUA) line, and the OSWorld-Verified score of 75% indicates that its first-attempt completion rate on a curated set of 369 desktop tasks crossed the human reference baseline of 72.4%.^[1]^[12] On a private OpenAI evaluation involving roughly 30,000 HOA and property tax portals, GPT-5.4 reportedly achieved 95% first-attempt success and 100% within three attempts.^[1]

How does tool search cut token usage?

Tool search is the second major addition. Rather than including the full schema for every available tool in every API call, the model receives a lightweight catalog and queries it on demand.^[19]^[24] OpenAI reported a 47% reduction in token usage on benchmarks involving 36 MCP servers without loss of accuracy.^[19] The mechanism preserves the prompt cache better and scales to larger tool ecosystems than the prior "all tools in the prompt" approach.^[19]

How well does GPT-5.4 perform on benchmarks?

OpenAI published a wide set of benchmark scores at launch.^[1] Third-party trackers such as Artificial Analysis, Vellum, llm-stats.com, and DataCamp confirmed many of them, with the usual caveats that some scores reflect specific reasoning effort settings or tool configurations.^[8]^[10]

Headline benchmarks

Benchmark	GPT-5.4 Thinking	GPT-5.4 Pro	GPT-5.3-Codex	GPT-5.2 Thinking
GDPval (professional knowledge work)	83.0%	not separately reported	not reported	70.9%
OSWorld-Verified (computer use)	75.0%	not separately reported	64.7%	47.3%
WebArena Verified	67.3%	not separately reported	not reported	not reported
Online-Mind2Web	92.8%	not separately reported	not reported	not reported
BrowseComp	82.7%	89.3%	not reported	not reported
Toolathlon	54.6%	not separately reported	not reported	not reported
SWE-bench Verified	~80%	~80%	~80%	80.0%
SWE-bench Pro Public	57.7%	not separately reported	56.8%	55.6%
Terminal-Bench 2.0	75.0%	not separately reported	77.3%	62.2%
GPQA Diamond	reported in 91 to 92% range	94.4%	not reported	92.4%
AIME 2025	100%	100%	100%	100%
FrontierMath (Tiers 1 to 3)	47.6%	not separately reported	not reported	40.3%
FrontierMath Tier 4	not reported	38.0%	not reported	not reported
Humanity's Last Exam (with tools)	52.1%	not reported	not reported	34.5%
Humanity's Last Exam (Artificial Analysis)	not reported	41.6%	not reported	not reported
ARC-AGI-1	93.7%	not separately reported	not reported	~88%
ARC-AGI-2	73.3%	not separately reported	not reported	52.9%

The most prominent gains versus GPT-5.2 are on computer use and abstract reasoning. OSWorld-Verified rose from 47.3% to 75.0%, a 27.7 percentage-point jump and the largest single-generation gain on that benchmark.^[1]^[12] ARC-AGI-2 climbed from 52.9% to 73.3%, continuing the rapid progression that had begun with GPT-5.2. Knowledge work as measured by GDPval rose from 70.9% to 83.0%, with OpenAI noting that the gain was strongest on document-heavy occupations such as legal, financial, and project-management roles.^[1]

Coding benchmarks

GPT-5.4 absorbed the coding capability of GPT-5.3-Codex without surpassing it on every coding metric. SWE-bench Pro Public improved by 0.9 points (56.8% to 57.7%), and SWE-bench Verified stayed near the 80% range.^[14] Terminal-Bench 2.0 actually fell from 77.3% in GPT-5.3-Codex to 75.0% in GPT-5.4, reflecting the breadth-versus-specialization tradeoff of merging the Codex line back into the mainline.^[14] OpenAI's argument was that GPT-5.4's broader scope (long context, computer use, knowledge work, and tool search) made up for the marginal regression on terminal-specific tasks.^[1] Independent reviewers including Nathan Lambert (Interconnects) described the upgrade as "a meaningful step" in practice across correctness, ease of use, speed, and cost, even where on-paper benchmarks looked incremental.^[16]

Document and spreadsheet evaluations

On an internal OpenAI benchmark of spreadsheet modeling tasks of the kind a junior investment-banking analyst might perform, GPT-5.4 scored 87.3% mean compared to 68.4% for GPT-5.2.^[1] Human raters preferred GPT-5.4 presentations to GPT-5.2 presentations 68.0% of the time.^[1] On legal document analysis (BigLaw Bench), GPT-5.4 reached 91% per third-party reporting.^[23]

How accurate is GPT-5.4?

OpenAI reported that individual factual claims in GPT-5.4 outputs are 33% less likely to be wrong than in GPT-5.2, and overall responses are 18% less likely to contain any factual errors.^[1] The largest hallucination reductions were on web-enabled queries and on professional domains such as legal, medical, and financial, in line with the trend across earlier GPT-5 point releases.^[1]

Public leaderboards

On LM Arena, GPT-5.4 took the #1 Elo position on March 6, 2026, the day after launch.^[27] Through April 2026, GPT-5.4-high held a top-five position with an Elo around 1,480.^[27] The standard GPT-5.4 entry tracked at around Elo 1,466, ranking in the top 20.^[27] On Artificial Analysis's Intelligence Index it scored 57, tied with Gemini 3.1 Pro and ahead of Claude Opus 4.6 at 53.^[8] Time to first token in reasoning mode averaged 207 seconds, and throughput averaged 80.3 tokens per second across 120 million evaluated output tokens.^[8]

How much does GPT-5.4 cost?

API pricing

Tier	Input (per 1M tokens)	Cached input (per 1M)	Output (per 1M tokens)
GPT-5.4 Thinking	$2.50	$0.25	$15.00
GPT-5.4 Pro	$30.00	not published	$180.00
GPT-5.4 mini	$0.75	$0.075	$4.50
GPT-5.4 nano	$0.20	$0.02	$1.25
GPT-5.2 Thinking (prior tier reference)	$1.75	$0.175	$14.00
GPT-5.3-Codex (prior reference)	$1.75	$0.175	$14.00

The Thinking tier reflects a 43% increase in input price and a 7% increase in output price relative to GPT-5.2 Thinking.^[12] Cached input retains the 90% discount.^[12] Batch API pricing applies the standard 50% discount, and OpenAI's regional data residency endpoints add a 10% surcharge. Some reviewers, including The Decoder, characterized GPT-5.4 mini and nano as roughly four times more expensive than equivalent GPT-5.0 small models, citing this as a friction point for high-volume workloads.^[11]

GPT-5.4 Pro pricing puts it among the most expensive models on the market. Artificial Analysis ranked it 143 out of 145 models on input price and 144 out of 145 on output price, comparable in tier to OpenAI's prior Pro releases.^[9]

ChatGPT plans

Plan	GPT-5.4 Thinking	GPT-5.4 Pro	GPT-5.4 mini
Free	No	No	Yes (Thinking option, rate limited)
Go	No	No	Yes
Plus ($20/month)	Yes (80 messages per 3 hours)	No	Yes
Pro ($200/month)	Yes (unlimited)	Yes (unlimited)	Yes
Team / Business	Yes	No	Yes
Enterprise / Edu	Yes (admin opt-in)	Yes (admin opt-in)	Yes

Free users initially had no access to GPT-5.4 itself, only the GPT-5.3 Instant model.^[1] The mini-tier addition on March 17 closed that gap by exposing GPT-5.4 mini through the Thinking feature on the free plan, although with rate limits.^[2]^[11]

What is GPT-5.4 used for?

Computer use

GPT-5.4's most-discussed feature is native computer use. OpenAI's announcement framed it as a step from chat-based assistance toward true desktop agency: the model can open applications, navigate menus, fill in spreadsheets, click through web forms, and chain operations without bespoke automation scripts.^[1]^[10] Common patterns include scraping HOA portals for property data, running multi-step forms across legacy government websites, populating Excel workbooks from PDF inputs, and driving SaaS dashboards through their UI rather than their API.^[1]

Knowledge work

The GDPval score of 83% reflects performance across 44 distinct professional categories, including financial analysis, legal drafting, medical chart abstraction, project planning, and engineering documentation.^[1] OpenAI's example workflows include producing a sales presentation, generating an accounting spreadsheet, scheduling staff for an urgent care clinic, drawing manufacturing diagrams, and producing short videos.^[1] The model is positioned for tasks that combine reading large documents, reasoning over them, drafting structured artifacts, and using tools to verify or refine the output.

Coding

With the merge of GPT-5.3-Codex's coding capability, GPT-5.4 became the recommended coding option in Codex CLI and the major IDE integrations.^[1]^[14] Reviewers noted improvements in long-running coding sessions: better context management, fewer "context wall" failures on million-token repositories, more reliable git operations, and reduced regression to previously solved problems.^[16] Cursor's evaluation data, cited by Nathan Lambert, showed efficiency gains in tokens per task.^[16] GPT-5.3-Codex retained narrow advantages in highly terminal-focused workflows where its specialist tuning still mattered.^[14]

Agentic workflows

The combination of long context, computer use, tool search, and the existing developer tools (apply_patch, local_shell, preambles) makes GPT-5.4 OpenAI's first model that can sustain agentic workflows over millions of tokens and many tools without specialist routing.^[1] Tool search reportedly cuts token usage on tool-heavy workflows by roughly half.^[19] Mid-response interactive thinking lets users redirect the model partway through a long reasoning trajectory rather than waiting for the entire run to finish.^[1]

Document, spreadsheet, and presentation generation

OpenAI emphasized professional artifacts in its launch demos: financial models, legal memos, board presentation decks, and multi-tab spreadsheets.^[1] The 87.3% spreadsheet score and the 68.0% presentation-preference rate are the headline numbers.^[1] Several enterprise customers including Notion, Box, and Mercor cited workflow improvements on internal benchmarks.

How does GPT-5.4 compare with competitors?

At the time of GPT-5.4's release, the primary competing frontier models were Claude Opus 4.6 and Claude Opus 4.7 (1M context) from Anthropic, Gemini 3.1 Pro from Google, and a small number of open-weights releases. The table below uses third-party benchmark numbers as reported in the months after launch.

Benchmark	GPT-5.4 Thinking	Claude Opus 4.7 (1M context)	Gemini 3.1 Pro
Context window	1.05M tokens	1M tokens	1M to 2M tokens
OSWorld-Verified	75.0%	72.5%	not directly reported
GDPval	83.0%	reported lower	not directly reported
SWE-bench Verified	~80%	80.8%	76% range
Humanity's Last Exam (Artificial Analysis)	not directly reported	reported lower	44.7% (preview)
Artificial Analysis Intelligence Index	57	53	57
Input price (per 1M tokens)	$2.50	$3 to $5 range	$2 range
Output price (per 1M tokens)	$15.00	$15 to $25 range	$12 range

GPT-5.4 led on computer use and knowledge work, sat near the top of the SWE-bench Verified pack, and was competitive on price for a frontier reasoning model.^[8]^[12] Claude Opus 4.6 retained narrow advantages on selected SWE-bench measurements and on tasks with ambiguous specifications, where reviewers found it inferred developer intent more reliably. Gemini 3.1 Pro held the top score on Humanity's Last Exam in the preview phase and offered cheaper input pricing along with better integration with Google Cloud.^[9]

The split between GPT-5.4 and competitors is qualitatively similar to the GPT-5.2 era: OpenAI's model is faster and more precise on well-specified tasks; Claude is more reliable on ambiguous ones; Gemini is strongest on long-context and Google-ecosystem workflows. GPT-5.4 closed the context-window gap that had persisted since the GPT-5 launch.

Is GPT-5.4 safe to use?

GPT-5.4 Thinking was the second model OpenAI classified as High capability for cybersecurity under the Preparedness Framework, after GPT-5.3-Codex earlier in 2026.^[22] The classification triggered the same set of layered safeguards that had accompanied the GPT-5.3-Codex launch: refusal training on clearly malicious requests, automated classifier-based monitoring of high-risk traffic, a fallback model for traffic flagged as suspicious, and the gated Trusted Access for Cyber program for advanced capabilities.^[22]

The High threshold under the framework is defined as a model that "removes existing bottlenecks to scaling cyber operations" through automation of end-to-end attacks against hardened targets or automation of operationally relevant vulnerability discovery and exploitation.^[22] OpenAI did not claim definitive evidence that GPT-5.4 reaches this threshold but adopted a precautionary approach as it had with prior Codex releases.^[22] The Critical threshold (zero-day discovery in many hardened systems without human intervention, or end-to-end novel cyberattack strategy execution) was not met.^[22]

OpenAI also reported safety improvements for ordinary user interactions. Hallucination rates on representative ChatGPT traffic stayed under 1% per claim with browsing enabled, continuing the trend established by GPT-5.2 and GPT-5.3 Instant.^[1] Mental health handling and self-harm response benchmarks improved relative to GPT-5.2.^[1]

GPT-5.4-Cyber, released in late April, formalized the cyber-permissive variant for verified defensive professionals.^[21] It includes binary reverse engineering capability, lower refusal rates on legitimate vulnerability research, and a tiered access model with identity verification, enterprise authentication, and Zero-Data-Retention waivers for the highest permissiveness tier.^[21]^[22]

How was GPT-5.4 received?

Developer and enterprise reception

Developer reception was largely positive on capability. Reviewers from Turing College, NxCode, BuildFastWithAI, and Interconnects characterized GPT-5.4 as a meaningful step forward, particularly on long sessions where context-window pressure had been a persistent bottleneck for the GPT-5.2 generation.^[13]^[16]^[17]^[18] Cursor's data showed efficiency gains in tokens per coding task.^[16] Mercor named GPT-5.4 the leader on its APEX-Agents law and finance benchmark.

Nathan Lambert summarized the trajectory: "Where GPT 5.4 feels like another incremental model on some on-paper benchmarks, in practice it feels like a meaningful step" across correctness, ease of use, speed, and cost.^[16] He praised the elimination of the "death by a thousand cuts" experience around git operations and background package management that had plagued earlier Codex variants.^[16]

Enterprise customers cited workflow improvements. The 95% first-attempt success on HOA portals was repeatedly highlighted by adopters in property tech and finance.^[1] Box, Notion, and Mercor reported productivity gains on internal benchmarks. Microsoft's day-zero Azure AI Foundry availability gave enterprise users a fast path to deployment.

Criticism

Criticism focused on three areas. First, pricing: the 43% input-price increase over GPT-5.2 Thinking, and the four-fold cost increase for mini and nano relative to the GPT-5.0 small models, drew complaints from cost-sensitive teams.^[11]^[12] The Decoder ran a critical piece on the small-model price increases.^[11]

Second, latency: Artificial Analysis measured a 207-second average time to first token in reasoning mode.^[8] While this is partly inherent to extended reasoning, several reviewers noted that GPT-5.4 felt slower in interactive use than GPT-5.2 Thinking on equivalent prompts.

Third, mixed retention of Codex specialization: the Terminal-Bench 2.0 regression from 77.3% to 75.0% relative to GPT-5.3-Codex prompted some terminal-heavy teams to keep GPT-5.3-Codex as their primary coding model.^[14] GPT-5.4's broader scope was seen as a tradeoff against the highly tuned specialist model.^[14]

Reaction in coverage and the open community

Mainstream coverage in TechCrunch, ZDNET, Vice, Wired, and CNBC framed GPT-5.4 as office automation infrastructure rather than a chatbot upgrade, in particular through the Excel and presentation integrations and the promise of fewer iteration loops on professional artifacts.^[6] The Decoder and Vice highlighted token efficiency and deep research. Some Reddit threads on r/MachineLearning and Hacker News raised data-handling concerns; a small "soft boycott" of frontier closed models in favor of local options like Llama 3.x was reported by AI Critique and other outlets, though it did not register as a measurable shift in market share.^[20]

What came after GPT-5.4?

GPT-5.4 was succeeded by GPT-5.5, released April 23, 2026, just under seven weeks after the GPT-5.4 launch. GPT-5.5 carried forward the 1M-token context window and computer-use capability, increased token efficiency, raised intelligence-index scores further, and added new agentic features.^[21] GPT-5.5 was priced higher than GPT-5.4 in the API but with lower effective per-task cost on many workloads due to fewer tokens consumed. GPT-5.5 Instant followed on May 5, 2026 as the default model in ChatGPT.

GPT-5.4 was scheduled to remain available in the API through at least the end of 2026 in line with OpenAI's standard transition policy.

References

OpenAI. "Introducing GPT-5.4." openai.com, March 5, 2026. https://openai.com/index/introducing-gpt-5-4/ ↩
OpenAI. "Introducing GPT-5.4 mini and nano." openai.com, March 17, 2026. https://openai.com/index/introducing-gpt-5-4-mini-and-nano/ ↩
OpenAI. "GPT-5.4 Model." developers.openai.com. https://developers.openai.com/api/docs/models/gpt-5.4 ↩
OpenAI. "GPT-5.4 mini Model." developers.openai.com. https://developers.openai.com/api/docs/models/gpt-5.4-mini ↩
OpenAI. "GPT-5.4 nano Model." developers.openai.com. https://developers.openai.com/api/docs/models/gpt-5.4-nano ↩
TechCrunch. "OpenAI launches GPT-5.4 with Pro and Thinking versions." techcrunch.com, March 5, 2026. https://techcrunch.com/2026/03/05/openai-launches-gpt-5-4-with-pro-and-thinking-versions/ ↩
Wikipedia contributors. "GPT-5.4." Wikipedia. https://en.wikipedia.org/wiki/GPT-5.4
Artificial Analysis. "GPT-5.4 (xhigh) Intelligence, Performance & Price Analysis." artificialanalysis.ai. https://artificialanalysis.ai/models/gpt-5-4 ↩
Artificial Analysis. "GPT-5.4 Pro (xhigh) Intelligence, Performance & Price Analysis." artificialanalysis.ai. https://artificialanalysis.ai/models/gpt-5-4-pro ↩
DataCamp. "GPT-5.4: Native Computer Use, 1M Context Window, Tool Search." datacamp.com. https://www.datacamp.com/blog/gpt-5-4 ↩
DataCamp. "GPT-5.4 mini and nano: Benchmarks, Access, and Reactions." datacamp.com. https://www.datacamp.com/blog/gpt-5-4-mini-nano ↩
NxCode. "GPT-5.4 (March 2026): 75% Computer Use, 1M Context, $2.50/MTok." nxcode.io. https://www.nxcode.io/resources/news/gpt-5-4-release-date-features-pricing-2026 ↩
NxCode. "GPT 5.4 Complete Guide 2026: Features, Pricing, Benchmarks & How to Use." nxcode.io. https://www.nxcode.io/resources/news/gpt-5-4-complete-guide-features-pricing-models-2026 ↩
NxCode. "GPT-5.4 vs GPT-5.3 Codex: Should Developers Upgrade?." nxcode.io. https://www.nxcode.io/resources/news/gpt-5-4-vs-gpt-5-3-codex-upgrade-comparison-2026 ↩
NxCode. "OpenAI GPT-5 Model Guide: GPT-5.2 vs 5.3 vs 5.4." nxcode.io. https://www.nxcode.io/resources/news/openai-gpt-5-model-guide-which-to-use-2026 ↩
Lambert, Nathan. "GPT 5.4 is a big step for Codex." interconnects.ai. https://www.interconnects.ai/p/gpt-54-is-a-big-step-for-codex ↩
BuildFastWithAI. "GPT-5.4 Review: Features, Benchmarks & Access (2026)." buildfastwithai.com. https://www.buildfastwithai.com/blogs/gpt-5-4-review-benchmarks-2026 ↩
Turing College. "GPT-5.4 Review: Is It Worth Leaving GPT-5.3 Codex Behind?." turingcollege.com. https://www.turingcollege.com/blog/gpt-5-4-review-vs-gpt-5-3-codex ↩
MindStudio. "What Is Tool Search? How GPT-5.4 Cuts Token Usage by 47%." mindstudio.ai. https://www.mindstudio.ai/blog/what-is-tool-search-gpt-5-4-token-efficiency ↩
AI Critique. "GPT-5.4 and the March 2026 ChatGPT Upgrade Cycle." aicritique.org, March 16, 2026. https://www.aicritique.org/us/2026/03/16/gpt-5-4-and-the-march-2026-chatgpt-upgrade-cycle-official-release-media-narratives-and-real-world-reactions/ ↩
Fluid Attacks. "GPT-5.4-Cyber and GPT-5.5 for security." fluidattacks.com. https://fluidattacks.com/blog/gpt-5-4-cyber-gpt-5-5-ai-cybersecurity-future ↩
OpenAI. "Trusted access for the next era of cyber defense." openai.com. https://openai.com/index/scaling-trusted-access-for-cyber-defense/ ↩
ALMCorp. "OpenAI GPT-5.4: Features, Benchmarks, Pricing & Computer Use (2026)." almcorp.com. https://almcorp.com/blog/gpt-5-4/ ↩
CometAPI. "How to Use GPT-5.4 API: Parameters and Tools Usage Guide." cometapi.com. https://www.cometapi.com/how-to-use-gpt-5-4-api/ ↩
TTMS. "GPT-5.4 by OpenAI: What's new? 9 Key Improvements." ttms.com. https://ttms.com/gpt-5-4-by-openai-whats-new-9-key-improvements/
AnalyticsVidhya. "We Tried The New GPT-5.4 And it is The Most Powerful ChatGPT Has Ever Been." analyticsvidhya.com, March 2026. https://www.analyticsvidhya.com/blog/2026/03/we-tried-gpt-5-4-ai/
VPSRanking. "LMArena Ranking Update GPT-5.4 Takes #1 Elo Rating." vpsranking.com. https://vpsranking.com/news/ai/ai-2026-03-06-lmarena ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

4 revisions by 1 contributors · full history

Suggest edit

What links here

AI Model Release Timeline (2022-2026)ARC-AGI Anthropic GDPval GPT Model Timeline (GPT-1 to GPT-5.x)GPT-4o GPT-5 Pro GPT-5.3 GPT-5.5 Humanity's Last Exam Large Language Model Muse Spark OSWorld OpenAI SWE-Atlas SWE-Bench Pro SWE-bench Sam Altman SciCode

Why did OpenAI build GPT-5.4?

When was GPT-5.4 released?

What variants of GPT-5.4 are there?

GPT-5.4 Thinking

GPT-5.4 Pro

GPT-5.4 mini and nano

GPT-5.4-Cyber

What are GPT-5.4's technical specifications?

What is native computer use?

How does tool search cut token usage?

How well does GPT-5.4 perform on benchmarks?

Headline benchmarks

Coding benchmarks

Document and spreadsheet evaluations

How accurate is GPT-5.4?

Public leaderboards

How much does GPT-5.4 cost?

API pricing

ChatGPT plans

What is GPT-5.4 used for?

Computer use

Knowledge work

Coding

Agentic workflows

Document, spreadsheet, and presentation generation

How does GPT-5.4 compare with competitors?

Is GPT-5.4 safe to use?

How was GPT-5.4 received?

Developer and enterprise reception

Criticism

Reaction in coverage and the open community

What came after GPT-5.4?

See also

References

Improve this article

Related Articles

GPT

GPT-5

GPT-3.5

OpenAI o1

OpenAI o3

GPT-4.1

What links here

Related Articles

GPT

GPT-5

GPT-3.5

OpenAI o1

OpenAI o3

GPT-4.1

What links here