GPT-4.5 is a large language model developed by OpenAI and released as a research preview on February 27, 2025. Internally, it was codenamed Orion, a name that had been the subject of speculation in AI press for more than a year. OpenAI described GPT-4.5 as its largest and most knowledgeable model to date, framing it as a step forward in scaling unsupervised pre-training rather than a new entry in the chain-of-thought reasoning track that the company had opened with o1 and o3. Sam Altman called it "our last non-chain-of-thought model," signaling that subsequent flagships, beginning with GPT-5, would combine pre-trained intuition with explicit reasoning in a single system.[1][2][3]
The release shipped first to ChatGPT Pro ($200 per month) subscribers, then to ChatGPT Plus and the API the following week. API pricing was set at $75 per million input tokens and $150 per million output tokens, more than 25 times the per-token rate of GPT-4o and the highest production rate OpenAI has ever charged for a flagship model. The price reflected the unusual amount of compute the model used during training and the fact that, on a per-request basis, it was substantially heavier to serve than any previous OpenAI release.[1][4][5]
Reception was mixed and, in important quarters, harshly negative. On reasoning-style benchmarks such as AIME and GPQA, GPT-4.5 trailed both OpenAI's own o1 and o3-mini models and several competitors, including DeepSeek-R1 and Claude 3.7 Sonnet. On knowledge benchmarks like SimpleQA, it set a new high among non-reasoning models, with a hallucination rate of 37.1% (compared to 61.8% for GPT-4o). Press coverage from outlets including TechCrunch, Fortune, The Verge, and the Financial Times treated the launch as evidence that gains from scaling pre-training alone had begun to plateau, a debate that had been brewing since reports of an underwhelming Orion training run leaked the previous autumn.[2][6][7][8]
GPT-4.5 had a short commercial life. On April 14, 2025, OpenAI announced GPT-4.5's deprecation alongside the launch of GPT-4.1, citing GPT-4.1's ability to match or exceed GPT-4.5 "in many key capabilities at much lower cost and latency." The model was removed from the API on July 14, 2025, less than five months after its public debut. It remained accessible to ChatGPT Pro users for a few additional months before being absorbed into the GPT-5 launch in August. The episode is now widely cited as the cleanest public example of pre-training scaling running into diminishing returns at the frontier, and as the moment OpenAI's product strategy pivoted decisively toward reasoning-augmented models.[5][9][10]
| Aspect | Detail |
|---|---|
| Developer | OpenAI |
| Codename | Orion |
| Announcement date | February 27, 2025 |
| Research preview launch | February 27, 2025 (ChatGPT Pro and API tier 5+) |
| ChatGPT Plus rollout | Week of March 3, 2025 (after GPU expansion) |
| Type | Non-reasoning large language model |
| Predecessors | GPT-4, GPT-4o |
| Successor | GPT-4.1 (API), GPT-5 (consumer) |
| Context window | 128,000 tokens |
| Maximum output | 16,384 tokens |
| Knowledge cutoff | October 2023 |
| Modalities | Text input and output, image input |
| Native audio | No |
| Reasoning mode | None (no chain-of-thought) |
| API model ID | gpt-4.5-preview, gpt-4.5-preview-2025-02-27 |
| API deprecation announced | April 14, 2025 |
| API removal date | July 14, 2025 |
| ChatGPT removal | August 7, 2025 (folded into GPT-5 launch) |
The codename Orion had circulated in AI press throughout 2024 as the working name for OpenAI's next pre-training run. Reuters, The Information, and Bloomberg ran a steady cadence of stories describing a single very large model that the company hoped would represent the kind of generational leap GPT-4 had delivered over GPT-3. Internal expectations, according to those reports, were that Orion would be the foundation of GPT-5.[6][11]
By late 2024, the picture had shifted. The Information reported in November 2024 that Orion had fallen short of internal targets, with the gain from GPT-4 to Orion looking smaller than the gain from GPT-3 to GPT-4 had been. The Wall Street Journal followed with a long piece describing two failed pre-training attempts and a debate inside the company about whether to ship the model at all. Bloomberg framed it as the AI industry's first serious encounter with the limits of pure scaling, noting that Anthropic and Google DeepMind were experiencing similar slowdowns on their own large pre-training runs.[6][11][12]
At the same time, OpenAI launched o1 in December 2024 and announced o3 at the end of the "12 Days of OpenAI" event the same month. The reasoning track demonstrated that test-time compute could deliver large benchmark gains without a new pre-training run. Several commentators, including the SemiAnalysis newsletter and AI researcher Nathan Lambert, argued that this was the emerging answer to Orion's underperformance: capability gains would now come more from inference-time reasoning than from pre-training scale.[13][14]
In this context, Altman's January 2025 announcement that the company would "next ship GPT-4.5, the model we called Orion internally, as our last non-chain-of-thought model" carried two messages. First, Orion would in fact ship, repackaged as a half-step (4.5) rather than a flagship (5). Second, OpenAI was telegraphing that the era of releasing a new flagship by training a larger pure pre-trained model was ending. Subsequent flagships would combine the "intuition" of pre-training with the deliberation of reasoning, an architecture that arrived with GPT-5 in August 2025.[3][6]
OpenAI announced GPT-4.5 in a blog post titled "Introducing GPT-4.5" published on February 27, 2025, accompanied by a 12-minute live stream featuring research lead Mia Glaese, post-training researcher Amelia Glaese, and Altman over video. The post framed the model as "a step forward in scaling up pre-training and post-training" and emphasized that GPT-4.5 had been trained with substantially more compute than any prior OpenAI model. The phrase that drew the most attention from press was OpenAI's description of GPT-4.5 as the result of "scaling unsupervised learning" one more time, which several outlets read as a quiet acknowledgement that this generation of pre-training had reached its commercial ceiling.[1][7]
Access at launch was tiered. ChatGPT Pro subscribers got the model on the day of launch, both inside ChatGPT and through API access on usage tier 5 and above. ChatGPT Plus, Team, Enterprise, and Education users were told they would get access "the following week," subject to capacity. Free-tier ChatGPT users were not given access at any point during the research preview.[1][2]
Within hours of the launch, capacity became a problem. Altman posted on X on February 28, 2025 that OpenAI "will need tens of thousands more GPUs soon" and that the rollout would have to be staggered because the company had "run out of GPUs." Tom's Hardware, TechCrunch, and TechSpot all ran stories on the GPU shortage that week, citing it as the proximate reason ChatGPT Plus users had to wait. Several Plus subscribers reported being unable to access GPT-4.5 even after the official rollout began, and rate limits on the model in ChatGPT were tighter than for GPT-4o for the duration of the research preview.[15][16][17]
The API capacity story was similar. OpenAI initially gated GPT-4.5 to tier 5 customers (those with at least $1,000 of historical spend on the platform) and lowered tier requirements over the following weeks. Several developer forum threads documented latency in the 100 to 150 second range for moderately long prompts in the first month, well above what production teams were used to from GPT-4o.[4][16]
| Date | Milestone |
|---|---|
| February 27, 2025 | Research preview launch; ChatGPT Pro and API tier 5+ |
| February 28, 2025 | Altman tweets that OpenAI is "out of GPUs"; staggered rollout confirmed |
| Week of March 3, 2025 | ChatGPT Plus, Team rollout begins, capacity-permitting |
| March 5, 2025 | ChatGPT Enterprise and Education access announced |
| March 27, 2025 | API tier requirements lowered; broader developer access |
| April 14, 2025 | API deprecation notice issued; July 14, 2025 set as removal date |
| April 14, 2025 | GPT-4.1 launched as API replacement |
| July 14, 2025 | gpt-4.5-preview removed from API |
| August 7, 2025 | GPT-4.5 retired from ChatGPT alongside GPT-5 launch |
OpenAI did not disclose GPT-4.5's parameter count, training data composition, hardware footprint, or expert configuration. The company has been consistent in keeping these details private since GPT-4, citing competitive and safety considerations. What is on the public record is qualitative: GPT-4.5 was trained with "substantially more" compute than GPT-4o, it does not use chain-of-thought reasoning during inference, and it was the largest non-reasoning model OpenAI had built as of February 2025.[1][2][7]
Independent analysts speculated about the underlying design. SemiAnalysis, in its December 2024 piece on Orion and reasoning training infrastructure, estimated that the model used roughly 5 to 7 trillion total parameters with a sparsity factor similar to GPT-4, putting active parameters per forward pass somewhere around 600 billion. The same piece estimated training compute at roughly 10 times that of GPT-4. None of these figures were confirmed by OpenAI.[13]
Most outside observers assumed GPT-4.5 inherited the mixture of experts architecture that has been widely reported for GPT-4, since dense scaling at the implied parameter count would be uneconomical to serve at scale. The very high API price ($75 input, $150 output per million tokens) was generally read as a tell. At those rates, the model was almost certainly running on substantially more active compute per token than GPT-4o, whether through more experts active per pass, longer per-token compute paths, or both.[5][13]
GPT-4.5 is a non-reasoning model in OpenAI's terminology. It does not produce a hidden chain of thought before answering. It does not expose a reasoning_effort parameter. It does not have an extended thinking mode. The post-training pipeline did include reinforcement learning from human feedback and supervised fine-tuning, but it did not include the deliberative alignment training that powers the o-series.[1][2][18]
GPT-4.5 accepts text and image inputs and produces text outputs. It does not natively process audio, which had been one of the headline capabilities of GPT-4o. It also does not generate images natively (the gpt-image-1 launch was still two months in the future). For developers, the model supported function calling, structured outputs (JSON Schema), streaming, and the standard chat completions and Responses APIs.[1][4][18]
OpenAI's framing for GPT-4.5 emphasized two qualitative capability claims: better world knowledge and higher emotional intelligence. The launch post described the model as having "a deeper world knowledge" and "higher emotional intelligence" than any prior OpenAI model, with internal evaluations showing improved follow-instruction quality and lower hallucination rates. Internal testers were quoted describing the model as "warm," "intuitive," and "natural" in tone.[1][19]
For concrete capabilities, the model supports:
The model does not support:
The "emotional intelligence" framing was the most distinctive piece of OpenAI's pitch. In the launch post, the company claimed that GPT-4.5 was better at picking up subtext, choosing when to advise versus listen, and adapting tone to the user's emotional state. Internal A/B tests showed humans preferring GPT-4.5 responses to GPT-4o responses 56.8% of the time on everyday queries, 63.2% on professional queries, and 56.8% on creative queries. OpenAI did not publish detailed methodology for these evaluations.[1][2]
Altman amplified this framing on X. His launch-day post described GPT-4.5 as "the first model that feels like talking to a thoughtful person to me," adding "i have had several moments where i've sat back in my chair and been astonished at getting actually good advice from an AI." The same post called the model "a giant, expensive model." The deliberate juxtaposition of "thoughtful person" against "giant, expensive" was widely read as Altman managing expectations: the model was meant to feel different in tone, not necessarily to top reasoning leaderboards.[20][21]
Independent reactions to the personality claim were mixed. Some prominent users, including Andrej Karpathy, reported that they preferred GPT-4o's writing style in blind comparisons. Others, including writer Henry Shevlin and several Substack-based AI commentators, argued that GPT-4.5 felt qualitatively warmer and more conversationally natural than its predecessors. Nathan Lambert, in a widely circulated Interconnects post, observed that "many prominent AI figures online were touting how GPT-4.5 is much nicer to use and better at writing" but cautioned that vibes-based capability claims were hard to test rigorously.[14][22]
The most defensible quantitative claim OpenAI made was on hallucinations. On SimpleQA, an internal benchmark of factual questions, GPT-4.5 reached 62.5% accuracy with a hallucination rate of 37.1%, compared to GPT-4o's hallucination rate of 61.8% and o3-mini's 80.3%. The reduction was substantial and drew positive coverage from VentureBeat, DataCamp, and Constellation Research. The improvement did not, however, close the gap with reasoning models on factually demanding tasks once those models had access to web search.[1][23][24]
The benchmark story for GPT-4.5 was uneven, and the unevenness explains most of the launch's reception problems. On knowledge-heavy and language-style evaluations, the model led OpenAI's prior non-reasoning lineup. On reasoning-style evaluations such as AIME and GPQA, it trailed not only OpenAI's own o-series but also competing reasoning models from Anthropic and DeepSeek. Competitors using extended thinking or chain-of-thought routinely outperformed GPT-4.5 by 30 to 50 percentage points on the hardest math and science problems.[1][2][7]
| Benchmark | What it measures | GPT-4.5 | GPT-4o | o1 | o3-mini |
|---|---|---|---|---|---|
| SimpleQA accuracy | Factual recall on short questions | 62.5% | 38.2% | 47.0% | 15.0% |
| SimpleQA hallucination rate | Lower is better | 37.1% | 61.8% | 44.0% | 80.3% |
| MMMLU | Multilingual MMLU (15 languages) | 85.1% | 81.5% | 87.7% | 81.1% |
| MMMU | Multimodal college reasoning | 74.4% | 69.1% | 78.2% | n/a |
| GPQA (Diamond) | Graduate-level science | 71.4% | 53.6% | 75.7% | 79.7% |
| AIME 2024 | Competition mathematics | 36.7% | 9.3% | 87.3% (high) | 87.3% (high) |
| SWE-bench Verified | Real GitHub issues solved | 38.0% | 30.7% | 48.9% | 61.0% |
| SWE-Lancer Diamond | Multi-task software development | 32.6% | 23.3% | n/a | 10.8% |
| Aider Polyglot diff | Multi-language code edits | 44.9% | ~17% | n/a | n/a |
Numbers are taken from OpenAI's launch post, the system card, and the Helicone benchmarks summary.[1][2][24]
The pattern is consistent. On knowledge tasks (SimpleQA, MMMLU, MMMU) GPT-4.5 outperforms GPT-4o by 4 to 24 points. On reasoning tasks (AIME, GPQA Diamond, SWE-bench), it trails dedicated reasoning models by 10 to 50 points. For code, the model is better than GPT-4o on real-world bugs (SWE-bench Verified) but worse than o3-mini and far worse than reasoning models on competition-style problems.
The more painful comparison was with non-OpenAI competitors. Anthropic's Claude 3.7 Sonnet, launched on February 24, 2025, three days before GPT-4.5, posted SWE-bench Verified scores in the 62 to 70% range with extended thinking enabled and led on the Aider Polyglot benchmark. Google's Gemini 2.5 Pro, launched on March 25, 2025, scored 84.0% on AIME and 86.4% on Humanity's Last Exam variants while being roughly an order of magnitude cheaper than GPT-4.5 in the API. DeepSeek-R1, the open-weight reasoning model that had set off a January 2025 stock-market wobble, scored 79.8% on AIME 2024 and 71.5% on GPQA Diamond at a fraction of GPT-4.5's API cost.[25][26][27]
| Model | Lab | Architecture | AIME 2024 | GPQA Diamond | SWE-bench Verified | Input price/1M |
|---|---|---|---|---|---|---|
| GPT-4.5 | OpenAI | Non-reasoning | 36.7% | 71.4% | 38.0% | $75.00 |
| GPT-4o | OpenAI | Non-reasoning | 9.3% | 53.6% | 30.7% | $2.50 |
| o1 | OpenAI | Reasoning | 83.3% | 75.7% | 48.9% | $15.00 |
| o3-mini (high) | OpenAI | Reasoning | 87.3% | 79.7% | 61.0% | $1.10 |
| Claude 3.7 Sonnet | Anthropic | Hybrid (extended thinking) | ~80% | ~78% | 70.3% | $3.00 |
| Gemini 2.5 Pro | Google DeepMind | Reasoning | 92.0% | 84.0% | 63.8% | ~$1.25 |
| DeepSeek-R1 | DeepSeek | Reasoning (MoE) | 79.8% | 71.5% | 49.2% | $0.55 |
GPT-4.5 was substantially more expensive than any of these alternatives while underperforming all of them on the benchmarks most associated with frontier capability. Its only clear win was on SimpleQA-style factuality, and even that win narrowed once reasoning models were given web search.[24][26][28]
The one place GPT-4.5 led most peers was in human preference comparisons of conversational tone. OpenAI's internal A/B testing showed humans preferring GPT-4.5 outputs to GPT-4o outputs 56.8 to 63.2% of the time across query categories. Andrej Karpathy ran his own informal Twitter polls comparing GPT-4.5 against the latest GPT-4o-2024-11-20 snapshot on writing tasks. The results were closer than OpenAI's claim. On three of the five poll prompts, the GPT-4o snapshot won by 5 to 15 points. On the other two, GPT-4.5 won. Karpathy described the differences as smaller than expected for a model with so much more compute behind it.[1][22]
GPT-4.5 launched at API rates that were unprecedented for a production OpenAI model.
| Model | Input ($/1M tokens) | Cached input ($/1M tokens) | Output ($/1M tokens) |
|---|---|---|---|
| GPT-4 (8K, 2023) | $30.00 | n/a | $60.00 |
| GPT-4 Turbo | $10.00 | n/a | $30.00 |
| GPT-4o (Aug 2024) | $2.50 | $1.25 | $10.00 |
| GPT-4.5 | $75.00 | $37.50 | $150.00 |
| GPT-4.1 (Apr 2025) | $2.00 | $0.50 | $8.00 |
| GPT-5 (Aug 2025) | $1.25 | $0.13 | $10.00 |
At $75 input and $150 output per million tokens, GPT-4.5 was 30 times more expensive on input and 15 times more expensive on output than GPT-4o. It was 5 times the input price and 2.5 times the output price of o1, the previous most expensive frontier OpenAI model. Compared to its eventual successors, GPT-4.5 was 37 times the input price of GPT-4.1 and 60 times the input price of GPT-5.[5][29]
The price was justified internally as a reflection of training cost and serving cost, not a deliberate gating mechanism. OpenAI did not publish the inference compute per request, but several developers reported that long-prompt requests took 100 seconds or more to respond, which suggested per-request compute well above GPT-4o's. Even at $75 input and $150 output, the rate-limited capacity meant that most production users could not have shifted to GPT-4.5 even if the price had been a bargain.[4][14][16]
In the consumer product, GPT-4.5 was gated more aggressively than any prior flagship.
| ChatGPT tier | GPT-4.5 access |
|---|---|
| Free | None during research preview |
| Plus ($20/month) | Limited; capped messages per week, after week-of-March-3 rollout |
| Team | Limited; same caps as Plus |
| Pro ($200/month) | Available from launch day |
| Enterprise | Phased rollout from March 5, 2025 |
| Education | Phased rollout from March 5, 2025 |
The practical effect was that for the first several weeks, GPT-4.5 was almost exclusively a Pro-tier feature. ChatGPT Pro had launched in December 2024 alongside o1 Pro Mode at the $200 monthly price point, and GPT-4.5 was the second flagship to debut as a Pro-only feature. Several Plus users reported being unable to access GPT-4.5 even after the official rollout for capacity reasons, and rate limits were tighter than for any concurrently available model in ChatGPT.[1][16]
The reception of GPT-4.5 was historically significant for two reasons. First, it was one of the few times a major OpenAI flagship received broadly muted to negative coverage at launch. Second, it was the first OpenAI release where a substantial fraction of mainstream press explicitly framed the result as evidence that pre-training scaling alone was no longer producing the gains it once had.
Not all coverage was negative. OpenAI's own blog post and follow-up media briefings emphasized the SimpleQA improvements and the personality claim. Constellation Research described the EQ improvements as a "differentiator" for use cases involving customer support and creative collaboration. The New York Times treated the model as marking "the end of an era" in a piece that described the "more thoughtful" tone in mostly approving terms. VentureBeat ran a launch piece titled "OpenAI releases 'largest, most knowledgable' model GPT-4.5 with reduced hallucinations and high API price" that was net positive on capability while flagging the cost.[1][7][24][30]
A contingent of working creative writers and editors gave the model warmer reviews. Several Substack-based critics described GPT-4.5 as the first model where the writing style felt distinctly different from generic chatbot prose, and a few argued that the difference was particularly pronounced on prompts about emotionally complex topics. These accounts varied considerably in how rigorous they were, and many were quickly contested by users running blind comparisons.[14][22]
Most coverage from established AI press was sceptical. TechCrunch's headline read "OpenAI unveils GPT-4.5 'Orion,' its largest AI model yet," but the body of the piece focused on GPT-4.5 underperforming "newer AI 'reasoning' models from Chinese AI startup DeepSeek, Anthropic, and OpenAI itself." Fortune ran two pieces, one titled "OpenAI launches long-awaited GPT-4.5 but 'Orion's' shortcomings raise questions" and another reporting that GPT-4.5 was "a nothing burger" in the words of AI critic Gary Marcus. The Verge framed the launch around the price tag, noting that "GPT-4.5's $75-per-million-token price reflects an unspoken bet that scale still matters." The Financial Times and Bloomberg both ran pieces situating GPT-4.5 within their longer running coverage of frontier model scaling and Orion's underperformance.[2][6][7][31][32]
Developer communities responded similarly. Hacker News threads from the launch week ran heavily towards "underwhelming for the price." Several Substack-based AI commentators including Erik Hoel, Sinan Onur Altinuc, and the algorithmic-bridge writer Alberto Romero argued that the launch demonstrated that pre-training had hit its limits at the frontier, with Altinuc's piece titled "GPT-4.5 Just Hit the AI Scaling Wall" being one of the most circulated takes that week.[8][33][34]
Nathan Lambert's Interconnects analysis was the most carefully argued of the technical responses. Lambert pointed out that OpenAI's own draft system card included the line "GPT-4.5 is not a frontier model," later removed from the published version. He argued that GPT-4.5 was actually a frontier-capability model that had arrived "oddly, ahead of its time," because the field had already pivoted toward reasoning and test-time compute. His core conclusion was that "scaling pretraining alone" no longer delivered breakthroughs at the frontier, and that the future belonged to combinations of pre-training, reasoning, and reinforcement learning.[14]
The most quoted text from launch week was Altman's X post on February 27, 2025. The full quote, frequently truncated in subsequent reporting, ran:
"GPT-4.5 is ready! good news: it is the first model that feels like talking to a thoughtful person to me. i have had several moments where i've sat back in my chair and been astonished at getting actually good advice from an AI. bad news: it is a giant, expensive model. we really wanted to launch it to plus and pro at the same time, but we've been growing a lot and are out of GPUs. we will add tens of thousands of GPUs next week and roll it out to the plus tier then. (hundreds of thousands coming soon, and i'm pretty sure y'all will use every one we can rack up.) this isn't how we want to operate, but it's hard to perfectly predict growth surges that lead to GPU shortages."[20]
The quote was used in two contradictory ways. Bullish takes treated the "thoughtful person" line as evidence of a qualitative leap. Bearish takes paired "giant, expensive" with the model's middling benchmark performance to argue that OpenAI was launching what it knew was a flawed product because it had run out of options on Orion. A separate Altman tweet two weeks later, on March 12, 2025, said GPT-4.5 was "a different kind of intelligence and there's a magic to it i haven't felt before," and added that "this isn't a reasoning model and won't crush benchmarks." The second tweet was widely read as an effort to set expectations downward.[20][21]
The most consequential strand of GPT-4.5's reception was the debate over what the launch said about scaling laws and pre-training. The proper way to read this debate is as a continuation of an argument that was already underway in late 2024, not as something the GPT-4.5 launch settled on its own.
OpenAI's launch post described GPT-4.5 as "a step forward in scaling up pre-training and post-training" and characterized the model as the result of "scaling unsupervised learning," without using the words "plateau" or "wall." The system card contained a draft passage saying "GPT-4.5 is not a frontier model," which OpenAI removed from the published version, with Lambert's analysis arguing that the draft phrasing reflected internal uncertainty about how to position the release.[1][14]
Altman's framing in his February 27 tweet acknowledged the model was "giant" and "expensive" but did not concede a plateau. His later note that GPT-4.5 was "our last non-chain-of-thought model" was probably the strongest internal signal: future flagships would not rely on pre-training scale alone. In a Bloomberg interview the week of launch, OpenAI president Greg Brockman said the company had "learned a lot from training Orion that will improve future models," which several reporters took as a careful way of saying the run had not gone as well as hoped.[3][20][21][35]
Ilya Sutskever, OpenAI's former chief scientist now running Safe Superintelligence Inc., said at NeurIPS 2024 that "pre-training as we know it will unquestionably end" because the world is running out of human-created text data. The quote was widely cited in coverage of GPT-4.5 as evidence from the inside that pre-training had a ceiling, although Sutskever made the comments before GPT-4.5 was launched and was speaking generally about the training paradigm rather than about GPT-4.5 specifically.[36]
Mainstream coverage settled on "plateau" framing more confidently than the primary sources warranted. Fortune's February 25 piece, published two days before the launch, was titled "The $19.6 billion pivot: How OpenAI's 2-year struggle to launch GPT-5 revealed that its core AI strategy has stopped working." The Wall Street Journal, in a feature published the week of launch, described two failed Orion training runs and an internal debate about whether to ship the model. The Financial Times ran a follow-up piece on diminishing returns at the frontier, citing Anthropic and Google DeepMind sources that mirrored OpenAI's experience.[6][12][32]
It is worth being careful with the framing. "Plateau" implies that no further capability gains are possible from scaling, which is a stronger claim than the evidence supports. What the primary sources support is the weaker claim that the marginal benchmark gain per dollar of training compute has shrunk substantially at the frontier, and that this has shifted resources toward reasoning, test-time compute, and post-training techniques. Several commentators, including Lambert and Hoel, argued for the weaker version of the claim. Others, including Marcus, argued for the stronger "wall" version.[8][14][33][34]
Whatever the right framing, OpenAI's subsequent product moves were consistent with deemphasizing pure pre-training scale. GPT-4.1, launched on April 14, 2025, was explicitly positioned as a smaller, cheaper, faster model that matched or exceeded GPT-4.5 on most production-relevant tasks. GPT-5, launched on August 7, 2025, combined a fast non-reasoning model with a deep reasoning model behind a router. The GPT-5 system card and launch post described the system as the unification of "intuition" (pre-training) with "reasoning" (chain-of-thought), framing the previous separation between GPT-4.5-style and o-series-style models as a transitional step.[5][37]
This is not the same as saying pre-training is dead. OpenAI continued to scale pre-training in GPT-5 and subsequent flagships. What changed was the assumption that a flagship release could be carried by pre-training alone. GPT-4.5 is widely understood as the last release where OpenAI tried, and the experience clearly informed everything that followed.
OpenAI announced GPT-4.5's API deprecation on April 14, 2025, the same day it launched GPT-4.1. The deprecation notice gave developers exactly three months to migrate, setting July 14, 2025 as the date when gpt-4.5-preview would be removed from the API. The official rationale, as quoted in the GPT-4.1 launch post and confirmed by OpenAI staff in developer forum threads, was that GPT-4.1 "matches or exceeds GPT-4.5 in many key capabilities at much lower cost and latency."[5][9]
The pricing comparison made the migration economics obvious. GPT-4.1 was priced at $2.00 input and $8.00 output per million tokens, less than 3% of GPT-4.5's per-token cost on input and roughly 5% on output. On benchmarks OpenAI cared about (SWE-bench Verified, instruction following, multi-document recall), GPT-4.1 outperformed GPT-4.5 by 8 to 25 points. The 1 million token context window in GPT-4.1, eight times GPT-4.5's 128,000 token cap, was a further argument for migration.[5]
Developer response to the deprecation was sharply negative in tone, even among developers who agreed that GPT-4.1 was the better choice on cost. The complaint was about the timeline: a four-and-a-half-month commercial life for a flagship model was, by a wide margin, the shortest in OpenAI's history. The OpenAI Developer Community thread on the deprecation accumulated hundreds of comments, with several developers arguing that production deployments could not reasonably be expected to migrate that quickly. One Discord post quoted in Synapse Times' coverage read "Deprecating after 4 months is insane. Barely had time to ship."[9][38]
VentureBeat ran a follow-up piece headlined "OpenAI moves forward with GPT-4.5 deprecation in API, triggering developer anguish and confusion" reporting that some developers had built workflows around the model's distinctive tone and were not satisfied that GPT-4.1 reproduced it. Others noted that the deprecation effectively confirmed what skeptics had said at launch: GPT-4.5 was a research preview that OpenAI did not consider commercially sustainable.[10]
gpt-4.5-preview was removed from the API on July 14, 2025 as scheduled. Existing API requests to the deprecated endpoint began returning errors directing developers to GPT-4.1, GPT-4.1 mini, or the o-series. ChatGPT Pro retained access to GPT-4.5 through July and into August, although the model picker increasingly emphasized GPT-4.1 and the o-series for paid users.[9][39]
GPT-4.5 was finally removed from ChatGPT on August 7, 2025, when GPT-5 launched and replaced the entire prior model picker. ChatGPT Pro users could re-enable GPT-4.5 alongside GPT-4o and other legacy models through the "Show legacy models" toggle that OpenAI added during the GPT-5 backlash, but as of February 2026 GPT-4.5 had been quietly dropped from even that menu.[37][40]
The clearest line from GPT-4.5 to subsequent OpenAI models runs through GPT-5. GPT-5's launch post, published on August 7, 2025, framed the model as a unified system combining the pre-trained intuition that GPT-4.5 represented with the reasoning capabilities of the o-series. The framing was explicit: GPT-4.5 had been the company's last attempt to push pre-training alone to the frontier, and GPT-5 was the synthesis that followed.[37]
Several concrete features of GPT-5 trace directly to lessons OpenAI seems to have learned from GPT-4.5. The router in GPT-5, which sends each prompt to either a fast model or a thinking model, is essentially an automated version of the choice that ChatGPT users had been making manually between GPT-4.5 (for tone and knowledge) and o3 (for reasoning). The 50 to 80% reduction in reasoning tokens that GPT-5 thinking mode achieved relative to o3 made it commercially viable to ship a reasoning-default product, something GPT-4.5's pricing had made impossible. The hallucination reductions in GPT-5, roughly six times fewer factual errors than o3 with web search enabled, built directly on the SimpleQA gains that had been GPT-4.5's most defensible benchmark win.[37]
The reception story also influenced GPT-5. The August 2025 "we want our 4o back" backlash, in which paying users objected to the removal of GPT-4o from ChatGPT, played out partly because the GPT-5 default tone felt colder than GPT-4o's. Several writers noted that GPT-4.5's emphasis on personality presaged the personality preset system that shipped with GPT-5.1 in November 2025, which gave users explicit control over default warmth, friendliness, and verbosity. The lesson OpenAI took from both GPT-4.5 and the GPT-4o controversy was that personality and tone are first-class product features, not afterthoughts.[37][41]
GPT-4.5 has the unusual distinction of being one of the most expensive AI models ever trained and one of the shortest-lived commercial AI products at the frontier. By the time it was retired in August 2025, it had been on sale for 161 days. Its API life was 137 days. Its commercial output, in dollar terms, was almost certainly a small fraction of its training cost.[4][9]
The model is now widely cited in three contexts. First, as the test case that closed out the era of treating pre-training scale as the sole path to frontier capability. The shift to reasoning-augmented systems was already underway when GPT-4.5 launched, but the launch's reception accelerated it. Second, as the moment OpenAI internalized that personality and tone were product features that could carry a model even when its benchmarks were unimpressive. Third, as a cautionary tale about pricing. The $75 input and $150 output rate was an experiment in monetizing a very heavy model that did not survive the comparison to GPT-4.1 and the o-series.
The broader AI safety community took a different lesson. Several alignment researchers argued that GPT-4.5's underwhelming reasoning performance was actually good news on the margin, because it suggested that capability gains from pre-training scale alone were getting harder. The catch was that those gains were now being recaptured by the reasoning track, which raised its own set of safety concerns about deceptive chains of thought, reward hacking, and the difficulty of monitoring models that thought before they spoke.[42]
For OpenAI's product story, GPT-4.5 was the bridge between the GPT-4 era and the GPT-5 era. It carried over the conversational personality that defined the GPT-4o experience and pre-figured the personality presets that arrived with GPT-5.1. It established that very large pre-trained models could substantially reduce hallucinations on factual questions. And it confirmed, by negative example, that the future of frontier flagships would belong to systems that could reason, not just remember.