GPT-5.1 is a family of large language models developed by OpenAI and released on November 12, 2025. It is an incremental update to GPT-5, released approximately three months after GPT-5's debut, and was succeeded by GPT-5.2 on December 11, 2025. The GPT-5.1 family introduced a bifurcated model strategy with two primary consumer variants, GPT-5.1 Instant for everyday conversational use and GPT-5.1 Thinking for reasoning, alongside agentic coding models released under the Codex sub-brand. The update also expanded the personality customization system to eight tone presets, lowered API input pricing relative to GPT-5, and shipped an automatic routing layer that selects the most appropriate variant for each query.
Five distinct GPT-5.1 models shipped within a single week. November 12, 2025 saw the launch of GPT-5.1 Instant, GPT-5.1 Thinking, and GPT-5.1-Codex-Mini. November 19, 2025 added GPT-5.1-Codex-Max and GPT-5.1 Pro. Within the GPT-5 series cadence (GPT-5 in August 2025, GPT-5.1 in November 2025, GPT-5.2 in December 2025, and later point releases through 2026), GPT-5.1 occupied a brief window as OpenAI's flagship before the December update arrived.
OpenAI released GPT-5 on August 7, 2025, positioning it as a flagship model for coding, reasoning, and agentic tasks. GPT-5 introduced a unified architecture that combined a fast "Instant" model, a deeper "Thinking" model, and a real-time router that dispatched each query to the appropriate variant. It supported a 400,000-token context window with 128,000-token maximum output, multimodal text and image input, and pricing of $1.25 per million input tokens and $10 per million output tokens at launch.
GPT-5's reception was mixed. Several reviewers reported that on routine conversational and coding tasks the gains over GPT-4o and o3 felt modest. Reports from late 2025 indicated that Microsoft was exploring deeper integration with Anthropic's Claude models across Copilot and GitHub products, framed in part as a response to GPT-5's perceived weak points on everyday workflows. OpenAI's iterative release cadence picked up sharply in the fall of 2025, with GPT-5.1 arriving in November and GPT-5.2 in December.
Rather than a ground-up retrain, GPT-5.1 addressed user feedback around conversational tone, instruction adherence, and reasoning efficiency. It split the model surface more explicitly: a fast, warm model for interactive use and a deeper reasoning model for complex tasks. The two variants share an underlying weight family but are trained and prompted differently to produce distinct response profiles. GPT-5.1 also carried forward the Auto routing concept introduced with GPT-5.
OpenAI announced and released GPT-5.1 on November 12, 2025. Three models shipped that day: GPT-5.1 Instant, GPT-5.1 Thinking, and GPT-5.1-Codex-Mini. A second wave followed on November 19, 2025, adding GPT-5.1-Codex-Max and GPT-5.1 Pro. The rollout to consumer ChatGPT users was tiered: ChatGPT Pro, Plus, Go, and Business subscribers received access first, while Enterprise and Education accounts had a seven-day early access toggle before GPT-5.1 became the default. Free users received access on a gradual basis after the paid rollout. GPT-5 remained available in the ChatGPT legacy dropdown for approximately three months, allowing users to compare or continue workflows that depended on GPT-5's behavior.
A system card addendum covering GPT-5.1 Instant and GPT-5.1 Thinking was published simultaneously by OpenAI, documenting updated safety evaluations and post-deployment monitoring methodology for both variants. GitHub Copilot announced public preview support for GPT-5.1, GPT-5.1-Codex, and GPT-5.1-Codex-Mini on November 13, 2025, the day after the initial release.
GPT-5.1 Instant is the default model for most ChatGPT sessions. OpenAI described it as "warmer by default and more conversational," with instruction-following improvements that make the model more consistent in adhering to user-specified tone, format, and length preferences. The key technical addition in Instant is light adaptive reasoning: the model can autonomously decide whether a prompt warrants additional computation before responding, handling simple queries at full speed while applying a brief internal deliberation step to more demanding ones. This was OpenAI's first non-thinking model to exhibit adaptive reasoning behavior.
In ChatGPT, the context window for GPT-5.1 Instant varies by subscription tier, with smaller windows for free users and larger windows for paid tiers. Via the OpenAI API, the GPT-5.1 model supports a combined context of up to 400,000 tokens with a maximum output of 128,000 tokens. Knowledge cutoff is September 30, 2024.
GPT-5.1 Thinking is designed for complex tasks requiring deeper reasoning. It adjusts the time and compute devoted to each problem based on its difficulty, spending less on simple queries than its GPT-5 Thinking predecessor and more on genuinely hard problems. OpenAI stated that GPT-5.1 Thinking is approximately twice as fast on simple tasks and is more persistent on complex ones compared to GPT-5 Thinking. The model also produces clearer explanations with reduced technical jargon, which OpenAI framed as part of an effort to make reasoning output more legible to end users.
GPT-5.1 Auto is an automatic routing layer, not a distinct model. When selected, it dispatches each incoming query to whichever GPT-5.1 variant is judged most appropriate: Instant for conversational and routine tasks, Thinking for problems that benefit from extended reasoning, and Codex variants for programming-heavy work. Auto is the default selection in ChatGPT for users who prefer not to choose manually.
Released on November 12, 2025 alongside the main GPT-5.1 launch, GPT-5.1-Codex-Mini is a smaller, faster, and more cost-efficient version of the Codex line, intended for coding automation, agentic pipelines, and tasks where latency and cost matter more than maximum capability. It is available via the OpenAI API and through the Codex CLI and IDE extensions. The model has a 400,000-token context window and the same September 30, 2024 knowledge cutoff as the rest of the family.
Released on November 19, 2025, GPT-5.1-Codex-Max is the most capable agentic coding model in the GPT-5.1 family. It is built on an updated foundational reasoning model and is trained specifically on agentic tasks across software engineering, mathematics, and research. Its most notable technical feature is native context compaction: GPT-5.1-Codex-Max is the first OpenAI model trained to work coherently across multiple context windows by compaction, allowing it to operate over millions of tokens in a single long-running task without losing track of prior context. When the agent approaches its current context limit, the model automatically compacts its session and continues, repeating the cycle until task completion.
In internal evaluations, OpenAI reported observing GPT-5.1-Codex-Max work autonomously for more than 24 hours, iterating on implementations, fixing test failures, and delivering complete results. The model exposes a configurable reasoning effort level labeled xhigh that unlocks its peak performance. On SWE-bench Verified, GPT-5.1-Codex-Max with xhigh reasoning effort scored 77.9%, with the same model at medium outperforming GPT-5.1-Codex at medium while using approximately 30% fewer thinking tokens. On Terminal-Bench 2.0, GPT-5.1-Codex-Max scored 58.1% to 60.4% depending on the harness used.
Released on November 19, 2025, GPT-5.1 Pro replaces GPT-5 Pro as the model available to ChatGPT Pro subscribers for unlimited use. It combines GPT-5.1 Thinking's reasoning behavior with the higher context allowance and usage entitlements of the Pro tier.
| Specification | GPT-5.1 (API) | GPT-5.1 Codex | GPT-5.1-Codex-Mini |
|---|---|---|---|
| Context window (input) | 400,000 tokens | 400,000 tokens | 400,000 tokens |
| Max output tokens | 128,000 | 128,000 | 128,000 |
| Knowledge cutoff | September 30, 2024 | September 30, 2024 | September 30, 2024 |
| Input modalities | Text, image | Text, image | Text, image |
| Output modalities | Text | Text | Text |
| Reasoning effort levels | none, low, medium, high | low, medium, high, xhigh (Codex-Max) | low, medium, high |
| Streaming | Yes | Yes | Yes |
| Function calling | Yes | Yes | Yes |
| Structured outputs | Yes | Yes | Yes |
| Fine-tuning | No | No | No |
| Prompt caching | Yes (90% discount on cached tokens) | Yes | Yes |
| Batch API | Yes | Yes | Yes |
For consumer ChatGPT usage, the context window varies by subscription tier rather than matching the API limit, and the maximum effective input is generally smaller than the API ceiling.
One of the most discussed aspects of the GPT-5.1 update was its expanded personality system. GPT-5 shipped with a small set of tone presets. GPT-5.1 extended this to eight options accessible through ChatGPT's Personality settings panel. OpenAI also renamed two of the presets carried over from GPT-5: "Robot" became "Efficient" and "Listener" became "Friendly."
| Preset | Description |
|---|---|
| Default | Warmer and more conversational than GPT-5's default, with occasional playfulness |
| Friendly | A warmer, more empathetic tone (formerly "Listener" in GPT-5) |
| Efficient | Direct and unemotional, focused on concise answers (formerly "Robot") |
| Professional | Formal register suitable for business or academic contexts |
| Candid | Frank and unvarnished, less hedging |
| Quirky | Playful and unconventional phrasing |
| Nerdy | Technical depth with domain-specific vocabulary |
| Cynical | Skeptical framing, minimal reassurance |
Professional, Candid, and Quirky were the three presets newly introduced with GPT-5.1. Beyond preset selection, users can tune additional parameters: response conciseness, warmth level, scannability (how much the model uses headers and bullet points), and emoji frequency. These settings persist across sessions. The system also has a proactive update feature: during conversations, if a user asks the model to adjust tone or format, GPT-5.1 can offer to update the saved preferences permanently rather than applying the change only to the current session.
The shift in GPT-5.1 Instant's default toward warmer and more conversational language was not universally welcomed. Writer John Gruber, covering the release on Daring Fireball, called the change a "glaring regression" and criticized phrases like "I've got you" as misrepresenting the model's nature, which he described as a kind of phony emotional manipulation that LLMs cannot authentically deliver. Gruber preferred the Efficient preset, which he argued removed unnecessary filler from responses, and dismissed the new Cynical setting as neither truly cynical nor sarcastic. The rename of "Robot" to "Efficient" and "Listener" to "Friendly" also drew comment, with some users finding the older names more transparent about what the presets actually changed.
OpenAI published a dedicated GPT-5.1 Instant and GPT-5.1 Thinking System Card Addendum at the time of release. The addendum supplemented the original GPT-5 System Card with two new evaluation suites: a mental health suite that probes model behavior in conversations where users show signs of isolated delusions, psychosis, or mania, and an emotional reliance suite that examines outputs for signs of unhealthy attachment to ChatGPT. The October 2025 sensitive conversations addendum, released roughly two weeks before GPT-5.1, had already documented work with more than 170 mental health experts to revise model behavior in distress scenarios. OpenAI reported that updates introduced through that work reduced responses falling short of desired behavior by 65% to 80% in the targeted categories.
The addendum also reaffirmed the standard battery of evaluations carried over from GPT-5: prompt injection robustness, jailbreak resistance, instruction hierarchy adherence, hallucination metrics on factual benchmarks, and red-teaming results on biological, chemical, and cybersecurity uplift. OpenAI did not classify GPT-5.1 as a meaningful capability leap requiring a new Preparedness Framework score, in contrast with GPT-5.2, which received expanded coverage on agentic and self-improvement risks in its own December 2025 system card.
METR, the independent AI evaluations organization, published a separate report on GPT-5.1-Codex-Max in late November 2025. METR ran the model against the Human-Calibrated Autonomy Software Tasks (HCAST) suite, evaluating it on a subset of 90 tasks spanning cybersecurity, AI R&D, general reasoning, environment exploration, and general software engineering. METR reported a 50% time-horizon for GPT-5.1-Codex-Max between 75 minutes and 350 minutes, with a point estimate of approximately 2 hours and 42 minutes. The organization characterized the model as a low-risk incremental improvement on an existing trend line, with no evidence supporting concerns about catastrophic capability jumps via self-improvement, rogue replication, or sabotage of AI labs in the near term if extrapolation of current trends holds.
GPT-5.1 refined the Auto routing system introduced with GPT-5. The updated routing layer uses signal from the incoming query, account tier, and conversation history to select the appropriate variant. For most everyday queries, Instant handles the response. When a prompt contains mathematical notation, multi-step logical dependencies, or explicit instructions to think carefully, the router escalates to Thinking. Coding-specific queries with file contexts or repository instructions route to the Codex variants when the user has Codex access enabled.
The routing changes were part of a broader design goal: reducing the cognitive overhead on users who previously had to manually select between o-series reasoning models and base GPT models. The GPT-5.1 system card addendum noted that online measurement of routing decisions allows OpenAI to detect and correct routing errors in near real time, adjusting thresholds without requiring a full model update.
OpenAI published benchmark comparisons between GPT-5.1 and its predecessor at launch. Independent evaluators published additional results in the weeks that followed. Specific score attribution can be tricky for the GPT-5.1 family because OpenAI sometimes reported headline numbers for the Codex-Max variant under the GPT-5.1 banner. The table below distinguishes scores attributed to GPT-5.1 Thinking from those attributed specifically to GPT-5.1-Codex-Max where reliable sources support that distinction.
| Benchmark | GPT-5.1 (Thinking) | Notes |
|---|---|---|
| GPQA Diamond | 88.1% | PhD-level science questions; OpenAI evaluations and Vellum confirm |
| ARC-AGI-2 | 17.6% | Abstract reasoning; superseded by GPT-5.2 at 52.9% |
| MMMU Pro | 85.4% | Multimodal understanding |
| FrontierMath (Tiers 1-3) | ~30% | Approximate; precise OpenAI-published value not surfaced in independent third-party reporting |
| Benchmark | GPT-5.1 | GPT-5.1-Codex-Max (xhigh) | Notes |
|---|---|---|---|
| SWE-bench Verified | 76.3% | 77.9% | Codex-Max at xhigh is the highest reported figure for the family |
| Terminal-Bench 2.0 | 47.6% | 58.1% to 60.4% | Score depends on harness; Codex-Max also reported at 58.1% by OpenAI |
| Benchmark | GPT-5.1 | GPT-5 | Claude Opus 4.5 | Gemini 3 Pro |
|---|---|---|---|---|
| GPQA Diamond | 88.1% | ~84% | 83.4% | 91.9% |
| ARC-AGI-2 | 17.6% | not reported by OpenAI | ~15% | 31.1% |
| SWE-bench Verified | 76.3% | 74.9% | ~77% | 76.2% |
| MMMU / MMMU Pro | 85.4% | 84.2% | 77.8% | 81.0% |
GPT-5.1's headline GPQA Diamond score of 88.1% improved on GPT-5 but trailed Gemini 3 Pro's 91.9%, which was the first model to surpass human expert-level performance on this benchmark at scale. On SWE-bench Verified, GPT-5.1-Codex-Max with xhigh effort closed the gap with Claude Opus 4.5 at 77.9% versus 77.2%. On ARC-AGI-2, GPT-5.1's 17.6% looked weak in retrospect after GPT-5.2 jumped to 52.9% in December 2025, a 35-point leap that OpenAI cited prominently in its GPT-5.2 announcement. Across multiple agentic and reasoning benchmarks, third-party reviewers tended to rank Gemini 3 Pro above GPT-5.1 at launch, with Claude Opus 4.5 close on coding-focused tasks.
For pure throughput, Artificial Analysis measured GPT-5.1 (high) at roughly 119 to 142 tokens per second across OpenAI, Azure, and Databricks endpoints, with Azure offering the lowest time to first token. Output speed for GPT-5.1 generally placed it in a competitive range with other frontier models on the same harness, but specific provider rankings shifted over time as the workload mix and quantization varied.
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Cached input | Batch discount |
|---|---|---|---|---|
| GPT-5.1 | $1.25 | $10.00 | $0.125 (90% off) | 50% off standard |
| GPT-5.1-Codex-Mini | Lower than GPT-5.1 | Lower than GPT-5.1 | Available | Available |
| GPT-5 (for comparison) | $1.25 | $10.00 | $0.125 | 50% |
| Claude Opus 4.5 (for comparison) | $5.00 | $25.00 | Available | Available |
| Gemini 3 Pro (for comparison) | $2.00 | $12.00 | Available | Available |
GPT-5.1's input token pricing of $1.25 per million matched GPT-5's launch rate, with the same $10.00 per million output. Prompt caching offers a further 90% discount on cached tokens, dropping the cached input rate to $0.125 per million, and the cache retains entries for up to 24 hours, which makes long-context repeated workloads substantially cheaper than the headline rate suggests. Reasoning tokens generated by GPT-5.1 Thinking are billed as output tokens. Batch API jobs receive a 50% discount on both input and output tokens in exchange for up to 24-hour processing latency. Azure AI Foundry hosts GPT-5.1 at slightly different pricing in some configurations, while OpenRouter aggregates multiple providers behind a single endpoint.
Reporting from BenchLM in early 2026 framed GPT-5.1's pricing as "absurdly cheap" for tasks the model handled well, and recommended GPT-5.1 as the default tier for straightforward text processing where the larger context and stronger reasoning of GPT-5.4 was not strictly required.
| Tier | Monthly cost | Access |
|---|---|---|
| Free | $0 | GPT-5.1 Instant (rate limited) |
| Plus | $20 | GPT-5.1 Instant and GPT-5.1 Thinking |
| Pro | $200 | GPT-5.1 Pro, expanded usage limits |
| Business / Team | Per-seat pricing | GPT-5.1 Instant and Thinking, organizational controls |
| Enterprise | Custom | Full suite including Codex variants, admin controls |
| Attribute | GPT-5.1 | GPT-5.2 | Claude Opus 4.5 | Gemini 3 Pro |
|---|---|---|---|---|
| Release date | November 12, 2025 | December 11, 2025 | November 24, 2025 | November 18, 2025 |
| Context window (API) | 400,000 tokens | 400,000 tokens | 200,000 tokens | 1,000,000 tokens |
| Output tokens (max) | 128,000 | 128,000 | 64,000 | 64,000 |
| Input pricing (per 1M) | $1.25 | $1.75 | $5.00 | $2.00 |
| Output pricing (per 1M) | $10.00 | $14.00 | $25.00 | $12.00 |
| GPQA Diamond | 88.1% | 92.4% | 83.4% | 91.9% |
| SWE-bench Verified | 76.3% | 80.0% | ~77% | 76.2% |
| ARC-AGI-2 | 17.6% | 52.9% | ~15% | 31.1% |
| Personality presets | 8 | Inherited | Not exposed | Not exposed |
| Reasoning configurability | Configurable | Configurable | Not configurable | Deep Think toggle |
At launch, GPT-5.1 was widely seen as a competent but incremental update. It led on a small number of metrics, sat in the middle of the pack on others, and trailed Gemini 3 Pro on graduate-level scientific reasoning. On coding benchmarks, particularly when running GPT-5.1-Codex-Max at the xhigh reasoning effort level, the family was competitive with or briefly ahead of Claude Opus 4.5. GPT-5.2, released in December 2025, superseded GPT-5.1 in nearly every capability dimension at a higher price point. By the time GPT-5.4 arrived in March 2026, GPT-5.1 had become a budget-friendly option for tasks that did not need the newer models' larger context window or stronger abstract reasoning.
GPT-5.1's strongest showing relative to its pricing is in software engineering tasks. The headline 76.3% on SWE-bench Verified, with GPT-5.1-Codex-Max at 77.9% under xhigh reasoning, made the family a viable choice for repository-level coding agents. GPT-5.1-Codex-Max extends this to long-horizon agentic workflows including autonomous debugging, multi-file refactoring, and continuous integration pipelines that can run for many hours without human supervision. Developers using the Codex CLI or IDE extensions can invoke GPT-5.1-Codex-Max for tasks where GPT-5.1 Instant lacks the persistence required.
GitHub Copilot's public preview integration on November 13, 2025 covered GPT-5.1, GPT-5.1-Codex, and GPT-5.1-Codex-Mini, exposing the family directly inside Visual Studio Code, the GitHub web interface, and Copilot Chat sessions.
GPT-5.1 Thinking's dynamic allocation of reasoning time made it more practical than a fixed-effort reasoning model for workflows where some prompts are trivial and others require deep work. Independent reviews flagged competition mathematics, structured proof checking, and engineering-style problem decomposition as areas where the Thinking variant performed close to leading reasoning models, although it trailed GPT-5.2 and Gemini 3 Pro on the toughest benchmarks. Tutoring, problem decomposition, and checking mathematical derivations were common consumer use cases.
GPT-5.1 Instant's personality presets and fine-grained tone controls made it suited to conversational product integrations. Applications that previously required custom system prompt engineering to establish a model's tone could instead use the personality API parameters to achieve consistent output across user sessions. The eight presets and per-parameter controls were widely seen as a step toward letting users configure behavior without raw system prompt engineering.
The 400,000-token context window via the API supports ingestion of large codebases, legal documents, lengthy research papers, and multi-volume datasets in a single session. Combined with prompt caching at a 90% discount on cached tokens, document-heavy workflows became substantially more cost-efficient compared to GPT-5.
The structured outputs, native JSON schema support, parallel function calling, and configurable reasoning effort levels (none, low, medium, high) make GPT-5.1 a flexible backbone for agent frameworks, retrieval-augmented generation pipelines, and data extraction workflows. GPT-5.1 was made available across multiple providers at launch, including OpenAI, Azure AI Foundry, Databricks, and OpenRouter, allowing developers to choose between hosted endpoints based on latency, region, or cost.
Multiple enterprise integrations adopted GPT-5.1 within weeks of launch. Snowflake Cortex AI added GPT-5.1 as a hosted option for in-database language model calls. Databricks integrated the model into its Foundation Model API, allowing customers to invoke GPT-5.1 from within Databricks notebooks and jobs without separately provisioning OpenAI credentials. Azure AI Foundry continued to mirror OpenAI's catalog with regional deployment options for customers requiring data residency in specific Microsoft regions. The breadth of providers reduced single-vendor risk for enterprises and contributed to GPT-5.1's relatively quick uptake despite its short tenure as the OpenAI flagship.
The GPT-5.1 release came with a set of ChatGPT product changes that ran alongside the new models. ChatGPT Pro subscribers received GPT-5.1 Pro on November 19, 2025, replacing GPT-5 Pro as the default for unlimited usage. OpenAI reported that early testers consistently preferred GPT-5.1 Pro over GPT-5 Pro on writing help, data science, and business questions, with testers highlighting improved clarity, relevance, and structure of responses.
File attachment limits were increased on the ChatGPT web client, with up to 20 files attachable at once on supported tiers, up from 10 previously. ChatGPT's voice mode received tooling improvements to make it more adept at using web search and other tools while engaged in spoken conversation. Web search continued to be available within GPT-5.1 Instant and Thinking, with the Auto router escalating to a search-grounded response when the query indicated a recent event or factual lookup. ChatGPT free users received GPT-5.1 Instant on a gradual rollout following the paid tier release, with the same eight personality presets exposed in settings, although context windows for the free tier were notably smaller than the API ceiling of 400,000 tokens.
GPT-5 was retained as a legacy option in the model picker for approximately three months after the GPT-5.1 launch, mirroring OpenAI's approach with prior major upgrades. The legacy access window let developers and power users continue to use GPT-5 directly for workflows that depended on its specific tone, hallucination profile, or routing characteristics. Custom GPTs built on GPT-5 continued to function unchanged after the upgrade, as GPT-5.1 was made available as a backend swap rather than a breaking schema change.
Initial coverage of GPT-5.1 framed it as a competent but incremental update. Tom's Guide reviewer Amanda Caswell praised its mathematical reasoning and coding abilities but concluded that Gemini 3 Pro produced better results overall across general tasks. Daring Fireball was more pointed in its criticism of the warmer default personality, with John Gruber arguing that phrases like "I've got you" introduce an anthropomorphic register that misrepresents how the model actually works. The broader view among reviewers was that GPT-5.1 closed some of the gap that had opened between OpenAI and competitors like Google and Anthropic during the weeks following GPT-5's launch, without decisively retaking the top position across all domains.
Gizmodo characterized the release as a "mini-update" but noted that early testers were surprised by the model's playfulness. CEO Sam Altman called GPT-5.1 "a nice upgrade" in his public comments. Computerworld and EdTech Innovation Hub covered the adaptive reasoning and tone control features approvingly, while MacRumors documented the rollout schedule and the seven-day Enterprise toggle.
Developer response to the pricing structure was broadly positive. The 90% prompt caching discount and the flat $1.25 per million input tokens were cited as making GPT-5.1 a cost-efficient frontier-class model via direct API at launch, before steeper discounts from competing labs reset expectations later. The personality customization system attracted significant coverage in both consumer and technical press. Some developers noted a potentially confusing interaction between custom instructions set before the GPT-5.1 rollout and the new personality presets, recommending that users audit their saved instructions when switching to the Efficient or Professional presets.
By the time GPT-5.2 launched on December 11, 2025, GPT-5.1 had largely receded from active discussion. It remained available as a lower-cost option for developers who did not need GPT-5.2's improvements on abstract reasoning, FrontierMath, and SWE-bench Pro. The 35-point jump from GPT-5.1's 17.6% to GPT-5.2's 52.9% on ARC-AGI-2 became one of the most cited generation-over-generation gains during the late-2025 model cycle.
GPT-5.1 had an unusually short reign as OpenAI's flagship model. GPT-5.2 launched on December 11, 2025, less than a month after GPT-5.1's debut, and was framed as a competitive response to Google's Gemini 3 Pro (November 18, 2025) and Anthropic's Claude Opus 4.5 (November 24, 2025). The December release moved the input price from $1.25 to $1.75 per million tokens and the output price from $10 to $14 per million, which OpenAI tied to the substantial benchmark gains across abstract reasoning and software engineering. The flagship moved on, but GPT-5.1 stayed in service as a budget-friendly tier within OpenAI's product mix.
GPT-5.2-Codex followed on December 18, 2025, replacing GPT-5.1-Codex-Max as the default Codex CLI backend, although developers retained the ability to pin the prior version when stability mattered. Beyond December, GPT-5.3 Instant arrived in March 2026 as a default Instant replacement, with an expanded context window and reduced hallucinations. GPT-5.4 followed in March 2026 with native computer-use capabilities and a 1.05 million-token context window. GPT-5.5 launched in April 2026 as a fully retrained agentic model with state-of-the-art Terminal-Bench 2.0 scores.
Within this rapid succession, GPT-5.1 became one of the shorter-lived flagship snapshots in OpenAI's release history. Reporting from BenchLM in early 2026 nevertheless argued that GPT-5.1 remained a sensible default for workloads dominated by routine text processing, where its $1.25 input rate and 90% caching discount made it cheaper to run at scale than GPT-5.2 or GPT-5.4 without a meaningful capability deficit on those particular tasks.
Knowledge cutoff: Both Instant and Thinking variants have a knowledge cutoff of September 30, 2024. Events, research, and product developments after that date are unavailable without external tools or retrieval augmentation, even though the model was released more than a year after its training cutoff.
Context window asymmetry: The 400,000-token context window is available only via the direct API. ChatGPT free users see a much smaller context, substantially reducing the model's utility for long-document work without an API subscription.
Abstract reasoning gap: ARC-AGI-2 performance at 17.6% trailed Gemini 3 Pro by a wide margin and was decisively beaten by GPT-5.2's 52.9% just one month later. For tasks that depend on novel abstract pattern recognition rather than learned domain knowledge, GPT-5.1 was not the strongest available choice during its short tenure as OpenAI's flagship.
GPQA gap with Gemini 3 Pro: On GPQA Diamond, GPT-5.1 trailed Gemini 3 Pro by a meaningful margin (88.1% versus 91.9%). For tasks requiring sustained scientific or graduate-level reasoning, GPT-5.1 Thinking was competitive but not the top-ranked option.
Image output: GPT-5.1 accepts image and PDF inputs but does not produce image outputs. Image generation requires separate tooling or calls to image-specific models.
Context compaction limited to Codex-Max: The long-horizon context compaction capability of GPT-5.1-Codex-Max was not extended to the Instant or Thinking variants at launch. Standard sessions remain bounded by the fixed context window sizes.
Personality default controversy: The warmer default personality received significant critical coverage, with some users finding the informal register inappropriate for professional or technical use. Switching to the Efficient preset or setting explicit instructions in the system prompt restores a more neutral tone, but requires deliberate user action.
Fine-tuning unavailable: Unlike some earlier GPT models, GPT-5.1 does not currently support customer fine-tuning through OpenAI's standard fine-tuning API.
OpenAI published a GPT-5.1-Codex-Max prompting guide alongside the November 19 launch, with practical advice for developers working on long-horizon agentic tasks. The guide recommended structuring prompts to make compaction-friendly behavior explicit, including listing the active goal, the current sub-task, and any dependencies that should survive across compaction boundaries. Developers building Codex CLI integrations were advised to surface compaction events to the user when possible, both for transparency and because compaction sometimes shifts the agent's approach to a problem after the prior context is summarized.
For regular GPT-5.1 use through the API, the Responses API was OpenAI's recommended endpoint, with the older Chat Completions API also supported. Function calling, structured outputs, and JSON schema validation behaved consistently across the two endpoints. Streaming output worked with both reasoning and non-reasoning effort levels, although developers using the Thinking variant had to account for a longer time-to-first-token while the model performed its internal deliberation. Reasoning tokens were billed as output tokens, which meant that opting into the Thinking variant for short-output tasks could materially raise the per-request cost compared to the Instant model at the same headline rates.
Several developer publications recommended GPT-5.1 specifically for retrieval-augmented generation pipelines, where the 400,000-token context window allowed long retrieved spans to be passed alongside the user's question, and where prompt caching at the 90% discount made repeated retrievals over similar source corpora substantially cheaper. The configurable reasoning effort gave developers a knob to tune cost versus quality without switching models, with low and medium effort handling routine extraction tasks at near-Instant speed and high effort reserving the deeper reasoning for harder analyses.
The "GPT-5.1" name was OpenAI's first use of a decimal point version increment within the GPT series. Earlier major releases such as GPT-3, GPT-3.5, GPT-4, and GPT-4o used either whole-number labels or compound names. GPT-4.1, released in April 2025, was an exception that anticipated the decimal-point cadence later applied to GPT-5. GPT-5.1's naming foreshadowed the rapid sequence of GPT-5.2, GPT-5.3, GPT-5.4, and GPT-5.5 releases that followed in the next six months. By mid-2026, OpenAI's release cadence had effectively shifted to monthly or bimonthly point releases, with the GPT-5.x family acting as a versioning umbrella for a continuously evolving line of models.
The "Instant" and "Thinking" labels for GPT-5.1's two consumer variants matched the architecture introduced with GPT-5 in August 2025. Subsequent updates retained the Instant or Thinking nomenclature, with GPT-5.2 expanding to Instant, Thinking, and Pro tiers and later releases preserving similar branding. The "Codex" sub-brand for coding-specialized models continued with GPT-5.1-Codex-Mini, GPT-5.1-Codex-Max, and successor releases through GPT-5.2-Codex and GPT-5.3-Codex.