GPT-4.1 mini
Last reviewed
Jun 3, 2026
Sources
8 citations
Review status
Source-backed
Revision
v1 · 1,365 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
8 citations
Review status
Source-backed
Revision
v1 · 1,365 words
Add missing citations, update stale details, or suggest a clearer explanation.
GPT-4.1 mini is a large language model developed by OpenAI and released on April 14, 2025 as the mid-size member of the GPT-4.1 family. Positioned between the full GPT-4.1 model and the smaller GPT-4.1 nano, it was designed to deliver intelligence competitive with GPT-4o while substantially reducing latency and cost. OpenAI reported that GPT-4.1 mini matches or exceeds GPT-4o on many benchmark evaluations while cutting latency by roughly half and lowering cost by about 83 percent. [1][2]
GPT-4.1 mini was introduced alongside GPT-4.1 and GPT-4.1 nano on April 14, 2025, in a launch that was initially available only through the OpenAI API rather than in ChatGPT. [1][3] All three models share a context window of up to roughly 1,000,000 tokens and a knowledge cutoff of June 2024, and the family emphasizes improvements in coding, instruction following, and long-context comprehension over GPT-4o. [1][2]
The "mini" tier is intended for developers who need most of GPT-4.1's capability at a lower price point and faster response time. OpenAI described GPT-4.1 mini as a model that "delivers performance competitive with GPT-4o at substantially lower latency and cost," making it suitable for high-volume production workloads where cost and speed matter as much as raw capability. [1][4] It supports both text and image inputs and produces text output; audio and video inputs are not supported. [5]
The release reflected a broader pattern in OpenAI's product strategy of offering tiered model families, in which a flagship model is accompanied by progressively smaller and cheaper variants tuned for different workloads. Within that structure GPT-4.1 mini occupies the middle, trading a modest amount of capability relative to the full GPT-4.1 model for roughly a fifth of its input price, while remaining markedly more capable than the entry-level nano variant. [1]
The GPT-4.1 family consists of three models released together on April 14, 2025:
All three were API-only at launch. OpenAI positioned the family as a successor to GPT-4o for developers, citing major gains on the SWE-bench Verified coding benchmark and on Scale AI's MultiChallenge instruction-following benchmark, along with improved ability to use very long contexts. [1] At the same livestream, OpenAI announced that the GPT-4.5 Preview model (sometimes referred to by the codename Orion) would be deprecated and turned off in the API on July 14, 2025, giving developers three months to migrate. [1][6] OpenAI noted that GPT-4.1 offered "improved or similar performance" to GPT-4.5 on many key capabilities at much lower cost and latency. [1]
Although the family launched in the API only, OpenAI later brought GPT-4.1 to ChatGPT. On May 14, 2025, GPT-4.1 became available to ChatGPT Plus, Pro, and Team subscribers, while GPT-4.1 mini replaced GPT-4o mini as the lightweight fallback model for all ChatGPT users, including those on the free tier. [3]
GPT-4.1 mini was built to retain the core strengths of the GPT-4.1 family, namely coding, instruction following, and long-context handling, while operating faster and more cheaply than GPT-4o. OpenAI stated that the model "matches or exceeds GPT-4o" on many intelligence evaluations despite its smaller size and lower price. [1][2]
Key capability claims from OpenAI include:
OpenAI noted that GPT-4.1 mini is also more capable than GPT-4o mini, although that gain comes with a small increase in latency relative to the older, smaller model. [4] On some image and reasoning benchmarks, GPT-4.1 mini performs close to the full GPT-4.1 model. [4]
The family as a whole was tuned to follow instructions more literally and reliably than GPT-4o, which OpenAI cited as one of its most-requested developer improvements. GPT-4.1 and its smaller siblings were trained to adhere more closely to formatting requirements, ordering of steps, and explicit constraints in prompts, behavior that benefits agentic and tool-using applications. These instruction-following gains carry over to GPT-4.1 mini, which scored 84.1 percent on the public IFEval benchmark and 35.8 percent on Scale AI's MultiChallenge measure of multi-turn instruction adherence. [1][7]
OpenAI also documented a known limitation that applies across the family: reasoning accuracy degrades as the input grows toward the maximum context length. In OpenAI's own long-context evaluation, accuracy fell from roughly 84 percent at 8,000 tokens to about 50 percent at 1,000,000 tokens, indicating that the headline million-token window does not guarantee uniform performance across an entire maximally sized input. [3]
OpenAI published benchmark results for the GPT-4.1 family at launch. For GPT-4.1 mini specifically, reported scores include the following. [1][7]
| Benchmark | What it measures | GPT-4.1 mini |
|---|---|---|
| MathVista | Visual mathematical reasoning | 73.1% |
| MultiChallenge (Scale AI) | Multi-turn instruction following | 35.8% |
| IFEval | Instruction following | 84.1% |
| Hard instruction-following eval (OpenAI internal) | Difficult instruction adherence | 45.1% |
| Aider polyglot (diff format) | Code editing across languages | 31.6% |
On MathVista, GPT-4.1 mini slightly outscored the full GPT-4.1 model, illustrating how close the mid-size model can come to the flagship on certain tasks. [4] OpenAI emphasized that, taken together, these results show GPT-4.1 mini meeting or beating GPT-4o across a broad set of intelligence evaluations. [1][2]
It is important not to confuse GPT-4.1 mini's scores with those of GPT-4.1 nano. The widely cited figures of 80.1 percent on MMLU and 50.3 percent on GPQA belong to GPT-4.1 nano, the smallest model in the family, not to GPT-4.1 mini. [1]
GPT-4.1 mini uses standard per-token pricing, with a discounted rate for cached input tokens. OpenAI listed the following prices at launch. [5][7]
| Model | Input (per 1M tokens) | Cached input (per 1M tokens) | Output (per 1M tokens) | Context window |
|---|---|---|---|---|
| GPT-4.1 | $2.00 | $0.50 | $8.00 | ~1,000,000 |
| GPT-4.1 mini | $0.40 | $0.10 | $1.60 | ~1,000,000 |
| GPT-4.1 nano | $0.10 | $0.025 | $0.40 | ~1,000,000 |
| GPT-4o | $2.50 | $1.25 | $10.00 | 128,000 |
| GPT-4o mini | $0.15 | $0.075 | $0.60 | 128,000 |
The roughly 83 percent cost reduction OpenAI cited for GPT-4.1 mini reflects the gap between its $0.40 input and $1.60 output prices and GPT-4o's $2.50 input and $10.00 output prices. [1][2][8] The exact context window for GPT-4.1 mini is 1,047,576 tokens, with a maximum output of 32,768 tokens, and its knowledge cutoff is June 2024. [5]
At launch on April 14, 2025, GPT-4.1 mini was available exclusively through the OpenAI API, exposed under the model identifiers gpt-4.1-mini and the dated snapshot gpt-4.1-mini-2025-04-14. [3][5] It was not initially offered in ChatGPT, where OpenAI said improvements were being folded into GPT-4o on a separate track. [1]
On May 14, 2025, OpenAI added GPT-4.1 to ChatGPT for paying subscribers, and GPT-4.1 mini became the lightweight model available to all ChatGPT users, taking the place previously held by GPT-4o mini. [3] The GPT-4.1 mini model is also accessible through Microsoft's Azure OpenAI Service and third-party API aggregators that mirror OpenAI's published pricing and benchmarks. [4]