GPT-4o mini
Last reviewed
Jun 3, 2026
Sources
11 citations
Review status
Source-backed
Revision
v1 · 1,474 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
11 citations
Review status
Source-backed
Revision
v1 · 1,474 words
Add missing citations, update stale details, or suggest a clearer explanation.
GPT-4o mini is a small, low-cost multimodal large language model developed by OpenAI and released on July 18, 2024. Positioned as OpenAI's most cost-efficient small model at launch, it replaced GPT-3.5 Turbo as the default lightweight model in the company's API and as the free-tier model in ChatGPT. GPT-4o mini accepts text and image inputs, has a 128,000-token context window, and was priced at $0.15 per million input tokens and $0.60 per million output tokens, making it roughly an order of magnitude cheaper than the frontier models of its era. [1][2]
GPT-4o mini belongs to the GPT-4o ("omni") family but is a smaller, distinct model from the full GPT-4o. OpenAI introduced it as a way to expand the range of applications that could be built affordably with AI, citing low-cost use cases such as chaining or parallelizing multiple model calls, passing large volumes of context to a model, and powering fast, real-time text interactions like customer support chatbots. [1]
OpenAI did not disclose the model's parameter count, but described it as occupying roughly the same capability tier as small models such as Gemini 1.5 Flash and Claude 3 Haiku. The company noted that the cost per token of GPT-4o mini had fallen by more than 99% since text-davinci-003, a less capable model released in 2022, illustrating the rapid decline in inference costs. [1][3]
At launch GPT-4o mini supported text and vision (image) inputs through the API, with OpenAI stating that support for text, image, video, and audio inputs and outputs was planned for the future. [1][4]
GPT-4o mini is a general-purpose model handling natural language understanding, reasoning, math, coding, and multimodal (text-and-vision) tasks. OpenAI highlighted its strong performance in function calling, which lets developers build applications that fetch data or take actions with external systems, as well as improved long-context performance relative to GPT-3.5 Turbo. [1]
The model uses the same tokenizer as GPT-4o, which is more efficient at handling non-English text than the tokenizer used by GPT-3.5 Turbo. [1] Its 128,000-token context window is large enough to hold roughly a book's worth of text in a single request, and it can generate up to 16,384 output tokens per request. The training data has a knowledge cutoff of October 2023. [4][5]
| Specification | Value |
|---|---|
| Context window | 128,000 tokens |
| Maximum output | 16,384 tokens |
| Knowledge cutoff | October 2023 |
| Input modalities | Text, image (vision) |
| Output modalities | Text |
| Released | July 18, 2024 |
On safety, OpenAI stated that GPT-4o mini was the first model to apply its "instruction hierarchy" method, a technique designed to improve the model's ability to resist jailbreaks, prompt injections, and system-prompt extractions. The approach helps the model prioritize developer-supplied system instructions over conflicting user instructions, reducing the effectiveness of attacks such as telling the model to "ignore all previous instructions." OpenAI said this should make the model more reliable for developers deploying it at scale. [1]
OpenAI reported that GPT-4o mini outperformed GPT-3.5 Turbo and other small models on academic benchmarks spanning reasoning, math, and coding. On textual reasoning and intelligence as measured by MMLU (Massive Multitask Language Understanding), it scored 82.0%, ahead of Gemini 1.5 Flash at 77.9% and Claude 3 Haiku at 73.8%. On the MGSM benchmark for mathematical reasoning, it scored 87.0%, compared with 75.5% for Gemini 1.5 Flash and 71.7% for Claude 3 Haiku. On HumanEval, which measures coding, it scored 87.2%, ahead of Gemini 1.5 Flash at 71.5% and Claude 3 Haiku at 75.9%. [1][6]
In multimodal reasoning measured by MMMU (Massive Multi-discipline Multimodal Understanding), GPT-4o mini scored 59.4%, compared with 56.1% for Gemini 1.5 Flash and 50.2% for Claude 3 Haiku. OpenAI also reported scores of 70.2% on the MATH benchmark and 79.7% on DROP, a reading-comprehension and reasoning benchmark. [1][6][7]
| Benchmark | GPT-4o mini | Gemini 1.5 Flash | Claude 3 Haiku |
|---|---|---|---|
| MMLU (reasoning) | 82.0% | 77.9% | 73.8% |
| MGSM (math) | 87.0% | 75.5% | 71.7% |
| HumanEval (coding) | 87.2% | 71.5% | 75.9% |
| MMMU (multimodal) | 59.4% | 56.1% | 50.2% |
| MATH | 70.2% | (not reported) | (not reported) |
| DROP | 79.7% | (not reported) | (not reported) |
OpenAI further noted that, at the time of launch, GPT-4o mini ranked above the original GPT-4 on user chat preferences in the LMSYS Chatbot Arena leaderboard. [1][6]
GPT-4o mini launched at $0.15 per million input tokens and $0.60 per million output tokens (15 cents and 60 cents, respectively). OpenAI described this as more than 60% cheaper than GPT-3.5 Turbo, whose pricing had been $0.50 per million input tokens and $1.50 per million output tokens. [1][2]
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| GPT-4o mini | $0.15 | $0.60 |
| GPT-3.5 Turbo | $0.50 | $1.50 |
The dramatically lower price was central to OpenAI's positioning of the model, enabling cost-sensitive workloads that would have been impractical with larger or older models. [1][3]
GPT-4o mini became available on July 18, 2024 through the OpenAI API (Assistants API, Chat Completions API, and Batch API) and immediately replaced GPT-3.5 Turbo in ChatGPT for Free, Plus, and Team users. Enterprise users gained access the following week. [1][2]
On July 23, 2024, OpenAI made fine-tuning available for GPT-4o mini, initially to developers on usage tiers 4 and 5 with plans to expand to all tiers. As a limited-time offer, the company provided the first 2 million training tokens per day for free through September 23, 2024. This announcement came shortly after Meta released its open-source Llama 3.1 models, amid intense competition in the small-model market. [8][9]
GPT-4o mini was also offered through Microsoft's Azure OpenAI Service. Although GPT-4o mini replaced GPT-3.5 Turbo as OpenAI's recommended small model, GPT-3.5 Turbo remained available through the API for existing applications. [10][2]
GPT-4o mini was widely covered as a significant move to lower the cost of capable AI. TechCrunch described it as a "smaller and cheaper" model that powered ChatGPT and replaced GPT-3.5 Turbo as OpenAI's smallest offering, noting a median output speed reported at over 200 tokens per second, more than twice as fast as GPT-4o and GPT-3.5 Turbo. [2] VentureBeat characterized it as a "smaller, much cheaper multimodal AI model" and emphasized its competitive benchmark results against rival small models. [4]
Commentators framed the launch as part of an escalating price and capability race among providers of small, fast models, including Google's Gemini 1.5 Flash and Anthropic's Claude 3 Haiku, with OpenAI's free fine-tuning offer interpreted as a direct response to Meta's Llama 3.1 release. [8][9] Outlets such as TechRadar covered the launch under the framing of bidding "goodbye" to GPT-3.5, reflecting the model's role in superseding the older default. [11]
GPT-4o mini is a small model and is less capable than larger frontier models, including the full GPT-4o, on the most demanding reasoning, knowledge, and multimodal tasks; OpenAI positioned it for cost-efficient and high-volume workloads rather than maximum capability. Its knowledge is fixed at an October 2023 cutoff, so it lacks awareness of events after that date unless supplemented with external context. [4][5]
At launch the model's multimodal support was limited to text and image inputs and text outputs; the audio and video capabilities that OpenAI said were planned were not available at release. [1][4] As with other large language models, GPT-4o mini can produce inaccurate or fabricated information ("hallucinations"), and OpenAI's instruction-hierarchy safety method reduces but does not entirely eliminate the risk of jailbreaks and prompt-injection attacks. [1]