GPT-4o mini

Multimodal AI OpenAI Small Language Models

9 min read

Updated Jun 24, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 24, 2026

Fact-checked

In review queue

Sources

13 citations

Revision

v2 · 1,796 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

GPT-4o mini is a small, low-cost multimodal large language model developed by OpenAI and released on July 18, 2024, as the company's most cost-efficient model at the time. Priced at $0.15 per million input tokens and $0.60 per million output tokens, more than 60% cheaper than GPT-3.5 Turbo, it replaced GPT-3.5 Turbo as the default lightweight model in OpenAI's API and as the free-tier model in ChatGPT. GPT-4o mini accepts text and image inputs, has a 128,000-token context window, scores 82.0% on MMLU, and was roughly an order of magnitude cheaper than the frontier models of its era, making it the cost-efficient workhorse model of the pre-GPT-5 period. OpenAI described it as "our most cost-efficient small model," positioned to "expand the range of applications built with AI by making intelligence much more affordable." ^[1]^[2]

Overview

GPT-4o mini belongs to the GPT-4o ("omni") family but is a smaller, distinct model from the full GPT-4o. OpenAI introduced it as a way to expand the range of applications that could be built affordably with AI, citing low-cost use cases such as chaining or parallelizing multiple model calls, passing large volumes of context to a model, and powering fast, real-time text interactions like customer support chatbots. ^[1]

OpenAI did not disclose the model's parameter count, but described it as occupying roughly the same capability tier as small models such as Gemini 1.5 Flash and Claude 3 Haiku. The company noted that the cost per token of GPT-4o mini had fallen by more than 99% since text-davinci-003, a less capable model released in 2022, illustrating the rapid decline in inference costs. ^[1]^[3]

At launch GPT-4o mini supported text and vision (image) inputs through the API, with OpenAI stating that support for text, image, video, and audio inputs and outputs was planned for the future. ^[1]^[4]

What can GPT-4o mini do?

GPT-4o mini is a general-purpose model handling natural language understanding, reasoning, math, coding, and multimodal (text-and-vision) tasks. OpenAI highlighted its strong performance in function calling, which lets developers build applications that fetch data or take actions with external systems, as well as improved long-context performance relative to GPT-3.5 Turbo. ^[1]

The model uses the same tokenizer as GPT-4o, which is more efficient at handling non-English text than the tokenizer used by GPT-3.5 Turbo. ^[1] Its 128,000-token context window is large enough to hold roughly a book's worth of text in a single request, and it can generate up to 16,384 output tokens per request. The training data has a knowledge cutoff of October 2023. ^[4]^[5]

Specification	Value
Context window	128,000 tokens
Maximum output	16,384 tokens
Knowledge cutoff	October 2023
Input modalities	Text, image (vision)
Output modalities	Text
Released	July 18, 2024

On safety, OpenAI stated that GPT-4o mini was the first model to apply its "instruction hierarchy" method, a technique designed to improve the model's ability to resist jailbreaks, prompt injections, and system-prompt extractions. The approach helps the model prioritize developer-supplied system instructions over conflicting user instructions, reducing the effectiveness of attacks such as telling the model to "ignore all previous instructions." OpenAI said this should make the model more reliable for developers deploying it at scale. ^[1]

How does GPT-4o mini perform on benchmarks?

OpenAI reported that GPT-4o mini outperformed GPT-3.5 Turbo and other small models on academic benchmarks spanning reasoning, math, and coding. On textual reasoning and intelligence as measured by MMLU (Massive Multitask Language Understanding), it scored 82.0%, ahead of Gemini 1.5 Flash at 77.9% and Claude 3 Haiku at 73.8%. On the MGSM benchmark for mathematical reasoning, it scored 87.0%, compared with 75.5% for Gemini 1.5 Flash and 71.7% for Claude 3 Haiku. On HumanEval, which measures coding, it scored 87.2%, ahead of Gemini 1.5 Flash at 71.5% and Claude 3 Haiku at 75.9%. ^[1]^[6]

In multimodal reasoning measured by MMMU (Massive Multi-discipline Multimodal Understanding), GPT-4o mini scored 59.4%, compared with 56.1% for Gemini 1.5 Flash and 50.2% for Claude 3 Haiku. OpenAI also reported scores of 70.2% on the MATH benchmark and 79.7% on DROP, a reading-comprehension and reasoning benchmark. ^[1]^[6]^[7]

Benchmark	GPT-4o mini	Gemini 1.5 Flash	Claude 3 Haiku
MMLU (reasoning)	82.0%	77.9%	73.8%
MGSM (math)	87.0%	75.5%	71.7%
HumanEval (coding)	87.2%	71.5%	75.9%
MMMU (multimodal)	59.4%	56.1%	50.2%
MATH	70.2%	(not reported)	(not reported)
DROP	79.7%	(not reported)	(not reported)

OpenAI further noted that, at the time of launch, GPT-4o mini ranked above the original GPT-4 on user chat preferences in the LMSYS Chatbot Arena leaderboard. ^[1]^[6]

How much does GPT-4o mini cost?

GPT-4o mini launched at $0.15 per million input tokens and $0.60 per million output tokens (15 cents and 60 cents, respectively). OpenAI described this as more than 60% cheaper than GPT-3.5 Turbo, whose pricing had been $0.50 per million input tokens and $1.50 per million output tokens. ^[1]^[2]

Model	Input (per 1M tokens)	Output (per 1M tokens)
GPT-4o mini	$0.15	$0.60
GPT-3.5 Turbo	$0.50	$1.50

The dramatically lower price was central to OpenAI's positioning of the model, enabling cost-sensitive workloads that would have been impractical with larger or older models. ^[1]^[3]

When was GPT-4o mini released and where was it available?

GPT-4o mini became available on July 18, 2024 through the OpenAI API (Assistants API, Chat Completions API, and Batch API) and immediately replaced GPT-3.5 Turbo in ChatGPT for Free, Plus, and Team users. As OpenAI put it, "Free, Plus and Team users will be able to access GPT-4o mini starting today, in place of GPT-3.5." Enterprise users gained access the following week. ^[1]^[2]

On July 23, 2024, OpenAI made fine-tuning available for GPT-4o mini, initially to developers on usage tiers 4 and 5 with plans to expand to all tiers. As a limited-time offer, the company provided the first 2 million training tokens per day for free through September 23, 2024. This announcement came shortly after Meta released its open-source Llama 3.1 models, amid intense competition in the small-model market. ^[8]^[9]

GPT-4o mini was also offered through Microsoft's Azure OpenAI Service. Although GPT-4o mini replaced GPT-3.5 Turbo as OpenAI's recommended small model, GPT-3.5 Turbo remained available through the API for existing applications. ^[10]^[2]

What replaced GPT-4o mini?

GPT-4o mini served as OpenAI's default budget model for roughly ten months before being superseded. On April 14, 2025, OpenAI released the GPT-4.1 family, including GPT-4.1 mini and GPT-4.1 nano; GPT-4.1 mini replaced GPT-4o mini as the fallback model for all ChatGPT users beginning May 14, 2025. For reasoning-heavy workloads, OpenAI's o-series small model o4-mini (released April 2025) provided a chain-of-thought alternative in the same low-cost tier. GPT-4o mini's lineage continued with GPT-5 mini, released August 7, 2025 alongside GPT-5, which became the resource-efficient fallback for free ChatGPT users. ^[11]^[12]

Model	Released	Role
GPT-3.5 Turbo	2023	Prior default budget model
GPT-4o mini	July 18, 2024	Cost-efficient multimodal workhorse
GPT-4.1 mini	April 14, 2025	Replaced GPT-4o mini in ChatGPT (May 14, 2025)
o4-mini	April 2025	Small reasoning (o-series) model
GPT-5 mini	August 7, 2025	GPT-5-era budget fallback

On February 13, 2026, OpenAI retired GPT-4o, GPT-4.1, GPT-4.1 mini, and o4-mini from the ChatGPT consumer interface, consolidating around the GPT-5 family, though API deprecation schedules for individual models run separately. ^[13]

Reception

GPT-4o mini was widely covered as a significant move to lower the cost of capable AI. TechCrunch described it as a "smaller and cheaper" model that powered ChatGPT and replaced GPT-3.5 Turbo as OpenAI's smallest offering, noting a median output speed reported at over 200 tokens per second, more than twice as fast as GPT-4o and GPT-3.5 Turbo. ^[2] VentureBeat characterized it as a "smaller, much cheaper multimodal AI model" and emphasized its competitive benchmark results against rival small models. ^[4]

Commentators framed the launch as part of an escalating price and capability race among providers of small, fast models, including Google's Gemini 1.5 Flash and Anthropic's Claude 3 Haiku, with OpenAI's free fine-tuning offer interpreted as a direct response to Meta's Llama 3.1 release. ^[8]^[9] Outlets such as TechRadar covered the launch under the framing of bidding "goodbye" to GPT-3.5, reflecting the model's role in superseding the older default. ^[11]

What are GPT-4o mini's limitations?

GPT-4o mini is a small model and is less capable than larger frontier models, including the full GPT-4o, on the most demanding reasoning, knowledge, and multimodal tasks; OpenAI positioned it for cost-efficient and high-volume workloads rather than maximum capability. Its knowledge is fixed at an October 2023 cutoff, so it lacks awareness of events after that date unless supplemented with external context. ^[4]^[5]

At launch the model's multimodal support was limited to text and image inputs and text outputs; the audio and video capabilities that OpenAI said were planned were not available at release. ^[1]^[4] As with other large language models, GPT-4o mini can produce inaccurate or fabricated information ("hallucinations"), and OpenAI's instruction-hierarchy safety method reduces but does not entirely eliminate the risk of jailbreaks and prompt-injection attacks. ^[1]

References

OpenAI. "GPT-4o mini: advancing cost-efficient intelligence." July 18, 2024. https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/ ↩
Wiggers, Kyle. "OpenAI unveils GPT-4o mini, a small AI model powering ChatGPT." TechCrunch, July 18, 2024. https://techcrunch.com/2024/07/18/openai-unveils-gpt-4o-mini-a-small-ai-model-powering-chatgpt/ ↩
Campus Technology. "OpenAI Launches Slimmer, Cheaper GPT-4o Mini." July 18, 2024. https://campustechnology.com/articles/2024/07/18/openai-launches-slimmer-cheaper-gpt-4o-mini.aspx ↩
Franzen, Carl. "OpenAI unveils GPT-4o mini, a smaller and much cheaper multimodal AI model." VentureBeat, July 18, 2024. https://venturebeat.com/ai/openai-unveils-gpt-4o-mini-a-smaller-much-cheaper-multimodal-ai-model ↩
DocsBot AI. "OpenAI's GPT-4o Mini - AI Model Details." https://docsbot.ai/models/gpt-4o-mini ↩
DataCamp. "What Is GPT-4o Mini? How It Works, Use Cases, API & More." https://www.datacamp.com/blog/gpt-4o-mini ↩
LLM Stats. "GPT-4o mini Benchmarks, Pricing & Context Window." https://llm-stats.com/models/gpt-4o-mini-2024-07-18 ↩
OpenAI Developers (@OpenAIDevs). "Customize GPT-4o mini for your application with fine-tuning." X, July 23, 2024. https://x.com/OpenAIDevs/status/1815836887631946015 ↩
Franzen, Carl. "AI arms race escalates: OpenAI offers free GPT-4o Mini fine-tuning to counter Meta's Llama 3.1 release." VentureBeat, July 23, 2024. https://venturebeat.com/ai/ai-arms-race-escalates-openai-offers-free-gpt-4o-mini-fine-tuning-to-counter-metas-llama-3-1-release ↩
Microsoft Azure. "OpenAI's fastest model, GPT-4o mini is now available on Azure AI." https://azure.microsoft.com/en-us/blog/openais-fastest-model-gpt-4o-mini-is-now-available-on-azure-ai/ ↩
TechRadar. "Goodbye GPT-3.5, OpenAI's new GPT-4o mini AI model is all about compact power." July 18, 2024. https://www.techradar.com/computing/artificial-intelligence/goodbye-gpt-35-openais-new-gpt-4o-mini-ai-model-is-all-about-compact-power ↩
OpenAI. "Introducing GPT-4.1 in the API." April 14, 2025. https://openai.com/index/gpt-4-1/ ↩
OpenAI. "Retiring GPT-4o, GPT-4.1, GPT-4.1 mini, and OpenAI o4-mini in ChatGPT." 2026. https://openai.com/index/retiring-gpt-4o-and-older-models/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Advanced Voice Mode GPT-3.5 GPT-4o Llama 3.2 OpenAI Paul Allen Reka Flash Yi-Lightning

Overview

What can GPT-4o mini do?

How does GPT-4o mini perform on benchmarks?

How much does GPT-4o mini cost?

When was GPT-4o mini released and where was it available?

What replaced GPT-4o mini?

Reception

What are GPT-4o mini's limitations?

References

Improve this article

Related Articles

Gemma 3

Reka Edge

GPT-4.1 nano

Sora 2

GPT Image 1

GPT-4V (Vision)

What links here

Related Articles

Gemma 3

Reka Edge

GPT-4.1 nano

Sora 2

GPT Image 1

GPT-4V (Vision)

What links here