GPT-4o

GPT-4o (Generative Pre-trained Transformer 4 Omni) is a multimodal large language model developed by OpenAI and announced on May 13, 2024 at the company's "Spring Update" event. The letter "o" stands for omni, signaling that the model accepts and produces any combination of text, audio, image, and video tokens within a single end-to-end neural network rather than the chained pipeline of separate models that powered earlier versions of ChatGPT Voice Mode. GPT-4o was designed as the successor to GPT-4 Turbo, matching that model's text and code performance while running roughly twice as fast and at half the API cost.

The most widely publicized aspect of GPT-4o is its real-time conversational ability. The model can respond to audio inputs in as little as 232 milliseconds, with an average latency of about 320 milliseconds, which is comparable to the cadence of an everyday human conversation. Earlier ChatGPT voice experiences chained a speech-to-text model, a text-only GPT-4 inference, and a text-to-speech model in series, producing typical end-to-end latencies of five seconds or more and stripping away tone, multiple speakers, background noise, laughter, and song. Because GPT-4o handles waveforms and text within the same network, it preserves prosody and can output expressive speech, including singing and laughter.

OpenAI used the launch to make GPT-4 class capability free for the first time. GPT-4o became the default model for paying ChatGPT Plus, Team, and Enterprise users, and it was rolled out to ChatGPT Free with usage caps. A smaller and cheaper variant, GPT-4o mini, was released on July 18, 2024 and replaced GPT-3.5 Turbo as the recommended low-cost API model. A native image generation update for GPT-4o was launched inside ChatGPT on March 25, 2025 and exposed in the API as the gpt-image-1 endpoint on April 23, 2025. GPT-4o was eventually retired from the ChatGPT product on February 13, 2026 in favor of the GPT-5 family, while remaining available in the OpenAI API.

The model occupies an unusual place in the GPT lineage. It was OpenAI's last flagship before the company pivoted to the dedicated reasoning track of the o1, o3, and o4-mini series, and it was the model that defined the conversational tone of ChatGPT for nearly two years across hundreds of millions of weekly users. When OpenAI tried to remove it during the GPT-5 launch in August 2025, user pushback was strong enough that the company restored access within five days under a "Show legacy models" toggle, an episode that became a touchstone for debates about model attachment and AI personality.

History

Background and the gpt2-chatbot leak

In the months leading up to GPT-4o, OpenAI tested release candidates anonymously on the public LMSYS Chatbot Arena. In late April 2024, an unbranded model called gpt2-chatbot appeared on the Arena and immediately matched or exceeded the strongest available systems, including GPT-4 Turbo, Claude 3 Opus, and Gemini 1.5 Pro. The model was withdrawn after a few days, then reappeared on May 6, 2024 under two new names: im-a-good-gpt2-chatbot and im-also-a-good-gpt2-chatbot. OpenAI CEO Sam Altman cryptically tweeted "i do have a soft spot for gpt2" and later "im-a-good-gpt2-chatbot," and the day before the Spring Update he posted only the word "her," widely read as a nod to the 2013 film of the same name. After the May 13 announcement, OpenAI confirmed that the masked Arena models had been GPT-4o checkpoints; im-also-a-good-gpt2-chatbot held an Arena Elo near 1309, well ahead of GPT-4 Turbo at 1253 and Claude 3 Opus at 1246 at the time of the reveal. The episode also drew attention to the Chatbot Arena as a forum where labs blind-test new releases before public launch.

Spring Update keynote

The Spring Update was a 26 minute live stream held at OpenAI's San Francisco headquarters on May 13, 2024, the day before Google's annual I/O developer conference. Then chief technology officer Mira Murati opened the presentation, framed three priorities for the launch (a new desktop app, a refreshed user interface, and the new flagship model), and then handed off to research leads Mark Chen and Barret Zoph for live demonstrations. The team showed real-time spoken conversation with interruption handling, on-the-fly emotional expression, real-time language translation between Italian and English, vision based math tutoring on a handwritten linear equation, code review of a Python script via screen sharing, and the model singing a bedtime story for a stuffed animal. The presentation deliberately steered around the staged feel of typical product demos: the presenters interrupted the model mid-sentence, asked it to be more dramatic, and let it ad-lib.

Google's I/O keynote the following day featured Project Astra, a multimodal assistant with similar ambitions, and the timing was widely interpreted as a strategic positioning move by OpenAI. The keynote landed roughly six months after the November 2023 Bletchley Declaration, the first multilateral AI safety summit, and OpenAI was among the labs that had voluntarily committed to pre-deployment safety testing of frontier models. The Spring Update emphasized that GPT-4o had been reviewed under that framework before public release.

Sky voice and Scarlett Johansson controversy

One of the five default voices showcased during the keynote, named Sky, was immediately compared to actress Scarlett Johansson, who had voiced the AI assistant Samantha in the 2013 film Her. Sam Altman's one-word "her" tweet on May 12 amplified the comparison.

On May 19, 2024, Johansson released a public statement saying she had been "shocked" and "angered" when she heard Sky, and that Altman had personally approached her in September 2023 to voice ChatGPT, a request she had declined. He had reportedly contacted her agent again only two days before the May 13 launch. Her legal team sent letters to OpenAI demanding a full account of how Sky was created. OpenAI paused Sky the same day, with Altman writing that "out of respect for Ms. Johansson, we have paused using Sky's voice."

OpenAI maintained that Sky was not a clone of Johansson's voice and was instead provided by a different professional voice actress whose natural speaking voice resembled the actress. The Washington Post reviewed casting documents and found that the voice actress had been hired before any contact with Johansson. NPR commissioned an independent voice analysis from Arizona State University researchers, who found audible similarities between Sky and Johansson's natural speech. The episode prompted a U.S. Senate subcommittee to invite Johansson to discuss AI and the right of publicity, and the dispute became a frequently cited reference case in early debates about voice cloning, name and likeness rights, and AI regulation.

Subsequent releases

OpenAI continued to update the GPT-4o family throughout 2024 and 2025:

Date	Release	Notes
May 13, 2024	gpt-4o-2024-05-13	Initial public release; text and vision in API; voice in ChatGPT
July 18, 2024	gpt-4o-mini-2024-07-18	Smaller distilled variant; replaces GPT-3.5 Turbo
July 30, 2024	Advanced Voice Mode alpha	Limited Plus rollout of native audio voice mode
August 6, 2024	gpt-4o-2024-08-06	Adds Structured Outputs (JSON Schema), 16K output tokens, lower price
August 8, 2024	GPT-4o System Card	Public safety evaluation report
September 2024	Advanced Voice general availability	All Plus and Team users on iOS and Android
October 1, 2024	Realtime API beta	Speech-to-speech WebSocket API for developers
October 17, 2024	gpt-4o-audio-preview-2024-10-01	Audio input/output added to Chat Completions
November 20, 2024	gpt-4o-2024-11-20	Improved creative writing and longer answers
December 17, 2024	Realtime API mini and lower pricing	Realtime gains gpt-4o-mini variant; audio prices cut
March 25, 2025	Native image generation in ChatGPT	Replaces DALL-E 3 as default image generator in ChatGPT
April 23, 2025	gpt-image-1 in API	Native image generation exposed to developers
August 7, 2025	GPT-4o removed from ChatGPT picker (initially)	Replaced by GPT-5 router for all users
August 12, 2025	GPT-4o restored under "Show legacy models"	Reinstated for paid users after backlash
October 6, 2025	Realtime API general availability	Realtime exits beta with new GA pricing
November 12, 2025	Default switches to GPT-5.1 Instant	GPT-4o stays optional behind legacy toggle
February 13, 2026	Retirement from ChatGPT	Replaced in product by GPT-5 family; still in API
April 3, 2026	Final cutoff for Custom GPTs	Business, Enterprise, Edu Custom GPTs migrated off GPT-4o

Architecture

Single network multimodality

GPT-4o is described by OpenAI as "a single new model trained end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network." Audio waveforms and image pixels are encoded directly into tokens that share the same latent space as text tokens, so the model's attention can directly correlate a word in a prompt with a particular pixel region in an image, a particular speaker in a recording, or a particular musical note in audio. OpenAI has not published the architecture diagram, parameter count, training data composition, or hardware footprint, citing the same competitive and safety considerations it cited for GPT-4.

In the previous voice pipeline, three separate models were chained together: an automatic speech recognition model (typically Whisper) transcribed audio to text, a text-only GPT-4 model produced a text reply, and a separate text-to-speech model converted the reply back to audio. The text-only middle stage threw away tone, multiple speaker information, background sound, and emotional inflection, and the chain accumulated latency at each step. GPT-4o eliminates this loss of information by carrying audio and vision tokens directly into and out of the same model.

Tokenizer

GPT-4o ships with a new tokenizer, o200k_base, which doubles the vocabulary from the cl100k_base tokenizer used by GPT-4 and GPT-3.5 to roughly 200,000 BPE tokens. The expanded vocabulary is heavily targeted at non-English text, where earlier OpenAI tokenizers used many more tokens per character. According to OpenAI's published comparisons, common Hindi sentences require about 2.9 times fewer tokens, common Arabic sentences about 2 times fewer, common Chinese sentences about 1.4 times fewer, and common Korean sentences about 1.7 times fewer. A widely cited Indic language analysis showed Malayalam tokens reduced by nearly 4 times and Telugu by about 3.5 times.

Fewer tokens per character of foreign-language text means lower latency, lower API costs, and longer effective context windows for those languages, since context length is measured in tokens rather than characters.

Context window and outputs

GPT-4o offers a 128,000 token context window with a knowledge cutoff date of October 2023. The initial gpt-4o-2024-05-13 snapshot capped output at 4,096 tokens; the August 6, 2024 update raised the maximum output to 16,384 tokens.

Capabilities

Text and code

On standard text and reasoning benchmarks, GPT-4o matches or modestly improves on GPT-4 Turbo, while running roughly twice as fast and at about half the price.

Benchmark	GPT-4o	GPT-4 Turbo	GPT-4 (March 2023)	Claude 3.5 Sonnet
MMLU (general knowledge, 5-shot)	88.7%	86.5%	86.4%	88.7%
HumanEval (Python code, pass@1)	90.2%	87.1%	67.0%	92.0%
MATH (competition math)	76.6%	72.6%	50.6%	71.1%
GPQA (graduate science, 0-shot CoT)	53.6%	48.0%	35.7%	59.4%
DROP (reading comprehension, F1)	83.4	86.0	80.9	87.1
MGSM (multilingual grade school math)	90.5%	88.5%	74.5%	91.6%

MMLU stands for Massive Multitask Language Understanding and covers 57 subjects ranging from US history to abstract algebra. HumanEval is a 164-problem Python coding benchmark introduced by OpenAI in 2021. GPQA is a 448-question graduate-level science benchmark designed to be "Google-proof," and MGSM tests grade-school math problems translated into ten languages. Claude 3.5 Sonnet, released by Anthropic in June 2024, was the strongest direct competitor to GPT-4o for most of its lifetime; the two trade leadership across benchmarks, with Claude 3.5 Sonnet generally stronger at agentic coding and graduate-level reasoning, and GPT-4o stronger at multilingual math and general knowledge.

Vision

GPT-4o accepts arbitrary images, including photographs, screenshots, charts, diagrams, slides, handwritten notes, and short video frame sequences. Resolution is preserved up to roughly 2,048 by 2,048 pixels, and a 1,024 by 1,024 image consumes approximately 765 tokens of context. Vision capability is bundled into the same per-token billing as text rather than priced separately.

Vision benchmark	GPT-4o	GPT-4 Turbo
MMMU (multi-discipline college reasoning)	69.1%	63.1%
MathVista (visual math)	63.8%	58.1%
AI2D (science diagrams)	94.2%	89.4%
ChartQA	85.7%	78.1%
DocVQA (document images)	92.8%	87.2%
ActivityNet (video question answering)	61.9%	59.5%
EgoSchema (egocentric video)	72.2%	63.1%

MMMU evaluates college-level subject knowledge across art, business, science, health, and engineering using mixed image and text questions. DocVQA tests reading from scanned documents, ChartQA tests reading numbers and trends from charts, and AI2D tests interpretation of textbook science diagrams.

Audio and Advanced Voice Mode

Native audio output is delivered to ChatGPT users through Advanced Voice Mode, which began an alpha rollout to a small group of ChatGPT Plus subscribers on July 30, 2024 and reached general availability for Plus and Team users on iOS and Android in September 2024. Advanced Voice Mode supports interruption, expressive prosody, regional accents, whispering, character voices, multiple language switching mid-sentence, and singing. The model can pick up cues such as urgency, sarcasm, or sadness from the user's voice and modulate its replies accordingly.

For developers, the same speech-to-speech capability is exposed through the Realtime API, a WebSocket interface released in beta on October 1, 2024. The Realtime API streams audio in and audio out, supports function calling, and lets developers configure a system prompt, voice, and tools without writing their own speech recognition or speech synthesis layer.

GPT-4o ships with several preset voices, originally Breeze, Cove, Ember, Juniper, and Sky. After the Scarlett Johansson incident, OpenAI paused Sky and later expanded the voice library with options named Arbor, Maple, Sol, Spruce, and Vale.

Advanced Voice Mode rollout milestones

The gap between the polished May 13 demo and broad availability stretched across most of 2024.

Date	Milestone
May 13, 2024	Advanced Voice demoed during Spring Update; no public availability
July 30, 2024	Alpha rollout to a small set of Plus subscribers (no video, no screen share)
September 24, 2024	Advanced Voice Mode reaches GA for Plus and Team on iOS and Android
October 1, 2024	Realtime API beta released for developers
December 12, 2024	Video and screen share added to Advanced Voice during the "12 Days of OpenAI" event
December 19, 2024	Free-tier users get a limited Advanced Voice preview, capped at a few minutes per month
March 24, 2025	Improved Advanced Voice Mode rolled out with reduced interruption rate and more natural pauses
June 2025	Advanced Voice Mode becomes the default voice experience on the Plus tier; the older voice pipeline is retired

Native image generation

On March 25, 2025, OpenAI shipped a long-awaited native image generation update for GPT-4o inside ChatGPT, replacing the prior DALL-E 3 backend. Unlike DALL-E 3, the GPT-4o image generator is part of the same model that handles conversation, so the system can iteratively refine images across turns, place legible text inside images, render charts and diagrams from data, follow longer and more complex compositional prompts, and bind ten to twenty distinct objects in a single scene. The launch produced viral interest in stylized renderings, including Studio Ghibli style portraits, and OpenAI reported that approximately 700 million images were generated in the first week, equivalent to roughly 1,200 images per second. The same backend was exposed in the API on April 23, 2025 as gpt-image-1.

The Studio Ghibli wave became the largest user-driven aesthetic trend on ChatGPT to that point. Within 48 hours, social feeds were saturated with Ghibli-style portraits, family photos restyled as anime stills, and pet pictures reimagined in soft watercolor. The trend prompted public commentary from Studio Ghibli itself, which had not licensed the style, and renewed debate about whether reproducing a recognizable artistic identity counts as a fair derivative or a copyright concern. OpenAI added an opt-out for living artists to remove their style from the generator's training and inference behavior in the weeks that followed.

The same model became the default image backend for several enterprise integrations, including Microsoft Copilot's image generation feature on the consumer tier (where it briefly carried the marketing label "Designer with GPT-4o") and a number of third-party tools that switched away from DALL-E 3 once gpt-image-1 went GA. Pricing for gpt-image-1 launched at $5 per million text input tokens, $10 per million image input tokens, and $40 per million image output tokens, with image outputs counted at roughly 1056 tokens per 1024 by 1024 image at standard quality. A lower-cost gpt-image-1-mini followed in October 2025 at roughly half the price for high-volume product photography and avatar use cases.

Multilingual capability

Thanks to both the larger tokenizer and additional non-English training data, GPT-4o substantially improves on GPT-4 Turbo for the world's most widely spoken languages. OpenAI's evaluations on the M3Exam multilingual benchmark show that GPT-4o outperforms GPT-4 in all 14 languages tested except English, with the largest gains in Swahili, Yoruba, and Bengali.

Memory

The Memory feature, which lets ChatGPT carry information across conversations, became generally available to paying users on April 29, 2024, only two weeks before the GPT-4o announcement. When Memory is enabled, ChatGPT extracts personal facts (preferences, ongoing projects, names of family members, dietary restrictions, communication style preferences) from each conversation and persists them as short notes that the model can read at the start of subsequent chats. GPT-4o was the first OpenAI model whose product behavior was tuned with Memory in mind. The model is conditioned during system prompting to weave stored facts into responses naturally rather than enumerate them, and to update or delete facts when the user asks. The combination of low-latency conversation, expressive voice, and persistent context made GPT-4o feel notably more like an assistant that knows the user than its predecessor, which is part of the explanation for the strong attachment that surfaced during its 2025 retirement.

A second Memory milestone landed on April 10, 2025, when OpenAI began rolling out cross-chat reference, which let GPT-4o draw on the entire chat history rather than the curated note set. The feature was opt-in for Plus users and rolled out gradually through the second quarter of 2025. It was followed in late 2025 by team-level memories for Business and Enterprise plans.

Custom GPT integration

When OpenAI launched the GPT Store in January 2024, the underlying model was GPT-4 Turbo. With the May 2024 Spring Update, GPT-4o became the default backbone for new and existing Custom GPTs, dramatically lowering the cost of running a popular GPT and adding vision and (eventually) audio handoffs. Custom GPT builders gained access to GPT-4o's larger output cap once the August 6 snapshot shipped, which made long-form content GPTs (essay drafters, coding assistants, structured report generators) substantially more useful. The free-tier rollout in May 2024 also opened Custom GPTs to non-paying users for the first time, subject to message caps. Through 2025, GPT-4o remained the default Custom GPT backbone until GPT-5 replaced it after the November 2025 transition.

ChatGPT integration and product changes

The Spring Update bundled GPT-4o with a series of ChatGPT product changes:

A new macOS desktop app with a global keyboard shortcut and screen-sharing for vision and audio context.
A refreshed web interface with persistent chat history in a redesigned sidebar.
A unified model picker that exposes GPT-4o, GPT-4, and (until their retirement) GPT-3.5 to paying users.
Custom GPTs, file uploads, browsing, advanced data analysis, and the GPT Store made available to free users for the first time, subject to lower message caps than paying tiers.
Memory across conversations rolled out to paying users, allowing the assistant to remember user preferences between chats.

Free users were granted limited GPT-4o access (typically 10 to 20 messages per five hour window), with the system silently downgrading to GPT-4o mini once the cap was reached.

A Windows desktop app reached general availability in November 2024, mirroring the macOS feature set including global hotkey, screen sharing, and Advanced Voice Mode integration. By mid-2025, ChatGPT had crossed 700 million weekly active users, with OpenAI attributing much of that growth to GPT-4o's combination of free access, voice, and the native image generation update.

API and pricing

GPT-4o is exposed through the OpenAI API as well as Microsoft's Azure OpenAI Service. The August 6, 2024 snapshot reduced input pricing to $2.50 per million tokens and output pricing to $10.00 per million tokens, a 50% input discount and 33% output discount versus the original May release.

Model snapshot	Input price (per 1M tokens)	Output price (per 1M tokens)	Cached input	Max output tokens
gpt-4o-2024-05-13	$5.00	$15.00	not offered	4,096
gpt-4o-2024-08-06	$2.50	$10.00	$1.25	16,384
gpt-4o-2024-11-20	$2.50	$10.00	$1.25	16,384
gpt-4o-mini-2024-07-18	$0.15	$0.60	$0.075	16,384
gpt-4-turbo-2024-04-09 (for reference)	$10.00	$30.00	not offered	4,096
gpt-3.5-turbo-0125 (for reference)	$0.50	$1.50	not offered	4,096

At $2.50 input and $10.00 output, GPT-4o is roughly four times cheaper than GPT-4 and three times cheaper than GPT-4 Turbo while delivering comparable or better quality. GPT-4o mini at $0.15 input and $0.60 output is more than 60% cheaper than GPT-3.5 Turbo and an order of magnitude cheaper than the GPT-4 Turbo price that prevailed only a few months before its release.

Structured Outputs

The gpt-4o-2024-08-06 snapshot introduced Structured Outputs, a feature that constrains model generation to a developer-supplied JSON Schema. With Structured Outputs enabled, the API guarantees that responses match the schema. OpenAI reported that the new model achieved 100% schema conformance on a complex internal evaluation set, compared to under 40% for the original GPT-4 (gpt-4-0613) under similar conditions. Structured Outputs work both for direct response formats and for function calling tools, and are also supported by GPT-4o mini.

Realtime API

The Realtime API, released in beta on October 1, 2024, lets developers build their own speech-to-speech applications using the same neural pipeline as ChatGPT Advanced Voice Mode. It uses a persistent WebSocket connection, supports function calling, and was priced at the time of launch at $5.00 per million audio input tokens (about $0.06 per minute of input) and $20.00 per million audio output tokens (about $0.24 per minute of output). Text components of Realtime conversations are billed at the standard text rates.

The Realtime API evolved over the following year. On December 17, 2024, OpenAI added a Realtime mini variant (gpt-4o-mini-realtime-preview) at roughly one tenth the audio price, and cut the audio input price for the standard Realtime model. On October 6, 2025, the Realtime API exited beta and reached general availability, with WebRTC support added alongside the existing WebSocket transport, lower jitter on cellular connections, and a new gpt-realtime model identifier that pinned a stable production snapshot. The GA release also introduced regional residency endpoints and a SOC 2 Type II attestation that made it easier for enterprise customers to deploy voice assistants in regulated industries.

Modality matrix

Modality	Input	Output	Notes
Text	Yes	Yes	128K context, English and 50+ other languages
Image	Yes	Yes (via gpt-image-1, 2025)	Vision from launch; native generation added March 2025
Audio	Yes	Yes	Native speech in/out via Advanced Voice and Realtime API
Video	Frames as images	No	Live video understood through frame sequences
Function calls	Yes	Yes	Tool use with strict schema since August 2024

GPT-4o mini

GPT-4o mini is a smaller, distilled member of the GPT-4o family, released on July 18, 2024. It became OpenAI's recommended low-cost API model and replaced GPT-3.5 Turbo as the default fallback in ChatGPT for free users who exceeded their GPT-4o quota.

Key facts:

Released July 18, 2024 as gpt-4o-mini-2024-07-18.
Priced at $0.15 per million input tokens and $0.60 per million output tokens, roughly 60% cheaper than GPT-3.5 Turbo and more than 25 times cheaper than the original GPT-4 Turbo.
128,000 token context, 16,384 token max output, October 2023 knowledge cutoff.
Supports text and vision input from launch; audio input and output were added later through the same Realtime API.
Scores 82.0% on MMLU, 87.0% on HumanEval, and 70.2% on MATH, well above GPT-3.5 Turbo (70.0%, 48.1%, and 34.1% respectively) and competitive with Anthropic's Claude 3 Haiku and Google's Gemini 1.5 Flash in the same price tier.
Inherits the o200k_base tokenizer and Structured Outputs from GPT-4o.

GPT-4o mini was widely adopted for high-volume back-office applications, batch document processing, retrieval-augmented generation pipelines, and customer support assistants, where its combination of multimodal capability and low price made the older GPT-3.5 Turbo class of models obsolete.

When to use GPT-4o vs GPT-4o mini

Developers settled on a fairly stable rule of thumb during 2024 and 2025. GPT-4o mini was the default for high-volume, latency-sensitive, or cost-bound work: classification, extraction, ranking, RAG retrieval grading, simple agent steps, batch document tagging, customer-support reply suggestions, and form filling. GPT-4o was reserved for tasks that required harder reasoning, longer planning, image-heavy analysis, polished long-form writing, code synthesis on non-trivial files, or tutor-style multi-turn conversations. The 17 to 20 times price gap meant that wrapping a workflow as "plan with GPT-4o, execute with GPT-4o mini" became a common architecture; OpenAI's cookbook and several third-party tutorials documented the pattern. As GPT-5.1 and later releases introduced automatic routing, this hand-tuned split became less necessary, although many production workloads continued to call gpt-4o-mini directly into 2026 for cost reasons.

Use cases and deployment

GPT-4o became the default ChatGPT model for the bulk of consumer queries during its lifetime. Common deployment patterns reported by OpenAI, third parties, and customer case studies fall into several recurring buckets.

Education and tutoring

The Spring Update demos featured Sal Khan and his son using GPT-4o to walk through a handwritten geometry problem step by step, and that mode of use carried over into production. Khan Academy expanded its Khanmigo tutor to use GPT-4o for new conversational features, citing the speed and emotional warmth of the voice mode as differentiators against the older GPT-4 Turbo backend. Several language learning apps, including Duolingo's Max plan and Speak, adopted Advanced Voice Mode for spoken practice. Universities ran pilot programs in which students used GPT-4o for office hours-style tutoring, particularly in introductory mathematics, computer science, and writing courses.

Accessibility

Be My Eyes, the visual-assistance app for blind and low-vision users, integrated GPT-4o into its Be My AI feature, building on an earlier GPT-4 with Vision deployment. The combination of fast vision input, conversational voice output, and bundled per-token pricing made it economical to deliver real-time scene description, label reading, and document interpretation. Users reported that GPT-4o's prosody made it more pleasant for extended sessions than the synthesized voices used in older accessibility tools.

Customer support and back office

GPT-4o mini handled most production volume in customer support, RAG-backed knowledge bases, ticket triage, and document classification by mid-2025, with GPT-4o reserved for harder cases or escalations. Companies including Klarna, Stripe, Notion, Intercom, and Shopify ran public case studies in which GPT-4o or GPT-4o mini powered first-line support. Klarna's most-cited figure, that its AI assistant handled the equivalent workload of 700 full-time agents within a month of launch, ran on the GPT-4 Turbo and later GPT-4o stack.

Legal, medical, and creative work

GPT-4o was used as a baseline in a wide range of professional pilots, although its hallucination rate (still measurable in the low percent range on factual claims) made unsupervised deployment in regulated fields rare. Legal teams used it for first-draft contract review, deposition summary, and citation lookup with retrieval. Medical research groups used it for literature triage and patient-history summarization, typically with strict prompt scaffolding and human review. Creative writers used it for brainstorming, scene drafting, and editorial passes, especially after the November 20, 2024 snapshot improved long-form coherence and tone control. Hollywood production companies experimented with the voice features for table reads, with several reports of GPT-4o being used to generate placeholder voice tracks during pre-production.

Academic baseline

From mid-2024 through 2026, GPT-4o was the most cited closed-model baseline in language model research papers, frequently appearing alongside GPT-4 Turbo, Claude 3.5 Sonnet, Claude 3.7 Sonnet, and Gemini 1.5 Pro in tables of headline benchmarks. Several agentic-evaluation papers, including the original Tau-Bench and SWE-bench Lite write-ups, used GPT-4o as the reference point for tool-using behavior. Its eventual replacement in many academic comparison tables tracked closely with the GPT-5 launch in August 2025 and the rise of GPT-5.2 and GPT-5.4 as the more common reference models in 2026 papers.

Benchmark trajectory and competitor comparisons

LMSYS Chatbot Arena Elo over time

GPT-4o entered the LMSYS Chatbot Arena at the top of the leaderboard on May 13, 2024 with an Elo of roughly 1287 and held the #1 position for several months. The score climbed as OpenAI shipped new snapshots, peaked, and then fell back as competitors caught up.

Snapshot or window	Approximate Arena Elo	Position
im-also-a-good-gpt2-chatbot (pre-launch)	1309	#1
gpt-4o-2024-05-13 (launch)	1287	#1
gpt-4o-2024-08-06	1316	#1
Claude 3.5 Sonnet (June 2024)	1271	top 3
gpt-4o-2024-11-20	1338	#1 (tied)
Claude 3.5 Sonnet (October 2024 update)	1283	top 5
Gemini 2.0 Pro (early 2025)	~1380	top 3
gpt-4o-2024-11-20 (May 2025)	~1290	mid-pack
GPT-4o after GPT-5 launch (August 2025)	~1240	dropped to mid-table

The specific numbers shifted as the Arena added more models and recalibrated, but the broad shape held: GPT-4o was the runaway leader through autumn 2024, traded the lead with Claude and Gemini through early 2025, and slid to mid-pack by the time GPT-5 and Claude 4 launched. By early 2026 it sat well below the frontier on the Arena leaderboard, although it remained a popular default for many consumer-facing products thanks to its mature integrations and lower cost.

Comparison with Claude 3.5 Sonnet, Claude 3.7 Sonnet, and Claude Sonnet 4

Anthropic's Claude Sonnet line was GPT-4o's closest direct competitor across most of its lifetime. The relative positioning shifted with each release.

Capability	GPT-4o (Aug 2024)	Claude 3.5 Sonnet (June 2024)	Claude 3.7 Sonnet (Feb 2025)	Claude Sonnet 4 (May 2025)
MMLU	88.7%	88.7%	89.5%	not directly reported
HumanEval	90.2%	92.0%	93.7%	92.0%
GPQA Diamond	53.6%	59.4%	65.0% (extended thinking)	70.4%
SWE-bench Verified	~33% (spring 2024)	49.0%	70.3%	72.7%
Native voice in/out	Yes	No	No	No
Native image generation	Yes (March 2025)	No	No	No
Pricing (input/output per 1M)	$2.50 / $10.00	$3.00 / $15.00	$3.00 / $15.00	$3.00 / $15.00

The rough pattern was that GPT-4o stayed ahead on multimodality (voice, image generation, video frame analysis) and price, while the Claude Sonnet line pulled ahead on agentic coding and structured tool use. Claude 3.7 Sonnet's introduction of extended thinking in February 2025 widened the reasoning gap, which OpenAI eventually answered with the o1 and o3 reasoning series rather than a direct GPT-4o update. By the time Claude 4 launched in May 2025, GPT-4o was no longer the leading model in any benchmark category outside of voice latency and cost, but its install base in ChatGPT and the API kept it commercially relevant for nearly another year.

Safety and alignment

Preparedness Framework evaluation

OpenAI published the GPT-4o System Card on August 8, 2024 as part of its Preparedness Framework, which scores frontier models on four risk categories: cybersecurity, biological and chemical weapons (CBRN), persuasion, and model autonomy. GPT-4o received an overall classification of medium risk, with three categories rated low (cybersecurity, CBRN, model autonomy) and persuasion rated medium. The medium persuasion rating was driven by isolated audio interactions in which the model marginally outperformed the human baseline at shifting opinions on political topics; aggregate performance was below the human baseline.

If a model crosses the high threshold in any category, OpenAI's policy is to delay deployment until mitigations bring the score down. GPT-4o passed the bar for deployment but with additional audio-specific safeguards.

Red teaming

More than 100 external red teamers covering 45 languages and 29 countries were given access to GPT-4o snapshots between early March and late June 2024. The red team probed for harmful content generation, bias, jailbreaking, voice based attacks, biometric inference from voice, and unauthorized voice imitation.

Voice safety mitigations

Because native audio output is uniquely capable of reproducing voices, OpenAI restricted GPT-4o to a small set of pre-approved voices recorded by professional voice actors and trained classifiers to refuse requests to imitate specific people. The model is also trained to refuse to produce copyrighted singing performances and to apply additional content filters to audio output. The Sky pause in May 2024 was a public application of these voice safeguards.

Bias and refusal behavior

The System Card reports moderate improvements over GPT-4 Turbo on bias evaluations, including BBQ (Bias Benchmark for QA) and a refusal-to-stereotype test. OpenAI also describes a tendency for GPT-4o to over-refuse certain benign multimodal requests at launch, which was tuned down in later snapshots.

Jailbreaks and safety incidents

GPT-4o attracted the usual run of jailbreak research. The most widely circulated technique in 2024 was the "BPE token confusion" prompt, in which adversarial spellings exploited the new o200k_base tokenizer to slip past content classifiers. A separate line of work showed that audio prompts with whispered or sung text could occasionally bypass the text-only safety classifiers in the Realtime API, prompting OpenAI to ship audio-side classifiers in late 2024. None of the public jailbreaks crossed into AI safety categories such as CBRN uplift or autonomous replication; they generally produced policy violations rather than catastrophic outputs.

One minor incident drew journalistic attention. In December 2024, ChatGPT Advanced Voice Mode was briefly observed responding in a voice that sounded like the absent Sky voice during an outage of the standard voice routing layer. OpenAI confirmed it was a routing bug rather than a return of the paused voice, and the incident resolved within a day. A separate string of safety reports in the first half of 2025 documented cases where GPT-4o was overly accommodating on emotionally sensitive prompts (mental health, relationship advice), prompting OpenAI to retrain refusal behavior on those topics in the November 20, 2024 and follow-up snapshots.

The August 2025 GPT-5 transition and "we want our 4o back"

When GPT-5 launched on August 7, 2025, OpenAI initially removed GPT-4o, GPT-4, GPT-4.1, o3, o4-mini, and several other prior models from the ChatGPT model picker. The new system was presented as a single model with an internal router that sent each prompt to either a fast or a thinking variant. Within hours of the rollout, complaints from paying ChatGPT subscribers spread widely on Reddit's r/ChatGPT, on X, and in Discord communities organized around specific use cases.

The most-cited concern was that GPT-5's default tone felt colder, more clinical, and less personable than GPT-4o. Threads with titles like "We want our 4o back," "Bring back GPT-4o," and "This is the biggest bait and switch in AI history" accumulated tens of thousands of upvotes within days. A subset of users described the change as the loss of a familiar companion, with many describing the GPT-4o personality as something they had grown attached to over months of daily use. Coverage in Fortune, TechRadar, Ars Technica, and The Verge framed the response as the first large-scale user revolt over a model deprecation. Some commentators, including Kevin Roose at The New York Times, noted the episode as evidence that consumers were forming durable preferences over specific model personalities, not simply over capability tiers.

A second source of friction was the router itself. On August 8, 2025, Sam Altman acknowledged on X that "the autoswitcher broke and was out of commission for a chunk of the day," leaving many users routed to the fast model when their query merited the thinking model. Several journalists noted basic factual errors in the resulting responses, which contradicted the launch's marketing. The combination of router instability and the GPT-4o removal turned the GPT-5 launch into one of the noisier OpenAI releases.

OpenAI shipped a series of fixes within five days. By August 12, 2025, GPT-4o was restored to paying users in the model picker behind a new "Show legacy models" toggle in ChatGPT settings. The restoration also reinstated GPT-4.1, o3, o4-mini, and several other prior snapshots for paid plans. OpenAI doubled the GPT-5 thinking rate limit from 200 to 3,000 weekly messages on Plus, added "Auto," "Fast," and "Thinking" sub-options so users could override the router, and pledged to keep major prior models available for at least three months after each new flagship release. The pattern, repeated for the GPT-5.1 launch in November 2025, became a standard part of OpenAI's release procedure: ship the new model, default everyone to it, then expose the prior generation under the legacy toggle for at least the next quarter.

The episode reframed several internal product debates at OpenAI. Altman wrote later that the company had "underestimated how much some users had come to depend on a specific model's tone" and committed to giving users more explicit control over personality presets in future releases. The personality preset system that shipped with GPT-5.1 (Default, Professional, Friendly, Cynical, Listener, Nerdy, Efficient, Witty) was a direct outgrowth of the post-launch retrospective. Some independent commentators, including John Gruber at Daring Fireball, argued that the warmer GPT-5.1 default itself was a mis-step, but the broader principle that users should be able to tune model tone took hold across the industry.

On the safety side, Anthropic and other labs cited the GPT-4o restoration as evidence that the AI market had reached a point where deprecating popular models needed to be handled with explicit transition policies, not silent removals. Several labs adopted similar legacy access toggles in subsequent product updates.

Reception

Reviewers and benchmark trackers were broadly positive about GPT-4o on launch:

LMSYS Chatbot Arena placed GPT-4o at the top of its leaderboard within days, with an Elo score above 1300.
MIT Technology Review described the model as letting people "interact using voice or video in the same model" and called it the most credible challenge yet to staged interactions with virtual assistants like Siri or Alexa.
TechCrunch highlighted the cost reduction, noting that 50% lower API pricing and a free-tier rollout meaningfully widened access to GPT-4 class capability.
The Verge and Axios focused on the speed of the demos and the timing one day before Google I/O.
VentureBeat covered the March 2025 native image generation update under the headline "Insane," reflecting widespread surprise at how well the model could render long passages of in-image text.

Criticism centered on three points: the Sky voice incident and the broader question of voice and likeness rights, the gap between the polished launch demos and the slower actual rollout of Advanced Voice Mode (which took roughly four months to reach all Plus users), and the model's continued tendency to hallucinate facts and citations despite the new training run.

Long-term reception and the "4o personality"

The 2025 retirement controversy made it clear that GPT-4o had developed a recognizable personality among heavy users. A 2025 user-research thread at OpenAI's developer forum collected hundreds of descriptions of the GPT-4o tone: warm, slightly enthusiastic, prone to hedging and follow-up questions, willing to use exclamation points in moderation, and quicker to engage with creative or emotional prompts than its predecessor or successor. Several writers argued that this personality reflected the heavy reinforcement learning from human feedback in the model's post-training, and that the GPT-5 team's decision to dial back the warmth was a deliberate alignment choice to reduce sycophancy and overclaiming. The same writers tended to support the right to keep the older model accessible for users who preferred the original tone.

Retirement and successor

GPT-5 was released by OpenAI on August 7, 2025 as a unified family combining the multimodal capabilities of the GPT-4o line with the chain-of-thought reasoning of the o1 and o3 series. By February 2026, GPT-5 was the default model for the overwhelming majority of ChatGPT conversations, with internal usage data cited by OpenAI showing that only about 0.1% of daily users still selected GPT-4o. On February 13, 2026, OpenAI retired GPT-4o, GPT-4.1, GPT-4.1 mini, and o4-mini from ChatGPT, defaulting existing conversations and projects to GPT-5 Instant or GPT-5 Thinking equivalents. ChatGPT Business, Enterprise, and Edu customers retained access to GPT-4o inside Custom GPTs through April 3, 2026.

The GPT-4o snapshots remain available in the OpenAI API, where many production deployments continued to call gpt-4o and gpt-4o-mini for cost or latency reasons through 2026. OpenAI's standard policy commits to at least 12 months of API availability after a model is removed from ChatGPT, which means the dated snapshots are scheduled to remain callable until at least February 2027.

A modest secondary market for "GPT-4o-style" responses emerged on third-party platforms during the wind-down. Some prompt-marketplace sellers offered system prompts that aimed to reproduce the GPT-4o tone on top of newer models, and a handful of open-source projects fine-tuned smaller models on logs of GPT-4o outputs to mimic its conversational style. None of these matched the original model in capability or temperament, but they reflected the unusual amount of attachment users had formed to the specific model.

Legacy

GPT-4o is widely cited as the first commercial AI model to deliver real-time, expressive, native multimodal interaction at scale. Five lasting effects of the launch are commonly noted:

Free GPT-4 class access. By making GPT-4o available to ChatGPT Free, OpenAI established a baseline that competitors quickly matched, including Anthropic with Claude 3.5 Haiku and Google with Gemini 1.5 Flash.
Native multimodality as the default expectation. Subsequent flagship models from Google (Gemini 2.0), Anthropic (Claude 3.5 and 3.7 Sonnet), and Meta (Llama 3.2 vision) all advertised native rather than pipelined multimodality, in part because the GPT-4o demo had reset user expectations.
Voice and likeness law. The Sky controversy is taught in law schools and cited in legislative debates as an early test case for voice cloning, the right of publicity, and AI regulation.
Model attachment as a product issue. The August 2025 "we want our 4o back" episode demonstrated that users develop durable attachments to specific model personalities, and pushed the industry toward explicit personality presets, legacy access toggles, and longer deprecation windows.
The end of the GPT-4 era. GPT-4o was the final flagship in the original GPT line before OpenAI bifurcated into the o-series for reasoning and the unified GPT-5 family. Its retirement closed a two-year stretch in which a single multimodal chat model was at the center of OpenAI's product story.

References

OpenAI. "Hello GPT-4o." OpenAI Blog, May 13, 2024. https://openai.com/index/hello-gpt-4o/
OpenAI. "Spring Update." OpenAI Blog, May 13, 2024. https://openai.com/index/spring-update/
OpenAI. "Introducing GPT-4o and more tools to ChatGPT free users." OpenAI Blog, May 13, 2024. https://openai.com/index/gpt-4o-and-more-tools-to-chatgpt-free/
OpenAI. "GPT-4o System Card." OpenAI, August 8, 2024. https://openai.com/index/gpt-4o-system-card/
OpenAI. "GPT-4o mini: advancing cost-efficient intelligence." OpenAI Blog, July 18, 2024. https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/
OpenAI. "Introducing Structured Outputs in the API." OpenAI Blog, August 6, 2024. https://openai.com/index/introducing-structured-outputs-in-the-api/
OpenAI. "Retiring GPT-4o, GPT-4.1, GPT-4.1 mini, and OpenAI o4-mini in ChatGPT." OpenAI Blog, February 2026. https://openai.com/index/retiring-gpt-4o-and-older-models/
Wiggers, Kyle. "OpenAI debuts GPT-4o 'omni' model now powering ChatGPT." TechCrunch, May 13, 2024. https://techcrunch.com/2024/05/13/openais-newest-model-is-gpt-4o/
Heaven, Will Douglas. "OpenAI's new GPT-4o lets people interact using voice or video in the same model." MIT Technology Review, May 13, 2024. https://www.technologyreview.com/2024/05/13/1092358/openais-new-gpt-4o-model-lets-people-interact-using-voice-or-video-in-the-same-model
Allyn, Bobby. "Scarlett Johansson wants answers about ChatGPT voice that sounds like 'Her'." NPR, May 20, 2024. https://www.npr.org/2024/05/20/1252495087/openai-pulls-ai-voice-that-was-compared-to-scarlett-johansson-in-the-movie-her
Tiku, Nitasha. "OpenAI didn't copy Scarlett Johansson's voice for ChatGPT, records show." The Washington Post, May 22, 2024. https://www.washingtonpost.com/technology/2024/05/22/openai-scarlett-johansson-chatgpt-ai-voice/
Donnelly, Matt. "Scarlett Johansson Reacts to OpenAI Sky Voice: 'Shocked' and 'Angered'." Variety, May 20, 2024. https://variety.com/2024/digital/news/scarlett-johansson-responds-shocked-angered-openai-chatgpt-her-1236011135/
"GPT-4o." Wikipedia. https://en.wikipedia.org/wiki/GPT-4o
Franzen, Carl. "'Insane': OpenAI introduces GPT-4o native image generation and it's already wowing users." VentureBeat, March 25, 2025. https://venturebeat.com/ai/insane-openai-introduces-gpt-4o-native-image-generation-and-its-already-wowing-users
Microsoft Azure AI Foundry Blog. "Exploring the New Frontier of AI: OpenAI's GPT-4o For Indic Languages." Microsoft Community Hub, 2024. https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/exploring-the-new-frontier-of-ai-openais-gpt-4-o-for-indic-languages/4142383
LMSYS Org. "The Multimodal Arena is Here!" LMSYS Blog, June 27, 2024. https://www.lmsys.org/blog/2024-06-27-multimodal/
"OpenAI debuts new model with enhanced real-time voice abilities." Axios, May 13, 2024. https://www.axios.com/2024/05/13/openai-google-chatgpt-ai
OpenAI. "GPT-4o System Card." arXiv preprint, October 2024. https://arxiv.org/abs/2410.21276
OpenAI. "Introducing the Realtime API." OpenAI Blog, October 1, 2024. https://openai.com/index/introducing-the-realtime-api/
OpenAI. "OpenAI Day 9: Realtime API updates." OpenAI Blog, December 17, 2024. https://openai.com/12-days/?day=9
OpenAI. "Introducing 4o Image Generation." OpenAI Blog, March 25, 2025. https://openai.com/index/introducing-4o-image-generation/
OpenAI. "Introducing gpt-image-1 in the API." OpenAI Blog, April 23, 2025. https://openai.com/index/image-generation-api/
OpenAI. "Memory and new controls for ChatGPT." OpenAI Blog, February 13, 2024. https://openai.com/index/memory-and-new-controls-for-chatgpt/
"GPT-5's model router ignited a user backlash against OpenAI." Fortune, August 12, 2025. https://fortune.com/2025/08/12/openai-gpt-5-model-router-backlash-ai-future/
"Sam Altman confirms ChatGPT Plus subscribers will have increased rate limit amid continued GPT-5 backlash." TechRadar, August 2025. https://www.techradar.com/ai-platforms-assistants/chatgpt/sam-altman-confirms-chatgpt-plus-subscribers-will-have-increased-rate-limit-amid-continued-gpt-5-backlash
"GPT-5 Launch Timeline: A Story of Backlash and OpenAI's Fix." Arsturn, August 2025. https://www.arsturn.com/blog/the-gpt-5-launch-a-timeline-of-grand-promises-user-backlash-openais-scramble-to-fix-it
Be My Eyes. "Be My AI: Built with OpenAI GPT-4o." bemyeyes.com, 2024. https://www.bemyeyes.com/blog/be-my-ai
Khan Academy. "Khanmigo expands with GPT-4o." khanacademy.org, 2024. https://www.khanacademy.org/khan-labs
OpenAI. "OpenAI Realtime API: General Availability." OpenAI Blog, October 6, 2025. https://openai.com/index/realtime-api-general-availability/
OpenAI. "Memory cross-chat reference rollout." OpenAI Blog, April 10, 2025. https://openai.com/index/memory-faq/

History

Background and the gpt2-chatbot leak

Spring Update keynote

Sky voice and Scarlett Johansson controversy

Subsequent releases

Architecture

Single network multimodality

Tokenizer

Context window and outputs

Capabilities

Text and code

Vision

Audio and Advanced Voice Mode

Advanced Voice Mode rollout milestones

Native image generation

Multilingual capability

Memory

Custom GPT integration

ChatGPT integration and product changes

API and pricing

Structured Outputs

Realtime API

Modality matrix

GPT-4o mini

When to use GPT-4o vs GPT-4o mini

Use cases and deployment

Education and tutoring

Accessibility

Customer support and back office

Legal, medical, and creative work

Academic baseline

Benchmark trajectory and competitor comparisons

LMSYS Chatbot Arena Elo over time

Comparison with Claude 3.5 Sonnet, Claude 3.7 Sonnet, and Claude Sonnet 4

Safety and alignment

Preparedness Framework evaluation

Red teaming

Voice safety mitigations

Bias and refusal behavior

Jailbreaks and safety incidents

The August 2025 GPT-5 transition and "we want our 4o back"

Reception

Long-term reception and the "4o personality"

Retirement and successor

Legacy

See also

References

Improve this article

Related Articles

OpenAI Codex

GPT-4.5

ChatGPT

ChatGPT Guides

GPT Store

Gaming ChatGPT Plugins

History

Background and the gpt2-chatbot leak

Spring Update keynote

Sky voice and Scarlett Johansson controversy

Subsequent releases

Architecture

Single network multimodality

Tokenizer

Context window and outputs

Capabilities

Text and code

Vision

Audio and Advanced Voice Mode

Advanced Voice Mode rollout milestones

Native image generation

Multilingual capability

Memory

Custom GPT integration

ChatGPT integration and product changes

API and pricing

Structured Outputs

Realtime API

Modality matrix

GPT-4o mini

When to use GPT-4o vs GPT-4o mini