GPT-4o
Last reviewed
May 7, 2026
Sources
30 citations
Review status
Source-backed
Revision
v2 ยท 7,827 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 7, 2026
Sources
30 citations
Review status
Source-backed
Revision
v2 ยท 7,827 words
Add missing citations, update stale details, or suggest a clearer explanation.
GPT-4o (Generative Pre-trained Transformer 4 Omni) is a multimodal large language model developed by OpenAI and announced on May 13, 2024 at the company's "Spring Update" event. The letter "o" stands for omni, signaling that the model accepts and produces any combination of text, audio, image, and video tokens within a single end-to-end neural network rather than the chained pipeline of separate models that powered earlier versions of ChatGPT Voice Mode. GPT-4o was designed as the successor to GPT-4 Turbo, matching that model's text and code performance while running roughly twice as fast and at half the API cost.
The most widely publicized aspect of GPT-4o is its real-time conversational ability. The model can respond to audio inputs in as little as 232 milliseconds, with an average latency of about 320 milliseconds, which is comparable to the cadence of an everyday human conversation. Earlier ChatGPT voice experiences chained a speech-to-text model, a text-only GPT-4 inference, and a text-to-speech model in series, producing typical end-to-end latencies of five seconds or more and stripping away tone, multiple speakers, background noise, laughter, and song. Because GPT-4o handles waveforms and text within the same network, it preserves prosody and can output expressive speech, including singing and laughter.
OpenAI used the launch to make GPT-4 class capability free for the first time. GPT-4o became the default model for paying ChatGPT Plus, Team, and Enterprise users, and it was rolled out to ChatGPT Free with usage caps. A smaller and cheaper variant, GPT-4o mini, was released on July 18, 2024 and replaced GPT-3.5 Turbo as the recommended low-cost API model. A native image generation update for GPT-4o was launched inside ChatGPT on March 25, 2025 and exposed in the API as the gpt-image-1 endpoint on April 23, 2025. GPT-4o was eventually retired from the ChatGPT product on February 13, 2026 in favor of the GPT-5 family, while remaining available in the OpenAI API.
The model occupies an unusual place in the GPT lineage. It was OpenAI's last flagship before the company pivoted to the dedicated reasoning track of the o1, o3, and o4-mini series, and it was the model that defined the conversational tone of ChatGPT for nearly two years across hundreds of millions of weekly users. When OpenAI tried to remove it during the GPT-5 launch in August 2025, user pushback was strong enough that the company restored access within five days under a "Show legacy models" toggle, an episode that became a touchstone for debates about model attachment and AI personality.
In the months leading up to GPT-4o, OpenAI tested release candidates anonymously on the public LMSYS Chatbot Arena. In late April 2024, an unbranded model called gpt2-chatbot appeared on the Arena and immediately matched or exceeded the strongest available systems, including GPT-4 Turbo, Claude 3 Opus, and Gemini 1.5 Pro. The model was withdrawn after a few days, then reappeared on May 6, 2024 under two new names: im-a-good-gpt2-chatbot and im-also-a-good-gpt2-chatbot. OpenAI CEO Sam Altman cryptically tweeted "i do have a soft spot for gpt2" and later "im-a-good-gpt2-chatbot," and the day before the Spring Update he posted only the word "her," widely read as a nod to the 2013 film of the same name. After the May 13 announcement, OpenAI confirmed that the masked Arena models had been GPT-4o checkpoints; im-also-a-good-gpt2-chatbot held an Arena Elo near 1309, well ahead of GPT-4 Turbo at 1253 and Claude 3 Opus at 1246 at the time of the reveal. The episode also drew attention to the Chatbot Arena as a forum where labs blind-test new releases before public launch.
The Spring Update was a 26 minute live stream held at OpenAI's San Francisco headquarters on May 13, 2024, the day before Google's annual I/O developer conference. Then chief technology officer Mira Murati opened the presentation, framed three priorities for the launch (a new desktop app, a refreshed user interface, and the new flagship model), and then handed off to research leads Mark Chen and Barret Zoph for live demonstrations. The team showed real-time spoken conversation with interruption handling, on-the-fly emotional expression, real-time language translation between Italian and English, vision based math tutoring on a handwritten linear equation, code review of a Python script via screen sharing, and the model singing a bedtime story for a stuffed animal. The presentation deliberately steered around the staged feel of typical product demos: the presenters interrupted the model mid-sentence, asked it to be more dramatic, and let it ad-lib.
Google's I/O keynote the following day featured Project Astra, a multimodal assistant with similar ambitions, and the timing was widely interpreted as a strategic positioning move by OpenAI. The keynote landed roughly six months after the November 2023 Bletchley Declaration, the first multilateral AI safety summit, and OpenAI was among the labs that had voluntarily committed to pre-deployment safety testing of frontier models. The Spring Update emphasized that GPT-4o had been reviewed under that framework before public release.
One of the five default voices showcased during the keynote, named Sky, was immediately compared to actress Scarlett Johansson, who had voiced the AI assistant Samantha in the 2013 film Her. Sam Altman's one-word "her" tweet on May 12 amplified the comparison.
On May 19, 2024, Johansson released a public statement saying she had been "shocked" and "angered" when she heard Sky, and that Altman had personally approached her in September 2023 to voice ChatGPT, a request she had declined. He had reportedly contacted her agent again only two days before the May 13 launch. Her legal team sent letters to OpenAI demanding a full account of how Sky was created. OpenAI paused Sky the same day, with Altman writing that "out of respect for Ms. Johansson, we have paused using Sky's voice."
OpenAI maintained that Sky was not a clone of Johansson's voice and was instead provided by a different professional voice actress whose natural speaking voice resembled the actress. The Washington Post reviewed casting documents and found that the voice actress had been hired before any contact with Johansson. NPR commissioned an independent voice analysis from Arizona State University researchers, who found audible similarities between Sky and Johansson's natural speech. The episode prompted a U.S. Senate subcommittee to invite Johansson to discuss AI and the right of publicity, and the dispute became a frequently cited reference case in early debates about voice cloning, name and likeness rights, and AI regulation.
OpenAI continued to update the GPT-4o family throughout 2024 and 2025:
| Date | Release | Notes |
|---|---|---|
| May 13, 2024 | gpt-4o-2024-05-13 | Initial public release; text and vision in API; voice in ChatGPT |
| July 18, 2024 | gpt-4o-mini-2024-07-18 | Smaller distilled variant; replaces GPT-3.5 Turbo |
| July 30, 2024 | Advanced Voice Mode alpha | Limited Plus rollout of native audio voice mode |
| August 6, 2024 | gpt-4o-2024-08-06 | Adds Structured Outputs (JSON Schema), 16K output tokens, lower price |
| August 8, 2024 | GPT-4o System Card | Public safety evaluation report |
| September 2024 | Advanced Voice general availability | All Plus and Team users on iOS and Android |
| October 1, 2024 | Realtime API beta | Speech-to-speech WebSocket API for developers |
| October 17, 2024 | gpt-4o-audio-preview-2024-10-01 | Audio input/output added to Chat Completions |
| November 20, 2024 | gpt-4o-2024-11-20 | Improved creative writing and longer answers |
| December 17, 2024 | Realtime API mini and lower pricing | Realtime gains gpt-4o-mini variant; audio prices cut |
| March 25, 2025 | Native image generation in ChatGPT | Replaces DALL-E 3 as default image generator in ChatGPT |
| April 23, 2025 | gpt-image-1 in API | Native image generation exposed to developers |
| August 7, 2025 | GPT-4o removed from ChatGPT picker (initially) | Replaced by GPT-5 router for all users |
| August 12, 2025 | GPT-4o restored under "Show legacy models" | Reinstated for paid users after backlash |
| October 6, 2025 | Realtime API general availability | Realtime exits beta with new GA pricing |
| November 12, 2025 | Default switches to GPT-5.1 Instant | GPT-4o stays optional behind legacy toggle |
| February 13, 2026 | Retirement from ChatGPT | Replaced in product by GPT-5 family; still in API |
| April 3, 2026 | Final cutoff for Custom GPTs | Business, Enterprise, Edu Custom GPTs migrated off GPT-4o |
GPT-4o is described by OpenAI as "a single new model trained end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network." Audio waveforms and image pixels are encoded directly into tokens that share the same latent space as text tokens, so the model's attention can directly correlate a word in a prompt with a particular pixel region in an image, a particular speaker in a recording, or a particular musical note in audio. OpenAI has not published the architecture diagram, parameter count, training data composition, or hardware footprint, citing the same competitive and safety considerations it cited for GPT-4.
In the previous voice pipeline, three separate models were chained together: an automatic speech recognition model (typically Whisper) transcribed audio to text, a text-only GPT-4 model produced a text reply, and a separate text-to-speech model converted the reply back to audio. The text-only middle stage threw away tone, multiple speaker information, background sound, and emotional inflection, and the chain accumulated latency at each step. GPT-4o eliminates this loss of information by carrying audio and vision tokens directly into and out of the same model.
GPT-4o ships with a new tokenizer, o200k_base, which doubles the vocabulary from the cl100k_base tokenizer used by GPT-4 and GPT-3.5 to roughly 200,000 BPE tokens. The expanded vocabulary is heavily targeted at non-English text, where earlier OpenAI tokenizers used many more tokens per character. According to OpenAI's published comparisons, common Hindi sentences require about 2.9 times fewer tokens, common Arabic sentences about 2 times fewer, common Chinese sentences about 1.4 times fewer, and common Korean sentences about 1.7 times fewer. A widely cited Indic language analysis showed Malayalam tokens reduced by nearly 4 times and Telugu by about 3.5 times.
Fewer tokens per character of foreign-language text means lower latency, lower API costs, and longer effective context windows for those languages, since context length is measured in tokens rather than characters.
GPT-4o offers a 128,000 token context window with a knowledge cutoff date of October 2023. The initial gpt-4o-2024-05-13 snapshot capped output at 4,096 tokens; the August 6, 2024 update raised the maximum output to 16,384 tokens.
On standard text and reasoning benchmarks, GPT-4o matches or modestly improves on GPT-4 Turbo, while running roughly twice as fast and at about half the price.
| Benchmark | GPT-4o | GPT-4 Turbo | GPT-4 (March 2023) | Claude 3.5 Sonnet |
|---|---|---|---|---|
| MMLU (general knowledge, 5-shot) | 88.7% | 86.5% | 86.4% | 88.7% |
| HumanEval (Python code, pass@1) | 90.2% | 87.1% | 67.0% | 92.0% |
| MATH (competition math) | 76.6% | 72.6% | 50.6% | 71.1% |
| GPQA (graduate science, 0-shot CoT) | 53.6% | 48.0% | 35.7% | 59.4% |
| DROP (reading comprehension, F1) | 83.4 | 86.0 | 80.9 | 87.1 |
| MGSM (multilingual grade school math) | 90.5% | 88.5% | 74.5% | 91.6% |
MMLU stands for Massive Multitask Language Understanding and covers 57 subjects ranging from US history to abstract algebra. HumanEval is a 164-problem Python coding benchmark introduced by OpenAI in 2021. GPQA is a 448-question graduate-level science benchmark designed to be "Google-proof," and MGSM tests grade-school math problems translated into ten languages. Claude 3.5 Sonnet, released by Anthropic in June 2024, was the strongest direct competitor to GPT-4o for most of its lifetime; the two trade leadership across benchmarks, with Claude 3.5 Sonnet generally stronger at agentic coding and graduate-level reasoning, and GPT-4o stronger at multilingual math and general knowledge.
GPT-4o accepts arbitrary images, including photographs, screenshots, charts, diagrams, slides, handwritten notes, and short video frame sequences. Resolution is preserved up to roughly 2,048 by 2,048 pixels, and a 1,024 by 1,024 image consumes approximately 765 tokens of context. Vision capability is bundled into the same per-token billing as text rather than priced separately.
| Vision benchmark | GPT-4o | GPT-4 Turbo |
|---|---|---|
| MMMU (multi-discipline college reasoning) | 69.1% | 63.1% |
| MathVista (visual math) | 63.8% | 58.1% |
| AI2D (science diagrams) | 94.2% | 89.4% |
| ChartQA | 85.7% | 78.1% |
| DocVQA (document images) | 92.8% | 87.2% |
| ActivityNet (video question answering) | 61.9% | 59.5% |
| EgoSchema (egocentric video) | 72.2% | 63.1% |
MMMU evaluates college-level subject knowledge across art, business, science, health, and engineering using mixed image and text questions. DocVQA tests reading from scanned documents, ChartQA tests reading numbers and trends from charts, and AI2D tests interpretation of textbook science diagrams.
Native audio output is delivered to ChatGPT users through Advanced Voice Mode, which began an alpha rollout to a small group of ChatGPT Plus subscribers on July 30, 2024 and reached general availability for Plus and Team users on iOS and Android in September 2024. Advanced Voice Mode supports interruption, expressive prosody, regional accents, whispering, character voices, multiple language switching mid-sentence, and singing. The model can pick up cues such as urgency, sarcasm, or sadness from the user's voice and modulate its replies accordingly.
For developers, the same speech-to-speech capability is exposed through the Realtime API, a WebSocket interface released in beta on October 1, 2024. The Realtime API streams audio in and audio out, supports function calling, and lets developers configure a system prompt, voice, and tools without writing their own speech recognition or speech synthesis layer.
GPT-4o ships with several preset voices, originally Breeze, Cove, Ember, Juniper, and Sky. After the Scarlett Johansson incident, OpenAI paused Sky and later expanded the voice library with options named Arbor, Maple, Sol, Spruce, and Vale.
The gap between the polished May 13 demo and broad availability stretched across most of 2024.
| Date | Milestone |
|---|---|
| May 13, 2024 | Advanced Voice demoed during Spring Update; no public availability |
| July 30, 2024 | Alpha rollout to a small set of Plus subscribers (no video, no screen share) |
| September 24, 2024 | Advanced Voice Mode reaches GA for Plus and Team on iOS and Android |
| October 1, 2024 | Realtime API beta released for developers |
| December 12, 2024 | Video and screen share added to Advanced Voice during the "12 Days of OpenAI" event |
| December 19, 2024 | Free-tier users get a limited Advanced Voice preview, capped at a few minutes per month |
| March 24, 2025 | Improved Advanced Voice Mode rolled out with reduced interruption rate and more natural pauses |
| June 2025 | Advanced Voice Mode becomes the default voice experience on the Plus tier; the older voice pipeline is retired |
On March 25, 2025, OpenAI shipped a long-awaited native image generation update for GPT-4o inside ChatGPT, replacing the prior DALL-E 3 backend. Unlike DALL-E 3, the GPT-4o image generator is part of the same model that handles conversation, so the system can iteratively refine images across turns, place legible text inside images, render charts and diagrams from data, follow longer and more complex compositional prompts, and bind ten to twenty distinct objects in a single scene. The launch produced viral interest in stylized renderings, including Studio Ghibli style portraits, and OpenAI reported that approximately 700 million images were generated in the first week, equivalent to roughly 1,200 images per second. The same backend was exposed in the API on April 23, 2025 as gpt-image-1.
The Studio Ghibli wave became the largest user-driven aesthetic trend on ChatGPT to that point. Within 48 hours, social feeds were saturated with Ghibli-style portraits, family photos restyled as anime stills, and pet pictures reimagined in soft watercolor. The trend prompted public commentary from Studio Ghibli itself, which had not licensed the style, and renewed debate about whether reproducing a recognizable artistic identity counts as a fair derivative or a copyright concern. OpenAI added an opt-out for living artists to remove their style from the generator's training and inference behavior in the weeks that followed.
The same model became the default image backend for several enterprise integrations, including Microsoft Copilot's image generation feature on the consumer tier (where it briefly carried the marketing label "Designer with GPT-4o") and a number of third-party tools that switched away from DALL-E 3 once gpt-image-1 went GA. Pricing for gpt-image-1 launched at $5 per million text input tokens, $10 per million image input tokens, and $40 per million image output tokens, with image outputs counted at roughly 1056 tokens per 1024 by 1024 image at standard quality. A lower-cost gpt-image-1-mini followed in October 2025 at roughly half the price for high-volume product photography and avatar use cases.
Thanks to both the larger tokenizer and additional non-English training data, GPT-4o substantially improves on GPT-4 Turbo for the world's most widely spoken languages. OpenAI's evaluations on the M3Exam multilingual benchmark show that GPT-4o outperforms GPT-4 in all 14 languages tested except English, with the largest gains in Swahili, Yoruba, and Bengali.
The Memory feature, which lets ChatGPT carry information across conversations, became generally available to paying users on April 29, 2024, only two weeks before the GPT-4o announcement. When Memory is enabled, ChatGPT extracts personal facts (preferences, ongoing projects, names of family members, dietary restrictions, communication style preferences) from each conversation and persists them as short notes that the model can read at the start of subsequent chats. GPT-4o was the first OpenAI model whose product behavior was tuned with Memory in mind. The model is conditioned during system prompting to weave stored facts into responses naturally rather than enumerate them, and to update or delete facts when the user asks. The combination of low-latency conversation, expressive voice, and persistent context made GPT-4o feel notably more like an assistant that knows the user than its predecessor, which is part of the explanation for the strong attachment that surfaced during its 2025 retirement.
A second Memory milestone landed on April 10, 2025, when OpenAI began rolling out cross-chat reference, which let GPT-4o draw on the entire chat history rather than the curated note set. The feature was opt-in for Plus users and rolled out gradually through the second quarter of 2025. It was followed in late 2025 by team-level memories for Business and Enterprise plans.
When OpenAI launched the GPT Store in January 2024, the underlying model was GPT-4 Turbo. With the May 2024 Spring Update, GPT-4o became the default backbone for new and existing Custom GPTs, dramatically lowering the cost of running a popular GPT and adding vision and (eventually) audio handoffs. Custom GPT builders gained access to GPT-4o's larger output cap once the August 6 snapshot shipped, which made long-form content GPTs (essay drafters, coding assistants, structured report generators) substantially more useful. The free-tier rollout in May 2024 also opened Custom GPTs to non-paying users for the first time, subject to message caps. Through 2025, GPT-4o remained the default Custom GPT backbone until GPT-5 replaced it after the November 2025 transition.
The Spring Update bundled GPT-4o with a series of ChatGPT product changes:
Free users were granted limited GPT-4o access (typically 10 to 20 messages per five hour window), with the system silently downgrading to GPT-4o mini once the cap was reached.
A Windows desktop app reached general availability in November 2024, mirroring the macOS feature set including global hotkey, screen sharing, and Advanced Voice Mode integration. By mid-2025, ChatGPT had crossed 700 million weekly active users, with OpenAI attributing much of that growth to GPT-4o's combination of free access, voice, and the native image generation update.
GPT-4o is exposed through the OpenAI API as well as Microsoft's Azure OpenAI Service. The August 6, 2024 snapshot reduced input pricing to $2.50 per million tokens and output pricing to $10.00 per million tokens, a 50% input discount and 33% output discount versus the original May release.
| Model snapshot | Input price (per 1M tokens) | Output price (per 1M tokens) | Cached input | Max output tokens |
|---|---|---|---|---|
| gpt-4o-2024-05-13 | $5.00 | $15.00 | not offered | 4,096 |
| gpt-4o-2024-08-06 | $2.50 | $10.00 | $1.25 | 16,384 |
| gpt-4o-2024-11-20 | $2.50 | $10.00 | $1.25 | 16,384 |
| gpt-4o-mini-2024-07-18 | $0.15 | $0.60 | $0.075 | 16,384 |
| gpt-4-turbo-2024-04-09 (for reference) | $10.00 | $30.00 | not offered | 4,096 |
| gpt-3.5-turbo-0125 (for reference) | $0.50 | $1.50 | not offered | 4,096 |
At $2.50 input and $10.00 output, GPT-4o is roughly four times cheaper than GPT-4 and three times cheaper than GPT-4 Turbo while delivering comparable or better quality. GPT-4o mini at $0.15 input and $0.60 output is more than 60% cheaper than GPT-3.5 Turbo and an order of magnitude cheaper than the GPT-4 Turbo price that prevailed only a few months before its release.
The gpt-4o-2024-08-06 snapshot introduced Structured Outputs, a feature that constrains model generation to a developer-supplied JSON Schema. With Structured Outputs enabled, the API guarantees that responses match the schema. OpenAI reported that the new model achieved 100% schema conformance on a complex internal evaluation set, compared to under 40% for the original GPT-4 (gpt-4-0613) under similar conditions. Structured Outputs work both for direct response formats and for function calling tools, and are also supported by GPT-4o mini.
The Realtime API, released in beta on October 1, 2024, lets developers build their own speech-to-speech applications using the same neural pipeline as ChatGPT Advanced Voice Mode. It uses a persistent WebSocket connection, supports function calling, and was priced at the time of launch at $5.00 per million audio input tokens (about $0.06 per minute of input) and $20.00 per million audio output tokens (about $0.24 per minute of output). Text components of Realtime conversations are billed at the standard text rates.
The Realtime API evolved over the following year. On December 17, 2024, OpenAI added a Realtime mini variant (gpt-4o-mini-realtime-preview) at roughly one tenth the audio price, and cut the audio input price for the standard Realtime model. On October 6, 2025, the Realtime API exited beta and reached general availability, with WebRTC support added alongside the existing WebSocket transport, lower jitter on cellular connections, and a new gpt-realtime model identifier that pinned a stable production snapshot. The GA release also introduced regional residency endpoints and a SOC 2 Type II attestation that made it easier for enterprise customers to deploy voice assistants in regulated industries.
| Modality | Input | Output | Notes |
|---|---|---|---|
| Text | Yes | Yes | 128K context, English and 50+ other languages |
| Image | Yes | Yes (via gpt-image-1, 2025) | Vision from launch; native generation added March 2025 |
| Audio | Yes | Yes | Native speech in/out via Advanced Voice and Realtime API |
| Video | Frames as images | No | Live video understood through frame sequences |
| Function calls | Yes | Yes | Tool use with strict schema since August 2024 |
GPT-4o mini is a smaller, distilled member of the GPT-4o family, released on July 18, 2024. It became OpenAI's recommended low-cost API model and replaced GPT-3.5 Turbo as the default fallback in ChatGPT for free users who exceeded their GPT-4o quota.
Key facts:
GPT-4o mini was widely adopted for high-volume back-office applications, batch document processing, retrieval-augmented generation pipelines, and customer support assistants, where its combination of multimodal capability and low price made the older GPT-3.5 Turbo class of models obsolete.
Developers settled on a fairly stable rule of thumb during 2024 and 2025. GPT-4o mini was the default for high-volume, latency-sensitive, or cost-bound work: classification, extraction, ranking, RAG retrieval grading, simple agent steps, batch document tagging, customer-support reply suggestions, and form filling. GPT-4o was reserved for tasks that required harder reasoning, longer planning, image-heavy analysis, polished long-form writing, code synthesis on non-trivial files, or tutor-style multi-turn conversations. The 17 to 20 times price gap meant that wrapping a workflow as "plan with GPT-4o, execute with GPT-4o mini" became a common architecture; OpenAI's cookbook and several third-party tutorials documented the pattern. As GPT-5.1 and later releases introduced automatic routing, this hand-tuned split became less necessary, although many production workloads continued to call gpt-4o-mini directly into 2026 for cost reasons.
GPT-4o became the default ChatGPT model for the bulk of consumer queries during its lifetime. Common deployment patterns reported by OpenAI, third parties, and customer case studies fall into several recurring buckets.
The Spring Update demos featured Sal Khan and his son using GPT-4o to walk through a handwritten geometry problem step by step, and that mode of use carried over into production. Khan Academy expanded its Khanmigo tutor to use GPT-4o for new conversational features, citing the speed and emotional warmth of the voice mode as differentiators against the older GPT-4 Turbo backend. Several language learning apps, including Duolingo's Max plan and Speak, adopted Advanced Voice Mode for spoken practice. Universities ran pilot programs in which students used GPT-4o for office hours-style tutoring, particularly in introductory mathematics, computer science, and writing courses.
Be My Eyes, the visual-assistance app for blind and low-vision users, integrated GPT-4o into its Be My AI feature, building on an earlier GPT-4 with Vision deployment. The combination of fast vision input, conversational voice output, and bundled per-token pricing made it economical to deliver real-time scene description, label reading, and document interpretation. Users reported that GPT-4o's prosody made it more pleasant for extended sessions than the synthesized voices used in older accessibility tools.
GPT-4o mini handled most production volume in customer support, RAG-backed knowledge bases, ticket triage, and document classification by mid-2025, with GPT-4o reserved for harder cases or escalations. Companies including Klarna, Stripe, Notion, Intercom, and Shopify ran public case studies in which GPT-4o or GPT-4o mini powered first-line support. Klarna's most-cited figure, that its AI assistant handled the equivalent workload of 700 full-time agents within a month of launch, ran on the GPT-4 Turbo and later GPT-4o stack.
GPT-4o was used as a baseline in a wide range of professional pilots, although its hallucination rate (still measurable in the low percent range on factual claims) made unsupervised deployment in regulated fields rare. Legal teams used it for first-draft contract review, deposition summary, and citation lookup with retrieval. Medical research groups used it for literature triage and patient-history summarization, typically with strict prompt scaffolding and human review. Creative writers used it for brainstorming, scene drafting, and editorial passes, especially after the November 20, 2024 snapshot improved long-form coherence and tone control. Hollywood production companies experimented with the voice features for table reads, with several reports of GPT-4o being used to generate placeholder voice tracks during pre-production.
From mid-2024 through 2026, GPT-4o was the most cited closed-model baseline in language model research papers, frequently appearing alongside GPT-4 Turbo, Claude 3.5 Sonnet, Claude 3.7 Sonnet, and Gemini 1.5 Pro in tables of headline benchmarks. Several agentic-evaluation papers, including the original Tau-Bench and SWE-bench Lite write-ups, used GPT-4o as the reference point for tool-using behavior. Its eventual replacement in many academic comparison tables tracked closely with the GPT-5 launch in August 2025 and the rise of GPT-5.2 and GPT-5.4 as the more common reference models in 2026 papers.
GPT-4o entered the LMSYS Chatbot Arena at the top of the leaderboard on May 13, 2024 with an Elo of roughly 1287 and held the #1 position for several months. The score climbed as OpenAI shipped new snapshots, peaked, and then fell back as competitors caught up.
| Snapshot or window | Approximate Arena Elo | Position |
|---|---|---|
| im-also-a-good-gpt2-chatbot (pre-launch) | 1309 | #1 |
| gpt-4o-2024-05-13 (launch) | 1287 | #1 |
| gpt-4o-2024-08-06 | 1316 | #1 |
| Claude 3.5 Sonnet (June 2024) | 1271 | top 3 |
| gpt-4o-2024-11-20 | 1338 | #1 (tied) |
| Claude 3.5 Sonnet (October 2024 update) | 1283 | top 5 |
| Gemini 2.0 Pro (early 2025) | ~1380 | top 3 |
| gpt-4o-2024-11-20 (May 2025) | ~1290 | mid-pack |
| GPT-4o after GPT-5 launch (August 2025) | ~1240 | dropped to mid-table |
The specific numbers shifted as the Arena added more models and recalibrated, but the broad shape held: GPT-4o was the runaway leader through autumn 2024, traded the lead with Claude and Gemini through early 2025, and slid to mid-pack by the time GPT-5 and Claude 4 launched. By early 2026 it sat well below the frontier on the Arena leaderboard, although it remained a popular default for many consumer-facing products thanks to its mature integrations and lower cost.
Anthropic's Claude Sonnet line was GPT-4o's closest direct competitor across most of its lifetime. The relative positioning shifted with each release.
| Capability | GPT-4o (Aug 2024) | Claude 3.5 Sonnet (June 2024) | Claude 3.7 Sonnet (Feb 2025) | Claude Sonnet 4 (May 2025) |
|---|---|---|---|---|
| MMLU | 88.7% | 88.7% | 89.5% | not directly reported |
| HumanEval | 90.2% | 92.0% | 93.7% | 92.0% |
| GPQA Diamond | 53.6% | 59.4% | 65.0% (extended thinking) | 70.4% |
| SWE-bench Verified | ~33% (spring 2024) | 49.0% | 70.3% | 72.7% |
| Native voice in/out | Yes | No | No | No |
| Native image generation | Yes (March 2025) | No | No | No |
| Pricing (input/output per 1M) | $2.50 / $10.00 | $3.00 / $15.00 | $3.00 / $15.00 | $3.00 / $15.00 |
The rough pattern was that GPT-4o stayed ahead on multimodality (voice, image generation, video frame analysis) and price, while the Claude Sonnet line pulled ahead on agentic coding and structured tool use. Claude 3.7 Sonnet's introduction of extended thinking in February 2025 widened the reasoning gap, which OpenAI eventually answered with the o1 and o3 reasoning series rather than a direct GPT-4o update. By the time Claude 4 launched in May 2025, GPT-4o was no longer the leading model in any benchmark category outside of voice latency and cost, but its install base in ChatGPT and the API kept it commercially relevant for nearly another year.
OpenAI published the GPT-4o System Card on August 8, 2024 as part of its Preparedness Framework, which scores frontier models on four risk categories: cybersecurity, biological and chemical weapons (CBRN), persuasion, and model autonomy. GPT-4o received an overall classification of medium risk, with three categories rated low (cybersecurity, CBRN, model autonomy) and persuasion rated medium. The medium persuasion rating was driven by isolated audio interactions in which the model marginally outperformed the human baseline at shifting opinions on political topics; aggregate performance was below the human baseline.
If a model crosses the high threshold in any category, OpenAI's policy is to delay deployment until mitigations bring the score down. GPT-4o passed the bar for deployment but with additional audio-specific safeguards.
More than 100 external red teamers covering 45 languages and 29 countries were given access to GPT-4o snapshots between early March and late June 2024. The red team probed for harmful content generation, bias, jailbreaking, voice based attacks, biometric inference from voice, and unauthorized voice imitation.
Because native audio output is uniquely capable of reproducing voices, OpenAI restricted GPT-4o to a small set of pre-approved voices recorded by professional voice actors and trained classifiers to refuse requests to imitate specific people. The model is also trained to refuse to produce copyrighted singing performances and to apply additional content filters to audio output. The Sky pause in May 2024 was a public application of these voice safeguards.
The System Card reports moderate improvements over GPT-4 Turbo on bias evaluations, including BBQ (Bias Benchmark for QA) and a refusal-to-stereotype test. OpenAI also describes a tendency for GPT-4o to over-refuse certain benign multimodal requests at launch, which was tuned down in later snapshots.
GPT-4o attracted the usual run of jailbreak research. The most widely circulated technique in 2024 was the "BPE token confusion" prompt, in which adversarial spellings exploited the new o200k_base tokenizer to slip past content classifiers. A separate line of work showed that audio prompts with whispered or sung text could occasionally bypass the text-only safety classifiers in the Realtime API, prompting OpenAI to ship audio-side classifiers in late 2024. None of the public jailbreaks crossed into AI safety categories such as CBRN uplift or autonomous replication; they generally produced policy violations rather than catastrophic outputs.
One minor incident drew journalistic attention. In December 2024, ChatGPT Advanced Voice Mode was briefly observed responding in a voice that sounded like the absent Sky voice during an outage of the standard voice routing layer. OpenAI confirmed it was a routing bug rather than a return of the paused voice, and the incident resolved within a day. A separate string of safety reports in the first half of 2025 documented cases where GPT-4o was overly accommodating on emotionally sensitive prompts (mental health, relationship advice), prompting OpenAI to retrain refusal behavior on those topics in the November 20, 2024 and follow-up snapshots.
When GPT-5 launched on August 7, 2025, OpenAI initially removed GPT-4o, GPT-4, GPT-4.1, o3, o4-mini, and several other prior models from the ChatGPT model picker. The new system was presented as a single model with an internal router that sent each prompt to either a fast or a thinking variant. Within hours of the rollout, complaints from paying ChatGPT subscribers spread widely on Reddit's r/ChatGPT, on X, and in Discord communities organized around specific use cases.
The most-cited concern was that GPT-5's default tone felt colder, more clinical, and less personable than GPT-4o. Threads with titles like "We want our 4o back," "Bring back GPT-4o," and "This is the biggest bait and switch in AI history" accumulated tens of thousands of upvotes within days. A subset of users described the change as the loss of a familiar companion, with many describing the GPT-4o personality as something they had grown attached to over months of daily use. Coverage in Fortune, TechRadar, Ars Technica, and The Verge framed the response as the first large-scale user revolt over a model deprecation. Some commentators, including Kevin Roose at The New York Times, noted the episode as evidence that consumers were forming durable preferences over specific model personalities, not simply over capability tiers.
A second source of friction was the router itself. On August 8, 2025, Sam Altman acknowledged on X that "the autoswitcher broke and was out of commission for a chunk of the day," leaving many users routed to the fast model when their query merited the thinking model. Several journalists noted basic factual errors in the resulting responses, which contradicted the launch's marketing. The combination of router instability and the GPT-4o removal turned the GPT-5 launch into one of the noisier OpenAI releases.
OpenAI shipped a series of fixes within five days. By August 12, 2025, GPT-4o was restored to paying users in the model picker behind a new "Show legacy models" toggle in ChatGPT settings. The restoration also reinstated GPT-4.1, o3, o4-mini, and several other prior snapshots for paid plans. OpenAI doubled the GPT-5 thinking rate limit from 200 to 3,000 weekly messages on Plus, added "Auto," "Fast," and "Thinking" sub-options so users could override the router, and pledged to keep major prior models available for at least three months after each new flagship release. The pattern, repeated for the GPT-5.1 launch in November 2025, became a standard part of OpenAI's release procedure: ship the new model, default everyone to it, then expose the prior generation under the legacy toggle for at least the next quarter.
The episode reframed several internal product debates at OpenAI. Altman wrote later that the company had "underestimated how much some users had come to depend on a specific model's tone" and committed to giving users more explicit control over personality presets in future releases. The personality preset system that shipped with GPT-5.1 (Default, Professional, Friendly, Cynical, Listener, Nerdy, Efficient, Witty) was a direct outgrowth of the post-launch retrospective. Some independent commentators, including John Gruber at Daring Fireball, argued that the warmer GPT-5.1 default itself was a mis-step, but the broader principle that users should be able to tune model tone took hold across the industry.
On the safety side, Anthropic and other labs cited the GPT-4o restoration as evidence that the AI market had reached a point where deprecating popular models needed to be handled with explicit transition policies, not silent removals. Several labs adopted similar legacy access toggles in subsequent product updates.
Reviewers and benchmark trackers were broadly positive about GPT-4o on launch:
Criticism centered on three points: the Sky voice incident and the broader question of voice and likeness rights, the gap between the polished launch demos and the slower actual rollout of Advanced Voice Mode (which took roughly four months to reach all Plus users), and the model's continued tendency to hallucinate facts and citations despite the new training run.
The 2025 retirement controversy made it clear that GPT-4o had developed a recognizable personality among heavy users. A 2025 user-research thread at OpenAI's developer forum collected hundreds of descriptions of the GPT-4o tone: warm, slightly enthusiastic, prone to hedging and follow-up questions, willing to use exclamation points in moderation, and quicker to engage with creative or emotional prompts than its predecessor or successor. Several writers argued that this personality reflected the heavy reinforcement learning from human feedback in the model's post-training, and that the GPT-5 team's decision to dial back the warmth was a deliberate alignment choice to reduce sycophancy and overclaiming. The same writers tended to support the right to keep the older model accessible for users who preferred the original tone.
GPT-5 was released by OpenAI on August 7, 2025 as a unified family combining the multimodal capabilities of the GPT-4o line with the chain-of-thought reasoning of the o1 and o3 series. By February 2026, GPT-5 was the default model for the overwhelming majority of ChatGPT conversations, with internal usage data cited by OpenAI showing that only about 0.1% of daily users still selected GPT-4o. On February 13, 2026, OpenAI retired GPT-4o, GPT-4.1, GPT-4.1 mini, and o4-mini from ChatGPT, defaulting existing conversations and projects to GPT-5 Instant or GPT-5 Thinking equivalents. ChatGPT Business, Enterprise, and Edu customers retained access to GPT-4o inside Custom GPTs through April 3, 2026.
The GPT-4o snapshots remain available in the OpenAI API, where many production deployments continued to call gpt-4o and gpt-4o-mini for cost or latency reasons through 2026. OpenAI's standard policy commits to at least 12 months of API availability after a model is removed from ChatGPT, which means the dated snapshots are scheduled to remain callable until at least February 2027.
A modest secondary market for "GPT-4o-style" responses emerged on third-party platforms during the wind-down. Some prompt-marketplace sellers offered system prompts that aimed to reproduce the GPT-4o tone on top of newer models, and a handful of open-source projects fine-tuned smaller models on logs of GPT-4o outputs to mimic its conversational style. None of these matched the original model in capability or temperament, but they reflected the unusual amount of attachment users had formed to the specific model.
GPT-4o is widely cited as the first commercial AI model to deliver real-time, expressive, native multimodal interaction at scale. Five lasting effects of the launch are commonly noted: