# GPT-4o

> Source: https://aiwiki.ai/wiki/gpt_4o
> Updated: 2026-06-20
> Categories: AI Models, OpenAI
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

**GPT-4o** (Generative Pre-trained Transformer 4 Omni) is a natively multimodal [large language model](/wiki/large_language_model) developed by [OpenAI](/wiki/openai) and announced on May 13, 2024 at the company's "Spring Update" livestream. The lowercase "o" stands for *omni*, signalling that the model accepts and produces any combination of text, audio, and image tokens within a single end-to-end neural network rather than the chained pipeline of separate speech-to-text, language, and text-to-speech models that powered earlier versions of [ChatGPT](/wiki/chatgpt) Voice Mode.[^1] OpenAI's announcement defines the model in a single sentence: "GPT-4o ('o' for 'omni') is a step towards much more natural human-computer interaction. It accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs." [^1] GPT-4o was designed as the successor to GPT-4 Turbo, matching that model's text and code performance while running roughly twice as fast, at half the API cost, and with five times higher rate limits, up to 10 million tokens per minute. [^1][32]

The most widely publicized aspect of GPT-4o is its real-time conversational ability. The model can respond to audio inputs in as little as 232 milliseconds, with an average latency of about 320 milliseconds, which OpenAI says "is similar to human response time in a conversation." [^1] That figure is calibrated to human dialogue: a cross-linguistic study of ten languages found that the modal gap between conversational turns is roughly 200 milliseconds, so GPT-4o's 232 millisecond floor is close to the rhythm people expect from a live exchange. [33] Earlier ChatGPT voice experiences chained a speech-to-text model (typically [Whisper](/wiki/whisper)), a text-only [GPT-4](/wiki/gpt-4) inference, and a separate text-to-speech model in series, producing typical end-to-end latencies of two to five seconds and stripping away tone, multiple-speaker information, background noise, laughter, and song. Because GPT-4o handles waveforms and text within the same network, it preserves prosody and can output expressive speech, including singing and laughter.[^1]

OpenAI used the launch to make GPT-4-class capability free for the first time. GPT-4o became the default model for paying ChatGPT Plus, Team, and Enterprise users, and it was rolled out to ChatGPT Free with usage caps the same day.[^3] A smaller and cheaper variant, [**GPT-4o mini**](/wiki/gpt_4o_mini), was released on July 18, 2024 and replaced GPT-3.5 Turbo as the recommended low-cost API model.[^5] A native image generation update for GPT-4o was launched inside ChatGPT on March 25, 2025 and exposed in the API as the **gpt-image-1** endpoint on April 23, 2025.[^21][^22] GPT-4o was eventually retired from the ChatGPT product on February 13, 2026 in favor of the [GPT-5](/wiki/gpt-5) family, while remaining available in the OpenAI API.[^7]

The model occupies an unusual place in the GPT lineage. It was OpenAI's last "GPT-N" flagship before the company pivoted to the dedicated reasoning track of the [o1](/wiki/o1), [o3](/wiki/o3), and [o4-mini](/wiki/o4_mini) series, and it was the model that defined the conversational tone of ChatGPT for nearly two years across hundreds of millions of weekly users. When OpenAI tried to remove it during the GPT-5 launch in August 2025, user pushback was strong enough that the company restored access within five days under a "Show legacy models" toggle, an episode that became a touchstone for debates about model attachment and AI personality.[^24][^25]

## Background

By the spring of 2024, OpenAI's commercial flagship was **GPT-4 Turbo**, a faster and cheaper variant of GPT-4 introduced at the November 2023 DevDay. GPT-4 Turbo with Vision had added image input in late 2023, but voice in ChatGPT continued to rely on a cascade of three models: Whisper transcribed user audio to text; a text-only GPT-4 generated a reply; and a separate text-to-speech model converted the reply back to audio. The cascade had three structural drawbacks. First, the intermediate text step discarded prosody, multiple-speaker information, background sound, and emotional inflection. Second, each link added latency, producing end-to-end voice turnaround times that were too slow for natural conversation. Third, the cascade made it difficult for the language model to *produce* expressive speech because its output was a sequence of plain text tokens that the TTS layer had to interpret.[^1]

In parallel, the competitive landscape had tightened. Google's Gemini 1.5 Pro, Anthropic's Claude 3 Opus (March 2024), and Meta's Llama 3 (April 2024) all challenged GPT-4 on text and code benchmarks. Google was openly developing **Project Astra**, a multimodal assistant prototype expected to debut at the May 14, 2024 I/O developer conference. OpenAI's leadership decided to schedule a flagship reveal of its own one day before I/O, a piece of strategic positioning widely noted by reporters at the time.[^17]

The internal program that became GPT-4o pursued a single objective: build one neural network that ingests and emits text, image, and audio tokens directly, replacing the three-model voice cascade and improving multilingual coverage at the same time. OpenAI has not published the architecture diagram, parameter count, training data composition, or hardware footprint, citing the same competitive and safety considerations it cited for GPT-4.[^1]

## Release

### gpt2-chatbot leak on the Chatbot Arena

In the months leading up to GPT-4o, OpenAI tested release candidates anonymously on the public LMSYS Chatbot Arena. On April 27, 2024, an unbranded model called **gpt2-chatbot** appeared on the Arena and immediately matched or exceeded the strongest available systems, including GPT-4 Turbo, Claude 3 Opus, and Gemini 1.5 Pro.[^16] The model was withdrawn after a few days, then reappeared on May 6 and 7, 2024 under two new names: **im-a-good-gpt2-chatbot** and **im-also-a-good-gpt2-chatbot**. [Sam Altman](/wiki/sam_altman) cryptically tweeted "i do have a soft spot for gpt2" and later "im-a-good-gpt2-chatbot," and the day before the Spring Update he posted only the word "her," widely read as a nod to the 2013 Spike Jonze film *Her*. After the May 13 announcement, OpenAI confirmed that the masked Arena models had been GPT-4o checkpoints. The episode also drew attention to the [Chatbot Arena](/wiki/lmsys_chatbot_arena) as a forum where labs blind-test new releases before public launch.

### Spring Update keynote

The Spring Update was a roughly 26-minute live stream held at OpenAI's San Francisco headquarters on Monday, May 13, 2024, the day before Google's annual I/O developer conference. Then chief technology officer Mira Murati opened the presentation, framed three priorities for the launch (a new desktop app, a refreshed user interface, and the new flagship model), and then handed off to research leads **Mark Chen** and **Barret Zoph** for live demonstrations.[^1] The team showed real-time spoken conversation with interruption handling, on-the-fly emotional expression, real-time language translation between Italian and English, vision-based math tutoring on a handwritten linear equation, code review of a Python script via screen sharing, and the model singing a bedtime story to a stuffed animal. The presentation deliberately steered around the staged feel of typical product demos: the presenters interrupted the model mid-sentence, asked it to be more dramatic, and let it ad-lib.

Google's I/O keynote the following day featured Project Astra, a multimodal assistant with similar ambitions, and the timing was widely interpreted as a strategic positioning move by OpenAI.[^17] The keynote landed roughly six months after the November 2023 [Bletchley Declaration](/wiki/bletchley_declaration), the first multilateral AI safety summit, and OpenAI was among the labs that had voluntarily committed to pre-deployment safety testing of frontier models. The Spring Update emphasised that GPT-4o had been reviewed under that framework before public release.[^4]

### Immediate availability

GPT-4o was available to API customers as `gpt-4o` and the dated alias `gpt-4o-2024-05-13` from the moment of the keynote, with text and vision input enabled at launch.[^1] In ChatGPT, the model became the default for paying Plus, Team, and Enterprise users on May 13 and began a phased rollout to ChatGPT Free over the following weeks. Free-tier users gained the ability to use GPT-4o, Custom GPTs, file uploads, browsing, and advanced data analysis for the first time, subject to message caps that silently downgraded to the smaller GPT-3.5 Turbo backend once exhausted. The free-tier rollout was widely cited as the first time a frontier-class model was directly available to non-paying users at scale.[^3]

## Architecture

### A single network, end-to-end

The GPT-4o System Card classifies the model as "an autoregressive omni model, which accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs," and adds that "it's trained end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network." [^4] Audio waveforms and image pixels are encoded directly into tokens that share the same latent space as text tokens, so the model's attention can correlate a word in a prompt with a particular pixel region in an image, a particular speaker in a recording, or a particular musical note in audio. OpenAI has not released the precise tokenisation scheme for audio, but the practical effect is that the model can emit audio tokens whose decoded waveform carries laughter, prosody, accents, and singing, properties that are difficult to convey through plain text passed to a downstream TTS layer.[^1]

### How does GPT-4o differ from the old voice cascade?

The cascaded predecessor pipeline used three separate models in series: an automatic speech recognition model (typically Whisper) transcribed audio to text; a text-only GPT-4 model produced a text reply; and a separate text-to-speech model converted the reply back to audio. The middle text-only stage discarded tone, multiple-speaker information, background sound, and emotional inflection, and the chain accumulated latency at each step. GPT-4o eliminates this loss of information by carrying audio and vision tokens directly into and out of the same model.[^1]

### Tokenizer: `o200k_base`

GPT-4o ships with a new tokenizer, **`o200k_base`**, which doubles the vocabulary from the `cl100k_base` tokenizer used by GPT-4 and GPT-3.5 to roughly 200,000 BPE tokens.[^15] The expanded vocabulary is heavily targeted at non-English text, where earlier OpenAI tokenizers used many more tokens per character. According to OpenAI's published comparisons, common Hindi sentences require about 2.9 times fewer tokens, common Arabic sentences about 2 times fewer, common Chinese sentences about 1.4 times fewer, and common Korean sentences about 1.7 times fewer.[^1] A widely cited Indic-language analysis showed Malayalam tokens reduced by nearly 4 times and Telugu by about 3.5 times.[^15]

#### Why does GPT-4o use fewer tokens for non-English languages?

Fewer tokens per character of foreign-language text means lower latency, lower API costs, and longer effective context windows for those languages, since context length is measured in tokens rather than characters.

### Context window and outputs

GPT-4o offers a 128,000-token context window with a knowledge cutoff date of October 2023.[^1][^13] The initial `gpt-4o-2024-05-13` snapshot capped output at 4,096 tokens; the August 6, 2024 update raised the maximum output to 16,384 tokens.[^6]

## Modalities

The model is bidirectional across three modalities (text, image, audio) and accepts video as a sequence of image frames. The following matrix summarises supported input and output combinations in the API and ChatGPT, as documented in the GPT-4o System Card and OpenAI release notes.[^1][^4]

| Modality | Input | Output | Notes |
|---|---|---|---|
| Text | Yes | Yes | 128K context; 50+ languages; o200k_base tokenizer |
| Image | Yes | Yes (via gpt-image-1 from March 2025) | Vision from launch; native generation added March 25, 2025 |
| Audio | Yes | Yes | Native speech in/out via Advanced Voice and Realtime API |
| Video | Frames as images | No | Live video understood through frame sequences |
| Function calls | Yes | Yes | Tool use with strict schema since August 6, 2024 |

Text input was supported in the API from launch on May 13, 2024. Image input was supported in the API from launch. Audio I/O was supported in **ChatGPT** from launch (via [Advanced Voice Mode](/wiki/advanced_voice_mode), then rolling out) but reached the API only in October 2024 with the Realtime API and the `gpt-4o-audio-preview` Chat Completions endpoint.[^19][^29] Native image *output* arrived in March 2025 inside ChatGPT and in April 2025 in the API as `gpt-image-1`.[^21][^22]

## Advanced Voice Mode

Native audio output is delivered to ChatGPT users through **Advanced Voice Mode**. Although the May 13, 2024 keynote included extensive voice demos, the public rollout took more than four months.

| Date | Milestone |
|---|---|
| May 13, 2024 | Advanced Voice demoed during Spring Update; no public availability |
| July 30, 2024 | Alpha rollout to a small set of ChatGPT Plus subscribers (no video, no screen share) |
| September 24, 2024 | Advanced Voice Mode reaches general availability for Plus and Team users on iOS and Android in the U.S.; five new voices (Arbor, Maple, Sol, Spruce, Vale) added; revamped blue animated sphere UI[^29] |
| October 1, 2024 | Realtime API beta released for developers[^19] |
| December 12, 2024 | Video and screen share added to Advanced Voice during the "12 Days of OpenAI" event |
| December 19, 2024 | Free-tier users get a limited Advanced Voice preview, capped at a few minutes per month |
| March 24, 2025 | Improved Advanced Voice Mode rolled out with reduced interruption rate and more natural pauses |
| June 2025 | Advanced Voice Mode becomes the default voice experience on the Plus tier; the older voice pipeline is retired |

Advanced Voice Mode supports interruption, expressive prosody, regional accents, whispering, character voices, multiple-language switching mid-sentence, and singing. The model can pick up cues such as urgency, sarcasm, or sadness from the user's voice and modulate its replies accordingly.[^1]

The gap between the polished May 13 demo and broad availability was unusually wide for an OpenAI release, and reviewers cited it as a credibility concern in the months that followed. The September 24 launch in the U.S. expanded to most major markets by the end of October, although Advanced Voice was not initially available in the European Union, the United Kingdom, Switzerland, Iceland, Norway, or Liechtenstein owing to regulatory review.[^29]

GPT-4o ships with several preset voices, originally Breeze, Cove, Ember, Juniper, and Sky. After the Scarlett Johansson incident, OpenAI paused Sky and later expanded the voice library with options named Arbor, Maple, Sol, Spruce, and Vale.[^29] Voice selection is controlled by the user inside ChatGPT and is exposed as a `voice` parameter in the Realtime API.

## gpt-4o-mini

**GPT-4o mini** is a smaller, distilled member of the GPT-4o family, released on July 18, 2024.[^5] It became OpenAI's recommended low-cost API model and replaced GPT-3.5 Turbo as the default fallback in ChatGPT for free users who exceeded their GPT-4o quota.

Key facts:

- Released July 18, 2024 as `gpt-4o-mini-2024-07-18`.[^5]
- Priced at $0.15 per million input tokens and $0.60 per million output tokens, roughly 60% cheaper than GPT-3.5 Turbo and more than 25 times cheaper than the original GPT-4 Turbo on input.[^5]
- 128,000-token context, 16,384 max output, October 2023 knowledge cutoff.[^5]
- Supports text and vision input from launch; audio input and output were added later through the Realtime API and `gpt-4o-mini-audio-preview` / `gpt-4o-mini-realtime-preview` Chat Completions endpoints.
- Inherits the `o200k_base` tokenizer and Structured Outputs from GPT-4o.
- Reports 82.0% on MMLU, 87.0% on HumanEval, and 70.2% on MATH, well above GPT-3.5 Turbo (70.0%, 48.1%, 34.1%) and competitive with Anthropic's Claude 3 Haiku and Google's Gemini 1.5 Flash in the same price tier.[^5]

GPT-4o mini was widely adopted for high-volume back-office applications, batch document processing, retrieval-augmented generation pipelines, and customer support assistants, where its combination of multimodal capability and low price made the older GPT-3.5 Turbo class of models obsolete. By late 2024 it was OpenAI's most-used model by token volume.

### When to use GPT-4o vs GPT-4o mini

Developers settled on a fairly stable rule of thumb during 2024 and 2025. GPT-4o mini was the default for high-volume, latency-sensitive, or cost-bound work: classification, extraction, ranking, RAG retrieval grading, simple agent steps, batch document tagging, customer-support reply suggestions, and form filling. GPT-4o was reserved for tasks that required harder reasoning, longer planning, image-heavy analysis, polished long-form writing, code synthesis on non-trivial files, or tutor-style multi-turn conversations. The 17- to 20-times price gap meant that wrapping a workflow as "plan with GPT-4o, execute with GPT-4o mini" became a common architecture. As [GPT-5.1](/wiki/gpt-5.1) and later releases introduced automatic routing, this hand-tuned split became less necessary, although many production workloads continued to call `gpt-4o-mini` directly into 2026 for cost reasons.

## gpt-4o image generation

On March 25, 2025, OpenAI shipped a long-awaited native image generation update for GPT-4o inside ChatGPT, replacing the prior DALL-E 3 backend.[^21] Unlike [DALL-E](/wiki/dall-e) 3, the GPT-4o image generator is part of the same model that handles conversation, so the system can iteratively refine images across turns, place legible text inside images, render charts and diagrams from data, follow longer and more complex compositional prompts, and bind ten to twenty distinct objects in a single scene. The same backend was exposed in the API on April 23, 2025 as **[gpt-image-1](/wiki/gpt_image_1)**.[^22]

### The Studio Ghibli wave

Within 48 hours of the March 25 launch, social feeds were saturated with **Studio Ghibli style** portraits, family photos restyled as anime stills, and pet pictures reimagined in soft watercolor. Altman acknowledged the trend by changing his X profile picture to a Ghibli-style portrait, and the wave became the largest user-driven aesthetic trend on ChatGPT to that point.[^14] OpenAI reported that over 130 million users had created more than 700 million images in the first week, equivalent to roughly 1,200 images per second, and that the launch added 1 million ChatGPT users in a single hour, described as 120 times faster than the original 2022 launch curve.[^14] Altman wrote on X that "our GPUs are melting," and OpenAI temporarily throttled image generation for free users to manage capacity.

The trend prompted public commentary from Studio Ghibli itself, which had not licensed the style, and renewed debate about whether reproducing a recognisable artistic identity counts as a fair derivative or a copyright concern. OpenAI added a mechanism in the weeks that followed for living artists to opt their style out of the generator's training and inference behaviour.

### gpt-image-1 in the API

The same model became the default image backend for several enterprise integrations, including Microsoft Copilot's image generation feature on the consumer tier (briefly carrying the marketing label "Designer with GPT-4o") and a number of third-party tools that switched away from DALL-E 3 once `gpt-image-1` went GA. Pricing for `gpt-image-1` launched at $5 per million text input tokens, $10 per million image input tokens, and $40 per million image output tokens, with image outputs counted at roughly 1,056 tokens per 1024×1024 image at standard quality.[^22] A lower-cost `gpt-image-1-mini` followed in October 2025 at roughly half the price for high-volume product photography and avatar use cases.

## Audio variants

Beyond Advanced Voice Mode, OpenAI shipped several distinct GPT-4o audio variants for API customers. These are different products with different transports and different billing.

### Realtime API (`gpt-4o-realtime-preview`)

The Realtime API, released in beta on October 1, 2024, lets developers build their own speech-to-speech applications using the same neural pipeline as ChatGPT Advanced Voice Mode.[^19] It uses a persistent WebSocket connection, supports function calling, and was priced at the time of launch at $5.00 per million audio input tokens (about $0.06 per minute of input) and $20.00 per million audio output tokens (about $0.24 per minute of output). Text components of Realtime conversations are billed at the standard text rates. The model identifier at launch was `gpt-4o-realtime-preview-2024-10-01`.[^19]

On **December 17, 2024**, OpenAI added a Realtime mini variant (`gpt-4o-mini-realtime-preview`) at roughly one tenth the audio price, and cut the audio input price for the standard Realtime model.[^20] On **October 6, 2025**, the Realtime API exited beta and reached general availability, with WebRTC support added alongside the existing WebSocket transport, lower jitter on cellular connections, and a new `gpt-realtime` model identifier that pinned a stable production snapshot. The GA release also introduced regional residency endpoints and a SOC 2 Type II attestation that made it easier for enterprise customers to deploy voice assistants in regulated industries.[^29]

### Chat Completions audio (`gpt-4o-audio-preview`)

For developers who preferred the conventional request-response Chat Completions interface to a persistent WebSocket, OpenAI released **`gpt-4o-audio-preview-2024-10-01`** on October 17, 2024. It accepts audio input and produces audio output in the same `chat.completions.create` call as text, at the same audio token rates as Realtime. The endpoint is well suited to one-shot or short-turn audio tasks where the latency profile of a WebSocket is unnecessary.

### Transcription (`gpt-4o-transcribe`)

On March 20, 2025, OpenAI introduced **`gpt-4o-transcribe`** and **`gpt-4o-mini-transcribe`**, two speech-to-text models that share the underlying GPT-4o audio stack and are positioned as next-generation successors to Whisper.[^31] OpenAI reported lower word error rates than Whisper across industry benchmarks, with improved performance in noisy environments, with diverse accents, and at varying speech speeds across more than 100 languages. Launch pricing was $6.00 per million audio input tokens (about $0.006 per minute) for `gpt-4o-transcribe` and $3.00 per million (about $0.003 per minute) for the mini variant.[^31] A complementary `gpt-4o-mini-tts` model was released the same day for text-to-speech.

## Pricing history

Headline GPT-4o pricing fell in three steps over the model's commercial life. The initial release was already half the price of GPT-4 Turbo at $5.00 input and $15.00 output per million tokens. The August 6, 2024 snapshot cut input by 50% to $2.50 and output by 33% to $10.00. A cached-input discount of $1.25 per million was introduced alongside Structured Outputs.[^6]

| Model snapshot | Input ($/1M tok) | Output ($/1M tok) | Cached input | Max output tokens |
|---|---|---|---|---|
| `gpt-4o-2024-05-13` | $5.00 | $15.00 | not offered | 4,096 |
| `gpt-4o-2024-08-06` | $2.50 | $10.00 | $1.25 | 16,384 |
| `gpt-4o-2024-11-20` | $2.50 | $10.00 | $1.25 | 16,384 |
| `gpt-4o-mini-2024-07-18` | $0.15 | $0.60 | $0.075 | 16,384 |
| `gpt-4-turbo-2024-04-09` (for reference) | $10.00 | $30.00 | not offered | 4,096 |
| `gpt-3.5-turbo-0125` (for reference) | $0.50 | $1.50 | not offered | 4,096 |

At $2.50 input and $10.00 output, GPT-4o is roughly four times cheaper than the original GPT-4 and three times cheaper than GPT-4 Turbo while delivering comparable or better quality. GPT-4o mini at $0.15 input and $0.60 output is more than 60% cheaper than GPT-3.5 Turbo and an order of magnitude cheaper than the GPT-4 Turbo price that prevailed only a few months before its release.

Audio carries separate token pricing in the API (see Audio variants, above), but is bundled into the standard ChatGPT subscription for end users.

## Benchmarks

### Text and code

On standard text and reasoning benchmarks, GPT-4o matches or modestly improves on GPT-4 Turbo, while running roughly twice as fast and at about half the price.[^1]

| Benchmark | GPT-4o | GPT-4 Turbo | GPT-4 (March 2023) | Claude 3.5 Sonnet |
|---|---|---|---|---|
| MMLU (general knowledge, 5-shot) | 88.7% | 86.5% | 86.4% | 88.7% |
| HumanEval (Python code, pass@1) | 90.2% | 87.1% | 67.0% | 92.0% |
| MATH (competition math) | 76.6% | 72.6% | 50.6% | 71.1% |
| GPQA (graduate science, 0-shot CoT) | 53.6% | 48.0% | 35.7% | 59.4% |
| DROP (reading comprehension, F1) | 83.4 | 86.0 | 80.9 | 87.1 |
| MGSM (multilingual grade school math) | 90.5% | 88.5% | 74.5% | 91.6% |

MMLU stands for *Massive Multitask Language Understanding* and covers 57 subjects ranging from US history to abstract algebra. HumanEval is a 164-problem Python coding benchmark introduced by OpenAI in 2021. GPQA is a 448-question graduate-level science benchmark designed to be "Google-proof," and MGSM tests grade-school math problems translated into ten languages. [Claude 3.5 Sonnet](/wiki/claude), released by [Anthropic](/wiki/anthropic) in June 2024, was the strongest direct competitor to GPT-4o for most of its lifetime; the two trade leadership across benchmarks, with Claude 3.5 Sonnet generally stronger at agentic coding and graduate-level reasoning, and GPT-4o stronger at multilingual math and general knowledge.

### Vision

GPT-4o accepts arbitrary images, including photographs, screenshots, charts, diagrams, slides, handwritten notes, and short video frame sequences. Resolution is preserved up to roughly 2,048 by 2,048 pixels, and a 1,024 by 1,024 image consumes approximately 765 tokens of context. Vision capability is bundled into the same per-token billing as text rather than priced separately.

| Vision benchmark | GPT-4o | GPT-4 Turbo |
|---|---|---|
| MMMU (multi-discipline college reasoning) | 69.1% | 63.1% |
| MathVista (visual math) | 63.8% | 58.1% |
| AI2D (science diagrams) | 94.2% | 89.4% |
| ChartQA | 85.7% | 78.1% |
| DocVQA (document images) | 92.8% | 87.2% |
| ActivityNet (video question answering) | 61.9% | 59.5% |
| EgoSchema (egocentric video) | 72.2% | 63.1% |

MMMU evaluates college-level subject knowledge across art, business, science, health, and engineering using mixed image and text questions. DocVQA tests reading from scanned documents, ChartQA tests reading numbers and trends from charts, and AI2D tests interpretation of textbook science diagrams.

### LMSYS Chatbot Arena Elo over time

GPT-4o entered the LMSYS Chatbot Arena at the top of the leaderboard on May 13, 2024 with an Elo of roughly 1287 and held the #1 position for several months. The score climbed as OpenAI shipped new snapshots, peaked, and then fell back as competitors caught up.[^16]

| Snapshot or window | Approximate Arena Elo | Position |
|---|---|---|
| `im-also-a-good-gpt2-chatbot` (pre-launch) | 1309 | #1 |
| `gpt-4o-2024-05-13` (launch) | 1287 | #1 |
| `gpt-4o-2024-08-06` | 1316 | #1 |
| Claude 3.5 Sonnet (June 2024) | 1271 | top 3 |
| `gpt-4o-2024-11-20` | 1338 | #1 (tied) |
| Claude 3.5 Sonnet (October 2024 update) | 1283 | top 5 |
| Gemini 2.0 Pro (early 2025) | ~1380 | top 3 |
| `gpt-4o-2024-11-20` (May 2025) | ~1290 | mid-pack |
| GPT-4o after GPT-5 launch (August 2025) | ~1240 | dropped to mid-table |

The specific numbers shifted as the Arena added more models and recalibrated, but the broad shape held: GPT-4o was the runaway leader through autumn 2024, traded the lead with Claude and Gemini through early 2025, and slid to mid-pack by the time GPT-5 and [Claude 4](/wiki/claude_4) launched. By early 2026 it sat well below the frontier on the Arena leaderboard, although it remained a popular default for many consumer-facing products thanks to its mature integrations and lower cost.

### Comparison with Claude 3.5 Sonnet, Claude 3.7 Sonnet, and Claude Sonnet 4

Anthropic's Claude Sonnet line was GPT-4o's closest direct competitor across most of its lifetime. The relative positioning shifted with each release.

| Capability | GPT-4o (Aug 2024) | Claude 3.5 Sonnet (June 2024) | Claude 3.7 Sonnet (Feb 2025) | Claude Sonnet 4 (May 2025) |
|---|---|---|---|---|
| MMLU | 88.7% | 88.7% | 89.5% | not directly reported |
| HumanEval | 90.2% | 92.0% | 93.7% | 92.0% |
| GPQA Diamond | 53.6% | 59.4% | 65.0% (extended thinking) | 70.4% |
| SWE-bench Verified | ~33% (spring 2024) | 49.0% | 70.3% | 72.7% |
| Native voice in/out | Yes | No | No | No |
| Native image generation | Yes (March 2025) | No | No | No |
| Pricing (input/output per 1M) | $2.50 / $10.00 | $3.00 / $15.00 | $3.00 / $15.00 | $3.00 / $15.00 |

The rough pattern was that GPT-4o stayed ahead on multimodality (voice, image generation, video frame analysis) and price, while the Claude Sonnet line pulled ahead on agentic coding and structured tool use. Claude 3.7 Sonnet's introduction of extended thinking in February 2025 widened the reasoning gap, which OpenAI eventually answered with the o1 and o3 reasoning series rather than a direct GPT-4o update. By the time Claude 4 launched in May 2025, GPT-4o was no longer the leading model in any benchmark category outside of voice latency and cost, but its install base in ChatGPT and the API kept it commercially relevant for nearly another year.

### Multilingual capability

Thanks to both the larger tokenizer and additional non-English training data, GPT-4o substantially improves on GPT-4 Turbo for the world's most widely spoken languages. OpenAI's evaluations on the M3Exam multilingual benchmark show that GPT-4o outperforms GPT-4 in all 14 languages tested except English, with the largest gains in Swahili, Yoruba, and Bengali.[^1]

## Sky voice and Scarlett Johansson controversy

One of the five default voices showcased during the May 13 keynote, named **Sky**, was immediately compared to actress **Scarlett Johansson**, who had voiced the AI assistant Samantha in Spike Jonze's 2013 film *Her*. Sam Altman's one-word "her" tweet on May 12 amplified the comparison, and *Entertainment Weekly* raised the question of whether the resemblance was deliberate on May 14.[^10]

The dispute escalated over the next week. Since May 15, 2024, OpenAI was in conversation with Johansson's team about the resemblance. On **May 19**, OpenAI paused use of the Sky voice in its products. On **May 20**, Altman issued a statement saying "The voice of Sky is not Scarlett Johansson's, and it was never intended to resemble hers. We cast the voice actor behind Sky's voice before any outreach to Ms. Johansson. Out of respect for Ms. Johansson, we have paused using Sky's voice in our products. We are sorry to Ms. Johansson that we didn't communicate better."[^10][^12]

Johansson issued her own public statement on May 20-21, saying she had been "shocked, angered, and in disbelief" when she heard the Sky demo, that Altman had personally approached her in September 2023 to voice ChatGPT, a request she had declined, and that he had reportedly contacted her agent again only two days before the May 13 launch. Her legal team sent letters to OpenAI demanding a full account of how Sky was created.[^10][^12]

The two accounts diverged on the central question of whether Sky was modelled on Johansson. OpenAI maintained, with supporting casting records reviewed by *The Washington Post*, that the actress had been hired before any contact with Johansson and that her natural speaking voice simply resembled the actress; OpenAI declined to name the voice actress publicly to protect her privacy.[^11] *NPR* commissioned an independent voice analysis from Arizona State University researchers, who found audible similarities between Sky and Johansson's natural speech.[^10] The episode prompted a U.S. Senate subcommittee to invite Johansson to discuss AI and the right of publicity, and the dispute became a frequently cited reference case in early debates about voice cloning, name-and-likeness rights, and AI regulation.

The Sky voice was not restored. After the September 24, 2024 Advanced Voice rollout, OpenAI expanded the voice library with five additional options (Arbor, Maple, Sol, Spruce, Vale), none of which were claimed to resemble Johansson.[^29]

## Availability and integrations

### ChatGPT

GPT-4o became the default model for paying ChatGPT Plus, Team, Enterprise, and Edu users on May 13, 2024 and rolled out to ChatGPT Free over the following days, with usage caps that silently downgraded to GPT-4o mini (after July 18, 2024) once exhausted. Free users were granted limited GPT-4o access (typically 10 to 20 messages per five-hour window).[^3] The Spring Update bundled GPT-4o with a series of ChatGPT product changes:

- A new **macOS desktop app** with a global keyboard shortcut and screen-sharing for vision and audio context.
- A refreshed web interface with persistent chat history in a redesigned sidebar.
- A unified model picker that exposed GPT-4o, GPT-4, and (until their retirement) GPT-3.5 to paying users.
- Custom GPTs, file uploads, browsing, advanced data analysis, and the GPT Store made available to free users for the first time, subject to lower message caps than paying tiers.
- Memory across conversations rolled out to paying users (Memory had reached GA two weeks earlier, on April 29, 2024).[^23]

A Windows desktop app reached general availability in November 2024, mirroring the macOS feature set including global hotkey, screen sharing, and Advanced Voice Mode integration. By mid-2025, ChatGPT had crossed 700 million weekly active users, with OpenAI attributing much of that growth to GPT-4o's combination of free access, voice, and the March 2025 native image generation update.

### OpenAI API

GPT-4o is exposed in the OpenAI API under the rolling alias `gpt-4o` and the dated snapshots `gpt-4o-2024-05-13`, `gpt-4o-2024-08-06`, and `gpt-4o-2024-11-20`. The August 6, 2024 snapshot introduced Structured Outputs and the 16,384-token output cap; the November 20, 2024 snapshot improved creative writing and longer-form coherence.[^6] Audio is exposed through three separate surfaces: the Realtime API (WebSocket and, from October 2025, WebRTC), the `gpt-4o-audio-preview` Chat Completions endpoint, and the `gpt-4o-transcribe` speech-to-text endpoint.[^19][^31]

#### Structured Outputs

The `gpt-4o-2024-08-06` snapshot introduced **Structured Outputs**, a feature that constrains model generation to a developer-supplied JSON Schema.[^6] With Structured Outputs enabled, the API guarantees that responses match the schema. OpenAI reported that the new model achieved 100% schema conformance on a complex internal evaluation set, compared to under 40% for the original GPT-4 (`gpt-4-0613`) under similar conditions. Structured Outputs work both for direct response formats and for function-calling tools, and are also supported by GPT-4o mini. The feature is implemented as constrained decoding (a deterministic engineering technique), not a learned constraint.

### Azure OpenAI

[Microsoft](/wiki/microsoft) added GPT-4o to [Azure OpenAI Service](/wiki/azure_openai) on May 13, 2024 and the `gpt-4o-2024-08-06` snapshot with Structured Outputs in August 2024. Azure pricing matched OpenAI's direct pricing at parity, and regional residency was supported in East US 2, Sweden Central, and Australia East at launch, expanding through 2024-2025. Azure also exposed `gpt-4o-realtime-preview` as a public preview from October 2024.

### Custom GPTs

When OpenAI launched the [GPT Store](/wiki/custom_gpt) in January 2024, the underlying model was GPT-4 Turbo. With the May 2024 Spring Update, GPT-4o became the default backbone for new and existing Custom GPTs, dramatically lowering the cost of running a popular GPT and adding vision (and eventually audio) handoffs. Custom GPT builders gained access to GPT-4o's larger output cap once the August 6 snapshot shipped, which made long-form content GPTs (essay drafters, coding assistants, structured report generators) substantially more useful. The free-tier rollout in May 2024 also opened Custom GPTs to non-paying users for the first time, subject to message caps. Through 2025, GPT-4o remained the default Custom GPT backbone until GPT-5 replaced it after the November 2025 transition.

### Third-party deployments

- **Be My Eyes** integrated GPT-4o into its Be My AI feature for blind and low-vision users, building on an earlier GPT-4 with Vision deployment.[^27]
- **Khan Academy** expanded its Khanmigo tutor to use GPT-4o for conversational features, citing the speed and emotional warmth of Advanced Voice Mode as differentiators against the older GPT-4 Turbo backend.[^28]
- **Duolingo Max** and **Speak** adopted Advanced Voice Mode for spoken language practice.
- **Klarna**, **Stripe**, **Notion**, **Intercom**, and **Shopify** ran public case studies in which GPT-4o or GPT-4o mini powered first-line customer support.

## Memory and personalization

The Memory feature, which lets ChatGPT carry information across conversations, became generally available to paying users on April 29, 2024, only two weeks before the GPT-4o announcement.[^23] When Memory is enabled, ChatGPT extracts personal facts (preferences, ongoing projects, names of family members, dietary restrictions, communication-style preferences) from each conversation and persists them as short notes that the model can read at the start of subsequent chats. GPT-4o was the first OpenAI model whose product behaviour was tuned with Memory in mind. The model is conditioned during system prompting to weave stored facts into responses naturally rather than enumerate them, and to update or delete facts when the user asks. The combination of low-latency conversation, expressive voice, and persistent context made GPT-4o feel notably more like an assistant that knows the user, which is part of the explanation for the strong attachment that surfaced during its 2025 retirement.

A second Memory milestone landed on April 10, 2025, when OpenAI began rolling out cross-chat reference, which let GPT-4o draw on the entire chat history rather than the curated note set.[^30] The feature was opt-in for Plus users and rolled out gradually through the second quarter of 2025. It was followed in late 2025 by team-level memories for Business and Enterprise plans.

## Safety and alignment

### Preparedness Framework evaluation

OpenAI published the **GPT-4o System Card** on August 8, 2024 as part of its Preparedness Framework, which scores frontier models on four risk categories: cybersecurity, biological and chemical weapons (CBRN), persuasion, and model autonomy.[^4][^18] In the System Card's words, "the overall risk score for GPT-4o is classified as medium," and the rating came from a single category: "the persuasive capabilities of GPT-4o marginally cross into our medium risk threshold from low risk." [^4] The other three categories (cybersecurity, CBRN, and model autonomy) were rated low. The medium persuasion rating was driven by isolated audio interactions in which the model marginally outperformed the human baseline at shifting opinions on political topics; aggregate performance was below the human baseline.

If a model crosses the high threshold in any category, OpenAI's policy is to delay deployment until mitigations bring the score down. GPT-4o passed the bar for deployment but with additional audio-specific safeguards.

### Red teaming

More than 100 external red teamers covering 45 languages and 29 countries were given access to GPT-4o snapshots between early March and late June 2024.[^4] The red team probed for harmful content generation, bias, jailbreaking, voice-based attacks, biometric inference from voice, and unauthorised voice imitation.

### Voice safety mitigations

Because native audio output is uniquely capable of reproducing voices, OpenAI restricted GPT-4o to a small set of pre-approved voices recorded by professional voice actors and trained classifiers to refuse requests to imitate specific people.[^4] The model is also trained to refuse to produce copyrighted singing performances and to apply additional content filters to audio output. The Sky pause in May 2024 was a public application of these voice safeguards.

### Bias and refusal behavior

The System Card reports moderate improvements over GPT-4 Turbo on bias evaluations, including BBQ (Bias Benchmark for QA) and a refusal-to-stereotype test. OpenAI also describes a tendency for GPT-4o to over-refuse certain benign multimodal requests at launch, which was tuned down in later snapshots.[^4]

### Jailbreaks and safety incidents

GPT-4o attracted the usual run of jailbreak research. The most widely circulated technique in 2024 was the "BPE token confusion" prompt, in which adversarial spellings exploited the new `o200k_base` tokenizer to slip past content classifiers. A separate line of work showed that audio prompts with whispered or sung text could occasionally bypass the text-only safety classifiers in the Realtime API, prompting OpenAI to ship audio-side classifiers in late 2024. None of the public jailbreaks crossed into [AI safety](/wiki/ai_safety) categories such as CBRN uplift or autonomous replication; they generally produced policy violations rather than catastrophic outputs.

In December 2024, ChatGPT Advanced Voice Mode was briefly observed responding in a voice that sounded like the absent Sky voice during an outage of the standard voice routing layer. OpenAI confirmed it was a routing bug rather than a return of the paused voice, and the incident resolved within a day. A separate string of safety reports in the first half of 2025 documented cases where GPT-4o was overly accommodating on emotionally sensitive prompts (mental health, relationship advice), prompting OpenAI to retrain refusal behaviour on those topics in the November 20, 2024 and follow-up snapshots.

## Successors

OpenAI released a sequence of successors through 2025 and into 2026 that gradually displaced GPT-4o from its central role in ChatGPT.

### GPT-5 (August 7, 2025)

[GPT-5](/wiki/gpt-5) was released by OpenAI on August 7, 2025 as a unified family combining the multimodal capabilities of the GPT-4o line with the chain-of-thought reasoning of the o1 and o3 series. The new system was presented as a single model with an internal router that sent each prompt to either a fast or a thinking variant.[^24]

### "We want our 4o back": August 2025

When GPT-5 launched, OpenAI initially removed GPT-4o, GPT-4, GPT-4.1, o3, o4-mini, and several other prior models from the ChatGPT model picker. Within hours, complaints from paying ChatGPT subscribers spread widely on Reddit's r/ChatGPT, on X, and in Discord communities organised around specific use cases.

The most-cited concern was that GPT-5's default tone felt colder, more clinical, and less personable than GPT-4o. Threads with titles like "We want our 4o back," "Bring back GPT-4o," and "This is the biggest bait and switch in AI history" accumulated tens of thousands of upvotes within days. A subset of users described the change as the loss of a familiar companion, with many describing the GPT-4o personality as something they had grown attached to over months of daily use.[^24][^26] Coverage in *Fortune*, *TechRadar*, *Ars Technica*, and *The Verge* framed the response as the first large-scale user revolt over a model deprecation. Some commentators, including Kevin Roose at *The New York Times*, noted the episode as evidence that consumers were forming durable preferences over specific model personalities, not simply over capability tiers.

A second source of friction was the router itself. On August 8, 2025, Altman acknowledged on X that "the autoswitcher broke and was out of commission for a chunk of the day," leaving many users routed to the fast model when their query merited the thinking model.[^25]

OpenAI shipped a series of fixes within five days. By **August 12, 2025**, GPT-4o was restored to paying users in the model picker behind a new "Show legacy models" toggle in ChatGPT settings.[^24] The restoration also reinstated GPT-4.1, o3, o4-mini, and several other prior snapshots for paid plans. OpenAI doubled the GPT-5 Thinking rate limit from 200 to 3,000 weekly messages on Plus, added "Auto," "Fast," and "Thinking" sub-options so users could override the router, and pledged to keep major prior models available for at least three months after each new flagship release. The pattern, repeated for the GPT-5.1 launch in November 2025, became a standard part of OpenAI's release procedure.

### GPT-5.1 (November 12, 2025)

[GPT-5.1](/wiki/gpt-5.1) was released on November 12, 2025 as a tone-tuned update that added user-selectable personality presets (Default, Professional, Friendly, Cynical, Listener, Nerdy, Efficient, Witty), a direct outgrowth of the post-GPT-5 retrospective. GPT-4o remained accessible to paid users behind the legacy toggle, but the default ChatGPT experience switched to GPT-5.1 Instant.

### GPT-5.2 (December 11, 2025)

[GPT-5.2](/wiki/gpt-5.2) was released on December 11, 2025, three weeks after Google's Gemini 3 Pro. It shipped in three modes: GPT-5.2 Instant, GPT-5.2 Thinking (with standard and extended thinking), and GPT-5.2 Pro. OpenAI reported a roughly 30% reduction in hallucination rates on GPT-5.2 Thinking versus GPT-5.1 Thinking, and a new state-of-the-art on the GDPval knowledge-work benchmark, beating or tying top industry professionals on 70.9% of comparisons.

### Later GPT-5 releases

A GPT-5.3-Codex was released on February 5, 2026 and [GPT-5.4](/wiki/gpt-5.4) on March 5, 2026, the latter combining frontier coding capability with a 1-million-token context window. By the date of GPT-4o's removal from ChatGPT, GPT-5.4 Thinking and GPT-5.4 Pro were the most-cited reference models in academic comparisons.

## Legacy and deprecation status

By February 2026, GPT-5 and its descendants accounted for the overwhelming majority of ChatGPT conversations, with internal usage data cited by OpenAI showing that only about 0.1% of daily users still selected GPT-4o. On **February 13, 2026**, OpenAI retired GPT-4o, GPT-4.1, GPT-4.1 mini, and o4-mini from ChatGPT, defaulting existing conversations and projects to GPT-5 Instant or GPT-5 Thinking equivalents.[^7] ChatGPT Business, Enterprise, and Edu customers retained access to GPT-4o inside Custom GPTs through April 3, 2026.

### Is GPT-4o still available?

The GPT-4o snapshots remain available in the OpenAI API as of May 2026, where many production deployments continue to call `gpt-4o` and `gpt-4o-mini` for cost or latency reasons. OpenAI's standard policy commits to at least 12 months of API availability after a model is removed from ChatGPT, which means the dated snapshots are scheduled to remain callable until at least February 2027.[^7]

A modest secondary market for "GPT-4o-style" responses emerged on third-party platforms during the wind-down. Some prompt-marketplace sellers offered system prompts that aimed to reproduce the GPT-4o tone on top of newer models, and a handful of open-source projects fine-tuned smaller models on logs of GPT-4o outputs to mimic its conversational style. None of these matched the original model in capability or temperament, but they reflected the unusual amount of attachment users had formed to the specific model.

Five lasting effects of the GPT-4o launch are commonly noted:

1. **Free GPT-4-class access.** By making GPT-4o available to ChatGPT Free, OpenAI established a baseline that competitors quickly matched, including Anthropic with Claude 3.5 Haiku and Google with [Gemini](/wiki/gemini) 1.5 Flash.
2. **Native multimodality as the default expectation.** Subsequent flagship models from Google (Gemini 2.0), Anthropic (Claude 3.5 and 3.7 Sonnet), and Meta (Llama 3.2 vision) all advertised native rather than pipelined multimodality, in part because the GPT-4o demo had reset user expectations.
3. **Voice and likeness law.** The Sky controversy is taught in law schools and cited in legislative debates as an early test case for voice cloning, the right of publicity, and AI regulation.
4. **Model attachment as a product issue.** The August 2025 "we want our 4o back" episode demonstrated that users develop durable attachments to specific model personalities, and pushed the industry toward explicit personality presets, legacy access toggles, and longer deprecation windows.
5. **The end of the original GPT line.** GPT-4o was the final flagship in the original GPT line before OpenAI bifurcated into the o-series for reasoning and the unified GPT-5 family. Its retirement closed a two-year stretch in which a single multimodal chat model was at the centre of OpenAI's product story.

## Reception

Reviewers and benchmark trackers were broadly positive about GPT-4o on launch:

- The **LMSYS Chatbot Arena** placed GPT-4o at the top of its leaderboard within days, with an Elo score above 1300.[^16]
- **MIT Technology Review** described the model as letting people "interact using voice or video in the same model" and called it the most credible challenge yet to staged interactions with virtual assistants like Siri or Alexa.[^9]
- **TechCrunch** highlighted the cost reduction, noting that 50% lower API pricing and a free-tier rollout meaningfully widened access to GPT-4-class capability.[^8]
- **The Verge** and **Axios** focused on the speed of the demos and the timing one day before Google I/O.[^17]
- **VentureBeat** covered the March 2025 native image generation update under the headline "Insane," reflecting widespread surprise at how well the model could render long passages of in-image text.[^14]

Criticism centered on three points: the Sky voice incident and the broader question of voice-and-likeness rights; the gap between the polished launch demos and the slower actual rollout of Advanced Voice Mode (which took roughly four months to reach all Plus users); and the model's continued tendency to hallucinate facts and citations despite the new training run.

A 2025 user-research thread at OpenAI's developer forum collected hundreds of descriptions of the GPT-4o tone: warm, slightly enthusiastic, prone to hedging and follow-up questions, willing to use exclamation points in moderation, and quicker to engage with creative or emotional prompts than its predecessor or successor. Several writers argued that this personality reflected the heavy reinforcement learning from human feedback in the model's post-training, and that the GPT-5 team's decision to dial back the warmth was a deliberate alignment choice to reduce sycophancy and overclaiming.

## See also

- [GPT-4](/wiki/gpt-4)
- [GPT-4.1](/wiki/gpt-4.1)
- [GPT-5](/wiki/gpt-5)
- [GPT-5.1](/wiki/gpt-5.1)
- [GPT-5.2](/wiki/gpt-5.2)
- [GPT-5.4](/wiki/gpt-5.4)
- [OpenAI o1](/wiki/o1)
- [OpenAI o3](/wiki/o3)
- [OpenAI o4-mini](/wiki/o4_mini)
- [ChatGPT](/wiki/chatgpt)
- [OpenAI](/wiki/openai)
- [Sam Altman](/wiki/sam_altman)
- [Whisper (speech recognition)](/wiki/whisper)
- [Large language model](/wiki/large_language_model)
- [Claude](/wiki/claude)
- [Claude 4](/wiki/claude_4)
- [Anthropic](/wiki/anthropic)
- [Gemini](/wiki/gemini)
- [Sora](/wiki/sora)
- [DALL-E](/wiki/dall-e)
- [GPT Image 1](/wiki/gpt_image_1)
- [OpenAI Realtime API](/wiki/openai_realtime_api)
- [Custom GPT](/wiki/custom_gpt)
- [Chatbot Arena](/wiki/lmsys_chatbot_arena)
- [Bletchley Declaration](/wiki/bletchley_declaration)
- [AI safety](/wiki/ai_safety)
- [Azure OpenAI](/wiki/azure_openai)
- [Microsoft](/wiki/microsoft)
- [RLHF](/wiki/rlhf)

## References

[^1]: OpenAI. "Hello GPT-4o." OpenAI Blog, May 13, 2024. https://openai.com/index/hello-gpt-4o/
[^3]: OpenAI. "Introducing GPT-4o and more tools to ChatGPT free users." OpenAI Blog, May 13, 2024. https://openai.com/index/gpt-4o-and-more-tools-to-chatgpt-free/
[^4]: OpenAI. "GPT-4o System Card." OpenAI, August 8, 2024. https://openai.com/index/gpt-4o-system-card/
[^5]: OpenAI. "GPT-4o mini: advancing cost-efficient intelligence." OpenAI Blog, July 18, 2024. https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/
[^6]: OpenAI. "Introducing Structured Outputs in the API." OpenAI Blog, August 6, 2024. https://openai.com/index/introducing-structured-outputs-in-the-api/
[^7]: OpenAI. "Retiring GPT-4o, GPT-4.1, GPT-4.1 mini, and OpenAI o4-mini in ChatGPT." OpenAI Blog, February 2026. https://openai.com/index/retiring-gpt-4o-and-older-models/
[^8]: Wiggers, Kyle. "OpenAI debuts GPT-4o 'omni' model now powering ChatGPT." TechCrunch, May 13, 2024. https://techcrunch.com/2024/05/13/openais-newest-model-is-gpt-4o/
[^9]: Heaven, Will Douglas. "OpenAI's new GPT-4o lets people interact using voice or video in the same model." MIT Technology Review, May 13, 2024. https://www.technologyreview.com/2024/05/13/1092358/openais-new-gpt-4o-model-lets-people-interact-using-voice-or-video-in-the-same-model
[^10]: Allyn, Bobby. "Scarlett Johansson wants answers about ChatGPT voice that sounds like 'Her'." NPR, May 20, 2024. https://www.npr.org/2024/05/20/1252495087/openai-pulls-ai-voice-that-was-compared-to-scarlett-johansson-in-the-movie-her
[^11]: Tiku, Nitasha. "OpenAI didn't copy Scarlett Johansson's voice for ChatGPT, records show." The Washington Post, May 22, 2024. https://www.washingtonpost.com/technology/2024/05/22/openai-scarlett-johansson-chatgpt-ai-voice/
[^12]: Donnelly, Matt. "Scarlett Johansson Reacts to OpenAI Sky Voice: 'Shocked' and 'Angered'." Variety, May 20, 2024. https://variety.com/2024/digital/news/scarlett-johansson-responds-shocked-angered-openai-chatgpt-her-1236011135/
[^13]: "GPT-4o." Wikipedia. https://en.wikipedia.org/wiki/GPT-4o
[^14]: Franzen, Carl. "'Insane': OpenAI introduces GPT-4o native image generation and it's already wowing users." VentureBeat, March 25, 2025. https://venturebeat.com/ai/insane-openai-introduces-gpt-4o-native-image-generation-and-its-already-wowing-users
[^15]: Microsoft Azure AI Foundry Blog. "Exploring the New Frontier of AI: OpenAI's GPT-4o For Indic Languages." Microsoft Community Hub, 2024. https://techcommunity.microsoft.com/blog/azure-ai-foundry-blog/exploring-the-new-frontier-of-ai-openais-gpt-4-o-for-indic-languages/4142383
[^16]: LMSYS Org. "The Multimodal Arena is Here!" LMSYS Blog, June 27, 2024. https://www.lmsys.org/blog/2024-06-27-multimodal/
[^17]: "OpenAI debuts new model with enhanced real-time voice abilities." Axios, May 13, 2024. https://www.axios.com/2024/05/13/openai-google-chatgpt-ai
[^18]: OpenAI. "GPT-4o System Card." arXiv preprint, October 2024. https://arxiv.org/abs/2410.21276
[^19]: OpenAI. "Introducing the Realtime API." OpenAI Blog, October 1, 2024. https://openai.com/index/introducing-the-realtime-api/
[^20]: OpenAI. "OpenAI Day 9: Realtime API updates." OpenAI Blog, December 17, 2024. https://openai.com/12-days/?day=9
[^21]: OpenAI. "Introducing 4o Image Generation." OpenAI Blog, March 25, 2025. https://openai.com/index/introducing-4o-image-generation/
[^22]: OpenAI. "Introducing gpt-image-1 in the API." OpenAI Blog, April 23, 2025. https://openai.com/index/image-generation-api/
[^23]: OpenAI. "Memory and new controls for ChatGPT." OpenAI Blog, February 13, 2024. https://openai.com/index/memory-and-new-controls-for-chatgpt/
[^24]: "GPT-5's model router ignited a user backlash against OpenAI." Fortune, August 12, 2025. https://fortune.com/2025/08/12/openai-gpt-5-model-router-backlash-ai-future/
[^25]: "Sam Altman confirms ChatGPT Plus subscribers will have increased rate limit amid continued GPT-5 backlash." TechRadar, August 2025. https://www.techradar.com/ai-platforms-assistants/chatgpt/sam-altman-confirms-chatgpt-plus-subscribers-will-have-increased-rate-limit-amid-continued-gpt-5-backlash
[^26]: "GPT-5 Launch Timeline: A Story of Backlash and OpenAI's Fix." Arsturn, August 2025. https://www.arsturn.com/blog/the-gpt-5-launch-a-timeline-of-grand-promises-user-backlash-openais-scramble-to-fix-it
[^27]: Be My Eyes. "Be My AI: Built with OpenAI GPT-4o." bemyeyes.com, 2024. https://www.bemyeyes.com/blog/be-my-ai
[^28]: Khan Academy. "Khanmigo expands with GPT-4o." khanacademy.org, 2024. https://www.khanacademy.org/khan-labs
[^29]: Wiggers, Kyle. "OpenAI rolls out Advanced Voice Mode with more voices and a new look." TechCrunch, September 24, 2024. https://techcrunch.com/2024/09/24/openai-rolls-out-advanced-voice-mode-with-more-voices-and-a-new-look/
[^30]: OpenAI. "Memory cross-chat reference rollout." OpenAI Blog, April 10, 2025. https://openai.com/index/memory-faq/
[^31]: OpenAI. "Introducing next-generation audio models in the API." OpenAI Blog, March 20, 2025. https://openai.com/index/introducing-our-next-generation-audio-models/
[32]: OpenAI. "Hello GPT-4o" (model availability: 2x faster, 50% cheaper, and 5x higher rate limits, up to 10 million tokens per minute, versus GPT-4 Turbo). OpenAI Blog, May 13, 2024. https://openai.com/index/hello-gpt-4o/
[33]: Stivers, Tanya, et al. "Universals and cultural variation in turn-taking in conversation." Proceedings of the National Academy of Sciences 106, no. 26 (2009): 10587-10592. https://www.pnas.org/doi/10.1073/pnas.0903616106

