GPT-4o

22 min read

Updated Jul 28, 2026

GPT-4o (Generative Pre-trained Transformer 4 Omni) is a natively multimodal large language model developed by OpenAI and announced on May 13, 2024 at the company's "Spring Update" livestream. The lowercase "o" stands for omni. OpenAI describes the research model as accepting combinations of text, audio, image, and video and generating text, audio, and image, and says it was trained end-to-end across text, vision, and audio so that all inputs and outputs are processed by the same neural network.^[1]^[4] This differs from the chained speech-to-text, language, and text-to-speech models that powered earlier versions of ChatGPT Voice Mode. GPT-4o succeeded GPT-4 Turbo, matching its text and code performance while running roughly twice as fast, at half the API cost, and with five times higher rate limits, up to 10 million tokens per minute.^[1]^[32]

The most widely publicized aspect of GPT-4o is its real-time conversational ability. The model can respond to audio inputs in as little as 232 milliseconds, with an average latency of about 320 milliseconds, which OpenAI compares with human conversational response time.^[1] A cross-linguistic study of ten languages found that the modal gap between conversational turns is roughly 200 milliseconds, so GPT-4o's 232 millisecond floor is close to the rhythm people expect from a live exchange.^[33] Earlier ChatGPT voice experiences chained a speech-to-text model (typically Whisper), a text-only GPT-4 inference, and a separate text-to-speech model in series, producing typical end-to-end latencies of two to five seconds and stripping away tone, multiple-speaker information, background noise, laughter, and song. By training one model end-to-end across the supported modalities, OpenAI sought to retain those cues and enable expressive speech, including singing and laughter.^[1]^[4]

OpenAI used the launch to begin making GPT-4-class capability available free for the first time. On May 13 it began rolling GPT-4o out to ChatGPT Plus and Team users, started a usage-capped Free rollout, and said Enterprise access was coming soon; advanced tools for Free users were staged over the following weeks.^[3] A smaller and cheaper variant, GPT-4o mini, followed on July 18, 2024 as a cost-efficient API model and later Free-tier fallback.^[5] Native image generation reached ChatGPT on March 25, 2025 and the API as gpt-image-1 on April 23, 2025.^[21]^[22] GPT-4o was retired from ordinary ChatGPT conversations on February 13, 2026 in favor of the GPT-5 family, while remaining available in the OpenAI API.^[7]

GPT-4o remained central to ChatGPT until the launch of GPT-5 in August 2025. OpenAI initially replaced it in the model picker, restored it for paid users five days later, and retired it from ChatGPT in February 2026 after usage had shifted to newer models.^[7]^[24]^[39]

Background

By the spring of 2024, OpenAI's commercial flagship was GPT-4 Turbo, a faster and cheaper variant of GPT-4 introduced at the November 2023 DevDay. GPT-4 Turbo with Vision had added image input in late 2023, but voice in ChatGPT continued to rely on a cascade of three models: Whisper transcribed user audio to text; a text-only GPT-4 generated a reply; and a separate text-to-speech model converted the reply back to audio. The cascade had three structural drawbacks. First, the intermediate text step discarded prosody, multiple-speaker information, background sound, and emotional inflection. Second, each link added latency, producing end-to-end voice turnaround times that were too slow for natural conversation. Third, the cascade made it difficult for the language model to produce expressive speech because its output was a sequence of plain text tokens that the TTS layer had to interpret.^[1]

In parallel, the competitive landscape had tightened. Google's Gemini 1.5 Pro, Anthropic's Claude 3 Opus, and Meta's Llama 3 all challenged GPT-4 on text and code benchmarks. OpenAI announced GPT-4o on May 13, one day before Google's I/O keynote; contemporary coverage framed the back-to-back events as direct competition.^[17]

OpenAI presented GPT-4o as one model trained end-to-end across text, vision, and audio. The System Card identifies broad training-data categories, including publicly available data and proprietary sources such as partnerships, plus code, mathematics, and multimodal data. OpenAI did not publish a comprehensive corpus manifest, dataset proportions, parameter count, detailed architecture diagram, or training-hardware specification.^[1]^[4]

Release

gpt2-chatbot leak on the Chatbot Arena

Before release, OpenAI tested GPT-4o variants anonymously on LMSYS Chatbot Arena under names including gpt2-chatbot, im-a-good-gpt2-chatbot, and im-also-a-good-gpt2-chatbot. An OpenAI employee confirmed the connection after the May 13 announcement.^[34] The episode drew attention to blind, preference-based testing on the Chatbot Arena, but the snapshot's exact pre-launch performance claims are not retained because the live leaderboard changed as votes accumulated.

Spring Update keynote

The Spring Update was a roughly 26-minute live stream held at OpenAI's San Francisco headquarters on Monday, May 13, 2024, the day before Google's annual I/O developer conference. Then chief technology officer Mira Murati opened the presentation, framed three priorities for the launch (a new desktop app, a refreshed user interface, and the new flagship model), and then handed off to research leads Mark Chen and Barret Zoph for live demonstrations.^[1] The team showed real-time spoken conversation with interruption handling, on-the-fly emotional expression, real-time language translation between Italian and English, vision-based math tutoring on a handwritten linear equation, code review of a Python script via screen sharing, and the model singing a bedtime story to a stuffed animal. The presentation deliberately steered around the staged feel of typical product demos: the presenters interrupted the model mid-sentence, asked it to be more dramatic, and let it ad-lib.

Google's I/O keynote was scheduled for the following day, and contemporary reporting treated the timing as competition between the two announcements.^[17] OpenAI's launch materials also published a Preparedness Framework scorecard covering cybersecurity, CBRN, persuasion, and model autonomy.^[1]

Immediate availability

GPT-4o became available to API customers as gpt-4o and gpt-4o-2024-05-13 on the day of the keynote, with text and image input and text output.^[1] In ChatGPT, OpenAI began rolling it out to Plus and Team users, started capped Free access that day with more tools staged over the following weeks, and said Enterprise access was coming soon. At launch, Free users who reached the limit were automatically switched to GPT-3.5.^[3]

Architecture

A single network, end-to-end

The GPT-4o System Card describes the model as autoregressive and trained end-to-end across text, vision, and audio, with all inputs and outputs processed by the same neural network.^[4] OpenAI has not published enough architectural detail to establish that pixels, waveforms, and text use one shared token space, so that stronger claim is omitted. The end-to-end design nevertheless lets the model retain cues such as tone, multiple speakers, background sounds, and expressive delivery that the earlier text bottleneck discarded.^[1]

How does GPT-4o differ from the old voice cascade?

The cascaded predecessor pipeline used three separate models in series: an automatic speech recognition model (typically Whisper) transcribed audio to text; a text-only GPT-4 model produced a text reply; and a separate text-to-speech model converted the reply back to audio. The middle text-only stage discarded tone, multiple-speaker information, background sound, and emotional inflection, and the chain accumulated latency at each step. GPT-4o instead processes the supported inputs and outputs with the same end-to-end model, allowing it to use cues that the intermediate transcription discarded.^[1]^[4]

Tokenizer: `o200k_base`

GPT-4o uses the o200k_base tokenizer, a roughly 200,000-token BPE encoding that expands on the cl100k_base encoding used by GPT-4 and GPT-3.5.^[15] OpenAI's published sample-sentence comparisons show approximately 3.5 times fewer tokens for Telugu, 2.9 times fewer for Hindi, 2 times fewer for Arabic, 1.7 times fewer for Korean, and 1.4 times fewer for Chinese.^[1] These are observed examples rather than evidence of a disclosed tokenizer-design intent.

Why does GPT-4o use fewer tokens for non-English languages?

For token-billed requests and token-limited contexts, fewer tokens for the same text can reduce cost and leave more of the context window available. The size of the benefit varies by language and text.

Context window and outputs

OpenAI's current base gpt-4o documentation lists a 128,000-token context window, a 16,384-token maximum output, and an October 1, 2023 knowledge cutoff.^[13]

Modalities and product surfaces

The research model described in OpenAI's launch post and System Card accepts combinations of text, audio, image, and video and can generate text, audio, and image.^[1]^[4] Public product access has always depended on the specific endpoint and release date.

Surface	Input	Output	Status
Base `gpt-4o` API model	Text and image	Text	128K context; audio and video not supported on this endpoint^[13]
ChatGPT Advanced Voice	Text, audio, and later image/video sharing	Text and preset-voice audio	Wider paid rollout began September 2024^[29]
Realtime API preview	Text and audio	Text and audio	Public beta began October 2024^[19]
4o image generation in ChatGPT	Text and image	Image	Launched March 2025^[21]
`gpt-image-1` API	Text and image	Image	Launched April 2025^[22]

This distinction corrects the snapshot's implication that the base gpt-4o API endpoint itself returns audio and images. Specialized audio and image models expose those outputs.

Advanced Voice Mode

The May 13 presentation demonstrated real-time interruption, emotional delivery, translation, visual tutoring, and singing, but the new voice system was not broadly available at launch.^[1]^[3] OpenAI began a wider Advanced Voice Mode rollout to Plus and Team users on September 24, 2024 after an earlier limited alpha. The wider rollout added five voices and still omitted the demonstrated video and screen-sharing features.^[29]

Advanced Voice reached the ChatGPT web app for paid users in November 2024. OpenAI began rolling out real-time video, screen sharing, and image uploads in mobile Advanced Voice on December 12, 2024, with geographic and plan limits.^[39]

OpenAI's deployed voice system used preset voices and audio-safety classifiers described in the GPT-4o System Card.^[4] Original voice choices included Breeze, Cove, Ember, Juniper, and Sky. OpenAI paused Sky in May 2024 and later offered Arbor, Maple, Sol, Spruce, and Vale alongside the remaining voices.^[29]^[36]

gpt-4o-mini

GPT-4o mini is a smaller model in the GPT-4o family, released on July 18, 2024 as gpt-4o-mini-2024-07-18.^[5] OpenAI positioned it for applications that make many model calls, pass large contexts, or need low-latency text responses.

Launch price: $0.15 per million input tokens and $0.60 per million output tokens.^[5]
Context and output: 128,000-token context and up to 16,000 output tokens, with an October 2023 knowledge cutoff.^[5]
Launch modalities: text and image input with text output. OpenAI said additional modalities would come later.^[5]
Launch evaluations: 82.0% on MMLU, 87.0% on MGSM, 87.2% on HumanEval, and 59.4% on MMMU, using OpenAI's reported evaluation setup.^[5]

When to use GPT-4o vs GPT-4o mini

OpenAI's launch guidance emphasized GPT-4o mini for cost-sensitive, high-volume, and latency-sensitive tasks. The larger GPT-4o remained the higher-capability choice. The snapshot's detailed developer "rule of thumb" and adoption claims were not tied to a survey or deployment dataset and have been removed.

GPT-4o image generation

On March 25, 2025, OpenAI launched native 4o image generation in ChatGPT. It could accept image inputs, revise an image across conversational turns, render text, and follow prompts containing more objects and relationships than OpenAI's earlier DALL-E image system.^[21] This image generator was related to the GPT-4o family but was a separate product surface from the base gpt-4o API model.

API release and adoption

OpenAI exposed the image system through the API as gpt-image-1 on April 23, 2025. OpenAI reported that more than 130 million users had created more than 700 million images during the feature's first week in ChatGPT.^[22]

At API launch, gpt-image-1 cost $5 per million text-input tokens, $10 per million image-input tokens, and $40 per million image-output tokens. OpenAI estimated low-, medium-, and high-quality square images at about $0.02, $0.07, and $0.19 each.^[22]

gpt-image-1-mini

OpenAI's current model documentation describes gpt-image-1-mini as a cost-efficient version of GPT Image 1 that accepts text and image inputs and produces image outputs. It lists text-input pricing of $2 per million tokens and $0.20 for cached input, image-input pricing of $2.50 and $0.25 for cached input, and image-output pricing of $8 per million tokens. The same page's snapshot section currently labels the rolling alias deprecated.^[43]

Studio Ghibli-style image trend

Within days of the March 2025 release, users widely shared personal photos and internet memes transformed into the style associated with Studio Ghibli. OpenAI chief executive Sam Altman changed his X profile image to a Ghibli-style portrait, while the trend renewed questions about copyrighted training material and artists' livelihoods. The Associated Press reported that OpenAI permitted broad studio styles but refused requests in the style of living artists; Studio Ghibli declined to comment.^[41]

Audio variants

Realtime API

OpenAI released the Realtime API in public beta on October 1, 2024 with gpt-4o-realtime-preview. It used a persistent WebSocket connection and processed speech directly rather than requiring developers to assemble separate transcription and speech-generation models. Launch pricing was $100 per million audio input tokens and $200 per million audio output tokens, which OpenAI estimated at about $0.06 and $0.24 per minute.^[19]

On December 17, 2024, OpenAI released new GPT-4o and GPT-4o mini Realtime snapshots, added WebRTC support, and cut GPT-4o audio pricing to $40 per million input and $80 per million output tokens.^[20] The Realtime API became generally available on August 28, 2025 with a new gpt-realtime model, image input, SIP calling, and remote MCP support.^[35]

Chat Completions audio

OpenAI added audio input and output to Chat Completions on October 17, 2024 through gpt-4o-audio-preview. This request-response interface served audio tasks that did not need a persistent low-latency Realtime session.^[19]

Transcription and speech generation

On March 20, 2025, OpenAI introduced gpt-4o-transcribe, gpt-4o-mini-transcribe, and gpt-4o-mini-tts. OpenAI reported lower word error rates than Whisper across its evaluations, especially with accents, noise, and varying speech speeds.^[31]

Pricing history

At launch, OpenAI priced GPT-4o at $5 per million input tokens and $15 per million output tokens, half the corresponding GPT-4 Turbo price.^[1] The August 6, 2024 snapshot reduced those prices to $2.50 and $10.^[6] Current documentation lists $2.50 per million input tokens, $1.25 per million cached input tokens, and $10 per million output tokens for the base text-and-vision API model.^[13]

Model or snapshot	Input per 1M tokens	Cached input	Output per 1M tokens
`gpt-4o-2024-05-13` at launch	$5.00	Not listed	$15.00
`gpt-4o-2024-08-06` at launch	$2.50	Not listed	$10.00
`gpt-4o` current documentation	$2.50	$1.25	$10.00
`gpt-4o-mini` at launch	$0.15	Not listed	$0.60

Audio and image models use separate token types and prices. Their costs are not interchangeable with the base GPT-4o text-token rates.^[19]^[22]

Evaluations

Text, multilingual, and vision results

OpenAI's launch evaluations reported GPT-4 Turbo-level performance on English text, reasoning, and code, with larger gains in multilingual, audio, and vision tasks.^[1] These were first-party evaluations and should not be read as independent certification. OpenAI's simple-evals repository reports the following point estimates for two dated GPT-4o snapshots under its assistant prompt setup; unlike the removed mixed-provider table, the rows use the same published harness.^[40]

GPT-4o snapshot	MMLU	GPQA	MATH	HumanEval	MGSM
`gpt-4o-2024-05-13`	87.2	49.9	76.6	91.0	89.9
`gpt-4o-2024-08-06`	88.7	53.1	75.9	90.2	90.0

The repository's scores are first-party point estimates, not independent certification or a guarantee of application-level performance.

The launch report includes evaluations for text, automatic speech recognition, audio translation, the M3Exam multilingual benchmark, and vision understanding. It reports particularly large token-count reductions for several non-English languages and says GPT-4o outperformed GPT-4 on M3Exam in most of the displayed languages.^[1]

Chatbot Arena

Anonymous human-preference testing placed GPT-4o among the leading systems around launch. LMSYS later reported that GPT-4o and Claude 3.5 Sonnet led the June 10-25, 2024 Multimodal Arena window.^[16] Exact Arena scores change as votes, models, categories, and ranking methods change, so the snapshot's cross-date Elo table is not retained.

Sky voice and Scarlett Johansson controversy

One of ChatGPT's five original voices, named Sky, was compared with Scarlett Johansson's performance as the assistant Samantha in the 2013 film Her. OpenAI paused Sky on May 19, 2024 after questions about the resemblance.^[10]^[37]

Johansson said Sam Altman had asked her in September 2023 to provide a voice for the product, that she declined, and that he contacted her agent again shortly before the GPT-4o launch. She said the released voice sounded similar enough that her legal team requested an explanation.^[10]^[12]

OpenAI said Sky belonged to another professional actor using her natural voice, had been cast before any outreach to Johansson, and was never intended to imitate her. The company apologized for not communicating better and kept the actor's identity private.^[11]^[36] The available sources therefore establish the dispute and the two parties' differing accounts, not that OpenAI cloned Johansson's voice.

At NPR's request, Arizona State University researchers compared Sky with the voices of around 600 professional actresses. Their analysis found Johansson's voice more similar to Sky than 98% of the sample, although Anne Hathaway and Keri Russell sometimes ranked closer; lead researcher Visar Berisha said the voices were similar but likely not identical. The result supports a measurable resemblance, not a finding that the voices were the same or that one had been cloned.^[42]

Availability and integrations

ChatGPT

On May 13, 2024, OpenAI began rolling GPT-4o out to Plus and Team users, started a usage-limited Free rollout, and said Enterprise access was coming soon. Advanced tools for Free users, including web results, data analysis, image discussion, file uploads, GPTs, and Memory, were staged over the following weeks. Once a Free user's model limit was reached, ChatGPT switched to GPT-3.5 at launch and later to GPT-4o mini.^[3]^[5]

The same announcement introduced a macOS desktop app and a redesigned ChatGPT interface. It did not make the demonstrated native audio and video system broadly available that day.^[3]

OpenAI API

The base gpt-4o API model currently accepts text and image input and returns text. It has a 128,000-token context window, a 16,384-token maximum output, and an October 1, 2023 knowledge cutoff. OpenAI lists the rolling alias plus dated May 13, August 6, and November 20, 2024 snapshots.^[13]

Structured Outputs

The August 6 snapshot introduced Structured Outputs, which constrain generation to a developer-supplied JSON Schema. OpenAI reported 100% conformance for gpt-4o-2024-08-06 on its internal complex-schema evaluation, compared with less than 40% for gpt-4-0613. OpenAI explains that the feature combines model training with constrained decoding and notes that schema compliance does not guarantee that values inside the schema are factually correct.^[6]

Azure OpenAI

Microsoft made GPT-4o available in preview through Azure OpenAI Service on May 13, 2024. The first preview supported text and image input in two US regions.^[38]

Custom GPTs and Memory

The May 2024 Free rollout allowed non-paying users to discover and use Custom GPTs, subject to usage limits.^[3] OpenAI's Memory feature had become available to most Plus users on April 29, 2024. On April 10, 2025, OpenAI expanded Memory for Plus and Pro users so ChatGPT could reference past conversations in addition to saved memories. Both controls could be disabled separately.^[23]

Selected demonstrations and deployments

OpenAI's launch materials included GPT-4o demonstrations with Be My Eyes and Khan Academy. Those examples showed visual assistance and tutoring, but they did not establish the broader deployment and market-share claims that appeared in the snapshot.^[1]

Safety and limitations

Preparedness Framework evaluation

OpenAI published the GPT-4o System Card on August 8, 2024 as its principal AI safety assessment of the model. Under the version of the Preparedness Framework used in that report, GPT-4o received a medium overall rating because text-based persuasion marginally crossed the medium threshold. Cybersecurity, biological threats, and model autonomy were rated low. The card reported that voice interactions did not exceed the low persuasion threshold.^[4]^[18]

Red teaming and voice safeguards

More than 100 external red teamers, covering 45 languages and 29 countries, tested checkpoints from early March through late June 2024.^[4] OpenAI restricted deployed audio output to preset voices created with voice actors and used a streaming classifier to block output that did not match an approved voice. The system card also describes safeguards for speaker identification, sensitive-trait inference, disallowed audio, and copyrighted content.^[4]

Documented limitations

The system card reports lower safety robustness with low-quality audio, background noise, echoes, and interruptions. Red teamers also observed non-native accents in some non-English output and could prompt the model to repeat misinformation or conspiracy theories. OpenAI warned that high-fidelity voice could increase anthropomorphism and miscalibrated trust.^[4] These documented limits replace the snapshot's uncited claims about tokenizer jailbreaks, a Sky routing outage, and specific retraining incidents.

Replacement and retirement

OpenAI released GPT-5 on August 7, 2025 and made it the default ChatGPT system, replacing GPT-4o and several other selectable models for signed-in users.^[24] The launch initially removed GPT-4o from the model picker. On August 12, OpenAI restored GPT-4o for paid users and added controls for selecting GPT-5 Auto, Fast, or Thinking.^[39]

OpenAI later said it restored GPT-4o because some Plus and Pro users wanted more time to move established workflows and preferred GPT-4o's conversational style and warmth. The company said that feedback influenced the personality controls and creative behavior of GPT-5.1 and GPT-5.2.^[7] These statements support the existence of a user preference for GPT-4o, but not the snapshot's broader claims about a quantified "revolt," a secondary market for imitation prompts, or industry-wide effects.

Deprecation status

OpenAI retired GPT-4o from ordinary ChatGPT conversations on February 13, 2026. At the time of the January 29 announcement, OpenAI said only about 0.1% of daily users still selected it.^[7] OpenAI's Help Center states that Business, Enterprise, and Edu customers retained GPT-4o within Custom GPTs until April 3, 2026.^[44]

The retirement was specific to ChatGPT. OpenAI stated that there were no API changes at that time, and its current API documentation still lists the rolling gpt-4o alias and dated snapshots. The same documentation marks gpt-4o-2024-08-06 as deprecated while continuing to list gpt-4o-2024-11-20.^[13]

Reception

Launch coverage focused on the short voice-response latency, the end-to-end multimodal design, free-tier access, and the lower API price relative to GPT-4 Turbo.^[8]^[9] LMSYS reported that GPT-4o and Claude 3.5 Sonnet led its June 2024 multimodal Arena results, while emphasizing that Arena scores are preference measurements rather than fixed capability tests.^[16]

Criticism centered on the gap between the May demonstrations and the later rollout of Advanced Voice Mode, and on the Sky voice dispute. The model card also documented technical limits, including audio robustness problems, non-native accents in some languages, misinformation, and the possibility that human-like audio could encourage anthropomorphism or misplaced trust.^[4]

References

^OpenAI. "Hello GPT-4o." May 13, 2024.
^OpenAI. "Introducing GPT-4o and more tools to ChatGPT free users." May 13, 2024.
^OpenAI. "GPT-4o System Card." August 8, 2024.
^OpenAI. "GPT-4o mini: advancing cost-efficient intelligence." July 18, 2024.
^OpenAI. "Introducing Structured Outputs in the API." August 6, 2024.
^OpenAI. "Retiring GPT-4o, GPT-4.1, GPT-4.1 mini, and OpenAI o4-mini in ChatGPT." January 29, 2026.
^Kyle Wiggers. "OpenAI debuts GPT-4o 'omni' model now powering ChatGPT." TechCrunch, May 13, 2024.
^Will Douglas Heaven. "OpenAI's new GPT-4o lets people interact using voice or video in the same model." MIT Technology Review, May 13, 2024.
^Bobby Allyn. "Scarlett Johansson wants answers about ChatGPT voice that sounds like 'Her'." NPR, May 20, 2024.
^Nitasha Tiku. "OpenAI didn't copy Scarlett Johansson's voice for ChatGPT, records show." The Washington Post, May 22, 2024.
^Matt Donnelly. "Scarlett Johansson Reacts to OpenAI Sky Voice: 'Shocked' and 'Angered'." Variety, May 20, 2024.
^OpenAI. "GPT-4o model documentation." OpenAI API documentation, accessed July 28, 2026.
^OpenAI. "tiktoken." GitHub repository, accessed July 28, 2026.
^LMSYS Org. "The Multimodal Arena is Here!" June 27, 2024.
^Ina Fried. "OpenAI debuts new model with enhanced real-time voice abilities." Axios, May 13, 2024.
^OpenAI et al. "GPT-4o System Card." arXiv:2410.21276, 2024.
^OpenAI. "Introducing the Realtime API." October 1, 2024, updated August 28, 2025.
^OpenAI. "OpenAI o1 and new tools for developers." December 17, 2024.
^OpenAI. "Introducing 4o Image Generation." March 25, 2025.
^OpenAI. "Introducing our latest image generation model in the API." April 23, 2025.
^OpenAI. "Memory and new controls for ChatGPT." February 13, 2024, with updates through June 3, 2025.
^OpenAI. "Introducing GPT-5." August 7, 2025.
^Kyle Wiggers. "OpenAI rolls out Advanced Voice Mode with more voices and a new look." TechCrunch, September 24, 2024.
^OpenAI. "Introducing next-generation audio models in the API." March 20, 2025.
^OpenAI. "Hello GPT-4o." Model availability and rate-limit comparison, May 13, 2024.
^Tanya Stivers et al. "Universals and cultural variation in turn-taking in conversation." Proceedings of the National Academy of Sciences 106, no. 26 (2009): 10587-10592.
^Benj Edwards. "Before launching, GPT-4o broke records on chatbot leaderboard under a secret name." Ars Technica, May 14, 2024.
^OpenAI. "Introducing gpt-realtime and Realtime API updates for production voice agents." August 28, 2025.
^OpenAI. "How the voices for ChatGPT were chosen." May 19, 2024, updated May 20, 2024.
^Associated Press. "Scarlett Johansson says a ChatGPT voice is 'eerily similar' to hers and OpenAI is halting its use." May 20, 2024.
^Microsoft. "Introducing GPT-4o: OpenAI's new flagship multimodal model now in preview on Azure." May 13, 2024.
^OpenAI. "ChatGPT release notes." Entries dated December 12, 2024; August 7 and 12, 2025; and February 13, 2026.
^OpenAI. "simple-evals." GitHub repository, main results table, accessed July 28, 2026.
^Associated Press. "ChatGPT's viral Studio Ghibli-style images highlight AI copyright concerns." March 27, 2025.
^Bobby Allyn. "Voice analysis shows striking similarity between Scarlett Johansson and ChatGPT." NPR, May 31, 2024; accessible NPR mirror.
^OpenAI. "gpt-image-1-mini model documentation." OpenAI API documentation, accessed July 28, 2026.
^OpenAI. "Retiring GPT-4o and other ChatGPT models." OpenAI Help Center, accessed July 28, 2026.

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

9 revisions by 1 contributors · v10 · 4,394 words · full history

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Suggest edit