Advanced Voice Mode

ChatGPT OpenAI Voice AI

10 min read

Updated Jun 27, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 27, 2026

Fact-checked

In review queue

Sources

12 citations

Revision

v3 · 1,984 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Advanced Voice Mode is a real-time, spoken conversation feature in ChatGPT, developed by OpenAI, that lets users talk with the assistant using natural speech and receive spoken replies. It is built on the native, end-to-end audio capabilities of the GPT-4o model, which processes speech directly rather than first transcribing it to text. OpenAI first demonstrated the capability at the GPT-4o launch on May 13, 2024, began an alpha release to a small group of ChatGPT Plus users on July 30, 2024, and expanded it to all Plus and Team subscribers beginning September 24, 2024.^[1]^[2]^[3] OpenAI reported that GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, a latency comparable to human response time in conversation, versus averages of 2.8 seconds and 5.4 seconds for the earlier GPT-3.5 and GPT-4 voice pipelines.^[1] The feature is distinct from the GPT-4o model itself, from the developer-facing Realtime API, and from OpenAI's separate Voice Engine voice-cloning research.

What is Advanced Voice Mode?

Advanced Voice Mode is the consumer-facing ChatGPT experience that turns spoken audio into a fluid, two-way conversation with the assistant. It is the successor to ChatGPT's earlier voice feature and is defined by three traits: very low latency, the ability to perceive and express tone and emotion, and the ability to be interrupted mid-sentence so the conversation feels like talking to a person rather than dictating to a transcription tool.

ChatGPT first gained spoken conversation on September 25, 2023, when OpenAI introduced a voice feature for Plus and Enterprise subscribers in its mobile apps.^[4] That original system, now referred to as Standard Voice Mode, worked by chaining three separate models together: OpenAI's open-source Whisper model transcribed the user's speech to text, a text model such as GPT-4 generated a written reply, and a separate text-to-speech model read that reply aloud.^[1]^[2] Because each step ran sequentially, the pipeline introduced noticeable latency, averaging 2.8 seconds with GPT-3.5 and 5.4 seconds with GPT-4, and discarded information such as tone, emotion, and background sound along the way.^[1]

Advanced Voice Mode replaced that pipeline with a single model. With GPT-4o, OpenAI trained one neural network end-to-end across text, vision, and audio, so that audio input and audio output are handled by the same model without an intermediate text-conversion step.^[1] This allows the model to perceive characteristics of speech that a transcript cannot capture, and to respond with a spoken voice that can vary in tone and pacing.

How does Advanced Voice Mode work?

The defining technical characteristic of Advanced Voice Mode is that it relies on GPT-4o's native speech-to-speech processing. Rather than converting audio to text, reasoning over the text, and synthesizing new audio with three different systems, GPT-4o accepts audio directly and emits audio directly. OpenAI stated that all inputs and outputs are processed by the same neural network, and that because the model is multimodal it can "directly observe tone, multiple speakers, or background noises."^[1]

This single-model design produces two practical effects. First, it reduces latency dramatically compared with the chained Standard Voice Mode, enabling near-instant, back-and-forth exchanges in which the user can interrupt the assistant mid-sentence and have it stop and respond. Second, it preserves and can generate paralinguistic information: GPT-4o can detect emotional cues in a speaker's voice, such as whether they sound sad or excited, and can modulate its own delivery, including speaking in different emotive styles, varying its speed, and in some demonstrations singing.^[1]^[2] These behaviors are properties of the underlying GPT-4o model; Advanced Voice Mode is the consumer-facing ChatGPT feature that exposes them in a live conversational interface.

How is it different from the Realtime API and Voice Engine?

It is worth distinguishing Advanced Voice Mode from related OpenAI systems. The Realtime API, introduced for developers in October 2024, provides programmatic access to the same low-latency speech-to-speech capability for building third-party applications. Voice Engine is a separate research model that clones a specific person's voice from a short audio sample, and it is not the technology behind ChatGPT's preset assistant voices. Advanced Voice Mode specifically refers to the in-app ChatGPT experience.

What are the main features of Advanced Voice Mode?

Advanced Voice Mode is designed around natural, free-flowing conversation. Its principal features include:

Real-time, interruptible dialogue, allowing users to break in while the assistant is speaking.
Perception of tone and emotion in the user's speech, and the ability to convey emotion and varied intonation in its own responses.^[1]
Multiple preset voices. At the alpha launch, four voices were available: Juniper, Breeze, Cove, and Ember.^[2] The September 2024 general release added five more, Arbor, Maple, Sol, Spruce, and Vale, bringing the total to nine, alongside improved accents in supported languages.^[3]
Support for Custom Instructions and Memory, so the assistant can apply user preferences and recall details from earlier conversations.^[3]
Live video and screen sharing, added on December 12, 2024, which let users point a phone camera at objects or share their screen and ask ChatGPT about what it sees in real time.^[5]

When Advanced Voice limits are reached, or when it is otherwise unavailable, ChatGPT falls back to the older Standard Voice Mode. A version of Advanced Voice powered by GPT-4o mini later reached free users; OpenAI began rolling out a daily preview to all free ChatGPT users on February 25, 2025.^[6]^[7]

When was Advanced Voice Mode released?

Date	Milestone
September 25, 2023	Standard Voice Mode (Whisper plus text-to-speech) launches for Plus and Enterprise users^[4]
May 13, 2024	GPT-4o launched; native voice capability demonstrated at OpenAI's Spring Update^[1]
May 19, 2024	OpenAI pauses the "Sky" voice^[8]
July 30, 2024	Advanced Voice Mode alpha released to a small group of ChatGPT Plus users^[2]
September 24, 2024	Broad rollout begins to all Plus and Team users, with new voices and a new look^[3]
October 22, 2024	Advanced Voice becomes available to Plus users in the EU, Switzerland, Iceland, Norway, and Liechtenstein^[9]
December 12, 2024	Live video and screen sharing added during the "12 Days of OpenAI"^[5]
February 25, 2025	A GPT-4o mini-powered version begins rolling out to free users as a daily preview^[6]^[7]
November 25, 2025	Voice integrated directly into the chat thread, retiring the separate full-screen interface as the default^[10]^[11]

When the general rollout began on September 24, 2024, OpenAI also gave the feature a new visual identity, replacing the animated black dots shown at the May demo with a blue animated sphere.^[3] Enterprise and Edu customers received access in the week following the Plus and Team launch.^[3] The feature was initially unavailable in the European Union, the United Kingdom, Switzerland, Iceland, Norway, and Liechtenstein; OpenAI said that some regions require additional external reviews before launch.^[3]^[9]

What was the Sky voice controversy?

One of the five original ChatGPT voices, named Sky, became the subject of a public dispute in May 2024. After the GPT-4o demo on May 13, 2024, listeners compared Sky to the voice of actress Scarlett Johansson, who had voiced an artificial intelligence assistant in the 2013 film "Her." On the day of the demo, OpenAI chief executive Sam Altman posted the single word "her" on social media, which many interpreted as a reference to the film.^[10]

According to a statement Johansson released on May 20, 2024, Altman had approached her in September 2023 to voice the system, suggesting that her involvement could help bridge the gap between technology companies and creatives. She declined the offer. She said that two days before the May demo, Altman contacted her agent asking her to reconsider, before she was able to respond. Johansson said she was "shocked, angered and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine," and that she was "forced to hire legal counsel."^[10]

OpenAI paused the use of Sky in its products as of May 19, 2024.^[8] In a statement, Altman said, "The voice of Sky is not Scarlett Johansson's, and it was never intended to resemble hers. We cast the voice actor behind Sky's voice before any outreach to Ms. Johansson. Out of respect for Ms. Johansson, we have paused using Sky's voice in our products."^[8] In a blog post published the same day describing how the voices were chosen, OpenAI said the actors were selected through a multi-month casting process, that more than 400 submissions were reviewed before five voices were chosen, and that none of the voices were picked for similarity to any celebrity, with each being the talent's natural speaking voice.^[8]

What changed in the 2025 updates?

The largest change after the 2024 launch came on November 25, 2025, when OpenAI moved Advanced Voice into the main chat thread instead of a dedicated full-screen mode. Voice conversations now run inline alongside the text transcript, so users can speak, watch replies appear as text, scroll back through earlier messages, and see visuals such as images, maps, and other on-screen content while talking.^[10]^[11] OpenAI described the change directly: "You can now use ChatGPT Voice right inside chat, no separate mode needed. Talk, watch answers appear, review earlier messages, and see visuals like images or maps in real time."^[11] The earlier full-screen experience, an animated blue circle with no visible transcript, remained available for users who preferred it through a "Separate mode" toggle in ChatGPT's voice settings, but the in-chat experience became the new default across web and mobile apps.^[10]^[11]

By late 2025, OpenAI also detailed how voice usage is metered across plans. For paying subscribers, voice sessions automatically begin with GPT-4o and switch to GPT-4o mini once the daily GPT-4o allotment is used up, with daily voice usage described as nearly unlimited. Logged-in free users run on GPT-4o mini from the start, subject to a limit of about two hours per day.^[12]

How was Advanced Voice Mode received?

Advanced Voice Mode drew substantial attention for the naturalness of its conversations, with coverage describing the feature as "hyperrealistic" and noting how closely its pacing and emotional range approached human speech.^[2] The roughly seven-month gap between the May 2024 demo and the December 2024 arrival of live video, a capability shown at the original unveiling, was a recurring point in press coverage.^[5] The feature's delayed availability in Europe, attributed by some commentators to regulatory review under the EU AI Act and by OpenAI to standard external review processes, also generated discussion before the October 2024 EU rollout.^[9] The Sky episode became a widely cited example in debates over voice likeness, consent, and the rights of performers in generative AI.^[10]

References

OpenAI, "Hello GPT-4o," May 13, 2024. https://openai.com/index/hello-gpt-4o/ ↩
Maxwell Zeff, "OpenAI releases ChatGPT's hyperrealistic voice to some paying users," TechCrunch, July 30, 2024. https://techcrunch.com/2024/07/30/openai-releases-chatgpts-super-realistic-voice-feature/ ↩
Maxwell Zeff, "OpenAI rolls out Advanced Voice Mode with more voices and a new look," TechCrunch, September 24, 2024. https://techcrunch.com/2024/09/24/openai-rolls-out-advanced-voice-mode-with-more-voices-and-a-new-look/ ↩
Kyle Wiggers, "OpenAI gives ChatGPT a voice for verbal conversations," TechCrunch, September 25, 2023. https://techcrunch.com/2023/09/25/openai-chatgpt-voice/ ↩
Maxwell Zeff, "ChatGPT now understands real-time video, seven months after OpenAI first demoed it," TechCrunch, December 12, 2024. https://techcrunch.com/2024/12/12/chatgpt-now-understands-real-time-video-seven-months-after-openai-first-demoed-it/ ↩
OpenAI (@OpenAI), post announcing GPT-4o mini-powered Advanced Voice for free users, X, February 25, 2025. https://x.com/OpenAI/status/1894495906952876101 ↩
Sayan Sen, "ChatGPT's Advanced Voice Mode comes to free users with usage limits," Neowin, February 26, 2025. https://www.neowin.net/news/chatgpts-advanced-voice-mode-comes-to-free-users-with-usage-limits/ ↩
Bobby Allyn, "Scarlett Johansson wants answers about ChatGPT voice that sounds like 'Her'," NPR, May 20, 2024. https://www.npr.org/2024/05/20/1252495087/openai-pulls-ai-voice-that-was-compared-to-scarlett-johansson-in-the-movie-her ↩
"Here's why ChatGPT's Advanced Voice Mode isn't available in the EU region," Windows Central, October 2024. https://www.windowscentral.com/software-apps/chatgpt-advanced-voice-mode-feature-barred-by-eu ↩
Maxwell Zeff, "ChatGPT's voice mode is no longer a separate interface," TechCrunch, November 25, 2025. https://techcrunch.com/2025/11/25/chatgpts-voice-mode-is-no-longer-a-separate-interface/ ↩
Karissa Bell, "Now you can use ChatGPT Voice without leaving your chat," Engadget, November 25, 2025. https://www.engadget.com/ai/now-you-can-use-chatgpt-voice-without-leaving-your-chat-195000538.html ↩
OpenAI Help Center, "Voice Mode FAQ." https://help.openai.com/en/articles/8400625-voice-mode-faq ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

ChatGPT ChatGPT Guides ChatGPT Plus ChatGPT Pro Claude vs ChatGPT GPT-4o Voice AI Voice Engine (OpenAI)

What is Advanced Voice Mode?

How does Advanced Voice Mode work?

How is it different from the Realtime API and Voice Engine?

What are the main features of Advanced Voice Mode?

When was Advanced Voice Mode released?

What was the Sky voice controversy?

What changed in the 2025 updates?

How was Advanced Voice Mode received?

References

Improve this article

Related Articles

OpenAI Realtime API

GPT-Realtime / OpenAI Realtime API

Voice Engine (OpenAI)

ChatGPT

ChatGPT Guides

ChatGPT plugins

What links here

Related Articles

OpenAI Realtime API

GPT-Realtime / OpenAI Realtime API

Voice Engine (OpenAI)

ChatGPT

ChatGPT Guides

ChatGPT plugins

What links here