Voice Engine (OpenAI)

OpenAI Speech & Audio AI Voice AI

9 min read

Updated Jun 28, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 28, 2026

Fact-checked

In review queue

Sources

7 citations

Revision

v2 · 1,754 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Voice Engine is a speech-generation and voice-cloning model developed by OpenAI that can produce natural-sounding speech resembling a specific person from a single audio sample as short as 15 seconds. OpenAI publicly previewed the model on March 29, 2024, in a blog post titled "Navigating the challenges and opportunities of synthetic voices," but declined to release it widely, citing the risks of synthetic-voice misuse during a year of major elections.^[1]^[2] As of mid-2025, more than a year after the preview, OpenAI had still not made Voice Engine generally available, keeping it restricted to a small set of trusted partners.^[3]

Voice Engine is the underlying model that powers OpenAI's text-to-speech preset voices and the spoken-output features in ChatGPT, and it is distinct from Whisper (OpenAI's speech-to-text system) and from Advanced Voice Mode (the real-time conversational voice feature in ChatGPT).^[1]^[4]

What is OpenAI Voice Engine?

OpenAI says it first developed Voice Engine in late 2022 and has used it internally since then to provide the preset voices available in its text-to-speech API, as well as the ChatGPT Voice and Read Aloud features.^[1]^[4] The same model can also generate speech in multiple languages, including languages other than that of the original speaker, which underpins some of the translation use cases explored by partners.^[2]^[5] Because the model had already been running quietly in production for more than a year, the March 2024 preview was less a debut of new technology than the first public acknowledgment of what powers OpenAI's existing voice features.^[6]

The model was originally referred to internally as "Custom Voices." OpenAI had reportedly planned to bring it to the API on March 7, 2024, offering access to a group of up to 100 developers building applications with a clear social benefit or "innovative and responsible" uses, with proposed pricing of about $15 per one million characters for standard voices and $30 per million for "HD" quality, which TechCrunch estimated at roughly $1 per hour of audio.^[3]^[6] The company postponed that launch at the last minute and instead unveiled Voice Engine a few weeks later, as a limited preview without a public sign-up, available only to a cohort of around 10 developers it had begun working with in late 2023.^[3]^[6]

When was Voice Engine released?

Voice Engine has never been released to the general public. OpenAI previewed it on March 29, 2024, and described it as a "small scale preview" shared with about 10 developers rather than a product launch.^[1]^[6] The table below summarizes the model's timeline.

Date	Event
Late 2022	OpenAI develops Voice Engine internally and begins using it to power preset TTS voices, ChatGPT Voice, and Read Aloud^[1]^[6]
Late 2023	OpenAI begins privately testing the model with a small group of trusted partners^[3]
March 7, 2024	Planned API rollout to up to ~100 developers (postponed at the last minute)^[3]^[6]
March 29, 2024	Public preview published; not released widely on safety grounds^[1]^[2]
March 2025	One year on, still in preview with no announced launch date^[3]

How does Voice Engine work?

Voice Engine takes a short reference recording of a target speaker, around 15 seconds long, together with a passage of input text, and generates speech that reads the text in a voice resembling the reference speaker.^[1]^[2] According to reporting on the preview, the model uses a combination of diffusion and transformer techniques, and the reference audio submitted to the system is discarded after a request is completed.^[6] The preview did not expose fine-grained controls for adjusting tone, pitch, or cadence.^[6]

The table below summarizes the model's reported capabilities.

Capability	Detail
Reference sample	Single audio clip of roughly 15 seconds
Output	Natural-sounding speech reading arbitrary text in the reference voice
Languages	Multiple languages, including languages other than the speaker's own
Architecture (reported)	Combined diffusion and transformer approach
Reference handling	Submitted audio discarded after the request completes
Provenance	Generated audio carries a watermark to trace its origin

How is Voice Engine different from Whisper and Advanced Voice Mode?

OpenAI distinguishes Voice Engine from its other audio systems. Whisper transcribes speech into text and does not generate audio, while Advanced Voice Mode is a low-latency, interactive spoken-conversation capability built into ChatGPT. Voice Engine, by contrast, is the text-to-speech generation model that synthesizes audio output.^[1]^[4]

Who got access to Voice Engine and what was it used for?

Rather than a public launch, OpenAI shared Voice Engine with a small group of partners who agreed to its usage policies, including obtaining the explicit consent of any person whose voice was used, not impersonating individuals or organizations without permission, and disclosing to audiences that the voices were AI-generated.^[1]^[2] OpenAI highlighted several partners and the ways they were testing the model.

Partner	Field	Reported use case
Age of Learning	Education	Generating natural, emotive voice-over for pre-scripted educational content aimed at non-readers and children
HeyGen	Video / media	Translating video content so creators and businesses can reach audiences in multiple languages
Dimagi	Global health	Providing interactive feedback to community health workers in their native languages, including Swahili and a mix of Swahili and English (Sheng)
Livox	Accessibility	Powering augmentative and alternative communication (AAC) devices with more natural, distinct voices in multiple languages for people with disabilities
Lifespan, Norman Prince Neurosciences Institute	Healthcare	Exploring clinical use to restore the voices of patients with speech impairments from conditions such as brain tumors

The most widely cited example came from the Norman Prince Neurosciences Institute at Lifespan, a nonprofit health system affiliated with Brown University. Clinicians there, including Rohaid Ali and pediatric neurosurgeon Konstantina Svokos, used Voice Engine to help restore the voice of a young patient who had lost fluent speech because of a brain tumor, recreating her voice from a recording made for a school project before her condition worsened.^[2]^[5] OpenAI presented these pilots as illustrations of potential benefits in education, accessibility, translation, and care for people who have lost the ability to speak.^[1]

Why didn't OpenAI release Voice Engine widely?

OpenAI framed the preview explicitly as a discussion about safety rather than a product launch. In its announcement the company wrote, "We recognize that generating speech that resembles people's voices has serious risks, which are especially top of mind in an election year."^[1]^[2] It said it would preview the model with early testers but not widely release the technology at that time because of the potential for misuse.^[2] The company noted that 2024 would see voting in more than 80 countries and referenced real-world misuse of synthetic audio, including a January 2024 robocall in New Hampshire that used an AI-generated imitation of U.S. President Joe Biden, an incident that prompted action from the U.S. Federal Communications Commission, and the use of AI-generated speech tied to Pakistan's Imran Khan.^[2]^[6] These concerns were a central reason the company chose not to release the model broadly.

OpenAI said it implemented technical safeguards for the preview, including watermarking generated audio to trace its origin and proactively monitoring how partners used the system.^[1]^[2] It also set out a series of recommendations it believed should accompany any broad deployment of synthetic-voice technology by society as a whole.

Recommendation	Description
Phase out voice authentication	Stop using voice as a security factor for accessing bank accounts and other sensitive information
No-go voice list	Maintain lists that detect and prevent the creation of voices too similar to prominent figures
Provenance and watermarking	Accelerate techniques for tracking the origin of audiovisual content, such as watermarking
Public education	Educate the public about the capabilities and limits of AI, including the possibility of deceptive AI-generated content

OpenAI added that it was engaging with partners across government, media, entertainment, education, and civil society, and that policies protecting the use of individuals' voices in AI should be explored.^[1]^[2] Because building safeguards such as comprehensive no-go voice lists and broad voice-authentication phase-outs is technically and institutionally demanding, the model remained in limited preview with no announced timeline for general availability.^[3]

How was Voice Engine received?

Coverage of the preview emphasized the tension between the technology's potential and its risks. Outlets including TechCrunch, NBC News, VentureBeat, and Al Jazeera described Voice Engine as impressive but characterized OpenAI's decision to withhold it as a recognition that the tool was, in effect, too risky for unrestricted public release in 2024.^[2]^[3]^[6] Reporters noted that synthetic-voice cloning had become a fast-growing vector for fraud and that proposed safeguards such as watermarking can be difficult to enforce because watermarks may be stripped or bypassed.^[3]

A year after the preview, in March 2025, TechCrunch reported that Voice Engine still had not been released publicly and remained limited to roughly the same small group of partners. An OpenAI spokesperson said the company was "learning from how [our partners are] using the technology so we can improve the model's usefulness and safety."^[3] The article also reported that at least one partner, the accessibility company Livox, ultimately could not build Voice Engine into a product because the model's online requirement was difficult to reconcile with customers who depend on offline devices; Livox CEO Carlos Pereira said many of the company's users do not have internet access.^[3] The prolonged, deliberately cautious rollout was widely cited as an example of an AI developer restraining the release of a capable model on safety grounds.^[3]^[6]

References

OpenAI, "Navigating the challenges and opportunities of synthetic voices," March 29, 2024. https://openai.com/index/navigating-the-challenges-and-opportunities-of-synthetic-voices/ ↩
Al Jazeera, "OpenAI debuts voice cloning tool, but deems it too risky for public release," April 1, 2024. https://www.aljazeera.com/economy/2024/4/1/openai-unveils-voice-cloning-tool-but-deems-it-too-risky-for-public-release ↩
TechCrunch, "A year later, OpenAI still hasn't released its voice cloning tool," March 6, 2025. https://techcrunch.com/2025/03/06/a-year-later-openai-still-hasnt-released-its-voice-cloning-tool/ ↩
OpenAI, "Expanding on how Voice Engine works and our safety research," March 29, 2024. https://openai.com/index/expanding-on-how-voice-engine-works-and-our-safety-research/ ↩
FoneArena, "OpenAI unveils 'Voice Engine' text-to-voice AI model," March 30, 2024. https://www.fonearena.com/blog/420461/openai-voice-engine-text-to-voice-ai-model.html ↩
TechCrunch, "OpenAI built a voice cloning tool, but you can't use it... yet," March 29, 2024. https://techcrunch.com/2024/03/29/openai-custom-voice-engine-preview/ ↩
PBS News, "OpenAI reveals Voice Engine, but won't yet release it publicly due to safety concerns," March 29, 2024. https://www.pbs.org/newshour/economy/openai-reveals-voice-engine-but-wont-yet-release-it-publicly-due-to-safety-concerns

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

1 revision by 1 contributors · full history

Suggest edit

What links here

Advanced Voice Mode Text-to-Speech Models Voice AI

What is OpenAI Voice Engine?

When was Voice Engine released?

How does Voice Engine work?

How is Voice Engine different from Whisper and Advanced Voice Mode?

Who got access to Voice Engine and what was it used for?

Why didn't OpenAI release Voice Engine widely?

How was Voice Engine received?

References

Improve this article

Related Articles

OpenAI Realtime API

GPT-Realtime / OpenAI Realtime API

Murf AI

ElevenLabs

Voice cloning

Sesame (AI company)

What links here

Related Articles

OpenAI Realtime API

GPT-Realtime / OpenAI Realtime API

Murf AI

ElevenLabs

Voice cloning

Sesame (AI company)

What links here