Resemble AI
Last reviewed
Jun 4, 2026
Sources
22 citations
Review status
Source-backed
Revision
v1 ยท 2,267 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 4, 2026
Sources
22 citations
Review status
Source-backed
Revision
v1 ยท 2,267 words
Add missing citations, update stale details, or suggest a clearer explanation.
Resemble AI is a generative voice and AI security company based in San Francisco, California, and originally founded in Toronto, Canada. It builds tools for synthetic speech, including voice cloning, text-to-speech, speech-to-speech conversion, real-time voice, and multilingual localization, and it has increasingly positioned itself as a provider of AI media authentication through its Detect deepfake detector and its PerTh neural audio watermarker. The company was founded in 2019 by Zohaib Ahmed and Saqib Muhammad. By mid-2023 it reported more than one million users, and in 2025 it released Chatterbox, an open-source text-to-speech model that became one of the most-downloaded community voice models. Resemble AI is also notable for its public stance on responsible synthetic media: its co-founder and CEO testified before the United States Senate in 2024 on the risks of election deepfakes.
Resemble AI was started in 2019 by Zohaib Ahmed and Saqib Muhammad. Ahmed, the chief executive, had previously worked as a software engineer at Magic Leap, the travel-search startup Hipmunk, and BlackBerry, and studied at the University of Toronto. At Magic Leap he worked on novel interface interactions and on analyzing user data with deep learning, which drew him toward synthetic media and speech synthesis. According to the founders, the original idea grew out of an observation about video games: the recorded voice lines in games could not keep pace with the frequent updates and content changes shipped for the games themselves, so a way to generate new lines in a consistent voice would be valuable. The company was based in Toronto in its early years and later established a presence in San Francisco.
Before launching its commercial cloning product, Resemble open-sourced Resemblyzer, a speaker-verification model that takes a few seconds of speech and produces a representation used to compare voices and to help flag whether other audio is real or synthetic. That early work in speech recognition and voice verification foreshadowed the company's later move into deepfake detection.
Resemble AI raised an initial seed round in 2019 and added further early-stage capital over the following years, reaching roughly $4 million in total funding by 2021 according to startup-tracking databases. Its first widely reported institutional round came in July 2023.
In July 2023 the company announced an $8 million round, reported by TechCrunch, Voicebot, Yahoo Finance, and Venture Capital Journal and generally described in the press as a Series A. Javelin Venture Partners and Comcast Ventures led the round, with participation from Craft Ventures and Ubiquity Ventures. At the time, Resemble said the financing brought its total raised to about $12 million. The company launched its Detect deepfake voice detector alongside this announcement, signaling a strategic pivot toward AI security in addition to voice generation.
In December 2025 Resemble AI announced a further $13 million strategic round that brought its total venture funding to roughly $25 million. The round drew a notably broad set of corporate and strategic investors. The company tied the new capital to a sharp rise in deepfake-enabled fraud, citing an estimate of about $1.56 billion in deepfake-related losses in 2025 and projections that generative AI could enable far larger US fraud losses in the following years. Alongside the round it announced its DETECT-3B Omni multimodal detection model and a Gemini-powered explainability layer it calls Resemble Intelligence.
Note that funding databases such as Crunchbase classify several of these rounds as seed or early-stage rather than by a formal Series letter, while the financial press labeled the 2023 raise a Series A. The amounts and dates below are consistent across sources; the round naming is the main point of variation.
| Round | Date | Amount | Lead / notable investors | Cumulative total |
|---|---|---|---|---|
| Seed and early-stage | 2019 to 2021 | ~$4M (combined) | Early-stage backers | ~$4M |
| Series A (per press) | July 2023 | $8M | Javelin Venture Partners, Comcast Ventures (leads); Craft Ventures, Ubiquity Ventures | ~$12M |
| Strategic round | December 2025 | $13M | Sony Innovation Fund, Okta Ventures, Google's AI Futures Fund, KDDI Open Innovation Fund, Taiwania Capital, Gentree Fund, IAG Capital Partners, Berkeley Frontier Fund, Wa'ed Ventures, plus returning investors Comcast Ventures, Craft Ventures, Javelin, Ubiquity | ~$25M |
Resemble AI offers a generative voice toolkit alongside a separate suite of detection and watermarking products. The voice tools let users create and control synthetic voices for media, gaming, accessibility, and conversational applications, while the security tools are aimed at enterprises and governments that need to verify whether audio, images, or video are authentic.
The core platform is an API and web app for generative AI voices. Users can build a synthetic voice from recorded audio and then drive it with text or with other speech.
The company offers two cloning paths. Rapid Voice Cloning produces a usable voice from a very short reference clip (the company has cited figures of around 10 seconds for the first version and roughly 20 seconds for Rapid Voice Cloning 2.0), with the clone ready in under a minute. A higher-fidelity Professional Clone is trained on a larger set of varied speech, on the order of 10 to 25 minutes or more, and takes longer to prepare. The platform supports emotion and style control so that a given line can be delivered with different tone or intensity, and it offers real-time and speech-to-speech voice conversion for conversational agents and live applications.
Localize is Resemble's dubbing and translation feature. It lets teams take a custom or marketplace voice and reproduce content in a large number of additional languages using speech-to-speech dubbing, so that the same voice identity carries across localized versions of video or audio. The platform also includes a neural audio editor (and a fill feature) that blends recorded speech with generated speech, letting editors swap individual words, adjust delivery, or fix mistakes without re-recording the original talent.
In 2025 Resemble AI released Chatterbox, an open-source text-to-speech model family distributed on GitHub and Hugging Face under the permissive MIT license, which allows commercial use without royalties or usage caps. The original English model was released in mid-2025, followed by a multilingual version supporting more than 20 languages and a smaller, faster Turbo variant.
Chatterbox supports zero-shot voice cloning from roughly five seconds of reference audio and includes built-in PerTh watermarking on every output. Resemble describes it as the first open-source model with an emotion-exaggeration control that lets users dial delivery from flat to highly expressive with a single parameter, and the Turbo model adds native paralinguistic tags such as [laugh], [cough], [sigh], and [whisper]. The original and multilingual models use roughly a 0.5-billion-parameter (500M) architecture, while Chatterbox Turbo uses a leaner 350M-parameter design that Resemble says runs several times faster than real time on a single GPU. The company has published blind-comparison results, run through the evaluation service Podonos, in which a majority of listeners preferred Chatterbox over ElevenLabs in head-to-head tests. The repository became popular quickly, accumulating tens of thousands of GitHub stars.
Resemble Detect is the company's deepfake detection product, first launched in 2023 and substantially expanded since. It analyzes audio (and, in later versions, images and video) to determine whether content is AI-generated, returning a confidence score rather than relying on simply matching against a list of known fakes. The system is trained to recognize the statistical artifacts that generative models leave behind, which the company says makes it more robust to compression, filtering, and other post-processing.
At its 2023 launch, Resemble cited real-time accuracy figures in the high-90-percent range for voices the model had encountered, with lower accuracy on previously unseen voices. With the December 2025 release of the DETECT-3B Omni model, the company described a single roughly 3-billion-parameter multimodal architecture covering audio, image, and video, claiming accuracy around 98 percent across dozens of languages and sub-second detection latency. Resemble markets Detect for use cases such as defending against voice phishing (vishing), social engineering, wire-transfer fraud, KYC bypass, and synthetic candidates in hiring, and it offers integrations for video-conferencing tools, telephony, browser extensions, and APIs. The company states that its detection models have ranked at or near the top of independent third-party benchmarks for audio and image deepfake detection, though such rankings should be read as the company's own characterization.
PerTh is Resemble's neural audio watermarker. The name refers to the perceptual threshold of human hearing: the watermark uses psychoacoustics and auditory masking to hide data in parts of the signal that sit below what listeners can perceive, in regions already masked by louder nearby sounds. Because the payload is spread across the waveform, it can be recovered from a short non-silent segment rather than requiring the whole file.
Resemble says PerTh is designed to survive common transformations, including MP3 compression and re-encoding, resampling, time-stretching and speed changes, pitch shifting, filtering, and added noise, while remaining recoverable with very high accuracy. PerTh is applied automatically to Resemble-generated audio (and to Chatterbox output) and works alongside Detect: the watermark answers whether a clip originated from Resemble, while Detect addresses the broader question of whether arbitrary audio is synthetic.
Resemble AI has consistently framed voice cloning as a technology that needs consent controls and provenance tooling. The platform requires consent from the person whose voice is cloned, including spoken consent clips, and the company says users retain ownership of their cloned voices and that customer voice data is not used to train other models.
The company has also engaged directly with policymakers. On April 16, 2024, CEO Zohaib Ahmed testified before the United States Senate Judiciary Subcommittee on Privacy, Technology, and the Law at a hearing on AI election deepfakes. In his testimony he argued for clear labeling of AI-generated content and urged Congress to require platforms to use watermarking and deepfake-detection technology so that audiences can tell real content from synthetic content. The hearing took place against the backdrop of legislative proposals such as the bipartisan NO FAKES Act, which sought federal rules around the use of a person's voice, name, and likeness.
Resemble's voice tools are used across media production, gaming, advertising, e-learning, accessibility, and conversational AI. Reported and company-cited uses include character voices for film and games, audiobook narration, multilingual dubbing of video, interactive voice response and call-center voices, personalized voice messages, and synthetic speech data for machine-learning training. The company has said its platform generated very large volumes of audio (it cited the equivalent of decades of audio over a twelve-month period around 2023) and that it served more than a million users, a meaningful share of them paying.
On the security side, Resemble markets Detect to enterprises and the public sector, and in connection with its 2025 round it described customers among global entertainment companies, Fortune 500 telecommunications providers, and government agencies. As with any vendor-supplied customer list, these claims come from the company.
Coverage of Resemble AI has tended to focus on two themes: the quality and speed of its voice cloning, and its unusual decision (for a synthetic-voice company) to also build deepfake detection and watermarking. Outlets including TechCrunch, Voicebot, and SecurityWeek covered its funding and its Detect launches, and the open-source Chatterbox release drew attention in the developer community for matching or beating commercial systems on listener-preference tests while being freely licensed. The company operates in a competitive field that includes ElevenLabs and other voice AI and natural language processing startups, and its detection work overlaps with a growing market for media-authentication and content-provenance tools.