HeyGen
Last reviewed
May 7, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v6 ยท 4,799 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 7, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v6 ยท 4,799 words
Add missing citations, update stale details, or suggest a clearer explanation.
HeyGen is an American artificial intelligence company headquartered in Los Angeles, California, that develops AI-powered video generation software specializing in digital avatars, voice cloning, and multilingual video translation. Founded in December 2020 by Joshua Xu and Wayne Liang, the company builds tools that allow businesses and individuals to produce professional-quality video content featuring lifelike speaking avatars without requiring cameras, studios, or on-camera talent. HeyGen's platform is used by more than 85,000 businesses globally for applications ranging from marketing and sales outreach to corporate training, employee onboarding, and multilingual content localization. The company reached $100 million in annual recurring revenue (ARR) by October 2025, making it one of the fastest-growing generative AI companies by revenue growth rate.
HeyGen's most recognized products include its avatar video generation studio, its viral Video Translation tool that lip-syncs speakers into more than 175 languages while preserving their original voice and accent, and its Interactive Avatar (LiveAvatar) system that enables real-time conversational AI experiences. The company raised a $60 million Series A round led by Benchmark in June 2024 at a $500 million valuation and has attracted enterprise customers including OpenAI, HubSpot, Ogilvy, and Deloitte.
HeyGen traces its origins to December 2020, when Joshua Xu and Wayne Liang founded the company in Shenzhen, China, originally under the name Surreal. The two founders had known each other since their undergraduate years at Tongji University in Shanghai and later both completed graduate degrees at Carnegie Mellon University in Pittsburgh. After finishing their studies, they moved to the West Coast of the United States in 2014, where they built careers in the technology industry before eventually returning to found a company together.
Xu spent six years as a software engineer at Snap, the parent company of Snapchat, where his final two years were dedicated to the company's advertising team and its efforts to integrate AI into its camera and content tools. Liang worked as a product designer at Smule, a social music application known for its karaoke features. Both founders developed firsthand exposure to the barriers that face content creators: expensive equipment, logistical complexity of video production, and the difficulty of scaling video output. Their shared frustration with these constraints motivated them to explore whether AI could remove the bottlenecks from video creation entirely.
In 2021, HeyGen (then Surreal) raised seed capital from Sequoia China (subsequently renamed HongShan following its legal separation from the global Sequoia network) and ZhenFund, a Chinese early-stage investment firm known for backing technology companies founded by Chinese entrepreneurs with overseas experience.
In 2022, Xu and Liang relocated the company to Los Angeles and rebranded it as Movio. The relocation was driven by a combination of strategic and practical factors. California provided proximity to major enterprise customers such as Salesforce, Amazon, and Nvidia, which the founders were actively pursuing. Los Angeles also offered access to the kind of semiconductor infrastructure and AI research ecosystem that the company needed to accelerate its model training efforts. The founders were explicit in interviews that the move was partly about accessing the most advanced hardware for building their AI systems at a time when computational resources were increasingly concentrated in the United States.
The Movio name positioned the company as an AI video production platform for enterprise spokesperson content. During this period the product offered tools to generate scripted talking-head videos featuring AI avatars, targeting businesses that needed promotional or instructional video content without the expense of hiring actors or booking studios. The platform gained early traction with small and medium-sized businesses that were looking to reduce video production costs.
In 2022, HeyGen launched its web application to the public. Within months of launch, demand outpaced infrastructure capacity and the platform experienced system crashes as users flooded in, a sign of product-market fit that would prove to be a recurring theme in the company's early growth story.
In April 2023, Movio rebranded as HeyGen. The name change signaled a broader strategic pivot: rather than focusing narrowly on generating spokesperson videos, the company was expanding toward a more comprehensive AI video creation platform. The new brand identity reflected a more versatile product ambition.
Also in 2023, HeyGen dissolved its Shenzhen legal entity. The decision came amid heightened regulatory scrutiny in the United States regarding Chinese investor connections in AI companies. By severing the Chinese corporate structure, HeyGen aligned itself more cleanly with the US regulatory environment at a time when concerns about data security and foreign investment in AI were becoming significant factors in enterprise procurement decisions. The company retained its Chinese-American founding team and its headquarters in Los Angeles.
In November 2023, HeyGen raised $5.6 million from Conviction, the venture capital firm founded by Sarah Guo, a former Greylock partner and prominent voice in the AI investment community. At the time of that raise, the company had approximately 25 employees. HeyGen reached $1 million ARR in just 178 days from launch, and by October 2023 it had crossed $10 million ARR, a pace of revenue growth that attracted significant attention from institutional investors.
In March 2024, HeyGen strengthened its executive leadership team with three significant hires. Dave King, previously at Asana, joined as Chief Business Officer to lead the company's enterprise sales and partnerships function. Rong Yan, a former engineering leader at HubSpot, joined as Chief Technology Officer to oversee product development and AI infrastructure. Lavanya Poreddy joined as Head of Trust and Safety to manage the growing challenge of content moderation and misuse prevention as the platform scaled. These additions signaled a shift from a startup in product-finding mode to a company building the organizational infrastructure for sustained enterprise growth.
HeyGen has raised approximately $69 million in disclosed funding across multiple rounds since its founding.
The company's seed round in 2021 came from Sequoia China (HongShan) and ZhenFund, providing the initial capital to build out the core avatar generation technology and relocate the team to Los Angeles. In November 2023, the company raised $5.6 million from Conviction in a round that gave investors an early look at revenue metrics that were growing at an unusually fast clip.
The most significant financing event in HeyGen's history was its Series A round, announced in June 2024. Benchmark, one of the most storied venture capital firms in Silicon Valley and an early backer of companies including eBay, Twitter, Instagram, and Uber, led the round with $60 million. Additional investors participating in the Series A included Thrive Capital, BOND, SV Angels, and Conviction. Notable angel investors included Neil Mehta, founder of Greenoaks Capital, and Dylan Field, CEO and co-founder of Figma. The round valued HeyGen at $500 million.
At the time of the Series A, HeyGen had surpassed $35 million in ARR and was serving more than 40,000 customers. The company had also been profitable since the second quarter of 2023, an unusual distinction among generative AI startups that had often prioritized growth over margins. By the end of 2024, HeyGen's ARR had grown to approximately $57.5 million. By September 2025, Sacra estimated the figure at $95 million, and HeyGen announced it had crossed $100 million ARR in October 2025.
No additional funding rounds had been publicly disclosed as of the time of writing, though the company's revenue growth and profitability suggested it was not under near-term pressure to raise external capital.
HeyGen's core product is a web-based studio for creating avatar-driven video content. The platform gives users access to a library of more than 1,100 pre-built digital avatars, covering a range of appearances, ages, ethnicities, and professional presentations. Users can also create personal avatars from their own footage. The studio offers more than 300 AI voice options across numerous languages and more than 400 video templates designed for common business use cases.
The video creation workflow in the studio is designed for non-technical users. A user selects an avatar, writes or pastes a script, chooses a voice, selects a template or background, and submits the job for rendering. The platform handles the AI processing of lip-sync, facial expression generation, and voice synthesis. Videos can be exported in standard resolutions, with higher-tier plans supporting 4K output.
The system supports a range of customization options including adjustments to voice pitch and speech speed, selection of background environments, overlays of branded elements, and modifications to avatar clothing and appearance within available parameters. Enterprise users have access to additional controls over output consistency and branding standards.
HeyGen has released successive generations of its core avatar model, with Avatar IV and Avatar V representing the most technically advanced iterations. Avatar IV, launched in April 2025, introduced the ability to generate a photorealistic talking avatar from a single still photograph combined with a text script or audio file. The model uses what HeyGen describes as diffusion-inspired audio-to-expression technology to synthesize facial movements, lip sync, and subtle expressions from still images without requiring video training footage.
Avatar IV's most notable advancement over prior models is the rendering of hand gestures and full-body motion that respond naturally to the emotional tone and cadence of the script. Prior avatar systems from HeyGen and competitors typically produced floating head or upper-body representations with limited or absent hand movement. Avatar IV's gesture generation brought the output closer to what a viewer would expect from a naturally shot interview or presentation video.
Avatar V, announced in late 2025, further raised the quality ceiling by enabling studio-grade avatar creation from a 15-second recording. Earlier versions of the technology required several minutes to hours of training footage to produce a custom avatar with reliable quality. Reducing the input requirement to 15 seconds significantly lowered the barrier for users who wanted a personal avatar rather than a pre-built one. Avatar V introduced multi-angle stability and long-form performance consistency, allowing users to generate videos of 10 minutes or longer in a single pass without quality degradation over time.
In November 2025, HeyGen released major improvements to Avatar IV's Fast Mode and Quality Mode. Fast Mode became 50% faster while maintaining quality thresholds. Quality Mode added improvements to expressiveness and timing precision, particularly for custom motion prompting where subtle control over avatar behavior was desired. The company also introduced Avatar Memory, a feature that saves successful motion clips for reuse in future videos without consuming additional generation credits.
HeyGen's Video Translation product is the feature most responsible for the company's early viral growth and broad public recognition. The tool accepts a video input, either uploaded directly or referenced via a YouTube URL, and produces an output version of that video with the speaker dubbed into a target language. The system transcribes the original audio, translates the text, synthesizes a new voiceover in the target language using voice cloning to approximate the original speaker's intonation and accent, and then applies lip-sync to align the speaker's visible mouth movements with the new audio track.
The feature supports more than 70 languages and 175 dialects. Unlike traditional dubbing, which tends to replace the original speaker's voice entirely with a different voice actor's recording, HeyGen's translation attempts to preserve acoustic characteristics of the original speaker, including accent qualities, speaking rhythm, and emotional tone. The result is a dubbed video that sounds recognizably like the original speaker even when the language is completely different.
The translation tool attracted widespread attention in January 2024 when a dubbed version of a speech by Argentine President Javier Milei at the World Economic Forum in Davos circulated widely on social media. In the original recording, Milei delivered his remarks in Spanish. A HeyGen-translated version in English retained his distinctive cadence and accent characteristics so convincingly that the clip was shared millions of times across platforms including X (formerly Twitter), accumulating more than 75 million views and nearly 240,000 likes according to reports at the time. Technology observers and journalists noted that the lip-sync quality and voice preservation were markedly better than previous AI dubbing tools, and the Davos clip became one of the defining demonstrations of the commercial maturity of AI video translation.
The platform subsequently became a standard tool for international content localization, enabling companies and creators to reach audiences in languages other than their own without the cost and coordination overhead of traditional dubbing studios. HeyGen noted that one of its customers, Deloitte, used the translation feature to roll out compliance training content across 40 countries, reducing the time required from three weeks to a single day.
HeyGen's Interactive Avatar product extends the company's avatar technology from pre-rendered video into real-time conversational experiences. Rather than generating a finished video that a user watches, Interactive Avatar creates a live video stream featuring an AI avatar that listens to user inputs and responds in real time, with synchronized lip movements, natural facial expressions, and appropriate gestures.
The system uses computer vision and speech synthesis to replicate the communicative behaviors of a human speaker, including turn-taking cues, gaze behavior, and responsive expression changes that correspond to the content of what is being said. Developers access the capability via API and can connect the avatar to any large language model backend, such as OpenAI's GPT models or a custom-trained system, to control the content of the responses. The avatar functions as the visual interface for the AI agent behind it.
LiveAvatar, launched as the production successor to the original Interactive Avatar beta, operates on a dedicated platform at liveavatar.com. It uses WebRTC for low-latency video and audio streaming and integrates with ElevenLabs Flash v2.5 for text-to-speech generation, with automatic speech recognition handled by Deepgram and AssemblyAI. The system is designed for enterprise use cases that require high-volume, reliable streaming with minimal latency, and is positioned as infrastructure for AI agents that need a human face.
Documented applications for Interactive Avatar include customer support agents that handle routine inquiries with a visual presence, sales enablement tools that deliver personalized pitches, educational tutors that adapt to student responses, and product demonstration systems that can answer questions about specifications in real time. The technology also enables avatars to join video conferencing calls on platforms such as Zoom, representing a participant or brand without requiring a human on camera.
HeyGen Studio refers to the company's broader video production environment that goes beyond single-avatar talking-head videos. The studio interface supports multi-scene video assembly, brand kit integration for consistent visual identity across outputs, and collaborative editing workflows for team environments. In a major redesign released in late 2025, the studio interface was overhauled for speed and usability, with improved response times when adjusting parameters, a cleaner layout for long editing sessions, and tighter integration between the avatar generation tools and the broader video editing workspace.
The studio also incorporated integrations with third-party AI video generation models. In 2025, HeyGen announced integration with Google's Veo 3.1 video generation model, allowing users to combine their personal avatars and voice profiles with generative video scenes produced by Veo, enabling productions that blend avatar-based talking segments with AI-generated background footage while maintaining voice and appearance consistency.
HeyGen offers a developer-facing API that provides programmatic access to its avatar video generation, text-to-speech, video translation, and interactive avatar capabilities. The API is designed for teams that want to embed AI video generation into their own products or workflows rather than using the web studio directly.
The API's capabilities include generating avatar videos from scripts and voice parameters, triggering video translation jobs for uploaded content, and initializing real-time interactive avatar sessions via WebRTC. Developers specify parameters including avatar selection, voice choice, background settings, layout configuration, and output resolution. Results are returned asynchronously, with a webhook or polling mechanism for retrieving completed video files.
Original API pricing was structured around tiered monthly subscription plans. As of February 2026, HeyGen migrated the API to a usage-based pay-as-you-go billing model, charging per second of video generated rather than per subscription tier. Credits are purchased and consumed based on actual generation time, with costs varying by output resolution (standard definition versus high definition), avatar type (library avatar versus a user's custom Instant Avatar), and whether real-time streaming features are included. Enterprise API contracts include volume discounts, dedicated developer support, access to the Digital Twin Creation API, and a Proofreading API that checks scripts before generation.
The API has been used to build a range of downstream products, including personalized sales video tools, automated training content pipelines, and customer-facing support interfaces powered by Interactive Avatar.
HeyGen serves more than 85,000 paying customers globally as of mid-2025, spanning enterprise clients, small and medium-sized businesses, and individual creators. The customer base includes organizations across sectors including technology, consulting, marketing, education, healthcare, and financial services.
Notable enterprise customers include OpenAI, HubSpot, and Ogilvy. Deloitte has been publicly cited as using HeyGen to localize corporate compliance training content across dozens of countries at a fraction of the time and cost required by traditional production methods.
The McDonald's "Grandma McFlurry" campaign, launched in June 2024, was cited as a notable use case demonstrating HeyGen's potential in consumer marketing. The campaign leveraged HeyGen's translation and localization capabilities to connect with customers across language barriers in an emotionally resonant way.
The United Nations Development Programme used HeyGen's avatar technology for a campaign called "Weather Kids," which featured AI avatar forecasters to communicate climate change information. The campaign illustrated use cases for HeyGen in the non-profit and advocacy sectors where production budgets are constrained but multilingual reach is essential.
Marketing teams use HeyGen to produce promotional videos, product explainers, social media content, and personalized outreach videos at scale. The ability to generate new video content from a script without scheduling camera time or hiring talent allows marketing departments to iterate rapidly on messaging and adapt content for different audiences, channels, and languages. Sales teams use the Interactive Avatar and LiveAvatar products to create personalized video pitches or deploy AI-powered virtual sales representatives.
Corporate training is among the highest-volume use cases for HeyGen's enterprise customers. Human resources and learning and development teams use the platform to create onboarding content, compliance training, skills development courses, and internal communications videos. The translation feature is particularly valuable in this context because multinational organizations can produce master training content in one language and localize it into dozens of others without re-recording.
Media companies, content creators, and organizations with international audiences use HeyGen's translation product to extend the reach of existing video content. A video originally recorded in English can be translated into Spanish, Mandarin, French, Arabic, Portuguese, and dozens of other languages within hours, with the speaker's voice and lip movements synchronized to each translated version. This capability has been used for educational content, documentary narration, brand videos, and news broadcasting.
HeyGen operates in a market that includes several other AI avatar and video generation companies, each with distinct technical approaches, product emphases, and target customer segments.
| Feature | HeyGen | Synthesia | Tavus | D-ID |
|---|---|---|---|---|
| Avatar quality | Very high; Avatar IV/V models with full-body gestures | High; studio-captured STUDIO avatars | High; strong real-time focus | Moderate; photo-animation approach |
| Video translation | 70+ languages, 175+ dialects; voice cloning | 140+ languages | Limited | Limited |
| Interactive / real-time | Yes; LiveAvatar via WebRTC | No | Yes; core product focus | Yes; Chat.D-ID |
| Custom avatar creation | Yes; from photo (Avatar IV) or short video (Avatar V) | Yes; custom avatar capture | Yes; video-based | Yes; from photos |
| API access | Yes; pay-as-you-go | Yes | Yes | Yes |
| Pricing (entry) | $29/month | ~$22.50/month | Higher; enterprise-focused | From ~$5.90/month |
| Primary audience | Businesses, creators, enterprise | Enterprise, L&D teams | Enterprise, real-time AI agents | SMBs, developers |
Synthesia, founded in 2017 and headquartered in London, is HeyGen's closest direct competitor in the pre-rendered avatar video market. Synthesia has raised over $336 million and serves primarily enterprise customers in learning and development, communications, and HR. Its STUDIO avatars are captured under controlled professional lighting conditions, producing a consistent, high-polish aesthetic that is well suited to formal corporate content. Synthesia's pricing starts lower than HeyGen's at the entry level, and the platform offers strong team collaboration features. However, Synthesia does not offer real-time interactive avatar capabilities comparable to HeyGen's LiveAvatar product, and its translation feature, while broad in language coverage, does not emphasize the voice accent preservation that distinguishes HeyGen's translation output.
Tavus is an American company that focuses specifically on personalized video at scale and real-time conversational AI avatars. Tavus's Phoenix model generates realistic replicas from video footage and its Conversational Video Interface (CVI) product is purpose-built for real-time AI agent applications, making it a direct competitor to HeyGen's LiveAvatar and Interactive Avatar products. Tavus is positioned at the premium end of the market in terms of real-time interaction quality and tends to target enterprises building AI agent products rather than content marketing teams. Its pricing is higher than HeyGen's standard tiers, and its scalability for high-volume pre-rendered video production is generally considered less developed than HeyGen's studio offering.
D-ID, founded in 2017 in Tel Aviv, Israel, takes a different technical approach by animating still photographs to create talking portraits. Its Creative Reality platform transforms a static image and a voice track into a video of the photographed person speaking, which can be produced at very low cost. D-ID is priced significantly below HeyGen, with entry plans starting around $5.90 per month, making it accessible to individual users and small businesses with minimal video budgets. The photograph animation approach produces lower overall visual quality than HeyGen's video-trained avatar models, with particular limitations in the rendering of complex mouth shapes and expressive eye contact. D-ID offers Chat.D-ID for conversational avatar interactions. Its customer base skews toward individual creators and small businesses rather than enterprise accounts.
Beyond these direct competitors, HeyGen faces broader competitive pressure from general-purpose AI video generation platforms including Runway and Adobe's Firefly Video tools, as well as from OpenAI's Sora, which focuses on text-to-video generation rather than avatar-based production. These tools target different production workflows and user needs than HeyGen's avatar-centric approach, but as AI video generation capabilities converge, the distinctions between these product categories are expected to narrow.
HeyGen has received substantial positive attention from the technology and business press, driven primarily by its rapid revenue growth and the viral demonstrations of its translation technology.
In November 2024, Fast Company included HeyGen on its "Next Big Things in Tech" list, recognizing the company's impact on video content creation. Inc. Magazine named HeyGen to its "Best in Business" list for 2024. The company was widely cited in generative AI industry analyses as one of the fastest-growing AI startups by revenue trajectory in the 2023-2025 period.
The Milei speech translation at the World Economic Forum in January 2024 served as perhaps the single most effective public demonstration of what AI video translation could achieve. The clip circulated across media platforms and prompted coverage in publications including The National, Slator, and various mainstream technology outlets, bringing HeyGen's name to audiences that had not previously encountered the company. The demonstration catalyzed a significant increase in sign-ups and accelerated enterprise sales conversations.
HeyGen's technology, like all AI systems capable of generating realistic video of real people speaking, has attracted criticism and concern related to deepfake misuse. The company has been identified in reporting by security analysts and journalists as a tool that has been used to create harmful synthetic media, including content deployed in consumer scams, deceptive health-related misinformation, and geopolitical propaganda. Russian cybersecurity firm Group-IB documented instances of HeyGen-generated content appearing in disinformation campaigns, and multiple media reports noted its use in financial scam videos circulated on social media platforms.
The misuse concern became particularly pointed in October 2024 when HeyGen's own CEO, Joshua Xu, became the subject of an AI-generated deepfake video used in a cryptocurrency scam and posted on YouTube. The incident received coverage in outlets including Boing Boing and was cited as an ironic illustration of the risks posed by the same technology his company sells. HeyGen stated in response that it strictly prohibits the creation of unauthorized content and that the video had been made without the company's involvement.
HeyGen has implemented a range of content moderation and trust and safety measures in response to these concerns. Users submitting their own likeness to create a custom avatar are required to provide verbal consent, including speaking a randomly generated password to verify identity and confirm consent. The platform employs automated content filters to flag potentially harmful content as well as human moderators who review flagged material and remove content used for bullying, harassment, and disinformation. The company published a moderation policy and trust and safety guidelines that outline prohibited uses, including the creation of non-consensual intimate imagery, impersonation of public figures for fraudulent purposes, and content designed to deceive viewers about its synthetic nature.
Critics have argued that consent-verification workflows place too much of the enforcement burden on self-reporting and that the scale at which AI video generation operates makes comprehensive human moderation practically difficult. The question of whether AI video companies can effectively police downstream misuse of their technology remains unresolved and is the subject of ongoing policy and regulatory debate in multiple jurisdictions.
The broader ethical debate around AI-generated video and its implications for trust in visual media has shaped how HeyGen communicates about its products. The company has published blog posts and guidance specifically addressing the difference between legitimate commercial use of AI avatars and the creation of deepfakes intended to deceive, and it has sought to distinguish its platform from tools designed for the latter purpose.
Despite its technical advances, HeyGen's avatar generation technology carries several limitations that affect its practical utility in certain contexts.
Avatar quality, while markedly improved in the Avatar IV and Avatar V generations, remains distinguishable from real human video when viewed closely or on large displays. Subtle inconsistencies in facial geometry, lighting interaction with skin textures, and the rendering of fine details such as hair and teeth can signal to attentive viewers that the content is AI-generated. The visual quality gap narrows with each model generation but has not been fully closed as of the time of writing.
The translation feature's voice cloning, while impressive in preserving general acoustic character, does not perfectly replicate every acoustic quality of the original speaker across all languages. Tonal languages such as Mandarin and Thai present particular challenges because the prosodic patterns of the original speaker may not map cleanly onto the pitch contours required by the target language's grammatical system.
Interactive Avatar and LiveAvatar latency, while low relative to earlier generations of the technology, remains noticeable in conversations and falls short of the near-zero perceptible delay of a real video call with a human participant. In use cases that require very rapid back-and-forth conversational exchange, the delay can interrupt the sense of natural turn-taking.
HeyGen's pricing structure has drawn criticism from some users and reviewers. Reviewers have noted that the credit-based system for both the web studio and the API can lead to unexpectedly high costs for users who underestimate their consumption, and that enterprise pricing lacks transparency before formal sales engagement. The migration of the API from tiered subscriptions to pay-as-you-go billing in early 2026 addressed some predictability concerns but introduced new complexity in estimating costs for variable workloads.
Finally, the platform's capabilities are primarily suited to talking-head and upper-body video formats. HeyGen's tools are less effective for productions that require full-scene video narrative, complex camera movement, or the kind of dynamic editing typical of commercial advertising or cinematic content. For these use cases, text-to-video models or traditional production methods remain more appropriate.