HeyGen Avatar IV

Avatar IV is the fourth generation of the AI avatar engine from HeyGen, the AI video company co-founded in 2020 by Joshua Xu and Wayne Liang. It was announced by Xu on May 6, 2025, with the model going live in HeyGen's web app the same week. Avatar IV generates a talking-avatar video from a single photo, a script or audio file, and a voice, and is built on what HeyGen describes as a diffusion-inspired audio-to-expression engine that drives facial motion, head movement and micro-expressions from the input audio rather than from a fixed gesture library. Through the second half of 2025 the model expanded from head-and-shoulders output into half-body and full-body framings with timing-aware hand gestures, and on November 5, 2025 HeyGen paired it with a new voice control layer called Voice Director and the Panda Voice Engine.

Avatar IV is the headline model in HeyGen's lineup, sitting above the older Avatar III lip-sync engine and beneath the video-reference Avatar V model that followed it in April 2026. It is gated as a premium feature on HeyGen's plans, with usage metered in Premium Credits at roughly 20 credits per minute of generated video, while Avatar III remains the unmetered workhorse for paid users. By the time of HeyGen's $100 million annual recurring revenue milestone in October 2025, Avatar IV was the model the company pointed to in nearly every product announcement, including the August 2025 upgrade of the Digital Twin feature and the May 2026 launch of the Avatar IV API.

Background

HeyGen was founded in Shenzhen in December 2020 as Surreal by Joshua Xu and Wayne Liang, both Tongji University alumni with master's degrees from Carnegie Mellon University. The company moved its headquarters to Los Angeles, rebranded to HeyGen, and over the next four years built an avatar video platform around stock avatars, voice cloning and AI dubbing. By June 2024 HeyGen had raised a $60 million round led by Benchmark at a $500 million valuation, and by October 2025 the company stated it had crossed $100 million in annual recurring revenue, a roughly tenfold jump from late 2023.

The avatar pipeline went through several model generations before Avatar IV. The earliest engines were focused on accurate lip sync over a stock or custom presenter, with movement limited to small head motion against a static background. Avatar III, the immediate predecessor, kept lip sync as its main strength and is described in HeyGen's own product materials as the workhorse model that remains unmetered for paid users. Avatar III handled the bulk of routine talking-head generation but did not produce significant body motion or context-aware gestures, and its outputs were generally framed as head-and-shoulders. Avatar IV was designed to add expressive performance on top of accurate lip sync, with HeyGen positioning it as a model that interprets the script rather than only syncing to it.

The launch landed during a noisy stretch in the AI avatar market. Synthesia was preparing its own Express-2 engine, which shipped in October 2025 as part of Synthesia 3.0 and made full-body avatars and in-context voice cloning the new baseline. D-ID, Hedra and several open-source projects were also pushing on photoreal head-and-shoulders avatars. HeyGen's positioning with Avatar IV was that a single photo plus a voice should be enough to drive a performance rather than a lip sync, and that the model should scale from a portrait crop up to a full-body shot without needing a separate motion-capture pipeline.

Announcement and rollout

Joshua Xu announced Avatar IV on X on May 6, 2025, framing it as HeyGen's most advanced avatar model and describing the input as "one photo, one script, just your voice." In the same post Xu wrote that "most avatars sync to your words, Avatar IV interprets them," and credited a diffusion-inspired audio-to-expression engine that analyzes vocal tone, rhythm and emotion as the core architecture. HeyGen ran a live product launch event for Avatar IV in the same window and pushed the model into general availability across its web app and creator plans in the days that followed.

The initial release covered portrait and partial-body framings, photorealistic and stylized characters, and a wide range of input photo angles including front-facing, three-quarter and profile shots. HeyGen also marketed the model as working on non-human characters such as anime portraits and pets, where the audio-to-expression engine drives mouth and head motion from the same audio input. Across the rest of 2025 HeyGen layered new capabilities on top of the launch release in monthly product update cycles, rather than calling them new model generations.

The most significant rollout milestones after the initial announcement were:

June 5, 2025: Xu announced Prompting Gesture Control inside Avatar IV, letting users prompt the avatar to perform specific gestures and body movements that line up with the script.
August 27, 2025: HeyGen upgraded its Digital Twin feature so that personal avatars created from user footage were powered by Avatar IV instead of older engines. Xu described the new Digital Twin as "indistinguishable from you," mirroring user gestures, expressions and mannerisms.
November 5, 2025: HeyGen released Voice Director and the Panda Voice Engine together with an updated Avatar IV that picks up facial expression cues from the new voice layer.
May 4, 2026: HeyGen opened an Avatar IV API on Pro and Scale developer tiers, exposing the same video generation model to programmatic POST requests.

Avatar IV's successor, Avatar V, was introduced at a HeyGen webinar on April 16, 2026. Avatar V is positioned as a video-reference model that fine-tunes on a 15-second clip of the user rather than predicting motion from a single image, but Avatar IV remains live in the product and remains the model HeyGen recommends when only a photo is available.

Avatar IV capabilities

Avatar IV produces video from three inputs: a photo of the subject, a script or pre-recorded audio file, and a voice. The script is converted to speech using HeyGen's voice engine if no audio is provided, and the audio is then passed into the audio-to-expression engine that drives the visual generation. The output is a fully rendered video clip with synchronized lip movement, head motion, facial micro-expressions and, in half-body and full-body modes, hand and arm gestures.

Capability	Details
Input photo	Single image; front-facing, three-quarter or profile angles supported
Supported subjects	Photoreal humans, stylized portraits, anime characters and pets
Framing	Portrait, half-body and full-body; framing is selected per generation
Lip sync	Audio-driven sync with industry-leading accuracy claimed across dozens of languages
Facial motion	Diffusion-inspired audio-to-expression engine for brow shifts, eye softening, asymmetric smiles
Head motion	Tilts on pauses, forward lean on emphasis, turn-away on reflective lines
Hand gestures	Timing-aware gestures synchronized with script content from June 2025 onward
Body motion	Full-body weight shifts and posture, added with the August 2025 Digital Twin upgrade
Prompted gestures	Natural-language gesture and movement prompts inside the Avatar IV editor
Voice languages	More than 175 languages and dialects across HeyGen's voice stack
Voice cloning	Few-second reference clip for voice mirroring; integrated with Voice Director
Output resolution	Up to 1080p on Creator plans; 4K on Pro and above
API access	Avatar IV API on Pro and Scale developer tiers from May 2026
Credit cost	20 Premium Credits per minute of generated Avatar IV video

The diffusion-inspired engine is the part HeyGen emphasizes most in its own materials. Rather than mapping phonemes to mouth shapes, the engine analyzes the audio frame by frame for tone, rhythm and emotional cues, then synthesizes a coherent set of facial movements that match those cues. HeyGen describes the result as photoreal motion with temporal realism, where blinks, head tilts and small smiles arrive at moments that fit the spoken cadence rather than at fixed intervals. The same signal path drives the hand-gesture system that was added in June 2025; prompted phrases such as "emphasize this point" or specific descriptions of a gesture can be inserted into the script and the model attempts to produce the corresponding motion.

A second wave of capabilities arrived in HeyGen's November 2025 release. The Avatar IV editor gained more controllable movements with prompt-based gestures, finer-grained expression controls for subtler body language, and an intelligent render selection that automatically picks an optimal rendering approach for a given clip. These changes did not break existing Avatar IV jobs; HeyGen treated them as in-place upgrades to the same model line rather than a new generation.

Voice Director

Voice Director is the voice control layer that HeyGen shipped on November 5, 2025 alongside the November Avatar IV update. It is built on the Panda Voice Engine and is designed to let users shape an avatar's vocal delivery through natural-language prompts rather than through manual SSML or a separate audio editor. Prompts such as "add excitement," "make it sound confident," or "emphasize this line" can be applied at the word, sentence or paragraph level, and the engine adjusts tone, pacing and emotion for that span while keeping the underlying voice consistent.

HeyGen pairs Voice Director with a companion feature called Voice Mirroring. Voice Mirroring takes a short uploaded voice recording and replicates not only the speaker's voice identity but also their rhythm, emotion, pacing and personality, so that the avatar speaks in the same style the reference speaker actually used rather than in a flattened default. This is the same input pattern HeyGen had used for voice cloning in earlier engines, but extended so that the cloned voice carries the speaker's expressive habits into the Avatar IV performance.

The November 2025 release also moved voice tools into the workflows where they are used most, including AI Studio, Proofread Studio and the Avatars surface. The standalone Voices tab was deprecated as part of that change. Inside the Avatar IV editor, Voice Director output is wired directly into the audio-to-expression engine, so a Voice Director prompt that adds excitement to a line also tends to produce larger gestures, more head motion and more emphatic facial expressions on the avatar.

Pricing

Avatar IV is included on HeyGen's paid plans and is metered through HeyGen's Premium Credit system, which is the meter the company uses for its newer generative features. Avatar III remains unmetered for paid users and is the default fallback for high-volume routine work. Premium Credit consumption for Avatar IV runs at approximately 20 credits per minute of generated video, although HeyGen has tuned the exact ratio over time and bundles credits differently across plans.

Plan	Monthly price	Notable Avatar IV terms
Free	0 dollars	Short watermarked trial videos; limited Avatar IV access
Creator	About 29 dollars per month, 24 with annual billing	Roughly 200 Premium Credits per month, about 10 minutes of Avatar IV video
Pro	About 99 dollars per month, 79 with annual billing	Roughly 2,000 Premium Credits per month, about 100 minutes of Avatar IV; 4K export; Avatar IV API access on the developer tier
Business	About 149 dollars per month	Shared workspace credits; additional seats at about 20 dollars each
Enterprise	Custom pricing	Custom credit pools, dedicated support, governance and security controls

The Avatar IV API, opened on May 4, 2026, is self-serve on the Pro and Scale developer tiers, with custom rates for higher-volume Enterprise usage. The API exposes the same Avatar IV engine through POST requests to HeyGen's video generation endpoints and through photo-avatar endpoints that handle motion and sound effects programmatically. Pricing for API usage is published on HeyGen's API pricing page and is metered separately from the credit pool tied to the web app subscription.

The Premium Credit cap is the part most often flagged in third-party reviews. Reviewers note that 200 credits a month on the Creator plan covers roughly ten minutes of Avatar IV output before the user has to either fall back to Avatar III or buy additional credits, which can make Avatar IV feel premium-gated for casual creators. Pro and Business plans, with much larger credit allocations, are the tiers most often recommended for users whose primary workflow is Avatar IV rather than Avatar III.

Comparison to competitors

Through late 2025 and into 2026 the AI avatar market is dominated by HeyGen and Synthesia, with Hedra Character sitting alongside them as a more cinematic single-image option and D-ID still active in real-time talking-head workflows. Avatar IV is the model HeyGen sends into all of these comparisons.

Platform	Engine	Avatar style	Primary positioning	Notable late-2025 feature
HeyGen Avatar IV	Diffusion-inspired audio-to-expression engine	Photoreal portraits to full-body avatars from a single photo, stylized and non-human subjects also supported	Marketing, social, creator and small-to-mid business video, plus Digital Twin for executive use	Voice Director and Panda Voice Engine integration, prompted gesture control
Synthesia Express-2	DiT-based Express-Video plus two-stage Express-Voice transformer	Full-body stock and personal avatars from a curated actor library, rendered at 1080p, 30 fps	Enterprise learning and development, training and internal communications	Action-capable avatars that perform prompted B-roll gestures, embedded AI Playground with Sora 2 and Veo 3.1
Hedra Character (Character-3)	Multimodal model reasoning over image, text and audio jointly	Cinematic talking characters from a single image, oriented toward storytelling	Creator-side video, short film, character-driven content	Live Avatars and tightly synchronized audio-driven character performance at sub-dollar per-minute pricing
D-ID	Talking-head pipeline with strong real-time path	Head-and-shoulders presenters with streaming output	Real-time customer interaction, agents, chat-style avatars	Streaming avatars and conversational deployment in customer support tools

The practical split between HeyGen and Synthesia in 2026 reviews tends to track buyer profile rather than raw quality. Avatar IV is consistently rated higher by solo creators and marketing teams who want a faster path from a single photo to a finished video, broader language coverage and lower per-seat pricing. Synthesia's Express-2 is preferred by enterprise buyers who want a curated stock-avatar library, governance, SCORM-ready learning content and the new Video Agents layer. Reviewers covering both platforms in late 2025 generally describe Express-2 as closing most of the realism gap that Avatar IV had opened earlier in the year, while still trailing Avatar IV on flexibility with arbitrary input photos.

Against Hedra Character, Avatar IV is usually framed as the broader product with stronger full-body motion and a deeper voice stack, while Hedra Character is rated higher for character expressiveness and cinematic single-image talking heads at a much lower per-minute price. Against D-ID, Avatar IV is rated higher on visual realism and emotional nuance but does not, as a non-streaming video generation product, replace D-ID's real-time conversational deployments.

Reception

Avatar IV was treated by AI video coverage through 2025 as the model that pushed the photoreal avatar bar past head-and-shoulders lip sync into full-body performance. Reviews in publications including TechCrunch, AI Magazine and a wide set of independent creator-focused outlets pointed to the single-photo input, the prompted gesture control, and the November integration with Voice Director as the features that separated it from Avatar III and from competing models earlier in the year. HeyGen's own metrics through the same period, including the jump to $100 million in annual recurring revenue by October 2025, were repeatedly tied back to Avatar IV adoption.

Individual users were more mixed. The most common complaint in third-party reviews of Avatar IV in 2026 is the Premium Credit cap, which makes the 200-credit Creator plan feel narrow for users whose primary workflow is Avatar IV rather than Avatar III. Reviewers also flagged artifacts in complex hand motion, swift head turns and rapid phrasing, with the model performing best on medium-length conversational takes and explainer content rather than fast-paced commentary or heavy choreography. Several reviews noted that the model's strongest output comes from well-lit, front-facing reference photos and that more extreme angles or low-quality source images produce visible artifacts in the generated motion.

Joshua Xu has continued to frame Avatar IV publicly as a step toward avatars that can carry an actual performance rather than only lip-sync a script. In his August 2025 Digital Twin announcement Xu wrote that the upgraded Digital Twin powered by Avatar IV is "indistinguishable from you," and in his June 2025 gesture control announcement he described Avatar IV as a model that can "speak, gesture, and move its body with meaning." HeyGen has since used Avatar IV as the foundation under Avatar V, the April 2026 video-reference model that takes the same general workflow and replaces single-photo prediction with a fine-tuned model based on a 15-second user clip.

References

Joshua Xu (@joshua_xu_). "NEW: HeyGen Avatar IV is here. Our most advanced AI avatar model yet." X post, May 6, 2025. https://x.com/joshua_xu_/status/1919765489775231401
Joshua Xu (@joshua_xu_). "NEW in Avatar IV: Prompting Gesture Control." X post, June 5, 2025. https://x.com/joshua_xu_/status/1930641719739248757
Joshua Xu (@joshua_xu_). "Your Digital Twin is now indistinguishable from you." X post, August 26, 2025. https://x.com/joshua_xu_/status/1960341965398225295
HeyGen. "Introducing Voice Director and Avatar IV." HeyGen blog, November 5, 2025. https://www.heygen.com/blog/introducing-voice-director-and-avatar-iv
HeyGen. "November 2025 Product Release." HeyGen blog, 2025. https://www.heygen.com/blog/heygen-november-2025-release
HeyGen. "Announcing the Avatar IV API." HeyGen blog, May 2026. https://www.heygen.com/blog/announcing-the-avatar-iv-api
HeyGen. "Avatar IV: AI Avatars That Look Real." HeyGen product page. https://www.heygen.com/avatars/avatar-iv
HeyGen. "HeyGen Avatar IV Complete Guide." HeyGen Help Center. https://help.heygen.com/en/articles/11269603-heygen-avatar-iv-complete-guide
HeyGen. "Introducing Avatar IV: Create talking Avatars from a single photo." HeyGen community resources. https://community.heygen.com/public/resources/introducing-avatar-iv-create-talking-avatars-from-a-single-photo
HeyGen. "Product Launch: Introducing Avatar-IV." HeyGen community event. https://community.heygen.com/public/events/product-launch-introducing-avatar-iv-m6fn84c06s
HeyGen. "Avatar V: Live webinar recap and top questions answered." HeyGen community resources, April 16, 2026. https://community.heygen.com/public/resources/avatar-v-live-webinar-recap-top-questions-answered-2026-04-16
HeyGen. "HeyGen Pricing, Plans, and Subscriptions Explained." HeyGen Help Center. https://help.heygen.com/en/articles/9204682-heygen-pricing-plans-and-subscriptions-explained-what-you-need-to-know
HeyGen. "Pricing Plans for Creators and Marketers." https://www.heygen.com/pricing
HeyGen. "How to Use Voice Mirroring and Voice Director." HeyGen Help Center. https://help.heygen.com/en/articles/11408956-how-to-use-voice-mirroring-and-voice-director
Blue Lightning TV. "HeyGen Upgrades Digital Twin to Avatar IV, Pushing Toward Lifelike Full-Body Avatars." August 27, 2025. https://bluelightningtv.com/2025/08/27/heygen-upgrades-digital-twin-to-avatar-iv-pushing-toward-lifelike-full-body-avatars/
Wikipedia. "HeyGen." https://en.wikipedia.org/wiki/HeyGen
Sacra. "HeyGen revenue, valuation and funding." https://sacra.com/c/heygen/
Contrary Research. "HeyGen Business Breakdown and Founding Story." https://research.contrary.com/company/heygen
RoboRhythms. "HeyGen Review: When the Avatar IV Credit Cap Bites." 2026. https://www.roborhythms.com/heygen-review/
WaveSpeed. "HeyGen Avatar IV Complete Guide: Create Realistic AI Videos in 2026." https://wavespeed.ai/blog/posts/heygen-avatar-iv-complete-guide-2026/

HeyGen Avatar IV

HeyGen Avatar IV

Background

Announcement and rollout

Avatar IV capabilities

Voice Director

Pricing

Comparison to competitors

Reception

See also

References

Improve this article

HeyGen Avatar IV

Background

Announcement and rollout

Avatar IV capabilities

Voice Director

Pricing

Comparison to competitors

Reception

See also

References

HeyGen Avatar IV

Background

Announcement and rollout

Avatar IV capabilities

Voice Director

Pricing

Comparison to competitors

Reception

See also

References

Improve this article

Related Articles

Synthesia 3.0

Hedra Character

NVIDIA Picasso

Text-to-video generation

Veo 2

Runway Gen-3 Alpha

HeyGen Avatar IV

Background

Announcement and rollout

Avatar IV capabilities

Voice Director

Pricing

Comparison to competitors

Reception

See also

References

Related Articles

Synthesia 3.0

Hedra Character

NVIDIA Picasso

Text-to-video generation

Veo 2

Runway Gen-3 Alpha