Hedra Character
Last reviewed
May 16, 2026
Sources
30 citations
Review status
Source-backed
Revision
v1 ยท 3,689 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 16, 2026
Sources
30 citations
Review status
Source-backed
Revision
v1 ยท 3,689 words
Add missing citations, update stale details, or suggest a clearer explanation.
Hedra Character is a family of generative video foundation models developed by Hedra Labs, a San Francisco based artificial intelligence company. The models take an image of a person or character together with audio (and, in later versions, text) and produce video clips in which the character speaks, emotes, and moves in time with the supplied sound. The current flagship is Character-3, which Hedra introduced on March 6, 2025 and described as the first omnimodal model in production, capable of reasoning jointly over image, text, and audio inputs in a single pass. Earlier members of the family are Character-1, released on June 24, 2024, and Character-2, released on October 3, 2024.
In July 2025 Hedra extended the Character-3 model into a real-time streaming product called Live Avatars, built in partnership with LiveKit. Live Avatars renders a talking head video stream in response to spoken or typed input with sub-100 millisecond response times and is priced at $0.05 per minute, which Hedra described as roughly fifteen times cheaper than competing real-time avatar services. The combination of the Character-3 video model, the Hedra Studio web product, and the Live Avatars streaming API positions Hedra against HeyGen, Synthesia, D-ID, and Runway Act-Two in the broader generative avatar and AI video market.
Hedra Labs was founded in 2021 by Michael Lingelbach, a former Stanford neuroscience PhD student who had previously worked as a theatre actor before moving into machine learning research. Lingelbach has said in interviews and on X that the company's focus on character driven video grew out of his belief that believable digital characters are the missing piece for the next generation of storytelling, and that text to video models alone do not solve that problem because they cannot reliably produce a specific character delivering a specific performance.
Hedra came out of stealth in mid 2024 with the release of Character-1 and a $10 million seed round that closed in August 2024. The seed round was led by Index Ventures, a16z Speedrun, and Abstract, with angel participation from researchers and operators including Jon Barron, Kevin Hartz, and Bilawal Sidhu. Hedra raised a $32 million Series A in May 2025 led by Andreessen Horowitz's Infrastructure fund, which brought total funding to roughly $43 million and valued the company at about $200 million. The Series A was reported by TechCrunch, SiliconANGLE, and Deadline, all of which framed Hedra as one of the most visible startups working specifically on audio driven character video rather than general purpose text to video.
At the time of the Series A, Hedra had around twenty employees and said it had been used to generate more than ten million videos. The company indicated that funds would go toward expanding the research team, scaling the Character-3 model, and developing real-time and enterprise products. Hedra is headquartered in San Francisco.
Hedra has shipped three numbered versions of the Character family between June 2024 and March 2025, each on the order of three to five months apart. Every release added either modality coverage (audio to image to text), output flexibility (aspect ratios, character types, clip length), or generation quality.
| Version | Release date | Key changes |
|---|---|---|
| Character-1 | June 24, 2024 | First public model. Audio to video animation from a single portrait image. Generated short clips with lip sync and basic facial expression. Launched alongside a free web tool and waitlist. |
| Character-2 | October 3, 2024 | Second generation audio to video foundation model. Added widescreen and vertical aspect ratio support, improved face consistency at longer durations, and refined motion fidelity. Hedra described it as a step toward a full-stack character model. |
| Character-3 | March 6, 2025 | First omnimodal Hedra model. Jointly reasons over image, text, and audio inputs in a single pass. Generates upper-body video with phoneme aligned lip sync, micro-expressions, and natural head and body motion. Released together with the Hedra Studio web product. |
Character-1 was the model that brought Hedra public attention. VentureBeat and Tom's Guide both covered the June 2024 launch, with the latter calling it a notable step up from existing talking head tools because it produced more natural facial motion and supported a wider range of input images including stylized illustrations and animals. The model was released as a free web tool with a waitlist and used the company's seed-stage credits to attract early users; it became briefly viral in mid 2024 through the so-called "talking baby podcast" trend, in which users generated short videos of infants and animals delivering scripted dialogue.
Character-2, announced on X by Hedra on October 3, 2024, expanded the model's output options. The most visible additions were widescreen 16:9 and vertical 9:16 aspect ratios, which allowed the same underlying model to feed both YouTube style horizontal content and short-form vertical content on TikTok, Reels, and Shorts. Character-2 also addressed identity drift, which had been a frequent complaint about Character-1: characters in longer clips occasionally morphed slightly between frames. Hedra described the release as a generational leap for audio to video foundation models and as the next step toward a full-stack character model.
Character-3 is the current production model. Hedra introduced it on March 6, 2025, simultaneously with Hedra Studio, the company's web based creative environment that combines image generation, voice generation, and Character-3 video into a single workflow. The model is the first in production that Hedra characterizes as omnimodal: rather than processing image, text, and audio in separate stages, Character-3 reasons over all three modalities jointly in a single forward pass. In practice this means that the model can take a single reference image, a text prompt describing motion or scene context, and an audio clip of speech or singing, and produce a video in which the reference character delivers the audio with motion appropriate to both the words and the scene description.
Character-3 generates upper-body video rather than just a face, supports photorealistic portraits, illustrated characters, anime, cartoons, and non-human references, and produces synchronized lip motion at the phoneme level rather than only at the syllable level. Output resolution is up to 720p in standard plans, with widescreen and vertical orientations available. The model handles speech, singing, and non-speech sounds such as laughter, sighs, and shouts, with corresponding facial behavior.
Hedra launched Live Avatars on July 22, 2025, in partnership with LiveKit, the open source real-time communication infrastructure provider. Live Avatars is a streaming product rather than a batch render: instead of generating a video file from an image and an audio clip, the system renders a continuous video stream of a talking avatar that responds in real time to text or audio coming from a connected language model, text-to-speech system, or human speaker.
The announcement, posted by Hedra and by founder Michael Lingelbach on X, described Live Avatars as the most advanced streaming avatar model in the world and emphasized three figures: a per minute price of $0.05, end to end response latency under 100 milliseconds, and integration with the LiveKit Agents framework that brokers LLM and TTS connections. Hedra positioned the $0.05 per minute price as roughly fifteen times cheaper than existing real-time avatar services and as low enough to make video AI agents viable for use cases that until then could only afford voice only agents.
| Live Avatars attribute | Detail |
|---|---|
| Launch date | July 22, 2025 |
| Underlying model | Character-3 adapted for streaming inference |
| Infrastructure partner | LiveKit |
| End to end latency | Under 100 milliseconds |
| Price | $0.05 per minute |
| LLM integrations | OpenAI, Google Gemini, Anthropic Claude, via LiveKit Agents |
| TTS integrations | ElevenLabs, Cartesia, others via LiveKit Agents |
| Typical applications | Conversational AI agents, customer support, interactive tutors, virtual presenters |
The technical claim that distinguishes Live Avatars from earlier streaming avatar systems is the latency budget. Sub-100 millisecond response time is on the order of human conversational turn taking, which means the avatar can be plugged into a voice based agent without introducing the visible delay that earlier solutions had. The product can take a photograph plus a voice clone, then produce a live video stream of that person speaking arbitrary text generated by a connected language model, with the avatar's lip motion synchronized to the streamed audio.
Lingelbach framed the launch on X around the broader shift from voice agents to video agents. He argued that voice had become the dominant interface for interacting with language models over the previous year, and that video was the next unlock because a visible speaking presence increases engagement, supports non-verbal communication, and makes AI agents usable in interfaces that have a display surface. Reception in technology press, including AIbase, Metaverse Post, and Medium creator coverage, focused on the price drop and on the integration with LiveKit, with several articles describing the per minute pricing as the most aggressive in the category at launch.
Character-3 is built around a single forward pass that ingests up to three input modalities and produces a synchronized video clip. The system's headline capabilities are summarized below.
| Capability | Description |
|---|---|
| Audio driven lip sync | Phoneme aligned mouth motion that matches actual speech sounds rather than only syllable timing. Works with English and several other languages. |
| Text conditioning | A text prompt can guide the scene, mood, body motion, and camera framing, in addition to the audio and image inputs. |
| Image input | Accepts photorealistic portraits, illustrated characters, anime, cartoons, animals, and non-human faces. The model preserves the visual identity of the reference. |
| Audio handling | Speech, singing, laughter, sighs, and shouts each produce appropriate facial and head behavior. Non-speech sounds drive expression in addition to lip motion. |
| Upper body motion | Generates head turns, shoulder posture, and visible upper body gestures alongside facial expression, beyond pure talking head animation. |
| Aspect ratios | Standard 16:9 widescreen, 9:16 vertical for short form video, and 1:1 square outputs. |
| Output resolution | Up to 720p in the standard product. |
| Voice cloning | Hedra Studio integrates third party voice cloning, so a single uploaded image plus a voice sample can become a complete avatar in three steps. |
| Live streaming | Character-3 powers the Live Avatars product, which streams video at sub-100 millisecond latency rather than generating a fixed file. |
Hedra has not published a detailed architecture paper for Character-3. Public descriptions on the company blog and in third party coverage indicate that the model uses a diffusion based image and video backbone with a joint conditioning stage that aligns image, text, and audio features before generation. The omnimodal framing distinguishes Character-3 from earlier audio to video systems, which typically used a pretrained image to video diffusion model and bolted on a separate audio conditioning network. Hedra's claim is that joint training across all three modalities produces better lip sync, more natural micro-expressions, and tighter alignment between spoken content and physical performance than two-stage architectures.
Hedra distributes Character-3 through a free tier, three paid subscription tiers, and an enterprise tier, with separate per minute pricing for Live Avatars. The plans are sold on the company's pricing page and have been adjusted upward in credit allocation since the Series A close in May 2025.
| Plan | Price (monthly) | Credits per month | Notes |
|---|---|---|---|
| Free | $0 | 400 | Slower generation queue, no commercial use rights, watermark on output. |
| Basic | $10 ($8 if billed annually) | 1,000 | Premium voices, voice cloning, commercial use rights, no watermark. |
| Creator | $30 ($24 if billed annually) | 4,000 | Higher concurrency. Bumped from 3,600 credits after the May 2025 update. |
| Pro | $75 ($60 if billed annually) | 12,000 | Priority generation, higher monthly volume. Bumped from 11,000 credits. |
| Enterprise | Custom | Custom | API access, SSO, dedicated capacity. |
| Live Avatars | $0.05 per minute | n/a | Real-time streaming via LiveKit. Billed on minutes streamed. |
The credit accounting is per generation rather than per minute. A short Character-3 clip consumes single digit to low double digit credits depending on duration and resolution, so a Creator plan with 4,000 monthly credits supports hundreds of typical clips. Hedra has stated on X that monthly credits do not roll over between cycles and reset on the billing date. Add-on credit packs are available in $20 and $100 sizes for users who exceed their monthly allotment.
Commercial use rights are granted on every paid tier. Free tier output is restricted to personal use and carries a Hedra watermark. The Live Avatars price applies on top of, not in place of, the LLM and TTS costs that a real-time agent typically incurs.
Hedra Character competes in a market that includes enterprise focused AI avatar platforms, audio driven talking head tools, and broader generative video models that have added performance transfer features. The competitive picture in 2025 is shaped by where each product sits on the spectrum from short audio driven clips to fully synthesized corporate videos to performance driven character animation.
| System | Primary input | Primary output | Best fit |
|---|---|---|---|
| Hedra Character-3 | Image, text, audio | Audio driven talking character video, real-time stream via Live Avatars | Talking character clips from a single image, real-time AI agents with a visible presence |
| HeyGen | Script plus avatar selection or uploaded photo | AI avatar video, multilingual dubbing | Marketing and sales video, multilingual translation of existing video, photo avatars |
| Synthesia | Script plus avatar selection | Enterprise AI avatar video | Corporate training, internal communication, compliance heavy enterprise content |
| D-ID | Photo plus script or audio | Talking head video, real-time interactive agents | Photo to video animation, agent style talking head products |
| Runway Act-Two | Driving performance video plus character reference | Full body and face performance transfer | Cinematic performance capture, previz, indie animation |
The practical positioning of Hedra Character against these systems can be summarized as follows. Hedra's strengths concentrate around audio driven character animation from a single image, with naturalistic lip motion and broad support for stylized references. The Live Avatars product extends those strengths into real-time, and the per minute price is the most aggressive in the streaming category as of late 2025. Hedra's weaknesses are resolution (capped at 720p where HeyGen and Synthesia routinely deliver 1080p or higher) and language coverage (around fifteen supported languages versus 175 plus on HeyGen for translation and 140 plus avatar languages on Synthesia).
HeyGen and Synthesia are not direct architectural competitors to Character-3 because they are platform products rather than single models. HeyGen and Synthesia both wrap a library of pre-built avatars and a translation pipeline around their underlying generation models, which suits the enterprise training and marketing use cases that anchor their revenue. Hedra Character-3, by contrast, is a foundation model exposed through Hedra Studio and the API, with the assumption that users will bring their own reference image and voice rather than pick from a stock avatar list.
D-ID is the closest historical competitor on the talking head axis. D-ID's photo to video and real-time interactive agent products predate Hedra and serve a similar pool of customer support, ed-tech, and content production use cases. Hedra Character-3 produces more naturalistic upper body motion than D-ID's earlier face-only models and Live Avatars undercuts D-ID's real-time pricing, but D-ID has a broader catalog of pre-built avatars and a more mature enterprise sales motion.
Runway Act-Two, released in July 2025, sits in an adjacent rather than overlapping category. Act-Two is a performance transfer model: it requires a driving performance video and produces full body and hand animation of the target character. Hedra Character-3 is an audio driven model: it produces talking character video from a single image and a sound file without a driving performance. Filmmakers and indie animators tend to combine the two when they need both a recorded performance and an audio dialogue track on the same character.
Coverage of the Character family has been broadly positive across the major launches. VentureBeat introduced Character-1 in June 2024 as a notable new entry in the talking head category and emphasized the model's seed funding and team background. Tom's Guide described Character-1 as a significant step up from contemporary talking photo tools because it produced fewer obvious artifacts on stylized inputs. The talking baby podcast trend in mid 2024, which was reported by TechCrunch in its Series A coverage, drove the company's first wave of mainstream attention and contributed to early subscription growth.
The October 2024 Character-2 release was treated more incrementally in the press, with most coverage focused on the new aspect ratio options and the longer-form consistency improvements. The Hedra company X account and the Virtual Beings community group both highlighted the release as a generational step forward in audio to video models.
Character-3, released in March 2025, was the version that drew sustained coverage in the AI trade press. LearnPrompting, Neuronad, AIbase, Jon Peddie Research, and several creator focused outlets covered the announcement, with most articles framing the omnimodal claim as the headline. The Series A coverage two months later from TechCrunch, SiliconANGLE, Yahoo Finance, Deadline, and StartupHub built on that framing, with several articles noting that Hedra had become the most visible specialist in character video at a time when general purpose text to video models from OpenAI, Google, and Runway were taking up most of the oxygen in the broader video generation space.
Live Avatars in July 2025 attracted particular attention because of the price point and the latency claim. Metaverse Post described the launch as a meaningful step in the shift from voice agents to video agents and highlighted the LiveKit partnership as evidence that Hedra was building for production deployments rather than only demos. AIbase, Medium creator coverage, and KinomotoMAG echoed the framing, with most articles concluding that the $0.05 per minute price would force competitors to reprice or to differentiate on features such as avatar customization or enterprise compliance.
Creator and developer response on X has been broadly favorable, with several recurring criticisms. The 720p resolution cap is the most common complaint, especially from users who plan to use Hedra output in widescreen video projects alongside footage shot at higher resolution. Some users have reported that the model struggles with extreme camera angles in the reference image, with profile views, and with characters whose framing differs significantly from the typical upper body composition the model was trained on. The voice cloning quality depends on the third party providers integrated into Hedra Studio rather than on a Hedra owned voice model, which has been a source of inconsistency in user reports.
The broader question of safety and consent has surfaced periodically. Audio driven avatar tools have well documented potential for misuse in non-consensual likeness applications, and Hedra's terms of service prohibit such uses. Hedra has stated that it uses content moderation and account review to enforce its policy, in line with industry peers such as HeyGen and D-ID, but the company has not published detailed transparency reports on takedown volumes or moderation outcomes.