D-ID

D-ID is an Israeli artificial intelligence company specializing in generative AI technology for creating photorealistic digital humans and talking avatars from still images. Founded in 2017 by Gil Perry, Sella Blondheim, and Eliran Kuta, the company is headquartered in Tel Aviv, Israel, with additional offices in Wilmington, Delaware. D-ID's flagship product, the Creative Reality Studio, allows users to produce AI-generated videos featuring lifelike talking avatars from a single photograph, supporting over 119 languages. The company has raised approximately $48 million in total funding and serves customers including Warner Bros. Studios, Mondelez, Publicis, and MyHeritage ^[1]^[2].

D-ID originally focused on protecting images from unauthorized facial recognition and mass surveillance. Over time, the company pivoted toward generative AI video production, becoming one of the leading platforms in the AI avatar and synthetic media space. The company's technology gained global recognition through its partnership with MyHeritage on the viral Deep Nostalgia feature, which has been used to animate over 118 million faces in historical photographs.

History

Founding and early focus

D-ID was founded in 2017 by Gil Perry (CEO), Sella Blondheim (COO), and Eliran Kuta (CTO). The three founders are veterans of the Israeli Defense Forces' elite Unit 8200 intelligence corps and met during their military service. Perry holds a B.Sc. in Computer Science from Tel Aviv University with a focus on computer vision and image processing. Kuta is a computer vision and image processing expert with extensive management and development experience, while Blondheim brings expertise in project management, advertising, and marketing, holding a B.Sc. in Industrial Engineering from Shenkar College of Engineering and Design ^[3].

The company's original mission centered on de-identification technology, a system designed to protect photographs from facial recognition software. The name "D-ID" itself is shorthand for "de-identification." The founders were motivated by growing concerns about mass surveillance and the erosion of privacy through facial recognition systems deployed without consent. In the years following the founding, governments and corporations had deployed facial recognition at an unprecedented scale, from public spaces to social media platforms, and the founders saw an urgent need for counter-technology that could protect individuals' visual identities ^[4].

D-ID's initial product could alter images in ways imperceptible to the human eye but sufficient to prevent facial recognition algorithms from matching those images to identity databases. The technology applied subtle pixel-level perturbations to facial images that disrupted the mathematical representations (known as face embeddings) that recognition algorithms use to identify individuals. From a human perspective, the altered photo looked identical to the original, but to a facial recognition system, the two images appeared to belong to different people.

D-ID participated in Y Combinator and gradually expanded its technology from privacy protection into creative applications of face animation and synthetic media.

Pivot to generative AI

While the privacy-focused de-identification technology attracted early interest from security-conscious organizations and privacy advocates, D-ID's leadership recognized a broader commercial opportunity in using their deep learning face animation capabilities for content creation. The company began developing tools that could take a single still photograph of a face and animate it to speak, blink, and express emotions with convincing realism.

This pivot positioned D-ID within the rapidly growing synthetic media industry, where businesses and individuals sought efficient ways to produce video content without traditional filming equipment, actors, or studios. The shift also aligned with the explosive growth of AI-generated content tools in the early 2020s, as advances in generative adversarial networks, diffusion models, and transformer architectures made increasingly realistic synthetic media possible.

The pivot did not mean D-ID abandoned its privacy roots entirely. The company continued to offer de-identification services alongside its creative tools, and the technical expertise in understanding and manipulating facial features at the pixel level directly informed the quality of its generative products.

Growth and viral success (2021-2023)

D-ID's breakout moment came in February 2021 when MyHeritage launched the Deep Nostalgia feature powered by D-ID's Live Portrait technology. The feature, which animated faces in old family photographs, went viral on social media, particularly on TikTok, where millions of users shared videos of their deceased relatives appearing to move and look around. The emotional resonance of seeing long-dead family members "come to life" drove extraordinary engagement, and within months, nearly 100 million faces had been animated through the feature ^[5]^[8].

The viral success of Deep Nostalgia significantly raised D-ID's profile and demonstrated the consumer appeal of face animation technology. It also showed that the company's technology could handle a vast range of input image qualities, from crisp modern photographs to faded, damaged images from the 19th century.

Building on this momentum, D-ID launched the Creative Reality Studio in 2022, expanding from a technology provider (powering other companies' products) to a direct-to-consumer and direct-to-business platform. The studio made D-ID's face animation technology available to anyone through a web browser, dramatically lowering the barrier to entry for AI video creation.

Technology

Core face animation engine

D-ID's core technology uses deep learning models to analyze the structure of a human face in a photograph, then generate frame-by-frame animations that produce natural-looking head movements, lip synchronization, and facial expressions. The system processes a static input image along with either a text script or audio file and outputs a video in which the photographed individual appears to speak the provided content.

The animation pipeline handles several technical challenges simultaneously:

Challenge	D-ID's Approach
Lip synchronization	Deep learning model maps phonemes to mouth shapes in real time
Head movement	Generates natural micro-movements to avoid an uncanny stillness
Eye blinking	Adds realistic blink patterns and gaze behavior
Facial expressions	Supports emotional range through configurable expression parameters
Multi-language support	Handles 119+ languages with appropriate mouth shapes for each phonetic system
Resolution preservation	Maintains the quality of the original input photograph in the output video
Occlusion handling	Manages partial face visibility (hair, glasses, headwear)
Aging and damage tolerance	Processes degraded historical photographs alongside modern high-resolution images

The system's approach to lip synchronization is particularly notable. Rather than using a simple lookup table that maps speech sounds to mouth shapes, D-ID's model generates continuous, fluid mouth movements that account for coarticulation (the phenomenon where the pronunciation of one sound is influenced by surrounding sounds). This produces more natural-looking speech than simpler approaches, which can result in robotic or disjointed mouth movements.

Live Portrait technology

D-ID's proprietary Live Portrait technology forms the foundation of several of its products and partnerships. This system was the technology behind the viral Deep Nostalgia feature on MyHeritage, which animated the faces in historical family photographs. Live Portrait works by mapping a pre-recorded "driver" video of facial movements onto a target still image, transferring motion from one face to another while preserving the identity of the person in the photograph ^[5].

The technology handles a range of challenging input conditions:

Low-resolution images: Historical photographs that may be only a few hundred pixels across
Damaged or faded photographs: Images with scratches, discoloration, or missing portions
Non-frontal poses: Faces captured at angles rather than directly facing the camera
Group photographs: The ability to animate individual faces within larger group images
Painted or illustrated portraits: The system can animate artistic renderings, not just photographs

Natural User Interface (NUI) agents

Beyond pre-recorded video generation, D-ID has developed technology for real-time interactive digital humans. These Natural User Interface agents combine D-ID's face animation with large language model text generation, enabling face-to-face conversations between users and AI-driven avatars. The agents can be integrated into websites, applications, and kiosks to serve as virtual customer service representatives, tutors, or brand ambassadors ^[6].

NUI agents represent an evolution from passive content creation (generating a video) to active interaction (holding a conversation). The technical requirements are significantly more demanding, as the system must generate facial animations in real time with low enough latency to feel conversational. D-ID achieves this through a streaming architecture where the avatar begins responding visually before the full response has been generated, similar to how ChatGPT streams text responses token by token.

Typical NUI agent deployments include:

Use Case	Description
Customer support kiosks	AI avatars that greet customers and answer questions in retail locations, airports, or hotels
Virtual tutors	Interactive educational characters that teach language, science, or other subjects
Corporate training	Onboarding presenters that guide new employees through company policies and procedures
Healthcare information	Patient-facing avatars that explain medical procedures or medication instructions
Museum and exhibit guides	Interactive characters that provide information about exhibits in cultural institutions

Products

Creative Reality Studio

The Creative Reality Studio is D-ID's flagship self-service platform, launched in 2022 to allow businesses and individual creators to produce AI-generated videos without technical expertise. The studio combines D-ID's face animation technology with text-to-speech capabilities and LLM integration into a single browser-based interface ^[7].

Key features of the Creative Reality Studio include:

Photo-to-video conversion: Users upload a photograph (or select from a library of stock presenters) and provide a text script. The system generates a video of the photographed person delivering the script with natural lip sync and head movements.
Presenter library: A collection of diverse AI presenters spanning different ages, ethnicities, and professional styles that users can select for their videos, eliminating the need to use personal photographs.
Multi-language support: Text-to-speech and lip synchronization in over 119 languages and dialects, making the platform accessible to global audiences.
Emotional expression controls: Configurable parameters that allow users to set the emotional tone of the presenter's delivery, including happiness, seriousness, concern, and enthusiasm.
Batch video creation: Tools for generating multiple personalized videos at scale, useful for marketing campaigns and customer communications where each recipient receives a video addressed specifically to them.
Script from URL: The ability to generate a video script automatically from a URL, allowing users to turn blog posts or web pages into video presentations.

Chat API

D-ID offers a Chat API that enables developers to build face-to-face conversational experiences powered by AI. This API connects D-ID's real-time face animation with LLMs such as GPT-4 and Claude, allowing developers to create interactive digital humans that can hold natural conversations. The Chat API supports streaming, meaning the avatar responds in real time rather than requiring pre-rendered video ^[6].

The API provides several configuration options:

Avatar customization: Developers can specify which presenter or uploaded image to use as the conversational partner
Voice selection: Integration with multiple text-to-speech providers, including ElevenLabs and Azure Speech
LLM selection: Developers can connect the avatar to their preferred language model
Knowledge base attachment: Custom documents and data can be attached to inform the avatar's responses
Conversation memory: The system maintains context across multiple exchanges within a session

Agents platform

The D-ID Agents platform allows businesses to deploy persistent AI characters that maintain context across conversations, remember previous interactions, and can be trained on company-specific knowledge bases. These agents are designed for use cases such as customer support, employee training, and interactive educational content. Unlike the Chat API, which is designed for developers, the Agents platform provides a no-code interface for business users to create and manage their digital human deployments.

Video API

For developers who need to generate talking-head videos programmatically (rather than interactively), D-ID provides a Video API. This API accepts an image, a script or audio file, and configuration parameters, and returns a rendered video. Common use cases include automated video generation pipelines for e-commerce product descriptions, news summaries, and personalized marketing campaigns.

Partnerships and notable projects

MyHeritage Deep Nostalgia

The most publicly visible application of D-ID's technology is the Deep Nostalgia feature, developed in partnership with genealogy platform MyHeritage. Launched in February 2021, Deep Nostalgia allows users to animate the faces in old family photographs, creating short videos that show how deceased relatives might have moved and looked in life. The feature became a viral sensation, generating hundreds of millions of views on social media platforms including TikTok and Instagram. As of 2025, users have animated over 118 million faces through Deep Nostalgia ^[5]^[8].

Following the success of Deep Nostalgia, D-ID and MyHeritage expanded their partnership with the LiveStory feature, which adds narrated voice to the animated photographs, allowing users to hear what appears to be their ancestors speaking. This feature combines D-ID's face animation with text-to-speech technology to produce a fully narrated video from a single photograph and a written script.

ElevenLabs partnership

D-ID partnered with ElevenLabs to integrate premium AI-generated voices into the Creative Reality Studio. This partnership gives D-ID users access to ElevenLabs' highly realistic text-to-speech voices, which are known for their natural intonation, emotional expressiveness, and support for voice cloning. The integration means users can pair D-ID's visual realism with ElevenLabs' audio realism, producing videos where both the face and voice are AI-generated but appear convincingly human ^[9].

Enterprise clients

D-ID serves a range of enterprise customers across industries:

Client	Industry	Use Case
Warner Bros. Studios	Entertainment	Film and entertainment production applications
Mondelez	Consumer goods	Marketing and brand communication videos
Publicis	Advertising	Creative campaigns at scale
MyHeritage	Genealogy	Deep Nostalgia and LiveStory features
Various educational institutions	Education	Virtual tutors and training content
Healthcare providers	Healthcare	Patient education and information delivery

Pricing

D-ID operates on a credit-based pricing model with several tiers designed for different user segments. Credits are consumed based on the duration of video generated, with one credit typically corresponding to a set amount of video time.

Plan	Price	Key Features
Free Trial	$0 (14 days)	Limited credits, watermarked output, access to core features
Lite	Starting at $6/month	Basic video generation, limited credits per month
Pro	Mid-tier pricing	Increased credits, no watermark, API access, premium presenters
Advanced	Higher tier	More credits, priority rendering, advanced avatar features
Enterprise	Custom pricing	Unlimited or custom usage, dedicated support, custom integrations, SLA guarantees

API pricing operates on a separate structure with per-minute or per-credit billing. Both monthly and annual billing options are available, with annual plans offering discounted rates. Credits are issued monthly on the billing date and do not roll over to subsequent months ^[10].

Funding

D-ID has raised a total of approximately $48 million across multiple funding rounds.

Round	Amount	Lead Investor(s)	Notable Participants	Date
Seed	Undisclosed	Various	Early-stage investors	2018
Series A	$13.5 million	AXA Venture Partners, Pitango	OurCrowd	2021
Series B	$25 million	Macquarie Capital	Pitango, AXA Venture Partners, OurCrowd, OIF, Maverick, Marubeni	2022
Total	~$48 million

The Series B round in 2022, led by Macquarie Capital, brought D-ID's total funding to $48 million. The company stated that the funds would be used to expand its Creative Reality platform, invest in research and development, and grow its team. The funding reflected investor confidence in the growing market for AI-generated video content, which was seeing rapid adoption across marketing, education, and entertainment sectors ^[2].

Competition

D-ID operates in the competitive AI video generation and synthetic media market alongside several other platforms.

| Competitor | Headquarters | Focus | Key Differentiator | |------------|-------------|-------|----|| | Synthesia | London, UK | Enterprise video | Highly realistic studio-quality avatars, strong corporate adoption | | HeyGen | Los Angeles, USA | Script-to-video | 4K export, superior lip-sync, comprehensive template library | | Colossyan | Budapest, Hungary | Training video | Emphasis on educational and corporate training content | | Elai.io | Delaware, USA | Automated video | Video generation from text, URLs, and slides | | DeepBrain AI | Seoul, South Korea | Broadcast video | Broadcast-quality avatars for news and media | | Hour One | Tel Aviv, Israel | Enterprise | Presentations, training, and compliance content | | Pictory | Dublin, Ireland | Content repurposing | Converting long-form content into short videos |

D-ID differentiates itself primarily through its ability to animate any photograph into a talking avatar (rather than requiring users to choose from a fixed library of pre-recorded presenters), its extensive 119+ language support, the strength of its API for developers building custom applications, and its proven track record with viral consumer products like Deep Nostalgia. However, competitors like Synthesia and HeyGen are often noted for producing higher visual fidelity in their avatar outputs, particularly for corporate and enterprise use cases. HeyGen, in particular, has gained ground with its 4K output quality and more expressive avatar animations ^[11].

Privacy and ethical considerations

D-ID's technology raises important questions about synthetic media ethics. The ability to make any photographed person appear to speak arbitrary content has obvious potential for misuse, including the creation of deepfakes and misinformation.

D-ID addresses these concerns through several measures:

Consent requirements: The platform's terms of service require users to have the right to use any photographs they upload, and the platform prohibits generating content that misrepresents real individuals.
Content moderation: Automated systems and human review processes detect and prevent abuse, including attempts to generate political misinformation or non-consensual intimate content.
Watermarking: Free-tier videos include visible watermarks indicating AI generation, providing a basic layer of provenance tracking.
De-identification roots: The company's original privacy-protection technology informs its approach to responsible AI development, and D-ID has contributed to industry discussions about synthetic media governance.
Regulatory engagement: D-ID has engaged with policymakers in the EU, US, and Israel on frameworks for responsible synthetic media, including disclosure requirements and consent standards.

The broader industry of AI-generated synthetic media continues to face regulatory scrutiny, with proposed legislation in multiple jurisdictions requiring disclosure when AI-generated content depicts real people. The EU AI Act, for instance, includes transparency requirements for AI-generated content, and similar proposals are advancing in the United States and other countries.

References

History

Founding and early focus

Pivot to generative AI

Growth and viral success (2021-2023)

Technology

Core face animation engine

Live Portrait technology

Natural User Interface (NUI) agents

Products

Creative Reality Studio

Chat API

Agents platform

Video API

Partnerships and notable projects

MyHeritage Deep Nostalgia

ElevenLabs partnership

Enterprise clients

Pricing

Funding

Competition

Privacy and ethical considerations

See also

References

Improve this article

Related Articles

NVIDIA Picasso

Text-to-video generation

Veo 2

Runway Gen-3 Alpha

Thinking Machines Lab

Runway (company)

History

Founding and early focus

Pivot to generative AI

Growth and viral success (2021-2023)

Technology

Core face animation engine

Live Portrait technology

Natural User Interface (NUI) agents

Products

Creative Reality Studio

Chat API

Agents platform

Video API

Partnerships and notable projects

MyHeritage Deep Nostalgia

ElevenLabs partnership

Enterprise clients

Pricing

Funding

Competition

Privacy and ethical considerations

See also

References

Related Articles

NVIDIA Picasso

Text-to-video generation

Veo 2

Runway Gen-3 Alpha

Thinking Machines Lab

Runway (company)