Seedance 2.0

AI Models Chinese AI Video Generation

9 min read

Updated Jun 2, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 2, 2026

Fact-checked

In review queue

Sources

10 citations

Revision

v1 · 1,721 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Seedance 2.0 is a multimodal video generation model developed by ByteDance, released in February 2026 as the second major version of the company's Seedance line. Its defining feature is native audio-video joint generation: rather than producing silent footage and adding sound afterward, the model outputs synchronized visuals and audio together, including lip-synced speech, ambient sound, and music. It accepts four input modalities (text, image, audio, and video) and is exposed to consumers through ByteDance's Dreamina and Jimeng creative apps and the CapCut editor, with an API priced through the company's Volcengine cloud platform.^[1]^[2]^[3]

Overview

Seedance 2.0 generates short video clips of 4 to 15 seconds at native resolutions of 480p and 720p (measured on the shorter edge), across six aspect ratios. A single generation can span multiple camera shots with cuts inside one clip, and the accompanying audio track is produced in the same pass as the picture rather than dubbed on afterward.^[1]^[3] ByteDance positions the model for narrative and short-form use cases such as short dramas, advertising, and social video, and the company describes its target output as "cinematic," with controls over performance, lighting, shadow, and camera movement.^[2]

The release sits within an intense period of competition among Chinese technology firms building AI video tools. When the model surfaced in China, shares of several Chinese media and content companies rallied, with Huace Media rising roughly 7 percent and Perfect World about 10 percent, as investors weighed the implications of cheaper AI-assisted production for film and television.^[4]

ByteDance Seed and the Seedance line

The model was built by ByteDance Seed, the company's foundation-model research group, which also produces the Seedream family of image models. Seedance 2.0 follows earlier Seedance releases, including Seedance 1.0 and the intermediate Seedance 1.5 Pro, and a technical report frames 2.0 as a substantial step up across the main dimensions of video and audio quality rather than an incremental update.^[3]

The Seedance branding is shared across ByteDance's product surfaces. On the company's Chinese consumer app, Jimeng, and the international app, Dreamina (both tied to the CapCut and Jianying editors), the model powers AI video features; on the developer side it is referenced as Doubao-Seedance-2.0, aligning it with ByteDance's Doubao consumer assistant and model brand.^[5]^[6]

Release

ByteDance Seed announced Seedance 2.0 on February 12, 2026, describing it on the company's research site as adopting "a unified multimodal audio-video joint generation architecture that supports text, image, audio, and video inputs."^[1]^[2] The model had already begun circulating in China over the preceding weekend, drawing attention on social platforms before the formal announcement.^[4]

The launch was not entirely smooth. On February 10, 2026, ByteDance suspended a feature that could synthesize a person's voice from a facial photograph alone. The capability drew scrutiny after a technology commentator uploaded his own photo and reported that the model produced audio nearly identical to his real voice without any voice sample, raising concerns about deepfakes, fraud, and impersonation. In response, ByteDance barred the use of realistic human photos or videos as reference subjects and added a live-verification step, requiring users to record their own image and voice before creating a digital avatar.^[7]

Distribution expanded over the following weeks. Through CapCut's paid tier, Dreamina Seedance 2.0 reached users across Southeast Asia, Latin America, Africa, the Middle East, parts of Europe, Japan, and the United States, while a developer-facing API was published with pricing on Volcengine.^[5]^[6]

Capabilities

Seedance 2.0's central capability is generating video and its soundtrack together in one model pass. The audio includes dialogue with lip synchronization in multiple languages, along with ambient effects and music, and is produced as a dual-channel track.^[1]^[3] Because picture and sound are generated jointly, the model can keep speech, on-screen action, and scene cuts aligned within a clip.^[6]

The model takes mixed-modality input. A single request can combine natural-language instructions with reference media: ByteDance's open platform allows up to 9 images, 3 video clips, and 3 audio clips per generation.^[1]^[3] These references support what the company calls director-level control, letting users steer character performance, lighting, camera motion, and other elements using example media rather than text alone.^[2]

Beyond generation from scratch, Seedance 2.0 supports video extension and editing. ByteDance describes stable and controllable extension of existing footage and targeted modification of specified clips, characters, actions, and storylines, positioning editing as a first-class function alongside fresh generation.^[1]^[3]

The following table summarizes the model's disclosed specifications.

Attribute	Specification
Output type	Joint audio plus video, single pass
Duration	4 to 15 seconds
Native resolution	480p and 720p (shorter edge)
Aspect ratios	Six, including landscape and portrait
Audio	Dual-channel; dialogue, ambient sound, music
Input modalities	Text, image, audio, video
Reference limits	Up to 9 images, 3 video clips, 3 audio clips
Multi-shot	Multiple shots with cuts within one clip
Editing	Video extension and targeted clip and character edits

Technical approach as disclosed

ByteDance has released limited architectural detail. The company and an associated technical report describe a "unified, highly efficient, and large-scale architecture" for multimodal audio-video joint generation supporting the four input modalities, and note a lower-latency "Fast" variant intended for scenarios that prioritize speed.^[1]^[3] A research paper titled "Seedance 2.0: Advancing Video Generation for World Complexity," credited to ByteDance Seed and posted in April 2026, accompanies the release; observers have characterized it as closer to a product and benchmark showcase than a detailed account of training data, infrastructure, or model internals.^[3]

ByteDance evaluates the model with its own benchmark suite, SeedVideoBench-2.0, which it reports covers multiple task types and quality dimensions and on which it places Seedance 2.0 in a leading position.^[2] Because that benchmark is the developer's own, the independent leaderboard results discussed below are a more useful external reference.

Availability and pricing

Seedance 2.0 is offered to consumers through ByteDance's creative apps and to developers through its cloud. Consumer access runs through Jimeng in China and Dreamina internationally, both connected to the CapCut and Jianying editors, generally on paid tiers. For developers, Volcengine published per-token pricing for the Doubao-Seedance-2.0 model, billed by token consumption rather than by clip; ByteDance reported that generating a 15-second clip consumes roughly 308,880 tokens, which works out to about 1 yuan, or roughly 0.14 US dollars, per second of pure generation. At the time the pricing was disclosed in early March 2026, the API was described as in limited or internal release rather than openly available to all third-party developers.^[6]^[8]

Channel	Platform	Audience	Access and pricing notes
Consumer app (China)	Jimeng (with Jianying / CapCut)	General users in China	Paid creative tiers
Consumer app (international)	Dreamina (with CapCut)	Users across SE Asia, Latin America, Africa, Middle East, parts of Europe, Japan, US	Rolled out on CapCut paid tier
Developer API	Volcengine (Doubao-Seedance-2.0)	Developers in China	28 yuan / million tokens with video input (editing); 46 yuan / million tokens for pure generation; about 1 yuan (~$0.14) per second; limited release at launch

Pricing reported by third-party resellers differs from the first-party figures above and is not used here. The yuan-to-dollar conversions are approximate and reflect the rates cited at the time of the announcement.^[8]

Reception and benchmark standing

Seedance 2.0 was received as a strong entrant in AI video. On the Artificial Analysis Video Arena, which ranks models by Elo from blind human preference votes, the entry listed as "Dreamina Seedance 2.0 720p" placed at or very near the top of the categories restricted to models that produce audio output. Early after launch it was reported as narrowly leading both the text-to-video and image-to-video arenas; on later snapshots of the audio-output text-to-video category it sat around the second position with an Elo near 1,213, in a close race with other newly added systems. The arena is continuously re-rated as votes accumulate and new models are added, so its standings shift over time.^[9]^[10]

Press coverage emphasized the realism of the output. The South China Morning Post quoted an early tester who said the reality enhancements made it "very hard to tell whether a video is generated by AI" and praised the storytelling and visual quality.^[4] An industry newsletter described the model as state of the art and noted that its arrival coincided with OpenAI's decision, announced in March 2026, to wind down its Sora app and API, framing ByteDance's expansion as a contrast to that retreat.^[5]

Limitations

Several constraints follow from the model's design and rollout. Output is capped at 15 seconds per clip, and native resolution tops out at 720p on the shorter edge, lower than some competing systems advertise.^[3] Open-platform inputs are bounded to 9 images, 3 video clips, and 3 audio clips per request.^[1]

The voice-cloning incident illustrates a broader risk surface for a model that synthesizes realistic faces and voices: ByteDance withdrew the photo-to-voice feature and added identity verification after it was shown to reproduce a real person's voice without consent.^[7] Access is also uneven. Consumer availability has expanded across many regions through CapCut's paid tiers, but the developer API was characterized as a limited or internal release at the time its pricing was published, rather than a generally open service.^[6]^[8] Finally, ByteDance has disclosed little about the model's training data, scale, or internal architecture, so independent assessment rests largely on output quality and third-party preference rankings rather than published technical detail.^[3]

References

Seedance 2.0 (project page), ByteDance Seed. ↩
Official Launch of Seedance 2.0, ByteDance Seed. ↩
Seedance 2.0: Advancing Video Generation for World Complexity, ByteDance Seed, arXiv:2604.14148, April 2026. ↩
ByteDance's new model sparks stock rally as China's AI video battle escalates, South China Morning Post. ↩
ByteDance Adds State-of-the-Art Seedance 2.0 Video to CapCut, While OpenAI Retreats, The Batch (DeepLearning.AI), issue 352. ↩
ByteDance's Seedance 2.0 video model costs about $0.14 per second, TechNode. ↩
ByteDance suspends Seedance 2.0 feature that turns facial photos into personal voices over potential risks, TechNode. ↩
1元1秒，字节 Seedance 2.0 视频生成 AI 模型公布 API 定价, IT之家. ↩
Text to Video Leaderboard, Artificial Analysis. ↩
Image to Video Leaderboard, Artificial Analysis. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

Best AI Video Generators ByteDance Seed3D 2.0 Jimeng (Dreamina)

Overview

ByteDance Seed and the Seedance line

Release

Capabilities

Technical approach as disclosed

Availability and pricing

Reception and benchmark standing

Limitations

References

Improve this article

Related Articles

Wan 2.1

Seedance

Wan 2.1-VACE

Wan 2.5

Doubao Seedance

HappyHorse-1.0

What links here

Related Articles

Wan 2.1

Seedance

Wan 2.1-VACE

Wan 2.5

Doubao Seedance

HappyHorse-1.0

What links here