Seedance

AI Models Chinese AI Computer Vision Generative AI Video Generation

20 min read

Updated Jun 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 23, 2026

Fact-checked

In review queue

Sources

20 citations

Revision

v5 · 3,954 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Seedance is the family of foundation video generation models built by the Seed team at ByteDance, the Chinese internet company that owns TikTok and Douyin. The first version, Seedance 1.0, launched on June 11, 2025 and on June 10, 2025 took first place on both the Artificial Analysis text-to-video and image-to-video leaderboards, beating the second and third best models, Google's Veo 3 and Kuaishou's Kling 2.0, by more than 100 Elo points on the image-to-video task.^[1]^[4]^[11] A second numbered release, Seedance 2.0, followed on February 12, 2026, adding a unified multimodal architecture that jointly generates video and audio from text, image, audio, and video prompts, and reclaimed the number one slot on the Artificial Analysis text-to-video ranking with an Elo score of 1269.^[2]^[7]^[15] Together with Tencent's Hunyuan Video and Kuaishou's Kling, Seedance is one of the three Chinese model families that competes directly with Google's Veo 3 and OpenAI's Sora 2 in AI video generation.

The ByteDance Seed team describes Seedance 1.0 as "a model that supports multi-shot video generation from both text and image" that "can create 1080p videos with smooth motion, rich details, and cinematic aesthetics."^[1] It is delivered through ByteDance's own consumer products: Doubao, the company's main chat assistant; Jimeng (the Chinese version of Dreamina), its dedicated creative platform; and CapCut, the editing app used by hundreds of millions of mobile creators.^[14] Outside of China the model also reaches creators through the Dreamina web app and through paid API access on Volcano Engine (the Chinese cloud arm) and BytePlus (the international cloud arm), at a list price of about one yuan per second of pure text-to-video output.^[9] Independent third party endpoints such as fal.ai and Replicate offer the same model with different billing tiers.^[17]

Seedance models are positioned as commercial products rather than open weights. ByteDance has published a long technical report describing the underlying diffusion transformer, the RLHF stack, and the inference distillation pipeline for Seedance 1.0, but it has not released the model parameters.^[4]^[5] The reception inside the AI video community has been broadly positive: SeedVideoBench results, LMArena Video Arena voting, and Artificial Analysis Elo scores have all placed Seedance versions at or near the top of the public rankings, often ahead of older releases from Google, OpenAI, Runway, and Kuaishou.^[11]^[12] There is, of course, healthy skepticism about benchmarks designed by the same lab that ships the model, and the cinematic resolution ceiling still trails some Western competitors at 4K. But on speed, on price, and on the SeedVideoBench multi shot tasks, Seedance has been the model to beat through most of 2025 and into 2026.

Background

ByteDance set up its Seed research group in 2023 to consolidate its frontier model work. Before that, ByteDance's video generation efforts had been scattered across product teams at TikTok, Douyin, and the Jimeng creative app. Seed pulled the foundation model work into a single group that reports up through the Volcano Engine cloud organization. The Seed name is now attached to several model families: Doubao 1.5 and 1.6 for text and reasoning, the Seedream image model series, the BAGEL multimodal model, and Seedance for video. The team publishes papers under the ByteDance Seed Team byline and runs the seed.bytedance.com research site, where most model launches are accompanied by an English language tech report and a Chinese launch event on the Volcano Engine stage.^[5]

Video generation became a strategic priority for ByteDance for two reasons. The first is product reach. TikTok and Douyin between them handle a meaningful share of global short video traffic, and CapCut is the default mobile editor for a large fraction of those creators. Putting a competitive AI video model behind the camera button is a direct product win, especially for the kinds of vertical, low budget content the apps already host. The second reason is competitive defense. Kuaishou shipped Kling in mid 2024, MiniMax shipped Hailuo AI in late 2024, and both of those models started turning up inside Chinese content workflows before ByteDance had a flagship of its own. Seedance 1.0 was, in part, the catch up release, and Seedance 2.0 was the consolidation.

The team behind Seedance includes researchers who previously worked on Doubao text models and on the Pixeldance and MagicVideo research lines that predate the Seedance brand. Pixeldance, released in late 2024, was the first ByteDance video model to attract real attention. It was effectively a research preview. The Seedance line that followed treats video as a production system with its own data pipeline, its own reward models, and its own distilled inference path. The internal SeedVideoBench evaluation set, used to compare Seedance against Veo, Sora, Kling, and Runway, was built with help from working film directors and now serves as the de facto internal benchmark the team optimizes against.^[4]

What versions of Seedance has ByteDance released?

ByteDance has shipped three named Seedance releases between June 2025 and February 2026. Each has a base variant and, in some cases, a Pro variant that adds resolution, audio, or cinematic camera control. The Pro variants are sold at a premium tier on Volcano Engine and BytePlus.

Version	Release date	Headline feature
Seedance 1.0	June 11, 2025	First public release. Text to video and image to video at 1080p, multi shot narrative output, native bilingual prompting.
Seedance 1.0 Pro	June 11, 2025	Higher quality tier of the same architecture, shipped at launch alongside the base model.
Seedance 1.5 Pro	December 16, 2025	First audio video joint generation. Adds synchronized dialogue, sound effects, ambient audio, cinema grade camera moves, and six language voice support.
Seedance 2.0	February 12, 2026	Unified multimodal architecture that accepts up to 9 images, 3 videos, and 3 audio clips per generation. Up to 15 second multi shot output with dual channel stereo audio.

The 1.0 and 1.0 Pro releases share an architecture but differ in the inference budget, the post training data mix, and the supported resolution ceiling.^[4] The 1.5 Pro release was the audio milestone: it was the first Seedance model to generate video and audio in a single forward pass rather than dubbing audio onto silent video.^[6]^[13] The 2.0 release reorganized the input pipeline around the @ mention pattern, where creators can attach up to twelve reference files and tag each one as a source of motion, style, character, camera path, or audio rhythm.^[2]

There is no public Seedance 1.0 Lite or distilled student model in the ByteDance catalog. The team has chosen to scale the model family by adding capabilities (audio, multi reference, longer duration) rather than by shrinking it down for edge inference.

How does the Seedance architecture work?

Seedance 1.0 is a diffusion transformer with a few specific design choices that the tech report calls out. The encoder is a temporally causal variational autoencoder with downsampling ratios of 4 in time and 16 in each spatial dimension, and 48 latent channels.^[4] That compression rate is what lets the model run a five second 1080p generation in 41.4 seconds on a single NVIDIA L20, a figure the tech report states directly: "Seedance 1.0 can generate a 5-second video at 1080p resolution only with 41.4 seconds (NVIDIA-L20)."^[4] The diffusion backbone is an MMDiT design with decoupled spatial and temporal attention; visual and textual tokens carry separate weights inside the multi modality self attention blocks, and the temporal layers use window partitions to keep attention cost from exploding on long clips.^[4] A multishot variant of 3D rotary positional encoding, called Multishot MM-RoPE in the paper, lets the model interleave several shot worth of visual tokens with their respective text prompts in a single sequence.^[4] That is what enables the multi shot output without stitching clips after the fact.

The 1080p output does not come from the base model directly. Seedance 1.0 trains on 480p video and then attaches a diffusion refiner, a cascaded module that upscales 480p outputs to 720p or 1080p conditioned on the original generation.^[4] This is similar in spirit to the cascaded approach used in earlier Imagen Video and Make-A-Video systems, though the refiner here is a learned diffusion model rather than a deterministic super resolution network. Training proceeds in four stages: a progressive pre training stage that ramps from 256 pixel images up through 640 pixel video at 24 frames per second, a continue training stage that increases the image to video ratio from 20 to 40 percent, a supervised fine tuning stage that uses curated video text pairs and model merging across style categories, and finally an RLHF stage that uses three separate reward models, one for foundational quality, one for motion, and one for aesthetics.^[4]

The tech report summarizes the recipe as four core improvements: "multi-source data curation augmented with precision and meaningful video captioning," an efficient architecture that "natively" supports multi-shot generation, post-training that combines "fine-grained supervised fine-tuning, and video-specific RLHF," and "excellent model acceleration achieving ~10x inference speedup through multi-stage distillation strategies."^[4]

Seedance 1.5 Pro keeps the MMDiT base but rebuilds it as a unified multimodal framework that handles audio and visual tokens in the same attention stack.^[6] The paper credits a multi stage data pipeline that prioritizes audio video coherence and motion expressiveness, an SFT plus RLHF post training loop tailored to audio visual outputs, and a multi dimensional reward model that optimizes text to video, image to video, and the new joint audio video tasks separately.^[3] The 1.5 Pro release also ships an inference acceleration stack that the team claims gives a more than 10x end to end speedup over the un-distilled base model, achieved through a combination of trajectory segmented consistency distillation, score distillation, narrowed VAE decoder channels, kernel fusion, quantization, and tensor parallelism.^[3]^[6]

Seedance 2.0 keeps the unified multimodal audio video architecture and extends it. The headline architectural change is the input side. Where 1.5 Pro accepted text plus image plus optional audio, 2.0 accepts up to nine images, three short video clips (up to fifteen seconds combined), three audio clips (up to fifteen seconds combined), and a natural language prompt in a single generation request.^[2]^[7] Each attached reference can be tagged with a role: composition, motion, camera movement, character identity, visual effects, or audio rhythm.^[2] The model uses those role tags to decide which reference contributes which signal to the output. Output is multi shot, up to fifteen seconds total, with stereo audio that includes background music, ambient effects, and synchronized dialogue with millisecond lip sync.^[2]^[7]

What can Seedance do?

The headline capabilities of the Seedance family have grown with each release. The table below captures what each public version can do.

Capability	Seedance 1.0 / 1.0 Pro	Seedance 1.5 Pro	Seedance 2.0
Text to video	Yes	Yes	Yes
Image to video	Yes	Yes	Yes
Audio in input	No	Optional	Up to 3 clips, 15s total
Video reference in input	No	No	Up to 3 clips, 15s total
Multiple image references	One image	One image	Up to 9 images
Native audio output	No	Yes	Yes, dual channel stereo
Lip sync to dialogue	No	Yes, multi language	Yes, millisecond precision
Multi shot narrative	Yes	Yes	Yes
Max resolution	1080p	1080p (480p, 720p, 1080p)	Up to 2K reported, 1080p official
Max duration	About 10 seconds	4 to 12 seconds	4 to 15 seconds
Frame rate	Up to 24 fps	24 fps	24 fps
Aspect ratios	16:9 and a few others	7 aspect ratios	16:9, 9:16, 4:3, 3:4, 21:9, 1:1
Camera control	Prompt based	Cinema grade camera moves, dolly zoom	Hitchcock zoom, orbit, tracking, handheld
Character consistency	Single shot stable	Improved multi shot	Strong with reference image tagging
Bilingual prompt support	Chinese and English	6 languages plus dialects	6 languages plus dialects

Some notes on the practical edges of the model. The 1.0 release is bilingual in the sense that it understands both Chinese and English prompts natively and will produce text in either language inside the generated video if asked.^[1] The 1.5 Pro audio system covers Chinese, English, Japanese, Korean, Spanish, and Indonesian, plus Sichuanese and Cantonese as regional dialects.^[6] The 2.0 release adds the @ mention input pattern, where a creator tags each reference file in the prompt with a role; for example, "use @character.png for the protagonist's face, @style.mp4 for the camera rhythm, and @audio.wav for the soundtrack."^[2]^[19] That input pattern is what lets a single generation pass behave like a small editorial timeline rather than a one shot prompt.

Multi shot generation deserves a separate note. The Seedance line was the first widely available video model to ship native multi shot output, meaning the model itself produces a clip with several internal cuts and consistent characters across those cuts, rather than producing several independent clips that have to be edited together. The product page frames this directly: Seedance 1.0 "natively supports the generation of narrative videos with multiple cohesive shots."^[1] SeedVideoBench measures multi shot subject consistency directly, and the Seedance 1.0 paper claims a margin of more than 100 Elo points over Veo 3 and Kling 2.0 on the image to video multi shot tasks.^[4]

How much does Seedance cost and where can you use it?

Seedance is sold as a managed API, never as open weights. The standard pricing on Volcano Engine, disclosed in March 2026, is 46 yuan (about 6.40 US dollars) per million tokens for pure text to video generation, and 28 yuan (about 3.90 US dollars) per million tokens for video editing workflows where an input video is supplied.^[8] A fifteen second 1080p generation consumes roughly 308,880 tokens, which works out to about one yuan, or fourteen US cents, per second of output.^[9] Volcano Engine grants new accounts five million free tokens at registration to seed the free experience center before billed usage begins.^[8]

International access runs through BytePlus, the English language version of the Volcano Engine product, and through several third party API resellers. fal.ai opened a Seedance 2.0 endpoint in April 2026, with separate pricing per resolution tier.^[17] Replicate and PiAPI also host the model. Inside ByteDance's own products, the model is available without paying a per second rate: Doubao users in China get access through the chat assistant, Jimeng users get access through the dedicated creative product, and CapCut users can call into the model from the mobile editor.^[14] The CapCut rollout began in March 2026 in Brazil, Indonesia, Malaysia, Mexico, the Philippines, Thailand, and Vietnam, with further markets added through the spring.^[10] Dreamina, the international web facing version, is open to anyone who creates an account at dreamina.capcut.com.

The Volcano Engine pricing for pure generation, at about fourteen cents per second, sits on the low end of the global market.^[9] According to coverage in TechNode and CnTechPost, that is roughly two orders of magnitude cheaper than Sora 2 at equivalent resolution.^[8]^[9] Whether that price holds at scale is a different question. Seedance is partly subsidized by ByteDance's existing cloud business and by the company's interest in driving usage of the Doubao and Jimeng products, so the rack rate is unlikely to match the true unit economics of a smaller standalone vendor.

How does Seedance compare to Veo 3, Sora 2, and Kling?

The market for general purpose video generation models in mid 2026 has four serious incumbents and a long tail of specialist tools. Seedance competes most directly with Veo 3 from Google DeepMind, Sora 2 from OpenAI, Kling 3.0 from Kuaishou, Hailuo from MiniMax, Runway Gen-4, and Luma Dream Machine. Each has its own strength.

Model	Vendor	Max resolution	Native audio	Max duration	Notable strength
Seedance 2.0	ByteDance	1080p (2K reported)	Stereo dual channel	15 seconds	Multi reference input, fast inference, low API price
Sora 2	OpenAI	1080p	Yes	25 seconds	Strongest physics, photorealism
Veo 3	Google DeepMind	4K	Yes	8 seconds (1080p tier)	Cinematic resolution, color science
Kling 3.0	Kuaishou	4K at 60 fps	Limited	10 seconds	Generous free tier, natural physics
Hailuo AI	MiniMax	1080p	Optional	6 to 10 seconds	Stylized expressive output
Runway Gen-4	Runway	1080p	No	10 seconds	Professional editing integration
Luma Dream Machine Ray3	Luma AI	1080p	No	10 seconds	Fast iteration, strong on motion

The practical comparison most often cited in independent reviews places Sora 2 ahead on physical realism, Veo 3 ahead on cinema grade resolution and audio fidelity, Kling ahead on resolution and free tier generosity, and Seedance 2.0 ahead on speed, character consistency through reference tagging, and price per second.^[18] That spread holds up across reviews from sources including WaveSpeed, MindStudio, and various third party tutorial sites, though those reviewers tend to test in workflows that favor whichever model ships first in their region.^[18]^[19]

Seedance has a notable architectural choice that none of its Western competitors fully match. The multi reference @ mention input pattern in version 2.0, where a single prompt can attach up to twelve files with different roles, is closer in spirit to a video editing timeline than to a chat prompt.^[2] Veo 3 and Sora 2 accept a text prompt plus an optional reference image; Seedance 2.0 accepts a directed graph of inputs. That difference shows up in production workflows where a creator wants a specific character, a specific camera move, and a specific soundtrack all applied together in one generation pass.

Inside the Chinese market, Seedance also competes with Hunyuan Video from Tencent, Wan from Alibaba, and a long tail of smaller players. Among that group, Seedance and Kling have been the two most widely deployed inside content creation workflows on TikTok and Douyin, respectively.

How has Seedance been received?

Reception of the Seedance line has been broadly positive in the AI video community, with one important caveat. Internal benchmarks designed by the model's own team should be read with care, and SeedVideoBench is no exception. With that caveat, the public rankings have been favorable.

On the Artificial Analysis Video Arena, Seedance 1.0 took first place on both the text to video and image to video leaderboards in June 2025, ahead of Veo 3, Sora, Kling 2.0, Runway Gen-4, and Wan.^[1]^[11] By the end of 2025, after the 1.5 Pro release, Seedance ranked roughly sixth on image to video and tenth on text to video in the same arena as Veo 3.1 and other newer competitors caught up.^[11] On the LMArena Video Arena, which uses community voting rather than expert evaluation, Seedance 1.0 Pro tied with Hailuo 02 for the fifth spot in the August 2025 ranking.^[12] After the 2.0 release in February 2026, ByteDance reported that Dreamina Seedance 2.0 took the number one slot on the Artificial Analysis text to video ranking with an Elo score of 1269, and an Elo of 1350 on image to video, ahead of Google Veo 3, OpenAI Sora, and Kling.^[15]

Seedance 1.0 Pro and 1.5 Pro also showed up on the SuperMaker AI tracking pages and the Higgsfield model directory through late 2025, where reviewers praised the speed of the model on short clips.^[20] The 41.4 second generation time for a five second 1080p clip on a single NVIDIA L20 was notably faster than most public competitors at the same resolution.^[4] That speed advantage is partly a function of the cascaded refiner approach: base generations run at 480p and the refiner brings them up to full resolution, which lets the model spend less time on the most expensive stages of diffusion.^[4]

Criticism of the model has clustered around three areas. The first is the 1080p ceiling. While Seedance 2.0 has been described in some sources as supporting up to 2K output, the official ByteDance documentation tops out at 1080p, and Veo 3 retains a clean lead at the 4K tier.^[2] The second is the closed weights stance. Unlike Wan from Alibaba, which has open weights and a community ecosystem of LoRA fine tunes, Seedance is a managed product and cannot be self hosted. That is fine for content creators but limits the model's reach in research. The third is the policy stack. Seedance 2.0 ships with a structured intellectual property protection layer that blocks generation of real public figures, copyrighted characters, and recognizable brand identities, and it embeds C2PA style invisible watermarking on every output.^[19] Those safety choices are widely seen as positive but they do close off some uses that competing models still permit.

The practical reception inside content creation workflows has been strong. CapCut users in the rollout countries have used the model for cooking recipe videos, fitness tutorial sequences, product overviews, and short narrative pieces.^[10] The Dreamina free tier on the web has become a common entry point for creators who want to try the model without setting up a Volcano Engine account. Inside China, the integration into Doubao has put video generation in front of the existing Doubao chat user base, which according to ByteDance was already well into the hundreds of millions of monthly active users by the time Seedance 1.5 Pro shipped.^[14]

References

ByteDance Seed Team, "Seedance 1.0" product page, seed.bytedance.com/en/seedance. ↩
ByteDance Seed Team, "Seedance 2.0" product page, seed.bytedance.com/en/seedance2_0. ↩
ByteDance Seed Team, "Seedance 1.5 Pro" product page, seed.bytedance.com/en/seedance1_5_pro. ↩
ByteDance Seed Team, "Seedance 1.0: Exploring the Boundaries of Video Generation Models," arXiv:2506.09113, June 2025. ↩
ByteDance Seed Team, "Tech Report of Seedance 1.0 Is Now Publicly Available," Seed blog, June 2025. ↩
ByteDance Seed Team, "The Official Release of Seedance 1.5 Pro: Sound and Vision All in One Take," Seed blog, December 16, 2025. ↩
ByteDance Seed Team, "Official Launch of Seedance 2.0," Seed blog, February 12, 2026. ↩
CnTechPost, "ByteDance announces API pricing for AI video model Seedance 2.0," March 4, 2026. ↩
TechNode, "ByteDance's Seedance 2.0 video model costs about $0.14 per second," March 5, 2026. ↩
TechCrunch, "ByteDance's new AI video generation model, Dreamina Seedance 2.0, comes to CapCut," March 26, 2026. ↩
Artificial Analysis, Video Generation Model Arena, artificialanalysis.ai/text-to-video/arena. ↩
LMArena, Video Arena Leaderboard, August 2025 results. ↩
EqualOcean, "ByteDance Releases Seedance 1.5 Pro, Supporting Joint Audio-Video Generation and Cinema-Grade Camera Movements," December 2025. ↩
Global Times, "ByteDance launches Seedance 2.0, integrating across Doubao, Dreamina platforms," February 2026. ↩
AIBase, "Seedance 2.0 Launches Globally, Tops the Artificial Analysis Video Ranking List," February 2026. ↩
Vercel AI Gateway, "Seedance v1.5 Pro by ByteDance," model card and pricing reference.
fal.ai, Seedance 2.0 API documentation, April 2026. ↩
WaveSpeed, "Seedance 2.0 vs Kling 3.0 vs Sora 2 vs Veo 3.1: The Ultimate Video Generation Comparison," 2026. ↩
MindStudio, "What Is Seedance 2.0? ByteDance's AI Video Model Release, Guardrails, and Workflow Guide," 2026. ↩
MindStudio, "What Is Seedance 1.5 Pro? ByteDance's AI Video Generation Model," December 2025. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

4 revisions by 1 contributors · full history

Suggest edit

What links here

Best AI Video Generators ByteDance Seed ByteDance Seed3D 2.0 Doubao Grok Imagine HappyHorse-1.0 Jimeng (Dreamina)Kling 2.1 Kling 3.0 Seedance 2.0 Seedream Seedream 4.0 Seedream 5.0 Vidu (video generation)Wan 2.1-VACE Wan 2.5

Background

What versions of Seedance has ByteDance released?

How does the Seedance architecture work?

What can Seedance do?

How much does Seedance cost and where can you use it?

How does Seedance compare to Veo 3, Sora 2, and Kling?

How has Seedance been received?

See also

References

Improve this article

Related Articles

Wan 2.1-VACE

Wan 2.5

Luma Dream Machine

Runway Act-Two

Runway Aleph

Pika 2.5

What links here

Related Articles

Wan 2.1-VACE

Wan 2.5

Luma Dream Machine

Runway Act-Two

Runway Aleph

Pika 2.5

What links here