Seedance
Last reviewed
May 16, 2026
Sources
20 citations
Review status
Source-backed
Revision
v1 · 3,739 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 16, 2026
Sources
20 citations
Review status
Source-backed
Revision
v1 · 3,739 words
Add missing citations, update stale details, or suggest a clearer explanation.
Seedance is the family of foundation video generation models built by the Seed team at ByteDance, the Chinese internet company that owns TikTok and Douyin. The first version, Seedance 1.0, launched on June 11, 2025 at the Volcano Engine Force Original Power Conference and quickly took the top spot on the Artificial Analysis text-to-video and image-to-video leaderboards. A second numbered release, Seedance 2.0, followed on February 12, 2026, adding a unified multimodal architecture that jointly generates video and audio from text, image, audio, and video prompts. Together with Tencent's Hunyuan Video and Kuaishou's Kling, Seedance is one of the three Chinese model families that competes directly with Google's Veo 3 and OpenAI's Sora 2.
The model is delivered through ByteDance's own consumer products: Doubao, the company's main chat assistant; Jimeng (the Chinese version of Dreamina), its dedicated creative platform; and CapCut, the editing app used by hundreds of millions of mobile creators. Outside of China the model also reaches creators through the Dreamina web app and through paid API access on Volcano Engine (the Chinese cloud arm) and BytePlus (the international cloud arm), at a list price of about one yuan per second of pure text-to-video output. Independent third party endpoints such as fal.ai and Replicate offer the same model with different billing tiers.
Seedance models are positioned as commercial products rather than open weights. ByteDance has published a long technical report describing the underlying diffusion transformer, the RLHF stack, and the inference distillation pipeline for Seedance 1.0, but it has not released the model parameters. The reception inside the AI video community has been broadly positive: SeedVideoBench results, LMArena Video Arena voting, and Artificial Analysis Elo scores have all placed Seedance versions at or near the top of the public rankings, often ahead of older releases from Google, OpenAI, Runway, and Kuaishou. There is, of course, healthy skepticism about benchmarks designed by the same lab that ships the model, and the cinematic resolution ceiling still trails some Western competitors at 4K. But on speed, on price, and on the SeedVideoBench multi shot tasks, Seedance has been the model to beat through most of 2025 and into 2026.
ByteDance set up its Seed research group in 2023 to consolidate its frontier model work. Before that, ByteDance's video generation efforts had been scattered across product teams at TikTok, Douyin, and the Jimeng creative app. Seed pulled the foundation model work into a single group that reports up through the Volcano Engine cloud organization. The Seed name is now attached to several model families: Doubao 1.5 and 1.6 for text and reasoning, the Seedream image model series, the BAGEL multimodal model, and Seedance for video. The team publishes papers under the ByteDance Seed Team byline and runs the seed.bytedance.com research site, where most model launches are accompanied by an English language tech report and a Chinese launch event on the Volcano Engine stage.
Video generation became a strategic priority for ByteDance for two reasons. The first is product reach. TikTok and Douyin between them handle a meaningful share of global short video traffic, and CapCut is the default mobile editor for a large fraction of those creators. Putting a competitive AI video model behind the camera button is a direct product win, especially for the kinds of vertical, low budget content the apps already host. The second reason is competitive defense. Kuaishou shipped Kling in mid 2024, MiniMax shipped Hailuo AI in late 2024, and both of those models started turning up inside Chinese content workflows before ByteDance had a flagship of its own. Seedance 1.0 was, in part, the catch up release, and Seedance 2.0 was the consolidation.
The team behind Seedance includes researchers who previously worked on Doubao text models and on the Pixeldance and MagicVideo research lines that predate the Seedance brand. Pixeldance, released in late 2024, was the first ByteDance video model to attract real attention. It was effectively a research preview. The Seedance line that followed treats video as a production system with its own data pipeline, its own reward models, and its own distilled inference path. The internal SeedVideoBench evaluation set, used to compare Seedance against Veo, Sora, Kling, and Runway, was built with help from working film directors and now serves as the de facto internal benchmark the team optimizes against.
ByteDance has shipped three named Seedance releases between June 2025 and February 2026. Each has a base variant and, in some cases, a Pro variant that adds resolution, audio, or cinematic camera control. The Pro variants are sold at a premium tier on Volcano Engine and BytePlus.
| Version | Release date | Headline feature |
|---|---|---|
| Seedance 1.0 | June 11, 2025 | First public release. Text to video and image to video at 1080p, multi shot narrative output, native bilingual prompting. |
| Seedance 1.0 Pro | June 11, 2025 | Higher quality tier of the same architecture, shipped at launch alongside the base model. |
| Seedance 1.5 Pro | December 16, 2025 | First audio video joint generation. Adds synchronized dialogue, sound effects, ambient audio, cinema grade camera moves, and six language voice support. |
| Seedance 2.0 | February 12, 2026 | Unified multimodal architecture that accepts up to 9 images, 3 videos, and 3 audio clips per generation. Up to 15 second multi shot output with dual channel stereo audio. |
The 1.0 and 1.0 Pro releases share an architecture but differ in the inference budget, the post training data mix, and the supported resolution ceiling. The 1.5 Pro release was the audio milestone: it was the first Seedance model to generate video and audio in a single forward pass rather than dubbing audio onto silent video. The 2.0 release reorganized the input pipeline around the @ mention pattern, where creators can attach up to twelve reference files and tag each one as a source of motion, style, character, camera path, or audio rhythm.
There is no public Seedance 1.0 Lite or distilled student model in the ByteDance catalog. The team has chosen to scale the model family by adding capabilities (audio, multi reference, longer duration) rather than by shrinking it down for edge inference.
Seedance 1.0 is a diffusion transformer with a few specific design choices that the tech report calls out. The encoder is a temporally causal variational autoencoder with downsampling ratios of 4 in time and 16 in each spatial dimension, and 48 latent channels. That compression rate is what lets the model run a five second 1080p generation in 41.4 seconds on a single NVIDIA L20. The diffusion backbone is an MMDiT design with decoupled spatial and temporal attention; visual and textual tokens carry separate weights inside the multi modality self attention blocks, and the temporal layers use window partitions to keep attention cost from exploding on long clips. A multishot variant of 3D rotary positional encoding, called Multishot MM-RoPE in the paper, lets the model interleave several shot worth of visual tokens with their respective text prompts in a single sequence. That is what enables the multi shot output without stitching clips after the fact.
The 1080p output does not come from the base model directly. Seedance 1.0 trains on 480p video and then attaches a diffusion refiner, a cascaded module that upscales 480p outputs to 720p or 1080p conditioned on the original generation. This is similar in spirit to the cascaded approach used in earlier Imagen Video and Make-A-Video systems, though the refiner here is a learned diffusion model rather than a deterministic super resolution network. Training proceeds in four stages: a progressive pre training stage that ramps from 256 pixel images up through 640 pixel video at 24 frames per second, a continue training stage that increases the image to video ratio from 20 to 40 percent, a supervised fine tuning stage that uses curated video text pairs and model merging across style categories, and finally an RLHF stage that uses three separate reward models, one for foundational quality, one for motion, and one for aesthetics.
Seedance 1.5 Pro keeps the MMDiT base but rebuilds it as a unified multimodal framework that handles audio and visual tokens in the same attention stack. The paper credits a multi stage data pipeline that prioritizes audio video coherence and motion expressiveness, an SFT plus RLHF post training loop tailored to audio visual outputs, and a multi dimensional reward model that optimizes text to video, image to video, and the new joint audio video tasks separately. The 1.5 Pro release also ships an inference acceleration stack that the team claims gives a more than 10x end to end speedup over the un-distilled base model, achieved through a combination of trajectory segmented consistency distillation, score distillation, narrowed VAE decoder channels, kernel fusion, quantization, and tensor parallelism.
Seedance 2.0 keeps the unified multimodal audio video architecture and extends it. The headline architectural change is the input side. Where 1.5 Pro accepted text plus image plus optional audio, 2.0 accepts up to nine images, three short video clips (up to fifteen seconds combined), three audio clips (up to fifteen seconds combined), and a natural language prompt in a single generation request. Each attached reference can be tagged with a role: composition, motion, camera movement, character identity, visual effects, or audio rhythm. The model uses those role tags to decide which reference contributes which signal to the output. Output is multi shot, up to fifteen seconds total, with stereo audio that includes background music, ambient effects, and synchronized dialogue with millisecond lip sync.
The headline capabilities of the Seedance family have grown with each release. The table below captures what each public version can do.
| Capability | Seedance 1.0 / 1.0 Pro | Seedance 1.5 Pro | Seedance 2.0 |
|---|---|---|---|
| Text to video | Yes | Yes | Yes |
| Image to video | Yes | Yes | Yes |
| Audio in input | No | Optional | Up to 3 clips, 15s total |
| Video reference in input | No | No | Up to 3 clips, 15s total |
| Multiple image references | One image | One image | Up to 9 images |
| Native audio output | No | Yes | Yes, dual channel stereo |
| Lip sync to dialogue | No | Yes, multi language | Yes, millisecond precision |
| Multi shot narrative | Yes | Yes | Yes |
| Max resolution | 1080p | 1080p (480p, 720p, 1080p) | Up to 2K reported, 1080p official |
| Max duration | About 10 seconds | 4 to 12 seconds | 4 to 15 seconds |
| Frame rate | Up to 24 fps | 24 fps | 24 fps |
| Aspect ratios | 16:9 and a few others | 7 aspect ratios | 16:9, 9:16, 4:3, 3:4, 21:9, 1:1 |
| Camera control | Prompt based | Cinema grade camera moves, dolly zoom | Hitchcock zoom, orbit, tracking, handheld |
| Character consistency | Single shot stable | Improved multi shot | Strong with reference image tagging |
| Bilingual prompt support | Chinese and English | 6 languages plus dialects | 6 languages plus dialects |
Some notes on the practical edges of the model. The 1.0 release is bilingual in the sense that it understands both Chinese and English prompts natively and will produce text in either language inside the generated video if asked. The 1.5 Pro audio system covers Chinese, English, Japanese, Korean, Spanish, and Indonesian, plus Sichuanese and Cantonese as regional dialects. The 2.0 release adds the @ mention input pattern, where a creator tags each reference file in the prompt with a role; for example, "use @character.png for the protagonist's face, @style.mp4 for the camera rhythm, and @audio.wav for the soundtrack." That input pattern is what lets a single generation pass behave like a small editorial timeline rather than a one shot prompt.
Multi shot generation deserves a separate note. The Seedance line was the first widely available video model to ship native multi shot output, meaning the model itself produces a clip with several internal cuts and consistent characters across those cuts, rather than producing several independent clips that have to be edited together. SeedVideoBench measures multi shot subject consistency directly, and the Seedance 1.0 paper claims a margin of more than 100 Elo points over Veo 3 and Kling 2.0 on the image to video multi shot tasks.
Seedance is sold as a managed API, never as open weights. The standard pricing on Volcano Engine, disclosed in March 2026, is 46 yuan (about 6.40 US dollars) per million tokens for pure text to video generation, and 28 yuan (about 3.90 US dollars) per million tokens for video editing workflows where an input video is supplied. A fifteen second 1080p generation consumes roughly 308,880 tokens, which works out to about one yuan, or fourteen US cents, per second of output. Volcano Engine grants new accounts five million free tokens at registration to seed the free experience center before billed usage begins.
International access runs through BytePlus, the English language version of the Volcano Engine product, and through several third party API resellers. fal.ai opened a Seedance 2.0 endpoint in April 2026, with separate pricing per resolution tier. Replicate and PiAPI also host the model. Inside ByteDance's own products, the model is available without paying a per second rate: Doubao users in China get access through the chat assistant, Jimeng users get access through the dedicated creative product, and CapCut users can call into the model from the mobile editor. The CapCut rollout began in March 2026 in Brazil, Indonesia, Malaysia, Mexico, the Philippines, Thailand, and Vietnam, with further markets added through the spring. Dreamina, the international web facing version, is open to anyone who creates an account at dreamina.capcut.com.
The Volcano Engine pricing for pure generation, at about fourteen cents per second, sits on the low end of the global market. According to coverage in TechNode and CnTechPost, that is roughly two orders of magnitude cheaper than Sora 2 at equivalent resolution. Whether that price holds at scale is a different question. Seedance is partly subsidized by ByteDance's existing cloud business and by the company's interest in driving usage of the Doubao and Jimeng products, so the rack rate is unlikely to match the true unit economics of a smaller standalone vendor.
The market for general purpose video generation models in mid 2026 has four serious incumbents and a long tail of specialist tools. Seedance competes most directly with Veo 3 from Google DeepMind, Sora 2 from OpenAI, Kling 3.0 from Kuaishou, Hailuo from MiniMax, Runway Gen-4, and Luma Dream Machine. Each has its own strength.
| Model | Vendor | Max resolution | Native audio | Max duration | Notable strength |
|---|---|---|---|---|---|
| Seedance 2.0 | ByteDance | 1080p (2K reported) | Stereo dual channel | 15 seconds | Multi reference input, fast inference, low API price |
| Sora 2 | OpenAI | 1080p | Yes | 25 seconds | Strongest physics, photorealism |
| Veo 3 | Google DeepMind | 4K | Yes | 8 seconds (1080p tier) | Cinematic resolution, color science |
| Kling 3.0 | Kuaishou | 4K at 60 fps | Limited | 10 seconds | Generous free tier, natural physics |
| Hailuo AI | MiniMax | 1080p | Optional | 6 to 10 seconds | Stylized expressive output |
| Runway Gen-4 | Runway | 1080p | No | 10 seconds | Professional editing integration |
| Luma Dream Machine Ray3 | Luma AI | 1080p | No | 10 seconds | Fast iteration, strong on motion |
The practical comparison most often cited in independent reviews places Sora 2 ahead on physical realism, Veo 3 ahead on cinema grade resolution and audio fidelity, Kling ahead on resolution and free tier generosity, and Seedance 2.0 ahead on speed, character consistency through reference tagging, and price per second. That spread holds up across reviews from sources including WaveSpeed, MindStudio, and various third party tutorial sites, though those reviewers tend to test in workflows that favor whichever model ships first in their region.
Seedance has a notable architectural choice that none of its Western competitors fully match. The multi reference @ mention input pattern in version 2.0, where a single prompt can attach up to twelve files with different roles, is closer in spirit to a video editing timeline than to a chat prompt. Veo 3 and Sora 2 accept a text prompt plus an optional reference image; Seedance 2.0 accepts a directed graph of inputs. That difference shows up in production workflows where a creator wants a specific character, a specific camera move, and a specific soundtrack all applied together in one generation pass.
Inside the Chinese market, Seedance also competes with Hunyuan Video from Tencent, Wan from Alibaba, and a long tail of smaller players. Among that group, Seedance and Kling have been the two most widely deployed inside content creation workflows on TikTok and Douyin, respectively.
Reception of the Seedance line has been broadly positive in the AI video community, with one important caveat. Internal benchmarks designed by the model's own team should be read with care, and SeedVideoBench is no exception. With that caveat, the public rankings have been favorable.
On the Artificial Analysis Video Arena, Seedance 1.0 took first place on both the text to video and image to video leaderboards in June 2025, ahead of Veo 3, Sora, Kling 2.0, Runway Gen-4, and Wan. By the end of 2025, after the 1.5 Pro release, Seedance ranked roughly sixth on image to video and tenth on text to video in the same arena as Veo 3.1 and other newer competitors caught up. On the LMArena Video Arena, which uses community voting rather than expert evaluation, Seedance 1.0 Pro tied with Hailuo 02 for the fifth spot in the August 2025 ranking. After the 2.0 release in February 2026, ByteDance reported that Dreamina Seedance 2.0 took the number one slot on the Artificial Analysis text to video ranking with an Elo score of 1269, ahead of Google Veo 3, OpenAI Sora, and Kling.
Seedance 1.0 Pro and 1.5 Pro also showed up on the SuperMaker AI tracking pages and the Higgsfield model directory through late 2025, where reviewers praised the speed of the model on short clips. The 41.4 second generation time for a five second 1080p clip on a single NVIDIA L20 was notably faster than most public competitors at the same resolution. That speed advantage is partly a function of the cascaded refiner approach: base generations run at 480p and the refiner brings them up to full resolution, which lets the model spend less time on the most expensive stages of diffusion.
Criticism of the model has clustered around three areas. The first is the 1080p ceiling. While Seedance 2.0 has been described in some sources as supporting up to 2K output, the official ByteDance documentation tops out at 1080p, and Veo 3 retains a clean lead at the 4K tier. The second is the closed weights stance. Unlike Wan from Alibaba, which has open weights and a community ecosystem of LoRA fine tunes, Seedance is a managed product and cannot be self hosted. That is fine for content creators but limits the model's reach in research. The third is the policy stack. Seedance 2.0 ships with a structured intellectual property protection layer that blocks generation of real public figures, copyrighted characters, and recognizable brand identities, and it embeds C2PA style invisible watermarking on every output. Those safety choices are widely seen as positive but they do close off some uses that competing models still permit.
The practical reception inside content creation workflows has been strong. CapCut users in the rollout countries have used the model for cooking recipe videos, fitness tutorial sequences, product overviews, and short narrative pieces. The Dreamina free tier on the web has become a common entry point for creators who want to try the model without setting up a Volcano Engine account. Inside China, the integration into Doubao has put video generation in front of the existing Doubao chat user base, which according to ByteDance was already well into the hundreds of millions of monthly active users by the time Seedance 1.5 Pro shipped.