Doubao Seedance

AI Models Chinese AI Video Generation

15 min read

Updated Jul 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 23, 2026

Fact-checked

In review queue

Sources

9 citations

Revision

v3 · 3,061 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Doubao-Seedance is a family of video generation foundation models developed by ByteDance Seed, the AI research division of Chinese technology conglomerate ByteDance. The flagship model, Seedance 1.0 Pro, was announced by ByteDance's cloud subsidiary Volcano Engine on June 11, 2025 at the company's FORCE Original Power Conference in Beijing, alongside the Doubao 1.6 large language model.^[1] Seedance supports both text-to-video and image-to-video generation at up to 1080p resolution, with native support for coherent multi-shot narratives, and the accompanying technical report claims a roughly 10x inference speedup over comparable systems.^[2] At launch, ByteDance positioned the model as a direct competitor to Google's Veo 3, OpenAI's Sora, Kuaishou's Kling and Runway's video models, citing first-place rankings on the Artificial Analysis video arena leaderboards.^[3] The model is exposed commercially through Volcano Engine's "Ark" API platform and is integrated into ByteDance's consumer-facing Doubao chatbot and the Jimeng (Dreamina) creative tool.^[1]

Infobox

Field	Value
Developer	ByteDance Seed (Seed Vision Team)
Type	Text-to-video and image-to-video foundation model
First public release	June 11, 2025 (Seedance 1.0 Pro)^[1]
Lite variant release	May 13, 2025 (Seedance 1.0 Lite)^[4]
Technical report	arXiv 2506.09113, June 10, 2025^[2]
Maximum resolution	1080p (Pro), 720p (Lite)^[5]
Maximum clip duration	10 seconds^[5]
Distribution	Volcano Engine API, Doubao app, Jimeng/Dreamina^[1]
Pricing (launch)	3.67 yuan per 5-second 1080p clip via Volcano Engine^[1]

History and Background

ByteDance's Seed research group, established in 2023 as the company's foundation-model arm, organizes its generative-media work under product-family names beginning with "Seed". Within that umbrella the Seedream series covers text-to-image and image editing, while Seedance is the parallel line for video.^[6] The first publicly named variant was Seedance 1.0 Lite, a "small-parameter" model that Volcano Engine unveiled on May 13, 2025 at the Shanghai stop of its FORCE LINK AI Innovation Tour. The Lite model supported 5-second and 10-second clips at 480p or 720p, was made available via the Volcano Ark enterprise platform, and was offered to consumers through the Doubao app and the Jimeng (Dreamina) creative platform.^[4]

The headline release came roughly four weeks later. On June 11, 2025 at the FORCE Original Power Conference in Beijing, Volcano Engine president Tan Dai introduced Seedance 1.0 Pro together with Doubao 1.6 and Doubao 1.6-vision. The Pro variant raised the maximum resolution to 1080p and emphasized multi-shot narrative generation, character consistency across shots, and cinematic camera control.^[1] On the same day, a 44-author technical report titled "Seedance 1.0: Exploring the Boundaries of Video Generation Models" appeared on arXiv (preprint 2506.09113), authored by ByteDance's Seed Vision Team and led by Yu Gao and Haoyuan Guo.^[2]

The June 2025 launch came during a period of rapid concentration in the Chinese video-generation market. Kuaishou's Kling, MiniMax's Hailuo, Alibaba's Wan, and Tencent's Hunyuan had all shipped competitive systems earlier in 2025, while Google had released Veo 3 in May 2025 with synchronized audio generation. Press coverage emphasized that Seedance was explicitly framed as a benchmark-topping response: ByteDance announced that the model ranked first on both the text-to-video and image-to-video boards of Artificial Analysis's video arena on June 10, 2025, ahead of Veo 3, Kling 2.1, and OpenAI's Sora.^[2]^[3]

Seedance development continued after the 1.0 launch. Volcano Engine announced Seedance 1.0 Pro updates including first-and-last-frame conditioning, and at the December 18, 2025 FORCE conference released Doubao 1.8 alongside Seedance 1.5 Pro, an audio-video creation model.^[7] Seedance 2.0, a unified multimodal audio-video model, was released by the Doubao team on February 9, 2026.^[8] Coverage in TechCrunch in early 2026 described a phased global rollout of the Seedance 2.0 model through the CapCut video editing platform, with initial markets including Brazil, Indonesia, Malaysia, Mexico, the Philippines, Thailand and Vietnam.^[9]

Technical Details

The Seedance 1.0 report frames the system as a video foundation model that "simultaneously balances prompt following, motion plausibility, and visual quality," and bundles four contributions: a curated multi-source data pipeline, an efficient architecture with a tailored training paradigm, a post-training regime that combines supervised fine-tuning with video-specific reinforcement learning from human feedback, and inference acceleration through distillation and systems-level engineering.^[2]

Architecture

Seedance 1.0 adopts a two-stage pipeline: a base diffusion model produces 480p video latents, and a learned diffusion refiner upscales them to 720p or 1080p, an approach the authors describe as a "cascaded framework". The video tokenizer is a temporally causal variational autoencoder with compression ratios of (4, 16, 16) along the temporal, height, and width dimensions, projecting video into a 48-channel latent space.^[2] The generator itself is a diffusion transformer using the MMDiT (Multi-Modality Diffusion Transformer) design, with decoupled spatial and temporal layers: spatial layers attend within frames, while temporal layers attend across frames. Multi-modal rotary position embeddings (MM-RoPE, an extension of rotary positional encoding) provide positional structure across text, time and image axes, and the authors argue this interleaved encoding is what allows the model to natively generate coherent multi-shot videos rather than concatenating independent clips.^[2]

Data and training pipeline

The data pipeline applies shot-aware temporal segmentation (with a 12-second cap per clip), removes blurry or low-aesthetic content via quality and safety filtering, rectifies visual overlays such as logos and watermarks, and performs semantic deduplication using video embeddings. Captions integrate dynamic features (actions, camera movements) with static descriptors (appearance, aesthetics, style).^[2]

Training proceeds in four phases:

Pre-training: progressive stages beginning with 256-pixel text-to-image, advancing to 640-pixel multi-modal training, then 24-fps video refinement.
Continued training: the image-to-video data ratio is raised from 20% to 40% with higher-quality samples.
Supervised fine-tuning on human-curated video-text pairs across hundreds of categories.
Video-specific reinforcement learning from human feedback using three specialized reward models that target visual-language alignment, motion quality, and aesthetic quality.^[2]

Inference acceleration

The report attributes the headline "~10x speedup" to a stack of techniques rather than a single optimization: Trajectory Segmented Consistency Distillation (TSCD), score distillation derived from a "RayFlow" objective, a thinner VAE decoder that doubles decode speed, mixed-precision quantization, adaptive hybrid parallelism, and asynchronous offloading.^[2] On NVIDIA L20 hardware, Seedance 1.0 generates a 5-second 1080p clip in 41.4 seconds, a figure the authors highlight as roughly two to four times faster than comparable commercial systems at similar resolution.^[2]

Multi-shot generation

A distinguishing claim of the Seedance 1.0 paper is that the model handles multi-shot generation natively rather than through post hoc stitching. The MM-RoPE positional scheme is the principal mechanism: by encoding shot boundaries into the position embedding alongside spatial and temporal coordinates, the model can plan view transitions while preserving subject and style identity across cuts.^[2] The architecture also unifies text-to-image, text-to-video and image-to-video tasks in a single backbone, which the authors argue improves prompt following and stylistic transfer.^[2]

Variants

Seedance 1.0 is publicly documented in two named editions.

Variant	Release date	Max resolution	Durations	Notes
Seedance 1.0 Lite	May 13, 2025^[4]	480p, 720p^[4]	5 s, 10 s^[4]	Lower-parameter model optimized for speed and cost; available via Volcano Ark, Doubao app, and Jimeng/Dreamina^[4]
Seedance 1.0 Pro	June 11, 2025^[1]	1080p^[1]	up to 10 s^[1]	Flagship model; first-place rankings on Artificial Analysis arenas at launch^[2]^[3]

ByteDance has subsequently released Seedance 1.5 Pro (December 18, 2025), which extends Seedance 1.0 Pro with synchronized audio output, and Seedance 2.0 (February 9, 2026), a unified multimodal model.^[7]^[8] Those successors are documented in their own articles and are not the subject of this entry.

Distribution and Pricing

Seedance is distributed through three primary surfaces, all controlled by ByteDance:

The Volcano Engine "Ark" API platform (volcengine.com), targeted at enterprise developers, exposes endpoints under the model identifier doubao-seedance-1-0-pro and a corresponding Lite endpoint.^[1]^[2]
The Doubao consumer chatbot exposes Seedance through video-generation prompts in the Doubao app.^[4]
The Jimeng (international name Dreamina) creative platform, which is part of the CapCut/Dreamina family, provides a creator-focused interface to Seedance.^[4]

At the June 2025 launch, Volcano Engine quoted Seedance 1.0 Pro pricing at 0.015 yuan per 1,000 tokens, which the company illustrated as roughly 3.67 yuan (approximately fifty US cents) for a 5-second 1080p clip.^[1] ByteDance presented the pricing as substantially undercutting Western competitors, although direct per-second comparisons depend on resolution, length, and feature flags. The accompanying Doubao 1.6 large language model dropped input pricing to 0.8 yuan per million tokens in the 0-32K input range, framing the announcement as a coordinated price reduction across the Seed product family.^[1]

In the same announcement Volcano Engine also released the Doubao 1.6-vision model (a tool-calling multimodal LLM) and an upgrade to its AI cloud-native services, including a Model Context Protocol service and the PromptPilot prompt tool, presenting Seedance as one node in a broader generative-AI stack rather than a standalone product.^[1]

Performance and Benchmarks

The Seedance 1.0 paper reports evaluation on two benchmark suites: the publicly visible Artificial Analysis Video Arena, and ByteDance's internal SeedVideoBench-1.0.^[2]

On the Artificial Analysis arena snapshot of June 10, 2025, Seedance 1.0 ranked first on both the text-to-video and image-to-video leaderboards. The authors specifically note an Elo margin exceeding 100 points over Veo 3 and Kling 2.0 on the image-to-video board.^[2] Independent coverage by Time magazine confirmed the leaderboard placement, characterizing Seedance 1.0 as outperforming OpenAI Sora, Veo 3, Runway's Gen-4, Kling 2.1 and Alibaba's Wan family on the arena's public-voting metric.^[3]

The internal SeedVideoBench-1.0 evaluation uses 300 prompts each for text-to-video and image-to-video, scored on four axes: motion quality (structural accuracy, plausibility, stability, vividness), prompt following (action responsiveness, fidelity, style conformity), aesthetic quality (texture, material detail, artistic expression), and preservation (subject and style consistency in the image-to-video setting).^[2] The authors situate Seedance 1.0 as offering balanced performance: they note that Kling 2.1 excels at motion quality and visual fidelity but is weaker on prompt following, that Veo 3 is stronger on prompt following and photorealism but weaker on motion, and that Sora underperforms on both motion quality and prompt adherence in their evaluation.^[2]

These claims are vendor-disclosed and use a vendor-curated prompt set. Independent technical reviewers have generally corroborated the high Artificial Analysis ranking through 2025 and into 2026, although successor releases (Veo 3.1, Sora 2, Kling 3.0, Wan 2.5) have since shifted the relative ordering.^[3]

Relationship to Doubao and CapCut

Seedance is one of several "Doubao" models hosted on Volcano Engine. The Doubao name belongs to ByteDance's consumer chatbot, but the company applies the brand to its full enterprise model line, including the Doubao Seed 1.6 large language model, the Seedance video models, the Seedream image models, and a number of specialized variants (Seed Code, Seed-VL, Seed-Embedding).^[6] Within the Doubao chatbot, users can invoke Seedance for video generation directly in the chat interface.^[4]

Seedance also integrates with two of ByteDance's largest consumer media properties:

Jimeng (Dreamina), a standalone creative app for AI image and video generation that targets professional creators, exposes Seedance generations with finer control surfaces than the consumer Doubao app.^[4]
The CapCut family, which includes the global CapCut video editor and the Pippit marketing tool, integrates Seedance through the Dreamina branding. Initial integration paths for later Seedance versions were rolled out across CapCut mobile, desktop and web in 2026, and the editing workflow allows direct publication to ByteDance's TikTok platform.^[9]

The Seedance technical report and Volcano Engine pages do not describe the relationship between consumer Doubao prompts and the underlying Ark API beyond noting that both surfaces invoke the same model family.^[2]

Comparison with Seedream

Seedance and Seedream are sister model families within ByteDance Seed's generative-media line, designed to share data curation infrastructure and a common architectural lineage rooted in diffusion transformers, but specialized for different modalities. Seedream 3.0, released April 17, 2025 on Volcano Engine, is a bilingual text-to-image model that competes with Midjourney, Imagen, Ideogram and the imaging modes of OpenAI's GPT-4o.^[6] Seedance 1.0 then extends the family to video. Both lines emphasize Chinese-English bilingual prompting, multi-resolution output, and a unified architecture for related sub-tasks (image generation plus editing for Seedream; text-to-video plus image-to-video for Seedance).^[2]^[6]

Family	First named release	Modality	Bilingual	Distribution
Seedream (3.0)	April 17, 2025^[6]	Text-to-image, image editing	Yes (zh, en)^[6]	Volcano Engine API, Doubao, Jimeng^[6]
Seedance (1.0 Lite)	May 13, 2025^[4]	Text-to-video, image-to-video	Yes (zh, en)^[2]	Volcano Engine API, Doubao, Jimeng^[4]
Seedance (1.0 Pro)	June 11, 2025^[1]	Text-to-video, image-to-video, multi-shot	Yes (zh, en)^[2]	Volcano Engine API, Doubao, Jimeng^[1]

Comparison with Other Video Generation Models

The Seedance 1.0 technical report explicitly benchmarks against contemporary closed and open systems. The comparisons below are drawn from the paper's evaluation and from contemporaneous third-party coverage.

System	Vendor	Notable strengths cited in Seedance evaluation
Veo 3	Google DeepMind	Strong prompt following and photorealism; weaker motion quality per SeedVideoBench^[2]
Sora	OpenAI	Underperforms on motion quality and prompt adherence in SeedVideoBench^[2]
Kling 2.1 (Master)	Kuaishou	Strong motion quality and visual fidelity; limited prompt following per SeedVideoBench^[2]
Hailuo AI	MiniMax	Competing Chinese text-to-video model; not directly tabulated in Seedance paper^[3]
Runway Gen-3 Alpha / Gen-4	Runway	Listed in Artificial Analysis arena comparisons below Seedance 1.0^[3]
Wan (2.1)	Alibaba	Substantially outperformed across SeedVideoBench dimensions per the paper^[2]
Pika	Pika Labs	Not benchmarked in the Seedance paper
Stable Video Diffusion	Stability AI	Earlier open-weights baseline; not benchmarked in the Seedance paper
Open-Sora	HPC-AI Tech and contributors	Open-weights research baseline; not benchmarked in the Seedance paper

The Time magazine coverage of ByteDance's visual-model strategy summarized Seedance's positioning as that of a Chinese system "challenging the entire Western tier" of video models, citing the Artificial Analysis arena results as evidence and noting that ByteDance also leads on text-to-image with the Seedream line.^[3]

Limitations

Public assessments of Seedance 1.0 identify several caveats:

Long-form coherence: although the multi-shot mechanism is a marquee feature, the model is bounded to ten-second clips per generation, and consistency across longer narratives still requires user-side stitching or scripting.^[5]
Photorealistic faces and identities: like other 2025-era systems, Seedance generates plausible but not identity-stable human likenesses; later Seedance 2.0 releases imposed explicit safeguards against face-based image-to-video conversion to mitigate identity-misuse risk.^[9]
Evaluation provenance: the headline benchmark figures originate from vendor-disclosed evaluations (SeedVideoBench) and a community-voting platform (Artificial Analysis). Both have known limitations, and successor models from competitors have since reshuffled the leaderboards.^[3]
Access geography: Seedance 1.0 was launched primarily for the Chinese market through Volcano Engine and the Doubao/Jimeng surfaces, with global access expanding gradually through CapCut/Dreamina rather than direct API access.^[9]
Audio: Seedance 1.0 generates silent video. Audio-video joint generation arrived only with Seedance 1.5 Pro and Seedance 2.0 in subsequent releases.^[7]^[8]

Significance

Seedance 1.0 was widely reported as a watershed for Chinese video generation. ByteDance's combination of an aggressive low Volcano Engine price (3.67 yuan per 5-second 1080p clip), a top-of-leaderboard claim on a third-party arena, and immediate distribution through both an enterprise API and the country's most-used consumer chatbot positioned the model as a credible Sora and Veo competitor inside three weeks of Google's Veo 3 launch.^[1]^[3] Independent commentary frequently framed the release as evidence that the Chinese AI ecosystem had closed, and in some surveys exceeded, the gap with Western leaders on video generation.^[3]

The model also served as a reference point for the rest of ByteDance Seed's generative-media stack. Subsequent releases in the same product line, particularly the audio-augmented Seedance 1.5 Pro and the multimodal Seedance 2.0, retained the architectural skeleton (a diffusion transformer with MM-RoPE positional encoding, cascaded refinement, and RLHF-style post-training) and extended it to audio joint generation and richer reference inputs.^[7]^[8]

Doubao: ByteDance's consumer chatbot brand and the product line that hosts Seedance for end users.
Doubao Seed 1.6: the large language model released alongside Seedance 1.0 Pro on June 11, 2025.
Seedream: ByteDance's parallel text-to-image and image-editing family.
ByteDance Seed: the research division responsible for both Seedance and Seedream.
Veo 3: Google DeepMind's contemporaneous text-to-video model.
Sora and Sora 2: OpenAI's video generation models.
Kling / Kling 2.1: Kuaishou's video generation family.
Hailuo AI: MiniMax's video generation model.
Runway Gen-3 Alpha and Runway Gen-4: Runway's video generation models.
Wan: Alibaba's video generation family.
CapCut and the Dreamina/Jimeng platform: ByteDance's creator-facing video distribution surfaces.
TikTok: ByteDance's short-video platform, the eventual destination for many Seedance-generated clips.
Artificial Analysis: the third-party benchmarking platform whose video arena anchored Seedance's launch claims.

References

AIbase News, "ByteDance Unveils DouBao 1.6 and Seedance 1.0 with Significantly Reduced Costs", AIbase, 2025-06-11. https://www.aibase.com/news/18831. Accessed 2026-05-21. ↩
Yu Gao, Haoyuan Guo, Tuyen Hoang, Weilin Huang, Lu Jiang and the ByteDance Seed Vision Team, "Seedance 1.0: Exploring the Boundaries of Video Generation Models", arXiv:2506.09113, 2025-06-10 (v1), 2025-06-28 (v2). https://arxiv.org/abs/2506.09113. Accessed 2026-05-21. ↩
Time magazine staff, "Does ByteDance Have the Best AI Visual Models in the World?", Time, 2025. https://time.com/7321911/bytedance-seedance-ai-sora/. Accessed 2026-05-21. ↩
1AI.net (AI Artificial Intelligence News), "Volcano Engine Releases Seedance 1.0 Lite, a Beanbag Video Generation Model: Movie and TV Grade Quality, Dramatic Speed Improvement", 1ai.net, 2025-05-13. http://www.1ai.net/en/35125.html. Accessed 2026-05-21. ↩
Seedance10.com, "Seedance 1.0: Cinematic AI. Seedance 1.0 Pro | Lite", product page, 2025. https://seedance10.com/. Accessed 2026-05-21. ↩
ByteDance Seed, "Seedream 3.0 Text-to-Image Model Technical Report Released", seed.bytedance.com, 2025-04-17. https://seed.bytedance.com/en/blog/seedream-3-0-text-to-image-model-technical-report-released. Accessed 2026-05-21. ↩
AIbase News, "Volcano Engine FORCE Conference Unleashes: Doubao Large Model 1.8 + Seedance 1.5 Pro Released", news.aibase.com, 2025-12-18. https://news.aibase.com/news/23817. Accessed 2026-05-21. ↩
ByteDance Seed, "Seedance 2.0 Official Launch", seed.bytedance.com, 2026-02-09. https://seed.bytedance.com/en/blog/official-launch-of-seedance-2-0. Accessed 2026-05-21. ↩
TechCrunch, "ByteDance's new AI video generation model, Dreamina Seedance 2.0, comes to CapCut", techcrunch.com, 2026-03-26. https://techcrunch.com/2026/03/26/bytedances-new-ai-video-generation-model-dreamina-seedance-2-0-comes-to-capcut/. Accessed 2026-05-21. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributor · full history

Suggest edit

What links here

Kling 3.0