Doubao Seedream

AI Models Chinese AI Image Generation

15 min read

Updated Jul 16, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 16, 2026

Fact-checked

In review queue

Sources

17 citations

Revision

v4 · 2,979 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Doubao-Seedream is the family of text-to-image generation foundation models developed by the ByteDance Seed team and shipped through ByteDance's Doubao product line and the company's Volcano Engine cloud platform. The first widely covered international release was Seedream 2.0, whose technical report was posted to arXiv on March 10, 2025, after the model had already been deployed in ByteDance's Doubao chatbot and the Jimeng (Dreamina) creative app in early December 2024.^[1]^[2] Seedream is best known for native Chinese-English bilingual prompt understanding and for legible rendering of Chinese characters inside generated images, two areas where Western text-to-image systems such as Midjourney, Imagen 3, gpt-image-1, and FLUX have historically been weak.^[1]^[3] Successive releases (Seedream 3.0 in April 2025 and Seedream 4.0 in September 2025) added native 2K and then 4K output, unified image editing with generation, and pushed the model to first place on the Artificial Analysis text-to-image leaderboard in late 2025.^[4]^[5]^[6]

Overview

Attribute	Detail
Developer	ByteDance Seed (Doubao Team)
Type	Text-to-image diffusion transformer
Languages	Native Chinese and English prompts
First public deployment	December 2024 (Doubao, Jimeng)
Seedream 2.0 technical report	arXiv:2503.07703, March 10, 2025
Seedream 3.0 technical report	arXiv:2504.11346, April 15, 2025
Seedream 4.0 technical report	arXiv:2509.20427, September 24, 2025
Distribution channels	Doubao, Jimeng / Dreamina, CapCut, Volcano Engine API
Reported native resolutions	1K and 2K (3.0); up to 4K (4.0)

Doubao-Seedream sits alongside the Doubao Seed language and multimodal models and the Doubao-Seedance video generation family inside ByteDance's broader Seed foundation-model program, which was established in 2023 as the company's fundamental AI research division.^[7]

History and Releases

Seed team origins (2023 to 2024)

ByteDance reorganized its AI research in 2023 and established the Seed team as a dedicated unit for fundamental large-model work, with a research scope spanning language, speech, vision, world models, and AI infrastructure.^[7] In February 2025, former Google DeepMind vice president Wu Yonghui joined ByteDance as head of foundational research for Seed, taking a role described as similar to a chief scientist.^[8] The team's image and video stack is led by Jianchao Yang, head of the Multimodal Foundation Model group, who has been publicly identified as a driving force behind Seedream and Seedance.^[9]

The first Seedream variant that gained Western press coverage was Seedream 2.0, although ByteDance had been iterating internally and shipping earlier versions through its Chinese consumer products. The Seedream 2.0 technical report explicitly states that "as of early December 2024, Seedream 2.0 has been incorporated into various platforms exemplified by Doubao (豆包)" and the Jimeng / Dreamina creative tool, serving a large Chinese user base before the international announcement.^[1]

Seedream 2.0 (December 2024 deployment, March 2025 paper)

The Seedream 2.0 technical report was uploaded to arXiv on March 10, 2025 as paper 2503.07703, titled "Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model."^[1] Volcano Engine and the Doubao team published a corresponding technical disclosure on March 12, 2025, marking the model's formal international unveiling.^[2] The paper lists 28 named contributors, with Lixue Gong as lead author and Jianchao Yang and Weilin Huang as senior contributors.^[1]

Seedream 2.0's headline pitch was that prior Western and Chinese open systems, including FLUX, Stable Diffusion 3 / 3.5, and Midjourney, "still grapple with issues like model bias, limited text rendering capabilities, and insufficient understanding of Chinese cultural nuances."^[1] The model was designed from the start to ingest both Chinese and English prompts at a native level, rather than relying on English-only encoders with a translation layer in front of them.

Seedream 3.0 (April 2025)

Seedream 3.0 was released on the Doubao chat platform and the Jimeng tool in early April 2025, with a technical report posted to arXiv on April 15, 2025 (paper 2504.11346) and a public blog post on the ByteDance Seed site.^[4]^[5] The release framed itself as addressing concrete weaknesses of Seedream 2.0: "alignment with complicated prompts, fine-grained typography generation, suboptimal visual aesthetics and fidelity, and limited image resolutions."^[4]

Headline upgrades in 3.0 included native 2K (2048 by 2048) output without a separate refiner pass and approximately three-second generation times for 1K images.^[5] ByteDance reported internal text-availability rates of roughly 94 percent for both Chinese and English characters, up sharply from the 78 percent Chinese rate cited in the 2.0 paper.^[1]^[5]

Shortly after the 3.0 launch, Artificial Analysis listed Seedream 3.0 at the top of its blind-vote text-to-image arena with an Elo of approximately 1158, narrowly ahead of GPT-4o's image mode at 1157 and well ahead of Midjourney v6.1 at around 1047.^[5] This was the first time a Chinese closed model held the number-one spot on the Artificial Analysis leaderboard.^[5]^[6]

Seedream 4.0 (September 2025)

In September 2025 ByteDance announced Seedream 4.0 with a technical report on arXiv (paper 2509.20427, submitted September 24, 2025) and integrations into the Doubao app and Jimeng platform.^[6] Seedream 4.0 unified text-to-image generation, image editing, and multi-image composition inside a single diffusion transformer architecture and a new variational autoencoder, supporting native generation up to 4K resolution and "billions of text-image pairs" in pretraining.^[6] On the Artificial Analysis arena, Seedream 4.0 ranked first across both the text-to-image and image-editing leaderboards as of September 18, 2025.^[6]

Seedream 4.0 also introduced an acceleration framework combining adversarial distillation, distribution matching, hardware-aware quantization, and speculative decoding to bring generation latency low enough for production workflows.^[6]

Seedream 4.5 and 5.0 family (late 2025 to early 2026)

ByteDance later shipped Seedream 4.5 in December 2025 as a refinement focused on character consistency across multiple reference images and professional-grade typography.^[10] A 5.0 generation followed in early 2026, with Seedream 5.0 Lite released alongside Seedance 2.0 on the Jimeng / Dreamina platform and the broader Seed 2.0 launch on Volcano Engine on February 14, 2026.^[11]^[12] The 5.0 generation added real-time web search and multi-turn image-and-text editing to the model line.^[11]

Technical Details

Doubao-Seedream is implemented as a diffusion transformer (DiT) that operates in the latent space of a variational autoencoder, with conditioning provided by a self-developed bilingual large language model that acts as the text encoder.^[1] Rather than reusing a CLIP or T5 encoder, ByteDance fine-tunes its own LLM on image-text pairs so that representations of Chinese cultural concepts and idiomatic English are kept in a shared embedding space.^[1]

Bilingual text encoder and glyph rendering

The Seedream 2.0 paper describes a two-encoder design in which features from the bilingual LLM are concatenated with features from a Glyph-Aligned ByT5 model that operates at the byte / character level.^[1] ByT5 is used specifically to provide accurate, character-level supervision for in-image text, which is necessary for handling the large number of distinct Chinese glyphs and for keeping small English captions legible at 1K and 2K output sizes.^[1] This is one of the architectural choices ByteDance highlights as the source of Seedream's relative strength at text rendering compared with Midjourney, Stable Diffusion 3, and FLUX base models.^[1]

Position encoding and resolution generalization

Both Seedream 2.0 and Seedream 3.0 use a "Scaled" rotary position embedding (RoPE) scheme designed so that patches near the image center share similar position identifiers across different resolutions, allowing the network to generalize to aspect ratios and pixel counts it has not seen during training.^[1] Seedream 3.0 generalizes this further with cross-modality RoPE, representation alignment loss, and resolution-aware timestep sampling that conditions the noise schedule on the target output size.^[4]

Training pipeline

The Seedream 2.0 training recipe runs in five stages:^[1]

Continuing training (CT): Quality-filtered images for aesthetic enhancement.
Supervised fine-tuning (SFT): Manually curated artistic datasets.
Reinforcement learning from human feedback (RLHF): Multi-phase optimization with three reward models covering aesthetics, prompt alignment, and structure.
Prompt engineering (PE): An LLM-based prompt rephraser claimed to improve aesthetic quality by roughly 30 percent.
Refiner: Upscaling and texture enhancement.

Seedream 3.0 doubles the effective dataset by combining a defect-aware training paradigm with a dual-axis collaborative sampling framework, and replaces the human-only RLHF reward model with a vision-language-model-based reward that can scale to larger output sizes.^[4] Seedream 4.0 pushes this further by jointly training text-to-image generation and image editing on billions of pairs and by reducing the number of latent tokens per image through a more aggressive VAE compression scheme.^[6]

Reported benchmark results

Model	Resolution	Reported text availability	Artificial Analysis rank at release
Seedream 2.0	up to 1K (refiner)	78% Chinese, higher English	Not yet on leaderboard at paper time^[1]
Seedream 3.0	Native 2K	94% Chinese and English	#1 with Elo ~1158 (April 2025)^[5]
Seedream 4.0	Up to 4K	Not separately reported	#1 in T2I and editing arenas (September 2025)^[6]

ByteDance evaluations also report that Seedream 2.0 collected roughly 500,000 pairwise human comparisons and obtained the highest total Elo score in those evaluations for both Chinese and English prompts, though the paper does not publish the absolute Elo values for competitors.^[1]

Variants and Distribution

Seedream models are exposed to end users and developers through several distinct surfaces:

Doubao chatbot and Doubao app. The Doubao consumer assistant is ByteDance's flagship generative-AI product and is the primary route for Chinese users to call Seedream from a chat interface. By late 2025 Doubao had reached around 163 million monthly active users.^[10]
Jimeng (Chinese) and Dreamina (international). Jimeng is the Chinese-language creative tool that bundles image and video generation, and Dreamina is its international counterpart on the same backend. Seedream powers the image generation features of both apps.^[1]^[11]
CapCut. ByteDance's CapCut video editor integrates Seedream 4.0 so that text-to-image, image-to-image, and multi-image editing operations are accessible inside the editing timeline, with outputs immediately available to be animated or composited.^[13]
TikTok Image Studio. Image generation features in TikTok's creator tools draw from the same model family, integrating Seedream outputs into the social platform's posting flow.^[10]
Volcano Engine API. Volcano Engine is ByteDance's cloud platform and the official commercial entry point for Seedream and Seedance APIs; the Seed 2.0 series launch announcement in 2026 explicitly bundles Seedream and Seedance into the same Volcano Engine surface.^[12]

The international consumer chatbot for the Doubao family is branded Dola (formerly Cici); users searching for an "English Doubao" are typically routed to Dola, which exposes Seedream image generation through that interface.^[13]

Relationship to Doubao-Seedance

Doubao-Seedance is the video-generation sibling of Seedream and ships out of the same Seed Multimodal Foundation Model group at ByteDance.^[9] Seedance 1.0 was the first version to gain widespread coverage in mid-2025 with text-to-video and image-to-video support inside Doubao and Jimeng, and Seedance 2.0 was launched on February 10, 2026 as a limited beta on Jimeng before being included in the broader Seed 2.0 / Volcano Engine release on February 14, 2026.^[12]^[14] Seedance 2.0 uses a unified multimodal architecture that ingests text, image, audio, and video inputs and produces joint audio-video output.^[15]

In ByteDance's product hierarchy Seedream and Seedance are paired together: the same Doubao or Jimeng workflow can produce a still image with Seedream, refine it with SeedEdit (an image editor released alongside Seedream 3.0), and then animate it into video using Seedance, all under a single account on Volcano Engine.^[4]^[15] CapCut's Dreamina-branded video features are powered by Seedance, and its image generation features are powered by Seedream.^[13]

Aspect	Doubao-Seedream	Doubao-Seedance
Modality output	Still images	Video (with audio in 2.0)
Latest major version	Seedream 5.0 / 5.0 Lite (Feb 2026)	Seedance 2.0 (Feb 2026)
Native max resolution	Up to 4K (4.0)	Variable; clip length and frame rate vary by tier
First broad arXiv report	2503.07703 (Seedream 2.0)	Reported in Seedance technical posts on the ByteDance Seed site
Primary integration apps	Doubao, Jimeng / Dreamina, CapCut, TikTok	Doubao, Jimeng / Dreamina, CapCut

Applications

Seedream's design choices map directly to several concrete use cases:

Chinese-language poster and advertising design. Native 2K and 4K output combined with high-fidelity Chinese typography make Seedream particularly useful for posters, signage, and marketing creatives, which is the workflow ByteDance most often demonstrates in Volcano Engine showcases.^[5]^[10]
Cross-language e-commerce creatives. Bilingual prompt support and accurate small-text rendering allow a single workflow to produce English and Chinese variants of the same product image with consistent layout, addressing a long-standing pain point for sellers using Western models that mangle Chinese characters.^[1]^[5]
CapCut and TikTok short-form video. Inside CapCut, Seedream-generated images feed directly into Seedance-driven motion, allowing creators to go from prompt to short-form video without leaving the editor.^[13]
Editing and consistency-driven workflows. Seedream 4.0 unifies text-to-image with image editing in a single model, and Seedream 4.5 adds a cross-image consistency module that supports up to roughly 14 reference images for keeping a character's identity stable across a series of shots.^[6]^[10]

Limitations and Criticisms

Seedream models inherit several limitations common to diffusion-based text-to-image systems and have a few that are specific to ByteDance's design choices:

Long-form text in images. Despite high reported character availability rates, secondary reviewers note that Seedream 3.0 handles short bold phrases and stylized captions better than long paragraphs of fine print, with multi-line copy still showing artifacts compared with specialized typography models.^[16]
Closed weights. The Seedream family is closed-source. Weights are not released, model cards do not disclose parameter counts, and the only public access routes are ByteDance-controlled consumer apps and the Volcano Engine API.^[4]^[6]
Geographic availability. Doubao and Jimeng are primarily aimed at the Chinese market. International users mostly access Seedream indirectly through Dreamina, CapCut, TikTok's image features, or third-party API resellers, which means access patterns and content policies vary by region.^[13]
Benchmark volatility. Seedream 3.0 and 4.0 reached the top of the Artificial Analysis blind-vote arena at their respective launches, but later model releases from competitors (such as GPT Image 2 and Nano Banana 2 in late 2025 and early 2026) displaced Seedream 4.0 from the number-one position, illustrating how quickly the leaderboard rotates.^[17]
Cultural and policy constraints. As a ByteDance product, Seedream's content filters are tuned to Chinese regulatory requirements, which affects what kinds of images can be generated and how political or sensitive prompts are handled across the various end-user surfaces.^[3]

Comparison with Other Text-to-Image Models

Model	Developer	Native max resolution	Bilingual Chinese/English	Strengths cited by independent reviewers
Midjourney v7	Midjourney	Internal upscaling	English-primary	Aesthetic quality, stylization
Imagen 3	Google DeepMind	Up to 2K	English-primary	Photorealism, instruction following
gpt-image-1	OpenAI	Up to ~4K	Multilingual	Text rendering, integration with ChatGPT
FLUX.1 / FLUX.2	Black Forest Labs	High resolution	English-primary	Open-weight options, sharp detail
Ideogram 3.0	Ideogram	High resolution	English-primary	In-image text rendering
Stable Diffusion 3.5	Stability AI	Variable	English-primary	Open weights, customization
Hunyuan (image)	Tencent	High resolution	Chinese / English	Chinese cultural concepts
Seedream 3.0 / 4.0	ByteDance	2K (3.0) / 4K (4.0)	Native Chinese / English	Bilingual prompts, Chinese typography, Artificial Analysis top ranks at launch

Seedream's most distinctive position in this landscape is the combination of native bilingual handling, in-image Chinese typography, and deep integration into ByteDance's consumer surfaces (Doubao, Jimeng, TikTok, CapCut), rather than any single benchmark number.^[1]^[5]^[13]

References

Lixue Gong et al., "Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model", arXiv (ByteDance Doubao Team), 2025-03-10. https://arxiv.org/abs/2503.07703. Accessed 2026-05-21. ↩
AIbase News, "Groundbreaking Release! Seedream 2.0's Text-to-Image Technology Unveiled, Reshaping Industry Landscape", AIbase, 2025-03-12. https://www.aibase.com/news/16216. Accessed 2026-05-21. ↩
getimg.ai Blog, "Seedream 3.0 Is Here: ByteDance's Text-to-Image Model Explained", getimg.ai, 2025-06-17. https://getimg.ai/blog/seedream-3-0-is-here-bytedances-text-to-image-model-explained. Accessed 2026-05-21. ↩
Yu Gao et al., "Seedream 3.0 Technical Report", arXiv (ByteDance Doubao Team), 2025-04-15. https://arxiv.org/abs/2504.11346. Accessed 2026-05-21. ↩
DigiAlps, "Seedream 3.0 by ByteDance Doubao Team Delivers Stunning 2K Text-to-Image Results", DigiAlps LTD, 2025-04-16. https://digialps.com/seedream-3-0-by-bytedance-doubao-team-delivers-stunning-2k-text-to-image-results/. Accessed 2026-05-21. ↩
ByteDance Seed et al., "Seedream 4.0: Toward Next-generation Multimodal Image Generation", arXiv:2509.20427, 2025-09-24. https://arxiv.org/abs/2509.20427. Accessed 2026-05-21. ↩
ByteDance Seed Team, "Seed Models (overview page)", ByteDance Seed, 2025. https://seed.bytedance.com/en/models. Accessed 2026-05-21. ↩
Wang Jiamin, "Former Google DeepMind VP joins ByteDance as Seed team research lead", TechNode, 2025-02-24. https://technode.com/2025/02/24/former-google-deepmind-vp-joins-bytedance-as-seed-team-research-lead/. Accessed 2026-05-21. ↩
ZoomInfo, "Jianchao Yang, Head of Multimodal Foundation Model at ByteDance (profile)", ZoomInfo, 2025. https://www.zoominfo.com/p/Jianchao-Yang/2042573422. Accessed 2026-05-21. ↩
MindStudio, "What Is ByteDance Seedream 4.5? AI Image Generation from the Makers of TikTok", MindStudio Blog, 2025-12-01. https://www.mindstudio.ai/blog/what-is-bytedance-seedream-4-5. Accessed 2026-05-21. ↩
AIbase News, "Dreamina AI Launches: Seedance 2.0 and Seedream 5.0 Lite Now Officially Available", AIbase, 2026-02-14. https://www.aibase.com/news/26510. Accessed 2026-05-21. ↩
ByteDance Seed, "Seed 2.0 Official Launch (blog announcement)", ByteDance Seed, 2026-02-14. https://seed.bytedance.com/en/blog/seed2-0-%E6%AD%A3%E5%BC%8F%E5%8F%91%E5%B8%83. Accessed 2026-05-21. ↩
Ima Studio, "What is Doubao? How To Try Doubao, Seedream and Seedance", Ima Studio Blog, 2025. https://imastudio.com/blog/what-is-doubao. Accessed 2026-05-21. ↩
SitePoint, "Seedance 2.0: ByteDance's New AI Video Model, Developer Guide and Comparison", SitePoint, 2026-02-12. https://www.sitepoint.com/introducing-seedance-2-0/. Accessed 2026-05-21. ↩
BigGo Finance, "ByteDance's Video-Generating AI Pricing Announced, Volcano Engine's Seedance 2.0 Ushers in the One-Yuan-Per-Second Era", BigGo Finance, 2026-02-14. https://finance.biggo.com/news/X75VvJwBNZYCTTDvWHQb. Accessed 2026-05-21. ↩
ByteDance Seed, "Seedream 3.0 Text-to-Image Model Technical Report Released (official blog)", ByteDance Seed, 2025-04-16. https://seed.bytedance.com/en/blog/seedream-3-0-text-to-image-model-technical-report-released. Accessed 2026-05-21. ↩
Artificial Analysis, "Text to Image Leaderboard (Image Arena)", Artificial Analysis, 2026. https://artificialanalysis.ai/image/leaderboard/text-to-image. Accessed 2026-05-21. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

3 revisions by 1 contributor · full history

Suggest edit

What links here

Seedream 5.0

Overview

History and Releases

Seed team origins (2023 to 2024)

Seedream 2.0 (December 2024 deployment, March 2025 paper)

Seedream 3.0 (April 2025)

Seedream 4.0 (September 2025)

Seedream 4.5 and 5.0 family (late 2025 to early 2026)

Technical Details

Bilingual text encoder and glyph rendering

Position encoding and resolution generalization

Training pipeline

Reported benchmark results

Variants and Distribution

Relationship to Doubao-Seedance

Applications

Limitations and Criticisms

Comparison with Other Text-to-Image Models

See also

References

Improve this article

Related Articles

Seedream

Seedream 5.0

Seedream 4.0

Hunyuan Image 3.0

Jimeng (Dreamina)

HiDream

What links here

Related Articles

Seedream

Seedream 5.0

Seedream 4.0

Hunyuan Image 3.0

Jimeng (Dreamina)

HiDream