Seedream
Last reviewed
May 16, 2026
Sources
20 citations
Review status
Source-backed
Revision
v1 ยท 3,086 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 16, 2026
Sources
20 citations
Review status
Source-backed
Revision
v1 ยท 3,086 words
Add missing citations, update stale details, or suggest a clearer explanation.
Seedream is a series of text-to-image and image-editing foundation models developed by the Seed research team at ByteDance, the Beijing-based company best known for TikTok and Douyin. The series powers image creation features inside ByteDance's consumer products, including the Doubao chatbot and the Jimeng (Dreamina) creative app, and is also distributed as a paid API through ByteDance's cloud arm Volcano Engine and its international platform BytePlus. Successive releases moved the family from a specialist bilingual text-rendering system in late 2024 to a unified generation-and-editing model in 2025, then to a flagship version with native 4K output in late 2025.
The most recent flagship in the line, Seedream 4.5, was released on 4 December 2025 and supports text-to-image generation and image editing at resolutions up to 4096 by 4096 pixels, with stable subject consistency across up to fourteen reference images. The series is positioned by ByteDance as a competitor to GPT Image 1, Google's Nano Banana family, Black Forest Labs' Flux 2, Ideogram 3.0, and Google's Imagen 4, and has placed in the top five on the Artificial Analysis text-to-image Arena since the autumn of 2025.
ByteDance founded its Seed research division in 2023 to consolidate large-model work that had previously been spread across multiple product teams. Seed is responsible for the family of Doubao language models as well as a set of media-generation systems that share the "Seed" prefix. The visual side of that portfolio centers on two parallel lines: Seedream for still images and Seedance for video. Both feed the same consumer products, with Doubao acting as the chatbot front end and Jimeng (marketed internationally as Dreamina under CapCut) serving as the dedicated creative app.
ByteDance entered the image-generation market relatively late compared with Stability AI, OpenAI, and Midjourney, but it had several advantages. The company already operated a large captioning and tagging pipeline for short-form video, which produced training data well suited to multilingual prompt understanding. It also controlled distribution channels with hundreds of millions of daily users, so a new model could be tested at scale within days of release. The first Seedream releases were aimed at Chinese-language users who wanted to place legible Chinese characters inside generated posters and ads, a task that earlier Western models handled poorly.
The ByteDance Seed team has published a series of technical reports on arXiv detailing the architecture and training of each Seedream release. These reports are unusual among Chinese frontier-lab releases for the level of disclosure on data filtering, reward modeling, and inference acceleration. Each report has been accompanied by a marketing release on the seed.bytedance.com domain and, where applicable, an API rollout on Volcano Engine and BytePlus.
The Seedream family has shipped five named generations since late 2024. Release dates below come from ByteDance Seed announcements, technical reports, and independent coverage.
| Version | Public release | Key change |
|---|---|---|
| Seedream 2.0 | December 2024 (Doubao and Jimeng); arXiv report March 2025 | Native Chinese-English bilingual generation, glyph-aligned text rendering |
| Seedream 3.0 | April 2025 | Direct 2K output, roughly 3-second 1K generation, improved photorealism |
| Seedream 4.0 | 9 September 2025 | Unified generation and editing in one model, 4K output, up to nine images per run |
| Seedream 4.5 | 4 December 2025 | Native 4K, stronger typography, up to 14 reference images, roughly 10x speedup |
| Seedream 5.0 Lite | 13 February 2026 | Chain-of-thought visual reasoning, live web search, lower per-image price |
Official sources refer to the December 2024 release as Seedream 2.0; there is no public Seedream 1.0 article on seed.bytedance.com, and the version number is understood by independent coverage to reflect internal ordering rather than a public launch. The arXiv report for Seedream 2.0 was posted in March 2025, several months after the model first shipped inside Doubao.
Seedream 3.0 arrived on Doubao and Jimeng in early April 2025 and was the first release that ByteDance described as a general-purpose image foundation model rather than a poster-and-text specialist. It introduced direct 2K output without an upscaling pass and brought 1K generation latency down to roughly three seconds.
Seedream 4.0 was announced on 9 September 2025 and made two architectural changes that defined the rest of the line: it merged text-to-image and image editing into a single network, and it added native 4K generation. ByteDance described the model as more than ten times faster than Seedream 3.0 at the same resolution, and it shipped batch generation of up to nine coherent images in one run.
Seedream 4.5 followed in early December 2025. ByteDance positioned the release as a scaling upgrade rather than a redesign. The system card lists the model code as seedream-4-5-251128 on BytePlus, and the version brings stable handling of up to fourteen reference images, sharper small-text rendering, and roughly a tenfold speed improvement over 4.0 at equivalent quality settings.
Seedream 5.0 Lite, released on 13 February 2026, is a smaller variant that pairs the image stack with a chain-of-thought reasoning module and a live web-search tool. It is priced below 4.5 and is the first release in the family to advertise visual reasoning over partially specified scenes.
ByteDance has published architecture details for Seedream 2.0, 3.0, and 4.0 in technical reports on arXiv. The lineage is consistent: every version uses a diffusion transformer backbone trained with a multi-stage recipe that includes continued training, supervised fine-tuning, and reinforcement learning from human feedback.
Seedream 2.0 paired a diffusion transformer with a self-developed bilingual large language model used as a text encoder. The text branch was supplemented by a glyph-aligned ByT5 model that operated at the character level and let the system render Chinese ideographs as well as English glyphs. A scaled rotary position embedding was used to generalize to resolutions that were not seen during training. ByteDance also reported using a variational autoencoder for latent compression, consistent with the design of contemporaries such as Stable Diffusion 3 and Flux.
Seedream 3.0 retained the diffusion transformer and bilingual text encoder, scaled up data and parameters, and added acceleration techniques that brought 1K latency under three seconds. The technical report describes mixed-precision inference and distillation-based sampling shortcuts.
Seedream 4.0 made the largest break in the line. The arXiv report "Seedream 4.0: Toward Next-generation Multimodal Image Generation" describes a unified diffusion transformer that handles text-to-image, image-to-image, single-image editing, multi-image editing, and image composition through one backbone. The model uses a higher-compression VAE than its predecessor, a fine-tuned vision-language model called SeedVLM for multimodal understanding, and joint training across the editing and generation tasks. Inference is accelerated through a stack that includes adversarial distillation, distribution-matching distillation, 4-bit and 8-bit quantization, and speculative decoding. The combination is what enables 4K output at the speeds ByteDance has reported.
Seedream 4.5 and 5.0 Lite are described in marketing material as scaled extensions of the 4.0 stack rather than as fresh architectures. ByteDance has framed 4.5 as an "all-round improvement through the overall scaling of the model," with the typography and small-text gains attributed to expanded training data and better reward models. The 5.0 Lite release adds a reasoning loop that runs chain-of-thought tokens before image synthesis, plus a search tool that retrieves real-time web content the model can use to ground generations in current events.
Training data details have not been fully disclosed. The Seedream 2.0 paper describes a filtering pipeline that uses internal taggers and an aesthetic scorer, and the 4.0 report mentions "hundreds of millions" of images at multiple resolutions. ByteDance has not published model parameter counts for any Seedream release.
The Seedream line is built around five capability clusters that are present in some form across every release.
| Capability | Notes |
|---|---|
| Text-to-image generation | All versions; 4K native in 4.0 and later |
| Image editing | Added as a first-class mode in 4.0, refined in 4.5 |
| Multi-image composition | Up to nine outputs per run in 4.0, up to fourteen reference inputs in 4.5 |
| Bilingual text rendering | Chinese and English supported across the line; glyph quality is a stated focus |
| Visual reasoning and search | Introduced in 5.0 Lite |
Text rendering has been the most distinctive feature of the family. The Seedream 3.0 release notes claimed a 94 percent success rate on a Chinese complex-typography test, ahead of GPT-4o, which had handled English well but stumbled on Chinese characters. Seedream 4.0 added accurate rendering of dense English and mixed-script content, and 4.5 was marketed on small-text and designer-grade poster typography. These are the cases where the system competes most directly with Ideogram 3.0, the other model frequently cited for text fidelity.
Multi-image composition is the headline feature of the editing modes. Seedream 4.5 takes up to fourteen reference images and uses them to lock face identity, product finish, color palette, or other consistent attributes across a generated set. Inside Jimeng and Doubao, this is surfaced as a "character keeper" or "product keeper" tool that designers use for ad campaigns and storyboards.
Resolution targets have climbed at every release. Seedream 2.0 produced standard 1K images; 3.0 added native 2K; 4.0 introduced 4K with an adaptive aspect ratio selector; 4.5 made 4K the default output for the BytePlus API, with the model card listing 4096 by 4096 as the maximum supported size. Edit mode in 4.5 also supports 2048 by 2048 output for cases where source resolution is lower.
The 5.0 Lite release added two genuinely new capabilities. The first is chain-of-thought visual reasoning: given a partial scene, the model writes intermediate thoughts about what should appear before rendering. The second is real-time web search, in which the model retrieves current information at generation time so that an image referencing a recent event can include accurate visual details. Independent reviewers have compared the reasoning loop to a smaller, image-focused version of the inference-time scaling seen in language models such as o1.
Seedream is available through three main channels: consumer apps, the Volcano Engine API in mainland China, and BytePlus globally.
Inside ByteDance products, Seedream powers image generation in Doubao and in Jimeng (Dreamina in English markets). These consumer surfaces are free for casual use with credit limits. CapCut bundles a version of the Dreamina interface for video creators.
Volcano Engine, ByteDance's mainland Chinese cloud, exposes Seedream models through its ModelArk API platform. Pricing on Volcano Engine is denominated in yuan and varies by region. Outside China, BytePlus offers the same models through ModelArk-Byteplus and through partner platforms such as fal.ai, Replicate, Runware, OpenRouter, and ImagineArt.
| Model | Indicative BytePlus list price | Notes |
|---|---|---|
| Seedream 4.0 | About 0.025 to 0.035 US dollars per image | Tiered packs available |
| Seedream 4.5 | About 0.045 US dollars per image | Native 4K supported |
| Seedream 5.0 Lite | About 0.035 US dollars per image | Reasoning and search included |
BytePlus also runs a free trial of 200 image generations and sells subscription bundles such as 400 images for 6.99 US dollars, 1,028 images for 24.99 US dollars, and 2,000 images for 49.99 US dollars, each valid for thirty days from purchase. Third-party aggregators publish their own credit-based prices, which vary by platform and may apply markups.
The API supports two main endpoints per generation, one for text-to-image and one for editing, plus parameters for reference images, aspect ratios, output resolution, and a seed for reproducibility. Latency on the BytePlus endpoint is typically a few seconds for 2K generation and longer for full 4K output, depending on prompt complexity.
ByteDance benchmarks Seedream against the dominant Western image models on its internal MagicBench evaluation and against the public Artificial Analysis and LMArena Arenas. The picture below is drawn from ByteDance's own arXiv reports and from independent leaderboard data through early 2026.
| Model | Maximum native resolution | Strength | Public benchmark position (early 2026) |
|---|---|---|---|
| Seedream 4.0 | 4096 by 4096 | Bilingual text, editing, speed | Tied or top-five on Artificial Analysis text-to-image |
| Seedream 4.5 | 4096 by 4096 | Multi-reference consistency, typography | Improves over 4.0 on MagicBench prompt adherence and aesthetics |
| GPT Image 1 | Reported 4096 by 4096 | Photorealism, English text, reasoning | First-place on Artificial Analysis through much of late 2025 |
| Nano Banana family | 2048 by 2048 native, higher with Pro | Speed, multimodal context | Top of LMArena image-edit board in late 2025 |
| Flux 2 | High native resolution | Open-weight options, aesthetics | Strong reception among open-source users |
| Ideogram 3.0 | 2048 by 2048 | Typography and English text rendering | Niche leader in poster and graphic design |
| Imagen 4 | High resolution | Photorealism and prompt adherence | Competitive on Google-hosted benchmarks |
ByteDance's own MagicBench evaluations showed Seedream 4.0 first on single-image editing and competitive on text-to-image, with a noted advantage over GPT-Image-1 on texture, lighting, and color tone. The independent Artificial Analysis text-to-image Arena, which uses anonymized side-by-side votes, listed Seedream 4.0 at an Elo rating in the high 1,100s during early 2026, putting it inside the top five and within roughly 100 points of GPT Image 2.
On the LMArena Text-to-Image leaderboard, Seedream 4 entered at fifth place in September 2025 with more than 4,500 votes and rose to second place on the image-edit leaderboard with 43,000 votes by late September. A separate "High Res" variant briefly tied with Gemini 2.5 Flash Image, which the community had nicknamed Nano Banana, at the top of the text-to-image board.
Against Flux 2, the most prominent open-weight competitor in late 2025, Seedream offers higher native resolution and stronger Chinese text rendering, while Flux remains the default for users who want local inference. Against Ideogram 3.0, Seedream now matches or exceeds English typography quality on independent evaluations, although Ideogram retains a following among graphic designers for its prompt-following on poster layouts. Against Imagen 4, Seedream is judged comparable on photorealism in independent reviews, with Google's model often preferred for natural skin tone and Seedream preferred for high-resolution detail.
Reception of Seedream has shifted as the family matured. The first two releases were treated by Western coverage as Chinese-market tools for handling Chinese text, with limited interest outside Doubao and Jimeng users. Seedream 3.0 changed that perception by topping the Artificial Analysis user preference leaderboard briefly in April 2025, and Seedream 4.0 entered the mainstream Western tech press in September 2025 when outlets including TechRadar and Yahoo Tech ran reviews framing the model as "terrifyingly real" and as a direct threat to Google's then-dominant Nano Banana family.
Independent reviewers have praised Seedream 4.0 and 4.5 for their handling of small text, photorealistic skin and fabric textures, and their ability to keep characters and products visually consistent across multi-image sets. The arXiv release of the Seedream 4.0 technical report drew technical attention to the unified generation-and-editing design, which several reviewers cited as a likely template for next-generation systems from other labs.
Criticism has focused on three areas. The first is provenance: ByteDance has disclosed less about training data than some Western competitors, and the model's strong typography ability on brand-name fonts has raised licensing questions. The second is censorship and political filtering. The Chinese deployment of Seedream is subject to Cyberspace Administration rules on synthetic media, and certain prompts that work outside China are blocked on Volcano Engine. The third is the gap between marketing demos and average prompts: independent reviewers have noted that ByteDance's hero images for 4.5 typically use refined prompt engineering, and casual users report more variable results.
Within China, Seedream has become the default for ad creative workflows. Agencies that use Jimeng report multi-week production timelines compressed into days for short-form video and poster work. Outside China, the BytePlus distribution channel has given Seedream a foothold among Western indie developers and graphic-design startups, often via aggregator platforms rather than direct integration.
In early 2026, the release of Seedream 5.0 Lite drew attention for being one of the first commercial image models to bundle real-time web search with image generation. Reviewers compared the visual-reasoning mode to inference-time scaling in language models, and several noted that the Lite tier was priced low enough to be attractive even where the model's reasoning loop was not needed. ByteDance has not publicly announced a full Seedream 5.0 release as of May 2026.
The model is also indirectly visible through ByteDance's other products. Doubao image creation, Jimeng campaigns, and CapCut's image-to-video and image-to-text features all rely on Seedream weights. Seedance, the sibling video model, accepts Seedream outputs as input frames for image-to-video pipelines, which has positioned the two systems as a paired creative stack inside ByteDance's ecosystem.