# Seedream

> Source: https://aiwiki.ai/wiki/seedream
> Updated: 2026-06-24
> Categories: AI Models, Chinese AI, Computer Vision, Generative AI, Image Generation
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

**Seedream** is a series of [text-to-image](/wiki/text_to_image) and image-editing foundation models built by the Seed research team at [ByteDance](/wiki/bytedance), the company behind TikTok and Douyin. In September 2025 the independent benchmarking firm Artificial Analysis declared Seedream 4.0 "the new leading image model across both the Artificial Analysis Text to Image and Image Editing Arena, surpassing Google's Gemini 2.5 Flash (Nano-Banana), across both," the first time a Chinese model had topped both [AI image generation](/wiki/ai_image_generation) arenas at once. [4][21] Seedream powers the image features inside ByteDance's consumer products, including the [Doubao](/wiki/doubao) chatbot and the Jimeng (Dreamina) creative app, and is sold as a paid API through ByteDance's cloud arm Volcano Engine and its international platform BytePlus. [4]

The series is best known for two things: native bilingual (Chinese and English) text rendering and high-resolution photorealism. Successive releases moved the family from a specialist bilingual text-rendering system in late 2024 to a unified generation-and-editing model in 2025, then to a flagship version with native 4K output. The most recent flagship, Seedream 4.5, was released on 4 December 2025 and supports text-to-image generation and editing at resolutions up to 4096 by 4096 pixels, with stable subject consistency across up to fourteen reference images. [1] The series is positioned by ByteDance as a competitor to [GPT Image 1](/wiki/gpt_image_1), Google's [Nano Banana](/wiki/nano_banana) family, Black Forest Labs' [Flux 2](/wiki/flux_2), [Ideogram 3.0](/wiki/ideogram_3), and Google's [Imagen 4](/wiki/imagen_4), and has placed in the top five on the Artificial Analysis text-to-image Arena since the autumn of 2025. [8]

## Who makes Seedream?

ByteDance founded its Seed research division in 2023 to consolidate large-model work that had previously been spread across multiple product teams. Seed is responsible for the family of Doubao language models as well as a set of media-generation systems that share the "Seed" prefix. The visual side of that portfolio centers on two parallel lines: Seedream for still images and [Seedance](/wiki/seedance) for video. Both feed the same consumer products, with Doubao acting as the chatbot front end and Jimeng (marketed internationally as Dreamina under CapCut) serving as the dedicated creative app. [4]

ByteDance entered the image-generation market relatively late compared with Stability AI, OpenAI, and Midjourney, but it had several advantages. The company already operated a large captioning and tagging pipeline for short-form video, which produced training data well suited to multilingual prompt understanding. It also controlled distribution channels with hundreds of millions of daily users, so a new model could be tested at scale within days of release. The first Seedream releases were aimed at Chinese-language users who wanted to place legible Chinese characters inside generated posters and ads, a task that earlier Western models handled poorly. The Seedream 2.0 technical report frames the gap directly: "prevalent models such as Flux, SD3.5 and Midjourney, still grapple with issues like model bias, limited text rendering capabilities, and insufficient understanding of Chinese cultural nuances." [5]

The ByteDance Seed team has published a series of technical reports on arXiv detailing the architecture and training of each Seedream release. These reports are unusual among Chinese frontier-lab releases for the level of disclosure on data filtering, reward modeling, and inference acceleration. Each report has been accompanied by a marketing release on the seed.bytedance.com domain and, where applicable, an API rollout on Volcano Engine and BytePlus.

## When was each version of Seedream released?

The Seedream family has shipped five named generations since late 2024. Release dates below come from ByteDance Seed announcements, technical reports, and independent coverage.

| Version | Public release | Key change |
|---|---|---|
| Seedream 2.0 | December 2024 (Doubao and Jimeng); arXiv report March 2025 | Native Chinese-English bilingual generation, glyph-aligned text rendering [5] |
| Seedream 3.0 | April 2025 | Direct 2K output, roughly 3-second 1K generation, improved photorealism [6][14] |
| Seedream 4.0 | 9 September 2025 | Unified generation and editing in one model, 4K output, up to nine images per run [2][3] |
| Seedream 4.5 | 4 December 2025 | Native 4K, stronger typography, up to 14 reference images, roughly 10x speedup [1][16] |
| Seedream 5.0 Lite | 13 February 2026 | Chain-of-thought visual reasoning, live web search, lower per-image price [7] |

Official sources refer to the December 2024 release as Seedream 2.0; there is no public Seedream 1.0 article on seed.bytedance.com, and the version number is understood by independent coverage to reflect internal ordering rather than a public launch. The arXiv report for Seedream 2.0 was posted in March 2025, several months after the model first shipped inside Doubao. [5]

Seedream 3.0 arrived on Doubao and Jimeng in early April 2025 and was the first release that ByteDance described as a general-purpose image foundation model rather than a poster-and-text specialist. It introduced direct 2K output (2048 by 2048 pixels) without an upscaling pass and brought end-to-end 1K generation latency down to roughly three seconds. [6][14]

Seedream 4.0 was announced on 9 September 2025 and made two architectural changes that defined the rest of the line: it merged text-to-image and image editing into a single network, and it added native 4K generation. ByteDance reported a diffusion transformer with "more than 10x inference acceleration compared to Seedream 3.0," generation of a 2K image in about 1.4 seconds, and batch generation of up to nine coherent images in one run. [2][4]

Seedream 4.5 followed in early December 2025. ByteDance positioned the release as a scaling upgrade rather than a redesign. The system card lists the model code as seedream-4-5-251128 on BytePlus, and the version brings stable handling of up to fourteen reference images, sharper small-text rendering, and roughly a tenfold speed improvement over 4.0 at equivalent quality settings. [1][16]

Seedream 5.0 Lite, released on 13 February 2026, is a smaller variant that pairs the image stack with a chain-of-thought reasoning module and a live web-search tool. It is priced below 4.5 and is the first release in the family to advertise visual reasoning over partially specified scenes. [7]

## How does Seedream work?

ByteDance has published architecture details for Seedream 2.0, 3.0, and 4.0 in technical reports on arXiv. The lineage is consistent: every version uses a diffusion transformer backbone trained with a multi-stage recipe that includes continued training, supervised fine-tuning, and reinforcement learning from human feedback.

Seedream 2.0 paired a diffusion transformer with a self-developed bilingual large language model used as a text encoder. The text branch was supplemented by a glyph-aligned ByT5 model that operated at the character level and let the system render Chinese ideographs as well as English glyphs. A scaled rotary position embedding (Scaled ROPE) was used to generalize to resolutions that were not seen during training. As the paper puts it, "Glyph-Aligned ByT5 is applied for flexible character-level text rendering, while a Scaled ROPE generalizes well to untrained resolutions." [5] ByteDance also reported using a variational autoencoder for latent compression, consistent with the design of contemporaries such as Stable Diffusion 3 and Flux.

Seedream 3.0 retained the diffusion transformer and bilingual text encoder, scaled up data and parameters, and added acceleration techniques that brought 1K latency under three seconds. The technical report describes mixed-precision inference and distillation-based sampling shortcuts, plus multi-resolution mixed training that enables direct 2K output. [6][14]

Seedream 4.0 made the largest break in the line. The arXiv report "Seedream 4.0: Toward Next-generation Multimodal Image Generation" describes a unified diffusion transformer that handles text-to-image, image-to-image, single-image editing, multi-image editing, and image composition through one backbone. [4] The model uses a higher-compression VAE than its predecessor (described as a "powerful VAE featuring high compression ratio" that reduces the number of image tokens in latent space), a VLM-based prompt-engineering model trained on Seed1.5-VL for multimodal understanding, and joint training across the editing and generation tasks. [4] Inference is accelerated through a stack that includes adversarial distillation, distribution-matching distillation, 4-bit and 8-bit quantization, and speculative decoding. The combination is what enables 4K output at the speeds ByteDance has reported.

Seedream 4.5 and 5.0 Lite are described in marketing material as scaled extensions of the 4.0 stack rather than as fresh architectures. ByteDance has framed 4.5 as an "all-round improvement through the overall scaling of the model," with the typography and small-text gains attributed to expanded training data and better reward models. [1] The 5.0 Lite release adds a reasoning loop that runs chain-of-thought tokens before image synthesis, plus a search tool that retrieves real-time web content the model can use to ground generations in current events. [7]

Training data details have not been fully disclosed. The Seedream 2.0 paper describes a filtering pipeline that uses internal taggers and an aesthetic scorer, and the 4.0 report mentions "hundreds of millions" of images at multiple resolutions. ByteDance has not published model parameter counts for any Seedream release.

## What is Seedream used for?

The Seedream line is built around five capability clusters that are present in some form across every release.

| Capability | Notes |
|---|---|
| Text-to-image generation | All versions; 4K native in 4.0 and later |
| Image editing | Added as a first-class mode in 4.0, refined in 4.5 |
| Multi-image composition | Up to nine outputs per run in 4.0, up to fourteen reference inputs in 4.5 |
| Bilingual text rendering | Chinese and English supported across the line; glyph quality is a stated focus |
| Visual reasoning and search | Introduced in 5.0 Lite |

Text rendering has been the most distinctive feature of the family. The Seedream 3.0 release notes claimed a 94 percent success rate on a Chinese complex-typography test, ahead of GPT-4o, which had handled English well but stumbled on Chinese characters. Seedream 4.0 added accurate rendering of dense English and mixed-script content, and 4.5 was marketed on small-text and designer-grade poster typography. These are the cases where the system competes most directly with [Ideogram 3.0](/wiki/ideogram_3), the other model frequently cited for text fidelity.

Multi-image composition is the headline feature of the editing modes. Seedream 4.5 takes up to fourteen reference images and uses them to lock face identity, product finish, color palette, or other consistent attributes across a generated set. [1] Inside Jimeng and Doubao, this is surfaced as a "character keeper" or "product keeper" tool that designers use for ad campaigns and storyboards.

Resolution targets have climbed at every release. Seedream 2.0 produced standard 1K images; 3.0 added native 2K; 4.0 introduced 4K with an adaptive aspect ratio selector; 4.5 made 4K the default output for the BytePlus API, with the model card listing 4096 by 4096 as the maximum supported size. [1] Edit mode in 4.5 also supports 2048 by 2048 output for cases where source resolution is lower.

The 5.0 Lite release added two genuinely new capabilities. The first is chain-of-thought visual reasoning: given a partial scene, the model writes intermediate thoughts about what should appear before rendering. The second is real-time web search, in which the model retrieves current information at generation time so that an image referencing a recent event can include accurate visual details. [7] Independent reviewers have compared the reasoning loop to a smaller, image-focused version of the inference-time scaling seen in language models such as o1.

## How much does Seedream cost and where can you use it?

Seedream is available through three main channels: consumer apps, the Volcano Engine API in mainland China, and BytePlus globally.

Inside ByteDance products, Seedream powers image generation in [Doubao](/wiki/doubao) and in Jimeng (Dreamina in English markets). These consumer surfaces are free for casual use with credit limits. CapCut bundles a version of the Dreamina interface for video creators. [4]

Volcano Engine, ByteDance's mainland Chinese cloud, exposes Seedream models through its ModelArk API platform. Pricing on Volcano Engine is denominated in yuan and varies by region. Outside China, BytePlus offers the same models through ModelArk-Byteplus and through partner platforms such as fal.ai, Replicate, Runware, OpenRouter, and ImagineArt. [12][13]

| Model | Indicative BytePlus list price | Notes |
|---|---|---|
| Seedream 4.0 | About 0.025 to 0.035 US dollars per image | Tiered packs available |
| Seedream 4.5 | About 0.045 US dollars per image | Native 4K supported |
| Seedream 5.0 Lite | About 0.035 US dollars per image | Reasoning and search included |

BytePlus also runs a free trial of 200 image generations and sells subscription bundles such as 400 images for 6.99 US dollars, 1,028 images for 24.99 US dollars, and 2,000 images for 49.99 US dollars, each valid for thirty days from purchase. [13] Third-party aggregators publish their own credit-based prices, which vary by platform and may apply markups.

The API supports two main endpoints per generation, one for text-to-image and one for editing, plus parameters for reference images, aspect ratios, output resolution, and a seed for reproducibility. Latency on the BytePlus endpoint is typically a few seconds for 2K generation and longer for full 4K output, depending on prompt complexity. [12]

## How does Seedream compare to GPT Image, Nano Banana, and Flux?

ByteDance benchmarks Seedream against the dominant Western image models on its internal MagicBench evaluation and against the public Artificial Analysis and LMArena Arenas. The picture below is drawn from ByteDance's own arXiv reports and from independent leaderboard data through early 2026.

| Model | Maximum native resolution | Strength | Public benchmark position (early 2026) |
|---|---|---|---|
| Seedream 4.0 | 4096 by 4096 | Bilingual text, editing, speed | Tied or top-five on Artificial Analysis text-to-image |
| Seedream 4.5 | 4096 by 4096 | Multi-reference consistency, typography | Improves over 4.0 on MagicBench prompt adherence and aesthetics |
| GPT Image 1 | Reported 4096 by 4096 | Photorealism, English text, reasoning | First-place on Artificial Analysis through much of late 2025 |
| Nano Banana family | 2048 by 2048 native, higher with Pro | Speed, multimodal context | Top of LMArena image-edit board in late 2025 |
| Flux 2 | High native resolution | Open-weight options, aesthetics | Strong reception among open-source users |
| Ideogram 3.0 | 2048 by 2048 | Typography and English text rendering | Niche leader in poster and graphic design |
| Imagen 4 | High resolution | Photorealism and prompt adherence | Competitive on Google-hosted benchmarks |

ByteDance's own MagicBench evaluations showed Seedream 4.0 first on single-image editing and competitive on text-to-image, with a noted advantage over GPT-Image-1 on texture, lighting, and color tone. [4] The independent Artificial Analysis text-to-image Arena, which uses anonymized side-by-side votes, listed Seedream 4.0 at an Elo rating in the high 1,100s during early 2026, putting it inside the top five and within roughly 100 points of [GPT Image 2](/wiki/gpt_image_2). [8]

The single most-cited result for the family is its September 2025 Artificial Analysis debut. Artificial Analysis reported that Seedream 4.0 was "the new leading image model across both the Artificial Analysis Text to Image and Image Editing Arena, surpassing Google's Gemini 2.5 Flash (Nano-Banana), across both," giving the model the top spot on both boards simultaneously. [21] On the LMArena Text-to-Image leaderboard, Seedream 4 entered at fifth place in September 2025 with more than 4,500 votes and rose to second place on the image-edit leaderboard with 43,000 votes by late September. A separate "High Res" variant briefly tied with Gemini 2.5 Flash Image, which the community had nicknamed [Nano Banana](/wiki/nano_banana), at the top of the text-to-image board. [10][11]

Against [Flux 2](/wiki/flux_2), the most prominent open-weight competitor in late 2025, Seedream offers higher native resolution and stronger Chinese text rendering, while Flux remains the default for users who want local inference. Against [Ideogram 3.0](/wiki/ideogram_3), Seedream now matches or exceeds English typography quality on independent evaluations, although Ideogram retains a following among graphic designers for its prompt-following on poster layouts. Against [Imagen 4](/wiki/imagen_4), Seedream is judged comparable on photorealism in independent reviews, with Google's model often preferred for natural skin tone and Seedream preferred for high-resolution detail.

## How was Seedream received?

Reception of Seedream has shifted as the family matured. The first two releases were treated by Western coverage as Chinese-market tools for handling Chinese text, with limited interest outside Doubao and Jimeng users. Seedream 3.0 changed that perception by topping the Artificial Analysis user preference leaderboard briefly in April 2025, and Seedream 4.0 entered the mainstream Western tech press in September 2025 when outlets including TechRadar and Yahoo Tech ran reviews framing the model as "terrifyingly real" and as a direct threat to Google's then-dominant Nano Banana family. [15]

Independent reviewers have praised Seedream 4.0 and 4.5 for their handling of small text, photorealistic skin and fabric textures, and their ability to keep characters and products visually consistent across multi-image sets. The arXiv release of the Seedream 4.0 technical report drew technical attention to the unified generation-and-editing design, which several reviewers cited as a likely template for next-generation systems from other labs.

Criticism has focused on three areas. The first is provenance: ByteDance has disclosed less about training data than some Western competitors, and the model's strong typography ability on brand-name fonts has raised licensing questions. The second is censorship and political filtering. The Chinese deployment of Seedream is subject to Cyberspace Administration rules on synthetic media, and certain prompts that work outside China are blocked on Volcano Engine. The third is the gap between marketing demos and average prompts: independent reviewers have noted that ByteDance's hero images for 4.5 typically use refined prompt engineering, and casual users report more variable results.

Within China, Seedream has become the default for ad creative workflows. Agencies that use Jimeng report multi-week production timelines compressed into days for short-form video and poster work. Outside China, the BytePlus distribution channel has given Seedream a foothold among Western indie developers and graphic-design startups, often via aggregator platforms rather than direct integration.

In early 2026, the release of Seedream 5.0 Lite drew attention for being one of the first commercial image models to bundle real-time web search with image generation. [7] Reviewers compared the visual-reasoning mode to inference-time scaling in language models, and several noted that the Lite tier was priced low enough to be attractive even where the model's reasoning loop was not needed. ByteDance has not publicly announced a full Seedream 5.0 release as of May 2026.

The model is also indirectly visible through ByteDance's other products. Doubao image creation, Jimeng campaigns, and CapCut's image-to-video and image-to-text features all rely on Seedream weights. [Seedance](/wiki/seedance), the sibling video model, accepts Seedream outputs as input frames for image-to-video pipelines, which has positioned the two systems as a paired creative stack inside ByteDance's ecosystem. [17]

## See also

- [Hunyuan Image 3.0](/wiki/hunyuan_image_3)
- [ByteDance](/wiki/bytedance)
- [Doubao](/wiki/doubao)
- [Seedance](/wiki/seedance)
- [GPT Image 1](/wiki/gpt_image_1)
- [Nano Banana](/wiki/nano_banana)
- [Flux 2](/wiki/flux_2)
- [Ideogram 3.0](/wiki/ideogram_3)
- [Imagen 4](/wiki/imagen_4)

## References

1. ByteDance Seed, "Seedream 4.5," seed.bytedance.com, December 2025. https://seed.bytedance.com/en/seedream4_5
2. ByteDance Seed, "Seedream 4.0," seed.bytedance.com, September 2025. https://seed.bytedance.com/en/seedream4_0
3. ByteDance Seed, "Seedream 4.0 Officially Released: Beyond Drawing, Into Imagination," Seed blog, 9 September 2025. https://seed.bytedance.com/en/blog/seedream-4-0-officially-released-beyond-drawing-into-imagination
4. ByteDance Seed, "Seedream 4.0: Toward Next-generation Multimodal Image Generation," arXiv 2509.20427, 2025. https://arxiv.org/abs/2509.20427
5. ByteDance Seed, "Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model," arXiv 2503.07703, 2025. https://arxiv.org/abs/2503.07703
6. ByteDance Seed, "Seedream 3.0 Technical Report," arXiv 2504.11346, 2025. https://arxiv.org/abs/2504.11346
7. ByteDance Seed, "Introducing Seedream 5.0 Lite: Deeper Thinking, More Accurate Generation," Seed blog, 13 February 2026. https://seed.bytedance.com/en/blog/deeper-thinking-more-accurate-generation-introducing-seedream-5-0-lite
8. Artificial Analysis, "Text-to-Image Leaderboard," artificialanalysis.ai, accessed May 2026. https://artificialanalysis.ai/image/leaderboard/text-to-image
9. Artificial Analysis, "Seedream 4.0 AI Image Generation Explorer," artificialanalysis.ai. https://artificialanalysis.ai/image/explore/model/bytedance-seed_seedream-4-0
10. LMArena, "Leaderboard update: Seedream 4 enters Text-to-Image and Image Edit boards," X (formerly Twitter), 12 September 2025. https://x.com/lmarena_ai/status/1966562486897029274
11. LMArena, "High Res Seedream 4 added; ties Nano Banana on Text-to-Image," X, 17 September 2025. https://x.com/arena/status/1968007564270211228
12. BytePlus, "Seedream on ModelArk," docs.byteplus.com. https://docs.byteplus.com/en/docs/ModelArk/1824121
13. BytePlus, "Seedream product page," byteplus.com. https://www.byteplus.com/en/product/Seedream
14. WinBuzzer, "ByteDance Unveils Seedream 3.0 AI Image Generator and SeedEdit AI Image Editor," 19 April 2025. https://winbuzzer.com/2025/04/19/bytedance-unveils-seedream-3-0-ai-image-generator-and-seededit-ai-image-editor-with-enhanced-realism-xcxwbn/
15. TechRadar, "Step aside Nano Banana, Seedream 4.0 is the best AI image generator I've ever seen," September 2025. https://www.techradar.com/ai-platforms-assistants/tiktok-creators-new-ai-image-generator-is-the-best-ive-ever-seen-and-its-terrifying
16. APIYI, "SeeDream 4.5 Launch: BytePlus Volcano Engine's Most Powerful 4K Image Generation Model," docs.apiyi.com, 4 December 2025. https://docs.apiyi.com/en/news/seedream-4-5-launch
17. APIYI, "Seedance 2.0 and Seedream 5.0: 7 upgrade highlights and API guide," help.apiyi.com, February 2026. https://help.apiyi.com/en/seedance-2-seedream-5-february-release-api-guide-en.html
18. Scenario, "Seedream Models: The Essentials," help.scenario.com. https://help.scenario.com/en/articles/seedream-4-the-essentials/
19. WaveSpeed, "Seedream 4.5 Complete Guide," wavespeed.ai, 2026. https://wavespeed.ai/blog/posts/seedream-4-5-complete-guide-2026/
20. MindStudio, "What Is ByteDance Seedream 4.5?" mindstudio.ai. https://www.mindstudio.ai/blog/what-is-bytedance-seedream-4-5
21. Artificial Analysis, "Seedream 4.0 is the new leading image model across both the Artificial Analysis Text to Image and Image Editing Arena," X (formerly Twitter), September 2025. https://x.com/ArtificialAnlys/status/1966167814512980210