HiDream
Last reviewed
Jun 3, 2026
Sources
12 citations
Review status
Source-backed
Revision
v1 · 1,286 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
12 citations
Review status
Source-backed
Revision
v1 · 1,286 words
Add missing citations, update stale details, or suggest a clearer explanation.
HiDream most commonly refers to HiDream-I1, an open-source text-to-image generative foundation model released in April 2025 by the Chinese company HiDream.ai (Chinese: 智象未来). The model has about 17 billion parameters and uses a sparse diffusion transformer with a mixture-of-experts design. It drew attention shortly after release for topping the Artificial Analysis text-to-image arena, where its distilled "Dev" variant briefly outranked closed models including GPT-4o image generation. [1][2] The same name also covers a small family of related models from HiDream.ai (the E1 editing models and the later O1-Image model) and the company's consumer products, vivago.ai and PixMaker. [3]
HiDream.ai is a generative AI company founded in 2023 and headquartered in Beijing, China. [4] It builds foundation models for image, video, 3D, and text generation, and it operates consumer and marketing-oriented creative tools on top of those models. [3][4]
The founder and CEO is Tao Mei (梅涛), a computer-vision and multimedia researcher. Before starting HiDream.ai he was a Vice President at the e-commerce company JD.com and, earlier, a Senior Research Manager at Microsoft Research. [5] He holds B.E. and Ph.D. degrees from the University of Science and Technology of China, and he is a Fellow of the ACM, IEEE, IAPR, and CAAI, as well as an International Fellow of the Canadian Academy of Engineering. [5] (Some secondary descriptions order the name as "Mei Tao"; the surname is Mei.)
The company has raised venture funding across multiple rounds, including a seed round in late 2023 and, in April 2026, a round exceeding CNY 500 million led by investors including Oriental Fortune Capital and Anhui Provincial Investment Group, earmarked for a next-generation multimodal model and enterprise products. [6]
HiDream-I1 is the company's first open-source image foundation model. The transformer weights were published on GitHub and Hugging Face in early April 2025, and a technical report, "HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer," followed on 28 May 2025 (arXiv:2505.22705). [7][8]
The model generates 1024 x 1024 images from text prompts. It is described by its authors as achieving state-of-the-art prompt following among open models while keeping inference fast, which is the basis for the "high-efficient" label in the report title. [8]
HiDream-I1 has roughly 17 billion parameters and is built as a sparse diffusion transformer. The network combines dual-stream blocks, which process image and text tokens in separate paths, with single-stream blocks, where the two modalities interact. Both block types use a dynamic mixture-of-experts layer, so only part of the network is active for a given token, which keeps compute lower than a dense model of the same nominal size. [8]
For text conditioning the model uses four sources of text features rather than a single encoder: a long-context CLIP variant (Long-CLIP), the T5-XXL encoder, the Llama 3.1 8B Instruct language model, and pooled text embeddings. [1][8] The image autoencoder (VAE) is reused from Black Forest Labs' FLUX.1 [schnell] release. [1]
HiDream-I1 ships in three variants that trade speed against the number of sampling steps. The "Full" model runs the standard sampler, while "Dev" and "Fast" are distilled for fewer steps and run with classifier-free guidance effectively disabled (guidance scale 1.0). [1][9]
| Variant | Inference steps | Guidance scale (cfg) | Notes |
|---|---|---|---|
| HiDream-I1-Full | 50 | 5.0 | Highest-fidelity base model |
| HiDream-I1-Dev | 28 | 1.0 | Guidance-distilled; the variant that topped the arena |
| HiDream-I1-Fast | 16 | 1.0 | Fastest; fewest steps |
The technical report cites 14 steps for the Fast configuration, slightly fewer than the 16 steps recommended in the GitHub README and common ComfyUI workflows; both figures refer to the same distilled model. [8][9]
The HiDream-I1 transformer weights are released under the MIT License, which permits commercial use, and this permissive licensing was a large part of why the release was widely picked up. [1][2] Because the model is assembled from third-party components, those pieces keep their own licenses: the reused FLUX.1 [schnell] VAE is under Apache 2.0, the T5-XXL encoder is under Apache 2.0, and the bundled Llama 3.1 8B Instruct text encoder is governed by Meta's Llama 3.1 Community License. [1]
On standard prompt-following benchmarks reported in the technical report, HiDream-I1 scores 0.83 overall on GenEval and 85.89 on DPG-Bench, and it averages 33.82 on the HPSv2.1 human-preference benchmark. The GenEval result is above the 0.80 reported there for Janus-Pro-7B, and the HPSv2.1 average is above the 32.47 reported for FLUX.1-dev. [1][8]
Shortly after release in April 2025, HiDream-I1 reached the top of the Artificial Analysis text-to-image arena, an Elo-style leaderboard built from blind human pairwise preferences. Chinese state outlet China Daily reported that the model "topped the Artificial Analysis global leaderboard within 24 hours of its release, beating mainstream models from companies such as Midjourney, OpenAI and Google." [2] The MIT-licensed Dev variant was the entry that reached the number-one position, briefly above GPT-4o image generation. [2][10] That standing was temporary, as it is for any arena as newer models arrive; by 2026 HiDream's own O1-Image model and various other systems sat above the original I1 entries. [3]
HiDream-E1 is an instruction-based image-editing model built by fine-tuning HiDream-I1 on a dataset of about 5 million (source image, editing instruction, target image) triplets, letting users edit an image with natural-language commands and no masks. [8] An updated HiDream-E1.1, open-sourced on 16 July 2025, adds dynamic resolution up to roughly 1 megapixel, lifting the original E1's 768 x 768 limit. [11]
In 2026 HiDream.ai released HiDream-O1-Image, an 8-billion-parameter model open-sourced under the MIT License on 8 May 2026 (arXiv:2605.11061). [12] It uses a "Pixel-level Unified Transformer" that encodes raw pixels, text, and task conditions in one shared token space, dropping the external VAE and separate text encoders used by latent diffusion models, and it handles text-to-image generation, instruction editing, and subject-driven personalization at up to 2048 x 2048. [12] At launch it sat around eighth on the Artificial Analysis text-to-image arena, the highest-placed open-weight model there at the time, despite being far smaller than competitors such as FLUX.2 [dev]. [12]
Beyond the open models, HiDream.ai runs vivago.ai, a consumer all-in-one creative assistant available on web and mobile that generates and edits images and other media from text prompts; a relaunched "vivago 2.0" bundles several creation tools into one workflow. [3] The company also offers PixMaker (also referred to in some materials by the name Pixeling), aimed at marketing and commercial content creation. [3] These products are powered by HiDream.ai's own foundation models, including the HiDream-I1 line.