FLUX.1 is a family of text-to-image generation models developed by Black Forest Labs, released on August 1, 2024. The series comprises three initial variants with different performance, licensing, and accessibility profiles: FLUX.1 [pro], FLUX.1 [dev], and FLUX.1 [schnell]. All three share a hybrid rectified flow transformer architecture scaled to approximately 12 billion parameters.
FLUX.1 arrived from a team of researchers who had previously built Stable Diffusion and related latent diffusion models at Stability AI. On release, benchmark evaluations placed FLUX.1 [pro] and [dev] at the top of the Artificial Analysis ELO leaderboard, above Midjourney v6.0, DALL-E 3, and Stable Diffusion 3 Ultra. The broader FLUX.1 ecosystem has since expanded to include FLUX.1.1 [pro], high-resolution Ultra and Raw modes, a suite of control tools (Fill, Depth, Canny, Redux), the image-editing-focused FLUX.1 Kontext family, and the second-generation FLUX.2 series released in November 2025.
The article for the broader Flux (text-to-image model) family covers Black Forest Labs' full model catalog, company history, and commercial partnerships. This article focuses on the technical details, variants, tools, and reception of the FLUX.1 series specifically.
Black Forest Labs (BFL) was founded in 2024 by Robin Rombach, Andreas Blattmann, Patrick Esser, and Dominik Lorenz. All four had worked together as researchers at Ludwig Maximilian University Munich under Professor Bjorn Ommer, and later joined Stability AI, where they were central authors of Latent Diffusion Models and Stable Diffusion. As Stability AI encountered financial and leadership difficulties through 2023 and early 2024, the research team departed to start their own company.
The company is headquartered in Freiburg, Germany, with a research presence in San Francisco. Its advisory board includes Michael Ovitz, a veteran entertainment industry executive, and Professor Matthias Bethge, who pioneered neural style transfer.
Black Forest Labs emerged from stealth on August 1, 2024, announcing both its founding and the release of the FLUX.1 model suite simultaneously. The seed round of $31 million was led by Andreessen Horowitz, with additional participation from General Catalyst, MätchVC, and angel investors including Garry Tan, Brendan Iribe, Timo Aila of NVIDIA, and Vladlen Koltun. A subsequent Series A (terms undisclosed at the time) was led by a16z with participation from BroadLight Capital, Creandum, Earlybird VC, Northzone, and NVIDIA. In December 2025, the company raised $300 million in a Series B at a post-money valuation of $3.25 billion, co-led by Salesforce Ventures and Anjney Midha.
The FLUX.1 architecture draws directly on two prior research threads from the founding team. The first is latent diffusion modeling, introduced in the 2022 paper "High-Resolution Image Synthesis with Latent Diffusion Models" (Rombach et al.), which proposed compressing images into a lower-dimensional latent space before applying the diffusion process, dramatically reducing compute requirements relative to pixel-space diffusion. The second is rectified flow, a training framework for flow-based generative models introduced by Patrick Esser and colleagues, which frames generation as learning a straight-line path between noise and data in the latent space. FLUX.1 combined both approaches at a scale and with architectural modifications not previously used in publicly released image generation models.
All FLUX.1 models share the same underlying architecture: a hybrid multimodal diffusion transformer trained with the rectified flow objective in the latent space of a variational autoencoder (VAE). The model has approximately 12 billion parameters.
Rectified flow is a variant of continuous normalizing flows in which the model learns to map noise to data along paths that are as straight as possible. Compared to the denoising score matching objective used in earlier diffusion models, rectified flow tends to produce cleaner probability flow ODEs with fewer discretization errors, which means good images can be produced in fewer sampling steps. FLUX.1 [schnell] can generate usable images in as few as four steps.
The transformer backbone consists of two types of blocks: double-stream blocks and single-stream blocks. In the 19 double-stream blocks, image tokens and text tokens are processed with separate weight matrices for each modality, but attention is computed jointly over their concatenation. This allows the model to mix image and text information from early in the forward pass, unlike cross-attention architectures where text conditions are injected at fixed points. After the double-stream blocks, the representations from both streams are combined and passed through 38 single-stream blocks, where each block applies attention and a feed-forward MLP in parallel (rather than sequentially) from the same input. This parallel computation improves hardware utilization.
FLUX.1 uses factorized three-dimensional rotary positional embeddings (3D RoPE) rather than learned or sinusoidal positional encodings. RoPE injects positional information by rotating the query and key vectors in attention according to their spatial position, which preserves relative positional relationships without adding extra tokens or adding to the embedding dimension. The factorized 3D variant covers height, width, and time dimensions, enabling the model to generalize to image resolutions not seen during training.
The model accepts two separate text encoder outputs as conditioning. The first comes from OpenAI's CLIP ViT-L model, which provides a 768-dimensional pooled embedding. The second comes from T5-XXL (11 billion parameters), which provides token-level embeddings carrying detailed semantic information. Using both encoders together improves the model's ability to follow complex multi-clause prompts, particularly for spatial relationships, attributes, and proper nouns. The T5 encoder is the primary driver of FLUX.1's strong prompt adherence relative to earlier models that relied solely on CLIP.
The VAE compresses input images by a factor of 8 in each spatial dimension. A 1024x1024 pixel image becomes a 128x128 latent grid. This compression is standard in latent diffusion models and keeps the diffusion transformer operating on a manageable sequence length. The VAE used by FLUX.1 was retrained with improvements relative to the VAE in Stable Diffusion XL, resulting in higher fidelity reconstruction especially for textures and fine details.
At 12 billion parameters, the FLUX.1 transformer is substantially larger than the UNet used in Stable Diffusion XL (approximately 3.5 billion parameters) and the DiT in Stable Diffusion 3 (approximately 2 billion parameters in the medium variant). The size increase is the primary contributor to FLUX.1's higher memory and compute requirements but also underpins its quality improvements, particularly in anatomy, text rendering, and complex scene composition.
FLUX.1 [pro] is the highest-performing variant and is available exclusively through API access. The weights are proprietary and not publicly distributed. At launch it was accessible through the API endpoints provided by Replicate and fal.ai; Black Forest Labs subsequently launched its own BFL API in October 2024.
FLUX.1 [pro] uses a full guidance-based sampling process with more diffusion steps than the distilled variants, which contributes to its superior image quality, prompt adherence, and output diversity. It is licensed for commercial use through the API.
On the Artificial Analysis ELO benchmark at the time of release, FLUX.1 [pro] scored 1048 ELO for overall quality and 1060 ELO for visual quality specifically, placing it first in both categories above FLUX.1 [dev] (1035 overall), Midjourney v6.0 (1026), and DALL-E 3.
FLUX.1 [dev] is an open-weight model released on Hugging Face under the FLUX.1 [dev] non-commercial license. It is guidance-distilled from FLUX.1 [pro], meaning it was trained specifically to approximate the outputs of the [pro] model using a smaller number of inference steps. The distillation process preserves most of the quality of the teacher model while making it faster to run.
The weights can be downloaded freely for research, personal, and non-commercial applications. Commercial use requires a separate license from Black Forest Labs. The model produces results competitive with FLUX.1 [pro] and is widely used by researchers and community developers as the base for fine-tuning experiments, LoRA training, and integration into local tools such as ComfyUI and Automatic1111.
FLUX.1 [schnell] is the speed-optimized variant, distilled from FLUX.1 [pro] using a timestep-distillation approach. It is released under the Apache 2.0 license, which permits unrestricted commercial use, redistribution, and modification.
The model is designed for generation in four steps or fewer, compared to the 20-50 steps typical for full FLUX.1 [dev] generation. On an RTX 4090, FLUX.1 [schnell] can produce a 1024x1024 image in roughly 2-4 seconds. The quality trade-off relative to [dev] and [pro] is modest for straightforward prompts and more noticeable for complex scenes requiring precise anatomy or compositional accuracy.
The Apache 2.0 license made FLUX.1 [schnell] particularly attractive for deployment in commercial products and cloud services that needed to run image generation at scale without per-image royalties or use restrictions.
On October 4, 2024, Black Forest Labs released FLUX.1.1 [pro] alongside the launch of the BFL API. FLUX.1.1 [pro] improved on FLUX.1 [pro] across all dimensions: image quality, prompt adherence, and generation speed. The model generates photorealistic images in approximately 4.5 seconds and was reported to be six times faster than the original FLUX.1 [pro].
At release, FLUX.1.1 [pro] achieved the highest ELO rating on Artificial Analysis, scoring 1153 points, above Midjourney 6.1 (1100) and Ideogram v2 (1108). In structured evaluation, FLUX.1.1 [pro] correctly rendered all requested elements in 94% of test prompts, compared to 87% for DALL-E 3 and 82% for Stable Diffusion XL.
FLUX.1.1 [pro] is accessible through the BFL API and through third-party providers including fal.ai, Replicate, Together.ai, and Freepik. The BFL API charges 4 credits ($0.04) per generated image.
On November 6, 2024, Black Forest Labs extended FLUX.1.1 [pro] with two new generation modes: Ultra and Raw.
Ultra enables image generation at four times the standard resolution, up to approximately 4 megapixels (4MP). Generation at Ultra resolution takes around 10 seconds. Black Forest Labs described the model as over 2.5 times faster than comparable high-resolution offerings from other providers. The BFL API prices Ultra generation at 6 credits ($0.06) per image.
Raw is a mode for creators who want less polished, more naturalistic output. It reduces the model's tendency to produce the smooth, idealized look common in AI-generated images. In practice, Raw mode increases variety in human subjects and significantly improves realism in nature photography, producing images that read as candid or documentary rather than composited.
The two modes can be used independently or together, and Raw mode works at both standard and Ultra resolutions.
On November 1, 2024, Black Forest Labs released FLUX.1 Tools, a set of specialized models designed to give users control over structure, content, and style in image generation.
FLUX.1 Fill supports inpainting (replacing selected regions within an image) and outpainting (extending an image beyond its original boundaries). The user provides a binary mask indicating the area to edit and a text description of the desired replacement content. Fill is available as both a [pro] variant through the BFL API and an open-weight [dev] variant on Hugging Face under the FLUX Dev License. Third-party providers including fal.ai, Replicate, Together.ai, and Freepik distribute Fill through their own APIs.
FLUX.1 Depth and FLUX.1 Canny provide structure-conditioned generation, similar in concept to ControlNet for Stable Diffusion. Depth conditioning uses a depth map extracted from a reference image; Canny conditioning uses edge outlines from the Canny edge detection algorithm. In both cases, the user provides a reference image and a text prompt, and the model generates a new image that follows the spatial structure of the reference while adopting the content described in the prompt.
Both [pro] and [dev] variants were released in November 2024. Black Forest Labs subsequently deprecated both Depth and Canny from the BFL API (the models remain on Hugging Face as legacy open-weight checkpoints). Structural control capabilities were carried forward in FLUX.1 Kontext.
FLUX.1 Redux is an image variation adapter that takes a reference image plus a text prompt and generates new images that resemble the reference while incorporating the prompted changes. It can be used as a restyling tool or for creating variations of an image that preserve general composition and subject while modifying details. Redux integrates with FLUX.1.1 [pro] Ultra for 4MP output. It is available as an open-weight model on Hugging Face (under the FLUX Dev License) and through the BFL API.
FLUX.1 Kontext is a distinct model family released on May 29, 2025, focused on image editing through text instructions. Unlike the FLUX.1 Tools, which require explicit masks or structural inputs, Kontext models accept an input image and a text description of the desired edit, then produce an edited version of that image directly.
The approach is described by Black Forest Labs as "in-context image generation": the model jointly conditions on image tokens and text tokens, allowing it to reason about what is in an existing image and modify it according to the prompt without requiring the user to define edit regions explicitly.
FLUX.1 Kontext addresses a practical limitation of earlier image editing approaches: accumulated degradation across multiple edits. Previous methods often introduce artifacts or drift in character identity with each successive edit. Kontext was designed with iterative workflows in mind, preserving object and character consistency more robustly across multiple rounds of prompting.
| Variant | Access | Parameters | Notes |
|---|---|---|---|
| FLUX.1 Kontext [max] | BFL API | Proprietary | Maximum performance, highest prompt adherence |
| FLUX.1 Kontext [pro] | BFL API | Proprietary | Balanced speed and quality for iterative workflows |
| FLUX.1 Kontext [dev] | Open-weight | 12B | Runs on consumer hardware; available on Hugging Face |
FLUX.1 Kontext [dev] weights were made available on Hugging Face under the FLUX Dev License (non-commercial). At 12 billion parameters, it runs on consumer GPUs with 24GB VRAM, making proprietary-level image editing accessible for local deployment. The [pro] and [max] variants are accessible via the BFL API and third-party providers including Replicate.
Kontext can perform local edits (modifying a specific object or region without touching the rest of the image), style transfer, character consistency across scenes, and typography manipulation (adding, removing, or altering text within images). It also supports text-to-image generation, making it a unified generation and editing model rather than a pure editing tool.
A technical paper, "FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space," was published in June 2025 (arXiv:2506.15742), describing the training methodology and benchmark results.
Black Forest Labs released the FLUX.2 family on November 25, 2025, representing a generational upgrade over FLUX.1 in architecture, parameter count, and capabilities.
FLUX.2 couples a Mistral-3 24B vision-language model (VLM) with a rectified flow transformer, totaling approximately 32 billion parameters for the [pro] and [dev] variants. The VLM component brings world knowledge and contextual reasoning to the generation process, while the transformer models spatial relationships and scene composition. FLUX.2 also introduces a newly retrained VAE with improved learnability.
Key new capabilities in FLUX.2 include multi-reference image support (up to 10 simultaneous reference images), image editing at resolutions up to 4 megapixels, substantially improved complex typography rendering, and reliable brand and layout adherence for structured prompts. These capabilities target commercial applications in advertising, brand content, and graphic design.
| Variant | Access | Parameters | Notes |
|---|---|---|---|
| FLUX.2 [pro] | BFL API | ~32B | State-of-the-art hosted generation |
| FLUX.2 [flex] | BFL API | ~32B | Adjustable steps/guidance; optimized for text rendering |
| FLUX.2 [dev] | Open-weight | ~32B | Most capable open-source image model; Hugging Face |
| FLUX.2 [klein] 4B | Open-source | 4B | Size-distilled, Apache 2.0; scheduled release |
| FLUX.2 [klein] 9B | Open-source | 9B | Size-distilled, Apache 2.0; scheduled release |
FLUX.2 [dev] is available on Hugging Face and GitHub under a non-commercial license. The open-weight [dev] model at 32B parameters is the most capable publicly released image generation model by parameter count as of its release. The [klein] variants, distilled to 4B and 9B parameters, are designed for deployment on lower-resource hardware and are planned for Apache 2.0 release.
In October 2025, MLCommons selected FLUX.1 (the 11.9B transformer) as the text-to-image benchmark for MLPerf Training v5.1, reflecting the model's adoption as a representative baseline for generative AI training workloads.
At the time of FLUX.1's release in August 2024, the primary alternatives for high-quality text-to-image generation were Midjourney v6.0, DALL-E 3, and Stable Diffusion 3 Ultra. The following table summarizes key differences as of the initial FLUX.1 release:
| Model | Params | License | Open weights | API access | Typical steps | ELO (Aug 2024) |
|---|---|---|---|---|---|---|
| FLUX.1 [pro] | 12B | Commercial (API) | No | BFL, Replicate, fal.ai | ~25-50 | 1048 |
| FLUX.1 [dev] | 12B | Non-commercial | Yes | BFL, Replicate, fal.ai | ~20-50 | 1035 |
| FLUX.1 [schnell] | 12B | Apache 2.0 | Yes | BFL, Replicate, fal.ai | 4 | ~980 |
| Midjourney v6.0 | Proprietary | Proprietary | No | Midjourney only | N/A | 1026 |
| DALL-E 3 | Proprietary | Proprietary | No | OpenAI API | N/A | ~1000 |
| Stable Diffusion 3 Ultra | 8B | Non-commercial | Partial | Stability AI API | ~20-50 | ~970 |
ELO scores from the Artificial Analysis text-to-image leaderboard, August-September 2024.
FLUX.1 set a new bar for legible text within images. Earlier models including Midjourney v6.0 frequently misspelled words or produced malformed letterforms, particularly in non-standard fonts or curved layouts. FLUX.1's T5-XXL conditioning and large transformer size gave it substantially better letter-level fidelity. In independent evaluations, FLUX correctly renders complex text phrases at accuracy rates well above Midjourney v6.1 and DALL-E 3.
FLUX.1 performs comparably to or slightly above Midjourney v6.0 on human anatomy accuracy, particularly for dynamic poses and spatial relationships between limbs. Midjourney v6.0 holds a modest advantage in dramatic lighting and muscle definition. Both models outperform earlier open-source models including Stable Diffusion XL on hand rendering.
FLUX.1's use of T5-XXL conditioning produces stronger adherence to multi-clause prompts specifying counts, spatial positions, and attribute combinations. Models using CLIP-only conditioning tend to average or blend attributes rather than binding them to specific subjects in the scene.
Midjourney has historically been tuned for visual impact and stylistic coherence, producing images with strong compositional aesthetics and polished lighting by default. FLUX.1 [dev] and [pro] produce more literal, technically accurate images but can require more explicit style direction in the prompt to match the artistic quality Midjourney achieves with minimal prompting. In practice, most users report that FLUX.1's output more reliably represents exactly what the prompt described, while Midjourney more reliably produces a visually striking result.
FLUX.1 models are accessible through multiple channels:
| Provider | Models available | Pricing model |
|---|---|---|
| BFL API (docs.bfl.ml) | FLUX.1 [pro], FLUX.1.1 [pro], FLUX.1.1 [pro] Ultra, FLUX.1 Fill, FLUX.1 Redux, FLUX.1 Kontext [pro/max] | Credits ($0.01/credit) |
| Replicate | FLUX.1 [dev], FLUX.1 [schnell], FLUX.1.1 [pro], FLUX.1.1 [pro] Ultra, Kontext [pro] | Per-second GPU or per-image |
| fal.ai | FLUX.1 [dev], FLUX.1 [schnell], FLUX.1.1 [pro], FLUX.1 Fill, FLUX.1 Redux, Kontext [pro] | Per-image |
| Together.ai | FLUX.1 [schnell], FLUX.1 [dev], FLUX.1.1 [pro], FLUX.1 Fill, FLUX.1 Depth, FLUX.1 Canny | Per-step or per-image |
| Hugging Face (self-hosted) | FLUX.1 [dev], FLUX.1 [schnell], FLUX.1 Fill [dev], FLUX.1 Redux [dev], Kontext [dev] | Free (own hardware) |
| Cloudflare Workers AI | FLUX.1 [schnell], FLUX.2 [dev] | Per-image serverless |
The BFL API uses a credit system where 1 credit equals $0.01 USD.
| Model | Credits | USD per image |
|---|---|---|
| FLUX.1.1 [pro] | 4 | $0.04 |
| FLUX.1.1 [pro] Ultra | 6 | $0.06 |
| FLUX.2 [pro] | 3 | $0.03 |
| FLUX.2 [flex] | 6 | $0.06 |
| FLUX.2 [max] | 7 | $0.07 |
| FLUX.2 [klein] 4B | ~1.4 | from $0.014 |
| FLUX.2 [klein] 9B | ~1.5 | from $0.015 |
FLUX.1 [dev] and [schnell] are not available through the BFL API; they are distributed as open-weight models for self-hosted deployment.
Running FLUX.1 [dev] or [schnell] locally requires a GPU with sufficient VRAM to hold the model weights. At full FP16 precision, the transformer alone requires approximately 24GB VRAM. The T5-XXL text encoder adds approximately 9GB, bringing the total to roughly 33GB for a full-quality setup.
Quantization reduces these requirements significantly:
| Precision | Approximate VRAM | Quality impact |
|---|---|---|
| FP16 (full) | ~33 GB | None |
| FP8 | ~10 GB | Minor |
| NF4 | ~7 GB | Noticeable on complex prompts |
On an RTX 4090 (24GB VRAM), FLUX.1 [schnell] generates a 1024x1024 image in approximately 2-4 seconds. FLUX.1 [dev] at 20 steps takes approximately 15-30 seconds on the same hardware. The transformer module accounts for roughly 96% of total inference time.
FLUX.1 is more memory-hungry than SDXL or Stable Diffusion 1.5 because its DiT architecture processes tokens sequentially at full attention, unlike the UNet architectures used in earlier Stable Diffusion models. A GPU that runs SDXL comfortably may be too constrained for FLUX.1 [dev] at full precision without quantization.
FLUX.1 has been applied across a range of professional and creative contexts:
Photography and portraiture: FLUX.1 [dev] fine-tuned with DreamBooth on 3-5 reference photographs can generate consistent likeness-preserving portraits in varied settings and lighting conditions. This is used for AI headshots, profile pictures, and marketing assets.
Graphic design and typography: FLUX.1's strong text rendering makes it suitable for generating visual mockups, poster concepts, and promotional materials containing legible text, a use case where earlier models consistently failed.
Fashion and product visualization: Brands use FLUX.1 to generate clothing on varied body types, visualize product colorways, and produce lifestyle photography for e-commerce without physical photoshoots.
Fine-tuning and style models: The open-weight FLUX.1 [dev] serves as the base for thousands of community fine-tuned models on Civitai and Hugging Face, targeting specific artistic styles, characters, and photographic aesthetics. LoRA (Low-Rank Adaptation) training is the dominant fine-tuning approach, with tools including SimpleTuner, kohya-ss trainer, and Hugging Face Diffusers supporting FLUX-specific LoRA training.
Scientific visualization: Researchers have used FLUX.1 for generating illustrative figures, anatomical visualizations, and synthetic training data for computer vision tasks.
Advertising and branded content: FLUX.1 Kontext and FLUX.2's multi-reference capabilities support generating on-brand imagery that maintains logos, color palettes, and product consistency across variations, a workflow that previously required manual compositing.
FLUX.1's August 2024 release was widely covered in the AI community as a significant quality jump in openly available image generation. VentureBeat, The Decoder, and DeepLearning.AI's The Batch all highlighted FLUX.1's benchmark performance and the significance of the founding team's Stable Diffusion lineage.
The creative community adopted FLUX.1 [dev] and [schnell] rapidly. ComfyUI, the node-based workflow tool that serves as a primary interface for local image generation, added FLUX support on launch day. Civitai, the primary community hub for sharing generative model fine-tunes and LoRAs, accumulated thousands of FLUX-specific models within weeks of release.
The Apache 2.0 license on FLUX.1 [schnell] was specifically noted as enabling commercial deployment without legal complexity, distinguishing it from the non-commercial restriction on FLUX.1 [dev] and many Stable Diffusion variants.
In October 2025, MLCommons selected FLUX.1 as the standard text-to-image task for MLPerf Training v5.1, an industry benchmark for measuring hardware training throughput. This selection reflected the model's status as a representative and reproducible workload for modern image generation infrastructure.
Black Forest Labs' enterprise partnerships grew substantially through 2025. Meta signed a multi-year contract with BFL in September 2025 valued at approximately $140 million, and combined enterprise contracts from Adobe, Canva, Snap, and Meta reached approximately $300 million in total contract value by end of 2025, according to reporting by Sifted.
FLUX.1 has several practical limitations that users and developers have documented:
Memory requirements: At full precision, FLUX.1 [dev] requires substantially more VRAM than comparable-quality SDXL models, making it inaccessible on entry-level consumer GPUs without quantization.
Inference speed at full quality: FLUX.1 [dev] at 20+ steps is 3-5x slower per image than SDXL at a comparable step count, limiting real-time interactive applications. FLUX.1 [schnell] addresses this but at some quality cost.
Artistic style defaults: FLUX.1 produces more photorealistic, literal outputs by default than Midjourney. Users who want stylized or illustrative results need to provide explicit style guidance in prompts or use fine-tuned LoRAs.
License fragmentation: The three FLUX.1 variants have different licenses (commercial API for [pro], non-commercial open for [dev], Apache 2.0 for [schnell]). Fine-tuned models based on [dev] inherit its non-commercial restriction, which has caused some confusion in the community about what downstream uses are permitted.
NSFW and content policy: The BFL API enforces content moderation that restricts explicit or harmful outputs. Open-weight [dev] and [schnell] models can be run without these restrictions locally, which has contributed to a segment of community fine-tunes that explicitly remove safety constraints.
Older hardware: FLUX.1's DiT architecture is less memory-efficient than the UNet architectures in older Stable Diffusion models, and some inference optimizations developed for UNet pipelines do not directly transfer.