Flux (text-to-image model)

Flux is a family of text-to-image generative models developed by Black Forest Labs (BFL), a company founded by the original creators of Stable Diffusion. First released in August 2024, the FLUX.1 models quickly established themselves as among the highest-quality open and commercial image generators available, with particular strengths in photorealism, text rendering within images, and anatomical accuracy. The models are built on a 12-billion-parameter hybrid transformer architecture that uses flow matching rather than the traditional denoising diffusion approach, representing a significant technical evolution from earlier latent diffusion models ^[1].

Flux models have been widely adopted across the AI image generation ecosystem. In August 2024, xAI integrated FLUX.1 into the Grok chatbot for image generation on the X platform. Freepik has scaled Flux to handle millions of image generation requests per day. Adobe integrated Flux.1 Kontext Pro into Photoshop as an option for its generative fill tool in September 2025. Meta signed a multi-year contract worth $140 million for use of BFL's generative image technology in September 2025. As of early 2026, Black Forest Labs has raised over $330 million in funding, is valued at approximately $3.25 billion, and has expanded its model lineup from the original FLUX.1 series to the second-generation FLUX.2 family ^[2]^[3].

Black Forest Labs

Founding

Black Forest Labs was founded in 2024 by Robin Rombach, Andreas Blattmann, and Patrick Esser, all of whom were former researchers at LMU Munich under Professor Bjorn Ommer and subsequently employees at Stability AI ^[4].

Robin Rombach is the lead author of the 2022 paper "High-Resolution Image Synthesis with Latent Diffusion Models" (commonly known as the latent diffusion or Stable Diffusion paper), which introduced the technique of performing the diffusion process in a compressed latent space rather than directly in pixel space. This architectural insight dramatically reduced the computational cost of diffusion-based image generation and made high-quality image synthesis accessible to consumer hardware. The paper has become one of the most cited works in the history of computer vision and generative AI ^[4].

Andreas Blattmann and Patrick Esser were co-authors on the same paper and contributed to subsequent work on video generation and image synthesis at Stability AI. The three founders left Stability AI in 2024 to start Black Forest Labs, named after the Black Forest region in southwestern Germany near Freiburg, where much of their academic research took place.

Funding

Round	Date	Amount	Valuation	Lead Investors
Seed	August 2024	$31M	~$150M (post-money)	Andreessen Horowitz (a16z)
Series A	Late 2024	~$100M	~$1B	Andreessen Horowitz
Series B	December 2025	$300M	$3.25B	Salesforce Ventures, AMP (Anjney Midha)

The seed round, announced simultaneously with the launch of FLUX.1 in August 2024, was led by Andreessen Horowitz with participation from General Catalyst, Brendan Iribe (co-founder of Oculus), Michael Ovitz, Garry Tan (CEO of Y Combinator), and NVIDIA's Timo Aila ^[4]. By September 2024, reports indicated that the company was raising an additional $100 million at a $1 billion valuation, a dramatic increase from the $150 million post-money valuation just weeks earlier, driven largely by the rapid adoption of FLUX.1 models ^[5].

The Series A was led by Andreessen Horowitz with participation from BroadLight Capital, Creandum, Earlybird VC, General Catalyst, Northzone, and NVIDIA. It was not publicly announced at the time but was disclosed alongside the Series B.

In December 2025, Black Forest Labs closed a $300 million Series B round at a $3.25 billion valuation, co-led by Salesforce Ventures and Anjney Midha's AMP, with participation from a16z, NVIDIA, General Catalyst, Temasek, Air Street Capital, Bain Capital Ventures, Canva, Figma Ventures, Adobe Ventures, Samsung Next, Lux Capital, and others. Notable angel investors in the round included Guillermo Rauch (Vercel CEO), Clem Delangue (Hugging Face CEO), and Mati Staniszewski (ElevenLabs CEO) ^[3].

Commercial Partnerships

Black Forest Labs has secured significant commercial partnerships that underscore enterprise demand for its technology. In September 2025, Meta signed a multi-year contract worth $140 million ($35 million in the first year, $105 million in the second year) for use of BFL's generative image technology ^[15]. Combined with contracts from Adobe, Canva, and Snap, BFL's total enterprise contract value reached approximately $300 million by the end of 2025 ^[15].

FLUX.1 Models

The initial FLUX.1 release on August 1, 2024 comprised three model variants, each targeting different use cases and operating under different licensing terms ^[1].

Model	Parameters	Steps	License	Availability	Target Use Case
FLUX.1 [schnell]	12B	1-4	Apache 2.0	Open weights (Hugging Face)	Fast local generation, prototyping
FLUX.1 [dev]	12B	20-50	Non-commercial (BFL license)	Open weights (Hugging Face)	Research, hobbyist, non-commercial
FLUX.1 [pro]	12B	Varies	Proprietary	API only	Professional/commercial use

FLUX.1 [schnell]

Schnell (German for "fast") is the speed-optimized variant, capable of generating images in just 1 to 4 inference steps. It is released under the Apache 2.0 license, making it fully open for commercial and personal use. It can run on GPUs with as little as 12 GB of VRAM, making it accessible on consumer hardware. While it produces lower-fidelity images than the dev or pro variants, the quality is remarkably high for the extremely low step count ^[1].

FLUX.1 [dev]

The dev variant is a guidance-distilled version of FLUX.1 [pro], offering higher quality than schnell at the cost of requiring 20 to 50 inference steps (with 30 to 40 recommended for optimal results). It is released as source-available software under a non-commercial license, though users can obtain a self-serving commercial license from BFL. The dev model produces noticeably better skin textures, lighting effects, and fine details compared to schnell ^[1].

FLUX.1 [pro]

The professional variant is available exclusively through BFL's API and through partner platforms. It offers the highest image quality in the initial FLUX.1 lineup, with superior prompt adherence, photorealistic rendering, and fine detail work. It is the only variant whose weights are not publicly distributed ^[1].

FLUX1.1 [pro] and Ultra

FLUX1.1 [pro]

Released on October 2, 2024 alongside the general availability of the BFL API, FLUX1.1 [pro] was a major upgrade that generated images six times faster than the original FLUX.1 [pro] while simultaneously improving image quality, prompt adherence, and output diversity. The model generates photorealistic images in approximately 4.5 seconds. It was submitted to the Artificial Analysis image arena under the codename "blueberry" and achieved the highest overall Elo score of any model on the leaderboard at the time of its debut ^[6].

FLUX1.1 [pro] natively supports high-resolution generation up to 2K (2048 x 2048 pixels) without sacrificing quality, and it introduced improved handling of text rendering, complex multi-object scenes, and human anatomy.

FLUX1.1 [pro] Ultra and Raw Modes

On November 6, 2024, BFL released FLUX1.1 [pro] Ultra and Raw modes ^[16].

Ultra mode extends FLUX's capability to generate images at four times the resolution of the standard FLUX1.1 [pro], producing 4-megapixel (approximately 2048 x 2048) images in about 10 seconds. Benchmarks showed Ultra was over 2.5 times faster than comparable high-resolution offerings from competitors. Ultra is priced at $0.06 per image through the BFL API ^[16].

Raw mode captures the genuine feel of candid photography, producing images with a less synthetic, more natural aesthetic. It significantly increases diversity in human subjects and enhances the realism of nature photography, addressing the common criticism that AI-generated images can look overly polished or "plastic." Raw mode is available as a toggle on both the standard and Ultra variants ^[16].

FLUX.1 Tools

On November 21, 2024, BFL released FLUX.1 Tools, a suite of editing capabilities designed to extend the core FLUX models ^[17].

Tool	Function	Availability
FLUX.1 Fill	Inpainting and outpainting with text-guided editing	Pro (API) + Dev (open weights)
FLUX.1 Depth	Structural guidance based on depth maps from input images	Pro (API) + Dev (open weights)
FLUX.1 Canny	Structural guidance based on canny edge detection from input images	Pro (API) + Dev (open weights)
FLUX.1 Redux	Adapter for mixing and recreating input images with text prompts	Pro (API) + Dev (open weights)

Each tool was released as a FLUX.1 [pro] variant through the API and as a guidance-distilled open-access FLUX.1 [dev] variant with inference code and weights on Hugging Face. FLUX.1 Fill [pro] achieved state-of-the-art results in inpainting benchmarks at the time of release. FLUX.1 Canny and Depth provide ControlNet-style structural conditioning, enabling precise control over the spatial layout and structure of generated images ^[17].

FLUX Pro Finetuning API

On January 16, 2025, BFL launched the FLUX Pro Finetuning API, enabling users to customize FLUX.1 [pro] with their own images and concepts. The system requires as few as 1 to 5 example images to create a targeted customization. In user studies, FLUX Pro finetuning results were preferred 68.9% of the time over other available finetuning services using FLUX.1 [dev] ^[18].

Once a finetune is created, it can be applied across the entire FLUX.1 model suite without additional adaptation, including FLUX.1 [pro], FLUX1.1 [pro], and the complete FLUX.1 Tools suite. This enables customized content generation with resolutions up to 4 megapixels, customized inpainting with FLUX.1 Fill, and customized structural control with FLUX.1 Depth ^[18].

FLUX.1 Kontext

On May 29, 2025, Black Forest Labs released FLUX.1 Kontext, a suite of models that enable in-context image generation and editing. Unlike standard text-to-image models, Kontext accepts both text and image inputs, allowing users to provide reference images and modify them through natural language instructions ^[12].

Kontext can extract and modify visual concepts from reference images to produce new coherent renderings, enabling use cases such as character consistency across multiple generations, style transfer, object replacement, and iterative editing without requiring fine-tuning or complex multi-step workflows.

Model	Description	Availability
FLUX.1 Kontext [max]	Maximum performance with exceptional prompt adherence, advanced typography, and premium rendering quality	API ($0.08/image)
FLUX.1 Kontext [pro]	Balanced quality and speed for iterative editing workflows	API ($0.04/image)
FLUX.1 Kontext [dev]	Lightweight 12B diffusion transformer for customization and local deployment	Open weights (Hugging Face)

BFL reported that Kontext models deliver inference speeds up to 8 times faster than competing context-aware image editing models such as GPT-Image. In September 2025, Adobe announced that Flux.1 Kontext Pro was available as a model option for Photoshop's generative fill tool in beta, marking significant validation from the professional creative tools industry ^[12].

On June 26, 2025, BFL released FLUX.1 Kontext [dev] as open weights, allowing the community to run the model locally and integrate it into custom workflows ^[19].

FLUX.1 Krea [dev]

On July 31, 2025, BFL released FLUX.1 Krea [dev], a model developed in collaboration with Krea AI. FLUX.1 Krea [dev] is a 12-billion-parameter rectified flow transformer that was specifically trained to overcome the oversaturated "AI look" common in text-to-image models, achieving new levels of photorealism with a distinctive aesthetic approach ^[20].

The model is the open-weights version of Krea 1, offering strong performance with highly distinctive aesthetics and exceptional realism. It scored 1011 Elo in human evaluation tests, outperforming other open-source FLUX models and rivaling premium models like FLUX1.1 [pro]. It was released under a non-commercial license with weights available on Hugging Face ^[20].

Architecture

FLUX.1: Hybrid Multimodal Diffusion Transformer

All FLUX.1 models are built on a hybrid architecture that combines multimodal and parallel diffusion transformer (DiT) blocks, scaled to 12 billion parameters. The architecture represents an evolution of the DiT framework introduced in "Scalable Diffusion Models with Transformers" (Peebles and Xie, 2023), adapted for the text-to-image generation task ^[7].

The FLUX.1 architecture consists of 57 total transformer blocks, divided into two types ^[7]:

19 dual-branch (double-stream) blocks: These blocks process text and image tokens through separate attention pathways, allowing each modality to maintain its own representation before interacting through cross-attention mechanisms.
38 single-branch (single-stream) blocks: These blocks concatenate the text and image embeddings and process them as a unified sequence with shared weights.

This hybrid design allows the model to maintain modality-specific processing in the early layers (where text and image features are quite different) while enabling deep integration in the later layers (where the model needs to tightly coordinate text semantics with visual content).

Dual Text Encoders (FLUX.1)

FLUX.1 uses two text encoders working in tandem to process input prompts ^[21]:

CLIP L/14: Trained on hundreds of millions of image-text pairs, the CLIP text encoder captures rich semantic information and maps text into a shared latent space with image embeddings. It has a fixed token limit of 77 tokens.
T5-v1.1-XXL: A large language model developed by Google that provides rich, token-level semantic representations for long and complex textual prompts. T5 can handle up to 512 tokens (256 on the schnell variant), enabling much more detailed prompt descriptions than CLIP alone.

This combination allows Flux to interpret complex scene descriptions with high fidelity, leveraging CLIP's visual-semantic alignment alongside T5's deep language understanding.

Flow Matching

Flux uses flow matching as its training paradigm rather than the denoising diffusion probabilistic models (DDPM) framework used by Stable Diffusion and many earlier generative models ^[7].

In traditional diffusion models, the generation process involves gradually adding Gaussian noise to an image during training (the forward process) and then learning to reverse this noisy process step by step (the reverse process). Flow matching takes a conceptually simpler approach: it learns a deterministic vector field that transforms samples from a simple noise distribution directly to the target data distribution along an optimal transport path. This method, called Rectified Flow, straightens the transformation paths between noise and data, resulting in more efficient generation that requires fewer inference steps to produce high-quality outputs ^[7].

The practical benefit is that flow matching enables faster generation at equivalent quality levels, or higher quality at equivalent step counts, compared to DDPM-based methods.

Rotary Positional Embeddings

FLUX incorporates rotary positional embeddings (RoPE) to encode spatial relationships within the image and sequential relationships within the text. RoPE enables the model to generalize across different image resolutions and aspect ratios more effectively than fixed positional encodings ^[7].

Parallel Attention Layers

The architecture uses parallel attention layers to improve hardware efficiency. Rather than computing self-attention and feedforward layers sequentially, parallel attention computes both simultaneously and sums their outputs. This design choice improves GPU utilization and reduces wall-clock inference time ^[7].

Latent Space

Like Stable Diffusion, Flux operates in a compressed latent space rather than directly in pixel space. Images are encoded into a lower-dimensional latent representation by a variational autoencoder (VAE) before the diffusion/flow matching process begins, and the generated latent representation is decoded back into pixel space by the VAE decoder after generation is complete. FLUX.1 processes images in a 16-channel latent space, scaled up from the 4 channels used in Stable Diffusion. This expanded representation allows the model to capture more nuanced information about textures, lighting, and spatial arrangements ^[7].

FLUX.2: Next-Generation Architecture

FLUX.2 introduces significant architectural changes compared to FLUX.1. The model scales to 32 billion parameters and replaces the dual T5 + CLIP text encoder system with a Mistral-3 24B vision-language model (VLM) ^[8]. By coupling a VLM trained on a massive corpus of interleaved text and images with the rectified flow transformer, FLUX.2 possesses significantly more grounded "world knowledge" than its predecessors, enabling better understanding of real-world concepts, spatial relationships, and material properties.

FLUX.2 also introduced a retrained variational autoencoder that provides an optimized trade-off between learnability, quality, and compression rate. This new VAE was released as open-source software under the Apache 2.0 license ^[8].

For the FLUX.2 [klein] models, the architecture uses a Qwen3 8B text embedder instead of Mistral-3, paired with a 9B or 4B flow transformer. The klein variants are step-distilled to just 4 inference steps, enabling sub-second generation on consumer GPUs ^[22].

FLUX.2 Models

On November 25, 2025, Black Forest Labs announced the FLUX.2 series, a major second-generation update to the model family. The initial announcement included FLUX.2 [pro], [flex], and [dev], with additional variants released in the following months ^[8].

Model Lineup

Model	Parameters	Text Encoder	License	Release Date	Key Features
FLUX.2 [max]	32B	Mistral-3 24B VLM	Proprietary (API)	January 2026	Highest quality, grounded generation with web context
FLUX.2 [pro]	32B	Mistral-3 24B VLM	Proprietary (API)	November 2025	Production-grade, multi-reference support
FLUX.2 [flex]	32B	Mistral-3 24B VLM	Proprietary (API)	November 2025	Tunable parameters (steps, guidance), typography specialist
FLUX.2 [dev]	32B	Mistral-3 24B VLM	BFL non-commercial	November 2025	Open weights, LoRA training, local deployment
FLUX.2 [klein] 9B	9B	Qwen3 8B	Apache 2.0	January 15, 2026	Sub-second generation, consumer hardware
FLUX.2 [klein] 4B	4B	Qwen3 8B	Apache 2.0	January 15, 2026	Smallest model, ~13 GB VRAM, consumer GPUs
FLUX.2 [klein] 9B-KV	9B	Qwen3 8B	Apache 2.0	March 2026	KV-cache for 2.5x faster multi-reference editing

FLUX.2 [max]

FLUX.2 [max] is the highest-performance model in the lineup, delivering the most consistent image editing and the strongest prompt following across the FLUX.2 family. It preserves colors, lighting, faces, text, and objects with exceptional fidelity during editing tasks. Despite major gains in quality, it generates content nearly as fast as FLUX.2 [pro], making it up to 3 times faster than competing models of similar quality. It supports grounded generation with real-time web context ^[23].

FLUX.2 [pro]

FLUX.2 [pro] is the production-grade variant that balances state-of-the-art quality with speed. It supports processing up to 10 reference images simultaneously while preserving character features, product details, and style elements across outputs. It can generate and edit images at resolutions up to 4 megapixels. Teams use this variant when they need reliable, consistent results without parameter tuning ^[8].

FLUX.2 [flex]

FLUX.2 [flex] provides developer control over inference parameters such as the number of sampling steps and the guidance scale, enabling developers to tune the trade-offs between speed, text accuracy, and detail fidelity for each project. It specializes in text rendering and fine details, making it well suited for typography, UI mockups, and infographics ^[8].

FLUX.2 [dev]

The open-weight 32B model combines text-to-image synthesis and image editing with multiple input images in a single checkpoint. It is available on Hugging Face with optimized fp8 implementations for consumer GPUs. The dev variant is suitable for developers, researchers, and power users who want local or cloud deployments, LoRA training, or rapid iteration ^[8].

FLUX.2 [klein]

Klein (German for "small") is the fastest model family, generating and editing images in under one second. Available in 4B and 9B parameter sizes, klein is designed for real-time applications, rapid creative iteration, and deployment on consumer hardware. The 4B variant requires approximately 13 GB of VRAM, making it accessible on GPUs like the NVIDIA RTX 3090 and RTX 4070. It is released under the Apache 2.0 license ^[22].

Unlike previous generation models that required separate pipelines for generation and editing, FLUX.2 [klein] unifies text-to-image, single-reference editing, and multi-reference generation in one architecture.

In March 2026, BFL released FLUX.2 [klein] 9B-KV and its FP8 variant, which incorporate KV-cache optimization. By caching key-value pairs from reference images during the first denoising step, the KV variant eliminates redundant computation in subsequent steps, achieving up to 2.5 times faster inference for multi-reference editing tasks ^[24].

Key Strengths

Text Rendering

One of FLUX's most praised capabilities is its ability to render legible, accurately spelled text within generated images. Text rendering has historically been one of the weakest aspects of diffusion-based image generators, with models like Midjourney and earlier versions of Stable Diffusion frequently producing garbled or misspelled text. FLUX handles text rendering with significantly higher accuracy, producing sharp, readable typography even at small sizes and in complex layouts ^[9].

Photorealism and Anatomy

FLUX generates highly photorealistic images with notably fewer artifacts in human anatomy, particularly hands and fingers, which have been a persistent challenge for image generation models. Comparative evaluations have found that FLUX maintains better anatomical consistency than Midjourney V6.1 and DALL-E 3, with fewer instances of extra fingers, deformed limbs, or distorted facial features ^[9].

Prompt Adherence

The model demonstrates strong prompt following, accurately representing complex multi-object scenes, spatial relationships, and specific attributes described in text prompts. This capability is partly attributable to the dual-branch transformer architecture, which allows deep cross-modal attention between text and image features ^[9].

Comparison with Competitors

Feature	FLUX.1 [pro]	FLUX.2 [pro]	Midjourney V6	DALL-E 3	Stable Diffusion XL
Parameters	12B	32B	Unknown (proprietary)	Unknown (proprietary)	~3.5B
Text Rendering	Excellent	Excellent	Poor	Good	Poor
Photorealism	Excellent	State of the art	Excellent	Good	Good
Anatomy/Hands	Excellent	Excellent	Good (improved in V6.1)	Moderate	Moderate
Open Weights	Partial (schnell, dev)	Partial (dev, klein)	No	No	Yes
Local Deployment	Yes (schnell, dev)	Yes (dev, klein)	No	No	Yes
Training Approach	Flow matching	Flow matching	Diffusion	Diffusion	Diffusion
Architecture	DiT (12B transformer)	DiT + Mistral-3 VLM (32B)	Unknown	Unknown	U-Net
Max Resolution	2K (Ultra: 4MP)	4MP native	Unknown	1024x1024	1024x1024
Image Editing	Via Tools suite	Native (unified model)	Limited	Via DALL-E editor	Via extensions

Flux's primary advantage over Midjourney and DALL-E 3 is the availability of open weights for the schnell, dev, and klein variants, enabling local deployment, fine-tuning, and community-driven extensions. Compared to Stable Diffusion XL, Flux offers substantially higher quality across all dimensions due to its much larger model size and more advanced architecture ^[9].

API Access and Pricing

Black Forest Labs offers a credit-based API pricing system where 1 credit equals $0.01 USD. Pricing scales with model capability and, for some models, with output resolution ^[25].

Model	Price per Image	Notes
FLUX.2 [klein] 4B	From $0.014	Megapixel-based pricing
FLUX.2 [klein] 9B	From $0.015	Megapixel-based pricing
FLUX.2 [pro]	From $0.03 (generation), $0.045 (editing)	Production-grade
FLUX.2 [flex]	$0.05 (generation), $0.10 (editing)	Tunable parameters
FLUX.2 [dev]	Free	Non-commercial local use
FLUX.1 Kontext [pro]	$0.04	Context-aware editing
FLUX.1 Kontext [max]	$0.08	Highest Kontext quality
FLUX1.1 [pro]	$0.04	Standard generation
FLUX1.1 [pro] Ultra	$0.06	4MP high-resolution
FLUX.1 Fill [pro]	$0.05	Inpainting/outpainting

The same pricing applies for both API and Playground access. Batch requests multiply the base cost by the number of images requested. FLUX models are also available through numerous third-party platforms including Together AI, Replicate, Fal.AI, Cloudflare Workers AI, DeepInfra, Runware, and the NVIDIA NIM API catalog ^[6]^[25].

Adoption and Integrations

FLUX models achieved rapid adoption after their August 2024 launch, reaching 1 million API inferences in the first week and over 500,000 downloads on Hugging Face ^[4].

Notable Integrations

Grok (xAI): In August 2024, FLUX.1 was integrated as the image generation backend for the Grok chatbot on the X platform. The integration attracted attention partly because of its relatively permissive content policies. xAI later switched to its own Aurora model for image generation in December 2024 ^[10].
Freepik: The design platform adopted FLUX for its AI Image Generator, scaling to millions of requests per day through specialized infrastructure partnerships with DataCrunch and WaveSpeed AI ^[11].
Adobe Photoshop: In September 2025, Adobe announced that Flux.1 Kontext Pro was available as a model option for Photoshop's generative fill tool in beta, marking significant validation from the professional creative tools industry ^[12].
Microsoft Azure AI Foundry: In August 2025, FLUX models became available on Azure AI Foundry, extending BFL's reach into Microsoft's enterprise cloud ecosystem ^[19].
Platform Partners: FLUX models are available through numerous AI platforms including Together AI, Replicate, Fal.AI, Cloudflare Workers AI, and the NVIDIA NIM API catalog ^[6].

Community and ComfyUI

The open-weight FLUX models have fostered a large and active community, particularly around ComfyUI, the node-based visual workflow tool for diffusion models. ComfyUI provided day-one support for FLUX.1 Tools at their November 2024 launch and has continued to add support for new FLUX variants as they are released. In November 2025, NVIDIA highlighted FLUX.2 models as optimized for RTX GPUs and showcased ComfyUI workflows in its RTX AI Garage program ^[26].

Community-built tools like FluxGym simplify LoRA training for FLUX models, and Kohya_ss remains a widely used option for comprehensive LoRA training with support for 12 GB VRAM setups. FLUX LoRAs enable users to teach the model new concepts, characters, and styles, with trained weights easily integrated into existing ComfyUI workflows ^[27].

Fal.AI FLUX.2 [dev] Turbo

On December 29, 2025, Fal.AI released FLUX.2 [dev] Turbo, a distilled LoRA adapter for FLUX.2 [dev] that enables high-quality image generation in just 8 inference steps (compared to 50 for the base model). The adapter uses a customized DMD2 distillation technique and was released on Hugging Face. This third-party contribution demonstrated the value of BFL's open-weight strategy in enabling community-driven optimization ^[28].

MLPerf Benchmark

In October 2025, MLCommons selected Flux.1 as the new text-to-image benchmark for MLPerf Training v5.1, replacing Stable Diffusion v2 to reflect modern model architectures and scale. The 11.9-billion-parameter transformer-based model serves as a representative benchmark for current generative AI workloads. In the MLPerf Training v5.1 results released on November 12, 2025, NVIDIA set a record time-to-train of 12.5 minutes using 1,152 Blackwell GPUs ^[29].

Licensing

Black Forest Labs uses a tiered licensing strategy across its model family ^[1]:

Apache 2.0: Applied to the schnell and klein model variants, allowing unrestricted commercial and personal use, modification, and redistribution.
BFL Non-Commercial License: Applied to dev model variants (including Kontext dev and Krea dev), allowing free use for research, education, and personal projects. Commercial use requires a separate license from BFL.
Proprietary: Applied to pro, max, and flex variants, available only through BFL's API or licensed partner platforms.

This approach balances open-source community building (through the Apache-licensed models) with revenue generation (through the API-only professional variants).

Technical Lineage

Flux represents the technical evolution of ideas that originated in several key research papers:

High-Resolution Image Synthesis with Latent Diffusion Models (Rombach et al., 2022): The foundational latent diffusion paper that established the approach of performing diffusion in a compressed latent space, which became the basis for Stable Diffusion ^[4].
Scalable Diffusion Models with Transformers (Peebles and Xie, 2023): Introduced the Diffusion Transformer (DiT) architecture that replaced U-Net with transformer blocks for diffusion models.
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis (Esser et al., 2024): The paper describing the MM-DiT architecture and rectified flow training approach used in FLUX.1, authored by BFL co-founder Patrick Esser ^[7].

Release Timeline

Date	Release
August 1, 2024	Black Forest Labs launch; FLUX.1 [schnell], [dev], [pro] released
October 2, 2024	FLUX1.1 [pro] and BFL API general availability
November 6, 2024	FLUX1.1 [pro] Ultra and Raw modes
November 21, 2024	FLUX.1 Tools (Fill, Depth, Canny, Redux)
January 2, 2025	NVIDIA collaboration for performance optimization
January 16, 2025	FLUX Pro Finetuning API launch
May 29, 2025	FLUX.1 Kontext (Max, Pro, Dev) and BFL Playground
June 26, 2025	FLUX.1 Kontext [dev] open weights
July 31, 2025	FLUX.1 Krea [dev] (collaboration with Krea AI)
September 25, 2025	FLUX.1 Kontext integration in Adobe Photoshop (beta)
November 25, 2025	FLUX.2 series announcement (Pro, Flex, Dev)
December 1, 2025	$300M Series B at $3.25B valuation
January 15, 2026	FLUX.2 [klein] (4B and 9B)
January 2026	FLUX.2 [max]
March 2026	FLUX.2 [klein] 9B-KV and FP8 variants

Current State (March 2026)

As of March 2026, Black Forest Labs has established itself as one of the leading companies in AI image generation, competing directly with Midjourney, OpenAI's DALL-E and GPT-Image, and Google's Imagen. The company's valuation of $3.25 billion, its $140 million partnership with Meta, and its integration into Adobe Photoshop underscore the commercial significance of its technology.

The FLUX model family spans two generations and offers capabilities ranging from sub-second generation on consumer hardware (FLUX.2 [klein]) to professional-grade 4-megapixel output with multi-reference support (FLUX.2 [max] and [pro]). The open-weight releases under Apache 2.0 have fostered a large community of developers building custom workflows, fine-tuned models, and integrations through platforms like ComfyUI and the broader Stable Diffusion ecosystem.

Black Forest Labs has also indicated ongoing development of a text-to-video model, positioning the company to compete in video generation alongside image generation.

Black Forest Labs represents a notable case of academic researchers successfully commercializing foundational AI research. The company's founders created the technology underlying Stable Diffusion, left Stability AI, and built a new company around the next generation of that same technology, achieving a multi-billion-dollar valuation within 18 months of founding.

References

Official inference repo for FLUX.1 models - GitHub, Black Forest Labs
Meet Black Forest Labs, the startup powering Elon Musk's unhinged AI image generator - TechCrunch, August 2024
Black Forest Labs raises $300M at $3.25B valuation - TechCrunch, December 2025
Stable Diffusion creators launch Black Forest Labs, secure $31M for FLUX.1 AI image generator - VentureBeat, August 2024
Exclusive: Black Forest Labs is raising another $100M on a $1B valuation - TechCrunch, September 2024
[Announcing FLUX1.1 [pro] and the BFL API](https://bfl.ai/announcing-flux-1-1-pro-and-the-bfl-api/) - Black Forest Labs, October 2024
How does Flux work? The new image generation AI that rivals Midjourney - Marcos V. Conde, Medium
FLUX.2: Frontier Visual Intelligence - Black Forest Labs, November 2025
FLUX.1 vs Midjourney: Text to Image AI Showdown - getimg.ai
FLUX.1 AI Image Gen in Grok 2.0 - Ultralytics
How Freepik scaled FLUX media generation to millions of requests per day - DataCrunch
Black Forest Labs' Kontext AI models can edit pics as well as generate them - TechCrunch, May 2025
[Black Forest Labs launches open source Flux.2 [klein] to generate AI images in less than a second](https://venturebeat.com/technology/black-forest-labs-launches-open-source-flux-2-klein-to-generate-ai-images-in) - VentureBeat, January 2026
Demystifying Flux Architecture - arXiv
Meta to Pay $140 Million to Use Black Forest Labs AI for Images - Sifted / Bloomberg, September 2025
[Introducing FLUX1.1 [pro] Ultra and Raw Modes](https://bfl.ai/flux-1-1-ultra/) - Black Forest Labs, November 2024
Introducing FLUX.1 Tools - Black Forest Labs, November 2024
Announcing the FLUX Pro Finetuning API - Black Forest Labs, January 2025
Black Forest Labs announcements - Black Forest Labs
[FLUX.1 Krea [dev]: An 'Opinionated' Text-to-Image Model](https://bfl.ai/blog/flux-1-krea-dev) - Black Forest Labs, July 2025
FLUX.1-dev: Encoders and Token Limitations - Medium
[FLUX.2 [klein]: Towards Interactive Visual Intelligence](https://bfl.ai/blog/flux2-klein-towards-interactive-visual-intelligence) - Black Forest Labs, January 2026
[FLUX.2 [max] - Top-Tier Quality Image Generation](https://bfl.ai/models/flux-2-max) - Black Forest Labs
FLUX.2-klein-9b-kv - Hugging Face, March 2026
Pricing overview - Black Forest Labs Documentation
FLUX.2 Image Generation Models Now Released, Optimized for NVIDIA RTX GPUs - NVIDIA Blog
ComfyUI FLUX LoRA Training: Detailed Guides - RunComfy
New Year's AI surprise: Fal releases its own version of Flux 2 image generator - VentureBeat, December 2025
MLPerf Training Introduces Flux.1 Text-to-Image Benchmark - MLCommons, October 2025

Black Forest Labs

Founding

Funding

Commercial Partnerships

FLUX.1 Models

FLUX.1 [schnell]

FLUX.1 [dev]

FLUX.1 [pro]

FLUX1.1 [pro] and Ultra

FLUX1.1 [pro]

FLUX1.1 [pro] Ultra and Raw Modes

FLUX.1 Tools

FLUX Pro Finetuning API

FLUX.1 Kontext

FLUX.1 Krea [dev]

Architecture

FLUX.1: Hybrid Multimodal Diffusion Transformer

Dual Text Encoders (FLUX.1)

Flow Matching

Rotary Positional Embeddings

Parallel Attention Layers

Latent Space

FLUX.2: Next-Generation Architecture

FLUX.2 Models

Model Lineup

FLUX.2 [max]

FLUX.2 [pro]

FLUX.2 [flex]

FLUX.2 [dev]

FLUX.2 [klein]

Key Strengths

Text Rendering

Photorealism and Anatomy

Prompt Adherence

Comparison with Competitors

API Access and Pricing

Adoption and Integrations

Notable Integrations

Community and ComfyUI

Fal.AI FLUX.2 [dev] Turbo

MLPerf Benchmark

Licensing

Technical Lineage

Release Timeline

Current State (March 2026)

See Also

References

Improve this article

Related Articles

Stable Diffusion 3

Stable Diffusion

Black Forest Labs

FLUX.1

Text-to-video generation

NVIDIA Picasso

Black Forest Labs

Founding

Funding

Commercial Partnerships

FLUX.1 Models

FLUX.1 [schnell]

FLUX.1 [dev]

FLUX.1 [pro]

FLUX1.1 [pro] and Ultra

FLUX1.1 [pro]

FLUX1.1 [pro] Ultra and Raw Modes

FLUX.1 Tools

FLUX Pro Finetuning API

FLUX.1 Kontext

FLUX.1 Krea [dev]

Architecture

FLUX.1: Hybrid Multimodal Diffusion Transformer

Dual Text Encoders (FLUX.1)

Flow Matching

Rotary Positional Embeddings

Parallel Attention Layers

Latent Space

FLUX.2: Next-Generation Architecture

FLUX.2 Models

Model Lineup