Black Forest Labs

AI Companies Diffusion Models Generative AI Image Generation Open Source AI

42 min read

Updated Jun 21, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 21, 2026

Fact-checked

In review queue

Sources

32 citations

Revision

v7 · 8,331 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Black Forest Labs (BFL) is a German-American artificial intelligence company founded in 2024 by Robin Rombach, Andreas Blattmann, Patrick Esser, and Dominik Lorenz, all of whom were key researchers behind the latent diffusion technology that powered Stable Diffusion. The company is best known for creating the FLUX family of text-to-image models, which rapidly became some of the most widely used image generation models in the industry. Black Forest Labs has raised over $430 million in total funding, reaching a valuation of $3.25 billion as of its December 2025 Series B round ^[1]^[2]. The company describes its mission as building "foundational infrastructure for visual intelligence, technology that transforms how the world is seen and understood" ^[22].

The founding of Black Forest Labs represents a notable case of core technical innovators leaving a company (Stability AI) to build a new venture based on their own foundational research. The FLUX models have been adopted by major platforms including Elon Musk's Grok chatbot for image generation, and in September 2025, Adobe integrated FLUX.1 Kontext Pro into Photoshop's generative fill tool ^[3]^[4]. The same month, Meta signed a multi-year contract with Black Forest Labs worth approximately $140 million, signalling that BFL's models had become strategically important for the largest social platforms as well ^[15].

Overview

Black Forest Labs operates from offices in Freiburg, Germany, and the United States. The company's name is a reference to the Black Forest region in southwestern Germany, close to where most of its founding researchers completed their PhDs. The labs concentrate on visual generative models: text-to-image generation, image-to-image editing, in-context editing, and, by late 2025, early development of text-to-video systems. The product line is structured around a tiered licensing approach. The most capable models are closed and offered through Black Forest Labs' own API. Below those sit guidance-distilled open-weight models for developers and researchers. At the bottom of the stack are small, fully permissive Apache 2.0 models such as FLUX.1 [schnell] and FLUX.2 [klein] that can run on a consumer GPU ^[1]^[8].

The combination of strong technical performance and a permissive bottom tier turned FLUX into the de facto successor to Stable Diffusion in the open-source community. Tools such as ComfyUI, the Hugging Face Diffusers library, Replicate, fal.ai, and Together AI added FLUX support within days of the August 2024 launch, and most major image generation tutorials and LoRA training guides published after late 2024 use FLUX rather than Stable Diffusion as the default base model ^[7]^[26]^[27]. FLUX.1 [dev] became the most-downloaded open image model on Hugging Face, accumulating tens of millions of downloads, and Black Forest Labs states that the open-weight FLUX models are the most widely deployed visual generation systems in the world ^[22].

The company has been profitable on a unit-economics basis since shortly after the launch of its API, helped by enterprise contracts with Adobe, Canva, Meta, and Snap. Combined enterprise contract value reached around $300 million by the end of 2025, according to Bloomberg ^[15]. Black Forest Labs' investors include Andreessen Horowitz, General Catalyst, NVIDIA, Salesforce Ventures, and Temasek, alongside angel investors who include Brendan Iribe, Garry Tan, Guillermo Rauch, Clem Delangue, and Mati Staniszewski.

Founders and origins

The latent diffusion paper

The story of Black Forest Labs begins with a research paper published in 2022: "High-Resolution Image Synthesis with Latent Diffusion Models," presented at CVPR 2022. The paper was authored by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer, all affiliated with Ludwig-Maximilians-Universitat (LMU) Munich and Heidelberg University ^[5].

This paper introduced the concept of latent diffusion, a technique that applies the diffusion process in a compressed latent space rather than directly on pixel-level images. By operating in this lower-dimensional space, latent diffusion models could generate high-quality images at a fraction of the computational cost of previous diffusion approaches. The paper's core innovation enabled the creation of Stable Diffusion, which became the most widely used open-source AI image generation system in the world ^[5].

The CompVis group at LMU Munich, led by Bjorn Ommer, had already produced an influential earlier paper, "Taming Transformers for High-Resolution Image Synthesis," which introduced VQGAN. Patrick Esser was the lead author of that paper, and it became one of the technical building blocks behind both DALL-E Mini and the later latent diffusion work. By the time the latent diffusion paper appeared, the four researchers who would later co-found Black Forest Labs had been collaborating for several years on the same core problem of efficient high-resolution image synthesis with neural networks ^[5].

The Stability AI years

Stable Diffusion was released in August 2022 by Stability AI, a company founded by Emad Mostaque. The model was built directly on the latent diffusion research of Rombach and his colleagues from the CompVis group at LMU Munich. Four of the five original latent diffusion paper authors (Rombach, Blattmann, Esser, and Lorenz) joined Stability AI to continue developing the technology commercially ^[5]^[6].

At Stability AI, the team developed subsequent versions of Stable Diffusion, including Stable Diffusion 2.0 and Stable Diffusion XL (SDXL). Patrick Esser later led the work on Stable Diffusion 3, which introduced the multi-modal diffusion transformer (MM-DiT) architecture and rectified flow training. That technical direction would carry directly over to FLUX once the team left.

Stability AI experienced significant organizational and financial turbulence in 2023 and 2024. Reports surfaced of slow vendor payments, layoffs, and friction between Mostaque and the senior research staff. CEO Emad Mostaque resigned in March 2024 amid growing pressure, and the company subsequently went through several rounds of leadership transitions ^[6]. Robin Rombach and the other senior researchers behind Stable Diffusion 3 resigned around the same time, citing concerns about the company's ability to fund continued large-scale research.

The departure of the core technical team from Stability AI to found Black Forest Labs reflected a broader pattern in the AI industry where the researchers who develop key technologies often move on to build new companies, taking their expertise (though not their former employer's proprietary work) with them. The split was relatively clean from a legal perspective: the latent diffusion paper had been published as academic work at LMU Munich and Heidelberg, and the founders left Stability AI before founding the new company rather than spinning out from it.

Founding team

Founder	Role	Background
Robin Rombach	CEO	Lead author of the latent diffusion paper; PhD at LMU Munich and Heidelberg; studied physics at University of Heidelberg (2013-2020)
Andreas Blattmann	Co-founder	Co-author of latent diffusion paper; researcher at LMU Munich; contributed to video diffusion research and the Stable Video Diffusion model
Patrick Esser	Co-founder	Co-author of latent diffusion paper; lead author of VQGAN; led Stable Diffusion 3 research at Stability AI
Dominik Lorenz	Co-founder	Co-author of latent diffusion paper; researcher at LMU Munich; worked on video and 3D diffusion at Stability AI

All four founders had previously worked in the Computer Vision and Learning (CompVis) group at LMU Munich under the supervision of Bjorn Ommer. Their shared research background and years of collaboration provided a strong technical foundation for the new company ^[5].

A fifth notable figure connected to the company is Tim Brooks, formerly one of the co-leads of OpenAI's Sora video model. Brooks joined Black Forest Labs as an advisor in 2024, lending credibility to the company's stated long-term ambitions to expand from images into video. Bjorn Ommer, the founders' former PhD supervisor at LMU Munich, remained in academia but continued to publish related research and is widely seen as a member of the extended BFL technical community.

Funding

Black Forest Labs has raised capital rapidly, reflecting strong investor confidence in the team's technical capabilities and the commercial potential of their image generation technology.

Round	Date	Amount	Valuation	Lead Investors
Seed	August 2024	$31 million	~$150 million (post-money)	Andreessen Horowitz (a16z)
Series A	Late 2024	~$100 million	~$1 billion	Andreessen Horowitz
Series B	December 2025	$300 million	$3.25 billion	Salesforce Ventures, AMP (Anjney Midha)
Total		$430+ million	$3.25 billion

Seed round

The seed round of $31 million was announced alongside the company's public launch on August 1, 2024. In addition to a16z, the seed round included investments from notable figures including Brendan Iribe (Oculus co-founder), Michael Ovitz, and Garry Tan (Y Combinator CEO) ^[1]. The lead partner at Andreessen Horowitz, Anjney Midha, would remain closely involved with the company and later co-lead its Series B through his own firm, AMP, alongside Salesforce Ventures.

The seed round was sized aggressively for a company that had not yet shipped a product. According to reporting by TechCrunch, the founders' technical reputations and the visible demand for higher-quality open-source image models made the round oversubscribed within days of the founding pitch ^[1].

Series A

By September 2024, reports emerged that Black Forest Labs was seeking to raise approximately $100 million at a $1 billion valuation, representing a dramatic jump from the $150 million post-money valuation of the seed round. The Series A was led by Andreessen Horowitz, with participation from BroadLight Capital, Creandum, Earlybird VC, General Catalyst, Northzone, and NVIDIA ^[3]. The round was not publicly confirmed at the time but was later disclosed alongside the Series B in December 2025.

The rapid step-up from a $150 million post-money valuation to a $1 billion valuation in roughly six weeks was driven by the August launch of FLUX.1 and the August 13 integration with xAI's Grok, which sent traffic and demand through the roof.

Series B

The December 2025 Series B of $300 million at a $3.25 billion valuation was co-led by Salesforce Ventures and AMP, with additional participation from a16z, NVIDIA, General Catalyst, Temasek, Air Street Capital, Bain Capital Ventures, Canva, Figma Ventures, Adobe Ventures, Samsung Next, and Lux Capital. Notable angel investors in the round included Guillermo Rauch (Vercel CEO), Clem Delangue (Hugging Face CEO), and Mati Staniszewski (ElevenLabs CEO) ^[2]. The round closed on December 1, 2025.

Reports during the fundraising process indicated that Black Forest Labs had also been in discussions with several sovereign wealth funds, including Saudi Arabia's Public Investment Fund (PIF). The Series B announcement did not name PIF as a participant, but the round's combination of strategic investors (Adobe Ventures, Canva, Figma Ventures, Samsung Next) reflected the breadth of platforms that had come to depend on FLUX models for their own products.

The Series B brought Black Forest Labs' total funding to more than $430 million, less than 18 months after the company emerged from stealth ^[2]^[32]. As of the round, the company reported revenue run rate in the tens of millions of dollars per year, driven primarily by API usage and enterprise contracts.

FLUX.1 family launch

FLUX.1 was released alongside the company's public launch on August 1, 2024. The model family initially comprised three variants, each targeting different use cases and licensing requirements ^[7].

Variant	Parameters	License	Steps	Availability
FLUX.1 [schnell]	12 billion	Apache 2.0	1 to 4	Open weights (Hugging Face)
FLUX.1 [dev]	12 billion	BFL non-commercial	20 to 50	Open weights (Hugging Face)
FLUX.1 [pro]	12 billion	Proprietary	Varies	API only

FLUX.1 [schnell]

Schnell (German for "fast") is the speed-optimized variant, capable of generating images in just 1 to 4 inference steps. It is released under the Apache 2.0 license, making it fully open for commercial and personal use. It can run on GPUs with as little as 12 GB of VRAM, making it accessible on consumer hardware. While it produces slightly lower-fidelity images than the dev or pro variants, the quality is high for the extremely low step count ^[7]. Schnell is widely used in real-time and interactive applications because its low latency allows generation to feel like a direct manipulation rather than a batch process.

FLUX.1 [dev]

The dev variant is a guidance-distilled version of FLUX.1 [pro], offering higher quality than schnell at the cost of requiring 20 to 50 inference steps (with 30 to 40 recommended for optimal results). It is released as source-available software under a non-commercial license, although users can obtain a self-serving commercial license from BFL. The dev model produces noticeably better skin textures, lighting effects, and fine details compared to schnell ^[7]. Most LoRA training in the community is done against FLUX.1 [dev] rather than the pro variant, since the dev weights are downloadable.

FLUX.1 [pro]

The professional variant is available exclusively through BFL's API and through partner platforms. It offers the highest image quality in the initial FLUX.1 lineup, with superior prompt adherence, photorealistic rendering, and fine detail work. It is the only variant whose weights are not publicly distributed ^[7]. Pro is typically priced at around $0.05 per image through the API, though pricing varies by partner platform.

Reception of the initial launch

Upon release, FLUX.1 quickly demonstrated state-of-the-art performance. The models outperformed Midjourney 6.1, DALL-E 3, and Stable Diffusion XL on multiple evaluation metrics, including visual quality, prompt adherence, and text rendering within images ^[7]. Within the first week, the FLUX.1 GitHub repository accumulated thousands of stars, the dev weights on Hugging Face crossed half a million downloads, and BFL's API processed over a million inference requests, according to the company's own data and tracking on Hugging Face ^[4].

A particular point of attention was anatomy. Image models had historically struggled with hands, faces, and complex multi-person scenes. FLUX.1 [dev] showed visibly better hand and finger rendering than SDXL, Midjourney V6, and DALL-E 3, which became a frequent talking point in early review threads on Reddit's r/StableDiffusion community and on X (formerly Twitter). Text rendering inside images was also markedly better, with FLUX able to produce short legible signs and labels that earlier models would have mangled.

FLUX 1.1 line

FLUX1.1 [pro]

Released on October 2, 2024 alongside the general availability of the BFL API, FLUX1.1 [pro] was a major upgrade that generated images six times faster than the original FLUX.1 [pro] while simultaneously improving image quality, prompt adherence, and output diversity. The model generates photorealistic images in approximately 4.5 seconds. It was submitted to the Artificial Analysis image arena under the codename "blueberry" and achieved the highest overall Elo score of any model on the leaderboard at the time of its debut ^[8].

FLUX1.1 [pro] supports high-resolution generation up to roughly 2K (2048 by 2048 pixels) without sacrificing quality, and it introduced improved handling of text rendering, complex multi-object scenes, and human anatomy. The release coincided with the general availability of api.bfl.ai, which up to that point had been in private beta.

FLUX1.1 [pro] Ultra

On November 6, 2024, BFL released FLUX1.1 [pro] Ultra mode. Ultra extends FLUX's capability to generate images at four times the resolution of the standard FLUX1.1 [pro], producing 4-megapixel images (up to 2752 by 1184 pixels) in about 10 seconds. Internal benchmarks showed Ultra was over 2.5 times faster than comparable high-resolution offerings from competitors. Ultra is priced at $0.06 per image through the BFL API ^[16].

FLUX1.1 [pro] Raw

Released alongside Ultra, Raw mode captures the genuine feel of candid photography, producing images with a less synthetic, more natural aesthetic. It significantly increases diversity in human subjects and enhances the realism of nature photography, addressing the common criticism that AI-generated images can look overly polished or "plastic." Raw mode is available as a toggle on both the standard and Ultra variants ^[16]. Raw output is noticeably less "airbrushed" than the default and often resembles smartphone photography or stock photo libraries rather than the highly stylised look common to other models.

FLUX.1 Tools

On November 21, 2024, BFL released FLUX.1 Tools, a suite of editing capabilities designed to extend the core FLUX models ^[17].

Tool	Function	Availability
FLUX.1 Fill	Inpainting and outpainting with text-guided editing	Pro (API) and Dev (open weights)
FLUX.1 Depth	Structural guidance based on depth maps from input images	Pro (API) and Dev (open weights)
FLUX.1 Canny	Structural guidance based on canny edge detection from input images	Pro (API) and Dev (open weights)
FLUX.1 Redux	Adapter for mixing and recreating input images with text prompts	Pro (API) and Dev (open weights)

Each tool was released as a FLUX.1 [pro] variant through the API and as a guidance-distilled open-access FLUX.1 [dev] variant with inference code and weights on Hugging Face. FLUX.1 Fill [pro] achieved state-of-the-art results in inpainting benchmarks at the time of release. FLUX.1 Canny and Depth provide ControlNet-style structural conditioning, enabling precise control over the spatial layout and structure of generated images. Redux supports image variations and remixing, allowing a single reference image to be re-generated with different prompts or in different styles ^[17].

The combination of Fill, Depth, Canny, and Redux gave FLUX feature parity with the most popular ControlNet workflows that had developed around SDXL, and most ComfyUI users who had been running ControlNet pipelines migrated to FLUX Tools within a few weeks of the November 2024 release.

FLUX Pro Finetuning API

On January 16, 2025, BFL launched the FLUX Pro Finetuning API, enabling users to customize FLUX.1 [pro] with their own images and concepts. The system requires as few as 1 to 5 example images to create a targeted customization. In user studies conducted by BFL, finetuning results were preferred 68.9% of the time over other available finetuning services using FLUX.1 [dev] ^[18].

Once a finetune is created, it can be applied across the entire FLUX.1 model suite without additional adaptation, including FLUX.1 [pro], FLUX1.1 [pro], and the complete FLUX.1 Tools suite. This enables customized content generation with resolutions up to 4 megapixels, customized inpainting with FLUX.1 Fill, and customized structural control with FLUX.1 Depth ^[18]. The finetuning API is used heavily by enterprise customers who want consistent brand styles, characters, or product likenesses without managing their own training infrastructure.

FLUX Kontext

FLUX.1 Kontext, released on May 29, 2025, represented a new direction for the model family. Rather than purely text-to-image generation, Kontext enables in-context image generation and editing, allowing users to prompt with both text and images. The model can extract and modify visual concepts from input images to produce new, coherent renderings ^[4]. In effect, Kontext is BFL's answer to GPT-Image and Gemini Flash Image, the multimodal image models from OpenAI and Google that arrived around the same time.

Kontext variant	Focus	Availability
Kontext [max]	Highest quality; iterative image modification	API ($0.08 per image)
Kontext [pro]	Balanced quality and speed	API ($0.04 per image)
Kontext [dev]	Non-commercial research	Open weights (Hugging Face)

Kontext can extract and modify visual concepts from reference images to produce new coherent renderings. Common use cases include character consistency across multiple generations, style transfer, object replacement, garment changes, and iterative editing without requiring fine-tuning or complex multi-step workflows. Kontext was the first BFL model that natively supported "edit this image" instructions in plain English, in contrast to the earlier ControlNet-style workflows that required separate masks or conditioning images.

BFL reported that Kontext models deliver inference speeds up to 8 times faster than competing context-aware image editing models such as GPT-Image. The benchmark comparisons were independently reviewed by several third parties, including the Artificial Analysis benchmark team, who corroborated the speed advantage at comparable quality ^[4].

In September 2025, Adobe announced that Flux.1 Kontext Pro was available as a model option for Photoshop's generative fill tool in beta, marking a significant validation from the professional creative tools industry ^[4]^[12]. Kontext also became the default image-editing backend for several smaller design tools, including Magnific by Freepik.

On June 26, 2025, BFL released FLUX.1 Kontext [dev] as open weights, allowing the community to run the model locally and integrate it into custom workflows. The release made Kontext the first open-source model with general-purpose instruction-based image editing capabilities at this quality level ^[19].

FLUX Krea

On July 31, 2025, BFL released FLUX.1 Krea [dev], a model developed in collaboration with Krea AI. FLUX.1 Krea [dev] is a 12-billion-parameter rectified flow transformer that was specifically trained to overcome the oversaturated "AI look" common in text-to-image models, achieving higher photorealism with a distinctive aesthetic approach ^[20].

The model is the open-weights version of Krea 1, offering strong performance with highly distinctive aesthetics and exceptional realism. It scored 1011 Elo in human evaluation tests, outperforming other open-source FLUX models and approaching the quality of premium models like FLUX1.1 [pro]. It was released under a non-commercial license with weights available on Hugging Face ^[20]. The Krea collaboration was unusual in that BFL allowed an outside team to train an opinionated, stylistically distinctive checkpoint on top of FLUX and distribute it as part of the official FLUX family, rather than as a third-party finetune.

Krea AI itself is a creative tooling company that operates a real-time image generation product. The two companies' collaboration started in late 2024 and continued through 2025, with the Krea team focused on training data curation and aesthetic tuning while the BFL team handled the underlying model training infrastructure.

FLUX.2 family

On November 25, 2025, Black Forest Labs announced FLUX.2, the second major generation of the model family. The release included several variants, with additional models rolling out through early 2026 ^[9]. FLUX.2 introduced headline capabilities that set it apart from FLUX.1: support for up to 10 reference images in a single request, native output up to 4 megapixels, and prompts of up to 32,000 input tokens ^[22]. Black Forest Labs framed the release around production use, stating that "FLUX.2 is designed for real-world creative workflows, not just demos or party tricks" ^[22].

Model	Parameters	Text encoder	License	Release date	Key features
FLUX.2 [max]	32B	Mistral-3 24B VLM	Proprietary (API)	January 2026	Highest quality, grounded generation with web context
FLUX.2 [pro]	32B	Mistral-3 24B VLM	Proprietary (API)	November 2025	Production-grade, multi-reference support
FLUX.2 [flex]	32B	Mistral-3 24B VLM	Proprietary (API)	November 2025	Tunable parameters, typography specialist
FLUX.2 [dev]	32B	Mistral-3 24B VLM	BFL non-commercial	November 2025	Open weights, LoRA training
FLUX.2 [klein] 9B	9B	Qwen3 8B	Apache 2.0	January 15, 2026	Sub-second generation, consumer hardware
FLUX.2 [klein] 4B	4B	Qwen3 8B	Apache 2.0	January 15, 2026	Smallest model, ~13 GB VRAM
FLUX.2 VAE	n/a	n/a	Apache 2.0	November 2025	Variational autoencoder
FLUX.2 [klein] 9B-KV	9B	Qwen3 8B	Apache 2.0	March 2026	KV-cache for 2.5x faster multi-reference editing

The FLUX.2 variational autoencoder was released as open-source software under the Apache 2.0 license, allowing the community to build on the model's image encoding and decoding capabilities. The full set of FLUX.2 variants spans roughly three orders of magnitude in parameter count (4B to 32B) and several orders of magnitude in compute requirements, giving developers a continuous range of price/quality options ^[9].

Architectural changes in FLUX.2

FLUX.2 made two significant architectural changes compared with FLUX.1. First, the model scales to 32 billion parameters in the largest variant. Second, it replaces the dual T5 plus CLIP text encoder system with a Mistral-3 24B vision-language model. By coupling a VLM trained on a large corpus of interleaved text and images with the rectified flow transformer, FLUX.2 has more grounded "world knowledge" than its predecessors, enabling better understanding of real-world concepts, spatial relationships, and material properties ^[9]. The smaller klein variants pair a 9B or 4B flow transformer with a Qwen3 8B text embedder instead of Mistral.

Is FLUX.2 open source?

FLUX.2 continues the company's tiered approach to openness. The 32-billion-parameter FLUX.2 [dev] variant is distributed with open weights under the BFL non-commercial license, and Black Forest Labs describes it as "the most powerful open-weight image generation and editing model available today," combining text-to-image synthesis and multi-image editing in a single checkpoint ^[22]^[31]. The fully permissive end of the FLUX.2 line, the Apache 2.0 FLUX.2 [klein] models and the FLUX.2 VAE, can be used and redistributed commercially without a separate license. The proprietary [pro], [flex], and [max] variants remain API-only.

FLUX.2 [klein]

Klein (German for "small") is the fastest model family in the FLUX lineup, generating and editing images in under one second on modern hardware. Available in 4B and 9B parameter sizes, klein is designed for real-time applications, rapid creative iteration, and deployment on consumer hardware. The 4B variant requires approximately 13 GB of VRAM, making it accessible on GPUs like the NVIDIA RTX 3090 and RTX 4070. It is released under the Apache 2.0 license ^[22].

Unlike previous-generation models that required separate pipelines for generation and editing, FLUX.2 [klein] unifies text-to-image, single-reference editing, and multi-reference generation in one architecture. In March 2026, BFL released FLUX.2 [klein] 9B-KV and its FP8 variant, which incorporate KV-cache optimization. By caching key-value pairs from reference images during the first denoising step, the KV variant eliminates redundant computation in subsequent steps, achieving up to 2.5 times faster inference for multi-reference editing tasks ^[24].

Architecture and methodology

The FLUX models represent a significant architectural departure from the U-Net-based architecture used in Stable Diffusion 1.x, 2.x, and SDXL. Instead, FLUX builds on two key technical innovations: the Diffusion Transformer (DiT) architecture and flow matching with rectified flow ^[7]^[10].

Traditional diffusion models for image generation, including all early versions of Stable Diffusion, used U-Net architectures for the denoising network. The DiT approach replaces the U-Net with a transformer architecture, bringing the scalability advantages of transformers (which had proven so effective in language modeling) to image generation. The DiT framework was introduced in the 2023 paper "Scalable Diffusion Models with Transformers" by William Peebles and Saining Xie ^[10].

FLUX uses a multimodal DiT variant called MM-DiT (Multimodal Diffusion Transformer), originally introduced in Patrick Esser et al.'s "Scaling Rectified Flow Transformers for High-Resolution Image Synthesis," which was published at Stability AI in early 2024. The architecture processes text and image information through parallel streams that interact at multiple points during the generation process. The FLUX.1 architecture consists of 57 total transformer blocks divided into two types:

19 dual-branch (double-stream) blocks process text and image tokens through separate attention pathways, allowing each modality to maintain its own representation before interacting through cross-attention mechanisms.
38 single-branch (single-stream) blocks concatenate the text and image embeddings and process them as a unified sequence with shared weights.

This hybrid design allows the model to maintain modality-specific processing in the early layers (where text and image features are quite different) while enabling deep integration in the later layers (where the model needs to tightly coordinate text semantics with visual content) ^[7]^[10].

Flow matching with rectified flow

FLUX uses rectified flow matching rather than the traditional DDPM (Denoising Diffusion Probabilistic Models) approach used in Stable Diffusion 1.x and 2.x. Flow matching is a more general framework for training generative models that includes diffusion as a special case ^[10].

In traditional diffusion models, the generation process involves gradually adding Gaussian noise to an image during training (the forward process) and then learning to reverse this noisy process step by step (the reverse process). Flow matching takes a conceptually simpler approach: it learns a deterministic vector field that transforms samples from a simple noise distribution directly to the target data distribution along an optimal transport path. Rectified flow, in particular, encourages linear denoising trajectories: the model learns to transform noise into images along straighter paths in the generative process. This property enables more efficient sampling, since each step makes more progress along the generation trajectory. It is why FLUX.1 [schnell] can produce good results in as few as one to four steps, compared with the 20 to 50 steps typically needed by older Stable Diffusion models ^[7]^[10].

Text encoders

FLUX.1 uses two text encoders working in tandem to process input prompts ^[21]:

CLIP L/14 trained on hundreds of millions of image-text pairs. The CLIP text encoder captures rich semantic information and maps text into a shared latent space with image embeddings. It has a fixed token limit of 77 tokens.
T5-v1.1-XXL, a large language model developed by Google that provides rich, token-level semantic representations for long and complex textual prompts. T5 can handle up to 512 tokens (256 on the schnell variant), enabling much more detailed prompt descriptions than CLIP alone.

This combination allows FLUX to interpret complex scene descriptions with high fidelity, leveraging CLIP's visual-semantic alignment alongside T5's deep language understanding. FLUX.2 replaces this dual-encoder system with the Mistral-3 24B vision-language model, which acts as a unified encoder for both text and reference images.

Additional technical features

Feature	Description	Benefit
Rotary Position Embeddings (RoPE)	Position encoding via rotation matrices	Better handling of varying image resolutions
Parallel attention layers	Concurrent computation of attention and feedforward	Improved hardware efficiency
16-channel latent space	Compressed representation with more channels than SDXL	Captures more nuance in textures and lighting
12 billion (FLUX.1) and 32 billion (FLUX.2) parameters	Large model capacity	Strong generation quality

The combination of these architectural choices results in a model that generates higher-quality images, handles text rendering better (a longstanding weakness of diffusion models), and runs more efficiently than previous approaches ^[7]^[10].

Relationship to Stable Diffusion

The relationship between Black Forest Labs and Stable Diffusion is central to understanding the company's position in the AI image generation landscape.

The foundational latent diffusion research was conducted at LMU Munich and Heidelberg University, funded by academic grants and the German Research Foundation (DFG). When Stability AI commercialized this research as Stable Diffusion in 2022, the core researchers (Rombach, Blattmann, Esser, Lorenz) joined the company ^[5]^[6].

Stability AI funded further development of the technology, producing Stable Diffusion 2.0, SDXL, and Stable Diffusion 3. However, as Stability AI encountered financial difficulties and leadership turmoil in 2023 and 2024, the technical team departed to start Black Forest Labs. The FLUX models represent a clean break architecturally (using DiT and flow matching instead of U-Net and DDPM), but the intellectual lineage from latent diffusion through Stable Diffusion to FLUX is direct and continuous ^[6].

Aspect	Stable Diffusion (1.x / 2.x / XL)	FLUX
Architecture	U-Net based	Diffusion Transformer (DiT)
Training method	DDPM	Rectified flow matching
Text encoder	CLIP (1.x/2.x), CLIP+OpenCLIP (XL)	T5 + CLIP (FLUX.1), Mistral-3 (FLUX.2)
Developer	Stability AI	Black Forest Labs
Core researchers	Rombach, Blattmann, Esser, Lorenz	Same (now at BFL)
Image quality	Good (progressive improvement)	State of the art at release
Text rendering	Weak in 1.x/2.x; better in SD3	Strong from FLUX.1 onwards
Generation speed	20 to 50 steps typical	1 to 4 steps (schnell), 20 to 50 (pro)

Stability AI continued to develop its own models after the departure of the founding researchers, releasing Stable Diffusion 3 and subsequent versions. However, FLUX models have generally been regarded as technically superior, leading to a situation where the spiritual successors to Stable Diffusion now compete against it ^[6]. Many community tools have effectively switched their default model to FLUX, even though they remain backwards-compatible with Stable Diffusion checkpoints.

Commercial partnerships

Black Forest Labs has secured several high-profile commercial partnerships that have driven adoption of FLUX models.

xAI and Grok

Elon Musk's xAI launched image generation in the Grok chatbot on August 13, 2024, less than two weeks after BFL emerged from stealth. The image generation backend was FLUX.1, accessed via Black Forest Labs' API. This partnership brought significant visibility to Black Forest Labs and demonstrated the models' production readiness for high-traffic consumer applications ^[3]^[10].

The rollout was, by any standard, chaotic. Grok's image generation feature was launched without strong content filters: users on X were quickly able to generate fairly realistic images of celebrities, politicians, and copyrighted characters, including Donald Trump, Kamala Harris, Taylor Swift, Mickey Mouse, and many others, in scenarios that ranged from satirical to clearly defamatory. News coverage during August 2024 in The Verge, Wired, Bloomberg, and Reuters focused on the looseness of the safety pipeline and the implications for political deepfakes ahead of the 2024 US presidential election ^[10].

Black Forest Labs and xAI later tightened the content filters significantly. xAI also switched its primary image generation backend to its own Aurora model in December 2024, although FLUX continued to be available as an option inside Grok for some time after that.

Adobe

In September 2025, Adobe integrated FLUX.1 Kontext Pro into the generative fill tool in Photoshop (beta). This partnership is particularly significant because it places FLUX technology in one of the world's most widely used creative software suites, alongside Adobe's own Firefly models. Photoshop users can choose between Firefly and FLUX.1 Kontext for generative fill, with FLUX.1 Kontext often producing more photorealistic results while Firefly retains a clearer commercial-licensing position ^[4]^[12].

Other platforms

Black Forest Labs operates its own API (api.bfl.ai) for direct access to FLUX models. The API is also available through third-party platforms including Replicate, fal.ai, Together AI, Cloudflare Workers AI, DeepInfra, Runware, and the NVIDIA NIM API catalog ^[6]^[7]. The model is also available through the Hugging Face Inference Endpoints and Microsoft Azure AI Foundry, the latter through a partnership announced in August 2025 ^[19].

Freepik adopted FLUX as the backend for its AI Image Generator and Magnific image upscaler, scaling to millions of requests per day through specialized infrastructure partnerships with DataCrunch and WaveSpeed AI ^[11]. Krea AI used FLUX as the basis for its real-time generation product and later collaborated on the official FLUX.1 Krea [dev] release.

Reception and benchmarks

FLUX models have consistently ranked at or near the top of independent image generation benchmarks. The Artificial Analysis image arena, lmarena.ai, and various academic evaluations have all placed FLUX models among the leading text-to-image systems since the August 2024 launch.

Artificial Analysis

FLUX1.1 [pro] launched in October 2024 by entering the Artificial Analysis image arena under the codename "blueberry" and topping the Elo leaderboard. It outperformed Midjourney 6.1, Ideogram v2, and Stable Diffusion 3 in both visual fidelity and prompt accuracy. FLUX1.1 [pro] Ultra later achieved similar leaderboard dominance at higher resolutions ^[8].

lmarena.ai image arena

On lmarena.ai, the same crowdsourced evaluation site behind the Chatbot Arena for text models, FLUX models have consistently appeared in the top three by Elo score since their launch. FLUX1.1 [pro] held the top position for several months in late 2024 before being overtaken in some categories by Google's Imagen 3 and OpenAI's GPT-Image, with FLUX.2 [pro] reclaiming much of the lead following its November 2025 release.

MLPerf benchmark

In October 2025, MLCommons selected FLUX.1 as the new text-to-image benchmark for MLPerf Training v5.1, replacing Stable Diffusion v2 to reflect modern model architectures and scale. The 11.9-billion-parameter transformer-based model serves as a representative benchmark for current generative AI workloads. In the MLPerf Training v5.1 results released on November 12, 2025, NVIDIA set a record time-to-train of 12.5 minutes using 1,152 Blackwell GPUs ^[29].

Specific strengths in benchmarks

Capability	FLUX position	Notes
Text rendering	Best in class	Legible spelling at small sizes, accurate typography
Photorealism (humans)	Best in class	Skin texture, lighting, eyes notably better than SDXL and Midjourney V6
Anatomy (hands)	Best in class	Far fewer extra-finger and limb errors than SDXL
Prompt adherence (complex)	Top tier	Multi-object spatial relationships handled well
Aesthetic preference	Mixed	Midjourney often preferred for stylised illustration; FLUX preferred for photoreal
Speed	Top tier	Schnell can match Midjourney's turbo modes at lower latency

In aesthetic preference tests, Midjourney is often preferred for highly stylised illustration work, while FLUX is preferred for photorealism, product photography, and editorial-style content ^[9].

Comparison with competitors

Model	Developer	Architecture	Open weights	Text quality	Speed	Key strength
FLUX.2	Black Forest Labs	DiT + flow matching	Partial (klein, VAE)	Excellent	Fast	Technical quality, open variants
Midjourney v6	Midjourney	Unknown (proprietary)	No	Good	Medium	Artistic aesthetics
DALL-E 3 / GPT-Image	OpenAI	Unknown (proprietary)	No	Good	Medium	Integration with ChatGPT
Stable Diffusion 3	Stability AI	MM-DiT (similar lineage)	Partial	Good	Medium	Community ecosystem
Imagen 3	Google	Diffusion + transformer	No	Excellent	Medium	Photorealism
Firefly	Adobe	Proprietary	No	Good	Fast	Commercial licensing clarity

FLUX models have generally achieved top rankings on independent benchmarks for text-to-image quality, particularly excelling in prompt adherence, text rendering, and photorealism. The availability of open-weight variants (schnell and klein under Apache 2.0, dev for non-commercial use) gives FLUX a significant advantage among developers and researchers who want to run models locally or fine-tune them for specific applications ^[7]^[8].

Licensing

Black Forest Labs uses a tiered licensing strategy across its model family ^[1]:

Apache 2.0 applied to the schnell and klein model variants, plus the FLUX.2 VAE. These weights can be used for unrestricted commercial and personal use, modification, and redistribution.
BFL Non-Commercial License applied to dev model variants (including Kontext dev and Krea dev). These weights can be used freely for research, education, and personal projects. Commercial use requires a separate license from BFL.
Proprietary applied to pro, max, and flex variants, available only through BFL's API or licensed partner platforms.

This approach balances open-source community building (through the Apache-licensed models) with revenue generation (through the API-only professional variants).

The FLUX.1 [dev] Non-Commercial License has been the source of some debate inside the open-source community. The license allows derivative works and finetunes, but it forbids using the model or its outputs for commercial purposes without obtaining a separate license. Black Forest Labs has stated that obtaining commercial licenses is relatively straightforward for small businesses and indie developers, and that pricing is typically tied to API usage rather than per-deployment fees. Larger commercial users typically use the pro variants through the API instead, where the licensing is bundled with the per-image cost.

The community has generally accepted the licensing terms more readily than it accepted similar tiered models from other vendors, partly because the schnell and klein variants are fully Apache 2.0 and partly because the dev variant is still freely available for research and personal use. The Apache-licensed FLUX.2 VAE released in November 2025 was an additional signal of openness, since the VAE is a critical component of any FLUX-based pipeline and can also be reused with other diffusion models.

Ecosystem

The open-weight FLUX models have produced a large and active community, particularly around tools and platforms that previously concentrated on Stable Diffusion.

ComfyUI

ComfyUI, the node-based visual workflow tool for diffusion models, provided day-one support for FLUX.1 in August 2024. ComfyUI also supported FLUX.1 Tools at their November 2024 launch and has continued to add support for new FLUX variants as they are released. In November 2025, NVIDIA highlighted FLUX.2 models as optimized for RTX GPUs and showcased ComfyUI workflows in its RTX AI Garage program ^[26].

FLUX has effectively replaced Stable Diffusion as the default model in many ComfyUI tutorials and workflows. Most newer custom nodes, schedulers, and samplers are written with FLUX in mind, and the most-shared workflows on the ComfyUI subreddit and on Civitai are now FLUX-based rather than SDXL-based.

Hugging Face Diffusers

The Hugging Face Diffusers library, the standard Python library for running diffusion models, added FLUX.1 support shortly after launch. By late 2024, Diffusers had become the primary distribution channel for FLUX.1 [dev] and FLUX.1 [schnell] weights, complemented by the FLUX GitHub inference repository maintained by Black Forest Labs.

Diffusers also handles FLUX.1 Tools (Fill, Depth, Canny, Redux), FLUX.1 Kontext [dev], FLUX.1 Krea [dev], FLUX.2 [dev], and FLUX.2 [klein]. Custom pipelines for inpainting, ControlNet-style guidance, multi-reference editing, and LoRA training are maintained both by the BFL team and by the open-source community ^[22].

Replicate, fal.ai, and Together AI

Replicate added FLUX.1 to its model marketplace on day one and remained one of the most popular ways to use FLUX from a hosted API. Replicate's per-image pricing closely mirrors BFL's own API pricing. fal.ai and Together AI similarly added FLUX support quickly and have continued to push optimisations: fal.ai released an optimised FLUX.2 [dev] Turbo distillation in December 2025 that enabled high-quality generation in just 8 inference steps ^[28].

Civitai and LoRAs

Civitai, the dominant community hub for diffusion model checkpoints and LoRAs, hosts thousands of FLUX LoRAs covering specific characters, art styles, photographic looks, and product types. Tools like FluxGym and Kohya_ss simplify FLUX LoRA training for users with 12 GB or more of VRAM. FLUX LoRAs are typically trained against FLUX.1 [dev] but are usually compatible with FLUX.1 [pro] and FLUX1.1 [pro] through the FLUX Pro Finetuning API ^[27].

Microsoft Azure AI Foundry and NVIDIA NIM

In August 2025, FLUX models became available on Microsoft Azure AI Foundry, extending BFL's reach into Microsoft's enterprise cloud ecosystem ^[19]. NVIDIA included FLUX in its NIM API catalog and produced TensorRT-optimised builds for RTX GPUs, which are bundled with ComfyUI through NVIDIA's RTX AI Garage program ^[26]. These enterprise integrations help FLUX reach customers who are unwilling or unable to call a German startup's API directly but who already have procurement relationships with Microsoft or NVIDIA.

Safety, deepfakes, and controversy

The Grok image generation incident

The most prominent controversy involving Black Forest Labs followed xAI's August 13, 2024 launch of image generation in Grok, which used FLUX.1 as its backend. Grok's safety filters at launch were noticeably more permissive than competing image generation services. Users were able to generate fairly realistic depictions of Donald Trump, Kamala Harris, Taylor Swift, Vice President Harris kissing Trump, Mickey Mouse holding firearms, and many other politically charged or copyrighted scenes that would have been blocked by OpenAI, Google, or Midjourney's safety systems ^[10].

News coverage in The Verge, Wired, Bloomberg, Reuters, NBC News, and The New York Times during August and September 2024 focused on the implications for political deepfakes ahead of the November 2024 US presidential election. Several commentators noted that Grok's looseness was an xAI policy choice rather than an inherent property of the FLUX model: BFL had documented safety guidelines and a watermark API, but xAI had not enabled them in the integration. The two companies subsequently tightened the filters and added more visible safeguards over the following months.

The episode echoed earlier controversies that had surrounded Stable Diffusion, which was widely used to generate non-consensual sexual imagery and deepfakes in the months after its August 2022 release. The BFL founders, who had been at Stability AI during that period, had publicly acknowledged the trade-offs of open weights and the difficulty of controlling downstream use. With FLUX, the company chose to keep the highest-quality variants closed and only release distilled, less capable open versions, which gave it more control over the moderation pipeline at the API level.

Content credentials and watermarking

From launch, BFL has supported the Coalition for Content Provenance and Authenticity (C2PA) standard, attaching cryptographic provenance metadata to images generated through its API. The metadata signals that the image was generated by a particular FLUX model, although the metadata is straightforward to strip if a user wants to do so. The company has also explored invisible watermarking that survives common image edits, similar to Google's SynthID, but at the time of the FLUX.2 launch in November 2025, this work had not been released as a default feature.

Copyright and training data

Like other diffusion model developers, Black Forest Labs faced questions about the training data used for its models. The company has not publicly disclosed its training corpus in detail. The FLUX models clearly inherit some of the same large-scale image-text datasets that powered Stable Diffusion, including LAION-derived data, but the team has stated that the FLUX training pipelines included additional curation and filtering steps relative to the original Stable Diffusion training runs. The team has not been a named defendant in the major copyright lawsuits that have targeted Stability AI and Midjourney, though the broader legal questions remain unsettled.

Safety advisory board

Black Forest Labs has stated that it works with external advisors on safety policy and that all major commercial partners (Adobe, Meta, Canva, Snap) operate the models under stricter content policies than BFL's default API. The company's public communications emphasise that closed pro variants are intended for moderated commercial use, while the open variants are intended primarily for research and creative tooling rather than for direct deployment in consumer-facing products without additional safety infrastructure.

Current state

As of early 2026, Black Forest Labs is one of the leading companies in AI image generation. The FLUX model family has evolved through multiple generations and variants, establishing itself as a top-tier option for both open-source and commercial image generation.

The company's $3.25 billion valuation from its December 2025 Series B reflects strong investor confidence in the team's ability to maintain technical leadership in a rapidly evolving field. The combination of open-weight models (which drive community adoption and developer ecosystems), proprietary API services (which generate revenue), and high-profile partnerships (Adobe, Meta, Canva, Snap, xAI) provides a multi-pronged business model ^[2]^[15].

The broader AI image generation landscape continues to advance rapidly, with competition from Midjourney, OpenAI, Google, Adobe, and others. Black Forest Labs' core advantage remains the deep technical expertise of its founding team, the same researchers who created the latent diffusion approach that transformed the entire field. The company has indicated ongoing development of a text-to-video model, building on Andreas Blattmann's earlier video diffusion work and Tim Brooks's experience at OpenAI's Sora team, positioning it to compete in video generation alongside image generation.

References

Stable Diffusion creators launch Black Forest Labs, secure $31M for FLUX.1 AI image generator - VentureBeat, August 2024 ↩
Black Forest Labs raises $300M at $3.25B valuation - TechCrunch, December 2025 ↩
Exclusive: Black Forest Labs, the company that powers Grok's image generation, is raising another $100M on a $1B valuation - TechCrunch, September 2024 ↩
Black Forest Labs Launches FLUX.1 Kontext - BusinessWire, May 2025 ↩
High-Resolution Image Synthesis with Latent Diffusion Models - CVPR 2022 ↩
The story of Black Forest Labs - MagicDoor ↩
Announcing Black Forest Labs - BFL Blog, August 2024 ↩
[Announcing FLUX1.1 [pro] and the BFL API](https://bfl.ai/announcing-flux-1-1-pro-and-the-bfl-api/) - BFL Blog, October 2024 ↩
Flux (text-to-image model) - Wikipedia) ↩
Meet Black Forest Labs, the startup powering Elon Musk's unhinged AI image generator - TechCrunch, August 2024 ↩
How Freepik scaled FLUX media generation to millions of requests per day - DataCrunch ↩
Black Forest Labs' Kontext AI models can edit pics as well as generate them - TechCrunch, May 2025 ↩
Black Forest Labs Open-Source FLUX.1: A 12 Billion Parameter Rectified Flow Transformer - MarkTechPost, August 2024
Demystifying Flux Architecture - arXiv, July 2025
Meta to Pay $140 Million to Use Black Forest Labs AI for Images - Sifted / Bloomberg, September 2025 ↩
[Introducing FLUX1.1 [pro] Ultra and Raw Modes](https://bfl.ai/flux-1-1-ultra/) - Black Forest Labs, November 2024 ↩
Introducing FLUX.1 Tools - Black Forest Labs, November 2024 ↩
Announcing the FLUX Pro Finetuning API - Black Forest Labs, January 2025 ↩
Black Forest Labs announcements - Black Forest Labs ↩
[FLUX.1 Krea [dev]: An Opinionated Text-to-Image Model](https://bfl.ai/blog/flux-1-krea-dev) - Black Forest Labs, July 2025 ↩
FLUX.1-dev: Encoders and Token Limitations - Medium ↩
FLUX.2: Frontier Visual Intelligence - Black Forest Labs, November 2025 ↩
[FLUX.2 [max] - Top-Tier Quality Image Generation](https://bfl.ai/models/flux-2-max) - Black Forest Labs
FLUX.2-klein-9b-kv - Hugging Face, March 2026 ↩
Pricing overview - Black Forest Labs Documentation
FLUX.2 Image Generation Models Now Released, Optimized for NVIDIA RTX GPUs - NVIDIA Blog ↩
ComfyUI FLUX LoRA Training: Detailed Guides - RunComfy ↩
New Year's AI surprise: Fal releases its own version of Flux 2 image generator - VentureBeat, December 2025 ↩
MLPerf Training Introduces Flux.1 Text-to-Image Benchmark - MLCommons, October 2025 ↩
FLUX.1 AI Image Gen in Grok 2.0 - Ultralytics, August 2024
black-forest-labs/FLUX.2-dev - Hugging Face, November 2025 ↩
Black Forest Labs raises $300M at $3.25B valuation - TechCrunch, December 2025 ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

6 revisions by 1 contributors · full history

Suggest edit