Black Forest Labs
Last reviewed
May 13, 2026
Sources
30 citations
Review status
Source-backed
Revision
v6 ยท 8,093 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 13, 2026
Sources
30 citations
Review status
Source-backed
Revision
v6 ยท 8,093 words
Add missing citations, update stale details, or suggest a clearer explanation.
Black Forest Labs (BFL) is a German-American artificial intelligence company founded in 2024 by Robin Rombach, Andreas Blattmann, Patrick Esser, and Dominik Lorenz, all of whom were key researchers behind the latent diffusion technology that powered Stable Diffusion. The company is best known for creating the FLUX family of text-to-image models, which rapidly became some of the most widely used image generation models in the industry. Black Forest Labs has raised over $430 million in total funding, reaching a valuation of $3.25 billion as of its December 2025 Series B round [1][2].
The founding of Black Forest Labs represents a notable case of core technical innovators leaving a company (Stability AI) to build a new venture based on their own foundational research. The FLUX models have been adopted by major platforms including Elon Musk's Grok chatbot for image generation, and in September 2025, Adobe integrated FLUX.1 Kontext Pro into Photoshop's generative fill tool [3][4]. The same month, Meta signed a multi-year contract with Black Forest Labs worth approximately $140 million, signalling that BFL's models had become strategically important for the largest social platforms as well [15].
Black Forest Labs operates from offices in Freiburg, Germany, and the United States. The company's name is a reference to the Black Forest region in southwestern Germany, close to where most of its founding researchers completed their PhDs. The labs concentrate on visual generative models: text-to-image generation, image-to-image editing, in-context editing, and, by late 2025, early development of text-to-video systems. The product line is structured around a tiered licensing approach. The most capable models are closed and offered through Black Forest Labs' own API. Below those sit guidance-distilled open-weight models for developers and researchers. At the bottom of the stack are small, fully permissive Apache 2.0 models such as FLUX.1 [schnell] and FLUX.2 [klein] that can run on a consumer GPU [1][8].
The combination of strong technical performance and a permissive bottom tier turned FLUX into the de facto successor to Stable Diffusion in the open-source community. Tools such as ComfyUI, the Hugging Face Diffusers library, Replicate, fal.ai, and Together AI added FLUX support within days of the August 2024 launch, and most major image generation tutorials and LoRA training guides published after late 2024 use FLUX rather than Stable Diffusion as the default base model [7][26][27].
The company has been profitable on a unit-economics basis since shortly after the launch of its API, helped by enterprise contracts with Adobe, Canva, Meta, and Snap. Combined enterprise contract value reached around $300 million by the end of 2025, according to Bloomberg [15]. Black Forest Labs' investors include Andreessen Horowitz, General Catalyst, NVIDIA, Salesforce Ventures, and Temasek, alongside angel investors who include Brendan Iribe, Garry Tan, Guillermo Rauch, Clem Delangue, and Mati Staniszewski.
The story of Black Forest Labs begins with a research paper published in 2022: "High-Resolution Image Synthesis with Latent Diffusion Models," presented at CVPR 2022. The paper was authored by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer, all affiliated with Ludwig-Maximilians-Universitat (LMU) Munich and Heidelberg University [5].
This paper introduced the concept of latent diffusion, a technique that applies the diffusion process in a compressed latent space rather than directly on pixel-level images. By operating in this lower-dimensional space, latent diffusion models could generate high-quality images at a fraction of the computational cost of previous diffusion approaches. The paper's core innovation enabled the creation of Stable Diffusion, which became the most widely used open-source AI image generation system in the world [5].
The CompVis group at LMU Munich, led by Bjorn Ommer, had already produced an influential earlier paper, "Taming Transformers for High-Resolution Image Synthesis," which introduced VQGAN. Patrick Esser was the lead author of that paper, and it became one of the technical building blocks behind both DALL-E Mini and the later latent diffusion work. By the time the latent diffusion paper appeared, the four researchers who would later co-found Black Forest Labs had been collaborating for several years on the same core problem of efficient high-resolution image synthesis with neural networks [5].
Stable Diffusion was released in August 2022 by Stability AI, a company founded by Emad Mostaque. The model was built directly on the latent diffusion research of Rombach and his colleagues from the CompVis group at LMU Munich. Four of the five original latent diffusion paper authors (Rombach, Blattmann, Esser, and Lorenz) joined Stability AI to continue developing the technology commercially [5][6].
At Stability AI, the team developed subsequent versions of Stable Diffusion, including Stable Diffusion 2.0 and Stable Diffusion XL (SDXL). Patrick Esser later led the work on Stable Diffusion 3, which introduced the multi-modal diffusion transformer (MM-DiT) architecture and rectified flow training. That technical direction would carry directly over to FLUX once the team left.
Stability AI experienced significant organizational and financial turbulence in 2023 and 2024. Reports surfaced of slow vendor payments, layoffs, and friction between Mostaque and the senior research staff. CEO Emad Mostaque resigned in March 2024 amid growing pressure, and the company subsequently went through several rounds of leadership transitions [6]. Robin Rombach and the other senior researchers behind Stable Diffusion 3 resigned around the same time, citing concerns about the company's ability to fund continued large-scale research.
The departure of the core technical team from Stability AI to found Black Forest Labs reflected a broader pattern in the AI industry where the researchers who develop key technologies often move on to build new companies, taking their expertise (though not their former employer's proprietary work) with them. The split was relatively clean from a legal perspective: the latent diffusion paper had been published as academic work at LMU Munich and Heidelberg, and the founders left Stability AI before founding the new company rather than spinning out from it.
| Founder | Role | Background |
|---|---|---|
| Robin Rombach | CEO | Lead author of the latent diffusion paper; PhD at LMU Munich and Heidelberg; studied physics at University of Heidelberg (2013-2020) |
| Andreas Blattmann | Co-founder | Co-author of latent diffusion paper; researcher at LMU Munich; contributed to video diffusion research and the Stable Video Diffusion model |
| Patrick Esser | Co-founder | Co-author of latent diffusion paper; lead author of VQGAN; led Stable Diffusion 3 research at Stability AI |
| Dominik Lorenz | Co-founder | Co-author of latent diffusion paper; researcher at LMU Munich; worked on video and 3D diffusion at Stability AI |
All four founders had previously worked in the Computer Vision and Learning (CompVis) group at LMU Munich under the supervision of Bjorn Ommer. Their shared research background and years of collaboration provided a strong technical foundation for the new company [5].
A fifth notable figure connected to the company is Tim Brooks, formerly one of the co-leads of OpenAI's Sora video model. Brooks joined Black Forest Labs as an advisor in 2024, lending credibility to the company's stated long-term ambitions to expand from images into video. Bjorn Ommer, the founders' former PhD supervisor at LMU Munich, remained in academia but continued to publish related research and is widely seen as a member of the extended BFL technical community.
Black Forest Labs has raised capital rapidly, reflecting strong investor confidence in the team's technical capabilities and the commercial potential of their image generation technology.
| Round | Date | Amount | Valuation | Lead Investors |
|---|---|---|---|---|
| Seed | August 2024 | $31 million | ~$150 million (post-money) | Andreessen Horowitz (a16z) |
| Series A | Late 2024 | ~$100 million | ~$1 billion | Andreessen Horowitz |
| Series B | December 2025 | $300 million | $3.25 billion | Salesforce Ventures, AMP (Anjney Midha) |
| Total | $430+ million | $3.25 billion |
The seed round of $31 million was announced alongside the company's public launch on August 1, 2024. In addition to a16z, the seed round included investments from notable figures including Brendan Iribe (Oculus co-founder), Michael Ovitz, and Garry Tan (Y Combinator CEO) [1]. The lead partner at Andreessen Horowitz, Anjney Midha, would remain closely involved with the company and later co-lead its Series B through his own firm, AMP, alongside Salesforce Ventures.
The seed round was sized aggressively for a company that had not yet shipped a product. According to reporting by TechCrunch, the founders' technical reputations and the visible demand for higher-quality open-source image models made the round oversubscribed within days of the founding pitch [1].
By September 2024, reports emerged that Black Forest Labs was seeking to raise approximately $100 million at a $1 billion valuation, representing a dramatic jump from the $150 million post-money valuation of the seed round. The Series A was led by Andreessen Horowitz, with participation from BroadLight Capital, Creandum, Earlybird VC, General Catalyst, Northzone, and NVIDIA [3]. The round was not publicly confirmed at the time but was later disclosed alongside the Series B in December 2025.
The rapid step-up from a $150 million post-money valuation to a $1 billion valuation in roughly six weeks was driven by the August launch of FLUX.1 and the August 13 integration with xAI's Grok, which sent traffic and demand through the roof.
The December 2025 Series B of $300 million at a $3.25 billion valuation was co-led by Salesforce Ventures and AMP, with additional participation from a16z, NVIDIA, General Catalyst, Temasek, Air Street Capital, Bain Capital Ventures, Canva, Figma Ventures, Adobe Ventures, Samsung Next, and Lux Capital. Notable angel investors in the round included Guillermo Rauch (Vercel CEO), Clem Delangue (Hugging Face CEO), and Mati Staniszewski (ElevenLabs CEO) [2]. The round closed on December 1, 2025.
Reports during the fundraising process indicated that Black Forest Labs had also been in discussions with several sovereign wealth funds, including Saudi Arabia's Public Investment Fund (PIF). The Series B announcement did not name PIF as a participant, but the round's combination of strategic investors (Adobe Ventures, Canva, Figma Ventures, Samsung Next) reflected the breadth of platforms that had come to depend on FLUX models for their own products.
The Series B brought Black Forest Labs' total funding to more than $430 million, less than 18 months after the company emerged from stealth. As of the round, the company reported revenue run rate in the tens of millions of dollars per year, driven primarily by API usage and enterprise contracts.
FLUX.1 was released alongside the company's public launch on August 1, 2024. The model family initially comprised three variants, each targeting different use cases and licensing requirements [7].
| Variant | Parameters | License | Steps | Availability |
|---|---|---|---|---|
| FLUX.1 [schnell] | 12 billion | Apache 2.0 | 1 to 4 | Open weights (Hugging Face) |
| FLUX.1 [dev] | 12 billion | BFL non-commercial | 20 to 50 | Open weights (Hugging Face) |
| FLUX.1 [pro] | 12 billion | Proprietary | Varies | API only |
Schnell (German for "fast") is the speed-optimized variant, capable of generating images in just 1 to 4 inference steps. It is released under the Apache 2.0 license, making it fully open for commercial and personal use. It can run on GPUs with as little as 12 GB of VRAM, making it accessible on consumer hardware. While it produces slightly lower-fidelity images than the dev or pro variants, the quality is high for the extremely low step count [7]. Schnell is widely used in real-time and interactive applications because its low latency allows generation to feel like a direct manipulation rather than a batch process.
The dev variant is a guidance-distilled version of FLUX.1 [pro], offering higher quality than schnell at the cost of requiring 20 to 50 inference steps (with 30 to 40 recommended for optimal results). It is released as source-available software under a non-commercial license, although users can obtain a self-serving commercial license from BFL. The dev model produces noticeably better skin textures, lighting effects, and fine details compared to schnell [7]. Most LoRA training in the community is done against FLUX.1 [dev] rather than the pro variant, since the dev weights are downloadable.
The professional variant is available exclusively through BFL's API and through partner platforms. It offers the highest image quality in the initial FLUX.1 lineup, with superior prompt adherence, photorealistic rendering, and fine detail work. It is the only variant whose weights are not publicly distributed [7]. Pro is typically priced at around $0.05 per image through the API, though pricing varies by partner platform.
Upon release, FLUX.1 quickly demonstrated state-of-the-art performance. The models outperformed Midjourney 6.1, DALL-E 3, and Stable Diffusion XL on multiple evaluation metrics, including visual quality, prompt adherence, and text rendering within images [7]. Within the first week, the FLUX.1 GitHub repository accumulated thousands of stars, the dev weights on Hugging Face crossed half a million downloads, and BFL's API processed over a million inference requests, according to the company's own data and tracking on Hugging Face [4].
A particular point of attention was anatomy. Image models had historically struggled with hands, faces, and complex multi-person scenes. FLUX.1 [dev] showed visibly better hand and finger rendering than SDXL, Midjourney V6, and DALL-E 3, which became a frequent talking point in early review threads on Reddit's r/StableDiffusion community and on X (formerly Twitter). Text rendering inside images was also markedly better, with FLUX able to produce short legible signs and labels that earlier models would have mangled.
Released on October 2, 2024 alongside the general availability of the BFL API, FLUX1.1 [pro] was a major upgrade that generated images six times faster than the original FLUX.1 [pro] while simultaneously improving image quality, prompt adherence, and output diversity. The model generates photorealistic images in approximately 4.5 seconds. It was submitted to the Artificial Analysis image arena under the codename "blueberry" and achieved the highest overall Elo score of any model on the leaderboard at the time of its debut [8].
FLUX1.1 [pro] supports high-resolution generation up to roughly 2K (2048 by 2048 pixels) without sacrificing quality, and it introduced improved handling of text rendering, complex multi-object scenes, and human anatomy. The release coincided with the general availability of api.bfl.ai, which up to that point had been in private beta.
On November 6, 2024, BFL released FLUX1.1 [pro] Ultra mode. Ultra extends FLUX's capability to generate images at four times the resolution of the standard FLUX1.1 [pro], producing 4-megapixel images (up to 2752 by 1184 pixels) in about 10 seconds. Internal benchmarks showed Ultra was over 2.5 times faster than comparable high-resolution offerings from competitors. Ultra is priced at $0.06 per image through the BFL API [16].
Released alongside Ultra, Raw mode captures the genuine feel of candid photography, producing images with a less synthetic, more natural aesthetic. It significantly increases diversity in human subjects and enhances the realism of nature photography, addressing the common criticism that AI-generated images can look overly polished or "plastic." Raw mode is available as a toggle on both the standard and Ultra variants [16]. Raw output is noticeably less "airbrushed" than the default and often resembles smartphone photography or stock photo libraries rather than the highly stylised look common to other models.
On November 21, 2024, BFL released FLUX.1 Tools, a suite of editing capabilities designed to extend the core FLUX models [17].
| Tool | Function | Availability |
|---|---|---|
| FLUX.1 Fill | Inpainting and outpainting with text-guided editing | Pro (API) and Dev (open weights) |
| FLUX.1 Depth | Structural guidance based on depth maps from input images | Pro (API) and Dev (open weights) |
| FLUX.1 Canny | Structural guidance based on canny edge detection from input images | Pro (API) and Dev (open weights) |
| FLUX.1 Redux | Adapter for mixing and recreating input images with text prompts | Pro (API) and Dev (open weights) |
Each tool was released as a FLUX.1 [pro] variant through the API and as a guidance-distilled open-access FLUX.1 [dev] variant with inference code and weights on Hugging Face. FLUX.1 Fill [pro] achieved state-of-the-art results in inpainting benchmarks at the time of release. FLUX.1 Canny and Depth provide ControlNet-style structural conditioning, enabling precise control over the spatial layout and structure of generated images. Redux supports image variations and remixing, allowing a single reference image to be re-generated with different prompts or in different styles [17].
The combination of Fill, Depth, Canny, and Redux gave FLUX feature parity with the most popular ControlNet workflows that had developed around SDXL, and most ComfyUI users who had been running ControlNet pipelines migrated to FLUX Tools within a few weeks of the November 2024 release.
On January 16, 2025, BFL launched the FLUX Pro Finetuning API, enabling users to customize FLUX.1 [pro] with their own images and concepts. The system requires as few as 1 to 5 example images to create a targeted customization. In user studies conducted by BFL, finetuning results were preferred 68.9% of the time over other available finetuning services using FLUX.1 [dev] [18].
Once a finetune is created, it can be applied across the entire FLUX.1 model suite without additional adaptation, including FLUX.1 [pro], FLUX1.1 [pro], and the complete FLUX.1 Tools suite. This enables customized content generation with resolutions up to 4 megapixels, customized inpainting with FLUX.1 Fill, and customized structural control with FLUX.1 Depth [18]. The finetuning API is used heavily by enterprise customers who want consistent brand styles, characters, or product likenesses without managing their own training infrastructure.
FLUX.1 Kontext, released on May 29, 2025, represented a new direction for the model family. Rather than purely text-to-image generation, Kontext enables in-context image generation and editing, allowing users to prompt with both text and images. The model can extract and modify visual concepts from input images to produce new, coherent renderings [4]. In effect, Kontext is BFL's answer to GPT-Image and Gemini Flash Image, the multimodal image models from OpenAI and Google that arrived around the same time.
| Kontext variant | Focus | Availability |
|---|---|---|
| Kontext [max] | Highest quality; iterative image modification | API ($0.08 per image) |
| Kontext [pro] | Balanced quality and speed | API ($0.04 per image) |
| Kontext [dev] | Non-commercial research | Open weights (Hugging Face) |
Kontext can extract and modify visual concepts from reference images to produce new coherent renderings. Common use cases include character consistency across multiple generations, style transfer, object replacement, garment changes, and iterative editing without requiring fine-tuning or complex multi-step workflows. Kontext was the first BFL model that natively supported "edit this image" instructions in plain English, in contrast to the earlier ControlNet-style workflows that required separate masks or conditioning images.
BFL reported that Kontext models deliver inference speeds up to 8 times faster than competing context-aware image editing models such as GPT-Image. The benchmark comparisons were independently reviewed by several third parties, including the Artificial Analysis benchmark team, who corroborated the speed advantage at comparable quality [4].
In September 2025, Adobe announced that Flux.1 Kontext Pro was available as a model option for Photoshop's generative fill tool in beta, marking a significant validation from the professional creative tools industry [4][12]. Kontext also became the default image-editing backend for several smaller design tools, including Magnific by Freepik.
On June 26, 2025, BFL released FLUX.1 Kontext [dev] as open weights, allowing the community to run the model locally and integrate it into custom workflows. The release made Kontext the first open-source model with general-purpose instruction-based image editing capabilities at this quality level [19].
On July 31, 2025, BFL released FLUX.1 Krea [dev], a model developed in collaboration with Krea AI. FLUX.1 Krea [dev] is a 12-billion-parameter rectified flow transformer that was specifically trained to overcome the oversaturated "AI look" common in text-to-image models, achieving higher photorealism with a distinctive aesthetic approach [20].
The model is the open-weights version of Krea 1, offering strong performance with highly distinctive aesthetics and exceptional realism. It scored 1011 Elo in human evaluation tests, outperforming other open-source FLUX models and approaching the quality of premium models like FLUX1.1 [pro]. It was released under a non-commercial license with weights available on Hugging Face [20]. The Krea collaboration was unusual in that BFL allowed an outside team to train an opinionated, stylistically distinctive checkpoint on top of FLUX and distribute it as part of the official FLUX family, rather than as a third-party finetune.
Krea AI itself is a creative tooling company that operates a real-time image generation product. The two companies' collaboration started in late 2024 and continued through 2025, with the Krea team focused on training data curation and aesthetic tuning while the BFL team handled the underlying model training infrastructure.
On November 25, 2025, Black Forest Labs announced FLUX.2, the second major generation of the model family. The release included several variants, with additional models rolling out through early 2026 [9].
| Model | Parameters | Text encoder | License | Release date | Key features |
|---|---|---|---|---|---|
| FLUX.2 [max] | 32B | Mistral-3 24B VLM | Proprietary (API) | January 2026 | Highest quality, grounded generation with web context |
| FLUX.2 [pro] | 32B | Mistral-3 24B VLM | Proprietary (API) | November 2025 | Production-grade, multi-reference support |
| FLUX.2 [flex] | 32B | Mistral-3 24B VLM | Proprietary (API) | November 2025 | Tunable parameters, typography specialist |
| FLUX.2 [dev] | 32B | Mistral-3 24B VLM | BFL non-commercial | November 2025 | Open weights, LoRA training |
| FLUX.2 [klein] 9B | 9B | Qwen3 8B | Apache 2.0 | January 15, 2026 | Sub-second generation, consumer hardware |
| FLUX.2 [klein] 4B | 4B | Qwen3 8B | Apache 2.0 | January 15, 2026 | Smallest model, ~13 GB VRAM |
| FLUX.2 VAE | n/a | n/a | Apache 2.0 | November 2025 | Variational autoencoder |
| FLUX.2 [klein] 9B-KV | 9B | Qwen3 8B | Apache 2.0 | March 2026 | KV-cache for 2.5x faster multi-reference editing |
The FLUX.2 variational autoencoder was released as open-source software under the Apache 2.0 license, allowing the community to build on the model's image encoding and decoding capabilities. The full set of FLUX.2 variants spans roughly three orders of magnitude in parameter count (4B to 32B) and several orders of magnitude in compute requirements, giving developers a continuous range of price/quality options [9].
FLUX.2 made two significant architectural changes compared with FLUX.1. First, the model scales to 32 billion parameters in the largest variant. Second, it replaces the dual T5 plus CLIP text encoder system with a Mistral-3 24B vision-language model. By coupling a VLM trained on a large corpus of interleaved text and images with the rectified flow transformer, FLUX.2 has more grounded "world knowledge" than its predecessors, enabling better understanding of real-world concepts, spatial relationships, and material properties [9]. The smaller klein variants pair a 9B or 4B flow transformer with a Qwen3 8B text embedder instead of Mistral.
Klein (German for "small") is the fastest model family in the FLUX lineup, generating and editing images in under one second on modern hardware. Available in 4B and 9B parameter sizes, klein is designed for real-time applications, rapid creative iteration, and deployment on consumer hardware. The 4B variant requires approximately 13 GB of VRAM, making it accessible on GPUs like the NVIDIA RTX 3090 and RTX 4070. It is released under the Apache 2.0 license [22].
Unlike previous-generation models that required separate pipelines for generation and editing, FLUX.2 [klein] unifies text-to-image, single-reference editing, and multi-reference generation in one architecture. In March 2026, BFL released FLUX.2 [klein] 9B-KV and its FP8 variant, which incorporate KV-cache optimization. By caching key-value pairs from reference images during the first denoising step, the KV variant eliminates redundant computation in subsequent steps, achieving up to 2.5 times faster inference for multi-reference editing tasks [24].
The FLUX models represent a significant architectural departure from the U-Net-based architecture used in Stable Diffusion 1.x, 2.x, and SDXL. Instead, FLUX builds on two key technical innovations: the Diffusion Transformer (DiT) architecture and flow matching with rectified flow [7][10].
Traditional diffusion models for image generation, including all early versions of Stable Diffusion, used U-Net architectures for the denoising network. The DiT approach replaces the U-Net with a transformer architecture, bringing the scalability advantages of transformers (which had proven so effective in language modeling) to image generation. The DiT framework was introduced in the 2023 paper "Scalable Diffusion Models with Transformers" by William Peebles and Saining Xie [10].
FLUX uses a multimodal DiT variant called MM-DiT (Multimodal Diffusion Transformer), originally introduced in Patrick Esser et al.'s "Scaling Rectified Flow Transformers for High-Resolution Image Synthesis," which was published at Stability AI in early 2024. The architecture processes text and image information through parallel streams that interact at multiple points during the generation process. The FLUX.1 architecture consists of 57 total transformer blocks divided into two types:
This hybrid design allows the model to maintain modality-specific processing in the early layers (where text and image features are quite different) while enabling deep integration in the later layers (where the model needs to tightly coordinate text semantics with visual content) [7][10].
FLUX uses rectified flow matching rather than the traditional DDPM (Denoising Diffusion Probabilistic Models) approach used in Stable Diffusion 1.x and 2.x. Flow matching is a more general framework for training generative models that includes diffusion as a special case [10].
In traditional diffusion models, the generation process involves gradually adding Gaussian noise to an image during training (the forward process) and then learning to reverse this noisy process step by step (the reverse process). Flow matching takes a conceptually simpler approach: it learns a deterministic vector field that transforms samples from a simple noise distribution directly to the target data distribution along an optimal transport path. Rectified flow, in particular, encourages linear denoising trajectories: the model learns to transform noise into images along straighter paths in the generative process. This property enables more efficient sampling, since each step makes more progress along the generation trajectory. It is why FLUX.1 [schnell] can produce good results in as few as one to four steps, compared with the 20 to 50 steps typically needed by older Stable Diffusion models [7][10].
FLUX.1 uses two text encoders working in tandem to process input prompts [21]:
This combination allows FLUX to interpret complex scene descriptions with high fidelity, leveraging CLIP's visual-semantic alignment alongside T5's deep language understanding. FLUX.2 replaces this dual-encoder system with the Mistral-3 24B vision-language model, which acts as a unified encoder for both text and reference images.
| Feature | Description | Benefit |
|---|---|---|
| Rotary Position Embeddings (RoPE) | Position encoding via rotation matrices | Better handling of varying image resolutions |
| Parallel attention layers | Concurrent computation of attention and feedforward | Improved hardware efficiency |
| 16-channel latent space | Compressed representation with more channels than SDXL | Captures more nuance in textures and lighting |
| 12 billion (FLUX.1) and 32 billion (FLUX.2) parameters | Large model capacity | Strong generation quality |
The combination of these architectural choices results in a model that generates higher-quality images, handles text rendering better (a longstanding weakness of diffusion models), and runs more efficiently than previous approaches [7][10].
The relationship between Black Forest Labs and Stable Diffusion is central to understanding the company's position in the AI image generation landscape.
The foundational latent diffusion research was conducted at LMU Munich and Heidelberg University, funded by academic grants and the German Research Foundation (DFG). When Stability AI commercialized this research as Stable Diffusion in 2022, the core researchers (Rombach, Blattmann, Esser, Lorenz) joined the company [5][6].
Stability AI funded further development of the technology, producing Stable Diffusion 2.0, SDXL, and Stable Diffusion 3. However, as Stability AI encountered financial difficulties and leadership turmoil in 2023 and 2024, the technical team departed to start Black Forest Labs. The FLUX models represent a clean break architecturally (using DiT and flow matching instead of U-Net and DDPM), but the intellectual lineage from latent diffusion through Stable Diffusion to FLUX is direct and continuous [6].
| Aspect | Stable Diffusion (1.x / 2.x / XL) | FLUX |
|---|---|---|
| Architecture | U-Net based | Diffusion Transformer (DiT) |
| Training method | DDPM | Rectified flow matching |
| Text encoder | CLIP (1.x/2.x), CLIP+OpenCLIP (XL) | T5 + CLIP (FLUX.1), Mistral-3 (FLUX.2) |
| Developer | Stability AI | Black Forest Labs |
| Core researchers | Rombach, Blattmann, Esser, Lorenz | Same (now at BFL) |
| Image quality | Good (progressive improvement) | State of the art at release |
| Text rendering | Weak in 1.x/2.x; better in SD3 | Strong from FLUX.1 onwards |
| Generation speed | 20 to 50 steps typical | 1 to 4 steps (schnell), 20 to 50 (pro) |
Stability AI continued to develop its own models after the departure of the founding researchers, releasing Stable Diffusion 3 and subsequent versions. However, FLUX models have generally been regarded as technically superior, leading to a situation where the spiritual successors to Stable Diffusion now compete against it [6]. Many community tools have effectively switched their default model to FLUX, even though they remain backwards-compatible with Stable Diffusion checkpoints.
Black Forest Labs has secured several high-profile commercial partnerships that have driven adoption of FLUX models.
Elon Musk's xAI launched image generation in the Grok chatbot on August 13, 2024, less than two weeks after BFL emerged from stealth. The image generation backend was FLUX.1, accessed via Black Forest Labs' API. This partnership brought significant visibility to Black Forest Labs and demonstrated the models' production readiness for high-traffic consumer applications [3][10].
The rollout was, by any standard, chaotic. Grok's image generation feature was launched without strong content filters: users on X were quickly able to generate fairly realistic images of celebrities, politicians, and copyrighted characters, including Donald Trump, Kamala Harris, Taylor Swift, Mickey Mouse, and many others, in scenarios that ranged from satirical to clearly defamatory. News coverage during August 2024 in The Verge, Wired, Bloomberg, and Reuters focused on the looseness of the safety pipeline and the implications for political deepfakes ahead of the 2024 US presidential election [10].
Black Forest Labs and xAI later tightened the content filters significantly. xAI also switched its primary image generation backend to its own Aurora model in December 2024, although FLUX continued to be available as an option inside Grok for some time after that.
In September 2025, Meta signed a multi-year contract with Black Forest Labs worth approximately $140 million ($35 million in the first year, $105 million in the second year) for use of BFL's generative image technology. Reports indicated that the technology would power image features inside Meta AI, the AI assistant embedded in Facebook, Instagram, WhatsApp, and Messenger, as well as creative tools used by advertisers on Meta's platforms [15]. Combined with contracts from Adobe, Canva, and Snap, BFL's total enterprise contract value reached approximately $300 million by the end of 2025 [15].
In September 2025, Adobe integrated FLUX.1 Kontext Pro into the generative fill tool in Photoshop (beta). This partnership is particularly significant because it places FLUX technology in one of the world's most widely used creative software suites, alongside Adobe's own Firefly models. Photoshop users can choose between Firefly and FLUX.1 Kontext for generative fill, with FLUX.1 Kontext often producing more photorealistic results while Firefly retains a clearer commercial-licensing position [4][12].
Black Forest Labs operates its own API (api.bfl.ai) for direct access to FLUX models. The API is also available through third-party platforms including Replicate, fal.ai, Together AI, Cloudflare Workers AI, DeepInfra, Runware, and the NVIDIA NIM API catalog [6][7]. The model is also available through the Hugging Face Inference Endpoints and Microsoft Azure AI Foundry, the latter through a partnership announced in August 2025 [19].
Freepik adopted FLUX as the backend for its AI Image Generator and Magnific image upscaler, scaling to millions of requests per day through specialized infrastructure partnerships with DataCrunch and WaveSpeed AI [11]. Krea AI used FLUX as the basis for its real-time generation product and later collaborated on the official FLUX.1 Krea [dev] release.
FLUX models have consistently ranked at or near the top of independent image generation benchmarks. The Artificial Analysis image arena, lmarena.ai, and various academic evaluations have all placed FLUX models among the leading text-to-image systems since the August 2024 launch.
FLUX1.1 [pro] launched in October 2024 by entering the Artificial Analysis image arena under the codename "blueberry" and topping the Elo leaderboard. It outperformed Midjourney 6.1, Ideogram v2, and Stable Diffusion 3 in both visual fidelity and prompt accuracy. FLUX1.1 [pro] Ultra later achieved similar leaderboard dominance at higher resolutions [8].
On lmarena.ai, the same crowdsourced evaluation site behind the Chatbot Arena for text models, FLUX models have consistently appeared in the top three by Elo score since their launch. FLUX1.1 [pro] held the top position for several months in late 2024 before being overtaken in some categories by Google's Imagen 3 and OpenAI's GPT-Image, with FLUX.2 [pro] reclaiming much of the lead following its November 2025 release.
In October 2025, MLCommons selected FLUX.1 as the new text-to-image benchmark for MLPerf Training v5.1, replacing Stable Diffusion v2 to reflect modern model architectures and scale. The 11.9-billion-parameter transformer-based model serves as a representative benchmark for current generative AI workloads. In the MLPerf Training v5.1 results released on November 12, 2025, NVIDIA set a record time-to-train of 12.5 minutes using 1,152 Blackwell GPUs [29].
| Capability | FLUX position | Notes |
|---|---|---|
| Text rendering | Best in class | Legible spelling at small sizes, accurate typography |
| Photorealism (humans) | Best in class | Skin texture, lighting, eyes notably better than SDXL and Midjourney V6 |
| Anatomy (hands) | Best in class | Far fewer extra-finger and limb errors than SDXL |
| Prompt adherence (complex) | Top tier | Multi-object spatial relationships handled well |
| Aesthetic preference | Mixed | Midjourney often preferred for stylised illustration; FLUX preferred for photoreal |
| Speed | Top tier | Schnell can match Midjourney's turbo modes at lower latency |
In aesthetic preference tests, Midjourney is often preferred for highly stylised illustration work, while FLUX is preferred for photorealism, product photography, and editorial-style content [9].
| Model | Developer | Architecture | Open weights | Text quality | Speed | Key strength |
|---|---|---|---|---|---|---|
| FLUX.2 | Black Forest Labs | DiT + flow matching | Partial (klein, VAE) | Excellent | Fast | Technical quality, open variants |
| Midjourney v6 | Midjourney | Unknown (proprietary) | No | Good | Medium | Artistic aesthetics |
| DALL-E 3 / GPT-Image | OpenAI | Unknown (proprietary) | No | Good | Medium | Integration with ChatGPT |
| Stable Diffusion 3 | Stability AI | MM-DiT (similar lineage) | Partial | Good | Medium | Community ecosystem |
| Imagen 3 | Diffusion + transformer | No | Excellent | Medium | Photorealism | |
| Firefly | Adobe | Proprietary | No | Good | Fast | Commercial licensing clarity |
FLUX models have generally achieved top rankings on independent benchmarks for text-to-image quality, particularly excelling in prompt adherence, text rendering, and photorealism. The availability of open-weight variants (schnell and klein under Apache 2.0, dev for non-commercial use) gives FLUX a significant advantage among developers and researchers who want to run models locally or fine-tune them for specific applications [7][8].
Black Forest Labs uses a tiered licensing strategy across its model family [1]:
This approach balances open-source community building (through the Apache-licensed models) with revenue generation (through the API-only professional variants).
The FLUX.1 [dev] Non-Commercial License has been the source of some debate inside the open-source community. The license allows derivative works and finetunes, but it forbids using the model or its outputs for commercial purposes without obtaining a separate license. Black Forest Labs has stated that obtaining commercial licenses is relatively straightforward for small businesses and indie developers, and that pricing is typically tied to API usage rather than per-deployment fees. Larger commercial users typically use the pro variants through the API instead, where the licensing is bundled with the per-image cost.
The community has generally accepted the licensing terms more readily than it accepted similar tiered models from other vendors, partly because the schnell and klein variants are fully Apache 2.0 and partly because the dev variant is still freely available for research and personal use. The Apache-licensed FLUX.2 VAE released in November 2025 was an additional signal of openness, since the VAE is a critical component of any FLUX-based pipeline and can also be reused with other diffusion models.
The open-weight FLUX models have produced a large and active community, particularly around tools and platforms that previously concentrated on Stable Diffusion.
ComfyUI, the node-based visual workflow tool for diffusion models, provided day-one support for FLUX.1 in August 2024. ComfyUI also supported FLUX.1 Tools at their November 2024 launch and has continued to add support for new FLUX variants as they are released. In November 2025, NVIDIA highlighted FLUX.2 models as optimized for RTX GPUs and showcased ComfyUI workflows in its RTX AI Garage program [26].
FLUX has effectively replaced Stable Diffusion as the default model in many ComfyUI tutorials and workflows. Most newer custom nodes, schedulers, and samplers are written with FLUX in mind, and the most-shared workflows on the ComfyUI subreddit and on Civitai are now FLUX-based rather than SDXL-based.
The Hugging Face Diffusers library, the standard Python library for running diffusion models, added FLUX.1 support shortly after launch. By late 2024, Diffusers had become the primary distribution channel for FLUX.1 [dev] and FLUX.1 [schnell] weights, complemented by the FLUX GitHub inference repository maintained by Black Forest Labs.
Diffusers also handles FLUX.1 Tools (Fill, Depth, Canny, Redux), FLUX.1 Kontext [dev], FLUX.1 Krea [dev], FLUX.2 [dev], and FLUX.2 [klein]. Custom pipelines for inpainting, ControlNet-style guidance, multi-reference editing, and LoRA training are maintained both by the BFL team and by the open-source community [22].
Replicate added FLUX.1 to its model marketplace on day one and remained one of the most popular ways to use FLUX from a hosted API. Replicate's per-image pricing closely mirrors BFL's own API pricing. fal.ai and Together AI similarly added FLUX support quickly and have continued to push optimisations: fal.ai released an optimised FLUX.2 [dev] Turbo distillation in December 2025 that enabled high-quality generation in just 8 inference steps [28].
Civitai, the dominant community hub for diffusion model checkpoints and LoRAs, hosts thousands of FLUX LoRAs covering specific characters, art styles, photographic looks, and product types. Tools like FluxGym and Kohya_ss simplify FLUX LoRA training for users with 12 GB or more of VRAM. FLUX LoRAs are typically trained against FLUX.1 [dev] but are usually compatible with FLUX.1 [pro] and FLUX1.1 [pro] through the FLUX Pro Finetuning API [27].
In August 2025, FLUX models became available on Microsoft Azure AI Foundry, extending BFL's reach into Microsoft's enterprise cloud ecosystem [19]. NVIDIA included FLUX in its NIM API catalog and produced TensorRT-optimised builds for RTX GPUs, which are bundled with ComfyUI through NVIDIA's RTX AI Garage program [26]. These enterprise integrations help FLUX reach customers who are unwilling or unable to call a German startup's API directly but who already have procurement relationships with Microsoft or NVIDIA.
The most prominent controversy involving Black Forest Labs followed xAI's August 13, 2024 launch of image generation in Grok, which used FLUX.1 as its backend. Grok's safety filters at launch were noticeably more permissive than competing image generation services. Users were able to generate fairly realistic depictions of Donald Trump, Kamala Harris, Taylor Swift, Vice President Harris kissing Trump, Mickey Mouse holding firearms, and many other politically charged or copyrighted scenes that would have been blocked by OpenAI, Google, or Midjourney's safety systems [10].
News coverage in The Verge, Wired, Bloomberg, Reuters, NBC News, and The New York Times during August and September 2024 focused on the implications for political deepfakes ahead of the November 2024 US presidential election. Several commentators noted that Grok's looseness was an xAI policy choice rather than an inherent property of the FLUX model: BFL had documented safety guidelines and a watermark API, but xAI had not enabled them in the integration. The two companies subsequently tightened the filters and added more visible safeguards over the following months.
The episode echoed earlier controversies that had surrounded Stable Diffusion, which was widely used to generate non-consensual sexual imagery and deepfakes in the months after its August 2022 release. The BFL founders, who had been at Stability AI during that period, had publicly acknowledged the trade-offs of open weights and the difficulty of controlling downstream use. With FLUX, the company chose to keep the highest-quality variants closed and only release distilled, less capable open versions, which gave it more control over the moderation pipeline at the API level.
From launch, BFL has supported the Coalition for Content Provenance and Authenticity (C2PA) standard, attaching cryptographic provenance metadata to images generated through its API. The metadata signals that the image was generated by a particular FLUX model, although the metadata is straightforward to strip if a user wants to do so. The company has also explored invisible watermarking that survives common image edits, similar to Google's SynthID, but at the time of the FLUX.2 launch in November 2025, this work had not been released as a default feature.
Like other diffusion model developers, Black Forest Labs faced questions about the training data used for its models. The company has not publicly disclosed its training corpus in detail. The FLUX models clearly inherit some of the same large-scale image-text datasets that powered Stable Diffusion, including LAION-derived data, but the team has stated that the FLUX training pipelines included additional curation and filtering steps relative to the original Stable Diffusion training runs. The team has not been a named defendant in the major copyright lawsuits that have targeted Stability AI and Midjourney, though the broader legal questions remain unsettled.
Black Forest Labs has stated that it works with external advisors on safety policy and that all major commercial partners (Adobe, Meta, Canva, Snap) operate the models under stricter content policies than BFL's default API. The company's public communications emphasise that closed pro variants are intended for moderated commercial use, while the open variants are intended primarily for research and creative tooling rather than for direct deployment in consumer-facing products without additional safety infrastructure.
As of early 2026, Black Forest Labs is one of the leading companies in AI image generation. The FLUX model family has evolved through multiple generations and variants, establishing itself as a top-tier option for both open-source and commercial image generation.
The company's $3.25 billion valuation from its December 2025 Series B reflects strong investor confidence in the team's ability to maintain technical leadership in a rapidly evolving field. The combination of open-weight models (which drive community adoption and developer ecosystems), proprietary API services (which generate revenue), and high-profile partnerships (Adobe, Meta, Canva, Snap, xAI) provides a multi-pronged business model [2][15].
The broader AI image generation landscape continues to advance rapidly, with competition from Midjourney, OpenAI, Google, Adobe, and others. Black Forest Labs' core advantage remains the deep technical expertise of its founding team, the same researchers who created the latent diffusion approach that transformed the entire field. The company has indicated ongoing development of a text-to-video model, building on Andreas Blattmann's earlier video diffusion work and Tim Brooks's experience at OpenAI's Sora team, positioning it to compete in video generation alongside image generation.