Flux is a family of text-to-image generative models developed by Black Forest Labs (BFL), a company founded by the original creators of Stable Diffusion. First released in August 2024, the FLUX.1 models quickly established themselves as among the highest-quality open and commercial image generators available, with particular strengths in photorealism, text rendering within images, and anatomical accuracy. The models are built on a 12-billion-parameter hybrid transformer architecture that uses flow matching rather than the traditional denoising diffusion approach, representing a significant technical evolution from earlier latent diffusion models [1].
Flux models have been widely adopted across the AI image generation ecosystem. In August 2024, xAI integrated FLUX.1 into the Grok chatbot for image generation on the X platform. Freepik has scaled Flux to handle millions of image generation requests per day. Adobe integrated Flux.1 Kontext Pro into Photoshop as an option for its generative fill tool in September 2025. Meta signed a multi-year contract worth $140 million for use of BFL's generative image technology in September 2025. As of early 2026, Black Forest Labs has raised over $330 million in funding, is valued at approximately $3.25 billion, and has expanded its model lineup from the original FLUX.1 series to the second-generation FLUX.2 family [2][3].
Black Forest Labs was founded in 2024 by Robin Rombach, Andreas Blattmann, and Patrick Esser, all of whom were former researchers at LMU Munich under Professor Bjorn Ommer and subsequently employees at Stability AI [4].
Robin Rombach is the lead author of the 2022 paper "High-Resolution Image Synthesis with Latent Diffusion Models" (commonly known as the latent diffusion or Stable Diffusion paper), which introduced the technique of performing the diffusion process in a compressed latent space rather than directly in pixel space. This architectural insight dramatically reduced the computational cost of diffusion-based image generation and made high-quality image synthesis accessible to consumer hardware. The paper has become one of the most cited works in the history of computer vision and generative AI [4].
Andreas Blattmann and Patrick Esser were co-authors on the same paper and contributed to subsequent work on video generation and image synthesis at Stability AI. The three founders left Stability AI in 2024 to start Black Forest Labs, named after the Black Forest region in southwestern Germany near Freiburg, where much of their academic research took place.
| Round | Date | Amount | Valuation | Lead Investors |
|---|---|---|---|---|
| Seed | August 2024 | $31M | ~$150M (post-money) | Andreessen Horowitz (a16z) |
| Series A | Late 2024 | ~$100M | ~$1B | Andreessen Horowitz |
| Series B | December 2025 | $300M | $3.25B | Salesforce Ventures, AMP (Anjney Midha) |
The seed round, announced simultaneously with the launch of FLUX.1 in August 2024, was led by Andreessen Horowitz with participation from General Catalyst, Brendan Iribe (co-founder of Oculus), Michael Ovitz, Garry Tan (CEO of Y Combinator), and NVIDIA's Timo Aila [4]. By September 2024, reports indicated that the company was raising an additional $100 million at a $1 billion valuation, a dramatic increase from the $150 million post-money valuation just weeks earlier, driven largely by the rapid adoption of FLUX.1 models [5].
The Series A was led by Andreessen Horowitz with participation from BroadLight Capital, Creandum, Earlybird VC, General Catalyst, Northzone, and NVIDIA. It was not publicly announced at the time but was disclosed alongside the Series B.
In December 2025, Black Forest Labs closed a $300 million Series B round at a $3.25 billion valuation, co-led by Salesforce Ventures and Anjney Midha's AMP, with participation from a16z, NVIDIA, General Catalyst, Temasek, Air Street Capital, Bain Capital Ventures, Canva, Figma Ventures, Adobe Ventures, Samsung Next, Lux Capital, and others. Notable angel investors in the round included Guillermo Rauch (Vercel CEO), Clem Delangue (Hugging Face CEO), and Mati Staniszewski (ElevenLabs CEO) [3].
Black Forest Labs has secured significant commercial partnerships that underscore enterprise demand for its technology. In September 2025, Meta signed a multi-year contract worth $140 million ($35 million in the first year, $105 million in the second year) for use of BFL's generative image technology [15]. Combined with contracts from Adobe, Canva, and Snap, BFL's total enterprise contract value reached approximately $300 million by the end of 2025 [15].
The initial FLUX.1 release on August 1, 2024 comprised three model variants, each targeting different use cases and operating under different licensing terms [1].
| Model | Parameters | Steps | License | Availability | Target Use Case |
|---|---|---|---|---|---|
| FLUX.1 [schnell] | 12B | 1-4 | Apache 2.0 | Open weights (Hugging Face) | Fast local generation, prototyping |
| FLUX.1 [dev] | 12B | 20-50 | Non-commercial (BFL license) | Open weights (Hugging Face) | Research, hobbyist, non-commercial |
| FLUX.1 [pro] | 12B | Varies | Proprietary | API only | Professional/commercial use |
Schnell (German for "fast") is the speed-optimized variant, capable of generating images in just 1 to 4 inference steps. It is released under the Apache 2.0 license, making it fully open for commercial and personal use. It can run on GPUs with as little as 12 GB of VRAM, making it accessible on consumer hardware. While it produces lower-fidelity images than the dev or pro variants, the quality is remarkably high for the extremely low step count [1].
The dev variant is a guidance-distilled version of FLUX.1 [pro], offering higher quality than schnell at the cost of requiring 20 to 50 inference steps (with 30 to 40 recommended for optimal results). It is released as source-available software under a non-commercial license, though users can obtain a self-serving commercial license from BFL. The dev model produces noticeably better skin textures, lighting effects, and fine details compared to schnell [1].
The professional variant is available exclusively through BFL's API and through partner platforms. It offers the highest image quality in the initial FLUX.1 lineup, with superior prompt adherence, photorealistic rendering, and fine detail work. It is the only variant whose weights are not publicly distributed [1].
Released on October 2, 2024 alongside the general availability of the BFL API, FLUX1.1 [pro] was a major upgrade that generated images six times faster than the original FLUX.1 [pro] while simultaneously improving image quality, prompt adherence, and output diversity. The model generates photorealistic images in approximately 4.5 seconds. It was submitted to the Artificial Analysis image arena under the codename "blueberry" and achieved the highest overall Elo score of any model on the leaderboard at the time of its debut [6].
FLUX1.1 [pro] natively supports high-resolution generation up to 2K (2048 x 2048 pixels) without sacrificing quality, and it introduced improved handling of text rendering, complex multi-object scenes, and human anatomy.
On November 6, 2024, BFL released FLUX1.1 [pro] Ultra and Raw modes [16].
Ultra mode extends FLUX's capability to generate images at four times the resolution of the standard FLUX1.1 [pro], producing 4-megapixel (approximately 2048 x 2048) images in about 10 seconds. Benchmarks showed Ultra was over 2.5 times faster than comparable high-resolution offerings from competitors. Ultra is priced at $0.06 per image through the BFL API [16].
Raw mode captures the genuine feel of candid photography, producing images with a less synthetic, more natural aesthetic. It significantly increases diversity in human subjects and enhances the realism of nature photography, addressing the common criticism that AI-generated images can look overly polished or "plastic." Raw mode is available as a toggle on both the standard and Ultra variants [16].
On November 21, 2024, BFL released FLUX.1 Tools, a suite of editing capabilities designed to extend the core FLUX models [17].
| Tool | Function | Availability |
|---|---|---|
| FLUX.1 Fill | Inpainting and outpainting with text-guided editing | Pro (API) + Dev (open weights) |
| FLUX.1 Depth | Structural guidance based on depth maps from input images | Pro (API) + Dev (open weights) |
| FLUX.1 Canny | Structural guidance based on canny edge detection from input images | Pro (API) + Dev (open weights) |
| FLUX.1 Redux | Adapter for mixing and recreating input images with text prompts | Pro (API) + Dev (open weights) |
Each tool was released as a FLUX.1 [pro] variant through the API and as a guidance-distilled open-access FLUX.1 [dev] variant with inference code and weights on Hugging Face. FLUX.1 Fill [pro] achieved state-of-the-art results in inpainting benchmarks at the time of release. FLUX.1 Canny and Depth provide ControlNet-style structural conditioning, enabling precise control over the spatial layout and structure of generated images [17].
On January 16, 2025, BFL launched the FLUX Pro Finetuning API, enabling users to customize FLUX.1 [pro] with their own images and concepts. The system requires as few as 1 to 5 example images to create a targeted customization. In user studies, FLUX Pro finetuning results were preferred 68.9% of the time over other available finetuning services using FLUX.1 [dev] [18].
Once a finetune is created, it can be applied across the entire FLUX.1 model suite without additional adaptation, including FLUX.1 [pro], FLUX1.1 [pro], and the complete FLUX.1 Tools suite. This enables customized content generation with resolutions up to 4 megapixels, customized inpainting with FLUX.1 Fill, and customized structural control with FLUX.1 Depth [18].
On May 29, 2025, Black Forest Labs released FLUX.1 Kontext, a suite of models that enable in-context image generation and editing. Unlike standard text-to-image models, Kontext accepts both text and image inputs, allowing users to provide reference images and modify them through natural language instructions [12].
Kontext can extract and modify visual concepts from reference images to produce new coherent renderings, enabling use cases such as character consistency across multiple generations, style transfer, object replacement, and iterative editing without requiring fine-tuning or complex multi-step workflows.
| Model | Description | Availability |
|---|---|---|
| FLUX.1 Kontext [max] | Maximum performance with exceptional prompt adherence, advanced typography, and premium rendering quality | API ($0.08/image) |
| FLUX.1 Kontext [pro] | Balanced quality and speed for iterative editing workflows | API ($0.04/image) |
| FLUX.1 Kontext [dev] | Lightweight 12B diffusion transformer for customization and local deployment | Open weights (Hugging Face) |
BFL reported that Kontext models deliver inference speeds up to 8 times faster than competing context-aware image editing models such as GPT-Image. In September 2025, Adobe announced that Flux.1 Kontext Pro was available as a model option for Photoshop's generative fill tool in beta, marking significant validation from the professional creative tools industry [12].
On June 26, 2025, BFL released FLUX.1 Kontext [dev] as open weights, allowing the community to run the model locally and integrate it into custom workflows [19].
On July 31, 2025, BFL released FLUX.1 Krea [dev], a model developed in collaboration with Krea AI. FLUX.1 Krea [dev] is a 12-billion-parameter rectified flow transformer that was specifically trained to overcome the oversaturated "AI look" common in text-to-image models, achieving new levels of photorealism with a distinctive aesthetic approach [20].
The model is the open-weights version of Krea 1, offering strong performance with highly distinctive aesthetics and exceptional realism. It scored 1011 Elo in human evaluation tests, outperforming other open-source FLUX models and rivaling premium models like FLUX1.1 [pro]. It was released under a non-commercial license with weights available on Hugging Face [20].
All FLUX.1 models are built on a hybrid architecture that combines multimodal and parallel diffusion transformer (DiT) blocks, scaled to 12 billion parameters. The architecture represents an evolution of the DiT framework introduced in "Scalable Diffusion Models with Transformers" (Peebles and Xie, 2023), adapted for the text-to-image generation task [7].
The FLUX.1 architecture consists of 57 total transformer blocks, divided into two types [7]:
This hybrid design allows the model to maintain modality-specific processing in the early layers (where text and image features are quite different) while enabling deep integration in the later layers (where the model needs to tightly coordinate text semantics with visual content).
FLUX.1 uses two text encoders working in tandem to process input prompts [21]:
This combination allows Flux to interpret complex scene descriptions with high fidelity, leveraging CLIP's visual-semantic alignment alongside T5's deep language understanding.
Flux uses flow matching as its training paradigm rather than the denoising diffusion probabilistic models (DDPM) framework used by Stable Diffusion and many earlier generative models [7].
In traditional diffusion models, the generation process involves gradually adding Gaussian noise to an image during training (the forward process) and then learning to reverse this noisy process step by step (the reverse process). Flow matching takes a conceptually simpler approach: it learns a deterministic vector field that transforms samples from a simple noise distribution directly to the target data distribution along an optimal transport path. This method, called Rectified Flow, straightens the transformation paths between noise and data, resulting in more efficient generation that requires fewer inference steps to produce high-quality outputs [7].
The practical benefit is that flow matching enables faster generation at equivalent quality levels, or higher quality at equivalent step counts, compared to DDPM-based methods.
FLUX incorporates rotary positional embeddings (RoPE) to encode spatial relationships within the image and sequential relationships within the text. RoPE enables the model to generalize across different image resolutions and aspect ratios more effectively than fixed positional encodings [7].
The architecture uses parallel attention layers to improve hardware efficiency. Rather than computing self-attention and feedforward layers sequentially, parallel attention computes both simultaneously and sums their outputs. This design choice improves GPU utilization and reduces wall-clock inference time [7].
Like Stable Diffusion, Flux operates in a compressed latent space rather than directly in pixel space. Images are encoded into a lower-dimensional latent representation by a variational autoencoder (VAE) before the diffusion/flow matching process begins, and the generated latent representation is decoded back into pixel space by the VAE decoder after generation is complete. FLUX.1 processes images in a 16-channel latent space, scaled up from the 4 channels used in Stable Diffusion. This expanded representation allows the model to capture more nuanced information about textures, lighting, and spatial arrangements [7].
FLUX.2 introduces significant architectural changes compared to FLUX.1. The model scales to 32 billion parameters and replaces the dual T5 + CLIP text encoder system with a Mistral-3 24B vision-language model (VLM) [8]. By coupling a VLM trained on a massive corpus of interleaved text and images with the rectified flow transformer, FLUX.2 possesses significantly more grounded "world knowledge" than its predecessors, enabling better understanding of real-world concepts, spatial relationships, and material properties.
FLUX.2 also introduced a retrained variational autoencoder that provides an optimized trade-off between learnability, quality, and compression rate. This new VAE was released as open-source software under the Apache 2.0 license [8].
For the FLUX.2 [klein] models, the architecture uses a Qwen3 8B text embedder instead of Mistral-3, paired with a 9B or 4B flow transformer. The klein variants are step-distilled to just 4 inference steps, enabling sub-second generation on consumer GPUs [22].
On November 25, 2025, Black Forest Labs announced the FLUX.2 series, a major second-generation update to the model family. The initial announcement included FLUX.2 [pro], [flex], and [dev], with additional variants released in the following months [8].
| Model | Parameters | Text Encoder | License | Release Date | Key Features |
|---|---|---|---|---|---|
| FLUX.2 [max] | 32B | Mistral-3 24B VLM | Proprietary (API) | January 2026 | Highest quality, grounded generation with web context |
| FLUX.2 [pro] | 32B | Mistral-3 24B VLM | Proprietary (API) | November 2025 | Production-grade, multi-reference support |
| FLUX.2 [flex] | 32B | Mistral-3 24B VLM | Proprietary (API) | November 2025 | Tunable parameters (steps, guidance), typography specialist |
| FLUX.2 [dev] | 32B | Mistral-3 24B VLM | BFL non-commercial | November 2025 | Open weights, LoRA training, local deployment |
| FLUX.2 [klein] 9B | 9B | Qwen3 8B | Apache 2.0 | January 15, 2026 | Sub-second generation, consumer hardware |
| FLUX.2 [klein] 4B | 4B | Qwen3 8B | Apache 2.0 | January 15, 2026 | Smallest model, ~13 GB VRAM, consumer GPUs |
| FLUX.2 [klein] 9B-KV | 9B | Qwen3 8B | Apache 2.0 | March 2026 | KV-cache for 2.5x faster multi-reference editing |
FLUX.2 [max] is the highest-performance model in the lineup, delivering the most consistent image editing and the strongest prompt following across the FLUX.2 family. It preserves colors, lighting, faces, text, and objects with exceptional fidelity during editing tasks. Despite major gains in quality, it generates content nearly as fast as FLUX.2 [pro], making it up to 3 times faster than competing models of similar quality. It supports grounded generation with real-time web context [23].
FLUX.2 [pro] is the production-grade variant that balances state-of-the-art quality with speed. It supports processing up to 10 reference images simultaneously while preserving character features, product details, and style elements across outputs. It can generate and edit images at resolutions up to 4 megapixels. Teams use this variant when they need reliable, consistent results without parameter tuning [8].
FLUX.2 [flex] provides developer control over inference parameters such as the number of sampling steps and the guidance scale, enabling developers to tune the trade-offs between speed, text accuracy, and detail fidelity for each project. It specializes in text rendering and fine details, making it well suited for typography, UI mockups, and infographics [8].
The open-weight 32B model combines text-to-image synthesis and image editing with multiple input images in a single checkpoint. It is available on Hugging Face with optimized fp8 implementations for consumer GPUs. The dev variant is suitable for developers, researchers, and power users who want local or cloud deployments, LoRA training, or rapid iteration [8].
Klein (German for "small") is the fastest model family, generating and editing images in under one second. Available in 4B and 9B parameter sizes, klein is designed for real-time applications, rapid creative iteration, and deployment on consumer hardware. The 4B variant requires approximately 13 GB of VRAM, making it accessible on GPUs like the NVIDIA RTX 3090 and RTX 4070. It is released under the Apache 2.0 license [22].
Unlike previous generation models that required separate pipelines for generation and editing, FLUX.2 [klein] unifies text-to-image, single-reference editing, and multi-reference generation in one architecture.
In March 2026, BFL released FLUX.2 [klein] 9B-KV and its FP8 variant, which incorporate KV-cache optimization. By caching key-value pairs from reference images during the first denoising step, the KV variant eliminates redundant computation in subsequent steps, achieving up to 2.5 times faster inference for multi-reference editing tasks [24].
One of FLUX's most praised capabilities is its ability to render legible, accurately spelled text within generated images. Text rendering has historically been one of the weakest aspects of diffusion-based image generators, with models like Midjourney and earlier versions of Stable Diffusion frequently producing garbled or misspelled text. FLUX handles text rendering with significantly higher accuracy, producing sharp, readable typography even at small sizes and in complex layouts [9].
FLUX generates highly photorealistic images with notably fewer artifacts in human anatomy, particularly hands and fingers, which have been a persistent challenge for image generation models. Comparative evaluations have found that FLUX maintains better anatomical consistency than Midjourney V6.1 and DALL-E 3, with fewer instances of extra fingers, deformed limbs, or distorted facial features [9].
The model demonstrates strong prompt following, accurately representing complex multi-object scenes, spatial relationships, and specific attributes described in text prompts. This capability is partly attributable to the dual-branch transformer architecture, which allows deep cross-modal attention between text and image features [9].
| Feature | FLUX.1 [pro] | FLUX.2 [pro] | Midjourney V6 | DALL-E 3 | Stable Diffusion XL |
|---|---|---|---|---|---|
| Parameters | 12B | 32B | Unknown (proprietary) | Unknown (proprietary) | ~3.5B |
| Text Rendering | Excellent | Excellent | Poor | Good | Poor |
| Photorealism | Excellent | State of the art | Excellent | Good | Good |
| Anatomy/Hands | Excellent | Excellent | Good (improved in V6.1) | Moderate | Moderate |
| Open Weights | Partial (schnell, dev) | Partial (dev, klein) | No | No | Yes |
| Local Deployment | Yes (schnell, dev) | Yes (dev, klein) | No | No | Yes |
| Training Approach | Flow matching | Flow matching | Diffusion | Diffusion | Diffusion |
| Architecture | DiT (12B transformer) | DiT + Mistral-3 VLM (32B) | Unknown | Unknown | U-Net |
| Max Resolution | 2K (Ultra: 4MP) | 4MP native | Unknown | 1024x1024 | 1024x1024 |
| Image Editing | Via Tools suite | Native (unified model) | Limited | Via DALL-E editor | Via extensions |
Flux's primary advantage over Midjourney and DALL-E 3 is the availability of open weights for the schnell, dev, and klein variants, enabling local deployment, fine-tuning, and community-driven extensions. Compared to Stable Diffusion XL, Flux offers substantially higher quality across all dimensions due to its much larger model size and more advanced architecture [9].
Black Forest Labs offers a credit-based API pricing system where 1 credit equals $0.01 USD. Pricing scales with model capability and, for some models, with output resolution [25].
| Model | Price per Image | Notes |
|---|---|---|
| FLUX.2 [klein] 4B | From $0.014 | Megapixel-based pricing |
| FLUX.2 [klein] 9B | From $0.015 | Megapixel-based pricing |
| FLUX.2 [pro] | From $0.03 (generation), $0.045 (editing) | Production-grade |
| FLUX.2 [flex] | $0.05 (generation), $0.10 (editing) | Tunable parameters |
| FLUX.2 [dev] | Free | Non-commercial local use |
| FLUX.1 Kontext [pro] | $0.04 | Context-aware editing |
| FLUX.1 Kontext [max] | $0.08 | Highest Kontext quality |
| FLUX1.1 [pro] | $0.04 | Standard generation |
| FLUX1.1 [pro] Ultra | $0.06 | 4MP high-resolution |
| FLUX.1 Fill [pro] | $0.05 | Inpainting/outpainting |
The same pricing applies for both API and Playground access. Batch requests multiply the base cost by the number of images requested. FLUX models are also available through numerous third-party platforms including Together AI, Replicate, Fal.AI, Cloudflare Workers AI, DeepInfra, Runware, and the NVIDIA NIM API catalog [6][25].
FLUX models achieved rapid adoption after their August 2024 launch, reaching 1 million API inferences in the first week and over 500,000 downloads on Hugging Face [4].
The open-weight FLUX models have fostered a large and active community, particularly around ComfyUI, the node-based visual workflow tool for diffusion models. ComfyUI provided day-one support for FLUX.1 Tools at their November 2024 launch and has continued to add support for new FLUX variants as they are released. In November 2025, NVIDIA highlighted FLUX.2 models as optimized for RTX GPUs and showcased ComfyUI workflows in its RTX AI Garage program [26].
Community-built tools like FluxGym simplify LoRA training for FLUX models, and Kohya_ss remains a widely used option for comprehensive LoRA training with support for 12 GB VRAM setups. FLUX LoRAs enable users to teach the model new concepts, characters, and styles, with trained weights easily integrated into existing ComfyUI workflows [27].
On December 29, 2025, Fal.AI released FLUX.2 [dev] Turbo, a distilled LoRA adapter for FLUX.2 [dev] that enables high-quality image generation in just 8 inference steps (compared to 50 for the base model). The adapter uses a customized DMD2 distillation technique and was released on Hugging Face. This third-party contribution demonstrated the value of BFL's open-weight strategy in enabling community-driven optimization [28].
In October 2025, MLCommons selected Flux.1 as the new text-to-image benchmark for MLPerf Training v5.1, replacing Stable Diffusion v2 to reflect modern model architectures and scale. The 11.9-billion-parameter transformer-based model serves as a representative benchmark for current generative AI workloads. In the MLPerf Training v5.1 results released on November 12, 2025, NVIDIA set a record time-to-train of 12.5 minutes using 1,152 Blackwell GPUs [29].
Black Forest Labs uses a tiered licensing strategy across its model family [1]:
This approach balances open-source community building (through the Apache-licensed models) with revenue generation (through the API-only professional variants).
Flux represents the technical evolution of ideas that originated in several key research papers:
| Date | Release |
|---|---|
| August 1, 2024 | Black Forest Labs launch; FLUX.1 [schnell], [dev], [pro] released |
| October 2, 2024 | FLUX1.1 [pro] and BFL API general availability |
| November 6, 2024 | FLUX1.1 [pro] Ultra and Raw modes |
| November 21, 2024 | FLUX.1 Tools (Fill, Depth, Canny, Redux) |
| January 2, 2025 | NVIDIA collaboration for performance optimization |
| January 16, 2025 | FLUX Pro Finetuning API launch |
| May 29, 2025 | FLUX.1 Kontext (Max, Pro, Dev) and BFL Playground |
| June 26, 2025 | FLUX.1 Kontext [dev] open weights |
| July 31, 2025 | FLUX.1 Krea [dev] (collaboration with Krea AI) |
| September 25, 2025 | FLUX.1 Kontext integration in Adobe Photoshop (beta) |
| November 25, 2025 | FLUX.2 series announcement (Pro, Flex, Dev) |
| December 1, 2025 | $300M Series B at $3.25B valuation |
| January 15, 2026 | FLUX.2 [klein] (4B and 9B) |
| January 2026 | FLUX.2 [max] |
| March 2026 | FLUX.2 [klein] 9B-KV and FP8 variants |
As of March 2026, Black Forest Labs has established itself as one of the leading companies in AI image generation, competing directly with Midjourney, OpenAI's DALL-E and GPT-Image, and Google's Imagen. The company's valuation of $3.25 billion, its $140 million partnership with Meta, and its integration into Adobe Photoshop underscore the commercial significance of its technology.
The FLUX model family spans two generations and offers capabilities ranging from sub-second generation on consumer hardware (FLUX.2 [klein]) to professional-grade 4-megapixel output with multi-reference support (FLUX.2 [max] and [pro]). The open-weight releases under Apache 2.0 have fostered a large community of developers building custom workflows, fine-tuned models, and integrations through platforms like ComfyUI and the broader Stable Diffusion ecosystem.
Black Forest Labs has also indicated ongoing development of a text-to-video model, positioning the company to compete in video generation alongside image generation.
Black Forest Labs represents a notable case of academic researchers successfully commercializing foundational AI research. The company's founders created the technology underlying Stable Diffusion, left Stability AI, and built a new company around the next generation of that same technology, achieving a multi-billion-dollar valuation within 18 months of founding.