Robin Rombach
Last reviewed
Jun 5, 2026
Sources
18 citations
Review status
Source-backed
Revision
v2 · 2,289 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 5, 2026
Sources
18 citations
Review status
Source-backed
Revision
v2 · 2,289 words
Add missing citations, update stale details, or suggest a clearer explanation.
Robin Rombach is a German computer scientist and entrepreneur best known as the lead author of the latent diffusion models paper that underpins Stable Diffusion, and as the co-founder and chief executive of Black Forest Labs, the company behind the FLUX family of image-generation models. [1][2] He developed the core latent diffusion architecture in the CompVis research group while completing his doctorate, helped commercialize it during a period as a research director at Stability AI, and in 2024 left to start his own lab. [2][3]
Rombach's work sits at the center of the open generative-image boom of the 2020s. The latent diffusion approach he led made high-resolution text-to-image synthesis cheap enough to run on consumer hardware, and its public release as Stable Diffusion in August 2022 put that capability in the hands of millions of users. [4] His Google Scholar profile, which lists Black Forest Labs as his affiliation, records on the order of 59,000 citations, an h-index of 31, and a single most-cited paper, the latent diffusion models work, with more than 34,000 citations. [5]
After Stable Diffusion, Rombach co-authored several of the most influential follow-on papers in the field, including SDXL and Stable Video Diffusion, before founding Black Forest Labs, which by late 2025 had raised more than $450 million and reached a valuation of $3.25 billion. [5][6]
Rombach studied physics at Heidelberg University, where he earned a master's degree (his studies there spanned roughly 2013 to 2020), before moving into computer science for his doctorate. [2][12] He completed his PhD at Ludwig Maximilian University of Munich (LMU Munich). [2] His doctoral work was carried out in the Computer Vision and Learning group, commonly known as CompVis, led by Björn Ommer; the group was originally based at Heidelberg University's Interdisciplinary Center for Scientific Computing (IWR) and relocated to LMU Munich, with Rombach's research team making the move in 2021. [2][7] Under Ommer's supervision he concentrated on making deep generative models more computationally efficient, a theme that runs through his later work. [3][7][12]
Early papers from this period established Rombach's reputation in generative modeling. His 2021 paper "Taming Transformers for High-Resolution Image Synthesis," which introduced the VQGAN model, and the latent diffusion work that followed are among his most cited; on Google Scholar the VQGAN paper has accumulated more than 5,000 citations. [5] Taming Transformers paired a vector-quantized adversarial autoencoder with a transformer that modeled the resulting discrete latent codes, allowing transformers to be applied to high-resolution images for the first time at practical cost, and the compression idea it explored fed directly into the latent diffusion design. [5]
In December 2021 Rombach and colleagues posted "High-Resolution Image Synthesis with Latent Diffusion Models," which was presented at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) in 2022. [3][8] Rombach is the paper's first author. The central idea was to run the diffusion process not on raw pixels but in the compressed latent space of a pretrained autoencoder, which cut the computational cost of training and sampling by a large factor while preserving image quality. [8] The same architecture introduced cross-attention conditioning, the mechanism that lets a text prompt or other input steer the denoising process, which is what turned latent diffusion into a general text-to-image system. [8] That efficiency is what made open, consumer-grade image generation practical, and the paper has gone on to become one of the most cited works in modern computer vision, with more than 34,000 citations recorded on Google Scholar. [5] Its influence extended well beyond the open-source ecosystem: the latent diffusion design informed architectural elements of later proprietary systems such as DALL-E and Sora. [13]
The CompVis group at LMU Munich released the resulting model under the name latent diffusion. [4] Building on it, a collaboration involving CompVis, the startup Runway, and Stability AI produced Stable Diffusion, which was made publicly available on 22 August 2022 with open weights and source code. [4] Rombach led the development of Stable Diffusion alongside Patrick Esser, then of Runway. [4] Releasing the weights and code openly, in contrast to the closed releases of DALL-E and Midjourney, was a deliberate choice that made the model freely modifiable and is widely credited with seeding a large open ecosystem of fine-tunes, interfaces, and downstream tools. [4]
The CVPR 2022 paper had five authors, four of whom went on to help build commercial Stable Diffusion at Stability AI. [4]
| Author | Note |
|---|---|
| Robin Rombach | First author; later co-founder and CEO of Black Forest Labs |
| Andreas Blattmann | Later co-founder of Black Forest Labs |
| Dominik Lorenz | Later co-founder of Black Forest Labs |
| Patrick Esser | Then at Runway; later co-founder of Black Forest Labs |
| Björn Ommer | CompVis group lead and Rombach's doctoral advisor |
Rombach's published research is concentrated on generative models for images and video and, in particular, on making them more efficient to train and sample. Across this body of work he has been first or senior author on several of the most cited papers in the area. The table below lists the landmark contributions and what each introduced, with approximate citation counts from his Google Scholar profile as of 2026. [5]
| Year | Paper | Venue | Contribution | Citations (approx.) |
|---|---|---|---|---|
| 2021 | Taming Transformers for High-Resolution Image Synthesis | CVPR 2021 | Introduced VQGAN; combined a vector-quantized adversarial autoencoder with a transformer to scale autoregressive image synthesis | 5,100+ [5] |
| 2022 | High-Resolution Image Synthesis with Latent Diffusion Models | CVPR 2022 | Introduced latent diffusion; moved the diffusion process into a compressed latent space and added cross-attention conditioning, the basis of Stable Diffusion | 34,000+ [5][8] |
| 2023 | Align Your Latents: High-Resolution Video Synthesis with Latent Diffusion Models | CVPR 2023 | Extended latent diffusion to video, a step toward later video models | 2,000+ [5] |
| 2023 | Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets | arXiv | Underpinned Stable Video Diffusion; described data and training recipes for image-to-video generation | 2,600+ [5] |
| 2024 | SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis | ICLR 2024 | A larger, redesigned latent diffusion model released as SDXL | 5,300+ [5] |
| 2024 | Adversarial Diffusion Distillation | ECCV 2024 | A distillation method (the basis of SDXL Turbo) that cuts sampling to a few steps | 900+ [5] |
| 2024 | Scaling Rectified Flow Transformers for High-Resolution Image Synthesis | ICML 2024 | Described the rectified-flow transformer architecture behind Stable Diffusion 3 | 4,300+ [5] |
This sequence traces a consistent research arc: from compressing images so generative models can run efficiently (VQGAN), to performing diffusion in that compressed space (latent diffusion), to scaling the approach to larger models and to video, and then to reducing the number of sampling steps through distillation and to replacing the diffusion formulation with rectified flow and flow matching. [5][14] The rectified-flow transformer work in particular reformulated image generation as a flow-matching problem and became the foundation both for Stable Diffusion 3 and for the FLUX models Rombach later built at Black Forest Labs. [14]
Rombach joined the London-based Stability AI around 2022 and became a research director there, leading work on the company's image-generation models. [2][3] During this tenure he and his collaborators shipped a series of widely used systems and accompanying papers, among them SDXL ("Improving Latent Diffusion Models for High-Resolution Image Synthesis"), Stable Video Diffusion, and the rectified-flow research that informed Stable Diffusion 3. [5] He was one of two academics from LMU Munich, alongside Patrick Esser, who had originated the model and then scaled it commercially. [9]
Stability AI went through a turbulent stretch in early 2024: its chief executive and founder, Emad Mostaque, resigned in March 2024 amid mounting financial and organizational pressure. [3] In the preceding twelve months the company had also replaced its chief technology officer and lost a string of senior staff, including its VP of product, VP of engineering, VP of research and development, head of research, and two large language model leads. [9] Rombach left the company in March 2024, the same month, departing with several of the researchers who had been central to its image models. [2][9] In a brief statement Stability AI thanked him for his contribution and wished him well, while Rombach did not initially comment publicly on the move. [9] His exit removed one of the chief architects of the technology that had made Stability AI's name.
In 2024 Rombach co-founded Black Forest Labs together with Andreas Blattmann, Patrick Esser, and Dominik Lorenz, all veterans of the latent diffusion and Stable Diffusion effort. [3][9] The company is based in Freiburg, in southwestern Germany near the Black Forest from which it takes its name, with an additional lab in San Francisco, and operates as a German-American generative-AI lab. [2][9] It emerged from stealth on 1 August 2024 with an initial suite of text-to-image models and a $31 million seed round led by Andreessen Horowitz, with participation from General Catalyst, the early-stage fund Maetch VC, and angel investors including Y Combinator chief executive Garry Tan, former Oculus chief executive Brendan Iribe, the talent agent Michael Ovitz, and NVIDIA researcher Timo Aila. [9][15] Andreessen Horowitz framed its investment around the firm's plan to build "the world's best open visual models for developers," continuing the open-weights approach that had made Stable Diffusion influential. [13]
The lab's flagship product is the FLUX family of models. The launch lineup, FLUX.1, was built on rectified-flow transformer blocks scaled to 12 billion parameters and came in three tiers: an open-weight "schnell" model under an Apache 2.0 license, a "dev" model released under a non-commercial license, and a proprietary "pro" model offered only through an API. [16] The line was positioned to compete with the strongest image models from large incumbents, and Rombach framed the company's ambition as building models that could go head to head with "state-of-the-art models made at large institutions like Google and Nvidia." [2] Days after launch, Black Forest Labs gained wide visibility when xAI adopted FLUX.1 to power image generation inside its Grok chatbot. [9] The company iterated quickly, releasing an improved FLUX 1.1 Pro on 2 October 2024 and adding faster "Ultra" and "Raw" modes the following month. [16]
Black Forest Labs subsequently expanded the FLUX line with image-editing models marketed under the "Kontext" name, released on 29 May 2025, which perform in-context generation and editing driven by both text and image prompts across Max, Pro, and Dev variants. [16][17] In late 2025 it shipped a next-generation release, FLUX.2, announced on 25 November 2025, which the company said improved text and image rendering, supported using up to ten reference images to hold style consistent, and could generate images at resolutions up to 4 megapixels. [6][18] FLUX.2 pairs a 32-billion-parameter flow-matching transformer with a Mistral-3 vision-language model (about 24 billion parameters) that interprets text and image inputs, and was released across open-weight "klein," non-commercial "dev," and proprietary tiers. [18]
As of 2026 Rombach is the co-founder and chief executive of Black Forest Labs, which he describes as a frontier research lab advancing generative models for images and video, and he is a vocal advocate of open-source model development. [1][3] He has presented the company's research arc publicly, including an October 2024 talk at Stanford University titled "From Latent Diffusion to FLUX and Beyond: Scaling Efficient Content Creation." [12] Under his leadership the company's FLUX models have been integrated by a range of partners, with Adobe, Canva, and Meta among the companies that have folded FLUX into creative or enterprise products. [6][10]
The lab's financing reflects its rapid rise. On 1 December 2025 Black Forest Labs announced a $300 million Series B at a post-money valuation of $3.25 billion, led by Salesforce Ventures and Anjney Midha of Andreessen Horowitz, with additional backing from investors including NVIDIA, General Catalyst, Temasek, Bain Capital Ventures, Canva, and Figma Ventures. [6][11] Counting an earlier, previously unannounced Series A, the round brought the company's total funding to more than $450 million. [11]