Stable Diffusion 3

Stable Diffusion 3 (SD3) is a family of text-to-image diffusion models developed by Stability AI, first announced as an early preview on February 22, 2024. The series introduced a new architecture called the Multimodal Diffusion Transformer (MMDiT), trained with a rectified flow objective, and abandoned the U-Net backbone that had defined every previous Stable Diffusion release. SD3 was meant to be the company's answer to closed competitors such as DALL-E 3 and Midjourney v6, with substantially improved prompt following, multi-subject handling, and in-image text rendering compared to Stable Diffusion XL (SDXL). ^[1] ^[2]

The public release did not go to plan. SD3 Medium (2 billion parameters) shipped on June 12, 2024, under a Stability AI Community License whose commercial terms many in the open-source community considered hostile. Within days, the model became the subject of memes about distorted human anatomy, and CivitAI, the largest hub for Stable Diffusion fine-tunes, briefly banned uploads of SD3-derived models entirely. Four months later, after a near-complete management turnover at Stability AI and the departure of most of the original Stable Diffusion research team to found Black Forest Labs, the company released the SD 3.5 family on October 22, 2024, with a revised license, retrained weights, and three model sizes. ^[3] ^[4] ^[5]

This article covers the SD3 and SD 3.5 series together, since the two are technically continuous and the SD 3.5 release effectively superseded the original SD3 weights. It also covers the corporate context that shaped the launch, because SD3 is one of the rare cases where a model's reception was driven more by licensing and company drama than by image quality.

Infobox


Developer	Stability AI
Initial preview	February 22, 2024
Open weights release	June 12, 2024 (SD3 Medium)
Latest variant	SD 3.5 Large (October 22, 2024)
Architecture	Multimodal Diffusion Transformer (MMDiT)
Sampler objective	Rectified flow
Parameters	2B (Medium) to 8B (Large)
Text encoders	CLIP-L, OpenCLIP-bigG, T5-XXL
VAE channels	16
License	Stability AI Community License
Predecessor	SDXL
Successor in lineage	Flux (by ex-Stability researchers)
Paper	arxiv.org/abs/2403.03206
Hugging Face	huggingface.co/stabilityai/stable-diffusion-3-medium

Background

Stability AI in late 2023 and early 2024

The period leading up to SD3 was, by the company's own later admission, chaotic. By October 2023 Stability AI was reportedly burning roughly $8 million per month on cloud compute against monthly revenue near $5.4 million. An attempted fundraise at a $4 billion valuation fell apart. In a letter to the board, Lightspeed Venture Partners said founder Emad Mostaque's management had "severely undermined" their confidence in him; Coatue Management pushed for his resignation and opened an internal review. ^[6]

Emad Mostaque resigned as CEO and from the board on March 23, 2024, exactly one month after SD3 was first shown to the public. He framed the move publicly as a desire to pursue "decentralized AI," though contemporary reporting described it as effectively forced. Within days, three of the four authors of the original latent diffusion paper, Robin Rombach, Andreas Blattmann, and Dominik Lorenz, also left Stability AI. They went on to co-found Black Forest Labs and ship the Flux image models, which in many respects became the spiritual successor to SD3 within the same year. ^[7] ^[8]

The SD3 paper credits a long author list, with Patrick Esser as first author and Rombach as senior author. By the time the open weights came out in June 2024, several names on that paper were no longer Stability employees.

The SDXL ceiling

The immediate technical motivation for SD3 was that SDXL, released July 2023, had hit a ceiling on prompt following. SDXL's 3.5 billion-parameter U-Net produced sharp, well-composed images at 1024x1024 native resolution, but it routinely failed at multi-subject prompts ("a red cube on top of a blue sphere"), legible in-image text, and complex spatial relationships. Closed competitors had moved further: DALL-E 3, released by OpenAI in October 2023 and integrated into ChatGPT, used a captioning rewrite step that gave it noticeably better prompt adherence; Midjourney v6, released around the same time, also rendered text reasonably well. ^[9]

Stability AI needed both a quality jump and an architectural story for the next round of fundraising. SD3 was supposed to provide both.

Architecture

SD3 is described in the paper "Scaling Rectified Flow Transformers for High-Resolution Image Synthesis" (Esser et al., arxiv:2403.03206). The architecture differs from SDXL in three substantive ways: it uses a transformer rather than a U-Net, trains on a rectified flow objective rather than the standard EDM-style diffusion noise schedule, and conditions on three text encoders rather than two. ^[1]

Multimodal Diffusion Transformer (MMDiT)

The MMDiT block is the centerpiece. Earlier diffusion transformers (DiT, PixArt) treated image and text tokens as a single concatenated sequence, fed through ordinary transformer blocks. MMDiT instead keeps two separate parameter streams, one for image tokens and one for text tokens, with their own weight matrices for query, key, value, and feed-forward projections. The two streams meet only inside the attention operation, where their projected keys and values are concatenated, attention is computed jointly, and the resulting context is split back to the two streams.

In practice this means a SD3 transformer block is roughly twice the size of a corresponding DiT block at the same depth, but the two modalities can develop their own representations rather than fighting over a shared embedding space. Esser et al. report that this separation is the single largest source of the gain in prompt adherence over earlier DiT variants. ^[1]

Rectified flow

Traditional diffusion models train on a noise schedule, learning to denoise samples drawn at randomly chosen timesteps along a curved trajectory between data and Gaussian noise. Rectified flow, introduced in 2022 by Liu et al., reformulates this as learning a constant velocity field along a straight line in the data-noise space. SD3 uses a rectified flow loss with a logit-normal weighting on the timestep, which the paper finds outperforms standard diffusion across model sizes. ^[1]

In inference terms this matters because rectified flow models can be sampled with relatively few steps (often 25 to 30 for high quality) and respond well to distillation, which is why Stability could ship SD3 Large Turbo and SD 3.5 Large Turbo as 4-step distilled variants without major quality collapse.

Sixteen-channel VAE

SD 1.x and 2.x used a 4-channel VAE latent. SDXL kept the 4-channel encoder. SD3 moves to a 16-channel VAE with a higher reconstruction fidelity, which the paper credits for sharper detail and substantially better text rendering inside images. The 16-channel VAE is also part of why SD3 weights are not interchangeable with prior Stable Diffusion checkpoints: the latent space has different dimensionality.

Three text encoders

SDXL used CLIP ViT-L/14 plus OpenCLIP ViT-bigG/14. SD3 keeps both of those and adds Google's T5 XXL encoder (about 4.7 billion parameters), giving the model a text encoder stack that is substantially larger than its image transformer at the Medium scale.

The T5 encoder is optional at inference. Users on tight VRAM budgets can drop T5 and run with just the two CLIP encoders. The trade-off is significant: T5 is the source of most of the long-prompt and complex-composition gains over SDXL. Without it, SD3 still beats SDXL on simple prompts but loses much of its advantage on the hard ones.

Architectural comparison vs SDXL and SD 2

Feature	SD 2.1	SDXL 1.0	SD3 Medium	SD 3.5 Large
Backbone	U-Net	U-Net (larger)	MMDiT	MMDiT-X
Diffusion parameters	~865M	~3.5B (Base) + ~2.3B (Refiner)	2B	8B
Training objective	DDPM (v-prediction)	EDM	Rectified flow	Rectified flow
VAE channels	4	4	16	16
Native resolution	768x768	1024x1024	1024x1024	1024x1024
Text encoder(s)	OpenCLIP ViT-H/14	CLIP ViT-L/14 + OpenCLIP ViT-bigG/14	CLIP ViT-L/14 + OpenCLIP ViT-bigG/14 + T5-XXL	CLIP ViT-L/14 + OpenCLIP ViT-bigG/14 + T5-XXL
Inference steps (typical)	30-50	25-40	28-50	28-50
Two-stage pipeline	No	Yes (optional Refiner)	No	No

MMDiT-X in SD 3.5

The SD 3.5 family does not use exactly the same blocks as SD3. Stability AI calls the updated architecture MMDiT-X, with two changes worth noting. First, self-attention modules are added in the first 13 transformer layers, which the company says improves multi-resolution generation. Second, query-key normalization is applied before the attention scores are computed, which reduces training instability and makes the SD 3.5 weights easier to fine-tune. The 8B SD 3.5 Large is also wider and deeper than the SD3 model that was scaled up but never publicly released as open weights. ^[4]

Variants

Variant lineup

Variant	Parameters (diffusion model)	Release date	Open weights	Notes
SD3 (early preview)	Multiple sizes 0.8B-8B (paper)	February 22, 2024	No (API only)	Closed alpha; quality previews shown in paper
SD3 Medium	2B	June 12, 2024	Yes	First open weights release; restrictive license; widely criticized for anatomy issues
SD3 Large	8B	API only	No	Available via Stability API and Fireworks AI; never open-sourced under SD3 branding
SD3 Large Turbo	8B	API only	No	4-step distilled SD3 Large; API only
SD 3.5 Large	8B	October 22, 2024	Yes	Flagship open release; MMDiT-X with QK normalization
SD 3.5 Large Turbo	8B	October 22, 2024	Yes	4-step distilled SD 3.5 Large
SD 3.5 Medium	2.5B	October 29, 2024	Yes	Slightly larger than SD3 Medium; targets consumer GPUs

A few details are worth pinning down because they often get confused. The SD3 Large and SD3 Large Turbo weights were, as far as the public is aware, never released under permissive open weights. Stability shipped them through the Stability API and through partner inference providers like Fireworks AI, but the open-weights story for the 8B class only began with SD 3.5 Large in October 2024. The SD3 Medium open release in June 2024 was the only SD3-branded checkpoint that ever ended up on Hugging Face for direct download. ^[3] ^[10]

SD 3.5 Medium (2.5B) is slightly larger than SD3 Medium (2B). The choice of 2.5B was driven by the desire to fit comfortably in 12 GB of VRAM with the T5 encoder offloaded to system memory.

Capabilities and improvements over SDXL

The paper claims state-of-the-art results across three axes: prompt following, typography, and visual aesthetics, evaluated against SDXL, DALL-E 3, Midjourney v6, Ideogram v1.0, and Pixart-alpha. In Stability's own GenEval and human preference scores at preview, SD3 came out ahead on prompt following and tied or led on typography. ^[1]

The specific gains over SDXL fall into a few buckets:

In-image text. SDXL could occasionally produce a readable single word, usually short and centered. SD3 can render longer phrases, including multi-line text, with markedly fewer letter-soup failures. The 16-channel VAE and the T5 encoder both contribute to this.
Multi-subject prompts. "A cat on a chair next to a dog on a sofa" is the canonical hard case. SDXL frequently merges subjects or assigns the wrong attribute. SD3 holds attribute binding more reliably, particularly when the T5 encoder is used.
Spatial relationships. "Above," "below," "to the left of," and similar prepositions are slightly more honored, though still far from reliable.
Style consistency. Photorealistic renders, illustration styles, and 3D render styles are more cleanly separated than in SDXL, which sometimes drifted between styles within a single image.

Where SD3 did not deliver is human anatomy. The June 2024 SD3 Medium release became infamous for hands, elongated limbs, and the now-meme-worthy "woman lying on grass" prompts where the model produced bodies in physically impossible configurations. The cause has been variously attributed to aggressive NSFW filtering of training data (which removed enough human bodies that the model under-trained on them), distillation effects, and bugs in the released checkpoint. SD 3.5 substantially closes the gap, though Flux and Midjourney v7 remain stronger on photorealistic human figures as of early 2026. ^[3] ^[11]

Release history

Release timeline

Date	Event
February 22, 2024	SD3 announced via Stability AI blog post; closed early preview; waitlist opened
March 5, 2024	Paper "Scaling Rectified Flow Transformers for High-Resolution Image Synthesis" posted on arXiv (2403.03206)
March 23, 2024	Emad Mostaque resigns as CEO of Stability AI
March 2024	Robin Rombach, Andreas Blattmann, Dominik Lorenz leave Stability AI
April 17, 2024	SD3 API access opens through Stability Developer Platform and Fireworks AI
June 12, 2024	SD3 Medium open weights released on Hugging Face under Stability AI Community License
June 14-16, 2024	CivitAI bans SD3 Medium fine-tunes amid license uncertainty
June 17, 2024	Stability AI publishes clarification on commercial license terms
June 24, 2024	Prem Akkaraju named CEO; Sean Parker named Executive Chairman
August 1, 2024	Black Forest Labs (Rombach, Blattmann, Esser) launches FLUX.1
October 22, 2024	SD 3.5 Large and SD 3.5 Large Turbo released as open weights with revised Community License
October 29, 2024	SD 3.5 Medium released as open weights
April 2025	SD3 (3.0) APIs deprecated; users automatically migrated to SD 3.5

A few of these dates are worth elaborating. The original February 22 announcement was a preview only, with API access gated behind a waitlist. The actual open-weights release came almost four months later, by which point the company had a different CEO, several departed authors, and a louder open-source community than it had bargained for. ^[2] ^[3]

License history and backlash

The license drama around SD3 is the part of the story most people remember, and it is the most tangled to summarize accurately because the terms shifted in real time during June 2024.

Initial June 2024 terms

When SD3 Medium open weights went live on Hugging Face on June 12, 2024, they were released under the Stability AI Community License. The license, in its initial form, contained provisions that the open-source image generation community read as substantially more restrictive than the OpenRAIL-M license that had governed Stable Diffusion 1.x and 2.x and the SDXL license. The most contested provisions, as written in the original license text and reported across community sources at the time, were:

A revenue cap of $1 million in annual gross revenue for any commercial entity using the model. Above that threshold, a separate enterprise license from Stability AI was required.
Restrictions on training derivative models on outputs of SD3, with language broad enough to be read as banning any LoRA, fine-tune, or distilled model trained on SD3 outputs.
A requirement to delete model weights and any derivatives if the license was terminated.
Tracking and reporting requirements for commercial use cases.

The cumulative effect, as the community read it, was that anyone building a paid product on SD3, even a small Patreon-funded image generation site, faced uncertainty about whether they would owe Stability fees, whether their fine-tuned LoRAs were legal, and whether the license could be revoked. ^[12] ^[13]

CivitAI ban and community response

The largest community hub for Stable Diffusion custom models, CivitAI, responded by banning SD3-based content from its platform within roughly two days of the open weights release. The ban covered uploads of SD3 Medium itself, fine-tuned SD3 checkpoints, and LoRAs targeted at SD3. CivitAI's stated reason was that the license language was unclear enough that hosting derivatives risked exposing both the platform and uploaders to retroactive license claims. The decision was widely covered in tech media and on developer forums; for several days, the consensus among major Stable Diffusion communities was effectively to pretend SD3 did not exist. ^[13]

AUTOMATIC1111, ComfyUI, and other major tool maintainers were more cautious in their public statements but slow to integrate SD3 fully. ComfyUI added SD3 support quickly, but several extension authors paused work pending license clarification.

Stability's clarifications and revisions

Stability AI responded over the following days with a series of blog posts and license clarifications, addressing what it called "misunderstandings" of the terms. The company stated that:

Personal and research use was unambiguously free.
The $1 million revenue threshold applied to total annual revenue, not revenue from SD3 specifically, but smaller commercial users could still use the model freely.
The training-on-outputs language was not meant to ban LoRAs and the company would clarify it.
Derivative works were permitted under the same Community License, with attribution.

Whether these clarifications matched the original license text or constituted retroactive interpretation is, frankly, a matter of opinion in the community. The text itself was edited multiple times in June 2024, which made any single "what does the license say" answer a moving target. CivitAI partially reinstated SD3 content over the following weeks once specific provisions were softened. ^[12] ^[13]

October 2024 Community License revision

With the SD 3.5 release on October 22, 2024, Stability AI shipped a revised version of the Stability AI Community License. The October 2024 revision, which also retroactively governed SD3 Medium, made several changes that brought it closer to community expectations without making it fully open source by OSI definitions:

Personal and research use: free for individuals and organizations of any size.
Commercial use: free for organizations with annual revenues up to $1 million.
Enterprise license required: only for organizations with annual revenues above $1 million using the model in commercial products.
LoRA and fine-tuning: explicitly permitted under the same license.
Derivative model distribution: permitted under the Community License.
No requirement to share outputs back to Stability.

This is the form the license has held through 2025 and into 2026. The model weights remain free for the vast majority of users; the constraint applies only to a relatively small number of medium- and large-revenue commercial deployments. Most observers, even those who criticized the original launch, regarded the revised terms as reasonable, though the Open Source Initiative does not consider them open source because of the revenue gating. ^[4] ^[12]

License history table

Date	License version	Key terms
June 12, 2024	Stability AI Community License (initial)	$1M revenue cap, ambiguous derivative training restrictions, output tracking provisions
June 17-25, 2024	Stability AI Community License (clarifications)	Public statements softening derivative-training language; LoRA explicitly permitted
October 22, 2024	Stability AI Community License (revised)	Free under $1M annual revenue; enterprise license required above; explicit LoRA and fine-tune permission; no output tracking

SD 3.5 family

SD 3.5, released October 22, 2024, was the company's chance to reset the SD3 story. By that point Stability had a new CEO (Prem Akkaraju), a recapitalized balance sheet, an executive chairman from Napster (Sean Parker), and the looming presence of Flux, released by ex-Stability researchers two months earlier. The SD 3.5 release had to demonstrate that the company could ship competitive open-weights image models without the original Stable Diffusion team. ^[4] ^[5]

What changed from SD3 Medium

Several things changed simultaneously, which makes attribution of the quality improvements somewhat murky:

MMDiT-X, the modified architecture with QK normalization and self-attention in the first 13 layers, replaced plain MMDiT.
Larger flagship. The SD 3.5 Large 8B model is the first 8B-class SD3-lineage checkpoint to ship as open weights.
More training. SD 3.5 was trained for substantially more compute steps than SD3 Medium, with corrections aimed specifically at the anatomy issues users reported in June.
Less aggressive NSFW filtering. Without saying so explicitly, Stability appears to have rebalanced the training data to include more human bodies, which addressed the most prominent SD3 Medium failures.
Revised license. Same Community License framework, but with the more permissive October 2024 terms applied from the start.

The SD 3.5 release page emphasized customizability and ease of fine-tuning as primary goals, an explicit acknowledgment that the original SD3 license had alienated the fine-tuning community. ^[4]

SD 3.5 Large Turbo

SD 3.5 Large Turbo is a 4-step distilled version of SD 3.5 Large, trained using Adversarial Diffusion Distillation (ADD), the same family of techniques Stability used for SDXL Turbo. At 4 inference steps, SD 3.5 Large Turbo gives roughly 7-10x faster generation than SD 3.5 Large at the cost of some prompt fidelity, especially on long T5-conditioned prompts. The Turbo variant has the same architecture and parameter count as SD 3.5 Large; it differs only in training, with a one-step student model distilled from a multi-step teacher.

SD 3.5 Medium

SD 3.5 Medium (2.5B), released a week after SD 3.5 Large on October 29, 2024, was positioned as the consumer-hardware option. It does not match SD 3.5 Large in absolute quality, but it fits comfortably in 12 GB of VRAM with the T5 encoder offloaded to system memory, and at 16 GB VRAM it can run with full T5 conditioning. The MMDiT-X architecture and the QK normalization are present in both Medium and Large.

Reception

The SD3 Medium reception was, in a word, ugly. "Sloppy anatomy" memes proliferated on Reddit's Stable Diffusion subreddit during the week of June 12-19, 2024, with the most-shared examples being prompts like "a woman lying on grass" or "a person holding hands with another person" producing bodies that looked, charitably, like they had been assembled by someone who had never seen a human. The license backlash and the anatomy memes reinforced each other: the community read the situation as a company that had alienated its developers and shipped a buggy model on the same day. ^[3] ^[11]

Senior figures inside Stability AI publicly acknowledged the problems within days. The Decoder, citing Stability sources, reported that the company "apologized" for the disappointing release. Hanno Basse, who held an interim CEO role during the transition before Prem Akkaraju was appointed, was quoted in trade press around mid-2024 acknowledging that the launch had not gone as planned. The official Stability community statement at the time described the SD3 Medium release as a "first step" with improvements promised in subsequent versions. ^[3]

SD 3.5's reception was warmer but quieter. By October 2024, much of the energy that would have gone into evaluating an open-weights image model from Stability had moved to Flux, released August 2024 by ex-Stability researchers under Black Forest Labs. FLUX.1 schnell shipped under Apache 2.0 (genuinely open source by OSI standards), FLUX.1 dev had a clearer non-commercial license, and FLUX.1 pro was available through API. SD 3.5 Large was widely considered to roughly match FLUX.1 dev in image quality on many tasks, with some advantages on artistic styles and disadvantages on photorealistic humans. The community largely accepted SD 3.5 as a viable alternative without crowning it. ^[4] ^[14]

Professional reviews of SD 3.5 Large in the months following release tended to highlight its prompt-following improvements and its better-behaved fine-tuning. The flip side, repeatedly noted, was that running SD 3.5 Large with the T5 encoder requires roughly 24 GB of VRAM, which puts it in workstation-card territory; SDXL fine-tunes were comfortable on 12 GB cards.

Comparison with peer models

Peer model comparison

Model	Developer	Released	Architecture	Open weights	Known strengths
SD 3.5 Large	Stability AI	October 2024	MMDiT-X (8B)	Yes (Community License)	Style range, prompt following, fine-tuning ecosystem
FLUX.1 dev	Black Forest Labs	August 2024	Hybrid flow transformer (12B)	Yes (non-commercial)	Photorealism, in-image text, anatomy
FLUX.1 schnell	Black Forest Labs	August 2024	Distilled flow transformer (12B)	Yes (Apache 2.0)	Genuinely open source; fast inference
FLUX.1 pro	Black Forest Labs	August 2024	Flow transformer (12B+)	No (API)	Highest BFL quality tier
DALL-E 3	OpenAI	October 2023	Diffusion + caption rewrite	No (API)	Prompt rewriting, ChatGPT integration
Midjourney v6	Midjourney Inc.	December 2023	Closed	No (Discord/web)	Aesthetics, stylistic coherence
Midjourney v7	Midjourney Inc.	April 2025	Closed	No (Discord/web)	Refinement of v6 strengths
Imagen 3	Google	August 2024	Closed	No (Vertex AI)	Photorealism, text rendering
Imagen 4	Google	2025	Closed	No (Vertex AI / consumer)	Sharper detail, better text
Ideogram v2	Ideogram	2024	Closed	No (web)	In-image text rendering

A blunt summary in early 2026: SD 3.5 is a credible open-weights model in the second tier of image quality, behind closed leaders (Midjourney v7, Imagen 4, DALL-E 3 in some categories) and behind FLUX in several open categories. Its main argument over FLUX is its larger fine-tuning ecosystem inherited from earlier Stable Diffusion versions, plus the genuine open-weights status of all SD 3.5 variants under the revised Community License. ^[14]

Hardware requirements

Hardware requirements by variant

Variant	Minimum VRAM (with T5 offloaded)	Recommended VRAM (with T5 in GPU)	System RAM	Disk
SD3 Medium (2B)	~10 GB	~12-16 GB	16 GB+	~10 GB
SD 3.5 Medium (2.5B)	~10 GB	~12-16 GB	16 GB+	~10 GB
SD 3.5 Large (8B)	~16 GB	~24 GB	32 GB+	~16 GB
SD 3.5 Large Turbo (8B, 4-step)	~16 GB	~24 GB	32 GB+	~16 GB

The T5 XXL encoder is the largest single consumer of memory at inference. T5 XXL is roughly 4.7 billion parameters in fp16, or about 9.4 GB if loaded into VRAM directly. ComfyUI and similar tools support offloading the encoder to CPU, which lets users run T5-conditioned generation at the cost of slower text encoding. Quantized variants of the T5 encoder (8-bit and 4-bit) are widely used in community workflows to bring the memory footprint down further.

SD 3.5 Large can be run without T5 by passing only CLIP-L and OpenCLIP-bigG embeddings, but the typical recommendation is that anyone interested in SD 3.5 Large's prompt-following advantages over SDXL should keep T5 in the pipeline. The 8B image transformer in fp16 is itself about 16 GB.

Stability AI corporate context

The SD3 launch is one of the cleanest examples in recent AI history of a release whose reception was shaped almost as much by company drama as by model quality. The personnel and ownership changes at Stability AI during 2024 deserve a paragraph in any treatment of SD3.

Corporate timeline 2024

Date	Event
October 2023	Lightspeed Venture Partners and Coatue Management push for Mostaque resignation; Stability burns $8M/month against $5.4M monthly revenue
February 22, 2024	SD3 announced; closed alpha; paper not yet posted
March 5, 2024	SD3 paper posted on arXiv
March 23, 2024	Emad Mostaque resigns as CEO and from board
March 2024	Robin Rombach, Andreas Blattmann, Dominik Lorenz leave Stability
March-June 2024	Shan Shan Wong (former COO) and Christian Laforte (CTO) serve as interim co-CEOs
April 17, 2024	SD3 API access opens via Stability and Fireworks AI
June 12, 2024	SD3 Medium open weights released; license backlash
June 24, 2024	Prem Akkaraju named CEO; Sean Parker named Executive Chairman; ~$80M raised; ~$100M debt and ~$300M supplier obligations forgiven
August 1, 2024	FLUX.1 launches from Black Forest Labs (Rombach, Blattmann, Esser)
September 2024	James Cameron joins Stability AI board
October 22, 2024	SD 3.5 Large and Large Turbo released
October 29, 2024	SD 3.5 Medium released
December 2024	Akkaraju reports triple-digit revenue growth; debt eliminated

Several of these dates are worth re-emphasizing because they line up uncomfortably. SD3 was previewed February 22; the paper went up March 5; Mostaque resigned March 23. The original SD3 research team left in waves through March. The open weights release on June 12 happened in the middle of an interim co-CEO regime, twelve days before Prem Akkaraju and Sean Parker took over and the recapitalization closed. By the time SD 3.5 launched in October, the company was operating with new leadership, James Cameron on the board, and a clear positioning around enterprise customers and entertainment-industry partnerships. ^[7] ^[15] ^[16]

CoreWeave, the GPU cloud company, was Stability's primary compute partner and was named in the June 2024 recapitalization announcements as having forgiven future spending commitments as part of the rescue package. The exact figure for forgiven CoreWeave-specific obligations is not public, but Fortune and Deadline both reported the combined figure of roughly $100 million in debt and $300 million in future supplier spending, with CoreWeave understood to be the largest single component on the supplier side. ^[15]

Departures and Black Forest Labs

The departure of Robin Rombach, Andreas Blattmann, and Patrick Esser to found Black Forest Labs is critical context for SD3's trajectory. Rombach was first author on the original 2021 latent diffusion paper. Esser was first author on the SD3 paper in 2024. By the time SD3 Medium open weights shipped in June, Esser was still credited but had already physically left the company; by August 2024, FLUX.1 had launched. The same lineage that produced Stable Diffusion and SD3 immediately produced FLUX, and FLUX was widely seen as a continuation of where SD3 had been heading without the licensing friction. Black Forest Labs publicly raised $31 million at launch and went on to raise substantially more across 2024 and 2025; by early 2026 it was valued around $3.25 billion. ^[8] ^[14]

References

Esser, P., Kulal, S., Blattmann, A., Entezari, R., Muller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., Podell, D., Dockhorn, T., English, Z., Lacey, K., Goodwin, A., Marek, Y., & Rombach, R. (2024). "Scaling Rectified Flow Transformers for High-Resolution Image Synthesis." arXiv:2403.03206. https://arxiv.org/abs/2403.03206
"Stable Diffusion 3 Early Preview." Stability AI Blog, February 22, 2024.
"Stable Diffusion 3 Medium." Stability AI News, June 12, 2024. Hugging Face model card: https://huggingface.co/stabilityai/stable-diffusion-3-medium
"Introducing Stable Diffusion 3.5." Stability AI News, October 22, 2024.
"Stable Diffusion 3.5 Medium Released." Stability AI News, October 29, 2024.
"Inside Stability AI's Bad Breakup with Coatue and Lightspeed Venture." Fortune, March 2024.
"Stability AI CEO Resigns." TechCrunch, March 23, 2024.
"Stable Diffusion Creators Launch Black Forest Labs, Secure $31M for FLUX.1 AI Image Generator." VentureBeat, August 1, 2024.
"DALL-E 3 System Card." OpenAI, October 2023.
Stability AI Developer Platform documentation, 2024-2025.
"Stability AI Apologizes for Disappointing Stable Diffusion 3." The Decoder, June 2024.
"Stability AI Community License (October 2024 revision)." Stability AI, October 22, 2024.
"CivitAI Bans Stable Diffusion 3 Models." Various community sources and industry coverage, June 2024.
"FLUX.1 Image Models." Black Forest Labs, August 2024 onward.
"Former WETA Digital CEO Prem Akkaraju, Sean Parker Join Stability AI." Deadline, June 2024.
"James Cameron Joins Stability AI Board of Directors." Stability AI News, September 2024.
"Stable Diffusion 3.5: Architecture and Inference." LearnOpenCV, 2024.
"Stability AI's New CEO Says Business Growing by Triple Digits and No Debt." Fortune, December 2024.