Stable Diffusion 3.5
Last reviewed
May 18, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 3,537 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 18, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 3,537 words
Add missing citations, update stale details, or suggest a clearer explanation.
Stable Diffusion 3.5 (SD 3.5) is a family of open-weights text-to-image diffusion models released by stability ai in October 2024. The family comprises three models — Stable Diffusion 3.5 Large, Stable Diffusion 3.5 Large Turbo, and Stable Diffusion 3.5 Medium — all built on the Multimodal Diffusion Transformer (mmdit) architecture introduced earlier the same year with stable diffusion 3 (SD3). The release was announced on October 22, 2024, with the Large and Large Turbo variants made available immediately and the Medium variant following on October 29, 2024.[^1][^2]
Stable Diffusion 3.5 was widely interpreted as a course correction following the contentious launch of SD3 Medium in June 2024, which had been criticized both for image-quality regressions — particularly in human anatomy — and for an unusually restrictive commercial license. With SD 3.5, Stability AI re-released open weights under the more permissive Stability AI Community License, which is free for non-commercial use and for commercial users earning less than US$1 million in annual revenue.[^1][^3][^4] The company explicitly acknowledged that its prior SD3 Medium release "didn't fully meet our standards or our communities' expectations" and framed SD 3.5 as a response to community feedback.[^1][^5]
The release introduced architectural refinements — most notably the integration of Query-Key Normalization (QK-norm) and, in the Medium variant, an "MMDiT-X" variant with additional self-attention modules and dual attention blocks — and broadened the model's distribution channels via huggingface, the Stability AI API, comfyui, Replicate, Fireworks AI, DeepInfra, NVIDIA NIM microservices, and (from December 2024) Amazon Bedrock.[^1][^6][^7][^8] In subjective and benchmark testing, SD 3.5 Large was generally described as competitive with flux 1 [dev] and other contemporary frontier image models on prompt adherence, while sometimes trailing flux 1 [pro] on photorealism.[^9][^10][^11]
| Field | Detail |
|---|---|
| Developer | stability ai |
| Release | October 22, 2024 (Large, Large Turbo); October 29, 2024 (Medium)[^1][^2] |
| Models | Stable Diffusion 3.5 Large (8.1B parameters), Large Turbo (8B distilled), Medium (≈2.5B parameters)[^1][^7][^12] |
| Architecture | Multimodal Diffusion Transformer (MMDiT/MMDiT-X) with QK-normalization and (in Medium) dual attention blocks[^1][^13][^14] |
| Text encoders | OpenCLIP-ViT/G, CLIP-ViT/L, T5-XXL[^13][^14] |
| Training objective | Rectified-flow formulation from stable diffusion 3[^15] |
| License | Stability AI Community License (free <$1M revenue), Enterprise License above[^1][^4] |
| Predecessor | stable diffusion 3 (SD3 Medium, June 12, 2024)[^3][^5] |
| Successor | No "Stable Diffusion 4" had been released as of this article's writing; SD 3.5 remained Stability AI's flagship open image model[^16] |
Stable Diffusion 3 (SD3) was first announced as a research preview in February 2024 and described in detail in the paper Scaling Rectified Flow Transformers for High-Resolution Image Synthesis by Patrick Esser and colleagues, posted to arXiv on March 5, 2024.[^15] The paper introduced the Multimodal Diffusion Transformer (MMDiT) — a transformer-based replacement for the U-Net backbone used by earlier Stable Diffusion releases — along with a diffusion model training objective based on rectified flow, in which data and noise are connected by a linear trajectory and the noise schedule is reweighted toward perceptually relevant scales.[^15] The paper studied models ranging from 450 million to 8 billion parameters and reported smooth scaling improvements in validation loss and human preference.[^15]
On June 12, 2024, Stability AI publicly released SD3 Medium — a roughly 2-billion-parameter model — as open weights on huggingface. The release was unusually controversial for two independent reasons. First, users widely reported severe image-quality regressions, especially on human anatomy, with the model frequently producing distorted hands, feet, and limbs. Second, the accompanying "Stability AI Community License" introduced unfamiliar restrictions for commercial use, with reviewers and community sites flagging concerns over its definition of derivative works and its termination clauses.[^4][^5][^17] CivitAI, a major community model-sharing platform, temporarily banned all SD3-related uploads pending clarification.[^17]
The original SD3 license also drew attention for an expansive definition of "derivative works" that some interpreted as covering any model trained on outputs from SD3, raising fears that LoRAs and fine-tunes could fall under Stability AI's continuing control. The same agreement was widely flagged for clauses that appeared to make end users liable for downstream misuse by their own customers, and for permitting Stability AI to terminate the agreement at its discretion.[^3][^4][^17] Stability AI revised the license terms in July 2024 to clarify that the model could be used without charge by individuals and organizations earning under US$1 million in annual revenue, but the goodwill damage was substantial.[^4] The CEO at the time apologized publicly and promised an improved model.[^5] In that context, the October 2024 SD 3.5 release was both a technical revision aimed at addressing the anatomical and prompt-adherence shortcomings of SD3 Medium and a re-affirmation of Stability AI's stated commitment to open, broadly licensed weights.[^1]
The SD 3.5 release comprises three open-weights models, all distributed under the Stability AI Community License through gated huggingface repositories.
Stable Diffusion 3.5 Large is the flagship of the family and was released on October 22, 2024. It has approximately 8.1 billion parameters in the transformer backbone and is designed for professional use cases, generating images at resolutions up to roughly 1 megapixel (e.g., 1024×1024).[^1][^7][^12] Stability AI describes it as "the most powerful in the Stable Diffusion family," emphasizing image quality, prompt adherence, and typography.[^1][^13] The model is positioned to be customizable for downstream professional use, with the QK-normalization changes intended to make fine-tuning more tractable than the prior SD3 release.[^1][^13] The reference Diffusers pipeline recommends 28–40 denoising steps and a classifier-free guidance scale of approximately 4.5 for image generation, with bf16 precision as the standard inference dtype.[^18] The model was later made available for enterprise users through Amazon Bedrock (US West / Oregon region) on December 19, 2024, where Stability AI noted the model had been trained on Amazon SageMaker HyperPod.[^7]
Stable Diffusion 3.5 Large Turbo is a timestep-distilled variant of SD 3.5 Large, also released on October 22, 2024.[^1][^6] It is produced via Stability AI's Adversarial Diffusion Distillation (ADD) technique, originally developed for the SDXL Turbo and Stable Video Diffusion lines, and is optimized for few-step inference: Stability AI's reference pipeline generates images in just 4 sampling steps with classifier-free guidance effectively disabled (guidance scale 0).[^6][^18] In Adversarial Diffusion Distillation, the student model is trained against a discriminator that pushes its few-step outputs to match those of a multi-step teacher, allowing the student to reproduce high-fidelity samples with a small number of denoising calls. The model trades some peak image quality and prompt fidelity for an order-of-magnitude reduction in inference cost relative to the full Large model, and was positioned as a competitor to FLUX.1 [schnell] in the few-step open-weights category.[^9][^10] Because guidance is disabled in the distilled pipeline, classifier-free guidance scale tuning — a common knob for steering image quality and prompt strength in standard diffusion sampling — does not apply at inference time, simplifying deployment for high-throughput hosted services.[^6][^18]
Stable Diffusion 3.5 Medium was released on October 29, 2024, one week after the Large variants.[^1][^2][^14] It has approximately 2.5 billion parameters (some early coverage cited 2.6 billion) and is engineered to run "out of the box" on consumer hardware, generating images at resolutions from roughly 0.25 to 2 megapixels.[^1][^2][^14] It uses a refined MMDiT-X architecture — a Stability AI–specific variant of MMDiT — featuring self-attention modules in the first 13 transformer layers and dual attention blocks in the first 12 transformer layers, both intended to improve multi-resolution generation, structural coherence, and anatomy.[^14] Stability AI's reference inference requires roughly 9.9 GB of VRAM excluding text encoders, making it usable on mid-range consumer GPUs.[^1] The Medium model is trained on a mixed-resolution pipeline progressing through 256, 512, 768, 1024, and 1440 latent resolutions, with extended positional embedding spaces to better handle non-square and multi-resolution outputs.[^14]
All three SD 3.5 models share a diffusion model backbone based on the Multimodal Diffusion Transformer (MMDiT) architecture introduced in the SD3 paper.[^15] MMDiT differs from prior diffusion transformer (DiT) designs by maintaining separate weights for image and text token streams within each transformer block, while permitting bidirectional information flow between them via a joint attention operation. The SD3 paper showed that this dual-stream design outperformed both U-ViT and standard DiT in visual fidelity and text alignment over the course of training.[^15]
SD 3.5 retains this core MMDiT design but introduces two refinements that the prior SD3 Medium release did not have:[^13][^14][^18]
The text encoder stack, latent diffusion VAE decoder (16 latent channels), and noise scheduler are unchanged from SD3 Medium.[^18]
SD 3.5 uses three fixed, pretrained text encoders concatenated along the sequence dimension:[^13][^14][^19]
The model can be run with any one or two encoders disabled to reduce memory, at some cost to prompt fidelity.[^19] The reference inference repository specifies OpenAI CLIP-L/14, OpenCLIP bigG, and Google T5-XXL.[^19]
Stability AI describes the SD 3.5 training corpus as a combination of "synthetic data and filtered publicly available data."[^13][^14] As with previous Stable Diffusion releases, much of the publicly available data is scraped from the web, and Stability AI relies on a fair-use interpretation against ongoing copyright challenges.[^11] By March 2023, artists had already removed approximately 80 million images from public training datasets used by Stability AI through opt-out tools, a process that continued into the SD 3.5 training corpus preparation.[^11] The company says it used multi-prompt captioning during training, with shorter captions prioritized, to improve the diversity of concepts and demographic representation across generated outputs.[^11]
The SD 3.5 training objective inherits the rectified-flow framework introduced in the SD3 paper, in which the model is trained to predict the velocity field of a straight-line interpolation between data and noise rather than the noise schedule used in classical denoising diffusion probabilistic models. The SD3 paper additionally introduced a logit-normal weighting of the timestep distribution that biases training toward perceptually relevant noise scales, a choice that the SD 3.5 family retains.[^15] The SD 3.5 Medium model further refines this objective with the mixed-resolution training schedule described above, with progressive crop augmentation on positional embeddings used to teach the model to handle non-square aspect ratios and a wider range of output resolutions.[^14]
In Stability AI's own benchmark charts published alongside the announcement, SD 3.5 Large led peer open-weights and proprietary models on prompt adherence — including flux 1 [dev], Midjourney v6.1, ideogram 2.0, and others — while remaining competitive on image quality, where it was placed close to FLUX.1 [pro] and ahead of other open models.[^1][^20] Stability AI's framing emphasized that SD 3.5 Large was both broadly capable and small enough to run on a single consumer-grade GPU after quantization.[^1]
Independent reviews broadly corroborated this picture with significant nuance. Side-by-side comparisons in technical and consumer outlets observed that:
Stability AI also noted in its announcement that SD 3.5 deliberately exhibits greater variation across seeds for the same prompt than some competitors, a design choice intended to preserve stylistic diversity and broader knowledge at the cost of less deterministic outputs.[^1] Practitioners testing the model on photography, 3D-rendered scenes, painterly styles, and line-art benchmarks reported strong cross-style generalization but recommended pairing the Medium model with Skip Layer Guidance during sampling for better structural and anatomical coherency on portraits and figure-heavy compositions.[^14]
All three SD 3.5 models are released under the Stability AI Community License, in the same form that had been retroactively applied to SD3 Medium in July 2024.[^1][^4] The license has three principal tiers:
The license also affirms that users retain ownership of the media they generate and may distribute and commercialize that media independently of any restrictions on the model weights themselves.[^1][^11] Compared to SD3 Medium's original June 2024 terms — which had restricted derivatives, set monthly active user caps, and triggered the CivitAI ban — the SD 3.5 license is functionally equivalent to the revised July 2024 SD3 terms and represents the same revenue-threshold model that the SD3 controversy had ultimately produced.[^3][^4][^17] Independent analysts continued to note that Stability AI retains the right to terminate the agreement, which some users view as a limitation relative to traditional permissive licenses.[^3]
The SD 3.5 reference inference repository on GitHub is published under the MIT License, with portions of helper code subject to the Hugging Face Transformers Apache 2.0 License.[^19]
The reception of SD 3.5 was broadly positive in both press coverage and community forums, particularly in contrast to SD3 Medium. Tom's Guide described it as "a step up in realism," Decrypt headlined its coverage as Stability AI "redeems itself," and How-To Geek noted the release came "with the right number of limbs" — a pointed reference to SD3 Medium's anatomical failures.[^9][^21][^22] Hacker News and the r/StableDiffusion community on Reddit discussed the release at length on October 22–29, 2024, with many practitioners flagging SD 3.5 Large as competitive with FLUX.1 [dev] in their own testing while welcoming the return to broadly usable open weights.[^23][^24]
CivitAI, which had banned SD3 content under the prior license, accepted SD 3.5 uploads under the revised Community License, allowing community LoRAs, fine-tunes, and other derivative artifacts to be redistributed alongside the official weights.[^17] By the time of the Medium release, Stability AI emphasized that the model had been trained "to generate more diverse images of people," with more varied skin tones and features achievable without specialized prompting — a feature the company highlighted as an explicit response to user feedback about the homogeneity of earlier model outputs.[^11][^9]
Criticism focused on three points. First, SD 3.5 Large at full precision required substantial VRAM (over 24 GB) for native inference, although huggingface Diffusers' integration with bitsandbytes 4-bit (NF4) quantization brought inference within reach of single 24 GB consumer GPUs.[^18] Second, although improved over SD3 Medium, some anatomical artifacts persisted, which Stability AI characterized as engineering trade-offs.[^11] Third, the community fine-tune and LoRA ecosystem for SD 3.5 was slower to mature than for SDXL, which remained the most widely used Stability AI base model in many production pipelines through 2025–2026.[^25]
Within days of the October 22, 2024 announcement, SD 3.5 Large was integrated into huggingface Diffusers via the StableDiffusion3Pipeline class — the same pipeline class previously introduced for SD3 Medium, since the text encoders, VAE, and scheduler were unchanged.[^18] Hugging Face's accompanying blog post documented inference at bf16 precision, recommended values of 28–40 sampling steps and a guidance scale of approximately 4.5 for the Large model, and 4 steps with guidance scale 0 for Large Turbo. It also documented training and fine-tuning recipes (e.g., DreamBooth LoRA) compatible with the existing SD3 training scripts.[^18]
A reference inference implementation is published by Stability AI at the GitHub repository Stability-AI/sd3.5 under the MIT License, supporting SD3.5 Large, Large Turbo, Medium, and SD3 Medium, plus the ControlNets released in late November 2024.[^19] comfyui gained native SD 3.5 support shortly after launch, including dedicated nodes for the Large Turbo's 4-step pipeline.[^1][^19]
In addition to self-hosted use, the SD 3.5 family is available through Stability AI's hosted API and was distributed via partner platforms including Replicate, Fireworks AI, and DeepInfra at launch.[^1] In December 2024, Stability AI announced SD 3.5 Large availability in Amazon Bedrock for enterprise customers, with US West (Oregon) as the launch region.[^7] In a separate collaboration, Stability AI and NVIDIA shipped a Stable Diffusion 3.5 NIM microservice with TensorRT-optimized weights and bundled ControlNet variants for streamlined enterprise deployment.[^8]
On November 26, 2024, Stability AI released three ControlNets for SD 3.5 Large — Blur, Canny, and Depth — extending the family with conditioning models for upscaling to 8K/16K (Blur), structural control via edge maps (Canny), and depth-guided generation (Depth, using DepthFM-derived depth maps).[^26] Additional ControlNets, including ones targeting SD 3.5 Medium, were announced as in development.[^26]
As of this article's writing in May 2026, Stability AI had not released a "Stable Diffusion 4" model.[^16] SD 3.5 Large remained the company's flagship open-weights image-generation model, and the SD 3.5 family — Large, Large Turbo, and Medium — collectively defined the company's open offering throughout 2025. Stability AI's later 2024–2026 work expanded primarily into adjacent modalities — including upgrades to Stable Video Diffusion (notably Stable Video 4D and SV4D 2.0 for 4D / multi-view video generation) — and into enterprise distribution channels (Amazon Bedrock, NVIDIA NIM, and direct API) rather than into a numbered SD4 successor.[^16][^27][^7][^8]
The ControlNet ecosystem for SD 3.5 continued to expand into 2025, with Stability AI signaling that additional control models — for SD 3.5 Medium and for new modalities such as additional structural and stylistic conditioning — were in development.[^26] Community-contributed LoRAs, IP-Adapter-style conditioning, and fine-tuned variants gradually accumulated on huggingface and CivitAI under the Community License, although coverage lagged the much larger SDXL ecosystem.[^25] During the same period, the broader open image-generation field shifted toward newer architectures and competitors such as the flux 1 family from black forest labs, ideogram 3, and Google imagen 4, which together defined much of the 2025–2026 frontier alongside SD 3.5.[^10]