See also: Stable Diffusion, Stable Diffusion XL, Latent Diffusion
| Field | Value |
|---|---|
| Hugging Face repo (original) | runwayml/stable-diffusion-v1-5 (deprecated) |
| Active community mirrors | stable-diffusion-v1-5/stable-diffusion-v1-5, Comfy-Org/stable-diffusion-v1-5-archive, sd-legacy/stable-diffusion-v1-5, benjamin-paine/stable-diffusion-v1-5 |
| Original publisher | Runway (RunwayML) |
| Model family | Stable Diffusion 1.x |
| Type | Multimodal latent diffusion model |
| Task | Text-to-image generation |
| Library | Diffusers |
| License | CreativeML Open RAIL-M |
| Released | October 20, 2022 |
| Removed by Runway | August 29, 2024 |
| Native resolution | 512 x 512 pixels |
| Parameters (UNet) | ~860 million |
| Text encoder | CLIP ViT-L/14 (~123M parameters, frozen) |
| Training data | LAION-2B (en) and LAION-aesthetics v2 5+ subset of LAION-5B |
| Key papers | arXiv:2112.10752, arXiv:2207.12598, arXiv:2103.00020, arXiv:2205.11487 |
Runwayml/stable-diffusion-v1-5 is the Hugging Face repository name of the Stable Diffusion v1.5 checkpoint, a text-to-image latent diffusion model published on October 20, 2022 by Runway (then branded RunwayML). It is the fifth and most widely used checkpoint in the Stable Diffusion 1.x series, fine-tuned from the v1.2 checkpoint for an additional 595,000 training steps at 512 x 512 resolution on the LAION-aesthetics v2 5+ subset of LAION-5B. For roughly two years it was the de facto open-source baseline for diffusion-based image generation, the model trained against by DreamBooth, LoRA, ControlNet, and almost every popular community fine-tune on CivitAI and Hugging Face.
On August 29, 2024, Runway deleted its entire Hugging Face organization, taking down both runwayml/stable-diffusion-v1-5 and runwayml/stable-diffusion-inpainting. The empty organization page now reads only "We are no longer maintaining a HuggingFace organization." Because thousands of downstream pipelines (including AUTOMATIC1111, ComfyUI, and the official Diffusers library) used the repo path as a default fallback, the deletion broke installs across the open-source ecosystem until community mirrors took its place. The most prominent of those mirrors are stable-diffusion-v1-5/stable-diffusion-v1-5, Comfy-Org/stable-diffusion-v1-5-archive, and benjamin-paine/stable-diffusion-v1-5, all of which redistribute the original hash-identical weights under the same CreativeML Open RAIL-M license.
Despite being technically obsolete next to SDXL, Stable Diffusion 3, and 2024-era successors like Flux, the v1.5 checkpoint remains in heavy production use in 2026 because of its small footprint, fast sampling, and unmatched ecosystem of fine-tunes. The Comfy-Org archive alone reports over 170,000 downloads per month, and the community mirror at stable-diffusion-v1-5/stable-diffusion-v1-5 has more than 600 adapters and 370 fine-tuned children listed on Hugging Face.
Stable Diffusion 1.5 sits at the end of a chain of related checkpoints, all derived from the same architecture but produced by different research groups during 2022.
The underlying architecture comes from "High-Resolution Image Synthesis with Latent Diffusion Models" by Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer (arXiv:2112.10752). The paper was first posted in December 2021 and presented at CVPR 2022. The authors were affiliated with the Machine Vision and Learning research group (CompVis) at Ludwig Maximilian University of Munich, with Patrick Esser also working at Runway. The paper's central contribution was performing the diffusion model process inside a learned latent space rather than at full pixel resolution, cutting memory and compute requirements by roughly an order of magnitude while preserving image quality.
Stability AI provided the compute, datasets, and engineering coordination needed to scale the latent diffusion architecture into a public release. The original Stable Diffusion checkpoints (v1.1 through v1.4) were trained at CompVis on Stability AI's A100 cluster and uploaded to Hugging Face in August 2022 under the CompVis/ namespace. Version 1.4 went public on August 22, 2022 and is the checkpoint most people initially knew as "Stable Diffusion."
Runway's involvement traces back to Patrick Esser, a co-author of the latent diffusion paper who also led generative model research at the company. After v1.4 launched, the team continued fine-tuning the model on the same LAION-aesthetics dataset. Runway packaged the resulting checkpoint as Stable Diffusion v1.5 and uploaded it to its own Hugging Face organization on October 20, 2022 instead of routing the release through Stability AI or CompVis.
The release was briefly contested. Stability AI's then-CEO Emad Mostaque sent a takedown request to Hugging Face on October 19, 2022, arguing that Runway should not publish a new official checkpoint without Stability's involvement, then withdrew the takedown the next day after community pushback. After that, no further checkpoint in the Stable Diffusion series was released by Runway. Versions 2.0, 2.1, XL, 3.0, and 3.5 were all published by Stability AI alone.
| Checkpoint | Publisher | Date | Initialized from | Additional fine-tuning |
|---|---|---|---|---|
| sd-v1-1 | CompVis | Aug 2022 | random init / LAION-2B-en pretraining | 237k steps at 256x256, then 194k at 512x512 |
| sd-v1-2 | CompVis | Aug 2022 | sd-v1-1 | 515k steps at 512x512 on LAION-improved-aesthetics |
| sd-v1-3 | CompVis | Aug 2022 | sd-v1-2 | 195k steps with 10% text dropout |
| sd-v1-4 | CompVis | Aug 22, 2022 | sd-v1-2 | 225k steps on LAION-aesthetics v2 5+, 10% dropout |
| sd-v1-5 | Runway | Oct 20, 2022 | sd-v1-2 | 595k steps on LAION-aesthetics v2 5+, 10% dropout |
A practical consequence of this lineage: v1.5 is not a continuation of v1.4 but a parallel branch from the same v1.2 starting point, trained for roughly 2.6 times as many steps on the same aesthetics-filtered data. Both checkpoints share the v1.x architecture and the same CLIP ViT-L/14 text encoder, so prompts and weights generally transfer directly between them, but the longer fine-tune in v1.5 produces visibly better composition and prompt adherence on most subjects.
Stable Diffusion 1.5 follows the standard latent diffusion model design from Rombach et al. (2022). It has three components that load as separate sub-models inside the Diffusers StableDiffusionPipeline.
A Variational Autoencoder compresses 512 x 512 x 3 RGB images into a 64 x 64 x 4 latent tensor (an 8x downsampling factor on each spatial dimension and a 12x reduction in total floating-point values). The encoder is used at training time and for image-to-image workflows; the decoder is used at inference to map sampled latents back to pixel space. The VAE itself was trained earlier in the latent diffusion pipeline and is reused unchanged across all 1.x checkpoints.
The heart of v1.5 is an approximately 860 million parameter U-Net that learns to predict the noise added to a latent at each step of a forward diffusion process. The U-Net has cross-attention layers that integrate text embeddings produced by the CLIP text encoder, which is what makes the network text-conditional. The architecture combines ResNet blocks for the convolutional backbone with Vision Transformer style self-attention and cross-attention blocks. Roughly speaking, the U-Net is what changes between v1.4 and v1.5, since the additional 595k training steps update its weights while the VAE and text encoder stay frozen.
v1.5 uses OpenAI's CLIP ViT-L/14 model as its frozen text encoder, contributing roughly 123 million parameters. CLIP turns prompt strings into 77-token sequences of 768-dimensional embeddings that the U-Net cross-attends over. This choice was inherited from the v1.x line and is one of the main reasons the community stayed on 1.5 rather than upgrading to 2.x: SD 2.0 swapped CLIP ViT-L/14 for OpenCLIP ViT-H/14, which broke prompt phrasings that everyone had memorized and degraded recognition of named celebrities and artists.
| Component | Spec | Parameters | Notes |
|---|---|---|---|
| VAE | 8x spatial downsampling, 4 latent channels | ~84M | Frozen during v1.5 fine-tuning |
| U-Net denoiser | Cross-attention to text embeddings | ~860M | The component actually being trained |
| Text encoder | CLIP ViT-L/14 (frozen) | ~123M | 77-token context, 768-dim embeddings |
| Total trainable | U-Net only | ~860M | VAE and CLIP stay frozen |
| Native output resolution | 512 x 512 RGB | n/a | Higher works at degraded quality without ControlNet or upscalers |
v1.5 ships with a default scheduler of PNDM (closely related to DDIM) at 50 steps and a guidance scale around 7.5, but in practice it is one of the most-tested diffusion models for alternative samplers. DDIM, DPM-Solver, DPM-Solver++, Euler, Euler-A, Heun, LMS, and UniPC samplers are all routinely used; on modern GPUs, samplers like DPM-Solver++ 2M Karras can reach the same image quality in 20-30 steps that PNDM needed 50 for.
The training corpus is a subset of LAION-5B, specifically the LAION-aesthetics v2 5+ filter (images with predicted aesthetic score above 5 from the LAION aesthetic predictor, drawn from LAION-2B-en). LAION-5B itself was assembled by the LAION non-profit by scraping image-text pairs from the Common Crawl, with optional CLIP-based filtering. The aesthetics subset is meant to bias the training distribution toward visually appealing images rather than the whole web.
Starting from the sd-v1-2 checkpoint, Runway ran 595,000 additional optimizer steps at 512 x 512 resolution. Each minibatch dropped its text conditioning 10% of the time (the standard classifier-free guidance trick), giving the model an unconditional generation mode that can be combined with the conditional mode at inference. The model card published by Robin Rombach and Patrick Esser lists the training hardware as 32 nodes of 8 A100 GPUs each, with AdamW, gradient accumulation of 2, and an effective batch size of 2,048. Total training compute is reported as approximately 150,000 A100 hours and roughly 11,250 kg CO2-equivalent emissions on AWS US-East.
v1.5 was trained at a fixed 512 x 512 resolution. Inference at higher resolutions like 768 x 768 or 1024 x 1024 generally produces duplicated subjects (for instance, two heads instead of one) because the model has no training signal for those aspect ratios. Workflows that need higher resolution typically generate at 512x512 then upscale with a separate model such as ESRGAN, Real-ESRGAN, or the SD-based latent upscalers.
v1.5 is released under the CreativeML Open RAIL-M license, the same license CompVis used for v1.4. RAIL stands for "Responsible AI License" and was developed by BigScience together with the RAIL Initiative. It allows commercial use, redistribution, and derivative works, but adds a list of use-based restrictions in Attachment A: the model and its derivatives may not be used for things like generating sexual content involving minors, harassment, defamation, dangerous medical advice, mass surveillance, or content that violates applicable law.
Because RAIL adds use restrictions, it is not a free or open-source license under the OSI definition or the FSF's Free Software Definition. In practice, however, the open-source community has treated it as functionally open: the weights are downloadable, fine-tunable, and redistributable, and most of the use restrictions match what reputable platforms would already enforce.
Stable Diffusion 1.5 is, by a comfortable margin, the most fine-tuned generative image model ever released. Several factors drove its dominance from late 2022 through 2024.
A full inference run of v1.5 fits in roughly 4-6 GB of VRAM at fp16, and the model can run on a CPU with optimized runtimes such as OpenVINO. That made it the first text-to-image model that hobbyists could realistically run on a consumer gaming GPU, which was a major shift from earlier models like the original DALL-E, which was never released as weights at all, or Imagen, which Google kept closed.
Within a few months of v1.5's release, three main fine-tuning techniques became standard, all of which were popularized using v1.5 as the base.
| Technique | First public release | What it does | Why v1.5 was the base |
|---|---|---|---|
| DreamBooth | August 2022 paper, 2022-2023 SD ports | Fine-tunes the entire U-Net to teach a new concept (subject, person, object) from 3-5 reference images | First widely used SD fine-tuning method; community DreamBooth tooling targeted v1.5 |
| Textual Inversion | August 2022 | Trains a new token embedding without changing model weights | Light enough to share as a small .pt file; thousands of these were trained for v1.5 |
| LoRA (Low-Rank Adaptation) | Late 2022 / early 2023 SD ports | Inserts low-rank weight deltas into the U-Net's cross-attention layers | Small file sizes (often under 200 MB) made LoRAs viral on CivitAI |
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala released ControlNet in February 2023, presenting it at ICCV 2023. ControlNet adds spatial conditioning (Canny edges, depth maps, OpenPose skeletons, scribbles, segmentation masks) to a pretrained diffusion model. The first wave of ControlNet checkpoints (released by Lvmin Zhang's group as lllyasviel/sd-controlnet-*) was trained on top of Stable Diffusion 1.5. Even after SDXL launched, the SD 1.5 ControlNet collection remained more comprehensive, with checkpoints for inpainting, tile, soft edge, M-LSD lines, normal maps, segmentation, and OpenPose all targeting v1.5.
The community website CivitAI, launched in late 2022, became the primary distribution channel for v1.5-based fine-tunes. By mid-2024 it hosted tens of thousands of v1.5 derivatives. Notable families include:
| Fine-tune family | Style | Base model |
|---|---|---|
| Waifu Diffusion (versions 1.2, 1.3, 1.4) | Anime / Danbooru tag style | Stable Diffusion 1.4, then 1.5 |
| Anything V3, V4, V5 | Anime, NovelAI-leak inspired | Stable Diffusion 1.5 |
| Dreamlike Photoreal, Dreamlike Diffusion | Photoreal portraits | Stable Diffusion 1.5 |
| Realistic Vision (versions 1.0 through 6.0) | Photorealistic humans | Stable Diffusion 1.5 |
| Deliberate (versions 1.0, 2.0, 3.0) | General-purpose, painterly | Stable Diffusion 1.5 |
| ChilloutMix | Asian portrait photorealism | Stable Diffusion 1.5 |
| Counterfeit | Anime illustration | Stable Diffusion 1.5 |
| OpenJourney | Midjourney-style aesthetic | Stable Diffusion 1.5 |
| epiCRealism, MajicMix, Beautiful Realistic Asians | Photorealism merges | Stable Diffusion 1.5 |
Many of these models in turn became base models for further fine-tuning, producing a dense tree of merges. The phrase "trained on SD 1.5" became shorthand for the open-source image-generation stack.
| Version | Released | Publisher | Architecture | Native resolution | Text encoder | Why people did or did not switch |
|---|---|---|---|---|---|---|
| SD 1.4 | Aug 22, 2022 | CompVis | LDM with U-Net (~860M) | 512 | CLIP ViT-L/14 | First widely available SD; replaced by 1.5 within months |
| SD 1.5 | Oct 20, 2022 | Runway | LDM with U-Net (~860M) | 512 | CLIP ViT-L/14 | Better than 1.4 across the board, became the base for the entire ecosystem |
| SD 2.0 | Nov 24, 2022 | Stability AI | LDM with U-Net | 512 / 768 | OpenCLIP ViT-H/14 | New text encoder broke prompts, NSFW filter cut training data, community largely stayed on 1.5 |
| SD 2.1 | Dec 7, 2022 | Stability AI | LDM with U-Net | 512 / 768 | OpenCLIP ViT-H/14 | Less aggressive filter; better than 2.0 but still couldn't dethrone 1.5 |
| SDXL 1.0 | Jul 26, 2023 | Stability AI | LDM with U-Net (~3.5B base + 2.3B refiner) | 1024 | CLIP ViT-L/14 + OpenCLIP ViT-bigG/14 | Higher quality and resolution; many users moved over but kept 1.5 for fast iteration and existing fine-tunes |
| SD 3 Medium | Jun 12, 2024 | Stability AI | MMDiT (no U-Net) (~2B) | 1024 | CLIP-L + CLIP-G + T5-XXL | Architectural reset; mixed reception due to a restrictive initial license and weak human anatomy at launch |
| SD 3.5 | Oct 22, 2024 | Stability AI | MMDiT (~2.5B and ~8B variants) | 1024 | CLIP-L + CLIP-G + T5-XXL | Better than SD 3 but launched into a market that already had Flux as a strong open competitor |
For a comparable open competitor outside the Stable Diffusion family, Black Forest Labs released Flux.1 in August 2024. Flux is built by some of the same researchers who built Stable Diffusion (including Robin Rombach), uses a hybrid MMDiT and parallel attention transformer, and has largely supplanted SDXL and SD 3 as the open-weights leader for raw quality. v1.5 still wins on speed, hardware footprint, and ecosystem depth.
On or around August 28-29, 2024, Runway took down its entire Hugging Face organization page, removing both runwayml/stable-diffusion-v1-5 and runwayml/stable-diffusion-inpainting. The first public reports surfaced in a Hugging Face issue thread on the diffusers library on August 29, 2024 (Issue #9322), followed by a Stable Diffusion WebUI bug report on September 3, 2024 (Issue #16459).
Runway did not issue a public explanation. The organization page was reduced to a single line: "We are no longer maintaining a HuggingFace organization." There was no blog post, no tweet, and no press release.
In the absence of an official statement, the community has speculated about three overlapping reasons.
None of these have been confirmed by Runway. The factual situation is that the repository disappeared without explanation in late August 2024.
Within hours of the deletion, mirror copies appeared. The most stable replacements have been:
| Mirror | Maintainer | Notes |
|---|---|---|
| stable-diffusion-v1-5/stable-diffusion-v1-5 | Community organization | Now the canonical Hugging Face mirror; updated model card; default replacement in new Diffusers documentation |
| sd-legacy/stable-diffusion-v1-5 | Community organization | Identical weights; Diffusers code samples use this path |
| Comfy-Org/stable-diffusion-v1-5-archive | Comfy-Org (ComfyUI maintainers) | Hash-identical originals plus an FP16 conversion; explicitly archival |
| benjamin-paine/stable-diffusion-v1-5 | Benjamin Paine | Mirror used by various third-party tools |
| botp/stable-diffusion-v1-5 | botp | Early mirror referenced by the AUTOMATIC1111 community |
| Civitai (model id 62437) | Civitai | The Civitai checkpoint page predates the deletion |
The Hugging Face Diffusers team also patched the library so that the legacy default path no longer breaks installs, falling back to the community mirror in newer releases.
Stable Diffusion 1.5 has been the highest-profile model in several ongoing legal and policy debates about generative image models.
The two main cases are Andersen v. Stability AI and Getty Images v. Stability AI. The Andersen case, brought by artists Sarah Andersen, Kelly McKernan, and Karla Ortiz in January 2023, alleges that Stability AI, Midjourney, and DeviantArt infringed copyright by training image-generation models on billions of scraped images. The court dismissed several claims in October 2023 but allowed core direct-infringement and induced-infringement theories to proceed against Stability AI in an amended complaint. The Getty Images case, filed in the same month in both the United States (District of Delaware) and the United Kingdom (England and Wales High Court), alleges large-scale copying of Getty's image library and visible Getty watermarks in some Stable Diffusion outputs as evidence. The UK trial began in mid-2025 and produced a mixed result, with the High Court rejecting most claims but leaving some narrow issues open.
In both cases, Stable Diffusion 1.4 and 1.5 are the specific checkpoints whose training data is at issue, because they are the versions trained on LAION-2B-en before LAION's 2024 cleanup.
In December 2023, the Stanford Internet Observatory's report "Identifying and Eliminating CSAM in Generative ML Training Data and Models" identified more than 1,000 instances of suspected CSAM in LAION-5B by hash-matching against the National Center for Missing and Exploited Children's PhotoDNA database. LAION removed LAION-5B from public download in response, and later released a cleaned version (Re-LAION-5B) with the flagged URLs removed. SD 1.5's training data predates this cleanup.
The Diffusers StableDiffusionPipeline ships with an optional safety checker that compares CLIP embeddings of generated images against a hard-coded set of NSFW concept embeddings and replaces any match with a black image. The safety checker is opt-out in the official pipeline but is routinely disabled in community tooling like AUTOMATIC1111 because false positives are common and because end users want to enable inpainting on their own bodies. Stability AI also separately removed access to early checkpoints (v1.0 through v1.4 were unaffected on Hugging Face, but Stability stopped promoting them) as the legal environment tightened in 2023.
Stable Diffusion 1.5's strengths and weaknesses both come from its 2022 design.
The resolution ceiling is 512 x 512 in practice. Trying to generate at 1024 x 1024 reliably produces duplicated subjects unless you use ControlNet tile, latent upscalers, or a high-resolution fix workflow that generates at 512 then refines.
Hands and fingers are notoriously bad. The model frequently generates six-fingered hands, fused fingers, or anatomically impossible wrists. This is partly a training-data artifact (cropped Web images often hide hands) and partly a limitation of the U-Net's spatial reasoning. SDXL improved on this, and SD 3 and Flux improved further.
Text rendering inside images is essentially unusable. Words come out as garbled letter-like shapes. This was solved by SDXL with dual text encoders and by SD 3 with the T5-XXL encoder; v1.5 cannot do it.
Compositional prompts with multiple distinct objects in specific spatial relationships ("a red cube on top of a blue sphere next to a green cylinder") usually fail. The CLIP text encoder collapses these descriptions into a bag-of-attributes representation that the U-Net cannot reliably parse.
Bias in the training data shows up in outputs. The LAION corpus is heavily English-language and over-represents Western, white subjects. Prompts for generic professions tend to produce white men by default; prompts for criminals or homeless people tend to produce stereotyped non-white faces. The model card explicitly documents this.
These limitations are real, but in practice they are overcome by the ecosystem. ControlNet handles composition, fine-tunes handle subject specialization, hands can be inpainted with specialized inpainting models, and upscalers handle resolution. The base 1.5 checkpoint is treated less as a finished product and more as a substrate for further work.
In 2026, four years after its release, Stable Diffusion 1.5 is still in heavy production use.
ComfyUI workflows routinely combine v1.5 with ControlNet for jobs where SDXL or Flux would be overkill: rapid iteration on character art, batch generation of stock-style imagery, asset generation for indie game development, and inpainting tasks where a specialized v1.5 fine-tune (Realistic Vision, epiCRealism) gives better identity preservation than larger general-purpose models.
Cost-sensitive deployments keep it alive at scale. A 512 x 512 SD 1.5 generation on a consumer GPU finishes in under a second with DPM-Solver++ at 25 steps, while an SDXL or Flux generation at the same speed needs much more VRAM. For any application where image quality requirements are met by 512 x 512 with a good fine-tune, v1.5 is an order of magnitude cheaper to run.
Distillation and acceleration research targets v1.5 disproportionately because of its ecosystem. Latent Consistency Models (Luo et al. 2023), SDXL Turbo's 1-step inference techniques, and the LCM-LoRA extension all originally landed for SD 1.5 first.
The canonical install path in 2026 is no longer runwayml/stable-diffusion-v1-5 but the community mirror at stable-diffusion-v1-5/stable-diffusion-v1-5 or the Comfy-Org/stable-diffusion-v1-5-archive. Newer Diffusers documentation and tutorials use these paths. The original Runway repo is preserved only in browser caches, mirrors, and the bibliography of papers from the model's heyday.