Latent Consistency Models (LCM)
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 4,268 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 4,268 words
Add missing citations, update stale details, or suggest a clearer explanation.
Latent Consistency Models (LCMs) are a family of accelerated text-to-image generative models that apply the consistency-models framework of Song et al. (2023) to pre-trained latent diffusion models such as Stable Diffusion, enabling high-quality image synthesis in only one to four denoising steps rather than the twenty-five to fifty steps required by conventional samplers.[^1][^2] LCMs were proposed in October 2023 by Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao at the Institute for Interdisciplinary Information Sciences, Tsinghua University.[^1] The follow-up technical report "LCM-LoRA: A Universal Stable-Diffusion Acceleration Module," released one month later, packages the same distillation as a Low-Rank Adaptation (LoRA) module that can be plugged into any SDXL or Stable-Diffusion-v1.5 checkpoint without further training.[^3] The combination of low compute cost (a 768x768 LCM trains in roughly 32 A100 GPU hours), tight integration with the HuggingFace Diffusers library, and broad compatibility with fine-tuned community checkpoints made LCMs one of the central building blocks of the late-2023 wave of real-time image-generation products.[^1][^4]
Latent diffusion models (LDMs), introduced by Rombach et al. at CVPR 2022 and most widely deployed as Stable Diffusion, generate images by running an iterative denoising process inside the compressed latent space of a variational autoencoder rather than directly in pixel space.[^5] This reformulation reduces memory and compute, but inference still requires running the denoising U-Net tens of times per image, typically twenty-five to fifty calls for a sampler such as DDIM or the higher-order DPM-Solver family.[^5][^6] In practice this meant that a single 1024x1024 SDXL generation on consumer hardware took several seconds, foreclosing applications such as drawing in real time or generating images while a user types.
Two families of techniques addressed the latency problem before LCM. The first is fast ODE solvers: tools such as DDIM (Song et al., 2020) recast diffusion sampling as solving a deterministic ordinary differential equation that can be integrated in roughly twenty-to-fifty steps,[^7] and DPM-Solver (Lu et al., 2022) introduced a tailored high-order multistep solver that produces good samples in around ten function evaluations.[^6] These solvers do not change the underlying model; they only choose smarter integration step sizes. The second family is distillation, in which a fast student model is trained to imitate a slow teacher. Salimans and Ho's "Progressive Distillation" (2022) repeatedly halves the number of sampling steps, eventually reaching as few as four steps on CIFAR-10 with little quality loss.[^8] Guidance Distillation (Meng et al., 2022) further folds the cost of classifier-free guidance into a single network forward pass by conditioning the student on the guidance scale.[^9]
A third approach, consistency models, was proposed in March 2023 by Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever at OpenAI.[^2] Where progressive distillation pairs adjacent timesteps, consistency models impose a single self-consistency property: for any noise level along the probability flow ODE trajectory, the network should output the same clean sample. A network satisfying this constraint can therefore map noise directly to data in one step, while still supporting multistep refinement when a quality-compute trade-off is desired. The original paper achieves a one-step FID of 3.55 on CIFAR-10 and 6.20 on 64x64 ImageNet, outperforming earlier distillation methods at one-step generation.[^2] However, the OpenAI work targeted pixel-space diffusion at small resolutions; applying the same idea to the latent space of a large text-to-image model required additional machinery, which became the core contribution of Latent Consistency Models.
The paper "Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference," posted to arXiv as 2310.04378 on 6 October 2023, was authored by Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao at the Institute for Interdisciplinary Information Sciences (IIIS) at Tsinghua University.[^1] The headline claim is that "a high-quality 768x768 2-to-4-step LCM takes only 32 A100 GPU hours for training" and produces samples competitive with the underlying Stable Diffusion teacher.[^1]
The contribution can be decomposed into four pieces:
The first publicly released LCM checkpoint, SimianLuo/LCM_Dreamshaper_v7, was distilled from the popular Dreamshaper v7 fine-tune of Stable Diffusion 1.5 and reached usable quality in four denoising steps; the authors also released two-step and one-step variants showing graceful degradation.[^4][^10] Evaluation on the LAION-5B-Aesthetics subset reports that LCMs match or exceed prior few-step methods on standard FID and CLIP-score metrics, though as with all diffusion benchmarks the absolute numbers depend heavily on the chosen subset and sampling configuration.[^1]
The paper also introduces Latent Consistency Fine-tuning (LCF), a procedure for adapting an existing LCM to a custom image dataset without requiring access to a teacher diffusion model. LCF treats the LCM itself as both teacher and student over augmented trajectories, allowing community users to specialize LCMs on niche styles (the project repository ships examples on Pokemon and Simpsons datasets).[^10]
Latent diffusion models specify a forward process that progressively adds Gaussian noise to the latent representation z of an image. The corresponding reverse-time probability flow ODE is a deterministic differential equation whose solution at time zero recovers the clean latent.[^5] For an unconditional model the ODE depends only on the score function learned by the U-Net. For text-to-image generation, samples are guided by a prompt y via classifier-free guidance with a scale w, yielding a w-augmented ODE whose vector field interpolates between conditional and unconditional score estimates.
A consistency function is a parametric function f_theta(z_t, t) that maps any point on a particular ODE trajectory back to the same endpoint at t=0. The defining identity is
f_theta(z_t, t) = f_theta(z_{t'}, t') for all t, t' on the same trajectory.
Once f_theta has been learned, one-step generation reduces to drawing pure noise z_T and evaluating f_theta(z_T, T) once. Multi-step generation alternates between (i) evaluating f_theta to produce a candidate clean latent and (ii) re-injecting controlled noise to land on an earlier timestep, repeating for two to four iterations.[^2]
The LCM distillation loop in LCMScheduler follows the original consistency-distillation recipe of Song et al., with three changes that adapt it to a Stable Diffusion teacher.[^1] First, the loss is computed entirely in the VAE latent space, so the VAE encoder is run once per training image and then frozen. Second, the teacher signal at timestep t is computed by integrating the augmented PF-ODE for k skipping steps using a numerical solver such as DDIM, producing the "target" latent z_{t-k}. The student is then asked to ensure that f_theta(z_t, t) and f_theta_minus(z_{t-k}, t-k) agree, where f_theta_minus is an exponential-moving-average copy of the student weights.[^1] Third, the guidance scale w is sampled uniformly from a range (the paper uses [w_min, w_max] = [2, 14]) and the corresponding embedding is added to the time-embedding of the U-Net, mirroring Meng et al.'s guidance distillation.[^1][^9]
A separate consistency boundary condition c_skip(t), c_out(t) is used to parameterize the U-Net output so that at t = 0 the network returns the input unchanged; the HuggingFace Diffusers LCMScheduler exposes a timestep_scaling parameter (default 10.0) that controls how these boundary functions are scaled.[^11] This scaling factor is one of the few hyperparameters that practitioners may need to tune when distilling LCMs onto new base models.
At inference time, an LCM checkpoint is paired with the LCMScheduler provided by the Diffusers library.[^11] The scheduler implements Algorithm 3 of the LCM paper and supports both one-step and multistep sampling between one and eight steps; the Diffusers documentation recommends two-to-eight steps for general use, with four steps as the default in the LatentConsistencyModelPipeline.__call__ signature.[^12]
A subtle but important detail concerns guidance scale. The original LCM paper defines guidance scale so that w = 0 corresponds to "no guidance," whereas Stable Diffusion's traditional convention sets guidance_scale = 1 to mean no guidance.[^12] The Diffusers implementation follows the Stable Diffusion convention: the user passes guidance_scale = 8.5 in the pipeline call, and Diffusers internally subtracts one before computing the embedding fed to the U-Net.[^12] In practice LCM-LoRA model cards advise setting guidance_scale between 1.0 and 2.0 (Stable-Diffusion convention) because larger values, which work well for the full teacher, tend to oversaturate the few-step student.[^13]
The original LCM was distilled in approximately 32 A100 GPU-hours on a 768x768 dataset.[^1] By contrast, full-model fine-tunes of SDXL typically cost hundreds to thousands of GPU-hours, and progressive-distillation runs in the original Salimans-Ho paper consume tens of thousands of TPU-hours.[^8] At inference, the speed-up is similarly dramatic: HuggingFace's official LCM-LoRA blog post benchmarks an SDXL LCM-LoRA at four steps against the standard SDXL at twenty-five steps and reports speed-ups of roughly five-fold on an RTX 3090 (1.4s versus 7s), ten-fold on an M1 Mac (about 6s versus 60s), and three-fold on an A100 80GB (1.2s versus 3.8s) for a single 1024x1024 image.[^4]
One month after the original LCM paper, the same group released a follow-up technical report titled "LCM-LoRA: A Universal Stable-Diffusion Acceleration Module" (arXiv:2311.05556, 9 November 2023).[^3] The author list expanded to include several core HuggingFace Diffusers maintainers: Simian Luo, Yiqin Tan, Suraj Patil, Daniel Gu, Patrick von Platen, Apolinario Passos, Longbo Huang, Jian Li, and Hang Zhao.[^3] The collaboration was made public alongside an official HuggingFace blog post that shipped LCM-LoRA support in the Diffusers library on the same day.[^4]
LCM-LoRA observes that the distilled LCM U-Net differs from its teacher Stable Diffusion U-Net only in a small subspace of weight directions. Rather than fine-tune all parameters, the LCM training loss is applied to a low-rank LoRA adapter that is injected into the cross-attention and self-attention layers of the U-Net.[^3] The resulting adapter has on the order of 100M-200M parameters compared to the 2.6B parameters of SDXL, drastically reducing the memory footprint of distillation. After training, the LoRA weights can be (i) merged into a base checkpoint for deployment, or (ii) loaded at inference and switched on or off like any other LoRA, which means the same adapter accelerates many derived checkpoints.[^3][^4]
The empirical centerpiece of the report is that a single LCM-LoRA distilled against the base SDXL checkpoint generalizes to community fine-tunes of SDXL without retraining.[^3][^4] In other words, a user who downloads the official latent-consistency/lcm-lora-sdxl adapter (197M parameters) can apply it on top of any SDXL Dreambooth or community fine-tune and obtain four-step inference without performing any distillation themselves.[^13] The paper interprets this as evidence that LCM-LoRA functions as a "plug-in neural PF-ODE solver," generalizing across the family of nearby diffusion models in a way that traditional numerical solvers such as DDIM and DPM-Solver do.[^3]
The release shipped three official LoRA adapters and two full-parameter LCM checkpoints, all hosted on the HuggingFace Hub under the latent-consistency organization.[^4]
| Checkpoint | Base model | Parameters | Recommended steps | Release |
|---|---|---|---|---|
latent-consistency/lcm-lora-sdv1-5 | Stable Diffusion v1.5 | ~67M | 4-8 | November 2023[^4] |
latent-consistency/lcm-lora-sdxl | Stable Diffusion XL base 1.0 | ~197M | 4-8 | November 2023[^13] |
latent-consistency/lcm-lora-ssd-1b | SSD-1B (segmind/SSD-1B) | ~129M | 4-8 | November 2023[^4] |
latent-consistency/lcm-sdxl | SDXL base 1.0 (full UNet) | ~2.6B | 1-8 | November 2023[^4] |
latent-consistency/lcm-ssd-1b | SSD-1B (full UNet) | ~1.3B | 1-8 | November 2023[^4] |
Reported guidance for lcm-lora-sdxl: between two and eight inference steps; guidance_scale of 1.0 (effectively disabled) or in the range 1.0-2.0; usable image-to-image, inpainting, ControlNet, and T2I-adapter pipelines via the same LCMScheduler.[^13]
The reference implementation of LCM is the LCMScheduler class in the diffusers library, contributed by Simian Luo (luosiallen), Daniel Gu (dg845), and others, and merged into the Diffusers main branch shortly after the paper release in October 2023.[^11][^12] The scheduler exposes the multistep algorithm of Section 4.3 of the paper and supports the "Skipping-Step" timestep schedule that LCM uses internally.[^11]
Diffusers exposes two LCM-specific pipelines, LatentConsistencyModelPipeline for text-to-image and LatentConsistencyModelImg2ImgPipeline for image-to-image generation, both of which inherit the standard mixins for LoRA loading, CLIP text encoding, textual inversion, and IP-Adapter integration.[^12] Sample usage from the official documentation:
from diffusers import DiffusionPipeline, LCMScheduler
import torch
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0",
variant="fp16")
pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.to(device="cuda", dtype=torch.float16)
images = pipe(prompt="close-up photograph of an old man in the rain at night",
num_inference_steps=4, guidance_scale=1).images
Beyond the core pipelines, LCM-LoRA was designed to be stackable with other LoRAs. The HuggingFace integration allows a user to load an LCM-LoRA together with a style or character LoRA, weighing each adapter independently, so the same model can produce four-step inference in a fine-tuned artistic style without retraining.[^4] Tools and pipelines such as ControlNet, image-to-image, and inpainting can all be driven by an LCM-LoRA-equipped UNet, which is what makes LCMs broadly useful in downstream workflows.[^4]
LCMScheduler also exposes optional zero-SNR rescaling (rescale_betas_zero_snr), v-prediction, and configurable original_inference_steps (default 50) to support arbitrary spacing of distillation and inference timesteps. The default timestep_scaling=10.0 controls the boundary-condition coefficients; per the Diffusers docstring, "the approximation error at the default of 10.0 is already pretty small."[^11]
The release of LCM-LoRA coincided with a wave of "real-time" generative-image products that exploit sub-second inference to allow users to draw, type, or move a camera and see the AI output update fluidly. The two most widely cited examples are Krea AI and fal.ai.
Krea AI launched a real-time canvas in November 2023 in which users sketch shapes and the system regenerates a photorealistic image roughly every frame. Krea's real-time stack uses LCM-LoRA on top of SDXL and Stable Diffusion variants and advertises sub-50ms updates on its real-time canvas.[^14] Founder Diego Rodriguez and Victor Perez positioned the product as a way to bridge "rough sketch and polished visual in milliseconds," a use-case made viable only by the LCM-class step counts.[^14] VentureBeat covered the release on 9 November 2023 under the headline "Realtime generative AI art is here thanks to LCM-LoRA," explicitly tying the product wave to the Luo et al. paper.[^15]
The serverless inference provider fal.ai released a real-time LCM endpoint shortly after the LCM-LoRA paper and documented end-to-end latencies of about 150ms for Stable Diffusion v1.5 LCM and 650ms for SDXL LCM at four inference steps, of which roughly 120ms is GPU compute.[^16] fal's blog notes that "real-time" applications such as drawing-canvas integrations and live-camera image generation are now feasible because LCM inference is dominated by network round-trip rather than GPU work.[^16] The provider collaborated with the collaborative-whiteboard tool tldraw on a real-time generation feature and helped popularize the "LCM-Painter" interactive space on HuggingFace.[^16]
LCM and LCM-LoRA checkpoints are also supported by community front-ends including ComfyUI and Automatic1111's Stable-Diffusion-WebUI, and by C# / ONNX-runtime ports for CPU and edge inference.[^10] The combination of low VRAM requirements (LCM-LoRA-SDv1.5 weighs about 67MB) and few-step inference made LCM popular for self-hosted and on-device workloads where Stability AI's full SDXL Turbo was unavailable due to its noncommercial-research license.
LCM occupies a distinct point in the design space of fast diffusion samplers. The principal contemporaries are numerical ODE solvers, progressive distillation, guidance distillation, and Adversarial Diffusion Distillation. The table below summarizes their characteristics; entries are drawn from each method's reference paper or official documentation.
| Method | Type | Reported steps | Requires training | Reference |
|---|---|---|---|---|
| DDIM | Numerical PF-ODE solver | 20-50 | No | Song et al. 2020[^7] |
| DPM-Solver / DPM-Solver++ | Higher-order multistep solver | 10-20 | No | Lu et al. 2022[^6] |
| Progressive Distillation | Student-teacher halving | 4-8 | Yes (multi-stage) | Salimans & Ho 2022[^8] |
| Guidance Distillation | Folds CFG into one pass | unchanged | Yes | Meng et al. 2022[^9] |
| Consistency Models | Pixel-space self-consistency | 1-4 | Yes | Song et al. 2023[^2] |
| Latent Consistency Models | Latent-space self-consistency | 1-4 | Yes (32 A100h) | Luo et al. 2023[^1] |
| LCM-LoRA | LCM as low-rank adapter | 2-8 | Yes (LoRA only) | Luo et al. 2023[^3] |
| Adversarial Diffusion Distillation (SDXL Turbo) | GAN + score distillation | 1-4 | Yes | Sauer et al. 2023[^17] |
DPM-Solver is a training-free higher-order solver: it can be dropped onto an existing Stable Diffusion checkpoint and reduces inference from ~50 to ~10-20 steps without modifying the model.[^6] LCM cannot match this convenience because it requires a distillation pass, but in exchange it goes substantially further down the step count (one to four versus ten to twenty) and exposes a small LoRA that can be cached and shared. The LCM-LoRA paper explicitly frames LCM-LoRA as "a plug-in neural PF-ODE solver" that complements numerical solvers when the lowest possible step count is required.[^3]
Progressive Distillation (Salimans and Ho, 2022) reduces step counts via a chain of halving stages, each of which trains a new student that takes half as many steps as its teacher. Reaching four steps from a 1000-step teacher therefore requires multiple distillation rounds, each comparable in cost to a fine-tuning run.[^8] LCM, by contrast, distils a few-step student in a single training stage by enforcing a global self-consistency property, which is what enables the 32 GPU-hour training budget reported by Luo et al.[^1]
Stability AI released SDXL Turbo on 28 November 2023, three weeks after LCM-LoRA. SDXL Turbo is trained with Adversarial Diffusion Distillation (ADD), proposed by Sauer et al., which combines score distillation against a teacher diffusion model with a GAN-style discriminator that scores the realism of the student's one-step outputs.[^17] Stability AI's announcement reports that SDXL Turbo at one step is preferred by human evaluators over LCM-XL at four steps on prompt-following and image-quality measures.[^17] The comparison is not entirely apples-to-apples (LCM-XL is a free-research artifact while SDXL Turbo was released under a non-commercial license at first), but it illustrates the broader pattern that GAN-style discriminator losses can buy additional one-step quality at the price of more complex training infrastructure.[^17]
Subsequent work has continued to push this frontier. Bytedance's SDXL-Lightning (Lin et al., 2024) combines progressive distillation with adversarial training to reach competitive one-step quality, while academic follow-ups such as Multistep Consistency Models unify consistency distillation and TRACT to interpolate between LCM and progressive distillation regimes.[^18]
LCMs and LCM-LoRA are used wherever diffusion-model latency was previously a bottleneck:
LCMs come with several documented caveats:
openrail++, which permits commercial use; users distilling LCMs from licensed checkpoints must respect the upstream terms.[^13]