Latent Consistency Models (LCM)

Diffusion Models Generative AI Image Generation

26 min read

Updated Jul 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 23, 2026

Fact-checked

In review queue

Sources

29 citations

Revision

v5 · 5,150 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Latent Consistency Models (LCMs) are a family of accelerated text-to-image generative models that apply the consistency-models framework of Song et al. (2023) to pre-trained latent diffusion models such as Stable Diffusion, enabling high-quality image synthesis in only one to four denoising steps rather than the twenty-five to fifty steps required by conventional samplers.^[1]^[2] LCMs were proposed in October 2023 by Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao at the Institute for Interdisciplinary Information Sciences, Tsinghua University.^[1] The follow-up technical report "LCM-LoRA: A Universal Stable-Diffusion Acceleration Module," released one month later, packages the same distillation as a Low-Rank Adaptation (LoRA) module that can be plugged into any SDXL or Stable-Diffusion-v1.5 checkpoint without further training.^[3] The combination of low compute cost (a 768x768 LCM trains in roughly 32 A100 GPU hours), tight integration with the HuggingFace Diffusers library, and broad compatibility with fine-tuned community checkpoints made LCMs one of the central building blocks of the late-2023 wave of real-time image-generation products and a widely cited example of accelerated generative AI.^[1]^[4]

Background

Latent diffusion models (LDMs), introduced by Rombach et al. at CVPR 2022 and most widely deployed as Stable Diffusion, generate images by running an iterative denoising process inside the compressed latent space of a variational autoencoder rather than directly in pixel space.^[5] This reformulation reduces memory and compute, but inference still requires running the denoising U-Net tens of times per image, typically twenty-five to fifty calls for a sampler such as DDIM or the higher-order DPM-Solver family.^[5]^[6] In practice this meant that a single 1024x1024 SDXL generation on consumer hardware took several seconds, foreclosing applications such as drawing in real time or generating images while a user types.

Two families of techniques addressed the latency problem before LCM. The first is fast ODE solvers: tools such as DDIM (Song et al., 2020) recast diffusion sampling as solving a deterministic ordinary differential equation that can be integrated in roughly twenty-to-fifty steps,^[7] and DPM-Solver (Lu et al., 2022) introduced a tailored high-order multistep solver that produces good samples in around ten function evaluations.^[6] These solvers do not change the underlying model; they only choose smarter integration step sizes. The second family is distillation, in which a fast student model is trained to imitate a slow teacher. Salimans and Ho's "Progressive Distillation" (2022) repeatedly halves the number of sampling steps, eventually reaching as few as four steps on CIFAR-10 with little quality loss.^[8] Guidance Distillation (Meng et al., 2022) further folds the cost of classifier-free guidance into a single network forward pass by conditioning the student on the guidance scale.^[9]

A third approach, consistency models, was proposed in March 2023 by Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever at OpenAI.^[2] Where progressive distillation pairs adjacent timesteps, consistency models impose a single self-consistency property: for any noise level along the probability flow ODE trajectory, the network should output the same clean sample. A network satisfying this constraint can therefore map noise directly to data in one step, while still supporting multistep refinement when a quality-compute trade-off is desired. The original paper achieves a one-step FID of 3.55 on CIFAR-10 and 6.20 on 64x64 ImageNet, outperforming earlier distillation methods at one-step generation.^[2] However, the OpenAI work targeted pixel-space diffusion at small resolutions; applying the same idea to the latent space of a large text-to-image model required additional machinery, which became the core contribution of Latent Consistency Models and distinguishes them from the pixel-space Consistency Models of Song et al.

The LCM Paper

The paper "Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference," posted to arXiv as 2310.04378 on 6 October 2023, was authored by Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao at the Institute for Interdisciplinary Information Sciences (IIIS) at Tsinghua University.^[1] The headline claim is that "a high-quality 768x768 2-to-4-step LCM takes only 32 A100 GPU hours for training" and produces samples competitive with the underlying Stable Diffusion teacher.^[1]

The contribution can be decomposed into four pieces:

Latent-space consistency distillation. The authors port consistency distillation from pixel space (where Song et al. operated) to the compressed VAE latent space used by Stable Diffusion. Distillation runs in the latent space and uses the teacher's frozen encoder and decoder, so the student inherits the same image domain.^[1]
An augmented probability flow ODE. Because Stable Diffusion is a guided model, the relevant deterministic trajectory is not the bare diffusion ODE but the classifier-free-guided ODE. LCM formulates this as an "augmented PF-ODE" whose trajectory depends on the guidance scale, and trains the consistency function over that augmented trajectory.^[1]
Guidance scale embedding (w-CFG). Rather than training a separate model per guidance scale, the LCM U-Net is conditioned on the guidance scale w via a sinusoidal embedding (the same trick introduced by Meng et al. for guidance distillation), so a single distilled checkpoint covers the full range of guidance scales.^[1]^[9]
Skipping-step strategy. To shorten the consistency-distillation training horizon, LCM uses a k-step skipping schedule: instead of enforcing consistency between adjacent diffusion timesteps t and t-1, it enforces it between t and t-k for k around 20. This lets the model learn coarse trajectories quickly and keeps the 32 GPU-hour budget feasible.^[1]

The first publicly released LCM checkpoint, SimianLuo/LCM_Dreamshaper_v7, was distilled from the popular Dreamshaper v7 fine-tune of Stable Diffusion 1.5 and reached usable quality in four denoising steps; the authors also released two-step and one-step variants showing graceful degradation.^[4]^[10] Evaluation on the LAION-5B-Aesthetics subset reports that LCMs match or exceed prior few-step methods on standard FID and CLIP-score metrics, though as with all diffusion benchmarks the absolute numbers depend heavily on the chosen subset and sampling configuration.^[1]

The paper also introduces Latent Consistency Fine-tuning (LCF), a procedure for adapting an existing LCM to a custom image dataset without requiring access to a teacher diffusion model. LCF treats the LCM itself as both teacher and student over augmented trajectories, allowing community users to specialize LCMs on niche styles (the project repository ships examples on Pokemon and Simpsons datasets).^[10]

Technical Details

Probability flow ODE in latent space

Latent diffusion models specify a forward process that progressively adds Gaussian noise to the latent representation z of an image. The corresponding reverse-time probability flow ODE is a deterministic differential equation whose solution at time zero recovers the clean latent.^[5] For an unconditional model the ODE depends only on the score function learned by the U-Net. For text-to-image generation, samples are guided by a prompt y via classifier-free guidance with a scale w, yielding a w-augmented ODE whose vector field interpolates between conditional and unconditional score estimates.

A consistency function is a parametric function $f_\theta(z_t, t)$ that maps any point on a particular ODE trajectory back to the same endpoint at $t=0$ . The defining identity is

f_\theta(z_t, t) = f_\theta(z_{t'}, t') \quad \text{for all } t, t' \text{ on the same trajectory}

Once $f_\theta$ has been learned, one-step generation reduces to drawing pure noise $z_T$ and evaluating $f_\theta(z_T, T)$ once. Multi-step generation alternates between (i) evaluating $f_\theta$ to produce a candidate clean latent and (ii) re-injecting controlled noise to land on an earlier timestep, repeating for two to four iterations.^[2]

LCM distillation procedure

The LCM distillation loop in LCMScheduler follows the original consistency-distillation recipe of Song et al., with three changes that adapt it to a Stable Diffusion teacher.^[1] First, the loss is computed entirely in the VAE latent space, so the VAE encoder is run once per training image and then frozen. Second, the teacher signal at timestep t is computed by integrating the augmented PF-ODE for k skipping steps using a numerical solver such as DDIM, producing the "target" latent $z_{t-k}$ . The student is then asked to ensure that $f_\theta(z_t, t)$ and $f_{\theta^-}(z_{t-k}, t-k)$ agree, where $f_{\theta^-}$ is an exponential-moving-average copy of the student weights.^[1] Third, the guidance scale w is sampled uniformly from a range (the paper uses $[w_{\text{min}}, w_{\text{max}}] = [2, 14]$ ) and the corresponding embedding is added to the time-embedding of the U-Net, mirroring Meng et al.'s guidance distillation.^[1]^[9]

A separate consistency boundary condition $c_{\text{skip}}(t)$ , $c_{\text{out}}(t)$ is used to parameterize the U-Net output so that at $t = 0$ the network returns the input unchanged; the Diffusers LCMScheduler exposes a timestep_scaling parameter (default 10.0) that controls how these boundary functions are scaled.^[11] This scaling factor is one of the few hyperparameters that practitioners may need to tune when distilling LCMs onto new base models.

Inference with LCMScheduler

At inference time, an LCM checkpoint is paired with the LCMScheduler provided by the Diffusers library.^[11] The scheduler implements Algorithm 3 of the LCM paper and supports both one-step and multistep sampling between one and eight steps; the Diffusers documentation recommends two-to-eight steps for general use, with four steps as the default in the LatentConsistencyModelPipeline.__call__ signature.^[12]

A subtle but important detail concerns guidance scale. The original LCM paper defines guidance scale so that w = 0 corresponds to "no guidance," whereas Stable Diffusion's traditional convention sets guidance_scale = 1 to mean no guidance.^[12] The Diffusers implementation follows the Stable Diffusion convention: the user passes guidance_scale = 8.5 in the pipeline call, and Diffusers internally subtracts one before computing the embedding fed to the U-Net.^[12] In practice LCM-LoRA model cards advise setting guidance_scale between 1.0 and 2.0 (Stable-Diffusion convention) because larger values, which work well for the full teacher, tend to oversaturate the few-step student.^[13]

Memory and compute envelope

The original LCM was distilled in approximately 32 A100 GPU-hours on a 768x768 dataset.^[1] By contrast, full-model fine-tunes of SDXL typically cost hundreds to thousands of GPU-hours, and progressive-distillation runs in the original Salimans-Ho paper consume tens of thousands of TPU-hours.^[8] At inference, the speed-up is similarly dramatic: HuggingFace's official LCM-LoRA blog post benchmarks an SDXL LCM-LoRA at four steps against the standard SDXL at twenty-five steps and reports speed-ups of roughly five-fold on an RTX 3090 (1.4s versus 7s), ten-fold on an M1 Mac (about 6s versus 60s), and three-fold on an A100 80GB (1.2s versus 3.8s) for a single 1024x1024 image.^[4]

LCM-LoRA

One month after the original LCM paper, the same group released a follow-up technical report titled "LCM-LoRA: A Universal Stable-Diffusion Acceleration Module" (arXiv:2311.05556, 9 November 2023).^[3] The author list expanded to include several core HuggingFace Diffusers maintainers: Simian Luo, Yiqin Tan, Suraj Patil, Daniel Gu, Patrick von Platen, Apolinario Passos, Longbo Huang, Jian Li, and Hang Zhao.^[3] The collaboration was made public alongside an official HuggingFace blog post that shipped LCM-LoRA support in the Diffusers library on the same day.^[4]

Methodology

LCM-LoRA observes that the distilled LCM U-Net differs from its teacher Stable Diffusion U-Net only in a small subspace of weight directions. Rather than fine-tune all parameters, the LCM training loss is applied to a low-rank LoRA adapter that is injected into the cross-attention and self-attention layers of the U-Net.^[3] The resulting adapter has on the order of 100M-200M parameters compared to the 2.6B parameters of SDXL, drastically reducing the memory footprint of distillation. After training, the LoRA weights can be (i) merged into a base checkpoint for deployment, or (ii) loaded at inference and switched on or off like any other LoRA, which means the same adapter accelerates many derived checkpoints.^[3]^[4]

Universality claim

The empirical centerpiece of the report is that a single LCM-LoRA distilled against the base SDXL checkpoint generalizes to community fine-tunes of SDXL without retraining.^[3]^[4] In other words, a user who downloads the official latent-consistency/lcm-lora-sdxl adapter (197M parameters) can apply it on top of any SDXL Dreambooth or community fine-tune and obtain four-step inference without performing any distillation themselves.^[13] The paper interprets this as evidence that LCM-LoRA functions as a "plug-in neural PF-ODE solver," generalizing across the family of nearby diffusion models in a way that traditional numerical solvers such as DDIM and DPM-Solver do.^[3]

Released checkpoints

The release shipped three official LoRA adapters and two full-parameter LCM checkpoints, all hosted on the HuggingFace Hub under the latent-consistency organization.^[4]

Checkpoint	Base model	Parameters	Recommended steps	Release
`latent-consistency/lcm-lora-sdv1-5`	Stable Diffusion v1.5	~67M	4-8	November 2023^[4]
`latent-consistency/lcm-lora-sdxl`	Stable Diffusion XL base 1.0	~197M	4-8	November 2023^[13]
`latent-consistency/lcm-lora-ssd-1b`	SSD-1B (segmind/SSD-1B)	~129M	4-8	November 2023^[4]
`latent-consistency/lcm-sdxl`	SDXL base 1.0 (full UNet)	~2.6B	1-8	November 2023^[4]
`latent-consistency/lcm-ssd-1b`	SSD-1B (full UNet)	~1.3B	1-8	November 2023^[4]

Reported guidance for lcm-lora-sdxl: between two and eight inference steps; guidance_scale of 1.0 (effectively disabled) or in the range 1.0-2.0; usable image-to-image, inpainting, ControlNet, and T2I-adapter pipelines via the same LCMScheduler.^[13]

HuggingFace Diffusers Integration

The reference implementation of LCM is the LCMScheduler class in the diffusers library, contributed by Simian Luo (luosiallen), Daniel Gu (dg845), and others, and merged into the Diffusers main branch shortly after the paper release in October 2023.^[11]^[12] The scheduler exposes the multistep algorithm of Section 4.3 of the paper and supports the "Skipping-Step" timestep schedule that LCM uses internally.^[11]

Diffusers exposes two LCM-specific pipelines, LatentConsistencyModelPipeline for text-to-image and LatentConsistencyModelImg2ImgPipeline for image-to-image generation, both of which inherit the standard mixins for LoRA loading, CLIP text encoding, textual inversion, and IP-Adapter integration.^[12] Sample usage from the official documentation:

from diffusers import DiffusionPipeline, LCMScheduler
import torch

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0",
                                         variant="fp16")
pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.to(device="cuda", dtype=torch.float16)
images = pipe(prompt="close-up photograph of an old man in the rain at night",
              num_inference_steps=4, guidance_scale=1).images

Beyond the core pipelines, LCM-LoRA was designed to be stackable with other LoRAs. The HuggingFace integration allows a user to load an LCM-LoRA together with a style or character LoRA, weighing each adapter independently, so the same model can produce four-step inference in a fine-tuned artistic style without retraining.^[4] Tools and pipelines such as ControlNet, image-to-image, and inpainting can all be driven by an LCM-LoRA-equipped UNet, which is what makes LCMs broadly useful in downstream workflows.^[4]

LCMScheduler also exposes optional zero-SNR rescaling (rescale_betas_zero_snr), v-prediction, and configurable original_inference_steps (default 50) to support arbitrary spacing of distillation and inference timesteps. The default timestep_scaling=10.0 controls the boundary-condition coefficients; per the Diffusers docstring, "the approximation error at the default of 10.0 is already pretty small."^[11]

Real-Time Generation Tools

The release of LCM-LoRA coincided with a wave of "real-time" generative-image products that exploit sub-second inference to allow users to draw, type, or move a camera and see the AI output update fluidly. The two most widely cited examples are Krea AI and fal.ai.

Krea AI

Krea AI launched a real-time canvas in November 2023 in which users sketch shapes and the system regenerates a photorealistic image roughly every frame. Krea's real-time stack uses LCM-LoRA on top of SDXL and Stable Diffusion variants and advertises sub-50ms updates on its real-time canvas.^[14] Founder Diego Rodriguez and Victor Perez positioned the product as a way to bridge "rough sketch and polished visual in milliseconds," a use-case made viable only by the LCM-class step counts.^[14] VentureBeat covered the release on 9 November 2023 under the headline "Realtime generative AI art is here thanks to LCM-LoRA," explicitly tying the product wave to the Luo et al. paper.^[15]

fal.ai

The serverless inference provider fal.ai released a real-time LCM endpoint shortly after the LCM-LoRA paper and documented end-to-end latencies of about 150ms for Stable Diffusion v1.5 LCM and 650ms for SDXL LCM at four inference steps, of which roughly 120ms is GPU compute.^[16] fal's blog notes that "real-time" applications such as drawing-canvas integrations and live-camera image generation are now feasible because LCM inference is dominated by network round-trip rather than GPU work.^[16] The provider collaborated with the collaborative-whiteboard tool tldraw on a real-time generation feature and helped popularize the "LCM-Painter" interactive space on HuggingFace.^[16]

Community tooling

LCM and LCM-LoRA checkpoints are also supported by community front-ends including ComfyUI and Automatic1111's Stable-Diffusion-WebUI, and by C# / ONNX-runtime ports for CPU and edge inference.^[10] The combination of low VRAM requirements (LCM-LoRA-SDv1.5 weighs about 67MB) and few-step inference made LCM popular for self-hosted and on-device workloads where Stability AI's full SDXL Turbo was unavailable due to its noncommercial-research license.

Successors and Extensions

The latent-consistency idea was extended along two axes after 2023: refinements that address LCM's quality limitations, and ports of the method to modalities beyond still images.

Phased Consistency Models (PCM; Wang et al., NeurIPS 2024) identify three design flaws in LCM, including its sensitivity to classifier-free guidance and instability at very low step counts, and partition the ODE trajectory into multiple sub-trajectories ("phases") each governed by its own consistency constraint. PCM reports improvements over LCM across the one-to-sixteen-step range while remaining competitive with dedicated one-step methods.^[25] OpenAI's "Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models" (Lu and Song, 2024) revisits the consistency objective in continuous time; the resulting sCM models scale to 1.5 billion parameters and reach two-step ImageNet 512x512 samples within about 10 percent of the best diffusion FID, narrowing the quality gap that motivated GAN-augmented competitors to LCM.^[26]

LCM beyond images

Because the latent-consistency recipe only assumes a latent diffusion teacher and an augmented probability-flow ODE, it transfers to video and motion generation. VideoLCM (Wang et al., 2023) applies consistency distillation to a latent video diffusion model for roughly four-step clip synthesis.^[27] AnimateLCM (Wang et al., SIGGRAPH Asia 2024) decouples the consistency distillation of image priors from motion priors, allowing four-step personalized video generation without personalized video data.^[28] MotionLCM (Dai et al., ECCV 2024) brings the same latent-consistency objective to real-time controllable human-motion generation.^[29] These extensions reuse the core LCM machinery, the w-augmented PF-ODE and the EMA consistency target, in higher-dimensional latent spaces.

Comparison with Other Acceleration Methods

LCM occupies a distinct point in the design space of fast diffusion samplers. The principal contemporaries are numerical ODE solvers, progressive distillation, guidance distillation, and Adversarial Diffusion Distillation. The table below summarizes their characteristics; entries are drawn from each method's reference paper or official documentation.

Method	Type	Reported steps	Requires training	Reference
DDIM	Numerical PF-ODE solver	20-50	No	Song et al. 2020^[7]
DPM-Solver / DPM-Solver++	Higher-order multistep solver	10-20	No	Lu et al. 2022^[6]
Progressive Distillation	Student-teacher halving	4-8	Yes (multi-stage)	Salimans & Ho 2022^[8]
Guidance Distillation	Folds CFG into one pass	unchanged	Yes	Meng et al. 2022^[9]
Consistency Models	Pixel-space self-consistency	1-4	Yes	Song et al. 2023^[2]
Latent Consistency Models	Latent-space self-consistency	1-4	Yes (32 A100h)	Luo et al. 2023^[1]
LCM-LoRA	LCM as low-rank adapter	2-8	Yes (LoRA only)	Luo et al. 2023^[3]
Adversarial Diffusion Distillation (SDXL Turbo)	GAN + score distillation	1-4	Yes	Sauer et al. 2023^[17]

LCM versus DPM-Solver

DPM-Solver is a training-free higher-order solver: it can be dropped onto an existing Stable Diffusion checkpoint and reduces inference from ~50 to ~10-20 steps without modifying the model.^[6] LCM cannot match this convenience because it requires a distillation pass, but in exchange it goes substantially further down the step count (one to four versus ten to twenty) and exposes a small LoRA that can be cached and shared. The LCM-LoRA paper explicitly frames LCM-LoRA as "a plug-in neural PF-ODE solver" that complements numerical solvers when the lowest possible step count is required.^[3]

LCM versus Progressive Distillation

Progressive Distillation (Salimans and Ho, 2022) reduces step counts via a chain of halving stages, each of which trains a new student that takes half as many steps as its teacher. Reaching four steps from a 1000-step teacher therefore requires multiple distillation rounds, each comparable in cost to a fine-tuning run.^[8] LCM, by contrast, distils a few-step student in a single training stage by enforcing a global self-consistency property, which is what enables the 32 GPU-hour training budget reported by Luo et al.^[1]

LCM versus Adversarial Diffusion Distillation

Stability AI released SDXL Turbo on 28 November 2023, three weeks after LCM-LoRA. SDXL Turbo is trained with Adversarial Diffusion Distillation (ADD), proposed by Sauer, Lorenz, Blattmann, and Rombach, which combines score distillation against a teacher diffusion model with a GAN-style discriminator that scores the realism of the student's one-step outputs.^[17]^[20] The ADD paper reports that the method "outperforms existing few-step methods (GANs, Latent Consistency Models) in a single step" and matches SDXL quality in four steps.^[20] Stability AI's announcement reports that SDXL Turbo at one step is preferred by human evaluators over LCM-XL at four steps on prompt-following and image-quality measures.^[17] The comparison is not entirely apples-to-apples (LCM-XL is a free-research artifact while SDXL Turbo was released under a non-commercial license at first), but it illustrates the broader pattern that GAN-style discriminator losses can buy additional one-step quality at the price of more complex training infrastructure.^[17]

Subsequent work has continued to push this frontier. ByteDance's SDXL-Lightning (Lin et al., 2024) combines progressive distillation with adversarial training and ships checkpoints for one-, two-, four-, and eight-step inference as both LoRA and full UNet weights.^[18] Hyper-SD (Ren et al., NeurIPS 2024) introduces Trajectory Segmented Consistency Distillation, which performs consistency distillation within predefined timestep segments rather than over the whole trajectory, adds human-feedback learning and score distillation, and provides a single unified LoRA that supports inference at every step count from one to eight.^[21] Academic follow-ups such as Multistep Consistency Models (Heek et al., 2024) unify consistency models and TRACT to interpolate between the few-step LCM regime and the many-step diffusion regime.^[19]

LCM versus rectified-flow few-step methods

A parallel line of acceleration work starts from rectified flow rather than from consistency distillation. InstaFlow (Liu et al., ICLR 2024) applies the rectified-flow "reflow" procedure to straighten the noise-to-image trajectories of Stable Diffusion before distilling a one-step student, reporting an FID of 23.3 on MS COCO 2017-5k for its one-step generator with inference around 0.1 second on an A100.^[22] Stability AI later generalized adversarial distillation to the latent space with Latent Adversarial Diffusion Distillation (LADD; Sauer et al., 2024), which it applied to Stable Diffusion 3 to obtain SD3-Turbo, matching state-of-the-art text-to-image quality in roughly four unguided steps.^[23] Black Forest Labs' FLUX.1 [schnell], released in August 2024, is a 12-billion-parameter rectified-flow transformer trained with latent adversarial diffusion distillation for one-to-four-step generation.^[24] These flow-matching and rectified-flow models reach comparable step counts to LCM but rely on adversarial or trajectory-straightening objectives rather than the self-consistency loss, and they are typically trained on much larger backbones than the SD1.5 and SDXL models that LCM-LoRA targets.

Applications

LCMs and LCM-LoRA are used wherever diffusion-model latency was previously a bottleneck:

Real-time creative tools: live canvas drawing in Krea AI, collaborative whiteboards (tldraw + fal.ai), and webcam-driven generation systems use LCMs to render at interactive frame rates.^[14]^[16]
Mobile and on-device inference: the small LCM-LoRA-SDv1.5 adapter (~67MB) plus a quantized SD1.5 backbone fits on modern phones, and CoreML / ONNX ports exist for both Apple Silicon and Windows; the LCM repository advertises CPU support for SDv1.5.^[10]
Batch image generation pipelines: applications that previously paid for ~25 steps per image (asset generation for games, e-commerce product imagery, design ideation) can drop step counts by a factor of five with minimal quality loss.^[4]
ControlNet and image-to-image workflows: because LCM-LoRA is compatible with ControlNet, img2img and inpainting pipelines, it is widely used to accelerate guided generation, not only unconstrained text-to-image.^[4]^[13]
Style mixing: LCM-LoRA can be stacked with style LoRAs, allowing a single base model to produce four-step inference in many distinct artistic styles by swapping the style adapter.^[4]

Limitations

LCMs come with several documented caveats:

Quality ceiling at one step: at a single inference step LCM outputs are "approximate shapes, no discernible features," per the HuggingFace LCM-LoRA blog post; four to six steps is the practical sweet spot.^[4]
Guidance scale must be small: large classifier-free guidance scales (e.g., 7.5 in Stable Diffusion conventions) cause oversaturation and cartoonish artifacts. Recommended values are 1.0 to 2.0 in the Diffusers convention.^[4]^[13]
Distillation drift on small datasets: Latent Consistency Fine-tuning on niche datasets sometimes drifts away from the base model's style. The community has reported best results using LCM-LoRA on top of large general-purpose base checkpoints rather than on heavily specialized fine-tunes.^[10]
Compute compared to ADD: at one step, Stability AI's adversarial-distilled SDXL Turbo is generally preferred over LCM-XL by human raters, suggesting that the pure consistency-distillation loss leaves quality on the table relative to GAN-augmented training.^[17]
License coupling: LCMs derived from a base model inherit that base model's license. SDXL LCM and LCM-LoRA adapters are distributed under openrail++, which permits commercial use; users distilling LCMs from licensed checkpoints must respect the upstream terms.^[13]
Few-step models still depend on a VAE decoder: although the U-Net is called only once or twice, the VAE decoder still has to be evaluated. For SDXL this is a non-trivial fraction of total wallclock time at low step counts, which is why real-time products often use distilled or tiled VAEs alongside the LCM U-Net.^[4]

Consistency Models (Song et al., 2023) is the pixel-space predecessor of LCM and introduces the self-consistency property and the EMA-distillation loop adopted by Luo et al.^[2]
DDIM (Song et al., 2020) and DPM-Solver / DPM-Solver++ (Lu et al., 2022) are the numerical PF-ODE solvers that LCM-LoRA explicitly compares itself against.^[6]^[7]
Progressive Distillation (Salimans and Ho, 2022) is the antecedent few-step distillation method; LCM achieves similar step counts in a single training stage.^[8]
Guidance Distillation (Meng et al., 2022) supplies the w-conditioning trick that LCM uses to fold classifier-free guidance into a single forward pass.^[9]
SDXL Turbo (Sauer et al., 2023) is Stability AI's adversarial-distilled few-step model and the most direct competitor to LCM-XL on the speed-versus-quality Pareto frontier.^[17]
SDXL-Lightning (Lin et al., 2024) and Hyper-SD (Ren et al., 2024) combine progressive distillation, adversarial training, and consistency objectives to push few-step inference further; both build on the LCM line.^[18]^[21]
Rectified flow and flow matching few-step methods (InstaFlow, LADD / SD3-Turbo, FLUX.1 schnell) reach comparable step counts to LCM using trajectory-straightening or latent adversarial objectives.^[22]^[23]^[24]
Phased Consistency Models (Wang et al., 2024) directly address documented LCM limitations such as guidance sensitivity and low-step instability.^[25]
LoRA (Hu et al., 2021) is the parameter-efficient adapter framework that LCM-LoRA reuses to make the distilled module shippable and stackable.^[3]
Stable Diffusion and Stable Diffusion XL are the teacher models for the most widely deployed LCM checkpoints.^[5]

References

Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, Hang Zhao, "Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference", arXiv (Tsinghua IIIS), 2023-10-06. https://arxiv.org/abs/2310.04378. Accessed 2026-05-31. ↩
Yang Song, Prafulla Dhariwal, Mark Chen, Ilya Sutskever, "Consistency Models", arXiv (OpenAI), 2023-03-02. https://arxiv.org/abs/2303.01469. Accessed 2026-05-31. ↩
Simian Luo, Yiqin Tan, Suraj Patil, Daniel Gu, Patrick von Platen, Apolinario Passos, Longbo Huang, Jian Li, Hang Zhao, "LCM-LoRA: A Universal Stable-Diffusion Acceleration Module", arXiv, 2023-11-09. https://arxiv.org/abs/2311.05556. Accessed 2026-05-31. ↩
Suraj Patil, Apolinario Passos, Patrick von Platen, Pedro Cuenca, "SDXL in 4 steps with Latent Consistency LoRAs", HuggingFace Blog, 2023-11-09. https://huggingface.co/blog/lcm_lora. Accessed 2026-05-31. ↩
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Bjoern Ommer, "High-Resolution Image Synthesis with Latent Diffusion Models", arXiv (CVPR 2022), 2021-12-20. https://arxiv.org/abs/2112.10752. Accessed 2026-05-31. ↩
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, Jun Zhu, "DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps", arXiv (NeurIPS 2022 Oral), 2022-06-02. https://arxiv.org/abs/2206.00927. Accessed 2026-05-31. ↩
Jiaming Song, Chenlin Meng, Stefano Ermon, "Denoising Diffusion Implicit Models", arXiv, 2020-10-06. https://arxiv.org/abs/2010.02502. Accessed 2026-05-31. ↩
Tim Salimans, Jonathan Ho, "Progressive Distillation for Fast Sampling of Diffusion Models", arXiv (ICLR 2022), 2022-02-01. https://arxiv.org/abs/2202.00512. Accessed 2026-05-31. ↩
Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik P. Kingma, Stefano Ermon, Jonathan Ho, Tim Salimans, "On Distillation of Guided Diffusion Models", arXiv (CVPR 2023), 2022-10-06. https://arxiv.org/abs/2210.03142. Accessed 2026-05-31. ↩
Simian Luo et al., "luosiallen/latent-consistency-model: Latent Consistency Models repository", GitHub, 2023-10-19 (initial release). https://github.com/luosiallen/latent-consistency-model. Accessed 2026-05-31. ↩
HuggingFace, "Latent Consistency Model Multistep Scheduler (LCMScheduler)", Diffusers documentation, 2023-11. https://huggingface.co/docs/diffusers/api/schedulers/lcm. Accessed 2026-05-31. ↩
HuggingFace, "Latent Consistency Models pipeline (LatentConsistencyModelPipeline)", Diffusers documentation, 2023-11. https://huggingface.co/docs/diffusers/api/pipelines/latent_consistency_models. Accessed 2026-05-31. ↩
latent-consistency, "lcm-lora-sdxl model card", HuggingFace Hub, 2023-11-09. https://huggingface.co/latent-consistency/lcm-lora-sdxl. Accessed 2026-05-31. ↩
Krea AI, "Realtime AI Drawing with Krea AI", Krea AI product blog, 2023-11. https://www.krea.ai/. Accessed 2026-05-31. ↩
Sharon Goldman, "Realtime generative AI art is here thanks to LCM-LoRA", VentureBeat, 2023-11-09. https://venturebeat.com/ai/realtime-generative-ai-art-is-here-thanks-to-lcm-lora/. Accessed 2026-05-31. ↩
Burkay Gur, "Building Applications with Real-Time Stable Diffusion APIs", fal.ai blog, 2023-11. https://blog.fal.ai/building-applications-with-real-time-stable-diffusion-apis/. Accessed 2026-05-31. ↩
Stability AI, "Introducing SDXL Turbo: A Real-Time Text-to-Image Generation Model", Stability AI news, 2023-11-28. https://stability.ai/news/stability-ai-sdxl-turbo. Accessed 2026-05-31. ↩
Shanchuan Lin, Anran Wang, Xiao Yang, "SDXL-Lightning: Progressive Adversarial Diffusion Distillation", arXiv (ByteDance), 2024-02-21. https://arxiv.org/abs/2402.13929. Accessed 2026-05-31. ↩
Jonathan Heek, Emiel Hoogeboom, Tim Salimans, "Multistep Consistency Models", arXiv, 2024-03-11. https://arxiv.org/abs/2403.06807. Accessed 2026-05-31. ↩
Axel Sauer, Dominik Lorenz, Andreas Blattmann, Robin Rombach, "Adversarial Diffusion Distillation", arXiv (Stability AI), 2023-11-28. https://arxiv.org/abs/2311.17042. Accessed 2026-05-31. ↩
Yuxi Ren, Xin Xia, Yanzuo Lu, Jiacheng Zhang, Jie Wu, Pan Xie, Xing Wang, Xuefeng Xiao, "Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis", arXiv (NeurIPS 2024), 2024-04-21. https://arxiv.org/abs/2404.13686. Accessed 2026-05-31. ↩
Xingchao Liu, Xiwen Zhang, Jianzhu Ma, Jian Peng, Qiang Liu, "InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation", arXiv (ICLR 2024), 2023-09-12. https://arxiv.org/abs/2309.06380. Accessed 2026-05-31. ↩
Axel Sauer, Frederic Boesel, Tim Dockhorn, Andreas Blattmann, Patrick Esser, Robin Rombach, "Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation", arXiv (SIGGRAPH Asia 2024), 2024-03-18. https://arxiv.org/abs/2403.12015. Accessed 2026-05-31. ↩
Black Forest Labs, "FLUX.1 [schnell] model card", HuggingFace Hub, 2024-08-01. https://huggingface.co/black-forest-labs/FLUX.1-schnell. Accessed 2026-05-31. ↩
Fu-Yun Wang, Zhaoyang Huang, Alexander William Bergman, Dazhong Shen, Peng Gao, Michael Lingelbach, Keqiang Sun, Weikang Bian, Guanglu Song, Yu Liu, Hongsheng Li, Xiaogang Wang, "Phased Consistency Model", arXiv (NeurIPS 2024), 2024-05-28. https://arxiv.org/abs/2405.18407. Accessed 2026-05-31. ↩
Cheng Lu, Yang Song, "Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models", arXiv (OpenAI), 2024-10-14. https://arxiv.org/abs/2410.11081. Accessed 2026-05-31. ↩
Xiang Wang, Shiwei Zhang, Han Zhang, Yu Liu, Yingya Zhang, Changxin Gao, Nong Sang, "VideoLCM: Video Latent Consistency Model", arXiv, 2023-12-14. https://arxiv.org/abs/2312.09109. Accessed 2026-05-31. ↩
Fu-Yun Wang, Zhaoyang Huang, Xiaoyu Shi, Weikang Bian, Guanglu Song, Yu Liu, Hongsheng Li, "AnimateLCM: Computation-Efficient Personalized Style Video Generation without Personalized Video Data", arXiv (SIGGRAPH Asia 2024), 2024-02-01. https://arxiv.org/abs/2402.00769. Accessed 2026-05-31. ↩
Wenxun Dai, Ling-Hao Chen, Jingbo Wang, Jinpeng Liu, Bo Dai, Yansong Tang, "MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model", arXiv (ECCV 2024), 2024-04-30. https://arxiv.org/abs/2404.19759. Accessed 2026-05-31. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

4 revisions by 1 contributor · full history

Suggest edit

What links here

Consistency Models Krea AI

Background

The LCM Paper

Technical Details

Probability flow ODE in latent space

LCM distillation procedure

Inference with LCMScheduler

Memory and compute envelope

LCM-LoRA

Methodology

Universality claim

Released checkpoints

HuggingFace Diffusers Integration

Real-Time Generation Tools

Krea AI

fal.ai

Community tooling

Successors and Extensions

Refinements of latent consistency

LCM beyond images

Comparison with Other Acceleration Methods

LCM versus DPM-Solver

LCM versus Progressive Distillation

LCM versus Adversarial Diffusion Distillation

LCM versus rectified-flow few-step methods

Applications

Limitations

Related Work

See also

References

Improve this article

Related Articles

Stable Diffusion

DALL-E

Midjourney

Imagen (text-to-image model)

Flux (text-to-image model)

Black Forest Labs

What links here

Related Articles

Stable Diffusion

DALL-E

Midjourney

Imagen (text-to-image model)

Flux (text-to-image model)

Black Forest Labs

What links here