Rectified Flow
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 5,031 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 20, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 ยท 5,031 words
Add missing citations, update stale details, or suggest a clearer explanation.
Rectified Flow is a generative modeling framework that learns a transport ordinary differential equation between two probability distributions by regressing a velocity field along straight-line interpolations between paired samples, then iteratively "rectifying" the flow so trajectories become progressively straighter and the resulting model can generate samples in very few, ideally one, Euler steps. It was introduced by Xingchao Liu, Chengyue Gong, and Qiang Liu at the University of Texas at Austin in the paper "Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow", first posted to arXiv on 7 September 2022 and selected as a spotlight at the 2023 International Conference on Learning Representations.[^1] The same construction was independently proposed under the names Flow Matching by Lipman et al. and Stochastic Interpolants by Albergo and Vanden-Eijnden, with all three lines of work converging at ICLR 2023.[^2][^3] Rectified Flow with linear interpolation has since become the standard training objective for large-scale generative models including Stable Diffusion 3, the FLUX.1 family, and numerous downstream text-to-image, text-to-video, and audio systems.[^4][^5]
The dominant paradigm for high-quality image generation in the early 2020s was the denoising diffusion probabilistic model and its score-based relatives. These methods define a stochastic forward process that gradually corrupts data into Gaussian noise, then train a neural network to invert the process by predicting noise or scores. Sampling typically requires dozens to hundreds of network evaluations because the learned reverse process follows a curved trajectory that must be discretized finely to remain accurate. Although techniques such as DDIM solvers and progressive distillation reduced step counts, generation in one or two steps remained out of reach without significant quality loss.[^1]
A parallel research thread on continuous normalizing flows traced its lineage to Chen et al.'s 2018 work on neural ordinary differential equations. Continuous normalizing flows transport a base density to a data density by integrating a learned velocity field, but classical training required expensive simulation of the ODE and computation of the divergence of the velocity field during training. Researchers searched for a "simulation-free" objective that could regress the velocity field directly from data, in analogy to the score-matching objective used for diffusion models.[^2]
Rectified Flow, Flow Matching, and Stochastic Interpolants emerged within months of each other in late 2022 and early 2023, each providing a simulation-free regression objective for a velocity field that transports a noise distribution into a data distribution. Although the three groups used different language (transport ODEs, conditional probability paths, stochastic interpolants), the linear interpolation special case is identical across all three and forms the recipe that subsequent industrial systems adopted.[^1][^2][^3]
Let pi_0 and pi_1 be two probability distributions on R^d. In a generative setting pi_0 is typically a tractable noise distribution such as a standard Gaussian, and pi_1 is the empirical data distribution. Rectified Flow seeks a time-indexed velocity field v(z, t) on R^d such that solving the ordinary differential equation dZ_t = v(Z_t, t) dt with initial condition Z_0 ~ pi_0 produces Z_1 distributed according to pi_1.[^1]
The key constructive step is to choose a joint coupling between pi_0 and pi_1 (the trivial independent coupling is the default choice) and to define a deterministic interpolating path between each pair of samples (X_0, X_1) drawn from that coupling:
X_t = t * X_1 + (1 - t) * X_0, for t in [0, 1].
This is the straight-line interpolation between the endpoints. By construction, X_0 has the marginal pi_0 and X_1 has the marginal pi_1. The pointwise time derivative of the interpolation is the constant displacement vector X_1 - X_0, independent of t.[^1]
The straight-line interpolation is not by itself a causal ODE because, at any intermediate time t, the displacement X_1 - X_0 depends on the unknown endpoint X_1. To obtain a causal velocity field that depends only on the current state Z_t and time t, Liu, Gong, and Liu project the interpolation onto the space of causal ODEs by solving a least-squares regression problem.[^1]
The Rectified Flow training objective is
min_v Integral from 0 to 1 of E[ || (X_1 - X_0) - v(X_t, t) ||^2 ] dt,
where the expectation is over the joint coupling and t is drawn uniformly on [0, 1]. The conditional expectation that minimizes this loss is
v*(z, t) = E[ X_1 - X_0 | X_t = z ],
the average displacement direction over all interpolating lines that pass through the point z at time t.[^1] The objective is a standard nonlinear least-squares problem and is convex in the predicted vector field. It introduces no auxiliary parameters and is therefore trivially compatible with standard supervised training pipelines. A neural network v_theta(z, t) is trained to approximate v*(z, t).[^1]
A central theoretical result in the original paper is that the ODE driven by v* preserves the marginals of the interpolation. In particular, if Z_0 is initialized from pi_0 and Z_t is propagated by dZ_t = v*(Z_t, t) dt, then the law of Z_t is identical to the law of X_t for every t in [0, 1], even though the underlying trajectories generally differ. The endpoint Z_1 therefore has marginal pi_1.[^1]
The trajectories of the ODE governed by v* are not in general straight: although each individual interpolation X_t between endpoints (X_0, X_1) is a straight line, the conditional expectation v*(z, t) at a crossing point z averages over several different displacement vectors and can therefore curve. Rectified Flow addresses this curvature with an iterative procedure called reflow.[^1]
Given a trained 1-Rectified Flow, sample noise Z_0 ~ pi_0, solve the ODE to obtain Z_1, and record the new coupling (Z_0, Z_1). Then train a new Rectified Flow on this generated coupling instead of on the original independent coupling. The resulting model is called 2-Rectified Flow. The construction can be iterated to produce k-Rectified Flow for any positive integer k.[^1]
The key theorem of the paper states that this reflow operation never increases the convex transport cost between the source and target marginals, and that under mild regularity conditions the trajectories of the k-Rectified Flow become progressively straighter as k increases.[^6] Straighter trajectories are easier to integrate with coarse step sizes, so each reflow step trades a small amount of statistical quality for a sharply lower number of function evaluations at inference time. In the limit of perfectly straight trajectories, the Euler discretization with a single step is exact, enabling true one-step generation.[^1][^6]
The intuition behind reflow is geometric. Consider two source samples z and z' and two target samples y and y' such that the line segments [z, y] and [z', y'] cross at an interior point. The 1-Rectified Flow learned from the original coupling will, at the crossing point, predict an average of two different displacement directions, producing a curved overall trajectory. After one reflow iteration, however, the new coupling pairs each source with the endpoint that the previous flow actually reaches; the resulting set of trajectories has fewer crossings, so the conditional expectation v* is closer to a deterministic single direction at any point in space, and the new trajectories are correspondingly straighter. The paper formalizes this argument with a contraction inequality on the integrated transport cost.[^1][^6]
A practical concern is that the reflow procedure involves sampling noise-image pairs from the previous-generation model and then training a new network on that synthetic data. Sampling errors and finite minibatch noise compound across generations, which limits the practical depth of the iteration in high-dimensional settings such as text-to-image synthesis. Most large industrial systems apply at most one or two reflow iterations and then move directly to a final distillation step that compresses the straightened trajectory into a one-step or few-step student.[^9]
Three theoretical properties make the Rectified Flow framework appealing. First, the regression objective is pointwise convex in the velocity prediction, so the empirical risk inherits the convexity of squared error, simplifying optimization analysis.[^1] Second, the rectification operation provably reduces convex transport costs: for every convex cost function c, the start-end pairs produced by Rectified Flow have no larger expected c-cost than the original coupling, and recursive application drives the pairing toward a c-optimal transport coupling at the fixed point.[^6] Third, the special case of straight-line interpolation lies at the intersection of Rectified Flow with optimal transport: in one dimension, strictly convex transport costs are uniquely minimized by monotone (straight-line) couplings, and the reflow operation can be viewed as a marginal-preserving approach to optimal transport in higher dimensions.[^6]
The original paper presents experimental results on image generation that illustrate the trade-off introduced by reflow. On CIFAR-10, the official implementation reports 1-Rectified Flow reaching a Frechet Inception Distance of 2.58 with a high number of Euler steps, while 2-Rectified Flow reaches 3.36 and 3-Rectified Flow reaches 3.96.[^7] Quality measured by FID degrades slightly with successive reflow operations because each rectification involves a sample-and-distill cycle that compounds approximation error.[^7]
The advantage appears at low step counts. The same official benchmarks report that the distilled 2-Rectified Flow generates CIFAR-10 images with a single Euler step at FID 4.85, outperforming the strongest one-step competitor available at the time of the ICLR 2023 submission, a U-Net based TDPM baseline with FID 8.91.[^7] Subsequent work has reduced one-step CIFAR-10 FID further. A 2-Rectified Flow with improved training has been reported at FID 3.07 in one step, and one-step variants based on the Rectified Flow framework now reach FID 3.08 to 3.36 across recent benchmarks.[^8]
The qualitative picture is consistent across modalities: as k grows, trajectories become straighter, the few-step or one-step generation FID improves, and the high-step FID degrades modestly. In practice, most large-scale industrial systems train a 1-Rectified Flow with carefully designed noise samplers and then use either a small number of higher-order ODE steps (Stable Diffusion 3) or an explicit distillation stage (InstaFlow, FLUX.1 schnell) rather than a long chain of reflow iterations.[^4][^5][^9]
A useful way to summarize the empirical observations is that the Rectified Flow framework converts a fundamentally curved generative problem into one that can be tackled in stages. The first stage trains a high-quality but multi-step model directly on the linear interpolant, recovering essentially the same sample quality as a diffusion model trained on the same data. The second stage (one reflow) trades a small amount of multi-step quality for a substantial gain in low-step quality. The third stage (distillation against the reflowed teacher) further compresses the model into a few-step or one-step student. Each stage uses the same underlying network architecture and the same regression-style loss, so the engineering complexity is concentrated in the data and sampling pipelines rather than in the loss function itself.[^7][^9]
InstaFlow, presented at ICLR 2024, was the first application of Rectified Flow's reflow procedure to a large-scale text-to-image generator and the first one-step distillation of a Stable Diffusion model that approached the quality of the multi-step teacher. The paper "InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation" was authored by Xingchao Liu, Xiwen Zhang, Jianzhu Ma, Jian Peng, and Qiang Liu and posted to arXiv on 12 September 2023.[^9]
The pipeline starts from a pretrained Stable Diffusion checkpoint, treated as a 1-Rectified Flow with noisy trajectories. The authors then apply one reflow step by sampling noise-image pairs from the teacher, training a new velocity model on the straightened paths, and finally performing one-step distillation against the reflowed velocity field. InstaFlow reports a single-step FID of 23.3 on MS COCO 2017-5k, compared with 37.2 for the best progressive distillation baseline; a 1.7-billion-parameter version reduces single-step FID to 22.4 and reaches 13.1 on MS COCO 2014-30k in 0.09 seconds per image.[^9] Training the largest InstaFlow variant required 199 A100 GPU days, a fraction of the cost of a from-scratch text-to-image model.[^9] InstaFlow established the empirical claim that one Euler step is enough for high-quality text-to-image generation provided the underlying probability flow is straightened first by reflow.[^9]
Flow Matching, introduced by Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le in arXiv:2210.02747 (posted 6 October 2022, presented at ICLR 2023), is the contemporaneous formulation of essentially the same idea from a different theoretical entry point.[^2] Flow Matching defines a family of time-indexed conditional probability paths and trains a velocity field to match the conditional vector field that pushes a sample along the path. The Conditional Flow Matching objective Lipman et al. derive is a simulation-free regression problem in the same family as the Rectified Flow loss.[^2]
The two frameworks coincide exactly in the linear-interpolation special case. When Flow Matching is instantiated with Optimal Transport conditional paths between a Gaussian and a data sample, the conditional vector field reduces to the displacement X_1 - X_0, and the resulting objective is the velocity-field regression of Rectified Flow.[^2] The community now uses the names interchangeably for the linear case, with "rectified flow training" often emphasizing the reflow procedure and "flow matching training" emphasizing the conditional probability path viewpoint. Modern textbook treatments and surveys describe both as instances of a broader simulation-free continuous-time generative framework that also subsumes diffusion models and Stochastic Interpolants.[^3][^10]
Stochastic Interpolants, proposed by Michael S. Albergo and Eric Vanden-Eijnden in arXiv:2303.08797 (posted 15 March 2023), provides a third equivalent viewpoint. The framework builds a stochastic interpolant between two arbitrary densities, with the linear interpolation X_t = t X_1 + (1-t) X_0 as a special case, and shows that the resulting time-dependent density satisfies both transport and Fokker-Planck equations.[^3] This perspective recovers Rectified Flow and Flow Matching as the deterministic-limit, linear-interpolation instances of a larger family of stochastic generative models.[^3]
Diffusion models, including DDPM and score-based variants, train a network to denoise noisy data along a stochastic corruption process. The probability flow ODE associated with a diffusion model is a particular curved transport between the noise prior and the data distribution: noise is added gradually with a variance schedule, and the reverse-time ODE traces the matched probability flow.[^1]
Rectified Flow with linear interpolation can be regarded as the limiting case of a diffusion-style transport in which all of the noise has already been added at time t = 0 and the path between noise and data is a straight line in ambient space rather than a curved Ornstein-Uhlenbeck bridge.[^1] The training objectives are formally similar: both regress an attribute of the data given the noisy state. The chief practical differences are:
| Feature | Diffusion / Score-Matching | Rectified Flow / Flow Matching (linear) |
|---|---|---|
| Forward process | Stochastic, with curved probability path | Deterministic linear interpolation |
| Training target | Noise epsilon, score grad log p, or v-prediction | Constant displacement X_1 - X_0 |
| Sampling trajectory | Curved; requires many steps | Straight after reflow; supports few-step or one-step inference |
| Typical step count | 25-1000 | 1-50, often 1-4 with distillation |
| Theoretical link to optimal transport | Indirect | Direct via reflow theorem |
Esser et al. argue in the Stable Diffusion 3 paper that Rectified Flow with linear paths offers both better theoretical properties and conceptual simplicity, although the formulation had not been used in large-scale image synthesis prior to their work.[^4] Their systematic comparison of 61 training formulations, including epsilon-prediction with linear schedule (the DDPM default), v-prediction with cosine schedule, Karras et al.'s EDM formulation, and several Rectified Flow variants, found that rectified flow with logit-normal timestep sampling was the most sample-efficient and remained competitive with epsilon-prediction at 25 or more steps.[^4]
Rectified Flow has become the dominant training objective for state-of-the-art text-to-image and text-to-video models released in 2024 and 2025. The most prominent examples follow.
Stability AI's Stable Diffusion 3 was announced on 22 February 2024 and the 2-billion-parameter "Medium" checkpoint was publicly released for local generation on 12 June 2024.[^11][^12] The accompanying paper, "Scaling Rectified Flow Transformers for High-Resolution Image Synthesis" by Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Muller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, and Robin Rombach, was posted to arXiv on 5 March 2024.[^4] The paper explicitly bases its training loss on Rectified Flow with linear-interpolation paths between data and noise and contributes two main innovations: a new transformer architecture called MM-DiT that uses separate parameter blocks for text and image tokens connected by joint attention, and a logit-normal sampling distribution over timesteps that places more weight on intermediate values where the velocity is hardest to predict.[^4] Models are released across a range of sizes from 800 million to 8 billion parameters; the 8B variant is reported to outperform DALL-E 3 on the GenEval benchmark and to achieve a 52 percent visual-quality win rate and a 53 percent prompt-following win rate against DALL-E 3 on the Parti-Prompts benchmark.[^4] Stable Diffusion 3.5, released on 22 October 2024 in Large and Large Turbo variants, retains the rectified flow transformer architecture.[^13]
The Stable Diffusion 3 paper is notable for conducting a controlled large-scale study comparing rectified flow against six other generative-modeling formulations under matched training conditions. The authors evaluated 61 different formulations including epsilon-prediction with linear and cosine noise schedules (the standard DDPM/LDM family), v-prediction variants, the EDM formulation of Karras et al., the LDM-Linear formulation used in earlier Stable Diffusion releases, and several rectified-flow timestep samplers. Their global ranking placed rectified flow with logit-normal timestep sampling (parameters location m = 0.0 and scale s = 1.0) as the consistently best performer across model sizes and sampling-step budgets.[^4] The paper also reports a scaling study covering models with transformer depths from 15 to 38 layers and shows that validation loss correlates strongly with downstream prompt-following metrics, with the largest model (depth 38, approximately 8 billion parameters) achieving a 2.71 percent relative loss increase when reducing inference from 50 to 5 sampling steps, compared with a 4.30 percent increase for the depth-15 model.[^4] This scaling-favorable property is one of the central empirical justifications for using rectified flow training at industrial scale.[^4]
FLUX.1, released on 1 August 2024 by Black Forest Labs (a startup founded by several of the original Stable Diffusion authors and funded by a $31 million seed round led by Andreessen Horowitz), is described in its official announcement as a 12-billion-parameter rectified flow transformer trained in the latent space of an image encoder.[^5] Three variants were released initially: FLUX.1 pro (a commercial API model), FLUX.1 dev (an open-weight, non-commercial guidance-distilled version), and FLUX.1 schnell (an Apache 2.0 licensed variant trained with latent adversarial diffusion distillation for one-to-four-step inference).[^5] Black Forest Labs reports a novel timestep sampling distribution for rectified flow training that improves training and preserves the few-step sampling advantages of the framework.[^5]
Rectified flow training has been adopted across modalities beyond text-to-image. Frieren is a video-to-audio model based on rectified flow matching that regresses the conditional vector field from noise to a spectrogram latent along straight paths and can produce decent audio with a single sampling step after reflow and distillation.[^14] FlashAudio applies rectified flows to text-to-audio generation and reports 400x faster than real-time inference on a single consumer GPU.[^15] In structural biology, ReQFlow constructs a rectified flow on the quaternion representation of protein backbone rotations using spherical linear interpolation, and rectified flow has been used for structure-based drug design with state-of-the-art molecular affinity results.[^16][^17] Open-source video models such as Genmo's Mochi 1 (announced October 2024) and PixArt-style 4K image transformers also use flow-matching-style training closely aligned with the Rectified Flow recipe.[^18][^19]
The adoption pattern across these systems is consistent. A typical industrial pipeline starts from a backbone architecture (a U-Net or, more often after 2024, a diffusion transformer), trains a 1-Rectified Flow using a logit-normal or similar non-uniform timestep distribution, optionally performs one reflow iteration to straighten trajectories, and concludes with a guidance or adversarial distillation stage that produces a one-to-four-step student model for production deployment. The pretrained multi-step model is typically retained for offline batch generation and high-quality outputs, while the distilled student handles interactive use. This pattern was first articulated end-to-end by InstaFlow and has been reproduced with minor variations by every major rectified-flow-based image and video system released since.[^4][^5][^9]
The practical significance of Rectified Flow lies in the combination of three properties that no prior generative model achieved simultaneously: a simple, simulation-free, regression-based training loss; a theoretically motivated path toward few-step or one-step sampling; and a direct connection to optimal transport.[^1][^6]
Few-step sampling translates directly into lower latency and energy cost at inference time. A diffusion model that requires 50 network evaluations to produce a single image at 1024 by 1024 resolution can become a real-time interactive tool when the same image quality can be obtained in 4 to 8 evaluations using a rectified flow trained model. This has been most visible in consumer-facing FLUX.1 schnell and InstaFlow demonstrations, which generate 1024-pixel images in under one second on a consumer GPU.[^5][^9]
The straight-trajectory property is also valuable for editing and inversion. Because the rectified flow ODE is well-approximated by its linearization between endpoints, the same model can be used in both forward and inverse directions with high fidelity, enabling applications such as text-guided image editing, controlled style transfer, and unsupervised image-to-image translation. The original 2022 paper devotes substantial space to such transfer applications, demonstrating that the framework works for both noise-to-data generation and data-to-data transport.[^1]
Finally, the simplicity of the objective lowers the barrier to research adoption. Because training a rectified flow does not require simulating the ODE during the forward pass, computing a divergence, or scheduling a noise variance, practitioners can take existing diffusion infrastructure and replace the loss function with a one-line change. This contributed to the rapid spread of the method across labs and modalities through 2024 and 2025.[^4][^5][^10]
Beyond image generation, the same training recipe has been applied to a wide range of structured prediction problems where a continuous transport between a noise distribution and a target distribution is natural. In audio, FlashAudio and several follow-up systems use rectified flow to produce 24 kHz speech and music from text prompts, achieving real-time or faster-than-real-time inference rates that were previously the domain of GAN-based vocoders.[^15] In biology, ReQFlow uses rectified flow with spherical linear interpolation on the quaternion manifold to generate protein backbone conformations, and a separate line of work applies rectified flow to structure-based drug design with state-of-the-art binding affinity scores.[^16][^17] These domain-specific instantiations adapt the linear interpolation step to respect the geometry of the data, but retain the core regression-and-reflow recipe.
Another important practical benefit is the smooth interaction with classifier-free guidance, a technique developed for diffusion models in which the network is trained both with and without conditioning information and the inference-time velocity is taken as a weighted extrapolation between the two predictions. Because rectified flow's velocity field is a direct prediction of a displacement vector, classifier-free guidance integrates cleanly: the conditional and unconditional velocities can be combined linearly and the resulting extrapolated direction substituted into the ODE solver without any change to the underlying model. Stable Diffusion 3 and FLUX.1 both rely on this property to produce highly prompt-faithful outputs.[^4][^5]
Despite its empirical success, Rectified Flow has well-documented limitations that have prompted active research.
A first limitation is that the basic recipe trains a model that is, by itself, only approximately straight. A single 1-Rectified Flow trained on text-to-image data has trajectories that curve near the data manifold, and faithful generation at extremely low step counts (one or two Euler steps) requires either reflow, distillation, or both. Each reflow iteration consumes substantial compute (sampling and retraining the model), and each distillation stage discards diversity.[^9]
A second limitation is that the straightness theorem is asymptotic. In high-dimensional practice, only one or two reflow iterations are computationally tractable, and the residual curvature is not negligible. Several 2024 and 2025 papers, including "Improving the Training of Rectified Flows" and "Straightness Is Not Your Need in Rectified Flow", argue that the strict pursuit of straight trajectories sacrifices sample quality and propose alternative objectives that target first-order ODE consistency rather than pure straightness.[^8][^20]
A third limitation is the dependence on a well-chosen noise distribution and time-step sampling. The Stable Diffusion 3 paper documents that the naive uniform time-step distribution used in the original Rectified Flow paper underperforms a logit-normal distribution biased toward intermediate timesteps by a substantial margin on text-to-image benchmarks, and that the optimal sampler depends on the modality, model size, and resolution.[^4]
A fourth concern is theoretical. Although the rectification operation provably reduces convex transport costs, it is not guaranteed to converge to the optimal transport coupling between pi_0 and pi_1 in finite iterations, and the empirical fixed point is sensitive to the inductive bias of the velocity-network parameterization.[^6] Research on the exact relationship between rectified flows and optimal transport continues to refine the theoretical understanding.[^10]
Finally, although Rectified Flow training is conceptually simpler than diffusion training, the highest-quality systems combine it with elaborate engineering choices including latent encoders, multimodal text encoders, classifier-free guidance, advanced positional encodings, and distillation pipelines. The simplicity of the loss is real, but the overall complexity of a state-of-the-art text-to-image model is not substantially reduced.[^4][^5]
A subtler critique concerns evaluation. Most of the impressive few-step generation numbers reported in the literature rely on FID and on prompt-following benchmarks that are themselves imperfect measures of perceptual quality. Some recent work argues that low-step rectified flow samples, although close in distribution to the multi-step samples, can exhibit characteristic artifacts (over-smoothed textures, mode collapse on rare prompts) that FID does not capture. This has motivated additional evaluation pipelines based on human preference ratings, vision-language model judgments, and benchmark suites such as GenEval and Parti-Prompts. The Stable Diffusion 3 paper provides one of the most comprehensive applications of these complementary evaluation methods to a rectified-flow-trained system.[^4]
The framework also inherits broader limitations of generative image and video models. Rectified flow training does not on its own address dataset bias, copyright concerns surrounding training data, the generation of harmful or misleading content, or the energy cost of training large transformers. These social and ethical concerns are orthogonal to the choice of generative loss but apply with full force to deployed rectified-flow systems.
| Method | Trajectory shape | Training type | Typical inference steps | Reference |
|---|---|---|---|---|
| DDPM | Curved, stochastic | Score / noise regression | 50-1000 | Ho et al., 2020 |
| Score-based SDE | Curved, stochastic | Score matching | 50-2000 | Song et al., 2021 |
| EDM (Karras et al.) | Curved, deterministic | Denoising regression with sigma-conditioning | 20-50 | Karras et al., 2022 |
| Rectified Flow (linear) | Straight after reflow | Velocity regression on linear interpolant | 1-50 | Liu, Gong, Liu, 2022[^1] |
| Flow Matching (OT path) | Straight | Conditional vector-field regression | 1-50 | Lipman et al., 2022[^2] |
| Stochastic Interpolants | Configurable | Conditional vector-field regression | 1-50 | Albergo, Vanden-Eijnden, 2023[^3] |
| Consistency Models | One-shot | Self-consistency distillation | 1-4 | Song et al., 2023 |
For the linear-interpolation, deterministic case, Rectified Flow, Flow Matching, and Stochastic Interpolants are mathematically equivalent. The distinct names reflect three independent derivations that emphasize different aspects of the same underlying construction.[^1][^2][^3]