Stefano Ermon
Last reviewed
May 31, 2026
Sources
16 citations
Review status
Source-backed
Revision
v1 · 1,940 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 31, 2026
Sources
16 citations
Review status
Source-backed
Revision
v1 · 1,940 words
Add missing citations, update stale details, or suggest a clearer explanation.
Stefano Ermon is an Italian computer scientist and an associate professor of computer science at Stanford University. His research centers on machine learning and generative models, and he is known for foundational work on score-based generative modeling, an approach that informed the modern theory of diffusion models. With his doctoral student Yang Song he introduced noise-conditional score networks in 2019 and a continuous-time formulation of score-based models in 2021. Ermon also works on probabilistic inference, imitation learning, and artificial intelligence for sustainability and social good. In 2025 he co-founded Inception, a company that builds diffusion-based large language models, and he serves as its chief executive. [1][2][3]
Ermon studied electrical engineering at the University of Padova in Italy, where he earned a Bachelor of Science in 2006 and a Master of Science in 2008, both summa cum laude. [4][5]
He then moved to the United States for doctoral study at Cornell University. He completed a PhD in computer science, with a minor in applied mathematics, between 2008 and early 2015. His dissertation was titled "Decision Making and Inference under Limited Information and High Dimensionality," and his advisors were Carla P. Gomes and Bart Selman. While at Cornell he held a McMullen Fellowship and worked within the group that helped define computational sustainability, a research area that applies computational methods to environmental, economic, and societal problems. [4][6]
Ermon joined Stanford University as an assistant professor in November 2014 and was later promoted to associate professor. He is affiliated with the Stanford Artificial Intelligence Laboratory and is a fellow of the Woods Institute for the Environment. His group studies methods for probabilistic modeling, generative modeling, and decision making, with applications that range from image synthesis to satellite imagery analysis. [4][5]
The contribution most associated with Ermon is score-based generative modeling. In a 2019 paper titled "Generative Modeling by Estimating Gradients of the Data Distribution," Ermon and his student Yang Song proposed learning the gradient of the log probability density of data, a quantity known as the score, rather than the density itself. Earlier generative methods often modeled the probability density directly or relied on adversarial training, and both approaches carried known difficulties around normalization constants and training stability. Song and Ermon argued that the score function avoids the intractable normalization constant entirely, since the gradient of a log density does not depend on it. [7][8]
The method has two main parts. First, the authors trained a single neural network, the noise conditional score network, to estimate scores across many levels of added Gaussian noise. Perturbing the data with a range of noise scales addresses a practical problem, namely that score estimates are unreliable in regions of low data density, and the added noise spreads probability mass so that the network sees informative training signal everywhere. Second, for generation they used annealed Langevin dynamics, a sampling procedure that starts from large noise and gradually reduces it, following the estimated score at each level to move random samples toward the data distribution. The paper appeared at the Conference on Neural Information Processing Systems as an oral presentation, and a 2020 follow-up improved the technique and scaled it to higher-resolution images. [7][8]
In 2021 Song, Ermon, and several coauthors unified score-based models and diffusion models within a single framework based on stochastic differential equations. The paper, "Score-Based Generative Modeling through Stochastic Differential Equations," described data corruption as a continuous-time forward process that slowly injects noise until the data becomes a simple known distribution. Sample generation runs the corresponding reverse-time process, which removes noise and depends only on the time-varying score of the perturbed data. The authors showed that many earlier methods, including the noise conditional score network and denoising diffusion, are discrete instances of this continuous view. They also introduced a predictor-corrector sampler that combines numerical integration with score-based correction steps, and a probability flow ordinary differential equation that shares the same marginal distributions as the stochastic process and allows exact likelihood computation. The paper received an Outstanding Paper Award at the International Conference on Learning Representations in 2021. This line of work sits beside the denoising diffusion approach developed elsewhere, and together these efforts established the theoretical basis for diffusion models that later powered systems such as Stable Diffusion. [9][10]
With his student Jonathan Ho, Ermon introduced generative adversarial imitation learning, or GAIL, at the 2016 Conference on Neural Information Processing Systems. Imitation learning seeks to recover a policy from demonstrations of expert behavior without access to a reward signal. Classical approaches often first infer a reward function through inverse reinforcement learning and then optimize a policy against it, which is computationally heavy. GAIL skips the explicit reward step. It adapts ideas from generative adversarial networks by training a discriminator to distinguish expert state-action pairs from those produced by the learner, while the policy is updated to fool the discriminator and so to match the expert behavior. The method became a widely cited reference point in imitation learning and reinforcement learning. [11]
Ermon has also published extensively on probabilistic inference and variational inference, on discrete optimization and counting problems, and on techniques for improving the training and evaluation of generative models. Jonathan Ho, an early member of his group, went on to co-author the denoising diffusion probabilistic models paper that is often cited alongside the score-based work as a starting point for diffusion methods. [4]
A second strand of Ermon's research applies machine learning to sustainability and development. His group used satellite imagery and street-level images to estimate poverty, crop yields, and other livelihood indicators in regions where ground data is scarce. Work on poverty mapping in Africa was selected by Scientific American as one of its world-changing ideas for 2016, and crop-yield prediction models from his group won first place in the World Bank Big Data Innovation Challenge in 2017. In 2018 Ermon co-founded Atlas AI, a company focused on economic and agricultural forecasting from geospatial data, where he served as a chief technical advisor. [4][12]
In 2025 Ermon co-founded Inception, a company based in Palo Alto, California, with Aditya Grover and Volodymyr Kuleshov, two former collaborators who had worked with him at Stanford. Grover became a professor at the University of California, Los Angeles, and Kuleshov a professor at Cornell. Inception emerged from stealth in February 2025 and is sometimes referred to as Inception Labs. Ermon serves as the company's chief executive. [2][3][13]
Inception applies diffusion methods to text generation. Most large language models are autoregressive, producing one token at a time in sequence, so each token depends on the tokens before it. A diffusion language model instead starts from a noisy or masked draft and refines many tokens in parallel across several denoising steps. Inception argues that this parallel design can raise generation speed and lower cost relative to sequential decoding. [13][14]
The company's first product family is named Mercury. Inception described Mercury as the first commercial-scale diffusion large language model and released variants aimed at coding, including Mercury Coder Mini and Mercury Coder Small. The underlying network is a Transformer trained to refine a draft from noise over a small number of denoising steps, modifying many tokens at once rather than left to right. Inception reported throughput of about 1,109 tokens per second for Mercury Coder Mini and about 737 tokens per second for Mercury Coder Small on NVIDIA H100 hardware, figures it presented as roughly an order of magnitude faster than comparable autoregressive systems while reaching similar quality. On the HumanEval coding benchmark, for instance, the company reported a score of 88.0 for Mercury Coder Mini against 90.0 for a competing model of similar class. [13][15]
In November 2025 Inception announced a 50 million dollar funding round led by Menlo Ventures, with participation from Mayfield, Innovation Endeavors, M12, Snowflake Ventures, Databricks, and NVentures, along with angel investments from Andrew Ng and Andrej Karpathy. Alongside the round the company released an updated Mercury model aimed at software development and reported integrations with several developer tools. [3][12]
Score-based generative modeling reframed sample generation as the task of estimating and following the gradient of a data distribution rather than directly modeling the distribution. The stochastic differential equation formulation that followed connected this view to diffusion processes and gave a single mathematical language for a family of generative methods. These ideas, developed in Ermon's group alongside parallel work on denoising diffusion, became part of the standard toolkit for image, audio, and video generation. His later move to apply diffusion to language placed him among researchers exploring alternatives to the autoregressive design that dominates large language models. [9][16]
Ermon has received several awards for his research. In 2018 he received the IJCAI Computers and Thought Award, given to artificial intelligence researchers under the age of 35. He held a Sloan Research Fellowship in 2020 and a Microsoft Research Faculty Fellowship in 2019, and he received the National Science Foundation CAREER Award in 2017. His papers have won an Outstanding Paper Award at the International Conference on Learning Representations in 2021 and a Best Paper Award at the International Conference on Machine Learning in 2024, among other conference honors. [4][6]
| Field | Detail |
|---|---|
| Full name | Stefano Ermon |
| Occupation | Computer scientist |
| Known for | Score-based generative models, diffusion language models |
| Position | Associate professor of computer science, Stanford University |
| Affiliations | Stanford Artificial Intelligence Laboratory, Woods Institute for the Environment |
| PhD | Cornell University, computer science, 2015 |
| PhD advisors | Carla P. Gomes and Bart Selman |
| Earlier degrees | BSc 2006 and MSc 2008, electrical engineering, University of Padova |
| Companies | Inception (co-founder and chief executive, 2025), Atlas AI (co-founder, 2018) |
| Notable awards | IJCAI Computers and Thought Award (2018), Sloan Research Fellowship (2020), NSF CAREER Award (2017), ICLR Outstanding Paper (2021), ICML Best Paper (2024) |