Jonathan Ho
Last reviewed
May 31, 2026
Sources
17 citations
Review status
Source-backed
Revision
v1 · 2,087 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 31, 2026
Sources
17 citations
Review status
Source-backed
Revision
v1 · 2,087 words
Add missing citations, update stale details, or suggest a clearer explanation.
Jonathan Ho is a machine learning researcher best known for introducing denoising diffusion probabilistic models, a method for generating images that he and his coauthors described in the 2020 paper "Denoising Diffusion Probabilistic Models" (commonly abbreviated DDPM). That work showed that a simple training recipe could turn diffusion models into high quality image generators, and it launched the wave of diffusion based text-to-image systems that followed. Ho completed his PhD at the University of California, Berkeley, under Pieter Abbeel, worked as a research scientist at Google Brain, and in 2022 became a cofounder of the image generation company Ideogram. [1][2][3]
Beyond DDPM, Ho is credited with several techniques that became standard parts of modern generative pipelines, including classifier-free guidance, cascaded diffusion models, and video diffusion models. He was also a coauthor on Google's Imagen text-to-image system. [4][5][6][7]
Ho studied machine learning as a graduate student and built his early research around generative models and sequential decision making. His first widely cited result came in 2016, before he finished his doctorate, in a collaboration with Stefano Ermon on "Generative Adversarial Imitation Learning" (GAIL). That paper, presented at NeurIPS 2016, framed imitation learning as a problem that could be solved with the adversarial training ideas used in generative adversarial networks, and it produced a model free algorithm that learned to copy expert behavior in high dimensional control tasks. The work was carried out while Ho was associated with Stanford and OpenAI, and Ermon was an early collaborator who helped shape his graduate research. [8]
Ho completed his PhD in the Electrical Engineering and Computer Sciences department at the University of California, Berkeley, in 2020. His doctoral advisor was Pieter Abbeel, a robotics and deep learning researcher whose group produced a long line of generative modeling work. Ho's dissertation, "Deep Generative Models: Imitation Learning, Image Synthesis, and Compression," gathered his work on imitation learning, flow based models, and the diffusion models that he became known for. [1][9]
During this period he also worked on likelihood based generative models. In 2019 he was the lead author of "Flow++," presented at ICML, which improved flow based generative models through variational dequantization and a redesigned architecture. The coauthors included Xi Chen, Aravind Srinivas, Yan Duan, and Pieter Abbeel. Flow++ pushed normalizing flows toward stronger density estimation results, and the experience with likelihood based image models fed directly into his later diffusion work. [10]
Diffusion probabilistic models had been proposed earlier, most notably by Jascha Sohl-Dickstein and colleagues in 2015, but they had not yet produced image samples competitive with the best generative adversarial networks. The idea is to define a forward process that gradually adds Gaussian noise to an image until it becomes pure noise, then to train a neural network to reverse that process step by step, turning noise back into a sample from the data distribution. [2][11]
In "Denoising Diffusion Probabilistic Models," posted to arXiv in June 2020 and presented at NeurIPS 2020, Ho, Ajay Jain, and Pieter Abbeel rewrote the training objective in a form that was simple to optimize. Rather than predicting the reverse distribution directly, the network was trained to predict the noise that had been added at each step, using a weighted variant of the variational bound that reduced to a plain mean squared error loss. The authors connected this objective to denoising score matching and to Langevin dynamics, which gave the method a clear theoretical footing. On the unconditional CIFAR-10 benchmark the model reached an Inception score of 9.46 and a Frechet Inception Distance of 3.17, results that were state of the art for the dataset at the time and that put diffusion models on the same level as the leading GANs. [2][11]
The paper mattered because it made diffusion models practical. The training procedure was stable and did not require the adversarial balancing act that made GANs hard to train, and the loss was easy to implement. Within a short time the approach was extended, scaled, and adopted across the field, and DDPM became one of the most cited papers in modern generative modeling. [2][3]
The rise of diffusion models is often told together with the parallel line of score-based generative models developed by Yang Song and Stefano Ermon. Their 2019 work on noise conditional score networks estimated the gradient of the data density, called the score, at multiple noise levels and used Langevin dynamics to sample. The two approaches turned out to be closely related. In 2021 Song and coauthors, including Ermon, framed both DDPM and score-based models as discretizations of a single continuous time process described by a stochastic differential equation, which unified the two views and clarified why the denoising objective in DDPM worked. Ho's contribution sat on the diffusion side of this convergence, and the noise prediction objective he introduced is one of the standard ways the score is learned in practice. [12][13]
After his PhD, Ho continued to develop the diffusion framework at Google, and several of his later methods became default components of large generative systems.
In 2022 he and Tim Salimans introduced classifier-free guidance in the paper "Classifier-Free Diffusion Guidance." Earlier work had improved sample quality by steering a diffusion model with the gradient of a separate image classifier, an approach called classifier guidance. Ho and Salimans showed that the same trade off between sample quality and diversity could be reached without any extra classifier. Their method trains one network to act as both a conditional and an unconditional model, then combines the two score estimates at sampling time using a single guidance weight. The technique was simple to apply and became the standard way to control text-to-image diffusion models, used in systems such as DALL-E 2, GLIDE, and Imagen. [4][14]
Ho also worked on building diffusion models that generate high resolution images. In "Cascaded Diffusion Models for High Fidelity Image Generation," published in the Journal of Machine Learning Research, he and coauthors including Chitwan Saharia, William Chan, David Fleet, Mohammad Norouzi, and Tim Salimans chained several diffusion models together. A base model produces a small image, and a sequence of super-resolution diffusion models upsamples it to higher resolutions. A method they called conditioning augmentation, which adds noise to the low resolution inputs during training, was important for keeping sample quality high through the cascade. The pipeline produced class-conditional ImageNet samples that outperformed strong GAN and autoregressive baselines on Frechet Inception Distance. [5]
These ideas came together in Imagen, Google's text-to-image system described in the 2022 paper "Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding." Ho was one of the coauthors. Imagen paired a large frozen text encoder, taken from a pretrained language model, with a cascade of diffusion models and classifier-free guidance, and it reported strong results on photorealism and image text alignment. The system drew directly on the guidance and cascade techniques that Ho had helped develop. [6]
Ho then extended diffusion to moving images. In "Video Diffusion Models," presented in 2022, he and coauthors adapted the image diffusion architecture to space and time so that it could generate short video clips, an early demonstration that the same training recipe could handle temporal data. This line of work led to Imagen Video, a text-conditional video generation system built from a cascade of video diffusion models with classifier-free guidance, which Ho coauthored later in 2022. [7][15]
The denoising diffusion framework that Ho helped establish became the technical foundation of the text-to-image boom that began around 2022. The combination of a diffusion model and classifier-free guidance underlies most of the well known image generators of that period, including OpenAI's DALL-E 2, Google's Imagen, and the open source Stable Diffusion family. The two methods most directly associated with Ho, the DDPM training objective and classifier-free guidance, appear in nearly all of these systems. [3][14]
Diffusion models also displaced generative adversarial networks as the dominant approach to high quality image synthesis in research, in part because they were easier to train and scaled well with data and compute. The reach of the work extended past still images into video, audio, and other domains, and the noise prediction recipe from the 2020 paper remains a common starting point for new diffusion systems. [3][12]
After completing his PhD in 2020, Ho joined Google as a research scientist, working at Google Brain on the diffusion and text-to-image projects described above. Google Brain was later merged into Google DeepMind in 2023. [1][6]
In 2022 Ho cofounded Ideogram, a company focused on text-to-image generation, together with Mohammad Norouzi, William Chan, and Chitwan Saharia, all former Google Brain researchers who had worked on related generative modeling and Imagen research. The company is based in Toronto. It came out of stealth in August 2023 with an initial model and announced 16.5 million dollars in seed funding led by Andreessen Horowitz and Index Ventures. Ideogram drew attention for generating legible text inside images, a task that earlier image generators had struggled with, and it raised a further 80 million dollar funding round in February 2024 alongside the release of its 1.0 model. Ho is listed as a cofounder of the company. [3][16][17]
Ho's standing in the field rests mainly on the influence of his research rather than on formal prizes. "Denoising Diffusion Probabilistic Models" is among the most cited papers in generative modeling, and classifier-free guidance and cascaded diffusion models are widely used building blocks named after the techniques he introduced. Commentary on the founders of Ideogram routinely describes him as the lead author of the DDPM paper that defined the modern diffusion framework used by consumer image generators. The score-based modeling work that converged with his diffusion research, led by Yang Song and Stefano Ermon, received an Outstanding Paper Award at ICLR 2021, which reflects the broader recognition the surrounding area attracted. [2][3][13]
| Field | Detail |
|---|---|
| Name | Jonathan Ho |
| Fields | Machine learning, generative models, computer vision |
| Known for | Denoising diffusion probabilistic models (DDPM), classifier-free guidance, cascaded diffusion models, video diffusion models |
| Education | PhD, Electrical Engineering and Computer Sciences, University of California, Berkeley (2020) |
| Doctoral advisor | Pieter Abbeel |
| Dissertation | "Deep Generative Models: Imitation Learning, Image Synthesis, and Compression" (2020) |
| Notable papers | "Denoising Diffusion Probabilistic Models" (2020); "Classifier-Free Diffusion Guidance" (2022); "Cascaded Diffusion Models for High Fidelity Image Generation" (2022); "Video Diffusion Models" (2022) |
| Employers | Google Brain (research scientist); Ideogram (cofounder, 2022) |
| Earlier collaborators | Stefano Ermon (GAIL, 2016); Pieter Abbeel; Tim Salimans; Chitwan Saharia |