Diederik Kingma
Last reviewed
May 31, 2026
Sources
14 citations
Review status
Source-backed
Revision
v1 · 1,828 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 31, 2026
Sources
14 citations
Review status
Source-backed
Revision
v1 · 1,828 words
Add missing citations, update stale details, or suggest a clearer explanation.
Diederik Kingma is a Dutch machine learning researcher known for foundational work in deep generative modeling and optimization. He is the first author of the Adam optimizer [1] and the variational autoencoder [2], two methods that became standard tools across deep learning. He also introduced the reparameterization trick, co-developed the Glow normalizing flow model, and contributed to variational inference and diffusion model research. After helping found OpenAI in 2015, he worked at Google Brain and Google DeepMind, and in 2024 he joined Anthropic [3][4].
His formal name is Diederik P. Kingma. He commonly goes by the Frisian nickname Durk, which is pronounced like Dirk [5].
Kingma completed a PhD in machine learning at the University of Amsterdam, where his advisor was Max Welling [5]. His doctoral thesis, titled Variational Inference and Deep Learning: A New Synthesis, brought together probabilistic inference and neural networks, and several of the methods that defined his early career appeared in it [5][6]. He defended the work in 2017 and received the cum laude distinction [5].
During his doctoral studies he received support from Google. In 2015 he was awarded one of the first European doctoral fellowships in deep learning offered by the company [5]. The period of his PhD coincided with the rapid growth of deep learning research, and the techniques he developed in Amsterdam connected the older field of variational Bayesian inference with the newer practice of training large neural networks by gradient descent.
In 2013 Kingma and Welling released the paper Auto-Encoding Variational Bayes, which introduced what became known as the variational autoencoder, or VAE [2]. The preprint first appeared in December 2013, and the work was presented at the International Conference on Learning Representations in 2014 [2]. The paper addressed a long standing difficulty in probabilistic modeling, namely how to perform efficient inference and learning in directed probabilistic models that contain continuous latent variables with intractable posterior distributions, especially on large datasets [2].
The central technical idea is the reparameterization trick. Rather than sampling a latent variable directly, which blocks the flow of gradients, the method rewrites the sample as a deterministic function of the model parameters and an independent noise variable. This reparameterization yields a differentiable and unbiased estimator of the variational lower bound, often called the stochastic gradient variational Bayes estimator, which can be optimized with ordinary stochastic gradient methods [2]. The trick made it practical to train deep latent variable models end to end and connected probabilistic inference with the standard backpropagation machinery used for neural networks.
A second contribution of the paper is amortized inference. Instead of solving a separate optimization problem to infer the latent variables for each data point, the method trains a neural network, called the encoder or recognition model, to predict the parameters of the approximate posterior directly from the input. A second network, the decoder, maps latent variables back to the data space. Training both networks together produces a learned latent space in which nearby points correspond to similar data, which is useful for representation learning as well as for generation [2].
The variational autoencoder became one of the main families of deep generative models, alongside generative adversarial networks and, later, diffusion models. It is used to learn compressed latent representations of data and to generate new samples such as images. In 2024 the paper received the inaugural Test of Time Award at the International Conference on Learning Representations, recognizing its long lasting influence on the integration of deep learning with scalable probabilistic inference [7].
In December 2014 Kingma and Jimmy Ba published Adam: A Method for Stochastic Optimization [1]. The paper was presented at the International Conference on Learning Representations in 2015 in San Diego [1]. Adam is a first order gradient based optimization algorithm that adapts the learning rate for each parameter using running estimates of the first and second moments of the gradients [1]. The name derives from adaptive moment estimation. The algorithm maintains exponential moving averages of the gradient and of the squared gradient, applies a bias correction to those estimates so that they are accurate in the early steps of training, and then uses them to scale the update for each parameter individually [1].
Adam combined ideas from earlier adaptive methods such as AdaGrad and RMSProp into a single procedure that worked well with little tuning across many problems. One reason for its broad adoption is that its default hyperparameters perform reasonably across many tasks, which lowers the effort needed to train a new model. It became the default optimizer for training a wide range of neural network architectures, and its variant AdamW is also widely used. A survey of recent conference papers found that Adam or AdamW was referenced in a large majority of submissions that mention an optimizer, an indication of how thoroughly the method has been adopted [8]. The Adam paper is among the most cited works in modern science, with a citation count well into the hundreds of thousands [9]. In 2025 it received the Test of Time Award at the International Conference on Learning Representations [8].
In 2018 Kingma and Prafulla Dhariwal introduced Glow, described in the paper Glow: Generative Flow with Invertible 1x1 Convolutions [10]. Glow is a normalizing flow, a class of generative model that transforms a simple base distribution into a complex data distribution through a sequence of invertible mappings. Because each transformation is invertible, the model can compute exact likelihoods and generate samples by running the flow in reverse. The key component of Glow is an invertible 1 by 1 convolution, which generalizes a fixed permutation of channels and improves the expressiveness of the flow [10]. The model produced high quality image synthesis and demonstrated that flow based methods could scale to large images. The work was presented at the Conference on Neural Information Processing Systems in 2018.
Kingma continued to develop methods that link variational inference with deep learning. With Tim Salimans and Max Welling he introduced variational dropout and the local reparameterization trick, presented at the Conference on Neural Information Processing Systems in 2015 [11]. That work reduced the variance of stochastic gradients in variational Bayesian inference and reinterpreted the dropout regularization technique as a form of variational inference in which the dropout rates can be learned [11]. He also co-authored work on semi-supervised learning with deep generative models.
In later years his research moved toward diffusion models. With Tim Salimans, Ben Poole, and Jonathan Ho he co-authored Variational Diffusion Models, presented at the Conference on Neural Information Processing Systems in 2021, which analyzed diffusion based generative models through the lens of the variational lower bound and improved likelihood estimation [12]. He continued related work connecting diffusion training objectives to the evidence lower bound.
Before his research career, Kingma co-founded a technology company named Advanza, where he served as a technical lead. The company was later acquired [5].
In 2015 Kingma was part of the founding team of OpenAI, where he worked as a research scientist [3][5]. He led the organization's algorithms team, which developed methods for generative AI models [3]. He left this role in 2018 [4][13].
In July 2018 Kingma joined Google as a research scientist at Google Brain [3]. Google Brain was merged with DeepMind to form Google DeepMind in 2023 [13]. At Google he worked on generative models, including research on text to image generation [3][14].
On 1 October 2024 Kingma announced that he had joined Anthropic [3][4][13]. He stated that the company's approach to AI development aligned with his own views and that he looked forward to contributing to its work on building powerful AI systems responsibly [3]. He arranged to work largely remotely from the Netherlands while making regular visits to the San Francisco Bay Area [3][13]. His move was reported as part of a broader pattern of early OpenAI figures joining the rival company [13].
Kingma's research is among the most influential in contemporary machine learning, measured by both adoption and citation. The Adam paper has accumulated a citation count in the hundreds of thousands, placing it among the most cited scientific papers of any field [9]. Both of his best known papers have received Test of Time Awards from the International Conference on Learning Representations, the variational autoencoder paper in 2024 and the Adam paper in 2025 [7][8].
He has also received recognition for his doctoral work and contributions to European machine learning research, including a European doctoral fellowship from Google and other honors associated with his thesis [5]. The combination of widely used optimization methods and foundational generative modeling techniques has made his work a common reference point in both research and applied deep learning. The reparameterization trick in particular reappears across later generative models, including the diffusion models that became prominent in image and video generation, which gives his early work continued relevance well beyond the variational autoencoder itself.
| Field | Detail |
|---|---|
| Full name | Diederik P. Kingma |
| Also known as | Durk Kingma |
| Nationality | Dutch |
| Field | Machine learning, deep learning |
| PhD institution | University of Amsterdam |
| Doctoral advisor | Max Welling |
| PhD year | 2017 (cum laude) |
| Known for | Adam optimizer, variational autoencoder, reparameterization trick, Glow |
| OpenAI | Founding team, research scientist, 2015 to 2018 |
| Research scientist, Google Brain and Google DeepMind, from 2018 | |
| Anthropic | Joined 2024 |
| Website | dpkingma.com |