Diederik Kingma

Deep Learning Machine Learning People

10 min read

Updated Jun 28, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 28, 2026

Fact-checked

In review queue

Sources

14 citations

Revision

v3 · 2,090 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Diederik Kingma is a Dutch machine learning researcher and a founding member of OpenAI who is best known as the first author of the Adam optimizer ^[1] and the variational autoencoder (VAE) ^[2], two methods that became standard tools across deep learning. Adam is the default training algorithm for most modern neural networks, and the variational autoencoder introduced the reparameterization trick that underpins much of today's deep generative modeling. As of 2026 Kingma is a research scientist at Anthropic, working on large-scale machine learning, a role he took up in October 2024 after stints at OpenAI, Google Brain, and Google DeepMind ^[3]^[4].

He also introduced the reparameterization trick, co-developed the Glow normalizing flow model, and contributed to variational inference and diffusion model research. His formal name is Diederik P. Kingma. He commonly goes by the Frisian nickname Durk, which is pronounced like Dirk ^[5].

Who is Diederik Kingma?

Diederik P. Kingma is a deep learning researcher whose work sits at the intersection of probabilistic inference and neural networks. Over roughly a decade he produced several of the most widely used building blocks of modern machine learning: the Adam optimizer for training networks, the variational autoencoder for generative modeling, the Glow normalizing flow, and theoretical results connecting diffusion models to the variational lower bound. He was part of the founding team of OpenAI in 2015, later worked at Google, and joined Anthropic in 2024. Both of his best known papers have since won Test of Time Awards from the International Conference on Learning Representations (ICLR) ^[7]^[8].

Where did Diederik Kingma study?

Kingma completed a PhD in machine learning at the University of Amsterdam, where his advisor was Max Welling ^[5]. His doctoral thesis, titled Variational Inference and Deep Learning: A New Synthesis, brought together probabilistic inference and neural networks, and several of the methods that defined his early career appeared in it ^[5]^[6]. He defended the work in 2017 and received the cum laude distinction ^[5].

During his doctoral studies he received support from Google. In 2015 he was awarded one of the first European doctoral fellowships in deep learning offered by the company ^[5]. The period of his PhD coincided with the rapid growth of deep learning research, and the techniques he developed in Amsterdam connected the older field of variational Bayesian inference with the newer practice of training large neural networks by gradient descent.

What is the variational autoencoder and the reparameterization trick?

In 2013 Kingma and Welling released the paper Auto-Encoding Variational Bayes, which introduced what became known as the variational autoencoder, or VAE ^[2]. The preprint first appeared in December 2013, and the work was presented at the International Conference on Learning Representations in 2014 ^[2]. The paper addressed a long standing difficulty in probabilistic modeling, namely how to perform efficient inference and learning in directed probabilistic models that contain continuous latent variables with intractable posterior distributions, especially on large datasets ^[2].

The central technical idea is the reparameterization trick. Rather than sampling a latent variable directly, which blocks the flow of gradients, the method rewrites the sample as a deterministic function of the model parameters and an independent noise variable. This reparameterization yields a differentiable and unbiased estimator of the variational lower bound, often called the stochastic gradient variational Bayes estimator, which can be optimized with ordinary stochastic gradient methods ^[2]. The trick made it practical to train deep latent variable models end to end and connected probabilistic inference with the standard backpropagation machinery used for neural networks.

A second contribution of the paper is amortized inference. Instead of solving a separate optimization problem to infer the latent variables for each data point, the method trains a neural network, called the encoder or recognition model, to predict the parameters of the approximate posterior directly from the input. A second network, the decoder, maps latent variables back to the data space. Training both networks together produces a learned latent space in which nearby points correspond to similar data, which is useful for representation learning as well as for generation ^[2].

The variational autoencoder became one of the main families of deep generative models, alongside generative adversarial networks and, later, diffusion models. It is used to learn compressed latent representations of data and to generate new samples such as images. In 2024 the paper received the inaugural Test of Time Award at the International Conference on Learning Representations, recognizing its long lasting influence on the integration of deep learning with scalable probabilistic inference ^[7].

What is the Adam optimizer?

In December 2014 Kingma and Jimmy Ba published Adam: A Method for Stochastic Optimization ^[1]. The paper was presented at the International Conference on Learning Representations in 2015 in San Diego ^[1]. Adam is a first order gradient based optimization algorithm that adapts the learning rate for each parameter using running estimates of the first and second moments of the gradients ^[1]. The name derives from adaptive moment estimation. The algorithm maintains exponential moving averages of the gradient and of the squared gradient, applies a bias correction to those estimates so that they are accurate in the early steps of training, and then uses them to scale the update for each parameter individually ^[1]. The method grew partly out of Kingma's own need for a better optimizer to train the variational autoencoders he had developed, and was first drafted while he and Ba were interns at DeepMind in 2014 ^[8].

Adam combined ideas from earlier adaptive methods such as AdaGrad and RMSProp into a single procedure that worked well with little tuning across many problems. One reason for its broad adoption is that its default hyperparameters perform reasonably across many tasks, which lowers the effort needed to train a new model. It became the default optimizer for training a wide range of neural network architectures, and its variant AdamW is also widely used. A survey of recent conference papers found that Adam or its AdamW variant was mentioned in over half of ICLR 2025 submissions, and in almost 90 percent of the ICLR 2025 papers that mention an optimizer at all, an indication of how thoroughly the method has been adopted ^[8]. The Adam paper is among the most cited works in modern science, with a citation count well into the hundreds of thousands ^[9]. In 2025 it received the Test of Time Award at the International Conference on Learning Representations ^[8].

What is Glow and how does it relate to normalizing flows?

In 2018 Kingma and Prafulla Dhariwal introduced Glow, described in the paper Glow: Generative Flow with Invertible 1x1 Convolutions ^[10]. Glow is a normalizing flow, a class of generative model that transforms a simple base distribution into a complex data distribution through a sequence of invertible mappings. Because each transformation is invertible, the model can compute exact likelihoods and generate samples by running the flow in reverse. The key component of Glow is an invertible 1 by 1 convolution, which generalizes a fixed permutation of channels and improves the expressiveness of the flow ^[10]. The model produced high quality image synthesis and demonstrated that flow based methods could scale to large images. The work was presented at the Conference on Neural Information Processing Systems in 2018.

What other research has Diederik Kingma done?

Kingma continued to develop methods that link variational inference with deep learning. With Tim Salimans and Max Welling he introduced variational dropout and the local reparameterization trick, presented at the Conference on Neural Information Processing Systems in 2015 ^[11]. That work reduced the variance of stochastic gradients in variational Bayesian inference and reinterpreted the dropout regularization technique as a form of variational inference in which the dropout rates can be learned ^[11]. He also co-authored work on semi-supervised learning with deep generative models.

In later years his research moved toward diffusion models. With Tim Salimans, Ben Poole, and Jonathan Ho he co-authored Variational Diffusion Models, presented at the Conference on Neural Information Processing Systems in 2021, which analyzed diffusion based generative models through the lens of the variational lower bound and improved likelihood estimation ^[12]. He continued related work connecting diffusion training objectives to the evidence lower bound.

Where has Diederik Kingma worked?

Before his research career, Kingma co-founded a technology company named Advanza, where he served as a technical lead. The company was later acquired ^[5].

In 2015 Kingma was part of the founding team of OpenAI, where he worked as a research scientist ^[3]^[5]. He led the organization's algorithms team, which developed methods for generative AI models, and his work fed into systems such as DALL-E 3 and ChatGPT ^[3]. He left this role in 2018 ^[4]^[13].

In July 2018 Kingma joined Google as a research scientist at Google Brain ^[3]. Google Brain was merged with DeepMind to form Google DeepMind in 2023 ^[13]. At Google he worked on generative models, including research on text to image generation ^[3]^[14].

On 1 October 2024 Kingma announced that he had joined Anthropic ^[3]^[4]^[13]. Describing the move, he wrote that "Anthropic's approach to AI development resonates significantly with my own beliefs," and that he was "looking forward to contributing to Anthropic's mission of developing powerful AI systems responsibly" ^[3]. He arranged to work largely remotely from the Netherlands while making regular visits to the San Francisco Bay Area ^[3]^[13]. His move was reported as part of a broader pattern of early OpenAI figures joining the rival company ^[13]. As of 2026 his publicly listed role remains research on large-scale machine learning at Anthropic ^[5].

How influential is Diederik Kingma's work?

Kingma's research is among the most influential in contemporary machine learning, measured by both adoption and citation. The Adam paper has accumulated a citation count in the hundreds of thousands, placing it among the most cited scientific papers of any field ^[9]. Both of his best known papers have received Test of Time Awards from the International Conference on Learning Representations, the variational autoencoder paper in 2024 and the Adam paper in 2025 ^[7]^[8].

He has also received recognition for his doctoral work and contributions to European machine learning research, including a European doctoral fellowship from Google and other honors associated with his thesis ^[5]. The combination of widely used optimization methods and foundational generative modeling techniques has made his work a common reference point in both research and applied deep learning. The reparameterization trick in particular reappears across later generative models, including the diffusion models that became prominent in image and video generation, which gives his early work continued relevance well beyond the variational autoencoder itself.

Selected facts

Field	Detail
Full name	Diederik P. Kingma
Also known as	Durk Kingma
Nationality	Dutch
Field	Machine learning, deep learning
PhD institution	University of Amsterdam
Doctoral advisor	Max Welling
PhD year	2017 (cum laude)
Known for	Adam optimizer, variational autoencoder, reparameterization trick, Glow
OpenAI	Founding team, research scientist, 2015 to 2018
Google	Research scientist, Google Brain and Google DeepMind, from 2018
Anthropic	Research scientist, large-scale machine learning, joined October 2024
Website	dpkingma.com

References

Kingma, Diederik P.; Ba, Jimmy. "Adam: A Method for Stochastic Optimization." arXiv:1412.6980, December 2014. Presented at ICLR 2015. https://arxiv.org/abs/1412.6980 ↩
Kingma, Diederik P.; Welling, Max. "Auto-Encoding Variational Bayes." arXiv:1312.6114, December 2013. Presented at ICLR 2014. https://arxiv.org/abs/1312.6114 ↩
"Anthropic hires OpenAI co-founder Durk Kingma." TechCrunch, 1 October 2024. https://techcrunch.com/2024/10/01/anthropic-hires-openai-co-founder-durk-kingma/ ↩
"OpenAI Cofounder Durk Kingma Joins Anthropic." Maginative, October 2024. https://www.maginative.com/article/openai-cofounder-durk-kingma-joins-anthropic/ ↩
"Diederik P. (Durk) Kingma." Personal website. https://dpkingma.com/ ↩
Kingma, Diederik P. "Variational Inference and Deep Learning: A New Synthesis." PhD thesis, University of Amsterdam, 2017. ↩
"ICLR 2024 Test of Time Award." ICLR Blog, 7 May 2024. https://blog.iclr.cc/2024/05/07/iclr-2024-test-of-time-award/ ↩
"ICLR 2025 Test of Time Award: The Adam Optimiser and Attention." JoltML, 2025. https://joltml.com/iclr-2025/test-of-time-award/ ↩
"Adam: A Method for Stochastic Optimization." SciSpace paper record (citation count). https://scispace.com/papers/adam-a-method-for-stochastic-optimization-6rb5vm3utj ↩
Kingma, Diederik P.; Dhariwal, Prafulla. "Glow: Generative Flow with Invertible 1x1 Convolutions." arXiv:1807.03039, July 2018. Presented at NeurIPS 2018. https://arxiv.org/abs/1807.03039 ↩
Kingma, Diederik P.; Salimans, Tim; Welling, Max. "Variational Dropout and the Local Reparameterization Trick." arXiv:1506.02557, 2015. Presented at NeurIPS 2015. https://arxiv.org/abs/1506.02557 ↩
Kingma, Diederik P.; Salimans, Tim; Poole, Ben; Ho, Jonathan. "Variational Diffusion Models." arXiv:2107.00630, 2021. Presented at NeurIPS 2021. https://arxiv.org/abs/2107.00630 ↩
"Another OpenAI founder moves to arch-rival Anthropic." The Register, 2 October 2024. https://www.theregister.com/2024/10/02/anthropic_hires_openai_founder_durk_kingma/ ↩
"Durk Kingma." Wikipedia. https://en.wikipedia.org/wiki/Durk_Kingma ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

Adam optimizer Jimmy Ba Variational Autoencoder

Who is Diederik Kingma?

Where did Diederik Kingma study?

What is the variational autoencoder and the reparameterization trick?

What is the Adam optimizer?

What is Glow and how does it relate to normalizing flows?

What other research has Diederik Kingma done?

Where has Diederik Kingma worked?

How influential is Diederik Kingma's work?

Selected facts

References

Improve this article

Related Articles

Ilya Sutskever

Andrej Karpathy

François Chollet

Kaiming He

Ian Goodfellow

Jürgen Schmidhuber

What links here

Related Articles

Ilya Sutskever

Andrej Karpathy

François Chollet

Kaiming He

Ian Goodfellow

Jürgen Schmidhuber

What links here