David Silver

Google DeepMind People Reinforcement Learning

20 min read

Updated Jul 7, 2026

Suggest edit History Talk

RawGraph

Last edited

Jul 7, 2026

Fact-checked

In review queue

Sources

28 citations

Revision

v6 · 4,086 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

David Silver is a British computer scientist whose work has defined the modern field of deep reinforcement learning and computer game-playing. For more than a decade he served as a principal research scientist at Google DeepMind and as a professor of computer science at University College London (UCL), where he led the research programmes behind AlphaGo, AlphaZero, and MuZero: successive systems that achieved superhuman performance in Go, chess, shogi, and the Atari 2600 suite using progressively less hand-engineered knowledge.^[1]^[2]^[3]

Silver is best known as the technical lead of the AlphaGo project, which in March 2016 defeated 18-time world Go champion Lee Sedol four games to one at the Four Seasons Hotel in Seoul, an event watched by more than 200 million people and widely viewed as the most significant public demonstration of artificial intelligence since Deep Blue's 1997 victory over Garry Kasparov.^[4]^[5] He went on to lead the development of AlphaGo Zero, which mastered Go entirely through self-play; AlphaZero, which extended the same algorithm to chess and shogi; and MuZero, which removed the requirement that the system be given the rules of the game in advance.^[6]^[7]^[3] He was also a co-author of the 2015 Nature paper introducing the Deep Q-Network (DQN), the first deep reinforcement learning agent to achieve human-level performance on a broad range of Atari games.^[8]

For these contributions Silver received the 2019 ACM Prize in Computing, was elected a Fellow of the Royal Society in 2021, and was elected a Fellow of the Association for the Advancement of Artificial Intelligence in 2022.^[1]^[9]^[10] In November 2025 he co-founded the London-based AI startup Ineffable Intelligence and in January 2026 he left Google DeepMind to lead the company full time.^[11]^[12]

Key facts


Born	1976 (United Kingdom)^[10]
Education	BA, Christ's College, Cambridge (1997); MSc, Cambridge (2000); PhD, University of Alberta (2009)^[1]^[10]
Doctoral advisor	Richard S. Sutton^[10]
Thesis	Reinforcement Learning and Simulation-Based Search in Computer Go (2009)^[10]
Known for	DQN, AlphaGo, AlphaGo Zero, AlphaZero, MuZero, AlphaStar, "Reward is enough" hypothesis^[1]^[3]^[13]
Positions	Co-founder & CTO, Elixir Studios (1998-2005); Royal Society University Research Fellow, UCL (2011-); Principal research scientist, DeepMind/Google DeepMind (2013-2026); Professor of Computer Science, UCL; CEO/Director, Ineffable Intelligence (2026-)^[1]^[11]^[12]
Major awards	Royal Academy of Engineering Silver Medal (2017); Mensa Foundation Prize; Marvin Minsky Medal (2018); 2019 ACM Prize in Computing; FRS (2021); AAAI Fellow (2022)^[14]^[9]^[1]^[10]
Citations	h-index of 104, i10-index of 182, and more than 313,000 total citations on Google Scholar as of 2026; among the most-cited researchers in artificial intelligence^[15]

Early life and education

Silver was born in the United Kingdom in 1976.^[10] He read computer science at Christ's College, Cambridge, where he was awarded the Addison-Wesley Prize on graduation in 1997.^[1]^[10] He completed an MSc at Cambridge in 2000 alongside his industry work.^[1]

In 2004 Silver returned to academia, enrolling at the University of Alberta to pursue a PhD in computer science under the supervision of Richard S. Sutton, one of the founders of modern reinforcement learning and co-author of the field's standard textbook.^[1]^[10] Edmonton at the time was a leading centre for both reinforcement learning and computer Go, hosting the long-running MoGo and Fuego programmes. Silver's doctoral research focused on combining reinforcement learning with simulation-based search to play the ancient board game of Go, a domain widely considered the "grand challenge" of artificial intelligence because of its enormous branching factor and the difficulty of writing a good evaluation function.^[10] His thesis, Reinforcement Learning and Simulation-Based Search in Computer Go, was defended in 2009 and established several of the algorithmic ideas (temporal-difference learning combined with Monte Carlo rollouts and policy/value function approximation) that he would later scale up at DeepMind.^[10]

Industry interlude: Elixir Studios

Between completing his Cambridge undergraduate degree and beginning doctoral work, Silver co-founded the British video game developer Elixir Studios in 1998 with university friend Demis Hassabis.^[1]^[10] At Elixir, Silver served as chief technology officer and lead programmer, responsible for game engine architecture and artificial intelligence systems for titles including Republic: The Revolution and Evil Genius.^[1] The company received several British Academy of Film and Television Arts (BAFTA) and Develop Industry Excellence awards for technology and innovation before closing in 2005.^[1] The collaboration with Hassabis would prove foundational: more than a decade later, Hassabis (then CEO of DeepMind) would recruit Silver as one of the company's earliest senior researchers.

Academic career: University College London

After completing his PhD, Silver took a faculty position at University College London. In 2011 he was awarded a Royal Society University Research Fellowship, a prestigious five-year (and renewable) award for early-career researchers in the UK; he formally joined UCL's Department of Computer Science as a lecturer the same year.^[16]^[1] He was later promoted to professor of computer science.^[9] His UCL appointment ran jointly with his DeepMind research role, an arrangement common among DeepMind's senior scientists.

At UCL, Silver taught the postgraduate course COMPM050 / COMPGI13 "Reinforcement Learning," which he delivered as a sequence of ten ninety-minute lectures during the 2015 academic year. Recorded video of the lecture series, covering Markov decision processes, dynamic programming, model-free prediction and control, function approximation, policy-gradient methods, integration of learning and planning, and exploration, was posted on YouTube under DeepMind's account and has since accumulated millions of views, becoming one of the most widely used introductions to reinforcement learning worldwide.^[17] The slides remain a standard reference for graduate courses in the subject at many universities.

What did David Silver do at Google DeepMind?

Silver began collaborating with DeepMind as a consultant in 2010, shortly after the company was founded in London by Demis Hassabis, Shane Legg, and Mustafa Suleyman.^[10] He joined full time in 2013, one of the company's earliest senior researchers, and led DeepMind's reinforcement learning research group through the company's 2014 acquisition by Google and its 2023 reorganisation as Google DeepMind.^[1]^[10] Across his tenure he was the driving figure behind DeepMind's game-playing programme, from DQN and AlphaGo through AlphaZero, MuZero, and AlphaStar, and he later contributed to the company's language-model and mathematical-reasoning work. In January 2026 he stepped down as a principal research scientist to focus on his startup Ineffable Intelligence.^[11]^[12]

Major research contributions

Deep Q-Networks (2013-2015)

Silver was a co-author of the foundational paper that launched the field of deep reinforcement learning. The first version, a 2013 NeurIPS workshop paper by Volodymyr Mnih and colleagues titled "Playing Atari with Deep Reinforcement Learning," introduced the Deep Q-Network, a convolutional neural network trained with Q-learning to map raw pixel observations to action values.^[18] A more comprehensive version, "Human-level control through deep reinforcement learning," was published in Nature in February 2015 with Silver as one of the senior co-authors.^[8] The system received only the screen pixels and game score and learned to play 49 Atari 2600 games at or above professional human level using the same algorithm, network architecture, and hyperparameters in each game. The DQN paper is one of the most-cited works in the modern AI literature, with more than 43,000 citations on Google Scholar, and is credited with reigniting interest in reinforcement learning after a long period of relative quiet.^[8]^[15]

AlphaGo (2014-2016)

The AlphaGo programme grew out of an earlier collaboration between Silver and his DeepMind colleague Aja Huang, a Taiwanese computer-Go researcher who had previously contributed to the leading Monte-Carlo Go programmes Erica and Crazy Stone. Beginning around 2013-2014 the pair led an effort to build a Go program capable of defeating top human players, a challenge that had eluded researchers for two decades despite intense effort.^[24] The resulting system, AlphaGo, combined deep convolutional policy and value networks trained on expert human games (and subsequently refined by self-play) with Monte Carlo tree search.^[2] The original Nature paper, "Mastering the game of Go with deep neural networks and tree search," appeared in January 2016 with Silver and Huang as joint first authors and Demis Hassabis as senior author.^[2] In October 2015 a preliminary version of AlphaGo defeated the European Go champion Fan Hui 5-0 in a closed match in DeepMind's London office, the first time a computer program had beaten a professional human player at full-size 19×19 Go.^[2]

The decisive public test came between 9 and 15 March 2016 at the Four Seasons Hotel in Seoul, where AlphaGo played a five-game match against the South Korean Go master Lee Sedol, then widely regarded as one of the strongest players of the previous decade. AlphaGo won 4-1 in front of a global audience of an estimated 200 million viewers; the $1 million prize was donated to UNICEF and Go-related charities.^[4]^[5] DeepMind's documentary film about the match, AlphaGo, was released in 2017. The Korea Baduk Association awarded AlphaGo an honorary 9-dan diploma.^[5]

AlphaGo Master and the Future of Go Summit (2017)

A successor system, internally called AlphaGo Master, played a series of online speed-Go games in early 2017 under the pseudonyms "Magister" and "Master," winning 60 consecutive games against top professionals.^[19] In May 2017, at the Future of Go Summit in Wuzhen, China, AlphaGo Master defeated Ke Jie, then the world's top-ranked player, 3-0 in a formal three-game match, and won a "pair Go" exhibition.^[19] The Chinese Weiqi Association awarded AlphaGo a professional 9-dan diploma. After the summit DeepMind announced that AlphaGo would retire from competitive play.

AlphaGo Zero (2017)

In October 2017 Silver led publication of a paper in Nature titled "Mastering the game of Go without human knowledge," introducing AlphaGo Zero.^[6] Unlike its predecessors, AlphaGo Zero was trained tabula rasa, without any human game records, opening books, or hand-crafted features beyond the rules of Go. It used a single neural network with a residual architecture that output both a policy and a value, and a simplified tree-search procedure for training. After three days of self-play training on a single machine with four Google tensor processing units, AlphaGo Zero defeated the version of AlphaGo that had beaten Lee Sedol by 100 games to 0.^[6] The paper argued that the most consequential prior knowledge in earlier AlphaGo versions: the imitation of human expert play had been not just unnecessary but actively limiting.

AlphaZero (2017-2018)

In December 2017 Silver and colleagues posted "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" on the arXiv preprint server, introducing AlphaZero, a generalisation of the AlphaGo Zero algorithm that learned the games of chess, shogi (Japanese chess), and Go using the same algorithm, network architecture, and hyperparameters in each.^[20] AlphaZero was given only the rules of each game and learned superhuman play purely from self-play. Within 9 hours of chess training it defeated Stockfish, then the strongest open-source chess engine; within 12 hours of shogi training it defeated Elmo, then the strongest shogi engine.^[20] The full peer-reviewed paper, "A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play," appeared in Science in December 2018.^[7] AlphaZero's distinctive playing style, characterised by long-term positional sacrifices and unconventional pawn structures, has had a lasting influence on top-level computer chess and on human opening preparation.

MuZero (2019-2020)

In November 2019 Silver and colleagues posted a preprint introducing MuZero, which extended the AlphaZero framework to settings where the rules of the environment are not known in advance.^[3] MuZero learns a model of the environment dynamics that is sufficient for planning, predicting the action-selection policy, the value function, and the immediate reward at each hypothetical future state, without ever attempting to reconstruct the full state of the environment. The peer-reviewed paper, "Mastering Atari, Go, chess and shogi by planning with a learned model," was published in Nature in December 2020.^[3] MuZero matched AlphaZero's superhuman performance on Go, chess, and shogi and simultaneously set a new state of the art on the Atari benchmark, the first time a single algorithm had achieved leading performance across both perfectly observed board games and the visually rich Atari domain.^[3]

AlphaStar (2019)

Silver was a co-lead on AlphaStar, DeepMind's StarCraft II agent, which in 2019 became the first artificial system to reach Grandmaster level in the real-time strategy game across all three playable races. The work was published in Nature in October 2019 under the title "Grandmaster level in StarCraft II using multi-agent reinforcement learning."^[21] Unlike AlphaGo and AlphaZero, AlphaStar combined imitation learning from human replays, multi-agent reinforcement learning in a structured "league" of agents, and policy distillation, and operated under human-like interface constraints (a limited action-per-minute budget and a camera-restricted view).^[21]

"Reward is enough" hypothesis (2021)

In October 2021 Silver, together with Richard S. Sutton, Satinder Singh, and Doina Precup, published "Reward is Enough" in the journal Artificial Intelligence.^[13] The paper articulated the hypothesis that "intelligence, and its associated abilities, can be understood as subserving the maximisation of reward by an agent acting in its environment," arguing that the suite of capabilities studied in natural and artificial intelligence (knowledge, learning, perception, social intelligence, language, generalisation, and imitation) can in principle emerge from reward maximisation alone in sufficiently rich environments.^[13] The paper has been one of the most discussed (and contested) theoretical statements in modern reinforcement learning and prompted a substantial response literature.

Later DeepMind work (2022-2025)

In the years following MuZero, Silver and his colleagues continued to develop reinforcement learning systems for increasingly open-ended domains. He contributed to DeepMind's Gemini language-model programme and to its mathematical-reasoning efforts. In July 2024 DeepMind announced that two systems, AlphaProof and AlphaGeometry 2, had together achieved a performance equivalent to a silver medal at the 2024 International Mathematical Olympiad, solving four of the six problems including the hardest.^[22] The peer-reviewed AlphaProof paper, "Olympiad-level formal mathematical reasoning with reinforcement learning," was published in Nature in 2025; AlphaProof applied an AlphaZero-style reinforcement learning loop to the formal proof assistant Lean.^[23] (For the page on the system itself, see alphaproof and alphageometry.)

Research philosophy

Silver's research outlook is closely aligned with what is sometimes called the "Alberta school" of artificial intelligence, the tradition associated with his doctoral advisor Richard Sutton, which emphasises scalable computational methods that allow agents to learn directly from experience rather than from human-labelled data.^[10]^[12] Across DQN, the AlphaGo lineage, AlphaZero, MuZero, and the "Reward is enough" paper, Silver has consistently argued that the most general, and ultimately most powerful, route to advanced artificial intelligence is for an agent to maximise a scalar reward signal through interaction with its environment, discovering its own representations, strategies, and knowledge in the process.^[13]^[12] In a 2025 essay co-authored with Sutton, "Welcome to the Era of Experience" (discussed below), Silver argued that AI systems trained on human data are approaching the limits of what human knowledge can teach them, and that experience-based reinforcement learning is the more promising route to systems that can discover genuinely new knowledge.^[12]^[27] This view forms the explicit founding thesis of Ineffable Intelligence.^[11]^[12]

What is the "Era of Experience"?

"Welcome to the Era of Experience" is a widely read position paper that Silver co-authored with his former doctoral advisor Richard S. Sutton and released in April 2025 as a preprint of a chapter for the book Designing an Intelligence (MIT Press).^[27] The essay argues that AI is entering a new phase of development: following an era of simulation-based systems (exemplified by AlphaGo and AlphaZero) and an era built on human-generated data (exemplified by large language models), progress will increasingly come from agents that learn from their own experience. Silver and Sutton write that "experience will become the dominant medium of improvement and ultimately dwarf the scale of human data," and call for "agents to learn continually from their own experience, i.e., data that is generated by the agent interacting with its environment."^[27] They contend that the supply of high-quality human data in domains such as mathematics, coding, and science is being exhausted, and point to systems such as AlphaProof, which learns formal proofs through self-generated exploration rather than by imitating human solutions, as early evidence that experience-driven reinforcement learning can push past the ceiling of human imitation. The essay is the explicit intellectual foundation of Silver's company, Ineffable Intelligence.^[12]^[27]

What is Ineffable Intelligence?

Ineffable Intelligence is a London-based artificial intelligence company that Silver co-founded in November 2025.^[11] He stepped down from his principal research scientist role at Google DeepMind in January 2026 to lead the new company full time as its director and CEO.^[11]^[12] In April 2026 Ineffable Intelligence emerged from stealth with a seed round of approximately US$1.1 billion at a US$5.1 billion valuation, co-led by Sequoia and Lightspeed, with participation from Nvidia, DST Global, Index Ventures, Google, the British Business Bank, and the UK Sovereign AI Fund; at the time it was the largest seed financing in European venture capital history.^[11]^[28] Silver has described the company's mission as building "an endlessly learning superintelligence that self-discovers the foundations of all knowledge," positioning its bet on experience-driven reinforcement learning as a deliberate alternative to the human-imitation paradigm dominant in contemporary frontier large language models.^[12]^[28] He has framed the ambition in sweeping terms: "If successful, this will represent a scientific breakthrough of comparable magnitude to Darwin: where his law explained all Life, our law will explain and build all Intelligence."^[28] Silver has also pledged to donate the proceeds of his personal equity in the company to charity, stating that "any money that I make from Ineffable will go to high-impact charities that save as many lives as possible."^[28]

Is David Silver's reinforcement learning course available online?

Silver's UCL Reinforcement Learning lecture course (delivered in spring 2015 and recorded for public release) has become a canonical teaching resource for the field.^[17] The ten-lecture series, covering Markov decision processes, dynamic programming, Monte Carlo and temporal-difference methods, function approximation, policy gradients, integration of learning and planning, exploration and exploitation, and case studies including TD-Gammon and Atari DQN, is closely aligned with Sutton and Barto's textbook Reinforcement Learning: An Introduction but adds substantial material on deep RL.^[17] The full playlist is freely available on YouTube, and the accompanying slide deck is used in graduate machine-learning programmes worldwide.

The 2015 lectures predate the public announcement of AlphaGo by less than a year and present in lecture form many of the building blocks (temporal-difference learning, function approximation with deep networks, Monte Carlo tree search) that were subsequently assembled into the AlphaGo architecture.^[17]

What awards has David Silver won?

Silver's honours include:

2011: Royal Society University Research Fellowship^[16]
2017: Royal Academy of Engineering Silver Medal, awarded for an outstanding personal contribution to British engineering^[14]
2017: Mensa Foundation Prize^[9]
2018: Marvin Minsky Medal of the International Joint Conferences on Artificial Intelligence (IJCAI)^[9]
2019: ACM Prize in Computing, citation "for breakthrough advances in computer game-playing"^[1]^[9]
2021: Elected Fellow of the Royal Society (FRS)^[10]^[25]
2022: Elected Fellow of the Association for the Advancement of Artificial Intelligence (AAAI)^[10]

The ACM Prize in Computing carries a US$250,000 award funded by an endowment from Infosys.^[9] The ACM citation specifically credited Silver with "developing the AlphaGo algorithm" and with "fundamental contributions to deep reinforcement learning."^[1]^[9] Announcing the award, ACM President Cherri M. Pancake said, "Few other researchers have generated as much excitement in the AI field as David Silver."^[9]

Influence and reception

Silver's research has had a defining impact on the public perception of artificial intelligence in the 2010s and 2020s. The 2015 DQN paper, the 2016 AlphaGo paper, and the 2017 AlphaGo Zero paper appeared as cover stories in Nature and were among the most widely-reported scientific results of their respective years. AlphaGo's victory over Lee Sedol in particular is routinely cited as a watershed moment for deep reinforcement learning, comparable in cultural impact to Deep Blue's 1997 defeat of Garry Kasparov in chess but considered technically more significant because of Go's much larger state space and the absence of a strong hand-engineered evaluation function.^[5]

The AlphaZero playing style has had a lasting influence on top-level computer chess and on human opening preparation; the program's preference for long-term piece activity and willingness to sacrifice material for positional or initiative gains has been widely commented on by professional players and chess engine developers.^[7] In February 2022 DeepMind announced an extension of MuZero, "MuZero VP9," that learned to choose encoding decisions for the VP9 video codec and that reduced bitrate by roughly 4% at fixed quality on portions of YouTube traffic, among the first published deployments of an AlphaZero-lineage algorithm to a non-game industrial problem.^[26]

The "Reward is enough" hypothesis remains contested. Critics, including Silver's co-authors on subsequent papers, have argued that reward maximisation alone may be insufficient when reward signals are sparse, ambiguous, or contested between agents, and that multiobjective formulations may be required. The debate has nonetheless become one of the central theoretical questions of contemporary reinforcement learning research.^[13]

Selected publications

Mnih, V., Kavukcuoglu, K., Silver, D., et al. "Human-level control through deep reinforcement learning." Nature 518, 529-533 (2015).^[8]
Silver, D., Huang, A., Maddison, C. J., et al. "Mastering the game of Go with deep neural networks and tree search." Nature 529, 484-489 (2016).^[2]
Silver, D., Schrittwieser, J., Simonyan, K., et al. "Mastering the game of Go without human knowledge." Nature 550, 354-359 (2017).^[6]
Silver, D., Hubert, T., Schrittwieser, J., et al. "A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play." Science 362, 1140-1144 (2018).^[7]
Vinyals, O., Babuschkin, I., Czarnecki, W. M., et al. (incl. Silver, D.). "Grandmaster level in StarCraft II using multi-agent reinforcement learning." Nature 575, 350-354 (2019).^[21]
Schrittwieser, J., Antonoglou, I., Hubert, T., et al. (incl. Silver, D.). "Mastering Atari, Go, chess and shogi by planning with a learned model." Nature 588, 604-609 (2020).^[3]
Silver, D., Singh, S., Precup, D., Sutton, R. S. "Reward is enough." Artificial Intelligence 299, 103535 (2021).^[13]
Silver, D., Sutton, R. S. "Welcome to the Era of Experience." Preprint, April 2025 (chapter for Designing an Intelligence, MIT Press).^[27]

References

ACM Press Release, "David Silver to receive 2019 ACM Prize in Computing," April 2020. https://awards.acm.org/about/2019-acm-prize ↩
Silver, D., Huang, A., et al. "Mastering the game of Go with deep neural networks and tree search." *Nature* 529, 484-489 (2016). https://www.nature.com/articles/nature16961 ↩
Schrittwieser, J., Antonoglou, I., Hubert, T., et al. "Mastering Atari, Go, chess and shogi by planning with a learned model." *Nature* 588, 604-609 (2020). https://www.nature.com/articles/s41586-020-03051-4 ↩
Google DeepMind blog, "AlphaGo's ultimate challenge: a five-game match against the legendary Lee Sedol." https://blog.google/technology/ai/alphagos-ultimate-challenge/ ↩
Wikipedia, "AlphaGo versus Lee Sedol." https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol ↩
Silver, D., Schrittwieser, J., et al. "Mastering the game of Go without human knowledge." *Nature* 550, 354-359 (2017). https://www.nature.com/articles/nature24270 ↩
Silver, D., Hubert, T., Schrittwieser, J., et al. "A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play." *Science* 362, 1140-1144 (2018). https://www.science.org/doi/10.1126/science.aar6404 ↩
Mnih, V., Kavukcuoglu, K., Silver, D., et al. "Human-level control through deep reinforcement learning." *Nature* 518, 529-533 (2015). https://www.nature.com/articles/nature14236 ↩
Eurekalert / ACM, "ACM Prize in Computing Awarded to AlphaGo Developer," 1 April 2020. https://www.eurekalert.org/news-releases/730838 ↩
Wikipedia, "David Silver (computer scientist)." https://en.wikipedia.org/wiki/David_Silver_(computer_scientist) ↩
CNBC, "Ex-DeepMind David Silver raises $1.1 billion for AI startup Ineffable," 27 April 2026. https://www.cnbc.com/2026/04/27/deepmind-ineffable-intelligence-record-seed-funding-nvidia-google.html ↩
Fortune, "Exclusive: Google DeepMind researcher David Silver leaves to launch his own AI startup," 30 January 2026. https://fortune.com/2026/01/30/google-deepmind-ai-researcher-david-silver-leaves-to-found-ai-startup-ineffable-intelligence/ ↩
Silver, D., Singh, S., Precup, D., Sutton, R. S. "Reward is enough." *Artificial Intelligence* 299 (2021). https://www.sciencedirect.com/science/article/pii/S0004370221000862 ↩
Royal Academy of Engineering, Princess Royal Silver Medal previous winners. https://raeng.org.uk/programmes-and-prizes/prizes/princess-royal-silver-medal/previous-winners/ ↩
Google Scholar profile, David Silver. https://scholar.google.com/citations?user=-8DNE4UAAAAJ ↩
David Silver personal site / biography. https://davidstarsilver.wordpress.com/about/ ↩
DeepMind x UCL Introduction to Reinforcement Learning 2015 (YouTube playlist). https://www.youtube.com/playlist?list=PLqYmG7hTraZDM-OYHWgPebj2MfCFzFObQ ↩
Mnih, V., Kavukcuoglu, K., Silver, D., et al. "Playing Atari with Deep Reinforcement Learning." arXiv:1312.5602 (2013). https://arxiv.org/abs/1312.5602 ↩
Wikipedia, "AlphaGo versus Ke Jie." https://en.wikipedia.org/wiki/AlphaGo_versus_Ke_Jie ↩
Silver, D., Hubert, T., et al. "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm." arXiv:1712.01815 (2017). https://arxiv.org/abs/1712.01815 ↩
Vinyals, O., Babuschkin, I., Czarnecki, W. M., et al. "Grandmaster level in StarCraft II using multi-agent reinforcement learning." *Nature* 575, 350-354 (2019). https://www.nature.com/articles/s41586-019-1724-z ↩
Google DeepMind blog, "AI achieves silver-medal standard solving International Mathematical Olympiad problems," July 2024. https://deepmind.google/blog/ai-solves-imo-problems-at-silver-medal-level/ ↩
"Olympiad-level formal mathematical reasoning with reinforcement learning." *Nature* (2025). https://www.nature.com/articles/s41586-025-09833-y ↩
Wikipedia, "Aja Huang." https://en.wikipedia.org/wiki/Aja_Huang ↩
Royal Society, "Professor David Silver FRS." https://royalsociety.org/people/david-silver-35033/ ↩
Google DeepMind blog, "MuZero's first step from research into the real world," February 2022. https://deepmind.google/discover/blog/muzeros-first-step-from-research-into-the-real-world/ ↩
Silver, D., Sutton, R. S. "Welcome to the Era of Experience." Preprint, April 2025 (forthcoming chapter in *Designing an Intelligence*, MIT Press). https://storage.googleapis.com/deepmind-media/Era-of-Experience%20/The%20Era%20of%20Experience%20Paper.pdf ↩
TechCrunch, "DeepMind's David Silver just raised $1.1B to build an AI that learns without human data," 27 April 2026. https://techcrunch.com/2026/04/27/deepminds-david-silver-just-raised-1-1b-to-build-an-ai-that-learns-without-human-data/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

5 revisions by 1 contributor · full history

Suggest edit

What links here

AlphaGo Zero Ineffable Intelligence Ioannis Antonoglou

Key facts

Early life and education

Industry interlude: Elixir Studios

Academic career: University College London

What did David Silver do at Google DeepMind?

Major research contributions

Deep Q-Networks (2013-2015)

AlphaGo (2014-2016)

AlphaGo Master and the Future of Go Summit (2017)

AlphaGo Zero (2017)

AlphaZero (2017-2018)

MuZero (2019-2020)

AlphaStar (2019)

"Reward is enough" hypothesis (2021)

Later DeepMind work (2022-2025)

Research philosophy

What is the "Era of Experience"?

What is Ineffable Intelligence?

Is David Silver's reinforcement learning course available online?

What awards has David Silver won?

Influence and reception

Selected publications

References

Improve this article

Related Articles

Ioannis Antonoglou

DQN

AlphaStar

AlphaZero

MuZero

AlphaGo Zero

What links here

Related Articles

Ioannis Antonoglou

DQN

AlphaStar

AlphaZero

MuZero

AlphaGo Zero

What links here