David Silver
Last reviewed
May 18, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 3,638 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 18, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v1 · 3,638 words
Add missing citations, update stale details, or suggest a clearer explanation.
David Silver is a British computer scientist whose work has defined the modern field of deep reinforcement learning and computer game-playing. For more than a decade he served as a principal research scientist at Google DeepMind and as a professor of computer science at University College London (UCL), where he led the research programmes behind AlphaGo, AlphaZero, and MuZero — successive systems that achieved superhuman performance in Go, chess, shogi, and the Atari 2600 suite using progressively less hand-engineered knowledge.[^1][^2][^3]
Silver is best known as the technical lead of the AlphaGo project, which in March 2016 defeated 18-time world Go champion Lee Sedol four games to one at the Four Seasons Hotel in Seoul — an event widely viewed as the most significant public demonstration of artificial intelligence since Deep Blue's 1997 victory over Garry Kasparov.[^4][^5] He went on to lead the development of AlphaGo Zero, which mastered Go entirely through self-play; AlphaZero, which extended the same algorithm to chess and shogi; and MuZero, which removed the requirement that the system be given the rules of the game in advance.[^6][^7][^3] He was also a co-author of the 2015 Nature paper introducing the Deep Q-Network (DQN), the first deep reinforcement learning agent to achieve human-level performance on a broad range of Atari games.[^8]
For these contributions Silver received the 2019 ACM Prize in Computing, was elected a Fellow of the Royal Society in 2021, and was elected a Fellow of the Association for the Advancement of Artificial Intelligence in 2022.[^1][^9][^10] In November 2025 he co-founded the London-based AI startup Ineffable Intelligence and in January 2026 he left Google DeepMind to lead the company full time.[^11][^12]
| Born | 1976 (United Kingdom)[^10] |
| Education | BA, Christ's College, Cambridge (1997); MSc, Cambridge (2000); PhD, University of Alberta (2009)[^1][^10] |
| Doctoral advisor | Richard S. Sutton[^10] |
| Thesis | Reinforcement Learning and Simulation-Based Search in Computer Go (2009)[^10] |
| Known for | DQN, AlphaGo, AlphaGo Zero, AlphaZero, MuZero, AlphaStar, "Reward is enough" hypothesis[^1][^3][^13] |
| Positions | Co-founder & CTO, Elixir Studios (1998–2005); Royal Society University Research Fellow, UCL (2011–); Principal research scientist, DeepMind/Google DeepMind (2013–2026); Professor of Computer Science, UCL; CEO/Director, Ineffable Intelligence (2026–)[^1][^11][^12] |
| Major awards | Royal Academy of Engineering Silver Medal (2017); Mensa Foundation Prize; Marvin Minsky Medal (2018); 2019 ACM Prize in Computing; FRS (2021); AAAI Fellow (2022)[^14][^9][^1][^10] |
| Citations | h-index >100; among the most-cited researchers in artificial intelligence[^15] |
Silver was born in the United Kingdom in 1976.[^10] He read computer science at Christ's College, Cambridge, where he was awarded the Addison-Wesley Prize on graduation in 1997.[^1][^10] He completed an MSc at Cambridge in 2000 alongside his industry work.[^1]
In 2004 Silver returned to academia, enrolling at the University of Alberta to pursue a PhD in computer science under the supervision of Richard S. Sutton, one of the founders of modern reinforcement learning and co-author of the field's standard textbook.[^1][^10] Edmonton at the time was a leading centre for both reinforcement learning and computer Go, hosting the long-running MoGo and Fuego programmes. Silver's doctoral research focused on combining reinforcement learning with simulation-based search to play the ancient board game of Go, a domain widely considered the "grand challenge" of artificial intelligence because of its enormous branching factor and the difficulty of writing a good evaluation function.[^10] His thesis, Reinforcement Learning and Simulation-Based Search in Computer Go, was defended in 2009 and established several of the algorithmic ideas — temporal-difference learning combined with Monte Carlo rollouts and policy/value function approximation — that he would later scale up at DeepMind.[^10]
Between completing his Cambridge undergraduate degree and beginning doctoral work, Silver co-founded the British video game developer Elixir Studios in 1998 with university friend Demis Hassabis.[^1][^10] At Elixir, Silver served as chief technology officer and lead programmer, responsible for game engine architecture and artificial intelligence systems for titles including Republic: The Revolution and Evil Genius.[^1] The company received several British Academy of Film and Television Arts (BAFTA) and Develop Industry Excellence awards for technology and innovation before closing in 2005.[^1] The collaboration with Hassabis would prove foundational: more than a decade later, Hassabis (then CEO of DeepMind) would recruit Silver as one of the company's earliest senior researchers.
After completing his PhD, Silver took a faculty position at University College London. In 2011 he was awarded a Royal Society University Research Fellowship, a prestigious five-year (and renewable) award for early-career researchers in the UK; he formally joined UCL's Department of Computer Science as a lecturer the same year.[^16][^1] He was later promoted to professor of computer science.[^9] His UCL appointment ran jointly with his DeepMind research role, an arrangement common among DeepMind's senior scientists.
At UCL, Silver taught the postgraduate course COMPM050 / COMPGI13 "Reinforcement Learning," which he delivered as a sequence of ten ninety-minute lectures during the 2015 academic year. Recorded video of the lecture series — covering Markov decision processes, dynamic programming, model-free prediction and control, function approximation, policy-gradient methods, integration of learning and planning, and exploration — was posted on YouTube under DeepMind's account and has since accumulated millions of views, becoming one of the most widely used introductions to reinforcement learning worldwide.[^17] The slides remain a standard reference for graduate courses in the subject at many universities.
Silver began collaborating with DeepMind as a consultant in 2010, shortly after the company was founded in London by Demis Hassabis, Shane Legg, and Mustafa Suleyman.[^10] He joined full time in 2013 — one of the company's earliest senior researchers — and led DeepMind's reinforcement learning research group through the company's 2014 acquisition by Google and its 2023 reorganisation as Google DeepMind.[^1][^10] In January 2026 he stepped down as a principal research scientist to focus on his startup Ineffable Intelligence.[^11][^12]
Silver was a co-author of the foundational paper that launched the field of deep reinforcement learning. The first version, a 2013 NeurIPS workshop paper by Volodymyr Mnih and colleagues titled "Playing Atari with Deep Reinforcement Learning," introduced the Deep Q-Network — a convolutional neural network trained with Q-learning to map raw pixel observations to action values.[^18] A more comprehensive version, "Human-level control through deep reinforcement learning," was published in Nature in February 2015 with Silver as one of the senior co-authors.[^8] The system received only the screen pixels and game score and learned to play 49 Atari 2600 games at or above professional human level using the same algorithm, network architecture, and hyperparameters in each game. The DQN paper is one of the most-cited works in the modern AI literature and is credited with reigniting interest in reinforcement learning after a long period of relative quiet.[^8]
The AlphaGo programme grew out of an earlier collaboration between Silver and his DeepMind colleague Aja Huang, a Taiwanese computer-Go researcher who had previously contributed to the leading Monte-Carlo Go programmes Erica and Crazy Stone. Beginning around 2013–2014 the pair led an effort to build a Go program capable of defeating top human players, a challenge that had eluded researchers for two decades despite intense effort.[^24] The resulting system, AlphaGo, combined deep convolutional policy and value networks trained on expert human games (and subsequently refined by self-play) with Monte Carlo tree search.[^2] The original Nature paper, "Mastering the game of Go with deep neural networks and tree search," appeared in January 2016 with Silver and Huang as joint first authors and Demis Hassabis as senior author.[^2] In October 2015 a preliminary version of AlphaGo defeated the European Go champion Fan Hui 5-0 in a closed match in DeepMind's London office, the first time a computer program had beaten a professional human player at full-size 19×19 Go.[^2]
The decisive public test came between 9 and 15 March 2016 at the Four Seasons Hotel in Seoul, where AlphaGo played a five-game match against the South Korean Go master Lee Sedol, then widely regarded as one of the strongest players of the previous decade. AlphaGo won 4-1 in front of a global audience of an estimated 200 million viewers; the $1 million prize was donated to UNICEF and Go-related charities.[^4][^5] DeepMind's documentary film about the match, AlphaGo, was released in 2017. The Korea Baduk Association awarded AlphaGo an honorary 9-dan diploma.[^5]
A successor system, internally called AlphaGo Master, played a series of online speed-Go games in early 2017 under the pseudonyms "Magister" and "Master," winning 60 consecutive games against top professionals.[^19] In May 2017, at the Future of Go Summit in Wuzhen, China, AlphaGo Master defeated Ke Jie — then the world's top-ranked player — 3-0 in a formal three-game match, and won a "pair Go" exhibition.[^19] The Chinese Weiqi Association awarded AlphaGo a professional 9-dan diploma. After the summit DeepMind announced that AlphaGo would retire from competitive play.
In October 2017 Silver led publication of a paper in Nature titled "Mastering the game of Go without human knowledge," introducing AlphaGo Zero.[^6] Unlike its predecessors, AlphaGo Zero was trained tabula rasa — without any human game records, opening books, or hand-crafted features beyond the rules of Go. It used a single neural network with a residual architecture that output both a policy and a value, and a simplified tree-search procedure for training. After three days of self-play training on a single machine with four Google tensor processing units, AlphaGo Zero defeated the version of AlphaGo that had beaten Lee Sedol by 100 games to 0.[^6] The paper argued that the most consequential prior knowledge in earlier AlphaGo versions — the imitation of human expert play — had been not just unnecessary but actively limiting.
In December 2017 Silver and colleagues posted "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" on the arXiv preprint server, introducing AlphaZero, a generalisation of the AlphaGo Zero algorithm that learned the games of chess, shogi (Japanese chess), and Go using the same algorithm, network architecture, and hyperparameters in each.[^20] AlphaZero was given only the rules of each game and learned superhuman play purely from self-play. Within 9 hours of chess training it defeated Stockfish, then the strongest open-source chess engine; within 12 hours of shogi training it defeated Elmo, then the strongest shogi engine.[^20] The full peer-reviewed paper, "A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play," appeared in Science in December 2018.[^7] AlphaZero's distinctive playing style — characterised by long-term positional sacrifices and unconventional pawn structures — has had a lasting influence on top-level computer chess and on human opening preparation.
In November 2019 Silver and colleagues posted a preprint introducing MuZero, which extended the AlphaZero framework to settings where the rules of the environment are not known in advance.[^3] MuZero learns a model of the environment dynamics that is sufficient for planning — predicting the action-selection policy, the value function, and the immediate reward at each hypothetical future state — without ever attempting to reconstruct the full state of the environment. The peer-reviewed paper, "Mastering Atari, Go, chess and shogi by planning with a learned model," was published in Nature in December 2020.[^3] MuZero matched AlphaZero's superhuman performance on Go, chess, and shogi and simultaneously set a new state of the art on the Atari benchmark, the first time a single algorithm had achieved leading performance across both perfectly observed board games and the visually rich Atari domain.[^3]
Silver was a co-lead on AlphaStar, DeepMind's StarCraft II agent, which in 2019 became the first artificial system to reach Grandmaster level in the real-time strategy game across all three playable races. The work was published in Nature in October 2019 under the title "Grandmaster level in StarCraft II using multi-agent reinforcement learning."[^21] Unlike AlphaGo and AlphaZero, AlphaStar combined imitation learning from human replays, multi-agent reinforcement learning in a structured "league" of agents, and policy distillation, and operated under human-like interface constraints (a limited action-per-minute budget and a camera-restricted view).[^21]
In October 2021 Silver — together with Richard S. Sutton, Satinder Singh, and Doina Precup — published "Reward is Enough" in the journal Artificial Intelligence.[^13] The paper articulated the hypothesis that "intelligence, and its associated abilities, can be understood as subserving the maximisation of reward by an agent acting in its environment," arguing that the suite of capabilities studied in natural and artificial intelligence — knowledge, learning, perception, social intelligence, language, generalisation, and imitation — can in principle emerge from reward maximisation alone in sufficiently rich environments.[^13] The paper has been one of the most discussed (and contested) theoretical statements in modern reinforcement learning and prompted a substantial response literature.
In the years following MuZero, Silver and his colleagues continued to develop reinforcement learning systems for increasingly open-ended domains. He contributed to DeepMind's Gemini language-model programme and to its mathematical-reasoning efforts. In July 2024 DeepMind announced that two systems, AlphaProof and AlphaGeometry 2, had together achieved a performance equivalent to a silver medal at the 2024 International Mathematical Olympiad, solving four of the six problems including the hardest.[^22] The peer-reviewed AlphaProof paper, "Olympiad-level formal mathematical reasoning with reinforcement learning," was published in Nature in 2025; AlphaProof applied an AlphaZero-style reinforcement learning loop to the formal proof assistant Lean.[^23] (For the page on the system itself, see alphaproof and alphageometry.)
Silver's research outlook is closely aligned with what is sometimes called the "Alberta school" of artificial intelligence — the tradition associated with his doctoral advisor Richard Sutton, which emphasises scalable computational methods that allow agents to learn directly from experience rather than from human-labelled data.[^10][^12] Across DQN, the AlphaGo lineage, AlphaZero, MuZero, and the "Reward is enough" paper, Silver has consistently argued that the most general — and ultimately most powerful — route to advanced artificial intelligence is for an agent to maximise a scalar reward signal through interaction with its environment, discovering its own representations, strategies, and knowledge in the process.[^13][^12] In a 2025 essay co-authored with Sutton, "Welcome to the Era of Experience," Silver argued that large language models trained on human data are inherently limited to remixing existing human knowledge, and that experience-based reinforcement learning offers the only path to systems that can discover genuinely new knowledge.[^12] This view forms the explicit founding thesis of Ineffable Intelligence.[^11][^12]
In November 2025 Silver co-founded Ineffable Intelligence, a London-based artificial intelligence company.[^11] He stepped down from his principal research scientist role at Google DeepMind in January 2026 to lead the new company full time as director and CEO.[^11][^12] In April 2026 Ineffable Intelligence announced a seed round of approximately US$1.1 billion at a US$5.1 billion valuation, co-led by Sequoia and Lightspeed and including participation from Nvidia, DST Global, Google, and the UK Sovereign AI Fund — at the time the largest seed financing in European venture capital history.[^11][^12] Silver has described the company's mission as building "an endlessly learning superintelligence that self-discovers the foundations of all knowledge," positioning the firm's bet on experience-driven reinforcement learning as a deliberate alternative to the human-imitation paradigm dominant in contemporary frontier large language models.[^12]
Silver's UCL Reinforcement Learning lecture course (delivered in spring 2015 and recorded for public release) has become a canonical teaching resource for the field.[^17] The ten-lecture series — covering Markov decision processes, dynamic programming, Monte Carlo and temporal-difference methods, function approximation, policy gradients, integration of learning and planning, exploration and exploitation, and case studies including TD-Gammon and Atari DQN — is closely aligned with Sutton and Barto's textbook Reinforcement Learning: An Introduction but adds substantial material on deep RL.[^17] The accompanying slide deck is freely available and is used in graduate machine-learning programmes worldwide.
The 2015 lectures predate the public announcement of AlphaGo by less than a year and present in lecture form many of the building blocks — temporal-difference learning, function approximation with deep networks, Monte Carlo tree search — that were subsequently assembled into the AlphaGo architecture.[^17]
Silver's honours include:
The ACM citation specifically credited Silver with "developing the AlphaGo algorithm" and with "fundamental contributions to deep reinforcement learning."[^1][^9]
Silver's research has had a defining impact on the public perception of artificial intelligence in the 2010s and 2020s. The 2015 DQN paper, the 2016 AlphaGo paper, and the 2017 AlphaGo Zero paper appeared as cover stories in Nature and were among the most widely-reported scientific results of their respective years. AlphaGo's victory over Lee Sedol in particular is routinely cited as a watershed moment for deep reinforcement learning, comparable in cultural impact to Deep Blue's 1997 defeat of Garry Kasparov in chess but considered technically more significant because of Go's much larger state space and the absence of a strong hand-engineered evaluation function.[^5]
The AlphaZero playing style has had a lasting influence on top-level computer chess and on human opening preparation; the program's preference for long-term piece activity and willingness to sacrifice material for positional or initiative gains has been widely commented on by professional players and chess engine developers.[^7] In February 2022 DeepMind announced an extension of MuZero, "MuZero VP9," that learned to choose encoding decisions for the VP9 video codec and that reduced bitrate by roughly 4% at fixed quality on portions of YouTube traffic — among the first published deployments of an AlphaZero-lineage algorithm to a non-game industrial problem.[^26]
The "Reward is enough" hypothesis remains contested. Critics, including Silver's co-authors on subsequent papers, have argued that reward maximisation alone may be insufficient when reward signals are sparse, ambiguous, or contested between agents — and that multiobjective formulations may be required. The debate has nonetheless become one of the central theoretical questions of contemporary reinforcement learning research.[^13]