AlphaGo

Artificial intelligence Google Reinforcement learning

28 min read

Updated Apr 26, 2026

AlphaGo is a computer program developed by DeepMind that plays the board game Go. It was the first program to defeat a professional human Go player on a full-sized 19x19 board without a handicap, and the first to beat a 9-dan professional, the highest rank in Go. AlphaGo combines deep neural networks with Monte Carlo tree search (MCTS), using a blend of supervised learning from human expert games and reinforcement learning from games played against itself ^[1]. The program's victories over some of the strongest Go players in history between 2015 and 2017 marked a turning point in artificial intelligence, demonstrating that machines could master a game long considered too complex and intuitive for computers to play at a professional level.

Go, which originated in China over 2,500 years ago, presents a far greater search space than chess. A standard 19x19 Go board yields roughly 2.1 x 10^170 possible board positions, compared to approximately 10^47 in chess ^[2]. The game's enormous branching factor (about 250 legal moves per turn, versus roughly 35 in chess) and the difficulty of evaluating board positions made Go a grand challenge for AI researchers for decades. Before AlphaGo, the strongest Go programs played at the level of a weak amateur.

Background and development

AlphaGo was created at DeepMind, a British AI research lab founded in 2010 by Demis Hassabis, Shane Legg, and Mustafa Suleyman. Google acquired DeepMind in January 2014 for approximately 400 million British pounds, making it one of Google's largest European acquisitions at the time ^[3]. Under Google's umbrella, DeepMind continued operating as a semi-independent research lab with a stated mission of "solving intelligence."

The AlphaGo project grew out of DeepMind's broader interest in combining deep learning with reinforcement learning to solve complex sequential decision-making problems. The team was led by David Silver, a researcher specializing in reinforcement learning, alongside Aja Huang and a group of engineers and scientists at DeepMind. Demis Hassabis, himself a former child chess prodigy and game designer, saw Go as the ideal testbed: it was well-defined enough to measure progress objectively, but complex enough to require genuine breakthroughs in AI.

The original AlphaGo paper, "Mastering the game of Go with deep neural networks and tree search," was published in Nature on January 27, 2016, by David Silver, Aja Huang, and colleagues ^[1]. The paper described how AlphaGo combined two deep convolutional neural networks (a policy network and a value network) with Monte Carlo tree search to achieve superhuman performance.

How AlphaGo works

AlphaGo's architecture integrates several components that work together during gameplay. At a high level, deep neural networks handle pattern recognition and position evaluation, while Monte Carlo tree search orchestrates the actual move selection during a game. The interplay between these components is what gave AlphaGo its strength.

Board representation

The Go board is represented as a 19x19 grid. AlphaGo encodes the board state as a set of 48 feature planes, each of size 19x19. These feature planes capture various aspects of the position, including the locations of black and white stones, liberties (open adjacent points for groups of stones), capture status, legality of moves (including the ko rule), and turn information. This multi-channel representation allows the neural networks to process the board in a way analogous to how a convolutional neural network processes the color channels of an image ^[1].

Policy network

The policy network takes the current board state as input and outputs a probability distribution over all legal moves. In essence, it answers the question: "Given this position, which moves are most likely to be good?" The network architecture is a deep convolutional neural network with 13 layers. The first layer uses 5x5 filters (192 filters), while subsequent layers use 3x3 filters (also 192 filters each), with zero padding to maintain the 19x19 spatial dimension throughout ^[1].

Training the policy network happened in two stages:

Supervised learning (SL) policy network. The network was first trained on a dataset of approximately 30 million board positions from 160,000 games played by human experts on the KGS Go Server. The objective was to predict the move that the human player actually made. This SL policy network achieved a prediction accuracy of 57.0% on a held-out test set, a significant improvement over previous state-of-the-art results of around 44% ^[1].
Reinforcement learning (RL) policy network. The SL policy network was then improved through self-play. The RL policy network was initialized with the weights of the SL network and then played games against randomly selected previous versions of itself. It was updated using the REINFORCE algorithm, receiving a reward of +1 for winning and -1 for losing. After this RL training phase, the RL policy network won more than 80% of games against the SL policy network ^[1].

Fast rollout policy

In addition to the full policy network, AlphaGo used a much simpler and faster "rollout policy" based on a linear softmax model with hand-crafted pattern features. This lightweight policy could evaluate positions roughly 1,000 times faster than the deep policy network (about 2 microseconds per move versus 3 milliseconds). It was used during the simulation phase of Monte Carlo tree search to quickly play out games to completion and estimate outcomes ^[1].

Value network

The value network takes a board position as input and outputs a single number estimating the probability that the current player will win from that position. Its architecture is similar to the policy network (a deep CNN with 13 convolutional layers) but ends with a single scalar output rather than a probability distribution over moves.

The value network was trained on 30 million positions, each sampled from a separate game of self-play by the RL policy network. Using positions from separate games was important to avoid overfitting; if multiple positions from the same game were used, the network could simply memorize game outcomes rather than learning to evaluate positions independently. The value network's predictions approached the accuracy of Monte Carlo rollouts but were 15,000 times faster to compute ^[1].

Monte Carlo tree search integration

During actual gameplay, AlphaGo combined all these components through a modified version of Monte Carlo tree search. The search proceeds in four steps, repeated thousands of times for each move:

Step	Name	Description
1	Selection	Starting from the root (current board position), traverse the tree by selecting child nodes using the PUCT algorithm, which balances exploitation (choosing moves with high estimated value) and exploration (trying moves the policy network considers promising but that have been less explored)
2	Expansion	When a leaf node is reached, expand it by adding a new child node to the tree
3	Evaluation	Evaluate the new position using both the value network (producing value estimate v) and a fast rollout to the end of the game (producing outcome z)
4	Backup	Propagate the evaluation back up the tree, updating the statistics (visit count and mean value) of all nodes along the path

The final evaluation of each position combined the value network's estimate and the rollout outcome using a mixing parameter lambda, set to 0.5 in the match against Fan Hui, giving equal weight to both signals ^[1].

The PUCT (Predictor + Upper Confidence bounds applied to Trees) algorithm used in the selection phase incorporated the prior probabilities from the SL policy network. This meant that the search focused its effort on moves that the policy network considered most promising, while still exploring alternatives. The formula for selecting moves balanced the mean action value of a move, a prior probability term from the policy network, and an exploration bonus that decreased as a move was visited more often.

AlphaGo typically performed around 10,000 to 100,000 simulations per move during tournament play, running on a distributed system of CPUs and GPUs ^[1].

Hardware configuration

AlphaGo's hardware requirements varied across its different versions:

Version	Hardware	Notes
AlphaGo Fan (2015)	176 GPUs, distributed across multiple machines	Used in the match against Fan Hui
AlphaGo Lee (2016)	48 TPUs (first-generation)	Used in the match against Lee Sedol; ran on Google Cloud
AlphaGo Master (2017)	Single machine with 4 TPUs	Significantly more efficient than earlier versions
AlphaGo Zero (2017)	Single machine with 4 TPUs	No human data; trained entirely through self-play

Match history

AlphaGo's development can be traced through a series of increasingly high-profile matches, each representing a significant step forward in capability.

AlphaGo Fan vs. Fan Hui (October 2015)

In October 2015, AlphaGo played a formal five-game match against Fan Hui, the European Go champion, a 2-dan professional. The match took place at DeepMind's offices in London and was conducted under standard tournament conditions with no handicap on a full-sized 19x19 board. AlphaGo won all five games ^[1].

This was the first time any computer program had defeated a professional Go player under these conditions. The result was kept secret until the publication of the Nature paper in January 2016. When the news broke, it sent shockwaves through both the AI and Go communities. Many experts had predicted that it would take another decade or more before a computer could beat a professional Go player.

AlphaGo Lee vs. Lee Sedol (March 2016)

The match that brought AlphaGo to worldwide attention was a five-game series against Lee Sedol, a South Korean 9-dan professional widely regarded as one of the greatest Go players of the modern era. Lee held 18 world championship titles and was considered by many to be the strongest player of the previous decade ^[4].

The match took place at the Four Seasons Hotel in Seoul, South Korea, from March 9 to March 15, 2016. Google offered a prize of one million US dollars, to be donated to UNICEF, Go organizations, and STEM charities if AlphaGo won. The games were broadcast live and watched by an estimated 200 million viewers worldwide ^[4].

Game	Date	Result	Notable events
Game 1	March 9, 2016	AlphaGo wins (Lee resigns)	Lee described AlphaGo's play as "flawless"
Game 2	March 10, 2016	AlphaGo wins (Lee resigns)	AlphaGo plays Move 37, shocking commentators
Game 3	March 12, 2016	AlphaGo wins (Lee resigns)	Lee appeared visibly distressed after the loss
Game 4	March 13, 2016	Lee Sedol wins (AlphaGo resigns)	Lee plays Move 78, the "Hand of God"
Game 5	March 15, 2016	AlphaGo wins (Lee resigns)	Lee resigned after a long and complex game

AlphaGo won the match 4-1, earning a 9-dan professional honorary rank from the Korea Baduk Association ^[4].

Move 37: a moment of machine creativity

The most discussed moment from the entire match occurred in Game 2. On AlphaGo's 37th move, the program placed a stone on the fifth line at the shoulder of White's position, a move so unusual that it stunned professional commentators. Michael Redmond, a 9-dan professional providing commentary, described it as "creative" and "unique," a move that virtually no human player would consider ^[5].

Conventional Go wisdom holds that fifth-line plays in the early to middle game are too high and inefficient. When the move appeared on the board, several expert commentators assumed it was a mistake. Fan Hui, watching the match, had a visceral reaction, later recalling that the move made him feel "cold" ^[5]. Lee Sedol himself left the table for about 15 minutes after seeing it, spending over 12 minutes before playing his response.

As the game progressed, the brilliance of Move 37 became apparent. AlphaGo's policy network had estimated that a human would play that move with a probability of roughly 1 in 10,000, yet the program's analysis determined it was the strongest option available ^[5]. The move eventually contributed to AlphaGo's victory in Game 2 and became a symbol of AI's potential for creative problem-solving. It demonstrated that a machine could generate strategies that went beyond anything in its human training data.

Game 4: Lee Sedol's "Hand of God"

Game 4 provided the only human victory in the match and produced its own legendary moment. On move 78, Lee Sedol played a brilliant wedge move that split AlphaGo's groups in the center of the board. The move was later dubbed the "Hand of God" (or "God's Touch") by Gu Li, a 9-dan Chinese professional, who described it as "divine" ^[6].

Lee's Move 78 was estimated to have a probability of roughly 1 in 10,000 of being played by a human, mirroring AlphaGo's own Move 37 from Game 2. AlphaGo responded poorly on move 79, and its win-rate estimate, which had been around 70% at that point, plummeted. The program went on to make a series of weak moves from moves 87 to 101, and Lee won the game decisively ^[6].

This game revealed a weakness in AlphaGo's architecture: the program struggled in positions that its training data and self-play experience had not adequately covered. When confronted with a highly unusual and brilliant move, its evaluation became unreliable, leading to a cascade of errors. Game 4 remains the only game a human has won against AlphaGo under match conditions.

AlphaGo Master: 60-0 online (January 2017)

In late December 2016 and early January 2017, an updated version of AlphaGo appeared on the Tygem and FoxGo online Go servers under the pseudonyms "Magister" and then "Master." Over the course of about a week, it played 60 rapid games against some of the world's top professional players, including Ke Jie (world number one), Park Junghwan, and numerous other top-ranked professionals. Master won all 60 games ^[7].

DeepMind confirmed after the streak that Master was indeed an updated version of AlphaGo. The 60-0 record, achieved against a who's who of professional Go, confirmed that the version used against Lee Sedol had been far from AlphaGo's ceiling. The online games, played at a faster time control than the Lee Sedol match, demonstrated that AlphaGo's superiority was not dependent on long thinking times.

AlphaGo vs. Ke Jie (May 2017)

The final public competition for AlphaGo took place at the Future of Go Summit in Wuzhen, China, in May 2017. The centerpiece was a three-game match between AlphaGo Master and Ke Jie, then the world's top-ranked Go player at age 19. The summit also featured other exhibition formats, including pair Go (human-AlphaGo teams) and a team match where five Chinese professionals collaborated against AlphaGo ^[8].

Game	Date	Result
Game 1	May 23, 2017	AlphaGo wins by half a point
Game 2	May 25, 2017	AlphaGo wins (Ke resigns)
Game 3	May 27, 2017	AlphaGo wins (Ke resigns)

AlphaGo won all three games against Ke Jie. The first game was particularly close, decided by just half a point (the smallest possible margin in Go). Ke Jie was emotional after Game 2, stepping away from the board and openly weeping, later saying he felt that AlphaGo was "like a god of Go" ^[8].

In the team match, five top Chinese professionals (including Ke Jie) played together against AlphaGo, and AlphaGo still won. After the summit, DeepMind announced that AlphaGo would retire from competitive play. Ke Jie was awarded a prize of 1.5 million yuan (about $200,000 USD) ^[8].

Technical architecture in detail

The following section describes the key technical differences across AlphaGo's versions.

AlphaGo (original, 2015-2016)

The original system described in the 2016 Nature paper relied on a pipeline of four components:

Component	Architecture	Purpose	Speed
SL policy network	13-layer CNN (192 filters, 5x5 first layer, 3x3 rest)	Predict human expert moves	~3 ms per position
RL policy network	Same architecture as SL policy, fine-tuned via self-play	Improved move prediction	~3 ms per position
Fast rollout policy	Linear softmax with pattern features	Quick game simulations	~2 microseconds per move
Value network	13-layer CNN (similar to policy net, scalar output)	Evaluate board positions	~3 ms per position

The training pipeline proceeded as follows: the SL policy network was trained on human games, the RL policy network was improved through self-play against earlier versions of itself, and the value network was trained to predict the winner of RL policy self-play games. At game time, MCTS combined all four components.

AlphaGo Master (2017)

AlphaGo Master, the version that achieved the 60-0 online streak and defeated Ke Jie, featured improvements to the neural network architecture and training process. DeepMind did not publish a separate paper detailing all the changes in Master, but it used a more powerful neural network, better training procedures, and ran on significantly less hardware than the Lee Sedol version (a single machine with 4 TPUs, compared to the distributed system of 48 TPUs used against Lee Sedol) ^[7].

AlphaGo Zero

On October 19, 2017, DeepMind published a paper in Nature titled "Mastering the game of Go without human knowledge," authored by David Silver, Julian Schrittwieser, Karen Simonyan, and colleagues ^[9]. This paper introduced AlphaGo Zero, a fundamentally redesigned version that learned to play Go entirely from scratch, with no human game data at all.

Key differences from the original AlphaGo

AlphaGo Zero differed from the original AlphaGo in several important ways:

Feature	Original AlphaGo	AlphaGo Zero
Training data	160,000 human expert games	None (self-play only)
Neural networks	Separate policy and value networks	Single dual-headed network
Network architecture	13-layer CNN	39-block (or 20-block) residual neural network
Input features	48 hand-crafted feature planes	17 raw feature planes (stone positions + move history)
Rollout policy	Used a fast rollout policy for simulations	No rollouts; relied entirely on the value head
MCTS evaluation	Combined value network and rollout results	Used only the value network output
Training method	Supervised learning then reinforcement learning	Pure reinforcement learning from self-play

Architecture

AlphaGo Zero used a single neural network with two output heads: a policy head (producing move probabilities) and a value head (producing a win probability estimate). The body of the network was a deep residual neural network with either 20 or 40 residual blocks, each containing two convolutional layers with batch normalization and ReLU activations. The use of residual connections (skip connections) allowed the network to be trained to much greater depth than the original 13-layer CNN ^[9].

The input to the network was dramatically simplified compared to the original AlphaGo. Instead of 48 hand-crafted feature planes, AlphaGo Zero used only 17 binary feature planes: 8 planes encoding the positions of black stones over the last 8 time steps, 8 planes encoding white stone positions over the same period, and 1 plane indicating the current player's color ^[9].

Training process

AlphaGo Zero's training was remarkably simple in concept:

The neural network is initialized with random weights.
The system plays games against itself, using MCTS guided by the current neural network to select moves.
After each game, the neural network is updated to better predict the moves selected by MCTS (policy target) and the eventual game winner (value target).
Steps 2 and 3 repeat continuously.

The neural network and the tree search improve each other in a virtuous cycle: as the network becomes more accurate, the tree search becomes more effective, and the stronger tree search generates better training data for the network.

Performance milestones

AlphaGo Zero's learning curve was extraordinary:

Training time	Elo rating (approx.)	Milestone
0 hours	Random play	Completely random moves
3 days (4.9 million games)	~3,700	Surpassed AlphaGo Lee (the version that beat Lee Sedol)
21 days	~5,000	Surpassed AlphaGo Master (60-0 online version)
40 days (29 million games)	~5,185	Surpassed all previous versions; strongest Go player in history

The 40-day version of AlphaGo Zero defeated the version of AlphaGo that beat Lee Sedol by 100 games to 0. It also defeated AlphaGo Master by 89 games to 11 ^[9].

One of the most striking findings from the AlphaGo Zero paper was that the system independently rediscovered known Go strategies during its training. In its early phases, it learned basic tactics. Over time, it developed standard openings (joseki) used by human professionals. Eventually, it moved beyond known human strategies and developed novel approaches of its own, some of which professional Go players found genuinely instructive ^[9].

AlphaZero

On December 5, 2017, less than two months after the AlphaGo Zero paper, DeepMind released a preprint describing AlphaZero, a generalized version of the AlphaGo Zero algorithm that could master not just Go but also chess and shogi (Japanese chess) ^[10]. The paper, "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm," was authored by David Silver, Thomas Hubert, Julian Schrittwieser, and colleagues. A more detailed version was later published in Science in December 2018 ^[11].

AlphaZero used the same general architecture and training approach as AlphaGo Zero. The key innovation was generality: the same algorithm, with minimal modification, could learn any two-player perfect-information game given only the rules. No game-specific knowledge beyond the rules was provided.

Results across games

Game	Opponent	Training time	Result
Chess	Stockfish (2016 TCEC world champion)	~4 hours	Won 155, lost 6, drew 839 out of 1,000 games
Shogi	Elmo (2017 CSA world champion)	~2 hours	Won 91.2% of games
Go	AlphaGo Zero (3-day version)	~8 hours	Won 61% of games

AlphaZero's chess play attracted particular attention from the chess community. The program developed an aggressive, dynamic playing style that favored piece activity and long-term positional advantages over material. Former world chess champion Garry Kasparov praised AlphaZero's style, noting that it played in a way that was "recognizably human" yet also alien, willing to sacrifice material for initiative in ways that conventional engines would not ^[10].

AlphaZero used approximately 5,000 first-generation TPUs to generate self-play games and 64 second-generation TPUs to train the neural networks, all running in parallel ^[10].

Cultural impact

Impact on the Go community

AlphaGo's victories had a profound and lasting effect on the global Go community. The game of Go, with its 2,500-year history, occupies a position of cultural significance in East Asia comparable to chess in the West, but with even deeper roots in philosophy, art, and intellectual tradition. In China, Japan, and Korea, Go is not merely a game but a cultural institution, and its top players are celebrities.

AlphaGo's defeat of Lee Sedol was front-page news across East Asia and received extensive coverage worldwide. In South Korea, the match drew the largest TV audience for a Go event in history. The psychological impact on professional players was significant. Several top professionals described feeling a mix of admiration, loss, and existential unease about their life's work ^[12].

Lee Sedol himself retired from professional Go in November 2019, citing AlphaGo as a factor in his decision. In an interview with Yonhap News, he said: "With the debut of AI in Go games, I've realized that I'm not at the top even if I become the number one through frantic efforts. Even if I become the number one, there is an entity that cannot be defeated" ^[12].

However, many Go professionals found AlphaGo's influence to be ultimately positive. The program introduced new opening strategies and middle-game ideas that human players adopted and built upon. Professional Go players began using AI tools for training, analyzing positions, and preparing for matches. Several of AlphaGo's moves, including ideas first seen in its games, became part of the standard professional repertoire. The 3-3 point invasion in the early opening, which AlphaGo favored and which contradicted decades of professional convention, became widely adopted after professionals studied AlphaGo's games ^[13].

Move 37 as a cultural touchstone

Move 37 from Game 2 against Lee Sedol transcended Go to become a broader cultural symbol. It was referenced in discussions about AI creativity, the nature of intuition, and the relationship between human and machine intelligence. A documentary film, AlphaGo (2017), directed by Greg Kohs, told the story of the Lee Sedol match and featured Move 37 prominently. The film received critical acclaim and was screened at the Tribeca Film Festival ^[14].

The move challenged the common assumption that AI systems can only optimize within known patterns and cannot produce genuinely novel ideas. While the question of whether AlphaGo is truly "creative" remains a subject of philosophical debate, the practical impact was undeniable: the program generated a move that thousands of years of human play had never produced, and it turned out to be strong.

Broader public awareness of AI

The AlphaGo matches, particularly the Lee Sedol series, brought AI into mainstream public consciousness in a way that few previous developments had. The match was covered by major media outlets worldwide, and the live streams drew millions of viewers. In South Korea and China, the matches sparked a surge of interest in both Go and AI. Enrollment in Go classes reportedly increased in China following the matches, as public attention brought new players to the game ^[13].

The event also prompted public discussion about the pace of AI progress, the future of human work, and the societal implications of increasingly capable AI systems. For many people, the AlphaGo match was the moment they first took seriously the possibility that AI could perform tasks requiring what appeared to be intuition and creativity.

Legacy and influence on AI research

Advancing reinforcement learning

AlphaGo, and especially AlphaGo Zero, demonstrated the power of reinforcement learning combined with deep neural networks at a scale and level of performance that had not been achieved before. The progression from AlphaGo (trained partly on human data) to AlphaGo Zero (trained entirely from scratch) showed that self-play reinforcement learning could not only match but surpass approaches that relied on human expertise. This finding influenced the broader AI research community's approach to training agents for complex tasks.

The idea that an AI system could start from zero knowledge and achieve superhuman performance purely through self-play was a powerful proof of concept. It suggested that human knowledge, while useful as a starting point, might actually constrain an AI system by anchoring it to human strategies and biases.

Inspiring successor systems

AlphaGo's architecture and training approach inspired a succession of systems at DeepMind and beyond:

System	Year	Domain	Key advance
AlphaGo Zero	2017	Go	Self-play without human data
AlphaZero	2017	Chess, shogi, Go	Generalized single algorithm for multiple games
MuZero	2019	Atari, chess, shogi, Go	Learned its own model of game dynamics without being given the rules
AlphaStar	2019	StarCraft II	Applied similar principles to a real-time strategy game with imperfect information
AlphaFold	2018-2020	Protein structure prediction	Applied deep learning to the protein folding problem, winning CASP13 and CASP14
AlphaCode	2022	Competitive programming	Applied deep learning and search to code generation

MuZero, published in Nature in 2020, extended the AlphaZero approach further by eliminating the need to provide the system with game rules. MuZero learned its own internal model of how the environment worked, achieving superhuman performance in Go, chess, shogi, and Atari games without being told the rules of any of them ^[15].

AlphaFold, while architecturally quite different from AlphaGo, was developed at DeepMind by a team that drew on the lab's experience with the AlphaGo project. AlphaFold's solution to the protein structure prediction problem, one of biology's grand challenges, earned Demis Hassabis and John Jumper the Nobel Prize in Chemistry in 2024 ^[16].

Methodological contributions

Several technical ideas from AlphaGo have found broader application in AI research:

Neural network-guided tree search. The idea of using a learned policy to guide tree search, combined with a learned value function for evaluation, has been adopted in diverse areas including theorem proving, program synthesis, and planning. The PUCT algorithm used in AlphaGo's MCTS has become a standard approach in neural MCTS implementations.
Self-play as a training paradigm. AlphaGo Zero's demonstration that self-play could produce superhuman performance from scratch influenced research in multi-agent systems, curriculum learning, and emergent complexity.
Combining supervised and reinforcement learning. The original AlphaGo's two-phase training (supervised pretraining followed by RL fine-tuning) anticipated the pretrain-then-fine-tune paradigm that later became standard in natural language processing with models like BERT and GPT.
Dual-headed network architecture. AlphaGo Zero's use of a single network with both policy and value heads influenced the design of multi-task and multi-objective neural architectures.

Shifting perceptions of AI capability

Before AlphaGo, a common view among AI researchers was that mastering Go was at least a decade away. A 2015 survey of AI experts placed human-level Go play at roughly 2025 ^[17]. AlphaGo's victory in 2016 arrived far earlier than most predictions, contributing to a broader recalibration of timelines for AI capabilities. This recalibration influenced both research priorities and public policy discussions around AI safety, ethics, and regulation.

Connection to DeepMind and Google

AlphaGo was central to DeepMind's identity and public profile. Before AlphaGo, DeepMind was known primarily within the AI research community for its work on deep reinforcement learning applied to Atari games (published in Nature in 2015). The AlphaGo matches transformed DeepMind into a household name, at least in technology circles, and validated Google's investment in the company.

The AlphaGo project also demonstrated the value of Google's hardware infrastructure, particularly its Tensor Processing Units (TPUs). The Lee Sedol match used an early version of Google's TPUs, and the efficiency gains from custom hardware were a significant factor in AlphaGo's improvement across versions.

For Google, AlphaGo served as a showcase of the company's AI capabilities during a period of intense competition with other technology companies. The matches generated enormous media coverage and helped establish Google (and DeepMind) as leaders in AI research. In 2023, DeepMind merged with Google Brain (Google's other major AI research division) to form Google DeepMind, with Demis Hassabis as CEO ^[3].

Timeline

Date	Event
September 2010	DeepMind founded by Demis Hassabis, Shane Legg, and Mustafa Suleyman
January 2014	Google acquires DeepMind for approximately 400 million GBP
October 2015	AlphaGo defeats Fan Hui 5-0 (kept secret until January 2016)
January 27, 2016	Original AlphaGo paper published in Nature
March 9-15, 2016	AlphaGo defeats Lee Sedol 4-1 in Seoul
December 2016 - January 2017	AlphaGo Master wins 60 consecutive online games against top professionals
May 23-27, 2017	AlphaGo defeats Ke Jie 3-0 at the Future of Go Summit; AlphaGo retires from competition
October 19, 2017	AlphaGo Zero paper published in Nature
December 5, 2017	AlphaZero preprint released, generalizing the approach to chess and shogi
December 2018	AlphaZero paper published in Science
November 2019	Lee Sedol retires from professional Go

References

Silver, D., Huang, A., Maddison, C. J., et al. (2016). "Mastering the game of Go with deep neural networks and tree search." *Nature*, 529(7587), 484-489. https://www.nature.com/articles/nature16961
Tromp, J., & Farneback, G. (2006). "Combinatorics of Go." *Proceedings of the 5th International Conference on Computers and Games*. https://tromp.github.io/go/legal.html
Google DeepMind. "About Google DeepMind." https://deepmind.google/about/
Wikipedia. "AlphaGo versus Lee Sedol." https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
Kohs, G. (Director). (2017). *AlphaGo* [Documentary film]. Moxie Pictures / Reel As Dirt.
Gu, L. (2016). Commentary on AlphaGo vs. Lee Sedol, Game 4. Referenced in: https://en.wikipedia.org/wiki/AlphaGo_versus_Lee_Sedol
DeepMind. (2017). "AlphaGo Master." https://deepmind.google/research/breakthroughs/alphago/
Wikipedia. "AlphaGo versus Ke Jie." https://en.wikipedia.org/wiki/AlphaGo_versus_Ke_Jie
Silver, D., Schrittwieser, J., Simonyan, K., et al. (2017). "Mastering the game of Go without human knowledge." *Nature*, 550(7676), 354-359. https://www.nature.com/articles/nature24270
Silver, D., Hubert, T., Schrittwieser, J., et al. (2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm." arXiv:1712.01815. https://arxiv.org/abs/1712.01815
Silver, D., Hubert, T., Schrittwieser, J., et al. (2018). "A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play." *Science*, 362(6419), 1140-1144. https://www.science.org/doi/10.1126/science.aar6404
Yonhap News Agency. (2019). "Go master Lee Se-dol announces retirement, citing AI-driven paradigm shift." https://en.yna.co.kr/view/AEN20191127004800315
Lee, S. (2025). "10 years since AlphaGo: How AI completely changed the Go industry." *Medium*. https://medium.com/@soyeon.lee/10-years-since-alphago-how-ai-completely-changed-the-go-industry-b29ca27e3c6c
Internet Movie Database. *AlphaGo* (2017). https://www.imdb.com/title/tt6700846/
Schrittwieser, J., Antonoglou, I., Hubert, T., et al. (2020). "Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model." *Nature*, 588(7839), 604-609. https://www.nature.com/articles/s41586-020-03051-4
The Nobel Foundation. (2024). "The Nobel Prize in Chemistry 2024." https://www.nobelprize.org/prizes/chemistry/2024/summary/
Grace, K., Salvatier, J., Dafoe, A., Zhang, B., & Evans, O. (2018). "When Will AI Exceed Human Performance? Evidence from AI Experts." *Journal of Artificial Intelligence Research*, 62, 729-754. https://arxiv.org/abs/1705.08807

Background and development

How AlphaGo works

Board representation

Policy network

Fast rollout policy

Value network

Monte Carlo tree search integration

Hardware configuration

Match history

AlphaGo Fan vs. Fan Hui (October 2015)

AlphaGo Lee vs. Lee Sedol (March 2016)

Move 37: a moment of machine creativity

Game 4: Lee Sedol's "Hand of God"

AlphaGo Master: 60-0 online (January 2017)

AlphaGo vs. Ke Jie (May 2017)

Technical architecture in detail

AlphaGo (original, 2015-2016)

AlphaGo Master (2017)

AlphaGo Zero

Key differences from the original AlphaGo

Architecture

Training process

Performance milestones

AlphaZero

Results across games

Cultural impact

Impact on the Go community

Move 37 as a cultural touchstone

Broader public awareness of AI

Legacy and influence on AI research

Advancing reinforcement learning

Inspiring successor systems

Methodological contributions

Shifting perceptions of AI capability

Connection to DeepMind and Google

Timeline

See also

References

Related Articles

Google DeepMind

AlphaFold

NotebookLM

LaMDA

Tensor Processing Unit (TPU)

TPU Pod

Background and development

How AlphaGo works

Board representation

Policy network

Fast rollout policy

Value network

Monte Carlo tree search integration

Hardware configuration

Match history

AlphaGo Fan vs. Fan Hui (October 2015)

AlphaGo Lee vs. Lee Sedol (March 2016)

Move 37: a moment of machine creativity

Game 4: Lee Sedol's "Hand of God"

AlphaGo Master: 60-0 online (January 2017)

AlphaGo vs. Ke Jie (May 2017)

Technical architecture in detail

AlphaGo (original, 2015-2016)

AlphaGo Master (2017)

AlphaGo Zero

Key differences from the original AlphaGo

Architecture

Training process

Performance milestones

AlphaZero

Results across games

Cultural impact

Impact on the Go community

Move 37 as a cultural touchstone

Broader public awareness of AI

Legacy and influence on AI research

Advancing reinforcement learning

Inspiring successor systems

Methodological contributions

Shifting perceptions of AI capability

Connection to DeepMind and Google