AlphaGo is a computer program developed by DeepMind that plays the board game Go. It was the first program to defeat a professional human Go player on a full-sized 19x19 board without a handicap, and the first to beat a 9-dan professional, the highest rank in Go. AlphaGo combines deep neural networks with Monte Carlo tree search (MCTS), using a blend of supervised learning from human expert games and reinforcement learning from games played against itself [1]. The program's victories over some of the strongest Go players in history between 2015 and 2017 marked a turning point in artificial intelligence, demonstrating that machines could master a game long considered too complex and intuitive for computers to play at a professional level.
Go, which originated in China over 2,500 years ago, presents a far greater search space than chess. A standard 19x19 Go board yields roughly 2.1 x 10^170 possible board positions, compared to approximately 10^47 in chess [2]. The game's enormous branching factor (about 250 legal moves per turn, versus roughly 35 in chess) and the difficulty of evaluating board positions made Go a grand challenge for AI researchers for decades. Before AlphaGo, the strongest Go programs played at the level of a weak amateur.
AlphaGo was created at DeepMind, a British AI research lab founded in 2010 by Demis Hassabis, Shane Legg, and Mustafa Suleyman. Google acquired DeepMind in January 2014 for approximately 400 million British pounds, making it one of Google's largest European acquisitions at the time [3]. Under Google's umbrella, DeepMind continued operating as a semi-independent research lab with a stated mission of "solving intelligence."
The AlphaGo project grew out of DeepMind's broader interest in combining deep learning with reinforcement learning to solve complex sequential decision-making problems. The team was led by David Silver, a researcher specializing in reinforcement learning, alongside Aja Huang and a group of engineers and scientists at DeepMind. Demis Hassabis, himself a former child chess prodigy and game designer, saw Go as the ideal testbed: it was well-defined enough to measure progress objectively, but complex enough to require genuine breakthroughs in AI.
The original AlphaGo paper, "Mastering the game of Go with deep neural networks and tree search," was published in Nature on January 27, 2016, by David Silver, Aja Huang, and colleagues [1]. The paper described how AlphaGo combined two deep convolutional neural networks (a policy network and a value network) with Monte Carlo tree search to achieve superhuman performance.
AlphaGo's architecture integrates several components that work together during gameplay. At a high level, deep neural networks handle pattern recognition and position evaluation, while Monte Carlo tree search orchestrates the actual move selection during a game. The interplay between these components is what gave AlphaGo its strength.
The Go board is represented as a 19x19 grid. AlphaGo encodes the board state as a set of 48 feature planes, each of size 19x19. These feature planes capture various aspects of the position, including the locations of black and white stones, liberties (open adjacent points for groups of stones), capture status, legality of moves (including the ko rule), and turn information. This multi-channel representation allows the neural networks to process the board in a way analogous to how a convolutional neural network processes the color channels of an image [1].
The policy network takes the current board state as input and outputs a probability distribution over all legal moves. In essence, it answers the question: "Given this position, which moves are most likely to be good?" The network architecture is a deep convolutional neural network with 13 layers. The first layer uses 5x5 filters (192 filters), while subsequent layers use 3x3 filters (also 192 filters each), with zero padding to maintain the 19x19 spatial dimension throughout [1].
Training the policy network happened in two stages:
Supervised learning (SL) policy network. The network was first trained on a dataset of approximately 30 million board positions from 160,000 games played by human experts on the KGS Go Server. The objective was to predict the move that the human player actually made. This SL policy network achieved a prediction accuracy of 57.0% on a held-out test set, a significant improvement over previous state-of-the-art results of around 44% [1].
Reinforcement learning (RL) policy network. The SL policy network was then improved through self-play. The RL policy network was initialized with the weights of the SL network and then played games against randomly selected previous versions of itself. It was updated using the REINFORCE algorithm, receiving a reward of +1 for winning and -1 for losing. After this RL training phase, the RL policy network won more than 80% of games against the SL policy network [1].
In addition to the full policy network, AlphaGo used a much simpler and faster "rollout policy" based on a linear softmax model with hand-crafted pattern features. This lightweight policy could evaluate positions roughly 1,000 times faster than the deep policy network (about 2 microseconds per move versus 3 milliseconds). It was used during the simulation phase of Monte Carlo tree search to quickly play out games to completion and estimate outcomes [1].
The value network takes a board position as input and outputs a single number estimating the probability that the current player will win from that position. Its architecture is similar to the policy network (a deep CNN with 13 convolutional layers) but ends with a single scalar output rather than a probability distribution over moves.
The value network was trained on 30 million positions, each sampled from a separate game of self-play by the RL policy network. Using positions from separate games was important to avoid overfitting; if multiple positions from the same game were used, the network could simply memorize game outcomes rather than learning to evaluate positions independently. The value network's predictions approached the accuracy of Monte Carlo rollouts but were 15,000 times faster to compute [1].
During actual gameplay, AlphaGo combined all these components through a modified version of Monte Carlo tree search. The search proceeds in four steps, repeated thousands of times for each move:
| Step | Name | Description |
|---|---|---|
| 1 | Selection | Starting from the root (current board position), traverse the tree by selecting child nodes using the PUCT algorithm, which balances exploitation (choosing moves with high estimated value) and exploration (trying moves the policy network considers promising but that have been less explored) |
| 2 | Expansion | When a leaf node is reached, expand it by adding a new child node to the tree |
| 3 | Evaluation | Evaluate the new position using both the value network (producing value estimate v) and a fast rollout to the end of the game (producing outcome z) |
| 4 | Backup | Propagate the evaluation back up the tree, updating the statistics (visit count and mean value) of all nodes along the path |
The final evaluation of each position combined the value network's estimate and the rollout outcome using a mixing parameter lambda, set to 0.5 in the match against Fan Hui, giving equal weight to both signals [1].
The PUCT (Predictor + Upper Confidence bounds applied to Trees) algorithm used in the selection phase incorporated the prior probabilities from the SL policy network. This meant that the search focused its effort on moves that the policy network considered most promising, while still exploring alternatives. The formula for selecting moves balanced the mean action value of a move, a prior probability term from the policy network, and an exploration bonus that decreased as a move was visited more often.
AlphaGo typically performed around 10,000 to 100,000 simulations per move during tournament play, running on a distributed system of CPUs and GPUs [1].
AlphaGo's hardware requirements varied across its different versions:
| Version | Hardware | Notes |
|---|---|---|
| AlphaGo Fan (2015) | 176 GPUs, distributed across multiple machines | Used in the match against Fan Hui |
| AlphaGo Lee (2016) | 48 TPUs (first-generation) | Used in the match against Lee Sedol; ran on Google Cloud |
| AlphaGo Master (2017) | Single machine with 4 TPUs | Significantly more efficient than earlier versions |
| AlphaGo Zero (2017) | Single machine with 4 TPUs | No human data; trained entirely through self-play |
AlphaGo's development can be traced through a series of increasingly high-profile matches, each representing a significant step forward in capability.
In October 2015, AlphaGo played a formal five-game match against Fan Hui, the European Go champion, a 2-dan professional. The match took place at DeepMind's offices in London and was conducted under standard tournament conditions with no handicap on a full-sized 19x19 board. AlphaGo won all five games [1].
This was the first time any computer program had defeated a professional Go player under these conditions. The result was kept secret until the publication of the Nature paper in January 2016. When the news broke, it sent shockwaves through both the AI and Go communities. Many experts had predicted that it would take another decade or more before a computer could beat a professional Go player.
The match that brought AlphaGo to worldwide attention was a five-game series against Lee Sedol, a South Korean 9-dan professional widely regarded as one of the greatest Go players of the modern era. Lee held 18 world championship titles and was considered by many to be the strongest player of the previous decade [4].
The match took place at the Four Seasons Hotel in Seoul, South Korea, from March 9 to March 15, 2016. Google offered a prize of one million US dollars, to be donated to UNICEF, Go organizations, and STEM charities if AlphaGo won. The games were broadcast live and watched by an estimated 200 million viewers worldwide [4].
| Game | Date | Result | Notable events |
|---|---|---|---|
| Game 1 | March 9, 2016 | AlphaGo wins (Lee resigns) | Lee described AlphaGo's play as "flawless" |
| Game 2 | March 10, 2016 | AlphaGo wins (Lee resigns) | AlphaGo plays Move 37, shocking commentators |
| Game 3 | March 12, 2016 | AlphaGo wins (Lee resigns) | Lee appeared visibly distressed after the loss |
| Game 4 | March 13, 2016 | Lee Sedol wins (AlphaGo resigns) | Lee plays Move 78, the "Hand of God" |
| Game 5 | March 15, 2016 | AlphaGo wins (Lee resigns) | Lee resigned after a long and complex game |
AlphaGo won the match 4-1, earning a 9-dan professional honorary rank from the Korea Baduk Association [4].
The most discussed moment from the entire match occurred in Game 2. On AlphaGo's 37th move, the program placed a stone on the fifth line at the shoulder of White's position, a move so unusual that it stunned professional commentators. Michael Redmond, a 9-dan professional providing commentary, described it as "creative" and "unique," a move that virtually no human player would consider [5].
Conventional Go wisdom holds that fifth-line plays in the early to middle game are too high and inefficient. When the move appeared on the board, several expert commentators assumed it was a mistake. Fan Hui, watching the match, had a visceral reaction, later recalling that the move made him feel "cold" [5]. Lee Sedol himself left the table for about 15 minutes after seeing it, spending over 12 minutes before playing his response.
As the game progressed, the brilliance of Move 37 became apparent. AlphaGo's policy network had estimated that a human would play that move with a probability of roughly 1 in 10,000, yet the program's analysis determined it was the strongest option available [5]. The move eventually contributed to AlphaGo's victory in Game 2 and became a symbol of AI's potential for creative problem-solving. It demonstrated that a machine could generate strategies that went beyond anything in its human training data.
Game 4 provided the only human victory in the match and produced its own legendary moment. On move 78, Lee Sedol played a brilliant wedge move that split AlphaGo's groups in the center of the board. The move was later dubbed the "Hand of God" (or "God's Touch") by Gu Li, a 9-dan Chinese professional, who described it as "divine" [6].
Lee's Move 78 was estimated to have a probability of roughly 1 in 10,000 of being played by a human, mirroring AlphaGo's own Move 37 from Game 2. AlphaGo responded poorly on move 79, and its win-rate estimate, which had been around 70% at that point, plummeted. The program went on to make a series of weak moves from moves 87 to 101, and Lee won the game decisively [6].
This game revealed a weakness in AlphaGo's architecture: the program struggled in positions that its training data and self-play experience had not adequately covered. When confronted with a highly unusual and brilliant move, its evaluation became unreliable, leading to a cascade of errors. Game 4 remains the only game a human has won against AlphaGo under match conditions.
In late December 2016 and early January 2017, an updated version of AlphaGo appeared on the Tygem and FoxGo online Go servers under the pseudonyms "Magister" and then "Master." Over the course of about a week, it played 60 rapid games against some of the world's top professional players, including Ke Jie (world number one), Park Junghwan, and numerous other top-ranked professionals. Master won all 60 games [7].
DeepMind confirmed after the streak that Master was indeed an updated version of AlphaGo. The 60-0 record, achieved against a who's who of professional Go, confirmed that the version used against Lee Sedol had been far from AlphaGo's ceiling. The online games, played at a faster time control than the Lee Sedol match, demonstrated that AlphaGo's superiority was not dependent on long thinking times.
The final public competition for AlphaGo took place at the Future of Go Summit in Wuzhen, China, in May 2017. The centerpiece was a three-game match between AlphaGo Master and Ke Jie, then the world's top-ranked Go player at age 19. The summit also featured other exhibition formats, including pair Go (human-AlphaGo teams) and a team match where five Chinese professionals collaborated against AlphaGo [8].
| Game | Date | Result |
|---|---|---|
| Game 1 | May 23, 2017 | AlphaGo wins by half a point |
| Game 2 | May 25, 2017 | AlphaGo wins (Ke resigns) |
| Game 3 | May 27, 2017 | AlphaGo wins (Ke resigns) |
AlphaGo won all three games against Ke Jie. The first game was particularly close, decided by just half a point (the smallest possible margin in Go). Ke Jie was emotional after Game 2, stepping away from the board and openly weeping, later saying he felt that AlphaGo was "like a god of Go" [8].
In the team match, five top Chinese professionals (including Ke Jie) played together against AlphaGo, and AlphaGo still won. After the summit, DeepMind announced that AlphaGo would retire from competitive play. Ke Jie was awarded a prize of 1.5 million yuan (about $200,000 USD) [8].
The following section describes the key technical differences across AlphaGo's versions.
The original system described in the 2016 Nature paper relied on a pipeline of four components:
| Component | Architecture | Purpose | Speed |
|---|---|---|---|
| SL policy network | 13-layer CNN (192 filters, 5x5 first layer, 3x3 rest) | Predict human expert moves | ~3 ms per position |
| RL policy network | Same architecture as SL policy, fine-tuned via self-play | Improved move prediction | ~3 ms per position |
| Fast rollout policy | Linear softmax with pattern features | Quick game simulations | ~2 microseconds per move |
| Value network | 13-layer CNN (similar to policy net, scalar output) | Evaluate board positions | ~3 ms per position |
The training pipeline proceeded as follows: the SL policy network was trained on human games, the RL policy network was improved through self-play against earlier versions of itself, and the value network was trained to predict the winner of RL policy self-play games. At game time, MCTS combined all four components.
AlphaGo Master, the version that achieved the 60-0 online streak and defeated Ke Jie, featured improvements to the neural network architecture and training process. DeepMind did not publish a separate paper detailing all the changes in Master, but it used a more powerful neural network, better training procedures, and ran on significantly less hardware than the Lee Sedol version (a single machine with 4 TPUs, compared to the distributed system of 48 TPUs used against Lee Sedol) [7].
On October 19, 2017, DeepMind published a paper in Nature titled "Mastering the game of Go without human knowledge," authored by David Silver, Julian Schrittwieser, Karen Simonyan, and colleagues [9]. This paper introduced AlphaGo Zero, a fundamentally redesigned version that learned to play Go entirely from scratch, with no human game data at all.
AlphaGo Zero differed from the original AlphaGo in several important ways:
| Feature | Original AlphaGo | AlphaGo Zero |
|---|---|---|
| Training data | 160,000 human expert games | None (self-play only) |
| Neural networks | Separate policy and value networks | Single dual-headed network |
| Network architecture | 13-layer CNN | 39-block (or 20-block) residual neural network |
| Input features | 48 hand-crafted feature planes | 17 raw feature planes (stone positions + move history) |
| Rollout policy | Used a fast rollout policy for simulations | No rollouts; relied entirely on the value head |
| MCTS evaluation | Combined value network and rollout results | Used only the value network output |
| Training method | Supervised learning then reinforcement learning | Pure reinforcement learning from self-play |
AlphaGo Zero used a single neural network with two output heads: a policy head (producing move probabilities) and a value head (producing a win probability estimate). The body of the network was a deep residual neural network with either 20 or 40 residual blocks, each containing two convolutional layers with batch normalization and ReLU activations. The use of residual connections (skip connections) allowed the network to be trained to much greater depth than the original 13-layer CNN [9].
The input to the network was dramatically simplified compared to the original AlphaGo. Instead of 48 hand-crafted feature planes, AlphaGo Zero used only 17 binary feature planes: 8 planes encoding the positions of black stones over the last 8 time steps, 8 planes encoding white stone positions over the same period, and 1 plane indicating the current player's color [9].
AlphaGo Zero's training was remarkably simple in concept:
The neural network and the tree search improve each other in a virtuous cycle: as the network becomes more accurate, the tree search becomes more effective, and the stronger tree search generates better training data for the network.
AlphaGo Zero's learning curve was extraordinary:
| Training time | Elo rating (approx.) | Milestone |
|---|---|---|
| 0 hours | Random play | Completely random moves |
| 3 days (4.9 million games) | ~3,700 | Surpassed AlphaGo Lee (the version that beat Lee Sedol) |
| 21 days | ~5,000 | Surpassed AlphaGo Master (60-0 online version) |
| 40 days (29 million games) | ~5,185 | Surpassed all previous versions; strongest Go player in history |
The 40-day version of AlphaGo Zero defeated the version of AlphaGo that beat Lee Sedol by 100 games to 0. It also defeated AlphaGo Master by 89 games to 11 [9].
One of the most striking findings from the AlphaGo Zero paper was that the system independently rediscovered known Go strategies during its training. In its early phases, it learned basic tactics. Over time, it developed standard openings (joseki) used by human professionals. Eventually, it moved beyond known human strategies and developed novel approaches of its own, some of which professional Go players found genuinely instructive [9].
On December 5, 2017, less than two months after the AlphaGo Zero paper, DeepMind released a preprint describing AlphaZero, a generalized version of the AlphaGo Zero algorithm that could master not just Go but also chess and shogi (Japanese chess) [10]. The paper, "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm," was authored by David Silver, Thomas Hubert, Julian Schrittwieser, and colleagues. A more detailed version was later published in Science in December 2018 [11].
AlphaZero used the same general architecture and training approach as AlphaGo Zero. The key innovation was generality: the same algorithm, with minimal modification, could learn any two-player perfect-information game given only the rules. No game-specific knowledge beyond the rules was provided.
| Game | Opponent | Training time | Result |
|---|---|---|---|
| Chess | Stockfish (2016 TCEC world champion) | ~4 hours | Won 155, lost 6, drew 839 out of 1,000 games |
| Shogi | Elmo (2017 CSA world champion) | ~2 hours | Won 91.2% of games |
| Go | AlphaGo Zero (3-day version) | ~8 hours | Won 61% of games |
AlphaZero's chess play attracted particular attention from the chess community. The program developed an aggressive, dynamic playing style that favored piece activity and long-term positional advantages over material. Former world chess champion Garry Kasparov praised AlphaZero's style, noting that it played in a way that was "recognizably human" yet also alien, willing to sacrifice material for initiative in ways that conventional engines would not [10].
AlphaZero used approximately 5,000 first-generation TPUs to generate self-play games and 64 second-generation TPUs to train the neural networks, all running in parallel [10].
AlphaGo's victories had a profound and lasting effect on the global Go community. The game of Go, with its 2,500-year history, occupies a position of cultural significance in East Asia comparable to chess in the West, but with even deeper roots in philosophy, art, and intellectual tradition. In China, Japan, and Korea, Go is not merely a game but a cultural institution, and its top players are celebrities.
AlphaGo's defeat of Lee Sedol was front-page news across East Asia and received extensive coverage worldwide. In South Korea, the match drew the largest TV audience for a Go event in history. The psychological impact on professional players was significant. Several top professionals described feeling a mix of admiration, loss, and existential unease about their life's work [12].
Lee Sedol himself retired from professional Go in November 2019, citing AlphaGo as a factor in his decision. In an interview with Yonhap News, he said: "With the debut of AI in Go games, I've realized that I'm not at the top even if I become the number one through frantic efforts. Even if I become the number one, there is an entity that cannot be defeated" [12].
However, many Go professionals found AlphaGo's influence to be ultimately positive. The program introduced new opening strategies and middle-game ideas that human players adopted and built upon. Professional Go players began using AI tools for training, analyzing positions, and preparing for matches. Several of AlphaGo's moves, including ideas first seen in its games, became part of the standard professional repertoire. The 3-3 point invasion in the early opening, which AlphaGo favored and which contradicted decades of professional convention, became widely adopted after professionals studied AlphaGo's games [13].
Move 37 from Game 2 against Lee Sedol transcended Go to become a broader cultural symbol. It was referenced in discussions about AI creativity, the nature of intuition, and the relationship between human and machine intelligence. A documentary film, AlphaGo (2017), directed by Greg Kohs, told the story of the Lee Sedol match and featured Move 37 prominently. The film received critical acclaim and was screened at the Tribeca Film Festival [14].
The move challenged the common assumption that AI systems can only optimize within known patterns and cannot produce genuinely novel ideas. While the question of whether AlphaGo is truly "creative" remains a subject of philosophical debate, the practical impact was undeniable: the program generated a move that thousands of years of human play had never produced, and it turned out to be strong.
The AlphaGo matches, particularly the Lee Sedol series, brought AI into mainstream public consciousness in a way that few previous developments had. The match was covered by major media outlets worldwide, and the live streams drew millions of viewers. In South Korea and China, the matches sparked a surge of interest in both Go and AI. Enrollment in Go classes reportedly increased in China following the matches, as public attention brought new players to the game [13].
The event also prompted public discussion about the pace of AI progress, the future of human work, and the societal implications of increasingly capable AI systems. For many people, the AlphaGo match was the moment they first took seriously the possibility that AI could perform tasks requiring what appeared to be intuition and creativity.
AlphaGo, and especially AlphaGo Zero, demonstrated the power of reinforcement learning combined with deep neural networks at a scale and level of performance that had not been achieved before. The progression from AlphaGo (trained partly on human data) to AlphaGo Zero (trained entirely from scratch) showed that self-play reinforcement learning could not only match but surpass approaches that relied on human expertise. This finding influenced the broader AI research community's approach to training agents for complex tasks.
The idea that an AI system could start from zero knowledge and achieve superhuman performance purely through self-play was a powerful proof of concept. It suggested that human knowledge, while useful as a starting point, might actually constrain an AI system by anchoring it to human strategies and biases.
AlphaGo's architecture and training approach inspired a succession of systems at DeepMind and beyond:
| System | Year | Domain | Key advance |
|---|---|---|---|
| AlphaGo Zero | 2017 | Go | Self-play without human data |
| AlphaZero | 2017 | Chess, shogi, Go | Generalized single algorithm for multiple games |
| MuZero | 2019 | Atari, chess, shogi, Go | Learned its own model of game dynamics without being given the rules |
| AlphaStar | 2019 | StarCraft II | Applied similar principles to a real-time strategy game with imperfect information |
| AlphaFold | 2018-2020 | Protein structure prediction | Applied deep learning to the protein folding problem, winning CASP13 and CASP14 |
| AlphaCode | 2022 | Competitive programming | Applied deep learning and search to code generation |
MuZero, published in Nature in 2020, extended the AlphaZero approach further by eliminating the need to provide the system with game rules. MuZero learned its own internal model of how the environment worked, achieving superhuman performance in Go, chess, shogi, and Atari games without being told the rules of any of them [15].
AlphaFold, while architecturally quite different from AlphaGo, was developed at DeepMind by a team that drew on the lab's experience with the AlphaGo project. AlphaFold's solution to the protein structure prediction problem, one of biology's grand challenges, earned Demis Hassabis and John Jumper the Nobel Prize in Chemistry in 2024 [16].
Several technical ideas from AlphaGo have found broader application in AI research:
Neural network-guided tree search. The idea of using a learned policy to guide tree search, combined with a learned value function for evaluation, has been adopted in diverse areas including theorem proving, program synthesis, and planning. The PUCT algorithm used in AlphaGo's MCTS has become a standard approach in neural MCTS implementations.
Self-play as a training paradigm. AlphaGo Zero's demonstration that self-play could produce superhuman performance from scratch influenced research in multi-agent systems, curriculum learning, and emergent complexity.
Combining supervised and reinforcement learning. The original AlphaGo's two-phase training (supervised pretraining followed by RL fine-tuning) anticipated the pretrain-then-fine-tune paradigm that later became standard in natural language processing with models like BERT and GPT.
Dual-headed network architecture. AlphaGo Zero's use of a single network with both policy and value heads influenced the design of multi-task and multi-objective neural architectures.
Before AlphaGo, a common view among AI researchers was that mastering Go was at least a decade away. A 2015 survey of AI experts placed human-level Go play at roughly 2025 [17]. AlphaGo's victory in 2016 arrived far earlier than most predictions, contributing to a broader recalibration of timelines for AI capabilities. This recalibration influenced both research priorities and public policy discussions around AI safety, ethics, and regulation.
AlphaGo was central to DeepMind's identity and public profile. Before AlphaGo, DeepMind was known primarily within the AI research community for its work on deep reinforcement learning applied to Atari games (published in Nature in 2015). The AlphaGo matches transformed DeepMind into a household name, at least in technology circles, and validated Google's investment in the company.
The AlphaGo project also demonstrated the value of Google's hardware infrastructure, particularly its Tensor Processing Units (TPUs). The Lee Sedol match used an early version of Google's TPUs, and the efficiency gains from custom hardware were a significant factor in AlphaGo's improvement across versions.
For Google, AlphaGo served as a showcase of the company's AI capabilities during a period of intense competition with other technology companies. The matches generated enormous media coverage and helped establish Google (and DeepMind) as leaders in AI research. In 2023, DeepMind merged with Google Brain (Google's other major AI research division) to form Google DeepMind, with Demis Hassabis as CEO [3].
| Date | Event |
|---|---|
| September 2010 | DeepMind founded by Demis Hassabis, Shane Legg, and Mustafa Suleyman |
| January 2014 | Google acquires DeepMind for approximately 400 million GBP |
| October 2015 | AlphaGo defeats Fan Hui 5-0 (kept secret until January 2016) |
| January 27, 2016 | Original AlphaGo paper published in Nature |
| March 9-15, 2016 | AlphaGo defeats Lee Sedol 4-1 in Seoul |
| December 2016 - January 2017 | AlphaGo Master wins 60 consecutive online games against top professionals |
| May 23-27, 2017 | AlphaGo defeats Ke Jie 3-0 at the Future of Go Summit; AlphaGo retires from competition |
| October 19, 2017 | AlphaGo Zero paper published in Nature |
| December 5, 2017 | AlphaZero preprint released, generalizing the approach to chess and shogi |
| December 2018 | AlphaZero paper published in Science |
| November 2019 | Lee Sedol retires from professional Go |