# OpenAI Five

> Source: https://aiwiki.ai/wiki/openai_five
> Updated: 2026-06-22
> Categories: AI in Gaming, Artificial Intelligence, OpenAI, Reinforcement Learning
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

OpenAI Five was a [reinforcement learning](/wiki/reinforcement_learning) system developed by [OpenAI](/wiki/openai) to play the competitive multiplayer video game [Dota 2](/wiki/dota_2) at a professional level. On April 13, 2019, OpenAI Five became the first AI system to defeat the reigning world champions in a major esports title, beating Team OG 2-0 in a best-of-three series in San Francisco.[3] OpenAI described it as "the first AI to beat the world champions in an esports game, having won two back-to-back games versus the world champion Dota 2 team, OG, at Finals this weekend."[3] The system trained using [Proximal Policy Optimization](/wiki/ppo) (PPO) at massive scale, accumulating roughly 45,000 years of in-game experience over ten months of real-time self-play training on 256 GPUs and 128,000 CPU cores.[1][12] Days after the OG match, OpenAI opened the bots to the public in the OpenAI Five Arena, where the system won 99.4% of its games (7,215 wins, 42 losses) against human teams.[1][10]

## Background

Dota 2 is a multiplayer online battle arena (MOBA) game developed by Valve Corporation. In a standard match, two teams of five human players compete to destroy the opposing team's base structure called the Ancient. Each player controls a "hero" selected from a pool of over 115 characters, each with unique abilities. Matches typically last 30 to 60 minutes and require a combination of mechanical skill, strategic planning, team coordination, and real-time adaptation.

OpenAI selected Dota 2 as a research challenge because the game presented several properties that made it far more difficult for AI systems than previous game-playing benchmarks like chess or Go.[2] Greg Brockman, OpenAI's co-founder and then-CTO, described the game as a stepping stone toward building AI systems that could handle the complexity and unpredictability of real-world problems.

## Why Was Dota 2 a Hard Problem for AI?

Dota 2 posed a number of challenges that distinguished it from earlier AI game-playing milestones:

**Partial observability.** Unlike chess or Go, where both players can see the entire board, Dota 2 uses a "fog of war" mechanic. Players can only see areas of the map near their own units or wards, meaning much of the game state is hidden at any given time. The AI had to make decisions under significant uncertainty about enemy positions, intentions, and item builds.[1]

**Long time horizons.** A typical Dota 2 match involves around 20,000 timesteps (at the frame rate used by OpenAI Five). Actions taken in the early game, such as resource allocation and lane assignments, can have consequences that only become apparent 20 or 30 minutes later. This made [credit assignment](/wiki/credit_assignment) extremely difficult.[1]

**Enormous action space.** At each timestep, a hero can choose from roughly 8,000 to 80,000 valid actions depending on the situation. The theoretical action space, factoring in all possible combinations of action type, target, and positioning, reaches approximately 1.8 million dimensions.[1] For comparison, Go has a branching factor of roughly 250 moves per turn, and chess about 35.

**High-dimensional observation space.** Each hero receives approximately 16,000 numerical inputs per timestep describing the game state, including unit positions, health values, ability cooldowns, item inventories, and more. Rather than processing raw screen pixels, OpenAI Five consumed structured data through Valve's bot API.[1]

**Five-player coordination.** Dota 2 is a team game requiring tight coordination among five players. Successful play demands role specialization, shared map control, coordinated team fights, and collective resource management. Each of the five [AI agents](/wiki/ai_agents) needed to learn cooperative behavior without explicit communication channels.[1]

**Complex game mechanics.** Dota 2's rules are implemented in hundreds of thousands of lines of code, with intricate interactions between hero abilities, items, terrain, and neutral creeps. The game receives frequent patches that alter balance and mechanics.

## The 1v1 Bot (August 2017)

### Development

Before tackling the full 5v5 game, OpenAI built a bot that played 1v1 mid-lane matches using the hero Shadow Fiend.[6] Development of the underlying algorithms began in November 2016. The bot learned entirely through self-play, starting with no prior knowledge of the game and gradually discovering effective strategies by playing against copies of itself.[6] According to Greg Brockman, the bot required approximately two weeks of training to reach a competitive level.

The progression was rapid. By March 2017, the system achieved its first classical reinforcement learning results in a simplified Dota environment. By early June 2017, it could beat a tester at 1,500 matchmaking rating (MMR). By June 30, it won the majority of games against a 3,000 MMR tester. By July 8, it secured its first win against a 7,500 MMR semi-professional tester.[6]

### The International 2017 Demonstration

On August 11, 2017, OpenAI staged a surprise demonstration at The International 2017 (TI7), Dota 2's premier annual championship tournament, held in Seattle. The bot was matched against Danylo "Dendi" Ishutin, a Ukrainian professional player and former world champion widely regarded as one of the most recognizable figures in competitive Dota 2.[6]

The match was played under standard 1v1 Shadow Fiend rules: first to two kills or first to destroy the enemy tower, with no neutral creeps and item restrictions. The bot won the first game in under ten minutes, establishing a commanding lead in last hits (34 to Dendi's 14, with 15 denies to Dendi's 2). Dendi conceded the second game shortly after it began. During the match, Dendi repeatedly remarked, "This guy is scary."[6]

In the days surrounding TI7, the bot also defeated several other top players in private matches, including Arteezy (rated approximately 10,000 MMR, one of the highest-rated players in the world) with a 10-0 record, SumaiL (a top 1v1 specialist) 6-0, Pajkatt (a professional player rated 8,500 MMR) 2-1, and Blitz (a former professional rated 6,200 MMR) 3-0.[6]

### Limitations

The 1v1 bot operated under significant constraints. It played only one hero (Shadow Fiend) in a simplified 1v1 mid-lane format that eliminated most of the strategic complexity of the full game. There were no allied or enemy teammates, no jungle, limited items, and no need for map awareness or team coordination. Critics noted that 1v1 mid was largely a test of mechanical execution and lane control rather than deep strategic reasoning.

## OpenAI Five: The 5v5 System

### Architecture

OpenAI Five consisted of five independent neural networks, one controlling each hero on the team. Each network shared the same architecture and weights but received different observations indicating which of the five heroes it controlled.[1] The core of each network was a single-layer [Long Short-Term Memory](/wiki/long_short-term_memory_lstm) (LSTM) network with 4,096 hidden units. The full model contained approximately 159 million parameters, with the LSTM accounting for roughly 84% of the total parameter count.[1]

The observation space was flattened into a single vector of approximately 16,000 values per hero, representing all game-state information available to a human player (unit positions, health, mana, cooldowns, items, and so on). All floating-point observations were normalized using z-scores (subtracting the mean and dividing by the standard deviation) and clipped to the range (-5, 5) for training stability.[1]

The action space used a factored structure. At each timestep, a hero selected a primary action (from up to 30 possibilities, averaging 8.1 available per timestep), plus parameters for delay (4 dimensions), unit selection (189 dimensions), and spatial offset (81 dimensions). The combined theoretical action space reached approximately 1,837,080 dimensions, though invalid actions were filtered based on cooldowns, valid targets, and situational constraints.[1]

Critically, the five hero networks did not communicate directly with each other. There was no shared memory, messaging system, or centralized coordinator. All coordination emerged purely from training through self-play. Each agent learned to anticipate what its teammates would do based on the observable game state.[1]

### How Was OpenAI Five Trained? PPO at Scale

OpenAI Five was trained using Proximal Policy Optimization (PPO), a policy gradient [reinforcement learning](/wiki/reinforcement_learning) algorithm developed at OpenAI. PPO was chosen for its stability and scalability when applied to large-scale distributed training.[1]

The training ran on a custom distributed platform called "Rapid," hosted on Google Cloud Platform. The infrastructure consisted of:

- **256 NVIDIA P100 [GPUs](/wiki/gpu_computing)** for optimization (gradient computation and parameter updates)
- **128,000 preemptible CPU cores** for running game simulations (rollouts)

The system was organized into several components:

| Component | Role |
|---|---|
| Rollout Workers (CPUs) | Simulated Dota 2 games and collected experience data |
| Forward Pass GPUs | Computed actions for rollout workers during gameplay |
| Optimizer GPUs | Sampled experience from the buffer, computed gradients, and updated model parameters |
| Controller | Distributed updated parameters to all components |
| Experience Buffer | Stored gameplay data for the optimizers to sample from |

Rollout workers and optimizers operated asynchronously. The system targeted a sample reuse ratio close to 1, meaning optimizers consumed experience data at roughly the same rate that rollout workers produced it. Stale data was treated as harmful; game data was sent every 30 seconds, and model parameters were updated approximately once per minute.[1]

At peak throughput, OpenAI Five played the equivalent of 180 years of Dota 2 per day during its initial training phase. Over the full ten-month training period (June 30, 2018 to April 22, 2019), the system accumulated approximately 45,000 years of in-game experience. The total compute used was estimated at 770 plus or minus 50 PetaFLOP/s-days.[1]

The optimization itself grew far beyond the initial 2018 configuration. Each optimizer GPU computed gradients on minibatches of 120 sample sequences of 16 timesteps each, and gradients were averaged across a pool that peaked at 1,536 GPUs, for an effective batch size of up to 2,949,120 timesteps; the published paper describes the system as learning from batches of approximately 2 million frames every 2 seconds.[1] By the team's accounting, the run used batch sizes 50 to 150 times larger than [AlphaGo](/wiki/alphago)'s, a model roughly 20 times larger, and about 25 times more training time, with 180 days of actual training spread across the ten months of wall-clock time because of restarts and reverts.[1] The often-quoted 770 plus or minus 50 PetaFLOP/s-days figure corresponds to optimization compute consumed by the time of the OG match on April 13, 2019; by the final shutdown on April 22 the total had reached 820 plus or minus 50 PetaFLOP/s-days.[1]

### Reward Shaping

Training a reinforcement learning agent on a game as complex as Dota 2 with only a win/loss signal at the end of a 45-minute match would be extremely slow. OpenAI addressed this by designing a shaped reward function that provided intermediate feedback throughout the game.[1]

The reward function included components tied to game metrics that human players use to evaluate performance:

| Reward Component | Description |
|---|---|
| Kills | Reward for killing enemy heroes |
| Deaths | Penalty for being killed |
| Assists | Reward for contributing to teammate kills |
| Last Hits | Reward for landing the killing blow on enemy creeps (gold income) |
| Net Worth | Reward tied to total gold and item value |
| Tower Damage | Reward for damaging or destroying enemy towers |

Two important mechanisms modified the raw rewards:

1. **Zero-sum adjustment.** Each hero's reward was adjusted by subtracting the average reward of the enemy team, preventing agents from discovering positive-sum exploits that would not translate to competitive play.[1]

2. **Exponential time weighting.** Rewards were scaled based on game time to prevent agents from overvaluing late-game actions, where power levels naturally increase and rewards grow larger in absolute terms.[1]

OpenAI ran experiments comparing the shaped reward to a pure win/loss signal. The win/loss-only version trained an order of magnitude slower and plateaued at a lower skill level.[1]

### Team Spirit

One of the more notable design choices in OpenAI Five was a hyperparameter called "team spirit," denoted by the Greek letter tau. This parameter controlled the balance between individual and collective reward for each agent, using a simple formula:[1]

effective_reward[i] = tau * mean(all_hero_rewards) + (1 - tau) * hero_reward[i]

At tau = 0, each hero cared only about its own individual reward (kills, last hits, net worth). At tau = 1, each hero weighted the team's average reward equally, promoting fully cooperative behavior.

During training, tau was annealed from 0.2 at the start to 0.97 near the end. Early in training, lower team spirit allowed agents to learn basic individual skills like farming and fighting. As training progressed, higher team spirit pushed the agents toward coordinated team play, sacrificing individual advantage for collective benefit.[1]

### Self-Play and Opponent Sampling

OpenAI Five trained entirely through self-play, with no human demonstration data or imitation learning. The system played 80% of its games against the latest version of itself and 20% against past checkpoints sampled from its training history.[1] This mixture helped prevent "strategy collapse," where the agent might develop a narrow set of tactics that work well against its current self but fail against diverse opponents.

Past opponents were selected using a dynamic quality scoring system that prioritized informative matchups over random historical snapshots.[1]

### Surgery: Adapting to Change

Over the 296-day (approximately 10-month) training run, OpenAI needed to modify the model multiple times to accommodate game patches, changes to the hero pool, and architectural improvements. The team developed a technique called "surgery" to handle these transitions.[1]

When model changes maintained the same input-output structure, the new model was initialized to replicate the old model's behavior as closely as possible. When this was not feasible (for example, when the observation space changed due to a game patch), the team gradually increased the proportion of games played with the new version, allowing the model to adapt incrementally rather than starting from scratch.[1]

In total, the team performed more than twenty successful surgeries, alongside many attempts that were reverted after training failures, averaging roughly one surgery every two weeks.[1] Documented changes included switching the environment from multiple couriers to the standard single courier, doubling the LSTM from 2,048 to 4,096 units, adding the items Bottle and Divine Rapier, handing the previously scripted buyback decision to the model, and absorbing the Dota 2 patches 7.19, 7.20, and 7.21.[1] The final environment change, to game version 7.21d, came just eight days before the match against OG, which the team noted would not have been possible if it had needed to retrain from scratch.[1]

### Rerun: Validating the Final Pipeline

After the competitive phase ended, OpenAI trained a second agent, called "Rerun," from scratch on the final environment, model architecture, and codebase, starting on May 18, 2019.[1] Because it skipped the project's long history of game patches, rule changes, and architectural revisions, Rerun took two months and 150 plus or minus 5 PetaFLOP/s-days of compute, roughly 20 percent of the resources consumed by OpenAI Five itself.[1] Rerun did not merely match the champion-beating agent: it continued improving past it and reached a win rate of over 98 percent against the final version of OpenAI Five, at which point training was stopped because the goal of validating the final code and hyperparameters had been met.[1] The team estimated that naively retraining from scratch after each of its roughly twenty major surgeries would have stretched the ten-month project to about 40 months.[1] On the project's internal TrueSkill scale, where 0 corresponds to random play, the version of OpenAI Five that defeated OG was rated 254.[1]

## Timeline of Key Events and Matches

| Date | Event | Result |
|---|---|---|
| November 2016 | Algorithm development begins | N/A |
| March 2017 | First RL results in simplified Dota environment | N/A |
| August 11, 2017 | 1v1 bot vs. Dendi at TI7[6] | Bot wins 2-0 |
| June 25, 2018 | OpenAI Five announced; beats amateur teams[2] | OpenAI Five wins |
| August 5, 2018 | OpenAI Five Benchmark vs. casters/ex-pros (~4,200 MMR)[5] | OpenAI Five wins 2-1 |
| August 22-23, 2018 | TI8 Showmatches vs. paiN Gaming and a Chinese all-star team[4] | OpenAI Five loses both matches |
| April 13, 2019 | OpenAI Five Finals vs. OG (TI8 world champions)[3] | OpenAI Five wins 2-0 |
| April 18-21, 2019 | OpenAI Five Arena (public online event)[10] | 99.4% win rate (7,215 wins, 42 losses) |
| April 22, 2019 | Training officially ends; project retired[1] | N/A |
| May 2019 | "Rerun" experiment retrains a fresh agent on the final environment[1] | Exceeds 98% win rate vs. final OpenAI Five |

## Game Restrictions

OpenAI Five operated under a set of game restrictions that simplified the full Dota 2 experience. These restrictions were gradually relaxed over the project's lifetime, but some remained in place through the final matches against OG.

### Restrictions in Place Throughout

- **Restricted hero pool.** The system supported only 17 heroes in the final version[1] (down from 18 after Lich was removed due to a major rework in Dota 2 patch 7.20). The full game features over 115 heroes. The 17-hero roster consisted of: Axe, Crystal Maiden, Death Prophet, Earthshaker, Gyrocopter, Lion, Necrophos, Queen of Pain, Razor, Riki, Shadow Fiend, Slark, Sniper, Sven, Tidehunter, Viper, and Witch Doctor.[10]
- **No Divine Rapier.** This high-risk, high-reward item was excluded. Support for it was eventually added partway through training via surgery, so it was available to the final system.[1]
- **No Bottle.** This commonly used regeneration item was not available. Bottle, too, was added to the training environment in a later surgery.[1]
- **No summons or illusions.** Heroes that create additional controllable units or illusory copies were excluded.
- **Five invulnerable couriers.** Each hero received its own courier (delivery unit) that could not be killed, removing courier management as a gameplay element. This setup applied to the 2018 exhibition matches; a mid-training surgery switched the environment to the standard single shared courier, which the final system controlled through scripted logic.[1]
- **No Scan.** The Scan ability, which lets teams detect enemy presence in an area, was disabled.

### Restrictions Removed Over Time

Earlier versions of OpenAI Five also lacked wards (vision-granting items) and Roshan (a powerful neutral boss that grants significant team advantages when killed). Both were reintroduced before the later matches, adding strategic depth.

In the published system description, OpenAI ultimately characterized the final version as playing with only two limitations relative to the regular game: the 17-hero pool, and the lack of support for items that allow a single player to temporarily control multiple units at once (Illusion Rune, Helm of the Dominator, Manta Style, and Necronomicon), which were removed to avoid the technical complexity of multi-unit control.[1]

The restricted hero pool was one of the most commonly cited limitations. Hero selection ("drafting") is a fundamental part of competitive Dota 2. Teams spend significant effort constructing hero compositions that synergize well and counter the opponent's picks. With only 17 heroes available, this dimension of the game was severely limited. OpenAI reportedly attempted to expand the hero pool to 25 before the OG match but found that the system was not learning quickly enough to reach professional level with the larger pool.

Ablation experiments published with the project paper suggest the restriction was less fundamental than it appeared: in early training, runs with an 80-hero pool progressed only about 20 percent slower than the base 17-hero runs, leading the team to hypothesize that a similarly skilled agent with a much larger pool would have required roughly 20 percent more training time.[1]

## How Did OpenAI Five Beat OG? The OpenAI Five Finals

### The Event

On April 13, 2019, OpenAI hosted the "OpenAI Five Finals" in San Francisco. The headline match pitted OpenAI Five against OG, the winners of The International 2018 (TI8) and the reigning Dota 2 world champions at the time.[3] The event was broadcast on Twitch with commentary from well-known Dota 2 personalities, including William "Blitz" Lee, Austin "Capitalist" Walsh, Owen "ODPixel" Davies, Kevin "Purge" Godec, and Jorien "Sheever" van der Heijden.[9]

OG's roster for the event included their full TI8-winning lineup:

| Player | Real Name | Nationality | Position |
|---|---|---|---|
| ana | Anathan Pham | Australia | Carry (Position 1) |
| Topson | Topias Taavitsainen | Finland | Mid (Position 2) |
| 7ckngMad (Ceb) | Sebastien Debs | France | Offlane (Position 3) |
| JerAx | Jesse Vainikka | Finland | Support (Position 4) |
| N0tail | Johan Sundstein | Denmark | Support (Position 5) |

### Match Results

OpenAI Five won both games decisively, which OpenAI summarized as winning "two back-to-back games versus the world champion Dota 2 team, OG."[3]

**Game 1** lasted 38 minutes and 18 seconds. OpenAI Five drafted Sniper, Earthshaker, Viper, Riki, and Shadow Fiend. OG played Gyrocopter, Witch Doctor, Death Prophet, Tidehunter, and Crystal Maiden. After the draft phase, OpenAI Five's internal model estimated a 67.6% win probability.[11] The AI established map control in the mid game and systematically dismantled OG's defenses.[9] The system's announced win estimate had stood at 64.3 percent after the first pick of the draft, and it climbed above 95 percent midway through the game after a pivotal team fight swung in the AI's favor.[11]

**Game 2** lasted just 20 minutes and 51 seconds. OpenAI Five drafted Crystal Maiden, Gyrocopter, Sven, Witch Doctor, and Viper. OG picked Sniper, Earthshaker, Death Prophet, Slark, and Lion. The second game was a dominant performance by the AI, with OG calling "GG" (conceding defeat) before the 21-minute mark.[9]

This result marked the first time an AI system had defeated the reigning world champions in a major esports title in a public, live-streamed competition.[3]

### The OpenAI Five Arena

Following the OG match, OpenAI opened the system to the public through the "OpenAI Five Arena," an online event running from April 18 to April 21, 2019. During the event, anyone could form a five-player team and challenge OpenAI Five under the same rules used in the OG match.[3]

The results were overwhelming. Over the four-day period:

- **30,937 human players** participated.
- OpenAI Five played **7,257 games**, winning **7,215** and losing only **42**.
- The overall win rate was **99.4%**.
- It took **459 games** before a human team recorded the first victory.
- Only **29 unique teams** managed to defeat the AI.[1]
- The total in-game time played amounted to approximately 10.7 years.[10]

In the project paper's accounting, OpenAI Five faced 3,193 distinct teams during the event, and 3,140 of its 7,215 wins came from games that human teams abandoned before completion, sometimes immediately after an unfavorable draft; abandoned games were counted as wins because OpenAI Five itself never abandons a game.[1]

### Cooperative Play with Humans

Although OpenAI Five was trained purely to win at self-play, the Finals event also included a cooperative demonstration in which human players were placed on mixed teams with OpenAI Five agents, and the Arena offered a cooperative mode alongside the competitive one, letting the public play with the agents as teammates rather than only against them.[3][13] Sheever, who took part in the demonstration, said her bot-controlled Viper teammate "gave his life for me at some point," adding "he believed in me."[3] OpenAI noted that the system displayed this teammate behavior despite having been trained exclusively against copies of itself.[3]

## How Did OpenAI Five Compare to AlphaStar?

Around the same period, [DeepMind](/wiki/deepmind) developed [AlphaStar](/wiki/alphastar), an AI system for Blizzard Entertainment's real-time strategy game StarCraft II.[8] The two projects represent the most prominent examples of applying deep reinforcement learning to complex competitive video games. While they shared broad similarities, they differed substantially in their technical approaches and the challenges they faced.

| Attribute | OpenAI Five | AlphaStar |
|---|---|---|
| Game | Dota 2 (MOBA) | StarCraft II (RTS) |
| Developer | [OpenAI](/wiki/openai) | [DeepMind](/wiki/deepmind) |
| Year of peak result | 2019 | 2019 |
| Players per team | 5 AI agents (cooperative) | 1 AI agent |
| Core architecture | Single-layer 4,096-unit [LSTM](/wiki/long_short-term_memory_lstm) | Deep LSTM with [Transformer](/wiki/attention) encoder and pointer network |
| Model parameters | ~159 million | ~139 million (55 million at inference) |
| Training method | Pure self-play with [PPO](/wiki/ppo) | Supervised learning on human replays, then RL with league-based self-play |
| Human data used | None | 971,000 human replays |
| Compute hardware | 256 NVIDIA P100 GPUs + 128,000 CPU cores | 16-32 TPUv3s per agent; 384 TPUv3s for league |
| Training duration | ~10 months (continuous) | ~44 days (league training phase) |
| Game experience | ~45,000 years equivalent | Not directly comparable (league-based) |
| Input representation | Structured game-state vectors (bot API) | Structured game-state data (no raw pixels) |
| Game restrictions | 17-hero pool, no items like Divine Rapier, no summons | Full game; camera and APM constraints added later |
| Peak achievement | Defeated TI8 champions OG 2-0 | Reached Grandmaster (top 0.2%) in all three races |
| Publication | arXiv, December 2019 | Nature, October 2019 |

One notable difference was the use of human data. AlphaStar bootstrapped its training with supervised learning on nearly one million human replays before transitioning to reinforcement learning, while OpenAI Five learned entirely from scratch through self-play.[8] AlphaStar also used a "league" training approach where multiple agents specialized against different strategies, whereas OpenAI Five used a single population trained against its current and past selves.

Another key distinction involved game restrictions. AlphaStar eventually played the full StarCraft II game with all three races and no gameplay restrictions (though with constrained action rates to approximate human physical limitations).[8] OpenAI Five never played with the full hero roster or all game mechanics enabled.

## Technical Contributions and Impact

### Scaling Reinforcement Learning

OpenAI Five demonstrated that relatively simple reinforcement learning algorithms, when scaled to sufficient compute, could solve problems of remarkable complexity. The system used no search trees, no explicit planning modules, and no hand-coded strategies beyond the reward function. Its performance came entirely from the combination of a large LSTM network, self-play, and massive computational scale.[1]

The project provided evidence for a hypothesis that would become increasingly central to AI research in subsequent years: that scale, in terms of both model size and training compute, could substitute for algorithmic complexity in many domains.

### Transfer to Robotics

OpenAI reused the same reinforcement learning algorithms and training code from OpenAI Five for [Dactyl](/wiki/dactyl), a project that trained a robotic hand (a Shadow Dexterous Hand) to manipulate a Rubik's Cube using reinforcement learning.[7] Dactyl ran on the same "Rapid" distributed training platform, using 6,144 CPU cores and 8 GPUs, and collected approximately 100 years of simulated experience in 50 hours.[7] The successful transfer demonstrated that the infrastructure and algorithms developed for game-playing could generalize to physical robotics tasks.

### Emergent Coordination

One of the more surprising findings was the degree of team coordination that emerged without explicit communication. The five agents learned to execute complex team fights, set up ambushes, coordinate ability usage, and make collective decisions about when to push objectives or retreat. This coordination arose purely from the team spirit reward mechanism and shared training through self-play. No agent could send messages to its teammates; each simply learned to predict what the others would do.[1]

### Reaction Time and Mechanical Skill

OpenAI constrained the agents' reaction time to between 167 and 267 milliseconds (5 to 8 frames at the game's tick rate), placing them in a range comparable to human professional players. The effective action rate was approximately 7.5 actions per second.[1] The published system description reports an average reaction time of 217 milliseconds, against a typical human visual reaction time of roughly 250 milliseconds.[1] This was done deliberately to ensure that the AI's advantages came from strategic and tactical decision-making rather than superhuman reflexes.

### Limitations Acknowledged

OpenAI was transparent about the system's limitations. The restricted hero pool meant that OpenAI Five never experienced the full strategic complexity of Dota 2's drafting phase. The system also showed weaknesses in very late-game scenarios during its TI8 losses, where long-term strategic planning and item choices mattered most.[4] Professional players who competed against OpenAI Five noted that the bots sometimes made unusual item decisions and struggled with certain late-game strategies that required careful resource management.

Additionally, the enormous computational cost raised questions about sample efficiency. The 45,000 years of simulated gameplay and hundreds of thousands of dollars in cloud computing costs were far beyond what any human player would need to reach a similar skill level.

## Retirement and Legacy

After the OpenAI Five Arena concluded on April 21, 2019, OpenAI officially retired the project. Training was halted on April 22, 2019, and the system was not updated further. The team published its findings in a paper titled "Dota 2 with Large Scale Deep Reinforcement Learning" on arXiv in December 2019.[1]

The project left a lasting mark on the field of reinforcement learning and AI research more broadly. It demonstrated that cooperative multi-agent reinforcement learning could produce coordinated behavior in complex, partially observable environments. It validated the effectiveness of PPO as a general-purpose RL algorithm at scale. And it contributed to a growing body of evidence that compute scaling could unlock capabilities that had previously seemed to require fundamental algorithmic breakthroughs.

OpenAI Five also influenced the public perception of AI capabilities. The live-streamed matches at TI7, TI8, and the OpenAI Five Finals attracted millions of viewers and introduced a broad audience to the state of modern AI research. For many in the gaming community, the matches against Dendi and OG served as tangible demonstrations of how far machine learning had progressed.

### Later Developments (2020-2026)

The line of work OpenAI Five opened continued to shape both games research and OpenAI's own trajectory after the project's retirement.

- **MOBA AI after OpenAI Five.** [Tencent](/wiki/tencent) AI Lab's JueWu system, presented at NeurIPS 2020, extended large-scale self-play reinforcement learning to the mobile MOBA Honor of Kings with a 40-hero pool, fielding agents that defeated top esports professionals; its authors explicitly cited the fact that "OpenAI's Dota AI limits the play to a pool of only 17 heroes" as the central limitation their training paradigm addressed.[14]
- **From game-playing to language models.** [Proximal Policy Optimization](/wiki/ppo), the algorithm OpenAI Five ran at unprecedented batch sizes, was later adopted as the policy-optimization method in [reinforcement learning from human feedback](/wiki/rlhf) (RLHF) for [InstructGPT](/wiki/instructgpt) in 2022,[15] and OpenAI described [ChatGPT](/wiki/chatgpt) as trained "using the same methods as InstructGPT."[16]
- **The scaling thesis.** [Jakub Pachocki](/wiki/jakub_pachocki) and Szymon Sidor, two of the project's leading researchers, had pushed to scale up reinforcement learning "as a baseline to see where it broke when the conventional wisdom was that it didn't scale," as [Sam Altman](/wiki/sam_altman) wrote in a September 2025 tribute crediting that bet with producing the Dota result and much of the infrastructure behind OpenAI's later breakthroughs.[17] Pachocki went on to lead [GPT-4](/wiki/gpt-4) pretraining and succeeded [Ilya Sutskever](/wiki/ilya_sutskever) as OpenAI's chief scientist in May 2024.[18][17] In a 2025 TIME 100 AI profile, Pachocki described the Dota 2 victory as "a big thing for OpenAI, realizing that scaling was going to be very important."[19]

## References

1. OpenAI. "Dota 2 with Large Scale Deep Reinforcement Learning." arXiv:1912.06680, December 2019. https://arxiv.org/abs/1912.06680
2. OpenAI. "OpenAI Five." OpenAI Blog, June 25, 2018. https://openai.com/index/openai-five/
3. OpenAI. "OpenAI Five defeats Dota 2 world champions." OpenAI Blog, April 15, 2019. https://openai.com/index/openai-five-defeats-dota-2-world-champions/
4. OpenAI. "The International 2018: Results." OpenAI Blog, August 2018. https://openai.com/index/the-international-2018-results/
5. OpenAI. "OpenAI Five Benchmark." OpenAI Blog, August 2018. https://openai.com/index/openai-five-benchmark/
6. OpenAI. "Dota 2." OpenAI Blog, August 2017. https://openai.com/index/dota-2/
7. OpenAI. "Learning Dexterity." OpenAI Blog, July 2018. https://openai.com/index/learning-dexterity/
8. Vinyals, O. et al. "Grandmaster level in StarCraft II using multi-agent reinforcement learning." Nature 575, 350-354, October 2019. https://www.nature.com/articles/s41586-019-1724-z
9. Liquipedia. "OpenAI Five Finals." https://liquipedia.net/dota2/OpenAI_Five_Finals
10. Liquipedia. "OpenAI Five Arena." https://liquipedia.net/dota2/OpenAI_Five_Arena
11. The Game Haus. "DOTA 2: OpenAI Five vs OG." May 9, 2019. https://thegamehaus.com/dota/dota-2-openai-five-vs-og/2019/05/09/
12. OpenAI. "How to Train Your OpenAI Five." OpenAI Blog, April 15, 2019. https://openai.com/index/how-to-train-your-openai-five/
13. Coldewey, D. "OpenAI Five crushes Dota2 world champs, and soon you can lose to it too." TechCrunch, April 15, 2019. https://techcrunch.com/2019/04/15/openai-five-crushes-dota2-world-champs-and-soon-you-can-lose-to-it-too/
14. Ye, D. et al. "Towards Playing Full MOBA Games with Deep Reinforcement Learning." NeurIPS 2020; arXiv:2011.12692, November 2020. https://arxiv.org/abs/2011.12692
15. Ouyang, L. et al. "Training language models to follow instructions with human feedback." arXiv:2203.02155, March 2022. https://arxiv.org/abs/2203.02155
16. OpenAI. "Introducing ChatGPT." OpenAI Blog, November 30, 2022. https://openai.com/index/chatgpt/
17. Altman, S. "Jakub and Szymon." Sam Altman's blog, September 2025. https://blog.samaltman.com/jakub-and-szymon
18. OpenAI. "Ilya Sutskever to leave OpenAI, Jakub Pachocki announced as Chief Scientist." OpenAI Blog, May 14, 2024. https://openai.com/index/jakub-pachocki-announced-as-chief-scientist/
19. TIME. "Jakub Pachocki: The 100 Most Influential People in AI 2025." TIME, 2025. https://time.com/collections/time100-ai-2025/7305886/jakub-pachocki/