Dactyl (OpenAI)
Last reviewed
Jun 3, 2026
Sources
9 citations
Review status
Source-backed
Revision
v1 · 1,576 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
9 citations
Review status
Source-backed
Revision
v1 · 1,576 words
Add missing citations, update stale details, or suggest a clearer explanation.
Dactyl was a robotics research project at OpenAI that used deep reinforcement learning to control a five-fingered, human-like robot hand and manipulate physical objects with high dexterity. The system trained its control policies entirely in simulation and transferred them to the real robot without any real-world fine-tuning. Dactyl was first presented in 2018, when it learned to reorient a wooden block to arbitrary target orientations, and it became widely known in 2019 when the same hand solved a Rubik's Cube one-handed. The project is notable as an early large-scale demonstration of simulation-to-reality (sim-to-real) transfer for dexterous manipulation, and for sharing reinforcement-learning algorithms and infrastructure with OpenAI Five, OpenAI's Dota 2 system.[1][2][3]
Dactyl ran on a Shadow Dexterous Hand, a commercially available anthropomorphic robot hand with 24 degrees of freedom actuated by 20 pairs of agonist-antagonist tendons.[3] An object such as a block was placed in the hand's palm, and the system was asked to manipulate it into a specified pose. Rather than hand-programming the finger motions, OpenAI trained a neural-network control policy with the Proximal Policy Optimization (PPO) algorithm, the same general-purpose reinforcement-learning method and code base used for OpenAI Five.[1][3] The policy was a recurrent neural network built around a long short-term memory (LSTM) layer with 512 units, which allowed it to retain information about the object and the hand's dynamics over time.[3]
A central challenge in robotics is the difference between simulated physics and the real world, often called the reality gap. Collecting enough real-world robot experience to train a policy from scratch is prohibitively slow and can damage hardware. Dactyl addressed this by training only in simulation and using domain randomization to make the learned behavior robust enough to transfer to the physical hand.[1][2]
The first Dactyl results were published on July 30, 2018, in an OpenAI post titled "Learning Dexterity" and an accompanying paper, "Learning Dexterous In-Hand Manipulation" (arXiv:1808.00177), authored by a large OpenAI team including Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Bob McGrew, Jakub Pachocki, Matthias Plappert, Lilian Weng, and Wojciech Zaremba.[1][4] The task was to rotate a block resting in the palm to match a randomly chosen target orientation; once reached, a new target was generated, so the hand performed a continuous sequence of reorientations.[1]
The policy was trained without any human demonstrations and learned several human-like manipulation strategies on its own. OpenAI reported that behaviors such as finger gaiting (repositioning fingers to keep the object stable), multi-finger coordination, and the controlled use of gravity emerged naturally from the training process rather than being programmed.[1][4]
For perception, the real-world setup used two complementary systems. A motion-capture rig with 16 PhaseSpace cameras tracked LED markers to recover the hand's state at high precision, and a separate vision model estimated the object's pose from three Basler RGB cameras, demonstrating that the policy could work from camera images alone.[3] The control policy issued actions at roughly 12 Hz, while a low-level controller ran at about 1 kHz.[3]
Training was computationally intensive. Generating experience used a distributed system of 384 worker machines (16 CPU cores each), while optimization ran on 8 NVIDIA V100 GPUs.[3] Learning the block task with full randomization required on the order of 100 years of simulated experience, generated at roughly two years of simulation per hour, corresponding to about 50 hours of wall-clock time.[3] On the physical hand, the policy that used the vision-based pose estimator achieved a median of 11.5 consecutive successful rotations, while the version reading object state directly achieved a median of 13 and a maximum of 50 in a single trial.[3]
Domain randomization was the key technique enabling transfer. During training, OpenAI randomized many physical and visual properties of the simulation, including object size and mass, friction coefficients, the surface appearance of the object and hand, gravity, and sensor noise and delays.[1][3] By forcing a single policy to succeed across a wide distribution of simulated conditions, the real world appeared to the policy as just another variation it had already learned to handle.[2]
The cost of this robustness was substantial. OpenAI noted that learning the block-rotation task in a simulation without randomization required roughly three years of simulated experience, whereas reaching comparable performance with full randomization required about 100 years, far more experience to learn a policy that could survive contact with reality.[1]
On October 15, 2019, OpenAI published "Solving Rubik's Cube with a Robot Hand," with an accompanying paper (arXiv:1910.07113) authored by a team including Ilge Akkaya, Marcin Andrychowicz, Bob McGrew, Matthias Plappert, Jerry Tworek, Lilian Weng, and Wojciech Zaremba.[2][5] Using the same Shadow Dexterous Hand, the system manipulated a Rubik's Cube one-handed, performing the individual face rotations and cube flips needed to solve it.[2]
An important and frequently misunderstood point is the division of labor. Deciding which moves solve a given scramble is a solved problem in classical computer science: OpenAI used Kociemba's algorithm to compute the solution sequence.[2] The machine-learning contribution was not figuring out the solution but executing the required physical manipulations (rotating a face 90 degrees, or flipping the cube) reliably under real-world uncertainty.[2] To sense the cube's state, OpenAI used a custom cube fitted with Bluetooth sensors (based on a Giiker smart cube) to read face angles, alongside vision for the overall pose.[2]
The major algorithmic advance was Automatic Domain Randomization (ADR). Instead of fixing the randomization ranges by hand, ADR started from a single non-randomized environment and automatically widened the distribution of simulated conditions whenever the policy's performance crossed a threshold, generating an endless curriculum of ever-harder environments.[2] OpenAI reported that memory-augmented policies trained on ADR distributions showed signs of emergent meta-learning, appearing to adapt to the specific conditions of each trial at test time.[2][5]
The system was notably robust to perturbations it had never been trained against. In demonstrations, the hand continued manipulating the cube while wearing a rubber glove (which changed friction and geometry), with two fingers tied together, under a blanket occluding the camera, and while being prodded with a pen and a plush giraffe (nicknamed "Rubik").[2] On solving performance, OpenAI reported that the hand fully solved a maximally scrambled cube (requiring 26 quarter-face rotations to undo) about 20 percent of the time, and solved an easier scramble (requiring 15 rotations) about 60 percent of the time.[2]
| Aspect | Learning Dexterity (2018) | Solving Rubik's Cube (2019) |
|---|---|---|
| Object / task | Reorient a block to target poses | Face rotations and flips to solve a Rubik's Cube |
| Hardware | Shadow Dexterous Hand (24 DoF) | Shadow Dexterous Hand (24 DoF) |
| Randomization | Manually tuned domain randomization | Automatic Domain Randomization (ADR) |
| Solution logic | N/A | Kociemba's algorithm (classical solver) |
| Headline result | Median 13 block rotations (state); max 50 | ~20% solve (26-move scramble); ~60% (15-move) |
| Key emergent behavior | Finger gaiting, use of gravity | Meta-learning, robustness to perturbations |
Dactyl drew wide attention as a vivid illustration of sim-to-real transfer and reinforcement learning applied to a difficult physical task. Coverage by outlets including IEEE Spectrum, VentureBeat, and MIT Technology Review highlighted that a policy trained purely in simulation could cope with real-world friction, wear, and disturbances.[6][7]
Observers also cautioned against overstating the result. The robot did not "figure out" the Rubik's Cube, since the solution was computed by classical software; the achievement was the dexterity required to carry out the moves.[2][6] Commentators noted that the full-solve success rate was modest (around 20 percent for the hardest scrambles) and that the demonstration used hand-only manipulation, whereas human speedcubers also brace the cube against surfaces.[6] The result is therefore best understood as a manipulation and robustness milestone rather than a puzzle-solving one.
Dactyl helped popularize domain randomization and ADR as practical tools for transferring policies from simulation to physical robots, and it is frequently cited in subsequent dexterous-manipulation and sim-to-real research. The work also reinforced OpenAI's broader thesis at the time that a single general-purpose reinforcement-learning recipe, scaled with large amounts of compute and simulated experience, could attack problems ranging from video games to robotics.[1][3]
OpenAI later stepped back from robotics. In July 2021, co-founder Wojciech Zaremba, who had led the robotics group, confirmed on the Weights & Biases podcast that the company had disbanded its robotics team. He cited a shortage of training data as a key constraint and said OpenAI had decided to focus on domains where data was more readily available; an OpenAI spokesperson confirmed the team had been wound down.[8][9]