Dactyl (OpenAI)

OpenAI Reinforcement Learning Robotics

8 min read

Updated Jun 3, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 3, 2026

Fact-checked

In review queue

Sources

9 citations

Revision

v1 · 1,576 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Dactyl was a robotics research project at OpenAI that used deep reinforcement learning to control a five-fingered, human-like robot hand and manipulate physical objects with high dexterity. The system trained its control policies entirely in simulation and transferred them to the real robot without any real-world fine-tuning. Dactyl was first presented in 2018, when it learned to reorient a wooden block to arbitrary target orientations, and it became widely known in 2019 when the same hand solved a Rubik's Cube one-handed. The project is notable as an early large-scale demonstration of simulation-to-reality (sim-to-real) transfer for dexterous manipulation, and for sharing reinforcement-learning algorithms and infrastructure with OpenAI Five, OpenAI's Dota 2 system.^[1]^[2]^[3]

Overview

Dactyl ran on a Shadow Dexterous Hand, a commercially available anthropomorphic robot hand with 24 degrees of freedom actuated by 20 pairs of agonist-antagonist tendons.^[3] An object such as a block was placed in the hand's palm, and the system was asked to manipulate it into a specified pose. Rather than hand-programming the finger motions, OpenAI trained a neural-network control policy with the Proximal Policy Optimization (PPO) algorithm, the same general-purpose reinforcement-learning method and code base used for OpenAI Five.^[1]^[3] The policy was a recurrent neural network built around a long short-term memory (LSTM) layer with 512 units, which allowed it to retain information about the object and the hand's dynamics over time.^[3]

A central challenge in robotics is the difference between simulated physics and the real world, often called the reality gap. Collecting enough real-world robot experience to train a policy from scratch is prohibitively slow and can damage hardware. Dactyl addressed this by training only in simulation and using domain randomization to make the learned behavior robust enough to transfer to the physical hand.^[1]^[2]

Learning Dexterity (2018)

The first Dactyl results were published on July 30, 2018, in an OpenAI post titled "Learning Dexterity" and an accompanying paper, "Learning Dexterous In-Hand Manipulation" (arXiv:1808.00177), authored by a large OpenAI team including Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Bob McGrew, Jakub Pachocki, Matthias Plappert, Lilian Weng, and Wojciech Zaremba.^[1]^[4] The task was to rotate a block resting in the palm to match a randomly chosen target orientation; once reached, a new target was generated, so the hand performed a continuous sequence of reorientations.^[1]

The policy was trained without any human demonstrations and learned several human-like manipulation strategies on its own. OpenAI reported that behaviors such as finger gaiting (repositioning fingers to keep the object stable), multi-finger coordination, and the controlled use of gravity emerged naturally from the training process rather than being programmed.^[1]^[4]

For perception, the real-world setup used two complementary systems. A motion-capture rig with 16 PhaseSpace cameras tracked LED markers to recover the hand's state at high precision, and a separate vision model estimated the object's pose from three Basler RGB cameras, demonstrating that the policy could work from camera images alone.^[3] The control policy issued actions at roughly 12 Hz, while a low-level controller ran at about 1 kHz.^[3]

Training was computationally intensive. Generating experience used a distributed system of 384 worker machines (16 CPU cores each), while optimization ran on 8 NVIDIA V100 GPUs.^[3] Learning the block task with full randomization required on the order of 100 years of simulated experience, generated at roughly two years of simulation per hour, corresponding to about 50 hours of wall-clock time.^[3] On the physical hand, the policy that used the vision-based pose estimator achieved a median of 11.5 consecutive successful rotations, while the version reading object state directly achieved a median of 13 and a maximum of 50 in a single trial.^[3]

Domain randomization and sim-to-real

Domain randomization was the key technique enabling transfer. During training, OpenAI randomized many physical and visual properties of the simulation, including object size and mass, friction coefficients, the surface appearance of the object and hand, gravity, and sensor noise and delays.^[1]^[3] By forcing a single policy to succeed across a wide distribution of simulated conditions, the real world appeared to the policy as just another variation it had already learned to handle.^[2]

The cost of this robustness was substantial. OpenAI noted that learning the block-rotation task in a simulation without randomization required roughly three years of simulated experience, whereas reaching comparable performance with full randomization required about 100 years, far more experience to learn a policy that could survive contact with reality.^[1]

Solving the Rubik's Cube (2019) and ADR

On October 15, 2019, OpenAI published "Solving Rubik's Cube with a Robot Hand," with an accompanying paper (arXiv:1910.07113) authored by a team including Ilge Akkaya, Marcin Andrychowicz, Bob McGrew, Matthias Plappert, Jerry Tworek, Lilian Weng, and Wojciech Zaremba.^[2]^[5] Using the same Shadow Dexterous Hand, the system manipulated a Rubik's Cube one-handed, performing the individual face rotations and cube flips needed to solve it.^[2]

An important and frequently misunderstood point is the division of labor. Deciding which moves solve a given scramble is a solved problem in classical computer science: OpenAI used Kociemba's algorithm to compute the solution sequence.^[2] The machine-learning contribution was not figuring out the solution but executing the required physical manipulations (rotating a face 90 degrees, or flipping the cube) reliably under real-world uncertainty.^[2] To sense the cube's state, OpenAI used a custom cube fitted with Bluetooth sensors (based on a Giiker smart cube) to read face angles, alongside vision for the overall pose.^[2]

The major algorithmic advance was Automatic Domain Randomization (ADR). Instead of fixing the randomization ranges by hand, ADR started from a single non-randomized environment and automatically widened the distribution of simulated conditions whenever the policy's performance crossed a threshold, generating an endless curriculum of ever-harder environments.^[2] OpenAI reported that memory-augmented policies trained on ADR distributions showed signs of emergent meta-learning, appearing to adapt to the specific conditions of each trial at test time.^[2]^[5]

The system was notably robust to perturbations it had never been trained against. In demonstrations, the hand continued manipulating the cube while wearing a rubber glove (which changed friction and geometry), with two fingers tied together, under a blanket occluding the camera, and while being prodded with a pen and a plush giraffe (nicknamed "Rubik").^[2] On solving performance, OpenAI reported that the hand fully solved a maximally scrambled cube (requiring 26 quarter-face rotations to undo) about 20 percent of the time, and solved an easier scramble (requiring 15 rotations) about 60 percent of the time.^[2]

Aspect	Learning Dexterity (2018)	Solving Rubik's Cube (2019)
Object / task	Reorient a block to target poses	Face rotations and flips to solve a Rubik's Cube
Hardware	Shadow Dexterous Hand (24 DoF)	Shadow Dexterous Hand (24 DoF)
Randomization	Manually tuned domain randomization	Automatic Domain Randomization (ADR)
Solution logic	N/A	Kociemba's algorithm (classical solver)
Headline result	Median 13 block rotations (state); max 50	~20% solve (26-move scramble); ~60% (15-move)
Key emergent behavior	Finger gaiting, use of gravity	Meta-learning, robustness to perturbations

Reception and what it did and did not show

Dactyl drew wide attention as a vivid illustration of sim-to-real transfer and reinforcement learning applied to a difficult physical task. Coverage by outlets including IEEE Spectrum, VentureBeat, and MIT Technology Review highlighted that a policy trained purely in simulation could cope with real-world friction, wear, and disturbances.^[6]^[7]

Observers also cautioned against overstating the result. The robot did not "figure out" the Rubik's Cube, since the solution was computed by classical software; the achievement was the dexterity required to carry out the moves.^[2]^[6] Commentators noted that the full-solve success rate was modest (around 20 percent for the hardest scrambles) and that the demonstration used hand-only manipulation, whereas human speedcubers also brace the cube against surfaces.^[6] The result is therefore best understood as a manipulation and robustness milestone rather than a puzzle-solving one.

Legacy

Dactyl helped popularize domain randomization and ADR as practical tools for transferring policies from simulation to physical robots, and it is frequently cited in subsequent dexterous-manipulation and sim-to-real research. The work also reinforced OpenAI's broader thesis at the time that a single general-purpose reinforcement-learning recipe, scaled with large amounts of compute and simulated experience, could attack problems ranging from video games to robotics.^[1]^[3]

OpenAI later stepped back from robotics. In July 2021, co-founder Wojciech Zaremba, who had led the robotics group, confirmed on the Weights & Biases podcast that the company had disbanded its robotics team. He cited a shortage of training data as a key constraint and said OpenAI had decided to focus on domains where data was more readily available; an OpenAI spokesperson confirmed the team had been wound down.^[8]^[9]

References

OpenAI, "Learning dexterity," July 30, 2018. https://openai.com/index/learning-dexterity/ ↩
OpenAI, "Solving Rubik's Cube with a robot hand," October 15, 2019. https://openai.com/index/solving-rubiks-cube/ ↩
OpenAI et al., "Learning Dexterous In-Hand Manipulation," arXiv:1808.00177, 2018. https://arxiv.org/abs/1808.00177 ↩
"Learning Dexterous In-Hand Manipulation" (HTML), ar5iv. https://ar5iv.labs.arxiv.org/html/1808.00177 ↩
OpenAI et al., "Solving Rubik's Cube with a Robot Hand," arXiv:1910.07113, 2019. https://arxiv.org/abs/1910.07113 ↩
E. Ackerman, "OpenAI Teaches a Robot Hand to Solve a Rubik's Cube," IEEE Spectrum, October 15, 2019. https://spectrum.ieee.org/openai-demonstrates-sim2real-by-with-onehanded-rubiks-cube-solving ↩
W. Heaven, "A robot hand taught itself to solve a Rubik's Cube after creating its own training regime," MIT Technology Review, October 15, 2019. https://www.technologyreview.com/2019/10/15/75292/a-robot-hand-taught-itself-to-solve-a-rubiks-cube-after-creating-its-own-training-regime/ ↩
K. Quach, "OpenAI shuts down robotics team because it doesn't have enough data yet," The Register, July 18, 2021. https://www.theregister.com/2021/07/18/in_brief_ai/ ↩
K. Wiggers, "OpenAI disbands its robotics research team," VentureBeat, July 16, 2021. https://venturebeat.com/business/openai-disbands-its-robotics-research-team ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

Allegro Hand Dexterous hand Kyber Labs OpenAI OpenAI Five Shadow Robot Company Tendon-driven actuation

Overview

Learning Dexterity (2018)

Domain randomization and sim-to-real

Solving the Rubik's Cube (2019) and ADR

Reception and what it did and did not show

Legacy

References

Improve this article

Related Articles

Gym (OpenAI Gym / Gymnasium)

OpenAI Five

John Schulman

OpenAI Baselines

Spinning Up

Embodied AI

What links here

Related Articles

Gym (OpenAI Gym / Gymnasium)

OpenAI Five

John Schulman

OpenAI Baselines

Spinning Up

Embodied AI

What links here