Spinning Up

OpenAI Reinforcement Learning

6 min read

Updated Jun 3, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 3, 2026

Fact-checked

In review queue

Sources

9 citations

Revision

v1 · 1,120 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Spinning Up in Deep RL is a free, open-source educational resource produced by OpenAI to make deep reinforcement learning (deep RL) easier to learn. Released on November 8, 2018, and written primarily by Joshua Achiam, it combines a written introduction to RL theory, advice for aspiring researchers, a curated reading list, well-documented standalone implementations of several core algorithms, and a set of exercises. The material is hosted at spinningup.openai.com, and the accompanying code lives in the GitHub repository openai/spinningup under an MIT license.^[1]^[2]^[3]

Overview

OpenAI created Spinning Up to address a perceived gap: although introductory resources for deep learning were plentiful, deep reinforcement learning remained comparatively difficult to enter. The project's stated goal is to help people learn to use deep RL techniques and to develop intuitions about how and why they work.^[1]^[3]

The resource grew directly out of OpenAI's talent-development efforts. The organization observed through its OpenAI Scholars and OpenAI Fellows programs that people with little or no prior machine learning experience could become competent practitioners relatively quickly given the right guidance, and Spinning Up packaged that guidance for a wider audience. The material was subsequently incorporated into the curriculum for the 2019 cohorts of Scholars and Fellows.^[1]

The lead author is Joshua (Josh) Achiam, who at the time of release was a research scientist on OpenAI's safety team and a PhD student at the University of California, Berkeley, advised by Pieter Abbeel. His research focused on safety in deep reinforcement learning, including work on safe exploration.^[2]^[4]

An introduction to RL. A three-part conceptual overview that defines key terminology and notation (such as states, actions, policies, trajectories, rewards, and value functions), presents a taxonomy of RL algorithms, and introduces the mathematics of policy optimization.
An essay titled "Spinning Up as a Deep RL Researcher." Dated October 13, 2018, this guide walks readers from the prerequisite background (mathematics, deep learning, and familiarity with a framework such as TensorFlow or PyTorch) through learning by doing, developing a research project, and conducting rigorous experiments. Its practical advice includes writing single-threaded code before parallelizing, testing on simple environments first, assuming bugs rather than blaming hyperparameters when results disappoint, evaluating across multiple random seeds, and running ablation studies.^[6]
A curated list of key papers in deep RL. The reading list catalogs roughly 105 papers organized into 13 thematic categories, including model-free RL, exploration, transfer and multitask RL, hierarchy, memory, model-based RL, meta-RL, scaling RL, RL in the real world, safety, imitation learning and inverse RL, reproducibility and critique, and a bonus set of classic theory and review papers.^[7]
Standalone algorithm implementations. Short, self-contained, heavily documented implementations of several core algorithms, each accompanied by a documentation page explaining the background, pseudocode, and references.^[5]
Exercises. A set of problems and benchmarks intended to test the reader's understanding and implementation skills.^[3]

The repository also ships utilities for logging, plotting results, and running experiments, including support for parallelized execution.^[5]

Implemented algorithms

Spinning Up provides implementations of six core deep RL algorithms, grouped into on-policy and off-policy families. At launch all were written using TensorFlow v1. On January 30, 2020, release 0.2 added PyTorch implementations for every algorithm except Trust Region Policy Optimization, which remains available only in the TensorFlow version. Both code bases are maintained side by side.^[5]^[8]

Algorithm	Abbreviation	Family	Frameworks
Vanilla Policy Gradient	VPG	On-policy	TensorFlow, PyTorch
Trust Region Policy Optimization	TRPO	On-policy	TensorFlow
Proximal Policy Optimization	PPO	On-policy	TensorFlow, PyTorch
Deep Deterministic Policy Gradient	DDPG	Off-policy	TensorFlow, PyTorch
Twin Delayed DDPG	TD3	Off-policy	TensorFlow, PyTorch
Soft Actor-Critic	SAC	Off-policy	TensorFlow, PyTorch

The documentation frames VPG, TRPO, and PPO as a progression of on-policy methods that successively improve stability and sample efficiency, while DDPG, TD3, and SAC are off-policy methods that reuse past data, with TD3 and SAC presented as descendants of DDPG that incorporate additional techniques to address stability problems.^[5]

Educational use and reception

To support the launch, OpenAI offered a period of high-bandwidth support in November 2018, during which the author and others answered questions from learners.^[1] OpenAI also hosted a Spinning Up Workshop at its San Francisco office on February 2, 2019. Roughly 90 people attended in person, with close to 300 more following via livestream; attendees came from backgrounds spanning academia, software engineering, data science, machine learning engineering, medicine, and education. The morning featured talks by Joshua Achiam on the conceptual foundations and taxonomy of RL, Matthias Plappert on OpenAI's dexterous-robot-hand manipulation work, and Dario Amodei, then leader of OpenAI's safety team, on problems in AI safety. Afternoon breakout sessions, staffed by volunteer instructors, helped participants with implementation and research ideas.^[9]

Spinning Up has been widely cited as a reference point for newcomers to the field and has inspired community ports and clones. The GitHub repository is in maintenance status, meaning it receives bug fixes and minor updates rather than new features.^[3]

Relationship to Baselines and Gym

Spinning Up is distinct in purpose from OpenAI's two other major RL software releases, though they are often used together.^[1]^[5]

Project	Purpose	Audience
OpenAI Gym	A standardized toolkit of environments for developing and benchmarking RL algorithms	All RL practitioners
OpenAI Baselines	A set of high-quality reference implementations of RL algorithms, optimized for performance	Researchers reproducing or building on state-of-the-art results
Spinning Up	Educational, readable implementations and written material for learning deep RL	Learners and aspiring researchers

Where Baselines emphasizes performance and feature completeness, Spinning Up deliberately favors short, transparent code meant to be read and understood, and it relies on Gym environments (and compatible robotics environments) for training and evaluation. In its researcher essay, OpenAI explicitly recommends reading the Baselines code to learn engineering best practices once a learner has grasped the fundamentals.^[5]^[6]

References

OpenAI. "Spinning Up in Deep RL." November 8, 2018. https://openai.com/index/spinning-up-in-deep-rl/ ↩
Achiam, Joshua. "Spinning Up in Deep Reinforcement Learning." 2018 (citation and author metadata). https://spinningup.openai.com/en/latest/etc/author.html ↩
OpenAI. "openai/spinningup" repository readme. https://github.com/openai/spinningup ↩
"About the Author." Spinning Up documentation. https://spinningup.openai.com/en/latest/etc/author.html ↩
"Algorithms." Spinning Up documentation. https://spinningup.openai.com/en/latest/user/algorithms.html ↩
Achiam, Joshua. "Spinning Up as a Deep RL Researcher." October 13, 2018. https://spinningup.openai.com/en/latest/spinningup/spinningup.html ↩
"Key Papers in Deep RL." Spinning Up documentation. https://spinningup.openai.com/en/latest/spinningup/keypapers.html ↩
"Releases: openai/spinningup." GitHub. https://github.com/openai/spinningup/releases ↩
OpenAI. "Spinning Up in Deep RL: Workshop review." February 2019. https://openai.com/index/spinning-up-in-deep-rl-workshop-review/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

OpenAI OpenAI Baselines

Overview

Contents

Implemented algorithms

Educational use and reception

Relationship to Baselines and Gym

References

Improve this article

Related Articles

Gym (OpenAI Gym / Gymnasium)

OpenAI Five

John Schulman

Dactyl (OpenAI)

OpenAI Baselines

State (Reinforcement Learning)

What links here

Related Articles

Gym (OpenAI Gym / Gymnasium)

OpenAI Five

John Schulman

Dactyl (OpenAI)

OpenAI Baselines

State (Reinforcement Learning)

What links here