OpenAI Baselines

Open Source AI OpenAI Reinforcement Learning

6 min read

Updated Jun 3, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 3, 2026

Fact-checked

In review queue

Sources

8 citations

Revision

v1 · 1,279 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

OpenAI Baselines is a collection of open-source, high-quality reference implementations of reinforcement learning (RL) algorithms released by OpenAI. Distributed through the GitHub repository openai/baselines under the MIT License, the project provides correct, benchmarked implementations of widely used RL algorithms so that researchers can compare new methods against solid baselines instead of reimplementing older algorithms themselves. ^[1]^[2] The first component, an implementation of the Deep Q-Network (DQN) algorithm together with three of its variants, was announced on 24 May 2017. ^[3] The library is built on Google's TensorFlow framework. Although still publicly available, it is no longer in active development and is labeled as being in maintenance status. ^[1]

Overview

OpenAI Baselines was created to address a recurring problem in reinforcement learning research: results are notoriously difficult to reproduce because performance is highly sensitive to implementation details, hyperparameters, and subtle bugs. OpenAI argued that without known-good reference code, apparent algorithmic advances could be artifacts of comparing a new method against a poorly tuned or incorrect implementation of an older one. ^[3]

The repository describes itself as "a set of high-quality implementations of reinforcement learning algorithms" intended to make it easier for the research community to replicate, refine, and identify new ideas, and to create good baselines on which to build. ^[1] The accompanying announcement stated the goal of releasing "reliable implementations of reinforcement learning algorithms," beginning with DQN and three variants. ^[3]

The code is written for Python 3 (the repository requires Python 3.5 or higher) and depends on TensorFlow. The master branch supports TensorFlow versions 1.4 through 1.14, and a separate tf2 branch adds support for TensorFlow 2.0. ^[1] System dependencies include CMake, OpenMPI, and zlib, with OpenMPI used to support distributed and parallel training for several algorithms. Many of the continuous-control examples rely on the MuJoCo physics simulator, which historically required a proprietary license. ^[1] The library provides benchmark results on standard suites including Atari (evaluated at 10 million timesteps) and MuJoCo (evaluated at 1 million timesteps), and supports saving and loading trained models through --save_path and --load_path arguments. ^[1]

OpenAI Baselines should be distinguished from two related OpenAI projects. OpenAI Gym is a toolkit of standardized environments used to train and evaluate agents, whereas Baselines provides the algorithm implementations that run on those environments. Spinning Up is a separate educational resource that offers simplified, well-documented implementations intended for teaching rather than as performance benchmarks.

Implemented algorithms

OpenAI Baselines grew incrementally during 2017 and afterward. DQN and its variants were released first, on 24 May 2017. The original DQN combines Q-learning with deep neural networks; the three accompanying variants were Double Q-learning, Prioritized Experience Replay, and Dueling DQN. ^[3] A scalable, parallel implementation of Proximal Policy Optimization (PPO) and Trust Region Policy Optimization (TRPO), both using MPI for data passing, was released alongside the PPO announcement on 20 July 2017. ^[4] Implementations of ACKTR (Actor Critic using Kronecker-factored Trust Region) and A2C (a synchronous, batched variant of the A3C algorithm) were announced on 18 August 2017. ^[5]

The repository ultimately contained the following algorithm implementations, each in its own subdirectory: ^[1]

Algorithm	Full name / description	Type
DQN	Deep Q-Network, with Double Q-learning, Prioritized Replay, and Dueling variants	Value-based, off-policy
PPO1	Proximal Policy Optimization, original MPI version (marked obsolete)	Policy gradient, on-policy
PPO2	Proximal Policy Optimization, GPU-enabled version	Policy gradient, on-policy
TRPO	Trust Region Policy Optimization	Policy gradient, on-policy
A2C	Advantage Actor Critic (synchronous variant of A3C)	Actor-critic, on-policy
ACER	Actor-Critic with Experience Replay	Actor-critic, off-policy
ACKTR	Actor Critic using Kronecker-factored Trust Region	Actor-critic, on-policy
DDPG	Deep Deterministic Policy Gradient	Actor-critic, off-policy
GAIL	Generative Adversarial Imitation Learning	Imitation learning
HER	Hindsight Experience Replay (combined with DDPG)	Goal-based, off-policy

PPO1 is the original MPI-based implementation and is documented in the repository as obsolete, with PPO2 (which supports GPU acceleration) serving as the recommended version. ^[1] HER is not a standalone control algorithm but a replay strategy for sparse-reward, goal-conditioned tasks; the repository pairs it with DDPG and can additionally incorporate demonstrations during training. ^[1]

Purpose and reproducibility

A central motivation for OpenAI Baselines was reproducibility. The DQN announcement emphasized that small implementation differences can produce large performance gaps, and that releasing both correct code and a set of best practices would help ensure that reported RL advances were genuine rather than the result of comparison against buggy or untuned versions of existing algorithms. ^[3] To this end, OpenAI documented best practices for implementing RL algorithms correctly and provided benchmark numbers so that other researchers could check their own reproductions against a reference. ^[3]

This emphasis on benchmarked, citable implementations made OpenAI Baselines a common point of reference in subsequent RL literature, where papers frequently compared new algorithms against the Baselines versions of PPO, TRPO, DDPG, and DQN. The project also influenced later analyses of how implementation details, rather than core algorithmic ideas, drive measured RL performance.

Stable Baselines forks

Although widely used as a reference, OpenAI Baselines proved difficult to use as a software library. Community members noted that the codebase had inconsistent coding style, limited documentation and comments, significant code duplication, and a structure that made customization (for example, learning from non-image features) cumbersome. ^[6]

In response, researchers Antonin Raffin and Ashley Hill, working at the U2IS robotics laboratory (INRIA Flowers team) at ENSTA ParisTech, created Stable Baselines, a fork of OpenAI Baselines announced in August 2018. ^[6] Stable Baselines retained support for the same core algorithms (including A2C, PPO, TRPO, DQN, ACKTR, ACER, and DDPG) while introducing a unified, scikit-learn-style interface with common methods, comprehensive documentation, standardized code, expanded unit-test coverage, TensorBoard integration, callbacks, and various bug fixes. ^[6] Like the original, Stable Baselines was built on TensorFlow 1.x. ^[7]

As TensorFlow 1.x code was increasingly deprecated ahead of TensorFlow 2, the maintainers undertook a complete rewrite in PyTorch, producing Stable-Baselines3 (SB3). Maintained by the German Aerospace Center's Institute of Robotics and Mechatronics (DLR-RM), SB3 reached version 1.0 on 28 February 2021. ^[7] It preserves the user-friendly API of its predecessors while providing a smaller, cleaner core with static type checking and high automated test coverage, and it implements algorithms such as A2C, DDPG, DQN, PPO, SAC, TD3, and HER. ^[8] Additional and experimental algorithms (including TRPO, QR-DQN, TQC, and Recurrent PPO) are maintained in a companion repository, SB3-Contrib. ^[8] Stable-Baselines3 has continued to receive active maintenance releases. ^[8]

Status

The openai/baselines repository is labeled with the status "Maintenance (expect bug fixes and minor updates)," indicating that it is no longer in active feature development. ^[1] In practice, the community has largely migrated to the Stable Baselines lineage for new work: Stable Baselines (the TensorFlow fork) and especially the actively maintained, PyTorch-based Stable-Baselines3 are the recommended successors for users seeking a maintained library. ^[7]^[8] The original OpenAI Baselines code remains publicly available and is still cited as a historical reference implementation of its algorithms.

References

OpenAI. "openai/baselines: OpenAI Baselines: high-quality implementations of reinforcement learning algorithms." GitHub. https://github.com/openai/baselines ↩
OpenAI. "openai/baselines-results." GitHub. https://github.com/openai/baselines-results ↩
OpenAI. "OpenAI Baselines: DQN." 24 May 2017. https://openai.com/index/openai-baselines-dqn/ ↩
OpenAI. "Proximal Policy Optimization." 20 July 2017. https://openai.com/index/openai-baselines-ppo/ ↩
OpenAI. "OpenAI Baselines: ACKTR & A2C." 18 August 2017. https://openai.com/index/openai-baselines-acktr-a2c/ ↩
Raffin, Antonin. "Stable Baselines: a Fork of OpenAI Baselines, Reinforcement Learning Made Easy." Towards Data Science (Medium). https://medium.com/data-science/stable-baselines-a-fork-of-openai-baselines-reinforcement-learning-made-easy-df87c4b2fc82 ↩
Raffin, Antonin. "Stable-Baselines3: Reliable Reinforcement Learning Implementations." https://araffin.github.io/post/sb3/ ↩
DLR-RM. "stable-baselines3: PyTorch version of Stable Baselines." GitHub. https://github.com/DLR-RM/stable-baselines3 ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

Suggest edit

What links here

OpenAI Spinning Up

Overview

Implemented algorithms

Purpose and reproducibility

Stable Baselines forks

Status

References

Improve this article

Related Articles

Gym (OpenAI Gym / Gymnasium)

OpenAI Five

John Schulman

Dactyl (OpenAI)

Spinning Up

MuJoCo

What links here

Related Articles

Gym (OpenAI Gym / Gymnasium)

OpenAI Five

John Schulman

Dactyl (OpenAI)

Spinning Up

MuJoCo

What links here