OpenAI Baselines
Last reviewed
Jun 3, 2026
Sources
8 citations
Review status
Source-backed
Revision
v1 · 1,279 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Jun 3, 2026
Sources
8 citations
Review status
Source-backed
Revision
v1 · 1,279 words
Add missing citations, update stale details, or suggest a clearer explanation.
OpenAI Baselines is a collection of open-source, high-quality reference implementations of reinforcement learning (RL) algorithms released by OpenAI. Distributed through the GitHub repository openai/baselines under the MIT License, the project provides correct, benchmarked implementations of widely used RL algorithms so that researchers can compare new methods against solid baselines instead of reimplementing older algorithms themselves. [1][2] The first component, an implementation of the Deep Q-Network (DQN) algorithm together with three of its variants, was announced on 24 May 2017. [3] The library is built on Google's TensorFlow framework. Although still publicly available, it is no longer in active development and is labeled as being in maintenance status. [1]
OpenAI Baselines was created to address a recurring problem in reinforcement learning research: results are notoriously difficult to reproduce because performance is highly sensitive to implementation details, hyperparameters, and subtle bugs. OpenAI argued that without known-good reference code, apparent algorithmic advances could be artifacts of comparing a new method against a poorly tuned or incorrect implementation of an older one. [3]
The repository describes itself as "a set of high-quality implementations of reinforcement learning algorithms" intended to make it easier for the research community to replicate, refine, and identify new ideas, and to create good baselines on which to build. [1] The accompanying announcement stated the goal of releasing "reliable implementations of reinforcement learning algorithms," beginning with DQN and three variants. [3]
The code is written for Python 3 (the repository requires Python 3.5 or higher) and depends on TensorFlow. The master branch supports TensorFlow versions 1.4 through 1.14, and a separate tf2 branch adds support for TensorFlow 2.0. [1] System dependencies include CMake, OpenMPI, and zlib, with OpenMPI used to support distributed and parallel training for several algorithms. Many of the continuous-control examples rely on the MuJoCo physics simulator, which historically required a proprietary license. [1] The library provides benchmark results on standard suites including Atari (evaluated at 10 million timesteps) and MuJoCo (evaluated at 1 million timesteps), and supports saving and loading trained models through --save_path and --load_path arguments. [1]
OpenAI Baselines should be distinguished from two related OpenAI projects. OpenAI Gym is a toolkit of standardized environments used to train and evaluate agents, whereas Baselines provides the algorithm implementations that run on those environments. Spinning Up is a separate educational resource that offers simplified, well-documented implementations intended for teaching rather than as performance benchmarks.
OpenAI Baselines grew incrementally during 2017 and afterward. DQN and its variants were released first, on 24 May 2017. The original DQN combines Q-learning with deep neural networks; the three accompanying variants were Double Q-learning, Prioritized Experience Replay, and Dueling DQN. [3] A scalable, parallel implementation of Proximal Policy Optimization (PPO) and Trust Region Policy Optimization (TRPO), both using MPI for data passing, was released alongside the PPO announcement on 20 July 2017. [4] Implementations of ACKTR (Actor Critic using Kronecker-factored Trust Region) and A2C (a synchronous, batched variant of the A3C algorithm) were announced on 18 August 2017. [5]
The repository ultimately contained the following algorithm implementations, each in its own subdirectory: [1]
| Algorithm | Full name / description | Type |
|---|---|---|
| DQN | Deep Q-Network, with Double Q-learning, Prioritized Replay, and Dueling variants | Value-based, off-policy |
| PPO1 | Proximal Policy Optimization, original MPI version (marked obsolete) | Policy gradient, on-policy |
| PPO2 | Proximal Policy Optimization, GPU-enabled version | Policy gradient, on-policy |
| TRPO | Trust Region Policy Optimization | Policy gradient, on-policy |
| A2C | Advantage Actor Critic (synchronous variant of A3C) | Actor-critic, on-policy |
| ACER | Actor-Critic with Experience Replay | Actor-critic, off-policy |
| ACKTR | Actor Critic using Kronecker-factored Trust Region | Actor-critic, on-policy |
| DDPG | Deep Deterministic Policy Gradient | Actor-critic, off-policy |
| GAIL | Generative Adversarial Imitation Learning | Imitation learning |
| HER | Hindsight Experience Replay (combined with DDPG) | Goal-based, off-policy |
PPO1 is the original MPI-based implementation and is documented in the repository as obsolete, with PPO2 (which supports GPU acceleration) serving as the recommended version. [1] HER is not a standalone control algorithm but a replay strategy for sparse-reward, goal-conditioned tasks; the repository pairs it with DDPG and can additionally incorporate demonstrations during training. [1]
A central motivation for OpenAI Baselines was reproducibility. The DQN announcement emphasized that small implementation differences can produce large performance gaps, and that releasing both correct code and a set of best practices would help ensure that reported RL advances were genuine rather than the result of comparison against buggy or untuned versions of existing algorithms. [3] To this end, OpenAI documented best practices for implementing RL algorithms correctly and provided benchmark numbers so that other researchers could check their own reproductions against a reference. [3]
This emphasis on benchmarked, citable implementations made OpenAI Baselines a common point of reference in subsequent RL literature, where papers frequently compared new algorithms against the Baselines versions of PPO, TRPO, DDPG, and DQN. The project also influenced later analyses of how implementation details, rather than core algorithmic ideas, drive measured RL performance.
Although widely used as a reference, OpenAI Baselines proved difficult to use as a software library. Community members noted that the codebase had inconsistent coding style, limited documentation and comments, significant code duplication, and a structure that made customization (for example, learning from non-image features) cumbersome. [6]
In response, researchers Antonin Raffin and Ashley Hill, working at the U2IS robotics laboratory (INRIA Flowers team) at ENSTA ParisTech, created Stable Baselines, a fork of OpenAI Baselines announced in August 2018. [6] Stable Baselines retained support for the same core algorithms (including A2C, PPO, TRPO, DQN, ACKTR, ACER, and DDPG) while introducing a unified, scikit-learn-style interface with common methods, comprehensive documentation, standardized code, expanded unit-test coverage, TensorBoard integration, callbacks, and various bug fixes. [6] Like the original, Stable Baselines was built on TensorFlow 1.x. [7]
As TensorFlow 1.x code was increasingly deprecated ahead of TensorFlow 2, the maintainers undertook a complete rewrite in PyTorch, producing Stable-Baselines3 (SB3). Maintained by the German Aerospace Center's Institute of Robotics and Mechatronics (DLR-RM), SB3 reached version 1.0 on 28 February 2021. [7] It preserves the user-friendly API of its predecessors while providing a smaller, cleaner core with static type checking and high automated test coverage, and it implements algorithms such as A2C, DDPG, DQN, PPO, SAC, TD3, and HER. [8] Additional and experimental algorithms (including TRPO, QR-DQN, TQC, and Recurrent PPO) are maintained in a companion repository, SB3-Contrib. [8] Stable-Baselines3 has continued to receive active maintenance releases. [8]
The openai/baselines repository is labeled with the status "Maintenance (expect bug fixes and minor updates)," indicating that it is no longer in active feature development. [1] In practice, the community has largely migrated to the Stable Baselines lineage for new work: Stable Baselines (the TensorFlow fork) and especially the actively maintained, PyTorch-based Stable-Baselines3 are the recommended successors for users seeking a maintained library. [7][8] The original OpenAI Baselines code remains publicly available and is still cited as a historical reference implementation of its algorithms.