Gym
Last reviewed
May 10, 2026
Sources
13 citations
Review status
Source-backed
Revision
v4 ยท 2,494 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 10, 2026
Sources
13 citations
Review status
Source-backed
Revision
v4 ยท 2,494 words
Add missing citations, update stale details, or suggest a clearer explanation.

Figure 1: OpenAI Gym agent-environment loop. Source: Velotio Perspectives.
Gym (often written as OpenAI Gym) is an open source Python toolkit for developing and comparing reinforcement learning algorithms, originally released by OpenAI on April 27, 2016. It pairs a small, opinionated programming interface with a curated collection of benchmark environments so that researchers can plug almost any reinforcement learning (RL) agent into a wide variety of tasks without having to rewrite the simulation code each time. Although it was built around RL, the same env.reset() and env.step(action) calls work fine with imitation learning, evolutionary search, and other approaches that need a uniform notion of "environment." [1] [2]
The toolkit was introduced in the whitepaper "OpenAI Gym" (arXiv:1606.01540), submitted on June 5, 2016 by Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. The paper framed Gym as an attempt to do for RL what ImageNet had done for supervised vision: provide a shared, well-versioned set of tasks plus a public site for comparing results, so that progress could actually be measured rather than just claimed. [1]
OpenAI stopped active maintenance of Gym in late 2020. In early 2021 the codebase was handed to a volunteer team led by Jordan Terry, and in October 2022 that team officially relaunched the project as Gymnasium under the Farama Foundation. The original openai/gym repository was archived on April 8, 2026, and Gymnasium is now the canonical successor; it can be dropped into existing projects by replacing import gym with import gymnasium as gym. [3] [4] [5]
| Attribute | Detail |
|---|---|
| Original name | OpenAI Gym |
| Initial release | April 27, 2016 (public beta) |
| Whitepaper | Brockman et al., arXiv:1606.01540, June 5, 2016 |
| Original developer | OpenAI |
| Current maintainer | Farama Foundation (as Gymnasium) |
| Final Gym release | 0.26.2, October 4, 2022 |
| Repository archived | April 8, 2026 |
| License | MIT |
| Language | Python (3.7+ for late Gym; 3.10-3.13 for current Gymnasium) |
| Successor | Gymnasium (Farama Foundation) |
Before Gym, every RL paper tended to ship with its own custom simulator and its own way of feeding observations into a learning algorithm. Comparing two methods meant either reimplementing somebody else's environment or trusting a number printed in a table. Brockman and colleagues argued that this lack of a shared evaluation surface was holding the field back, particularly as deep RL was starting to show real results on Atari games and continuous control. [1]
The response was deliberately minimal. Gym does not ship a learning algorithm at all. It only defines a contract: an environment is anything that exposes reset(), step(action), and a pair of spaces describing what observations look like and what actions are legal. Anything matching that contract is a Gym environment, whether it simulates a 2D pole, an Atari ROM, a 3D humanoid, or a custom robotics rig. This narrow scope is part of why the API spread so quickly across other libraries. [1] [3]
The core of Gym is a small object called Env. A typical interaction loop in the original Gym (versions before 0.26) looked like this: a researcher would call gym.make() to build a versioned environment, call reset() to get the first observation, and then repeatedly call step(action) until the environment signaled that the episode was over.
| Method or attribute | Purpose |
|---|---|
gym.make("CartPole-v1") | Construct a versioned environment by string ID |
env.reset() | Return the first observation and reset internal state |
env.step(action) | Apply an action and return (observation, reward, done, info) in classic Gym, or (observation, reward, terminated, truncated, info) from Gym 0.26 onward |
env.render() | Visualize the current state, controlled by render_mode |
env.action_space | A Space describing valid actions |
env.observation_space | A Space describing the structure of observations |
env.close() | Release rendering windows or simulator handles |
Observation and action spaces are described by gym.spaces objects. The most common are Box for bounded continuous vectors, Discrete for a finite set of integer actions, MultiDiscrete and MultiBinary for structured discrete spaces, and Dict and Tuple for nested observations. Because the spaces are first-class objects, libraries like Stable Baselines and RLlib can ask an environment what shape its inputs are and build neural networks automatically. [3] [6]
For most of Gym's life, step() returned a single boolean called done that bundled together two very different events: the agent reached a terminal state of the underlying Markov decision process, or the episode was cut off by a time limit. Treating these the same caused subtle bugs in algorithms that bootstrap value estimates, because a time-cut episode is not really over from the agent's perspective. Gym 0.26 (October 2022) and every Gymnasium release since split done into two flags: terminated for true MDP terminal states and truncated for time limits or external cutoffs. The new five-tuple return is now standard across the ecosystem. [4] [7]
Gym popularized the idea of stacking environment wrappers. A wrapper takes an existing Env and modifies one slice of its behavior, observations, actions, rewards, or episode lifecycle, while passing the rest through. Common examples include normalizing observations, frame stacking for Atari, sticky actions, time-limit enforcement, and converting reward signals to running averages. Because wrappers compose, a single line such as env = TimeLimit(FrameStack(AtariPreprocessing(env), 4), 10000) reproduces a fairly complex training pipeline. [3]
Gym shipped with several families of environments, each with its own dependencies and typical research uses. Gymnasium inherited the same families and continues to maintain them.
| Category | Examples | Notes |
|---|---|---|
| Classic control | CartPole-v1, MountainCar-v0, Acrobot-v1, Pendulum-v1 | Lightweight 2D physics tasks taken from RL textbooks; useful for debugging and teaching |
| Toy text | FrozenLake-v1, Taxi-v3, Blackjack-v1, CliffWalking-v0 | Tiny tabular MDPs used for tabular methods like Q-learning and SARSA |
| Box2D | LunarLander-v2, BipedalWalker-v3, CarRacing-v2 | Built on the Box2D 2D physics engine; mid-difficulty continuous and discrete tasks |
| Atari | Pong-v5, Breakout-v5, SpaceInvaders-v5, and roughly 60 other ROMs | Wrapped from the Arcade Learning Environment (Bellemare et al., 2013); the standard benchmark for deep RL on pixels |
| MuJoCo | Ant-v4, HalfCheetah-v4, Humanoid-v4, Walker2d-v4 | Continuous control with detailed contact physics; originally required a paid MuJoCo license, free since 2021 |
| Robotics | FetchReach-v1, HandManipulateBlock-v0 | Goal-based manipulation tasks; later spun out to a separate gym-robotics package |
| Algorithmic | Copy-v0, RepeatCopy-v0, ReversedAddition-v0 | Simple symbol-manipulation puzzles; deprecated in later Gym versions |
The Atari family is the most influential of the bunch. By wrapping the Arcade Learning Environment and standardizing pre-processing, Gym made it trivial to reproduce the original DQN paper's experimental setup, and a generation of deep RL papers ran on exactly that suite of games. [2] [8]
In the original Gym, the base install was pip install gym. Optional extras pulled in environment-specific dependencies, for example pip install gym[atari] for Atari ROMs via ale-py, pip install gym[box2d] for the Box2D family, and pip install gym[mujoco] for the MuJoCo continuous control suite. The same pattern carries over to Gymnasium: pip install gymnasium, pip install "gymnasium[atari]", pip install "gymnasium[all]". [3] [5]
A minimal random-agent loop reads almost identically in either library:
import gymnasium as gym
env = gym.make("CartPole-v1", render_mode="human")
obs, info = env.reset(seed=42)
done = False
while not done:
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
env.close()
The seed argument to reset() is itself a Gymnasium-era addition. Earlier Gym versions exposed seeding through a separate env.seed() method, which was eventually deprecated. [4]
OpenAI built several projects on top of Gym during the years it was actively maintained, and most of them are now retired or community-maintained.
Universe, released by OpenAI in December 2016, used Gym's interface as the agent-side contract while running arbitrary desktop and browser programs inside Docker containers. Each container exposed a VNC server for pixels and keyboard or mouse events plus a separate WebSocket channel for reward signals, so Flash games, browser tasks, and even commercial titles like Grand Theft Auto V could be treated as Gym environments. The initial release advertised over 1,000 environments, of which a few hundred had reward signals wired up. Universe was effectively shelved by 2017 when OpenAI shifted focus to dedicated game research; the GitHub repository remained but stopped receiving updates. [9]
Roboschool, released by OpenAI in 2017, was an open source robotics simulator built on the Bullet physics engine. It provided MuJoCo-style continuous control environments without the proprietary license that MuJoCo required at the time and integrated with Gym through the standard interface. OpenAI deprecated Roboschool in 2019 in favor of MuJoCo-based environments after MuJoCo itself moved toward a free license; Bullet-based RL environments live on in projects like PyBullet Gym. [10]
Gym Retro, launched in 2018, extended the Atari pattern to many more retro consoles, including SNES, Sega Genesis, NES, Game Boy, and Atari 2600. The full release shipped over 1,000 games and tools for adding new ones via game integration files. Gym Retro powered OpenAI's Retro Contest, a generalization-focused competition built around Sonic the Hedgehog levels. [11]
Safety Gym, released in 2019 by Alex Ray, Joshua Achiam, and Dario Amodei, focused on constrained RL and safe exploration. It included an environment-builder for composing tasks out of physics elements, goals, and safety constraints, plus a benchmark suite of 18 high-dimensional continuous control environments and nine debugging environments. Like Roboschool, Safety Gym is no longer actively maintained by OpenAI; the Farama Foundation now hosts a successor called Safety-Gymnasium. [12]
Gym's most lasting contribution is the API itself. Almost every popular RL library released after 2016 either consumes Gym environments directly or implements a compatible adapter.
| Project | Relationship to Gym |
|---|---|
| Stable Baselines and Stable Baselines3 | Algorithm libraries that train against any Gym-compatible environment; SB3 added explicit Gymnasium support in 2023 |
| RLlib | Ray's distributed RL framework; uses the Gym/Gymnasium API as its environment standard |
| PettingZoo | Multi-agent counterpart to Gym from the same Farama team; designed as the multi-agent analogue of gym.Env |
| CleanRL | Single-file reference implementations of RL algorithms; written against Gym and later Gymnasium |
| Tianshou | Modular PyTorch RL library that adopts the Gym API |
| dm-control and DeepMind Lab | Originally separate; now offer Gym wrappers via the Farama Foundation's Shimmy compatibility layer |
The ripple effect goes beyond Python. Several non-Python tools, including Unity ML-Agents and PettingZoo's parallel API, model their interfaces explicitly on Gym so that existing agents and training scripts can be reused with minimal changes. [6] [13]
By 2020, OpenAI's research priorities had shifted decisively toward large language models, and Gym went largely unmaintained for most of that year. Pull requests piled up, environment versions drifted out of sync with their underlying simulators, and several core dependencies (notably MuJoCo and Atari) changed their licensing or distribution model in ways that broke the default install. [4] [5]
In early 2021, OpenAI agreed to hand the repository to a volunteer maintenance team led by Jordan Terry, who had been doing much of the upkeep informally. That team also founded the Farama Foundation, a nonprofit dedicated to open source RL infrastructure, which was publicly announced on October 25, 2022. The same announcement introduced Gymnasium as the long-term home for the Gym API, with Mark Towers and Ariel Kwiatkowski among the lead maintainers. The 2024 paper "Gymnasium: A Standard Interface for Reinforcement Learning Environments" (arXiv:2407.17032) by Towers et al. lists 16 contributors and documents the API in its current form. [4] [5]
Key changes between Gym and Gymnasium include the terminated/truncated split described above, a stricter contract for reset(seed=...) deterministic seeding, a unified render_mode argument set at construction time rather than passed to render(), and updated MuJoCo environments based on the open source mujoco Python bindings instead of the older mujoco-py. The Farama Foundation also publishes Shimmy, a compatibility layer that wraps older Gym environments and several non-Gym APIs (DeepMind Control Suite, OpenSpiel, Atari ALE) so they can be used as Gymnasium environments. [4] [5]
Gym is, by any reasonable measure, the most influential single piece of infrastructure in modern reinforcement learning research. The original openai/gym repository accumulated more than 37,000 GitHub stars and 8,700 forks before being archived, and the standard Atari, MuJoCo, and classic-control benchmark numbers reported in essentially every deep RL paper from 2016 onward trace back to environments first packaged here. [3]
For new work, however, the toolkit itself is no longer the right starting point. The original repository is read-only, several environment families have moved to Farama-maintained packages (ale-py for Atari, Gymnasium-Robotics, Safety-Gymnasium), and the API improvements introduced after 2022 only exist in Gymnasium. The practical advice from both OpenAI and Farama is the same: install Gymnasium and import it with the alias gym if backward compatibility matters. [3] [4] [5]
Viewed in retrospect, the most interesting thing about Gym may be how little it tried to do. It is a hundred or so lines of well-chosen abstractions plus a lot of glue code, and the field built almost everything else on top. That is roughly how good infrastructure tends to look once it has settled in.