Gym (OpenAI Gym / Gymnasium)

Developer Tools OpenAI Reinforcement Learning

25 min read

Updated Jun 21, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 21, 2026

Fact-checked

In review queue

Sources

26 citations

Revision

v7 · 4,905 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Gym, often written as OpenAI Gym, is an open source Python toolkit for developing and comparing reinforcement learning algorithms, originally released by openai on April 27, 2016.^[1]^[2] The original whitepaper opens with the single-sentence definition, "OpenAI Gym is a toolkit for reinforcement learning research."^[1] It pairs a small, opinionated programming interface with a curated collection of benchmark environments so that researchers can plug a reinforcement learning agent into a wide variety of tasks without having to rewrite the simulation code each time. Although it was built around RL, the same env.reset() and env.step(action) calls work fine with imitation learning, evolutionary search, and other approaches that need a uniform notion of "environment."^[1]^[3] OpenAI stopped active maintenance of Gym around 2020 and 2021, the codebase was handed to a volunteer team, and in October 2022 that team officially relaunched the project as Gymnasium under the farama foundation.^[4]^[5]^[6] The original openai/gym repository was archived on April 8, 2026, and Gymnasium is now the canonical successor; it can be dropped into existing projects by replacing import gym with import gymnasium as gym.^[4]^[5]^[6]

The Gym API, with its trio of reset(), step(action), and render() methods plus typed observation and action spaces, is the de facto standard interface in modern reinforcement learning research. Almost every popular RL library released after 2016, including Stable Baselines, RLlib, CleanRL, Tianshou, and TorchRL, either consumes Gym or Gymnasium environments directly or implements a compatible adapter.^[5]^[6] The original OpenAI Gym whitepaper has been cited well over ten thousand times on Google Scholar, and the Atari, MuJoCo, and classic-control benchmark numbers reported in essentially every deep RL paper from 2016 onward trace back to environments first packaged here.^[1]^[7]

Quick facts

Attribute	Detail
Original name	OpenAI Gym
Initial public beta	April 27, 2016
Whitepaper	Brockman et al., arXiv:1606.01540, submitted June 5, 2016
Original developer	OpenAI
Current maintainer	Farama Foundation, as Gymnasium
Final OpenAI Gym release	0.26.2, October 4, 2022
Farama announcement	October 25, 2022
Original repository archived	April 8, 2026
Latest Gymnasium version	1.3.0, April 22, 2026
License	MIT
Languages	Python (3.7+ for late Gym; 3.10 through 3.13 for current Gymnasium)
Gymnasium paper	Towers et al., arXiv:2407.17032, July 24, 2024
Successor	Gymnasium (Farama Foundation)

Why was OpenAI Gym created?

Before Gym, almost every reinforcement learning paper shipped with its own custom simulator and its own way of feeding observations into a learning algorithm. Comparing two methods meant either reimplementing somebody else's environment or trusting a number printed in a table. Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, john schulman, Jie Tang, and Wojciech Zaremba argued in the original Gym whitepaper that this lack of a shared evaluation surface was holding the field back, particularly as deep RL was starting to show real results on Atari games and continuous control benchmarks.^[1] greg brockman later described Gym as an attempt to do for RL what imagenet had done for supervised vision: provide a shared, well-versioned set of tasks plus a public site for comparing results, so that progress could actually be measured rather than just claimed.^[1]^[2]

The intellectual lineage is straightforward. The Atari benchmark suite came from the Arcade Learning Environment by Marc Bellemare and colleagues, published in the Journal of Artificial Intelligence Research in 2013.^[8] DeepMind's dqn paper applied deep Q-learning to that suite in 2013 and 2015, posting human-level scores on 29 of 49 games and demonstrating that a single network architecture could learn many games from raw pixels.^[9] By 2016, when Gym launched, researchers wanted to reproduce these results and extend them to continuous-control tasks; the missing piece was a uniform Python interface that would let an algorithm written for CartPole also drive a MuJoCo humanoid without code changes.^[1]

The response was deliberately minimal. Gym does not ship a learning algorithm at all. It only defines a contract: an environment is anything that exposes reset(), step(action), and a pair of spaces describing what observations look like and what actions are legal. Anything matching that contract is a Gym environment, whether it simulates a 2D pole, an Atari ROM, a 3D humanoid, or a custom robotics rig. This narrow scope is part of why the API spread so quickly across other libraries and is still the foundation of the Gymnasium fork a decade later.^[1]^[5]^[6]

How does the Gym and Gymnasium API work?

The core of the library is a small object called Env. A typical interaction loop in the original Gym (versions before 0.26) looked like this: a researcher would call gym.make() to build a versioned environment, call reset() to get the first observation, and then repeatedly call step(action) until the environment signaled that the episode was over.^[1]^[5]

Method or attribute	Purpose
`gym.make("CartPole-v1")`	Construct a versioned environment by string ID
`env.reset()`	Reset internal state and return the first observation (and an `info` dict in Gymnasium)
`env.step(action)`	Apply an action and return `(observation, reward, done, info)` in classic Gym, or `(observation, reward, terminated, truncated, info)` from Gym 0.26 onward
`env.render()`	Visualize the current state, controlled by `render_mode`
`env.action_space`	A `Space` describing valid actions
`env.observation_space`	A `Space` describing the structure of observations
`env.close()`	Release rendering windows or simulator handles

Observation and action spaces are described by gym.spaces objects. The most common are Box for bounded continuous vectors, Discrete for a finite set of integer actions, MultiDiscrete and MultiBinary for structured discrete spaces, and Dict and Tuple for nested observations.^[5]^[6] Because the spaces are first-class objects, downstream libraries can ask an environment what shape its inputs are and build neural networks automatically. This is the reason that algorithm libraries such as Stable Baselines 3 and RLlib can train on any Gym-compatible environment with essentially zero glue code: the environment advertises its own shapes, and the algorithm reads them at construction time.^[6]^[10]

What changed with terminated and truncated?

For most of Gym's life, step() returned a single boolean called done that bundled together two very different events: the agent reached a terminal state of the underlying markov decision process mdp, or the episode was cut off by a time limit. Treating these the same caused subtle bugs in algorithms that bootstrap value estimates, because a time-cut episode is not really over from the agent's perspective and the value function should keep extending past the cutoff. Gym 0.26 (October 2022) and every Gymnasium release since split done into two flags: terminated for true MDP terminal states and truncated for time limits or external cutoffs. The new five-tuple return is now standard across the ecosystem.^[5]^[6]^[11]

Several other contract changes accompanied the split. reset() now returns a tuple of (observation, info) rather than just the observation, giving environments a place to attach per-episode metadata. Seeding moved into a keyword argument on reset(seed=...) instead of a separate env.seed() method, which was eventually deprecated. The render_mode is now declared at construction time (gym.make(..., render_mode="human")) rather than passed to each render() call, so an environment knows up front whether it needs to allocate rendering resources. These changes broke a lot of existing code, which is why Farama published the Shimmy compatibility shim to wrap pre-0.26 Gym environments and several non-Gym APIs as Gymnasium environments.^[5]^[6]

What are Gym wrappers?

Gym popularized the idea of stacking environment wrappers. A wrapper takes an existing Env and modifies one slice of its behavior, observations, actions, rewards, or episode lifecycle, while passing the rest through. The standard library ships a long list of them, including TimeLimit (cap episode length), RecordVideo (write rollout footage to disk), RecordEpisodeStatistics (track per-episode return and length), NormalizeObservation and NormalizeReward (running-mean normalization), FrameStack (concatenate the last N frames), and AtariPreprocessing (the canonical DQN-era 84x84 grayscale, frame-skip, max-pool pipeline). Because wrappers compose, a single line such as env = TimeLimit(FrameStack(AtariPreprocessing(env), 4), 10000) reproduces a fairly complex training pipeline.^[5]^[6]

What environments does Gym include?

Gym shipped with several families of environments, each with its own dependencies and typical research uses. Gymnasium inherited the same families and continues to maintain them.

Category	Examples	Notes
Classic control	`CartPole-v1`, `MountainCar-v0`, `Acrobot-v1`, `Pendulum-v1`	Lightweight 2D physics tasks taken from RL textbooks; useful for debugging and teaching
Toy text	`FrozenLake-v1`, `Taxi-v3`, `Blackjack-v1`, `CliffWalking-v0`	Tiny tabular MDPs used for tabular methods such as q-learning and sarsa
Box2D	`LunarLander-v2`, `BipedalWalker-v3`, `CarRacing-v2`	Built on the Box2D 2D physics engine; mid-difficulty continuous and discrete tasks
Atari	`Pong-v5`, `Breakout-v5`, `SpaceInvaders-v5`, plus roughly 60 ROMs	Wrapped from the Arcade Learning Environment (Bellemare et al., 2013); the standard benchmark for deep RL on pixels
MuJoCo	`Ant-v4`, `HalfCheetah-v4`, `Hopper-v4`, `Humanoid-v4`, `Walker2d-v4`	Continuous control with detailed contact physics; originally required a paid mujoco license, free under DeepMind since 2021
Robotics	`FetchReach-v1`, `HandManipulateBlock-v0`	Goal-based manipulation tasks; later spun out to a separate `Gymnasium-Robotics` package
Algorithmic	`Copy-v0`, `RepeatCopy-v0`, `ReversedAddition-v0`	Simple symbol-manipulation puzzles; deprecated in later Gym versions

Classic control and toy text

The classic-control suite is the easiest place to start: CartPole asks an agent to balance an inverted pendulum on a cart, MountainCar asks an under-powered car to climb a hill by building momentum, Acrobot swings a two-link pendulum up to a target height, and Pendulum-v1 simply asks for upright stabilization with continuous torque. These are tiny 2D physics problems with state vectors of four to six floats and either discrete or one-dimensional continuous actions. Textbooks like Sutton and Barto have used variants of these tasks for decades, and they remain the standard sanity-check for any new algorithm implementation.^[5]^[6]

The toy-text family covers tabular reinforcement learning. FrozenLake is a four-by-four (or eight-by-eight) gridworld with slippery transitions; Taxi-v3 is a five-by-five world where a taxi picks up and drops off passengers; Blackjack is the card game; CliffWalking is the famous example from Sutton and Barto that contrasts SARSA and Q-learning. These environments have small, enumerable state spaces, so they let students and researchers exercise tabular methods without any function approximation at all.^[5]^[6]

Box2D, Atari, and MuJoCo

The Box2D family uses the Box2D 2D physics engine. LunarLander asks an agent to land a craft between two flags, BipedalWalker has a two-legged robot traverse rough terrain, and CarRacing is a top-down driving task with pixel observations. These are noticeably harder than classic control but still cheap to simulate. CarRacing in particular has been a common benchmark for image-based continuous control.^[5]^[6]

The Atari family is the most influential of the bunch. By wrapping the Arcade Learning Environment and standardizing pre-processing (84x84 grayscale frames, frame-skip of four, life-loss as a terminal signal in some configurations), Gym made it trivial to reproduce the original DQN paper's experimental setup, and a generation of deep RL papers ran on exactly that suite of games. The 49-game DQN benchmark gave way to the broader 57-game Atari-57 set used by later work like Rainbow, IMPALA, R2D2, MuZero, and Agent57.^[8]^[9]^[12] In 2024 the Arcade Learning Environment 2.0 release, maintained jointly with Farama, integrated the modern Gymnasium API and replaced the older atari-py dependency.^[13]

The MuJoCo family covers continuous control with detailed multi-joint physics: Ant (a quadruped), HalfCheetah (a planar two-leg runner), Hopper, Humanoid, and Walker2d. These were originally distributed against the proprietary MuJoCo physics engine, which required a paid license and a separate Python binding (mujoco-py). In October 2021, DeepMind acquired MuJoCo and open-sourced it under Apache 2.0, after which the official mujoco Python bindings replaced mujoco-py in both Gym and Gymnasium environment versions four and above.^[14]

The robotics family (Fetch and Shadow Hand manipulators) was originally part of Gym and is now maintained as the separate Gymnasium-Robotics package under Farama. The algorithmic family was deprecated and removed by later Gym versions.^[5]^[6]

How do you install and use Gym?

In the original Gym, the base install was pip install gym. Optional extras pulled in environment-specific dependencies, for example pip install gym[atari] for Atari ROMs via ale-py, pip install gym[box2d] for the Box2D family, and pip install gym[mujoco] for the MuJoCo continuous control suite. The same pattern carries over to Gymnasium: pip install gymnasium, pip install "gymnasium[atari]", pip install "gymnasium[all]".^[5]^[6]

A minimal random-agent loop reads almost identically in either library:

import gymnasium as gym
env = gym.make("CartPole-v1", render_mode="human")
obs, info = env.reset(seed=42)
done = False
while not done:
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    done = terminated or truncated
env.close()

The seed argument to reset() is itself a Gymnasium-era addition. Earlier Gym versions exposed seeding through a separate env.seed() method, which was eventually deprecated.^[5]^[6] A subtle gotcha for newcomers: in pre-0.26 code, env.reset() returned just obs (not a tuple), and env.step() returned a four-tuple. Most algorithm libraries detected the API version at runtime for a while, but new code should target the five-tuple convention exclusively.^[11]

Custom environments follow the same protocol. To register a new task with gym.make(), a developer subclasses gym.Env, implements reset, step, the two space attributes, and optionally render and close, and then calls gym.register() with a versioned ID. Because the contract is small, third-party environments such as Minigrid, MetaWorld, Procgen, MiniHack, CARLA wrappers, and many domain-specific simulators integrate with no changes to user code.^[5]^[6]

What did OpenAI build on top of Gym?

OpenAI built several projects on top of Gym during the years it was actively maintained, and most of them are now retired or community-maintained.

Universe

Universe, released by OpenAI in December 2016, used Gym's interface as the agent-side contract while running arbitrary desktop and browser programs inside Docker containers. Each container exposed a VNC server for pixels and keyboard or mouse events plus a separate WebSocket channel for reward signals, so Flash games, browser tasks, and even commercial titles like Grand Theft Auto V could be treated as Gym environments. The initial release advertised over 1,000 environments, of which a few hundred had reward signals wired up. Universe was effectively shelved by 2017 when OpenAI shifted focus to dedicated game research; the GitHub repository remained but stopped receiving updates.^[15]

Roboschool

Roboschool, released by OpenAI in May 2017, was an open source robotics simulator built on the Bullet physics engine. It provided MuJoCo-style continuous control environments without the proprietary license that MuJoCo required at the time and integrated with Gym through the standard interface. OpenAI deprecated Roboschool in 2019 in favor of MuJoCo-based environments after MuJoCo itself moved toward a free license; Bullet-based RL environments live on in projects like PyBullet Gym.^[16]

Gym Retro

gym retro, launched in 2018, extended the Atari pattern to many more retro consoles, including SNES, Sega Genesis, NES, Game Boy, and Atari 2600. The full release shipped over 1,000 games and tools for adding new ones via game integration files. Gym Retro powered OpenAI's Retro Contest, a generalization-focused competition built around Sonic the Hedgehog levels; the contest produced research on transfer learning and procedurally generated levels.^[17]

Safety Gym

Safety Gym, released in 2019 by Alex Ray, Joshua Achiam, and Dario Amodei, focused on constrained RL and safe exploration. It included an environment-builder for composing tasks out of physics elements, goals, and safety constraints, plus a benchmark suite of 18 high-dimensional continuous control environments and nine debugging environments. Like Roboschool, Safety Gym is no longer actively maintained by OpenAI; the Farama Foundation now hosts a successor called Safety-Gymnasium.^[18]

Procgen

Procgen, released by OpenAI in 2019, was a suite of 16 procedurally generated game environments designed to measure generalization in deep RL. The motivation was that fixed Atari and MuJoCo levels reward memorization as much as policy learning, so a benchmark whose levels are sampled from a generator gives a cleaner read on generalization. Procgen environments include CoinRun, Maze, BigFish, and others, all using the Gym API.^[19]

How did Gym influence the RL ecosystem?

Gym's most lasting contribution is the API itself. Almost every popular RL library released after 2016 either consumes Gym environments directly or implements a compatible adapter.

Project	Relationship to Gym
Stable Baselines and Stable Baselines 3	Algorithm libraries that train against any Gym-compatible environment; SB3 added explicit Gymnasium support in 2023
RLlib	Ray's distributed RL framework; uses the Gym and Gymnasium API as its environment standard
PettingZoo	Multi-agent counterpart to Gym from the same Farama team; designed as the multi-agent analogue of `gym.Env`
CleanRL	Single-file reference implementations of RL algorithms; written against Gym and later Gymnasium
Tianshou	Modular PyTorch RL library that adopts the Gym API
TorchRL	PyTorch-native RL library from Meta; consumes Gym and Gymnasium environments through wrappers
Acme	DeepMind's RL agent library; ships Gym compatibility wrappers
dm-control and DeepMind Lab	Originally separate; now offer Gym wrappers via the Farama Foundation's Shimmy compatibility layer
Unity ML-Agents	Game-engine RL platform; provides a Gym wrapper so existing agents can drive Unity scenes
Isaac Gym and successors	NVIDIA's GPU-parallelized robotics simulator family; Isaac Gym used the Gym API directly, succeeded by Isaac Sim and isaac lab

The ripple effect goes beyond Python. Several non-Python tools, including Unity ML-Agents and PettingZoo's parallel API, model their interfaces explicitly on Gym so that existing agents and training scripts can be reused with minimal changes.^[10]^[20] On the algorithm side, the canonical implementations of dqn, asynchronous advantage actor-critic, ppo, soft actor critic, ddpg, TD3, Rainbow, IMPALA, R2D2, and MuZero have all been benchmarked at one point or another on Gym Atari or Gym MuJoCo tasks, and the public scoreboards baked into the original Gym website (before it was retired) were among the first community-curated leaderboards in RL.^[1]^[9]^[12]^[21]^[22]

Adoption in foundational deep RL papers

The 2013 dqn preprint and the 2015 Nature paper by Mnih et al. predate Gym, but the post-Gym era of deep RL is dominated by works that use it as their evaluation harness. ppo (Schulman et al., 2017) explicitly used the Gym MuJoCo suite for its main continuous-control comparisons; soft actor critic (Haarnoja et al., 2018) reported numbers on Hopper, Walker2d, HalfCheetah, Ant, and Humanoid from the Gym MuJoCo family; Rainbow (Hessel et al., 2018) combined six DQN extensions and reported aggregate Atari-57 performance using the standard Gym wrappers.^[21]^[22]^[12] Later distributional and recurrent methods like IMPALA, R2D2, and MuZero relied on the same benchmark family for direct comparability.^[23] alphazero and muzero, while not direct consumers of Gym, share its convention of separating environment from learner and have influenced how Farama designs new benchmarks.

When did Gym become Gymnasium?

By 2020, OpenAI's research priorities had shifted decisively toward large language models, and Gym went largely unmaintained for most of that year. Pull requests piled up, environment versions drifted out of sync with their underlying simulators, and several core dependencies (notably MuJoCo and Atari) changed their licensing or distribution model in ways that broke the default install.^[4]^[5]^[6] The MuJoCo open-sourcing in October 2021 and the migration from atari-py to ale-py were the two most disruptive of these shifts; without active maintenance, the published pip install gym[atari] and pip install gym[mujoco] paths went stale.^[14]

In early 2021, OpenAI agreed to hand the repository to a volunteer maintenance team led by Jordan Terry, who had been doing much of the upkeep informally. That team founded the Farama Foundation, a nonprofit dedicated to open source RL infrastructure, which was publicly announced on October 25, 2022.^[4] In its launch post the Farama team stated its goal plainly: "Our mission is to develop and maintain open source reinforcement learning tools, making reinforcement learning research faster and more productive."^[4] The same announcement introduced Gymnasium as the long-term home for the Gym API and noted that "It's our understanding that OpenAI has no plans to develop Gym going forward," so the fork would not split the community between competing libraries.^[4] Mark Towers became the lead Gymnasium maintainer, with Ariel Kwiatkowski and other contributors handling subsystems such as the MuJoCo bindings, the Atari integration, the robotics fork, and the documentation site.^[4]^[5]^[6]

The 2024 paper "Gymnasium: A Standard Interface for Reinforcement Learning Environments" (arXiv:2407.17032) by Mark Towers, Ariel Kwiatkowski, Jordan Terry, John U. Balis, and twelve other authors documented the API in its current form and was accepted at NeurIPS Datasets and Benchmarks 2025. The paper frames Gymnasium as the de facto standard interface for single-agent RL and discusses interoperability with the rest of the Farama ecosystem.^[7]

Key changes between Gym and Gymnasium include the terminated and truncated split described above, a stricter contract for reset(seed=...) deterministic seeding, a unified render_mode argument set at construction time rather than passed to render(), and updated MuJoCo environments based on the open source mujoco Python bindings instead of the older mujoco-py. The Farama Foundation also publishes Shimmy, a compatibility layer that wraps older Gym environments and several non-Gym APIs (DeepMind Control Suite, OpenSpiel, Atari ALE) so they can be used as Gymnasium environments.^[4]^[5]^[6]

The wider Farama ecosystem

Gymnasium sits at the center of a family of related projects maintained by Farama, all of which share or extend the Gym contract.

Project	Scope	Notes
Gymnasium	Single-agent environment API	Direct successor to OpenAI Gym; current version 1.3.0 (April 2026)
Gymnasium-Robotics	Goal-conditioned manipulation tasks	Hosts the Fetch and Shadow Hand environments that were once part of Gym
PettingZoo	Multi-agent environment API	The multi-agent analogue of `gym.Env`; introduced by Terry et al. in 2020
MAgent2	Large-scale multi-agent battles	Hundreds to thousands of agents per scene; uses PettingZoo's parallel API
Minigrid	Grid-world tasks	Originally Chevalier-Boisvert et al.; common benchmark for exploration and curriculum learning
MiniWorld	First-person 3D grid environments	Pixel-based generalization tasks
Safety-Gymnasium	Constrained RL benchmarks	Continuation of Safety Gym under Farama
Shimmy	Compatibility shim	Wraps legacy Gym, DeepMind Control Suite, OpenSpiel, dm-env, and Melting Pot as Gymnasium environments
Arcade Learning Environment 2.0+	Atari benchmark	Co-maintained with the original ALE authors; ships native Gymnasium support
MO-Gymnasium	Multi-objective RL	Vector reward variants of standard tasks

This collection covers most of the niches that motivated OpenAI's original spin-off projects (multi-agent, retro games, safety, large worlds) while keeping a single API surface.^[4]^[5]^[6]^[20]

Several non-Farama projects sit alongside the Gym ecosystem rather than inside it. Brax (Freeman et al., DeepMind, 2021) is a JAX-native rigid-body simulator that ships its own Gym-style interface and is widely used for massively parallel continuous-control RL on TPUs and GPUs.^[24] NVIDIA's Isaac Gym was a GPU-resident robotics simulator that later evolved into isaac lab on top of Isaac Sim; both expose Gym-compatible task APIs.^[25] MuJoCo MJX (introduced in 2023) is the JAX port of MuJoCo and ships Gymnasium-compatible environments through mujoco_playground. MetaWorld, NetHack Learning Environment, MiniHack, Procgen, CARLA, and Habitat all expose Gym or Gymnasium adapters for their respective domains.^[26]

What breaks when migrating from Gym to Gymnasium?

Three things bite newcomers most often when moving between Gym and Gymnasium. First, seed handling: pre-0.26 code called env.seed(s) once and then env.reset(), while Gymnasium expects env.reset(seed=s) on each episode where determinism matters; calling the old seed method on a Gymnasium environment is a no-op. Second, the return-tuple change in step(): code that unpacks four values from step() breaks on Gymnasium, and code that ignores truncated will incorrectly bootstrap or fail to bootstrap on time-limit cutoffs. Third, render mode: pre-0.26 code passed mode="human" to render() every step, while Gymnasium expects render_mode="human" at gym.make() construction.^[5]^[6]^[11]

Several environment IDs were renamed across the transition. Pendulum-v0 became Pendulum-v1 well before the Farama fork to fix a reward calculation bug, and the MuJoCo environments moved through versions two, three, and four as the underlying bindings switched from mujoco-py to the official mujoco package. Robotics environments were renamed when they moved to Gymnasium-Robotics (the old FetchReach-v1 is now FetchReach-v3 with updated kinematics). Code that hard-codes a specific environment ID should be reviewed when upgrading Gymnasium versions.^[5]^[6]

Is OpenAI Gym still maintained?

Gym is, by any reasonable measure, the most influential single piece of infrastructure in modern reinforcement learning research. The original openai/gym repository accumulated more than 37,000 GitHub stars and 8,700 forks before being archived, and the standard Atari, MuJoCo, and classic-control benchmark numbers reported in essentially every deep RL paper from 2016 onward trace back to environments first packaged here.^[3] The Brockman et al. whitepaper has been cited tens of thousands of times on Google Scholar, comparable to other widely cited infrastructure papers in machine learning.^[1]

For new work, however, the toolkit itself is no longer the right starting point. The original repository is read-only, several environment families have moved to Farama-maintained packages (ale-py for Atari, Gymnasium-Robotics, Safety-Gymnasium), and the API improvements introduced after 2022 only exist in Gymnasium. The practical advice from both OpenAI and Farama is the same: install Gymnasium and import it with the alias gym if backward compatibility matters.^[3]^[4]^[5]^[6]

Viewed in retrospect, the most interesting thing about Gym may be how little it tried to do. It defined a small contract, packaged a handful of canonical task families, and let other people build the algorithm libraries, the visualization tools, and the multi-agent extensions. The Farama team's decision to preserve that minimalism rather than rewrite the API from scratch is the main reason Gymnasium has been adopted so quickly. The same env.reset(), env.step(action), observation_space, action_space pattern that Brockman and colleagues sketched in 2016 is still the contract that an RL agent and an RL environment use to talk to each other in 2026.

References

Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W., "OpenAI Gym", arXiv:1606.01540, 2016-06-05. https://arxiv.org/abs/1606.01540. Accessed 2026-05-26. ↩
OpenAI, "OpenAI Gym Beta", OpenAI blog, 2016-04-27. https://openai.com/index/openai-gym-beta/. Accessed 2026-05-26. ↩
OpenAI, "openai/gym: A toolkit for developing and comparing reinforcement learning algorithms", GitHub repository (archived 2026-04-08). https://github.com/openai/gym. Accessed 2026-05-26. ↩
Farama Foundation, "Announcing The Farama Foundation", farama.org, 2022-10-25. https://farama.org/Announcing-The-Farama-Foundation. Accessed 2026-05-26. ↩
Farama Foundation, "Gymnasium Documentation", gymnasium.farama.org. https://gymnasium.farama.org/. Accessed 2026-05-26. ↩
Farama Foundation, "Farama-Foundation/Gymnasium", GitHub repository, current version 1.3.0 released 2026-04-22. https://github.com/Farama-Foundation/Gymnasium. Accessed 2026-05-26. ↩
Towers, M., Kwiatkowski, A., Terry, J., Balis, J. U., et al., "Gymnasium: A Standard Interface for Reinforcement Learning Environments", arXiv:2407.17032, 2024-07-24. https://arxiv.org/abs/2407.17032. Accessed 2026-05-26. ↩
Bellemare, M. G., Naddaf, Y., Veness, J., and Bowling, M., "The Arcade Learning Environment: An Evaluation Platform for General Agents", Journal of Artificial Intelligence Research 47, 253-279, 2013. arXiv:1207.4708. https://arxiv.org/abs/1207.4708. Accessed 2026-05-26. ↩
Mnih, V., Kavukcuoglu, K., Silver, D., et al., "Human-level control through deep reinforcement learning", Nature 518, 529-533, 2015-02-26. https://www.nature.com/articles/nature14236. Accessed 2026-05-26. ↩
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., and Dormann, N., "Stable-Baselines3: Reliable Reinforcement Learning Implementations", Journal of Machine Learning Research, 2021. https://jmlr.org/papers/v22/20-1364.html. Accessed 2026-05-26. ↩
Farama Foundation, "Migration Guide v0.21 to v0.26 / Gymnasium", Gymnasium documentation. https://gymnasium.farama.org/introduction/migration_guide/. Accessed 2026-05-26. ↩
Hessel, M., Modayil, J., van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., and Silver, D., "Rainbow: Combining Improvements in Deep Reinforcement Learning", AAAI 2018. arXiv:1710.02298. https://arxiv.org/abs/1710.02298. Accessed 2026-05-26. ↩
Farama Foundation, "Arcade Learning Environment 2.0 release notes", Farama-Foundation/Arcade-Learning-Environment GitHub. https://github.com/Farama-Foundation/Arcade-Learning-Environment. Accessed 2026-05-26. ↩
DeepMind, "Opening up a physics simulator for robotics", deepmind.google blog, 2021-10-18. https://deepmind.google/discover/blog/opening-up-a-physics-simulator-for-robotics/. Accessed 2026-05-26. ↩
OpenAI, "openai/universe", GitHub repository, released December 2016. https://github.com/openai/universe. Accessed 2026-05-26. ↩
OpenAI, "openai/roboschool", GitHub repository (deprecated 2019). https://github.com/openai/roboschool. Accessed 2026-05-26. ↩
OpenAI, "openai/retro: Retro Games in Gym", GitHub repository. https://github.com/openai/retro. Accessed 2026-05-26. ↩
Ray, A., Achiam, J., and Amodei, D., "Benchmarking Safe Exploration in Deep Reinforcement Learning", OpenAI technical report, 2019. https://cdn.openai.com/safexp-short.pdf. Accessed 2026-05-26. ↩
Cobbe, K., Hesse, C., Hilton, J., and Schulman, J., "Leveraging Procedural Generation to Benchmark Reinforcement Learning", arXiv:1912.01588, 2019. https://arxiv.org/abs/1912.01588. Accessed 2026-05-26. ↩
Terry, J., Black, B., Grammel, N., et al., "PettingZoo: Gym for Multi-Agent Reinforcement Learning", arXiv:2009.14471, 2020. https://arxiv.org/abs/2009.14471. Accessed 2026-05-26. ↩
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O., "Proximal Policy Optimization Algorithms", arXiv:1707.06347, 2017-07-20. https://arxiv.org/abs/1707.06347. Accessed 2026-05-26. ↩
Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S., "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor", ICML 2018. arXiv:1801.01290. https://arxiv.org/abs/1801.01290. Accessed 2026-05-26. ↩
Espeholt, L., Soyer, H., Munos, R., et al., "IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures", ICML 2018. arXiv:1802.01561. https://arxiv.org/abs/1802.01561. Accessed 2026-05-26. ↩
Freeman, C. D., Frey, E., Raichuk, A., Girgin, S., Mordatch, I., and Bachem, O., "Brax: A Differentiable Physics Engine for Large Scale Rigid Body Simulation", arXiv:2106.13281, 2021. https://arxiv.org/abs/2106.13281. Accessed 2026-05-26. ↩
Makoviychuk, V., Wawrzyniak, L., Guo, Y., et al., "Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning", arXiv:2108.10470, 2021. https://arxiv.org/abs/2108.10470. Accessed 2026-05-26. ↩
Yu, T., Quillen, D., He, Z., Julian, R., Hausman, K., Finn, C., and Levine, S., "Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning", CoRL 2019. arXiv:1910.10897. https://arxiv.org/abs/1910.10897. Accessed 2026-05-26. ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

5 revisions by 1 contributors · full history

Suggest edit

Gym (OpenAI Gym / Gymnasium)

Quick facts

Why was OpenAI Gym created?

How does the Gym and Gymnasium API work?

What changed with terminated and truncated?

What are Gym wrappers?

What environments does Gym include?

Classic control and toy text

Box2D, Atari, and MuJoCo

How do you install and use Gym?

What did OpenAI build on top of Gym?

Universe

Roboschool

Gym Retro

Safety Gym

Procgen

How did Gym influence the RL ecosystem?

Adoption in foundational deep RL papers

When did Gym become Gymnasium?

The wider Farama ecosystem

What breaks when migrating from Gym to Gymnasium?

Is OpenAI Gym still maintained?

See also

References

Improve this article

What links here

What links here

Quick facts

Why was OpenAI Gym created?

How does the Gym and Gymnasium API work?

What changed with terminated and truncated?

What are Gym wrappers?

What environments does Gym include?

Classic control and toy text

Box2D, Atari, and MuJoCo

How do you install and use Gym?

What did OpenAI build on top of Gym?

Universe

Roboschool

Gym Retro

Safety Gym

Procgen

How did Gym influence the RL ecosystem?

Adoption in foundational deep RL papers

When did Gym become Gymnasium?

The wider Farama ecosystem

What breaks when migrating from Gym to Gymnasium?

Is OpenAI Gym still maintained?

See also

References

Improve this article

Related Articles

OpenAI Five

John Schulman

Dactyl (OpenAI)

OpenAI Baselines

Spinning Up

GPT API

What links here

Related Articles

OpenAI Five

John Schulman

Dactyl (OpenAI)

OpenAI Baselines

Spinning Up

GPT API

What links here