# Gym (OpenAI Gym / Gymnasium)

> Source: https://aiwiki.ai/wiki/gym
> Updated: 2026-06-21
> Categories: Developer Tools, OpenAI, Reinforcement Learning
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

Gym, often written as **OpenAI Gym**, is an open source Python toolkit for developing and comparing reinforcement learning algorithms, originally released by [openai](/wiki/openai) on April 27, 2016.[^1][^2] The original whitepaper opens with the single-sentence definition, "OpenAI Gym is a toolkit for reinforcement learning research."[^1] It pairs a small, opinionated programming interface with a curated collection of benchmark environments so that researchers can plug a [reinforcement learning](/wiki/reinforcement_learning) agent into a wide variety of tasks without having to rewrite the simulation code each time. Although it was built around RL, the same `env.reset()` and `env.step(action)` calls work fine with imitation learning, evolutionary search, and other approaches that need a uniform notion of "environment."[^1][^3] OpenAI stopped active maintenance of Gym around 2020 and 2021, the codebase was handed to a volunteer team, and in October 2022 that team officially relaunched the project as Gymnasium under the [farama foundation](/wiki/farama_foundation).[^4][^5][^6] The original `openai/gym` repository was archived on April 8, 2026, and Gymnasium is now the canonical successor; it can be dropped into existing projects by replacing `import gym` with `import gymnasium as gym`.[^4][^5][^6]

The Gym API, with its trio of `reset()`, `step(action)`, and `render()` methods plus typed observation and action spaces, is the de facto standard interface in modern reinforcement learning research. Almost every popular RL library released after 2016, including Stable Baselines, RLlib, CleanRL, Tianshou, and TorchRL, either consumes Gym or Gymnasium environments directly or implements a compatible adapter.[^5][^6] The original OpenAI Gym whitepaper has been cited well over ten thousand times on Google Scholar, and the Atari, MuJoCo, and classic-control benchmark numbers reported in essentially every deep RL paper from 2016 onward trace back to environments first packaged here.[^1][^7]

## Quick facts

| Attribute | Detail |
|---|---|
| Original name | OpenAI Gym |
| Initial public beta | April 27, 2016 |
| Whitepaper | Brockman et al., arXiv:1606.01540, submitted June 5, 2016 |
| Original developer | OpenAI |
| Current maintainer | Farama Foundation, as Gymnasium |
| Final OpenAI Gym release | 0.26.2, October 4, 2022 |
| Farama announcement | October 25, 2022 |
| Original repository archived | April 8, 2026 |
| Latest Gymnasium version | 1.3.0, April 22, 2026 |
| License | MIT |
| Languages | Python (3.7+ for late Gym; 3.10 through 3.13 for current Gymnasium) |
| Gymnasium paper | Towers et al., arXiv:2407.17032, July 24, 2024 |
| Successor | Gymnasium (Farama Foundation) |

## Why was OpenAI Gym created?

Before Gym, almost every reinforcement learning paper shipped with its own custom simulator and its own way of feeding observations into a learning algorithm. Comparing two methods meant either reimplementing somebody else's environment or trusting a number printed in a table. Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, [john schulman](/wiki/john_schulman), Jie Tang, and Wojciech Zaremba argued in the original Gym whitepaper that this lack of a shared evaluation surface was holding the field back, particularly as deep RL was starting to show real results on Atari games and continuous control benchmarks.[^1] [greg brockman](/wiki/greg_brockman) later described Gym as an attempt to do for RL what [imagenet](/wiki/imagenet) had done for supervised vision: provide a shared, well-versioned set of tasks plus a public site for comparing results, so that progress could actually be measured rather than just claimed.[^1][^2]

The intellectual lineage is straightforward. The Atari benchmark suite came from the Arcade Learning Environment by Marc Bellemare and colleagues, published in the Journal of Artificial Intelligence Research in 2013.[^8] DeepMind's [dqn](/wiki/dqn) paper applied deep Q-learning to that suite in 2013 and 2015, posting human-level scores on 29 of 49 games and demonstrating that a single network architecture could learn many games from raw pixels.[^9] By 2016, when Gym launched, researchers wanted to reproduce these results and extend them to continuous-control tasks; the missing piece was a uniform Python interface that would let an algorithm written for CartPole also drive a MuJoCo humanoid without code changes.[^1]

The response was deliberately minimal. Gym does not ship a learning algorithm at all. It only defines a contract: an environment is anything that exposes `reset()`, `step(action)`, and a pair of spaces describing what observations look like and what actions are legal. Anything matching that contract is a Gym environment, whether it simulates a 2D pole, an Atari ROM, a 3D humanoid, or a custom robotics rig. This narrow scope is part of why the API spread so quickly across other libraries and is still the foundation of the Gymnasium fork a decade later.[^1][^5][^6]

## How does the Gym and Gymnasium API work?

The core of the library is a small object called `Env`. A typical interaction loop in the original Gym (versions before 0.26) looked like this: a researcher would call `gym.make()` to build a versioned environment, call `reset()` to get the first observation, and then repeatedly call `step(action)` until the environment signaled that the episode was over.[^1][^5]

| Method or attribute | Purpose |
|---|---|
| `gym.make("CartPole-v1")` | Construct a versioned environment by string ID |
| `env.reset()` | Reset internal state and return the first observation (and an `info` dict in Gymnasium) |
| `env.step(action)` | Apply an action and return `(observation, reward, done, info)` in classic Gym, or `(observation, reward, terminated, truncated, info)` from Gym 0.26 onward |
| `env.render()` | Visualize the current state, controlled by `render_mode` |
| `env.action_space` | A `Space` describing valid actions |
| `env.observation_space` | A `Space` describing the structure of observations |
| `env.close()` | Release rendering windows or simulator handles |

Observation and action spaces are described by `gym.spaces` objects. The most common are `Box` for bounded continuous vectors, `Discrete` for a finite set of integer actions, `MultiDiscrete` and `MultiBinary` for structured discrete spaces, and `Dict` and `Tuple` for nested observations.[^5][^6] Because the spaces are first-class objects, downstream libraries can ask an environment what shape its inputs are and build neural networks automatically. This is the reason that algorithm libraries such as Stable Baselines 3 and RLlib can train on any Gym-compatible environment with essentially zero glue code: the environment advertises its own shapes, and the algorithm reads them at construction time.[^6][^10]

### What changed with terminated and truncated?

For most of Gym's life, `step()` returned a single boolean called `done` that bundled together two very different events: the agent reached a terminal state of the underlying [markov decision process mdp](/wiki/markov_decision_process_mdp), or the episode was cut off by a time limit. Treating these the same caused subtle bugs in algorithms that bootstrap value estimates, because a time-cut episode is not really over from the agent's perspective and the value function should keep extending past the cutoff. Gym 0.26 (October 2022) and every Gymnasium release since split `done` into two flags: `terminated` for true MDP terminal states and `truncated` for time limits or external cutoffs. The new five-tuple return is now standard across the ecosystem.[^5][^6][^11]

Several other contract changes accompanied the split. `reset()` now returns a tuple of `(observation, info)` rather than just the observation, giving environments a place to attach per-episode metadata. Seeding moved into a keyword argument on `reset(seed=...)` instead of a separate `env.seed()` method, which was eventually deprecated. The `render_mode` is now declared at construction time (`gym.make(..., render_mode="human")`) rather than passed to each `render()` call, so an environment knows up front whether it needs to allocate rendering resources. These changes broke a lot of existing code, which is why Farama published the Shimmy compatibility shim to wrap pre-0.26 Gym environments and several non-Gym APIs as Gymnasium environments.[^5][^6]

### What are Gym wrappers?

Gym popularized the idea of stacking environment wrappers. A wrapper takes an existing `Env` and modifies one slice of its behavior, observations, actions, rewards, or episode lifecycle, while passing the rest through. The standard library ships a long list of them, including `TimeLimit` (cap episode length), `RecordVideo` (write rollout footage to disk), `RecordEpisodeStatistics` (track per-episode return and length), `NormalizeObservation` and `NormalizeReward` (running-mean normalization), `FrameStack` (concatenate the last N frames), and `AtariPreprocessing` (the canonical DQN-era 84x84 grayscale, frame-skip, max-pool pipeline). Because wrappers compose, a single line such as `env = TimeLimit(FrameStack(AtariPreprocessing(env), 4), 10000)` reproduces a fairly complex training pipeline.[^5][^6]

## What environments does Gym include?

Gym shipped with several families of environments, each with its own dependencies and typical research uses. Gymnasium inherited the same families and continues to maintain them.

| Category | Examples | Notes |
|---|---|---|
| Classic control | `CartPole-v1`, `MountainCar-v0`, `Acrobot-v1`, `Pendulum-v1` | Lightweight 2D physics tasks taken from RL textbooks; useful for debugging and teaching |
| Toy text | `FrozenLake-v1`, `Taxi-v3`, `Blackjack-v1`, `CliffWalking-v0` | Tiny tabular MDPs used for tabular methods such as [q-learning](/wiki/q-learning) and [sarsa](/wiki/sarsa) |
| Box2D | `LunarLander-v2`, `BipedalWalker-v3`, `CarRacing-v2` | Built on the Box2D 2D physics engine; mid-difficulty continuous and discrete tasks |
| Atari | `Pong-v5`, `Breakout-v5`, `SpaceInvaders-v5`, plus roughly 60 ROMs | Wrapped from the Arcade Learning Environment (Bellemare et al., 2013); the standard benchmark for deep RL on pixels |
| MuJoCo | `Ant-v4`, `HalfCheetah-v4`, `Hopper-v4`, `Humanoid-v4`, `Walker2d-v4` | Continuous control with detailed contact physics; originally required a paid [mujoco](/wiki/mujoco) license, free under DeepMind since 2021 |
| Robotics | `FetchReach-v1`, `HandManipulateBlock-v0` | Goal-based manipulation tasks; later spun out to a separate `Gymnasium-Robotics` package |
| Algorithmic | `Copy-v0`, `RepeatCopy-v0`, `ReversedAddition-v0` | Simple symbol-manipulation puzzles; deprecated in later Gym versions |

### Classic control and toy text

The classic-control suite is the easiest place to start: CartPole asks an agent to balance an inverted pendulum on a cart, MountainCar asks an under-powered car to climb a hill by building momentum, Acrobot swings a two-link pendulum up to a target height, and Pendulum-v1 simply asks for upright stabilization with continuous torque. These are tiny 2D physics problems with state vectors of four to six floats and either discrete or one-dimensional continuous actions. Textbooks like Sutton and Barto have used variants of these tasks for decades, and they remain the standard sanity-check for any new algorithm implementation.[^5][^6]

The toy-text family covers tabular reinforcement learning. FrozenLake is a four-by-four (or eight-by-eight) gridworld with slippery transitions; Taxi-v3 is a five-by-five world where a taxi picks up and drops off passengers; Blackjack is the card game; CliffWalking is the famous example from Sutton and Barto that contrasts SARSA and Q-learning. These environments have small, enumerable state spaces, so they let students and researchers exercise tabular methods without any function approximation at all.[^5][^6]

### Box2D, Atari, and MuJoCo

The Box2D family uses the Box2D 2D physics engine. LunarLander asks an agent to land a craft between two flags, BipedalWalker has a two-legged robot traverse rough terrain, and CarRacing is a top-down driving task with pixel observations. These are noticeably harder than classic control but still cheap to simulate. CarRacing in particular has been a common benchmark for image-based continuous control.[^5][^6]

The Atari family is the most influential of the bunch. By wrapping the Arcade Learning Environment and standardizing pre-processing (84x84 grayscale frames, frame-skip of four, life-loss as a terminal signal in some configurations), Gym made it trivial to reproduce the original DQN paper's experimental setup, and a generation of deep RL papers ran on exactly that suite of games. The 49-game DQN benchmark gave way to the broader 57-game Atari-57 set used by later work like Rainbow, IMPALA, R2D2, MuZero, and Agent57.[^8][^9][^12] In 2024 the Arcade Learning Environment 2.0 release, maintained jointly with Farama, integrated the modern Gymnasium API and replaced the older `atari-py` dependency.[^13]

The MuJoCo family covers continuous control with detailed multi-joint physics: Ant (a quadruped), HalfCheetah (a planar two-leg runner), Hopper, Humanoid, and Walker2d. These were originally distributed against the proprietary MuJoCo physics engine, which required a paid license and a separate Python binding (`mujoco-py`). In October 2021, DeepMind acquired MuJoCo and open-sourced it under Apache 2.0, after which the official `mujoco` Python bindings replaced `mujoco-py` in both Gym and Gymnasium environment versions four and above.[^14]

The robotics family (Fetch and Shadow Hand manipulators) was originally part of Gym and is now maintained as the separate `Gymnasium-Robotics` package under Farama. The algorithmic family was deprecated and removed by later Gym versions.[^5][^6]

## How do you install and use Gym?

In the original Gym, the base install was `pip install gym`. Optional extras pulled in environment-specific dependencies, for example `pip install gym[atari]` for Atari ROMs via `ale-py`, `pip install gym[box2d]` for the Box2D family, and `pip install gym[mujoco]` for the MuJoCo continuous control suite. The same pattern carries over to Gymnasium: `pip install gymnasium`, `pip install "gymnasium[atari]"`, `pip install "gymnasium[all]"`.[^5][^6]

A minimal random-agent loop reads almost identically in either library:

```python
import gymnasium as gym
env = gym.make("CartPole-v1", render_mode="human")
obs, info = env.reset(seed=42)
done = False
while not done:
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    done = terminated or truncated
env.close()
```

The `seed` argument to `reset()` is itself a Gymnasium-era addition. Earlier Gym versions exposed seeding through a separate `env.seed()` method, which was eventually deprecated.[^5][^6] A subtle gotcha for newcomers: in pre-0.26 code, `env.reset()` returned just `obs` (not a tuple), and `env.step()` returned a four-tuple. Most algorithm libraries detected the API version at runtime for a while, but new code should target the five-tuple convention exclusively.[^11]

Custom environments follow the same protocol. To register a new task with `gym.make()`, a developer subclasses `gym.Env`, implements `reset`, `step`, the two space attributes, and optionally `render` and `close`, and then calls `gym.register()` with a versioned ID. Because the contract is small, third-party environments such as Minigrid, MetaWorld, Procgen, MiniHack, CARLA wrappers, and many domain-specific simulators integrate with no changes to user code.[^5][^6]

## What did OpenAI build on top of Gym?

OpenAI built several projects on top of Gym during the years it was actively maintained, and most of them are now retired or community-maintained.

### Universe

Universe, released by OpenAI in December 2016, used Gym's interface as the agent-side contract while running arbitrary desktop and browser programs inside Docker containers. Each container exposed a VNC server for pixels and keyboard or mouse events plus a separate WebSocket channel for reward signals, so Flash games, browser tasks, and even commercial titles like Grand Theft Auto V could be treated as Gym environments. The initial release advertised over 1,000 environments, of which a few hundred had reward signals wired up. Universe was effectively shelved by 2017 when OpenAI shifted focus to dedicated game research; the GitHub repository remained but stopped receiving updates.[^15]

### Roboschool

Roboschool, released by OpenAI in May 2017, was an open source robotics simulator built on the Bullet physics engine. It provided MuJoCo-style continuous control environments without the proprietary license that MuJoCo required at the time and integrated with Gym through the standard interface. OpenAI deprecated Roboschool in 2019 in favor of MuJoCo-based environments after MuJoCo itself moved toward a free license; Bullet-based RL environments live on in projects like PyBullet Gym.[^16]

### Gym Retro

[gym retro](/wiki/gym_retro), launched in 2018, extended the Atari pattern to many more retro consoles, including SNES, Sega Genesis, NES, Game Boy, and Atari 2600. The full release shipped over 1,000 games and tools for adding new ones via game integration files. Gym Retro powered OpenAI's Retro Contest, a generalization-focused competition built around Sonic the Hedgehog levels; the contest produced research on transfer learning and procedurally generated levels.[^17]

### Safety Gym

Safety Gym, released in 2019 by Alex Ray, Joshua Achiam, and Dario Amodei, focused on constrained RL and safe exploration. It included an environment-builder for composing tasks out of physics elements, goals, and safety constraints, plus a benchmark suite of 18 high-dimensional continuous control environments and nine debugging environments. Like Roboschool, Safety Gym is no longer actively maintained by OpenAI; the Farama Foundation now hosts a successor called Safety-Gymnasium.[^18]

### Procgen

Procgen, released by OpenAI in 2019, was a suite of 16 procedurally generated game environments designed to measure generalization in deep RL. The motivation was that fixed Atari and MuJoCo levels reward memorization as much as policy learning, so a benchmark whose levels are sampled from a generator gives a cleaner read on generalization. Procgen environments include CoinRun, Maze, BigFish, and others, all using the Gym API.[^19]

## How did Gym influence the RL ecosystem?

Gym's most lasting contribution is the API itself. Almost every popular RL library released after 2016 either consumes Gym environments directly or implements a compatible adapter.

| Project | Relationship to Gym |
|---|---|
| Stable Baselines and Stable Baselines 3 | Algorithm libraries that train against any Gym-compatible environment; SB3 added explicit Gymnasium support in 2023 |
| RLlib | Ray's distributed RL framework; uses the Gym and Gymnasium API as its environment standard |
| PettingZoo | Multi-agent counterpart to Gym from the same Farama team; designed as the multi-agent analogue of `gym.Env` |
| CleanRL | Single-file reference implementations of RL algorithms; written against Gym and later Gymnasium |
| Tianshou | Modular PyTorch RL library that adopts the Gym API |
| TorchRL | PyTorch-native RL library from Meta; consumes Gym and Gymnasium environments through wrappers |
| Acme | DeepMind's RL agent library; ships Gym compatibility wrappers |
| dm-control and DeepMind Lab | Originally separate; now offer Gym wrappers via the Farama Foundation's Shimmy compatibility layer |
| Unity ML-Agents | Game-engine RL platform; provides a Gym wrapper so existing agents can drive Unity scenes |
| Isaac Gym and successors | NVIDIA's GPU-parallelized robotics simulator family; Isaac Gym used the Gym API directly, succeeded by Isaac Sim and [isaac lab](/wiki/isaac_lab) |

The ripple effect goes beyond Python. Several non-Python tools, including Unity ML-Agents and PettingZoo's parallel API, model their interfaces explicitly on Gym so that existing agents and training scripts can be reused with minimal changes.[^10][^20] On the algorithm side, the canonical implementations of [dqn](/wiki/dqn), [asynchronous advantage actor-critic](/wiki/a3c), [ppo](/wiki/ppo), [soft actor critic](/wiki/soft_actor_critic), [ddpg](/wiki/ddpg), TD3, Rainbow, IMPALA, R2D2, and MuZero have all been benchmarked at one point or another on Gym Atari or Gym MuJoCo tasks, and the public scoreboards baked into the original Gym website (before it was retired) were among the first community-curated leaderboards in RL.[^1][^9][^12][^21][^22]

### Adoption in foundational deep RL papers

The 2013 [dqn](/wiki/dqn) preprint and the 2015 Nature paper by Mnih et al. predate Gym, but the post-Gym era of deep RL is dominated by works that use it as their evaluation harness. [ppo](/wiki/ppo) (Schulman et al., 2017) explicitly used the Gym MuJoCo suite for its main continuous-control comparisons; [soft actor critic](/wiki/soft_actor_critic) (Haarnoja et al., 2018) reported numbers on Hopper, Walker2d, HalfCheetah, Ant, and Humanoid from the Gym MuJoCo family; Rainbow (Hessel et al., 2018) combined six DQN extensions and reported aggregate Atari-57 performance using the standard Gym wrappers.[^21][^22][^12] Later distributional and recurrent methods like IMPALA, R2D2, and MuZero relied on the same benchmark family for direct comparability.[^23] [alphazero](/wiki/alphazero) and [muzero](/wiki/muzero), while not direct consumers of Gym, share its convention of separating environment from learner and have influenced how Farama designs new benchmarks.

## When did Gym become Gymnasium?

By 2020, OpenAI's research priorities had shifted decisively toward large language models, and Gym went largely unmaintained for most of that year. Pull requests piled up, environment versions drifted out of sync with their underlying simulators, and several core dependencies (notably MuJoCo and Atari) changed their licensing or distribution model in ways that broke the default install.[^4][^5][^6] The MuJoCo open-sourcing in October 2021 and the migration from `atari-py` to `ale-py` were the two most disruptive of these shifts; without active maintenance, the published `pip install gym[atari]` and `pip install gym[mujoco]` paths went stale.[^14]

In early 2021, OpenAI agreed to hand the repository to a volunteer maintenance team led by Jordan Terry, who had been doing much of the upkeep informally. That team founded the Farama Foundation, a nonprofit dedicated to open source RL infrastructure, which was publicly announced on October 25, 2022.[^4] In its launch post the Farama team stated its goal plainly: "Our mission is to develop and maintain open source reinforcement learning tools, making reinforcement learning research faster and more productive."[^4] The same announcement introduced Gymnasium as the long-term home for the Gym API and noted that "It's our understanding that OpenAI has no plans to develop Gym going forward," so the fork would not split the community between competing libraries.[^4] Mark Towers became the lead Gymnasium maintainer, with Ariel Kwiatkowski and other contributors handling subsystems such as the MuJoCo bindings, the Atari integration, the robotics fork, and the documentation site.[^4][^5][^6]

The 2024 paper "Gymnasium: A Standard Interface for Reinforcement Learning Environments" (arXiv:2407.17032) by Mark Towers, Ariel Kwiatkowski, Jordan Terry, John U. Balis, and twelve other authors documented the API in its current form and was accepted at NeurIPS Datasets and Benchmarks 2025. The paper frames Gymnasium as the de facto standard interface for single-agent RL and discusses interoperability with the rest of the Farama ecosystem.[^7]

Key changes between Gym and Gymnasium include the terminated and truncated split described above, a stricter contract for `reset(seed=...)` deterministic seeding, a unified `render_mode` argument set at construction time rather than passed to `render()`, and updated MuJoCo environments based on the open source `mujoco` Python bindings instead of the older `mujoco-py`. The Farama Foundation also publishes Shimmy, a compatibility layer that wraps older Gym environments and several non-Gym APIs (DeepMind Control Suite, OpenSpiel, Atari ALE) so they can be used as Gymnasium environments.[^4][^5][^6]

## The wider Farama ecosystem

Gymnasium sits at the center of a family of related projects maintained by Farama, all of which share or extend the Gym contract.

| Project | Scope | Notes |
|---|---|---|
| Gymnasium | Single-agent environment API | Direct successor to OpenAI Gym; current version 1.3.0 (April 2026) |
| Gymnasium-Robotics | Goal-conditioned manipulation tasks | Hosts the Fetch and Shadow Hand environments that were once part of Gym |
| PettingZoo | Multi-agent environment API | The multi-agent analogue of `gym.Env`; introduced by Terry et al. in 2020 |
| MAgent2 | Large-scale multi-agent battles | Hundreds to thousands of agents per scene; uses PettingZoo's parallel API |
| Minigrid | Grid-world tasks | Originally Chevalier-Boisvert et al.; common benchmark for exploration and curriculum learning |
| MiniWorld | First-person 3D grid environments | Pixel-based generalization tasks |
| Safety-Gymnasium | Constrained RL benchmarks | Continuation of Safety Gym under Farama |
| Shimmy | Compatibility shim | Wraps legacy Gym, DeepMind Control Suite, OpenSpiel, dm-env, and Melting Pot as Gymnasium environments |
| Arcade Learning Environment 2.0+ | Atari benchmark | Co-maintained with the original ALE authors; ships native Gymnasium support |
| MO-Gymnasium | Multi-objective RL | Vector reward variants of standard tasks |

This collection covers most of the niches that motivated OpenAI's original spin-off projects (multi-agent, retro games, safety, large worlds) while keeping a single API surface.[^4][^5][^6][^20]

Several non-Farama projects sit alongside the Gym ecosystem rather than inside it. Brax (Freeman et al., DeepMind, 2021) is a JAX-native rigid-body simulator that ships its own Gym-style interface and is widely used for massively parallel continuous-control RL on TPUs and GPUs.[^24] NVIDIA's Isaac Gym was a GPU-resident robotics simulator that later evolved into [isaac lab](/wiki/isaac_lab) on top of Isaac Sim; both expose Gym-compatible task APIs.[^25] MuJoCo MJX (introduced in 2023) is the JAX port of MuJoCo and ships Gymnasium-compatible environments through `mujoco_playground`. MetaWorld, NetHack Learning Environment, MiniHack, Procgen, CARLA, and Habitat all expose Gym or Gymnasium adapters for their respective domains.[^26]

## What breaks when migrating from Gym to Gymnasium?

Three things bite newcomers most often when moving between Gym and Gymnasium. First, seed handling: pre-0.26 code called `env.seed(s)` once and then `env.reset()`, while Gymnasium expects `env.reset(seed=s)` on each episode where determinism matters; calling the old seed method on a Gymnasium environment is a no-op. Second, the return-tuple change in `step()`: code that unpacks four values from `step()` breaks on Gymnasium, and code that ignores `truncated` will incorrectly bootstrap or fail to bootstrap on time-limit cutoffs. Third, render mode: pre-0.26 code passed `mode="human"` to `render()` every step, while Gymnasium expects `render_mode="human"` at `gym.make()` construction.[^5][^6][^11]

Several environment IDs were renamed across the transition. Pendulum-v0 became Pendulum-v1 well before the Farama fork to fix a reward calculation bug, and the MuJoCo environments moved through versions two, three, and four as the underlying bindings switched from `mujoco-py` to the official `mujoco` package. Robotics environments were renamed when they moved to `Gymnasium-Robotics` (the old `FetchReach-v1` is now `FetchReach-v3` with updated kinematics). Code that hard-codes a specific environment ID should be reviewed when upgrading Gymnasium versions.[^5][^6]

## Is OpenAI Gym still maintained?

Gym is, by any reasonable measure, the most influential single piece of infrastructure in modern reinforcement learning research. The original `openai/gym` repository accumulated more than 37,000 GitHub stars and 8,700 forks before being archived, and the standard Atari, MuJoCo, and classic-control benchmark numbers reported in essentially every deep RL paper from 2016 onward trace back to environments first packaged here.[^3] The Brockman et al. whitepaper has been cited tens of thousands of times on Google Scholar, comparable to other widely cited infrastructure papers in machine learning.[^1]

For new work, however, the toolkit itself is no longer the right starting point. The original repository is read-only, several environment families have moved to Farama-maintained packages (`ale-py` for Atari, `Gymnasium-Robotics`, `Safety-Gymnasium`), and the API improvements introduced after 2022 only exist in Gymnasium. The practical advice from both OpenAI and Farama is the same: install Gymnasium and import it with the alias `gym` if backward compatibility matters.[^3][^4][^5][^6]

Viewed in retrospect, the most interesting thing about Gym may be how little it tried to do. It defined a small contract, packaged a handful of canonical task families, and let other people build the algorithm libraries, the visualization tools, and the multi-agent extensions. The Farama team's decision to preserve that minimalism rather than rewrite the API from scratch is the main reason Gymnasium has been adopted so quickly. The same `env.reset()`, `env.step(action)`, `observation_space`, `action_space` pattern that Brockman and colleagues sketched in 2016 is still the contract that an RL agent and an RL environment use to talk to each other in 2026.

## See also

- Gymnasium
- [reinforcement learning](/wiki/reinforcement_learning)
- [openai](/wiki/openai)
- [farama foundation](/wiki/farama_foundation)
- [dqn](/wiki/dqn)
- [ppo](/wiki/ppo)
- [soft actor critic](/wiki/soft_actor_critic)
- [ddpg](/wiki/ddpg)
- [policy gradient](/wiki/policy_gradient)
- [mujoco](/wiki/mujoco)
- [markov decision process mdp](/wiki/markov_decision_process_mdp)
- [imagenet](/wiki/imagenet)
- [isaac lab](/wiki/isaac_lab)
- [gym retro](/wiki/gym_retro)
- [alphazero](/wiki/alphazero)
- [muzero](/wiki/muzero)
- [alphago](/wiki/alphago)
- [google deepmind](/wiki/google_deepmind)
- [q-learning](/wiki/q-learning)
- [sarsa](/wiki/sarsa)
- [reinforcement learning rl](/wiki/reinforcement_learning_rl)
- [greg brockman](/wiki/greg_brockman)
- [john schulman](/wiki/john_schulman)

## References

[^1]: Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., and Zaremba, W., "OpenAI Gym", arXiv:1606.01540, 2016-06-05. https://arxiv.org/abs/1606.01540. Accessed 2026-05-26.
[^2]: OpenAI, "OpenAI Gym Beta", OpenAI blog, 2016-04-27. https://openai.com/index/openai-gym-beta/. Accessed 2026-05-26.
[^3]: OpenAI, "openai/gym: A toolkit for developing and comparing reinforcement learning algorithms", GitHub repository (archived 2026-04-08). https://github.com/openai/gym. Accessed 2026-05-26.
[^4]: Farama Foundation, "Announcing The Farama Foundation", farama.org, 2022-10-25. https://farama.org/Announcing-The-Farama-Foundation. Accessed 2026-05-26.
[^5]: Farama Foundation, "Gymnasium Documentation", gymnasium.farama.org. https://gymnasium.farama.org/. Accessed 2026-05-26.
[^6]: Farama Foundation, "Farama-Foundation/Gymnasium", GitHub repository, current version 1.3.0 released 2026-04-22. https://github.com/Farama-Foundation/Gymnasium. Accessed 2026-05-26.
[^7]: Towers, M., Kwiatkowski, A., Terry, J., Balis, J. U., et al., "Gymnasium: A Standard Interface for Reinforcement Learning Environments", arXiv:2407.17032, 2024-07-24. https://arxiv.org/abs/2407.17032. Accessed 2026-05-26.
[^8]: Bellemare, M. G., Naddaf, Y., Veness, J., and Bowling, M., "The Arcade Learning Environment: An Evaluation Platform for General Agents", Journal of Artificial Intelligence Research 47, 253-279, 2013. arXiv:1207.4708. https://arxiv.org/abs/1207.4708. Accessed 2026-05-26.
[^9]: Mnih, V., Kavukcuoglu, K., Silver, D., et al., "Human-level control through deep reinforcement learning", Nature 518, 529-533, 2015-02-26. https://www.nature.com/articles/nature14236. Accessed 2026-05-26.
[^10]: Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M., and Dormann, N., "Stable-Baselines3: Reliable Reinforcement Learning Implementations", Journal of Machine Learning Research, 2021. https://jmlr.org/papers/v22/20-1364.html. Accessed 2026-05-26.
[^11]: Farama Foundation, "Migration Guide v0.21 to v0.26 / Gymnasium", Gymnasium documentation. https://gymnasium.farama.org/introduction/migration_guide/. Accessed 2026-05-26.
[^12]: Hessel, M., Modayil, J., van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., and Silver, D., "Rainbow: Combining Improvements in Deep Reinforcement Learning", AAAI 2018. arXiv:1710.02298. https://arxiv.org/abs/1710.02298. Accessed 2026-05-26.
[^13]: Farama Foundation, "Arcade Learning Environment 2.0 release notes", Farama-Foundation/Arcade-Learning-Environment GitHub. https://github.com/Farama-Foundation/Arcade-Learning-Environment. Accessed 2026-05-26.
[^14]: DeepMind, "Opening up a physics simulator for robotics", deepmind.google blog, 2021-10-18. https://deepmind.google/discover/blog/opening-up-a-physics-simulator-for-robotics/. Accessed 2026-05-26.
[^15]: OpenAI, "openai/universe", GitHub repository, released December 2016. https://github.com/openai/universe. Accessed 2026-05-26.
[^16]: OpenAI, "openai/roboschool", GitHub repository (deprecated 2019). https://github.com/openai/roboschool. Accessed 2026-05-26.
[^17]: OpenAI, "openai/retro: Retro Games in Gym", GitHub repository. https://github.com/openai/retro. Accessed 2026-05-26.
[^18]: Ray, A., Achiam, J., and Amodei, D., "Benchmarking Safe Exploration in Deep Reinforcement Learning", OpenAI technical report, 2019. https://cdn.openai.com/safexp-short.pdf. Accessed 2026-05-26.
[^19]: Cobbe, K., Hesse, C., Hilton, J., and Schulman, J., "Leveraging Procedural Generation to Benchmark Reinforcement Learning", arXiv:1912.01588, 2019. https://arxiv.org/abs/1912.01588. Accessed 2026-05-26.
[^20]: Terry, J., Black, B., Grammel, N., et al., "PettingZoo: Gym for Multi-Agent Reinforcement Learning", arXiv:2009.14471, 2020. https://arxiv.org/abs/2009.14471. Accessed 2026-05-26.
[^21]: Schulman, J., Wolski, F., Dhariwal, P., Radford, A., and Klimov, O., "Proximal Policy Optimization Algorithms", arXiv:1707.06347, 2017-07-20. https://arxiv.org/abs/1707.06347. Accessed 2026-05-26.
[^22]: Haarnoja, T., Zhou, A., Abbeel, P., and Levine, S., "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor", ICML 2018. arXiv:1801.01290. https://arxiv.org/abs/1801.01290. Accessed 2026-05-26.
[^23]: Espeholt, L., Soyer, H., Munos, R., et al., "IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures", ICML 2018. arXiv:1802.01561. https://arxiv.org/abs/1802.01561. Accessed 2026-05-26.
[^24]: Freeman, C. D., Frey, E., Raichuk, A., Girgin, S., Mordatch, I., and Bachem, O., "Brax: A Differentiable Physics Engine for Large Scale Rigid Body Simulation", arXiv:2106.13281, 2021. https://arxiv.org/abs/2106.13281. Accessed 2026-05-26.
[^25]: Makoviychuk, V., Wawrzyniak, L., Guo, Y., et al., "Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning", arXiv:2108.10470, 2021. https://arxiv.org/abs/2108.10470. Accessed 2026-05-26.
[^26]: Yu, T., Quillen, D., He, Z., Julian, R., Hausman, K., Finn, C., and Levine, S., "Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning", CoRL 2019. arXiv:1910.10897. https://arxiv.org/abs/1910.10897. Accessed 2026-05-26.

