Gym (OpenAI Gym / Gymnasium)
Last reviewed
May 26, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v5 ยท 4,779 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 26, 2026
Sources
No citations yet
Review status
Needs citations
Revision
v5 ยท 4,779 words
Add missing citations, update stale details, or suggest a clearer explanation.
Gym, often written as OpenAI Gym, is an open source Python toolkit for developing and comparing reinforcement learning algorithms, originally released by openai on April 27, 2016.[1][2] It pairs a small, opinionated programming interface with a curated collection of benchmark environments so that researchers can plug a reinforcement learning agent into a wide variety of tasks without having to rewrite the simulation code each time. Although it was built around RL, the same env.reset() and env.step(action) calls work fine with imitation learning, evolutionary search, and other approaches that need a uniform notion of "environment."[1][3] OpenAI stopped active maintenance of Gym around 2020 and 2021, the codebase was handed to a volunteer team, and in October 2022 that team officially relaunched the project as Gymnasium under the farama foundation.[4][5][6] The original openai/gym repository was archived on April 8, 2026, and Gymnasium is now the canonical successor; it can be dropped into existing projects by replacing import gym with import gymnasium as gym.[4][5][6]
The Gym API, with its trio of reset(), step(action), and render() methods plus typed observation and action spaces, is the de facto standard interface in modern reinforcement learning research. Almost every popular RL library released after 2016, including Stable Baselines, RLlib, CleanRL, Tianshou, and TorchRL, either consumes Gym or Gymnasium environments directly or implements a compatible adapter.[5][6] The original OpenAI Gym whitepaper has been cited well over ten thousand times on Google Scholar, and the Atari, MuJoCo, and classic-control benchmark numbers reported in essentially every deep RL paper from 2016 onward trace back to environments first packaged here.[1][7]
| Attribute | Detail |
|---|---|
| Original name | OpenAI Gym |
| Initial public beta | April 27, 2016 |
| Whitepaper | Brockman et al., arXiv:1606.01540, submitted June 5, 2016 |
| Original developer | OpenAI |
| Current maintainer | Farama Foundation, as Gymnasium |
| Final OpenAI Gym release | 0.26.2, October 4, 2022 |
| Farama announcement | October 25, 2022 |
| Original repository archived | April 8, 2026 |
| Latest Gymnasium version | 1.3.0, April 22, 2026 |
| License | MIT |
| Languages | Python (3.7+ for late Gym; 3.10 through 3.13 for current Gymnasium) |
| Gymnasium paper | Towers et al., arXiv:2407.17032, July 24, 2024 |
| Successor | Gymnasium (Farama Foundation) |
Before Gym, almost every reinforcement learning paper shipped with its own custom simulator and its own way of feeding observations into a learning algorithm. Comparing two methods meant either reimplementing somebody else's environment or trusting a number printed in a table. Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, john schulman, Jie Tang, and Wojciech Zaremba argued in the original Gym whitepaper that this lack of a shared evaluation surface was holding the field back, particularly as deep RL was starting to show real results on Atari games and continuous control benchmarks.[1] greg brockman later described Gym as an attempt to do for RL what imagenet had done for supervised vision: provide a shared, well-versioned set of tasks plus a public site for comparing results, so that progress could actually be measured rather than just claimed.[1][2]
The intellectual lineage is straightforward. The Atari benchmark suite came from the Arcade Learning Environment by Marc Bellemare and colleagues, published in the Journal of Artificial Intelligence Research in 2013.[8] DeepMind's dqn paper applied deep Q-learning to that suite in 2013 and 2015, posting human-level scores on 29 of 49 games and demonstrating that a single network architecture could learn many games from raw pixels.[9] By 2016, when Gym launched, researchers wanted to reproduce these results and extend them to continuous-control tasks; the missing piece was a uniform Python interface that would let an algorithm written for CartPole also drive a MuJoCo humanoid without code changes.[1]
The response was deliberately minimal. Gym does not ship a learning algorithm at all. It only defines a contract: an environment is anything that exposes reset(), step(action), and a pair of spaces describing what observations look like and what actions are legal. Anything matching that contract is a Gym environment, whether it simulates a 2D pole, an Atari ROM, a 3D humanoid, or a custom robotics rig. This narrow scope is part of why the API spread so quickly across other libraries and is still the foundation of the Gymnasium fork a decade later.[1][5][6]
The core of the library is a small object called Env. A typical interaction loop in the original Gym (versions before 0.26) looked like this: a researcher would call gym.make() to build a versioned environment, call reset() to get the first observation, and then repeatedly call step(action) until the environment signaled that the episode was over.[1][5]
| Method or attribute | Purpose |
|---|---|
gym.make("CartPole-v1") | Construct a versioned environment by string ID |
env.reset() | Reset internal state and return the first observation (and an info dict in Gymnasium) |
env.step(action) | Apply an action and return (observation, reward, done, info) in classic Gym, or (observation, reward, terminated, truncated, info) from Gym 0.26 onward |
env.render() | Visualize the current state, controlled by render_mode |
env.action_space | A Space describing valid actions |
env.observation_space | A Space describing the structure of observations |
env.close() | Release rendering windows or simulator handles |
Observation and action spaces are described by gym.spaces objects. The most common are Box for bounded continuous vectors, Discrete for a finite set of integer actions, MultiDiscrete and MultiBinary for structured discrete spaces, and Dict and Tuple for nested observations.[5][6] Because the spaces are first-class objects, downstream libraries can ask an environment what shape its inputs are and build neural networks automatically. This is the reason that algorithm libraries such as Stable Baselines 3 and RLlib can train on any Gym-compatible environment with essentially zero glue code: the environment advertises its own shapes, and the algorithm reads them at construction time.[6][10]
For most of Gym's life, step() returned a single boolean called done that bundled together two very different events: the agent reached a terminal state of the underlying markov decision process mdp, or the episode was cut off by a time limit. Treating these the same caused subtle bugs in algorithms that bootstrap value estimates, because a time-cut episode is not really over from the agent's perspective and the value function should keep extending past the cutoff. Gym 0.26 (October 2022) and every Gymnasium release since split done into two flags: terminated for true MDP terminal states and truncated for time limits or external cutoffs. The new five-tuple return is now standard across the ecosystem.[5][6][11]
Several other contract changes accompanied the split. reset() now returns a tuple of (observation, info) rather than just the observation, giving environments a place to attach per-episode metadata. Seeding moved into a keyword argument on reset(seed=...) instead of a separate env.seed() method, which was eventually deprecated. The render_mode is now declared at construction time (gym.make(..., render_mode="human")) rather than passed to each render() call, so an environment knows up front whether it needs to allocate rendering resources. These changes broke a lot of existing code, which is why Farama published the Shimmy compatibility shim to wrap pre-0.26 Gym environments and several non-Gym APIs as Gymnasium environments.[5][6]
Gym popularized the idea of stacking environment wrappers. A wrapper takes an existing Env and modifies one slice of its behavior, observations, actions, rewards, or episode lifecycle, while passing the rest through. The standard library ships a long list of them, including TimeLimit (cap episode length), RecordVideo (write rollout footage to disk), RecordEpisodeStatistics (track per-episode return and length), NormalizeObservation and NormalizeReward (running-mean normalization), FrameStack (concatenate the last N frames), and AtariPreprocessing (the canonical DQN-era 84x84 grayscale, frame-skip, max-pool pipeline). Because wrappers compose, a single line such as env = TimeLimit(FrameStack(AtariPreprocessing(env), 4), 10000) reproduces a fairly complex training pipeline.[5][6]
Gym shipped with several families of environments, each with its own dependencies and typical research uses. Gymnasium inherited the same families and continues to maintain them.
| Category | Examples | Notes |
|---|---|---|
| Classic control | CartPole-v1, MountainCar-v0, Acrobot-v1, Pendulum-v1 | Lightweight 2D physics tasks taken from RL textbooks; useful for debugging and teaching |
| Toy text | FrozenLake-v1, Taxi-v3, Blackjack-v1, CliffWalking-v0 | Tiny tabular MDPs used for tabular methods such as q-learning and sarsa |
| Box2D | LunarLander-v2, BipedalWalker-v3, CarRacing-v2 | Built on the Box2D 2D physics engine; mid-difficulty continuous and discrete tasks |
| Atari | Pong-v5, Breakout-v5, SpaceInvaders-v5, plus roughly 60 ROMs | Wrapped from the Arcade Learning Environment (Bellemare et al., 2013); the standard benchmark for deep RL on pixels |
| MuJoCo | Ant-v4, HalfCheetah-v4, Hopper-v4, Humanoid-v4, Walker2d-v4 | Continuous control with detailed contact physics; originally required a paid mujoco license, free under DeepMind since 2021 |
| Robotics | FetchReach-v1, HandManipulateBlock-v0 | Goal-based manipulation tasks; later spun out to a separate Gymnasium-Robotics package |
| Algorithmic | Copy-v0, RepeatCopy-v0, ReversedAddition-v0 | Simple symbol-manipulation puzzles; deprecated in later Gym versions |
The classic-control suite is the easiest place to start: CartPole asks an agent to balance an inverted pendulum on a cart, MountainCar asks an under-powered car to climb a hill by building momentum, Acrobot swings a two-link pendulum up to a target height, and Pendulum-v1 simply asks for upright stabilization with continuous torque. These are tiny 2D physics problems with state vectors of four to six floats and either discrete or one-dimensional continuous actions. Textbooks like Sutton and Barto have used variants of these tasks for decades, and they remain the standard sanity-check for any new algorithm implementation.[5][6]
The toy-text family covers tabular reinforcement learning. FrozenLake is a four-by-four (or eight-by-eight) gridworld with slippery transitions; Taxi-v3 is a five-by-five world where a taxi picks up and drops off passengers; Blackjack is the card game; CliffWalking is the famous example from Sutton and Barto that contrasts SARSA and Q-learning. These environments have small, enumerable state spaces, so they let students and researchers exercise tabular methods without any function approximation at all.[5][6]
The Box2D family uses the Box2D 2D physics engine. LunarLander asks an agent to land a craft between two flags, BipedalWalker has a two-legged robot traverse rough terrain, and CarRacing is a top-down driving task with pixel observations. These are noticeably harder than classic control but still cheap to simulate. CarRacing in particular has been a common benchmark for image-based continuous control.[5][6]
The Atari family is the most influential of the bunch. By wrapping the Arcade Learning Environment and standardizing pre-processing (84x84 grayscale frames, frame-skip of four, life-loss as a terminal signal in some configurations), Gym made it trivial to reproduce the original DQN paper's experimental setup, and a generation of deep RL papers ran on exactly that suite of games. The 49-game DQN benchmark gave way to the broader 57-game Atari-57 set used by later work like Rainbow, IMPALA, R2D2, MuZero, and Agent57.[8][9][12] In 2024 the Arcade Learning Environment 2.0 release, maintained jointly with Farama, integrated the modern Gymnasium API and replaced the older atari-py dependency.[13]
The MuJoCo family covers continuous control with detailed multi-joint physics: Ant (a quadruped), HalfCheetah (a planar two-leg runner), Hopper, Humanoid, and Walker2d. These were originally distributed against the proprietary MuJoCo physics engine, which required a paid license and a separate Python binding (mujoco-py). In October 2021, DeepMind acquired MuJoCo and open-sourced it under Apache 2.0, after which the official mujoco Python bindings replaced mujoco-py in both Gym and Gymnasium environment versions four and above.[14]
The robotics family (Fetch and Shadow Hand manipulators) was originally part of Gym and is now maintained as the separate Gymnasium-Robotics package under Farama. The algorithmic family was deprecated and removed by later Gym versions.[5][6]
In the original Gym, the base install was pip install gym. Optional extras pulled in environment-specific dependencies, for example pip install gym[atari] for Atari ROMs via ale-py, pip install gym[box2d] for the Box2D family, and pip install gym[mujoco] for the MuJoCo continuous control suite. The same pattern carries over to Gymnasium: pip install gymnasium, pip install "gymnasium[atari]", pip install "gymnasium[all]".[5][6]
A minimal random-agent loop reads almost identically in either library:
import gymnasium as gym
env = gym.make("CartPole-v1", render_mode="human")
obs, info = env.reset(seed=42)
done = False
while not done:
action = env.action_space.sample()
obs, reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
env.close()
The seed argument to reset() is itself a Gymnasium-era addition. Earlier Gym versions exposed seeding through a separate env.seed() method, which was eventually deprecated.[5][6] A subtle gotcha for newcomers: in pre-0.26 code, env.reset() returned just obs (not a tuple), and env.step() returned a four-tuple. Most algorithm libraries detected the API version at runtime for a while, but new code should target the five-tuple convention exclusively.[11]
Custom environments follow the same protocol. To register a new task with gym.make(), a developer subclasses gym.Env, implements reset, step, the two space attributes, and optionally render and close, and then calls gym.register() with a versioned ID. Because the contract is small, third-party environments such as Minigrid, MetaWorld, Procgen, MiniHack, CARLA wrappers, and many domain-specific simulators integrate with no changes to user code.[5][6]
OpenAI built several projects on top of Gym during the years it was actively maintained, and most of them are now retired or community-maintained.
Universe, released by OpenAI in December 2016, used Gym's interface as the agent-side contract while running arbitrary desktop and browser programs inside Docker containers. Each container exposed a VNC server for pixels and keyboard or mouse events plus a separate WebSocket channel for reward signals, so Flash games, browser tasks, and even commercial titles like Grand Theft Auto V could be treated as Gym environments. The initial release advertised over 1,000 environments, of which a few hundred had reward signals wired up. Universe was effectively shelved by 2017 when OpenAI shifted focus to dedicated game research; the GitHub repository remained but stopped receiving updates.[15]
Roboschool, released by OpenAI in May 2017, was an open source robotics simulator built on the Bullet physics engine. It provided MuJoCo-style continuous control environments without the proprietary license that MuJoCo required at the time and integrated with Gym through the standard interface. OpenAI deprecated Roboschool in 2019 in favor of MuJoCo-based environments after MuJoCo itself moved toward a free license; Bullet-based RL environments live on in projects like PyBullet Gym.[16]
Gym Retro, launched in 2018, extended the Atari pattern to many more retro consoles, including SNES, Sega Genesis, NES, Game Boy, and Atari 2600. The full release shipped over 1,000 games and tools for adding new ones via game integration files. Gym Retro powered OpenAI's Retro Contest, a generalization-focused competition built around Sonic the Hedgehog levels; the contest produced research on transfer learning and procedurally generated levels.[17]
Safety Gym, released in 2019 by Alex Ray, Joshua Achiam, and Dario Amodei, focused on constrained RL and safe exploration. It included an environment-builder for composing tasks out of physics elements, goals, and safety constraints, plus a benchmark suite of 18 high-dimensional continuous control environments and nine debugging environments. Like Roboschool, Safety Gym is no longer actively maintained by OpenAI; the Farama Foundation now hosts a successor called Safety-Gymnasium.[18]
Procgen, released by OpenAI in 2019, was a suite of 16 procedurally generated game environments designed to measure generalization in deep RL. The motivation was that fixed Atari and MuJoCo levels reward memorization as much as policy learning, so a benchmark whose levels are sampled from a generator gives a cleaner read on generalization. Procgen environments include CoinRun, Maze, BigFish, and others, all using the Gym API.[19]
Gym's most lasting contribution is the API itself. Almost every popular RL library released after 2016 either consumes Gym environments directly or implements a compatible adapter.
| Project | Relationship to Gym |
|---|---|
| Stable Baselines and Stable Baselines 3 | Algorithm libraries that train against any Gym-compatible environment; SB3 added explicit Gymnasium support in 2023 |
| RLlib | Ray's distributed RL framework; uses the Gym and Gymnasium API as its environment standard |
| PettingZoo | Multi-agent counterpart to Gym from the same Farama team; designed as the multi-agent analogue of gym.Env |
| CleanRL | Single-file reference implementations of RL algorithms; written against Gym and later Gymnasium |
| Tianshou | Modular PyTorch RL library that adopts the Gym API |
| TorchRL | PyTorch-native RL library from Meta; consumes Gym and Gymnasium environments through wrappers |
| Acme | DeepMind's RL agent library; ships Gym compatibility wrappers |
| dm-control and DeepMind Lab | Originally separate; now offer Gym wrappers via the Farama Foundation's Shimmy compatibility layer |
| Unity ML-Agents | Game-engine RL platform; provides a Gym wrapper so existing agents can drive Unity scenes |
| Isaac Gym and successors | NVIDIA's GPU-parallelized robotics simulator family; Isaac Gym used the Gym API directly, succeeded by Isaac Sim and isaac lab |
The ripple effect goes beyond Python. Several non-Python tools, including Unity ML-Agents and PettingZoo's parallel API, model their interfaces explicitly on Gym so that existing agents and training scripts can be reused with minimal changes.[10][20] On the algorithm side, the canonical implementations of dqn, asynchronous advantage actor-critic, ppo, soft actor critic, ddpg, TD3, Rainbow, IMPALA, R2D2, and MuZero have all been benchmarked at one point or another on Gym Atari or Gym MuJoCo tasks, and the public scoreboards baked into the original Gym website (before it was retired) were among the first community-curated leaderboards in RL.[1][9][12][21][22]
The 2013 dqn preprint and the 2015 Nature paper by Mnih et al. predate Gym, but the post-Gym era of deep RL is dominated by works that use it as their evaluation harness. ppo (Schulman et al., 2017) explicitly used the Gym MuJoCo suite for its main continuous-control comparisons; soft actor critic (Haarnoja et al., 2018) reported numbers on Hopper, Walker2d, HalfCheetah, Ant, and Humanoid from the Gym MuJoCo family; Rainbow (Hessel et al., 2018) combined six DQN extensions and reported aggregate Atari-57 performance using the standard Gym wrappers.[21][22][12] Later distributional and recurrent methods like IMPALA, R2D2, and MuZero relied on the same benchmark family for direct comparability.[23] alphazero and muzero, while not direct consumers of Gym, share its convention of separating environment from learner and have influenced how Farama designs new benchmarks.
By 2020, OpenAI's research priorities had shifted decisively toward large language models, and Gym went largely unmaintained for most of that year. Pull requests piled up, environment versions drifted out of sync with their underlying simulators, and several core dependencies (notably MuJoCo and Atari) changed their licensing or distribution model in ways that broke the default install.[4][5][6] The MuJoCo open-sourcing in October 2021 and the migration from atari-py to ale-py were the two most disruptive of these shifts; without active maintenance, the published pip install gym[atari] and pip install gym[mujoco] paths went stale.[14]
In early 2021, OpenAI agreed to hand the repository to a volunteer maintenance team led by Jordan Terry, who had been doing much of the upkeep informally. That team founded the Farama Foundation, a nonprofit dedicated to open source RL infrastructure, which was publicly announced on October 25, 2022.[4] The same announcement introduced Gymnasium as the long-term home for the Gym API. Mark Towers became the lead Gymnasium maintainer, with Ariel Kwiatkowski and other contributors handling subsystems such as the MuJoCo bindings, the Atari integration, the robotics fork, and the documentation site.[4][5][6]
The 2024 paper "Gymnasium: A Standard Interface for Reinforcement Learning Environments" (arXiv:2407.17032) by Mark Towers, Ariel Kwiatkowski, Jordan Terry, John U. Balis, and twelve other authors documented the API in its current form and was accepted at NeurIPS Datasets and Benchmarks 2025. The paper frames Gymnasium as the de facto standard interface for single-agent RL and discusses interoperability with the rest of the Farama ecosystem.[7]
Key changes between Gym and Gymnasium include the terminated and truncated split described above, a stricter contract for reset(seed=...) deterministic seeding, a unified render_mode argument set at construction time rather than passed to render(), and updated MuJoCo environments based on the open source mujoco Python bindings instead of the older mujoco-py. The Farama Foundation also publishes Shimmy, a compatibility layer that wraps older Gym environments and several non-Gym APIs (DeepMind Control Suite, OpenSpiel, Atari ALE) so they can be used as Gymnasium environments.[4][5][6]
Gymnasium sits at the center of a family of related projects maintained by Farama, all of which share or extend the Gym contract.
| Project | Scope | Notes |
|---|---|---|
| Gymnasium | Single-agent environment API | Direct successor to OpenAI Gym; current version 1.3.0 (April 2026) |
| Gymnasium-Robotics | Goal-conditioned manipulation tasks | Hosts the Fetch and Shadow Hand environments that were once part of Gym |
| PettingZoo | Multi-agent environment API | The multi-agent analogue of gym.Env; introduced by Terry et al. in 2020 |
| MAgent2 | Large-scale multi-agent battles | Hundreds to thousands of agents per scene; uses PettingZoo's parallel API |
| Minigrid | Grid-world tasks | Originally Chevalier-Boisvert et al.; common benchmark for exploration and curriculum learning |
| MiniWorld | First-person 3D grid environments | Pixel-based generalization tasks |
| Safety-Gymnasium | Constrained RL benchmarks | Continuation of Safety Gym under Farama |
| Shimmy | Compatibility shim | Wraps legacy Gym, DeepMind Control Suite, OpenSpiel, dm-env, and Melting Pot as Gymnasium environments |
| Arcade Learning Environment 2.0+ | Atari benchmark | Co-maintained with the original ALE authors; ships native Gymnasium support |
| MO-Gymnasium | Multi-objective RL | Vector reward variants of standard tasks |
This collection covers most of the niches that motivated OpenAI's original spin-off projects (multi-agent, retro games, safety, large worlds) while keeping a single API surface.[4][5][6][20]
Several non-Farama projects sit alongside the Gym ecosystem rather than inside it. Brax (Freeman et al., DeepMind, 2021) is a JAX-native rigid-body simulator that ships its own Gym-style interface and is widely used for massively parallel continuous-control RL on TPUs and GPUs.[24] NVIDIA's Isaac Gym was a GPU-resident robotics simulator that later evolved into isaac lab on top of Isaac Sim; both expose Gym-compatible task APIs.[25] MuJoCo MJX (introduced in 2023) is the JAX port of MuJoCo and ships Gymnasium-compatible environments through mujoco_playground. MetaWorld, NetHack Learning Environment, MiniHack, Procgen, CARLA, and Habitat all expose Gym or Gymnasium adapters for their respective domains.[26]
Three things bite newcomers most often when moving between Gym and Gymnasium. First, seed handling: pre-0.26 code called env.seed(s) once and then env.reset(), while Gymnasium expects env.reset(seed=s) on each episode where determinism matters; calling the old seed method on a Gymnasium environment is a no-op. Second, the return-tuple change in step(): code that unpacks four values from step() breaks on Gymnasium, and code that ignores truncated will incorrectly bootstrap or fail to bootstrap on time-limit cutoffs. Third, render mode: pre-0.26 code passed mode="human" to render() every step, while Gymnasium expects render_mode="human" at gym.make() construction.[5][6][11]
Several environment IDs were renamed across the transition. Pendulum-v0 became Pendulum-v1 well before the Farama fork to fix a reward calculation bug, and the MuJoCo environments moved through versions two, three, and four as the underlying bindings switched from mujoco-py to the official mujoco package. Robotics environments were renamed when they moved to Gymnasium-Robotics (the old FetchReach-v1 is now FetchReach-v3 with updated kinematics). Code that hard-codes a specific environment ID should be reviewed when upgrading Gymnasium versions.[5][6]
Gym is, by any reasonable measure, the most influential single piece of infrastructure in modern reinforcement learning research. The original openai/gym repository accumulated more than 37,000 GitHub stars and 8,700 forks before being archived, and the standard Atari, MuJoCo, and classic-control benchmark numbers reported in essentially every deep RL paper from 2016 onward trace back to environments first packaged here.[3] The Brockman et al. whitepaper has been cited tens of thousands of times on Google Scholar, comparable to other widely cited infrastructure papers in machine learning.[1]
For new work, however, the toolkit itself is no longer the right starting point. The original repository is read-only, several environment families have moved to Farama-maintained packages (ale-py for Atari, Gymnasium-Robotics, Safety-Gymnasium), and the API improvements introduced after 2022 only exist in Gymnasium. The practical advice from both OpenAI and Farama is the same: install Gymnasium and import it with the alias gym if backward compatibility matters.[3][4][5][6]
Viewed in retrospect, the most interesting thing about Gym may be how little it tried to do. It defined a small contract, packaged a handful of canonical task families, and let other people build the algorithm libraries, the visualization tools, and the multi-agent extensions. The Farama team's decision to preserve that minimalism rather than rewrite the API from scratch is the main reason Gymnasium has been adopted so quickly. The same env.reset(), env.step(action), observation_space, action_space pattern that Brockman and colleagues sketched in 2016 is still the contract that an RL agent and an RL environment use to talk to each other in 2026.