# Gym Retro

> Source: https://aiwiki.ai/wiki/gym_retro
> Updated: 2026-06-27
> Categories: Artificial Intelligence
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

Gym Retro is an open-source [reinforcement learning](/wiki/rl) (RL) research platform from [OpenAI](/wiki/openai), released in full in May 2018, that turns roughly 1,000 classic console video games into standardized [Gym](/wiki/gym) environments. It works by wrapping console emulators that implement the [Libretro API](/wiki/libretro_api) and reading game state straight out of emulator RAM, so a researcher can train an agent on Sega Genesis, NES, or SNES titles using the same `reset()` and `step()` interface as any other Gym environment. In OpenAI's own words, "Gym Retro lets you turn classic video games into Gym environments for reinforcement learning" and ships "with integrations for ~1000 games." [1] [2]

Gym Retro was built to push pixel-based RL research past the [Atari](/wiki/atari) 2600, with a particular emphasis on 16-bit [Sega Genesis](/wiki/sega_genesis) games. Its highest-profile use was the [Retro Contest](/wiki/retro_contest), a two-month transfer-learning competition (April 5 to June 5, 2018) in which 923 teams tried to build agents that could generalize to unseen levels of the Sonic the Hedgehog Genesis games. OpenAI framed the larger goal as studying how agents can "generalize between games with similar concepts but different appearances." [1] [3] [8]

The preliminary release earlier in 2018 included about 70 [Atari 2600](/wiki/atari_2600) games from the [Arcade Learning Environment](/wiki/arcade_learning_environment) plus roughly 30 [Sega Genesis](/wiki/sega_genesis) games from the SEGA Mega Drive and Genesis Classics Steam bundle, and that smaller corpus shipped with the Retro Contest in April 2018. The May 2018 full release expanded the lineup to over 1,000 titles spanning Sega Genesis, [Sega Master System](/wiki/sega_master_system), [NES](/wiki/nes), [SNES](/wiki/snes), and [Game Boy](/wiki/game_boy), with preliminary support for Sega Game Gear, Game Boy Color, [Game Boy Advance](/wiki/game_boy_advance), and the NEC TurboGrafx-16. [1] [4]

Moving past Atari was the point. Atari 2600 games run on 1977 hardware with simple sprites and a tiny action space, a natural early bench for deep RL after [DQN](/wiki/dqn) in 2013 but increasingly stale. Sega Genesis titles run on a 16-bit Motorola 68000 with a larger frame buffer, more buttons, and richer game logic, so agents must read richer pixels and act over longer horizons. Retro built on lessons from [Universe](/wiki/universe), OpenAI's late-2016 platform that ran browser and Flash games inside [VNC](/wiki/vnc) sessions and proved unreliable because of real-time stepping and screen-scraped state. Retro fixed that by emulating consoles in-process and reading state straight out of emulator RAM. [1] [3]

## Quick facts

| Attribute | Value |
| --- | --- |
| Developer | [OpenAI](/wiki/openai) |
| Initial preliminary release | April 5, 2018 (with the Retro Contest) |
| Full release | May 2018 |
| Latest OpenAI version | 0.8.0 (May 1, 2020) |
| License | MIT |
| Source | [github.com/openai/retro](https://github.com/openai/retro) |
| Maintenance status | Maintenance only since 2020 |
| Active fork | [stable-retro](/wiki/stable_retro) by the [Farama Foundation](/wiki/farama_foundation) |
| Game count at full release | ~1,000 across 8 systems |
| Languages | C (~69%), C++ (~27%), Python bindings |
| Companion paper | Gotta Learn Fast (Nichol et al., 2018) [5] |

## What is Gym Retro?

Gym Retro is a thin Python layer that exposes emulated retro console games through the OpenAI [Gym](/wiki/gym) API, the de facto standard interface for RL environments. Each game becomes an environment whose observations are RGB screen frames, whose actions are the original gamepad's buttons, and whose rewards are computed from values read live out of the emulator's memory. Because it sits on top of the [Libretro API](/wiki/libretro_api), the same package can cover many different consoles: "It uses various emulators that support the Libretro API, making it fairly easy to add new emulators." [2]

By 2017 the standard RL benchmark for pixel-based agents was the [Arcade Learning Environment](/wiki/arcade_learning_environment) (ALE), exposing a few dozen Atari 2600 ROMs through a Gym-compatible interface. ALE had carried the field through DQN, [A3C](/wiki/a3c), Rainbow, and [PPO](/wiki/ppo), but the limits were obvious: an 18-action joystick space, hand-tuned rewards, many titles already mastered, and agents that overfit to the specific ROM they were trained on. Gym Retro raised the ceiling by jumping from roughly 70 Atari and 30 Sega titles to over 1,000 games across eight systems, giving researchers a much wider and more visually varied pool to test generalization on. [1] [5]

OpenAI's earlier attempt at scale, Universe (late 2016), exposed thousands of browser and Flash titles through VNC but ran in wall-clock time and depended on fragile screen scraping. Retro took a different bet: run emulators inside the Python process and read game state directly out of emulator RAM, which unlocked deterministic resets, save states, and faster than real time training. The Retro Learning Environment, an earlier academic project for SNES and Genesis RL, used a similar trick; OpenAI credits it as inspiration but argues Gym Retro is more flexible because it abstracts over Libretro cores rather than baking in specific emulators. [1] [3] [4]

## How does Gym Retro work? (architecture)

Gym Retro is a thin Python wrapper around emulator binaries plus a per-game data layer.

### Libretro cores

Libretro is an emulator API that compiles each emulator into a single shared library called a core. A Libretro frontend (RetroArch is the best known) loads the core, sends it inputs, and pulls back video, audio, and memory. Retro acts as the frontend: each supported system ships its Libretro core inside the package, and adding a new console mostly means dropping in a new core. The Libretro API includes `retro_get_memory_data` and `retro_get_memory_size`, which Retro uses to read game variables for reward shaping. [3] [6]

### Per-game data files

Each integrated game has four files plus at least one save state. As OpenAI describes it, "Each game integration has files listing memory locations for in-game variables, reward functions based on those variables, episode end conditions, savestates at the beginning of levels and a file containing hashes of ROMs that work with these files." [2]

| File | Purpose |
| --- | --- |
| `data.json` | Maps named variables (lives, score, x position, ring count) to RAM addresses and types. |
| `scenario.json` | Defines the reward function and the done condition using variables from data.json. |
| `metadata.json` | Stores the default starting save state and other game-level settings. |
| `script.lua` | Optional Lua hooks for rewards or termination conditions that need logic beyond simple expressions. |
| `*.state` | Binary save states marking levels, checkpoints, or specific scenarios. |

ROM hashes are stored in a `rom.sha` file, and most hashes match No-Intro SHA-1 sums. ROMs themselves are not shipped: users have to provide them, although a few non-commercial homebrew titles such as `Airstriker-Genesis` come bundled for testing. [2] [3]

### Integration UI

The integration UI is a Qt desktop app that lets researchers step through a game frame by frame, watch RAM, and bookmark addresses as named variables. Lives, score, and progress counters get found by playing while sweeping memory for values that change in the expected way; the UI then exports them straight into a `data.json` for that ROM. [3]

### Python interface

The Python API mirrors Gym. After installation a typical session looks like:

```python
import retro
env = retro.make(game='Airstriker-Genesis')
obs = env.reset()
while True:
    action = env.action_space.sample()
    obs, reward, done, info = env.step(action)
    if done:
        break
env.close()
```

Observations are RGB frames, actions are multi-discrete vectors representing the original gamepad's buttons (Genesis has eight, NES has six, and so on), and `info` contains the data.json variables. Multiple states per game can be loaded by passing the `state=` argument to `retro.make`. [2]

## How does Gym Retro relate to OpenAI Gym?

Gym Retro is not a replacement for [OpenAI Gym](/wiki/gym); it is a collection of environments that plug into it. Gym, first released by OpenAI in 2016, defines the standard RL contract: an environment exposes `reset()`, `step(action)`, an `observation_space`, and an `action_space`. Gym Retro implements that contract for emulated console games, so any agent or library written against Gym (such as Stable Baselines or OpenAI Baselines) can train on a Sonic level with no special plumbing. The relationship mirrors how its successor, [stable-retro](/wiki/stable_retro), is built on the [Farama Foundation](/wiki/farama_foundation)'s [Gymnasium](/wiki/gymnasium): "Like Gym Retro built upon OpenAI Gym, Stable Retro is built upon Farama Foundation Gymnasium." [2] [11]

## Which systems and games does Gym Retro support?

| System | Emulator core | Notes |
| --- | --- | --- |
| Atari 2600 | Stella | Same hardware as ALE; Retro re-uses ALE-style ROMs. |
| NEC TurboGrafx-16 / PC Engine | Mednafen / Beetle PCE Fast | Preliminary support at full release. |
| Nintendo Game Boy / Game Boy Color | gambatte | Game Boy Color marked preliminary. |
| Nintendo Game Boy Advance | mGBA | Preliminary support. |
| Nintendo NES | FCEUmm | Standard 8-bit Nintendo. |
| Nintendo SNES | Snes9x | 16-bit Nintendo. |
| Sega Game Gear | Genesis Plus GX | Preliminary. |
| Sega Genesis / Mega Drive | Genesis Plus GX | The flagship console for the Retro Contest. |
| Sega Master System | Genesis Plus GX | Same core covers Master System, Game Gear, and Genesis. |

The full release covered eight emulated systems with about 1,000 ROMs integrated overall, and the Genesis catalog is the most thoroughly annotated because it was the focus of the contest. [2] [3]

## How do you install Gym Retro?

Gym Retro shipped pre-built wheels for Windows 7/8/10, macOS 10.13 (High Sierra) and 10.14 (Mojave), and manylinux1 Linux. It supports Python 3.6, 3.7, and 3.8, and OpenAI recommended a CPU with SSSE3 or better. The simplest install is `pip install gym-retro`, which still works against the May 2020 0.8.0 wheels for compatible Python versions. Building from source requires CMake, a C++ compiler, and the cloned repository at `github.com/openai/retro`. [2] [7]

## What is the Gotta Learn Fast paper?

OpenAI released a [technical report](/wiki/gotta_learn_fast) titled "Gotta Learn Fast: A New Benchmark for Generalization in RL" by Alex Nichol, Vicki Pfau, Christopher Hesse, Oleg Klimov, and [John Schulman](/wiki/john_schulman), submitted to arXiv on April 10, 2018 and revised on April 23, 2018. The paper proposes a transfer-learning benchmark on the [Sonic the Hedgehog](/wiki/sonic_the_hedgehog) Genesis trilogy: agents train on a large pool of training levels, then face a held-out set of test levels with one million timesteps (about 18 hours of in-game time at 60Hz) per test level. The report runs three baselines (Rainbow [DQN](/wiki/dqn), [PPO](/wiki/ppo), and a hand-coded random search baseline called JERK, short for 'Just Enough Retained Knowledge') and shows that joint PPO trained on the training levels then fine-tuned on each test level nearly doubles the performance of PPO trained from scratch on the test levels. The paper frames the benchmark as a complement to ALE rather than a replacement: ALE measures within-task performance, Sonic measures generalization. [5]

## What was the Retro Contest?

Between April 5 and June 5, 2018, OpenAI ran a public competition called the [Retro Contest](/wiki/retro_contest) for the best agent on previously unseen levels of the Sonic the Hedgehog Genesis games. Participants received the training levels from the three Genesis titles, were free to use any data or compute at training time, and submitted Docker containers; at test time each container had a one million timestep budget per held-out level, roughly 18 hours of in-game play. [4] [8]

### Format and participation

Submissions ran on OpenAI's evaluation infrastructure: train or script an agent on the public Sonic levels, wrap it in a Docker image, run against five public test levels (low-quality levels generated with a Sonic level editor) for the live leaderboard, then final standings on a separate set of secret evaluation levels that competitors never see. Leaderboard scores were averaged over levels, with each level capped at a normalized maximum of 10,000. OpenAI reported 923 registered teams, 229 of which submitted at least one solution; the evaluation cluster ran 4,448 evaluations over two months, roughly twenty per submitting team. Most entries started from the tuned PPO and Rainbow DQN baselines shipped with the contest. [8]

### Results

Top scores came from tuning existing model-free RL algorithms, not new architectures:

| Rank | Team | Approach | Notes |
| --- | --- | --- | --- |
| 1 | Dharmaraja | Joint PPO with modifications | Six-member team (Qing Da, Jing-Cheng Shi, Anxiang Zeng, Guangda Huzhang, Run-Ze Li, Yang Yu); added a CNN layer, tuned n-step Q-learning, lower DQN target update interval. |
| 2 | Mistake | Custom PPO variant | Edged out aborg narrowly. |
| 3 | Aborg | Joint PPO with extra training data | Solo entry from Alexandre Borghi; mixed in Sonic levels from Master System and Game Boy Advance ports with a different network. |

The top final score was 4,692 against a theoretical maximum of 10,000, which OpenAI took as a sign the benchmark was hard but not saturated and that Sonic-style transfer remained an open problem. The 'low quality' label on the evaluation set refers to the levels being editor-generated rather than crafted by Sega. [8]

## How is Gym Retro used in research?

The Retro Contest got coverage in [TechCrunch](/wiki/techcrunch), The Register, and Wired, mostly framed as an OpenAI publicity move around Sonic. In academic circles Retro became a common base for transfer-learning and meta-RL work, often cited alongside [ProcGen](/wiki/procgen_benchmark) and the [Obstacle Tower Challenge](/wiki/obstacle_tower) once those benchmarks appeared in 2019. The 16-bit games tested long-horizon credit assignment more cleanly than Atari, and the Sonic levels were the canonical transfer-learning benchmark thanks to Gotta Learn Fast. The platform shows up in work on world models, exploration bonuses, [curriculum learning](/wiki/curriculum_learning), and [self-supervised learning](/wiki/self_supervised) with RL, and it remains a teaching staple in university RL courses. [1] [4] [5] [9]

## Is Gym Retro still maintained? (stable-retro)

OpenAI moved Gym Retro into maintenance mode soon after release; the GitHub README has carried "Status: Maintenance (expect bug fixes and minor updates)" since 2018, and the last upstream release on PyPI was 0.8.0 on May 1, 2020. After that, the public RL gym ecosystem migrated away from OpenAI: Gym was forked into [Gymnasium](/wiki/gymnasium) under the [Farama Foundation](/wiki/farama_foundation), and Gym Retro followed the same path under the name [stable-retro](/wiki/stable_retro). [2] [10]

Stable-retro is led by Mathieu Poliquin and the Farama Foundation, and accepts pull requests for new games, emulator cores, and bug fixes that upstream no longer takes. The fork adds Sega Saturn, Sega CD, Sega 32X, Sega Dreamcast, Nintendo 64, Nintendo DS, and arcade machines while keeping the core Gym Retro API; Python support has been broadened to 3.7 through 3.12, and the Windows route runs through WSL2. The documentation lists more than 1,000 integrated games. For new RL projects on retro consoles, stable-retro is now the recommended starting point. [10] [11]

## How does Gym Retro compare with related platforms?

| Platform | Year | Scope | Reset model | Notes |
| --- | --- | --- | --- | --- |
| [Arcade Learning Environment](/wiki/arcade_learning_environment) | 2013 | ~60 [Atari 2600](/wiki/atari_2600) games | Deterministic | The original RL pixel benchmark; small action space and short horizons. |
| [Universe](/wiki/universe) | 2016 | Browser, Flash, commercial titles | Real-time, screen-scraped | Discontinued; reliability problems with VNC and timing. |
| Retro Learning Environment | 2016 | SNES, Genesis | Deterministic | Academic precursor to Gym Retro. |
| Gym Retro | 2018 | ~1,000 games across 8 retro consoles | Deterministic save states | Maintained by [OpenAI](/wiki/openai) until 2020. |
| [stable-retro](/wiki/stable_retro) | 2022 onward | Same as Retro plus Saturn, N64, DS, arcade | Deterministic save states | Active maintained fork by the [Farama Foundation](/wiki/farama_foundation). |
| [ProcGen](/wiki/procgen_benchmark) | 2019 | 16 procedurally generated game-like environments | Deterministic | Designed specifically for testing generalization, with no licensed ROMs. |
| [MiniGrid](/wiki/minigrid) | 2018 onward | Gridworlds | Deterministic | Lightweight benchmark for instruction following and planning. |

Gym Retro and ProcGen ended up filling complementary roles: ProcGen tests generalization across procedurally generated variations of a single game family, while Retro tests it across human-designed levels that share style and mechanics. Researchers often cite both. [4] [5]

## Limitations

ROMs are not bundled, so reproducible benchmarks depend on every contributor sourcing identical ROM hashes. Reward functions are extracted from RAM, which means each new game needs manual integration to find the right addresses. The 0.8.0 wheels target Python 3.6 to 3.8 on older operating systems; on modern macOS or Linux it is often easier to install stable-retro than to fight legacy build tooling. The Sonic benchmark has not been updated since 2018, and although Dharmaraja's 4,692 remains a useful reference, it predates diffusion policies, world models, and large-scale pre-training. [2] [10]

## See also

- [Gym](/wiki/gym)
- [Gymnasium](/wiki/gymnasium)
- [Universe](/wiki/universe)
- [Arcade Learning Environment](/wiki/arcade_learning_environment)
- [stable-retro](/wiki/stable_retro)
- [Sonic the Hedgehog](/wiki/sonic_the_hedgehog)
- [Reinforcement learning](/wiki/rl)
- [PPO](/wiki/ppo)
- [DQN](/wiki/dqn)
- [Farama Foundation](/wiki/farama_foundation)

## References

1. OpenAI. "Gym Retro." OpenAI Blog, May 25, 2018. https://openai.com/index/gym-retro/
2. OpenAI. "openai/retro" GitHub repository README. https://github.com/openai/retro
3. Gym Retro Documentation. Read the Docs. https://retro.readthedocs.io/en/latest/
4. OpenAI. "Retro Contest." OpenAI Blog, April 5, 2018. https://openai.com/index/retro-contest/
5. Nichol, A., Pfau, V., Hesse, C., Klimov, O., and Schulman, J. "Gotta Learn Fast: A New Benchmark for Generalization in RL." arXiv:1804.03720, April 10, 2018. https://arxiv.org/abs/1804.03720
6. Libretro. "API." libretro.com. https://www.libretro.com/index.php/api/
7. PyPI. "gym-retro 0.8.0." Python Package Index. https://pypi.org/project/gym-retro/
8. OpenAI. "Retro Contest: Results." OpenAI Blog, June 27, 2018. https://openai.com/index/first-retro-contest-retrospective/
9. Lardinois, F. "Machine Learning Zone: OpenAI competition takes on Sonic the Hedgehog." TechCrunch, April 5, 2018. https://techcrunch.com/2018/04/05/machine-learning-zone-openai-competition-takes-on-sonic-the-hedgehog/
10. Farama Foundation. "stable-retro." GitHub repository. https://github.com/Farama-Foundation/stable-retro
11. Stable-Retro Documentation. Farama Foundation. https://stable-retro.farama.org/

