Gym Retro

Artificial Intelligence

15 min read

Updated Jun 27, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 27, 2026

Fact-checked

In review queue

Sources

11 citations

Revision

v3 · 2,910 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

Gym Retro is an open-source reinforcement learning (RL) research platform from OpenAI, released in full in May 2018, that turns roughly 1,000 classic console video games into standardized Gym environments. It works by wrapping console emulators that implement the Libretro API and reading game state straight out of emulator RAM, so a researcher can train an agent on Sega Genesis, NES, or SNES titles using the same reset() and step() interface as any other Gym environment. In OpenAI's own words, "Gym Retro lets you turn classic video games into Gym environments for reinforcement learning" and ships "with integrations for ~1000 games." ^[1] ^[2]

Gym Retro was built to push pixel-based RL research past the Atari 2600, with a particular emphasis on 16-bit Sega Genesis games. Its highest-profile use was the Retro Contest, a two-month transfer-learning competition (April 5 to June 5, 2018) in which 923 teams tried to build agents that could generalize to unseen levels of the Sonic the Hedgehog Genesis games. OpenAI framed the larger goal as studying how agents can "generalize between games with similar concepts but different appearances." ^[1] ^[3] ^[8]

The preliminary release earlier in 2018 included about 70 Atari 2600 games from the Arcade Learning Environment plus roughly 30 Sega Genesis games from the SEGA Mega Drive and Genesis Classics Steam bundle, and that smaller corpus shipped with the Retro Contest in April 2018. The May 2018 full release expanded the lineup to over 1,000 titles spanning Sega Genesis, Sega Master System, NES, SNES, and Game Boy, with preliminary support for Sega Game Gear, Game Boy Color, Game Boy Advance, and the NEC TurboGrafx-16. ^[1] ^[4]

Moving past Atari was the point. Atari 2600 games run on 1977 hardware with simple sprites and a tiny action space, a natural early bench for deep RL after DQN in 2013 but increasingly stale. Sega Genesis titles run on a 16-bit Motorola 68000 with a larger frame buffer, more buttons, and richer game logic, so agents must read richer pixels and act over longer horizons. Retro built on lessons from Universe, OpenAI's late-2016 platform that ran browser and Flash games inside VNC sessions and proved unreliable because of real-time stepping and screen-scraped state. Retro fixed that by emulating consoles in-process and reading state straight out of emulator RAM. ^[1] ^[3]

Quick facts

Attribute	Value
Developer	OpenAI
Initial preliminary release	April 5, 2018 (with the Retro Contest)
Full release	May 2018
Latest OpenAI version	0.8.0 (May 1, 2020)
License	MIT
Source	github.com/openai/retro
Maintenance status	Maintenance only since 2020
Active fork	stable-retro by the Farama Foundation
Game count at full release	~1,000 across 8 systems
Languages	C (~69%), C++ (~27%), Python bindings
Companion paper	Gotta Learn Fast (Nichol et al., 2018) ^[5]

What is Gym Retro?

Gym Retro is a thin Python layer that exposes emulated retro console games through the OpenAI Gym API, the de facto standard interface for RL environments. Each game becomes an environment whose observations are RGB screen frames, whose actions are the original gamepad's buttons, and whose rewards are computed from values read live out of the emulator's memory. Because it sits on top of the Libretro API, the same package can cover many different consoles: "It uses various emulators that support the Libretro API, making it fairly easy to add new emulators." ^[2]

By 2017 the standard RL benchmark for pixel-based agents was the Arcade Learning Environment (ALE), exposing a few dozen Atari 2600 ROMs through a Gym-compatible interface. ALE had carried the field through DQN, A3C, Rainbow, and PPO, but the limits were obvious: an 18-action joystick space, hand-tuned rewards, many titles already mastered, and agents that overfit to the specific ROM they were trained on. Gym Retro raised the ceiling by jumping from roughly 70 Atari and 30 Sega titles to over 1,000 games across eight systems, giving researchers a much wider and more visually varied pool to test generalization on. ^[1] ^[5]

OpenAI's earlier attempt at scale, Universe (late 2016), exposed thousands of browser and Flash titles through VNC but ran in wall-clock time and depended on fragile screen scraping. Retro took a different bet: run emulators inside the Python process and read game state directly out of emulator RAM, which unlocked deterministic resets, save states, and faster than real time training. The Retro Learning Environment, an earlier academic project for SNES and Genesis RL, used a similar trick; OpenAI credits it as inspiration but argues Gym Retro is more flexible because it abstracts over Libretro cores rather than baking in specific emulators. ^[1] ^[3] ^[4]

How does Gym Retro work? (architecture)

Gym Retro is a thin Python wrapper around emulator binaries plus a per-game data layer.

Libretro cores

Libretro is an emulator API that compiles each emulator into a single shared library called a core. A Libretro frontend (RetroArch is the best known) loads the core, sends it inputs, and pulls back video, audio, and memory. Retro acts as the frontend: each supported system ships its Libretro core inside the package, and adding a new console mostly means dropping in a new core. The Libretro API includes retro_get_memory_data and retro_get_memory_size, which Retro uses to read game variables for reward shaping. ^[3] ^[6]

Per-game data files

Each integrated game has four files plus at least one save state. As OpenAI describes it, "Each game integration has files listing memory locations for in-game variables, reward functions based on those variables, episode end conditions, savestates at the beginning of levels and a file containing hashes of ROMs that work with these files." ^[2]

File	Purpose
`data.json`	Maps named variables (lives, score, x position, ring count) to RAM addresses and types.
`scenario.json`	Defines the reward function and the done condition using variables from data.json.
`metadata.json`	Stores the default starting save state and other game-level settings.
`script.lua`	Optional Lua hooks for rewards or termination conditions that need logic beyond simple expressions.
`*.state`	Binary save states marking levels, checkpoints, or specific scenarios.

ROM hashes are stored in a rom.sha file, and most hashes match No-Intro SHA-1 sums. ROMs themselves are not shipped: users have to provide them, although a few non-commercial homebrew titles such as Airstriker-Genesis come bundled for testing. ^[2] ^[3]

Integration UI

The integration UI is a Qt desktop app that lets researchers step through a game frame by frame, watch RAM, and bookmark addresses as named variables. Lives, score, and progress counters get found by playing while sweeping memory for values that change in the expected way; the UI then exports them straight into a data.json for that ROM. ^[3]

Python interface

The Python API mirrors Gym. After installation a typical session looks like:

import retro
env = retro.make(game='Airstriker-Genesis')
obs = env.reset()
while True:
    action = env.action_space.sample()
    obs, reward, done, info = env.step(action)
    if done:
        break
env.close()

Observations are RGB frames, actions are multi-discrete vectors representing the original gamepad's buttons (Genesis has eight, NES has six, and so on), and info contains the data.json variables. Multiple states per game can be loaded by passing the state= argument to retro.make. ^[2]

How does Gym Retro relate to OpenAI Gym?

Gym Retro is not a replacement for OpenAI Gym; it is a collection of environments that plug into it. Gym, first released by OpenAI in 2016, defines the standard RL contract: an environment exposes reset(), step(action), an observation_space, and an action_space. Gym Retro implements that contract for emulated console games, so any agent or library written against Gym (such as Stable Baselines or OpenAI Baselines) can train on a Sonic level with no special plumbing. The relationship mirrors how its successor, stable-retro, is built on the Farama Foundation's Gymnasium: "Like Gym Retro built upon OpenAI Gym, Stable Retro is built upon Farama Foundation Gymnasium." ^[2] ^[11]

Which systems and games does Gym Retro support?

System	Emulator core	Notes
Atari 2600	Stella	Same hardware as ALE; Retro re-uses ALE-style ROMs.
NEC TurboGrafx-16 / PC Engine	Mednafen / Beetle PCE Fast	Preliminary support at full release.
Nintendo Game Boy / Game Boy Color	gambatte	Game Boy Color marked preliminary.
Nintendo Game Boy Advance	mGBA	Preliminary support.
Nintendo NES	FCEUmm	Standard 8-bit Nintendo.
Nintendo SNES	Snes9x	16-bit Nintendo.
Sega Game Gear	Genesis Plus GX	Preliminary.
Sega Genesis / Mega Drive	Genesis Plus GX	The flagship console for the Retro Contest.
Sega Master System	Genesis Plus GX	Same core covers Master System, Game Gear, and Genesis.

The full release covered eight emulated systems with about 1,000 ROMs integrated overall, and the Genesis catalog is the most thoroughly annotated because it was the focus of the contest. ^[2] ^[3]

How do you install Gym Retro?

Gym Retro shipped pre-built wheels for Windows 7/8/10, macOS 10.13 (High Sierra) and 10.14 (Mojave), and manylinux1 Linux. It supports Python 3.6, 3.7, and 3.8, and OpenAI recommended a CPU with SSSE3 or better. The simplest install is pip install gym-retro, which still works against the May 2020 0.8.0 wheels for compatible Python versions. Building from source requires CMake, a C++ compiler, and the cloned repository at github.com/openai/retro. ^[2] ^[7]

What is the Gotta Learn Fast paper?

OpenAI released a technical report titled "Gotta Learn Fast: A New Benchmark for Generalization in RL" by Alex Nichol, Vicki Pfau, Christopher Hesse, Oleg Klimov, and John Schulman, submitted to arXiv on April 10, 2018 and revised on April 23, 2018. The paper proposes a transfer-learning benchmark on the Sonic the Hedgehog Genesis trilogy: agents train on a large pool of training levels, then face a held-out set of test levels with one million timesteps (about 18 hours of in-game time at 60Hz) per test level. The report runs three baselines (Rainbow DQN, PPO, and a hand-coded random search baseline called JERK, short for 'Just Enough Retained Knowledge') and shows that joint PPO trained on the training levels then fine-tuned on each test level nearly doubles the performance of PPO trained from scratch on the test levels. The paper frames the benchmark as a complement to ALE rather than a replacement: ALE measures within-task performance, Sonic measures generalization. ^[5]

What was the Retro Contest?

Between April 5 and June 5, 2018, OpenAI ran a public competition called the Retro Contest for the best agent on previously unseen levels of the Sonic the Hedgehog Genesis games. Participants received the training levels from the three Genesis titles, were free to use any data or compute at training time, and submitted Docker containers; at test time each container had a one million timestep budget per held-out level, roughly 18 hours of in-game play. ^[4] ^[8]

Format and participation

Submissions ran on OpenAI's evaluation infrastructure: train or script an agent on the public Sonic levels, wrap it in a Docker image, run against five public test levels (low-quality levels generated with a Sonic level editor) for the live leaderboard, then final standings on a separate set of secret evaluation levels that competitors never see. Leaderboard scores were averaged over levels, with each level capped at a normalized maximum of 10,000. OpenAI reported 923 registered teams, 229 of which submitted at least one solution; the evaluation cluster ran 4,448 evaluations over two months, roughly twenty per submitting team. Most entries started from the tuned PPO and Rainbow DQN baselines shipped with the contest. ^[8]

Results

Top scores came from tuning existing model-free RL algorithms, not new architectures:

Rank	Team	Approach	Notes
1	Dharmaraja	Joint PPO with modifications	Six-member team (Qing Da, Jing-Cheng Shi, Anxiang Zeng, Guangda Huzhang, Run-Ze Li, Yang Yu); added a CNN layer, tuned n-step Q-learning, lower DQN target update interval.
2	Mistake	Custom PPO variant	Edged out aborg narrowly.
3	Aborg	Joint PPO with extra training data	Solo entry from Alexandre Borghi; mixed in Sonic levels from Master System and Game Boy Advance ports with a different network.

The top final score was 4,692 against a theoretical maximum of 10,000, which OpenAI took as a sign the benchmark was hard but not saturated and that Sonic-style transfer remained an open problem. The 'low quality' label on the evaluation set refers to the levels being editor-generated rather than crafted by Sega. ^[8]

How is Gym Retro used in research?

The Retro Contest got coverage in TechCrunch, The Register, and Wired, mostly framed as an OpenAI publicity move around Sonic. In academic circles Retro became a common base for transfer-learning and meta-RL work, often cited alongside ProcGen and the Obstacle Tower Challenge once those benchmarks appeared in 2019. The 16-bit games tested long-horizon credit assignment more cleanly than Atari, and the Sonic levels were the canonical transfer-learning benchmark thanks to Gotta Learn Fast. The platform shows up in work on world models, exploration bonuses, curriculum learning, and self-supervised learning with RL, and it remains a teaching staple in university RL courses. ^[1] ^[4] ^[5] ^[9]

Is Gym Retro still maintained? (stable-retro)

OpenAI moved Gym Retro into maintenance mode soon after release; the GitHub README has carried "Status: Maintenance (expect bug fixes and minor updates)" since 2018, and the last upstream release on PyPI was 0.8.0 on May 1, 2020. After that, the public RL gym ecosystem migrated away from OpenAI: Gym was forked into Gymnasium under the Farama Foundation, and Gym Retro followed the same path under the name stable-retro. ^[2] ^[10]

Stable-retro is led by Mathieu Poliquin and the Farama Foundation, and accepts pull requests for new games, emulator cores, and bug fixes that upstream no longer takes. The fork adds Sega Saturn, Sega CD, Sega 32X, Sega Dreamcast, Nintendo 64, Nintendo DS, and arcade machines while keeping the core Gym Retro API; Python support has been broadened to 3.7 through 3.12, and the Windows route runs through WSL2. The documentation lists more than 1,000 integrated games. For new RL projects on retro consoles, stable-retro is now the recommended starting point. ^[10] ^[11]

Platform	Year	Scope	Reset model	Notes
Arcade Learning Environment	2013	~60 Atari 2600 games	Deterministic	The original RL pixel benchmark; small action space and short horizons.
Universe	2016	Browser, Flash, commercial titles	Real-time, screen-scraped	Discontinued; reliability problems with VNC and timing.
Retro Learning Environment	2016	SNES, Genesis	Deterministic	Academic precursor to Gym Retro.
Gym Retro	2018	~1,000 games across 8 retro consoles	Deterministic save states	Maintained by OpenAI until 2020.
stable-retro	2022 onward	Same as Retro plus Saturn, N64, DS, arcade	Deterministic save states	Active maintained fork by the Farama Foundation.
ProcGen	2019	16 procedurally generated game-like environments	Deterministic	Designed specifically for testing generalization, with no licensed ROMs.
MiniGrid	2018 onward	Gridworlds	Deterministic	Lightweight benchmark for instruction following and planning.

Gym Retro and ProcGen ended up filling complementary roles: ProcGen tests generalization across procedurally generated variations of a single game family, while Retro tests it across human-designed levels that share style and mechanics. Researchers often cite both. ^[4] ^[5]

Limitations

ROMs are not bundled, so reproducible benchmarks depend on every contributor sourcing identical ROM hashes. Reward functions are extracted from RAM, which means each new game needs manual integration to find the right addresses. The 0.8.0 wheels target Python 3.6 to 3.8 on older operating systems; on modern macOS or Linux it is often easier to install stable-retro than to fight legacy build tooling. The Sonic benchmark has not been updated since 2018, and although Dharmaraja's 4,692 remains a useful reference, it predates diffusion policies, world models, and large-scale pre-training. ^[2] ^[10]

References

OpenAI. "Gym Retro." OpenAI Blog, May 25, 2018. https://openai.com/index/gym-retro/ ↩
OpenAI. "openai/retro" GitHub repository README. https://github.com/openai/retro ↩
Gym Retro Documentation. Read the Docs. https://retro.readthedocs.io/en/latest/ ↩
OpenAI. "Retro Contest." OpenAI Blog, April 5, 2018. https://openai.com/index/retro-contest/ ↩
Nichol, A., Pfau, V., Hesse, C., Klimov, O., and Schulman, J. "Gotta Learn Fast: A New Benchmark for Generalization in RL." arXiv:1804.03720, April 10, 2018. https://arxiv.org/abs/1804.03720 ↩
Libretro. "API." libretro.com. https://www.libretro.com/index.php/api/ ↩
PyPI. "gym-retro 0.8.0." Python Package Index. https://pypi.org/project/gym-retro/ ↩
OpenAI. "Retro Contest: Results." OpenAI Blog, June 27, 2018. https://openai.com/index/first-retro-contest-retrospective/ ↩
Lardinois, F. "Machine Learning Zone: OpenAI competition takes on Sonic the Hedgehog." TechCrunch, April 5, 2018. https://techcrunch.com/2018/04/05/machine-learning-zone-openai-competition-takes-on-sonic-the-hedgehog/ ↩
Farama Foundation. "stable-retro." GitHub repository. https://github.com/Farama-Foundation/stable-retro ↩
Stable-Retro Documentation. Farama Foundation. https://stable-retro.farama.org/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

2 revisions by 1 contributors · full history

Suggest edit

What links here

AI Wiki Artificial intelligence terms Gym (OpenAI Gym / Gymnasium)Terms Universe

Quick facts

What is Gym Retro?

How does Gym Retro work? (architecture)

Libretro cores

Per-game data files

Integration UI

Python interface

How does Gym Retro relate to OpenAI Gym?

Which systems and games does Gym Retro support?

How do you install Gym Retro?

What is the Gotta Learn Fast paper?

What was the Retro Contest?

Format and participation

Results

How is Gym Retro used in research?

Is Gym Retro still maintained? (stable-retro)

How does Gym Retro compare with related platforms?

Limitations

See also

References

Improve this article

Related Articles

A*

LLM Anxiety

AI in transportation

AI Anxiety

AI Monarchy

AI Parasite

What links here

Related Articles

A*

LLM Anxiety

AI in transportation

AI Anxiety

AI Monarchy

AI Parasite

What links here