Universe
Last reviewed
Sources
18 citations
Review status
Source-backed
Revision
v3 ยท 2,899 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
18 citations
Review status
Source-backed
Revision
v3 ยท 2,899 words
Add missing citations, update stale details, or suggest a clearer explanation.
Universe was an open-source software platform that OpenAI released on December 5, 2016 for measuring and training an artificial intelligence agent's general intelligence across, in OpenAI's words, "the world's supply of games, websites and other applications." [1] [7] Its initial release shipped with over 1,000 environments in which an agent could take actions and gather observations, and it let any program become an OpenAI Gym environment without access to that program's source code or internal APIs. [1] [7] OpenAI archived the project on April 6, 2018 and pointed users to Gym Retro instead, but Universe is widely regarded as an early ancestor of today's browser-using and "computer use" AI agents. [9] [11]
Universe is a middleware program that builds on OpenAI's Gym, a toolkit for the development and evaluation of reinforcement learning (RL) algorithms. [2] [3] Games and websites are used to train the agent. Any task that a person can solve on a computer is theoretically a viable option for training, with researchers able to plug an application into Universe so AI agents have a common way of interacting with it. [2] [4]
The software environments are instantiated in Docker containers, with AI agents interacting through a virtual keyboard and mouse using a Virtual Network Computing (VNC) remote desktop. The more interaction the agents have with the environment, the better they become at a specific task. [3]
Universe lets a user train and evaluate AI agents, with the agent using a computer the way a human would. This provides a wide range of real-time and complex environments. [1] The platform turns any program into an OpenAI Gym environment without needing special access to the program's internals, source code, or APIs. According to OpenAI's GitHub README, Universe "does this by packaging the program into a Docker container, and presenting the AI with the same interface a human uses: sending keyboard and mouse events, and receiving screen pixels." [7] [8]
The AI agent explores environments visually, observing pixels on a screen and inputting keyboard and mouse commands. [1] [2] This interface is implemented using the VNC program for remote desktop access. [2] Internally a Universe session has two halves: a Python client (a VNCEnv instance) running inside the agent's process, and a remote (a Docker container running the actual environment dynamics). The two communicate over VNC for pixels and keyboard/mouse events, and over a separate WebSocket channel for rewards, episode boundaries, and diagnostics. [7] [9]
Games provide the feedback loop necessary for the constant improvement of AI skills, gathering experience in small tasks and resolving new ones faster. [2] Ideally, the agent would surpass its specialized knowledge of a specific environment, aiming at more generalized intelligence. [7] [8] Reward functions are integral to RL: in many games, there is an on-screen score that can be used as a reward, and OpenAI shipped a convolutional-neural-network OCR model, running inside the Docker container, that read those scores from the pixel buffer and relayed them to the agent as rewards. [1] [8]
Besides the game environments, Universe includes browser-based navigation where the agent can interact with the web like people do, learning how to use elements like buttons, lists, and sliders. [1] OpenAI developed a benchmark called Mini World of Bits to understand the challenges of browser interactions in a simplified setting. It consists of 80 environments that range from simple tasks like clicking a button to difficult ones like replying to a contact in a simulated email client; OpenAI believed "that mastering these environments provides valuable signal towards models and training techniques that will perform well on full websites and more complex tasks." [1] [5]
The Universe Python client was supported on Linux and macOS, with Python 2.7 and Python 3.5 as the supported interpreters. The build chain depended on Go 1.5 or newer, NumPy, libjpeg-turbo, and a working Docker installation. There was no official Windows support; Windows users were directed to run the client through a Linux virtual machine. [7] [9]
Universe has been compared to ImageNet, a hand-labeled image database used to test image recognition systems. In Universe, images are substituted by flash games, web browsers, photo editors, and CAD software. [5] On release the platform shipped with about 2,500 Atari games, 1,000 flash games, and 80 browser environments, described by OpenAI as the largest single library of reinforcement learning environments at the time. [1] [5] Of the 1,000 flash games, only about 100 came with built-in reward functions at launch. [1]
For scale, the table below compares Universe's launch library with the dominant RL benchmark that preceded it, the Arcade Learning Environment.
| Environment family | Approximate count at launch | Notes |
|---|---|---|
| Atari games | ~2,500 | Drawn from the Atari lineage that the Arcade Learning Environment popularized [1] [5] |
| Flash games | 1,000 (about 100 with rewards) | Distributed inside a Docker image; chosen as a starting point for scaling [1] |
| Browser environments | 80 | The Mini World of Bits benchmark of web tasks [1] [5] |
| Arcade Learning Environment (prior art) | 55 | The largest comparable RL resource before Universe [2] [3] |
According to OpenAI, flash games were a starting point for scaling because they are pervasive on the internet, usually with better graphics than Atari titles but still simple enough for early agents. [1] OpenAI also noted that with environments running asynchronously inside the Docker image with a local network in the cloud, games usually ran at 60 frames per second, while over public internet this dropped to about 20 frames per second. [1]
According to OpenAI, the goal of the project was to "develop a single AI agent that can flexibly apply its past experience on Universe environments to quickly master unfamiliar, difficult environments, which would be a major step towards general intelligence." Ilya Sutskever, an OpenAI researcher, said "an AI should be able to solve any problem you throw at it." [1] [4]
By expanding the number of training resources, OpenAI expected the education of AI agents to accelerate. Before Universe, the largest reinforcement learning resource of comparable design was the Arcade Learning Environment, which included 55 Atari games. [2] [3] Universe set out to push that number into the thousands by absorbing whole categories of software previously considered too messy for RL: web browsers, photo editors, CAD tools, and games with no programmatic interface.
On release, Universe shipped with the largest library of games and resources ever assembled for RL, including 1,000 flash games distributed in a Docker image, games like Slither.io and StarCraft, browser-based tasks, and applications like form filling and Foldit. [1] [7] As a worked example, OpenAI trained an agent through RL on Slither.io, where the player avoids collision with other snakes. After about six days of training, the agent scored "an average of 1,000 points, with a high score of 9,300 points. As a point of comparison, OpenAI machine-learning researcher Rafal Jozefowicz, with five hours of playing experience, averaged about 1,400 points, with a high score of 7,050." [3]
Universe was unveiled on December 5, 2016, the same week as the Conference on Neural Information Processing Systems (NeurIPS, then known as NIPS) in Barcelona. [1] [2] [10] OpenAI had launched Gym in April of the same year as a more limited RL toolkit, and Universe was positioned as the next step: a way to wrap arbitrary off-the-shelf software into Gym-compatible environments without requiring source code or internal APIs. [1] [10] The rollout was tied to the NIPS schedule, with the project discussed by co-founders Greg Brockman and Ilya Sutskever alongside the company's other reinforcement learning work. [10]
Progress slowed during 2017. In April 2018, OpenAI released a follow-up project called Gym Retro, which integrated emulated Sega Genesis and other classic console games into the Gym interface using direct memory access rather than VNC. The same month, the Universe repository was archived on GitHub, with the README updated to recommend Retro for new work. [9] [11] In retrospective notes about Retro, OpenAI acknowledged that it could not get good results from Universe because its environments "ran asynchronously, could only run in real time, and were often unreliable due to screen-based detection of game state." [11]
The table below sketches the project's main milestones.
| Date | Event |
|---|---|
| April 2016 | OpenAI releases Gym, the toolkit Universe later builds on. [10] |
| December 5, 2016 | Universe is announced on the OpenAI blog with over 1,000 environments at launch. [1] [2] |
| December 2016 | Universe is presented during NIPS 2016 in Barcelona. [10] |
| 2017 | OpenAI and Stanford researchers publish "World of Bits" at ICML 2017, expanding the browser benchmark seeded with Universe. [12] |
| April 6, 2018 | The openai/universe GitHub repository is archived; users are redirected to Gym Retro. [9] [11] |
During the implementation of this platform, OpenAI emphasized four design properties:
A notable feature of the launch was the list of game and software publishers that granted OpenAI permission for Universe agents to play their commercial titles. The headline partners were EA (Electronic Arts), Microsoft Studios, Valve, and Wolfram Research, with smaller indie studios contributing additional titles. [1] [13] [14] The table below lists representative titles named in the launch announcement and contemporary press coverage.
| Software | Publisher / origin | Type |
|---|---|---|
| Portal | Valve | First-person puzzle |
| Wing Commander III | EA | Space combat |
| Command & Conquer: Red Alert 2 | EA | Real-time strategy |
| Sid Meier's Alpha Centauri | EA | Turn-based strategy |
| Magic Carpet | EA (Bullfrog) | First-person shooter |
| Mirror's Edge | EA | First-person platformer |
| Syndicate (1993) | EA (Bullfrog) | Real-time tactics |
| Fable Anniversary | Microsoft Studios | Action role-playing |
| World of Goo | 2D Boy | Physics puzzle |
| RimWorld | Ludeon Studios | Colony simulation |
| Slime Rancher | Monomi Park | Life simulation |
| Shovel Knight | Yacht Club Games | 2D platformer |
| SpaceChem | Zachtronics | Puzzle |
| Wolfram Mathematica | Wolfram Research | Computer algebra |
| Slither.io | Steve Howse | Browser game |
| StarCraft | Blizzard Entertainment | Real-time strategy |
| Grand Theft Auto V | Rockstar Games | Open world (community integration) |
| Foldit | University of Washington | Protein folding game |
Grand Theft Auto V appeared throughout press coverage, although the actual integration was developed in parallel by Craig Quiter with NVIDIA and was not part of the initial release. [1] [9] [13] OpenAI also discussed plans to connect Universe with Microsoft Research's Project Malmo, a Minecraft-based AI sandbox, although that crossover did not become a maintained integration. [13]
Coverage of the launch was generally enthusiastic, treating Universe as a step beyond Atari benchmarks toward a more realistic test bed for general intelligence. The Register described it as a "universal training ground for computers," PCWorld and ITPro framed it as a way to teach AI to use software the way humans do, and SD Times noted that the platform was a clear continuation of Gym in a more open-ended direction. [2] [13] [14] [15] Michael Bowling of the University of Alberta, who had worked on the Arcade Learning Environment, told Futurism that the breadth of Universe was useful as long as researchers remembered that games are a means rather than an end. [16]
The enthusiasm did not last. By mid-2017, GitHub issues on the Universe repository were noting that pull requests were sitting unreviewed, that several integrations were broken on recent Docker releases, and that the rate of new content had slowed sharply. [9] In April 2018 OpenAI shipped Gym Retro and ran the Retro Contest, a transfer learning competition centered on Sonic the Hedgehog. The Retro launch posts argued that VNC-based remote desktops were not a good substrate for reinforcement learning: the agent could not run faster than wall-clock time, the screen-scraped state was noisy, and the integration overhead of every new title was high. [11]
The project has since been abandoned by OpenAI in favor of Gym Retro. The public GitHub repository was archived on April 6, 2018, with a deprecation notice stating that the "repository has been deprecated in favor of the Retro library," and an issue titled "This project is ABANDONED" confirmed that maintenance had stopped. [6] [9] Several upcoming developments described in the Universe launch blog post were never released, including environment integration tools so any user could contribute new integrations, and the public release of human demonstration data. [6] [9]
The core technical reasons were architectural. Because Universe drove real software over VNC rather than emulating it, every environment ran in real time and could not be sped up, the agent's view of game state came from noisy screen pixels rather than memory, and the overhead of integrating each new title was high. [11] Gym Retro addressed these by reading game state directly from emulator memory, which let environments run faster than real time and report exact rewards. [11]
Even though the Universe codebase itself was abandoned, the ideas it tested have had a long afterlife. The Mini World of Bits subset became the seed for the World of Bits paper presented at ICML 2017 (pages 3135-3144) by Tianlin Shi, Andrej Karpathy, Linxi Fan, Jonathan Hernandez, and Percy Liang. The paper introduced an open-domain platform for web-based agents, with crowdworkers writing natural language tasks and demonstrations on real websites, and HTTP traffic cached so the tasks could be replayed offline. [12] Announcing the work, Karpathy described it as a "Mini World of Bits project (agents learn to use the web) at OpenAI and how to use it with Universe." [18] That benchmark was later cleaned up and extended by Stanford researchers as MiniWoB++, with more than 100 web interaction tasks, and by 2022 had become a standard reference for browser-based LLM agents. [12] [17]
More recent web-agent benchmarks acknowledge Universe and MiniWoB as predecessors. WebArena, released in 2023, builds a self-hosted set of realistic websites for agents to navigate, citing Mini World of Bits as a foundational simplified benchmark. The same lineage runs through Mind2Web, VisualWebArena, and various "computer use" agents built on large language models, all of which can be read as successors to the basic Universe idea: give an agent pixels and a keyboard, then ask it to operate real software.
Universe also influenced OpenAI's own internal direction. The lessons about asynchronous execution and pixel-based state estimation pushed the company toward emulator-backed environments in Gym Retro and simulators with controlled tick rates in projects like the OpenAI Five Dota 2 effort. The goal of an agent that can flexibly apply prior experience to unfamiliar software has since become more reachable through large multimodal models rather than the from-scratch reinforcement learning Universe was designed for, and recent agent products for browsing and computer use trace a clear genealogy back to this 2016 platform.