# Simulation (in AI and robotics)

> Source: https://aiwiki.ai/wiki/simulation
> Updated: 2026-06-24
> Categories: Reinforcement Learning, Robotics
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

**Simulation** in [artificial intelligence](/wiki/artificial_intelligence) and [robotics](/wiki/robotics) is the use of computational physics, rendering, and procedural environments to recreate a synthetic version of a physical or virtual world inside which AI agents can perceive, act, learn, and be evaluated. It is the dominant way modern robots and [reinforcement learning](/wiki/reinforcement_learning) agents are trained: virtual robots run inside simulators for the equivalent of decades of experience, then the resulting policies are deployed on real hardware. The scale advantage is stark. A real robot collecting demonstrations produces a few hundred trajectories an hour, while a GPU-parallel simulator such as the Genesis engine reaches roughly 43 million frames per second for a Franka manipulation scene on a single RTX 4090, about 430,000 times faster than real time. [9] Without simulation, the field of reinforcement learning as we know it would not exist, and most contemporary work on humanoid robots, autonomous vehicles, dexterous manipulation, and embodied agents would be either far slower or simply impossible.

The term means something specific here. In statistics and physics, "simulation" can mean Monte Carlo sampling or finite-element solvers run for engineering. In the AI and robotics context the focus is narrower: physics engines and 3D environments wired into machine learning pipelines, with goals like data collection, [domain randomization](/wiki/domain_randomization), [sim-to-real](/wiki/sim_to_real) transfer, and the training of generalist policies. The frontier has expanded recently to include generative models that learn the simulator itself from video, blurring the line between traditional rigid-body engines and neural [world models](/wiki/world_model).

## Why does simulation matter for AI and robotics?

Real robots are expensive, slow, and break. A single Boston Dynamics Spot or a Unitree H1 humanoid costs tens of thousands of dollars; replacement parts can take weeks; and a fall during a learning episode might end an experiment for a day. Simulation sidesteps almost all of those constraints.

A simulator gives researchers cheap, parallelizable, safe, and resettable data. You can spin up thousands of robots on a single GPU, run them at hundreds or thousands of times real-time speed, and reset to any prior state on demand. Bad policies break virtual robots without consequences. Curricula are easy to design because you can vary anything: gravity, friction, lighting, the mass of an object, the geometry of a kitchen. None of that is feasible in the physical world.

There are several distinct reasons the field leans on simulation:

- **Cheap data collection.** A real robot collecting demonstrations might generate a few hundred trajectories an hour. A GPU-parallel simulator like [Isaac Lab](/wiki/isaac_lab) or Brax can generate millions per hour from thousands of parallel environments.
- **Safe exploration.** RL agents need to try bad actions to learn. A walking robot has to fall over many times before it stops falling. Letting that happen in simulation avoids hardware damage and human risk.
- **Counterfactuals.** Simulators let you ask what-if questions. What if the door were heavier, the floor more slippery, the camera placed 5 cm to the left? In the physical world, comparing those conditions cleanly is difficult; in simulation it is a parameter sweep.
- **Reproducibility.** Real-world experiments are notoriously hard to reproduce because lighting, calibration, and wear vary. Simulations are deterministic given the same seed, so different research groups can compare methods fairly.
- **Coverage.** Edge cases that are rare in reality (a child running into the road, a robot dropping a glass on a tile floor) can be sampled at will. Self-driving programs depend on this kind of synthetic edge-case generation, since the truly dangerous scenarios are not common enough to learn from on real roads alone.
- **Speed.** A modern GPU simulator runs tens of thousands of rigid-body steps per second on commodity hardware, and millions of steps per second when batched. That changes what algorithms are practical: PPO with billions of steps was a research curiosity in 2017 and routine by 2022.

All of this comes with the central, unsolved tradeoff: a simulator is a model, and models are wrong. Bridging the gap to the real world is the central engineering problem of the field, addressed mainly through domain randomization, system identification, and domain adaptation.

## What are the major physics engines for AI and robotics?

The physics engine is the core of any simulator. It computes how bodies move, collide, deform, and interact with actuators and sensors. The engines listed below are the ones most commonly cited in robotics and RL research; each has different tradeoffs in accuracy, speed, parallelism, and ergonomics.

| Engine | Origin | License | Strengths | Common use |
| --- | --- | --- | --- | --- |
| [MuJoCo](/wiki/mujoco) | Emo Todorov, 2012; DeepMind acquired 2021, open-sourced 2022 | Apache 2.0 | Fast, accurate contact-rich rigid-body dynamics; analytic gradients via MJX | Continuous-control RL benchmarks; humanoid locomotion; manipulation |
| Bullet / [PyBullet](/wiki/pybullet) | Erwin Coumans (Sony, AMD, Google, NVIDIA), early 2000s | zlib | Mature collision detection; Python-friendly; URDF support | Hobbyist robotics, classic OpenAI Gym tasks |
| [Gazebo](/wiki/gazebo) | USC 2002, then Open Robotics | Apache 2.0 | Tight ROS integration; sensor models; large robotics community | ROS-based simulation, system integration testing |
| Isaac Sim and [Isaac Lab](/wiki/isaac_lab) | NVIDIA, built on [Omniverse](/wiki/omniverse) and PhysX | Open under EULA | Photoreal rendering; GPU parallel; OpenUSD scene format | Industrial robotics, humanoids, large-scale RL |
| Brax | Google, 2021 | Apache 2.0 | Fully differentiable; written in JAX; massive parallelism on TPU/GPU | Differentiable RL, learned dynamics, JAX pipelines |
| Drake | MIT (Russ Tedrake) and TRI, since 2005 | BSD-3 | Rigorous multibody dynamics; strong contact mechanics; optimization tooling | High-fidelity research, control theory, manipulation |
| Genesis | Genesis Embodied AI consortium, December 2024 | Apache 2.0 | Multi-physics (rigid, soft, fluids, MPM); Python; very fast | Generative robotics workflows, embodied AI research |
| Webots | Cyberbotics (originally EPFL, 1996); open-sourced December 2018 | Apache 2.0 | Educational use, large robot library, scripted scenarios | Teaching, RoboCup, prototyping |

### MuJoCo

[MuJoCo](/wiki/mujoco) (Multi-Joint dynamics with Contact) was published by Emanuel Todorov in 2012 and quickly became the standard physics engine for academic continuous-control RL. [1] Most of the canonical benchmark tasks (HalfCheetah, Humanoid, Ant, the OpenAI Gym MuJoCo suite) use it. DeepMind acquired the engine and Roboti LLC in October 2021, made the binaries free, and in May 2022 open-sourced the full code under Apache 2.0. The DeepMind release post described the goal plainly: "We are committed to developing and maintaining MuJoCo as a free, open-source, community-driven project with best-in-class capabilities." [2]

The modern MuJoCo ecosystem includes MJX, a JAX implementation that runs entire simulations on accelerators with analytic gradients, and MuJoCo Warp (announced 2025), an NVIDIA-collaborative GPU port. DeepMind reports that the Warp version reaches more than 70x speedup for humanoid simulation and around 100x for in-hand manipulation compared to the CPU baseline. [26] MuJoCo is also the physics backend for tools like RoboCasa and the MuJoCo Playground.

### PyBullet and Bullet

Bullet started in the early 2000s and has been used in everything from feature films to AAA games. Erwin Coumans, the original author, has worked on it through stints at Sony, AMD, Google, and NVIDIA. [3] PyBullet is the Python wrapper that turned it into a standard tool for RL research. It is mature, well-documented, and a comfortable starting point for anyone new to robotics simulation, though for cutting-edge GPU parallelism it has been overshadowed by MuJoCo MJX, Brax, Isaac Sim, and Genesis.

### Gazebo

Gazebo grew out of the Player Project at the University of Southern California in 2002, became its own project under Willow Garage in 2011, and has been stewarded by Open Robotics (formerly OSRF) since 2012. [4] Its tight integration with ROS made it the default simulator for industrial and academic robotics groups for over a decade. Gazebo went through a confusing rebrand: a modern fork called Ignition Gazebo started in 2017, and after a 2022 trademark dispute Open Robotics renamed the original to Gazebo Classic and the new fork to just Gazebo. Gazebo Classic was retired in 2025.

### NVIDIA Isaac

[NVIDIA Isaac](/wiki/nvidia_isaac) is the umbrella name for NVIDIA's robotics stack. It includes Isaac Sim, a photorealistic GPU simulator built on [Omniverse](/wiki/omniverse) and powered by PhysX; [Isaac Lab](/wiki/isaac_lab), the RL training framework that replaced the deprecated Isaac Gym, IsaacGymEnvs, OmniIsaacGymEnvs, and Orbit projects; [5][6] and Isaac GR00T, NVIDIA's humanoid robot foundation model effort. Isaac Sim renders scenes in real time, supports OpenUSD, and is now the most common simulator for industrial humanoid and manipulation programs.

### Brax

[Brax](/wiki/brax) is a fully differentiable rigid-body physics engine written in JAX, released by Google researchers (Freeman, Frey, Raichuk, Girgin, Mordatch, Bachem) in 2021 and presented at NeurIPS 2021. [7] Because the entire simulator is JAX-traceable, environment dynamics, neural networks, and the optimizer all compile together and run on the same accelerator. The combination is what allows Brax to train agents in seconds to minutes, which is hard to picture if your reference point is older RL workflows that took days. Brax is also the natural home for differentiable physics research, where gradients flow through the dynamics into policies.

### Drake

Drake started in 2005 in Russ Tedrake's Robot Locomotion Group at MIT CSAIL and is now jointly developed with the Toyota Research Institute. [8] It is more conservative than the GPU-first simulators above. Drake invests heavily in numerically robust contact mechanics, hydroelastic contact models, and a systems framework that integrates well with optimization-based control. It is C++ with Python bindings and is used in research where fidelity matters more than raw simulation throughput, such as control of humanoids, dexterous manipulation, and academic underactuated robotics.

### Genesis

Genesis was released on December 19, 2024 by a consortium of more than 20 research labs led by Zhou Xian, then a final-year PhD student at Carnegie Mellon University's Robotics Institute, after a 24-month development effort. [9] It is an Apache 2.0 Python simulator with a generative front-end (text-to-scene) and a multi-physics back-end that handles rigid bodies, soft bodies, cloth, fluids, and material point method (MPM) materials. The project reports throughput in the range of 10x to 80x faster than Isaac Gym or MuJoCo MJX, with a Franka manipulation scene running at around 43 million frames per second on a single RTX 4090 GPU, roughly 430,000 times faster than real time. [9] Whether those numbers hold across diverse workloads in independent benchmarks is still being shaken out by the community, but Genesis has clearly become a major platform.

### Webots

Webots was started at EPFL in 1996, commercialized by Cyberbotics from 1998 onward, and open-sourced under Apache 2.0 in December 2018. [10] It has a strong educational and competition footprint (RoboCup, university courses) and a polished GUI, with bindings for C, C++, Python, Java, MATLAB, and ROS.

## What simulators are used for embodied AI?

A second class of simulators sits on top of physics engines and provides large 3D environments populated with rooms, objects, and tasks. These are designed for embodied AI: agents that navigate and manipulate inside human environments. Photorealism, scene diversity, and task variety matter more here than raw physics throughput.

| Simulator | Lead organization | First released | Underlying engine | Focus |
| --- | --- | --- | --- | --- |
| Habitat | Meta AI (FAIR) | 2019 (Habitat 1.0) | Custom; Bullet for physics | Indoor navigation, embodied agents |
| Habitat 3.0 | Meta AI | October 2023 | Same lineage | Human-robot collaboration, social rearrangement |
| AI2-THOR | Allen Institute for AI | 2017 | Unity | Household interaction, navigation, manipulation |
| ManipulaTHOR | Allen Institute for AI | 2021 | Unity (AI2-THOR) | Manipulation with a 6-DoF arm in indoor scenes |
| iGibson and OmniGibson | Stanford | 2020 (iGibson 1.0) | Bullet, then NVIDIA Isaac Sim | Interactive household tasks, BEHAVIOR benchmarks |
| BEHAVIOR-1K | Stanford | 2022 | iGibson/OmniGibson | 1,000 everyday household activities |
| RoboCasa | UT Austin and NVIDIA (Mandlekar et al., RSS 2024) | June 2024 | MuJoCo via robosuite | Kitchen tasks for generalist robot policies |
| ManiSkill / ManiSkill3 | UC San Diego (Su Lab) | 2021; ManiSkill3 in 2024 | SAPIEN | GPU-parallel manipulation benchmark |
| ProcTHOR | Allen Institute for AI | 2022 | Unity | Procedurally generated 10K houses |

Habitat from Meta AI emphasizes fast navigation in photorealistic indoor scans (initially Matterport3D, Replica, and Gibson). [11] Habitat 3.0, released in October 2023, added human avatars that can be controlled by a learned policy or a real person via VR, opening up benchmarks like social rearrangement and social navigation where a robot and a person tidy a room together. [12]

AI2-THOR from the Allen Institute for AI (AI2) takes the opposite approach: hand-modeled rooms in Unity with carefully crafted interactions. [13] Pour water into a kettle, place it on a stove, watch it boil. ManipulaTHOR added a 6-DoF arm; ProcTHOR procedurally generated 10,000 houses. AI2's recent MolmoSpaces effort unifies 230,000 indoor scenes and 130,000 object models with around 42 million annotated grasps.

RoboCasa was published at Robotics: Science and Systems in 2024 by Soroush Nasiriany, Ajay Mandlekar, Yuke Zhu, and collaborators. [14] It is built on MuJoCo via the robosuite framework and focuses on kitchen environments populated with thousands of generative-AI-produced 3D assets. The follow-up RoboCasa365 covers 365 everyday tasks across 2,500 kitchens with hundreds of hours of human and synthetic demonstrations. ManiSkill3, from the Su Lab at UCSD, runs on SAPIEN and reports up to 30,000+ FPS for state-visual GPU manipulation, depending on the task. [15]

## What simulators are used for autonomous driving?

Self-driving research has its own simulator ecosystem because the relevant physics (large vehicles, road surfaces, traffic) and the relevant tasks (perception in adversarial conditions, multi-agent prediction) are quite different from indoor robotics.

| Simulator | Origin | Engine | Notes |
| --- | --- | --- | --- |
| CARLA | Intel Labs and Toyota Research, 2017 paper | Unreal Engine | Open-source, leading academic AV simulator |
| AirSim | Microsoft Research, 2017 | Unreal Engine, Unity plugin | Drones and ground vehicles; archived 2022 in favor of Project AirSim |
| NVIDIA DRIVE Sim | NVIDIA, on Omniverse | PhysX, RTX rendering | Used by Mercedes-Benz, Volvo, others |
| LGSVL Simulator | LG, 2019 to 2022 | Unity | Discontinued in 2022 |
| Carcraft / Simulation City | Waymo, internal | Proprietary | Reported tens of billions of simulated miles per year |
| Cognata | Cognata Inc. | Proprietary | OEM-focused, sensor accurate |

CARLA was introduced in the paper *CARLA: An Open Urban Driving Simulator* by Dosovitskiy, Ros, Codevilla, Lopez, and Koltun (Intel Labs, Toyota Research Institute, and the Computer Vision Center in Barcelona) at the first Conference on Robot Learning (CoRL) in 2017. [16] It is the standard academic simulator for autonomous urban driving, with a flexible sensor suite (cameras, LiDAR, radar, depth, semantic segmentation) and configurable weather and traffic. AirSim was a major Microsoft Research effort starting in 2017, but Microsoft archived the open-source repo in 2022 and refocused on the closed-source Microsoft Project AirSim for the aerospace industry. [17]

Waymo's internal simulator, sometimes called Carcraft and later Simulation City, is the most heavily used closed system. The company has reported that for every mile its cars drive on real roads, hundreds or thousands of miles are driven in simulation, much of it focused on rare and dangerous edge cases.

## How does GPU-accelerated parallelism speed up simulation?

The most important shift in the last few years has been the move from single-threaded CPU simulators to massively parallel GPU simulators. The pattern is the same in each project: instead of stepping one environment at a time, batch tens of thousands of environments together as a single tensor and step them all on the GPU. Throughput goes up by two to four orders of magnitude.

- **Isaac Gym (deprecated 2023) and Isaac Lab.** Isaac Gym, released as a preview by NVIDIA, popularized GPU-resident simulation for RL. [5] It has been replaced by [Isaac Lab](/wiki/isaac_lab), built on Isaac Sim, which unifies the older IsaacGymEnvs, OmniIsaacGymEnvs, and Orbit codebases. [6]
- **MJX.** A JAX implementation of MuJoCo that compiles entire simulations to XLA. It does not match Isaac Lab's photorealism but is widely used in research because it preserves MuJoCo semantics while running on accelerators, and it carries gradients.
- **Brax.** A pure-JAX rigid-body engine designed from day one for accelerators. The original NeurIPS 2021 paper reported one to two orders of magnitude faster training compared to a typical workstation setup, condensing experiments that took days into minutes. [7]
- **Genesis.** Reports throughput in the millions of FPS on commodity GPUs with multi-physics support. [9]
- **MuJoCo Warp.** Announced at GTC 2025 as part of the joint NVIDIA / Google DeepMind / Disney *Newton* initiative, with reported speedups north of 70x for humanoid sim and around 100x for in-hand manipulation. [26]
- **ManiSkill3.** Built on SAPIEN, with up to 30,000+ FPS on tasks involving rendering and contact-rich manipulation. [15]

What changed in practice is that an algorithm like PPO that needed many CPU days now finishes in tens of minutes. That has reshaped how problems are posed: training a quadruped to walk in 10 minutes with 4,096 parallel environments on a single GPU is now a homework assignment rather than a publication.

## What is domain randomization?

A simulator that exactly matched reality would let you train policies in simulation and deploy them. No simulator does. The reality gap, the difference between simulated and real dynamics, lighting, and sensors, is the source of most sim-to-real failures.

[Domain randomization](/wiki/domain_randomization) is the dominant practical fix. The technique was introduced for vision in the 2017 paper *Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World* by Tobin, Fong, Ray, Schneider, Zaremba, and Abbeel (then at OpenAI and UC Berkeley, IROS 2017). [18] The core idea, in the authors' words, is that "with enough variability in the simulator, the real world may appear to the model as just another variation." [18] The original paper trained a real-world object detector accurate to 1.5 cm using only synthetic data with random non-photorealistic textures, which the authors describe as the first successful transfer of a deep neural network trained only on simulated RGB images to the real world for robotic control. [18]

Domain randomization came into the mainstream when OpenAI's Dactyl project used dynamics and visual randomization to train a Shadow Hand to manipulate a cube and (in a 2019 result) to solve a Rubik's cube using only simulated experience. [19][20] The trained policy transferred to the real Shadow Dexterous Hand without retraining and stayed robust to disturbances it had never seen, including being prodded by a stuffed giraffe. [20] That result relied on automatic domain randomization (ADR), where the simulator's randomization range is itself adapted by a curriculum of ever-increasing difficulty; structured DR, where physics parameters are sampled from posteriors fitted to real data, is a related modern variant. [20]

A short list of the things people typically randomize:

- visual: textures, colors, lighting, distractor objects, camera intrinsics and extrinsics, image noise
- dynamics: friction coefficients, contact stiffness, motor latency, joint backlash, payload mass
- sensing: IMU bias, encoder noise, latency, packet loss
- environment: obstacle layout, slope, surface compliance

This is one of those techniques that sounds dumb until it works. Throw a wide enough net and the policy learns features that are invariant to the things you randomized, which often happen to be the things that vary in reality.

## What is sim-to-real transfer?

[Sim-to-real](/wiki/sim_to_real) (sometimes written *sim2real*) is the umbrella term for getting a policy trained in simulation to work on real hardware. Domain randomization is one piece of it, but the broader toolkit also includes:

- **System identification.** Use real-world data to fit the simulator's parameters (mass, friction, link lengths) before training. This narrows the reality gap before randomization has to cover it.
- **Domain adaptation.** Learn a mapping (often adversarial or contrastive) between simulated and real observations so the policy sees a similar distribution at deployment.
- **Real-to-sim-to-real.** Reconstruct the real world (often with NeRF or 3D Gaussian Splatting) into a simulator, train inside it, then deploy. This is increasingly common for kitchen and warehouse scenes.
- **Hand-eye calibration and sensor modeling.** A surprising amount of sim-to-real work is just modeling the camera, IMU, motor, and tactile sensor properly.
- **Co-training and finetuning on real data.** Pre-train in sim, finetune in real with a smaller dataset.

Quadruped locomotion is the cleanest commercial success of sim-to-real. ANYmal (ETH Zurich), Unitree's robots, and Boston Dynamics' Spot all rely heavily on simulation for their walking controllers. Recent humanoid demos from Figure, 1X, Unitree, and Tesla follow the same playbook: train in sim with randomization, then deploy on hardware. Manipulation is harder, partly because contact and friction are harder to simulate accurately, but RoboCasa, ManiSkill3, and the Stanford BEHAVIOR programs are pushing the state of the art.

## What is differentiable simulation?

A differentiable simulator can take gradients of physical quantities with respect to actions, parameters, or initial conditions. That lets gradient-based optimization replace some uses of reinforcement learning. Instead of sampling thousands of trajectories, you backpropagate through the dynamics directly.

| Engine | Differentiable? | Notes |
| --- | --- | --- |
| Brax | Yes (JAX) | First-class differentiability; widely used for JAX-based control research |
| MuJoCo MJX | Yes | Analytic and finite-difference gradients via XLA |
| Genesis | Yes | Differentiable across rigid, soft, and MPM materials |
| Drake | Partial | Analytical gradients in some subsystems; AutoDiff scalars |
| DiffTaichi | Yes | Research framework for differentiable physics in Taichi |
| Newton (announced 2025) | Yes | Differentiable physics is a stated design goal |

Differentiable simulation is not a clean win. Contact and friction are non-smooth, so naive gradients can be biased or noisy, and sample-based methods like PPO often still beat gradient-based methods for tasks that involve a lot of contact. Where differentiable sim has worked well is for soft-body manipulation, parameter identification, trajectory optimization, and any setting where you want to optimize across many physical parameters at once.

## What are generative simulators?

The newest entry on the simulation side is not a physics engine at all. It is a neural network that learns to produce video conditioned on actions. Train such a model on enough gameplay or robot footage, and you get something that behaves like a simulator: it lets you take an action, and it shows you what would happen next.

The headline projects:

- **GameNGen** (Valevski, Leviathan, Arar, Fruchter, August 2024). A diffusion model that simulates id Software's Doom interactively at 20 FPS on a single TPU. [23] The model is trained on recordings from an RL agent that played the game, and human raters distinguishing real Doom clips from simulated ones do only slightly better than chance even after five minutes of generation. [23] The paper, *Diffusion Models Are Real-Time Game Engines*, was published at ICLR 2025.
- **Genie 1, 2, 3** (Google DeepMind). Genie 1 (early 2024) generated playable 2D worlds from a single image. Genie 2 (December 2024) extends to 3D, plays first or third person, and stays consistent for around a minute. [24] Genie 3, announced as a research preview in August 2025, generates navigable worlds from text at 720p and 24 FPS in real time while staying consistent for a few minutes, and is the most capable world model from DeepMind to date. [25]
- **NVIDIA Cosmos** ([NVIDIA Cosmos](/wiki/nvidia_cosmos)). Announced at CES 2025 and expanded with a major release on March 18, 2025. [27] Cosmos World Foundation Models include Cosmos Predict (future-frame prediction), Cosmos Transfer (style transfer between simulated and real domains), and Cosmos Reason (a reasoning model). NVIDIA describes Cosmos as enabling "synthetic data generation to augment training datasets, simulation to test and debug physical AI models before they are deployed in the real world, and reinforcement learning in virtual environments." [27] Early adopters span robotics firms 1X, Agility Robotics, and XPENG and AV developers Uber and Waabi. The October 2025 release introduced Cosmos-Predict 2.5 and Cosmos-Transfer 2.5. Cosmos is aimed at synthetic data generation for physical AI, especially robotics and self-driving.
- **Sora** ([Sora](/wiki/sora)). OpenAI has framed Sora as a step toward "world simulators," though it is primarily a video generation model. The same is true of Veo, Runway Gen-3, and similar systems.

These systems are not drop-in replacements for physics engines. They have stunning visual fidelity but no guarantees of physical consistency, no contact mechanics, no notion of mass. What they offer is *coverage*: arbitrary scenes, arbitrary actions, no need to model the world by hand. The likely future is a combination, with classical physics simulators handling contact-rich manipulation and generative models handling visual diversity, sensor simulation, and rare scenarios.

## How do world models relate to simulation in reinforcement learning?

A closely related but distinct line of work is the use of learned [world models](/wiki/world_model) inside RL itself. Instead of using the simulator only at training time, a world model is a neural network that the agent rolls out in its own head during training and even during deployment.

The canonical paper is Ha and Schmidhuber's *World Models* (2018), which trained a VAE plus RNN on car racing rollouts and showed that policies trained entirely "inside the dream" of the world model could transfer back to the real environment. [21] Hafner's Dreamer line (DreamerV1 in 2019, DreamerV2 in 2021, DreamerV3 in 2023) generalized this. DreamerV3, *Mastering Diverse Domains through World Models* (Hafner, Pasukonis, Ba, Lillicrap, published in Nature in April 2025), is a single-configuration algorithm that masters more than 150 tasks with a fixed set of hyperparameters and was the first to obtain diamonds in Minecraft from scratch given sparse rewards, a long-standing challenge that previously required human data or domain-specific heuristics. [22] Other notable model-based RL methods include MuZero, IRIS (transformer-based world models), TD-MPC2, and Sutton's Dyna lineage going back to the 1990s.

The boundary between "a simulator" and "a world model" has gotten blurry. Genie 2, GameNGen, and Cosmos behave like simulators (you can take actions and observe results) but are trained from data rather than coded. World models in Dreamer behave like policies' internal simulators. The unifying view is that anything that lets an agent ask "what happens if I do X?" is, functionally, a simulator.

## What are the recent milestones (2024-2026)?

The field has moved fast. A few markers from the past two years:

- **December 2024.** Genesis open-sources a multi-physics Python simulator with reported throughput around 43 million FPS for a Franka manipulation scene. [9]
- **December 2024.** Google DeepMind releases Genie 2, a foundation world model for action-controllable 3D environments. [24]
- **January 2025.** NVIDIA introduces Cosmos World Foundation Models at CES 2025. [27]
- **March 18, 2025.** NVIDIA, Google DeepMind, and Disney Research jointly announce Newton, an open-source GPU physics engine built on NVIDIA Warp, plus MuJoCo Warp; Disney's BDX droid is debuted on stage at GTC as a Newton/MuJoCo Warp demonstrator. [26] The Linux Foundation hosts the project, with the three companies formally contributing Newton in September 2025.
- **March 18, 2025.** NVIDIA announces Isaac GR00T N1, billed as the world's first open humanoid robot foundation model, trained on a mix of human videos, real and simulated robot trajectories, and synthetic data, alongside new simulation frameworks. [28]
- **August 2025.** Google DeepMind unveils Genie 3, a real-time text-to-world model running at 720p and 24 FPS. [25]
- **2025.** Most leading humanoid programs (Figure 02, 1X NEO, Tesla Optimus, Unitree H1/G1, Apptronik Apollo) report training their controllers primarily in simulation with sim-to-real transfer.

A reasonable read: the line between a physics simulator, a generative video model, and a robot foundation model is collapsing. It is now plausible to train a generalist robot policy almost entirely on synthetic data, with classical physics for contact and learned models for visual diversity.

## What are the limitations and open problems?

Simulation is not a solved problem. The honest list of what still goes wrong:

- **Contact and friction.** Rigid-contact mechanics are still surprisingly hard. Different engines disagree on the same scene, and small differences in friction coefficients, restitution, and integration step size produce qualitatively different policies.
- **Deformable objects.** Cloth, rope, food, and human bodies are barely simulated by mainstream engines. MPM-based and FEM-based engines (Genesis, NVIDIA Flex, Drake's hydroelastic contact) are improving but slow.
- **Photorealism vs. speed.** Photoreal rendering and high-throughput physics still pull in opposite directions on hardware budgets. Isaac Sim is photoreal but slower than MJX or Brax; Brax is fast but visually plain.
- **Sensor realism.** Cameras, LiDARs, IMUs, and tactile sensors are imperfectly modeled. Tactile sensing in particular is wildly under-served.
- **The reality gap is irreducible without real data.** Even with heavy domain randomization, a policy that has never seen a real robot underperforms one that has been finetuned on it. Real-to-sim-to-real pipelines and continual learning are partial answers.
- **Generative simulators have no physics guarantees.** Genie 2 will sometimes show a ball passing through a wall, because nothing in the model enforces conservation laws. Whether that is a fixable training problem or a structural limitation is open.
- **Benchmark fragility.** Simulation benchmarks tend to overfit. A policy that wins on RoboCasa is not necessarily a generalist robot, just like a model that wins on ImageNet was not always a strong vision system.

None of this means simulation is going away. The opposite. Every part of the modern stack assumes it. But the gap between "works in the simulator" and "works in the real world" remains the central engineering problem of robotics and embodied AI.

## See also

- [MuJoCo](/wiki/mujoco)
- [PyBullet](/wiki/pybullet)
- [Gazebo](/wiki/gazebo)
- [Isaac Lab](/wiki/isaac_lab)
- [NVIDIA Isaac](/wiki/nvidia_isaac)
- [Brax](/wiki/brax)
- [Genesis](/wiki/genesis)
- [Drake](/wiki/drake)
- [Webots](/wiki/webots)
- [CARLA](/wiki/carla)
- [AirSim](/wiki/airsim)
- [Habitat](/wiki/habitat)
- [AI2-THOR](/wiki/ai2_thor)
- [Domain randomization](/wiki/domain_randomization)
- [Sim-to-real](/wiki/sim_to_real)
- [World model](/wiki/world_model)
- [Dreamer](/wiki/dreamer)
- [Genie](/wiki/genie)
- [NVIDIA Cosmos](/wiki/nvidia_cosmos)
- [Newton Physics](/wiki/newton_physics)
- [Reinforcement learning](/wiki/reinforcement_learning)
- [Robotics](/wiki/robotics)
- [Robot learning](/wiki/robot_learning)

## References

1. Todorov, Erez, and Tassa. *MuJoCo: A physics engine for model-based control.* IROS 2012.
2. DeepMind. *Open-sourcing MuJoCo.* Blog post, May 23, 2022. https://deepmind.google/discover/blog/open-sourcing-mujoco/
3. Coumans, Erwin. Bullet Physics SDK and PyBullet. https://pybullet.org/ ; https://github.com/bulletphysics/bullet3
4. Open Robotics. Gazebo and Ignition Gazebo. https://gazebosim.org/ ; Wikipedia: *Gazebo (simulator)*.
5. Makoviychuk et al. *Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning.* NeurIPS Datasets and Benchmarks, 2021.
6. NVIDIA. Isaac Lab Documentation. https://isaac-sim.github.io/IsaacLab/
7. Freeman, Frey, Raichuk, Girgin, Mordatch, and Bachem. *Brax: A Differentiable Physics Engine for Large Scale Rigid Body Simulation.* arXiv:2106.13281, NeurIPS 2021.
8. Tedrake, Russ, and the Drake Development Team. *Drake: Model-Based Design and Verification for Robotics.* https://drake.mit.edu/
9. Genesis Embodied AI. *Genesis: A Generative World for General Purpose Robotics and Embodied AI Learning.* December 19, 2024. https://github.com/Genesis-Embodied-AI/Genesis ; project page: https://genesis-embodied-ai.github.io/
10. Cyberbotics. *Webots.* Open-sourced December 2018. https://cyberbotics.com/
11. Savva et al. *Habitat: A Platform for Embodied AI Research.* ICCV 2019.
12. Puig et al. *Habitat 3.0: A Co-Habitat for Humans, Avatars, and Robots.* Meta AI, October 2023.
13. Kolve et al. *AI2-THOR: An Interactive 3D Environment for Visual AI.* arXiv:1712.05474.
14. Nasiriany et al. *RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots.* RSS 2024. arXiv:2406.02523
15. Tao et al. *ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI.* arXiv:2410.00425, 2024.
16. Dosovitskiy, Ros, Codevilla, Lopez, and Koltun. *CARLA: An Open Urban Driving Simulator.* CoRL 2017. arXiv:1711.03938.
17. Microsoft Research. *AirSim.* https://github.com/microsoft/AirSim
18. Tobin, Fong, Ray, Schneider, Zaremba, and Abbeel. *Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World.* IROS 2017. arXiv:1703.06907.
19. OpenAI. *Learning Dexterity.* Blog post, July 2018. https://openai.com/index/learning-dexterity/
20. OpenAI et al. *Solving Rubik's Cube with a Robot Hand.* 2019. arXiv:1910.07113. https://openai.com/index/solving-rubiks-cube/
21. Ha and Schmidhuber. *World Models.* NeurIPS 2018. arXiv:1803.10122.
22. Hafner, Pasukonis, Ba, and Lillicrap. *Mastering Diverse Domains through World Models.* arXiv:2301.04104, 2023; published in Nature, April 2025.
23. Valevski, Leviathan, Arar, and Fruchter. *Diffusion Models Are Real-Time Game Engines (GameNGen).* arXiv:2408.14837, ICLR 2025. https://gamengen.github.io/
24. Google DeepMind. *Genie 2: A Large-Scale Foundation World Model.* December 2024. https://deepmind.google/blog/genie-2-a-large-scale-foundation-world-model/
25. Google DeepMind. *Genie 3: A New Frontier for World Models.* August 2025. https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/
26. NVIDIA. *Announcing Newton, an Open-Source Physics Engine for Robotics Simulation.* March 18, 2025. https://developer.nvidia.com/blog/announcing-newton-an-open-source-physics-engine-for-robotics-simulation/
27. NVIDIA. *NVIDIA Launches Cosmos World Foundation Model Platform to Accelerate Physical AI Development.* CES 2025; major release March 18, 2025. https://blogs.nvidia.com/blog/cosmos-world-foundation-models/
28. NVIDIA. *NVIDIA Announces Isaac GR00T N1, the World's First Open Humanoid Robot Foundation Model, and Simulation Frameworks.* March 18, 2025. arXiv:2503.14734. https://nvidianews.nvidia.com/news/nvidia-isaac-gr00t-n1-open-humanoid-robot-foundation-model-simulation-frameworks
