Simulation (in AI and robotics)
Last reviewed
May 2, 2026
Sources
28 citations
Review status
Source-backed
Revision
v1 ยท 5,034 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
May 2, 2026
Sources
28 citations
Review status
Source-backed
Revision
v1 ยท 5,034 words
Add missing citations, update stale details, or suggest a clearer explanation.
Simulation in artificial intelligence and robotics refers to the use of computational physics, rendering, and procedural environments to recreate a synthetic version of a physical or virtual world inside which AI agents can perceive, act, learn, and be evaluated. It is the dominant way modern robots and reinforcement learning agents are trained: virtual robots run inside simulators for the equivalent of decades of experience, then the resulting policies are deployed on real hardware or in real games. Without simulation, the field of reinforcement learning as we know it would not exist, and most contemporary work on humanoid robots, autonomous vehicles, dexterous manipulation, and embodied agents would be either far slower or simply impossible.
The term means something specific here. In statistics and physics, "simulation" can mean Monte Carlo sampling or finite-element solvers run for engineering. In the AI/robotics context the focus is narrower: physics engines and 3D environments wired into machine learning pipelines, with goals like data collection, domain_randomization, sim_to_real transfer, and the training of generalist policies. The frontier has expanded recently to include generative models that learn the simulator itself from video, blurring the line between traditional rigid-body engines and neural world models.
Real robots are expensive, slow, and break. A single Boston Dynamics Spot or a Unitree H1 humanoid costs tens of thousands of dollars; replacement parts can take weeks; and a fall during a learning episode might end an experiment for a day. Simulation sidesteps almost all of those constraints.
A simulator gives researchers cheap, parallelizable, safe, and resettable data. You can spin up thousands of robots on a single GPU, run them at hundreds or thousands of times real-time speed, and reset to any prior state on demand. Bad policies break virtual robots without consequences. Curricula are easy to design because you can vary anything: gravity, friction, lighting, the mass of an object, the geometry of a kitchen. None of that is feasible in the physical world.
There are several distinct reasons the field leans on simulation:
All of this comes with the central, unsolved tradeoff: a simulator is a model, and models are wrong. Bridging the gap to the real world is the central engineering problem of the field, addressed mainly through domain randomization, system identification, and domain adaptation.
The physics engine is the core of any simulator. It computes how bodies move, collide, deform, and interact with actuators and sensors. The engines listed below are the ones most commonly cited in robotics and RL research; each has different tradeoffs in accuracy, speed, parallelism, and ergonomics.
| Engine | Origin | License | Strengths | Common use |
|---|---|---|---|---|
| MuJoCo | Emo Todorov, 2012; DeepMind acquired 2021, open-sourced 2022 | Apache 2.0 | Fast, accurate contact-rich rigid-body dynamics; analytic gradients via MJX | Continuous-control RL benchmarks; humanoid locomotion; manipulation |
| Bullet / PyBullet | Erwin Coumans (Sony, AMD, Google, NVIDIA), early 2000s | zlib | Mature collision detection; Python-friendly; URDF support | Hobbyist robotics, classic OpenAI Gym tasks |
| Gazebo | USC 2002, then Open Robotics | Apache 2.0 | Tight ROS integration; sensor models; large robotics community | ROS-based simulation, system integration testing |
| Isaac Sim and Isaac Lab | NVIDIA, built on Omniverse and PhysX | Open under EULA | Photoreal rendering; GPU parallel; OpenUSD scene format | Industrial robotics, humanoids, large-scale RL |
| Brax | Google, 2021 | Apache 2.0 | Fully differentiable; written in JAX; massive parallelism on TPU/GPU | Differentiable RL, learned dynamics, JAX pipelines |
| Drake | MIT (Russ Tedrake) and TRI, since 2005 | BSD-3 | Rigorous multibody dynamics; strong contact mechanics; optimization tooling | High-fidelity research, control theory, manipulation |
| Genesis | Genesis Embodied AI consortium, December 2024 | Apache 2.0 | Multi-physics (rigid, soft, fluids, MPM); Python; very fast | Generative robotics workflows, embodied AI research |
| Webots | Cyberbotics (originally EPFL, 1996); open-sourced December 2018 | Apache 2.0 | Educational use, large robot library, scripted scenarios | Teaching, RoboCup, prototyping |
MuJoCo (Multi-Joint dynamics with Contact) was published by Emanuel Todorov in 2012 and quickly became the standard physics engine for academic continuous-control RL. Most of the canonical benchmark tasks (HalfCheetah, Humanoid, Ant, the OpenAI Gym MuJoCo suite) use it. DeepMind acquired the engine and Roboti LLC in October 2021, made the binaries free, and in May 2022 open-sourced the full code under Apache 2.0.
The modern MuJoCo ecosystem includes MJX, a JAX implementation that runs entire simulations on accelerators with analytic gradients, and MuJoCo Warp (announced 2025), an NVIDIA-collaborative GPU port. DeepMind reports that the Warp version reaches more than 70x speedup for humanoid simulation and around 100x for in-hand manipulation compared to the CPU baseline. MuJoCo is also the physics backend for tools like RoboCasa and the MuJoCo Playground.
Bullet started in the early 2000s and has been used in everything from feature films to AAA games. Erwin Coumans, the original author, has worked on it through stints at Sony, AMD, Google, and NVIDIA. PyBullet is the Python wrapper that turned it into a standard tool for RL research. It is mature, well-documented, and a comfortable starting point for anyone new to robotics simulation, though for cutting-edge GPU parallelism it has been overshadowed by MuJoCo MJX, Brax, Isaac Sim, and Genesis.
Gazebo grew out of the Player Project at the University of Southern California in 2002, became its own project under Willow Garage in 2011, and has been stewarded by Open Robotics (formerly OSRF) since 2012. Its tight integration with ROS made it the default simulator for industrial and academic robotics groups for over a decade. Gazebo went through a confusing rebrand: a modern fork called Ignition Gazebo started in 2017, and after a 2022 trademark dispute Open Robotics renamed the original to Gazebo Classic and the new fork to just Gazebo. Gazebo Classic was retired in 2025.
NVIDIA Isaac is the umbrella name for NVIDIA's robotics stack. It includes Isaac Sim, a photorealistic GPU simulator built on Omniverse and powered by PhysX; Isaac Lab, the RL training framework that replaced the deprecated Isaac Gym, IsaacGymEnvs, OmniIsaacGymEnvs, and Orbit projects; and Isaac GR00T, NVIDIA's humanoid robot foundation model effort. Isaac Sim renders scenes in real time, supports OpenUSD, and is now the most common simulator for industrial humanoid and manipulation programs.
Brax is a fully differentiable rigid-body physics engine written in JAX, released by Google researchers (Freeman, Frey, Raichuk, Girgin, Mordatch, Bachem) in 2021 and presented at NeurIPS 2021. Because the entire simulator is JAX-traceable, environment dynamics, neural networks, and the optimizer all compile together and run on the same accelerator. The combination is what allows Brax to train agents in seconds to minutes, which is hard to picture if your reference point is older RL workflows that took days. Brax is also the natural home for differentiable physics research, where gradients flow through the dynamics into policies.
Drake started in 2005 in Russ Tedrake's Robot Locomotion Group at MIT CSAIL and is now jointly developed with the Toyota Research Institute. It is more conservative than the GPU-first simulators above. Drake invests heavily in numerically robust contact mechanics, hydroelastic contact models, and a systems framework that integrates well with optimization-based control. It is C++ with Python bindings and is used in research where fidelity matters more than raw simulation throughput, such as control of humanoids, dexterous manipulation, and academic underactuated robotics.
Genesis was released in December 2024 by a consortium of more than 20 research labs led by Zhou Xian, after a 24-month development effort. It is an Apache 2.0 Python simulator with a generative front-end (text-to-scene) and a multi-physics back-end that handles rigid bodies, soft bodies, cloth, fluids, and material point method (MPM) materials. The project reports throughput in the range of 10x to 80x faster than Isaac Gym or MuJoCo MJX, with a Franka manipulation scene running at around 43 million frames per second on a single high-end GPU. Whether those numbers hold across diverse workloads in independent benchmarks is still being shaken out by the community, but Genesis has clearly become a major platform.
Webots was started at EPFL in 1996, commercialized by Cyberbotics from 1998 onward, and open-sourced under Apache 2.0 in December 2018. It has a strong educational and competition footprint (RoboCup, university courses) and a polished GUI, with bindings for C, C++, Python, Java, MATLAB, and ROS.
A second class of simulators sits on top of physics engines and provides large 3D environments populated with rooms, objects, and tasks. These are designed for embodied AI: agents that navigate and manipulate inside human environments. Photorealism, scene diversity, and task variety matter more here than raw physics throughput.
| Simulator | Lead organization | First released | Underlying engine | Focus |
|---|---|---|---|---|
| Habitat | Meta AI (FAIR) | 2019 (Habitat 1.0) | Custom; Bullet for physics | Indoor navigation, embodied agents |
| Habitat 3.0 | Meta AI | October 2023 | Same lineage | Human-robot collaboration, social rearrangement |
| AI2-THOR | Allen Institute for AI | 2017 | Unity | Household interaction, navigation, manipulation |
| ManipulaTHOR | Allen Institute for AI | 2021 | Unity (AI2-THOR) | Manipulation with a 6-DoF arm in indoor scenes |
| iGibson and OmniGibson | Stanford | 2020 (iGibson 1.0) | Bullet, then NVIDIA Isaac Sim | Interactive household tasks, BEHAVIOR benchmarks |
| BEHAVIOR-1K | Stanford | 2022 | iGibson/OmniGibson | 1,000 everyday household activities |
| RoboCasa | UT Austin and NVIDIA (Mandlekar et al., RSS 2024) | June 2024 | MuJoCo via robosuite | Kitchen tasks for generalist robot policies |
| ManiSkill / ManiSkill3 | UC San Diego (Su Lab) | 2021; ManiSkill3 in 2024 | SAPIEN | GPU-parallel manipulation benchmark |
| ProcTHOR | Allen Institute for AI | 2022 | Unity | Procedurally generated 10K houses |
Habitat from Meta AI emphasizes fast navigation in photorealistic indoor scans (initially Matterport3D, Replica, and Gibson). Habitat 3.0, released in October 2023, added human avatars that can be controlled by a learned policy or a real person via VR, opening up benchmarks like social rearrangement and social navigation where a robot and a person tidy a room together.
AI2-THOR from the Allen Institute for AI (AI2) takes the opposite approach: hand-modeled rooms in Unity with carefully crafted interactions. Pour water into a kettle, place it on a stove, watch it boil. ManipulaTHOR added a 6-DoF arm; ProcTHOR procedurally generated 10,000 houses. AI2's recent MolmoSpaces effort unifies 230,000 indoor scenes and 130,000 object models with around 42 million annotated grasps.
RoboCasa was published at Robotics: Science and Systems in 2024 by Soroush Nasiriany, Ajay Mandlekar, Yuke Zhu, and collaborators. It is built on MuJoCo via the robosuite framework and focuses on kitchen environments populated with thousands of generative-AI-produced 3D assets. The follow-up RoboCasa365 covers 365 everyday tasks across 2,500 kitchens with hundreds of hours of human and synthetic demonstrations. ManiSkill3, from the Su Lab at UCSD, runs on SAPIEN and reports up to 30,000+ FPS for state-visual GPU manipulation, depending on the task.
Self-driving research has its own simulator ecosystem because the relevant physics (large vehicles, road surfaces, traffic) and the relevant tasks (perception in adversarial conditions, multi-agent prediction) are quite different from indoor robotics.
| Simulator | Origin | Engine | Notes |
|---|---|---|---|
| CARLA | Intel Labs and Toyota Research, 2017 paper | Unreal Engine | Open-source, leading academic AV simulator |
| AirSim | Microsoft Research, 2017 | Unreal Engine, Unity plugin | Drones and ground vehicles; archived 2022 in favor of Project AirSim |
| NVIDIA DRIVE Sim | NVIDIA, on Omniverse | PhysX, RTX rendering | Used by Mercedes-Benz, Volvo, others |
| LGSVL Simulator | LG, 2019 to 2022 | Unity | Discontinued in 2022 |
| Carcraft / Simulation City | Waymo, internal | Proprietary | Reported tens of billions of simulated miles per year |
| Cognata | Cognata Inc. | Proprietary | OEM-focused, sensor accurate |
CARLA was introduced in the paper CARLA: An Open Urban Driving Simulator by Dosovitskiy, Ros, Codevilla, Lopez, and Koltun at the Conference on Robot Learning (CoRL) in 2017. It is the standard academic simulator for autonomous urban driving, with a flexible sensor suite (cameras, LiDAR, radar, depth, semantic segmentation) and configurable weather and traffic. AirSim was a major Microsoft Research effort starting in 2017, but Microsoft archived the open-source repo in 2022 and refocused on the closed-source Microsoft Project AirSim for the aerospace industry.
Waymo's internal simulator, sometimes called Carcraft and later Simulation City, is the most heavily used closed system. The company has reported that for every mile its cars drive on real roads, hundreds or thousands of miles are driven in simulation, much of it focused on rare and dangerous edge cases.
The most important shift in the last few years has been the move from single-threaded CPU simulators to massively parallel GPU simulators. The pattern is the same in each project: instead of stepping one environment at a time, batch tens of thousands of environments together as a single tensor and step them all on the GPU. Throughput goes up by two to four orders of magnitude.
What changed in practice is that an algorithm like PPO that needed many CPU days now finishes in tens of minutes. That has reshaped how problems are posed: training a quadruped to walk in 10 minutes with 4,096 parallel environments on a single GPU is now a homework assignment rather than a publication.
A simulator that exactly matched reality would let you train policies in simulation and deploy them. No simulator does. The reality gap, the difference between simulated and real dynamics, lighting, and sensors, is the source of most sim-to-real failures.
Domain randomization is the dominant practical fix. The technique was introduced for vision in the 2017 paper Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World by Tobin, Fong, Ray, Schneider, Zaremba, and Abbeel (then at OpenAI/UC Berkeley, IROS 2017). The core idea is that if you randomize enough properties of the simulator (textures, lighting, camera position, object color, friction, mass), the real world becomes "just another randomization" the policy has already learned to handle. The original paper trained a real-world object detector to 1.5 cm accuracy using only synthetic data with random non-photorealistic textures.
Domain randomization came into the mainstream when OpenAI's Dactyl project used dynamics and visual randomization to train a Shadow Hand to manipulate a cube and (in a later result) a Rubik's cube using only simulated experience. The trained policy transferred to the real Shadow Hand without any retraining. Modern variants include automatic domain randomization (ADR), where the simulator's randomization range is itself adapted by curriculum, and structured DR, where physics parameters are sampled from posteriors fitted to real data.
A short list of the things people typically randomize:
This is one of those techniques that sounds dumb until it works. Throw a wide enough net and the policy learns features that are invariant to the things you randomized, which often happen to be the things that vary in reality.
Sim-to-real (sometimes written sim2real) is the umbrella term for getting a policy trained in simulation to work on real hardware. Domain randomization is one piece of it, but the broader toolkit also includes:
Quadruped locomotion is the cleanest commercial success of sim-to-real. ANYmal (ETH Zurich), Unitree's robots, and Boston Dynamics' Spot all rely heavily on simulation for their walking controllers. Recent humanoid demos from Figure, 1X, Unitree, and Tesla follow the same playbook: train in sim with randomization, then deploy on hardware. Manipulation is harder, partly because contact and friction are harder to simulate accurately, but RoboCasa, ManiSkill3, and the Stanford BEHAVIOR programs are pushing the state of the art.
A differentiable simulator can take gradients of physical quantities with respect to actions, parameters, or initial conditions. That lets gradient-based optimization replace some uses of reinforcement learning. Instead of sampling thousands of trajectories, you backpropagate through the dynamics directly.
| Engine | Differentiable? | Notes |
|---|---|---|
| Brax | Yes (JAX) | First-class differentiability; widely used for JAX-based control research |
| MuJoCo MJX | Yes | Analytic and finite-difference gradients via XLA |
| Genesis | Yes | Differentiable across rigid, soft, and MPM materials |
| Drake | Partial | Analytical gradients in some subsystems; AutoDiff scalars |
| DiffTaichi | Yes | Research framework for differentiable physics in Taichi |
| Newton (announced 2025) | Yes | Differentiable physics is a stated design goal |
Differentiable simulation is not a clean win. Contact and friction are non-smooth, so naive gradients can be biased or noisy, and sample-based methods like PPO often still beat gradient-based methods for tasks that involve a lot of contact. Where differentiable sim has worked well is for soft-body manipulation, parameter identification, trajectory optimization, and any setting where you want to optimize across many physical parameters at once.
The newest entry on the simulation side is not a physics engine at all. It is a neural network that learns to produce video conditioned on actions. Train such a model on enough gameplay or robot footage, and you get something that behaves like a simulator: it lets you take an action, and it shows you what would happen next.
The headline projects:
These systems are not drop-in replacements for physics engines. They have stunning visual fidelity but no guarantees of physical consistency, no contact mechanics, no notion of mass. What they offer is coverage: arbitrary scenes, arbitrary actions, no need to model the world by hand. The likely future is a combination, with classical physics simulators handling contact-rich manipulation and generative models handling visual diversity, sensor simulation, and rare scenarios.
A closely related but distinct line of work is the use of learned world models inside RL itself. Instead of using the simulator only at training time, a world model is a neural network that the agent rolls out in its own head during training and even during deployment.
The canonical paper is Ha and Schmidhuber's World Models (2018), which trained a VAE plus RNN on car racing rollouts and showed that policies trained entirely "inside the dream" of the world model could transfer back to the real environment. Hafner's Dreamer line (DreamerV1 in 2019, DreamerV2 in 2021, DreamerV3 in 2023) generalized this. DreamerV3, Mastering Diverse Domains through World Models (Hafner, Pasukonis, Ba, Lillicrap), is a single-configuration algorithm that outperforms specialized methods across more than 150 tasks and was the first to collect diamonds in Minecraft from scratch. Other notable model-based RL methods include MuZero, IRIS (transformer-based world models), TD-MPC2, and Sutton's Dyna lineage going back to the 1990s.
The boundary between "a simulator" and "a world model" has gotten blurry. Genie 2, GameNGen, and Cosmos behave like simulators (you can take actions and observe results) but are trained from data rather than coded. World models in Dreamer behave like policies' internal simulators. The unifying view is that anything that lets an agent ask "what happens if I do X?" is, functionally, a simulator.
The field has moved fast. A few markers from the past two years:
A reasonable read: the line between a physics simulator, a generative video model, and a robot foundation model is collapsing. It is now plausible to train a generalist robot policy almost entirely on synthetic data, with classical physics for contact and learned models for visual diversity.
Simulation is not a solved problem. The honest list of what still goes wrong:
None of this means simulation is going away. The opposite. Every part of the modern stack assumes it. But the gap between "works in the simulator" and "works in the real world" remains the central engineering problem of robotics and embodied AI.