Robotics is the interdisciplinary field of engineering and science concerned with the design, construction, operation, and use of physical machines, called robots, that sense, plan, and act in the real world. The discipline draws on mechanical engineering, electrical engineering, computer science, control theory, mathematics, and increasingly on artificial intelligence, machine learning, and the foundation models that emerged from large-scale deep learning research after 2017.
A robot is a programmable machine that combines three core capabilities: it perceives its surroundings through sensors, decides what to do through some form of computation or learned policy, and changes the world through actuators. The field spans tiny surgical instruments that suture inside a beating heart, factory arms that weld car bodies a thousand times an hour, autonomous vehicles that navigate city traffic without a human at the wheel, and bipedal humanoids designed to walk into a warehouse and pack boxes alongside human workers.
For most of the field's history, robotics and AI developed in parallel and only loosely connected. Industrial robots executed pre-programmed motions in caged work cells, while AI research focused on symbolic reasoning, search, and pattern recognition on disembodied data. That separation began to collapse in the 2020s. Large vision-language models trained on internet-scale image and text data showed they could be adapted to control robot arms; diffusion models originally built for image generation became state-of-the-art policies for dexterous manipulation; and a new class of systems known as vision-language-action models (VLAs) gave robots a single network that could see, read instructions, and produce motor commands. By 2026 the boundary between robotics and AI had largely dissolved at the research frontier, and the question of whether useful general-purpose robots were imminent was no longer dismissed as science fiction.
| Field | Detail |
|---|---|
| Discipline | Robotics |
| Parent fields | Mechanical engineering, electrical engineering, computer science, control theory, AI |
| Key subfields | Manipulation, locomotion, perception, planning, control, human-robot interaction |
| Year term coined | 1942 (by Isaac Asimov in the short story "Runaround") |
| First industrial robot | Unimate (1961, General Motors plant, Ewing Township, New Jersey) |
| Global installed base, industrial robots (2024) | 4.664 million units in operation [1] |
| Annual industrial robot installations (2024) | 542,000 units worldwide [1] |
| Largest national market | China (54% of 2024 installations) [1] |
| Flagship academic conferences | ICRA, IROS, RoboCup, Conference on Robot Learning (CoRL) |
| Standard middleware | Robot Operating System (ROS, ROS 2) |
| Common simulators | Gazebo, MuJoCo, NVIDIA Isaac Lab, PyBullet |
There is no single universally accepted definition of a robot, and the boundary between a robot and a complex machine has shifted as technology has matured. The Robotics Industries Association uses a definition aligned with ISO 8373: a programmable, multipurpose manipulator with several degrees of freedom that can move material, parts, tools, or specialized devices through variable programmed motions. That definition was written with industrial arms in mind and excludes many systems people now intuitively call robots, such as quadruped legged platforms, autonomous cars, and surgical teleoperators.
A more useful working definition for the modern field is functional. A robot is a physical system that:
This sense-think-act loop is the canonical structure of a robotic system. Hardware that only senses (a security camera) or only acts (a remotely controlled drill) is not usually called a robot. The term "autonomous" is often appended to emphasize that the system makes decisions without continuous human control, but in practice most deployed robots in 2026 sit somewhere on a spectrum between full teleoperation and full autonomy.
The word robot was coined by Czech writer Karel Capek in his 1920 play R.U.R. (Rossum's Universal Robots). It derives from the Czech word robota, meaning forced labor or drudgery, a term Capek's brother Josef suggested when Karel was searching for a name for the artificial workers in the play. The robots in R.U.R. are biological, built from a synthetic protoplasm, and the play ends with them rising up and exterminating their human creators. The concept of a manufactured servant species predated Capek by centuries (the golem of Jewish folklore, automata in Greek mythology, mechanical figures built by Al-Jazari in the 12th and 13th centuries), but the word itself entered global usage from the play.
The term robotics was coined by Isaac Asimov in the short story "Liar!" published in 1941, although it appeared more prominently in his 1942 story "Runaround," which also introduced the Three Laws of Robotics. Asimov assumed the word already existed and was surprised to learn he had invented it. The Three Laws (a robot may not harm a human or through inaction allow harm; must obey humans except where that would violate the first law; must protect itself except where that would violate the first two) became culturally pervasive but were not designed as engineering specifications. Modern roboticists generally regard them as a literary device rather than a basis for safety engineering, although the questions they raise about machine ethics remain alive.
The history of robotics is often told in three overlapping waves: industrial automation from the 1950s to the 1990s, mobile and humanoid research from the 1970s through the 2010s, and the current AI-driven era of general-purpose robots that began around 2022.
Mechanical figures that mimicked living motion existed long before electricity. The Antikythera mechanism (circa 100 BCE) is the earliest known geared computing device, and Hellenistic engineers such as Ctesibius and Hero of Alexandria designed water-powered automata. In the medieval Islamic world, Ismail al-Jazari described programmable musical automata and a humanoid waiter. In 18th-century Europe, Jacques de Vaucanson built a mechanical duck that appeared to digest food, and Pierre Jaquet-Droz constructed three writing and drawing automata that survive in working order at the Musee d'Art et d'Histoire in Neuchatel. Charles Babbage's analytical engine (1830s, never fully built) and the Jacquard loom (1801, programmed with punch cards) supplied conceptual ingredients that 20th-century roboticists would later combine.
The modern field began with George Devol and Joseph Engelberger. Devol filed a patent in 1954 for a "Programmed Article Transfer" device, a hydraulic arm whose motions could be recorded and played back. Engelberger, an engineer who had read Asimov's stories, met Devol at a cocktail party in 1956 and recognized the patent's commercial potential. Together they founded Unimation, the first industrial robotics company. The first Unimate arm was installed at a General Motors die-casting plant in Ewing Township, New Jersey in 1961, where it unloaded hot castings, a job dangerous for human workers.
The Unimate weighed about 1,800 kg, had hydraulic actuators, and stored its program on a magnetic drum. It was a commercial success: GM ordered more, Volkswagen and Fiat followed, and Engelberger licensed the design to Kawasaki Heavy Industries in 1969, which began manufacturing the Kawasaki-Unimate as Japan's first domestically produced industrial robot. By 1970 the basic shape of the industry had been set: large hydraulic or electric arms, mounted in fixed cells, executing repetitive motions in welding, painting, and material handling.
While industry refined the assembly-line arm, university and government laboratories pursued more general-purpose machines. In 1966 SRI International began building Shakey the Robot, a wheeled platform topped with a TV camera and a bump sensor. Shakey is generally regarded as the first mobile robot capable of reasoning about its actions. It planned routes through a room of large blocks and ramps, and the project produced foundational work on the A* search algorithm and on the visibility graph approach to path planning.
In 1969 Victor Scheinman at Stanford built the Stanford Arm, the first electrically powered, six-axis articulated arm controlled by a computer. Scheinman went on to design the PUMA (Programmable Universal Machine for Assembly) at Unimation in the late 1970s, which became the standard form factor copied by industrial arm makers for decades.
Japan led the early effort on humanoid robots. In 1973, Ichiro Kato and his team at Waseda University unveiled WABOT-1, the first full-scale humanoid robot. WABOT-1 could walk on two legs, grasp objects with its hands, and converse in basic Japanese. Honda began a secret humanoid program in 1986, going through a P-series of prototypes through the 1990s and unveiling ASIMO in 2000. ASIMO could walk, run at 6 km/h, climb stairs, recognize faces, and pour drinks; it became a global ambassador for humanoid research, although Honda retired the project in 2018 to focus on more focused mobility platforms.
In the 1980s Rodney Brooks at MIT proposed the subsumption architecture, a behavior-based design that rejected the symbolic, deliberative AI approach of Shakey. Brooks built insect-like robots that reacted to sensors directly, layering simple behaviors instead of maintaining a central world model. His ideas influenced a generation of mobile robots, and in 1990 he co-founded iRobot, which would later release the Roomba vacuum (2002), the first mass-market consumer robot, with more than 50 million units sold by the mid-2020s.
The DARPA Grand Challenge competitions catalyzed self-driving research. In the 2004 challenge no team finished a 240 km desert course; the next year Stanford's Stanley, led by Sebastian Thrun, completed the route in under seven hours. The 2007 Urban Challenge required vehicles to navigate traffic and obey road rules. Many engineers from those teams later founded or led autonomous vehicle programs at Waymo, Cruise, Aurora, and Tesla.
The DARPA Robotics Challenge (2012 to 2015) focused on disaster-response humanoids, motivated by the Fukushima nuclear accident. The DRC Finals in Pomona, California in June 2015 required robots to drive a vehicle, open doors, turn valves, climb stairs, and use power tools. Most entries fell over repeatedly, producing viral blooper reels that captured how brittle hand-engineered humanoid control still was. The DARPA Subterranean Challenge (2018 to 2021) extended this idea to multi-robot teams exploring caves, tunnels, and urban underground environments.
The success of AlexNet on ImageNet in 2012 spilled into robotics within a few years. Researchers began replacing hand-engineered perception pipelines with convolutional neural networks, and by 2016 Sergey Levine and colleagues at Google had a famous "arm farm" of 14 robots learning grasping policies from raw pixels, accumulating hundreds of thousands of grasp attempts. Deep reinforcement learning results from DeepMind and OpenAI inspired sim-to-real research: in 2019 OpenAI demonstrated a Shadow Hand solving a Rubik's Cube one-handed, with the policy trained entirely in simulation through aggressive domain randomization.
Quadruped robots became a public symbol of the era. Boston Dynamics, spun out of MIT in 1992, released Spot in 2019 as its first commercial product. Spot ran on a model-predictive controller and could trot, climb stairs, and recover from kicks. ETH Zurich's spinoff ANYbotics shipped its ANYmal quadruped for industrial inspection. Chinese company Unitree Robotics drove prices down dramatically: by 2024 its Go2 quadruped retailed for under $2,000, two orders of magnitude cheaper than Spot.
The release of ChatGPT in November 2022 changed expectations across robotics overnight. Within months, multiple research groups had wired large language models into robot perception and planning. In July 2023 Google DeepMind introduced RT-2, the first widely cited vision-language-action model, which fine-tuned a vision-language model on robot trajectories so the same network that captioned images could also output joint commands. The pattern was set: pretrain on the internet, fine-tune on robot data, deploy a single multi-task policy.
Venture capital flooded into humanoid startups. Figure AI, founded by Brett Adcock in 2022, raised more than $1.5 billion across multiple rounds and reached a reported $39 billion valuation in 2026. Apptronik raised $520 million at a $5 billion valuation in February 2026. Tesla announced the Tesla Optimus program in 2021 and committed roughly $20 billion in 2026 capital expenditure to its production. 1X Technologies (formerly Halodi), Sanctuary AI, Agility Robotics, and Chinese firms Unitree, Fourier Intelligence, UBTech, XPENG Robotics, AgiBot, and RobotEra all shipped or announced humanoid platforms.
In parallel, NVIDIA repositioned itself as the platform layer for the new generation of robots. Its Isaac software stack, the Isaac Lab GPU-accelerated simulator, and the GR00T family of humanoid foundation models are used or supported by most major robot makers, and Jensen Huang has repeatedly described physical AI as the next major market for the company's accelerators.
| Year | Milestone |
|---|---|
| 1920 | Karel Capek's play R.U.R. introduces the word robot |
| 1942 | Isaac Asimov uses the word robotics and publishes the Three Laws of Robotics |
| 1954 | George Devol files patent for the first programmable robot arm |
| 1961 | Unimate installed at General Motors, the first industrial robot in production use |
| 1966 to 1972 | Shakey the Robot at SRI International, the first mobile reasoning robot |
| 1969 | Stanford Arm by Victor Scheinman |
| 1973 | WABOT-1 at Waseda University, first full-scale humanoid |
| 1978 | PUMA arm released by Unimation |
| 1986 | LEGO releases the educational Mindstorms predecessor sets |
| 1989 | MIT's Genghis robot demonstrates Brooks's subsumption architecture |
| 1997 | Sojourner rover lands on Mars; Honda unveils the P3 humanoid; first RoboCup tournament |
| 2000 | Honda introduces ASIMO |
| 2002 | iRobot Roomba ships, first mass-market home robot |
| 2004 to 2007 | DARPA Grand Challenge and Urban Challenge |
| 2007 | Robot Operating System (ROS) released by Willow Garage |
| 2010 | Da Vinci Si surgical system from Intuitive Surgical becomes widely deployed |
| 2012 | Kiva Systems acquired by Amazon for $775M, foundation of Amazon Robotics |
| 2013 | Boston Dynamics' Atlas debuts in DRC trials |
| 2015 | DARPA Robotics Challenge Finals |
| 2016 | Universal Robots and others mainstream the collaborative robot (cobot) |
| 2019 | OpenAI's Shadow Hand solves a Rubik's Cube; Boston Dynamics releases Spot commercially |
| 2021 | Tesla announces Optimus; DARPA Subterranean Challenge concludes |
| 2022 | ChatGPT released, accelerating LLM use in robot planning; Figure AI founded |
| 2023 | Google DeepMind publishes RT-2, the first widely known VLA |
| 2024 | Boston Dynamics retires hydraulic Atlas, unveils all-electric Atlas; OpenVLA released |
| 2025 | Physical Intelligence releases pi-zero and pi-0.5; NVIDIA releases Isaac GR00T N1; UBTech begins mass production of Walker S2 |
| 2026 | Boston Dynamics Atlas enters production; Tesla, Figure, Apptronik, Agility scale humanoid deployments |
A modern robot has roughly four layers: mechanical, electrical, sensing, and computational. The boundaries blur because clever design pushes capability into whichever layer is cheapest, but the layers are useful for thinking about how a robot is put together.
The mechanical layer determines what motions are physically possible. Industrial arms are characterized by their degrees of freedom (DOF), the number of independent joint axes. A typical six-axis arm can reach any position and orientation within its workspace; redundant seven-axis arms add an extra elbow joint that allows the same end-effector pose to be reached with multiple arm configurations, which is useful for avoiding obstacles. Humanoid robots have many more DOF: Boston Dynamics's electric Atlas has 56 DOF, Apptronik Apollo has roughly 71, Tesla Optimus has about 28, and Figure 02 has around 41 [3][4].
Manipulators end in end-effectors: parallel-jaw grippers, suction cups, multi-finger hands, or task-specific tools such as welding torches or paint spray heads. Multi-finger hands like the Shadow Dexterous Hand reproduce most of the human hand's 24 articulations; Shadow's newer DEX-EE platform, announced in May 2025, uses a tendon-driven design with multiple motors per joint to improve robustness, and integrates hundreds of channels of tactile sensing per finger [11].
Mobile robots use wheels, tracks, or legs. Differential-drive wheeled bases are the simplest and most common in warehouses; mecanum wheels allow holonomic motion in any direction; legged platforms (quadrupeds, bipeds, hexapods) trade mechanical complexity for the ability to traverse stairs, debris, and narrow gaps that wheeled robots cannot.
Actuators turn electrical or fluid power into motion. The dominant choices are:
| Actuator type | Strengths | Weaknesses | Typical use |
|---|---|---|---|
| Brushless DC motors with gearboxes | Clean, programmable, energy-efficient | Limited peak torque without large gear ratios | Most modern robot arms, quadrupeds, humanoids |
| Quasi-direct-drive motors | High torque-to-weight, low gear ratio, transparent backdrivability | Expensive, heat dissipation | Modern legged robots (Mini Cheetah lineage, Unitree, Boston Dynamics electric Atlas) |
| Hydraulic actuators | Very high force density, robust to shock | Heavy power packs, leaks, noise | Older Boston Dynamics Atlas, heavy industrial arms |
| Pneumatic actuators | Light, compliant, cheap | Hard to control precisely, energy inefficient | Soft robotics, simple grippers |
| Series elastic actuators | Force control, safe interaction | Bandwidth tradeoffs | Cobots, prosthetics, rehabilitation |
| Tendon-driven systems | Place motors away from joints, biologically inspired | Cable wear, complex routing | Dexterous hands like the Shadow Hand, some humanoids |
The shift from hydraulic to electric actuation is the defining mechanical story of the past decade. Boston Dynamics retired the iconic hydraulic Atlas in 2024 and replaced it with a fully electric design built around custom Hyundai Mobis actuators [5]. Quasi-direct-drive motors, popularized by Sangbae Kim's group at MIT through the Cheetah robot lineage in the early 2010s, gave designers high-bandwidth torque control without the seal failures and acoustic noise of hydraulics, and now appear in essentially every new humanoid platform.
The sensor stack determines what a robot can perceive. The dominant categories include:
| Sensor | What it measures | Notes |
|---|---|---|
| RGB camera | Color images | Cheap, ubiquitous, used for object detection and VLA inputs |
| Stereo camera | Depth via disparity | Examples: Intel RealSense, ZED, Luxonis OAK-D |
| Time-of-flight depth camera | Direct distance via light pulse timing | Microsoft Kinect Azure, iPhone Pro depth, RoboSense AC1 [12] |
| Lidar | 3D point clouds via laser scanning | Velodyne, Ouster, Hesai, Innoviz; mainstay for autonomous vehicles |
| IMU (inertial measurement unit) | Linear acceleration and angular rate | Used for pose tracking, balance |
| Joint encoders | Position and velocity of each joint | Required for feedback control |
| Force-torque sensors | 6-axis force at the wrist or fingertips | Critical for assembly, surgical robots |
| Tactile sensors | Pressure, slip, texture on contact | DIGIT, GelSight, Shadow's tactile fingertips |
| GPS / GNSS | Global position outdoors | Used in autonomous vehicles, agricultural robots |
| Microphones | Audio | Speech interfaces, anomaly detection |
Most serious robots fuse multiple sensor modalities. A self-driving car may carry six lidars, twelve cameras, an IMU, GPS, wheel encoders, and several radars; a humanoid in a warehouse uses depth cameras for object detection, encoders for joint feedback, an IMU for balance, and tactile sensors at the fingertips. Sensor fusion combines noisy estimates from different sources into a single coherent state estimate, classically through Kalman filters and their nonlinear variants, more recently through learned networks.
On-board compute has grown dramatically. The original Unimate had a magnetic-drum sequencer; today's humanoids carry GPU-class processors. Boston Dynamics's electric Atlas integrates NVIDIA Jetson modules with a custom compute stack, and the Apptronik Apollo runs on Jetson AGX Orin and Jetson NX modules providing more than 275 TOPS of AI inference performance, while leveraging NVIDIA's Isaac GR00T foundation models for skill learning [3]. Edge inference matters because round-trips to the cloud add unacceptable latency for closed-loop control: a humanoid balance loop must execute at hundreds of hertz, and even a 100-millisecond network delay would topple it.
Robotics is large enough that researchers tend to specialize. The major subfields below correspond roughly to chapters in standard textbooks such as Siciliano and Khatib's Springer Handbook of Robotics and Lynch and Park's Modern Robotics.
Robot perception is the process of converting raw sensor data into structured information about the world: where am I, where are the objects around me, what are they, and how can I interact with them? Subproblems include 3D scene reconstruction, object detection and pose estimation, semantic segmentation, motion estimation, and tactile sensing. Modern robot perception relies heavily on convolutional neural networks and vision transformers, often pretrained on large image datasets and fine-tuned for the robot's specific environment.
Simultaneous Localization and Mapping (SLAM) is the canonical robotics perception problem. A robot moving through an unknown environment must build a map of that environment and simultaneously estimate its own pose within the map. Classical SLAM uses extended Kalman filters or graph optimization (g2o, GTSAM, Ceres Solver); modern SLAM systems include ORB-SLAM3, LSD-SLAM, RTAB-Map, and Cartographer [9]. SLAM technology is now embedded in mass-market products such as Roomba vacuums, Meta Quest VR headsets, and most autonomous vehicles.
Manipulation covers everything to do with grasping, moving, and assembling objects. The classical pipeline is perception (find the object), grasp planning (compute a stable grip), motion planning (find a collision-free trajectory), and control (execute the trajectory while compensating for disturbances). Each step has a deep literature. Form-closure and force-closure analysis from the 1980s gave a mathematical theory of grasping; sampling-based motion planners like the Rapidly-exploring Random Tree (RRT, 1998) and Probabilistic Roadmap Method PRM (1996) made high-dimensional planning practical.
Learning-based manipulation became dominant in the late 2010s. The 2018 PoseCNN and follow-up work demonstrated learned 6D pose estimation; in 2023 the Diffusion Policy paper from Cheng Chi, Russ Tedrake, Shuran Song, and collaborators showed that representing a visuomotor policy as a conditional denoising diffusion process beat existing state-of-the-art methods by an average of 46.9% across 15 manipulation tasks [10]. The combination of imitation learning from human teleoperation, large-scale data collection efforts such as the Open X-Embodiment dataset (more than one million episodes from 22 different robot embodiments across 21 institutions), and diffusion- or flow-based policies has driven rapid progress in dexterous tasks like laundry folding, table bussing, and assembly.
Locomotion is the study of how robots move through space. Wheeled locomotion is well understood and dominates warehouse logistics. Legged locomotion is harder because legged robots are underactuated and inherently unstable: they fall over unless their controller actively manages balance.
The modern era of legged robotics combines model-predictive control with reinforcement learning trained in simulation. ETH Zurich's ANYmal showed that policies trained entirely in Isaac Gym and similar GPU-accelerated simulators could transfer zero-shot to real quadrupeds and outperform hand-engineered controllers on rough terrain. Unitree's H1 humanoid reached a record 3.3 m/s walking speed in 2024 using similar techniques [6]. Atlas, Optimus, Apollo, and Digit all use blends of model-based whole-body control with learned components for difficult terrain.
Bipedal locomotion is dramatically harder than quadrupedal: a biped has two contact points instead of four and must constantly manage center-of-pressure within a small support polygon. The Linear Inverted Pendulum Model and Zero Moment Point (ZMP) criterion from Miomir Vukobratovic provided much of the classical theoretical foundation for bipedal walking; modern controllers extend these with whole-body control, MPC, and learned compensators for unmodeled dynamics.
Motion planning finds a path from a start state to a goal state through a robot's configuration space, avoiding obstacles. Algorithms range from grid-based search (A*, Dijkstra), to sampling-based planners (RRT, RRT*, PRM), to optimization-based methods like CHOMP, STOMP, and TrajOpt that smooth and refine an initial trajectory. Task and motion planning (TAMP) integrates symbolic task planning with continuous motion planning, useful for problems like "clear the dishes off the counter" where a robot must decide an order of operations while also planning each individual motion.
Foundation models have begun to take over the high-level planning role. A robot equipped with a large language model can decompose a natural-language instruction ("make me a sandwich") into a sequence of subtasks, then call lower-level skills to execute each one. Google's SayCan (2022) was an early demonstration; more recent systems integrate VLMs and VLAs more tightly so the same network handles both planning and control.
Robot control is the layer that translates high-level commands into actuator signals. PID controllers handle simple position and velocity loops. Model-based methods include computed torque control, inverse dynamics, model predictive control (MPC), and operational space control. Modern legged and humanoid robots use whole-body control frameworks that solve a quadratic program at each control step to coordinate all joints under multiple constraints.
There is also a growing literature on learned control, where the controller itself is a neural network trained by reinforcement learning or imitation learning. Learned controllers tend to be brittle to distribution shift but perform well in domains where the dynamics are too complex to model precisely, such as deformable object manipulation or contact-rich assembly.
Human-robot interaction (HRI) studies how people perceive, communicate with, and work alongside robots. It draws on psychology, sociology, and design as much as engineering. The field examines questions like how to design robot speech and gesture, how trust calibration works between humans and machine partners, what level of autonomy a person prefers in a given task, and how shared control between human and robot should be structured. The rise of collaborative robots (cobots) and home humanoids has made HRI commercially relevant, not just academic.
For most of the 20th century, AI and robotics were the same field at universities and different fields in industry. AI researchers wrote planners and pattern recognizers; robot makers built precise, scripted machines. The two halves have now fused, driven by three intersecting trends: foundation models trained on internet-scale data, the transformer architecture's ability to handle multimodal inputs, and the maturation of GPU-based simulation that lets policies be trained in simulation and transferred to physical hardware.
A robot foundation model is a large neural network pretrained on broad data and then adapted to many downstream robot tasks, in the same spirit that GPT-4 is a language foundation model. Robot foundation models typically take in images and natural language instructions and output low-level actions (joint targets, end-effector velocities, or discrete motion primitives).
Vision-language-action models (VLAs) are the most prominent class. The original RT-2, introduced by Google DeepMind in mid-2023, fine-tuned the PaLM-E and PaLI-X vision-language models on robot demonstration data, encoding actions as discrete tokens that the model could emit alongside text [7]. OpenVLA, released in June 2024 by researchers at Stanford, is a 7-billion-parameter open-source VLA trained on the Open X-Embodiment dataset of more than a million episodes; despite being smaller than RT-2 it outperformed it on a suite of manipulation tasks [7].
Physical Intelligence's pi-zero (also written pi-0), released in late 2024, replaced discrete action tokens with flow matching to produce smooth continuous action trajectories at 50 Hz, enabling dexterous tasks like folding laundry, bussing tables, and bagging groceries. Its successor pi-0.5, released in 2025, extended pi-zero with open-world generalization, enabling a mobile manipulator to clean entirely new kitchens and bedrooms it had never seen during training [8]. Figure AI released its proprietary Helix VLA in February 2025, claiming dual-system architecture with a high-frequency low-level policy supervised by a slower vision-language reasoner; Figure simultaneously ended its partnership with OpenAI, with CEO Brett Adcock arguing that solving embodied AI requires vertical integration of the AI stack [13].
NVIDIA Isaac GR00T N1, released in March 2025, is the first openly available humanoid foundation model. It uses a dual-system architecture combining a vision-language reasoner with a diffusion transformer for continuous motion generation, and has been demonstrated controlling Fourier GR-1 and 1X NEO humanoids [14]. NVIDIA also reported using its Isaac GR00T blueprint to generate 780,000 synthetic robot trajectories (equivalent to 6,500 hours of human demonstration) in 11 hours of GPU time, and claimed a 40% performance improvement over real-data-only baselines when synthetic and real data are combined [14].
The ImageNet of robotics has not yet been built, but the field is heading toward one. Teleoperation rigs like the ALOHA system from Stanford (a bimanual setup using two leader and two follower arms), the Mobile ALOHA extension, and Tesla's Optimus teleoperation suits let human operators demonstrate tasks that are then used as training data for imitation learning policies. Companies are spending heavily on data collection: Tesla operates teleoperation centers in Texas, Apptronik captures data on its own robots, and Physical Intelligence collects continuously across multiple embodiments.
Classical imitation learning falls into two camps: behavioral cloning, which directly fits a policy to demonstrations through supervised learning, and inverse reinforcement learning, which infers the demonstrator's reward function and then plans against it. The 2023 Diffusion Policy work showed that representing the policy as a conditional denoising diffusion process gracefully handles multimodal action distributions and beats prior methods by large margins [10]. ALOHA and Diffusion Policy together became the de facto baselines for new manipulation research.
Simulators are crucial because real-world robot data is expensive, slow to collect, and hard to scale. The challenge is the reality gap: a policy that works perfectly in simulation often fails on the physical robot because the simulator's friction, contact, and sensor models do not match reality.
Domain randomization, popularized by Josh Tobin and Pieter Abbeel at OpenAI in 2017, addresses this by deliberately varying simulator parameters during training so that the real world is just one more sample from a distribution the policy already handles. OpenAI's 2019 Rubik's Cube demonstration trained a Shadow Hand policy entirely in simulation with aggressive randomization of object friction, masses, lighting, and camera positions, then transferred it to a physical hand. Modern sim-to-real pipelines use NVIDIA Isaac Lab, MuJoCo MJX, Genesis, Gazebo, and similar tools, often training thousands of robot instances in parallel on a single GPU to gather years of equivalent experience in hours [15].
Isaac Gym, NVIDIA's earlier GPU-accelerated environment, has been deprecated in favor of Isaac Lab, which integrates non-linear actuator models, multi-frequency sensor simulation, and procedural environment generation [15]. Researchers have used Isaac Lab to train walking and manipulation policies for nearly every major commercial humanoid.
A growing body of work studies how to combine vision, touch, language, audio, and proprioception in a single learned policy. Tactile sensors like GelSight, DIGIT (developed at Meta), and Shadow's tactile fingertips give robots high-resolution feedback at the contact patch, which is essential for tasks like inserting a USB connector or folding fabric. Multimodal architectures, often based on transformer attention over heterogeneous tokens, are an active area of research at every major lab.
The commercial robot world in 2026 spans many distinct categories with different technologies, customers, and economics.
Industrial robots remain the largest commercial segment by units and revenue. The International Federation of Robotics (IFR) reported 542,000 units installed in 2024, more than double the volume of a decade earlier, bringing the global installed base to 4.664 million units in operation, a 9% year-over-year increase [1]. Asia accounted for 74% of new deployments; China alone took 54% of all 2024 installations (295,000 units), Japan installed 44,500 units, and Europe collectively installed 85,000, an 8% decline [1]. The IFR projects 575,000 installations in 2025 and expects the 700,000-per-year mark to be passed by 2028 [1].
The "Big Four" of traditional industrial robotics are ABB (Switzerland), Fanuc (Japan), Yaskawa (Japan), and Kuka (Germany, owned by Midea since 2017). In October 2025, SoftBank Group announced an agreement to acquire ABB's robotics division for approximately $5.4 billion, an indication of the strategic value placed on automation portfolios [16].
Collaborative robots, or cobots, are designed to work safely alongside humans without protective fencing. They use force-sensitive joints, rounded surfaces, and conservative control to limit injury if they collide with a person. Universal Robots (founded 2005, acquired by Teradyne in 2015) created the modern cobot category with the UR5 and remains a market leader. Other significant cobot makers include Techman Robot (Taiwan), Doosan Robotics (South Korea), AUBO (China), Franka Emika (Germany, with the Panda research arm popular in academic labs), and the cobot lines from each of the Big Four. The global cobot market was estimated at roughly $2.95 billion in 2025 with projections in the high teens of billions by the early 2030s [16].
Autonomous mobile robots (AMRs) move material through warehouses, factories, and hospitals without fixed paths. They differ from older automated guided vehicles (AGVs), which followed magnetic strips or QR codes, in that AMRs use SLAM and sensor-driven navigation to plan around obstacles dynamically.
Amazon Robotics, originally Kiva Systems before Amazon's $775 million acquisition in 2012, operates the largest robot fleet in the world. Amazon disclosed it had crossed 750,000 deployed mobile robots across its warehouses in 2025 [17]. Its main models include the Hercules drive units (descendants of the original Kiva robots), the Robin and Sparrow robotic arms, the Proteus fully autonomous mobile robot, and Vulcan, unveiled in May 2025 with a tactile sensing capability designed for picking variable items from inbound bins [17]. Other major AMR vendors include Geek+, Locus Robotics, Mobile Industrial Robots (MiR), OTTO Motors, Seegrid, and Hai Robotics. The global warehouse AMR market was around $5.3 billion in 2025 with projections to $28.7 billion by 2034 [17].
Quadruped robots found a commercial niche in industrial inspection. Boston Dynamics's Spot, launched commercially in 2019, is used for autonomous inspection of oil and gas facilities, power plants, mines, and construction sites. Boston Dynamics reported deploying more than 500 robots in 2025, generating roughly $130 million in combined revenue from Spot and the Stretch trailer-loading system [5]. ANYbotics's ANYmal, Ghost Robotics's Vision 60, and DEEP Robotics's quadrupeds compete in similar markets.
The Chinese cost competition has been ferocious. Unitree Robotics's Go2 quadruped retails for under $2,000 in entry-level configurations, opening quadrupeds to hobbyists, educators, and researchers who could not previously afford a serious legged platform.
The commercial humanoid push is the most-watched story in robotics in 2026. Capital, talent, and media attention have converged on a small number of platforms competing to be the first general-purpose humanoid in mass deployment.
| Platform | Maker | Specs and notes (2026) |
|---|---|---|
| Atlas (electric) | Boston Dynamics | 56 DOF, 2.3 m reach, 50 kg payload, custom Hyundai Mobis actuators; production began at Boston HQ January 2026; first deliveries to Hyundai's Robotics Metaplant Application Center and Google DeepMind [5] |
| Optimus | Tesla | ~28 DOF; targets $20,000 to $30,000 retail price; Tesla committed roughly $20 billion in 2026 capex toward Optimus production [4] |
| Figure 02 / Figure 03 | Figure AI | 5'6" tall, 20 kg payload, 5-hour battery, 1.2 m/s walking speed; Helix VLA launched February 2025; deployed two robots at BMW's Spartanburg plant for 11 months ending November 2025 (90,000+ parts handled, contributing to 30,000 vehicles) [13] |
| Apollo | Apptronik | 1.73 m, 72.6 kg, 25 kg payload, 4-hour battery, 71 DOF; runs NVIDIA Jetson AGX Orin / Jetson NX (~275 TOPS); leverages NVIDIA Project GR00T; raised $520M at $5B valuation in February 2026 [3] |
| Digit | Agility Robotics | Non-traditional bipedal logistics robot, 8-hour shift battery; first humanoid in commercial revenue-generating deployment; multi-year RaaS contract with GXO Logistics at Spanx facility; pilots with Mercado Libre, Amazon, Schaeffler [18] |
| H1 / G1 | Unitree Robotics | H1: 1.8 m, 47 kg, peak joint torque 360 Nm, 3.3 m/s walking; G1: 1.27 m, 35 kg, 23 to 43 DOF, retail starting at low five figures USD [6] |
| Walker S2 | UBTech | Industrial humanoid; mass production announced 2025 with orders exceeding 800 million yuan; supports autonomous battery swap in 3 minutes |
| GR-1 / GR-2 | Fourier Intelligence | Initially positioned for rehabilitation and patient-care use; mass production claimed since 2023 |
| Phoenix | Sanctuary AI | Canadian humanoid; emphasizes general-purpose teleoperated and autonomous task execution |
| NEO | 1X Technologies | Soft-shell consumer humanoid backed by OpenAI Startup Fund |
Digit's deployment at GXO is widely credited as the first commercial humanoid contract paying for productive work [18]. Most other platforms are in pilot or pre-production at the time of writing. Whether these robots achieve the unit economics needed for mass deployment remains an open empirical question; the bull case rests on declining hardware costs, rapid VLA improvements, and labor shortages, and the bear case rests on persistent reliability issues, the difficulty of dexterous manipulation, and unfavorable comparisons against existing wheeled and arm-based automation that already does many warehouse jobs adequately.
Intuitive Surgical's Da Vinci system has dominated soft-tissue robotic surgery since the early 2000s. The fifth-generation Da Vinci 5, cleared by the FDA in March 2024 and broadly available in the U.S. by Q3 2025, brings 10,000 times the computing power of the Da Vinci Xi, force feedback at the instrument tip via a Force Gauge indicator, in-console video replay, and over 150 design improvements [19]. Approximately 1,200 Da Vinci 5 systems were installed and 270,000 procedures performed by early 2026, with U.S. utilization 11% higher than the predecessor Xi [19]. Competing platforms include Medtronic's Hugo, Johnson & Johnson's Ottava, CMR Surgical's Versius, and an expanding field of orthopedic and neurosurgical specialty systems.
Self-driving cars are robots whose primary task is locomotion through human road environments. Waymo, the Alphabet subsidiary that traces back to Google's 2009 self-driving project, operates by far the largest commercial robotaxi fleet, with roughly 2,500 to 3,000 vehicles spread across San Francisco, Los Angeles, Phoenix, Austin, and Atlanta as of late 2025, with weekly rides crossing 400,000 by the end of 2025 [20]. Tesla launched a small Robotaxi service in Austin in 2025, with the in-service fleet reportedly numbering in the tens rather than the hundreds Elon Musk had projected [20]. Zoox, Amazon's subsidiary, launched its first public-facing rides in 2025. Cruise wound down its operations after a 2023 incident in San Francisco. Outside the U.S., Chinese players Baidu Apollo (Apollo Go), Pony.ai, and WeRide operate sizeable fleets in multiple cities.
Aerial robots have become commonplace, from consumer quadcopters made by DJI and Skydio to delivery drones from Wing and Zipline to military systems. Autonomy stacks for drones combine GPS, IMU, vision-based obstacle avoidance, and increasingly learned policies for tasks like obstacle-avoidance flight and target tracking.
The iRobot Roomba, launched in 2002, was the first commercially successful home robot and has sold tens of millions of units. The category has since expanded to robot mops, lawn mowers (Husqvarna Automower, Mammotion), pool cleaners, and educational platforms (LEGO Mindstorms, VEX Robotics). Service robots in restaurants, hotels, and hospitals (delivery bots from Bear Robotics, Pudu, Keenon) have proliferated in Asia and increasingly elsewhere.
Most robotics systems share a common software architecture. At the bottom is real-time control code running on dedicated microcontrollers; in the middle is a publish-subscribe middleware that connects components; on top sit perception, planning, and decision-making.
Robot Operating System (ROS) is the dominant open-source framework. Originally developed at Willow Garage from 2007, ROS is a collection of libraries and tools that provide hardware abstraction, device drivers, message passing, package management, and visualization tools (rviz, rqt). Despite the name it is not an operating system: it runs on top of Linux. The original ROS 1 has been largely superseded by ROS 2 (Foxy, Galactic, Humble, Iron, Jazzy, Kilted releases through 2025), which uses DDS for real-time messaging and is suitable for production deployments. ROS 2 ships with slam_toolbox as the standard 2D SLAM solution; visual SLAM and lidar SLAM packages such as ORB-SLAM3, Cartographer, and RTAB-Map are widely used [9].
Simulators let researchers train and test robots without real hardware. Major options include:
| Simulator | Owner | Notes |
|---|---|---|
| Gazebo | Open Robotics | Long-standing ROS companion; multiple physics back-ends |
| MuJoCo | DeepMind (open-source) | Fast, accurate contact dynamics; MJX runs on GPU |
| NVIDIA Isaac Lab | NVIDIA | GPU-accelerated, integrates with Isaac Sim and GR00T; replaced Isaac Gym [15] |
| PyBullet | Open-source | Lightweight Python interface to Bullet physics |
| Webots | Cyberbotics | Educational and research; open-source since 2018 |
| Genesis | Genesis Embodied AI | 2024 entrant; designed for fast generative simulation |
| DRAKE | TRI / MIT | Trajectory optimization and dynamics for manipulation |
NVIDIA Isaac Sim, Omniverse, and the Newton physics engine (under development with Google DeepMind and Disney Research) are positioning themselves as the platform layer for synthetic data generation in the foundation-model era.
Libraries that wrap reinforcement learning, imitation learning, and policy training for robotics include LeRobot (Hugging Face), Stable-Baselines3, RLlib, Robosuite, and the Open X-Embodiment data tools. The pace of releases has accelerated: in 2024 and 2025 alone, public foundation-model releases like OpenVLA, pi-zero, pi-0.5, GR00T N1, and OpenPI have given small teams access to capabilities that previously required Google-scale infrastructure.
Robotics has a healthy conference culture, with several venues considered the equivalent of NeurIPS or ICML in machine learning.
| Conference | Sponsor | Typical scope |
|---|---|---|
| ICRA (IEEE International Conference on Robotics and Automation) | IEEE Robotics and Automation Society | Largest general robotics conference; ICRA 2025 was held in Atlanta, May 19 to 23, 2025 |
| IROS (IEEE/RSJ International Conference on Intelligent Robots and Systems) | IEEE / Robotics Society of Japan | Second flagship conference; IROS 2025 in Hangzhou, China received 5,083 submissions and accepted 1,991 papers [21] |
| RSS (Robotics: Science and Systems) | RSS Foundation | Selective single-track conference, often with foundational theoretical work |
| CoRL (Conference on Robot Learning) | Independent | Premier venue for learning-based robotics; founded 2017 |
| HRI (ACM/IEEE International Conference on Human-Robot Interaction) | ACM, IEEE | Human factors and HRI |
| ISRR (International Symposium on Robotics Research) | International Foundation of Robotics Research | Triennial, by-invitation, longer papers |
| RoboCup | RoboCup Federation | Annual competition (soccer, rescue, @home); founded 1997 with the goal of beating the human soccer World Cup champion by 2050 |
| WRC (World Robot Conference) | Beijing | Industry-oriented, dominant Chinese venue |
| Automate | A3 (Association for Advancing Automation) | North American industrial automation trade show |
Leading journals include the International Journal of Robotics Research (IJRR), IEEE Transactions on Robotics (T-RO), IEEE Robotics and Automation Letters (RA-L), Science Robotics, and the Journal of Field Robotics.
Recent figures from the IFR, company filings, and industry trackers paint a picture of a field that is large and growing in traditional segments while undergoing a structural shift toward general-purpose AI-driven robots.
| Metric | Value | Source |
|---|---|---|
| Industrial robots in operation worldwide (end of 2024) | 4.664 million units | IFR World Robotics 2025 [1] |
| Annual industrial robot installations (2024) | 542,000 units | IFR [1] |
| Year-over-year operational stock growth | +9% | IFR [1] |
| China share of 2024 installations | 54% (295,000 units) | IFR [1] |
| Japan installations (2024) | 44,500 units | IFR [1] |
| European installations (2024) | 85,000 units (down 8%) | IFR [1] |
| Projected 2025 industrial installations | 575,000 units | IFR forecast [1] |
| Projected installations to surpass | 700,000 units by 2028 | IFR forecast [1] |
| Global cobot market (2025) | ~$2.95 billion | Industry analyses [16] |
| Amazon mobile robot fleet | 750,000+ units | Amazon disclosures [17] |
| Warehouse AMR market (2025) | $5.3 billion | Market research [17] |
| Boston Dynamics 2025 revenue (Spot + Stretch) | ~$130 million | Company reporting [5] |
| Da Vinci 5 systems installed (early 2026) | ~1,200 systems, 270,000 procedures | Intuitive Surgical [19] |
| Waymo robotaxi fleet (late 2025) | ~2,500 to 3,000 vehicles | Independent trackers [20] |
| Waymo weekly ride volume (end of 2025) | ~400,000 | Waymo / press [20] |
| Apptronik valuation (Feb 2026) | $5 billion (raised $520M) | CNBC [3] |
| Figure AI valuation (2026) | ~$39 billion | Industry sources [13] |
Despite rapid progress, several long-standing problems remain unsolved.
Dexterous manipulation is the deepest. Folding a t-shirt cleanly, plugging in a USB cable on the first try, or screwing a small bolt under occlusion is still beyond the reliability bar that mass deployment demands. Tactile sensing is improving but remains rare on production hardware, and contact-rich tasks expose the gap between simulator dynamics and real friction and deformation.
Generalization across embodiments and environments is a second open question. The Open X-Embodiment dataset and models like pi-0.5 show that some skills transfer across robot bodies and into unseen rooms, but most VLA policies still require fine-tuning data from the specific robot and task. The gap between fine-tuned and zero-shot performance is the main empirical bottleneck for the foundation-model approach.
Safety, certification, and assurance remain underdeveloped relative to other safety-critical industries. Industrial robots have ISO 10218 and ISO/TS 15066 (for cobots), and surgical robots are regulated as medical devices. There is no comparable framework for general-purpose home or factory humanoids running learned policies whose internal behavior is hard to audit. The technical question of how to give safety guarantees over neural network controllers is an active research area, with formal verification of small networks beginning to scale to richer policy classes.
Long-horizon autonomy stresses both perception and planning. A robot that has to operate over hours or days, dealing with novel objects, partial failures, and changing goals, must combine reactive control with persistent memory and high-level reasoning. LLM-based agents have shown surprising competence at decomposing tasks but introduce their own failure modes, including hallucinated affordances and confident misidentification of objects.
Finally, the economic question of unit cost is unsettled. Industrial arms have a well-understood payback period in three- to five-shift operations. Humanoids have to compete not just with human labor but with the entire existing automation stack of conveyors, AMRs, and dedicated arms. Whether a $30,000 humanoid that can do 60% of human warehouse tasks at 50% of human throughput is a good investment depends on facility layout, labor markets, and reliability metrics that are only now starting to be measured at scale.