Physical AI refers to artificial intelligence systems that can perceive, understand, reason about, and interact with the physical world. Unlike purely digital AI, which operates on text, images, or data within software environments, physical AI bridges the gap between digital intelligence and real-world action. These systems combine advanced perception, cognitive reasoning, planning, and motor control to enable machines such as robots, autonomous vehicles, and industrial automation systems to operate intelligently in dynamic, unstructured environments.
The term gained widespread prominence through NVIDIA CEO Jensen Huang, who positioned physical AI as the third major era of artificial intelligence at CES 2025. Huang described the progression from perception AI (understanding images, words, and sounds) to generative AI (creating text, images, and media) to physical AI (perceiving, reasoning, planning, and acting in the real world). At CES 2026, Huang declared that "the ChatGPT moment for physical AI" had arrived, signaling that machines were beginning to understand, reason, and act in the physical world at a transformative scale.
Physical AI encompasses a broad set of capabilities that allow intelligent systems to function in the real world. At its core, a physical AI system must be able to:
This closed-loop integration of perception, cognition, and action distinguishes physical AI from other forms of artificial intelligence. While a large language model like GPT-4 can reason about the world through text, it cannot fold laundry, drive a car, or assemble a product on a factory line. Physical AI aims to bring that level of intelligence into tangible, real-world applications.
Physical AI draws on insights from cognitive science and neuroscience, building on the idea that intelligence emerges from the dynamic coupling of perception, cognition, and physical interaction. This concept, sometimes called embodied cognition, suggests that an agent's physical form and its ability to interact with the environment are integral to how it develops and applies intelligence.
The perception layer serves as a physical AI system's sensory interface with the world. It captures and processes real-time environmental data to build an internal representation of the surroundings. Sensors typically include:
| Sensor Type | Function | Common Applications |
|---|---|---|
| Cameras (RGB, depth) | Visual scene understanding, object recognition | Autonomous driving, robotic manipulation |
| LiDAR | 3D spatial mapping, distance measurement | Self-driving cars, drone navigation |
| Radar | Velocity detection, obstacle tracking | Automotive safety, industrial monitoring |
| IMUs (accelerometers, gyroscopes) | Orientation, balance, motion tracking | Humanoid robots, drones |
| Force/torque sensors | Contact force measurement, tactile feedback | Robotic grasping, assembly tasks |
| Proximity sensors | Near-field object detection | Warehouse robots, collaborative robots |
Modern physical AI systems increasingly use multimodal perception, fusing data from multiple sensor types to create richer, more robust environmental understanding. Computer vision has advanced rapidly with transformer-based architectures, enabling real-time object detection, scene segmentation, and spatial reasoning.
The cognitive layer processes perceptual inputs and generates plans of action. In physical AI, this often involves:
Recent advances in large language models and vision-language models have significantly improved the cognitive capabilities of physical AI systems. These models provide a form of "System 2" thinking (slow, deliberate reasoning) that complements the fast, reflexive "System 1" control needed for real-time physical interaction.
The action layer translates cognitive plans into physical movements. This involves generating precise motor commands for robotic actuators, whether those are robotic arms, grippers, legs for walking, or wheels for navigation. Key challenges in action and control include:
A major development driving physical AI forward is the emergence of foundation models specifically designed for robotic control and physical interaction. These models, often called Vision-Language-Action (VLA) models, represent a convergence of computer vision, natural language processing, and robotic control into unified architectures.
A Vision-Language-Action model is a class of multimodal foundation model that integrates three capabilities: vision (camera images or video of the environment), language (natural language instructions), and action (low-level robot commands such as motor movements, joint angles, or gripper states). Given an input image of the robot's surroundings and a text instruction like "pick up the red cup and place it in the sink," a VLA directly outputs robot actions that can be executed to accomplish the task.
VLAs are generally constructed by fine-tuning a vision-language model (VLM) on large-scale datasets that pair visual observations and language instructions with robot trajectories. The architecture typically combines a vision-language encoder (often a vision transformer) with an action decoder that transforms latent representations into continuous output actions.
Several architectural paradigms have emerged in the VLA space as of 2025:
| Paradigm | Description | Example Models |
|---|---|---|
| Early fusion | Vision, language, and action tokens are combined into a single sequence processed by one transformer | OpenVLA, SmolVLA |
| Dual-system architecture | A slow "System 2" VLM for reasoning paired with a fast "System 1" policy for real-time control | GR00T N1, Helix |
| Flow matching | Uses continuous normalizing flows to produce smooth action trajectories at high frequency | pi0 |
| Self-correcting | Models that detect and recover from execution errors using visual feedback | CoA-VLA |
NVIDIA announced Isaac GR00T N1 in March 2025 as the world's first open, fully customizable foundation model for generalized humanoid reasoning and skills. GR00T N1 features a dual-system architecture inspired by principles of human cognition. "System 2" is a slow-thinking model powered by a vision-language model that reasons about the environment and instructions to plan actions. "System 1" is a fast-thinking action model that translates these plans into precise, continuous robot movements.
GR00T N1 can generalize across common tasks such as grasping, moving objects with one or both arms, and transferring items between arms, as well as performing multi-step tasks that require long context and combinations of general skills. These capabilities can be applied across use cases including material handling, packaging, and inspection.
The model was updated to GR00T N1.6 in late 2025, integrating NVIDIA Cosmos Reason, an open reasoning vision-language model built for physical AI. Cosmos Reason acts as the robot's deep-thinking brain, turning vague instructions into step-by-step plans using prior knowledge, common sense, and physics to handle new situations. Leading humanoid developers with early access to GR00T N1 include Agility Robotics, Boston Dynamics, Mentee Robotics, and NEURA Robotics.
Physical Intelligence (often stylized as Pi or using the Greek letter) developed pi0 (pi-zero), a general-purpose VLA foundation model for robots. Built on top of the PaliGemma VLM, pi0 was trained on data from seven robotic platforms performing 68 unique tasks. The model employs flow matching to produce smooth, real-time action trajectories at 50 Hz.
pi0 demonstrated strong zero-shot and fine-tuned performance on complex real-world tasks including laundry folding, table bussing, grocery bagging, box assembly, and object retrieval. Physical Intelligence open-sourced the model through its "openpi" release, enabling the broader robotics community to fine-tune pi0 for their own robots and tasks. A subsequent model, pi0.5, introduced in 2025, exhibited meaningful generalization to entirely new environments.
Google DeepMind introduced Gemini Robotics and Gemini Robotics-ER (extended reasoning) in March 2025. Gemini Robotics is an advanced VLA generalist model capable of directly controlling robots, executing smooth and reactive movements to tackle a wide range of complex manipulation tasks. Built on the capabilities of Gemini 2.0, it extends multimodal understanding to physical action.
The reasoning capabilities of the Gemini 2.0 backbone, paired with learned low-level robot actions, allow robots to perform highly dexterous tasks such as folding origami and playing with cards. Gemini Robotics 1.5, released in 2025, brought AI agents further into the physical world by enabling robots to perceive, plan, think, use tools, and act to solve complex multi-step tasks.
In March 2026, Google partnered with Agile Robots to integrate Gemini Robotics foundation models with industrial hardware for manufacturing and logistics applications. Google also brought its Intrinsic robotics software division in-house to accelerate physical AI development.
Figure AI developed Helix, a generalist VLA model that unifies perception, language understanding, and learned control for humanoid robots. Helix was the first VLA to output high-rate continuous control of the entire humanoid upper body, including wrists, torso, head, and individual fingers. It was also the first VLA to operate simultaneously on two robots, enabling them to solve shared, long-horizon manipulation tasks with items they had never encountered.
Helix uses a dual-system approach: System 2 (S2), an onboard VLM operating at 7 to 9 Hz for scene understanding and language comprehension, and System 1 (S1), a fast reactive visuomotor policy that translates semantic representations into precise robot actions at 200 Hz. Helix 02, released in January 2026, extended control to full-body autonomy including walking and balance. In a demonstration, Helix 02 autonomously unloaded and reloaded a dishwasher across a full-sized kitchen in a continuous four-minute task integrating walking, manipulation, and balance with no resets or human intervention.
Skild AI is building a single, general-purpose artificial brain designed to control any robot for any task. The Skild Brain is "omni-bodied," meaning it can control various robot forms without prior knowledge of their exact body configuration, including quadrupeds, humanoids, tabletop arms, and mobile manipulators. In January 2026, Skild AI raised $1.4 billion in funding at a $14 billion valuation, led by SoftBank with participation from NVIDIA's NVentures, Bezos Expeditions, Samsung, LG, and Schneider Electric.
A critical enabler of physical AI is the development of world foundation models (WFMs) and advanced simulation environments. Training physical AI systems in the real world is expensive, slow, and potentially dangerous. Simulation provides a scalable alternative, allowing AI agents to learn from millions of interactions in virtual environments before deploying to the real world.
NVIDIA Cosmos is a platform of state-of-the-art generative world foundation models, advanced tokenizers, guardrails, and accelerated data processing pipelines. Cosmos generates realistic synthetic data for training and validating physical AI models, helping bridge the gap between simulation and reality.
Key components of the Cosmos platform include:
| Component | Description |
|---|---|
| Cosmos Predict 2.5 | Unifies Text2World, Image2World, and Video2World generation in a single model; trained on 200 million curated video clips |
| Cosmos Transfer 2.5 | Enables high-fidelity, spatially controlled world-to-world style transfer; 3.5x smaller than its predecessor |
| Cosmos Reason | An open reasoning VLM for physical AI that provides step-by-step planning using common sense and physics |
| Cosmos 3 | The first world foundation model unifying synthetic world generation, vision reasoning, and action simulation |
Cosmos models can be integrated into synthetic data pipelines running in NVIDIA Isaac Sim, the open-source robotics simulation framework built on the NVIDIA Omniverse platform. By generating photorealistic videos from simulated physics-based environments, these WFMs help reduce the simulation-to-real gap.
NVIDIA Omniverse is a platform for building and operating physically accurate digital twin simulations. It provides the infrastructure for creating virtual environments where physical AI systems can be trained, tested, and validated before real-world deployment.
NVIDIA Isaac Sim is a robotics simulation application built on Omniverse that enables researchers and developers to design, simulate, test, and train AI-based robots in physically accurate virtual environments. The Newton Physics Engine, released in late 2025, provides GPU-accelerated physics simulation within Isaac Lab for training robotic policies.
The simulation-to-reality (sim-to-real) pipeline typically works as follows:
The simulation-to-reality gap (sim-to-real gap) remains one of the central challenges in physical AI. This gap refers to the discrepancies between simulated and real-world environments that cause policies trained in simulation to perform poorly on real hardware. Sources of this gap include:
Several techniques have been developed to address the sim-to-real gap:
| Technique | Description |
|---|---|
| Domain randomization | Varying simulation parameters (lighting, textures, physics properties) to make policies robust to real-world variation |
| Domain adaptation | Using techniques like adversarial training to align simulated and real feature distributions |
| Policy distillation | Transferring learned behaviors from a complex simulation policy to a simpler policy suitable for real deployment |
| Digital twins | Creating high-fidelity replicas of real environments to minimize the gap from the start |
| Zero-shot transfer | Training on sufficiently diverse synthetic data to enable direct deployment without real-world fine-tuning |
Notable progress has been demonstrated by the Allen Institute for AI (Ai2), whose MolmoBot project showed that with sufficient diversity across scenes, objects, lighting, physics, and task definitions, zero-shot transfer from simulation alone is practical for real-world robotic manipulation.
The physical AI landscape involves major technology companies, specialized startups, and research institutions. The following table summarizes the leading players as of early 2026:
| Company | Focus Area | Key Products/Models | Notable Developments |
|---|---|---|---|
| NVIDIA | Platform and infrastructure | Cosmos, Isaac GR00T, Omniverse, Isaac Sim | Provides the foundational compute, simulation, and model platform for much of the industry |
| Google DeepMind | Foundation models, robotics research | Gemini Robotics, Gemini Robotics-ER | Partnered with Boston Dynamics and Agile Robots; brought Intrinsic in-house |
| Physical Intelligence | General-purpose robot foundation models | pi0, pi0.5 | Raised over $600M; open-sourced pi0; backed by OpenAI |
| Figure AI | Humanoid robots | Figure 03, Helix, Helix 02 | First VLA with full-body humanoid control; targeting home environments |
| Tesla | Humanoid robots, autonomous driving | Optimus, FSD | Leverages FSD neural networks for Optimus; planning 50,000 units by 2026 |
| Boston Dynamics | Humanoid robots, industrial automation | Atlas | Production began in 2026; 30,000-unit/year factory planned; partnered with Google DeepMind |
| Skild AI | Universal robot brain | Skild Brain | $1.4B funding at $14B valuation; omni-bodied control across robot types |
| Agility Robotics | Logistics humanoid robots | Digit | Moved 100,000+ totes in commercial operations; customers include Amazon and GXO |
| Apptronik | Humanoid robots | Apollo | Over $770M in total funding; partnered with Google for Gemini integration |
| 1X Technologies | Home humanoid robots | NEO | Accepting pre-orders at $20K; targeting 2026 US launch |
| Allen Institute for AI | Open research | MolmoBot | Demonstrated zero-shot sim-to-real transfer with fully open models |
Manufacturing represents one of the most immediate and high-value application domains for physical AI. Industrial robots powered by physical AI can perform tasks that previously required human judgment and dexterity, including assembly, quality inspection, material handling, and packaging.
The global market value of industrial robot installations reached an all-time high of $16.7 billion, with annual installations exceeding 500,000 units for the fourth consecutive year in 2024. Physical AI is accelerating this trend by enabling robots to handle more complex, unstructured tasks.
In March 2026, ABB and NVIDIA announced progress in closing the simulation-to-reality gap in industrial robotics. Boston Dynamics began manufacturing production Atlas robots immediately after their CES 2026 unveiling, with all 2026 deployments already committed to customers including Hyundai and Google DeepMind. The Atlas robot can perform a wide array of industrial tasks with a reach of up to 7.5 feet and the ability to lift 110 pounds.
Autonomous driving is a foundational application of physical AI, requiring real-time perception, prediction, and planning in highly dynamic environments. Self-driving systems must understand complex traffic scenarios, predict the behavior of other road users, and execute safe driving decisions.
NVIDIA's Alpamayo Autonomous Driving Platform features a 10-billion-parameter Vision-Language-Action model that leverages chain-of-thought reasoning to handle complex driving scenarios. Based on the Physical AI Open Dataset with more than 1,700 hours of driving data collected from over 2,500 cities in 25 countries, Alpamayo has been selected by Mercedes-Benz for integration into its vehicles.
Tesla's approach to physical AI in autonomous driving centers on its Full Self-Driving (FSD) platform, which uses camera-based perception with end-to-end neural networks for autonomous navigation and object detection. The same neural network architecture underpinning FSD has been adapted for the Optimus humanoid robot, demonstrating how physical AI techniques can transfer across different embodiments. Tesla expanded FSD globally in 2026, with public road testing launched in Japan in March 2026.
Autonomous vehicles with Level 4 capabilities (fully autonomous in defined conditions) are demonstrating viability in 2026, with broader commercial deployment expected within three to five years.
Warehouse automation is a rapidly growing application of physical AI. Amazon operates over one million robots in its warehouses as of 2026, and AI-orchestrated warehouse systems are reducing processing times by up to 60 percent.
Agility Robotics' Digit robot, purpose-built for logistics workflows, has moved over 100,000 totes in commercial operations with customers including GXO Logistics, Amazon, Schaeffler, and Spanx factories. Unlike general-purpose humanoids, Digit demonstrates the value of domain-specific physical AI optimized for particular operational environments.
Physical AI is finding applications in healthcare through surgical robots, rehabilitation systems, and assistive devices. AI-powered surgical robots can perform procedures with greater precision than human surgeons in certain tasks, while assistive robots help elderly or disabled individuals with daily activities.
Home assistance represents a longer-term goal for physical AI companies. Figure AI's Figure 03 robot is designed with home environments in mind, featuring soft materials, wireless charging, and safety features, though consumer availability is not expected until late 2026 at the earliest in limited pilot programs. 1X Technologies is accepting pre-orders for its NEO home humanoid robot at $20,000, targeting a 2026 US launch.
NVIDIA has positioned itself as the central platform provider for the physical AI ecosystem, analogous to its role in the broader AI revolution through GPU computing. The company's physical AI stack spans multiple layers:
At GTC 2025 and CES 2026, Jensen Huang outlined NVIDIA's vision for physical AI as the next major computing platform, comparing the coming wave of intelligent robots and autonomous systems to the personal computer and smartphone revolutions. NVIDIA has partnered with virtually every major robotics company, including Boston Dynamics, Figure AI, Agility Robotics, Apptronik, and Skild AI, providing compute infrastructure, simulation tools, and foundation models.
The physical AI market is experiencing extraordinary growth in both investment and market size:
| Metric | Value |
|---|---|
| Physical AI market size (2025) | $5.23 billion |
| Projected physical AI market (2033) | $49.73 billion (CAGR 32.53%) |
| Physical AI software platform market (projected 2034) | $55.8 billion (CAGR 42.0%) |
| Humanoid robot market (2025) | $2.92 billion |
| Projected humanoid market (2030) | $15.26 billion (CAGR 39.2%) |
| Long-term humanoid market (2050, Morgan Stanley estimate) | $5 trillion |
| Total robotics funding (2025) | Over $10.3 billion |
| Humanoid-specific funding (H1 2025) | $3.1 billion across 61 deals |
Major funding rounds in 2025 and early 2026 reflect the scale of investor interest:
| Company | Round | Amount | Valuation |
|---|---|---|---|
| Skild AI | Series C (Jan 2026) | $1.4 billion | $14 billion |
| Figure AI | Series B | $675 million | Not disclosed |
| Physical Intelligence | Series B | ~$600 million | ~$5.3 billion |
| Galaxy Bot | Series A | $453 million | Not disclosed |
| Apptronik | Series A | $403 million | Not disclosed |
Goldman Sachs projects global humanoid shipments of 50,000 to 100,000 units in 2026, with unit economics ultimately improving to $15,000 to $20,000 per robot as production scales.
Despite rapid progress, physical AI faces significant technical and practical challenges:
As of early 2026, physical AI has reached a critical inflection point. Several converging trends indicate that the field is transitioning from research and prototyping to commercial deployment:
Deloitte's 2026 Technology Trends report identified physical AI and humanoid robots as a major trend, noting the convergence of vision, sensing, cobots, and AI that is enabling humans and mobile robots to work together in increasingly flexible environments. Gartner predicted that 40 percent of enterprise applications would leverage task-specific AI agents by 2026, up from less than 5 percent in 2025.
Looking further ahead, the physical AI field is expected to progress through several phases: near-term commercial deployment in structured environments like factories and warehouses (2025 to 2027), broader deployment in semi-structured environments like stores and hospitals (2027 to 2030), and eventual deployment in fully unstructured environments like homes and outdoor spaces (2030 and beyond).
Physical AI intersects with and builds upon several related fields: