# Physical AI

> Source: https://aiwiki.ai/wiki/physical_ai
> Updated: 2026-06-23
> Categories: Artificial Intelligence, Embodied AI, Robotics
> From AI Wiki (https://aiwiki.ai), a free encyclopedia of artificial intelligence. Quote with attribution.

**Physical AI** is artificial intelligence that perceives, reasons about, and acts in the physical world through machines such as [robots](/wiki/robot), [autonomous vehicles](/wiki/autonomous_vehicle), and industrial automation, as opposed to digital AI that operates only on text, images, or data inside software. The term was popularized by [NVIDIA](/wiki/nvidia) CEO Jensen Huang, who framed it at CES 2025 as the third era of AI, after perception AI and [generative AI](/wiki/generative_ai), declaring that "the next frontier of AI is physical AI" and that "AI is now beginning to understand the laws of physics."[1] Physical AI combines advanced perception, cognitive reasoning, planning, and motor control to enable systems to operate in dynamic, unstructured environments.

Huang described the progression from **perception AI** (understanding images, words, and sounds) to **generative AI** (creating text, images, and media) to **physical AI** (perceiving, reasoning, planning, and acting in the real world).[1] At CES 2026, Huang declared that "the ChatGPT moment for physical AI" had arrived, signaling that machines were beginning to understand, reason, and act in the physical world at a transformative scale.[2] The technical foundation of physical AI rests on three converging pieces: [world models](/wiki/world_model) that simulate physics, [robot foundation models](/wiki/robot_foundation_model) that map perception and instructions to actions, and large-scale simulation that trains policies before they reach real hardware.

## What is physical AI?

Physical AI encompasses a broad set of capabilities that allow intelligent systems to function in the real world. At its core, a physical AI system must be able to:

1. **Perceive** its environment through sensors such as cameras, [LiDAR](/wiki/lidar), radar, proximity sensors, and inertial measurement units (IMUs)
2. **Understand** the physical properties of objects and scenes, including spatial relationships, material properties, and the laws of physics
3. **Reason and plan** sequences of actions to accomplish goals, adapting to new situations and unexpected changes
4. **Act** on the physical world through actuators, robotic arms, wheels, legs, or other mechanical systems
5. **Learn and adapt** from experience, improving performance over time through feedback loops

This closed-loop integration of perception, cognition, and action distinguishes physical AI from other forms of artificial intelligence.[20] While a [large language model](/wiki/large_language_model) like [GPT-4](/wiki/gpt-4) can reason about the world through text, it cannot fold laundry, drive a car, or assemble a product on a factory line. Physical AI aims to bring that level of intelligence into tangible, real-world applications.

Physical AI draws on insights from [cognitive science](/wiki/cognitive_science) and neuroscience, building on the idea that intelligence emerges from the dynamic coupling of perception, cognition, and physical interaction. This concept, sometimes called [embodied cognition](/wiki/embodied_cognition), suggests that an agent's physical form and its ability to interact with the environment are integral to how it develops and applies intelligence.[20]

## Key Components

### Perception

The perception layer serves as a physical AI system's sensory interface with the world. It captures and processes real-time environmental data to build an internal representation of the surroundings. Sensors typically include:

| Sensor Type | Function | Common Applications |
|---|---|---|
| Cameras (RGB, depth) | Visual scene understanding, object recognition | [Autonomous driving](/wiki/autonomous_driving), robotic manipulation |
| [LiDAR](/wiki/lidar) | 3D spatial mapping, distance measurement | Self-driving cars, drone navigation |
| Radar | Velocity detection, obstacle tracking | Automotive safety, industrial monitoring |
| IMUs (accelerometers, gyroscopes) | Orientation, balance, motion tracking | [Humanoid robots](/wiki/humanoid_robot), drones |
| Force/torque sensors | Contact force measurement, tactile feedback | Robotic grasping, assembly tasks |
| Proximity sensors | Near-field object detection | Warehouse robots, collaborative robots |

Modern physical AI systems increasingly use multimodal perception, fusing data from multiple sensor types to create richer, more robust environmental understanding. [Computer vision](/wiki/computer_vision) has advanced rapidly with [transformer](/wiki/transformer)-based architectures, enabling real-time object detection, scene segmentation, and spatial reasoning.

### Cognition and Planning

The cognitive layer processes perceptual inputs and generates plans of action. In physical AI, this often involves:

- **World modeling**: Building internal representations of the environment that allow the system to predict the outcomes of potential actions. [World models](/wiki/world_model) are central to reasoning and planning, enabling agents to simulate and evaluate different courses of action before executing them.
- **Task decomposition**: Breaking complex goals into manageable subtasks. For example, the instruction "clean the kitchen table" requires identifying the table, recognizing objects on it, planning a sequence of pick-and-place actions, and executing them in order.
- **Physical reasoning**: Understanding the laws of physics as they apply to real-world interactions, such as predicting where a ball will roll, estimating the force needed to grasp an object without damaging it, or inferring that a pedestrian might be hidden behind a parked car.
- **Common-sense reasoning**: Applying everyday knowledge that humans take for granted, such as understanding that liquids spill, fragile objects break, and heavy objects require more force to move.

Recent advances in [large language models](/wiki/large_language_model) and [vision-language models](/wiki/vision_language_model) have significantly improved the cognitive capabilities of physical AI systems. These models provide a form of "System 2" thinking (slow, deliberate reasoning) that complements the fast, reflexive "System 1" control needed for real-time physical interaction.[20]

### Action and Control

The action layer translates cognitive plans into physical movements. This involves generating precise motor commands for robotic actuators, whether those are robotic arms, grippers, legs for walking, or wheels for navigation. Key challenges in action and control include:

- **Dexterous manipulation**: Handling objects with human-like precision, including grasping irregularly shaped items, using tools, and performing fine motor tasks
- **Locomotion**: Walking, running, or navigating over uneven terrain, maintaining balance in dynamic conditions
- **Real-time responsiveness**: Reacting to unexpected events (a dropped object, a moving obstacle) within milliseconds
- **Multi-robot coordination**: Orchestrating multiple physical AI agents to work together on shared tasks

## Foundation Models for Physical AI

A major development driving physical AI forward is the emergence of [robot foundation models](/wiki/robot_foundation_model) specifically designed for robotic control and physical interaction. These models, often called **Vision-Language-Action (VLA) models**, represent a convergence of [computer vision](/wiki/computer_vision), [natural language processing](/wiki/natural_language_processing), and robotic control into unified architectures.

### What is a Vision-Language-Action (VLA) model?

A [Vision-Language-Action model](/wiki/vision-language-action_model) is a class of multimodal [foundation model](/wiki/foundation_model) that integrates three capabilities: vision (camera images or video of the environment), language (natural language instructions), and action (low-level robot commands such as motor movements, joint angles, or gripper states). Given an input image of the robot's surroundings and a text instruction like "pick up the red cup and place it in the sink," a VLA directly outputs robot actions that can be executed to accomplish the task.[19]

VLAs are generally constructed by fine-tuning a [vision-language model](/wiki/vision_language_model) (VLM) on large-scale datasets that pair visual observations and language instructions with robot trajectories. The architecture typically combines a vision-language encoder (often a [vision transformer](/wiki/vision_transformer)) with an action decoder that transforms latent representations into continuous output actions.[19]

Several architectural paradigms have emerged in the VLA space as of 2025:

| Paradigm | Description | Example Models |
|---|---|---|
| Early fusion | Vision, language, and action tokens are combined into a single sequence processed by one transformer | [OpenVLA](/wiki/openvla), SmolVLA |
| Dual-system architecture | A slow "System 2" VLM for reasoning paired with a fast "System 1" policy for real-time control | [GR00T N1](/wiki/isaac_gr00t), [Helix](/wiki/figure_ai) |
| Flow matching | Uses continuous normalizing flows to produce smooth action trajectories at high frequency | [pi0](/wiki/pi0) |
| Self-correcting | Models that detect and recover from execution errors using visual feedback | CoA-VLA |

### NVIDIA Isaac GR00T

[NVIDIA](/wiki/nvidia) announced [Isaac GR00T N1](/wiki/isaac_gr00t) in March 2025 as the world's first open, fully customizable foundation model for generalized humanoid reasoning and skills.[3] GR00T N1 features a dual-system architecture inspired by principles of human cognition. "System 2" is a slow-thinking model powered by a [vision-language model](/wiki/vision_language_model) that reasons about the environment and instructions to plan actions. "System 1" is a fast-thinking action model that translates these plans into precise, continuous robot movements.[3]

GR00T N1 can generalize across common tasks such as grasping, moving objects with one or both arms, and transferring items between arms, as well as performing multi-step tasks that require long context and combinations of general skills. These capabilities can be applied across use cases including material handling, packaging, and inspection.[3]

The model was updated to GR00T N1.6 in late 2025, integrating NVIDIA Cosmos Reason, an open reasoning vision-language model built for physical AI. Cosmos Reason acts as the robot's deep-thinking brain, turning vague instructions into step-by-step plans using prior knowledge, common sense, and physics to handle new situations. Leading humanoid developers with early access to GR00T N1 include [Agility Robotics](/wiki/agility_robotics), [Boston Dynamics](/wiki/boston_dynamics), Mentee Robotics, and NEURA Robotics.[3]

### Physical Intelligence pi0

[Physical Intelligence](/wiki/physical_intelligence) (often stylized as Pi or using the Greek letter) developed pi0 (pi-zero), a general-purpose VLA foundation model for robots. Built on top of the PaliGemma VLM, pi0 was trained on data from seven robotic platforms performing 68 unique tasks. The model employs flow matching to produce smooth, real-time action trajectories at 50 Hz.[4]

pi0 demonstrated strong zero-shot and fine-tuned performance on complex real-world tasks including laundry folding, table bussing, grocery bagging, box assembly, and object retrieval. Physical Intelligence open-sourced the model through its "openpi" release, enabling the broader robotics community to fine-tune pi0 for their own robots and tasks. A subsequent model, pi0.5, introduced in 2025, exhibited meaningful generalization to entirely new environments.[4]

### Google DeepMind Gemini Robotics

[Google DeepMind](/wiki/google_deepmind) introduced Gemini Robotics and Gemini Robotics-ER (extended reasoning) in March 2025. Gemini Robotics is an advanced VLA generalist model capable of directly controlling robots, executing smooth and reactive movements to tackle a wide range of complex manipulation tasks. Built on the capabilities of [Gemini](/wiki/gemini) 2.0, it extends multimodal understanding to physical action.[5]

The reasoning capabilities of the Gemini 2.0 backbone, paired with learned low-level robot actions, allow robots to perform highly dexterous tasks such as folding origami and playing with cards.[5] Gemini Robotics 1.5, released in 2025, brought AI agents further into the physical world by enabling robots to perceive, plan, think, use tools, and act to solve complex multi-step tasks.[23]

In March 2026, Google partnered with Agile Robots to integrate Gemini Robotics foundation models with industrial hardware for manufacturing and logistics applications. Google also brought its Intrinsic robotics software division in-house to accelerate physical AI development.[14]

### Figure AI Helix

[Figure AI](/wiki/figure_ai) developed Helix, a generalist VLA model that unifies perception, language understanding, and learned control for humanoid robots. Helix was the first VLA to output high-rate continuous control of the entire humanoid upper body, including wrists, torso, head, and individual fingers. It was also the first VLA to operate simultaneously on two robots, enabling them to solve shared, long-horizon manipulation tasks with items they had never encountered.[6]

Helix uses a dual-system approach: System 2 (S2), an onboard VLM operating at 7 to 9 Hz for scene understanding and language comprehension, and System 1 (S1), a fast reactive visuomotor policy that translates semantic representations into precise robot actions at 200 Hz.[6] Helix 02, released in January 2026, extended control to full-body autonomy including walking and balance. In a demonstration, Helix 02 autonomously unloaded and reloaded a dishwasher across a full-sized kitchen in a continuous four-minute task integrating walking, manipulation, and balance with no resets or human intervention.[7]

### Skild AI

[Skild AI](/wiki/skild_ai) is building a single, general-purpose artificial brain designed to control any robot for any task. The Skild Brain is "omni-bodied," meaning it can control various robot forms without prior knowledge of their exact body configuration, including quadrupeds, humanoids, tabletop arms, and mobile manipulators. In January 2026, Skild AI raised $1.4 billion in funding at a $14 billion valuation, led by SoftBank with participation from NVIDIA's NVentures, Bezos Expeditions, Samsung, LG, and Schneider Electric.[15]

## World Foundation Models and Simulation

A critical enabler of physical AI is the development of **world foundation models (WFMs)** and advanced simulation environments. A world foundation model is a neural network that predicts and generates physics-aware video of the future state of a virtual environment, using text, image, video, and movement inputs to simulate worlds that accurately model the spatial relationships of objects and their physical interactions.[25] Training physical AI systems directly in the real world is expensive, slow, and potentially dangerous. Simulation provides a scalable alternative, allowing AI agents to learn from millions of interactions in virtual environments before deploying to the real world.

### What is NVIDIA Cosmos?

[NVIDIA Cosmos](/wiki/nvidia_cosmos) is a platform of state-of-the-art generative world foundation models, advanced tokenizers, guardrails, and accelerated data processing pipelines, launched at CES 2025 to accelerate physical AI development.[25] Cosmos generates realistic synthetic data for training and validating physical AI models, helping bridge the gap between simulation and reality.[8] In the platform's foundational research paper, NVIDIA defined the goal directly: "A World Foundation Model (WFM) is a general-purpose world model that can be fine-tuned into customized world models for downstream applications."[26]

Key components of the Cosmos platform include:

| Component | Description |
|---|---|
| Cosmos Predict 2.5 | Unifies Text2World, Image2World, and Video2World generation in a single flow-based model; trained on 200 million curated video clips and released at 2B and 14B parameter scales[27] |
| Cosmos Transfer 2.5 | Enables high-fidelity, spatially controlled world-to-world style transfer; 3.5x smaller than its predecessor |
| Cosmos Reason | An open reasoning VLM for physical AI that provides step-by-step planning using common sense and physics |
| Cosmos 3 | The first world foundation model unifying synthetic world generation, vision reasoning, and action simulation |

Cosmos models can be integrated into synthetic data pipelines running in NVIDIA Isaac Sim, the open-source robotics simulation framework built on the [NVIDIA Omniverse](/wiki/nvidia_omniverse) platform. By generating photorealistic videos from simulated physics-based environments, these WFMs help reduce the simulation-to-real gap.[24]

### NVIDIA Omniverse and Isaac Sim

[NVIDIA Omniverse](/wiki/nvidia_omniverse) is a platform for building and operating physically accurate [digital twin](/wiki/digital_twin) simulations. It provides the infrastructure for creating virtual environments where physical AI systems can be trained, tested, and validated before real-world deployment.[9]

NVIDIA Isaac Sim is a robotics simulation application built on Omniverse that enables researchers and developers to design, simulate, test, and train AI-based robots in physically accurate virtual environments. The Newton Physics Engine, released in late 2025, provides GPU-accelerated physics simulation within Isaac Lab for training robotic policies.[9]

The simulation-to-reality (sim-to-real) pipeline typically works as follows:

1. **Design**: Create a digital twin of the robot and its operating environment in Omniverse
2. **Train**: Use [reinforcement learning](/wiki/reinforcement_learning) or imitation learning in Isaac Sim to train robot policies across thousands of parallel environments
3. **Generate data**: Use Cosmos WFMs to generate diverse, photorealistic training data that augments simulation data
4. **Validate**: Test policies in progressively more realistic simulated scenarios
5. **Deploy**: Transfer validated policies to real robots, with continued fine-tuning from real-world data

### The Sim-to-Real Gap

The **simulation-to-reality gap** (sim-to-real gap) remains one of the central challenges in physical AI. This gap refers to the discrepancies between simulated and real-world environments that cause policies trained in simulation to perform poorly on real hardware. Sources of this gap include:

- Imperfect physics modeling (friction, deformable objects, fluid dynamics)
- Visual domain differences (lighting, textures, reflections)
- Sensor noise and calibration differences
- Actuator dynamics and mechanical tolerances

Several techniques have been developed to address the sim-to-real gap:

| Technique | Description |
|---|---|
| Domain randomization | Varying simulation parameters (lighting, textures, physics properties) to make policies robust to real-world variation |
| Domain adaptation | Using techniques like adversarial training to align simulated and real feature distributions |
| Policy distillation | Transferring learned behaviors from a complex simulation policy to a simpler policy suitable for real deployment |
| Digital twins | Creating high-fidelity replicas of real environments to minimize the gap from the start |
| Zero-shot transfer | Training on sufficiently diverse synthetic data to enable direct deployment without real-world fine-tuning |

Notable progress has been demonstrated by the Allen Institute for AI (Ai2), whose MolmoBot project showed that with sufficient diversity across scenes, objects, lighting, physics, and task definitions, zero-shot transfer from simulation alone is practical for real-world robotic manipulation.[10]

## Key Companies and Organizations

The physical AI landscape involves major technology companies, specialized startups, and research institutions. The following table summarizes the leading players as of early 2026:

| Company | Focus Area | Key Products/Models | Notable Developments |
|---|---|---|---|
| [NVIDIA](/wiki/nvidia) | Platform and infrastructure | Cosmos, Isaac GR00T, Omniverse, Isaac Sim | Provides the foundational compute, simulation, and model platform for much of the industry |
| [Google DeepMind](/wiki/google_deepmind) | Foundation models, robotics research | Gemini Robotics, Gemini Robotics-ER | Partnered with Boston Dynamics and Agile Robots; brought Intrinsic in-house |
| [Physical Intelligence](/wiki/physical_intelligence) | General-purpose robot foundation models | pi0, pi0.5 | Raised over $600M; open-sourced pi0; backed by OpenAI |
| [Figure AI](/wiki/figure_ai) | Humanoid robots | Figure 03, Helix, Helix 02 | First VLA with full-body humanoid control; targeting home environments |
| [Tesla](/wiki/tesla) | Humanoid robots, autonomous driving | [Optimus](/wiki/tesla_optimus), [FSD](/wiki/tesla_fsd) | Leverages FSD neural networks for Optimus; planning 50,000 units by 2026 |
| [Boston Dynamics](/wiki/boston_dynamics) | Humanoid robots, industrial automation | Atlas | Production began in 2026; 30,000-unit/year factory planned; partnered with Google DeepMind |
| [Skild AI](/wiki/skild_ai) | Universal robot brain | Skild Brain | $1.4B funding at $14B valuation; omni-bodied control across robot types |
| [Agility Robotics](/wiki/agility_robotics) | Logistics humanoid robots | Digit | Moved 100,000+ totes in commercial operations; customers include Amazon and GXO |
| [Apptronik](/wiki/apptronik) | Humanoid robots | Apollo | Over $770M in total funding; partnered with Google for Gemini integration |
| [1X Technologies](/wiki/1x_technologies) | Home humanoid robots | NEO | Accepting pre-orders at $20K; targeting 2026 US launch |
| Allen Institute for AI | Open research | MolmoBot | Demonstrated zero-shot sim-to-real transfer with fully open models |

## Applications

### Robotics and Manufacturing

Manufacturing represents one of the most immediate and high-value application domains for physical AI. Industrial robots powered by physical AI can perform tasks that previously required human judgment and dexterity, including assembly, quality inspection, material handling, and packaging.

The global market value of industrial robot installations reached an all-time high of $16.7 billion, with annual installations exceeding 500,000 units for the fourth consecutive year in 2024. Physical AI is accelerating this trend by enabling robots to handle more complex, unstructured tasks.

In March 2026, ABB and NVIDIA announced progress in closing the simulation-to-reality gap in industrial robotics. [Boston Dynamics](/wiki/boston_dynamics) began manufacturing production Atlas robots immediately after their CES 2026 unveiling, with all 2026 deployments already committed to customers including Hyundai and Google DeepMind.[13] The Atlas robot can perform a wide array of industrial tasks with a reach of up to 7.5 feet and the ability to lift 110 pounds.[12]

### Autonomous Vehicles

[Autonomous driving](/wiki/autonomous_driving) is a foundational application of physical AI, requiring real-time perception, prediction, and planning in highly dynamic environments. Self-driving systems must understand complex traffic scenarios, predict the behavior of other road users, and execute safe driving decisions.

NVIDIA's Alpamayo Autonomous Driving Platform features a 10-billion-parameter Vision-Language-Action model that leverages chain-of-thought reasoning to handle complex driving scenarios. Based on the Physical AI Open Dataset with more than 1,700 hours of driving data collected from over 2,500 cities in 25 countries, Alpamayo has been selected by Mercedes-Benz for integration into its vehicles.

[Tesla](/wiki/tesla)'s approach to physical AI in autonomous driving centers on its Full Self-Driving (FSD) platform, which uses camera-based perception with end-to-end [neural networks](/wiki/neural_network) for autonomous navigation and object detection. The same neural network architecture underpinning FSD has been adapted for the Optimus humanoid robot, demonstrating how physical AI techniques can transfer across different embodiments.[21] Tesla expanded FSD globally in 2026, with public road testing launched in Japan in March 2026.

Autonomous vehicles with Level 4 capabilities (fully autonomous in defined conditions) are demonstrating viability in 2026, with broader commercial deployment expected within three to five years.[22]

### Warehouse and Logistics

Warehouse automation is a rapidly growing application of physical AI. [Amazon](/wiki/amazon) operates over one million robots in its warehouses as of 2026, and AI-orchestrated warehouse systems are reducing processing times by up to 60 percent.

[Agility Robotics](/wiki/agility_robotics)' Digit robot, purpose-built for logistics workflows, has moved over 100,000 totes in commercial operations with customers including GXO Logistics, Amazon, Schaeffler, and Spanx factories. Unlike general-purpose humanoids, Digit demonstrates the value of domain-specific physical AI optimized for particular operational environments.

### Healthcare and Assistance

Physical AI is finding applications in healthcare through surgical robots, rehabilitation systems, and assistive devices. AI-powered surgical robots can perform procedures with greater precision than human surgeons in certain tasks, while assistive robots help elderly or disabled individuals with daily activities.

Home assistance represents a longer-term goal for physical AI companies. Figure AI's Figure 03 robot is designed with home environments in mind, featuring soft materials, wireless charging, and safety features, though consumer availability is not expected until late 2026 at the earliest in limited pilot programs. 1X Technologies is accepting pre-orders for its NEO home humanoid robot at $20,000, targeting a 2026 US launch.

## The Role of NVIDIA

NVIDIA has positioned itself as the central platform provider for the physical AI ecosystem, analogous to its role in the broader AI revolution through GPU computing. The company's physical AI stack spans multiple layers:

- **Compute hardware**: NVIDIA GPUs and specialized accelerators (including the Blackwell architecture) provide the computational power needed for training and running physical AI models
- **Simulation**: Omniverse and Isaac Sim provide the virtual environments for developing and testing physical AI systems
- **Foundation models**: Cosmos WFMs, GR00T, and Cosmos Reason provide pre-trained models that companies can customize for specific applications
- **Inference**: NVIDIA's Jetson platform provides edge computing for deploying AI models on robots and autonomous systems

At GTC 2025 and CES 2026, Jensen Huang outlined NVIDIA's vision for physical AI as the next major computing platform, comparing the coming wave of intelligent robots and autonomous systems to the personal computer and smartphone revolutions.[11] NVIDIA has partnered with virtually every major robotics company, including Boston Dynamics, Figure AI, Agility Robotics, Apptronik, and Skild AI, providing compute infrastructure, simulation tools, and foundation models.[9]

## How does SoftBank frame physical AI?

NVIDIA is not the only large company to make "physical AI" central to its strategy. In SoftBank Group's 2025 reporting, chairman and CEO Masayoshi Son declared that "SoftBank's next frontier is Physical AI," framing it as the fusion of artificial super intelligence (ASI) with robotics and identifying physical AI as the next trillion-dollar business opportunity.[28] SoftBank's stated mission is to realize ASI for the advancement of humanity, with investment concentrated across four areas: AI chips, AI robots, AI data centers, and energy.[28]

The company backed this framing with capital. On October 8, 2025, SoftBank Group entered a definitive agreement to acquire ABB's robotics business for a purchase price of $5.375 billion, a deal expected to close in mid to late 2026 and intended to accelerate SoftBank's physical AI ambitions.[29] SoftBank also led Skild AI's $1.4 billion round in January 2026.[15] This convergence, with both NVIDIA (as the platform and model supplier) and SoftBank (as a capital allocator and robotics owner) organizing their strategies around the same term, illustrates how "physical AI" has become the dominant industry framing for the embodied wave of artificial intelligence.

## Market and Investment Landscape

The physical AI market is experiencing extraordinary growth in both investment and market size:

| Metric | Value |
|---|---|
| Physical AI market size (2025) | $5.23 billion |
| Projected physical AI market (2033) | $49.73 billion (CAGR 32.53%) |
| Physical AI software platform market (projected 2034) | $55.8 billion (CAGR 42.0%) |
| Humanoid robot market (2025) | $2.92 billion |
| Projected humanoid market (2030) | $15.26 billion (CAGR 39.2%) |
| Long-term humanoid market (2050, Morgan Stanley estimate) | $5 trillion |
| Total robotics funding (2025) | Over $10.3 billion |
| Humanoid-specific funding (H1 2025) | $3.1 billion across 61 deals |

Major funding rounds in 2025 and early 2026 reflect the scale of investor interest:

| Company | Round | Amount | Valuation |
|---|---|---|---|
| [Skild AI](/wiki/skild_ai) | Series C (Jan 2026) | $1.4 billion | $14 billion |
| [Figure AI](/wiki/figure_ai) | Series B | $675 million | Not disclosed |
| [Physical Intelligence](/wiki/physical_intelligence) | Series B | ~$600 million | ~$5.3 billion |
| Galaxy Bot | Series A | $453 million | Not disclosed |
| [Apptronik](/wiki/apptronik) | Series A | $403 million | Not disclosed |

Goldman Sachs projects global humanoid shipments of 50,000 to 100,000 units in 2026, with unit economics ultimately improving to $15,000 to $20,000 per robot as production scales.

## Challenges and Limitations

Despite rapid progress, physical AI faces significant technical and practical challenges:

### Technical Challenges

- **Sim-to-real gap**: Transferring policies trained in simulation to real-world robots remains difficult due to differences in physics, visuals, and sensor characteristics. While techniques like domain randomization and digital twins help, achieving reliable zero-shot transfer across diverse real-world conditions is still an active area of research.
- **Generalization**: Current physical AI systems can struggle when encountering objects, environments, or situations significantly different from their training data. Achieving truly general-purpose physical intelligence that can handle the full diversity of real-world scenarios remains a long-term goal.
- **Safety and reliability**: Physical AI systems interact with the real world, where failures can cause property damage or injury. Ensuring consistent, safe behavior across all possible situations is substantially harder than in purely digital AI applications.
- **Real-time performance**: Physical AI systems must often make decisions and execute actions within milliseconds. Balancing the computational demands of sophisticated reasoning with the latency requirements of real-time control is a fundamental engineering challenge.
- **Dexterous manipulation**: While progress has been rapid, robots still fall short of human-level dexterity in manipulating diverse objects, especially soft, deformable, or very small items.

### Practical Challenges

- **Cost**: Advanced physical AI systems, particularly humanoid robots, remain expensive. While prices are projected to decrease significantly with scale, current costs limit widespread deployment.
- **Energy and compute requirements**: Running sophisticated AI models on mobile robots requires significant computing power and energy, constraining battery life and operational duration.
- **Regulatory frameworks**: As physical AI systems become more autonomous, regulatory frameworks for safety certification, liability, and operational boundaries are still being developed.
- **Workforce transition**: The deployment of physical AI in manufacturing, logistics, and other sectors raises important questions about workforce displacement and the need for retraining programs.

## Safety and functional safety

Because physical AI systems act on the world through motors, wheels, and limbs, their failures can cause property damage or physical injury, which makes safety a first-order concern rather than an afterthought. Ensuring consistent, safe behavior across the full diversity of real-world situations is substantially harder than in purely digital AI, and it requires more than a reliable model: it requires a safety architecture spanning the compute hardware, the sensors and connectivity, the runtime software, and an independent process for validation and certification. Established functional-safety standards from adjacent fields, including IEC 61508 (general functional safety of electrical and electronic systems), ISO 13849 (safety of machinery control systems), and ISO/IEC TR 5469 (the use of AI within safety-related functions), are being adapted to govern these systems.[30]

### NVIDIA Halos for Robotics

On June 22, 2026, [NVIDIA](/wiki/nvidia) announced Halos for Robotics, described as the industry's first full-stack safety system for physical AI.[30] [NVIDIA Halos](/wiki/nvidia_halos) had originated as a safety system for autonomous vehicles, and Halos for Robotics extends that work, reusing the development processes, tools, and foundational standards built up across NVIDIA's autonomous-vehicle program so that robotics teams can inherit existing automotive safety engineering rather than rebuild it.[31] Deepu Talla, NVIDIA's vice president of robotics and edge AI, framed the motivation by noting that physical AI is transforming factories, warehouses, and logistics operations and that robotics teams need a unified safety architecture to scale autonomous systems into those environments.[30]

Halos for Robotics is organized as three layers:[31]

| Layer | Components | Role |
|---|---|---|
| Compute and connectivity | [Jetson Thor](/wiki/jetson_thor)-class IGX Thor; Holoscan Sensor Bridge | IGX Thor pairs high AI performance with an IEC 61508 SIL 3 capable Safety Island and thousands of on-chip safety mechanisms; the Holoscan Sensor Bridge extends the safety chain to the sensor edge with authenticated, encrypted data flows |
| Software | Halos OS, including Halos Core and the Outside-In Safety Blueprint | Halos Core provides a certified safe Linux (or Linux plus QNX) runtime and safety firmware; the Outside-In Safety Blueprint is a reference design that lets robots operate safely using external infrastructure cameras through AI perception, anomaly monitoring, multi-camera event fusion, and a safety decision maker running on the Functional Safety Island |
| Validation | Halos AI Systems Inspection Lab | An ANSI/ANAB-accredited inspection body (ISO/IEC 17020) that assesses elements of the Halos stack and issues inspection certificates, reducing the certification burden on partners |

The system targets the functional-safety standards relevant to robotics, including IEC 61508, ISO 13849, and ISO/IEC TR 5469.[31] [Agility Robotics](/wiki/agility_robotics) was named as an early adopter, integrating IGX Thor and Halos Core into its Digit humanoid robot for industrial logistics, manufacturing, and warehouse use, and the Halos inspection-lab ecosystem included other robotics firms such as Boston Dynamics and KION Group.[30] For a fuller treatment of the architecture and its automotive origins, see the dedicated [NVIDIA Halos](/wiki/nvidia_halos) article.

## Current State and Future Outlook (2025-2026)

As of early 2026, physical AI has reached a critical inflection point. Several converging trends indicate that the field is transitioning from research and prototyping to commercial deployment:

- **Foundation models are maturing**: VLA models like GR00T N1, Gemini Robotics, pi0, and Helix have demonstrated increasingly general and capable robot control across diverse tasks and environments.
- **Simulation tools are improving**: NVIDIA Cosmos, Omniverse, and Isaac Sim provide increasingly realistic training environments, narrowing the sim-to-real gap.
- **Hardware is scaling**: Multiple companies (Boston Dynamics, Figure AI, Tesla, Agility Robotics) are moving from prototypes to production manufacturing, with tens of thousands of units planned for 2026 and beyond.
- **Investment is accelerating**: Over $10 billion in robotics funding in 2025, with humanoid-specific investment in the first half of 2025 alone exceeding the total from 2010 to 2024.
- **Commercial deployments are expanding**: Atlas robots are shipping to Hyundai and Google DeepMind; Digit robots are operating in commercial warehouses; Tesla plans limited Optimus sales by late 2026.

Deloitte's 2026 Technology Trends report identified physical AI and humanoid robots as a major trend, noting the convergence of vision, sensing, cobots, and AI that is enabling humans and mobile robots to work together in increasingly flexible environments.[18] Gartner predicted that 40 percent of enterprise applications would leverage task-specific AI agents by 2026, up from less than 5 percent in 2025.

Looking further ahead, the physical AI field is expected to progress through several phases: near-term commercial deployment in structured environments like factories and warehouses (2025 to 2027), broader deployment in semi-structured environments like stores and hospitals (2027 to 2030), and eventual deployment in fully unstructured environments like homes and outdoor spaces (2030 and beyond).

## How does physical AI differ from embodied AI?

Physical AI intersects with and builds upon several related fields:

- **[Embodied AI](/wiki/embodied_ai)**: The broader research field studying AI systems with physical bodies that interact with environments. Physical AI can be considered the applied, commercial manifestation of embodied AI research; the two terms are often used interchangeably, but "physical AI" is the industry and product framing popularized by NVIDIA and SoftBank, while "embodied AI" is the older academic term.
- **[Robot learning](/wiki/robot_learning)**: Techniques for training robots through [reinforcement learning](/wiki/reinforcement_learning), [imitation learning](/wiki/imitation_learning), and other methods. Physical AI relies heavily on robot learning approaches.
- **[Robot foundation models](/wiki/robot_foundation_model)**: Large pretrained models, including VLA models, that serve as a general-purpose control layer adaptable across many robots and tasks. These are the model-side core of physical AI.
- **[World models](/wiki/world_model)**: Learned simulators of environment dynamics that let an agent predict and plan; they underpin both the cognition layer of physical AI and the world foundation models used to generate synthetic training data.
- **[Computer vision](/wiki/computer_vision)**: The perception backbone of physical AI systems, providing visual understanding of the environment.
- **[Natural language processing](/wiki/natural_language_processing)**: Enables physical AI systems to understand and follow human language instructions.
- **[Digital twins](/wiki/digital_twin)**: Virtual replicas of physical systems used for simulation, testing, and monitoring.
- **[Edge computing](/wiki/edge_computing)**: Provides the on-device compute necessary for real-time physical AI inference on robots and autonomous systems.

## See Also

- [Embodied AI](/wiki/embodied_ai)
- [World Model](/wiki/world_model)
- [Robot Foundation Model](/wiki/robot_foundation_model)
- [NVIDIA Cosmos](/wiki/nvidia_cosmos)
- [Robot Learning](/wiki/robot_learning)
- [Autonomous Driving](/wiki/autonomous_driving)
- [Humanoid Robot](/wiki/humanoid_robot)
- [Vision-Language-Action Model](/wiki/vision-language-action_model)
- [NVIDIA Omniverse](/wiki/nvidia_omniverse)
- [Reinforcement Learning](/wiki/reinforcement_learning)
- [Digital Twin](/wiki/digital_twin)
- [Computer Vision](/wiki/computer_vision)

## References

1. NVIDIA Blog. "CES 2025: AI Advancing at 'Incredible Pace,' NVIDIA CEO Says." January 2025. https://blogs.nvidia.com/blog/ces-2025-jensen-huang/
2. Axios. "Nvidia CES 2026: Jensen Huang says 'ChatGPT moment for physical AI' is coming." January 2026. https://www.axios.com/2026/01/05/nvidia-ces-2026-jensen-huang-speech-ai
3. NVIDIA Newsroom. "NVIDIA Announces Isaac GR00T N1 -- the World's First Open Humanoid Robot Foundation Model." March 2025. https://nvidianews.nvidia.com/news/nvidia-isaac-gr00t-n1-open-humanoid-robot-foundation-model-simulation-frameworks
4. Physical Intelligence. "Our First Generalist Policy." 2024. https://www.pi.website/blog/pi0
5. Google DeepMind. "Gemini Robotics brings AI into the physical world." March 2025. https://deepmind.google/blog/gemini-robotics-brings-ai-into-the-physical-world/
6. Figure AI. "Helix: A Vision-Language-Action Model for Generalist Humanoid Control." 2025. https://www.figure.ai/news/helix
7. Figure AI. "Introducing Helix 02: Full-Body Autonomy." January 2026. https://www.figure.ai/news/helix-02
8. NVIDIA. "NVIDIA Cosmos: World Foundation Models Powering Physical AI." https://www.nvidia.com/en-us/ai/cosmos/
9. NVIDIA Newsroom. "NVIDIA Opens Portals to World of Robotics With New Omniverse Libraries, Cosmos Physical AI Models and AI Computing Infrastructure." 2025. https://nvidianews.nvidia.com/news/nvidia-opens-portals-to-world-of-robotics-with-new-omniverse-libraries-cosmos-physical-ai-models-and-ai-computing-infrastructure
10. Allen Institute for AI. "MolmoBot: Open, Simulation-First Stack for Physical AI." 2025. https://allenai.org/blog/molmobot
11. Superb AI. "Jensen Huang Declares 'Physical AI' the Next Wave of AI." 2025. https://superb-ai.com/en/resources/blog/physical-ai-series-1-what-is-it-en
12. Boston Dynamics. "Atlas Humanoid Robot." https://bostondynamics.com/products/atlas/
13. Engadget. "Boston Dynamics unveils production-ready version of Atlas robot at CES 2026." January 2026. https://www.engadget.com/big-tech/boston-dynamics-unveils-production-ready-version-of-atlas-robot-at-ces-2026-234047882.html
14. CNBC. "Google wants Intrinsic to be 'Android of robotics' as it pushes into physical AI." February 2026. https://www.cnbc.com/2026/02/28/google-wants-intrinsic-to-be-android-for-robots-moves-into-physical-ai.html
15. AI Business. "AI Startup That Builds a Brain for Robots Valued at $14 Billion." January 2026. https://aibusiness.com/robotics/skild-ai-startup-builds-robot-brain
16. SNS Insider. "Physical AI Market Size, Share & Growth Report 2033." https://www.snsinsider.com/reports/physical-ai-market-9007
17. Morgan Stanley. "Humanoid Robot Market Expected to Reach $5 Trillion by 2050." https://www.morganstanley.com/insights/articles/humanoid-robot-market-5-trillion-by-2050
18. Deloitte. "AI goes physical: Navigating the convergence of AI and robotics." 2026. https://www.deloitte.com/us/en/insights/topics/technology-management/tech-trends/2026/physical-ai-humanoid-robots.html
19. Wikipedia. "Vision-language-action model." https://en.wikipedia.org/wiki/Vision-language-action_model
20. Arxiv. "Physical AI Agents: Integrating Cognitive Intelligence with Real-World Action." 2025. https://arxiv.org/html/2501.08944v1
21. Tesla. "AI & Robotics." https://www.tesla.com/AI
22. Barclays. "AI gets physical: Innovation meets opportunity." 2025. https://www.ib.barclays/our-insights/series/impact-series/ai-gets-physical-innovation-meets-opportunity.html
23. Google DeepMind. "Gemini Robotics 1.5 brings AI agents into the physical world." 2025. https://deepmind.google/blog/gemini-robotics-15-brings-ai-agents-into-the-physical-world/
24. NVIDIA Developer Blog. "Scale Synthetic Data and Physical AI Reasoning with NVIDIA Cosmos World Foundation Models." 2025. https://developer.nvidia.com/blog/scale-synthetic-data-and-physical-ai-reasoning-with-nvidia-cosmos-world-foundation-models/
25. NVIDIA Newsroom. "NVIDIA Launches Cosmos World Foundation Model Platform to Accelerate Physical AI Development." January 2025. https://nvidianews.nvidia.com/news/nvidia-launches-cosmos-world-foundation-model-platform-to-accelerate-physical-ai-development
26. NVIDIA Research. "Cosmos World Foundation Model Platform for Physical AI." arXiv:2501.03575. January 2025. https://arxiv.org/abs/2501.03575
27. NVIDIA Research. "Cosmos-Predict2.5: Improved World Simulation with Video Foundation Models." 2025. https://research.nvidia.com/labs/cosmos-lab/cosmos-predict2.5/
28. SoftBank Group. "CEO Message (Masayoshi Son), SoftBank Group Report 2025." 2025. https://group.softbank/en/ir/financials/annual_reports/2025/message/son
29. SoftBank Group. "Acquisition of ABB Ltd's Robotics Business." October 8, 2025. https://group.softbank/en/news/press/20251008
30. NVIDIA Newsroom. "NVIDIA Announces Halos for Robotics, the Industry's First Full-Stack Safety System for Physical AI." June 22, 2026. https://nvidianews.nvidia.com/news/nvidia-announces-halos-for-robotics-the-industrys-first-full-stack-safety-system-for-physical-ai
31. NVIDIA Developer Blog. "Inside NVIDIA Halos for Robotics: A Full-Stack Functional Safety System for Physical AI." June 2026. https://developer.nvidia.com/blog/inside-nvidia-halos-for-robotics-a-full-stack-functional-safety-system-for-physical-ai/

