Physical AI

Physical AI refers to artificial intelligence systems that can perceive, understand, reason about, and interact with the physical world. Unlike purely digital AI, which operates on text, images, or data within software environments, physical AI bridges the gap between digital intelligence and real-world action. These systems combine advanced perception, cognitive reasoning, planning, and motor control to enable machines such as robots, autonomous vehicles, and industrial automation systems to operate intelligently in dynamic, unstructured environments.

The term gained widespread prominence through NVIDIA CEO Jensen Huang, who positioned physical AI as the third major era of artificial intelligence at CES 2025. Huang described the progression from perception AI (understanding images, words, and sounds) to generative AI (creating text, images, and media) to physical AI (perceiving, reasoning, planning, and acting in the real world). At CES 2026, Huang declared that "the ChatGPT moment for physical AI" had arrived, signaling that machines were beginning to understand, reason, and act in the physical world at a transformative scale.

Overview

Physical AI encompasses a broad set of capabilities that allow intelligent systems to function in the real world. At its core, a physical AI system must be able to:

Perceive its environment through sensors such as cameras, LiDAR, radar, proximity sensors, and inertial measurement units (IMUs)
Understand the physical properties of objects and scenes, including spatial relationships, material properties, and the laws of physics
Reason and plan sequences of actions to accomplish goals, adapting to new situations and unexpected changes
Act on the physical world through actuators, robotic arms, wheels, legs, or other mechanical systems
Learn and adapt from experience, improving performance over time through feedback loops

This closed-loop integration of perception, cognition, and action distinguishes physical AI from other forms of artificial intelligence. While a large language model like GPT-4 can reason about the world through text, it cannot fold laundry, drive a car, or assemble a product on a factory line. Physical AI aims to bring that level of intelligence into tangible, real-world applications.

Physical AI draws on insights from cognitive science and neuroscience, building on the idea that intelligence emerges from the dynamic coupling of perception, cognition, and physical interaction. This concept, sometimes called embodied cognition, suggests that an agent's physical form and its ability to interact with the environment are integral to how it develops and applies intelligence.

Key Components

Perception

The perception layer serves as a physical AI system's sensory interface with the world. It captures and processes real-time environmental data to build an internal representation of the surroundings. Sensors typically include:

Sensor Type	Function	Common Applications
Cameras (RGB, depth)	Visual scene understanding, object recognition	Autonomous driving, robotic manipulation
LiDAR	3D spatial mapping, distance measurement	Self-driving cars, drone navigation
Radar	Velocity detection, obstacle tracking	Automotive safety, industrial monitoring
IMUs (accelerometers, gyroscopes)	Orientation, balance, motion tracking	Humanoid robots, drones
Force/torque sensors	Contact force measurement, tactile feedback	Robotic grasping, assembly tasks
Proximity sensors	Near-field object detection	Warehouse robots, collaborative robots

Modern physical AI systems increasingly use multimodal perception, fusing data from multiple sensor types to create richer, more robust environmental understanding. Computer vision has advanced rapidly with transformer-based architectures, enabling real-time object detection, scene segmentation, and spatial reasoning.

Cognition and Planning

The cognitive layer processes perceptual inputs and generates plans of action. In physical AI, this often involves:

World modeling: Building internal representations of the environment that allow the system to predict the outcomes of potential actions. World models are central to reasoning and planning, enabling agents to simulate and evaluate different courses of action before executing them.
Task decomposition: Breaking complex goals into manageable subtasks. For example, the instruction "clean the kitchen table" requires identifying the table, recognizing objects on it, planning a sequence of pick-and-place actions, and executing them in order.
Physical reasoning: Understanding the laws of physics as they apply to real-world interactions, such as predicting where a ball will roll, estimating the force needed to grasp an object without damaging it, or inferring that a pedestrian might be hidden behind a parked car.
Common-sense reasoning: Applying everyday knowledge that humans take for granted, such as understanding that liquids spill, fragile objects break, and heavy objects require more force to move.

Recent advances in large language models and vision-language models have significantly improved the cognitive capabilities of physical AI systems. These models provide a form of "System 2" thinking (slow, deliberate reasoning) that complements the fast, reflexive "System 1" control needed for real-time physical interaction.

Action and Control

The action layer translates cognitive plans into physical movements. This involves generating precise motor commands for robotic actuators, whether those are robotic arms, grippers, legs for walking, or wheels for navigation. Key challenges in action and control include:

Dexterous manipulation: Handling objects with human-like precision, including grasping irregularly shaped items, using tools, and performing fine motor tasks
Locomotion: Walking, running, or navigating over uneven terrain, maintaining balance in dynamic conditions
Real-time responsiveness: Reacting to unexpected events (a dropped object, a moving obstacle) within milliseconds
Multi-robot coordination: Orchestrating multiple physical AI agents to work together on shared tasks

Foundation Models for Physical AI

A major development driving physical AI forward is the emergence of foundation models specifically designed for robotic control and physical interaction. These models, often called Vision-Language-Action (VLA) models, represent a convergence of computer vision, natural language processing, and robotic control into unified architectures.

Vision-Language-Action (VLA) Models

A Vision-Language-Action model is a class of multimodal foundation model that integrates three capabilities: vision (camera images or video of the environment), language (natural language instructions), and action (low-level robot commands such as motor movements, joint angles, or gripper states). Given an input image of the robot's surroundings and a text instruction like "pick up the red cup and place it in the sink," a VLA directly outputs robot actions that can be executed to accomplish the task.

VLAs are generally constructed by fine-tuning a vision-language model (VLM) on large-scale datasets that pair visual observations and language instructions with robot trajectories. The architecture typically combines a vision-language encoder (often a vision transformer) with an action decoder that transforms latent representations into continuous output actions.

Several architectural paradigms have emerged in the VLA space as of 2025:

Paradigm	Description	Example Models
Early fusion	Vision, language, and action tokens are combined into a single sequence processed by one transformer	OpenVLA, SmolVLA
Dual-system architecture	A slow "System 2" VLM for reasoning paired with a fast "System 1" policy for real-time control	GR00T N1, Helix
Flow matching	Uses continuous normalizing flows to produce smooth action trajectories at high frequency	pi0
Self-correcting	Models that detect and recover from execution errors using visual feedback	CoA-VLA

NVIDIA Isaac GR00T

NVIDIA announced Isaac GR00T N1 in March 2025 as the world's first open, fully customizable foundation model for generalized humanoid reasoning and skills. GR00T N1 features a dual-system architecture inspired by principles of human cognition. "System 2" is a slow-thinking model powered by a vision-language model that reasons about the environment and instructions to plan actions. "System 1" is a fast-thinking action model that translates these plans into precise, continuous robot movements.

GR00T N1 can generalize across common tasks such as grasping, moving objects with one or both arms, and transferring items between arms, as well as performing multi-step tasks that require long context and combinations of general skills. These capabilities can be applied across use cases including material handling, packaging, and inspection.

The model was updated to GR00T N1.6 in late 2025, integrating NVIDIA Cosmos Reason, an open reasoning vision-language model built for physical AI. Cosmos Reason acts as the robot's deep-thinking brain, turning vague instructions into step-by-step plans using prior knowledge, common sense, and physics to handle new situations. Leading humanoid developers with early access to GR00T N1 include Agility Robotics, Boston Dynamics, Mentee Robotics, and NEURA Robotics.

Physical Intelligence pi0

Physical Intelligence (often stylized as Pi or using the Greek letter) developed pi0 (pi-zero), a general-purpose VLA foundation model for robots. Built on top of the PaliGemma VLM, pi0 was trained on data from seven robotic platforms performing 68 unique tasks. The model employs flow matching to produce smooth, real-time action trajectories at 50 Hz.

pi0 demonstrated strong zero-shot and fine-tuned performance on complex real-world tasks including laundry folding, table bussing, grocery bagging, box assembly, and object retrieval. Physical Intelligence open-sourced the model through its "openpi" release, enabling the broader robotics community to fine-tune pi0 for their own robots and tasks. A subsequent model, pi0.5, introduced in 2025, exhibited meaningful generalization to entirely new environments.

Google DeepMind Gemini Robotics

Google DeepMind introduced Gemini Robotics and Gemini Robotics-ER (extended reasoning) in March 2025. Gemini Robotics is an advanced VLA generalist model capable of directly controlling robots, executing smooth and reactive movements to tackle a wide range of complex manipulation tasks. Built on the capabilities of Gemini 2.0, it extends multimodal understanding to physical action.

The reasoning capabilities of the Gemini 2.0 backbone, paired with learned low-level robot actions, allow robots to perform highly dexterous tasks such as folding origami and playing with cards. Gemini Robotics 1.5, released in 2025, brought AI agents further into the physical world by enabling robots to perceive, plan, think, use tools, and act to solve complex multi-step tasks.

In March 2026, Google partnered with Agile Robots to integrate Gemini Robotics foundation models with industrial hardware for manufacturing and logistics applications. Google also brought its Intrinsic robotics software division in-house to accelerate physical AI development.

Figure AI Helix

Figure AI developed Helix, a generalist VLA model that unifies perception, language understanding, and learned control for humanoid robots. Helix was the first VLA to output high-rate continuous control of the entire humanoid upper body, including wrists, torso, head, and individual fingers. It was also the first VLA to operate simultaneously on two robots, enabling them to solve shared, long-horizon manipulation tasks with items they had never encountered.

Helix uses a dual-system approach: System 2 (S2), an onboard VLM operating at 7 to 9 Hz for scene understanding and language comprehension, and System 1 (S1), a fast reactive visuomotor policy that translates semantic representations into precise robot actions at 200 Hz. Helix 02, released in January 2026, extended control to full-body autonomy including walking and balance. In a demonstration, Helix 02 autonomously unloaded and reloaded a dishwasher across a full-sized kitchen in a continuous four-minute task integrating walking, manipulation, and balance with no resets or human intervention.

Skild AI

Skild AI is building a single, general-purpose artificial brain designed to control any robot for any task. The Skild Brain is "omni-bodied," meaning it can control various robot forms without prior knowledge of their exact body configuration, including quadrupeds, humanoids, tabletop arms, and mobile manipulators. In January 2026, Skild AI raised $1.4 billion in funding at a $14 billion valuation, led by SoftBank with participation from NVIDIA's NVentures, Bezos Expeditions, Samsung, LG, and Schneider Electric.

World Foundation Models and Simulation

A critical enabler of physical AI is the development of world foundation models (WFMs) and advanced simulation environments. Training physical AI systems in the real world is expensive, slow, and potentially dangerous. Simulation provides a scalable alternative, allowing AI agents to learn from millions of interactions in virtual environments before deploying to the real world.

NVIDIA Cosmos

NVIDIA Cosmos is a platform of state-of-the-art generative world foundation models, advanced tokenizers, guardrails, and accelerated data processing pipelines. Cosmos generates realistic synthetic data for training and validating physical AI models, helping bridge the gap between simulation and reality.

Key components of the Cosmos platform include:

Component	Description
Cosmos Predict 2.5	Unifies Text2World, Image2World, and Video2World generation in a single model; trained on 200 million curated video clips
Cosmos Transfer 2.5	Enables high-fidelity, spatially controlled world-to-world style transfer; 3.5x smaller than its predecessor
Cosmos Reason	An open reasoning VLM for physical AI that provides step-by-step planning using common sense and physics
Cosmos 3	The first world foundation model unifying synthetic world generation, vision reasoning, and action simulation

Cosmos models can be integrated into synthetic data pipelines running in NVIDIA Isaac Sim, the open-source robotics simulation framework built on the NVIDIA Omniverse platform. By generating photorealistic videos from simulated physics-based environments, these WFMs help reduce the simulation-to-real gap.

NVIDIA Omniverse and Isaac Sim

NVIDIA Omniverse is a platform for building and operating physically accurate digital twin simulations. It provides the infrastructure for creating virtual environments where physical AI systems can be trained, tested, and validated before real-world deployment.

NVIDIA Isaac Sim is a robotics simulation application built on Omniverse that enables researchers and developers to design, simulate, test, and train AI-based robots in physically accurate virtual environments. The Newton Physics Engine, released in late 2025, provides GPU-accelerated physics simulation within Isaac Lab for training robotic policies.

The simulation-to-reality (sim-to-real) pipeline typically works as follows:

Design: Create a digital twin of the robot and its operating environment in Omniverse
Train: Use reinforcement learning or imitation learning in Isaac Sim to train robot policies across thousands of parallel environments
Generate data: Use Cosmos WFMs to generate diverse, photorealistic training data that augments simulation data
Validate: Test policies in progressively more realistic simulated scenarios
Deploy: Transfer validated policies to real robots, with continued fine-tuning from real-world data

The Sim-to-Real Gap

The simulation-to-reality gap (sim-to-real gap) remains one of the central challenges in physical AI. This gap refers to the discrepancies between simulated and real-world environments that cause policies trained in simulation to perform poorly on real hardware. Sources of this gap include:

Imperfect physics modeling (friction, deformable objects, fluid dynamics)
Visual domain differences (lighting, textures, reflections)
Sensor noise and calibration differences
Actuator dynamics and mechanical tolerances

Several techniques have been developed to address the sim-to-real gap:

Technique	Description
Domain randomization	Varying simulation parameters (lighting, textures, physics properties) to make policies robust to real-world variation
Domain adaptation	Using techniques like adversarial training to align simulated and real feature distributions
Policy distillation	Transferring learned behaviors from a complex simulation policy to a simpler policy suitable for real deployment
Digital twins	Creating high-fidelity replicas of real environments to minimize the gap from the start
Zero-shot transfer	Training on sufficiently diverse synthetic data to enable direct deployment without real-world fine-tuning

Notable progress has been demonstrated by the Allen Institute for AI (Ai2), whose MolmoBot project showed that with sufficient diversity across scenes, objects, lighting, physics, and task definitions, zero-shot transfer from simulation alone is practical for real-world robotic manipulation.

Key Companies and Organizations

The physical AI landscape involves major technology companies, specialized startups, and research institutions. The following table summarizes the leading players as of early 2026:

Company	Focus Area	Key Products/Models	Notable Developments
NVIDIA	Platform and infrastructure	Cosmos, Isaac GR00T, Omniverse, Isaac Sim	Provides the foundational compute, simulation, and model platform for much of the industry
Google DeepMind	Foundation models, robotics research	Gemini Robotics, Gemini Robotics-ER	Partnered with Boston Dynamics and Agile Robots; brought Intrinsic in-house
Physical Intelligence	General-purpose robot foundation models	pi0, pi0.5	Raised over $600M; open-sourced pi0; backed by OpenAI
Figure AI	Humanoid robots	Figure 03, Helix, Helix 02	First VLA with full-body humanoid control; targeting home environments
Tesla	Humanoid robots, autonomous driving	Optimus, FSD	Leverages FSD neural networks for Optimus; planning 50,000 units by 2026
Boston Dynamics	Humanoid robots, industrial automation	Atlas	Production began in 2026; 30,000-unit/year factory planned; partnered with Google DeepMind
Skild AI	Universal robot brain	Skild Brain	$1.4B funding at $14B valuation; omni-bodied control across robot types
Agility Robotics	Logistics humanoid robots	Digit	Moved 100,000+ totes in commercial operations; customers include Amazon and GXO
Apptronik	Humanoid robots	Apollo	Over $770M in total funding; partnered with Google for Gemini integration
1X Technologies	Home humanoid robots	NEO	Accepting pre-orders at $20K; targeting 2026 US launch
Allen Institute for AI	Open research	MolmoBot	Demonstrated zero-shot sim-to-real transfer with fully open models

Applications

Robotics and Manufacturing

Manufacturing represents one of the most immediate and high-value application domains for physical AI. Industrial robots powered by physical AI can perform tasks that previously required human judgment and dexterity, including assembly, quality inspection, material handling, and packaging.

The global market value of industrial robot installations reached an all-time high of $16.7 billion, with annual installations exceeding 500,000 units for the fourth consecutive year in 2024. Physical AI is accelerating this trend by enabling robots to handle more complex, unstructured tasks.

In March 2026, ABB and NVIDIA announced progress in closing the simulation-to-reality gap in industrial robotics. Boston Dynamics began manufacturing production Atlas robots immediately after their CES 2026 unveiling, with all 2026 deployments already committed to customers including Hyundai and Google DeepMind. The Atlas robot can perform a wide array of industrial tasks with a reach of up to 7.5 feet and the ability to lift 110 pounds.

Autonomous Vehicles

Autonomous driving is a foundational application of physical AI, requiring real-time perception, prediction, and planning in highly dynamic environments. Self-driving systems must understand complex traffic scenarios, predict the behavior of other road users, and execute safe driving decisions.

NVIDIA's Alpamayo Autonomous Driving Platform features a 10-billion-parameter Vision-Language-Action model that leverages chain-of-thought reasoning to handle complex driving scenarios. Based on the Physical AI Open Dataset with more than 1,700 hours of driving data collected from over 2,500 cities in 25 countries, Alpamayo has been selected by Mercedes-Benz for integration into its vehicles.

Tesla's approach to physical AI in autonomous driving centers on its Full Self-Driving (FSD) platform, which uses camera-based perception with end-to-end neural networks for autonomous navigation and object detection. The same neural network architecture underpinning FSD has been adapted for the Optimus humanoid robot, demonstrating how physical AI techniques can transfer across different embodiments. Tesla expanded FSD globally in 2026, with public road testing launched in Japan in March 2026.

Autonomous vehicles with Level 4 capabilities (fully autonomous in defined conditions) are demonstrating viability in 2026, with broader commercial deployment expected within three to five years.

Warehouse and Logistics

Warehouse automation is a rapidly growing application of physical AI. Amazon operates over one million robots in its warehouses as of 2026, and AI-orchestrated warehouse systems are reducing processing times by up to 60 percent.

Agility Robotics' Digit robot, purpose-built for logistics workflows, has moved over 100,000 totes in commercial operations with customers including GXO Logistics, Amazon, Schaeffler, and Spanx factories. Unlike general-purpose humanoids, Digit demonstrates the value of domain-specific physical AI optimized for particular operational environments.

Healthcare and Assistance

Physical AI is finding applications in healthcare through surgical robots, rehabilitation systems, and assistive devices. AI-powered surgical robots can perform procedures with greater precision than human surgeons in certain tasks, while assistive robots help elderly or disabled individuals with daily activities.

Home assistance represents a longer-term goal for physical AI companies. Figure AI's Figure 03 robot is designed with home environments in mind, featuring soft materials, wireless charging, and safety features, though consumer availability is not expected until late 2026 at the earliest in limited pilot programs. 1X Technologies is accepting pre-orders for its NEO home humanoid robot at $20,000, targeting a 2026 US launch.

The Role of NVIDIA

NVIDIA has positioned itself as the central platform provider for the physical AI ecosystem, analogous to its role in the broader AI revolution through GPU computing. The company's physical AI stack spans multiple layers:

Compute hardware: NVIDIA GPUs and specialized accelerators (including the Blackwell architecture) provide the computational power needed for training and running physical AI models
Simulation: Omniverse and Isaac Sim provide the virtual environments for developing and testing physical AI systems
Foundation models: Cosmos WFMs, GR00T, and Cosmos Reason provide pre-trained models that companies can customize for specific applications
Inference: NVIDIA's Jetson platform provides edge computing for deploying AI models on robots and autonomous systems

At GTC 2025 and CES 2026, Jensen Huang outlined NVIDIA's vision for physical AI as the next major computing platform, comparing the coming wave of intelligent robots and autonomous systems to the personal computer and smartphone revolutions. NVIDIA has partnered with virtually every major robotics company, including Boston Dynamics, Figure AI, Agility Robotics, Apptronik, and Skild AI, providing compute infrastructure, simulation tools, and foundation models.

Market and Investment Landscape

The physical AI market is experiencing extraordinary growth in both investment and market size:

Metric	Value
Physical AI market size (2025)	$5.23 billion
Projected physical AI market (2033)	$49.73 billion (CAGR 32.53%)
Physical AI software platform market (projected 2034)	$55.8 billion (CAGR 42.0%)
Humanoid robot market (2025)	$2.92 billion
Projected humanoid market (2030)	$15.26 billion (CAGR 39.2%)
Long-term humanoid market (2050, Morgan Stanley estimate)	$5 trillion
Total robotics funding (2025)	Over $10.3 billion
Humanoid-specific funding (H1 2025)	$3.1 billion across 61 deals

Major funding rounds in 2025 and early 2026 reflect the scale of investor interest:

Company	Round	Amount	Valuation
Skild AI	Series C (Jan 2026)	$1.4 billion	$14 billion
Figure AI	Series B	$675 million	Not disclosed
Physical Intelligence	Series B	~$600 million	~$5.3 billion
Galaxy Bot	Series A	$453 million	Not disclosed
Apptronik	Series A	$403 million	Not disclosed

Goldman Sachs projects global humanoid shipments of 50,000 to 100,000 units in 2026, with unit economics ultimately improving to $15,000 to $20,000 per robot as production scales.

Challenges and Limitations

Despite rapid progress, physical AI faces significant technical and practical challenges:

Technical Challenges

Sim-to-real gap: Transferring policies trained in simulation to real-world robots remains difficult due to differences in physics, visuals, and sensor characteristics. While techniques like domain randomization and digital twins help, achieving reliable zero-shot transfer across diverse real-world conditions is still an active area of research.
Generalization: Current physical AI systems can struggle when encountering objects, environments, or situations significantly different from their training data. Achieving truly general-purpose physical intelligence that can handle the full diversity of real-world scenarios remains a long-term goal.
Safety and reliability: Physical AI systems interact with the real world, where failures can cause property damage or injury. Ensuring consistent, safe behavior across all possible situations is substantially harder than in purely digital AI applications.
Real-time performance: Physical AI systems must often make decisions and execute actions within milliseconds. Balancing the computational demands of sophisticated reasoning with the latency requirements of real-time control is a fundamental engineering challenge.
Dexterous manipulation: While progress has been rapid, robots still fall short of human-level dexterity in manipulating diverse objects, especially soft, deformable, or very small items.

Practical Challenges

Cost: Advanced physical AI systems, particularly humanoid robots, remain expensive. While prices are projected to decrease significantly with scale, current costs limit widespread deployment.
Energy and compute requirements: Running sophisticated AI models on mobile robots requires significant computing power and energy, constraining battery life and operational duration.
Regulatory frameworks: As physical AI systems become more autonomous, regulatory frameworks for safety certification, liability, and operational boundaries are still being developed.
Workforce transition: The deployment of physical AI in manufacturing, logistics, and other sectors raises important questions about workforce displacement and the need for retraining programs.

Current State and Future Outlook (2025-2026)

As of early 2026, physical AI has reached a critical inflection point. Several converging trends indicate that the field is transitioning from research and prototyping to commercial deployment:

Foundation models are maturing: VLA models like GR00T N1, Gemini Robotics, pi0, and Helix have demonstrated increasingly general and capable robot control across diverse tasks and environments.
Simulation tools are improving: NVIDIA Cosmos, Omniverse, and Isaac Sim provide increasingly realistic training environments, narrowing the sim-to-real gap.
Hardware is scaling: Multiple companies (Boston Dynamics, Figure AI, Tesla, Agility Robotics) are moving from prototypes to production manufacturing, with tens of thousands of units planned for 2026 and beyond.
Investment is accelerating: Over $10 billion in robotics funding in 2025, with humanoid-specific investment in the first half of 2025 alone exceeding the total from 2010 to 2024.
Commercial deployments are expanding: Atlas robots are shipping to Hyundai and Google DeepMind; Digit robots are operating in commercial warehouses; Tesla plans limited Optimus sales by late 2026.

Deloitte's 2026 Technology Trends report identified physical AI and humanoid robots as a major trend, noting the convergence of vision, sensing, cobots, and AI that is enabling humans and mobile robots to work together in increasingly flexible environments. Gartner predicted that 40 percent of enterprise applications would leverage task-specific AI agents by 2026, up from less than 5 percent in 2025.

Looking further ahead, the physical AI field is expected to progress through several phases: near-term commercial deployment in structured environments like factories and warehouses (2025 to 2027), broader deployment in semi-structured environments like stores and hospitals (2027 to 2030), and eventual deployment in fully unstructured environments like homes and outdoor spaces (2030 and beyond).

Relationship to Other Fields

Physical AI intersects with and builds upon several related fields:

Embodied AI: The broader research field studying AI systems with physical bodies that interact with environments. Physical AI can be considered the applied, commercial manifestation of embodied AI research.
Robot learning: Techniques for training robots through reinforcement learning, imitation learning, and other methods. Physical AI relies heavily on robot learning approaches.
Computer vision: The perception backbone of physical AI systems, providing visual understanding of the environment.
Natural language processing: Enables physical AI systems to understand and follow human language instructions.
Digital twins: Virtual replicas of physical systems used for simulation, testing, and monitoring.
Edge computing: Provides the on-device compute necessary for real-time physical AI inference on robots and autonomous systems.

References

NVIDIA Blog. "CES 2025: AI Advancing at 'Incredible Pace,' NVIDIA CEO Says." January 2025. https://blogs.nvidia.com/blog/ces-2025-jensen-huang/
Axios. "Nvidia CES 2026: Jensen Huang says 'ChatGPT moment for physical AI' is coming." January 2026. https://www.axios.com/2026/01/05/nvidia-ces-2026-jensen-huang-speech-ai
NVIDIA Newsroom. "NVIDIA Announces Isaac GR00T N1 -- the World's First Open Humanoid Robot Foundation Model." March 2025. https://nvidianews.nvidia.com/news/nvidia-isaac-gr00t-n1-open-humanoid-robot-foundation-model-simulation-frameworks
Physical Intelligence. "Our First Generalist Policy." 2024. https://www.pi.website/blog/pi0
Google DeepMind. "Gemini Robotics brings AI into the physical world." March 2025. https://deepmind.google/blog/gemini-robotics-brings-ai-into-the-physical-world/
Figure AI. "Helix: A Vision-Language-Action Model for Generalist Humanoid Control." 2025. https://www.figure.ai/news/helix
Figure AI. "Introducing Helix 02: Full-Body Autonomy." January 2026. https://www.figure.ai/news/helix-02
NVIDIA. "NVIDIA Cosmos: World Foundation Models Powering Physical AI." https://www.nvidia.com/en-us/ai/cosmos/
NVIDIA Newsroom. "NVIDIA Opens Portals to World of Robotics With New Omniverse Libraries, Cosmos Physical AI Models and AI Computing Infrastructure." 2025. https://nvidianews.nvidia.com/news/nvidia-opens-portals-to-world-of-robotics-with-new-omniverse-libraries-cosmos-physical-ai-models-and-ai-computing-infrastructure
Allen Institute for AI. "MolmoBot: Open, Simulation-First Stack for Physical AI." 2025. https://allenai.org/blog/molmobot
Superb AI. "Jensen Huang Declares 'Physical AI' the Next Wave of AI." 2025. https://superb-ai.com/en/resources/blog/physical-ai-series-1-what-is-it-en
Boston Dynamics. "Atlas Humanoid Robot." https://bostondynamics.com/products/atlas/
Engadget. "Boston Dynamics unveils production-ready version of Atlas robot at CES 2026." January 2026. https://www.engadget.com/big-tech/boston-dynamics-unveils-production-ready-version-of-atlas-robot-at-ces-2026-234047882.html
CNBC. "Google wants Intrinsic to be 'Android of robotics' as it pushes into physical AI." February 2026. https://www.cnbc.com/2026/02/28/google-wants-intrinsic-to-be-android-for-robots-moves-into-physical-ai.html
AI Business. "AI Startup That Builds a Brain for Robots Valued at $14 Billion." January 2026. https://aibusiness.com/robotics/skild-ai-startup-builds-robot-brain
SNS Insider. "Physical AI Market Size, Share & Growth Report 2033." https://www.snsinsider.com/reports/physical-ai-market-9007
Morgan Stanley. "Humanoid Robot Market Expected to Reach $5 Trillion by 2050." https://www.morganstanley.com/insights/articles/humanoid-robot-market-5-trillion-by-2050
Deloitte. "AI goes physical: Navigating the convergence of AI and robotics." 2026. https://www.deloitte.com/us/en/insights/topics/technology-management/tech-trends/2026/physical-ai-humanoid-robots.html
Wikipedia. "Vision-language-action model." https://en.wikipedia.org/wiki/Vision-language-action_model
Arxiv. "Physical AI Agents: Integrating Cognitive Intelligence with Real-World Action." 2025. https://arxiv.org/html/2501.08944v1
Tesla. "AI & Robotics." https://www.tesla.com/AI
Barclays. "AI gets physical: Innovation meets opportunity." 2025. https://www.ib.barclays/our-insights/series/impact-series/ai-gets-physical-innovation-meets-opportunity.html
Google DeepMind. "Gemini Robotics 1.5 brings AI agents into the physical world." 2025. https://deepmind.google/blog/gemini-robotics-15-brings-ai-agents-into-the-physical-world/
NVIDIA Developer Blog. "Scale Synthetic Data and Physical AI Reasoning with NVIDIA Cosmos World Foundation Models." 2025. https://developer.nvidia.com/blog/scale-synthetic-data-and-physical-ai-reasoning-with-nvidia-cosmos-world-foundation-models/

Overview

Key Components

Perception

Cognition and Planning

Action and Control

Foundation Models for Physical AI

Vision-Language-Action (VLA) Models

NVIDIA Isaac GR00T

Physical Intelligence pi0

Google DeepMind Gemini Robotics

Figure AI Helix

Skild AI

World Foundation Models and Simulation

NVIDIA Cosmos

NVIDIA Omniverse and Isaac Sim

The Sim-to-Real Gap

Key Companies and Organizations

Applications

Robotics and Manufacturing

Autonomous Vehicles

Warehouse and Logistics

Healthcare and Assistance

The Role of NVIDIA

Market and Investment Landscape

Challenges and Limitations

Technical Challenges

Practical Challenges

Current State and Future Outlook (2025-2026)

Relationship to Other Fields

See Also

References

Improve this article

Related Articles

ERQA

Machine learning terms/Fairness

ARC-AGI 2

DeepSeek 3.0

Open-source AI

AI search

Overview

Key Components

Perception

Cognition and Planning

Action and Control

Foundation Models for Physical AI

Vision-Language-Action (VLA) Models

NVIDIA Isaac GR00T

Physical Intelligence pi0

Google DeepMind Gemini Robotics

Figure AI Helix

Skild AI

World Foundation Models and Simulation

NVIDIA Cosmos

NVIDIA Omniverse and Isaac Sim

The Sim-to-Real Gap

Key Companies and Organizations

Applications

Robotics and Manufacturing

Autonomous Vehicles

Warehouse and Logistics

Healthcare and Assistance

The Role of NVIDIA

Market and Investment Landscape

Challenges and Limitations

Technical Challenges

Practical Challenges

Current State and Future Outlook (2025-2026)

Relationship to Other Fields

See Also

References

Related Articles

ERQA

Machine learning terms/Fairness

ARC-AGI 2

DeepSeek 3.0

Open-source AI

AI search