Physical Intelligence (also known as Pi or π) is an American artificial intelligence robotics company that develops foundation models and learning algorithms for robots. Founded in early 2024 and headquartered in San Francisco, California, the company builds general-purpose AI systems designed to enable robots to perceive, reason about, and interact with the physical world. Physical Intelligence has attracted significant attention for its rapid fundraising, assembling over $1 billion in total capital within its first two years, and for releasing a series of vision-language-action models (VLAs) that allow a single model to control diverse robot hardware across a wide range of manipulation tasks.
The company was co-founded by a group of prominent robotics and AI researchers, including Karol Hausman (CEO, formerly of Google DeepMind), Sergey Levine (Chief Scientist, UC Berkeley professor), Chelsea Finn (Stanford professor), and Brian Ichter (formerly of Google DeepMind), along with Lachy Groom, Adnan Esmail, and Quan Vuong.
Physical Intelligence was incorporated in March 2024 by seven co-founders who shared a common vision: building a single, general-purpose AI system capable of controlling any robot for any task. The founding team brought together academic leaders in robot learning, reinforcement learning, and meta-learning with experienced operators from the technology and defense industries.
Karol Hausman, who serves as CEO, previously worked as a Staff Research Scientist at Google DeepMind and held an adjunct professorship at Stanford University. He earned his PhD in Computer Science from the University of Southern California, where he focused on general-purpose robot learning. Sergey Levine, the company's Chief Scientist, is an Associate Professor in the Department of Electrical Engineering and Computer Sciences at UC Berkeley, where he leads the Robotic AI and Learning (RAIL) Lab. Levine is widely recognized for pioneering work in deep reinforcement learning for robotic manipulation and control, as well as foundational contributions to offline reinforcement learning. Chelsea Finn is an Assistant Professor of Computer Science and Electrical Engineering at Stanford University. She is best known for developing Model-Agnostic Meta-Learning (MAML), a widely adopted technique that enables machine learning models to adapt rapidly to new tasks with minimal data. Finn earned her PhD at UC Berkeley under the supervision of Pieter Abbeel and Sergey Levine.
Brian Ichter, another co-founder, previously served as a Research Scientist at Google DeepMind and Google Brain, where he worked on kinodynamic planning and GPU-accelerated algorithms for robotic systems. He holds a PhD in Aerospace Engineering from Stanford University. Adnan Esmail brought operational and hardware engineering experience from his roles as Senior Vice President of Engineering at Anduril Industries and as a senior Autopilot engineer at Tesla. Lachy Groom, a former executive at Stripe, contributes business and product leadership. Quan Vuong, who previously worked at Google DeepMind as a software engineer, focuses on cross-embodiment learning and brings research experience in robotics and reinforcement learning from his time at UC San Diego.
In March 2024, shortly after its founding, Physical Intelligence closed a $70 million seed round. The round was led by Thrive Capital, with participation from Khosla Ventures, Lux Capital, OpenAI, and Sequoia Capital. The seed round valued the company at roughly $400 million, a notable figure for a startup that had not yet released any products or research.
On November 4, 2024, Physical Intelligence announced a $400 million Series A funding round at a post-money valuation of approximately $2.4 billion. The round drew investment from Jeff Bezos, OpenAI, Thrive Capital, Lux Capital, and Bond Capital. Additional participants included Khosla Ventures and Sequoia Capital. The round was one of the largest Series A raises in the history of the robotics industry and made Physical Intelligence a unicorn less than a year after its founding.
In November 2025, the company raised a $600 million Series B round led by CapitalG (Alphabet's growth equity fund) and Lux Capital. Other investors in the round included Bond, Redpoint Ventures, Sequoia Capital, T. Rowe Price, Thrive Capital, and Jeff Bezos. The round brought Physical Intelligence's post-money valuation to approximately $5.6 billion and its total funding to roughly $1.1 billion, making it one of the most heavily funded AI robotics startups in the world.
| Round | Date | Amount | Post-Money Valuation | Lead Investors | Other Notable Investors |
|---|---|---|---|---|---|
| Seed | March 2024 | $70 million | ~$400 million | Thrive Capital | Khosla Ventures, Lux Capital, OpenAI, Sequoia Capital |
| Series A | November 2024 | $400 million | ~$2.4 billion | Bond Capital, Thrive Capital, Lux Capital | Jeff Bezos, OpenAI, Khosla Ventures, Sequoia Capital |
| Series B | November 2025 | $600 million | ~$5.6 billion | CapitalG, Lux Capital | Bond, Redpoint Ventures, Sequoia Capital, T. Rowe Price, Jeff Bezos |
| Total | ~$1.1 billion |
Physical Intelligence's technical approach centers on building large-scale foundation models that can serve as general-purpose "brains" for robots. Rather than engineering task-specific control software for each individual robot and scenario, the company trains a single model on diverse robotic data and then fine-tunes it for downstream tasks. This strategy mirrors the approach that large language models (LLMs) use in natural language processing: pre-train on broad data, then specialize.
The core innovation at Physical Intelligence is the vision-language-action (VLA) model architecture. A VLA model takes in visual observations from cameras, processes natural language instructions, and outputs continuous motor commands that directly control a robot's actuators. This end-to-end approach eliminates the need for separate perception, planning, and control modules, instead learning all three capabilities within a single neural network.
Physical Intelligence's VLA models are built on top of pre-trained vision-language models (VLMs), which provide a strong foundation of visual understanding and language comprehension learned from internet-scale image-text data. By extending a VLM with an action generation module, the company's models can follow plain-language instructions (for example, "fold the shirt" or "put the dishes in the sink") and translate them into precise, coordinated motor actions.
A distinctive technical feature of Physical Intelligence's approach is the use of flow matching for action generation. Flow matching is a generative modeling technique related to diffusion models that learns to transform a simple noise distribution into a target distribution of robot actions. Compared to standard autoregressive action prediction, flow matching produces smooth, continuous action trajectories, which is important for dexterous manipulation tasks that require fluid, coordinated movements.
In Physical Intelligence's architecture, the flow matching module generates "action chunks," sequences of future actions (typically 50 timesteps at 50 Hz), rather than predicting a single action at a time. This allows the robot to plan short sequences of coordinated movements, resulting in smoother and more reliable execution.
Physical Intelligence follows a two-stage training process. During pre-training, the model is exposed to a large and diverse dataset that includes internet-scale image-text data (through the underlying VLM), open-source robot manipulation datasets such as the Open X-Embodiment dataset, and proprietary datasets collected in-house across multiple robot platforms and dozens of tasks. This broad pre-training gives the model a general understanding of physical manipulation, language instructions, and visual scenes.
During fine-tuning, the pre-trained model is adapted to specific tasks or robot configurations using smaller, task-specific datasets. The company has reported that as little as 1 to 20 hours of demonstration data is often sufficient to fine-tune the model to a new task.
Physical Intelligence released π₀, its first generalist robot policy, on October 31, 2024. The model was described in a research paper titled "π₀: A Vision-Language-Action Flow Model for General Robot Control," authored by a team of 24 researchers including all of the company's co-founders.
π₀ is built on top of PaliGemma, a 3-billion-parameter pre-trained vision-language model developed by Google. The model extends PaliGemma with an "action expert" module (adding approximately 300 million parameters, for a total of roughly 3.3 billion parameters) that generates continuous robot actions using flow matching. The architecture uses a two-expert design:
A block-wise causal attention mask governs how information flows between these modules: the VLM block attends to itself, the proprioception block attends to itself and the VLM, and the action block attends to all other blocks.
π₀ was trained on data from eight distinct robot platforms, including single-arm robots (UR5e, Franka), dual-arm configurations (Bimanual UR5e, Bimanual Trossen, Bimanual Arx), and mobile manipulators (Mobile Trossen, Mobile Fibocom). The training dataset combined proprietary data from over 10,000 hours of robot operation across 68 tasks with open-source data from the Open X-Embodiment dataset. Tasks ranged from simple object relocation to complex, multi-step activities such as folding laundry from a hamper, bussing tables (sorting dishes and trash), assembling boxes, bagging groceries, and preparing coffee.
In evaluations, π₀ outperformed prior generalist robot models, including OpenVLA and Octo, on five benchmark tasks. The model demonstrated both zero-shot task performance (completing tasks without any task-specific fine-tuning) and strong fine-tuned performance on complex, real-world tasks.
On February 4, 2025, Physical Intelligence open-sourced π₀ by releasing its model weights and code through the "openpi" repository on GitHub. The release included weights for both π₀ and the newer π₀-FAST variant, along with example fine-tuning configurations, inference code, and checkpoints for several robot platforms including ALOHA and DROID. HuggingFace subsequently prepared a PyTorch port of the model for developers who prefer PyTorch over the original JAX implementation.
π₀-FAST is an autoregressive variant of π₀ that replaces flow matching with a novel action tokenization scheme called FAST (Frequency-space Action Sequence Tokenization). Instead of generating continuous action trajectories through iterative denoising, π₀-FAST compresses continuous action sequences into discrete tokens using the Discrete Cosine Transform (DCT). These discrete tokens can then be predicted autoregressively by a standard transformer decoder, similar to how language models generate text.
The key advantage of FAST tokenization is training efficiency: π₀-FAST trains approximately five times faster than the diffusion-based π₀ while matching its performance on dexterous and long-horizon manipulation tasks such as laundry folding, table bussing, and grocery bagging. However, one trade-off is that the autoregressive decoding process in π₀-FAST is slower at inference time compared to the flow matching approach used by π₀.
Physical Intelligence published the research paper for π₀.₅ on April 22, 2025 (arXiv:2504.16054). The model represents a significant step toward open-world generalization, meaning the ability to operate effectively in environments that differ substantially from those seen during training.
The central innovation in π₀.₅ is co-training on heterogeneous data sources. While π₀ already combined robot data with VLM pre-training, π₀.₅ goes further by jointly training on data from multiple robots, high-level semantic task predictions, web-scale image and text data, and other sources. This co-training approach teaches the model physical manipulation skills, semantic understanding of objects and environments, the ability to infer task structure, and the ability to transfer behaviors across different robot embodiments.
The model maintains a hierarchical design in which the robot is tasked with a general goal and accomplishes it by both predicting semantic subtasks and generating low-level motor actions. Training with web data allows the model to recognize and manipulate objects it has never physically encountered before, because it has learned about those objects from internet images and descriptions.
Physical Intelligence tested π₀.₅ in three rental homes in San Francisco, where a mobile manipulator completed multi-step household chores such as putting dishes in the sink, placing clothes in a laundry basket, and organizing drawers. These environments were entirely new to the model, demonstrating meaningful generalization beyond training conditions.
In November 2025, Physical Intelligence published π*₀.₆, a VLA model that learns from its own deployment experience through reinforcement learning. The paper introduced RECAP (Reinforcement Learning with Experience and Corrections via Advantage-conditioned Policies), a training method that enables VLA models to improve through real-world interaction rather than relying solely on pre-collected demonstration data.
RECAP incorporates three types of data into the training process: human demonstrations, on-policy data from autonomous robot execution, and expert teleoperated interventions (corrections provided by human operators when the robot makes mistakes during autonomous execution). By conditioning the policy on an advantage signal, the model learns to distinguish between successful and unsuccessful behaviors and gradually improves its performance.
π*₀.₆ is built on a 5-billion-parameter vision-language model augmented with an action expert. In real-world evaluations, the model trained with RECAP more than doubled throughput on some of the most challenging tasks and reduced failure rates by a factor of two or more. Demonstrated capabilities included folding laundry in real homes, reliably assembling boxes, and preparing espresso drinks using a professional espresso machine.
| Model | Release Date | Parameters | Architecture | Key Innovation |
|---|---|---|---|---|
| π₀ | October 2024 | ~3.3 billion | VLM + flow matching action expert | First generalist VLA for multi-robot control |
| π₀-FAST | 2025 | ~3.3 billion | VLM + autoregressive FAST tokenizer | 5x faster training via DCT-based action tokens |
| π₀.₅ | April 2025 | Not disclosed | Hierarchical VLA with co-training | Open-world generalization to unseen environments |
| π*₀.₆ | November 2025 | ~5 billion | VLA + RL (RECAP) | Self-improvement from deployment experience |
Physical Intelligence's approach differs from traditional robotics in several ways. Classical robotics systems typically rely on hand-engineered perception pipelines, task-specific planning algorithms, and carefully designed controllers. Each new task or environment generally requires significant engineering effort to adapt. In contrast, Physical Intelligence's learned approach uses a single model that can be adapted to new tasks through data collection and fine-tuning rather than manual engineering.
Compared to other learned approaches in robotics, Physical Intelligence distinguishes itself through its emphasis on generality across robot embodiments. While many prior works in robot learning focus on a single robot platform, Physical Intelligence's models are trained on data from multiple different robots with different morphologies, sensors, and actuators. The use of a pre-trained VLM as the backbone further differentiates the approach from methods that train policies from scratch, since the VLM provides a foundation of visual and linguistic knowledge learned from billions of image-text pairs on the internet.
The company describes its approach as building "hardware-agnostic AI models" that function as a kind of operating system for robotics, meaning any robot manufacturer could potentially use Physical Intelligence's models to power their hardware.
Physical Intelligence operates in a growing field of companies and research groups working on foundation models for robotics. The competitive landscape includes both well-funded startups and major technology corporations.
Google DeepMind has been a significant player in robotic foundation models, having developed the RT-1, RT-2, and RT-X series of models. RT-2, in particular, demonstrated that a vision-language model could be used to control robots, an approach that influenced subsequent work across the field, including Physical Intelligence's own research. Several Physical Intelligence founders (Hausman, Ichter, Vuong) previously worked at Google DeepMind on these projects.
Tesla is developing its Optimus humanoid robot, which leverages the company's extensive experience in data collection and neural network training from its Autopilot and Full Self-Driving programs. Tesla's approach differs in that it is vertically integrated, designing both the robot hardware and the AI software.
Figure AI, another venture-backed startup, focuses on building general-purpose humanoid robots for commercial applications such as manufacturing and logistics. Figure AI has raised over $675 million and works on integrating foundation models with humanoid form factors.
Skild AI, valued at $3.5 billion, is developing a "Skild Brain" foundation model for controlling robots. Like Physical Intelligence, Skild takes a hardware-agnostic approach, training on data from diverse robot platforms.
OpenVLA, an open-source VLA model developed by researchers at Stanford, UC Berkeley, and MIT, provides an accessible alternative for research teams looking to build robot policies without commercial software. Physical Intelligence's own open-sourcing of π₀ through the openpi repository places it in direct interaction with the broader open-source robotics AI community.
NVIDIA has invested in the physical AI space through its Omniverse simulation platform and Isaac robotics development tools, as well as by releasing its own VLA models. While NVIDIA is not a direct competitor in building robot brain models, its tools and hardware form a critical part of the infrastructure that companies like Physical Intelligence use for training and deployment.
| Company | Focus | Approach | Notable Funding |
|---|---|---|---|
| Physical Intelligence | General-purpose robot AI | Hardware-agnostic VLA foundation models | $1.1 billion |
| Google DeepMind | Research and applications | RT series models, simulation | Internal (Alphabet) |
| Tesla Optimus | Humanoid robots | Vertically integrated hardware + AI | Internal (Tesla) |
| Figure AI | Humanoid robots | Foundation models for humanoid form | $675 million+ |
| Skild AI | General-purpose robot AI | Hardware-agnostic "Skild Brain" | $3.5 billion valuation |
| 1X Technologies | Home assistant robots | Humanoid robots for domestic use | $125 million+ |
Physical Intelligence has demonstrated its models on a range of real-world tasks that span household and light-industrial scenarios. Demonstrated applications include:
The company has indicated that its models could eventually be applied to manufacturing, warehouse automation, and other industrial contexts, though its publicly demonstrated work has focused primarily on household and semi-structured environments.
Physical Intelligence positions itself as a provider of AI models for robots rather than a robot manufacturer. This software-first, hardware-agnostic strategy means the company does not build or sell robots itself; instead, it develops the "brain" that can be integrated into robots built by other manufacturers. Reports have suggested a subscription-based pricing model, with an indicated price point of approximately $300 per month per connected robot for access to the company's AI platform.
As of early 2026, the company has not disclosed revenue figures. Physical Intelligence has continued to invest heavily in research and development, data collection, and compute infrastructure.
Physical Intelligence is headquartered in San Francisco's Mission District. The company has grown rapidly since its founding, with team estimates ranging from approximately 50 to nearly 200 employees as of late 2025. The team includes researchers and engineers from leading AI labs and universities, many of whom have published influential work in deep learning, reinforcement learning, computer vision, and robotic manipulation.
| Name | Role | Background |
|---|---|---|
| Karol Hausman | Co-Founder, CEO | PhD USC; formerly Staff Research Scientist at Google DeepMind, Adjunct Professor at Stanford |
| Sergey Levine | Co-Founder, Chief Scientist | PhD Stanford; Associate Professor at UC Berkeley (RAIL Lab) |
| Chelsea Finn | Co-Founder | PhD UC Berkeley; Assistant Professor at Stanford; creator of MAML |
| Brian Ichter | Co-Founder | PhD Stanford (Aerospace Engineering); formerly Research Scientist at Google DeepMind |
| Adnan Esmail | Co-Founder | MIT; formerly SVP Engineering at Anduril Industries, Sr. Staff Engineer at Tesla |
| Lachy Groom | Co-Founder | Former Stripe executive; business and product leadership |
| Quan Vuong | Co-Founder | UC San Diego; formerly Software Engineer at Google DeepMind |