# AI robotics

> Source: https://aiwiki.ai/wiki/ai_robotics
> Updated: 2026-06-23
> Categories: Artificial Intelligence, Robotics
> License: CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
> From AI Wiki (https://aiwiki.ai), the free encyclopedia of artificial intelligence. Reuse freely with attribution to "AI Wiki (aiwiki.ai)".

AI robotics is the integration of [artificial intelligence](/wiki/artificial_intelligence) techniques with physical robotic systems, enabling machines to perceive their environments, make decisions, and carry out actions in the real world. It is the engineering practice behind [embodied AI](/wiki/embodied_ai): rather than following pre-programmed sequences like a traditional industrial arm, an AI-powered robot uses learned models (computer vision for perception, reinforcement learning or imitation learning for control, and increasingly large [vision-language-action models](/wiki/vision_language_action_model)) to adapt to new situations, learn from experience, and handle unstructured tasks. The field sits at the intersection of [machine learning](/wiki/machine_learning), [computer vision](/wiki/computer_vision), [natural language processing](/wiki/natural_language_processing), [robotics](/wiki/robotics), and mechanical engineering.

The defining shift of the modern era is the move from hand-coded controllers to learned policies. The convergence of cheaper sensors, more powerful compute, and breakthroughs in [deep learning](/wiki/deep_learning) has accelerated this since the mid-2010s: robots that once required carefully controlled factory floors can now navigate warehouses, manipulate unfamiliar objects, and follow spoken instructions. As of 2026, the field is transitioning from research demonstrations toward commercial deployment, driven by a wave of general-purpose [humanoid robots](/wiki/humanoid_robot) and [robot foundation models](/wiki/robot_foundation_model). Goldman Sachs projects the total addressable market for humanoid robots will reach $38 billion by 2035, a more than sixfold increase from its earlier $6 billion estimate, with annual shipments rising to about 1.4 million units [1]. Morgan Stanley estimates the broader humanoid market, including supply chains and service networks, could surpass $5 trillion by 2050 [2].

## What makes a robot an "AI robot"?

The distinction is the source of the robot's behavior. A classical industrial robot executes a fixed program: every motion is specified in advance and the robot fails if the world deviates from the script. An AI robot instead runs a learned policy, a function (usually a neural network) that maps sensor observations to actions and is trained on data rather than written by hand. This lets the robot generalize to objects, layouts, and instructions it was not explicitly programmed for. Three ingredients define the modern stack: perception (computer vision and depth sensing that turn raw sensor data into a scene understanding), policy learning (reinforcement learning, imitation learning, or pretrained [vision-language-action models](/wiki/vision_language_action_model) that produce actions), and simulation (training in physics simulators before deploying to hardware). The remainder of this article traces how these came together.

## History

### Early research (1960s-1980s)

The roots of AI robotics trace back to the 1960s, when researchers first attempted to give machines the ability to reason about the physical world. The most notable early project was Shakey the Robot, developed at the Stanford Research Institute (SRI International) between 1966 and 1972. Funded by the Defense Advanced Research Projects Agency (DARPA), Shakey was the first mobile robot capable of reasoning about its own actions. It could perceive its surroundings using a TV camera and range finders, plan a sequence of actions to achieve a goal, and navigate through rooms while pushing objects around [3].

Shakey's software contributions proved as important as the robot itself. The project produced the A* search algorithm (used widely in pathfinding to this day), the STRIPS automated planner, and the Hough transform for detecting geometric shapes in images. The robot's programming was done primarily in LISP, and it could accept commands in simple English [3].

During the same period, industrial robotics took a separate path. Unimate, the first industrial robot arm, was installed at a General Motors plant in 1961 to perform die-casting tasks. These early industrial robots were not intelligent in any meaningful sense; they simply repeated pre-programmed movements with high precision. Throughout the 1970s and 1980s, companies like FANUC, ABB, and KUKA expanded the use of robot arms in automotive manufacturing, welding, and assembly lines [4].

### The AI winter and recovery (1980s-2000s)

Progress in AI robotics slowed during the [AI winter](/wiki/ai_winter) of the late 1980s and early 1990s, as funding dried up and the limitations of symbolic AI approaches became apparent. Robots that relied on hand-crafted rules and logical planning struggled with the messiness of real-world environments.

A shift came in the late 1990s and 2000s with the rise of behavior-based robotics, championed by Rodney Brooks at MIT. Brooks argued that robots should be built from the bottom up, with simple reactive behaviors layered on top of each other, rather than relying on complex internal world models. His company iRobot, founded in 1990, eventually produced the Roomba vacuum cleaner in 2002, which became one of the first commercially successful consumer robots [5].

Meanwhile, [Honda](/wiki/honda)'s [ASIMO](/wiki/asimo) project (2000) and Sony's AIBO robot dog (1999) demonstrated that legged locomotion and consumer-facing robots were technically feasible, even if not yet practical for real work.

### The deep learning revolution (2012-present)

The modern era of AI robotics began with the deep learning revolution around 2012. The success of [convolutional neural networks](/wiki/convolutional_neural_network) on image recognition tasks (notably [AlexNet](/wiki/alexnet)'s victory in the [ImageNet](/wiki/imagenet) competition) opened the door to robots that could see and interpret their surroundings far more effectively than previous systems. [Reinforcement learning](/wiki/reinforcement_learning) provided a framework for robots to learn behaviors through trial and error, and the combination of simulation with real-world training ([sim-to-real transfer](/wiki/sim_to_real_transfer)) made it practical to train robots on millions of interactions without wearing out physical hardware [6].

## Core AI technologies in robotics

### Computer vision for perception

Modern robots rely heavily on computer vision to understand their environments. Cameras, depth sensors (like LiDAR and structured-light sensors), and sometimes radar provide raw sensory input. Deep learning models, particularly convolutional neural networks and more recently [vision transformers](/wiki/vision_transformer), process this input to perform object detection, semantic segmentation, pose estimation, and scene understanding.

For manipulation tasks, robots need to identify objects, estimate their 3D position and orientation, and determine grasp points. Systems like DenseFusion and FoundationPose have pushed the accuracy of 6-DOF (six degrees of freedom) pose estimation, allowing robots to pick up objects they have never seen before based on shape and visual features alone [7].

### Reinforcement learning for manipulation and locomotion

Reinforcement learning has become a primary method for teaching robots physical skills. In RL, an agent learns a policy (a mapping from observations to actions) by interacting with an environment and receiving reward signals. For robotics, this means a robot can learn to walk, grasp objects, or perform assembly tasks through repeated practice.

Key milestones include [OpenAI](/wiki/openai)'s work on dexterous in-hand manipulation (2018-2019), where a robotic hand learned to solve a Rubik's Cube entirely through simulation training, and DeepMind's work on locomotion for quadruped robots. The challenge with RL in robotics is sample efficiency: robots need millions of training episodes, which is why simulation plays such a large role [8].

### Large language models for planning and instruction following

The emergence of [large language models](/wiki/large_language_model) (LLMs) has opened a new channel for human-robot interaction. Instead of programming specific behaviors, operators can issue natural language instructions that the robot interprets and executes. Several research projects from Google and [Google DeepMind](/wiki/google_deepmind) have demonstrated this approach:

| System | Year | Description | Key capability |
|--------|------|-------------|----------------|
| SayCan | 2022 | Grounds [LLM](/wiki/large_language_model) outputs in robotic affordances | Breaks long-horizon tasks into executable steps by scoring what the robot can actually do |
| Inner Monologue | 2022 | Closes the loop between LLM planning and environment feedback | Uses scene descriptions and error detection to let the LLM re-plan when something goes wrong |
| RT-1 | 2022 | Robotics [Transformer](/wiki/transformer) trained on 130,000 real-world episodes | 97% success rate on 700+ tasks with a fleet of 13 robots |
| RT-2 | 2023 | Vision-Language-Action model combining web-scale and robotics data | Transfers web knowledge directly to robot control; can reason about novel objects |
| Open X-Embodiment | 2023 | Dataset pooling data from 22 robot types across 33 labs | Training on multi-robot data tripled RT-2 performance on real-world skills |

SayCan works by combining an LLM's language understanding with a learned "affordance function" that estimates whether the robot can physically perform each proposed action. When a user says "I spilled my drink, can you help?", the LLM generates candidate action sequences, and the affordance model filters them to what the robot can actually do in its current state. The system achieved 67% execution success on long-horizon kitchen tasks with as many as 50 individual steps [9].

RT-2, published in 2023, took a different approach by fine-tuning a large vision-language model (VLM) to output robot actions directly. The model treats robot actions as text tokens, allowing it to leverage the vast knowledge learned from internet-scale training data. This means RT-2 can reason about objects it was never trained on in a robotics context. When asked to "pick up the object you would use to hammer a nail," it correctly selects a rock, even though no robotics training data included that instruction [10].

### What is a vision-language-action (VLA) model?

A [vision-language-action model](/wiki/vision_language_action_model) (VLA) is a single neural network that takes camera images and a natural language instruction as input and directly outputs robot actions, typically by repurposing a pretrained vision-language model so that the robot inherits broad world knowledge from internet data. VLAs are the dominant paradigm for [robot foundation models](/wiki/robot_foundation_model) as of 2026, and several have moved from research papers into deployed hardware:

| Model | Developer | Year | Notable detail |
|-------|-----------|------|----------------|
| RT-2 | [Google DeepMind](/wiki/google_deepmind) | 2023 | First widely cited VLA; encodes actions as text tokens on a VLM backbone |
| pi0 | [Physical Intelligence](/wiki/physical_intelligence) | 2024 | Builds on the 3B-parameter PaliGemma VLM and adds flow matching for continuous control; trained on roughly 10,000 hours of robot data spanning 68 tasks across 7 robot configurations |
| Helix | [Figure AI](/wiki/figure_ai) | 2025 | First VLA to run two humanoids on a shared, long-horizon task; runs fully onboard low-power embedded GPUs |
| Gemini Robotics On-Device | [Google DeepMind](/wiki/google_deepmind) | 2025 | Runs locally on the robot with no network; can be fine-tuned for a new task from as few as 50 demonstrations |

[Physical Intelligence](/wiki/physical_intelligence) describes its [pi0](/wiki/pi0) model as "a general-purpose robot foundation model" that uses flow matching to generate continuous, high-frequency action sequences, in contrast to the discretized action tokens of earlier transformer policies [11]. [Figure AI](/wiki/figure_ai)'s [Helix](/wiki/figure_helix) uses a "System 1, System 2" design (a slow VLM for reasoning paired with a fast policy for control) and is, per Figure, "the first VLA to output high-rate continuous control of the entire humanoid upper body" including individual fingers, wrists, torso, and head [12]. Gemini Robotics On-Device, released in 2025, was trained for ALOHA robots but adapted to a bi-arm Franka FR3 and to [Apptronik](/wiki/apptronik)'s Apollo humanoid, illustrating the cross-embodiment ambition of the VLA approach [13].

### Foundation models for robotics

The concept of [foundation models](/wiki/foundation_model) (large models pre-trained on broad data that can be adapted to many tasks) is now being applied to robotics. The Open X-Embodiment project, a collaboration among Google DeepMind and 33 academic labs, created a shared dataset of 60 existing robot datasets pooled into a corpus of more than 1 million real robot trajectories spanning 22 robot embodiments, 527 skills, and 160,266 tasks. Training a single model on this combined data led to significantly better performance than training on any individual robot's data [14].

The goal is a [robot foundation model](/wiki/robot_foundation_model) analogous to GPT or [BERT](/wiki/bert) for language: a single pre-trained model that can be adapted to control many different robots on many different tasks with minimal fine-tuning. NVIDIA's Project GR00T, Physical Intelligence's pi0, and various academic efforts are pursuing this vision, though a truly general-purpose robotics foundation model remains an open research challenge as of 2026 [15].

## Key companies and platforms

The AI robotics landscape spans established players and well-funded startups:

| Company | Notable robot(s) | Focus area | Key details |
|---------|-------------------|------------|-------------|
| [Boston Dynamics](/wiki/boston_dynamics) | Atlas (electric), Spot, Stretch | Research and commercial humanoids and quadrupeds | Atlas electric launched at CES 2026; production deployments at Hyundai and Google DeepMind scheduled for 2026 |
| [Figure AI](/wiki/figure_ai) | Figure 02, Figure 03 | General-purpose humanoid for manufacturing | $39B valuation (Sep 2025); Figure 02 deployed at BMW Spartanburg plant, loading 90,000+ parts |
| [1X Technologies](/wiki/1x_technologies) | NEO | Household humanoid | $20,000 consumer price; early access delivery in U.S. starting 2026 |
| [Agility Robotics](/wiki/agility_robotics) | Digit | Logistics and warehouse operations | 100,000+ totes moved at GXO facility; 98% task success rate at Amazon testing site |
| [Tesla](/wiki/tesla) | Optimus (Gen 3) | Factory and eventually consumer | Gen 3 production starting summer 2026 at Fremont; 25 actuators per hand |
| [Unitree Robotics](/wiki/unitree) | H1, G1 | Affordable humanoids and quadrupeds | G1 starting at ~$16,000; H1 achieved 3.3 m/s bipedal running speed |
| [Physical Intelligence](/wiki/physical_intelligence) | pi0 (software) | Robot foundation models | Cross-embodiment VLA running on third-party arms and humanoids |
| [NVIDIA](/wiki/nvidia) | Isaac platform, GR00T, Cosmos | Simulation, training infrastructure, and world models | Isaac Lab 3.0 with Newton physics engine; Cosmos world foundation models for physical AI |

## Hardware

### Sensors

AI robots depend on a suite of sensors to perceive the world:

- **Cameras.** RGB cameras provide color images for object recognition and scene understanding. Stereo camera pairs enable depth perception. Event cameras, which detect changes in brightness asynchronously, are used for high-speed tracking.
- **LiDAR.** Light Detection and Ranging sensors emit laser pulses and measure return times to build precise 3D maps of the environment. Widely used in autonomous vehicles and mobile robots.
- **Force/torque sensors.** Mounted at joints or end-effectors, these measure the forces a robot exerts on objects, enabling compliant manipulation.
- **Tactile sensors.** Sensors embedded in robotic fingers or grippers provide information about contact, pressure, and texture. The BioTac sensor mimics human fingertip sensitivity, while newer designs use arrays of "taxels" (tactile pixels) across entire finger surfaces.
- **IMUs.** Inertial measurement units track a robot's acceleration and orientation, which is especially important for legged robots maintaining balance.

### Actuators

Actuators convert electrical energy into physical motion. The main types used in modern AI robots include:

| Actuator type | Strengths | Weaknesses | Common use |
|---------------|-----------|------------|------------|
| Electric motors (brushless DC) | High efficiency, precise control, low maintenance | Limited torque density | Most humanoid and industrial robots |
| Hydraulic actuators | Very high force output, smooth motion | Heavy, prone to leaks, noisy | Heavy-lift applications; original Boston Dynamics Atlas |
| Series elastic actuators (SEAs) | Inherent compliance, safer for human interaction | Added mechanical complexity | Collaborative robots, legged locomotion |
| Quasi-direct drive | High backdrivability, good force control | Lower gear ratio limits torque | Unitree quadrupeds, some humanoid legs |

### Dexterous hands

Robotic hand dexterity remains one of the field's hardest challenges. The human hand has 27 degrees of freedom and dense tactile sensing across every finger, a combination that is extremely difficult to replicate mechanically.

The Shadow Dexterous Hand, developed by Shadow Robot Company in London, is one of the most advanced robotic hands available. It has 24 degrees of freedom, 20 motors, and over 100 sensors operating at up to 1 kHz. In 2025, Shadow Robot partnered with Google DeepMind to develop the DEX-EE, a next-generation hand with hundreds of tactile sensors per fingertip, precise torque control at 10 kHz internal loops, and the ability to close from fully open in 500 milliseconds [16].

Tesla's Optimus Gen 3 features hands with 22 degrees of freedom and 25 actuators per forearm/hand assembly (50 total), a significant step up from the 12 actuators in Gen 2. These hands are designed for factory tasks like picking up small parts and operating tools [17].

## Sim-to-real transfer

One of the most important techniques in modern AI robotics is [sim-to-real transfer](/wiki/sim_to_real_transfer): training robot policies in simulated environments and then deploying them on physical robots. This approach solves a fundamental bottleneck. Training a robot directly in the real world is slow, expensive, and potentially dangerous. A robot learning to walk might fall thousands of times; a robot learning to grasp objects might break them. In simulation, these failures cost nothing.

The process typically works as follows:

1. A physics simulator (such as NVIDIA Isaac Sim, MuJoCo, or PyBullet) creates a virtual environment with the robot model.
2. The robot policy is trained using reinforcement learning across thousands of parallel simulation instances running on [GPUs](/wiki/gpu).
3. Domain randomization is applied: the simulator varies physical parameters (friction, object mass, lighting, sensor noise) randomly during training so the policy learns to handle a range of conditions.
4. The trained policy is transferred to the real robot, ideally requiring zero additional real-world training ("zero-shot transfer").

NVIDIA's Isaac Lab is currently the leading platform for this workflow. It leverages GPU-based parallelization to run thousands of simulation instances simultaneously, and its domain randomization tools help bridge the gap between simulated and real physics. Isaac Lab 3.0, released in early access in 2026, uses the new Newton physics engine and supports large-scale training on DGX-class infrastructure [18].

Boston Dynamics demonstrated the power of this approach by training locomotion policies for its Spot quadruped in Isaac Lab and deploying them directly on the physical robot, achieving performance competitive with the company's hand-tuned controllers [18].

## Research milestones

### RT-1 and RT-2

Google's Robotics Transformer series represented a shift toward treating robot control as a sequence modeling problem. RT-1 (2022) used a [transformer](/wiki/transformer) architecture with an [EfficientNet](/wiki/efficientnet) backbone and early language fusion to output discretized robot actions. Trained on 130,000 real-world episodes covering over 700 tasks, RT-1 achieved a 97% success rate on known tasks and generalized 25% better to new tasks than prior baselines [19].

RT-2 (2023) went further by co-training on both internet-scale vision-language data and robotics data. The key insight was that robot actions could be represented as text tokens appended to the model's vocabulary, allowing a vision-language model to output control commands directly. This meant the model could leverage its "knowledge" of the world (learned from billions of image-text pairs) to handle robotics tasks involving novel objects and novel instructions [10].

### SayCan and language-conditioned planning

SayCan (2022) demonstrated that large language models could serve as high-level planners for robots, provided their outputs were grounded in what the robot could physically do. The system combined an LLM's ability to decompose tasks in natural language with a learned value function that scored each candidate action by how likely the robot was to succeed at it. This "grounding" step prevented the LLM from proposing actions the robot could not execute [9].

### Open X-Embodiment

The Open X-Embodiment project (2023) addressed a fundamental scaling problem in robotics: individual labs collect data on individual robots, but no single lab has enough data to train a general model. By pooling 60 datasets from 22 different robot types across 34 institutions into a corpus of more than 1 million real robot trajectories, the project showed that cross-embodiment training is not only possible but beneficial. A model trained on the combined dataset performed significantly better across many robots than models trained on any single robot's data, and training RT-2 on this multi-embodiment data tripled its real-world performance [14].

## Applications

### Manufacturing and logistics

The most immediate commercial applications for AI robots are in structured environments like factories and warehouses. Agility Robotics' Digit has been deployed at Amazon and GXO Logistics facilities for tote-moving tasks. Figure AI's Figure 02 has completed over 1,250 runtime hours at BMW's Spartanburg plant, running daily 10-hour shifts and loading parts for X3 vehicle production [20].

### Agriculture

AI-powered robots are used for fruit picking, weeding, and crop monitoring. Companies like Agrobot and Abundant Robotics (before its closure) developed strawberry-picking robots that use computer vision to identify ripe fruit and reinforcement-learned policies to handle delicate produce without bruising.

### Healthcare

Surgical robots like Intuitive Surgical's da Vinci system already incorporate limited AI for tremor filtering and motion scaling. Research is pushing toward greater autonomy, with systems learning to perform specific surgical subtasks (such as suturing) from demonstration data.

### Household

1X Technologies' NEO and various other systems aim to bring robots into homes for cleaning, organizing, and fetching tasks. The challenge is that homes are far less structured than factories, with enormous variation in layouts, objects, and tasks.

## Challenges

### Generalization

Perhaps the biggest challenge is generalization. A robot trained to pick up mugs in a lab may fail when faced with a mug of a different shape, color, or orientation in a different lighting condition. Foundation model approaches like Open X-Embodiment aim to address this through scale and diversity of training data, but the gap between lab demos and reliable real-world performance remains substantial.

### Safety and reliability

Robots that share space with humans must be safe. This requires both hardware safeguards (force-limiting actuators, soft coverings) and software safeguards (collision avoidance, behavioral constraints). Certifying AI-controlled robots for safety is complicated by the opacity of neural network decision-making.

### Hardware cost and durability

Advanced sensors, high-torque actuators, and dexterous hands remain expensive. The Unitree G1 at roughly $16,000 represents the low end; most capable [humanoid robots](/wiki/humanoid_robot) cost $90,000 to over $140,000. Bringing costs down to levels suitable for mass deployment will require supply chain maturation similar to what happened with smartphones. Goldman Sachs cited a 40% reduction in the bill of materials for high-end humanoids as a key driver of its faster path to profitability [1].

### Data scarcity

Unlike language AI, where trillions of tokens of text are freely available on the internet, robotics data is scarce and expensive to collect. Every data point requires a physical robot interacting with a real or simulated environment. Projects like Open X-Embodiment and teleoperation-based data collection are addressing this, but robotics datasets remain orders of magnitude smaller than language datasets.

### Battery life

Mobile robots are constrained by battery capacity. Most humanoid robots operate for one to four hours on a charge. Longer operation requires either battery swapping (as Boston Dynamics' [Atlas](/wiki/atlas_robot) and [Apptronik](/wiki/apptronik)'s Apollo implement) or tethered power.

## How does AI robotics differ from embodied AI?

The terms overlap heavily and are often used interchangeably, but the emphasis differs. [Embodied AI](/wiki/embodied_ai) is the broader research thesis that intelligence is shaped by having a body that senses and acts in the world, a framing that includes simulated agents and virtual avatars as well as physical machines. AI robotics is the applied, hardware-grounded slice of that thesis: it specifically concerns building real robots whose behavior is produced by learned AI models. In practice, embodied-AI research (world models, VLAs, sim-to-real) supplies the algorithms, and AI robotics deploys them on [robotics](/wiki/robotics) hardware. The [humanoid robot](/wiki/humanoid_robot) wave of 2024-2026 sits squarely at this intersection.

## What is the current state of AI robotics (2025-2026)?

As of mid-2026, AI robotics is transitioning from research to early commercialization. Several milestones mark this shift:

- Boston Dynamics launched the production version of its electric Atlas at CES 2026, with commercial deployments at Hyundai factories scheduled for 2026 and plans for Hyundai to produce up to 30,000 units per year by 2028 [21].
- Figure AI reached a $39 billion valuation in its September 2025 Series C, a roughly 15x increase from its $2.6 billion valuation in February 2024, and demonstrated sustained, multi-month deployment of Figure 02 at BMW [20][22].
- The vision-language-action paradigm matured from research papers into shipping software, with Physical Intelligence's pi0, Figure's Helix, and Google DeepMind's Gemini Robotics On-Device all targeting cross-embodiment, general-purpose control [11][12][13].
- NVIDIA expanded its Isaac ecosystem with Isaac Lab 3.0 and the Cosmos world foundation models, establishing itself as the primary infrastructure provider for physical AI development [18].
- China's robotics industry, led by Unitree and others, pushed prices down and brought affordable hardware to market [23].

The field's trajectory suggests that the next few years will see a shift from dozens of deployed robots to thousands, concentrated initially in manufacturing and logistics where the environments are semi-structured and the economic case is clearest. Broader deployment in households, healthcare, and public spaces will likely follow as the technology matures and costs decline.

## See also

- [Robotics](/wiki/robotics)
- [Embodied AI](/wiki/embodied_ai)
- [Robot foundation model](/wiki/robot_foundation_model)
- [Humanoid robot](/wiki/humanoid_robot)
- [Vision-language-action model](/wiki/vision_language_action_model)
- [Reinforcement learning](/wiki/reinforcement_learning)
- [Computer vision](/wiki/computer_vision)
- [Foundation model](/wiki/foundation_model)
- [Sim-to-real transfer](/wiki/sim_to_real_transfer)
- [World model](/wiki/world_model)

## References

[1] Goldman Sachs. (2024). "The global market for humanoid robots could reach $38 billion by 2035." https://www.goldmansachs.com/insights/articles/the-global-market-for-robots-could-reach-38-billion-by-2035

[2] Morgan Stanley. (2025). "Humanoid Robot Market Expected to Reach $5 Trillion by 2050." https://www.morganstanley.com/insights/articles/humanoid-robot-market-5-trillion-by-2050

[3] SRI International. "Shakey the Robot." https://www.sri.com/hoi/shakey-the-robot/

[4] International Federation of Robotics. "History of Industrial Robots." https://ifr.org/robot-history

[5] Brooks, R. (1991). "Intelligence without Representation." Artificial Intelligence, 47(1-3), 139-159.

[6] Tobin, J., et al. (2017). "Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World." IROS 2017. https://arxiv.org/abs/1703.06907

[7] Wen, B., et al. (2024). "FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects." CVPR 2024. https://arxiv.org/abs/2312.08344

[8] OpenAI. (2019). "Solving Rubik's Cube with a Robot Hand." https://openai.com/research/solving-rubiks-cube

[9] Ahn, M., et al. (2022). "Do As I Can, Not As I Say: [Grounding](/wiki/grounding) Language in Robotic Affordances." https://say-can.github.io/

[10] Google DeepMind. (2023). "RT-2: New model translates vision and language into action." https://deepmind.google/blog/rt-2-new-model-translates-vision-and-language-into-action/

[11] Physical Intelligence. (2024). "pi0: A Vision-Language-Action Flow Model for General Robot Control." https://www.pi.website/download/pi0.pdf

[12] Figure AI. (2025). "Helix: A Vision-Language-Action Model for Generalist Humanoid Control." https://www.figure.ai/news/helix

[13] Google DeepMind. (2025). "Gemini Robotics On-Device brings AI to local robotic devices." https://deepmind.google/blog/gemini-robotics-on-device-brings-ai-to-local-robotic-devices/

[14] Open X-Embodiment Collaboration. (2023). "Open X-Embodiment: Robotic Learning Datasets and RT-X Models." https://arxiv.org/abs/2310.08864

[15] NVIDIA. (2025). "NVIDIA Accelerates Robotics Research and Development With New Open Models and Simulation Libraries." https://nvidianews.nvidia.com/news/nvidia-accelerates-robotics-research-and-development-with-new-open-models-and-simulation-libraries

[16] Shadow Robot Company. (2025). "Shadow Robot unveils the world's most robust dexterous robot hand, developed in partnership with Google DeepMind." https://shadowrobot.com/news/press-releases/shadow-robot-unveils-the-worlds-most-robust-dexterous-robot-hand-developed-in-partnership-with-google-deepmind/

[17] Musk, E. (2026). Tesla Q4 2025 Earnings Call. Referenced in: https://botinfo.ai/articles/tesla-optimus

[18] NVIDIA. (2026). "NVIDIA Opens Portals to World of Robotics With New Omniverse Libraries, Cosmos Physical AI Models and AI Computing Infrastructure." https://nvidianews.nvidia.com/news/nvidia-opens-portals-to-world-of-robotics-with-new-omniverse-libraries-cosmos-physical-ai-models-and-ai-computing-infrastructure

[19] Brohan, A., et al. (2022). "RT-1: Robotics Transformer for Real-World Control at Scale." https://arxiv.org/abs/2212.06817

[20] Figure AI. (2025). "Figure 02 deployment at BMW Spartanburg." https://www.figure.ai/news/series-c

[21] Boston Dynamics. (2026). "Boston Dynamics Unveils New Atlas Robot to Revolutionize Industry." https://bostondynamics.com/blog/boston-dynamics-unveils-new-atlas-robot-to-revolutionize-industry/

[22] Figure AI. (2025). "Figure Exceeds $1B in Series C Funding at $39B Post-Money Valuation." https://www.figure.ai/news/series-c

[23] Unitree Robotics. "G1 Humanoid Robot." https://www.unitree.com/g1/