AI robotics

Artificial Intelligence Robotics

22 min read

Updated Jun 23, 2026

Suggest edit History Talk

RawGraph

Last edited

Jun 23, 2026

Fact-checked

In review queue

Sources

23 citations

Revision

v4 · 4,376 words

Fact-checks are independent of edits: a reviewer re-verifies the article against its sources and stamps the date. How we verify

AI robotics is the integration of artificial intelligence techniques with physical robotic systems, enabling machines to perceive their environments, make decisions, and carry out actions in the real world. It is the engineering practice behind embodied AI: rather than following pre-programmed sequences like a traditional industrial arm, an AI-powered robot uses learned models (computer vision for perception, reinforcement learning or imitation learning for control, and increasingly large vision-language-action models) to adapt to new situations, learn from experience, and handle unstructured tasks. The field sits at the intersection of machine learning, computer vision, natural language processing, robotics, and mechanical engineering.

The defining shift of the modern era is the move from hand-coded controllers to learned policies. The convergence of cheaper sensors, more powerful compute, and breakthroughs in deep learning has accelerated this since the mid-2010s: robots that once required carefully controlled factory floors can now navigate warehouses, manipulate unfamiliar objects, and follow spoken instructions. As of 2026, the field is transitioning from research demonstrations toward commercial deployment, driven by a wave of general-purpose humanoid robots and robot foundation models. Goldman Sachs projects the total addressable market for humanoid robots will reach $38 billion by 2035, a more than sixfold increase from its earlier $6 billion estimate, with annual shipments rising to about 1.4 million units ^[1]. Morgan Stanley estimates the broader humanoid market, including supply chains and service networks, could surpass $5 trillion by 2050 ^[2].

What makes a robot an "AI robot"?

The distinction is the source of the robot's behavior. A classical industrial robot executes a fixed program: every motion is specified in advance and the robot fails if the world deviates from the script. An AI robot instead runs a learned policy, a function (usually a neural network) that maps sensor observations to actions and is trained on data rather than written by hand. This lets the robot generalize to objects, layouts, and instructions it was not explicitly programmed for. Three ingredients define the modern stack: perception (computer vision and depth sensing that turn raw sensor data into a scene understanding), policy learning (reinforcement learning, imitation learning, or pretrained vision-language-action models that produce actions), and simulation (training in physics simulators before deploying to hardware). The remainder of this article traces how these came together.

History

Early research (1960s-1980s)

The roots of AI robotics trace back to the 1960s, when researchers first attempted to give machines the ability to reason about the physical world. The most notable early project was Shakey the Robot, developed at the Stanford Research Institute (SRI International) between 1966 and 1972. Funded by the Defense Advanced Research Projects Agency (DARPA), Shakey was the first mobile robot capable of reasoning about its own actions. It could perceive its surroundings using a TV camera and range finders, plan a sequence of actions to achieve a goal, and navigate through rooms while pushing objects around ^[3].

Shakey's software contributions proved as important as the robot itself. The project produced the A* search algorithm (used widely in pathfinding to this day), the STRIPS automated planner, and the Hough transform for detecting geometric shapes in images. The robot's programming was done primarily in LISP, and it could accept commands in simple English ^[3].

During the same period, industrial robotics took a separate path. Unimate, the first industrial robot arm, was installed at a General Motors plant in 1961 to perform die-casting tasks. These early industrial robots were not intelligent in any meaningful sense; they simply repeated pre-programmed movements with high precision. Throughout the 1970s and 1980s, companies like FANUC, ABB, and KUKA expanded the use of robot arms in automotive manufacturing, welding, and assembly lines ^[4].

The AI winter and recovery (1980s-2000s)

Progress in AI robotics slowed during the AI winter of the late 1980s and early 1990s, as funding dried up and the limitations of symbolic AI approaches became apparent. Robots that relied on hand-crafted rules and logical planning struggled with the messiness of real-world environments.

A shift came in the late 1990s and 2000s with the rise of behavior-based robotics, championed by Rodney Brooks at MIT. Brooks argued that robots should be built from the bottom up, with simple reactive behaviors layered on top of each other, rather than relying on complex internal world models. His company iRobot, founded in 1990, eventually produced the Roomba vacuum cleaner in 2002, which became one of the first commercially successful consumer robots ^[5].

Meanwhile, Honda's ASIMO project (2000) and Sony's AIBO robot dog (1999) demonstrated that legged locomotion and consumer-facing robots were technically feasible, even if not yet practical for real work.

The deep learning revolution (2012-present)

The modern era of AI robotics began with the deep learning revolution around 2012. The success of convolutional neural networks on image recognition tasks (notably AlexNet's victory in the ImageNet competition) opened the door to robots that could see and interpret their surroundings far more effectively than previous systems. Reinforcement learning provided a framework for robots to learn behaviors through trial and error, and the combination of simulation with real-world training (sim-to-real transfer) made it practical to train robots on millions of interactions without wearing out physical hardware ^[6].

Core AI technologies in robotics

Computer vision for perception

Modern robots rely heavily on computer vision to understand their environments. Cameras, depth sensors (like LiDAR and structured-light sensors), and sometimes radar provide raw sensory input. Deep learning models, particularly convolutional neural networks and more recently vision transformers, process this input to perform object detection, semantic segmentation, pose estimation, and scene understanding.

For manipulation tasks, robots need to identify objects, estimate their 3D position and orientation, and determine grasp points. Systems like DenseFusion and FoundationPose have pushed the accuracy of 6-DOF (six degrees of freedom) pose estimation, allowing robots to pick up objects they have never seen before based on shape and visual features alone ^[7].

Reinforcement learning for manipulation and locomotion

Reinforcement learning has become a primary method for teaching robots physical skills. In RL, an agent learns a policy (a mapping from observations to actions) by interacting with an environment and receiving reward signals. For robotics, this means a robot can learn to walk, grasp objects, or perform assembly tasks through repeated practice.

Key milestones include OpenAI's work on dexterous in-hand manipulation (2018-2019), where a robotic hand learned to solve a Rubik's Cube entirely through simulation training, and DeepMind's work on locomotion for quadruped robots. The challenge with RL in robotics is sample efficiency: robots need millions of training episodes, which is why simulation plays such a large role ^[8].

Large language models for planning and instruction following

The emergence of large language models (LLMs) has opened a new channel for human-robot interaction. Instead of programming specific behaviors, operators can issue natural language instructions that the robot interprets and executes. Several research projects from Google and Google DeepMind have demonstrated this approach:

System	Year	Description	Key capability
SayCan	2022	Grounds LLM outputs in robotic affordances	Breaks long-horizon tasks into executable steps by scoring what the robot can actually do
Inner Monologue	2022	Closes the loop between LLM planning and environment feedback	Uses scene descriptions and error detection to let the LLM re-plan when something goes wrong
RT-1	2022	Robotics Transformer trained on 130,000 real-world episodes	97% success rate on 700+ tasks with a fleet of 13 robots
RT-2	2023	Vision-Language-Action model combining web-scale and robotics data	Transfers web knowledge directly to robot control; can reason about novel objects
Open X-Embodiment	2023	Dataset pooling data from 22 robot types across 33 labs	Training on multi-robot data tripled RT-2 performance on real-world skills

SayCan works by combining an LLM's language understanding with a learned "affordance function" that estimates whether the robot can physically perform each proposed action. When a user says "I spilled my drink, can you help?", the LLM generates candidate action sequences, and the affordance model filters them to what the robot can actually do in its current state. The system achieved 67% execution success on long-horizon kitchen tasks with as many as 50 individual steps ^[9].

RT-2, published in 2023, took a different approach by fine-tuning a large vision-language model (VLM) to output robot actions directly. The model treats robot actions as text tokens, allowing it to leverage the vast knowledge learned from internet-scale training data. This means RT-2 can reason about objects it was never trained on in a robotics context. When asked to "pick up the object you would use to hammer a nail," it correctly selects a rock, even though no robotics training data included that instruction ^[10].

What is a vision-language-action (VLA) model?

A vision-language-action model (VLA) is a single neural network that takes camera images and a natural language instruction as input and directly outputs robot actions, typically by repurposing a pretrained vision-language model so that the robot inherits broad world knowledge from internet data. VLAs are the dominant paradigm for robot foundation models as of 2026, and several have moved from research papers into deployed hardware:

Model	Developer	Year	Notable detail
RT-2	Google DeepMind	2023	First widely cited VLA; encodes actions as text tokens on a VLM backbone
pi0	Physical Intelligence	2024	Builds on the 3B-parameter PaliGemma VLM and adds flow matching for continuous control; trained on roughly 10,000 hours of robot data spanning 68 tasks across 7 robot configurations
Helix	Figure AI	2025	First VLA to run two humanoids on a shared, long-horizon task; runs fully onboard low-power embedded GPUs
Gemini Robotics On-Device	Google DeepMind	2025	Runs locally on the robot with no network; can be fine-tuned for a new task from as few as 50 demonstrations

Physical Intelligence describes its pi0 model as "a general-purpose robot foundation model" that uses flow matching to generate continuous, high-frequency action sequences, in contrast to the discretized action tokens of earlier transformer policies ^[11]. Figure AI's Helix uses a "System 1, System 2" design (a slow VLM for reasoning paired with a fast policy for control) and is, per Figure, "the first VLA to output high-rate continuous control of the entire humanoid upper body" including individual fingers, wrists, torso, and head ^[12]. Gemini Robotics On-Device, released in 2025, was trained for ALOHA robots but adapted to a bi-arm Franka FR3 and to Apptronik's Apollo humanoid, illustrating the cross-embodiment ambition of the VLA approach ^[13].

Foundation models for robotics

The concept of foundation models (large models pre-trained on broad data that can be adapted to many tasks) is now being applied to robotics. The Open X-Embodiment project, a collaboration among Google DeepMind and 33 academic labs, created a shared dataset of 60 existing robot datasets pooled into a corpus of more than 1 million real robot trajectories spanning 22 robot embodiments, 527 skills, and 160,266 tasks. Training a single model on this combined data led to significantly better performance than training on any individual robot's data ^[14].

The goal is a robot foundation model analogous to GPT or BERT for language: a single pre-trained model that can be adapted to control many different robots on many different tasks with minimal fine-tuning. NVIDIA's Project GR00T, Physical Intelligence's pi0, and various academic efforts are pursuing this vision, though a truly general-purpose robotics foundation model remains an open research challenge as of 2026 ^[15].

Key companies and platforms

The AI robotics landscape spans established players and well-funded startups:

Company	Notable robot(s)	Focus area	Key details
Boston Dynamics	Atlas (electric), Spot, Stretch	Research and commercial humanoids and quadrupeds	Atlas electric launched at CES 2026; production deployments at Hyundai and Google DeepMind scheduled for 2026
Figure AI	Figure 02, Figure 03	General-purpose humanoid for manufacturing	$39B valuation (Sep 2025); Figure 02 deployed at BMW Spartanburg plant, loading 90,000+ parts
1X Technologies	NEO	Household humanoid	$20,000 consumer price; early access delivery in U.S. starting 2026
Agility Robotics	Digit	Logistics and warehouse operations	100,000+ totes moved at GXO facility; 98% task success rate at Amazon testing site
Tesla	Optimus (Gen 3)	Factory and eventually consumer	Gen 3 production starting summer 2026 at Fremont; 25 actuators per hand
Unitree Robotics	H1, G1	Affordable humanoids and quadrupeds	G1 starting at ~$16,000; H1 achieved 3.3 m/s bipedal running speed
Physical Intelligence	pi0 (software)	Robot foundation models	Cross-embodiment VLA running on third-party arms and humanoids
NVIDIA	Isaac platform, GR00T, Cosmos	Simulation, training infrastructure, and world models	Isaac Lab 3.0 with Newton physics engine; Cosmos world foundation models for physical AI

Hardware

Sensors

AI robots depend on a suite of sensors to perceive the world:

Cameras. RGB cameras provide color images for object recognition and scene understanding. Stereo camera pairs enable depth perception. Event cameras, which detect changes in brightness asynchronously, are used for high-speed tracking.
LiDAR. Light Detection and Ranging sensors emit laser pulses and measure return times to build precise 3D maps of the environment. Widely used in autonomous vehicles and mobile robots.
Force/torque sensors. Mounted at joints or end-effectors, these measure the forces a robot exerts on objects, enabling compliant manipulation.
Tactile sensors. Sensors embedded in robotic fingers or grippers provide information about contact, pressure, and texture. The BioTac sensor mimics human fingertip sensitivity, while newer designs use arrays of "taxels" (tactile pixels) across entire finger surfaces.
IMUs. Inertial measurement units track a robot's acceleration and orientation, which is especially important for legged robots maintaining balance.

Actuators

Actuators convert electrical energy into physical motion. The main types used in modern AI robots include:

Actuator type	Strengths	Weaknesses	Common use
Electric motors (brushless DC)	High efficiency, precise control, low maintenance	Limited torque density	Most humanoid and industrial robots
Hydraulic actuators	Very high force output, smooth motion	Heavy, prone to leaks, noisy	Heavy-lift applications; original Boston Dynamics Atlas
Series elastic actuators (SEAs)	Inherent compliance, safer for human interaction	Added mechanical complexity	Collaborative robots, legged locomotion
Quasi-direct drive	High backdrivability, good force control	Lower gear ratio limits torque	Unitree quadrupeds, some humanoid legs

Dexterous hands

Robotic hand dexterity remains one of the field's hardest challenges. The human hand has 27 degrees of freedom and dense tactile sensing across every finger, a combination that is extremely difficult to replicate mechanically.

The Shadow Dexterous Hand, developed by Shadow Robot Company in London, is one of the most advanced robotic hands available. It has 24 degrees of freedom, 20 motors, and over 100 sensors operating at up to 1 kHz. In 2025, Shadow Robot partnered with Google DeepMind to develop the DEX-EE, a next-generation hand with hundreds of tactile sensors per fingertip, precise torque control at 10 kHz internal loops, and the ability to close from fully open in 500 milliseconds ^[16].

Tesla's Optimus Gen 3 features hands with 22 degrees of freedom and 25 actuators per forearm/hand assembly (50 total), a significant step up from the 12 actuators in Gen 2. These hands are designed for factory tasks like picking up small parts and operating tools ^[17].

Sim-to-real transfer

One of the most important techniques in modern AI robotics is sim-to-real transfer: training robot policies in simulated environments and then deploying them on physical robots. This approach solves a fundamental bottleneck. Training a robot directly in the real world is slow, expensive, and potentially dangerous. A robot learning to walk might fall thousands of times; a robot learning to grasp objects might break them. In simulation, these failures cost nothing.

The process typically works as follows:

A physics simulator (such as NVIDIA Isaac Sim, MuJoCo, or PyBullet) creates a virtual environment with the robot model.
The robot policy is trained using reinforcement learning across thousands of parallel simulation instances running on GPUs.
Domain randomization is applied: the simulator varies physical parameters (friction, object mass, lighting, sensor noise) randomly during training so the policy learns to handle a range of conditions.
The trained policy is transferred to the real robot, ideally requiring zero additional real-world training ("zero-shot transfer").

NVIDIA's Isaac Lab is currently the leading platform for this workflow. It leverages GPU-based parallelization to run thousands of simulation instances simultaneously, and its domain randomization tools help bridge the gap between simulated and real physics. Isaac Lab 3.0, released in early access in 2026, uses the new Newton physics engine and supports large-scale training on DGX-class infrastructure ^[18].

Boston Dynamics demonstrated the power of this approach by training locomotion policies for its Spot quadruped in Isaac Lab and deploying them directly on the physical robot, achieving performance competitive with the company's hand-tuned controllers ^[18].

Research milestones

RT-1 and RT-2

Google's Robotics Transformer series represented a shift toward treating robot control as a sequence modeling problem. RT-1 (2022) used a transformer architecture with an EfficientNet backbone and early language fusion to output discretized robot actions. Trained on 130,000 real-world episodes covering over 700 tasks, RT-1 achieved a 97% success rate on known tasks and generalized 25% better to new tasks than prior baselines ^[19].

RT-2 (2023) went further by co-training on both internet-scale vision-language data and robotics data. The key insight was that robot actions could be represented as text tokens appended to the model's vocabulary, allowing a vision-language model to output control commands directly. This meant the model could leverage its "knowledge" of the world (learned from billions of image-text pairs) to handle robotics tasks involving novel objects and novel instructions ^[10].

SayCan and language-conditioned planning

SayCan (2022) demonstrated that large language models could serve as high-level planners for robots, provided their outputs were grounded in what the robot could physically do. The system combined an LLM's ability to decompose tasks in natural language with a learned value function that scored each candidate action by how likely the robot was to succeed at it. This "grounding" step prevented the LLM from proposing actions the robot could not execute ^[9].

Open X-Embodiment

The Open X-Embodiment project (2023) addressed a fundamental scaling problem in robotics: individual labs collect data on individual robots, but no single lab has enough data to train a general model. By pooling 60 datasets from 22 different robot types across 34 institutions into a corpus of more than 1 million real robot trajectories, the project showed that cross-embodiment training is not only possible but beneficial. A model trained on the combined dataset performed significantly better across many robots than models trained on any single robot's data, and training RT-2 on this multi-embodiment data tripled its real-world performance ^[14].

Applications

Manufacturing and logistics

The most immediate commercial applications for AI robots are in structured environments like factories and warehouses. Agility Robotics' Digit has been deployed at Amazon and GXO Logistics facilities for tote-moving tasks. Figure AI's Figure 02 has completed over 1,250 runtime hours at BMW's Spartanburg plant, running daily 10-hour shifts and loading parts for X3 vehicle production ^[20].

Agriculture

AI-powered robots are used for fruit picking, weeding, and crop monitoring. Companies like Agrobot and Abundant Robotics (before its closure) developed strawberry-picking robots that use computer vision to identify ripe fruit and reinforcement-learned policies to handle delicate produce without bruising.

Healthcare

Surgical robots like Intuitive Surgical's da Vinci system already incorporate limited AI for tremor filtering and motion scaling. Research is pushing toward greater autonomy, with systems learning to perform specific surgical subtasks (such as suturing) from demonstration data.

Household

1X Technologies' NEO and various other systems aim to bring robots into homes for cleaning, organizing, and fetching tasks. The challenge is that homes are far less structured than factories, with enormous variation in layouts, objects, and tasks.

Challenges

Generalization

Perhaps the biggest challenge is generalization. A robot trained to pick up mugs in a lab may fail when faced with a mug of a different shape, color, or orientation in a different lighting condition. Foundation model approaches like Open X-Embodiment aim to address this through scale and diversity of training data, but the gap between lab demos and reliable real-world performance remains substantial.

Safety and reliability

Robots that share space with humans must be safe. This requires both hardware safeguards (force-limiting actuators, soft coverings) and software safeguards (collision avoidance, behavioral constraints). Certifying AI-controlled robots for safety is complicated by the opacity of neural network decision-making.

Hardware cost and durability

Advanced sensors, high-torque actuators, and dexterous hands remain expensive. The Unitree G1 at roughly $16,000 represents the low end; most capable humanoid robots cost $90,000 to over $140,000. Bringing costs down to levels suitable for mass deployment will require supply chain maturation similar to what happened with smartphones. Goldman Sachs cited a 40% reduction in the bill of materials for high-end humanoids as a key driver of its faster path to profitability ^[1].

Data scarcity

Unlike language AI, where trillions of tokens of text are freely available on the internet, robotics data is scarce and expensive to collect. Every data point requires a physical robot interacting with a real or simulated environment. Projects like Open X-Embodiment and teleoperation-based data collection are addressing this, but robotics datasets remain orders of magnitude smaller than language datasets.

Battery life

Mobile robots are constrained by battery capacity. Most humanoid robots operate for one to four hours on a charge. Longer operation requires either battery swapping (as Boston Dynamics' Atlas and Apptronik's Apollo implement) or tethered power.

How does AI robotics differ from embodied AI?

The terms overlap heavily and are often used interchangeably, but the emphasis differs. Embodied AI is the broader research thesis that intelligence is shaped by having a body that senses and acts in the world, a framing that includes simulated agents and virtual avatars as well as physical machines. AI robotics is the applied, hardware-grounded slice of that thesis: it specifically concerns building real robots whose behavior is produced by learned AI models. In practice, embodied-AI research (world models, VLAs, sim-to-real) supplies the algorithms, and AI robotics deploys them on robotics hardware. The humanoid robot wave of 2024-2026 sits squarely at this intersection.

What is the current state of AI robotics (2025-2026)?

As of mid-2026, AI robotics is transitioning from research to early commercialization. Several milestones mark this shift:

Boston Dynamics launched the production version of its electric Atlas at CES 2026, with commercial deployments at Hyundai factories scheduled for 2026 and plans for Hyundai to produce up to 30,000 units per year by 2028 ^[21].
Figure AI reached a $39 billion valuation in its September 2025 Series C, a roughly 15x increase from its $2.6 billion valuation in February 2024, and demonstrated sustained, multi-month deployment of Figure 02 at BMW ^[20]^[22].
The vision-language-action paradigm matured from research papers into shipping software, with Physical Intelligence's pi0, Figure's Helix, and Google DeepMind's Gemini Robotics On-Device all targeting cross-embodiment, general-purpose control ^[11]^[12]^[13].
NVIDIA expanded its Isaac ecosystem with Isaac Lab 3.0 and the Cosmos world foundation models, establishing itself as the primary infrastructure provider for physical AI development ^[18].
China's robotics industry, led by Unitree and others, pushed prices down and brought affordable hardware to market ^[23].

The field's trajectory suggests that the next few years will see a shift from dozens of deployed robots to thousands, concentrated initially in manufacturing and logistics where the environments are semi-structured and the economic case is clearest. Broader deployment in households, healthcare, and public spaces will likely follow as the technology matures and costs decline.

References

Goldman Sachs. (2024). "The global market for humanoid robots could reach $38 billion by 2035." https://www.goldmansachs.com/insights/articles/the-global-market-for-robots-could-reach-38-billion-by-2035 ↩
Morgan Stanley. (2025). "Humanoid Robot Market Expected to Reach $5 Trillion by 2050." https://www.morganstanley.com/insights/articles/humanoid-robot-market-5-trillion-by-2050 ↩
SRI International. "Shakey the Robot." https://www.sri.com/hoi/shakey-the-robot/ ↩
International Federation of Robotics. "History of Industrial Robots." https://ifr.org/robot-history ↩
Brooks, R. (1991). "Intelligence without Representation." Artificial Intelligence, 47(1-3), 139-159. ↩
Tobin, J., et al. (2017). "Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World." IROS 2017. https://arxiv.org/abs/1703.06907 ↩
Wen, B., et al. (2024). "FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects." CVPR 2024. https://arxiv.org/abs/2312.08344 ↩
OpenAI. (2019). "Solving Rubik's Cube with a Robot Hand." https://openai.com/research/solving-rubiks-cube ↩
Ahn, M., et al. (2022). "Do As I Can, Not As I Say: Grounding Language in Robotic Affordances." https://say-can.github.io/ ↩
Google DeepMind. (2023). "RT-2: New model translates vision and language into action." https://deepmind.google/blog/rt-2-new-model-translates-vision-and-language-into-action/ ↩
Physical Intelligence. (2024). "pi0: A Vision-Language-Action Flow Model for General Robot Control." https://www.pi.website/download/pi0.pdf ↩
Figure AI. (2025). "Helix: A Vision-Language-Action Model for Generalist Humanoid Control." https://www.figure.ai/news/helix ↩
Google DeepMind. (2025). "Gemini Robotics On-Device brings AI to local robotic devices." https://deepmind.google/blog/gemini-robotics-on-device-brings-ai-to-local-robotic-devices/ ↩
Open X-Embodiment Collaboration. (2023). "Open X-Embodiment: Robotic Learning Datasets and RT-X Models." https://arxiv.org/abs/2310.08864 ↩
NVIDIA. (2025). "NVIDIA Accelerates Robotics Research and Development With New Open Models and Simulation Libraries." https://nvidianews.nvidia.com/news/nvidia-accelerates-robotics-research-and-development-with-new-open-models-and-simulation-libraries ↩
Shadow Robot Company. (2025). "Shadow Robot unveils the world's most robust dexterous robot hand, developed in partnership with Google DeepMind." https://shadowrobot.com/news/press-releases/shadow-robot-unveils-the-worlds-most-robust-dexterous-robot-hand-developed-in-partnership-with-google-deepmind/ ↩
Musk, E. (2026). Tesla Q4 2025 Earnings Call. Referenced in: https://botinfo.ai/articles/tesla-optimus ↩
NVIDIA. (2026). "NVIDIA Opens Portals to World of Robotics With New Omniverse Libraries, Cosmos Physical AI Models and AI Computing Infrastructure." https://nvidianews.nvidia.com/news/nvidia-opens-portals-to-world-of-robotics-with-new-omniverse-libraries-cosmos-physical-ai-models-and-ai-computing-infrastructure ↩
Brohan, A., et al. (2022). "RT-1: Robotics Transformer for Real-World Control at Scale." https://arxiv.org/abs/2212.06817 ↩
Figure AI. (2025). "Figure 02 deployment at BMW Spartanburg." https://www.figure.ai/news/series-c ↩
Boston Dynamics. (2026). "Boston Dynamics Unveils New Atlas Robot to Revolutionize Industry." https://bostondynamics.com/blog/boston-dynamics-unveils-new-atlas-robot-to-revolutionize-industry/ ↩
Figure AI. (2025). "Figure Exceeds $1B in Series C Funding at $39B Post-Money Valuation." https://www.figure.ai/news/series-c ↩
Unitree Robotics. "G1 Humanoid Robot." https://www.unitree.com/g1/ ↩

Improve this article

Add missing citations, update stale details, or suggest a clearer explanation. Every suggestion is reviewed for sourcing before it goes live.

3 revisions by 1 contributors · full history

Suggest edit

What links here

CloudMinds Digital twin Dyna Robotics Foundation Phantom MK1 Humanoid robot Humanoid robot manufacturers Markov Decision Process (MDP)Motion 2 Physical Intelligence Robot safety SIMA (DeepMind)World model

What makes a robot an "AI robot"?

History

Early research (1960s-1980s)

The AI winter and recovery (1980s-2000s)

The deep learning revolution (2012-present)

Core AI technologies in robotics

Computer vision for perception

Reinforcement learning for manipulation and locomotion

Large language models for planning and instruction following

What is a vision-language-action (VLA) model?

Foundation models for robotics

Key companies and platforms

Hardware

Sensors

Actuators

Dexterous hands

Sim-to-real transfer

Research milestones

RT-1 and RT-2

SayCan and language-conditioned planning

Open X-Embodiment

Applications

Manufacturing and logistics

Agriculture

Healthcare

Household

Challenges

Generalization

Safety and reliability

Hardware cost and durability

Data scarcity

Battery life

How does AI robotics differ from embodied AI?

What is the current state of AI robotics (2025-2026)?

See also

References

Improve this article

Related Articles

SmolVLA

Humanoid robot

Tesla Optimus

Autonomous driving

Embodied AI

Warehouse robot

What links here

Related Articles

SmolVLA

Humanoid robot

Tesla Optimus

Autonomous driving

Embodied AI

Warehouse robot

What links here