Humanoid robot autonomy levels

Humanoid robot autonomy levels describe a graduated framework for classifying the degree of independent operation that a humanoid robot can achieve. Ranging from Level 0 (no autonomy, full human control) through Level 5 (full general intelligence), the framework provides a standardized way to compare different robots and track the progress of the industry over time. As of 2025, most commercially available humanoid robots operate at Level 2 or Level 3, with several companies actively developing Level 4 systems. Beyond autonomy, humanoid robots can also be classified by their physical form, application domain, and the nature of their artificial intelligence integration.

Autonomy levels (L0 through L5)

The autonomy framework for humanoid robots draws loose parallels from the SAE levels of driving automation used in the automotive industry, adapted for the broader and more complex domain of general-purpose robotics. Each level represents a qualitative increase in the robot's ability to perceive, reason, and act without human intervention.

Level 0: No autonomy

At Level 0, the robot has no capacity for independent operation. Every movement and action requires continuous, real-time human control. The operator must directly command each joint position or end-effector trajectory. Robots at this level are essentially sophisticated puppets, useful for teleoperation research and performance art but incapable of any autonomous task execution. Early humanoid prototypes from the 1960s and 1970s fell into this category, as did many research platforms built purely for mechanical validation.

Level 1: Auxiliary control

Level 1 robots possess basic programmable movement with limited functionality. They can execute pre-programmed motion sequences (such as walking a fixed path or waving) but lack the ability to adapt to changes in their environment. Their control systems typically rely on open-loop or simple closed-loop algorithms with minimal sensor feedback. Industrial robotic arms from the 1980s and 1990s operated at roughly this level of capability. In the humanoid domain, early versions of Honda ASIMO (circa 2000) represented a transition from L0 to L1, executing choreographed walking patterns on flat, predictable surfaces.

Level 2: Partial autonomy

Level 2 robots use algorithm-driven movement within structured environments. They incorporate sensor feedback loops for balance correction and obstacle detection but still operate within tightly constrained parameters. These robots can perform specific tasks in controlled settings (factory floors, laboratories) with minimal human oversight during execution, though a human must set up the task and monitor for failures. SoftBank Robotics NAO, widely used in academic research and education, operates near this level, capable of navigating known indoor environments and performing scripted interactions. Many warehouse and logistics robots with partial humanoid features also fall into this category.

Level 3: Conditional autonomy

Level 3 represents the current industry standard for leading humanoid robots as of 2025. Robots at this level are equipped with sensor suites (cameras, LiDAR, force/torque sensors, IMUs) that provide real environmental awareness. They can adapt their behavior based on perceived conditions, handle moderate variability in their surroundings, and recover from some classes of errors without human intervention. However, they still require human oversight for novel situations, complex decision-making, and safety-critical judgments.

Examples of Level 3 systems include early deployment versions of Tesla Optimus, the Boston Dynamics Atlas platform in its later iterations, and initial versions of Agility Robotics Digit. These robots can perform tasks like picking and placing objects, navigating semi-structured environments, and responding to basic voice commands. Their autonomy is conditional because it depends on operating within a known set of scenarios; truly unexpected situations still require human takeover or cause the robot to halt.

Level 4: High autonomy

Level 4 robots can complete tasks with independent reasoning, planning, and execution across a broader range of environments. They can handle novel objects, recover from a wider set of failures, and make contextual decisions about how to accomplish goals that were specified at a high level (for example, "clean the kitchen" rather than "pick up object A and place it in location B"). This level requires sophisticated world models, advanced manipulation skills, and robust perception systems that can handle occlusion, clutter, and dynamic changes.

As of 2025, Level 4 is actively under development by several companies. Figure 02 with its OpenAI-powered reasoning capabilities, advanced versions of Tesla Optimus with end-to-end neural network control, and Sanctuary AI Phoenix with its Carbon AI system all aim to reach this tier. No humanoid robot has yet achieved consistent, reliable Level 4 operation across diverse real-world environments, though demonstrations in controlled settings have shown promising results.

Level 5: Full intelligence

Level 5 represents theoretical human-equivalent general intelligence in a robotic body. A Level 5 humanoid robot would possess creative problem-solving abilities, abstract reasoning, emotional understanding, and the capacity to learn any new task from minimal instruction, just as a human can. It would seamlessly handle novel environments, social interactions, and multi-step tasks that require common sense reasoning.

No current technology approaches Level 5. Achieving it would require breakthroughs in artificial general intelligence (AGI), and most researchers consider it decades away at minimum. The gap between Level 4 and Level 5 is arguably the largest in the entire framework, as it encompasses the full breadth of human cognitive capability.

Autonomy levels summary table

Level	Name	Description	Key capabilities	Examples	Status (2025)
L0	No autonomy	Continuous human control required for all movements and actions	Teleoperation only; no independent sensing or decision-making	Early research prototypes, teleoperated humanoids	Historical
L1	Auxiliary control	Basic programmable movement with limited function	Pre-programmed motion sequences; minimal sensor feedback; open-loop or simple closed-loop control	Early Honda ASIMO, basic research platforms	Historical/Legacy
L2	Partial autonomy	Algorithm-driven movement within structured environments	Sensor-based balance correction; obstacle detection in known settings; task execution with human setup and monitoring	SoftBank Robotics NAO, early warehouse humanoids, Fourier Intelligence GR-1	Available
L3	Conditional autonomy	Sensor-equipped with environmental awareness and adaptive behavior	Environmental perception via cameras/LiDAR/IMU; moderate error recovery; voice command response; task adaptation within known scenarios	Tesla Optimus (early deployment), Boston Dynamics Atlas, Agility Robotics Digit	Current industry standard
L4	High autonomy	Independent reasoning and task completion across diverse environments	High-level goal interpretation; novel object handling; multi-step planning; broad failure recovery; contextual decision-making	Figure 02 (target), advanced Tesla Optimus (target), Sanctuary AI Phoenix (target)	In development
L5	Full intelligence	Human-equivalent general intelligence with creative problem-solving	Abstract reasoning; emotional understanding; learning any task from minimal instruction; common sense reasoning; full social interaction	None	Theoretical

Classification by physical form

Beyond autonomy levels, humanoid robots can be classified by their physical locomotion design. The two primary categories are bipedal and wheeled humanoids, each with distinct advantages and trade-offs.

Bipedal humanoids

Bipedal humanoid robots walk on two legs, closely mimicking the human gait. This design offers maximum mobility and the ability to navigate environments built for humans, including stairs, narrow doorways, uneven terrain, and cluttered workspaces. However, bipedal locomotion imposes complex balance requirements. Maintaining stability while walking, turning, carrying loads, or recovering from pushes demands sophisticated control algorithms and high-bandwidth actuators.

Most flagship humanoid robots adopt the bipedal form factor. Boston Dynamics Atlas, Tesla Optimus, Figure 02, Unitree H1, and UBTECH Walker S2 are all bipedal. The engineering challenge of bipedal walking has driven significant advances in reinforcement learning, model predictive control, and actuator design.

Wheeled humanoids

Wheeled humanoid robots feature a human-like upper body mounted on a wheeled base. This design sacrifices stair navigation and rough terrain capability in exchange for greater stability, simpler control, lower energy consumption, and smoother movement on flat surfaces. Wheeled humanoids are particularly well-suited for structured indoor environments such as offices, hospitals, retail stores, and airports.

Examples include Enchanted Tools Mirokai, certain configurations of service robots from Keenon Robotics, and the LG Electronics CLOi platform. For many commercial service applications, the wheeled form factor offers a practical compromise between human-like interaction capability and operational reliability.

Form factor	Advantages	Disadvantages	Best suited for	Examples
Bipedal	Navigates stairs and uneven terrain; fits in human-designed spaces; maximum versatility	Complex balance control; higher energy consumption; risk of falls; more expensive actuators	Manufacturing, home assistance, construction, disaster response	Tesla Optimus, Figure 02, Unitree H1
Wheeled	Greater stability; simpler control; lower energy use; smoother movement; lower cost	Cannot climb stairs; limited on uneven terrain; restricted to flat indoor environments	Retail, hospitality, healthcare, offices, airports	Enchanted Tools Mirokai, LG Electronics CLOi

Classification by application domain

Humanoid robots are designed for different deployment environments, each placing distinct demands on the robot's motion control, perception, and autonomy capabilities. These domains can be ranked by the complexity of the motion control required.

1. Industrial manufacturing

Industrial manufacturing environments place the lowest relative motion demands on humanoid robots because the workspace is highly structured, predictable, and engineered for efficiency. Tasks include loading parts onto assembly lines, operating machinery, moving totes between stations, and performing quality inspections. The environment is typically well-lit, temperature-controlled, and free of unexpected obstacles. Figure 02 deployed at BMW's Spartanburg plant and Agility Robotics Digit working in Amazon fulfillment centers represent leading examples in this domain.

2. Commercial service

Commercial service applications require robots to operate in semi-structured settings with moderate human traffic, such as retail stores, hotels, hospitals, and restaurants. Robots must navigate around people, interact socially, and handle a wider variety of objects than in a factory. The environment is less predictable than a factory floor but still follows regular patterns. UBTECH Walker and various service-oriented platforms target this domain.

3. Extreme environments

Extreme environment applications deploy humanoid robots in hazardous locations where human presence is dangerous or impossible. This includes nuclear facilities, disaster sites, deep-sea operations, space exploration, and firefighting scenarios. These environments demand robust hardware (radiation hardening, waterproofing, temperature resistance) and high autonomy, since communication delays or signal loss may prevent real-time human control. NASA Valkyrie (R5) was designed specifically for disaster response and space exploration, while Boston Dynamics Atlas has been tested in simulated disaster scenarios through DARPA challenges.

4. Home services

Home service represents the highest motion control requirements because domestic environments are inherently unstructured, cluttered, and variable. A home robot must handle deformable objects (clothing, food), navigate around furniture and pets, operate in varying lighting conditions, and interact safely with children and elderly individuals. No humanoid robot has achieved reliable autonomous operation in home environments as of 2025, though this remains one of the most active areas of research and a primary long-term market target for companies like Tesla, 1X Technologies, and Figure.

Domain	Environment structure	Motion complexity	Key challenges	Leading examples
Industrial manufacturing	Highly structured	Lowest	Repetitive precision, cycle time, integration with existing lines	Figure 02 (BMW), Agility Robotics Digit (Amazon)
Commercial service	Semi-structured	Moderate	Human navigation, social interaction, object variety	UBTECH Walker S2, service platforms
Extreme environments	Unstructured/hazardous	High	Robustness, communication latency, hardware durability	NASA Valkyrie R5, Boston Dynamics Atlas
Home services	Unstructured/variable	Highest	Deformable objects, clutter, safety with people, diverse tasks	Research stage; targeted by Tesla, 1X Technologies, Figure

Classification by AI integration model

The humanoid robotics industry features three distinct business models for AI integration, reflecting different strategic bets on where value will accumulate in the robotics stack.

Vertically integrated companies

Vertically integrated companies build both the robot hardware and the AI software that controls it. This approach allows tight co-optimization of the mechanical design, sensor suite, and learning algorithms. Tesla (Optimus), Figure (Figure 02/03), and Unitree all pursue this strategy, developing proprietary neural networks trained on data collected from their own hardware. The advantage is end-to-end control over performance and rapid iteration cycles. The disadvantage is the enormous capital and talent required to excel simultaneously in mechanical engineering, electrical engineering, and AI research.

Hardware-focused companies

Hardware-focused companies specialize in building high-quality robot bodies and rely on partnerships or third-party solutions for higher-level AI capabilities. Sanctuary AI develops its Carbon AI system but places primary emphasis on its dexterous hardware. 1X Technologies builds the NEO platform with a focus on safe, lightweight mechanical design. Boston Dynamics has decades of expertise in dynamic locomotion hardware and increasingly integrates AI through collaborations with partners. These companies bet that the hardware challenge is the primary bottleneck and that AI capabilities can be sourced or developed incrementally.

AI model providers

AI model providers develop the foundation models and software platforms that power humanoid robots but do not build the physical hardware themselves. OpenAI provides language and reasoning models to Figure through a partnership. NVIDIA offers the Isaac and GR00T platforms for robot simulation, training, and deployment. Google DeepMind develops robotics foundation models like RT-2 and RT-X. Microsoft invests in robotics AI through Azure cloud services and partnerships. These companies view humanoid robots as a deployment platform for their AI capabilities, similar to how smartphone operating systems run on hardware from multiple manufacturers.

Integration model	Strategy	Advantages	Disadvantages	Key players
Vertically integrated	Build both hardware and AI	End-to-end optimization; rapid iteration; proprietary data flywheel	Enormous capital requirements; must excel across multiple disciplines	Tesla, Figure, Unitree
Hardware-focused	Specialize in robot bodies; partner for AI	Deep mechanical expertise; can adopt best-available AI; lower AI R&D costs	Dependent on external AI partners; less control over software performance	Sanctuary AI, 1X Technologies, Boston Dynamics
AI model providers	Develop foundation models and platforms	Scalable across many hardware platforms; leverage existing AI research; recurring software revenue	No control over hardware quality; dependent on hardware partners for deployment	OpenAI, NVIDIA, Google DeepMind, Microsoft

AI systems and learning methods

The AI systems powering humanoid robots employ several distinct learning paradigms, each suited to different aspects of robot behavior. Modern systems increasingly combine multiple approaches.

Reinforcement learning

Reinforcement learning (RL) enables robots to improve through trial and error. The robot takes actions in an environment, receives reward signals based on outcomes, and gradually learns policies that maximize cumulative reward. In humanoid robotics, RL has proven particularly effective for locomotion (walking, running, balancing, stair climbing) and dynamic tasks where the optimal control strategy is difficult to specify analytically.

RL training typically occurs in physics simulation environments before being transferred to the physical robot, a process known as sim-to-real transfer. NVIDIA Isaac Sim and MuJoCo are commonly used simulation platforms. The challenge lies in bridging the "reality gap" between simulated and real-world physics. Domain randomization (varying simulation parameters during training) and system identification (calibrating the simulator to match the real robot) are standard techniques for addressing this gap.

Imitation learning

Imitation learning (also called learning from demonstration) allows robots to acquire skills by observing human demonstrations. Rather than discovering optimal behaviors through trial and error, the robot directly learns a mapping from observations to actions based on expert examples. This approach is especially useful for manipulation tasks where specifying a reward function is difficult but demonstrating the desired behavior is straightforward.

Common methods include behavioral cloning (supervised learning on demonstration data) and inverse reinforcement learning (inferring the reward function from demonstrations). Tesla has used human teleoperators wearing motion-capture suits to collect demonstration data for Optimus. Figure has employed VR teleoperation to gather training data for dexterous manipulation. The scalability of data collection is a key bottleneck, and recent work on using internet video data for imitation learning aims to address this limitation.

Vision-language models

Vision-language models (VLMs) enable robots to understand natural language commands and reason about the visual world. These models, built on large language model architectures extended with visual encoders, can interpret instructions like "pick up the red cup on the table" by jointly processing camera images and text input. VLMs serve as the cognitive layer that bridges high-level human intent and low-level robot actions.

In humanoid robotics, VLMs are used for task planning, scene understanding, and human-robot dialogue. OpenAI's models power Figure 02's conversational abilities. Google DeepMind's PaLM-E and RT-2 demonstrated that VLMs can directly output robot control commands. The integration of VLMs gives humanoid robots a form of common-sense reasoning, allowing them to infer context, handle ambiguous instructions, and generalize to novel situations not seen during training.

Vision-language-action models

Vision-language-action (VLA) models extend VLMs by directly outputting robot actions from visual and language inputs in an end-to-end architecture. This eliminates the need for hand-crafted perception pipelines, separate planning modules, and explicit state estimation. The model takes in camera images and a language instruction, then directly produces motor commands or action tokens.

Google DeepMind's RT-2 (Robotic Transformer 2) was a landmark VLA model, demonstrating that a single model could reason about the world and produce physical actions. Subsequent work, including RT-X (trained on data from multiple robot embodiments) and Physical Intelligence's pi0 model, has pushed VLA capabilities further. VLA models represent a shift toward unified, general-purpose robot brains that can potentially transfer across different robot bodies and tasks.

Large behavior models

Large Behavior Models (LBMs) represent the newest frontier in humanoid robot AI. Pioneered by Boston Dynamics in collaboration with the Toyota Research Institute, LBMs provide unified control of the entire robot body, treating hands and feet as parts of an integrated whole rather than as separate subsystems. This approach allows the robot to coordinate complex, multi-contact behaviors, such as using one hand for support while the other manipulates an object, or transitioning smoothly between walking and reaching.

LBMs are trained on large-scale datasets of whole-body behavior, capturing the continuous, fluid nature of human movement rather than decomposing it into discrete skills. The key advantage is the ability to execute long, complex task sequences without the brittle handoffs between separate locomotion and manipulation controllers that plague traditional approaches. Toyota Research Institute has demonstrated LBM-controlled robots performing extended kitchen tasks that require simultaneous walking, reaching, grasping, and placing.

Learning method	How it works	Strengths	Limitations	Key applications
Reinforcement learning	Trial-and-error optimization via reward signals in simulation	Discovers novel solutions; excels at dynamic tasks; handles complex physics	Reward design is difficult; sim-to-real gap; sample inefficient	Walking, running, balancing, stair climbing, dynamic recovery
Imitation learning	Supervised learning from human demonstrations	Intuitive data collection; works where reward functions are hard to define; fast initial learning	Limited by demonstration quality and quantity; distribution shift problems	Dexterous manipulation, tool use, household tasks
Vision-language models	Joint processing of visual and textual inputs for reasoning	Common-sense reasoning; handles ambiguous instructions; generalizes to novel contexts	Computationally expensive; may hallucinate; slow inference	Task planning, scene understanding, human-robot dialogue
Vision-language-action models	End-to-end mapping from images and language to motor commands	Eliminates hand-crafted pipelines; transfers across embodiments; unified architecture	Requires massive training data; challenging to ensure safety; early stage	General-purpose manipulation, multi-step task execution
Large behavior models	Unified whole-body control from large-scale behavior datasets	Integrated locomotion and manipulation; fluid, continuous movement; long task sequences	Data-intensive; early research stage; limited availability	Complex multi-contact tasks, kitchen work, industrial assembly

Control approaches

The low-level control systems that translate high-level AI decisions into physical joint movements are critical to humanoid robot performance. Several control paradigms are used, each with distinct characteristics.

Impedance control

Impedance control makes the robot behave like a mass-spring-damper system, regulating the relationship between the robot's position and the force it exerts on the environment. Rather than commanding a rigid position trajectory, the controller specifies a desired mechanical impedance (stiffness and damping) that determines how the robot responds to external forces. This makes the robot inherently compliant and safe for physical interaction.

Impedance control is widely used in humanoid robots that must interact with people or handle delicate objects. It provides stable behavior under unexpected contacts, such as being bumped or encountering an unseen obstacle. The primary drawback is that it requires force/torque sensing at the joints or end-effectors, which adds cost, complexity, and potential failure points to the hardware.

Admittance control

Admittance control takes the inverse approach: it measures external forces (via force/torque sensors) and converts them into reference velocities or positions for the robot's position-controlled joints. The robot "admits" external forces by moving in the direction of the applied force, creating compliant behavior on top of stiff position-controlled actuators.

This approach is particularly useful for robots that use high-gain position-controlled servos, which are common in industrial humanoid platforms. The advantage is compatibility with standard position-controlled hardware. The disadvantage is that the behavior can feel sluggish or delayed compared to true impedance control, since the compliance is computed rather than inherent in the mechanical system. Tuning the admittance parameters for natural-feeling interaction remains an active area of research.

Whole-body quadratic programming control

Whole-Body Quadratic Programming (QP) Control formulates the robot's control problem as a mathematical optimization solved at each control timestep. The QP balances multiple objectives simultaneously: maintaining balance, following a desired trajectory, respecting joint limits, obeying contact friction constraints, and minimizing energy consumption. Each objective is expressed as a cost function or constraint, and the QP solver finds the joint torques that best satisfy all of them.

This approach is the backbone of modern dynamic humanoid locomotion. Boston Dynamics Atlas and many research humanoids use variants of whole-body QP control. The strength is the ability to handle complex, multi-task scenarios elegantly within a unified framework. The weakness is computational intensity; solving a QP at 500 Hz or higher requires powerful onboard processors, and the formulation must be carefully constructed to remain feasible under all conditions.

Human-aware control

Human-aware control extends standard controllers to incorporate models of human state and dynamics. The robot considers not just its own objectives but also the posture, fatigue level, intent, and comfort of nearby humans. This is essential for collaborative humanoid robots working alongside people in manufacturing or home environments.

The controller may reduce speed and force when a human is nearby, adjust its posture to maintain clear sightlines, or modify its grasp strategy to facilitate handovers. Advanced versions model human ergonomics and predict fatigue, adjusting the collaboration pattern to reduce strain on the human partner. The primary challenge is accurate real-time estimation of human state, which requires sophisticated perception systems and predictive models.

Control approaches comparison

Control approach	Mechanism	Pros	Cons	Typical use cases
Impedance control	Regulates position-force relationship as a virtual mass-spring-damper	Stable under unexpected contacts; inherently compliant; safe for human interaction	Requires force/torque sensing hardware; complex parameter tuning; sensor cost and fragility	Physical human-robot interaction, delicate object handling, collaborative tasks
Admittance control	Converts measured forces into reference velocities for position-controlled joints	Compatible with standard position-controlled hardware; straightforward implementation	Can feel sluggish or delayed; less natural compliance than impedance control; tuning challenges	Industrial robots with position-controlled servos, assembly tasks
Whole-body QP control	Solves quadratic program at each timestep to balance multiple objectives	Handles complex multi-task scenarios; unified framework; respects physical constraints	Computationally intensive; requires powerful onboard processors; careful formulation needed	Dynamic locomotion, whole-body coordination, parkour-style movements
Human-aware control	Incorporates human state and ergonomics into control objectives	Considers human comfort and safety; reduces partner fatigue; enables natural collaboration	Requires accurate real-time human state estimation; complex perception systems; limited validation	Human-robot collaboration in manufacturing, assistive robotics, home environments

Human-robot interaction

As humanoid robots increasingly operate alongside people, the design of effective human-robot interaction (HRI) has become a major research and engineering focus. HRI for humanoid robots spans three primary domains, each with distinct interaction paradigms.

Interaction domains

Companions. Companion humanoid robots serve in coaching, therapy, elderly care, and educational roles. They engage in long-term social relationships with users, requiring consistent personality, emotional responsiveness, and the ability to remember previous interactions. Applications include rehabilitation coaching for stroke patients, social engagement for individuals with autism, and companionship for isolated elderly populations. The robot's social skills are paramount; it must convey warmth, patience, and attentiveness through facial expressions, gestures, and speech.

Co-Workers. Co-worker humanoid robots collaborate with humans in manufacturing, logistics, and professional settings. The interaction is task-focused, emphasizing efficiency, safety, and clear communication of intent. The robot must signal its planned movements to nearby workers, respond to human gestures and commands, and adjust its pace to match the human partner. Figure 02's deployment at BMW, where robots work alongside human operators on the assembly line, exemplifies this domain.

Avatars. Avatar humanoid robots are teleoperated platforms that provide a physical presence for a remote human operator. The operator sees through the robot's cameras, speaks through its speakers, and controls its movements in real time. Applications include remote medical consultations, hazardous environment inspection, and virtual tourism. Toyota T-HR3 was designed specifically as a telepresence avatar, allowing operators to control the robot's movements through a master-slave system with haptic feedback.

Effective humanoid robots require several social and cognitive capabilities that go beyond task execution.

Believability refers to the robot's ability to exhibit consistent, coherent behaviors that make it seem like a coherent agent rather than a collection of disconnected responses. A believable robot maintains a stable personality, remembers context from earlier in a conversation, and behaves in ways that align with its established character. Inconsistent behavior erodes human trust and engagement.

Readability is the capacity to signal intentions clearly so that nearby humans can predict the robot's next actions. A readable robot telegraphs its movements through gaze direction, preparatory gestures, and explicit verbal cues. In collaborative manufacturing, readability is a safety requirement; workers must be able to anticipate the robot's movements to avoid collisions.

Theory of Mind (ToM) is the ability to attribute mental states (beliefs, desires, intentions) to other agents. A robot with ToM can infer that a human partner is confused, frustrated, or waiting for assistance, and adjust its behavior accordingly. This capability is essential for natural collaboration and is one of the most challenging aspects of humanoid robot AI. Current systems approximate ToM through learned models of human behavior rather than true mental state reasoning.

Evolution framework

The development of humanoid robots can be understood through a six-stage evolutionary framework that describes the progressive accumulation of capabilities.

Six progressive stages

Stage 1: Structures. The foundation of humanoid robotics is mechanical design. This stage focuses on building bodies with human-like proportions, degrees of freedom, and physical capabilities. Advances in materials science, actuator technology, and mechanical engineering drive progress at this stage. The transition from hydraulic to electric actuation and from rigid to compliant mechanisms represents major milestones.

Stage 2: Senses. The second stage adds perception. Robots gain the ability to see (cameras, depth sensors), feel (force/torque sensors, tactile arrays), hear (microphones), and sense their own body state (IMUs, joint encoders). The richness and reliability of sensory input determine the upper bound on what the robot can achieve autonomously.

Stage 3: Behaviors. With structure and senses in place, robots develop repertoires of behaviors: walking, grasping, balancing, reaching, and manipulating. These behaviors may be hand-crafted (traditional robotics) or learned (reinforcement learning, imitation learning). The transition from scripted to adaptive behaviors marks a critical inflection point.

Stage 4: Functions. Behaviors combine into useful functions: performing a complete task like loading boxes, cooking a meal, or assisting a patient. This stage requires task planning, sequencing, and error recovery. Most commercial humanoid robots in 2025 are working at the boundary between Stage 3 and Stage 4.

Stage 5: Humanity. At this stage, robots develop social and emotional capabilities that enable rich interaction with humans. They display empathy, humor, cultural awareness, and the ability to form ongoing relationships. This stage goes beyond functional task completion to encompass the full range of human social behavior.

Stage 6: Intelligence. The final stage represents general intelligence, the ability to learn, reason, create, and adapt across any domain without specific programming. This corresponds to Level 5 autonomy and remains a theoretical goal.

Three paradigm levels

The evolution framework also defines three paradigm levels that describe the overall character of a humanoid robot:

Paradigm level	Description	Key characteristics	Current status
Human-Looking	Robot has a human-like physical appearance	Correct proportions, recognizable body parts, human-like face and hands	Achieved by many platforms
Human-Like	Robot exhibits human-like behaviors and capabilities	Natural movement, social interaction, adaptive task execution, emotional expression	Partially achieved; active development
Human-Level	Robot matches human general intelligence and versatility	Creative problem-solving, abstract reasoning, learning from minimal instruction, full social competence	Theoretical; equivalent to AGI

How to evaluate humanoid robot demonstrations

As the humanoid robotics industry generates increasing numbers of impressive-looking demonstration videos and announcements, a critical evaluation framework helps distinguish genuine progress from marketing. The following checklist provides a structured approach to assessing robot demonstrations.

Evaluation checklist

Actual deployment location. Where was the demo conducted? A controlled lab with perfect lighting, flat floors, and no obstacles is very different from a real factory, warehouse, or home. Demos on company premises with engineered conditions should be viewed with appropriate skepticism about real-world generalization.
Who controlled the demo. Was the robot operating autonomously, or was it teleoperated (partially or fully) by a human? Many impressive demonstrations involve hidden human operators or extensive pre-programming. Look for explicit statements about the level of autonomy during the shown tasks.
Failure documentation. Does the company show failures alongside successes? All robots fail, and transparency about failure modes indicates engineering maturity. Companies that only show perfect runs may be cherry-picking from many attempts.
Real product offering. Is the robot available for purchase or lease, or is it purely a research prototype? A functioning product with paying customers represents a fundamentally different stage of development than a lab demonstration.
Data requirements. How much training data does the robot need to learn a new task? A system that requires thousands of demonstrations for each new skill has very different scalability prospects than one that generalizes from a handful of examples.
Fine print and restrictions. What constraints apply to the demo conditions? Look for disclaimers about controlled environments, pre-mapped spaces, limited object sets, or specific lighting requirements. The gap between demo conditions and real-world conditions is often the most important detail.

Current capabilities versus limitations (2025)

An honest assessment of the state of humanoid robotics in 2025 reveals both genuine achievements and persistent challenges.

Category	What works	What still struggles
Object handling	Tote and bin movement; container unloading; simple pick-and-place of rigid objects	Deformable objects (clothing, food, bags); transparent or reflective objects; very small items
Navigation	Structured warehouse aisles; flat factory floors; pre-mapped indoor spaces	Cluttered homes; outdoor terrain with varied surfaces; crowded public spaces
Task complexity	Single-step repetitive tasks; scripted multi-step sequences in controlled settings	Open-ended multi-step tasks; novel problem-solving; adapting to unexpected situations
Error recovery	Retrying simple grasp failures; recovering from minor balance disturbances	Recovering from novel failures; diagnosing root causes; requesting appropriate help
Manipulation precision	Placement within 5 mm tolerance for structured tasks; large object grasping	Fine assembly (screws, connectors); tool use requiring human-level dexterity; force-sensitive tasks
Human interaction	Basic voice commands; scripted conversational responses; simple gesture recognition	Natural conversation with context; understanding implicit instructions; reading emotional cues
Endurance	5 to 10 hour shifts for simple, repetitive tasks	Full workday of varied tasks; graceful degradation under component wear; self-maintenance

References

Kemp, C. C., Edsinger, A., & Torres-Jara, E. (2007). "Challenges for robot manipulation in human environments." *IEEE Robotics & Automation Magazine*, 14(1), 20-29.
Tsagarakis, N. G., Caldwell, D. G., et al. (2017). "WALK-MAN: A High-Performance Humanoid Platform for Realistic Environments." *Journal of Field Robotics*, 34(7), 1225-1259.
Brohan, A., Brown, N., et al. (2023). "RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control." *arXiv preprint arXiv:2307.15818*.
Figure AI. (2024). "Figure 02: Next Generation Humanoid Robot." figure.ai.
Tesla. (2024). "Tesla Optimus: Progress Update." tesla.com.
Boston Dynamics. (2024). "Atlas: The Next Generation." bostondynamics.com.
Haddadin, S., De Luca, A., & Albu-Schaffer, A. (2017). "Robot Collisions: A Survey on Detection, Isolation, and Identification." *IEEE Transactions on Robotics*, 33(6), 1292-1312.
Goodrich, M. A., & Schultz, A. C. (2007). "Human-Robot Interaction: A Survey." *Foundations and Trends in Human-Computer Interaction*, 1(3), 203-275.
Open Robotics. (2023). "The State of Humanoid Robotics." openrobotics.org.
NVIDIA. (2024). "Project GR00T: Foundation Model for Humanoid Robots." nvidia.com.
Agility Robotics. (2024). "Digit in the Real World." agilityrobotics.com.
Sanctuary AI. (2024). "Carbon: The World's First Human-Like Intelligence in a Robot." sanctuary.ai.
Toyota Research Institute. (2024). "Large Behavior Models for Robotics." tri.global.
Sentis, L., & Khatib, O. (2005). "Synthesis of Whole-Body Behaviors through Hierarchical Control of Behavioral Primitives." *International Journal of Humanoid Robotics*, 2(4), 505-518.
Darvish, K., et al. (2023). "Whole-Body Control of Humanoid Robots." *Annual Review of Control, Robotics, and Autonomous Systems*, 6, 395-421.

Humanoid robot autonomy levels

Autonomy levels (L0 through L5)

Level 0: No autonomy

Level 1: Auxiliary control

Level 2: Partial autonomy

Level 3: Conditional autonomy

Level 4: High autonomy

Level 5: Full intelligence

Autonomy levels summary table

Classification by physical form

Bipedal humanoids

Wheeled humanoids

Classification by application domain

1. Industrial manufacturing

2. Commercial service

3. Extreme environments

4. Home services

Classification by AI integration model

Vertically integrated companies

Hardware-focused companies

AI model providers

AI systems and learning methods

Reinforcement learning

Imitation learning

Vision-language models

Vision-language-action models

Large behavior models

Control approaches

Impedance control

Admittance control

Whole-body quadratic programming control

Human-aware control

Control approaches comparison

Human-robot interaction

Interaction domains

Social and cognitive skills

Evolution framework

Six progressive stages

Three paradigm levels

How to evaluate humanoid robot demonstrations

Evaluation checklist

Current capabilities versus limitations (2025)

See also

References

Improve this article

Related Articles

XPeng IRON

Zero Moment Point (ZMP)

RobotEra L7

Humanoid robots

Humanoid robot applications

Foundation models

Humanoid robot autonomy levels

Autonomy levels (L0 through L5)

Level 0: No autonomy

Level 1: Auxiliary control

Level 2: Partial autonomy

Level 3: Conditional autonomy

Level 4: High autonomy

Level 5: Full intelligence

Autonomy levels summary table

Classification by physical form

Bipedal humanoids

Wheeled humanoids

Classification by application domain

1. Industrial manufacturing

2. Commercial service

3. Extreme environments

4. Home services

Classification by AI integration model

Vertically integrated companies

Hardware-focused companies

AI model providers

AI systems and learning methods

Reinforcement learning

Imitation learning

Vision-language models

Vision-language-action models

Large behavior models

Control approaches