Helix (VLA model)
Last reviewed
Sources
27 citations
Review status
Source-backed
Revision
v3 · 5,574 words
Improve this article
Add missing citations, update stale details, or suggest a clearer explanation.
Last reviewed
Sources
27 citations
Review status
Source-backed
Revision
v3 · 5,574 words
Add missing citations, update stale details, or suggest a clearer explanation.
Helix is a vision-language-action model (VLA) developed by Figure AI that controls humanoid robots by mapping camera images and natural language commands directly to continuous joint-level motion at 200 Hz. First announced on February 20, 2025, Helix was the first VLA to output high-rate continuous control of an entire humanoid upper body, and the first to run fully onboard a robot's embedded low-power GPUs with no cloud connection.[1] Figure describes it as a single neural network that uses "a single set of neural network weights to learn all behaviors, picking and placing items, using drawers and refrigerators, and cross-robot interaction, without any task-specific fine-tuning."[1] Helix runs onboard Figure 02 robots and was later extended to power the Figure 03 platform.
Helix uses a dual-system architecture: System 2 (S2), a roughly 7-billion-parameter VLM that reasons about the scene and language at 7 to 9 Hz, and System 1 (S1), an 80-million-parameter visuomotor policy that produces fast, reactive control at 200 Hz.[1] At launch it claimed several firsts in the vision-language-action model category: the first VLA to output continuous high-rate control across a 35-degree-of-freedom humanoid upper body, the first to operate simultaneously across two robots sharing identical weights, and the first production VLA to run without cloud connectivity on embedded low-power GPUs.[1] In January 2026, Figure released Helix 02, an extension that added full lower-body locomotion control by introducing a third neural subsystem called System 0.[3] During 2025, Figure extended the same architecture to warehouse package handling, laundry folding, and navigation learned from egocentric human video, and in May 2026 the company demonstrated fleets of Figure 03 robots running fully autonomous package-sorting shifts of 8 to more than 24 hours.[18][19][20][26]
Helix is a generalist control model for humanoid robots: a single neural network that takes monocular camera images and a natural language instruction and outputs continuous control of the robot's wrists, torso, head, and individual fingers. Figure states that "Helix is the first VLA to output high-rate continuous control of the entire humanoid upper body, including wrists, torso, head, and individual fingers," and the first VLA "that runs entirely onboard embedded low-power-consumption GPUs, making it immediately ready for commercial deployment."[1] Rather than a library of hand-coded skills, Helix learns picking, placing, drawer and refrigerator use, and two-robot handovers with one set of weights and no per-task fine-tuning.[1]
Figure AI was founded in 2022 by entrepreneur Brett Adcock with the goal of building general-purpose humanoid robots for commercial work environments. The company's first platform, Figure 01, demonstrated basic teleoperated tasks and served as a proof-of-concept for the hardware. Its second, Figure 02, was designed with commercial deployment in mind and entered a pilot program at BMW Group Plant Spartanburg in South Carolina in early 2024.[6]
The early Figure 02 robots ran under conventional control architectures written in C++: hand-engineered state machines that specified motions explicitly for each task.[9] Adding a new task meant writing new code. Adding a new object type meant adjusting perception pipelines. These systems were brittle and required manual reprogramming whenever the environment changed or a new object was introduced. The company's researchers concluded that this approach would not scale to the variety of tasks required in real workplaces, let alone homes.
The broader robotics field had been wrestling with the same problem. Task-specific behavioral cloning, where a model learns to imitate recorded demonstrations for one specific task, produced policies that worked reliably in their training environment but failed unpredictably when objects were moved, lighting changed, or unfamiliar items appeared. The emergence of large language models and vision-language models between 2021 and 2023 opened a different path: if a model pretrained on internet data already contained broad knowledge about objects and actions, that knowledge might transfer to physical manipulation with far less robot-specific training.
Beginning in mid-2024, Figure's AI team built Helix as a replacement: a single neural network that could handle perception, language understanding, and motor control together. The effort was tied to Figure's decision to bring its robot intelligence fully in-house. On February 4, 2025, Adcock announced on X that Figure was ending its collaboration agreement with OpenAI, writing: "Figure made a major breakthrough on fully end-to-end robot AI, built entirely in-house. We're excited to show you in the next 30 days something no one has ever seen on a humanoid."[12] Adcock argued that, like Tesla's approach to autonomous driving, "to solve embodied AI at scale in the real world, you have to vertically integrate robot AI."[12] On February 20, 2025, Figure published both a detailed technical blog post and a demonstration video showing two Figure 02 robots collaborating to store grocery items, handling thousands of novel household objects they had never encountered in training.[1][10]
The introduction of Helix coincided with a broader reorganization at Figure. The company restructured its engineering teams around the model, treating Helix as the central product rather than the robot hardware. Later in 2025, Figure announced the Helix Lab, a dedicated data collection and training facility built to feed a continuously growing corpus of robot demonstrations to the model. The lab was designed to integrate data collection, simulation, and on-robot learning into a single pipeline in preparation for broader commercial deployment of Figure 03.
Capital followed the model's progress. On September 16, 2025, Figure announced that it had exceeded $1 billion in committed Series C funding at a $39 billion post-money valuation, in a round led by Parkway Venture Capital with participation from Brookfield Asset Management, NVIDIA, Intel Capital, Macquarie Capital, LG Technology Ventures, Salesforce, T-Mobile Ventures, and Qualcomm Ventures. The company said the funds would expand robot production, build GPU infrastructure for Helix training and simulation, and finance large-scale data collection using human video and multimodal sensors.[21]
Helix uses a dual-system design inspired loosely by the cognitive science concept of fast and slow thinking. The two subsystems are called System 2 (S2) and System 1 (S1). They operate at very different timescales and serve different functions, but share information through a latent representation vector in shared memory.[1]
System 2 is a 7-billion-parameter vision-language model (VLM) that processes monocular camera images and natural language task instructions. It runs at 7 to 9 Hz, updating its internal representation of the scene on every cycle.[1] The model was initialized from an open-source, open-weight VLM pretrained on internet-scale text and image data, then further trained on robot teleoperation data.[1]
S2's primary job is scene understanding: identifying objects, parsing instructions, and encoding the current behavioral intent into a compact latent vector. This latent vector is written to shared memory and read continuously by S1.[1] Because S2 is grounded in a large internet-pretrained model, it generalizes across a wide variety of objects, textures, and language phrasings without requiring specific training examples for every item the robot will encounter.
To run a 7B-parameter model on embedded hardware, Figure quantized the model to 4-bit precision and implemented model parallelism across the robot's dual onboard GPUs. The result is a 23x reduction in computational overhead compared to a naive cloud-based implementation, and the system stays under 60 watts while maintaining sub-100ms latency from image capture to latent update.
System 1 is an 80-million-parameter transformer-based visuomotor policy. It runs at 200 Hz, consuming both raw camera images and the latent vector produced by S2 to output continuous joint-level commands.[1] The S1 architecture uses a fully convolutional, multi-scale vision backbone pretrained in simulation, combined with a cross-attention encoder-decoder transformer that fuses visual and latent inputs into action predictions.[1]
S1 controls the robot's full upper-body action space: wrist poses in six degrees of freedom, finger flexion and abduction for each digit, torso orientation, and head gaze direction.[1] This 35-DoF action space is output as a continuous stream rather than discrete waypoints, which is what allows Helix to produce fluid, reactive motion rather than the step-by-step movements typical of earlier robot manipulation systems.
A critical design decision was the temporal offset between S1 and S2 inputs during training. Because S2 operates at roughly 7-9 Hz while S1 runs at 200 Hz, there is an inherent latency between when S2 updates its latent and when S1 acts on it. Figure deliberately replicated this latency offset during training so the model would not learn behaviors that depend on information S2 cannot provide in time during deployment.[1] This train-inference alignment prevents a class of compounding errors that plagued earlier dual-system robot controllers.
S2 runs as an asynchronous background process, continuously writing updated latents to shared memory.[1] S1 reads the most recent available latent on every 5-millisecond control cycle. The result is a system that can think at two timescales simultaneously: S2 reasons about what the robot is supposed to be doing while S1 handles the moment-to-moment physics of manipulation.
Both subsystems run on dual NVIDIA GPU modules embedded in the robot chassis, with no reliance on cloud compute.[1][27] Inference is split across two GPUs using model parallelism: S2 loads onto one GPU while S1 uses the other, and the shared memory latent vector bridges them. This onboard execution is what allows Helix to operate in environments with no network connectivity and to respond with the sub-100ms latency required for safe physical interaction.
Running the 7B-parameter S2 on embedded GPUs required significant compression. Figure applied 4-bit quantization to reduce memory footprint and combined it with model parallelism to split computation across both onboard GPUs. The result was a 23x reduction in computational overhead compared to a cloud-hosted inference approach, while the system stays under 60 watts total. This is a meaningful constraint for a mobile humanoid robot: the entire inference budget must fit within the robot's power envelope alongside motor controllers, sensors, and safety systems.
Helix was trained on roughly 500 hours of teleoperated demonstrations collected by a team of human operators controlling Figure 02 robots through a teleoperation interface.[1] The dataset was gathered using multiple robots and multiple operators to ensure behavioral diversity,[1] and care was taken to filter out slow or unsuccessful demonstrations while keeping examples that showed corrective behavior when failures arose from environmental randomness rather than operator error.
Figure's team described this training set as "a small fraction of the size of previously collected VLA datasets (<5%)," a point they attributed to data quality rather than data volume.[1] Working closely with teleoperators to standardize and refine manipulation strategies produced measurable improvements in policy quality. The company also applied hindsight instruction labeling: a separate auto-labeling VLM processed recorded demonstrations and generated natural language descriptions of each behavior after the fact, producing a paired dataset of (video, instruction) without requiring operators to dictate commands during collection.[1]
Training used a standard regression loss from raw pixels and text tokens to continuous action vectors.[1] There were no task-specific fine-tuning stages or separate action heads for different behaviors. The single set of neural network weights handles picking, placing, drawer operation, refrigerator use, multi-robot handovers, and other manipulation behaviors without any specialization per task.[1]
Objects used during training were explicitly excluded from the evaluation protocol, so all published results on novel object generalization reflect true out-of-distribution performance.[1]
On September 18, 2025, Figure announced Project Go-Big, an effort to build what it called the world's largest and most diverse pretraining dataset for humanoid robots, centered on egocentric (first-person) human video rather than robot teleoperation.[20] The initiative is supported by a partnership with Brookfield, whose real estate portfolio spans more than 100,000 residential units, 500 million square feet of commercial office space, and 160 million square feet of logistics facilities, giving Figure access to diverse human environments for passive video collection.[20] As a first result, Figure reported that Helix learned to navigate cluttered real-world spaces from natural language commands, mapping camera images and language directly to low-level SE(2) velocity commands, after training on human video alone. The company described this as zero-shot human-to-robot transfer and said it was the first time a humanoid robot had learned end-to-end navigation using only human video, with the same unified Helix network handling both dexterous manipulation and navigation.[20]
The most widely discussed capability of Helix at launch was its ability to handle objects the model had never seen during training. In the February 2025 demonstration, Figure 02 robots successfully picked up and stowed thousands of distinct household items, responding to natural language commands such as "Pick up the pasta box" or "Place the cereal in the cabinet." Because S2 is grounded in a large internet-pretrained VLM, it can identify and reason about arbitrary objects based on visual appearance and category knowledge, even without prior robot interaction examples.[1]
Figure specifically excluded training objects from the evaluation demonstrations, meaning no demonstration in the training set contained the exact items shown in the video.[1] This out-of-distribution generalization was a direct consequence of the S2 architecture and its internet pretraining.
Helix is the first VLA demonstrated running simultaneously across two robots sharing identical model weights.[1] In the February 2025 demonstration, two Figure 02 robots coordinated to complete a grocery storage task, with one robot handing items to the other and both adapting their behavior based on the shared task context.[10] No separate coordination protocol or inter-robot communication channel was required: both robots received the same natural language prompt and used S2's scene understanding to infer appropriate roles from the visual context.[1]
Prior to Helix, VLA models in the humanoid space typically controlled only end-effectors, delegating wrist orientation, torso posture, and head movement to separate lower-level controllers. Helix integrates all 35 upper-body degrees of freedom into a single action output, which means the neural policy can learn to use the entire upper body as part of its manipulation strategy.[1] This produces more natural motion and enables tasks that require coordinating hand position with torso lean or head gaze.
Running S2 and S1 entirely on embedded GPU hardware, without cloud connectivity, was a stated commercial priority at launch. Figure framed this as the model being "immediately ready for commercial deployment" without infrastructure dependencies.[1] The combination of 4-bit quantization, model parallelism, and the 80M-parameter S1 keeps total power consumption low enough for a mobile robot that must manage its own energy budget.
The onboard-only constraint also has implications for safety and latency. A system that relies on cloud inference introduces network latency into the control loop and has a single point of failure: if the network connection drops, the robot stops. Edge-native deployment eliminates both concerns. Helix's 200 Hz control loop and sub-100ms end-to-end latency are only achievable because all inference happens on the same hardware that drives the motors.
On August 12, 2025, Figure reported that Helix had learned to fold laundry, which the company described as the first instance of a humanoid robot with multi-fingered hands folding laundry fully autonomously using an end-to-end neural network.[19] Deformable textiles are among the hardest targets for manipulation policies because they change shape continuously and tend to wrinkle, crumple, or tangle. Figure stated that the same Helix architecture used for logistics was applied with no modifications to the model or training hyperparameters; only the training dataset changed. Demonstrated behaviors included picking towels from a mixed pile, adapting the folding strategy to each towel's starting configuration, recovering from multi-pick errors by returning extra items, and fine motions such as tracing an edge with a thumb and pinching corners before completing folds.[19]
Six days after the initial announcement, Figure published a follow-on technical post describing a specialized deployment of Helix for warehouse logistics.[2] The logistics variant added two modifications to the core System 1 visuomotor policy.
First, Figure added implicit stereo vision with multi-scale feature extraction. The stereo upgrade gave S1 a richer 3D understanding of package positions, particularly useful for estimating depth when placing items on moving conveyor belts. Benchmarks showed a 60% throughput increase over the non-stereo baseline, and the stereo model generalized to flat envelopes that had never appeared in the training corpus.[2]
Second, Figure implemented learned visual proprioception, which allows S1 to calibrate itself to individual robot bodies without manual recalibration. This cross-robot transfer capability was important for scaling the policy across a fleet where each unit has slightly different mechanical tolerances.[2]
The logistics training set was notably small: just 8 hours of curated demonstration data produced a functional policy.[2] Curated, high-quality data outperformed a three-times-larger uncurated dataset by 40% on throughput, reinforcing Figure's emphasis on data quality.[2] A "Sport Mode" feature enabled test-time speedup by linearly resampling action chunks, allowing the robot to execute motions 20 to 50% faster than the human demonstrators while maintaining high success rates.[2]
A third logistics post on June 7, 2025 reported further gains from scaling the same recipe. Average handling time fell to 4.05 seconds per package, roughly 20% faster than the previous system, and the policy handled deformable poly bags, padded mailers, and flat envelopes about as reliably as rigid boxes.[18] The share of packages placed with the shipping label correctly oriented for barcode scanning rose to approximately 95%, up from roughly 70%. The update added three architectural elements to System 1: a short-term visual memory that retains information across recent frames, a proprioceptive state history that feeds recent hand, torso, and head positions into the policy, and force feedback that gives the controller a basic sense of touch for grasp precision.[18] A data-scaling study trained models on 10, 20, 40, and 60 hours of demonstration trajectories; the largest dataset produced a 58% throughput increase and lifted barcode-scan success to 94.4%, supporting Figure's claim that performance scales predictably with high-quality data.[18]
Figure released Helix 02 with the Figure 03 platform, introducing a third subsystem called System 0 (S0) that extended Helix from upper-body manipulation to full-body loco-manipulation.[3]
System 0 is a 10-million-parameter neural network that runs at 1,000 Hz, one level below S1 in the control hierarchy.[3] It replaces the hand-engineered C++ balance and locomotion controllers that Figure's earlier robots relied on. S0 takes full-body joint state and base motion commands as input and outputs joint-level actuator commands at 1 kHz, handling contact, balance, and coordination across legs, torso, and arms simultaneously.[3]
S0 was trained entirely in simulation across more than 200,000 parallel environments, using reinforcement learning against a corpus of over 1,000 hours of retargeted human motion capture data.[3] The simulation-to-real transfer succeeded well enough that Figure could delete the final 109,504 lines of hand-engineered C++ from the robot's control stack, a milestone CEO Brett Adcock described as reaching "Software 2.0" status.[3][9]
The three-tier hierarchy in Helix 02 is:
The flagship demonstration for Helix 02 showed a Figure 03 robot completing a dishwasher loading and unloading task: a four-minute, end-to-end autonomous operation with no resets and no human intervention.[3] The task required 61 sequential loco-manipulation actions, including walking across the kitchen, opening the dishwasher door, grasping dishes from various positions, placing them in the rack, and closing the door.[3] The robot also demonstrated improvised whole-body moves, closing a drawer with its hip and lifting the dishwasher door with its foot when its hands were occupied.[3]
Other demonstrated tasks included unscrewing bottle caps, extracting pills from a pill organizer, dispensing liquid from a syringe, and sorting small metal components.[3] These tasks were designed to show that tactile feedback and palm cameras enabled manipulation beyond the limits of vision-only policies.[3]
Figure 03 was designed alongside Helix 02 to provide the sensory data the upgraded policy required. The robot's vision system delivers twice the frame rate, one-quarter the latency, and 60% wider field of view per camera compared to Figure 02.[4] Palm-mounted cameras in each hand provide close-range visual feedback during grasping.[4] Custom fingertip tactile sensors detect forces as small as 3 grams, roughly the weight of a paperclip.[4] These sensing upgrades feed directly into S1 and S0, enabling manipulation behaviors that were out of reach for the original Helix running on Figure 02 hardware.[3]
In May 2026, Figure used Helix 02 to demonstrate sustained autonomous work. The company showed a team of Figure 03 robots completing a full 8-hour package-sorting shift, detecting barcodes, reorienting parcels, and replacing them on a conveyor, with Brett Adcock stating that the fleet ran "at human performance levels" and that the work was fully autonomous under Helix 02.[25] After recording zero failures, Figure extended the run past 24 hours of continuous operation in a livestreamed demonstration in which three robots sorted more than 28,000 packages at roughly 3 seconds per package, a pace the company characterized as approximate human parity, with no teleoperation involved. The system used autonomous recovery behaviors to reset itself when it encountered unfamiliar situations.[26]
Figure 02 is a humanoid robot standing approximately 5 feet 6 inches tall and weighing around 70 kilograms.[27] It was the primary platform for Helix at launch and for the BMW deployment. The robot's upper-body configuration, with multi-finger hands and a mobile base, was designed to fit within industrial environments sized for human workers.
Figure 03 was announced on October 9, 2025,[4] as a consumer and home-oriented platform for Helix 02. Standing 5 feet 8 inches tall and weighing 61 kilograms (9% lighter than Figure 02 despite adding new sensors), Figure 03 includes wireless inductive charging through coils in its feet, allowing the robot to autonomously dock and recharge. Its actuators run at twice the speed of Figure 02's with improved torque density, and the chassis incorporates multi-density foam padding and a soft textile exterior for safety in proximity to people.[4] Figure 03's 10 Gbps mmWave data offload capability allows each unit in the field to continuously upload operational data for fleet-level model updates.[4]
Figure priced Figure 03 at approximately $20,000 for consumers and targeted initial home deployments in late 2026. As of early 2026, the company was producing units at its BotQ manufacturing facility in California at a rate approaching one robot every 90 minutes, with a stated annual capacity of approximately 50,000 units.[23]
TIME named Figure 03 one of its Best Inventions of 2025.[16] Production continued to accelerate: in an April 29, 2026 update, Figure said BotQ was completing one robot per hour, a 24x throughput improvement in under 120 days, with more than 350 Figure 03 units delivered, end-of-line first-pass yield above 80%, and custom manufacturing software coordinating over 150 networked workstations.[24] Figure frames the growing fleet as a data flywheel for Helix: each deployed robot generates training data and receives fleet-wide capability upgrades over the air.[24]
The most extensively documented real-world deployment of Helix-equipped robots was the Figure 02 pilot at BMW Group Plant Spartanburg in South Carolina. Figure and BMW announced the collaboration in early 2024.[6] The project ran for 11 months, with robots on active production lines from month 10 onward, running 10-hour shifts Monday through Friday.[5]
During the deployment, Figure 02 robots performed sheet-metal loading operations: picking components from racks and placing them into welding fixtures within a 5-millimeter tolerance in a 2-second window.[5] Over the course of the program:
The robots targeted three KPIs per shift: 84-second cycle time (37 seconds for the loading phase), greater than 99% placement accuracy, and zero human interventions required.[5]
The program surfaced a critical hardware reliability issue: the forearm was the top failure point across the fleet, attributed to tight mechanical packaging, three degrees of freedom in a small volume, and thermal constraints. This finding directly shaped the redesign of Figure 03's wrist electronics, which eliminated the distribution board and dynamic cabling and moved each wrist motor controller to direct communication with the main computer, reducing complexity and improving thermal management.[5]
BMW subsequently expanded its humanoid program with a different robot maker, Hexagon Robotics. An initial test deployment of Hexagon's AEON robot at BMW Group Plant Leipzig in Germany began in December 2025, with a further test phase planned for spring 2026 ahead of a full pilot launch in summer 2026, marking the first humanoid use in BMW's German production network.[7]
Helix belongs to a generation of VLA models that emerged between 2023 and 2025 for robot control. The models differ substantially in architecture, parameter count, control frequency, and target robot platforms. Each reflects different tradeoffs between generality, speed, deployment constraints, and the type of robot it was designed for.
| Model | Developer | Release | Parameters | Control Hz | Architecture type | Target platform |
|---|---|---|---|---|---|---|
| RT-2 | Google DeepMind | July 2023 | 55B (PaLM-E) | ~1-3 Hz | Single end-to-end VLA | Robot arm |
| Helix (S1+S2) | Figure AI | Feb 2025 | 7B (S2) + 80M (S1) | 200 Hz | Dual-system (VLM + visuomotor) | Humanoid upper body |
| Isaac GR00T N1 | NVIDIA | Mar 2025 | 2B | 120 Hz | Dual-system (VLM + diffusion) | Humanoid |
| pi0 | Physical Intelligence | Oct 2024 | 3.3B | 10 Hz | VLM + action expert (flow matching) | Multi-robot arm |
| GR00T N1.7 | NVIDIA | 2025 | 3B | 120 Hz | Dual-system + reasoning | Humanoid |
RT-2 was the seminal demonstration that a VLM pretrained on internet data could be co-finetuned on robot demonstrations to produce a general-purpose robot policy.[15] Its primary contribution was establishing the transfer of web knowledge to robot control. However, RT-2 controls a 6-DoF arm at roughly 1-3 Hz and was not designed for humanoid whole-body control.[15] The 55-billion parameter PaLM-E variant also required cloud compute and was not deployable onboard a mobile robot.
pi0 from Physical Intelligence uses a 3.3-billion-parameter architecture that combines a PaliGemma VLM backbone with a flow-matching action expert.[14] It operates at 10 Hz and is primarily evaluated on arm manipulation tasks across multiple physical robot platforms. Physical Intelligence subsequently released pi0.5 and pi0.6, progressively refining the architecture with improved reasoning and faster execution. A key difference from Helix is that pi0 targets a broader range of robot form factors and is made available to external researchers through an open-source release, whereas Helix is proprietary to Figure's robots.
Isaac GR00T N1 from NVIDIA uses a dual-system design that parallels Helix's S1/S2 architecture: a VLM module for scene understanding coupled with a diffusion transformer for action generation, running at 120 Hz. GR00T N1 was trained on a mixture of real teleoperation data, human video, and NVIDIA-generated synthetic data from Cosmos and Isaac Lab.[13] It was released as an open foundation model that third-party humanoid robot manufacturers can adapt, in contrast to Helix, which is tightly integrated with Figure's hardware and not publicly licensed.[13]
The core architectural distinction between Helix and most other VLAs is its 200 Hz continuous control rate and its 35-DoF whole-body action space.[1] Most VLA models target arm manipulation with 6-7 DoF at lower control frequencies. Helix's high-rate output is what allows it to handle the reactive, real-time adjustments required for humanoid whole-body manipulation. The tradeoff is that the design is specific to Figure's robot hardware and not straightforwardly portable to other platforms.
Another notable difference is the training data profile. Pi0 and GR00T N1 both incorporate large quantities of synthetic simulation data alongside real robot data. Helix v1 was trained exclusively on real teleoperated demonstrations, relying on data quality rather than simulation-generated volume.[1] Helix 02 introduced simulation-trained System 0, bringing Figure's approach closer to the sim-to-real paradigm used by competitors, but S1 and S2 continued to rely on human demonstration data.[3]
Several limitations in Helix's capabilities and deployment context were identified through Figure's own technical disclosures and the BMW program findings.
The original Helix controlled only the upper body. The lower body ran under separate hand-coded controllers, meaning the robot could not simultaneously walk and manipulate with the same neural policy. This was the primary gap that Helix 02 addressed.[3]
System 1, in the original design, relied mainly on proprioception for base stabilization, giving the robot limited environmental awareness outside its upper-body workspace. Navigation in complex or unstructured environments, including stairs and uneven terrain, required separate systems or manual switching prior to the System 0 integration in Helix 02.
The BMW deployment revealed that hardware reliability, not the AI policy itself, was the binding constraint in sustained industrial use. The forearm failure rate required intervention and ultimately drove a full redesign of the wrist subsystem for Figure 03.[5] Industrial deployment at scale exposed failure modes that controlled laboratory testing did not.
Figure's training approach requires large quantities of high-quality human demonstration data. The company was explicit that teaching a new behavior still requires either extensive teleoperation or programming. The 8-hour dataset figure for the logistics variant was held up as an achievement of data efficiency, but it still represents a meaningful collection effort per task category.[2] Generalizing to entirely new task domains beyond the existing training distribution remains a research problem.
In November 2025, Figure's former principal robotic safety engineer, Robert Gruendel, filed a lawsuit alleging the company had terminated his employment after he raised concerns that the robots posed injury risks, specifically that they were capable of generating forces sufficient to fracture a human skull. The company disputed the characterization. The suit, filed on November 21, 2025 in federal court in the Northern District of California, also alleged that a malfunctioning robot had gashed a steel refrigerator door; Figure said Gruendel was terminated for poor performance and called the allegations falsehoods it intended to discredit in court.[22] The case highlighted broader industry questions about safety evaluation standards for humanoid robots deployed in proximity to workers.
Figure 03 targeted home deployment in late 2026, but as of early 2026, no commercially available unit had been placed in a private residence. Adcock publicly noted that he would not release the robot for unsupervised home use until safety thresholds he had not yet fully specified were met.