Robot teleoperation is the remote control of a robot by a human operator, typically through a communication link that transmits commands from the operator to the robot and sensory feedback from the robot back to the operator. Teleoperation bridges human intelligence and decision-making with robotic precision and physical capability, enabling people to perform tasks in environments that are dangerous, distant, or otherwise inaccessible. In recent years, teleoperation has taken on a critically important secondary role: serving as a data collection mechanism for imitation learning and other robot learning approaches, where human demonstrations gathered through teleoperation are used to train autonomous robot policies.
The concept of remotely controlling a machine dates back to the late 19th century. In 1898, Nikola Tesla publicly demonstrated his "teleautomaton" at the Electrical Exposition in Madison Square Garden in New York City. The device was a radio-controlled boat, roughly three feet in length, equipped with blinking antennae and propelled by a small motor and rudder. Tesla transmitted wireless signals to the boat using a small box containing a lever and a telegraph key. A geared mechanism near the stern shifted a disk containing sets of electrical contacts; when advanced, the disk activated connections that powered electromagnetic motors controlling a rudder for steering and a screw propeller for propulsion. Tesla's U.S. Patent No. 613,809 described the first device for wireless remote control, and the demonstration laid foundational concepts for future developments in robotics, autonomous systems, and wireless communication.
The modern era of teleoperation began in the 1940s and 1950s with the development of mechanical manipulators for handling radioactive materials in nuclear research. Raymond C. Goertz, working at Argonne National Laboratory for the U.S. Atomic Energy Commission, designed the first bilateral master-slave manipulator in 1948. This was a seven-degree-of-freedom bilateral (symmetrical) metal tape transmission pantograph device operated through a leaded glass wall. In 1949, Goertz filed a patent for the device (U.S. Patent No. 2,632,574), and by 1951 he had improved the design with the first teleoperated articulated arm using steel pulleys and cables. Through incorporating the principles of cybernetics, Goertz also constructed the first electrical master-slave manipulator system, and by 1954 a modified version (CRL Model 8) entered commercial production. Goertz's work established the foundational principles of bilateral force-reflecting positional servomechanisms that underpin modern telerobotics.
Teleoperation played a central role in space exploration throughout the late 20th and early 21st centuries. The Canadarm (officially the Shuttle Remote Manipulator System, or SRMS), developed by Canada, operated on NASA's Space Shuttle fleet for 30 years from 1981 to 2011, deploying, maneuvering, and capturing payloads in orbit. Its successor, Canadarm2, was installed on the International Space Station (ISS) in 2001. Canadarm2 is 17.6 meters long when fully extended, has seven motorized joints, weighs 1,800 kg, and can handle payloads of up to 116,000 kg. It can be controlled by astronauts aboard the ISS or by ground teams at the Canadian Space Agency headquarters or NASA, and has been used to capture unpiloted spacecraft such as the SpaceX Dragon, Cygnus, and Japanese H-II Transfer Vehicle.
The application of teleoperation to surgery represents one of the field's most impactful commercial successes. The da Vinci Surgical System, developed by Intuitive Surgical, became the most widely used multipurpose robotic surgery system in the world. In the da Vinci system, the surgeon sits at an ergonomically designed control console, using a magnified three-dimensional view of the operating field, while robot arms on a movable cart next to the patient replicate the surgeon's hand movements with enhanced precision and dexterity. In 2025, Intuitive Surgical demonstrated transatlantic telesurgery capabilities, connecting surgeons across more than 4,000 miles. Separately, Dr. Sudhir Srivastava completed a cardiac repair on a patient in India while operating from France using the SSi Mantra 3 system, marking a landmark moment for long-distance teleoperated surgery.
Robot teleoperation systems can be classified according to the degree of autonomy shared between the human operator and the robot. The three principal categories are direct teleoperation, shared autonomy (or shared control), and supervisory control.
| Type | Description | Human Role | Robot Autonomy | Example Applications |
|---|---|---|---|---|
| Direct teleoperation | The operator controls robot motion directly, with no automated assistance | Full manual control of all degrees of freedom | None or minimal | Bomb disposal robots, basic ROV control, early master-slave manipulators |
| Shared autonomy | Operator and robot share control; the system provides automated assistance such as collision avoidance or trajectory smoothing | High-level guidance and corrections | Partial; assists with low-level control | Surgical robots, assistive manipulation, semi-autonomous driving |
| Supervisory control | The operator specifies high-level goals or sub-tasks, and the robot executes them autonomously | Task specification and monitoring | High; executes sub-tasks independently | Space robotics, autonomous vehicles with human oversight, industrial automation |
In direct teleoperation, the operator controls every aspect of the robot's motion in real time without any automated help. The operator's inputs (from a joystick, motion-tracked controller, exoskeleton, or leader robot) are mapped directly to the robot's actuators. This mode provides maximum operator authority but demands high cognitive load, continuous attention, and low-latency communication. Direct teleoperation is common in explosive ordnance disposal (EOD) robots, underwater remotely operated vehicles (ROVs), and basic industrial manipulators.
Shared autonomy (also called shared control) systems divide control responsibilities between the human and the robot. The robot handles certain known dimensions of the task at a fast update rate, such as maintaining stability, avoiding collisions, filtering hand tremor, or smoothing trajectories, while the human provides higher-level intent. The level of autonomy can be fixed or adjusted dynamically based on the situation. In collaborative control, a variant of shared autonomy, the user and robot are treated as peers that resolve conflicts through negotiation and dialogue. Shared autonomy is widely used in surgical teleoperation, where the robot provides tremor compensation and motion scaling while the surgeon directs the procedure.
In supervisory control, the operator divides a problem into a sequence of sub-tasks, which the robot then executes autonomously. The human interacts with the robot through a user interface, specifying goals rather than detailed motions. The more capable the robot, the longer it can operate without human intervention. This mode is well suited for environments with significant communication delays (such as space or deep-sea operations), where direct real-time control would be impractical.
Leader-follower (also called master-slave) systems are among the most intuitive teleoperation interfaces. The operator physically manipulates a "leader" robot or device, and the "follower" robot replicates those motions in real time. The ALOHA system, for example, uses two smaller WidowX arms as leader devices that the operator backdrives (or "puppeteers"), while two larger ViperX 6-DoF arms serve as followers. This approach provides a natural kinesthetic mapping between operator input and robot output, making it particularly effective for bimanual manipulation tasks.
Virtual reality headsets have become increasingly popular for robot teleoperation, offering immersive visual feedback and intuitive hand tracking. The Open-TeleVision system uses VR devices (such as the Apple Vision Pro, Meta Quest 3, or PICO 4 Ultra) to stream the operator's hand, head, and wrist poses to a server, which retargets these poses to control a humanoid robot. The system provides stereoscopic vision from the robot's cameras, creating an immersive experience as though the operator's mind has been transferred into the robot body. Open-TeleVision operates at 60 Hz and has been demonstrated with remote control over the Internet, with an operator at MIT on the U.S. east coast teleoperating a Unitree H1 robot at UC San Diego on the west coast.
NVIDIA has also demonstrated using the Apple Vision Pro to capture teleoperated demonstrations, which are then simulated in NVIDIA Isaac Sim and expanded using the MimicGen NIM microservice to generate large-scale synthetic training datasets from a small number of human demonstrations.
Exoskeleton-based interfaces allow operators to control a robot using their natural body movements while receiving force feedback from the remote environment. Haptic feedback is particularly valuable because it gives the operator a sense of touch, enabling more precise and careful manipulation. Research in 2025 demonstrated immersive bilateral bimanual telerobotic systems where operators wear exoskeleton gloves equipped with force feedback motors on each finger, providing up to 0.5 N-m of torque per finger. Studies have consistently shown that haptic force feedback enhances physical perception capabilities, compensates for visual perception deficiencies, and reduces operator cognitive burden.
The TelePulse system (presented at CHI 2025) explored a novel approach using electrical muscle stimulation (EMS) combined with biomechanical simulation to provide haptic feedback during VR-based teleoperation, demonstrating the potential for more lightweight and accessible haptic interfaces.
Whole-body teleoperation systems use motion capture technology to track an operator's full body movements and retarget them to a humanoid robot. The TWIST (Teleoperated Whole-Body Imitation System), published in 2025 and presented at CoRL 2025, generates reference motion clips by retargeting human motion capture data to a humanoid robot. TWIST uses a combination of reinforcement learning and behavior cloning to develop a robust whole-body controller, enabling a Unitree G1 humanoid robot with 29 degrees of freedom to perform whole-body manipulation (lifting boxes from the ground), legged manipulation (kicking a football), locomotion (walking sideways), and expressive movements (performing a waltz dance), all with a single unified neural network controller.
Not all teleoperation requires expensive equipment. The Universal Manipulation Interface (UMI), developed by researchers at Stanford University, Columbia University, and Toyota Research Institute in 2024, takes the form of a handheld parallel-jaw gripper mounted with a GoPro camera. UMI enables portable, low-cost data collection for robot manipulation demonstrations without requiring a physical robot during data collection. The resulting learned policies are hardware-agnostic and deployable across multiple robot platforms. UMI is not only faster than traditional teleoperation but can also capture demonstrations for tasks that are difficult with typical teleoperation interfaces, such as dynamic tossing. Both UMI's hardware and software are open-sourced.
One of the most significant developments in teleoperation during the 2020s has been its use as a systematic data collection tool for training robot machine learning policies. Rather than programming robots with explicit rules, researchers use teleoperation to gather large datasets of human demonstrations, which are then used to train autonomous policies through imitation learning, behavior cloning, or related techniques.
The typical workflow for teleoperation-based robot learning follows several stages:
Action Chunking with Transformers (ACT) is a key algorithm developed alongside the ALOHA system by Tony Z. Zhao and colleagues at Stanford University. ACT's central innovation is predicting a sequence of actions (an "action chunk") rather than a single action at each timestep, which reduces the effective planning horizon and improves learning efficiency. The policy is trained as the decoder of a Conditional Variational Autoencoder (CVAE), synthesizing images from multiple viewpoints and joint positions through a transformer encoder, then predicting action sequences with a transformer decoder. ACT achieves 80 to 96 percent success rates on most real-world tasks with only 10 minutes of demonstration data, significantly outperforming baselines such as BC-ConvMLP, BeT, RT-1, and VINN.
Diffusion models, originally developed for image generation, have been adapted for robot policy learning with strong results. Diffusion Policy, introduced by Cheng Chi and colleagues, formulates robot action prediction as a denoising diffusion process, iteratively refining random noise into precise action trajectories. ALOHA Unleashed (2024) from Google DeepMind combined the ALOHA 2 hardware with a transformer-based neural network trained using Diffusion Policy, enabling robots to autonomously perform complex tasks such as tying shoelaces, hanging t-shirts, replacing a robot finger, inserting gears, and stacking kitchen items. Subsequent work such as 3D Diffusion Policy (2024) encodes sparse point clouds into compact 3D representations, enabling successful policy training with as few as 10 to 40 demonstrations.
More recent systems go beyond simple one-shot demonstration collection. RoboCopilot (2025) presents a complete bimanual teleoperation system that alternates between model training and data collection, where a human teleoperator interrupts during policy execution to provide corrective feedback. HACTS (Human-As-Copilot Teleoperation System, 2025) establishes bilateral real-time joint synchronization between a robot arm and teleoperation hardware, enabling seamless human intervention while collecting action-correction data for future learning.
Research has shown that different demonstration modalities affect downstream learning performance. A 2025 study found that kinesthetic teaching (physically guiding the robot by hand) provides the cleanest data for the best downstream policy learning performance, and that a combination of a small number of kinesthetic demonstrations mixed with data collected through teleoperation achieves the best overall results. This finding suggests that the ideal data collection approach may involve multiple complementary modalities.
ALOHA (A Low-cost Open-source Hardware System for Bimanual Teleoperation) was developed by Tony Z. Zhao, Zipeng Fu, and colleagues at Stanford University. The system demonstrated that low-cost hardware combined with effective imitation learning algorithms could achieve impressive manipulation capabilities.
| Feature | ALOHA (Original) | ALOHA 2 |
|---|---|---|
| Arms | 2x ViperX follower + 2x WidowX leader | 2x ViperX follower + 2x WidowX leader (improved) |
| Degrees of freedom | 6 DoF per arm | 6 DoF per arm |
| Gripper | Parallel jaw | Parallel jaw (enhanced) |
| Improvements | Original design | Greater performance, ergonomics, and robustness |
| Learning algorithm | ACT (Action Chunking with Transformers) | ACT, Diffusion Policy |
| Open source | Yes | Yes |
ALOHA 2, developed in collaboration with Google DeepMind, introduced improvements in performance, ergonomics, and robustness compared to the original design, and served as the hardware platform for ALOHA Unleashed.
Mobile ALOHA extends the original ALOHA system by mounting it on a wheeled mobile base, creating a low-cost whole-body teleoperation system for mobile manipulation. The system costs approximately $32,000 including onboard power and compute, and the mobile base can travel at up to 1.42 meters per second (comparable to average human walking speed). The Mobile ALOHA paper was published at the Conference on Robot Learning (CoRL) in 2024.
Using supervised behavior cloning and co-training with existing ALOHA datasets, Mobile ALOHA can autonomously complete complex mobile manipulation tasks such as sauteing and serving shrimp. With 50 demonstrations per task, success rates increase by up to 90 percent. Both the hardware designs and software are fully open-sourced.
Open-TeleVision is an immersive teleoperation system that uses VR headsets to provide stereoscopic visual feedback and mirror the operator's arm and hand movements onto a humanoid robot. The system supports multiple VR devices and can operate over the Internet, enabling long-distance teleoperation. It was demonstrated teleoperating a Unitree H1 humanoid robot and has been used in conjunction with 3D Diffusion Policy for generalizable humanoid manipulation research.
UMI, developed in 2024, takes a fundamentally different approach by decoupling data collection from the physical robot. Demonstrations are collected using handheld grippers with mounted cameras, and the resulting policies can be transferred to different robot platforms. UMI's design enables data collection "in the wild" (in any environment, including homes and kitchens) without needing to bring a robot to each location. The system includes a carefully designed policy interface with inference-time latency matching and a relative-trajectory action representation for hardware-agnostic deployment.
LeRobot, developed by Hugging Face and led by ex-Tesla researcher Remi Cadene, is an open-source library providing models, datasets, and tools for real-world robotics in PyTorch. Launched in 2024, the project grew from zero to over 12,000 GitHub stars within its first twelve months. LeRobot provides a unified Robot class interface supporting a wide range of robots and teleoperation devices, including phone-based teleoperation (using an iOS or Android device as a teleoperator). It offers standardized dataset formats hosted on the Hugging Face Hub, pre-trained policy models (including ACT, Diffusion Policy, and Vision-Language-Action models like pi0.5 and GR00T N1.5), and affordable hardware options such as the SO-100 arm (approximately $100). In 2025, Hugging Face acquired Pollen Robotics and partnered with NVIDIA to further accelerate open-source robotics research.
The commercialization of humanoid robots in 2025 and 2026 has brought teleoperation into the spotlight as a practical deployment strategy. Several companies have adopted teleoperation as a bridge between current robot capabilities and the long-term goal of full autonomy.
1X Technologies launched the NEO humanoid robot for pre-order in late October 2025 at a price of $20,000 (with a subscription option of $499 per month), with first customer deliveries planned for 2026. NEO combines lightweight construction, quiet electric actuation, and a teleoperation-to-autonomy AI pipeline. The company's approach is explicitly built on a "human-in-the-loop" training model, commercially branded as "Expert Mode." When NEO encounters a task it has not yet learned to perform autonomously, a human operator from 1X (called a "1X Expert") can remotely pilot the robot through the task. Every teleoperated session is recorded, labeled, and fed into 1X's Redwood AI model to accelerate generalized learning. NEO ships with basic autonomous capabilities (opening doors, fetching items, turning off lights) but relies on teleoperation for complex tasks such as folding clothes. The company's stated goal is to ship a "mostly fully autonomous" robot by late 2026, with the quality and breadth of autonomous behaviors improving continuously with data.
Sanctuary AI, based in Vancouver, Canada, developed the Phoenix humanoid robot with an emphasis on task-learning through teleoperation. Operators guide Phoenix through complex workflows, and the resulting data is used to train autonomous capabilities. By 2024, Sanctuary AI had completed a successful commercial deployment at a store in Canada, where the teleoperated Phoenix system demonstrated the ability to complete 110 retail-related tasks. The company's Phoenix robot progressed through several hardware generations: Generation 7 (April 2024) introduced faster task learning (under 24 hours), improved range of motion, lighter weight, and lower bill-of-materials cost; Generation 8 (January 2025) was optimized for data capture with improved cameras, telemetry, and person-robot interaction capabilities. In February 2025, Sanctuary AI equipped Phoenix with new tactile sensors, giving human operators a sense of touch during teleoperation and enabling more precise control for dexterous manipulation tasks.
Many humanoid robot companies use some form of teleoperation in their development pipeline, even if they are less explicit about it than 1X and Sanctuary AI. The general pattern across the industry involves a blend of scripted behavior, teleoperation assist, and large language model-driven planning rather than full autonomy. In real deployments during 2025, humanoid robots in factories and warehouses primarily performed repetitive tasks such as moving totes, bins, and parts, unloading containers, and handling intralogistics operations.
Communication latency is one of the most persistent challenges in teleoperation. Delays between command issuance and action execution degrade operator situational awareness, increase cognitive load, and reduce task performance. In bilateral teleoperation systems (where force feedback is reflected back to the operator), even small delays can destroy system passivity and potentially lead to instability. This challenge is especially acute in space teleoperation, where communication latency can reach seconds or tens of seconds. Future solutions are expected to involve hybrid approaches combining modeling of user intent, prediction of robot movements, and time delay prediction using time series methods.
In haptic bilateral teleoperation, stability and transparency are fundamental but conflicting design goals. Transparency refers to how faithfully the remote environment's forces are conveyed to the operator; stability ensures the system does not oscillate or become unstable. Passivity-based control methods (such as wave variable transformation and time domain passivity control) have been the dominant approach for ensuring stability under communication delays, but these methods typically sacrifice transparency. Achieving both high transparency and robust stability simultaneously remains an open research problem.
Teleoperation demands sustained attention, precise motor control, and continuous processing of sensory feedback. Over extended sessions, operators experience fatigue that degrades performance. This challenge is particularly relevant for data collection, where thousands of demonstrations may be needed. Ergonomic interface design, shared autonomy (where the robot handles routine sub-tasks), and shift-based operation models (as used by 1X for its Expert Mode) are approaches to mitigating operator fatigue.
While teleoperation produces high-quality demonstration data, it is inherently limited by the speed at which humans can perform demonstrations. Collecting the thousands or tens of thousands of demonstrations needed to train robust policies is expensive and time-consuming. Approaches to address this include co-training with data from multiple tasks, synthetic data generation (using simulation platforms such as NVIDIA Isaac Sim), data augmentation techniques (such as NVIDIA's MimicGen), and transferring policies across robot platforms using hardware-agnostic interfaces like UMI.
Teleoperated robots have been used extensively in military explosive ordnance disposal (EOD) since the late 20th century. These robots allow operators to inspect, manipulate, and neutralize explosive devices from a safe distance. Modern EOD robots are equipped with high-definition cameras, manipulator arms, and various sensors providing real-time feedback.
Remotely operated vehicles (ROVs) are teleoperated underwater robots used for deep-sea exploration, infrastructure inspection, scientific research, and military mine countermeasures. ROVs are connected to surface vessels by tethered cables that transmit power, commands, and video feeds. The MK20 Defender ROV, for example, is used by the U.S. Navy to locate and neutralize underwater mines and explosives in maritime zones worldwide.
Teleoperated surgical robots enable minimally invasive surgery with enhanced precision, reduced tremor, and improved visualization. Beyond the da Vinci system, teleoperation is being explored for remote diagnostics, physical therapy, and patient monitoring, driven by the need to provide healthcare access in underserved or remote areas.
Teleoperation remains essential for space robotics, where autonomous systems cannot always handle the complexity and unpredictability of tasks. In addition to Canadarm2, various rovers and robotic systems on the Moon and Mars have been operated through teleoperation with varying degrees of autonomy to compensate for communication delays.
In industrial settings, teleoperation allows human operators to guide robots through tasks that are too complex for full automation, such as assembly in unstructured environments, quality inspection, and handling of irregular objects. The growing teleoperation market in logistics is driven by driver shortages and the need for enhanced operational efficiency.
The global teleoperation market is experiencing rapid growth. Estimates for 2025 place the market at approximately $890 million, with projections reaching $4 billion by 2032 at a compound annual growth rate (CAGR) of approximately 24 percent. Key growth drivers include advances in 5G connectivity, artificial intelligence, and robotics; rising demand for remote surgery and diagnostics; increasing adoption of Industrial Internet of Things (IIoT) and Industry 4.0 technologies; and the expanding use of teleoperation for data collection in robot learning.
| Market Metric | Value |
|---|---|
| Estimated market size (2025) | ~$890 million |
| Projected market size (2032) | ~$4 billion |
| CAGR (2025-2032) | ~24% |
| Key growth sectors | Healthcare, logistics, manufacturing, defense |
| Enabling technologies | 5G, AI, VR/AR, haptics, cloud robotics |
The field of robot teleoperation is evolving along several trajectories. The convergence of teleoperation with embodied AI is blurring the line between teleoperation and autonomy; systems like 1X's NEO are designed from the ground up so that teleoperated operation and autonomous operation exist on a continuum. Foundation models for robotics, including Vision-Language-Action (VLA) models, are increasingly being trained on teleoperation data to produce generalist robot policies. The democratization of teleoperation hardware through open-source projects like ALOHA, UMI, and LeRobot is lowering the barrier to entry for robotics research and enabling a broader community of contributors. Cloud robotics architectures, where computation and even operator input can be provided remotely over the Internet, are expanding the range of possible teleoperation applications. As 5G and future communication technologies reduce latency and increase bandwidth, the fidelity and responsiveness of teleoperation systems will continue to improve, enabling new applications in healthcare, manufacturing, space exploration, and everyday life.