Teleoperation

Definition

Teleoperation is the remote control of a robot by a human operator, either from across the room or across the world. The operator manipulates an input device — a leader arm, VR controller, SpaceMouse, or data glove — and the robot (the follower) mirrors those motions in real time. In the context of robot learning, teleoperation is far more than remote control: it is the primary pipeline for generating the expert demonstration datasets that power imitation learning, behavior cloning, and ACT-style policies.

Unlike kinesthetic teaching, where the operator physically moves the robot's joints, teleoperation decouples the operator from the robot. This means demonstrations can be collected on dangerous or remote hardware, at higher speeds, and with better ergonomics. The teleoperation system records synchronized streams of camera images, joint positions, gripper states, and optionally force-torque readings at 30–50 Hz, producing the (observation, action) pairs that policies learn from.

The quality of a learned policy is fundamentally limited by the quality of the teleoperation data. Smooth, consistent demonstrations with minimal pauses and hesitations produce significantly better policies than noisy, start-stop data. This makes teleoperation interface design and operator training as important as the learning algorithm itself.

How It Works

A teleoperation system consists of three layers: the input device (master/leader), the communication channel, and the robot controller (slave/follower). The input device captures the operator's intended motions as a stream of poses or joint angles. These are transmitted to the robot controller, which converts them into motor commands. The operator receives visual feedback from cameras mounted on the robot, and optionally haptic feedback through force-reflecting devices.

For data collection, the system also runs a recording pipeline that timestamps and synchronizes all sensor streams. Modern frameworks like LeRobot, ALOHA, and SVRC's data platform handle recording, visualization, and export to standard training formats (HDF5, LeRobot dataset) automatically.

The critical design parameter is end-to-end latency: the delay between the operator's motion and the robot's response. For precise manipulation, latency must stay below 50–100 ms. Above 200 ms, operators begin to adopt slow, cautious strategies that produce poor training data. Network jitter and video compression introduce additional latency in remote setups.

Types of Teleoperation

Leader-follower (kinematic) — A smaller "leader" arm with matching kinematics is physically moved by the operator; the "follower" robot mirrors joint angles. ALOHA uses this approach with ViperX 300 arms. Provides the most intuitive 1:1 mapping and captures all degrees of freedom naturally.
VR headset + controllers — The operator wears a Meta Quest 3 or similar headset and uses hand controllers to specify 6-DOF end-effector poses. The robot's inverse kinematics solver converts these to joint commands. Best for remote operation and tasks requiring a third-person perspective.
SpaceMouse / 6-DOF input — A desktop device that provides incremental 6-DOF velocity commands. Compact and inexpensive but requires significant operator practice. Well-suited for single-arm tasks in structured environments.
Data glove / hand tracking — Captures finger articulation for controlling dexterous robotic hands (Allegro, LEAP, Inspire). Essential for tasks requiring in-hand manipulation, but mapping between human and robot hand kinematics is non-trivial.
Mobile + manipulation — Extends bimanual teleop to a mobile base, as in Mobile ALOHA. The operator controls base navigation (joystick or autonomous) while teleoperating the arms for mobile manipulation tasks like tidying a room.

Key Metrics

Latency: End-to-end delay from input to robot motion. Target <50 ms for local setups, <100 ms for remote. Measured with motion capture or timestamp correlation.

Fidelity: How accurately the robot reproduces the operator's intended motion. Affected by kinematic mismatch, joint limits, and control frequency. Leader-follower systems with matched kinematics achieve the highest fidelity.

Degrees of freedom captured: A SpaceMouse provides 6 DOF (position + orientation). A bimanual leader-follower rig captures 14+ DOF (7 per arm including gripper). Dexterous hand systems can capture 20+ DOF per hand.

Throughput: Demonstrations per hour. An experienced operator with a well-designed interface can produce 15–30 demonstrations per hour for simple tasks, or 5–10 per hour for complex bimanual tasks. This directly affects data collection cost.

Hardware for Teleoperation

ALOHA / ALOHA 2: The most widely adopted research teleoperation platform. Two ViperX 300 leader arms control two ViperX 300 follower arms. Total cost ~$20K. Designed specifically for data collection and ACT policy training.

Meta Quest 3: Consumer VR headset repurposed for robot teleoperation. Hand tracking or controller tracking provides 6-DOF input. Requires a software bridge (e.g., Open-TeleVision) to stream poses to the robot controller.

3Dconnexion SpaceMouse: $130–$300 desktop device for 6-DOF input. Popular in CAD and increasingly used for robot control. Low barrier to entry but steep learning curve for complex tasks.

Custom rigs: Many labs build custom leader arms using the same actuators as the follower (e.g., Dynamixel servos with current-based position reading). This ensures perfect kinematic matching and minimal latency.

Teleoperation Modes

Beyond direct teleoperation, modern systems implement varying levels of autonomy sharing between the human and the robot:

Direct teleoperation — The human controls every joint or Cartesian DOF continuously. Full human authority, highest data quality for imitation learning, but most fatiguing. Used for initial data collection campaigns.
Shared autonomy — The robot assists the human by predicting intent and blending human commands with autonomous corrections. Examples: auto-aligning the gripper to detected grasp poses while the human controls gross motion, or the robot autonomously avoiding collisions while the human specifies target positions. Reduces operator fatigue and improves demonstration consistency.
Supervisory control — The human specifies high-level goals ("pick up the red cup and place it on the tray") and the robot executes autonomously, with the human monitoring and intervening only when errors occur. This is the operational model for deployed VLA-based systems with human oversight.
Traded control — The human and robot alternate control authority for different task phases. For example: the human teleoperates the approach and grasp, then the robot autonomously executes a pre-programmed insertion sequence. Used in industrial applications where certain phases are well-characterized.

Data Quality Best Practices

The quality of teleoperation data directly determines learned policy performance. SVRC's data collection protocols include:

Consistent strategy: Operators should follow the same approach strategy across demonstrations. Mixed strategies (sometimes approaching from the left, sometimes from the right) create multimodal action distributions that confuse unimodal policies like behavior cloning.
Smooth, continuous motion: Avoid start-stop movements and unnecessary pauses. The policy learns to reproduce whatever motion patterns appear in the data, including hesitations. Smooth demonstrations produce smooth policies.
Consistent speed: Maintain a similar execution speed across demonstrations. Large speed variations cause the action chunking temporal model to struggle with inconsistent timing.
Initial condition randomization: Vary object positions, orientations, and placements across demonstrations to teach the policy to generalize. A policy trained with the cup always in the same position will fail when the cup moves.
Quality filtering: Remove failed or significantly suboptimal demonstrations. One poor demonstration can be worse than no demonstration at all for BC performance.

SVRC Teleoperation Infrastructure

SVRC operates dedicated teleoperation stations at our San Francisco and Allston facilities:

ALOHA bimanual rigs — Two ViperX 300 leader-follower pairs for bimanual manipulation data collection. Pre-configured with multi-camera recording (3 views), synchronized at 50 Hz.
VR teleoperation — Meta Quest 3 headsets with Open-TeleVision integration for immersive single-arm and bimanual control. Supports remote data collection over low-latency networks.
OpenArm teleoperation — Leader-follower setup using OpenArm 1 with wrist F/T recording for contact-rich tasks.
Unitree G1 full-body — Motion capture suit and hand tracking gloves for whole-body humanoid teleoperation and demonstration collection.

Professional operators are available for high-throughput data campaigns. Typical throughput: 200–400 demonstrations per operator per day for simple tabletop tasks.

Key Papers

Zhao, T., Kumar, V., Levine, S., & Finn, C. (2023). "Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware." RSS 2023. Introduced the ALOHA bimanual teleoperation system and demonstrated its effectiveness for training ACT policies.
Fu, Z. et al. (2024). "Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation." Extended teleoperation to mobile manipulation with a wheeled base and two arms.
Iyer, A. et al. (2024). "Open-TeleVision: Teleoperation with Immersive Active Visual Feedback." CoRL 2024. VR-based teleoperation with stereoscopic streaming that achieves near-local-quality data collection over the internet.

Related Terms

Imitation Learning — The learning paradigm that consumes teleoperation data
Action Chunking (ACT) — Policy architecture designed for teleoperated demonstration data
Behavior Cloning — Supervised learning from teleoperated demonstrations
DAgger — Interactive data collection that combines policy rollouts with expert correction
Force-Torque Sensing — Haptic feedback and force data recorded during teleoperation

Collect Data at SVRC

Robotics Center of Silicon Valley operates ALOHA bimanual rigs, VR teleoperation stations, and SpaceMouse setups ready for your data collection campaigns. Our trained operators can collect hundreds of high-quality demonstrations per day, formatted and ready for ACT, Diffusion Policy, or VLA fine-tuning.

Explore Data Services Contact Us

Definition

How It Works

Types of Teleoperation

Key Metrics

Hardware for Teleoperation

Teleoperation Modes

Data Quality Best Practices

SVRC Teleoperation Infrastructure

See Also

Key Papers

Related Terms

Collect Data at SVRC

Related Pages

Action Chunking (ACT)

Imitation Learning

Force-Torque Sensing

Impedance Control