Definition

Teleoperation is the remote control of a robot by a human operator, either from across the room or across the world. The operator manipulates an input device — a leader arm, VR controller, SpaceMouse, or data glove — and the robot (the follower) mirrors those motions in real time. In the context of robot learning, teleoperation is far more than remote control: it is the primary pipeline for generating the expert demonstration datasets that power imitation learning, behavior cloning, and ACT-style policies.

Unlike kinesthetic teaching, where the operator physically moves the robot's joints, teleoperation decouples the operator from the robot. This means demonstrations can be collected on dangerous or remote hardware, at higher speeds, and with better ergonomics. The teleoperation system records synchronized streams of camera images, joint positions, gripper states, and optionally force-torque readings at 30–50 Hz, producing the (observation, action) pairs that policies learn from.

The quality of a learned policy is fundamentally limited by the quality of the teleoperation data. Smooth, consistent demonstrations with minimal pauses and hesitations produce significantly better policies than noisy, start-stop data. This makes teleoperation interface design and operator training as important as the learning algorithm itself.

How It Works

A teleoperation system consists of three layers: the input device (master/leader), the communication channel, and the robot controller (slave/follower). The input device captures the operator's intended motions as a stream of poses or joint angles. These are transmitted to the robot controller, which converts them into motor commands. The operator receives visual feedback from cameras mounted on the robot, and optionally haptic feedback through force-reflecting devices.

For data collection, the system also runs a recording pipeline that timestamps and synchronizes all sensor streams. Modern frameworks like LeRobot, ALOHA, and SVRC's data platform handle recording, visualization, and export to standard training formats (HDF5, LeRobot dataset) automatically.

The critical design parameter is end-to-end latency: the delay between the operator's motion and the robot's response. For precise manipulation, latency must stay below 50–100 ms. Above 200 ms, operators begin to adopt slow, cautious strategies that produce poor training data. Network jitter and video compression introduce additional latency in remote setups.

Types of Teleoperation

  • Leader-follower (kinematic) — A smaller "leader" arm with matching kinematics is physically moved by the operator; the "follower" robot mirrors joint angles. ALOHA uses this approach with ViperX 300 arms. Provides the most intuitive 1:1 mapping and captures all degrees of freedom naturally.
  • VR headset + controllers — The operator wears a Meta Quest 3 or similar headset and uses hand controllers to specify 6-DOF end-effector poses. The robot's inverse kinematics solver converts these to joint commands. Best for remote operation and tasks requiring a third-person perspective.
  • SpaceMouse / 6-DOF input — A desktop device that provides incremental 6-DOF velocity commands. Compact and inexpensive but requires significant operator practice. Well-suited for single-arm tasks in structured environments.
  • Data glove / hand tracking — Captures finger articulation for controlling dexterous robotic hands (Allegro, LEAP, Inspire). Essential for tasks requiring in-hand manipulation, but mapping between human and robot hand kinematics is non-trivial.
  • Mobile + manipulation — Extends bimanual teleop to a mobile base, as in Mobile ALOHA. The operator controls base navigation (joystick or autonomous) while teleoperating the arms for mobile manipulation tasks like tidying a room.

Key Metrics

Latency: End-to-end delay from input to robot motion. Target <50 ms for local setups, <100 ms for remote. Measured with motion capture or timestamp correlation.

Fidelity: How accurately the robot reproduces the operator's intended motion. Affected by kinematic mismatch, joint limits, and control frequency. Leader-follower systems with matched kinematics achieve the highest fidelity.

Degrees of freedom captured: A SpaceMouse provides 6 DOF (position + orientation). A bimanual leader-follower rig captures 14+ DOF (7 per arm including gripper). Dexterous hand systems can capture 20+ DOF per hand.

Throughput: Demonstrations per hour. An experienced operator with a well-designed interface can produce 15–30 demonstrations per hour for simple tasks, or 5–10 per hour for complex bimanual tasks. This directly affects data collection cost.

Hardware for Teleoperation

ALOHA / ALOHA 2: The most widely adopted research teleoperation platform. Two ViperX 300 leader arms control two ViperX 300 follower arms. Total cost ~$20K. Designed specifically for data collection and ACT policy training.

Meta Quest 3: Consumer VR headset repurposed for robot teleoperation. Hand tracking or controller tracking provides 6-DOF input. Requires a software bridge (e.g., Open-TeleVision) to stream poses to the robot controller.

3Dconnexion SpaceMouse: $130–$300 desktop device for 6-DOF input. Popular in CAD and increasingly used for robot control. Low barrier to entry but steep learning curve for complex tasks.

Custom rigs: Many labs build custom leader arms using the same actuators as the follower (e.g., Dynamixel servos with current-based position reading). This ensures perfect kinematic matching and minimal latency.

Key Papers

  • Zhao, T., Kumar, V., Levine, S., & Finn, C. (2023). "Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware." RSS 2023. Introduced the ALOHA bimanual teleoperation system and demonstrated its effectiveness for training ACT policies.
  • Fu, Z. et al. (2024). "Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation." Extended teleoperation to mobile manipulation with a wheeled base and two arms.
  • Iyer, A. et al. (2024). "Open-TeleVision: Teleoperation with Immersive Active Visual Feedback." CoRL 2024. VR-based teleoperation with stereoscopic streaming that achieves near-local-quality data collection over the internet.

Related Terms

  • Imitation Learning — The learning paradigm that consumes teleoperation data
  • Action Chunking (ACT) — Policy architecture designed for teleoperated demonstration data
  • Behavior Cloning — Supervised learning from teleoperated demonstrations
  • DAgger — Interactive data collection that combines policy rollouts with expert correction
  • Force-Torque Sensing — Haptic feedback and force data recorded during teleoperation

Collect Data at SVRC

Silicon Valley Robotics Center operates ALOHA bimanual rigs, VR teleoperation stations, and SpaceMouse setups ready for your data collection campaigns. Our trained operators can collect hundreds of high-quality demonstrations per day, formatted and ready for ACT, Diffusion Policy, or VLA fine-tuning.

Explore Data Services   Contact Us