Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

Tony Z. Zhao

Stanford University

Vikash Kumar

University of Washington

Sergey Levine

University of California, Berkeley

Chelsea Finn

Stanford University

Paper ID 16

Session 2. Manipulation from Demonstrations and Teleoperation

Poster Session Tuesday, July 11

Poster 16

Abstract: Fine manipulation tasks, such as threading cable ties or slotting a battery, are notoriously difficult for robots because they require precision, careful coordination of contact forces, and closed-loop visual feedback. Performing these tasks typically requires high-end robots, accurate sensors, or careful calibration, which can be expensive and difficult to set up. Can learning enable low-cost and imprecise hardware to perform these fine manipulation tasks? We present a low-cost system that performs end-to-end imitation learning directly from real demonstrations, collected with a custom teleoperation interface. Imitation learning, however, presents its own challenges, particularly in high-precision domains: the error of the policy can compound over time, drifting out of the training distribution. To address this challenge, we develop a simple yet novel algorithm Action Chunking with Transformers (ACT) which reduces the effective horizon by predicting actions in chunks. This allows us to learn difficult tasks such as opening a translucent condiment cup and slotting a battery with 80-90% success, with only 10 minutes worth of demonstration data. Project website: https://tonyzhaozh.github.io/aloha/