Pushing the Limits of Cross-Embodiment Learning for Manipulation and Navigation

Jonathan Heewon Yang, Catherine Glossop, Arjun Bhorkar, Dhruv Shah, Quan Vuong, Chelsea Finn, Dorsa Sadigh, Sergey Levine

Paper ID 93

Session 12. Robot learning foundation models

Poster Session day 3 (Thursday, July 18)

Abstract: Recent years in robotics and imitation learning have shown remarkable progress in training large-scale foundation models by leveraging data across a multitude of embodiments. The success of such policies might lead us to wonder: just how diverse can the robots in the training set be while still facilitating positive transfer? In this work, we study this question in the context of heterogeneous embodiments, examining how even seemingly very different domains such as robotic navigation and manipulation can provide benefits when included in the training data for the same model. We train a single goal-conditioned policy that is capable of controlling a robotic arm, quadcopter, quadruped, and mobile base. We then investigate the extent to which transfer can occur across navigation and manipulation by framing them as a single goal-reaching task. In particular, we find that co-training with navigation data can enhance robustness and performance in goal-conditioned manipulation with a wrist-mounted camera. We then deploy our policy trained only from navigation-only and static manipulation-only data on a mobile manipulator, showing that it can control a similar but novel embodiment in a zero-shot manner. These results provide evidence that large-scale robotic policies can benefit from data collected across a wide variety of embodiments.