Scaling Robot Learning with Semantically Imagined Experience


Tianhe Yu
Google Brain
Ted Xiao
Google Inc
Jonathan Tompson
Google Inc
Austin Stone
Google Inc
Su Wang
Google Inc
Anthony Brohan
Google Research
Jaspiar Singh
Google Inc
Clayton Tan
Google Inc
Dee M
Google Inc
Jodilyn Peralta
Google Inc
Karol Hausman
Google Brain
Brian Ichter
Google Brain
Fei Xia
Google Inc
Paper Website

Paper ID 27

Session 4. Large Data and Vision-Language Models for Robotics

Poster Session Tuesday, July 11

Poster 27

Abstract: Recent advances in robot learning have shown promise in enabling robots to perform a variety of manipulation tasks and generalize to novel scenarios. One of the key contributing factors to this progress is the scale of robot data used to train the models. To obtain large-scale datasets, prior approaches have relied on either demonstrations requiring high human involvement or engineering-heavy autonomous data collection schemes, both of which being challenging in scaling up the space of new tasks and skills needed for building generalist robots. To mitigate this issue, we propose to take an alternative route and leverage text-to-image foundation models widely used in computer vision and natural language processing to obtain meaningful data for robot learning without requiring additional robot data. Specifically, we make use of the state of the art text-to-image diffusion models and perform aggressive data augmentation on top of our existing robotic manipulation datasets via inpainting of various unseen objects for manipulation, backgrounds, and distractors with pure text guidance. Through extensive real-world experiments, we show that manipulation policies trained on the augmented data are able to solve completely unseen tasks with new objects and can behave more robustly w.r.t. novel distractors. In addition, we also find that we can improve the robustness and generalization of high-level robot learning tasks such as success detection through training with the diffusion-based data augmentation.