Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators

Alexander Herzog

Google X

Kanishka Rao

Google Inc

Karol Hausman

Google Brain

Yao Lu

Google Research

Paul Wohlhart

Google Inc

Mengyuan Yan

Google Inc

Jessica Lin

Everyday Robots

Montserrat Gonzalez Arenas

Google Inc

Ted Xiao

Google Inc

Daniel Kappler

Google X

Daniel Ho

Google Inc

Jarek Rettinghouse

Everyday Robots

Yevgen Chebotar

Google Inc

Kuang-Huei Lee

Google Inc

Keerthana Gopalakrishnan

Google Inc

Ryan Julian

Google Inc

Adrian Li

Wayve

Chuyuan Fu

Everyday Robots

Bob Wei

Everyday Robots

Sangeetha Ramesh

Everyday Robots

Khem Holden

Google Inc

Kim Kleiven

Everyday Robots

David J Rendleman

Google Inc

Sean Kirmani

Everyday Robots

Jeffrey Bingham

Everyday Robots

Jonathan Weisz

Everyday Robots

Ying Xu

Everyday Robots

Wenlong Lu

Everyday Robots

Matthew Bennice

Everyday Robots

Cody Fong

Everyday Robots

David Do

Everyday Robots

Jessica Lam

Everyday Robots

Yunfei Bai

Google X

Benjie Holson

Google X

Michael Quinlan

Google X

Noah Brown

Google Inc

Mrinal Kalakrishnan

Google X

Julian Ibarz

Google Inc

Peter Pastor

Google X

Sergey Levine

Google Inc

Paper ID 22

Session 3. Self-supervision and RL for Manipulation

Poster Session Tuesday, July 11

Poster 22

Abstract: We describe a system for deep reinforcement learning of robotic manipulation skills applied to a large-scale real-world task: sorting recyclables and trash in office buildings. Real-world deployment of deep RL policies requires not only effective training algorithms, but the ability to bootstrap real-world training and enable broad generalization. To this end, our system combines scalable deep RL from real-world data with bootstrapping from training in simulation, and incorporates auxiliary inputs from existing computer vision systems as a way to boost generalization to novel objects, while retaining the benefits of end-to-end training. We analyze the tradeoffs of different design decisions in our system, and present a large-scale empirical validation that includes training on real-world data gathered over the course of 24 months of experimentation, across a fleet of 23 robots in three office buildings, with a total training set of 9527 hours of robotic experience. Our final validation also consists of 4800 evaluation trials across 240 waste station configurations, in order to evaluate in detail the impact of the design decisions in our system, the scaling effects of including more real-world data, and the performance of the method on novel objects.