RT-1: Robotics Transformer for Real-World Control at Scale

Anthony Brohan

Google Research

Noah Brown

Google Research

Justice Carbajal

Google Research

Yevgen Chebotar

Google Inc

Joseph Dabis

Google Research

Chelsea Finn

Google Brain

Keerthana Gopalakrishnan

Google Inc

Karol Hausman

Google Brain

Alexander Herzog

Google X

Jasmine Hsu

Google Inc

Julian Ibarz

Google Inc

Brian Ichter

Google Brain

Alex Irpan

Google Inc

Tomas Jackson

Google Research

Sally Jesmonth

Google Research

Nikhil Joshi

Google Research

Ryan Julian

Google Inc

Dmitry Kalashnikov

Google Inc

Yuheng Kuang

Google Research

Isabel Leal

Google Research

Kuang-Huei Lee

Google Inc

Sergey Levine

Google Inc

Yao Lu

Google Research

Utsav Malla

Google Research

Deeksha Manjunath

Google Research

Igor Mordatch

Google Inc

Ofir Nachum

Google Inc

Carolina Parada

Google Inc

Jodilyn Peralta

Google Inc

Emily Perez

Google Inc

Karl Pertsch

Google Inc

Jornell Quiambao

Google Inc

Kanishka Rao

Google Inc

Michael S Ryoo

Google, Stony Brook University

Grecia Salazar

Google Inc

Pannag R Sanketi

Google Inc

Kevin Sayed

Google Inc

Jaspiar Singh

Google Inc

Sumedh Sontakke

Google Inc

Austin Stone

Google Inc

Clayton Tan

Google Inc

Huong Tran

Google Inc

Vincent Vanhoucke

Google Inc

Steve Vega

Google Inc

Quan H Vuong

Google Inc

Fei Xia

Google Inc

Ted Xiao

Google Inc

Peng Xu

Google Inc

Sichun Xu

Google Inc

Tianhe Yu

Google Brain

Brianna Zitkovich

Google Inc

Paper ID 25

Session 4. Large Data and Vision-Language Models for Robotics

Poster Session Tuesday, July 11

Poster 25

Abstract: By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.

RT-1: Robotics Transformer for Real-World Control at Scale

Paper ID 25

Session 4. Large Data and Vision-Language Models for Robotics

Poster Session Tuesday, July 11

Poster 25

Links