Demonstrating Large Language Models on Robots

Google DeepMind

Paper ID 24

Nominated for Best Demo Paper

Session 3. Self-supervision and RL for Manipulation

Demo

Poster Session Tuesday, July 11

Poster 24

Abstract: Robots may benefit from large language models (LLMs), which have demonstrated strong reasoning capabilities across various domains. This demonstration includes several systems based on recent methods that integrate LLMs on robots: SayCan, Socratic Models, Inner Monologue, and Code as Policies. While each algorithm highlights a different mode of grounding, they all share a common system-level structure in that they use LLMs to take as input natural language instructions and generate robot plans in the form of step-by-step procedures or code. This structure provides several practical perks for demonstration in that (i) we can use existing video chat interfaces to instruct the robot by typing commands and broadcasting its movements in action via video streaming, (ii) one can seamlessly switch between interfaces that communicate with different robots, and (iii) this can all be done remotely on a laptop, where the robots on real hardware can be held on standby in the lab ready to run on command. Our tentative plan is to show at least one system running on real hardware remotely – Inner Monologue or Code as Policies, and solicit task instructions from a live audience. Time-permitting we may also demonstrate the other systems available to run on real hardware. Otherwise, we will present recorded videos of past runs. We will link to open-source code, and conclude with a discussion of open research questions in the area.