Learning Agile Robotic Locomotion Skills by Imitating Animals


Xue Bin Peng , Erwin Coumans , Tingnan Zhang , Tsang-Wei Lee , Jie Tan , Sergey Levine

Abstract

Reproducing the diverse and agile locomotion skills of animals has been a longstanding challenge in robotics. While manually-designed controllers have been able to emulate many complex behaviors, building such controllers involves a time-consuming and difficult development process, often requiring substantial expertise of the nuances of each skill. Reinforcement learning provides an appealing alternative for automating the manual effort involved in the development of controllers. However, designing learning objectives that elicit the desired behaviors from an agent can also require a great deal of skill-specific expertise. In this work, we present an imitation learning system that enables legged robots to learn agile locomotion skills by imitating real-world animals. We show that by leveraging reference motion data, a single learning-based approach is able to automatically synthesize controllers for a diverse repertoire behaviors for legged robots. By incorporating sample efficient domain adaptation techniques into the training process, our system is able to learn adaptive policies in simulation that can then be quickly adapted for real-world deployment. To demonstrate the effectiveness of our system, we train an 18-DoF quadruped robot to perform a variety of agile behaviors ranging from different locomotion gaits to dynamic hops and turns.

Live Paper Discussion Information

Start Time End Time
07/15 15:00 UTC 07/15 17:00 UTC

Virtual Conference Presentation

Supplementary Video

Paper Reviews

Review 1

The paper presents a comprehensive study on mapping mocap animal gaits onto the Laikago robot. A technical contribution is a modification to [Yu-Liu-Turk-ICLR-2019], by adding an information bottleneck (IB), via a stochastic encoder. The aim of this is to prevent potential overfitting that could result in the learned policy being brittle in ways that are not necessarily observed during the adaptation process. The adaptation algorithm itself is also slightly different. The paper represents a thorough study of the possibilities of leveraging imitation learning for quadruped robots. The results will be of broad interest to the community and will inspire future work. Understanding when and why adaptation is necessary would be interesting to speculate on, e.g., [Hwangbo 2019] do not do adaptation, excluding the learned motor dynamics. The only critique I have is that the "overfitting" problem that the information bottleneck aims to address could be better documented. Currently it seems fairly minimal, e.g., in Fig 10, the beta=10^-3 curve on general really does almost as well beta=10^-4 curve. Consider adding a further large value of beta to these plots, to better document the problems of overfitting. Maybe I missed it, but it would be interesting to know how beneficial it would be to adapt across the ensemble of skills, as opposed to the individual skills. Perhaps with the ensemble of skills, the information-bottleneck "regularization" may not be needed, given that it would be more difficult to overfit in some particularly way. I'm also curious as to how similar the final parameter estimates are for the different skills. I.e., how skill-specific are the adaptations? suggestions The paper is specific quadruped robots, so including "quadruped" in the title would be more precise and help to make it findable for others working on quadrupeds. Abstract: could shorten this? The first four sentences are all about the context; reduce? Only in sentence 5 do we get to: "In this work, we present ..." which would be fine as the starting sentence for an abstract. Excellent and comprehensive review of the related work. use of a low-pass filter to smooth motions: Suggest to provide the details and the time constant for the low-pass filter. E.g., for basic smoothing, s_t = alpha*x_t + (1-alpha)*s_{t-1}, provide either alpha or tau, where tau = dT/alpha, where dT is the sampling rate. Algo 1, Line 9: as explained in the text, the argmax is not what is actually used, so perhaps reflect this in the algo description. Fig 5: The fair presentation of the results is greatly appreciated. Fig 10: This is an important ablation/set-of-tests to see Fig 5, Fig 7: I don't understand what the "Adaptive (Before)" results refer to. Presumably it is with parameters corresponding to the middle of their ranges as given in Table 1? Maybe I missed this in the text. How difficult is it for Laikago to do a pace? It's wide body would seem to preclude

Review 2

This is a well-written paper, and it was a pleasure to read it. I think the value of this work is that it builds up an end-to-end framework and has demonstrated a number of very agile movements trained from animation data on a quadruped robot. The system as a whole is new, though some of the components are not very novel, such as that the motion retargeting is pretty standard, and that the training of control policies using RL largely repeats existing works. The domain adaptation is based on the algorithm of [65,67], with a key change to enforce an information bottleneck during training. While the ablation study on the IB is convincing, I wonder how the IB compares to directly reducing the dimensionality of the latent space. In addition, the dimension of the latent space is not provided in the paper. Is it the same for all the motions? The supplementary video includes a failure case which is not a real failure in my opinion, because the real robot can follow the simulation well. I actually curious about in what case the sim-to-real transfer will fail. It would be appreciated to have an example in the paper.