LangWBC: Language-directed Humanoid Whole-Body Control via End-to-end Learning


Yiyang Shao, Bike Zhang, Qiayuan Liao, Xiaoyu Huang, Yuman Gao, Yufeng Chi, Zhongyu Li, Sophia Shao, Koushil Sreenath

Paper ID 65

Session 7. Humanoids

Poster Session (Day 2): Sunday, June 22, 6:30-8:00 PM

Abstract: General-purpose humanoid robots are expected to interact intuitively with humans, enabling seamless integration into daily life. Natural language provides the most accessible medium for this purpose. However, translating languages into humanoid whole-body motions remains a significant challenge, primarily due to the gap between linguistic understanding and physical actions. In this work, we present an end-to-end, language-directed policy for real-world humanoid control. Our approach combines reinforcement learning with policy distillation, allowing a single neural network to interpret language commands and execute corresponding physical actions directly. To enhance motion diversity and compositionality, we incorporate a Conditional Variational Autoencoder (CVAE) structure. The resulting policy achieves agile and versatile whole-body behaviors conditioned on language inputs, with smooth transitions between various motions, enabling iterative and adaptable control. We validate the efficacy and generalizability of our method through extensive simulations and real-world experiments, demonstrating robust whole-body control.