Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation

Abhiram Maddukuri, Zhenyu Jiang, Lawrence Yunliang Chen, Soroush Nasiriany, Yuqi Xie, Yu Fang, Wenqi Huang, Zu Wang, Zhenjia Xu, Nikita Chernyadev, Scott Reed, Ken Goldberg, Ajay Mandlekar, Linxi Fan, Yuke Zhu

Paper ID 109

Session 11. Manipulation II

Poster Session (Day 3): Monday, June 23, 6:30-8:00 PM

Abstract: Large real-world robot datasets hold great potential for developing generalist robot policies, but scaling real-world data collection is time-consuming, costly, and resource-intensive. Simulation offers a promising solution, with recent advances in generative AI and synthetic data generation tools enabling the creation of large-scale robot demonstration datasets while reducing human effort. However, when training policies solely on data from simulation we must address the sim-to-real gap, often requiring extensive human effort to carefully align simulation with the real world. Recent work has suggested that training on a mixture of simulation and real-world datasets has great promise for improving policy performance, yet a systematic understanding of how to effectively leverage simulation data for real-world vision-based manipulation remains lacking. In this work, we present a simple recipe for effectively utilizing simulation data in real-world manipulation tasks. We derive these insights from comprehensive experiments comparing co-training on various simulation and real-world datasets. Using two domains—a robot arm and a humanoid—across diverse tasks, we demonstrate that simulation data can significantly enhance real-world task performance, even with notable differences between the simulation and real-world data. Through controlled experiments, we provide guidelines on how to optimize across different factors in simulation data to enable successful real-world transfer.