Abstract: Large real-world robot datasets hold great potential for developing generalist robot policies, but scaling real-world data collection is time-consuming, costly, and resource-intensive. Simulation offers a promising solution, with recent advances in generative AI and synthetic data generation tools enabling the creation of large-scale robot demonstration datasets while reducing human effort. However, when training policies solely on data from simulation we must address the sim-to-real gap, often requiring extensive human effort to carefully align simulation with the real world. Recent work has suggested that training on a mixture of simulation and real-world datasets has great promise for improving policy performance, yet a systematic understanding of how to effectively leverage simulation data for real-world vision-based manipulation remains lacking. In this work, we present a simple recipe for effectively utilizing simulation data in real-world manipulation tasks. We derive these insights from comprehensive experiments comparing co-training on various simulation and real-world datasets. Using two domains—a robot arm and a humanoid—across diverse tasks, we demonstrate that simulation data can significantly enhance real-world task performance, even with notable differences between the simulation and real-world data. Through controlled experiments, we provide guidelines on how to optimize across different factors in simulation data to enable successful real-world transfer.