Interactive Knowledge Distillation with Adaptive Teachers in Cooperative Multi-Agent Reinforcement Learning

Minwoo Cho, Batuhan Altundas, Matthew Craig Gombolay

Paper ID 40

Session Multi-robot Systems

Poster session details TBA

Abstract: Knowledge distillation (KD) has the potential to accelerate multi-agent reinforcement learning (MARL) by employing a centralized teacher for decentralized students. However, centralized teachers in MARL often fail because decentralized student exploration induces out-of-distribution (OOD) state distributions the teacher was never trained on, compounded by partial observability, which creates observation mismatches between teacher and students at execution time. We propose HINT (Hierarchical INteractive Teacher-based transfer), a novel KD framework for MARL in a centralized training, decentralized execution setup. By leveraging hierarchical RL, HINT provides a scalable, high-performing teacher. Pseudo off-policy RL treats student trajectories as additional training data for the teacher, allowing it to adapt its policy to student-induced state distributions. Performance-based filtering removes teacher guidance that depends on centralized observations unavailable to decentralized students, retaining only outcome-relevant signals. Across FireCommander and MARINE, HINT consistently outperforms state-of-the-art online MARL baselines, improving task success rates by 60%–165%.