Learning to Evolve: Multi-modal Interactive Fields for Robust Humanoid Navigation in Dynamic Environments

Peifeng Jiang, Hong Liu, Jin Jin, Wenshuai Wang, Xia Li

Paper ID 23

Session Humanoids

Poster session details TBA

Abstract: Achieving safe manipulation-oriented navigation for humanoid robots is fundamentally challenged by two factors: locomotion-induced perceptual distortion (causing semantic-geometry distortion) and changes within the environment (causing map-reality mismatches). Existing static scene graphs often fail under these conditions, leading to interaction failures. To address this, we introduce the Multi-modal Interaction Field (MIF), a hierarchical framework that transforms the robot from a passive map-user into an active knowledge-evolver. MIF constructs three synergistic fields: (i) a denoised Appearance Field utilizing confidence-gated 3D Gaussian Splatting to suppress gait oscillation noise; (ii) a hierarchical Spatial Field for semantic reasoning; and (iii) Geometry Field, leveraging a Flow Matching based generative model to reconstruct high-fidelity meshes for rigorous Interaction Pose Safety (IPS) verification against the target object. Crucially, we propose a closed-loop Interaction and Adaptation Mechanism to adapt to environmental changes. By monitoring a multi-modal discrepancy score \mathcal{D}, the system autonomously distinguishes between sensor noise and genuine environmental changes (e.g., relocated objects), triggering a local evolution loop to rectify obsolete memory. Real-world experiments on a Unitree-G1 humanoid demonstrate that MIF significantly outperforms static baselines (HOV-SG), improving the success rate in dynamic relocation scenarios from 12% to 94%, while reducing semantic memory footprint by 91.4% via feature distillation.