ViTaSCOPE: Visuo-tactile Implicit Representation for In-hand Pose and Extrinsic Contact Estimation


Jayjun Lee, Nima Fazeli

Paper ID 54

Session 6. Manipulation I

Poster Session (Day 2): Sunday, June 22, 6:30-8:00 PM

Abstract: Mastering dexterous, contact-rich object manipulation demands precise estimation of both in-hand object poses and external contact locations—tasks particularly challenging due to partial and noisy observations. We present ViTaSCOPE: Visuo-Tactile Simultaneous Contact and Object Pose Estimation, a neural implicit representation that fuses vision and high-resolution tactile feedback for contact-aware 3D object reconstruction. By representing objects as signed distance fields and conditioning on shear field data from tactile sensors alongside visual feedback, ViTaSCOPE accurately localizes objects and registers extrinsic contacts onto their 3D geometry. Our method enables seamless reasoning over complementary visuo-tactile cues, and bridges the sim-to-real gap by leveraging simulation for scalable training. We evaluate our method through comprehensive simulated and real-world experiments, demonstrating its capabilities in dexterous manipulation scenarios.