Visual Verification Enables Inference-time Steering and Autonomous Policy Improvement

Mingtong Zhang, Dhruv Shah

Paper ID 79

Session Imitation learning 1

Posters presented in the poster session following their oral. Locations not assigned.

Abstract: Robots deployed in the real world must learn from their experience and improve over time. This requires a mechanism of practicing and learning from feedback. In this paper, we propose a generator–verifier framework for generalist robot policies for autonomous policy improvement. We use a pre-trained policy as a “generator’’ and pair it with a gradient-free “visual verifier” that evaluates and selects actions at inference time. This framework enables inference-time steering that improves real-world performance without additional training. Across SIMPLER simulation and real-world DROID setups, we show that inference-time verification consistently improves policy performance over na"ive execution, and that these gains hold across different choices of verifiers, including VLM-based and heuristic ones. Beyond inference-time steering, we demonstrate that verified rollouts provide effective supervision for offline policy improvement: policies fine-tuned on autonomously verified data achieve steep performance gains, with performance continuing to improve as more verified demonstrations are collected. Notably, we find that post-training with verified rollouts matches the efficiency of human expert demonstrations, while requiring no human interventions. Our results highlight test-time verification as a practical and scalable mechanism for improving robotic policies during autonomous deployment.