TMRL: Diffusion Timestep-Modulated Pretraining Enables Exploration for Efficient Policy Finetuning


Matthew M. Hong, Jesse Zhang, Anusha Nagabandi, Abhishek Gupta

Paper ID 208

Session Imitation learning 3

Poster session details TBA

Abstract: Efficient exploration remains a bottleneck in reinforcement learning (RL), particularly for long-horizon, high-dimensional tasks. While recent methods leverage pre-trained policies for guidance, they are often constrained by the base policy’s original behavior distribution. We introduce Timestep Modulated Reinforcement Learning (TMRL), a framework that enables agents to explore dynamically beyond these boundaries. TMRL leverages a forward diffusion process to inject noise into the context of a pre-trained policy, effectively aliasing nearby states to facilitate shared exploration modes. By training an RL policy to modulate the diffusion timestep at deployment, the agent can adaptively control conditioning strength, balancing marginal and conditional behaviors. Experimental results in navigation and robotic manipulation demonstrate that TMRL significantly outperforms existing baselines, proving that timestep modulation is a robust mechanism for adapting action sequences to novel tasks.