TMRL: Diffusion Timestep-Modulated Pretraining Enables Exploration for Efficient Policy Finetuning

Matthew M. Hong, Jesse Zhang, Anusha Nagabandi, Abhishek Gupta

Paper ID 208

Session Imitation learning 3

Posters presented in the poster session following their oral. Locations not assigned.

Abstract: Efficient exploration remains a bottleneck in reinforcement learning (RL), particularly for long-horizon, high-dimensional tasks. While recent methods leverage pre-trained policies for guidance, they are often constrained by the base policy’s original behavior distribution. We introduce Timestep Modulated Reinforcement Learning (TMRL), a framework that enables agents to explore dynamically beyond these boundaries. TMRL leverages a forward diffusion process to inject noise into the context of a pre-trained policy, effectively aliasing nearby states to facilitate shared exploration modes. By training an RL policy to modulate the diffusion timestep at deployment, the agent can adaptively control conditioning strength, balancing marginal and conditional behaviors. Experimental results in navigation and robotic manipulation demonstrate that TMRL significantly outperforms existing baselines, proving that timestep modulation is a robust mechanism for adapting action sequences to novel tasks.