Michael Lutter (TU Darmstadt), Shie Mannor (Technion), Jan Peters (TU Darmstadt), Dieter Fox (NVIDIA), Animesh Garg (University of Toronto, Vector Institute, NVIDIA) |
|
Paper #007 |
Interactive Poster Session I | Interactive Poster Session IV |
When transferring a control policy from simulation to a physical system, this policy needs to be robust to variations in the dynamics to perform well. Commonly, the optimal policy overfits to the approximate model and the corresponding state-distribution. Therefore, the policy fails when transferred to the physical system. In this paper, we are presenting robust value iteration. This approach uses dynamic programming to compute the optimal value function on the compact state domain and incorporates adversarial perturbations of the system dynamics. The adversarial perturbations cause the resulting optimal policy to be robust to changes in the dynamics. Utilizing the continuous time perspective of reinforcement learning, we derive the optimal perturbations for the states, actions, observations and model parameters in closed-form. The resulting algorithm does not require discretization of states or actions. Therefore, the optimal adversarial perturbations can be efficiently incorporated in the min-max value function update. We apply the resulting algorithm to the physical Furuta Pendulum and cartpole. By changing the masses of the systems we evaluate the quantitative and qualitative performance across different model parameters. We show that robust value iteration is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.