Safe Reinforcement Learning via Statistical Model Predictive Shielding

Osbert Bastani (University of Pennsylvania), Shuo Li (University of Pennsylvania), Anton Xue (University of Pennsylvania)
	Paper #026

Interactive Poster Session I	Interactive Poster Session IV
0d 00h 00m	0d 00h 00m

Abstract

Reinforcement learning is a promising approach to solving hard robotics tasks. An important challenge is ensuring safety—e.g., that a walking robot does not fall over or an autonomous car does not crash into an obstacle. We build on an approach that composes the learned policy with a backup policy—it uses the learned policy on the interior of the region where the backup policy is guaranteed to be safe, and switches to the backup policy on the boundary of this region. The key challenge is checking when the backup policy is guaranteed to be safe. Our algorithm, statistical model predictive shielding (SMPS), uses sampling-based verification and linear systems analysis to perform this check. We prove that SMPS ensures safety with high probability, and empirically evaluate its performance on several benchmarks.

Spotlight Presentation