Reinforcement Learning for Safety-Critical Control under Model Uncertainty, using Control Lyapunov Functions and Control Barrier Functions

Jason Choi (UC Berkeley); Fernando CastaƱeda (UC Berkeley); Claire Tomlin (UC Berkeley); Koushil Sreenath (Berkeley)


In this paper, the issue of model uncertainty in safety-critical control is addressed with a data-driven approach. For this purpose, we utilize the structure of an input-ouput linearization controller based on a nominal model along with a Control Barrier Function and Control Lyapunov Function based Quadratic Program (CBF-CLF-QP). Specifically, we propose a novel reinforcement learning framework which learns the model uncertainty present in the CBF and CLF constraints, as well as other control-affine dynamic constraints in the quadratic program. The trained policy is combined with the nominal model-based CBF-CLF-QP, resulting in the Reinforcement Learning-based CBF-CLF-QP (RL-CBF-CLF-QP), which addresses the problem of model uncertainty in the safety constraints. The performance of the proposed method is validated by testing it on an underactuated nonlinear bipedal robot walking on randomly spaced stepping stones with one step preview, obtaining stable and safe walking under model uncertainty.

Live Paper Discussion Information

Start Time End Time
07/16 15:00 UTC 07/16 17:00 UTC

Virtual Conference Presentation

Paper Reviews

Review 1

The presentation is clear and the reader can follow the main ideas. Even though the paper does not present any theoretical guarantees on feasibility (rightfully noted in the discussion), the articulation of the derivation of the framework and the application to bipedal robots are both convincing. Bibliographical entries should be given a general check for consistency and typos.

Review 2

This is an important contribution to RSS, and addresses some of the pressing problems in the real-world implementation of QP based control laws. In addition, given that there are sim-to-real problems with the direct deployment of DNN based control laws, this manuscript provides a fusion of QP and RL based methods for walking. In other words, this method not only learns the uncertainties in the system, but also provides safety guarantees that are very critical for proper deployment of control in hardware. However, there are some concerns in the manuscript that require a thorough analysis. They are listed below: 1) It is important to note that measuring accelerations are hard, since they are very noisy. There is no discussion on how this can be addressed in hardware. In addition, the losses used for training are dependent on the acceleration terms. I feel that this will be detrimental to the implementation of these types of QPs in hardware. 2) This has been a common problem with all the CLF-QP implementations that I have seen: why feedback linearization based QPs? There are no alternative CLFs? Why so much reliance on IO linearization? In other words, why should the RL framework learn the uncertainties in a IO control law? 3) I could be wrong, but I have not seen an experimental implementation of the exponential type of CBFs that were described in the manuscript (for walking). What are the possible reasons? And how can this limit the implementation of methods proposed in this manuscript? 4) There was no detailed discussion on the feasibility, although there was a short sentence in VIII. Why is the proposed CLF-CBF-QP not feasible? 5) What happens if the model is scaled down (to 0.5) instead of scaling up. Also what is the affect of joint friction, damping, introduction of motor losses, bending of links etc. These are some of the problems associated with the experimental implementation. 6) DDPG was used for learning the uncertainty. Please comment on other techniques like PPO, SAC etc. This direction is promising, and can serve as the first steps towards bridging the gap between model based methods and learning. Given that this is a simulation result, I feel that a detailed analysis of the practicality of this method is required. There should be a discussion on what are the potential drawbacks, and provide some resolutions.

Review 3

Per the reviewer's knowledge, the paper presents an original work. It is well written and easy to follow. The proposed approach has the potential to significantly impact on addressing model uncertainty in robotics and autonomous systems. Here are some detailed comments: - In (27) and (28): the decoupling matrix in u_{\theta} is defined in terms of \tilde{f} and \tilde{g}. Why they become identity when applying it to dynamics in (2)? Also, should \alpha and \beta be switched in (28), according to the definition in (27)? - The authors claim that that \dot{V_{\epsilon}} and B^{(r_b)} are computed using numerical differentiation. I wonder how much noise such numerical differentiation introduces to the computation, especially for high relative degree CBFs. My concern is that the process may introduce noisy control inputs that destabilize the system. - What is the simulation environment for training and validation? How do you model the groud contacts? The simulation results show that the friction cone constraints are violated in some cases, but the robot continues to walk. In reality, the violation could cause a failure in walking. - While the RL-CLF-QP method has better performance in satisfying the friction cone constraints, it exhibits larger tracking errors than the standard L1 method. It would be nice if the authors provide a plausible explanation of such phenomena.