(59ac) Control Lyapunov-Barrier Function-Based Safe Reinforcement Learning for Nonlinear Optimal Control | AIChE

(59ac) Control Lyapunov-Barrier Function-Based Safe Reinforcement Learning for Nonlinear Optimal Control

Authors 

Wang, Y., National University of Singapore
Modern industries have become increasingly focused on achieving optimal performance in their operations to address the challenge of balancing resource and environmental concerns in the new era [1]. Meanwhile, ensuring the safety of a system is also essential to prevent unforeseen events and protect human life and the environment. In many industrial processes, there are specific safe operating ranges for various parameters, such as temperature, pressure, and concentration, which must be maintained to prevent unsafe operations [2]. Given the practical need to balance optimality and safety, it is important to study the problem of safe optimal control. Reinforcement learning (RL) is to learn an optimal policy that maximizes the user-provided reward function. Unlike supervised learning that is provided with a training data set collected in advance offline, RL collects data online and learns the optimal policy in an interactive environment by trial and error using feedback from its own actions and experiences, which may lead to unsafe process operations during data collection. Specifically, system safety requires particular attention since exceeding the allowable range of system states and/or actuators can cause severe consequences. To achieve optimality while ensuring system safety, various techniques incorporated with RL have recently been studied [3-6]. For example, a barrier function-based safe RL (SRL) is developed in [5] by designing a barrier function-based RL control strategy to ensure safety during learning. However, since RL requires an admissible control law that ensures the stability and safety at the initial learning stage, the design of an SRL with guaranteed safety and stability remains an open question.

This work presents a novel safe reinforcement learning method to solve the safe optimal control problem for nonlinear systems with input constraints. Specifically, we design a new performance index function such that the value function is a control Lyapunov-barrier function (CLBF) with inherent stability and safety properties [6], and therefore, lead to an optimal control policy with guaranteed stability, safety, and optimality simultaneously. Since it is challenging to obtain the closed form of value functions, we use neural networks (NN) to approximate the CLBF-based value function in the policy iteration algorithm, for which the process operation data that indicate safe and unsafe operations is used to develop an NN-based barrier function. The theoretical results on stability, safety, and optimality of SRL method is developed by accounting for the generalization error of NN-based value function. Finally, an application to a chemical process example is presented to show the effectiveness of the proposed control strategy.

References:

  1. Nian, J. Liu, and B. Huang, “A review on reinforcement learning: Introduction and applications in industrial process control,” Computers and Chemical Engineering, vol. 139, p. 106886, 2020
  2. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs for safety critical systems,” IEEE Transactions on Automatic Control, vol. 62, no. 8, pp. 3861–3876, 2016
  3. Zanon and S. Gros, “Safe reinforcement learning using robust MPC,” IEEE Transactions on Automatic Control, vol. 66, no. 8, pp. 3638–3652, 2021.
  4. H. Cohen and C. Belta, “Safe exploration in model-based reinforcement learning using control barrier functions,” Automatica, vol. 147, p. 110684, 2023.
  5. Marvi and B. Kiumarsi, “Safe reinforcement learning: A control barrier function optimization approach,” International Journal of Robust and Nonlinear Control, vol. 31, no. 6, pp. 1923–1940, 2021.
  6. Wu, F. Albalawi, Z. Zhang, J. Zhang, H. Durand and P. D. Christofides, “Control Lyapunov-Barrier Function-Based Model Predictive Control of Nonlinear Systems,” Automatica, 109, 108508, 2019.