(59ah) Control Invariant Set Enhanced Reinforcement Learning for Process Control: Improved Sampling Efficiency and Guaranteed Stability
AIChE Annual Meeting
2023
2023 AIChE Annual Meeting
Computing and Systems Technology Division
Interactive Session: Data and Information Systems
Tuesday, November 7, 2023 - 3:30pm to 5:00pm
Safe RL algorithms have been developed to address these challenges, but some approaches increase the probability of the safety but without guarantee [3, 4, 5] and some approaches still require high computational burden online since MPC is involved [6, 7]. Control invariant sets (CIS) play a crucial role in ensuring the stability of a control system, and incorporating the concept of CIS in RL is expected to improve stability and efficiency. Different algorithms combining CIS and RL have been proposed, by designing the CIS as a filter to project risky actions to safe ones [8, 9, 10, 11]. Due to the challenge of obtaining a CIS for a general nonlinear system, researchers have shifted their focus towards implicit methods that utilize control barrier functions (CBF), Hamilton-Jacobi (HJ) reachability analysis, and safe backup controllers to define safety constraints and design filters indirectly [12]. In addition, they have mainly been studied in robotics, and limited research has been conducted in process control, where control problems are more complex.
While the construction of a CIS is not a trivial task, various methods, using graph-based approach [13], data-driven approach [14, 15], etc., have been developed in the past decade. These works motivate us to study the explicit integration of RL and CIS for process control, where the CIS can serve as a state space for the RL agent to explore, safely. Minimal modification to the RL algorithms is required, while the reward function design can incorporate both economic or zone tracking objectives which are common in process control.
In this presentation, the CIS of a nonlinear process is assumed to be available. Then, a two-stage CIS enhanced RL is proposed to improve the sampling efficiency and guarantee the stability. The first stage involves offline training with a process model and the CIS. Due to the potential disastrous consequences of failed process control, the use of a model to pre-train the RL offline can provide a significant amount of data with strong temporal correlation and broad coverage of various scenarios. The introduction of CIS has the potential to narrow down the state space, reduce the training dataset size, and provide guidance on agent exploration. However, exhaustive training cannot guarantee that the RL agent has encountered every scenario, which may result in instability in online implementation. Hence, the second online implementation stage involves online learning when the safety constraint is violated. A new control implementation strategy is proposed to ensure closed-loop stability. The proposed approach is applied to a chemical reactor to demonstrate its applicability and efficiency.
References
[1] David Q Mayne, James B Rawlings, Christopher V Rao, and Pierre OM Scokaert. Constrained model predictive control: Stability and optimality. Automatica, 36(6):789–814, 2000.
[2] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.
[3] Javier Garcıa and Fernando Fernández. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437–1480, 2015.
[4] Pavel Osinenko, Dmitrii Dobriborsci, and Wolfgang Aumer. Reinforcement learning with guarantees: a review. IFAC-PapersOnLine, 55(15):123–128, 2022.
[5] Shangding Gu, Long Yang, Yali Du, Guang Chen, Florian Walter, Jun Wang, Yaodong Yang, and Alois Knoll. A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330, 2022.
[6] Mario Zanon, Sébastien Gros, and Alberto Bemporad. Practical reinforcement learning of stabilizing economic mpc. In 2019 18th European Control Conference (ECC), pages 2258–2263. IEEE, 2019.
[7] Sébastien Gros and Mario Zanon. Reinforcement learning based on mpc and the stochastic policy gradient method. In 2021 American Control Conference (ACC), pages 1947–1952. IEEE, 2021.
[8] Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. Safe reinforcement learning via shielding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
[9] Sebastien Gros, Mario Zanon, and Alberto Bemporad. Safe reinforcement learning via projection on a safe set: How to achieve optimality? IFAC-PapersOnLine, 53(2):8076–8081, 2020.
[10] Shuo Li and Osbert Bastani. Robust model predictive shielding for safe reinforcement learning with stochastic dynamics. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 7166–7172. IEEE, 2020.
[11] Daniel Tabas and Baosen Zhang. Computationally efficient safe reinforcement learning for power systems. In 2022 American Control Conference (ACC), pages 3303–3310. IEEE, 2022.
[12] Lukas Brunke, Melissa Greeff, Adam W Hall, Zhaocong Yuan, Siqi Zhou, Jacopo Panerati, and Angela P Schoellig. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5:411–444, 2022.
[13] Benjamin Decardi-Nelson and Jinfeng Liu. Computing robust control invariant sets of constrained nonlinear systems: A graph algorithm approach. Computers & Chemical Engineering, 145:107177, 2021.
[14] Shaoru Chen, Mahyar Fazlyab, Manfred Morari, George J Pappas, and Victor M Preciado. Learning region of attraction for nonlinear systems. In 2021 60th IEEE Conference on Decision and Control (CDC), pages 6477–6484. IEEE, 2021.
[15] Angelo D Bonzanini, Joel A Paulson, Georgios Makrygiorgos, and Ali Mesbah. Scalable estimation of invariant sets for mixed-integer nonlinear systems using active deep learning. In 2022 IEEE 61st Conference on Decision and Control (CDC), pages 3431–3437. IEEE, 2022.