Bo, S. - Presenter, University of Alberta
Xunyuan, Y., Nanyang Technological University
Liu, J., University of Alberta
In process control, model predictive control (MPC) is a standard approach to optimal control. It is formulated as a constraint optimization problem in which the safety constraints are considered explicitly [1]. However, for large-scale systems, MPC may suffer from high computational complexity. Reinforcement learning (RL), a class of optimal control algorithms that enables machines to learn an optimal policy through trial and error, provides an alternative to MPC for optimal control and can shift the complex optimization calculations to offline training based on a model [2]. However, the standard RL approach does not incorporate safety constraints in its design.

Safe RL algorithms have been developed to address these challenges, but some approaches increase the probability of the safety but without guarantee [3, 4, 5] and some approaches still require high computational burden online since MPC is involved [6, 7]. Control invariant sets (CIS) play a crucial role in ensuring the stability of a control system, and incorporating the concept of CIS in RL is expected to improve stability and efficiency. Different algorithms combining CIS and RL have been proposed, by designing the CIS as a filter to project risky actions to safe ones [8, 9, 10, 11]. Due to the challenge of obtaining a CIS for a general nonlinear system, researchers have shifted their focus towards implicit methods that utilize control barrier functions (CBF), Hamilton-Jacobi (HJ) reachability analysis, and safe backup controllers to define safety constraints and design filters indirectly [12]. In addition, they have mainly been studied in robotics, and limited research has been conducted in process control, where control problems are more complex.

While the construction of a CIS is not a trivial task, various methods, using graph-based approach [13], data-driven approach [14, 15], etc., have been developed in the past decade. These works motivate us to study the explicit integration of RL and CIS for process control, where the CIS can serve as a state space for the RL agent to explore, safely. Minimal modification to the RL algorithms is required, while the reward function design can incorporate both economic or zone tracking objectives which are common in process control.

In this presentation, the CIS of a nonlinear process is assumed to be available. Then, a two-stage CIS enhanced RL is proposed to improve the sampling efficiency and guarantee the stability. The first stage involves offline training with a process model and the CIS. Due to the potential disastrous consequences of failed process control, the use of a model to pre-train the RL offline can provide a significant amount of data with strong temporal correlation and broad coverage of various scenarios. The introduction of CIS has the potential to narrow down the state space, reduce the training dataset size, and provide guidance on agent exploration. However, exhaustive training cannot guarantee that the RL agent has encountered every scenario, which may result in instability in online implementation. Hence, the second online implementation stage involves online learning when the safety constraint is violated. A new control implementation strategy is proposed to ensure closed-loop stability. The proposed approach is applied to a chemical reactor to demonstrate its applicability and efficiency.


