(14f) Transfer Reinforcement Learning-Based Optimal Control of Nonlinear Systems | AIChE

(14f) Transfer Reinforcement Learning-Based Optimal Control of Nonlinear Systems

Authors 

Wu, G. - Presenter, National University of Singapore
Wang, Y., National University of Singapore
Xiao, M., National University of Singapore
Wu, Z., University of California Los Angeles
Safe reinforcement learning (RL) has been developed to address safety concerns in optimal control of industrial safety-critical applications, such as self-driving cars, robotics, and chemical processes [1-3]. However, the safe RL algorithms require a considerable amount of training time to obtain an optimal control policy, as this process relies on data and experience gained through interaction with the environment. Additionally, designing control policies that ensure the stabilization of the closed-loop systems during the learning process remains an unresolved issue. Due to these challenges, research efforts have been made to improve traditional safe RL algorithms. Transfer learning (TL), an emerging subfield of machine learning, focuses on transferring the knowledge and experience gained from one task to another related task. The advantage of this method lies in its ability to provide a warm start to RL such that overall learning time is reduced and learning performance is improved. However, the integration of TL into RL would inevitably lead to safety issues due to the discrepancy between the target and the source tasks. How to achieve a balance between efficient knowledge transfer and safe exploration remains an open question.

Motivated by the above challenges, we propose a safe transfer reinforcement learning (TRL) framework. The algorithm leverages knowledge obtained from pre-trained source tasks to expedite learning in a new yet related target task, thereby significantly reducing both learning time and computational overhead for optimizing a control policy. Additionally, since there is a discrepancy between target and source tasks, the knowledge transferred to the target process may cause performance degradation. To account for this discrepancy, we develop a theoretical analysis via statistical learning theory to characterize the performance of TRL by accounting for the differences between the source and target tasks [4,5]. Furthermore, the proposed TRL method is designed to collect data and optimize the control policy within a control invariant set (CIS) to ensure the safety of the system throughout the learning process. Finally, we apply the proposed TRL method to an example of optimal control of a chemical reactor, showcasing its effectiveness in solving the optimal control problem with improved computational efficiency and safety guarantees compared to traditional RL without using transfer learning.

References:

[1] B. Yan, P. Shi, C. P. Lim, Y. Sun, and R. K. Agarwal, “Security and safety-critical learning-based collaborative control for multiagent systems,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1–12, 2024.

[2] Dogru O, Xie J, Prakash O, et al. Reinforcement learning in process industries: review and perspective[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(2): 283-300.

[3] A. B. Jeddi, N. L. Dehghani, and A. Shafieezadeh, “Lyapunov-based uncertainty-aware safe reinforcement learning,” arXiv preprint arXiv:2107.13944, 2021.

[4] M. Xiao, C. Hu, and Z. Wu, “Modeling and predictive control of nonlinear processes using transfer learning method,” AIChE Journal, p. e18076, 2023.

[5] Y. Wang and Z. Wu, “Control lyapunov-barrier function-based safe reinforcement learning for nonlinear optimal control,” AIChE Journal, p. e18306, 2023.