(59ao) Augmented Control Using Reinforcement Learning and Conventional Process Control
AIChE Annual Meeting
2023
2023 AIChE Annual Meeting
Computing and Systems Technology Division
Interactive Session: Data and Information Systems
Tuesday, November 7, 2023 - 3:30pm to 5:00pm
In this work, we propose a control structure by augmenting existing conventional process control (CPC) methods with an RL agent. An actor-critic structure was adopted for the RL that utilized a deep deterministic policy gradient (DDPG) algorithm. Due to the generally slow learning rates and high exploration requirements of RL, it was desired that the existing conventional process control (PID, MPC) continues to compute their own generated control action that enhances the learning rate of the RL agent. A weighted sum is derived from the control actions of the RL and CPC, and subsequently applied to the plant; the resultant states and actions are then used to supplement the RL agentâs learning. The proposed algorithm avoids direct action of the naive RL agent that may not result in acceptable performance and even be unsafe under worst case scenarios. Algorithms are developed for the weighting function based on a measure of instantaneous and historical performance as well as based on a game-theoretic approach that can incorporate expert advice, if available. Performance of both the RL and CPC method are assessed using a moving horizon that also decays with time, valuing more recent actions as more relevant than older actions. In this way, the RL agent can take over control as and when its control performance exceeds that of the conventional control method. If the RLâs performance begins to deteriorate, the conventional control method would again assume full control. The advantage of the DDPG approach is that it is able to handle continuous action spaces and is able to be compared to on a one-to-one basis with control actions provided by the conventional control method.
The above approach is applied to a flowsheet of solid oxide fuel cell (SOFC) [8]. The conventional control process in place is a series of PIDs, some arranged in cascade loops. Because of the mode-switching nature of the SOFC as well as the complex dynamics, the PID performance is often poor whereas the actor-critic structure of the RL algorithm facilitates capturing the dynamics accurately. For this case study, the RL is proposed to augment, and eventually phase out, the feed-side cascade loop. Other arrangements of the PID-RL structure are also considered, incorporating other PIDs. The episodic learning used for the RL-PID arrangement is a series of mode-switches from maximum hydrogen production to maximum power production and back to maximum hydrogen production. While learning is episodic in nature, the states are continuous across episodes, creating a consistent measure of performance improvement.
[1] S. Gros and M. Zanon, âData-driven economic NMPC using reinforcement learning,â IEEE Trans. Automat. Contr., vol. 65, no. 2, pp. 636â648, Feb. 2020, doi: 10.1109/TAC.2019.2913768.
[2] E. Hedrick, K. Hedrick, D. Bhattacharyya, S. E. Zitney, and B. Omell, âReinforcement learning for online adaptation of model predictive controllers: Application to a selective catalytic reduction unit,â Comput. Chem. Eng., vol. 160, p. 107727, 2022, doi: 10.1016/j.compchemeng.2022.107727.
[3] X. Pan, X. Chen, Q. Zhang, and N. Li, âModel Predictive Control : A Reinforcement Learning-based Approach,â J. Phys. Conf. Ser., vol. 2203, no. 1, p. 012058, 2022, doi: 10.1088/1742-6596/2203/1/012058.
[4] Y. Yang and S. Lucia, âMulti-step greedy reinforcement learning based on model predictive control,â IFAC-PapersOnLine, vol. 54, no. 3, pp. 699â705, 2021, doi: 10.1016/j.ifacol.2021.08.323.
[5] M. Zanon and S. Gros, âSafe Reinforcement Learning Using Robust MPC,â IEEE Trans. Automat. Contr., vol. 66, no. 8, pp. 3638â3652, Aug. 2021, doi: 10.1109/TAC.2020.3024161.
[6] Y. Wu, L. Xing, F. Guo, and X. Liu, âOn the Combination of PID control and Reinforcement Learning: A Case Study with Water Tank System,â in Proceedings of the 16th IEEE Conference on Industrial Electronics and Applications, ICIEA 2021, Aug. 2021, pp. 1877â1882, doi: 10.1109/ICIEA51954.2021.9516140.
[7] N. P. Lawrence, M. G. Forbes, P. D. Loewen, D. G. McClement, J. U. Backström, and R. B. Gopaluni, âDeep reinforcement learning with shallow controllers: An experimental application to PID tuning,â Control Eng. Pract., vol. 121, 2022, doi: 10.1016/j.conengprac.2021.105046.
[8] D. Bhattacharyya and R. Rengaswamy, âA review of solid oxide fuel cell (SOFC) dynamic models,â Ind. Eng. Chem. Res., vol. 48, no. 13, pp. 6068â6086, Jul. 2009, doi: 10.1021/ie801664j.