(106g) Development of Algorithms for Augmenting and Replacing Conventional Process Control Using Reinforcement Learning
AIChE Annual Meeting
2022
2022 Annual Meeting
Computing and Systems Technology Division
Data-Driven Dynamic Modeling, Estimation and Control II
Monday, November 14, 2022 - 2:24pm to 2:43pm
In this work, a novel approach to introduce RL controllers alongside existing process controllers is developed. Because there are significant exploration requirements, learning for and implementation of RL agents that directly generate input moves can pose significant degradation in control performance. In approaching this problem, it is assumed that there exists a process controller of a standard form (PID, MPC, etc.) that regulates the plant. The RL controller is then used to calculate a control move, and a weighted sum of the two inputs is injected to the plant. The weighting factor in this approach determines when to bring the RL controller online. The RL controller starts by learning from the input profile of the standard controller, and as its performance improves its input move receives heavier weight while phasing out the standard controller. This weighting is calculated as a function of the expected return (the action-value function) approximated by the RL agent with respect to the maximum expected return (i.e., zero for a quadratic reward function), quantifying when the RL controller is performing sufficiently to regulate the plant. For the formulation of the RL controller, multiple function approximators are considered.
The approach detailed above is exhibited for the control of a benchmark, nonlinear CSTR. Here, a PID is initially implemented to regulate the plant and the RL agent, learning from a quadratic reward, is implemented alongside. Learning is carried out in episodes starting from a random state, followed by randomly sequenced disturbance injections and setpoint changes. Under this structure the performance of the controller is first presented with respect to episodic return, standardized over multiple runs. Different thresholds for bringing the RL controller online are evaluated, and controller performance is then presented for a comparison of the first and last episodes control performance (where the PID controller is fully active in the first episode, marking baseline performance). Finally, the concept of âre-learningâ is introduced, and slow changes in the plant model are used to evaluate thresholds for which the RL controller must be updated online in an average reward setting after implementation, rather than the episodic setting that was originally used for training. Due to the continuing learning, performance of the RL based control exceeds the performance of the PID or other controllers that it learns from. The algorithm is generic since the learning is based on the continuous signal and therefore can be readily extended to other control approaches such as MPC.
[1] S. Gros and M. Zanon, âData-driven economic NMPC using reinforcement learning,â IEEE Trans. Automat. Contr., vol. 65, no. 2, pp. 636â648, Feb. 2020, doi: 10.1109/TAC.2019.2913768.
[2] Y. Wu, L. Xing, F. Guo, and X. Liu, âOn the Combination of PID control and Reinforcement Learning: A Case Study with Water Tank System,â in Proceedings of the 16th IEEE Conference on Industrial Electronics and Applications, ICIEA 2021, Aug. 2021, pp. 1877â1882, doi: 10.1109/ICIEA51954.2021.9516140.
[3] N. P. Lawrence, M. G. Forbes, P. D. Loewen, D. G. McClement, J. U. Backström, and R. B. Gopaluni, âDeep reinforcement learning with shallow controllers: An experimental application to PID tuning,â Control Eng. Pract., vol. 121, 2022, doi: 10.1016/j.conengprac.2021.105046.
[4] E. Hedrick, K. Hedrick, D. Bhattacharyya, S. E. Zitney, and B. Omell, âReinforcement learning for online adaptation of model predictive controllers: Application to a selective catalytic reduction unit,â Comput. Chem. Eng., vol. 160, p. 107727, 2022, doi: 10.1016/j.compchemeng.2022.107727.
[5] X. Pan, X. Chen, Q. Zhang, and N. Li, âModel Predictive Control : A Reinforcement Learning-based Approach,â J. Phys. Conf. Ser., vol. 2203, no. 1, p. 012058, 2022, doi: 10.1088/1742-6596/2203/1/012058.
[6] Y. Yang and S. Lucia, âMulti-step greedy reinforcement learning based on model predictive control,â IFAC-PapersOnLine, vol. 54, no. 3, pp. 699â705, 2021, doi: 10.1016/j.ifacol.2021.08.323.
[7] M. Zanon and S. Gros, âSafe Reinforcement Learning Using Robust MPC,â IEEE Trans. Automat. Contr., vol. 66, no. 8, pp. 3638â3652, Aug. 2021, doi: 10.1109/TAC.2020.3024161.