(106g) Development of Algorithms for Augmenting and Replacing Conventional Process Control Using Reinforcement Learning

Conference

AIChE Annual Meeting

Year

2022

Proceeding

2022 Annual Meeting

Group

Computing and Systems Technology Division

Session

Data-Driven Dynamic Modeling, Estimation and Control II

Time

Monday, November 14, 2022 - 2:24pm to 2:43pm

Authors

Beahr, D. - Presenter

Alastanos, M., West Virginia University

Hedrick, E., West Virginia University

Bhattacharyya, D., West Virginia University

Reinforcement learning (RL) is a machine learning approach that can be used to learn policies for automatic control. The ability of RL to learn online allows for learning from signals rather than labeled data sets. Applications of RL for the control of process systems are currently being investigated in the literature [1]. Combination with and augmentation of MPC have been evaluated in several ways [4]-[7]. Coupling of RL with conventional PID control has been evaluated in terms of both tuning and augmentation for better transient performance [2], [3]. Both approaches utilize RL in order to simply improve the performance of PID control. This work focuses on developing algorithms for augmenting and eventually replacing PID with RL.

In this work, a novel approach to introduce RL controllers alongside existing process controllers is developed. Because there are significant exploration requirements, learning for and implementation of RL agents that directly generate input moves can pose significant degradation in control performance. In approaching this problem, it is assumed that there exists a process controller of a standard form (PID, MPC, etc.) that regulates the plant. The RL controller is then used to calculate a control move, and a weighted sum of the two inputs is injected to the plant. The weighting factor in this approach determines when to bring the RL controller online. The RL controller starts by learning from the input profile of the standard controller, and as its performance improves its input move receives heavier weight while phasing out the standard controller. This weighting is calculated as a function of the expected return (the action-value function) approximated by the RL agent with respect to the maximum expected return (i.e., zero for a quadratic reward function), quantifying when the RL controller is performing sufficiently to regulate the plant. For the formulation of the RL controller, multiple function approximators are considered.

The approach detailed above is exhibited for the control of a benchmark, nonlinear CSTR. Here, a PID is initially implemented to regulate the plant and the RL agent, learning from a quadratic reward, is implemented alongside. Learning is carried out in episodes starting from a random state, followed by randomly sequenced disturbance injections and setpoint changes. Under this structure the performance of the controller is first presented with respect to episodic return, standardized over multiple runs. Different thresholds for bringing the RL controller online are evaluated, and controller performance is then presented for a comparison of the first and last episodes control performance (where the PID controller is fully active in the first episode, marking baseline performance). Finally, the concept of â€œre-learningâ€ is introduced, and slow changes in the plant model are used to evaluate thresholds for which the RL controller must be updated online in an average reward setting after implementation, rather than the episodic setting that was originally used for training. Due to the continuing learning, performance of the RL based control exceeds the performance of the PID or other controllers that it learns from. The algorithm is generic since the learning is based on the continuous signal and therefore can be readily extended to other control approaches such as MPC.

[1] S. Gros and M. Zanon, â€œData-driven economic NMPC using reinforcement learning,â€ IEEE Trans. Automat. Contr., vol. 65, no. 2, pp. 636â€“648, Feb. 2020, doi: 10.1109/TAC.2019.2913768.

[2] Y. Wu, L. Xing, F. Guo, and X. Liu, â€œOn the Combination of PID control and Reinforcement Learning: A Case Study with Water Tank System,â€ in Proceedings of the 16th IEEE Conference on Industrial Electronics and Applications, ICIEA 2021, Aug. 2021, pp. 1877â€“1882, doi: 10.1109/ICIEA51954.2021.9516140.

[3] N. P. Lawrence, M. G. Forbes, P. D. Loewen, D. G. McClement, J. U. BackstrÃ¶m, and R. B. Gopaluni, â€œDeep reinforcement learning with shallow controllers: An experimental application to PID tuning,â€ Control Eng. Pract., vol. 121, 2022, doi: 10.1016/j.conengprac.2021.105046.

[4] E. Hedrick, K. Hedrick, D. Bhattacharyya, S. E. Zitney, and B. Omell, â€œReinforcement learning for online adaptation of model predictive controllers: Application to a selective catalytic reduction unit,â€ Comput. Chem. Eng., vol. 160, p. 107727, 2022, doi: 10.1016/j.compchemeng.2022.107727.

[5] X. Pan, X. Chen, Q. Zhang, and N. Li, â€œModel Predictive Control : A Reinforcement Learning-based Approach,â€ J. Phys. Conf. Ser., vol. 2203, no. 1, p. 012058, 2022, doi: 10.1088/1742-6596/2203/1/012058.

[6] Y. Yang and S. Lucia, â€œMulti-step greedy reinforcement learning based on model predictive control,â€ IFAC-PapersOnLine, vol. 54, no. 3, pp. 699â€“705, 2021, doi: 10.1016/j.ifacol.2021.08.323.

[7] M. Zanon and S. Gros, â€œSafe Reinforcement Learning Using Robust MPC,â€ IEEE Trans. Automat. Contr., vol. 66, no. 8, pp. 3638â€“3652, Aug. 2021, doi: 10.1109/TAC.2020.3024161.

Topics

Process Automation & Control

Computing and Systems Engineering

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: November 2024

CEP: October 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.