(108b) Fast-Convergence of Deep Reinforcement Learning Controller: Application to a Continuous Stirred Tank Reactor

Conference

AIChE Annual Meeting

Year

2020

Proceeding

2020 Virtual AIChE Annual Meeting

Group

Topical Conference: Next-Gen Manufacturing

Session

Artificial Intelligence and Advanced Computation II

Time

Tuesday, November 17, 2020 - 8:15am to 8:30am

Authors

Bangi, M. S. F. - Presenter, Texas A&M University

Kwon, J., Texas A&M University

Reinforcement Learning (RL) originated several decades ago in computer science and operations research to solve complex sequential decision-making problems but its application to process control has been recent and limited. RL involves an agent that learns the optimal policy by interacting with the environment in real-time [1]. Many approaches have been proposed in the past to solve the RL problem but recent advancements in deep learning have made it possible to combine deep neural networks (DNNs) with RL. This combination has already delivered tremendous success in video games like Atari games [2] etc. More recently, in the context of process control, a Deep RL (DRL) controller, a model-free off-policy actor-critic algorithm, was proposed based on temporal difference (TD) learning [1] and Deterministic Policy Gradient (DPG) algorithm [3] for controlling discrete-time nonlinear processes [4]. The DRL controller utilizes two DNNs to generalize the actor and the critic to the continuous state and action spaces, and two more as target networks for their learning. Ideas like replay memory and gradient clipping were used in DRL controller to make learning appropriate for process control applications. The DRL controller was able to solve set-point tracking problems for single-input-single-output (SISO), multi-input-multi-output (MIMO), and a nonlinear system with external disturbances [4].

Despite its success, DRL controller has few limitations including the requirement of a large amount of data and high computational loads, and careful selection and initialization of hyperparameters for fast convergence, etc. Additionally, one glaring limitation of DRL controller, as with many other RL methods, is the long training time before it can deliver satisfactory control performance [5]. In order to overcome this challenge, we propose to train the actor and the critic offline using historical process data before using it for online control. For the actor network which approximates the policy function, we use the information of past states and control actions to train it offline until convergence within the training region is achieved. To train the critic network offline, which approximates the action-value function, we use the information of reward gain, calculated based on a pre-defined reward function, to train it offline until convergence is achieved. Once trained offline, we use the learned actor-critic as the starting point in the DRL controller. This pre-trained DRL controller is implemented to track concentration and temperature set-points for a continuous stirred tank reactor (CSTR) process, and we successfully demonstrate its ability to adapt and learn to track set-points outside the training region faster compared to a DRL controller that was randomly initialized. We also compare the control performance of this pre-trained DRL controller against a model-predictive controller in tracking a set-point.

Literature cited:

[1] Sutton, R.S., Barto, A.G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 1998.

[2] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M. Playing Atari with Deep Reinforcement Learning. arXiv Preprint, arXiv:13125602, 2013.

[3] Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M. Deterministic policy gradient algorithms. In ICML, 2014.

[4] Spielberg, S., Tulsyan, A., Lawrence, N.P., Loewen, P.D., Bhushan Gopaluni, R. Toward selfâ€driving processes: A deep reinforcement learning approach to control. AIChE J, 65(10), 2019.

[5] Shin, J., Badgwell T.A., Liu, KH., Lee, J.H. Reinforcement Learning â€“ Overview of recent progress and implications for process control. Computer Aided Chemical Engineering, 44:71-85, 2018.

Topics

Process Automation & Control

Computing and Systems Engineering

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2024 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: October 2024

CEP: September 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(108b) Fast-Convergence of Deep Reinforcement Learning Controller: Application to a Continuous Stirred Tank Reactor

AIChE Annual Meeting

2020

2020 Virtual AIChE Annual Meeting

Topical Conference: Next-Gen Manufacturing

Artificial Intelligence and Advanced Computation II

Tuesday, November 17, 2020 - 8:15am to 8:30am

Authors

Topics

More Conference Links

Contact Us

Cancellation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams