(207a) Offline Reinforcement Learning for Control of Complex Chemical Processes | AIChE

(207a) Offline Reinforcement Learning for Control of Complex Chemical Processes

Authors 

Park, J. S., Seoul National University
Shim, J., Seoul National University
Lee, J. M., Seoul National University
In recent years, Reinforcement Learning (RL) has made remarkable progress in control application areas. The majority of RL research has centered on online learning, as it allows for real-time adaptation to changes in the environment and can provide better control performance under uncertainty, which is particularly useful when supported by precisely modeled simulators.

In the case of chemical processes, however, the disadvantages of ‘online RL’ often outweigh its benefits. Online learning requires significant computational resources and time to derive a ‘stable’ policy. Moreover, the exploration process may involve unnecessarily aggressive actions, making it challenging to apply online RL to chemical processes, where a single wrong action could result in significant economic loss or physical damage [1].

Despite the potential of online RL, many chemical plants continue to rely on simple PID control or model predictive control (MPC). Since they require models of complex chemical systems, we first conduct as precise modeling as possible. However, in addition to requiring a substantial amount of time and effort, it is also difficult to account for the diverse stochastic behaviors observed in real plants. Additionally, system identification based on historical operation data may result in biased modeling and control results due to limited data availability, as chemical plants typically operate only within a specific range of desired conditions [2-4].

We propose the application of offline RL to chemical processes to circumvent the aforementioned issues. Offline RL is a type of RL in which learning relies solely on pre-collected data and does not require real-time interaction with the environment. By utilizing the available data directly for agent learning, we can eliminate the need for searching background knowledge, a complex modeling process, and the use of an imperfect, biased, and uncertain environment model. In addition, offline RL enables us to inspect potentially hazardous or undesirable actions before they are introduced into the learning phase and guarantees its performance is no less than a bound with a high probability, making it more suitable for chemical processes with stringent safety requirements [5].

To maximize the benefits of offline RL, we utilize the Conservative Q Learning (CQL) and Soft Actor-Critic (SAC) algorithms. To address the common issue of overestimated Q values for unseen (out-of-distribution) actions, the update rule of the Q function was modified to include a term that minimizes Q value estimates for such actions. The SAC algorithm enables the agent to perform balanced exploration within a limited dataset and learn a stochastic policy for a complex chemical system [5, 6].

We evaluated our approach on various chemical systems, from simple Van De Vusse CSTR to highly complex systems with complex reactions, kinetics, and time-varying dynamics. The plant was represented by a rigorous first-principles model, and operation data were obtained by numerically solving the model. We compared the control performance of offline RL to that of MPC, which was performed using the same operation data following system identification. The results show that offline RL not only offers a computationally efficient implementation but also improves control performance. This suggests that offline RL can be a promising approach for controlling unknown and highly complex chemical processes with limited data.

References

[1] T. Mannucci, E. J. van Kampen, C. De Visser, Q. Chu, "Safe exploration algorithms for reinforcement learning controllers." IEEE transactions on neural networks and learning systems 29.4 (2017): 1069-1081.

[2] J. Deng, S. Sierla, J. Sun, V. Vyatkin, "Offline reinforcement learning for industrial process control: a case study from steel industry." Information Sciences (2023).

[3] S. Sharifian, R. Sotudeh-Gharebagh, R. Zarghami, P. Tanguy, N. Mostoufi, "Uncertainty in chemical process systems engineering: a critical review." Reviews in Chemical Engineering 37.6 (2021): 687-714.

[4] A. S. Badwe, R. S. Patwardhan, S. L. Shah, S. C. Patwardhan, R. D. Gudi, “Quantifying the impact of model-plant mismatch on controller performance.” Journal of Process Control 20.4 (2010): 408-425.

[5] S. Levine, A. Kumar, G. Tucker, J. Fu, "Offline reinforcement learning: Tutorial, review, and perspectives on open problems." arXiv preprint arXiv:2005.01643 (2020).

[6] A. Kumar, A. Zhou, G. Tucker, S. Levine, "Conservative q-learning for offline reinforcement learning." Advances in Neural Information Processing Systems 33 (2020): 1179-1191.