(522h) Constrained Reinforcement Learning for Process Optimization and Control | AIChE

(522h) Constrained Reinforcement Learning for Process Optimization and Control

Authors 

del Rio Chanona, A. - Presenter, Imperial College London
Petsagkourakis, P., University College London
Bradford, E., NTNU
Sandoval Cardenas, I. O., Imperial College London
Galvanin, F., University College London
The optimization of chemical processes presents distinctive challenges to the stochastic systems community given that they suffer from three conditions:

  • There is no precise known model for most industrial scale processes (plant model mismatch), leading to inaccurate predictions and convergence to suboptimal solutions.
  • The process is affected by disturbances (i.e. it is stochastic).
  • State constraints must be satisfied due to operational and safety concerns, therefore constraint violation can be detrimental or even dangerous.

To solve the above problems, we propose a Reinforcement Learning (RL) Policy Gradient method, which satisfies chance constraints with probabilistic guarantees.

Machine learning is helping to address complex problems in the chemical and process industries, such as process optimal control [1,2], estimation and online monitoring [3,4]. However, less studies have been conducted to investigate the applicability and efficiency of RL in process engineering, and none include the efficient handling of constraints.

RL would be a natural choice to address nonlinear, uncertain and stochastic process control problems as it effectively addresses stochastic environments [5]. Unfortunately, present RL algorithms fail to reliably satisfy state constraints even when initialized with feasible initial policies [6]. Various approaches have been proposed in the litera-ture, where usually penalties are applied for the constraints.Such approaches can be very problematic, easily losing op-timality or feasibility [7] especially in the case of a fixed penalty. The main approaches to incor-porate constraints in this way make use of trust-region and fixed penalties [9],as well as cross entropy [7]. As it is observed in [8], when penalty methods are ap-plied in policy optimization, depending on the value of the penalty parameter the behaviour of the policy may change.

We propose a constrained RL algorithm which guarantees the satisfaction of joint chance constraints. To accomplish this, we propose the introduction of backoffs, which are computed simultaneously with the feedback policy. Backoffs are adjusted with Bayesian optimization using the empirical cumulative distribution function, which can, therefore, guarantee the satisfaction of joint chance constraints.

[1] Bradford, E.; Schweidtmann, A. M.; Zhang, D.; Jing, K. and del Rio-Chanona, E. A., Dynamic modeling and optimization of sustainable algal production with uncertainty using multivariate Gaussian processes, 118, 143-158, 2018

[2] del Rio-Chanona, E. A.; Fiorelli, F.; Zhang, D.; rashid Ahmed, N.; Jing, K.; and Shah, N, An efficient model construction strategy to simulate microalgal lutein photo-production dynamic process, 114(11), 2518-2527, 2017

[3] do Carmo Nicoletti, M. and Jain, L. C., Computational Intelligence Techniques for Bioprocess Modelling, Supervision and Control, Volume 218 of Studies in Computational Intelligence, Springer Science & Business Media, 29 Jun 2009

[4] Xiong, Z. and Zhang, J., Modelling and optimal control of fed-batch processes using a novel control affine feedforward neural network, Neurocomputing, 61, 317-337, 2004

[5] Petsagkourakis, P.; Sandoval, I. O.; Bradford E.; Zhang, D. and del Rio-Chanona, E. A., Reinforcement Learning for Batch Bioprocess Optimization, 133, 2020

[6] Wen, M., Constrained Cross-Entropy Method for Safe Reinforcement Learning, Neural Information Processing Systems (NeurIPS), 2018

[7] Ray, A.; Achiam, J. and Amodei, D., Benchmarking Safe Exploration in Deep Reinforcement Learning, Deep RL Workshop NeurIPS 2019, arXiv:1910.01708, 2019

[8] Achiam, J; Held, D.; Tamar, A. and Abbeel, P., Constrained Policy Optimization, International Conference on Machine Learning (ICML) 2017

[9] Tessler, C.; Mankowitz, D. J.; and Mannor, S.; Reward Constrained Policy Optimization, International Conference on Learning Representations (ICLR) 2019