(299h) Quantitative Comparison of Model-Free Reinforcement Learning and Data-Driven Model Based Optimal Control | AIChE

(299h) Quantitative Comparison of Model-Free Reinforcement Learning and Data-Driven Model Based Optimal Control

The sequential use of system identification, filtering, and model predictive control has been the standard procedure to implement the optimal control in the chemical process. Thanks to the linear system theory, computationally efficient methods such as N4SID, Kalman filter, and LQR were developed, and the optimal control implemented by this procedure shows a satisfactory performance. However, because the methods heavily benefit from the linear system theory, their extension to the nonlinear system with the economic cost becomes challenging tasks.

Recently, the digitalization of the manufacturing process is expected to provide a platform where the process data can be used online. Therefore, model-free reinforcement learning (RL), the data-based method, is getting attention to be another option to implement optimal control in chemical processes. The model-free RL does not rely on linearity and can learn the optimal control policy using process data only. Focusing on this generality, various studies have been conducted to apply RL to chemical processes, including microfluidic, textile, simulated moving bed, polishing, polymerization, and mainly the bioprocess. These studies certainly show the potential advantages of using RL, emphasizing its ability to handle nonlinear and stochastic systems without a model. However, they also reveal several limitations of RL, such as

  • RL may require excessive data to show a reasonable performance.
  • Conventional RL methods do not ensure the satisfaction of constraints.
  • The performance is sensitive to the value of hyper-parameters.

The overall schemes of the model-based approach and RL are presented in Figures 1 and 2, respectively. The key difference is the outcome obtained from the process data. In figure 1, the state-space model is obtained by using the system identification method. Then, the control policy is obtained by performing the optimization using the fitted model. On the other hand, the control policy is directly obtained from the data under the RL scheme.

This study quantitatively compares the conventional model-based approach and model-free RL concerning various criteria, such as control performance, data efficiency, performance sensitivity to the system parameter and hyper-parameter, and frequency of constraints violation. Depending on the result, this work can be solid support for developing and implementing the RL method in the chemical process or revealing a proper direction to improve the RL methods. In addition, this work provides the comparison results and environment to test the control method developed in the future.

The comparisons are conducted for several selected benchmark chemical processes such as CSTR, distillation, bioprocess, and polymerization. Each benchmark system has its character, which makes it challenging to control. For model-free RL methods, algorithms such as A2C, DDPG, and SAC are applied to the benchmark systems. For the model-based approach, the linear model, Gaussian process, and nonlinear ode represented by the deep neural networks are fitted from the data. Note that the first-principle model with parameter estimation method is excluded because the proper first-principle model may not be available in practice. The model predictive control based on these fitted models is applied to the benchmark systems.