(144f) Two-Stage Control Using Model-Based Reinforcement Learning and Predictive Control for Fed-Batch Bioreactor | AIChE

(144f) Two-Stage Control Using Model-Based Reinforcement Learning and Predictive Control for Fed-Batch Bioreactor

Authors 

Oh, T. H. - Presenter, Seoul National University
Lee, J. M., Seoul National University
As the demand for biomass and pharmaceutical materials is increasing in the market, various genetically modified cells are tested in a batch bioreactor in order to increase production rate [1]. At the same time, a robust and efficient control strategy is required to successfully produce the product with uncertainty of cellular behaviors [2]. However, limitation of the information on the system dynamics makes it challenging problems to optimize the operating conditions of a bioreactor and to implement an automatic controller. In general, the biosystem is less reproducible and shows higher nonlinearity compared to other engineering systems. In particular, most of the parameters in the first-principle model are lumped type and these values would be varied not only in a single batch operation with changes of the states but in a batch-to-batch fashion. Therefore, implementing an automatic control to track the optimal control trajectories obtained off-line is not a robust approach to produce fine products for a large number of batch reactors.

In this work, we propose a two-stage control strategy to optimally control a fed-batch type bioreactor that produces penicillin. The proposed strategy utilizes model-based Reinforcement Learning (RL) off-line to optimize the operating trajectories of state and input. In on-line, the Moving Horizon Estimation (MHE) is applied to estimate the parameter values, and an adaptive Model Predictive Control (MPC) is utilized to simultaneously modify and track the trajectories obtained in off-line. First, the first-principle model of the system dynamics is represented in Stochastic Differential Equations (SDEs) form which could theoretically handle the white noise in a continuous system. In addition, the variance of the process noise could shift a mean value of drifting terms of the system so that the changes in parameter values could be presented by adjusting the variance of the process noise [3]. These SDEs are numerically integrated and served as a virtual plant. The next step is obtaining the operating trajectories of the reactor off-line. The optimal control method could be classified in three classes, direct, indirect and RL (dynamic programming-based method). The direct method is the one mostly applied in practice since the problem formulation and the procedure of obtaining the solution is intuitive. On the other hand, RL has several advantages over the direct method. RL provides a closed-loop solution of the optimal control problem so that the control policy could be obtained instead of just a control trajectory [4]. This is a more robust approach since RL evaluates the control cost for the case where the states have deviated from the optimal trajectories. In addition, value based RL provides not only the optimal trajectories but also the value of state which is an expected total cost of the control policy starting from the state [4]. Although this value is just a by-product to find the optimal control policy, it is useful information when the trajectories are modified on-line.

The trajectories suggested by RL could serve as a baseline of optimal control. However, due to the change in parameter values and disturbances, tracking those trajectories might not be feasible which requires a modification of trajectories. For this purpose, MHE and adaptive MPC could be utilized to estimate the parameters and modify the trajectories on-line. However, even though changes in parameter values are successfully estimated by MHE, the modification of trajectories by MPC for the entire horizon could not be feasible by the lack of computational time. This forces MPC to solve the optimal control problem with a limited length of prediction horizon where the terminal state of the bioreactor is not considered. This becomes a problem because most of the important cost is imposed only at the terminal state of the bioreactor. Therefore, providing a proper terminal cost of MPC becomes a crucial point to properly modify the trajectories. At this point, the value calculated from the RL could be used as the terminal cost of MPC. Although this value would not be exact by the changes of parameter values, the simulation results suggest that this approximation is valid for MPC to stably calculate the new trajectories.

The key to this combination of MPC and RL is selecting the proper length of the prediction horizon. Under this scheme, the length of the prediction horizon is not only determined by the computational limits, but it is related to the reliability of the parameter values. The long prediction horizon implies that MPC is more rely on the newly estimated parameter values than the one used for RL. The presence of the parameter changes is one support on using the longer prediction horizon. However, the reliability of the estimated parameter values is related to both the quality and quantity of measurement data. Therefore, the choice of prediction horizon length should be carefully determined by evaluating the reliability of the parameter values. We propose one method for evaluating such reliability and propose a method of adjusting the prediction horizon length.

[1] C. Larroche, Sanromán Maria Ángeles, G. Du, and A. Pandey, Current developments in biotechnology and bioengineering. Amsterdam: Elsevier, 2017.

[2] Shuler and Kargi, Bioprocess engineering. United States: Academic Internet Publishers, 2007.

[3] H.-H. Kuo, Introduction to stochastic integration. New York: Springer, 2006.

[4] R. S. Sutton, Reinforcement learning: an introduction. Cambridge, Mass: MIT Press, 2018.

Checkout

This paper has an Extended Abstract file available; you must purchase the conference proceedings to access it.

Checkout

Do you already own this?

Pricing

Individuals

AIChE Pro Members $150.00
AIChE Emeritus Members $105.00
AIChE Graduate Student Members Free
AIChE Undergraduate Student Members Free
AIChE Explorer Members $225.00
Non-Members $225.00