(144f) Two-Stage Control Using Model-Based Reinforcement Learning and Predictive Control for Fed-Batch Bioreactor
AIChE Annual Meeting
2020
2020 Virtual AIChE Annual Meeting
Topical Conference: Next-Gen Manufacturing
Next-Gen Manufacturing in Pharma, Food, and Bioprocessing
Monday, November 16, 2020 - 9:00am to 9:15am
In this work, we propose a two-stage control strategy to optimally control a fed-batch type bioreactor that produces penicillin. The proposed strategy utilizes model-based Reinforcement Learning (RL) off-line to optimize the operating trajectories of state and input. In on-line, the Moving Horizon Estimation (MHE) is applied to estimate the parameter values, and an adaptive Model Predictive Control (MPC) is utilized to simultaneously modify and track the trajectories obtained in off-line. First, the first-principle model of the system dynamics is represented in Stochastic Differential Equations (SDEs) form which could theoretically handle the white noise in a continuous system. In addition, the variance of the process noise could shift a mean value of drifting terms of the system so that the changes in parameter values could be presented by adjusting the variance of the process noise [3]. These SDEs are numerically integrated and served as a virtual plant. The next step is obtaining the operating trajectories of the reactor off-line. The optimal control method could be classified in three classes, direct, indirect and RL (dynamic programming-based method). The direct method is the one mostly applied in practice since the problem formulation and the procedure of obtaining the solution is intuitive. On the other hand, RL has several advantages over the direct method. RL provides a closed-loop solution of the optimal control problem so that the control policy could be obtained instead of just a control trajectory [4]. This is a more robust approach since RL evaluates the control cost for the case where the states have deviated from the optimal trajectories. In addition, value based RL provides not only the optimal trajectories but also the value of state which is an expected total cost of the control policy starting from the state [4]. Although this value is just a by-product to find the optimal control policy, it is useful information when the trajectories are modified on-line.
The trajectories suggested by RL could serve as a baseline of optimal control. However, due to the change in parameter values and disturbances, tracking those trajectories might not be feasible which requires a modification of trajectories. For this purpose, MHE and adaptive MPC could be utilized to estimate the parameters and modify the trajectories on-line. However, even though changes in parameter values are successfully estimated by MHE, the modification of trajectories by MPC for the entire horizon could not be feasible by the lack of computational time. This forces MPC to solve the optimal control problem with a limited length of prediction horizon where the terminal state of the bioreactor is not considered. This becomes a problem because most of the important cost is imposed only at the terminal state of the bioreactor. Therefore, providing a proper terminal cost of MPC becomes a crucial point to properly modify the trajectories. At this point, the value calculated from the RL could be used as the terminal cost of MPC. Although this value would not be exact by the changes of parameter values, the simulation results suggest that this approximation is valid for MPC to stably calculate the new trajectories.
The key to this combination of MPC and RL is selecting the proper length of the prediction horizon. Under this scheme, the length of the prediction horizon is not only determined by the computational limits, but it is related to the reliability of the parameter values. The long prediction horizon implies that MPC is more rely on the newly estimated parameter values than the one used for RL. The presence of the parameter changes is one support on using the longer prediction horizon. However, the reliability of the estimated parameter values is related to both the quality and quantity of measurement data. Therefore, the choice of prediction horizon length should be carefully determined by evaluating the reliability of the parameter values. We propose one method for evaluating such reliability and propose a method of adjusting the prediction horizon length.
[1] C. Larroche, SanromaÌn Maria AÌngeles, G. Du, and A. Pandey, Current developments in biotechnology and bioengineering. Amsterdam: Elsevier, 2017.
[2] Shuler and Kargi, Bioprocess engineering. United States: Academic Internet Publishers, 2007.
[3] H.-H. Kuo, Introduction to stochastic integration. New York: Springer, 2006.
[4] R. S. Sutton, Reinforcement learning: an introduction. Cambridge, Mass: MIT Press, 2018.
Checkout
This paper has an Extended Abstract file available; you must purchase the conference proceedings to access it.
Do you already own this?
Log In for instructions on accessing this content.
Pricing
Individuals
AIChE Pro Members | $150.00 |
AIChE Emeritus Members | $105.00 |
AIChE Graduate Student Members | Free |
AIChE Undergraduate Student Members | Free |
AIChE Explorer Members | $225.00 |
Non-Members | $225.00 |