(144f) Two-Stage Control Using Model-Based Reinforcement Learning and Predictive Control for Fed-Batch Bioreactor

Conference

AIChE Annual Meeting

Year

2020

Proceeding

2020 Virtual AIChE Annual Meeting

Group

Topical Conference: Next-Gen Manufacturing

Session

Next-Gen Manufacturing in Pharma, Food, and Bioprocessing

Time

Monday, November 16, 2020 - 9:00am to 9:15am

Authors

Oh, T. H. - Presenter, Seoul National University

Lee, J. M., Seoul National University

As the demand for biomass and pharmaceutical materials is increasing in the market, various genetically modified cells are tested in a batch bioreactor in order to increase production rate [1]. At the same time, a robust and efficient control strategy is required to successfully produce the product with uncertainty of cellular behaviors [2]. However, limitation of the information on the system dynamics makes it challenging problems to optimize the operating conditions of a bioreactor and to implement an automatic controller. In general, the biosystem is less reproducible and shows higher nonlinearity compared to other engineering systems. In particular, most of the parameters in the first-principle model are lumped type and these values would be varied not only in a single batch operation with changes of the states but in a batch-to-batch fashion. Therefore, implementing an automatic control to track the optimal control trajectories obtained off-line is not a robust approach to produce fine products for a large number of batch reactors.

In this work, we propose a two-stage control strategy to optimally control a fed-batch type bioreactor that produces penicillin. The proposed strategy utilizes model-based Reinforcement Learning (RL) off-line to optimize the operating trajectories of state and input. In on-line, the Moving Horizon Estimation (MHE) is applied to estimate the parameter values, and an adaptive Model Predictive Control (MPC) is utilized to simultaneously modify and track the trajectories obtained in off-line. First, the first-principle model of the system dynamics is represented in Stochastic Differential Equations (SDEs) form which could theoretically handle the white noise in a continuous system. In addition, the variance of the process noise could shift a mean value of drifting terms of the system so that the changes in parameter values could be presented by adjusting the variance of the process noise [3]. These SDEs are numerically integrated and served as a virtual plant. The next step is obtaining the operating trajectories of the reactor off-line. The optimal control method could be classified in three classes, direct, indirect and RL (dynamic programming-based method). The direct method is the one mostly applied in practice since the problem formulation and the procedure of obtaining the solution is intuitive. On the other hand, RL has several advantages over the direct method. RL provides a closed-loop solution of the optimal control problem so that the control policy could be obtained instead of just a control trajectory [4]. This is a more robust approach since RL evaluates the control cost for the case where the states have deviated from the optimal trajectories. In addition, value based RL provides not only the optimal trajectories but also the value of state which is an expected total cost of the control policy starting from the state [4]. Although this value is just a by-product to find the optimal control policy, it is useful information when the trajectories are modified on-line.

The trajectories suggested by RL could serve as a baseline of optimal control. However, due to the change in parameter values and disturbances, tracking those trajectories might not be feasible which requires a modification of trajectories. For this purpose, MHE and adaptive MPC could be utilized to estimate the parameters and modify the trajectories on-line. However, even though changes in parameter values are successfully estimated by MHE, the modification of trajectories by MPC for the entire horizon could not be feasible by the lack of computational time. This forces MPC to solve the optimal control problem with a limited length of prediction horizon where the terminal state of the bioreactor is not considered. This becomes a problem because most of the important cost is imposed only at the terminal state of the bioreactor. Therefore, providing a proper terminal cost of MPC becomes a crucial point to properly modify the trajectories. At this point, the value calculated from the RL could be used as the terminal cost of MPC. Although this value would not be exact by the changes of parameter values, the simulation results suggest that this approximation is valid for MPC to stably calculate the new trajectories.

The key to this combination of MPC and RL is selecting the proper length of the prediction horizon. Under this scheme, the length of the prediction horizon is not only determined by the computational limits, but it is related to the reliability of the parameter values. The long prediction horizon implies that MPC is more rely on the newly estimated parameter values than the one used for RL. The presence of the parameter changes is one support on using the longer prediction horizon. However, the reliability of the estimated parameter values is related to both the quality and quantity of measurement data. Therefore, the choice of prediction horizon length should be carefully determined by evaluating the reliability of the parameter values. We propose one method for evaluating such reliability and propose a method of adjusting the prediction horizon length.

[1] C. Larroche, SanromaÌn Maria AÌngeles, G. Du, and A. Pandey, Current developments in biotechnology and bioengineering. Amsterdam: Elsevier, 2017.

[2] Shuler and Kargi, Bioprocess engineering. United States: Academic Internet Publishers, 2007.

[3] H.-H. Kuo, Introduction to stochastic integration. New York: Springer, 2006.

[4] R. S. Sutton, Reinforcement learning: an introduction. Cambridge, Mass: MIT Press, 2018.

Topics

Process Automation & Control

Bioprocessing

Checkout

This paper has an Extended Abstract file available; you must purchase the conference proceedings to access it.

Checkout

Do you already own this?

Pricing

Individuals

AIChE Pro Members	$150.00
AIChE Emeritus Members	$105.00
AIChE Graduate Student Members	Free
AIChE Undergraduate Student Members	Free
AIChE Explorer Members	$225.00
Non-Members	$225.00

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: December 2024

CEP: November 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(144f) Two-Stage Control Using Model-Based Reinforcement Learning and Predictive Control for Fed-Batch Bioreactor

AIChE Annual Meeting

2020

2020 Virtual AIChE Annual Meeting

Topical Conference: Next-Gen Manufacturing

Next-Gen Manufacturing in Pharma, Food, and Bioprocessing

Monday, November 16, 2020 - 9:00am to 9:15am

Authors

Topics

Checkout

Do you already own this?

Pricing

Individuals

More Conference Links

Contact Us

Cancellation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams

Code of Conduct

Beware of Hotel and Attendee-list Scams