(135e) AI-Based Optimal Batch Control for Industrial Penicillin Fermentation Leveraging Deep Reinforcement Learning | AIChE

(135e) AI-Based Optimal Batch Control for Industrial Penicillin Fermentation Leveraging Deep Reinforcement Learning

Authors 

Qiu, T., Tsinghua University
Starting with the development of deep tank fermentation in the 1940s, penicillin began to be produced on a large scale. As the decades progressed, penicillin fermentation has become a multi-billion-dollar industry. However, due to regulatory restrictions, the development of advanced process control technology for this process is still lagging compared to other chemical industries like oil and gas. As a pioneering batch biopharmaceutical process, kinetic simulations for penicillin fermentation have progressed. Currently, the most advanced industrial penicillin simulator is IndPenSim (http://www.industrialpenicillinsimulation.com). In this open-source simulator, the first-principles model of penicillin production is described by a system of ordinary differential equations incorporating a structured biomass model, component equilibria, and inhibition effects1. This simulator is also the benchmark for current research on advanced control strategies for penicillin fermentation processes2.
The difficulty in batch control of the penicillin fermentation process comes from two main aspects. On the one hand, the process state measurements that can be observed during the process are limited. Measurements of the process are divided into three main areas: online measurements (cheap and fast), offline measurements (slow and with a lag), and Raman spectroscopy (expensive). Online measurements can provide a basic description of the process state such as temperature, pH, dissolved oxygen, fermenter weight, and outlet exhaust gas composition, but for some process critical variables such as penicillin concentration, biomass concentration, phenylacetic acid concentration, nitrogen concentration, and viscosity, despite their direct and significant effect on the penicillin growth rate, they can only be obtained by offline measurements, which provide only lagging, low-resolution data, making it difficult to construct high-precision control strategies. In addition, Raman spectroscopy-based process analytical technique (PAT) can predict phenylacetic acid concentration in real-time based on spectral data, thus enabling real-time PID control of phenylacetic acid concentration, but this technique is expensive, and not many practical applications are available.
On the other hand, process disturbances make the batch yield of penicillin unstable. For batch production, there are differences in the initial fermentation conditions of each batch, such as initial substrate concentration, the specific growth rate of biomass, dissolved oxygen, etc., and the growth rate of penicillin is very sensitive to these variables, so when the initial conditions are different, the batch yield can vary significantly from under the same control strategy. In addition, there will be in-batch fluctuations during each batch production. Such as inlet sugar concentration, oil concentration, phenylacetic acid concentration, cooling water temperature, etc., these exogenous disturbances can also disturb the state inside the fermenter and make the batch yield fluctuate.
Due to these two difficulties, the three control strategies reported so far: operator control, recipe-driven control, and PAT control are not good enough to achieve optimal batch control under disturbances with limited observation space. In recent years, with the development of deep reinforcement learning (DRL), great successes have been achieved in the fields of gaming, aerospace, and autonomous driving. Reinforcement learning uses multilayer neural networks to approximate value and policy functions, and can automatically find the action that maximizes the desired future reward in a given state by learning from interaction with the environment, thus achieving optimal control of the process. However, the application of this technique in the field of chemical process control is still limited.
In this work, we use DRL to achieve optimal batch control of the penicillin fermentation process. First, three elements in reinforcement learning are defined: state, action, and reward. The state is a 16-dimensional vector describing the penicillin fermenter, containing five offline measurement variables with a sampling frequency of 12 hours and 11 online measurement variables with a sampling frequency of 12 minutes. Action is a 7-dimensional vector containing all manually controlled variables of the process. The return is the amount of change in the yield of penicillin in the fermenter at that moment compared to the previous moment, obtained by subtracting the yield of the previous step using the yield of the current step. The time interval for each step in the training process is 12 minutes. The interaction between the agent and environment goes in the following way: In each step, the agent receives state variables from the IndPenSim environment (PenSimEnv) and then gives out the action. The action is sent to PenSimEnv for execution. By solving the ODEs, PenSimEnv will generate the current step reward and the next state, which are used for the agent updates.
In this study, the Soft Actor-Critic (SAC) algorithm3 was employed to train our model. This algorithm has gained significant attention in recent years for its effectiveness in solving complex continuous control problems. Unlike traditional Q-learning approaches, SAC is a policy optimization method that directly learns a stochastic policy in an off-policy setting. This unique feature makes SAC suitable for handling both discrete and continuous action spaces. SAC employs maximum entropy reinforcement learning to encourage exploration and a critic network to estimate the value function. In addition, it employs a soft Q-learning objective to learn a more effective policy. The combination of these techniques allows SAC to learn efficient policies for various challenging robotic and control tasks.
The training episode is set as 10000, and the steps in each episode are 1150. After the training, 30 test batches are used for comparing the performances of the developed DRL model with three reported control strategies (operator, recipe-driven, and Raman-spectroscopy-based PAT). The average batch yields for operator, recipe-driven, PAT, and DRL models are 2552, 2879, 3517, and 3810 kg. In addition, the DRL model demonstrated the smallest yield variance among the 30 batches of yields. The above results indicate that the DRL model can significantly improve the batch average yield while reducing the fluctuation of the batch yield, demonstrating its superiority in comparison with existing methods.
In summary, we present an advanced DRL-based optimal batch control framework for industrial penicillin production. The DRL framework enables implicit learning of the process state using limited offline and online observations to achieve higher batch yields and smaller yield fluctuations than existing control methods in the presence of process disturbances. We believe that the proposed framework is a promising artificial intelligence-based batch control method for industrial processes.
(1) Goldrick, S.; Ştefan, A.; Lovett, D.; Montague, G.; Lennox, B. The Development of an Industrial-Scale Fed-Batch Fermentation Simulation. Journal of Biotechnology 2015, 193, 70–82. https://doi.org/10.1016/j.jbiotec.2014.10.029.
(2) Goldrick, S.; Duran-Villalobos, C. A.; Jankauskas, K.; Lovett, D.; Farid, S. S.; Lennox, B. Modern Day Monitoring and Control Challenges Outlined on an Industrial-Scale Benchmark Fermentation Process. Computers & Chemical Engineering 2019, 130, 106471. https://doi.org/10.1016/j.compchemeng.2019.05.037.
(3) Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. 2018. https://doi.org/10.48550/ARXIV.1801.01290.