(149z) Reinforcement Learning Based Control of Fed-Batch Production Reactor | AIChE

(149z) Reinforcement Learning Based Control of Fed-Batch Production Reactor

Authors 

Kontoravdi, C., Imperial College London
Fadda, S., Imperial College London
Mammalian cells produce up to 80% of the commercially available therapeutic proteins, with Chinese Hamster Ovary (CHO) cells being the primary production host. The production of monoclonal antibodies, which are used in the treatment of cancer and autoimmune diseases, using CHO cells is a highly complex and nonlinear process with many correlated variables. The industry standard for control of this process involves PID control of pH, temperature and dissolved oxygen tension. However, these traditional control strategies fail to capture the complex dynamics of the process, leading to high batch-to-batch variability. This is problematic in a highly regulated industry like biopharma, which must ensure a consistent and safe product.

The inherent complexity of bioprocesses makes them challenging to model purely mechanistically. However, the lack of experimental data and the need for explainability in control policies prevent the use of fully data-driven solutions. Hence, hybrid systems encompassing mechanistic and data-driven tools provide a suitable compromise. However, it is unclear how to best integrate these two components. In this work, we propose a model-based reinforcement learning agent to optimize the glucose feeding strategy. In reinforcement learning, an agent interacts with an environment and learns to act through trial and error. By receiving rewards for its actions, the agent develops a control policy based on the experiences gathered from the environment.

We defined the environment as a high-fidelity kinetic model of the reactor production process, simulating CHO cell growth and antibody production dynamics. The high-fidelity reactor model consists of differential and algebraic equations describing the mass balances for each component (cells, nutrients and metabolites) and the corresponding specific uptake and production rates. The cell culture dynamics are described by Monod-type kinetics. The model includes the description of 22 metabolites and the estimation of 35 model parameters. The high-fidelity model provides a safe and realistic representation of reality, suitable for the agent to learn and explore. The agent receives only partial state observations of the reactor. The observations include information on the concentration of amino acids, cell density, and antibody production - measurements which are commonly available offline during manufacturing. The agent acts on the environment by manipulating the glucose concentration and volume provided to the production reactor. To promote the learning of useful strategies, the agent is rewarded for maximising biomass in the growth phase of the batch and for maximising antibody production in the later stages. Furthermore, the agent receives observations from a wide operational space, thus allowing for a better understanding of the system across a broader range of conditions.

To evaluate the efficacy of the reinforcement learning agent, we compare its performance against two cases: a design of experiments-based feeding strategy [1] and a model-based control approach that utilized a metabolic model to dictate control policies. Our results suggest that the proposed methodology can lead to higher product yield compared to conventional feeding approaches.

The proposed data-driven methodology is a step towards advanced process control in the bioprocessing industry. Our approach enables the integration of mechanistic knowledge with data to develop more effective control strategies. Furthermore, the proposed control approach offers adaptability, which reduces the need for frequent reparameterization of the controller model when culture conditions change.

References:

[1] Kyriakopoulos S, Kontoravdi C. A framework for the systematic design of fed-batch strategies in mammalian cell culture. Biotechnol Bioeng. 2014;111(12):2466-2476. doi:10.1002/bit.25319