(234c) Predictive in silico Models for Cell Culture Process Development for Biologics Manufacturing | AIChE

(234c) Predictive in silico Models for Cell Culture Process Development for Biologics Manufacturing

Authors 

Dasgupta, A. - Presenter, Massachusetts Institute of Technology
Gonzalez, J., Takeda Pharmaceuticals
Coley, C., MIT
Love, J., Massachusetts Institute of Technology
Tie, S., Georgia Institute of Technology
Chin, M., Takeda Pharmaceuticals
Fiordalis, A., Tufts University
Li, L., Takeda Pharmaceuticals
Wang, N., Takeda Pharmaceuticals
Drug discovery and development timelines for biopharmaceuticals span multiple years and cost millions of dollars, with process development and manufacturing costs accounting for 13-17% of the total research and development (R&D) costs for a successfully launched target 1. The current guidance by the FDA to incorporate process analytical technology (PAT) to develop innovative pharmaceutical development, manufacturing and quality assurance 2,3 has led to the adoption of these tools within multiple bioprocessing pipelines. Yet, despite the adoption of these frameworks, much work remains to be done to leverage the data collected to streamline process development pipelines. Machine learning (ML) and statistical analysis can be used to analyze process data and train predictive models that can inform the selection of optimal process parameters and reduce empirical screening, in turn reducing process development timelines and costs. Applications of ML include the development of soft sensors, statistical process control and validation, real time process monitoring, and scale up analysis. While the promise of this approach holds true, the development and application of ML models for process development efforts is underexplored, as many studies focus exclusively on latent variable methods such as partial least squares (PLS) 4,5.

In this study, we describe the use of statistical learning techniques to predict cell culture performance attributes and drug quantity and quality metrics on two process types: a fed batch process for a monoclonal antibody (mAb) product and a perfusion process for an enzyme product. We compare the performance of partial least squares regression (PLS-R), random forests (RF), extreme gradient boosting (XGBoost) and long short-term memory (LSTM) as modeling methods to achieve this aim. We use data collected from offline and online monitoring sources within laboratory-scale experiments, including online pH, temperature and dissolved oxygen probes as well as offline metabolite and nutrient concentrations of species such as glutamine, glucose, and lactate. We assess our models based on their ability to predict cell culture performance, product quantity and product quality multiple days into the future. We predict viable cell density (diagnostic of cell culture performance), product titers (measuring product quantity), and enzyme activity or monomer species (mAb), a measure of product purity (measuring one product quality attribute for the enzyme and mAb product) with reasonable accuracy (<=20% mean relative error) up to several days in advance. We find that XGBoost achieves the most consistent performance, followed closely by RF and LSTM models. PLS-R achieves comparatively worse performance on all tasks except VCD prediction. The strengths and weaknesses of these modeling approaches will be discussed.

We also interrogate our models to assess the physical significance of their predictions by performing a sensitivity analysis using descriptor ablation studies. Initial results suggest that the models developed are only weakly reliant on any single descriptor, likely due to correlation with other descriptors. For example, the lowest R2 (0.847) and highest MAE (9.11) for VCD prediction on the fed-batch process was found when the descriptor measuring the partial pressure of carbon dioxide was removed, compared to a baseline performance of 0.86 R2 and 8.75 MAE for the model with all descriptors included.

Our study provides a framework that will ultimately be used to shorten process development timelines and identify optimal, robust operating conditions. In turn, this has the potential to reduce R&D expenditures and improve the availability of life-saving medicines in a timely manner for patients worldwide.

References:

1. Suzanne S. Farid, Max Baron, Christos Stamatis, Wenhao Nie,and Jon Coffman.Benchmarking biopharmaceutical process development and manufacturing cost contributions to R&D. mAbs,12(1):1754999, January 2020.Publisher:Taylor & Francis eprint:https://doi.org/10.1080/19420862.2020.1754999.

2. Center for Drug Evaluation and Research. PAT — A Framework for Innovative Pharmaceutical Development, Manufacturing, and Quality Assurance, June 2020. Publisher: FDA.

3. Carina L. Gargalo, Isuru Udugama, Katrin Pontius, Pau C. Lopez, Rasmus F. Nielsen, Aliyeh Hasanzadeh, Seyed Soheil Mansouri, Christoph Bayer, Helena Junicke, and Krist V. Gernaey. Towards smart biomanufacturing: a perspective on recent developments in industrial measurement and monitoring technologies for bio-based production processes. Journal of Industrial Microbiology & Biotechnology, 47(11):947–964, November 2020.

4. M. Ignova, J. Glassey, A.C. Ward, and G.A. Montague. Multivariate statistical methods in bioprocess fault detection and performance forecasting. Transactions of the Institute of Measurement and Control, 19(5):271–279,December 1997. Publisher: SAGE Publications Ltd STM.

5. Diego A. Suarez-Zuluaga, Daniel Borchert, Nicole N. Driessen, WilfriedA. M. Bakker, and Yvonne E. Thomassen. Accelerating bioprocess development by analysis of all available data: A USP case study. Vaccine,37(47):7081–7089, November 2019.