(234f) A Metaheuristic Investigation of the Impact of Nutrient Supplementation Strategies on the Growth and Productivity of Mammalian Cell Cultures
AIChE Annual Meeting
2021
2021 Annual Meeting
Food, Pharmaceutical & Bioengineering Division
Biomanufacturing with Advanced Mammalian Cell Culture Technologies
Tuesday, November 9, 2021 - 9:45am to 10:06am
Digital transformation of biopharmaceutical process development has become a rapidly expanding area exploring new opportunities for value creation across the entire drug manufacturing lifecycle. This digitalization has greatly benefited from ongoing developments in advanced sensor technology, robotic high throughput experimental platforms, a consolidation of a broad range of instrumental techniques, increased computational power and emerging data management systems. This has undoubtedly led to the generation of a considerable amount of process data repositories and archives with underlying information on the root causes of performance, input-output relationships and fluctuations in bioprocesses. Therefore, the steady integration of model-based approaches and tools to interpret these large amounts of complex process data aims to improve operational efficiency and robustness in bioprocess design.
Traditionally, model-based approaches in the bioprocessing space have been focused on tasks such as process development, scale-up, optimisation, validation, monitoring and control. Among these tasks, process models have taken active roles on descriptive, diagnostic, predictive and prescriptive function in the biopharmaceutical space. Nevertheless, the inherently multivariate nature of bioprocessing operations, the simultaneous observation of phenomena across multiple scales of complexity and their time varying behavior still pose multiple challenges for analysis of process datasets. More recently, an increasing number of data-centric approaches and applications based on machine learning are being developed, adapted and applied to biomanufacturing. Machine learning focuses on the theory, performance and properties of learning systems and associated algorithms applied to turn large data collections into knowledge and actionable outcomes. In particular, the so-called data mining, a subdomain of machine learning, has the potential to unveil meaningful relationships, predict future process outcomes and guide strategies for robust process design and operation.
Although, several metaheuristic approaches have been developed for data mining across many data-intense fields, the inability to use a universal methodology gives rise to multiple challenges depending on the nature, structure and complexity of the data layers. For instance, upstream unit operations generate inherently complex data ranging from biological systems properties, biological and process environment interactions, process configuration and process control actions. In addition, a number of dynamically changing features are typically monitored with varied frequency and resolution (on-line, in-line, at-line and off-line data).
Herein, we introduce a data-driven investigation in the upstream space for the impact of nutrient supplementation strategies on the growth and productivity of fed-batch CHO cell cultures. The approach presented herein examined the use of machine learning techniques for the extraction of knowledge relevant to bioprocess performance and cell culture media development and feeding strategies in historical records. This methodology (Gangadharan et al., 2021) addressed the particular challenges from multivariate, rank deficient, time series and non-linear nature of bioprocess datasets through data pre-processing steps. Subsequently, this framework was used to explore how to best utilize the obtained learning for description and prediction of the performance of current and future cell culture operations in terms of nutrient supplementation strategies.
2. Methodology
A historical dataset (year 2016-2020), consisting of 126 fed-batch CHO cell cultures (<5L scale, 12-15 day cultivations, VCD<20x106 cells mL-1), was collated for the investigation of upstream process design at bench scale (University College London, UK). This dataset included 51 variables related to cellular and metabolic parameters (3 cell line types, cell growth profile and cell specific metabolic rates), physical and chemical environmental parameters (process parameters and extracellular nutrient concentrations) and nutrient supplementation strategy parameters (2 basal media, 4 feed formats, varied feeding schedules and 17 nutrient group supplementations). This conjunctive framework was implemented in MATLAB computing platform (MATLAB 2019a The MathWorks, Inc.,USA) and open source software R (version, 3.6.0, R Foundation for Statistical Computing, Austria).
3. Results and Discussion
3.1 Data pre-processing
Data pre-processing steps in this pipeline included methods in handling and imputation of missing data, time series visualization, temporal clustering, multicollinearity elimination and dimensionality reduction with feature selection. To highlight, feature importance on growth and productivity was assessed using supervised feature selection based on intrinsic (random forests) and wrapper methods (RFE and MARS). Based on this analysis, typical process outcome defining features were corroborated including culture stage, cell line type, media type, media format and by-product concentrations across cultivations. On one hand, feeding of essential amino acids, tricarboxylic acids, metabolic modifiers, redox sinks and carbohydrates were identified to have positive and negative influence accordingly on the cell specific antibody productivity. On the other hand, supplementations of nucleosides, tricarboxylic acids, redox sinks, ketones and carbohydrates were identified to have also an influence on growth profile and specific growth rate. These in-silico observations aligned in accordance with those observed experimentally when analyzing end-point metrics and their correlation favoring growth or productivity in fed-batch cultures. Further investigation of these identified nutrient group supplementations in intensified cultures (VCD<70x106 cells mL-1) provided additional evidence on the effect of growth and productivity in CHO cell cultures across cultivation modes and media platforms.
3.2 Data processing
Subsequently, reduced data was further analyzed via supervised learning using Support Vector Machines for model training, validation and performance prediction. A cross-dimensional model approach was used to evaluate and predict antibody titer at harvest point as main cell culture performance metric and based on the combination of models in a first dimension (single variable based prediction) and second dimension (multivariable based prediction). On average, a ±8.0% deviation between observed and predicted harvest antibody values was observed with this cross dimensional model strategy, while a ±6.9% deviation could be achieved by further model optimisation. Nevertheless, a higher prediction value offset was detected in higher performing cultures (>3.0 g L-1 titer), representing a grey zone with less available historical data for pattern recognition. For such cases, the implementation of feeding strategies using enrichment (10x) and concentrated feed media (3x) in comparison to standard process (1x), along with other process strategies (mild hypothermia and multi-step feeding) were identified as the performance differentiators respect to the rest of the cultures.
4. Conclusions
Overall, this metaheuristic investigation applied to mammalian fed-batch cell cultures was capable of unfolding relevant process information from the time series and multivariate dataset during both data pre-processing and processing stages. Particularly, specific nutrient supplementation strategies and process conditions were identified to have a relevant impact on process performance indicators, such as growth and productivity, across the historical dataset. These included strategies related to nutrient concentration (basal, feed and enrichment media), nutrient composition (macromolecular and metabolic pathway associated nutrient supplements) and nutrient supplementation regime (feeding format and frequency). This framework also offered a modelling strategy based on historical datasets to predict the future performance of upstream unit operations based on single or multi-attribute outputs and as early as mid-exponential phase (day 5 to 6). The presented case study herein demonstrates the coexisting complexity and big potential of data mining for biopharmaceutical processing applications. Consequently, advances and development of tailored machine learning pipelines for multi and megavariate analysis and subsequent process design, optimisation and control will represent a more widely applied strategy for next generation biopharmaceutical manufacturing.
5. References
Gangadharan, N., Sewell, D., Turner, R., Field, R., Cheeks, M., Oliver, S. G., Slater, N. K. H., & Dikicioglu, D. (2021). Data intelligence for process performance prediction in biologics manufacturing. Computers and Chemical Engineering, 146, 107226. https://doi.org/10.1016/j.compchemeng.2021.107226