(468a) END-to-END DATA Paradigm for Cell LINE Development | AIChE

(468a) END-to-END DATA Paradigm for Cell LINE Development

Authors 

Tat, J., University of California San Diego
Kyger, R., Amgen
Yuan, Y., Amgen
Cuellar, M., Amgen
Lay, F., Amgen
Daris, K., Amgen
Das, A., Amgen
Gomez, N., Amgen
Wang, T., Amgen
Georgescu, R., Amgen
Stevens, J., Amgen
Stevens, J., Amgen
Cell Line Development holds a critical role in creating and selecting cell lines that become the Master Cell Banks for clinical and commercial biomanufacturing. Critical decisions are made during cell line development processes, such as the final subclone selection which typically will be locked throughout the life of a project. Datasets used to drive these decisions traditionally requires resource-intensive manual cell culture experiments and analytics to generate hundreds to thousands of datapoints per experiment. However, these datasets have largely been composed of discrete or endpoint measurements in which critical decisions are dependent upon. Furthermore, data manipulation and analysis are often conducted on a per-project basis. We have identified the need to shift from processes governed by empirical data siloes to a data-centric platform that provides holistic insight and predictive power. Our approach to entering an end-to-end data paradigm in Cell Line Development involves adopting more high-throughput, data-rich technologies, establishing an Enterprise Data Lake (EDL), creating visualization tools, deploying automation, and applying off-the-shelf machine learning algorithms.

  • High-throughput technologies and methods were implemented, such as Berkeley Lights Beacon, digital-droplet PCR (ddPCR), RNA-seq that increase the scale and granularity of data we can now interrogate.
  • Laboratory and data automation tools were deployed to reduce resources required to generate and manipulate data.
  • Amgen’s EDL was established to house historical and novel big data and serves as the central source to enable digital tools and the ability to effectively manage and mine data
  • Real-time visualization dashboards were created and deployed to create instant trends and tables—eliminating hours of data extraction, manipulation, and analysis.
  • Multi-variate data analysis was integrated into our decision-making process to leverage the availability and granularity of increased datasets and provide holistic insight across historical projects.
  • Predictive models using open-source python-based machine learning algorithms were created to predict downstream features and provide future insight.

Altogether, in our new paradigm, data will be the central driver to ensure clones of high quality are produced from the moment a cell is transfected to when a clonal bank is created. This will open opportunities to shrink timelines and connect disjointed processes from research to commercialization.