From Sequence to Yield: Deep Learning for Protein Production Systems | AIChE

From Sequence to Yield: Deep Learning for Protein Production Systems

Authors 

Oyarzún, D. - Presenter, Imperial College London
Nikolados, E. M., University of Edinburgh
A key area in biotechnology is the production of recombinant proteins for the energy, food and pharmaceutical sectors. Protein expression systems are typically engineered to produce large amounts of protein, but this requires many iterations between strain design, construction and characterisation. Here we propose an advanced machine learning pipeline to forecast production and improve performance. Our algorithms can predict protein production from DNA sequence with >80% accuracy. We trained machine learning models of increasing complexity, from simple linear regressors to deep neural networks, on large phenotypic screens of GFP readouts and fitness. Our results reveal advantages and caveats of various algorithms and highlight trade-offs between prediction accuracy, the size of the training dataset, and the type of algorithms employed. We demonstrate that the use of Transfer Learning can decrease the data requirements by up to five fold when training on multiple phenotypes. Our results lay the groundwork for machine learning-enabled engineering of strains with improved yields, providing a unique, data-driven, approach to accelerate strain design and optimisation.