Predicting Amyloid Fibrillation through Transfer Learning
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Annual Student Conference: Competitions & Events
Undergraduate Student Poster Session: Computing and Process Control
Monday, October 28, 2024 - 10:00am to 12:30pm
Drug discovery is a complex multi-optimization problem that involves balancing the biological activity and developability of target compounds. Recent advances in generative machine learning models have contributed to the streamlining of these processes but lack precise control over target biophysical propertiesâmany of which can be harmful to the development of a peptide therapeutic. Amyloid fibrils are a form of aggregate characterized by ordered stacks of peptide, which form a fibrous structure. In the discovery process, these aggregates are difficult to metabolize and are often less potent than free molecules. In this work, we develop a generalizable model that can predict amyloid fibrillation through protein language model (pLM) embeddings. We utilize ESM2âa pLMâto generate latent embeddings for sequences of interest, which are then passed to our model. Experimental data for fibrillation is limited and the largest public datasets consist of sequences that are much shorter than therapeutically relevant peptides. We explore transfer learning preprocessing strategies that allow us to effectively generalize to new sequence lengths, including mean-pooling and a modified convolutional neural network with attention weightsâwhich is referred to as light attention (LA). Processed embeddings are passed to a standard multilayer perceptron (MLP) and predictions are scored against labelled data. These architectures demonstrate high predictive power when evaluated on two publicly available datasets and serve to expedite the development of peptide-based pharmaceuticals.