(594c) Explainable Support Vector Machine Models for Analyzing Structure-Function Relationships of Membrane-Active Peptides
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Food, Pharmaceutical & Bioengineering Division
Machine Learning Based Protein Engineering
Wednesday, October 30, 2024 - 4:28pm to 4:46pm
To accomplish this, we compiled datasets from the literature that encompass cell-penetrating, hemolytic, anticancer, antifungal, antiparasitic, antibacterial, antiviral, and mammalian targeting peptides. Leveraging known periodicities in peptide properties that correlate with their structure and function, we employed Fourier transforms to generate features by measuring the amplitude of amino acid property oscillations. We applied an in-house feature selection procedure, based on non-linear support vector machines (SVMs), to derive structure-function fingerprints for each class of MAPs.
As reference point, we developed models using a more traditional feature set that includes amino acid compositions, dipeptide compositions, and physiochemical properties of amino acids. Additionally, to gauge the performance of our approach with state-of-the-art models we compared our models to deep-learning models that were trained by fine tuning a protein language model (ESM2) to predict each class of MAPs.
Furthermore, we compared our predictions with recently developed models in the literature that were trained on the same datasets for each peptide class. Comparison of our approach, based on the Fourier transform, with the state-of-the-art shows that our approach leads to models with significantly fewer features with at least comparable performance. Finally, we use the derived structure-function fingerprints to cluster the classes of MAPs, which provides insight into the design of MAPs with improved specificity.
This innovative study holds the potential to expedite and revolutionize the design and development of novel membrane-active peptides, offering promising avenues for drug discovery and clinical trials.
Keywords: Membrane-active peptides, support vector machine models, sequential feature generation, feature selection.