(284d) Data-Driven Approach for the Prediction of MHC Class II Epitopes Using Oscillations of Physicochemical Properties
AIChE Annual Meeting
2022
2022 Annual Meeting
Topical Conference: Chemical Engineers in Medicine
Big Data and Machine Learning to Advance Medicine
Tuesday, November 15, 2022 - 8:57am to 9:16am
Major histocompatibility complex (MHC) class II molecules, expressed on the surface of antigen-presenting cells (APC), display peptides to be recognized by CD4+ T-cells which would elicit various host immune responses. Thus, binding of peptides derived from protein antigen to the MHC molecules is a prerequisite for T-cell immunogenicity. One approach for the computational prediction of peptide-MHC binding is the data-driven machine learning method which involves predicting binding affinities given the sequences of the peptide and an MHC molecule. Numerous prediction tools have been developed for peptides-MHC class II binding, but it remains a challenging problem because of the polymorphic nature of MHC class II molecules and the variations in peptides length.
The presented work tests the performance of support vector machine (SVM) models of multiple allele-specific models combined with a previously proposed SVM based feature selection algorithm. The SVM models aim to classify MHC class II binding and non-binding peptides based on their amino acid sequences and derived features. In developing the SVM model, we take advantage of underlying periodicities in physicochemical properties along the sequence of a peptide that have been shown to be predictive features. Once the physicochemical descriptors are generated, Fourier transforms are then applied to be able to encode peptide sequences of varying lengths. In training and testing the model, a comprehensive dataset of MHC class II binding peptides was taken from IEDB database and cross validation and grid search are applied across multiple train and test datasets. A feature selection algorithm is also incorporated into the model development to identify an essential set of predictive features.