(543g) Premexotac: Bitterants in-Silico Screening Using Machine Learning for Advanced Pharmaceutical Development
AIChE Annual Meeting
2022
2022 Annual Meeting
Computing and Systems Technology Division
Advances in Machine Learning and Intelligent Systems II
Wednesday, November 16, 2022 - 5:24pm to 5:43pm
An in silico bitterness predictor was constructed, following the procedure on figure 1. A database of the desired bioactivity (bitter and non-bitter) compounds was collected from the literature. The database was preprocessed, eliminating salts and disconnect structures, keeping the largest fragment. The final size of the database was 932 bitterants and 1908 presumed non-bitter compounds. For feature extraction, two datasets of molecular descriptors were calculated. The first dataset consisted on Extended Connectivity Fingerprints (ECFP) with size 1024 bit. The second set was a collection of 22 physicochemical and topological descriptors. Mutual information (MI) was applied as method for feature selection. For the models training, Support Vector Machine (SVM), k Nearest Neighbors (kNN), Random Forest (RF) and Adaptive Boosting (AdaBoost) were the selected algorithms. For the external validation, the set of 56 compounds UNIMI was evaluated. The evaluated metrics were the specificity, F-1 score and the recall.
From the feature selection, it was found that the Wiener Index (WPath), Molecular Weight (MW) the ABC-index, Crippen-Wildman Molar Refractivity (SMR) and the Graovac-Ghorbani ABC index (ABCGG) were the top 5 descriptors, according their MI score. The latter descriptors provide key information for the classification of bitter compounds. Regarding the ECFP, the top 10 substructures with the highest MI score were identified, as key descriptors for bitterness prediction on table 1. The latter is an update on a previous work done by (Rodgers et al., 2006).
The performance of the best models was compared with the predictors available in the literature, using the reported metrics on the UNIMI set on table 2. The difference between the top performer and PREMEXOTAC was 0.08 on the F-1 score. All the models compared used different sets of descriptors, data pre-processing and modelling. From the comparison with the models available in the literature, it was found that with the actual methods and access to confirmed experimentally bitterants/non-bitterants, a plateau in performance has been reached. Novel approaches for feature extraction and model training are constantly being developed. With this, would be possible in the future to create models able to surpass significantly this plateau. Data size is also a very important factor for performance improvement. Nevertheless, for bitter classification, a significant increase in the database would be time and financially expensive. Thus, a significant improvement in performance due thereof would not be achieved in the short-term future. Nevertheless, the actual models have very good performance and would provide significant reduction in costs for further in-vivo/in-vitro validations. Also, machine learning algorithms are ideal for pattern recognition. Therefore, the key information provided by the MI feature selection method provides significant insights into key physicochemical and topological characteristics of bitter compounds.
Acknowledgment: HERMES-Johannes-Burges-Stiftung is funding this project.