(372ah) Similarity-Based Machine Learning for Small Datasets; Application in Predicting Bio-Lubricant Properties | AIChE

(372ah) Similarity-Based Machine Learning for Small Datasets; Application in Predicting Bio-Lubricant Properties

Authors 

Kim, J. Y. - Presenter, University of Delaware
Khan, S. A., University of Delaware
Vlachos, D., University of Delaware - Catalysis Center For Ener
Machine learning (ML) has been successfully applied to learn patterns in experimentally generated chemical data to predict molecular properties. However, experimental measurements can be expensive and, as a result, experimental data for several properties is scarce. Several ML methods face challenges when trained with limited data. Here, we introduce a similarity-based ML approach to efficiently train ML models on small datasets. We group molecules with similar structures, represented by molecular fingerprints, and use these groups to train separate ML models. We apply the methodology to predict kinematic viscosity of bio-lubricant base oil molecules at 40 °C (KV40). Our method shows noticeable improvement in model performance compared to transfer learning (TL) and standard Random Forest (RF) approach. Our methodology provides a robust framework for scenarios with limited data and can be readily generalized to a diverse range of molecular datasets.