(372ah) Similarity-Based Machine Learning for Small Datasets; Application in Predicting Bio-Lubricant Properties
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Computing and Systems Technology Division
10B: Interactive Session: Systems and Process Control
Tuesday, October 29, 2024 - 3:30pm to 5:00pm
Machine learning (ML) has been successfully applied to learn patterns in experimentally generated chemical data to predict molecular properties. However, experimental measurements can be expensive and, as a result, experimental data for several properties is scarce. Several ML methods face challenges when trained with limited data. Here, we introduce a similarity-based ML approach to efficiently train ML models on small datasets. We group molecules with similar structures, represented by molecular fingerprints, and use these groups to train separate ML models. We apply the methodology to predict kinematic viscosity of bio-lubricant base oil molecules at 40 °C (KV40). Our method shows noticeable improvement in model performance compared to transfer learning (TL) and standard Random Forest (RF) approach. Our methodology provides a robust framework for scenarios with limited data and can be readily generalized to a diverse range of molecular datasets.