(28i) Machine Learning on Heterogeneous Data Sets of Adsorption Energies
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Fuels and Petrochemicals Division
Advances in Petroleum Production and Refining I
Sunday, October 27, 2024 - 5:38pm to 5:54pm
Machine learning models have been developed that can accurately predict adsorption energies for use in catalyst screening, but these models are not well-suited for fitting to heterogenous data sets containing data with different computational setups and on different sets of materials. More broadly, existing models often perform poorly when they encounter data that is different from their training environment. Often, these models fail to implement chemical knowledge that would aid in prediction accuracy and make their results more interpretable, such as the fact that elements are discrete entities. Here, we develop and apply a framework that can effectively learn from a heterogeneous dataset, allowing us to simultaneously leverage multiple existing datasets and allowing very efficient transfer learning to new applications.
To perform this fitting on heterogeneous datasets, we develop a multilevel machine learning model that treats each element as a discretized entity, and uses latent variables to link them to adsorption energies. Essentially, the framework learns how the local environment around a given element modifies its chemical properties towards a broad set of adsorption energies. Our model has been shown to make accurate binding energy predictions on heterogeneous data and enables efficient transfer learning, allowing researchers to efficiently use existing datasets to accelerate screening with their own computational setup on their own problem of interest.