(169cx) Developing an Open-Source Tool for Generating Rich and Consistent Sigma Profiles
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Computational Molecular Science and Engineering Forum
Poster Session: Computational Molecular Science and Engineering Forum
Monday, October 28, 2024 - 3:30pm to 5:00pm
As an alternative, sigma profiles have been proposed as a universal molecular descriptor. A sigma profile is an un-normalized histogram of the surface screening charge distribution of a molecule when embedded in a continuum solvent with a very high dielectric constant. Therefore, sigma profiles contain valuable information about the possible inter- and intra-molecular interactions the molecule can experience. And they have been successfully used as inputs to convolutional neural networks and gaussian processes to predict bulk material properties (e.g. boiling point, density, aqueous solubility, etc.) with minimal datasets of less than 1500 molecules. Furthermore, because they are histograms, their size is independent of molecule size - the sigma profiles for methane and C30 will both have the same number of bins. The main reason sigma profiles have not been used as molecular descriptors in more machine learning applications so far is the lack of an open-source tool for generating them, and the publication limits placed by commercially available tools.
This work summarizes the development of an open-source python tool for generating sigma profiles to allow for their use in large-scale materials discovery. The work, also, addresses the effect of conformers on sigma profiles to ensure the generated profiles are consistent and rich with information. The quality or "richness" of the sigma profiles produced was benchmarked against datasets from literature by comparing the performances of Gaussian process models at predicting material properties with the different sigma profile sources.