(545a) Expanding Density-Correlation Machine Learning Formalisms for Anisotropic Particles and Hierarchical Systems | AIChE

(545a) Expanding Density-Correlation Machine Learning Formalisms for Anisotropic Particles and Hierarchical Systems

Authors 

Cersonsky, R. - Presenter, EPFL STI IMX COSMO
In recent years, machine learning (ML) methods have transformed computational chemistry and materials research. At the start of any machine learning inquiry we must first ask: how do we represent our system? This question is at the core of many ML methods, ranging from end-to-end methods, that take "raw" data formats and use multi-layer architectures to implicitly encode the chemical information to "feature-forward" methods, that use physics and chemistry to design numerical representations conducive to machine learning workflows. While both approaches have merits and overlaps, the latter often allows for added tunability to identify the role of key parameters in the design space and more straightforward interpretation.

There are many ways to encode the raw chemical data into features, and the suitable choice largely depends on the problem at hand. For cheminformatics, string-based featurizations such as SMILES or SELFIES are popular, as they encode important parameters such as present functional groups and connectivity and can be parsed using NLP models and other text-based advancements. However, in thermodynamic contexts, where the chemistry and connectivity remains largely unchanged, such as in molecular simulation, it is more typical to use configuration-dependent features, which transform molecular coordinates into a range of suitable numerical representations.

Yet, that leaves many of us that work beyond the atomic scale without many options for thermodynamic representations that reflect the complex interactions of mesoscale building blocks. Of the frameworks developed for atomic representations, density-based frameworks present the most compelling avenue for expansion. Furthermore, by defining our representations within the same conceptual framework as our atomic counterparts, we beget an ability to combine or compare representations across multiple scales, which is desperately needed for multiscale simulation and understanding and designing cross-scale phenomena.

Here, we introduce and demonstrate a first-such expansion of density-based frameworks for ML representations by taking the popular SOAP (Smooth Overlap of Atomic Positions) formalism and demonstrating its expansion to simple anisotropic bodies. We provide both the mathematical theory behind this expansion and demonstrate several case-studies for its usage across multiple simulation length-scales and materials systems.