(451e) A Graph Neural Network Approach for Efficient and Accurate Macromolecular Similarity Calculation | AIChE

(451e) A Graph Neural Network Approach for Efficient and Accurate Macromolecular Similarity Calculation

Authors 

Shi, J. - Presenter, University of Notre Dame
Audus, D. J., University of California, Santa Barbara
Olsen, B., Massachusetts Institute of Technology
Pairwise similarity between macromolecules plays a significant role in numerous cheminformatics tasks such as ranking, clustering, and classification. Previous approaches for macromolecular similarity calculations involve generating chemistry-informed graph representations of macromolecules with nodes representing monomers and edges representing bonds. Subsequently, similarity scores are computed via graph edit distance or graph kernel methods. However, these approaches have significant limitations. Graph edit distance, which measures the minimum operation costs to transform one graph to another graph, is a nondeterministic polynomial (NP) complete problem. This limitation hinders the use of graph edit distance in time-sensitive applications, such as the real-time ranking of search results. Conversely, graph kernels transform graphs to fixed-length feature vectors and then compute an inner product of the feature vectors, yielding a pairwise similarity score. Without the need to do feature extractions, graph kernel methods offer improved efficiency, but they suffer from reduced accuracy and interpretability compared to graph edit distance.

In this study, we propose a graph neural network model to address these challenges and accelerate macromolecular similarity calculations. The first step involves creating a database of macromolecule similarity by calculating and collecting the exact graph edit distances. The graph neural network model then proceeds through four stages: (1) node-level embedding, transforming each node of a graph into a vector by extracting the nodes' feature and structural properties; (2) graph-level embedding, generating one embedding vector for each graph by aggregating node embeddings; (3) graph-graph interactions using both the neural tensor network and the pairwise node comparison; and (4) fully connected network layers for graph edit distance predictions. The graph neural network is trained using the macromolecule pairwise graph edit distance database. Our proposed graph neural network model overcomes the limitations of previous graph similarity calculation methods. It substantially reduces computational costs while maintaining high accuracy and offers an efficient and precise solution for calculating the pairwise similarity of macromolecules. This novel method represents a significant advancement in cheminformatics for macromolecules, paving the way for the development of advanced search engines and quantitative design tools in macromolecules.