(566k) Automated Molecular Structure Deduction from Spectral and Reaction Data | AIChE

(566k) Automated Molecular Structure Deduction from Spectral and Reaction Data

Authors 

Savoie, B., Purdue University
Molecular structure identification is a fundamental task in chemistry that enables various niche applications such as degradant analysis, property prediction and formulation development. Spectroscopic methods, including mass spectrometry (MS), infrared (IR), and nuclear magnetic resonance (NMR) spectroscopy, are widely used to identify unknown products. However, due to the complexity of potential molecular structure space, the information obtained from spectroscopic characterization is often deficient, leaving the problem of elucidating chemical structure from spectral results underdetermined. To address this challenge, we propose a machine learning (ML) model that integrates spectral outcomes with information on starting materials, and establishes a deductive workflow for structure prediction. Our approach leverages the fact that researchers usually have information on what they put in the flasks prior to analyzing the unknown spectroscopic data. This additional information of reactants and reagents restricts the vastness of product space, leading to more precise structure prediction over analyzing spectra from scratch. The deductive ML model comprises several parallel-placed transformer modules which study the structural features of reactants, as well as the spectral fingerprint patterns. A deductive super-network then comprehensively deduces the most likely molecular candidates. It is typical that the success of ML approach depends on the quality and completeness of the data used to train the model. However, our model has the advantage of maintaining comparable performance even when one or more spectroscopes are absent.