(364b) Message Passing Neural Networks for Prediction of IR Spectra | AIChE

(364b) Message Passing Neural Networks for Prediction of IR Spectra

Authors 

McGill, C. J. - Presenter, North Carolina State University
Guan, Y., Massachusetts Institute of Technology
Green, W., Massachusetts Institute of Technology
Spectroscopy is an important tool for manufacturing for ascertaining purity in biological systems and drug manufacturing. Prediction of unknown spectra would be a powerful tool for extending existing spectral identification databases and evaluating novel molecules for which there exist no available standards. Some recent advances have used machine learning techniques in the prediction of IR spectra from specific molecule classes, such as polyaromatic hydrocarbons. In this work we discuss our release of a model for IR prediction with a more general scope of application. This model will provide researchers with easy access to IR spectra predictions, and the software used to develop the model (an extension of the Chemprop software) will inform efforts to develop other spectra prediction models in the future. The software is freely distributed through GitHub and the trained model files are available through Zenodo.

The presented model for IR spectra prediction requires only the input of a molecular SMILES strings to generate predictions. The chemical representation is processed through a message passing neural network to encode the molecule in a latent vector representation followed by a feed forward neural network for prediction of IR spectra. The entire process is differentiable, making even the encoding of the latent molecule vector learnable and optimizable. The model is pre-trained using semi-empirical quantum chemistry calculations (GFN2-xTB) for molecules sampled from the PubChem database to learn molecular encodings for a wide scope of molecules. Further training is performed using 56,955 experimental spectra collected from four data sources: the National Institute of Standards and Technology (NIST), Pacific Northwest National Labs (PNNL), The National Institute of Advanced Industrial Science and Technology (AIST), and the Coblentz Society. The model allows for predictions of spectra in the gas phase and in four supported condensed phases.