(356j) Infrared Spectra Prediction with Machine Learning | AIChE

(356j) Infrared Spectra Prediction with Machine Learning

Authors 

McGill, C. J. - Presenter, North Carolina State University
Green, W., Massachusetts Institute of Technology
Guan, Y., Massachusetts Institute of Technology
Spectroscopy is an important tool for manufacturing for ascertaining purity in biological systems and drug manufacturing. Prediction of unknown spectra would be a powerful tool for extending existing spectral identification databases and evaluating novel molecules for which there exist no standards. Machine learning algorithms using deep neural networks have been used for the interpretation of spectra in microbiology and food production for the last ten years. The prediction of the spectra themselves is a more recent area of research, with new efforts to predict IR spectra, proton NMR spectra, and mass spectra. It is the purpose of this present effort to use machine learning to predict IR spectra for novel molecules, improving on previous such work by including effects beyond calculated harmonic frequencies and using learned molecular fingerprints.

A model for IR spectral prediction has been developed, requiring only the input of a molecular graph structure provided in the form of a SMILES code. The chemical representation is processed through a message passing neural network followed by a feed forward neural network for prediction of IR spectra, an adaptation of the Chemprop code for molecular property prediction. The model is pre-trained using quantum chemistry calculations for molecules sampled from the PubChem database to learn the molecular harmonic vibration modes. During pretraining, active learning is used to explore the chemical space more efficiently and prioritize molecules which will most improve the model. Further training is performed using experimentally collected spectra available in open-access databases from the National Institute of Standards and Technology (NIST) and the National Institute of Advanced Industrial Science and Technology (AIST). The model allows for predictions of spectra in the gas phase and in supported condensed phases.