(509df) Graph Neural Networks to Predict Thermochemistry of Halogenated Hydrocarbons for Use in Combustion Modeling: New Data, Methods, and a Cautionary Tale.

Conference

AIChE Annual Meeting

Year

2021

Proceeding

2021 Annual Meeting

Group

Catalysis and Reaction Engineering Division

Session

Poster Session: Catalysis and Reaction Engineering (CRE) Division

Time

Wednesday, November 10, 2021 - 3:30pm to 5:00pm

Authors

Farina, D. Jr. - Presenter, Northeastern University

Sirumalla, S. K., Northeastern University

West, R. H.

Developing the next generation of more environmentally friendly flame suppressants and safer refrigerant working fluids requires predicting the combustion properties of novel halogenated hydrocarbons (HHCs). Software such as the open-source Reaction Mechanism Generator (RMG) can automatically construct detailed kinetic models of complex reaction mechanisms for us, but it relies on reasonable estimates of the thermochemistry (enthalpy, entropy, heat capacity) of thousands of intermediate species. In this work, we generate a new dataset of thermochemistry calculations and train three graph neural network architectures, including a novel architecture, for molecular property prediction. We test these neural networks with RMG-generated HHCs and highlight some challenges in doing so.

To generate training data, we constructed a new set of thermochemistry calculations by systematically enumerating halocarbon species containing up to 4 C and O atoms with at least one halogen atom (F, Cl, or Br). Conformers were generated using tight binding DFT, geometry optimizations and 1D hindered rotor scans were performed with DFT, and electronic energies recomputed with G4, resulting in high quality sets of enthalpy, entropy, and heat capacities for over 16,000 species.

Three graph neural network (GNN) architectures are investigated: MPNN (Gilmer et al. 2017), DMPNN (Yang et al. 2019), and a modified MPNN with dot product attention inspired by GAT (VeliÄkoviÄ‡ et al. 2017), which we refer to as Attention MPNN. All GNN models were trained with a batch size of 128, Adam optimizer (Kingma and Ba 2014) with cosine annealing learning rate schedule of warm up upto 100 epochs and annealing from 100 epochs to 540 epochs. The models are evaluated using a Scaffold split (Wu et al. 2017), and the best performing model is selected with the lowest mean absolute error (MAE) on the validation set. All GNN architectures are implemented in Deep Graph Library (DGL) with pytorch backend, and the training framework is implemented in pytorch-lightning. All GNN models showed very good performance, with MAEs for âˆ†HÂºf(298K) ranging from 0.6 - 2 kcal/mol on the validation set.

Since our purpose is to deploy these models during automated mechanism generation, we build a test set of several hundred species by using RMG to burn a variety of HHC refrigerants and fire suppressants. On these test species, which better represent where the GNNs will be used, the MAE for standard enthalpy of formation ranges from 15 - 40 kcal/mol, with some errors in the 60-90 kcal/mol range. Despite good performance on validation sets (MAE = 0.6 - 2 kcal/mol), all of the models are very poor at generalizing to out of distribution data points in the test set, when trained with the initial training dataset. By identifying weaknesses (eg. ring compounds and biradicals) and expanding the training set with additional DFT calculations, these problems are addressed and our final modelâ€™s MAE on the test set improves to 3 - 5 kcal/mol. As well as describing a new dataset and state of the art property prediction models, this talk serves as a cautionary tale to test generalization using very diverse data sets, and to only rely on a model where it has been trained.

Gilmer, Justin, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. 2017. â€œNeural Message Passing for Quantum Chemistry.â€ In International Conference on Machine Learning, 1263â€“72. PMLR.

Kingma, Diederik P., and Jimmy Ba. 2014. â€œAdam: A Method for Stochastic Optimization.â€ http://arxiv.org/abs/1412.6980 .

VeliÄkoviÄ‡, Petar, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro LiÃ², and Yoshua Bengio. 2017. â€œGraph Attention Networks.â€ http://arxiv.org/abs/1710.10903 .

Wu, Zhenqin, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, and Vijay Pande. 2017. â€œMoleculeNet: A Benchmark for Molecular Machine Learning.â€ http://arxiv.org/abs/1703.00564 .

Yang, Kevin, Kyle Swanson, Wengong Jin, Connor Coley, Philipp Eiden, Hua Gao, Angel Guzman-Perez, et al. 2019. â€œAnalyzing Learned Molecular Representations for Property Prediction.â€ Journal of Chemical Information and Modeling 59 (8): 3370â€“88. https://doi.org/10.1021/acs.jcim.9b00237

Topics

New Research Areas

Reaction Mechanism

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: January 2025

CEP: December 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.