(687e) Chemprop: Machine Learning for Molecular Property Prediction | AIChE

(687e) Chemprop: Machine Learning for Molecular Property Prediction

Authors 

McGill, C. - Presenter, Massachusetts Institute of Technology
Heid, E., Massachusetts Institute of Technology
Chung, Y., Massachusetts Institute of Technology
Greenman, K., Massachusetts Institute of Technology
Graff, D., Massachusetts Institute of Technology
Liu, M., Massachusetts Institute of Technology
Bilodeau, C., Massachusetts Institute of Technology
Gomez-Bombarelli, R., Massachusetts Institute of Technology
Coley, C., MIT
Jensen, K., Massachusetts Institute of Technology
Jaakkola, T. S., Massachusetts Institute of Technology
Barzilay, R., Massachusetts Institute of Technology
Green, W., Massachusetts Institute of Technology
Chemprop is an easy-to-use machine learning software for chemical property prediction. The software is open-source and available for public use [1] . With a simple interface and contained workflow, Chemprop is an accessible tool for use by scientists with varying levels of engagement with machine-learning techniques who want to create high-quality models from their data. Supported by the Machine Learning for Pharmaceutical Discovery and Synthesis Consortium [2], the software has an active base of industry users, driving continuous development of new features and support.

We will discuss the architecture used by Chemprop, some notable examples of recent applications, and several of the software’s significant features. Inputs to Chemprop models are provided as SMILES strings from which the software can construct 2D connectivity graphs of molecules. Chemprop’s network achitecture uses directed message passing neural networks (d-MPNN) for learnable molecular encodings, as implemented and benchmarked against other architectures by Yang et al. [3]. The end-to-end learning enabled by this architecture allows for the software to extract the information from the molecular graph that is most relevant to the property target being modeled. The software has been used for a variety of different applications, showing the versatility of learned encodings: enthalpy of formation, activation energy, solubility, antibiotic activity, reaction regioselectivity, UV-Vis absorption, and infrared spectra.

The implementation of Chemprop has incorporated a number of features and functions to fit the needs of its users. Chemprop has GPU-enabled training and prediction of models. Additional functions have been added for hyperparameter optimization, the extraction of molecule latent representations, and the estimation and calibration of model uncertainty. Additional molecule- and atom-level features can be provided to bring in outside information from outside methods such as experimental measurements or quantum mechanical calculations. Chemprop supports inputs using reactions or multiple molecules (e.g., solvent and solute). Tools for transfer learning and weighted multitask models enable the model to infer useful relationships across distinct datasets. The contained workflow allows for users to perform all of Chemprop’s main functions with minimal coding required.

[1] Chemprop: Molecular Property Prediction. https://github.com/chemprop/chemprop

[2] Machine Learning for Pharmaceutical Discovery and Sythesis Consortium. https://mlpds.mit.edu

[3] Yang, K.; Swanson, K.; Jin, W.; Coley, C.; Eiden, P.; Gao, H.; Guzman-Perez, A.; Hopper, T.; Kelley, B.; Mathea, M.; et al. Analyzing Learned Molecular Representations for Property Prediction. J. Chem. Inf. Model. 2019, 59 (8), 3370–3388. https://doi.org/10.1021/acs.jcim.9b00237.