(119d) The Molecular Descriptors with Actively Identified Subspaces (MolDAIS) Framework for Efficient Molecular Property Optimization and Beyond
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Computing and Systems Technology Division
Advances in machine learning and intelligent systems I
Monday, October 28, 2024 - 1:24pm to 1:42pm
In this work, we present the MolDAIS (Molecular Descriptors with Actively Identified Subspaces) framework [9] that tackles MPO problems from a different perspective. Specifically, MolDAIS combines a high-dimensional molecular descriptor representation of the search space with a Gaussian process (GP) model defined over a sparse-axis aligned subspaces (SAAS) prior [10] over which we can deploy standard Bayesian optimization methods (such as expected improvement). The fundamental assumption being made by MolDAIS is that only a fairly small number of well-designed features are needed to accurately predict any specific property of interest. Since these key features are rarely known a priori, we aim to learn them (from a large set of features) in the low-data regime using a sparsity-inducing GP prior. This idea is similar to that motivating so-called explainable machine learning methods that have seen an increase in popularity in recent years [11, 12]. An important difference in MolDAIS is that the chosen descriptors are actively learned in the sense that they evolve as more property data is collected â we have observed the ability to on-the-fly correct for errors in the representation can have a large influence on performance. MolDAIS has been extensively benchmarked versus several competing MPO algorithms; we have found it to routinely outperform all other tested approaches and, in some cases, is able to find the best-in-class molecule from more than 100,000 candidates by testing only ~20 molecules. We will also show how MolDAIS can be straightforwardly extended to problems outside of traditional MPO including constrained, multi-objective, and human-in-the-loop settings.
References:
[1] OâBoyle, N. M. (2012). Towards a Universal SMILES representation-A standard method to generate canonical SMILES based on the InChI. Journal of cheminformatics, 4, 1-14.
[2] Krenn, M., Häse, F., Nigam, A., Friederich, P., & Aspuru-Guzik, A. (2020). Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation. Machine Learning: Science and Technology, 1(4), 045024.
[3] Wieder, O., Kohlbacher, S., Kuenemann, M., Garon, A., Ducrot, P., Seidel, T., & Langer, T. (2020). A compact review of molecular property prediction with graph neural networks. Drug Discovery Today: Technologies, 37, 1-12.
[4] Cereto-Massagué, A., Ojeda, M. J., Valls, C., Mulero, M., Garcia-Vallvé, S., & Pujadas, G. (2015). Molecular fingerprint similarity search in virtual screening. Methods, 71, 58-63.
[5] Moriwaki, H., Tian, Y. S., Kawashita, N., & Takagi, T. (2018). Mordred: a molecular descriptor calculator. Journal of cheminformatics, 10, 1-14.
[6] Gómez-Bombarelli, R., Wei, J. N., Duvenaud, D., Hernández-Lobato, J. M., Sánchez-Lengeling, B., Sheberla, D., Aguilera-Iparraguirre, J., Hirzel, T. D., Adams, R. P., & Aspuru-Guzik, A. (2018). Automatic chemical design using a data-driven continuous representation of molecules. ACS central science, 4(2), 268-276.
[7] Maus, N., Jones, H., Moore, J., Kusner, M. J., Bradshaw, J., & Gardner, J. (2022). Local latent space Bayesian optimization over structured inputs. Advances in neural information processing systems, 35, 34505-34518.
[8] Frazier, P. I. (2018). A tutorial on Bayesian optimization. arXiv preprint arXiv:1807.02811.
[9] Sorourifar, F., Banker, T., & Paulson, J. A. (2024). Accelerating Black-Box Molecular Property Optimization by Adaptively Learning Sparse Subspaces. arXiv preprint arXiv:2401.01398.
[10] Eriksson, D., & Jankowiak, M. (2021, December). High-dimensional Bayesian optimization with sparse axis-aligned subspaces. In Uncertainty in Artificial Intelligence (pp. 493-503). PMLR.
[11] Wang, Y., Wagner, N., & Rondinelli, J. M. (2019). Symbolic regression in materials science. MRS Communications, 9(3), 793-805.
[12] Udrescu, S. M., & Tegmark, M. (2020). AI Feynman: A physics-inspired method for symbolic regression. Science Advances, 6(16), eaay2631.