(416f) Graph Hysteria – Comparing the Generative Performance of Graph and String-Based Translation Vaes for Molecular Design
AIChE Annual Meeting
2022
2022 Annual Meeting
Topical Conference: Applications of Data Science to Molecules and Materials
Innovations in Methods of Data Science
Tuesday, November 15, 2022 - 4:56pm to 5:12pm
Here we systematically compare the generative and predictive properties of graph and string-based encoders and decoders by framing the task of a VAE as a set of machine-translation problems â graph-to-graph, string-to-string, graph-to-string and string-to-graph. In doing so we can isolate the impact of the input representation on the quality of the learned molecular embeddings as well as the impact of the output representation on the novelty, diversity and validity of machine-generated structures. We find that the choice of encoder has a tangible effect on the modelâs ability to explore molecular phase space and that the choice of decoder has significant influence on the practical viability of a model. Finally, we also compare the effect of input representation on property prediction and model interpretability and discuss in which scenarios each architecture is likely to be optimal.
- Jin, W., Barzilay, R. & Jaakkola, T. Junction Tree Variational Autoencoder for Molecular Graph Generation. arXiv:1802.04364 [cs.LG] (2018).
- Jin, W., Barzilay, R. & Jaakkola, T. Hierarchical Generation of Molecular Graphs using Structural Motifs. in Proceedings of the 37th International Conference on Machine Learning 4839â4848 (2020).
- Mahmood, O., Mansimov, E., Bonneau, R. & Cho, K. Masked graph modeling for molecule generation. Nature Communications 2021 12:1 12, 1â12 (2021).
- Gasteiger, J., GroÃ, J. & Günnemann, S. Directional Message Passing for Molecular Graphs. arXiv:2003.03123 (2020) doi:10.48550/arxiv.2003.03123.
- Mercado, R. et al. Graph networks for molecular design. Machine Learning: Science and Technology 2, 025023 (2021).
- Flam-Shepherd, D., Zhu, K. & Aspuru-Guzik, A. Keeping it Simple: Language Models can learn Complex Molecular Distributions. arXiv:2112.03041 (2021) doi:10.48550/arxiv.2112.03041.