(372g) Molecule Selection and Batch Synthesis Planning for Design-Make-Test Cycles | AIChE

(372g) Molecule Selection and Batch Synthesis Planning for Design-Make-Test Cycles

In design-make-test loops, chemists iteratively design a set of molecules, synthesize them, and test them. The design of candidate molecules should be guided not only by how likely a molecule is to fulfill the design criteria, but also by synthetic cost and the probability of successful synthesis. Typically, chemists balance these factors during the design phase, intuitively designing molecules that they believe to be synthetically accessible. However, if a computational tool is used to design molecules in addition to those posed by a chemist, some candidates may not be synthesizable. In these cases, the design of candidate molecules and the selection of those to synthesize must become distinct steps: first, a set of candidates is proposed by a chemist and/or computational tool, and then a subset of them is selected for synthesis and testing. As the set of candidate molecules increases in size, chemist-guided selection becomes increasingly challenging because various factors related to synthesizability and predicted molecular properties must be balanced. No systematic approach currently exists to simultaneously select a subset of candidate molecules and corresponding synthetic routes in the context of design-make-test cycles.

We address this challenge by formulating the selection of molecules and their synthetic routes as a constrained integer linear optimization problem. Previous work has explored synthesis planning for both cost [1] and batch efficiency [2], but these algorithms take the set of molecules to be synthesized as a known input. Our proposed optimization algorithm extends this work through a scalarized objective function that simultaneously considers the molecular design objective, batch synthesis complexity, and synthetic feasibility. Retrosynthesis trees are first constructed for each molecule, and a set of constraints are defined such that optimal decision variables correspond to synthetic routes included in the pre-defined trees. The constrained linear optimization problem is solved using the commercial solver Gurobi, and the optimal decision variables are converted to a set of selected molecules and synthetic routes. The proposed workflow is described in greater detail below:

  1. Each candidate is assigned a “utility,” which captures the perceived information gained from testing that molecule.
  2. Retrosynthesis trees are constructed for each candidate molecule using ASKCOS [3]. The retrosynthesis trees for all candidate molecules are subsequently combined into a reaction “forest”, which is a graph that connects reaction nodes with molecule (starting material, intermediate, or candidate) nodes.
  3. A constrained linear optimization problem is formulated to select a set of molecules and their synthesis routes to optimize a scalarization of multiple objectives. The objectives are (1) to maximize the sum of the utilities of selected molecules, (2) to minimize the total number of starting materials and reagents, and (3) three to minimize the sum of reaction penalties, which capture the retrosynthesis model’s level of uncertainty in a reaction.
  4. The optimal variables are converted to a set of selected molecules and synthetic routes.

We demonstrate that our algorithm is capable of scaling to candidate sets containing hundreds of molecules. Moreover, we show how adjusting scalarization weights affects the number of molecules selected, the number of overlapping reaction steps, and the retrosynthesis model’s confidence in selected routes. Finally, we propose how the optimization task can be adjusted to minimize synthetic cost and maximize the probability of successful synthesis more rigorously.

[1] Badowski, T.; Molga, K.; A. Grzybowski, B. Selection of Cost-Effective yet Chemically Diverse Pathways from the Networks of Computer-Generated Retrosynthetic Plans. Chemical Science 2019, 10 (17), 4640–4651.

[2] Molga, K.; Dittwald, P.; Grzybowski, B. A. Computational Design of Syntheses Leading to Compound Libraries or Isotopically Labelled Targets. Chem. Sci. 2019, 10 (40), 9219–9232.

[3] Coley, C. W.; Thomas, D. A.; Lummiss, J. A. M.; Jaworski, J. N.; Breen, C. P.; Schultz, V.; Hart, T.; Fishman, J. S.; Rogers, L.; Gao, H.; Hicklin, R. W.; Plehiers, P. P.; Byington, J.; Piotti, J. S.; Green, W. H.; Hart, A. J.; Jamison, T. F.; Jensen, K. F. A Robotic Platform for Flow Synthesis of Organic Compounds Informed by AI Planning. Science 2019, 365 (6453), eaax1566.