(432h) Demonstration of Closed-Loop Machine-Learning-Guided Experimental Platform for the Discovery of Novel Dye-like Compounds | AIChE

(432h) Demonstration of Closed-Loop Machine-Learning-Guided Experimental Platform for the Discovery of Novel Dye-like Compounds

Authors 

Koscher, B., Massachusetts Institute of Technology
Ha, S. K., MIT
Jensen, K., Massachusetts Institute of Technology
McDonald, M., Georgia Tech
We demonstrate an automated, high-throughput experimental platform executing successive cycles of model prediction, experimental testing, and model updating without human intervention in a wellplate-batch architecture to facilitate the molecular discovery of dye-like compounds. Molecular property predictors in conjunction with procedural algorithms generate rare and unreported dye scaffolds functionalized either to optimize a set of user-specified property objectives (exploitative operation) or to improve model performance in a user-specified property space (explorative operation). Retrosynthetic routes and reaction conditions are predicted for a library of compounds, from which a subset is selected considering available reagents, costs, diversity of target compounds, property performance, and model uncertainty (either favored or discouraged for explorative and exploitative operation, respectively).

A central control network manages the selection of target compounds and the coordination of the synthesis, purification, and characterization of these compounds, carried out with only a single human checkpoint for reagent purchasing. To account for the variety of outcomes of the synthesis of novel compounds, the network allows instruments connected to the network (agents) to add steps to perform recovery and optimization actions on-the-fly. Additionally, agents use pre-programmed logics to determine how a step is executed (for example, choosing whether to filter or perform liquid-liquid extraction on a reaction mixture). The control network dynamically allocates material and experimental resources between agents in response to changes made by the agents. The flexible operation of the platform allows scripts to be written in a high-level, human-interpretable manner.

Following a campaign on the platform, the synthesis and characterization data are processed autonomously and the results are uploaded to a shared database from which Machine Learning models are automatically retrained. This union of an automated platform with Machine Learning guidance allows for both accelerated molecular discovery and the exploration of the best practices for how to aggregate new data and trigger model retraining. The platform has executed multiple cycles of automated molecular generation, synthesis planning and execution, characterization, data processing, and model retraining—demonstrating the viability of closed-loop Active Learning in the laboratory.