(6cj) Toward Autonomous Molecular Discovery: Machine Learning and Automation for the Rational Design and Optimization of Novel Compounds | AIChE

(6cj) Toward Autonomous Molecular Discovery: Machine Learning and Automation for the Rational Design and Optimization of Novel Compounds

Authors 

Research Interests:

The identification and synthesis of molecules that exhibit desired functions are an essential part of addressing contemporary problems in science and technology. The typical paradigm of molecular discovery is an iterative cycle of design, synthesis, and testing. The rate at which this process yields successful compounds can be limited by bottlenecks at all three stages and is plagued by inefficiencies in resource allocation, including the need for frequent manual intervention.

My future research will address the unification of chemical knowledge and empirical data to support autonomous molecular discovery.

The overall goal of my work will be to develop novel modeling strategies and computational approaches that in combination with more traditional automation techniques will improve the efficiency of small molecule discovery. Molecular discovery is a problem of inference from incomplete and imperfect information, for which techniques in machine learning and artificial intelligence are well-suited. My research will build upon methodologies in machine learning, laboratory automation, and synthetic chemistry to design, implement, and apply AI-driven experimental platforms for the rational design and optimization of novel molecular agents, e.g., as chemical sensors, antibiotics, high performance polymer precursors, organic semiconductors, organocatalysts, and protein binders.

My graduate research with Klavs F. Jensen and William H. Green has focused on addressing one aspect of this broad challenge—streamlining the synthesis of small molecules in the context of pharmaceutical discovery—from two perspectives: one experimental, the other using techniques in data science and machine learning.

My experimental research interests are driven by the goal of increasing the time, material, and experimental efficiency of data collection. To this end, I have developed several automated microfluidic reactor platforms for studying physical and chemical processes at the micromole scale. This has included the design and construction of an oscillatory droplet platform for the on-demand synthesis and purification of compound libraries including those requiring multi-phase, multi-step, or photochemical transformations [1-2]. The platform is capable of efficient closed loop self-optimization using an optimal design of experiments to bridge the gap between research-scale and production-scale synthesis [3]; my industrial collaborators have demonstrated that conditions optimized at the microliter scale are suitable for scaled-up continuous flow synthesis [4].

As a complement to identifying new opportunities for and applications of automated experimentation, my computational research interests have been driven by the goal of meaningfully generalizing existing data to new problems in synthesis and synthesis design. Specifically, in addition to contributing to the quintessential cheminformatics problem of developing structure-activity relationships [5], I have developed new data-driven methodologies for the design and validation of small molecule synthetic routes [6]. Among those methodologies is an automated workflow to propose retrosynthetic disconnections for novel molecules based on analogy to precedents in reaction corpora, intended to mimic manual approaches to the same task [7]. Published reactions also contain an implicit definition of synthetic complexity that a neural model can be trained to quantify for the evaluation of potential synthetic targets [8]. As a more significant undertaking, I have developed novel strategies to predict the reaction outcomes in silico—for virtual screening, impurity prediction, or structural elucidation—leveraging the flexibility in pattern recognition afforded by neural networks [9, 10]. My ongoing work in this area focuses on extracting new insights into chemical reactivity that support and supplement expert chemist intuition.

Teaching Interests:

My primary aims in teaching are to instill in students an inquisitiveness about why physical phenomena behave the way they do and to equip them with the ability to articulate and answer their questions. As an undergraduate at Caltech, I TAed a broad range of core chemical engineering subjects including separation process principles, thermodynamics I and II, a synthetic chemistry laboratory, and heat transport. I strongly support the cross-disciplinary use of programming and computational techniques and, after twice teaching the introductory computer science course at Caltech, TAed the first-year graduate course at MIT on numerical methods as applied to chemical engineering. Most recently, I completed the Kaufman Teaching Certificate Program to help formalize the best practices I have observed in the classroom both as a student and as a teacher.

From my formal education and past teaching experience, I am confident in my ability to teach any core chemical engineering course at the undergraduate or graduate level. I am particularly interested in teaching courses that combine chemical engineering and numerical methods, data science, or machine learning, whether as an expansion of an existing course or as a new course, Introduction to Statistical Inference for the Chemical Sciences.

Selected Awards:

  • DARPA Riser, 2018
  • William C. Rousseau Award in Leadership and Ethics in Chemical Engineering Practice, 2016
  • Robert T. Haslam Presidential Graduate Fellowship, MIT, 2014
  • NSF Graduate Research Fellowship, 2014
  • Frederic W. Hinrichs, Jr. Memorial Award for Student Leadership, Caltech, 2014
  • HHMI Undergraduate Teaching Assistant Fellowship, 2013
  • Edward Richter Memorial Undergraduate Research Fellowship, 2011

Selected Publications:

[1] Hwang, Y.-J*; Coley, C. W.*, Abolhasani, M., Marzinzik, A.L., Koch, G., Spanka, C., Lehmann, H., Jensen, K.F. “A segmented flow platform for on-demand medicinal chemistry and compound synthesis in oscillating droplets.” Chem. Comm. 53(49), 6649–6652 (2017).

[2] Coley, C. W., Abolhasani, M., Lin, H. & Jensen, K. F. “Material-efficient microfluidic platform for exploratory studies of visible-light photoredox catalysis.” Angew. Chem. Int. Ed. 129(33), 9979–82 (2017).

[3] Baumgartner, L.*, Coley, C. W.*, Reizman, B., Gao, K., Jensen, K. F. “Optimum catalyst selection over continuous and discrete process variables with a single droplet microfluidic reaction platform.” React. Chem. Eng. 3(3), 301-311 (2018).

[4] Hsieh, H.-W., Coley, C. W., Baumgartner, L., Jensen, K. F., Robison, R. “Photoredox iridium-nickel dual catalyzed decarboxylative arylation cross-coupling: from batch to continuous flow via self-optimizing segmented flow reactor.” Org. Process Res. Dev. 22(4), 542-550 (2018).

[5] Coley, C. W., Barzilay, R., Green, W. H., Jaakkola, T. S., & Jensen, K. F. “Convolutional embedding of attributed molecular graphs for physical property prediction.” J. Chem. Inf. Model. 57(8), 1757-1772 (2017).

[6] Coley, C. W., Green, W. H., Jensen, K. F. “Machine learning in computer-aided organic synthesis.” Acc. Chem. Res. 51(5), 1281-1289 (2018).

[7] Coley, C. W., Rogers, L., Green, W. H., Jensen, K. F. “Computer-assisted retrosynthesis based on molecular similarity.” ACS Cent. Sci. 3(12), 1237-1245 (2017).

[8] Coley, C. W., Rogers, L., Green, W. H., Jensen, K. F. “SCScore: Synthetic complexity learned from a reaction corpus.” J. Chem. Inf. Model. 58(2), 252-261 (2018).

[9] Coley, C. W., Barzilay, R., Jaakkola, T. S., Green, W. H. & Jensen, K. F. “Prediction of organic reaction outcomes using machine learning.” ACS Cent. Sci. 3(5), 434–443 (2017).

[10] Jin, W., Coley, C. W., Barzilay, R., & Jaakkola, T. “Predicting organic reaction outcomes with weisfeiler-lehman network.” NIPS (2017).

Topics