(346bf) Combining Strategic Training Data Selection and Feature Engineering to Reach Accurate and Efficient Molecular Property Prediction

Conference

AIChE Annual Meeting

Year

2020

Proceeding

2020 Virtual AIChE Annual Meeting

Group

Computational Molecular Science and Engineering Forum

Session

Poster Session: Computational Molecular Science and Engineering Forum (CoMSEF)

Time

Wednesday, November 18, 2020 - 8:00am to 9:00am

Authors

Li, B. - Presenter, Lehigh University

Rangarajan, S., Lehigh University - Dept of Chem & Biomolecular

Organic molecular design problems, such as drug discovery or material design, aim to identify molecules with desired properties from the chemical space, wherein the number of potential compounds is estimated to reach 10⁶⁰. The size of the chemical space forbids experiments or high-level quantum chemistry to evaluate each molecule. In recent decades, the integration of machine learning methods with virtual screening makes the exploration of chemical space practical due to its high efficiency and low cost. While many machine learning models manage to reach high accuracy with hundreds of thousands of training molecules, only a handful of study has been focused on optimizing the model performance under a tight computation budget. In this work, we propose a strategy to obtain accurate machine learning predictions with a minimum number of data points required for training. Specifically, we address the problem in threefold. First, we demonstrate the efficacy of a method that adaptively builds the compact training set by systematically balancing exploitation via experimental design and exploration of the space via cheminformatics-based diversity maximization procedures. Second, we expand this procedure with the use of nonlinear and locally linear dimensionality reduction methods to leverage data embeddings. Third, we focus on improving the model accuracy under the constraint of a small training set, which we achieve by progressively incorporating nonlinearity to our modified group additivity approach.

Topics

Computational Molecular Engineering

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2024 Annual Safety in Ammonia Plants and Related Facilities Symposium

4th Optogenetic Technologies and Applications Conference

Upcoming Conferences & Events

CCPS Workshop on Process Safety Metrics: API-RP-754 Implementation

University of Houston Student Process Safety Bootcamp

2024 Annual Safety in Ammonia Plants and Related Facilities Symposium

9th CCPS Canadian Regional Meeting

4th Optogenetic Technologies and Applications Conference

tcbiomass 2024

AIChE 2024 Virtual Career Fair for Professionals

CCPS Pharma, Food, and Fine Chemicals Meeting, September 2024

10th Latin American Conference on Process Safety

CEP: August 2024

CEP: July 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(346bf) Combining Strategic Training Data Selection and Feature Engineering to Reach Accurate and Efficient Molecular Property Prediction

AIChE Annual Meeting

2020

2020 Virtual AIChE Annual Meeting

Computational Molecular Science and Engineering Forum

Poster Session: Computational Molecular Science and Engineering Forum (CoMSEF)

Wednesday, November 18, 2020 - 8:00am to 9:00am

Authors

Topics

More Conference Links

Contact Us

Cancellation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams