PAL 2.0: A Physics-Driven Machine Learning Algorithm for Material Discovery | AIChE

PAL 2.0: A Physics-Driven Machine Learning Algorithm for Material Discovery

The lack of efficient discovery tools for advanced functional materials remains a major bottleneck to enabling advances in the next-generation of energy, health and sustainability technologies. A major factor contributing to discovery difficulties is the large combinatorial space of materials (with respect to material compositions and processing conditions) that is typically redolent of such materials-centric applications. Searches of this large combinatorial space are often influenced by expert knowledge and clustered close to material configurations that are known to perform well, leaving undiscovered potentially high-performing candidates in unanticipated regions of the composition-space or processing protocol. Experimental characterization or first principles quantum mechanical calculations of all possible material candidates can be prohibitively expensive, making it infeasible to conduct an exhaustive approach to determine the best candidates. As a result, there remains a need to develop computational algorithms that can efficiently search a large parameter space for a given application.

This poster introduces PAL 2.0, a machine learning method that combines a physics-based surrogate model with Bayesian optimization. The key contributing factor of our proposed framework is the ability to create a physics-based hypothesis using XGBoost and Neural Networks. This hypothesis provides a physics-based ‘prior’ (or initial beliefs) to a Gaussian process model, which is then used to perform a search of the material design space. We demonstrate the usefulness of our approach on three test cases: (1) discovery of metal halide perovskites with desired photovoltaic properties, (2) design of metal halide perovskite-solvent pairs that produce the best solution-processed films and (3) design of organic thermoelectric semiconductors.

The two most compelling results of PAL 2.0 are that we:

1. Demonstrate superior optimization performance, finding the optimal target within the lowest number of iterations as compared to state-of-the-art models such as Genetic Algorithms, an off-the-shelf Bayesian optimization package, SMAC, as well as one-hot-encoded Gaussian Process models for material discovery; and

2. Provide a predictive physics-informed model for the material space, offering valuable chemical insight.

To the best of our knowledge, PAL 2.0 is the first computational materials discovery framework that utilizes predictive, physics-based surrogate models within a Bayesian optimization framework by combining feature engineering and material space modeling.