(290g) Inverse Design of Nanoparticles Using Charge Transfer Properties and Multi-Target Machine Learning | AIChE

(290g) Inverse Design of Nanoparticles Using Charge Transfer Properties and Multi-Target Machine Learning

Authors 

Barnard, A. - Presenter, Australian National University
Li, S., Australian National University
Objectives

Inverse design [1, 2] that prescribes a target structure is a primary objective in materials informatics, and the ultimate goal of much academic and industrial research. However, the majority of materials informatics uses machine learning (ML) to make forward predictions of a property of a material (the target label) based on the structural characteristics (the features); so-called structure/property relationships. Inverse design involves property/structure relationships that are highly desirable since a researcher usually knows what properties they need for a particular application and want a “recipe” of what they should be attempting to make in the lab. This is challenging, and even more complicated in nanomaterials design, where the finite sizes and multitude of shapes mean the design space is larger.[3] This goal has been approached in the past using conventional structure/property relationships generated with ML and screening of hypothetical candidate materials using an additional step to optimize and rank the outcomes. Unfortunately, this approach suffers from too much specificity since structure/property relationships typically involve many structural features but only one target property label. This makes inverting the problem difficult as there will be only one known variable (property) and numerous unknown variables (structures) making the solution intractable.

An alternative approach to inverse design for multi-functional materials is possible in cases where more than one property label is known. By using multiple properties as inputs to predict a set of target structural outputs we can overcome these technical problems and obtain a more holistic structural profile which avoids the issue of one structural characteristic being insufficient to guide experiments. In this paper we demonstrate this inverse design workflow using multi-target ML, on a computational data set describing the charge transfer properties of silver nanoparticles. As we will show, by informing the inverse models with information on the important structural characteristics in a conventional (forward) structure/property relationship, new property/structure relationships can be predicted with comparable performance in terms of accuracy and generalizability.

Case Study and Discussion

This study employs a publicly available ensemble data set of 425 silver nanoparticles [4] originally simulated using electronic structure methods and characterized numerically using statistical analyses to define structural features for a set of attributes. Information on how the features were generated can be obtained from references listed on the data repository [5]. Feature selection and engineering plays an important role in the application of machine learning in material science, and in our study, linearly correlated attributes are identified using a correlation matrix and attributes with over 95% correlation are removed, in addition to attributes with a standard deviation of less than 0.1% to reduce noise. The labels included in the data set includes the formation energy (Formation_E), and electron change transfer properties include the ionization potential (IP), the electron affinity (EA), the electronic band gap (EG) the electronegativity (EN) or the energy the Fermi level (EF), all measured in eV.

Random Forest (RF) [6] is an ensemble technique that constructs a large number of decision trees and combines bootstrap aggregation, also known as bagging [7], and random feature selection methods. Each decision tree in the forest is trained on a different data sample with random subsets of features, and “votes” determine the prediction that best resembles the limited scope at the point of prediction. Random forests count each vote from individual trees and return the outputs with the most votes as the final ensemble prediction. RF ensemble predictor can be extended to tackle multi-target learning problems by replacing the typical univariate trees in the RF with multivariate trees [8].

To begin, the forward RF model is optimized and trained using k-fold cross validation, which enables the regressor to rank the feature attributes by Gini importance. Recursive feature elimination was used to select the most important attributes sufficient for the model to simultaneously predict the property labels without loss of accuracy or generalizability. It is important to reduce the number of predictable structural characteristics to be commensurate with (or similar to) the number of available properties, and while not essential, it is good practice to repeat the forward model optimization and training using only the final subset of features to quantify any loss.

The forward multi-target prediction training was initially conducted with 48 retained features following cleaning and five property labels. During data cleaning we removed outliers, retaining instances where IP<5 eV, EA>2.5 eV, EG<2.5 eV, -4.05<EF<-3.5 eV and Formation_E<0.8 eV. This resulted in a final set of 414 silver nanoparticles. The results for the MAE=0.0614 and MSE=0.0090 for all the structural features after cleaning. These 48 features were ranked, and recursive feature elimination was used to determine that the optimal model can be achieved using only 4 structural attributes: The ratio of the fraction of surface atoms to bulk atoms (Ag_Ratio), the average nanoparticle radius (R_avg), the deviation of the nanoparticle from an ideal anisotropic sphere (Anisotropy) and the number of surface facets which is a function of the polyhedral shape (Facets). The results using this reduced feature set of MAE=0.0710 and MSE=0.0120 show minimal loss of performance.

The data set was then inverted; the reduced set of important attributes becoming structural “meta-labels”, and the multiple property labels becoming the new “meta-features”. The inverted set is then split into training, testing and validation sets, and the optimization, training and validating processes of the multi-target RF model were repeated. These results show excellent performance with a MAE=0.0732 and MSE=0.0211, which are similar to the forward model and indicate that the forward model is also useful in anticipating the inverse model accuracy. To determine how sensitive the model is to the number of nanoparticle properties that are available, we also re-optimized and re-trained and inverse model with only the electron charge transfer properties. The impact is small, with MAE=0.0788 and MSE=0.0226 when predicting the important nanoparticle attributes based on IP, EA, EG and EF.

To give an insight into how to use a trained inverse model in practice we can consider a hypothetical instance with reasonable properties, and predict the nanoparticle that would exhibit them. For a silver nanoparticle with [Formation_E =0.4 eV, IP = 4.0 eV, EA = 3.0 eV, EG = 2.0 eV, EF = 2.5 eV] the inverse model prescribes [Ag_Ratio=0.56±0.0443, R_avg=22±1.3084 (Å), Anisotropy=1.42±0.0848, Facets=18±2]. The uncertainties were calculated using the MSE and are useful, as they provide researchers with a fault tolerance.

Conclusion

In this study we have demonstrated a new approach to inverse design using multi-target regression based on random forests. Using a publicly available data set of silver nanoparticles, characterized by a set of structural features and accompanied by multiple charge transfer property labels, we have shown that an inverse design model that predicts a set of structural characteristics from a set of properties can have similar performance to a traditional forward model (predicting properties from the structural features). Undertaking a precursory forward model is essential for focusing the inverse model on the features that are simultaneously influential to all properties, and also provides a useful indication of the approximate performance that can be expected for the inverse model. Overall the inverse design workflow used in the study is general and flexible enough to be applied to other nanoparticles and identify other types of property/structure relationships. This approach differs from alternative inverse design strategies based on optimization and does not require the generation of an exhaustive synthetic data set a priori. Materials can be identified by the model that were not in the original data set, so researchers can transform their materials design approach and potentially create nanomaterials suitable for specific chemical reactions.

References

[1] A Zunger (2018) Inverse design in search of materials with target functionalities. Nature Reviews Chemistry, 2(4):1-16.

[2] B Sanchez-Lengeling, A Aspuru-Guzik (2018) Inverse molecular design using machine learning: Generative models for matter engineering. Science, 361(6400):360-365.

[3] A S Barnard, B Motevalli, A J Parker, J M Fischer, C A Feigl, G Opletal (2019) Nanoinformatics, and the big challenges for the science of small things. Nanoscale, 11(41):19190-19201.

[4] A S Barnard and B Sun (2017) Silver nanoparticle data set, v3. CSIRO Data Collection.

[5] B Sun, M Fernandez, A S Barnard (2017) Machine learning for silver nanoparticle electron transfer property prediction. J. Chem. Info. Mod. 57(10):2413-2423.

[6] L Breiman (2001) Random forests. Mach. Learn. 45(1):5-32.

[7] L Breiman (1996) Bagging predictors. Mach. Learn. 24(2):123-140.

[8] M Segal, Y Xiao (2011) Multivariate random forests. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(1):80-87.

Checkout

This paper has an Extended Abstract file available; you must purchase the conference proceedings to access it.

Checkout

Do you already own this?

Pricing

Individuals

AIChE Pro Members $150.00
AIChE Emeritus Members $105.00
AIChE Graduate Student Members Free
AIChE Undergraduate Student Members Free
AIChE Explorer Members $225.00
Non-Members $225.00