(104c) Systematic Selection of Surrogate Modeling Techniques for Surrogate-Based Optimization Using Presto (Predictive REcommendations of Surrogate models To Optimize) | AIChE

(104c) Systematic Selection of Surrogate Modeling Techniques for Surrogate-Based Optimization Using Presto (Predictive REcommendations of Surrogate models To Optimize)

Authors 

Williams, B. - Presenter, Auburn University
Cremaschi, S., Auburn University
Optimization is required for several chemical engineering applications, including process design and process synthesis, operations, and supply chain management. These applications usually involve complex, high-fidelity simulations and/or physical experiments, which can both require significant resources in terms of cost and time and a large computational expense to collect data. Optimization using traditional gradient-based methods is impractical for these applications because gradient information is not readily available, and approximating gradients is infeasible due to the required expense for multiple simulation evaluations or experiments. To overcome this challenge, cheaper surrogates that can mimic the simulations' overall behavior can be constructed and used in their place for optimization.

Surrogate models, also known as response surfaces, black-box models, metamodels, or emulators, are simplified approximations of more complex, higher-order models. These models are used to map input data to output data when the actual relationship between the two is unknown or computationally expensive to evaluate (Han & Zhang, 2012). With all the surrogate modeling techniques currently available, there is a need for a systematic procedure for selecting the appropriate technique for a given application. Current common practices for selecting a model form rely on process-specific expertise. Progress has been made in recent works in generalizing the process for selecting a surrogate model to approximate a design space by using meta-learning approaches to build selection frameworks (Cui et al., 2016; Garud et al., 2018), avoiding expensive trial-and-error methods. However, the selection of surrogate models for surrogate-based optimization remains an open challenge.

This study aims to comprehensively investigate and compare the performance of several surrogate modeling techniques for surrogate-based optimization and link that performance to generalized characteristics of the data involved in the application. Previous work on this topic has shown that the performance for surrogate-based optimization is dependent on data characteristics such as the input dimension and the underlying function shape (Williams & Cremaschi, 2021). The surrogate-modeling techniques considered include Artificial Neural Networks, Automated Learning of Algebraic Models using Optimization (ALAMO), Radial Basis Networks, Extreme Learning Machines, Gaussian Progress Regression, Random Forests, Support Vector Regression, and Multivariate Adaptive Regression Splines (MARS). These techniques are used to construct surrogate models for data generated using optimization test functions from the Virtual Library of Simulation Experiments (Surjanovic & Bingham, 2013). A deterministic optimization problem is formulated and solved using each trained surrogate model as the objective function. The surrogate models' performance for surrogate-based optimization is evaluated by calculating the distance between the extreme point(s) estimated by the model and the actual function extrema and the difference between the actual minimum value of the test function and the one estimated by the surrogate model.

Using information extracted from the surrogate modeling comparison experiments and building upon previous meta-learning approaches (Cui et al., 2016; Garud et al., 2018), we have developed Predictive REcommendations of Surrogate models To Optimize, PRESTO. PRESTO is a random forest model-based tool that provides recommendations for the appropriate modeling techniques for the datasets based only on the characteristics of the data being modeled. Characteristics, i.e., attributes, were calculated for each dataset with the goal of representing its overall behavior. Attributes were calculated based only on input and output values in the dataset. The attributes that have the strongest relationships with the performance metrics were determined using feature reduction methods, including principal component analysis (Hotelling, 1933) and the built-in feature selection methods of the machine learning techniques. These attributes were used as inputs, with designated performance metrics as outputs, to train models for predicting the performance of the surrogate modeling techniques. The performance metric used as the output for training PRESTO is the normalized distance between the extreme point(s) estimated by the models and the actual extrema of the true model for surrogate-based optimization. PRESTO determines which surrogate modeling techniques are recommended for use for surrogate-based optimization given a set of input/output data. PRESTO identified which surrogates should be recommended for surrogate-based optimization for a dataset correctly with an accuracy of 82% and with a precision, or the probability that a surrogate modeling technique predicted to be recommended should actually be, of 89%.

References:

Bhosekar, A., & Ierapetritou, M. (2018). Advances in surrogate based modeling, feasibility analysis, and optimization: A review. Computers & Chemical Engineering, 108, 250-267.

Cui, C., et al. (2016). A recommendation system for meta-modeling: A meta-learning based approach. Expert Systems with Applications, 46, 33-44.

Garud, S. S., et al. (2018). LEAPS2: Learning based Evolutionary Assistive Paradigm for Surrogate Selection. Computers & Chemical Engineering, 119, 352-370.

Han, Z., & Zhang, K. (2012). Surrogate-Based Optimization. In O. Roeva (Ed.), Real-World Applications of Genetic Algorithms (pp. 343-362). Rijeka, Croatia: InTech Open.

Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 498-520.

Ju, Y. P., et al. (2016). Artificial intelligence metamodel comparison and application to wind turbine airfoil uncertainty analysis. Advances in Mechanical Engineering, 8.

Kira, K., & Rendell, L. A. (1992). A Practical Approach to Feature-Selection. Machine Learning /, 249-256.

Luo, J. N., & Lu, W. X. (2014). Comparison of surrogate models with different methods in groundwater remediation process. Journal of Earth System Science, 123, 1579-1589.

Miles, J. (2005). R Squared, Adjusted R Squared. In Encyclopedia of Statistics in Behavioral Science: John Wiley & Sons Ltd.

Surjanovic, S., & Bingham, D. (2013). Virtual Library of Simulation Experiments. In (Vol. 2018). Simon Fraser University.

Williams, B., Lobel, W., Finklea, F., Halloin, C., Ritzenhoff, K., Manstein, F., Mohammadi, S., Hashemi, M., Zweigerdt, R., Lipke, E., Cremaschi, S., 2020. Prediction of Human Induced Pluripotent Stem Cell Cardiac Differentiation Outcome by Multifactorial Process Modeling. Front Bioeng Biotechnol 8, 851. 10.3389/fbioe.2020.00851.

Williams, B. and Cremaschi, S., "Selection of Surrogate Modeling Techniques for Surface Approximation and Surrogate-Based Optimization", Chemical Engineering Research and Design. doi: 10.1016/j.cherd.2021.03.028.