(104h) Families of Data-Driven Surrogates Based on Accuracy and Complexity
AIChE Annual Meeting
2021
2021 Annual Meeting
Computing and Systems Technology Division
Advances in Machine Learning and Intelligent Systems I
Monday, November 8, 2021 - 2:15pm to 2:30pm
Therefore, in this work, we aim to identify sets or families of similar surrogates from a pool of 50 surrogates. The surrogate performances were evaluated over various diverse data sets using two performance metrics. Coefficient of determination (R2) measures the predictive accuracy of a surrogate, while Surrogate Quality Score (SQS) (Ahmad and Karimi, 2021) takes into account model complexity in addition to accuracy. We used correlation coefficient to quantify the extent of agreement or similarity between the performances of any two surrogate models. This enabled us to identify pairs of similar surrogates and hence build families containing mutually similar surrogates. Our results revealed separate and very different families for non-noisy and noisy data sets, based on either performance metric. For non-noisy data sets, we obtained nine families based on both, R2 and SQS metrics. Although the families were almost alike for both performance metrics, they were not identical. Certain complex surrogates especially those belonging to support vector regression technique are penalized heavily by SQS. Hence, they belonged to different families based on R2 and SQS for non-noisy data. While most families comprised of surrogates with the same modeling technique, two families had many surrogates from different modeling techniques, for both performance metrics. For noisy data, surrogates belonging to kriging and radial basis function techniques do not belong to any family since they overfit. Naturally, these techniques are unsuitable for modeling noisy data. Hence, we obtained fewer families than that obtained for non-noisy data. Furthermore, the families based on R2 and SQS were contrasting. Seven families were identified based on R2, while only three were obtained based on SQS metric for noisy data. While some families based on R2 comprised of surrogates using separate techniques, each family based on SQS consisted of surrogates with identical modeling technique. Our families for noisy and non-noisy data sets have been validated by verifying similar surrogates of each family, for several new data sets not used for deriving the original families. Our proposed classification of surrogates into families opens up a computationally efficient way for surrogate selection without the need for exhaustive search across all surrogates.
References:
Bhosekar, A., Ierapetritou, M., 2018. Advances in surrogate based modeling, feasibility analysis, and optimization: A review. Computers & Chemical Engineering 108, 250â267. https://doi.org/10.1016/j.compchemeng.2017.09.017
Davis, S.E., Cremaschi, S., Eden, M.R., 2017. Efficient Surrogate Model Development: Optimum Model Form Based on Input Function Characteristics, in: Computer Aided Chemical Engineering. Elsevier, pp. 457â462. https://doi.org/10.1016/B978-0-444-63965-3.50078-7
Garud, S.S., Karimi, I.A., Kraft, M., 2018. LEAPS2: Learning based Evolutionary Assistive Paradigm for Surrogate Selection. Computers & Chemical Engineering 119, 352â370. https://doi.org/10.1016/j.compchemeng.2018.09.008
Williams, B., Cremaschi, S., 2021. Selection of Surrogate Modeling Techniques for Surface Approximation and Surrogate-Based Optimization. Chemical Engineering Research and Design. https://doi.org/10.1016/j.cherd.2021.03.028