(199d) Low-Complexity, Modified Radial Basis Function Modeling Technique | AIChE

(199d) Low-Complexity, Modified Radial Basis Function Modeling Technique

Authors 

Ahmad, M. - Presenter, National University of Singapore
Karimi, I., National University of Singapore
With improved data-storage capabilities and significant advancements in AI technologies, machine-learning-based surrogate models are finding increased usage in modeling industrial process operations. Radial Basis Function (RBF) modeling technique (Hardy, 1971) has shown promising capabilities in accurately modeling complex data. When trained on K data points, an RBF surrogate model uses a linear combination of K basis functions (Φk(x), k=1, 2, ..., K) and an optional tail function (t(x)) to define its analytical model form (see Equation below)

The model parameters are obtained by solving a square system of linear equations. Each basis function Φk(x) is radially symmetric around a distinct sample point of the data set. By using as many independent model parameters as data points, an RBF surrogate precisely learns the response at each trained point, thereby perfectly fitting or interpolating each point of the trained data set. While the interpolating capability of RBF models may be beneficial for certain applications such as curve fitting and surrogate-based optimization, RBF surrogate forms pose some potential limitations as well. First, they have large model complexity with respect to degrees of freedom offered by the model parameters. In fact, their complexity increases linearly with the amount of data used for training the model. Often, a simple and accurate model having few parameters is desirable to maintain interpretability and tractability of the model. Second, an RBF surrogate may fail to capture the true trend of the response profile reliably at untrained locations of the input space. Although this is true for any ML-based modeling technique, the risk of poor generalizability is higher for models having large complexity such as RBFs owing to their ability to create extremely flexible response surfaces. In other words, an RBF surrogate may overfit at the trained points by virtue of its interpolating characteristic and generalize poorly elsewhere. Third, RBF surrogates cannot be used for modeling noisy data as they will fit the noise completely in addition to the underlying trend. This would render them unreliable to make predictions at test locations. Most real-world industrial process data are inherently noisy, which cannot be modeled by RBF surrogates. In light of the aforementioned limitations, the key question is whether they can be efficiently addressed, without compromising the ability of RBF surrogates to exhibit sufficient flexibility and accuracy.

To this end, we propose development of low-complexity, modified RBF surrogates (MRBF) which do not interpolate at the trained points, with the aim to address the above-mentioned limitations of original RBF (ORBF) surrogates. Starting with a few center points and an initial RBF form, we iteratively add new center points and update the corresponding RBF surrogate form until the trained model does not improve with increasing degrees of freedom. The performance of the trained model is tracked by assessing its performance over a hold-out validation set at each iteration. We assess the performance of MRBF surrogates using different basis functions over several data sets, and compare them with ORBF surrogates. The performances are compared based on two performance metrics, error-based coefficient of determination or R2, and a hybrid accuracy-cum-complexity metric Surrogate Quality Score or SQS (Ahmad and Karimi, 2021). SQS combines model accuracy and complexity into a single score, and is especially useful when simple and accurate surrogates are desired. MRBF either outperforms or performs as well (less than 0.01 absolute difference) as ORBF for ~60/75 data sets based on both performance metrics. Impressive performance of MRBF surrogates opens us the possibility to exploit them to model noisy data without significantly overfitting. Our ongoing work involves assessing MRBF performance on noisy data sets, comparing them with other commonly-used surrogates (Support Vector Regression, Neural Networks, Multivariate Adaptive Regression Splines, etc), and exploring additional strategies to develop low-complexity MRBF surrogate forms.