(373r) Data-Driven Bi-Level Optimization of Hyperparameters for Machine Learning Models | AIChE

(373r) Data-Driven Bi-Level Optimization of Hyperparameters for Machine Learning Models

Authors 

Achieving optimal performance from machine learning (ML) models requires precise hyperparameter tuning. Hyperparameters are external adjustable parameters that are specified before the model training begins. The selection of hyperparameter values can profoundly impact the performance of an ML model. Predictive accuracies can vary dramatically, from as low as 1% to as high as 95%, depending on the specific hyperparameter values chosen [1]. While their selection is commonly tackled using grid-search or random-search, they are also suboptimal, as the full spectrum of hyperparameter combinations cannot be exhaustively explored [2]. Such discretization-based hyperparameter search methods can only test a restricted number of hyperparameter combinations to identify the one with the best validation performance and disregard the interconnections between the cross-validation [3] and the training error. Hyperparameter tuning problems can also be addressed with Bayesian [4] or parametric bi-level optimization [5, 6]. Even though Bayesian techniques are promising in finding near-optimal HPs, they are computationally expensive and ineffective in tuning the nonlinear ML models [7]. Similarly, parametric techniques are advantageous in terms of getting the exact solution, but they are limited to certain types of bi-level formulations and still cannot address the tuning of nonlinear ML models.

Motivated by this, we formulate the cross-validated hyperparameter optimization problem for nonlinear ML models as a bi-level program and address their solution using data-driven optimization. The formulation poses the hyperparameter decisions at the upper level to minimize the mean squared validation error, whereas the parameter estimation for the training error is minimized at the lower level for each fold. We address this bi-level multi-follower optimization problem using the DOMINO framework and approximate it as a single-level optimization problem using data-driven techniques [8]. We test the performance of various local and global derivative-free optimizers within DOMINO and evaluate their hyperparameter tuning performance on four different chemical processes [ 9, 10, 11, 12] for regression and classification tasks. Our results show that local optimizers such as NOMAD can accurately tune linear ML models with one hyperparameter while its computation cost is significantly lower than that of global optimizers. Inversely, global optimization methods such as DIRECT, Particle Swarm Optimization (PSO), and Evolutionary Algorithms (EA) perform superior for models with a higher number of hyperparameters and nonlinear characteristics like support vector machines with radial basis function kernel for classification. Furthermore, our results show that EA and PSO are the most computationally expensive among the methods utilized in this work. However, they still outperform conventional grid search and Bayesian methods in terms of tuning precision and computation costs.

References:

[1] Luo, Gang. "A review of automatic selection methods for machine learning algorithms and hyper-parameter values."Network Modeling Analysis in Health Informatics and Bioinformatics 5 (2016): 1-16.

[2] Yu, Tong, and Hong Zhu. "Hyper-parameter optimization: A review of algorithms and applications." arXiv preprint arXiv:2003.05689 (2020).

[3] Anguita, Davide, Luca Ghelardoni, Alessandro Ghio, Luca Oneto, and Sandro Ridella. "The'K'in K-fold Cross Validation." In ESANN, vol. 102, pp. 441-446. 2012.

[4] Wu, Jia, Xiu-Yun Chen, Hao Zhang, Li-Dong Xiong, Hang Lei, and Si-Hao Deng. "Hyperparameter optimization for machine learning models based on Bayesian optimization." Journal of Electronic Science and Technology 17, no. 1 (2019): 26-40.

[5] Tso, William W., Baris Burnak, and Efstratios N. Pistikopoulos. "HY-POP: Hyperparameter optimization of machine learning models through parametric programming." Computers & Chemical Engineering 139 (2020): 106902.

[6] Sinha, Ankur, Pekka Malo, Peng Xu, and Kalyanmoy Deb. "A bilevel optimization approach to automated parameter tuning." In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, pp. 847-854. 2014.

[7] Alibrahim, Hussain, and Simone A. Ludwig. "Hyperparameter optimization: Comparing genetic algorithm against grid search and bayesian optimization." In 2021 IEEE Congress on Evolutionary Computation (CEC), pp. 1551-1559. IEEE, 2021.

[8] Beykal, Burcu, Styliani Avraamidou, Ioannis PE Pistikopoulos, Melis Onel, and Efstratios N. Pistikopoulos. "Domino: Data-driven optimization of bi-level mixed-integer nonlinear problems."Journal of Global Optimization 78 (2020): 1-36.

[9] Ghalavand, Younes, Hasan Nikkhah, and Ali Nikkhah. "Heat pump assisted divided wall column for ethanol azeotropic purification." Journal of the Taiwan Institute of Chemical Engineers 123 (2021): 206-218.

[10] Nikkhah, Hasan, and Burcu Beykal. "Process design and technoeconomic analysis for zero liquid discharge desalination via LiBr absorption chiller integrated HDH-MEE-MVR system." Desalination 558 (2023): 116643.

[11] Beykal, Burcu, Melis Onel, Onur Onel, and Efstratios N. Pistikopoulos. "A data‐driven optimization algorithm for differential algebraic equations with numerical infeasibilities." AIChE journal 66, no. 10 (2020): e16657.

[12] Aghayev, Z., Szafran, A.T., Tran, A., Ganesh, H.S., Stossi, F., Zhou, L., Mancini, M.A., Pistikopoulos, E.N. and Beykal, B., 2023. Machine learning methods for endocrine disrupting potential identification based on single-cell data. Chemical Engineering Science, 281, p.119086.