(282c) Constrained Bayesian Optimization for Expensive Noisy Hybrid Models Using Differentiable Quantile Function Approximations | AIChE

(282c) Constrained Bayesian Optimization for Expensive Noisy Hybrid Models Using Differentiable Quantile Function Approximations

Authors 

Lu, C. - Presenter, Ohio State University, Department of Chemical and
Paulson, J., The Ohio State University
Bayesian optimization (BO) has recently become one of the most popular frameworks for efficient data-driven discovery of globally optimal solutions for expensive black-box functions subjected to noisy observations. BO’s effectiveness is derived from its probabilistic representation of the black-box objective and constraint functions that allow it to systematically address the exploration (learning for the future) and exploitation (immediately advancing toward the goal) tradeoff [1,2]. Any specific BO method consists of two major components. First, the construction of some predictive model given by a Bayesian posterior distribution over the black-box functions, which acts as a “surrogate model” equipped with uncertainty estimates. One typically relies on Gaussian process (GP) models due to their non-parametric form [3]. Second, the selection of an “acquisition function”, that depends on the posterior distribution of the unknown functions, whose value at any future sample quantifies the benefit of evaluating the functions at this point. Due to recent successes in a variety of application domains including hyperparameter tuning in machine learning [4], material and drug design [5], and calibration of computationally expensive simulation models [6], BO has extended in many directions including the incorporation of black-box constraints [7], multiple models and objectives [8,9], and robust uncertainty handling [10].

There has been limited work, however, on extending the BO paradigm to handle so-called hybrid models that frequently appear in important science and engineering problems. The phrase “hybrid model” here refers to a combination of mechanistic (first principles) models and black-box (data-driven) models – the mechanistic portion thus represents a type of “prior information” that can be utilized within the BO process to improve the prediction accuracy of the surrogate models and is almost always available in practice. Although the traditional BO framework can be applied in such cases by simply treating the hybrid model as a black-box, it has been shown that this assumption places a fundamental limitation on the achievable rate of convergence [11]. This observation has motivated significant work by the process systems engineering community on so-called hybrid or grey-box modeling/optimization frameworks over the past several years, e.g., [12-16]. None of these works, however, take a BO perspective and thus are not often applicable when the black-box portions of the model are very expensive to evaluate.

Our group recently proposed COBALT, which is, to the best of our knowledge, the first constrained BO method that is applicable to hybrid models represented by composite functions, i.e., f(x) = g(h(x)) where g(•) and h(•) are white- and black-box functions, respectively [17]. We showed that COBALT can achieve significant (orders of magnitude) gains in sampling efficiency over state-of-the-art alternatives on a variety of problems; however, it does have certain restrictions: (i) the acquisition function does not directly handle noisy function observations; (ii) it uses an approximate constraint handling method based on linearization of the probabilistic surrogate model around its mean prediction, which may lead to problems when the constraints are highly nonlinear; and (iii) there is not a straightforward way to theoretically analyze its performance including establishing bounds on its rate of convergence.

In this talk, we develop a novel approach for constrained BO of expensive noisy hybrid models that addresses the three restrictions of COBALT mentioned previously. The fundamental basis for the proposed method is the notion of quantile functions, which can be used to evaluate the likely prediction range of a random variable given a desired probability level. In the fully black-box case (with GP surrogate models), the predictions of the unknown functions will be normally distributed such that the quantile function has an analytic form – this leads to the well-known upper confidence bound (UCB) acquisition function [18] whose practical and theoretical performance has been extensively studied in the literature in the presence and absence of noise [19-21]. Extending UCB to hybrid models is non-trivial due to the difficulty in evaluating and optimizing over the quantile function for non-standard distributions. We overcome this challenge by developing a sample-based approximation wherein the Nα sorted sample (when N samples are sorted in ascending order) is a convergent estimate of the quantile function evaluated at probability level α. Since sorting is non-differentiable, we also develop a differentiable sorting approximation via soft projection operators [22], which allows us to apply efficient maximize the quantile-based acquisition function. Using these differentiable quantile function approximations for the objective and constraints, we can directly address the noise (i) and constraint (ii) problems of COBALT. We further establish theoretical performance guarantees (to address (iii)) on the cumulative regret and constraint violation by bounding the predictions from the quantile function in terms of the maximum information gain of the GP models. This implies that, for a wide class of covariance functions, we can guarantee a finite iteration exists in which our best sample point is arbitrarily close to the constrained global solution. Thus, in sharp contrast to existing methods, our method is easy to implement, effective in practice, and enjoys a guarantee of global optimality. The effectiveness of the proposed quantile-based BO method is demonstrated on several test problems including parameter estimation in an environmental model and real-time optimization of a nonlinear reactor system wherein we consistently observe similar or improved performance over COBALT.

References:

[1] Frazier, Peter I. "A tutorial on Bayesian optimization." arXiv preprint arXiv:1807.02811 (2018).

[2] Shahriari, B., Swersky, K., Wang, Z., Adams, R. P., & De Freitas, N. (2015). Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104(1), 148-175.

[3] Williams, Christopher KI, and Carl Edward Rasmussen. Gaussian processes for machine learning. Vol. 2. No. 3. Cambridge, MA: MIT press, 2006.

[4] M. Lindauer, K. Eggensperger, M. Feurer, A. Biedenkapp, D. Deng, C. Benjamins, T. Ruhkopf, R. Sass, and F. Hutter, “Smac3: A versatile bayesian optimization package for hyperparameter optimization,” Journal of Machine Learning Research, vol. 23, pp. 1–9, 2022.

[5] P. I. Frazier and J. Wang, “Bayesian optimization for materials design,” in Information Science for Materials Discovery and Design. Springer, 2016, pp. 45–75.

[6] R. A. Vargas-Hernandez, “Bayesian optimization for calibrating and selecting hybrid-density

functional models,” The Journal of Physical Chemistry A, vol. 124, no. 20, pp. 4053–4061, 2020

[7] Lu, Congwen, and Joel A. Paulson. "No-regret Bayesian optimization with unknown equality and inequality constraints using exact penalty functions." IFAC-PapersOnLine 55.7 (2022): 895-902.

[8] Khan, Nazan, David E. Goldberg, and Martin Pelikan. "Multi-objective Bayesian optimization algorithm." Proceedings of the 4th Annual Conference on Genetic and Evolutionary Computation. 2002.

[9] Song, Jialin, Yuxin Chen, and Yisong Yue. "A general framework for multi-fidelity bayesian optimization with gaussian processes." The 22nd International Conference on Artificial Intelligence and Statistics. PMLR, 2019.

[10] Kudva, Akshay, Farshud Sorourifar, and Joel A. Paulson. "Constrained robust Bayesian optimization of expensive noisy black‐box functions with guaranteed regret bounds." AIChE Journal 68.12 (2022): e17857.

[11] H. Chen. Lower rate of convergence for locating a maximum of a function. The Annals of Statistics, 1330-1334, 1988.

[12] J. P. Eason, L. T. Biegler, A trust region filter method for glass box/black box optimization, AIChE Journal 62 (9) (2016) 3124-3136.

[13] I. Bajaj, S. S. Iyer, M. M. F. Hasan, A trust region-based two phase algorithm for constrained black-box and grey-box optimization with infeasible initial point, Computers & Chemical Engineering 116 (2018) 306-321.

[14] S. H. Kim, F. Boukouvala, Surrogate-based optimization for mixed-integer nonlinear problems, Computers & Chemical Engineering 140 (2020) 106847

[15] B. Beykal, F. Boukouvala, C. A. Floudas, N. Sorek, H. Zalavadia, E. Gildin, Global optimization of grey-box computational systems using surrogate functions and application to highly constrained oil- eld operations, Computers & Chemical Engineering 114 (2018) 99-110

[16] B. Beykal, F. Boukouvala, C. A. Floudas, E. N. Pistikopoulos, Optimal design of energy systems using constrained grey-box multi-objective optimization, Computers & Chemical Engineering 116 (2018) 488-502.

[17] J.A. Paulson & C. Lu. COBALT: COnstrained Bayesian optimizAtion of computationaLly expensive grey-box models exploiting derivaTive information. Computers & Chemical Engineering, 160, 107700, 2022.

[18] Srinivas, N., Krause, A., Kakade, S. M., & Seeger, M. W. (2012). Information-theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE transactions on information theory, 58(5), 3250-3265.

[19] Berkenkamp, Felix, Angela P. Schoellig, and Andreas Krause. "No-regret bayesian optimization with unknown hyperparameters." arXiv preprint arXiv:1901.03357 (2019).

[20] Chowdhury, Sayak Ray, and Aditya Gopalan. "On kernelized multi-armed bandits." International Conference on Machine Learning. PMLR, 2017.

[21] Vakili, Sattar, Kia Khezeli, and Victor Picheny. "On information gain and regret bounds in gaussian process bandits." International Conference on Artificial Intelligence and Statistics. PMLR, 2021.

[22] Blondel, M., Teboul, O., Berthet, Q., & Djolonga, J. (2020, November). Fast differentiable sorting and ranking. In International Conference on Machine Learning (pp. 950-959). PMLR.