(56e) Scalable Global Optimization of Gaussian Processes Using a Specialized Branch-and-Bound Algorithm | AIChE

(56e) Scalable Global Optimization of Gaussian Processes Using a Specialized Branch-and-Bound Algorithm

Authors 

Tang, W. T. - Presenter, The Ohio State University
Tsay, C., Imperial C
Paulson, J., The Ohio State University
Gaussian processes (GPs) have been extensively applied as data-driven surrogate models for unknown black-box functions across a variety of scientific and engineering domains [1,2]. GP models are non-parametric in the sense that they can represent any (smooth) nonlinear function as more data is collected and are closely related to empirical risk minimizers in reproducing kernel Hilbert spaces (RKHS). Such surrogate models are commonly incorporated into comprehensive optimization frameworks to identify optimal solutions, especially when some part of the problem is unknown or expensive to evaluate [3,4]. However, it is well-known that the predictive model, which corresponds to the posterior mean function for GPs, is highly non-convex and thus poses a significant challenge for deterministic global optimization algorithms.

Spatial branch and bound (BB) [5] is one of the leading techniques for rigorous global optimization of general nonlinear functions. BB relies on iteratively dividing the search domain through a branching process that aims to close the gap between upper and lower bounds on the function. Branching introduces new nodes (subdomains of the input space) within a BB tree; lower and upper bounds can be computed for each node and compared to the current best value to decide if the node can be eliminated or if it requires further branching. As such, the tightness of the relaxation of the function plays a crucial role in minimizing the size of the BB tree (number of nodes produced), which is closely related to the cost of solving the optimization problem. A key challenge in the design of a particular BB algorithm is thus achieving a good tradeoff/balance between the quality and computational complexity of optimizing the relaxation.

McCormick relaxation [6] and αBB [7] are widely used convex underestimators, yet they exhibit shortcomings in the case of GP surrogate models due their inherent structure as a summation over a large number of elements (one per data point). To the best of our knowledge, only a single previous work has considered guaranteed global optimization of GPs [8] – they develop custom McCormick relaxations that we show to be loose in practice, especially as more data is used to train the GP. This results in very large BB trees that increases the cost and memory requirements of the optimization algorithm. The main goal of this work is to introduce a new spatial BB algorithm that overcomes these limitations. The proposed method is built upon three key innovations: (i) development of an analytic lower bounding method for a subproblem defined in terms of a single term of the GP mean function; (ii) creation of a specialized non-convex underestimator for GP mean functions that can be efficiently optimized using state-of-the-art mixed-integer programming (MIP) methods; and (iii) proposal of heuristic strategy for cheaply splitting terms into inactive and active subsets for which we can apply the underestimators from (i) and (ii), respectively. We conduct a detailed comparison between our newly proposed method and previous BB schemes based on McCormick and αBB relaxations as well as established global optimization solvers such as BARON [9] on a broad collection of test problems ranging from 4 to 10 dimensions. Our findings show that our method can identify guaranteed global solutions faster (less CPU time in a naïve implementation) and with significantly smaller BB trees (100x or more reduction in many cases) compared to state-of-the-art alternative methods.

References:

[1] Chati, Y. S., & Balakrishnan, H. (2017, April). A Gaussian process regression approach to model aircraft engine fuel flow rate. In Proceedings of the 8th International Conference on Cyber-Physical Systems (pp. 131-140).

[2] di Sciascio, F., & Amicarelli, A. N. (2008). Biomass estimation in batch biotechnological processes by Bayesian Gaussian process regression. Computers & Chemical Engineering, 32(12), 3264-3273.

[3] Osborne, M. A., Garnett, R., & Roberts, S. J. (2009). Gaussian processes for global optimization.

[4] Ginsbourger, D., Le Riche, R., & Carraro, L. (2008). A multi-points criterion for deterministic parallel global optimization based on Gaussian processes.

[5] Smith, E. M., & Pantelides, C. C. (1999). A symbolic reformulation/spatial branch-and-bound algorithm for the global optimisation of nonconvex MINLPs. Computers & Chemical Engineering, 23(4-5), 457-478.

[6] McCormick, G. P. (1976). Computability of global solutions to factorable nonconvex programs: Part I—Convex underestimating problems. Mathematical programming, 10(1), 147-175.

[7] Adjiman, C. S., Dallwig, S., Floudas, C. A., & Neumaier, A. (1998). A global optimization method, αBB, for general twice-differentiable constrained NLPs—I. Theoretical advances. Computers & Chemical Engineering, 22(9), 1137-1158.

[8] Schweidtmann, A. M., Bongartz, D., Grothe, D., Kerkenhoff, T., Lin, X., Najman, J., & Mitsos, A. (2021). Deterministic global optimization with Gaussian processes embedded. Mathematical Programming Computation, 13(3), 553-581.

[9] Sahinidis, Nikolaos V. "Mixed-integer nonlinear programming 2018." Optimization and Engineering 20 (2019): 301-306.