(712b) Gradient Information Bayesian Optimization for Composite Functions (GIBO-CF): Local Physics-Based Grey-Box Optimization in High Dimensions

Conference

AIChE Annual Meeting

Year

2024

Proceeding

Computing and Systems Technology Division

Session

Thursday, October 31, 2024 - 3:51pm to 4:12pm

Authors

Ip, J. H. S. - Presenter

Makrygiorgos, G., UC Berkeley

Paulson, J., The Ohio State University

Mesbah, A., University of California, Berkeley

Bayesian optimization (BO) [1] has emerged as an efficient method for learning
and optimizing expensive black-box functions in numerous applications such as
hyperparameter tuning for deep neural networks [2] and experimental design [3].
This is in part due to its ability to locate global optima with a small number of
noisy objective queries. BO constructs a surrogate, typically a Gaussian Process
(GP) [4], as a model of the true black-box objective, and updates the poste-
rior when new observations are made. The acquisition function (e.g. expected
improvement [5], upper confidence bound [6]) maintains a tradeoff between ex-
ploration and exploitation, and determines the location of where to query next.

BO is generally a derivative-free/zeroth order optimization algorithm. However,
leveraging gradient information is advantageous [7]: for an input of dimension
d, there are d + 1 pieces of information, which will yield a more complete view of
function behavior, and derivatives are relatively low in cost when computed via
adjoint methods [8] or approximated via finite differences. Currently, there is
significant interest in using gradients in BO; [9] first suggested using gradients
by modifying the expected improvement and probability of improvement acqui-
sition functions under the assumption of full gradient information. They are
also explicitly obtained and used as a constraint for local optimality for global
optimization [10]. In [7], the property that the derivative of a GP is also a GP
is utilized to create a joint kernel function that correlates all possible combina-
tions of zeroth-order values and gradients. Unfortunately, this is prohibitively
expensive due to the computational bottleneck of the Cholesky decomposition
rendering it not scalable, since the resulting computational complexity is cu-
bic with respect to the number of datapoints and d. Furthermore, gradients
are not always readily accessible and the curse of dimensionality prohibits BO
from providing satisfactory empirical results for problems with 10 or more di-
mensions [11]. Gradient information BO (GIBO) circumvents these issues by
only focusing on locating local optima and using zeroth-order queries to infer

gradients [12]. The strategy forgoes learning the zeroth order function itself but
its Jacobian instead, and the associated acquisition function GI maximizes the
expected difference in Jacobian variance before and after the query. Based on
the data in the vicinity of the current location, the approach constructs a local
approximation of the Jacobian for gradient descent to find the optima. In the
same vein, [13] maximizes the probability of descent when querying points to
approximate the Jacobian for gradient descent.
Additionally, recent research has shown promising performance by exploiting
the composite nature of the objective function [14]. By taking advantage of the
partially known structure of the objective, the limit on the convergence rate
imposed by a black-box/no-prior information assumption can be overcome [15].
Therefore, we extend GIBO by integrating partial information in the form of a
grey-box composite function [16], which we term GIBO for composite functions
(GIBO-CF). In the computation of the Jacobian variance for the acquisition
function, the derivatives of the known portion of the function can be extracted
as constants at every iteration of gradient descent to reduce complexity, which
will greatly benefit applications for higher-dimensional problems. We demon-
strate that due to the known and inferred gradients, composite GIBO outper-
forms traditional BO as well as other algorithms such as augmented random
search [17] and CMA-ES [18]. Furthermore, the composite nature of the pro-
posed method leads to a lower number of objective calls to locate the optima
as compared to GIBO.

[1] D. R. Jones, M. Schonlau, and W. J. Welch, J. Glob. Optim., vol. 13, no. 4,
pp. 455â€“492, 1998.
[2] H. Cho, Y. Kim, E. Lee, D. Choi, Y. Lee, and W. Rhee, â€œBasic enhancement
strategies when using bayesian optimization for hyperparameter tuning of
deep neural networks,â€ IEEE Access, vol. 8, pp. 52 588â€“52 608, 2020.
[3] S. Greenhill, S. Rana, S. Gupta, P. Vellanki, and S. Venkatesh, â€œBayesian
optimization for adaptive experimental design: A review,â€ IEEE Access,
vol. 8, pp. 13 937â€“13 948, 2020.
[4] C. E. Rasmussen and C. K. I. Williams, Gaussian processes for machine
learning., ser. Adaptive computation and machine learning. MIT Press,
2006.
[5] D. Huang, T. Allen, W. Notz, and N. Zeng, â€œGlobal Optimization
of Stochastic Black-Box Systems via Sequential Kriging Meta-Models,â€
Journal of Global Optimization, vol. 34, no. 3, pp. 441â€“466, March 2006.
[Online]. Available: https://ideas.repec.org/a/spr/jglopt/v34y2006i3p441-
466.html

[6] N. Srinivas, A. Krause, S. M. Kakade, and M. W. Seeger,
â€œInformation-theoretic regret bounds for gaussian process optimiza-
tion in the bandit setting,â€ IEEE Transactions on Information
Theory, vol. 58, no. 5, p. 3250â€“3265, May 2012. [Online]. Available:
http://dx.doi.org/10.1109/TIT.2011.2182033
[7] J. Wu, M. Poloczek, A. G. Wilson, and P. Frazier, â€œBayesian optimiza-
tion with gradients,â€ in Advances in Neural Information Processing Sys-
tems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vish-
wanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017.
[8] R.-E. Plessix, â€œA review of the adjoint-state method for computing
the gradient of a functional with geophysical applications,â€ Geophysical
Journal International, vol. 167, no. 2, pp. 495â€“503, 11 2006. [Online].
Available: https://doi.org/10.1111/j.1365-246X.2006.02978.x
[9] D. J. Lizotte, â€œPractical bayesian optimization,â€ 2008.
[10] G. Makrygiorgos, J. A. Paulson, and A. Mesbah, â€œNo-regret bayesian op-
timization with gradients using local optimality-based constraints: Appli-
cation to closed-loop policy search,â€ in 2023 62nd IEEE Conference on
Decision and Control (CDC), 2023, pp. 20â€“25.
[11] W.-T. Tang and J. A. Paulson, â€œCages: Cost-aware gradient entropy search
for efficient local multi-fidelity bayesian optimization,â€ in 2024 63rd IEEE
Conference on Decision and Control (CDC), 2024.
[12] S. M Ìˆuller, A. von Rohr, and S. Trimpe, â€œLocal policy search with bayesian
optimization,â€ in Neural Information Processing Systems, 2021. [Online].
Available: https://api.semanticscholar.org/CorpusID:235593254
[13] Q. Nguyen, K. Wu, J. R. Gardner, and R. Garnett, â€œLocal bayesian opti-
mization via maximizing probability of descent,â€ 2023.
[14] A. Kudva, F. Sorouifar, and J. A. Paulson, â€œEfficient robust global op-
timization for simulation-based problems using decomposed gaussian pro-
cesses: Application to mpc calibration,â€ in 2022 American Control Con-
ference (ACC), 2022, pp. 2091â€“2097.
[15] H. Chen, â€œLower rate of convergence for locating a maximum of a
function,â€ The Annals of Statistics, vol. 16, no. 3, pp. 1330â€“1334, 1988.
[Online]. Available: http://www.jstor.org/stable/2241636
[16] J. A. Paulson and C. Lu, â€œCobalt: Constrained bayesian optimization of
computationally expensive grey-box models exploiting derivative informa-
tion,â€ 2022.
[17] H. Mania, A. Guy, and B. Recht, â€œSimple random search provides a com-
petitive approach to reinforcement learning,â€ 2018.
[18] N. Hansen, â€œThe cma evolution strategy: A tutorial,â€ 2023.

Topics

Computing and Systems Engineering

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: December 2024

CEP: November 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(712b) Gradient Information Bayesian Optimization for Composite Functions (GIBO-CF): Local Physics-Based Grey-Box Optimization in High Dimensions

AIChE Annual Meeting

2024

2024 AIChE Annual Meeting

Computing and Systems Technology Division

10C: Data-driven Optimization

Thursday, October 31, 2024 - 3:51pm to 4:12pm

Authors

Topics

More Conference Links

Cancelation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams

Code of Conduct

Beware of Hotel and Attendee-list Scams