(207c) Necessary Optimality-Constrained Bayesian Optimization (NOBO) for Efficiently Learning Complex Control Policies from Closed-Loop Data | AIChE

(207c) Necessary Optimality-Constrained Bayesian Optimization (NOBO) for Efficiently Learning Complex Control Policies from Closed-Loop Data

Authors 

Paulson, J., The Ohio State University
Mesbah, A., University of California, Berkeley
The control of complex systems involves several challenges due to the unknown (black-box) relationship between control policy parameters and the reward function that can only be observed through expensive and noisy simulations or experiments. Bayesian optimization (BO) has recently gained popularity for globally optimizing expensive black-box functions, which is a problem frequently encountered in learning-based control applications, thanks to its data efficiency [1,2]. Traditional BO methods construct a probabilistic surrogate model of the performance function and employ an acquisition function that approximates the value of information for future sample points, effectively balancing the exploration-exploitation tradeoff inherent in searching the design space [3]. Recent studies have shown that BO's convergence performance can be improved by incorporating additional information, such as derivative observations [4].

First-order BO methods mainly focus on standard acquisition functions and indirectly incorporate derivative measurements into the probabilistic surrogate model to enhance local predictions [5]. Nevertheless, these methods can exhibit drawbacks, such as potentially significant increases in training and optimization costs due to greater model complexity, and may fail if gradient observations are heavily obscured by noise [6]. In this talk, we propose a computationally efficient approach to simultaneously utilize performance (zero-order) and derivative (first-order) data within a single acquisition optimization subproblem. Our core idea involves imposing a series of black-box constraints that mimic the necessary optimality conditions for the original global optimization problem at each iteration. The proposed necessary-optimality BO (NOBO) method [7] employs Gaussian process surrogates for the objective's partial derivatives to approximately enforce first-order optimality conditions as black-box constraints in the acquisition function. These constraints establish a feasible set that explicitly accounts for the uncertainty in estimating partial gradients from data, which is updated as new data is observed. Consequently, the feasible set allows for narrowing down the design space search to regions that are jointly informative concerning both zeroth- and first-order information.

We examine the theoretical performance and regret bounds associated with the proposed algorithm and demonstrate in practice that incorporating these constraints, which restrict the allowable search space, leads to faster convergence rates compared to conventional BO. We further validate these performance enhancements on a reinforcement learning (RL) benchmark problem based on the linear quadratic regulator (LQR) problem [8], where the reward function's derivatives can be estimated directly from closed-loop data using the policy gradient theorem.

References:

[1] Shahriari, Bobak, et al. "Taking the human out of the loop: A review of Bayesian optimization." Proceedings of the IEEE 104.1 (2015): 148-175.

[2] Paulson, Joel A., Georgios Makrygiorgos, and Ali Mesbah. "Adversarially robust Bayesian optimization for efficient auto‐tuning of generic control structures under uncertainty." AIChE Journal 68.6 (2022): e17591.

[3] Frazier, Peter I. "A tutorial on Bayesian optimization." arXiv preprint arXiv:1807.02811 (2018).

[4] Shekhar, Shubhanshu, and Tara Javidi. "Significance of gradient information in bayesian optimization." International Conference on Artificial Intelligence and Statistics. PMLR, 2021.

[5] Wu, Jian, et al. "Bayesian optimization with gradients." Advances in neural information processing systems 30 (2017).

[6] Penubothula, Santosh, Chandramouli Kamanchi, and Shalabh Bhatnagar. "Novel first order bayesian optimization with an application to reinforcement learning." Applied Intelligence 51 (2021): 1565-1579.

[7] Makrygiorgos, Georgios, Joel A. Paulson, and Ali Mesbah. " No-Regret Bayesian Optimization with Gradients using Local Optimality-based Constraints: Application to Closed-loop Policy Search", 62nd IEEE Conference on Decision and Control (CDC). IEEE, 2023

[8] Recht, Benjamin. "A tour of reinforcement learning: The view from continuous control." Annual Review of Control, Robotics, and Autonomous Systems 2 (2019): 253-279.