(712d) Multistep Lookahead Bayesian Optimization for High Dimensional Black-Box Optimization Problems Using Reinforcement Learning | AIChE

(712d) Multistep Lookahead Bayesian Optimization for High Dimensional Black-Box Optimization Problems Using Reinforcement Learning

Authors 

Koh, D. Y., Georgia Institute of Technology
Lee, J. H., University of Southern California
Tsay, C., Imperial C
Multistep lookahead Bayesian optimization for high dimensional black-box optimization problems using reinforcement learning

Mujin Cheona, Dong-yeun Koha*, Jay H. Leeb*, and Calvin Tsayc*

a Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology, 291 Daehak-Ro, Yuseong-Gu, Daejeon, 34141, Republic of Korea

b Department of Chemical and Biological Engineering, University of Southern California, Los Angeles, CA 90007, USA

c Department of Computing, Imperial College London, London, SW7 2AZ, England, United Kingdom

Abstract:

Bayesian optimization (BO) is a popular decision-making tool in global optimization problems when the underlying system is not fully known and must be treated as “black box”. It excels in sequential decision-making by adeptly balancing exploration (learning more about the system) and exploitation (making the best decision based on current knowledge), which enables it to efficiently find global optima with minimal data. This attribute is particularly valuable in fields such as chemical engineering, where collecting experimental data can be both expensive and time-consuming [1]. Therefore, BO has seen explored as a design-of-experiments strategy in areas such as material discovery, reaction engineering, and process optimization[2-4]. Despite its effectiveness, standard BO methods only optimize for immediate next-step improvements, and thus do not directly assess the dependence between current data acquisition and subsequent experiments. This limitation can be critical when decision-making extends beyond a single step, e.g., in an experimental campaign [5]. In theory, to achieve a perfect balance between exploration and exploitation, one would need to solve the “multi-step lookahead” stochastic dynamic programming (SDP) problem for BO. However, this is often infeasible due to the sheer computational complexity and resource requirements.

Several efforts have been undertaken to find approximate solutions for the SDP problem inherent in multi-step lookahead BO. These approaches generally fall into two categories: those that employ rollout techniques and those limited to a two-step lookahead. More recently, several works propose methods that incorporate Reinforcement Learning (RL) to tackle multi-step lookahead BO challenges [6-7]. However, these RL-based approaches have encountered scalability issues, primarily due to their representation in the state space.

In this study, we introduce a novel architecture that integrates end-to-end RL with BO for multi-step lookahead decision-making in high-dimensional, unknown environments. Specifically, we encode the current state of knowledge in BO, or a set of experiments, as a point in a latent space, using a proposed neural network architecture. A critical characteristic of our model is its permutation invariance; the particular sequence of data acquisition in chemical experiments does not impact our understanding of the system. The learned latent representation is then utilized by an RL agent to make multi-step lookahead decisions. Actions determined by the RL agent are executed within a virtual environment using a Gaussian Process (GP), and the rewards obtained from these virtual experiments are used to iteratively update both the encoder and the RL agent. We evaluate the performance of our proposed BO framework on several high-dimensional benchmark functions by comparing its performance against that of traditional BO, a high-dimensional BO algorithm, and another end-to-end BO algorithm. Our computational study reveals that the proposed method exhibits lower average regret values, indicating a faster identification of optimal solutions across various scenarios. This suggests that our proposed BO framework can significantly enhance the efficiency of sequential decision-making in unknown environments, accelerating the discovery of globally optimal solutions.

[1] Beg, S., Swain, S., Rahman, M., Hasnain, M. S., & Imam, S. S. (2019). Application of design of experiments (DoE) in pharmaceutical product and process optimization. In Pharmaceutical quality by design (pp. 43-64). Academic Press.

[2] Pruksawan, S., Lambard, G., Samitsu, S., Sodeyama, K., & Naito, M. (2019). Prediction and optimization of epoxy adhesive strength from a small dataset through active learning. Science and technology of advanced materials, 20(1), 1010-1021.

[3] Byun, H. E., Kim, B., & Lee, J. H. (2022). Multi-step lookahead Bayesian optimization with active learning using reinforcement learning and its application to data-driven batch-to-batch optimization. Computers & Chemical Engineering, 167, 107987.

[4] Paulson, J. A., & Tsay, C. (2024). Bayesian optimization as a flexible and efficient design framework for sustainable process systems. arXiv preprint arXiv:2401.16373.

[5] Lee, E., Eriksson, D., Bindel, D., Cheng, B., & Mccourt, M. (2020, August). Efficient rollout strategies for Bayesian optimization. In Conference on Uncertainty in Artificial Intelligence (pp. 260-269). PMLR.

[6] Byun, H. E., Kim, B., & Lee, J. H. (2022). Multi-step lookahead Bayesian optimization with active learning using reinforcement learning and its application to data-driven batch-to-batch optimization. Computers & Chemical Engineering, 167, 107987.

[7] Cheon, M., Byun, H., & Lee, J. H. (2022). Reinforcement Learning based Multi‐Step Look‐Ahead Bayesian Optimization. IFAC-PapersOnLine, 55(7), 100-105.