(558e) A New Reinforcement Learning Based Bayesian Optimization Method for a Sequential Decision Making in an Unknown Environment
AIChE Annual Meeting
2021
2021 Annual Meeting
Computing and Systems Technology Division
Advances in nonlinear and surrogate optimization
Thursday, November 11, 2021 - 9:16am to 9:35am
In this work, we propose an architecture of reinforcement learning based BO for multi-step lookahead decision making in an uncertain environment. Reinforcement learning (RL) is used to approximately solve the DP problem in an efficient way and to enable multi-step lookahead decision making. To incorporate reinforcement learning into the BO, the BO problem has to be translated into a Markov Decision Process (MDP) which RL requires. Unlike games or robotics where reinforcement learning has been applied to thus far, proper definitions of the state and reward for an agent are not clear for the BO problem. Thus, this paper suggests a novel way of defining an MDP for solving the multi-step lookahead BO problem. Proximal Policy Optimization (PPO), a state-of-the art RL algorithm, is employed in this work. The performance of the proposed RL-based BO has been tested throughout several benchmark functions by comparing average regrets for each step of decision to that of the conventional BO. As a result, the proposed BO has lower average regret values than the conventional BO, which means the reinforcement learning based BO has found a better optimum faster than the conventional BO. The proposed BO can be applied to a variety of sequential decision-making problems cast in an unknown environment (e.g. with an unknown decision-reward map) to accelerate the finding of the global optimal solution.
[1] Schmidt, J., Marques, M. R., Botti, S., & Marques, M. A. (2019). Recent advances and applications of machine learning in solid-state materials science. npj Computational Materials, 5(1), 1-36.
[2] Greenhill, S., Rana, S., Gupta, S., Vellanki, P., & Venkatesh, S. (2020). Bayesian optimization for adaptive experimental design: A review. IEEE Access, 8, 13937-13948.
[3] Wu, J., & Frazier, P. (2019). Practical two-step lookahead Bayesian optimization. Advances in neural information processing systems, 32, 9813-9823.
[4] Lam, R., & Willcox, K. (2017, December). Lookahead Bayesian Optimization with Inequality Constraints. In NIPS (pp. 1890-1900).