(363h) Comparing Reinforcement Learning and Bayesian Optimization for Tuning MPC Policies | AIChE

(363h) Comparing Reinforcement Learning and Bayesian Optimization for Tuning MPC Policies

Authors 

Pérez-Piñeiro, D. - Presenter, Norwegian University of Science and Technology
Skogestad, S., Norwegian University of Science and Technology
Model predictive control (MPC) policies under plant-model mismatch will generally result in poor closed-loop performance. One traditional solution to this problem is to improve the model of the system from input-output data using system identification techniques. An alternative solution approach is to parametrize the MPC policy (i.e., the stage cost, terminal cost and constraints) and tune its parameters to improve closed-loop performance despite the plant-model mismatch. In this approach, selecting a good parametrization is often an art and it tends to be guided by process insight. However, more systematic approaches such as positive sums of convex functions and sum-of-square approaches are also possible, but more research needs to be done in this direction. On the other hand, tuning the MPC parameters can be done systematically. Data-driven optimization methods such as reinforcement learning and Bayesian optimization have been proposed in the literature to this end. Reinforcement learning techniques such as Q-learning and deterministic policy gradient methods have been used by Gros and Zanon (2019). Similarly, Bayesian optimization with reference models to exploit prior system information has been explored by Lu et al. (2021).

Despite these results, the authors feel that there has not been enough cross-communication between these two lines of work. To bridge the gap, this work compares deterministic policy gradient methods from reinforcement learning against Bayesian optimization methods using a series of benchmark energy storage problems. We compare the two classes of algorithms in terms of data efficiency and discuss their relative merits and disadvantages, as well as opportunities for integration.

References:

Gros, S., & Zanon, M. (2019). Data-driven economic nmpc using reinforcement learning. IEEE Transactions on Automatic Control, 65(2), 636-648.

Lu, Q., González, L. D., Kumar, R., & Zavala, V. M. (2021). Bayesian optimization with reference models: A case study in MPC for HVAC central plants. Computers & Chemical Engineering, 154, 107491.