(457c) Continuous Learning of the Value Function Utilizing Deep Reinforcement Learning to be Used As the Objective in Model Predictive Control
AIChE Annual Meeting
2024
2024 AIChE Annual Meeting
Computing and Systems Technology Division
10B: Predictive Control and Optimization
Wednesday, October 30, 2024 - 8:32am to 8:48am
The work presented here investigates implementing RL with existing MPC frameworks. Selection of MPC for combination with RL is not arbitrary. Two specific aspects of MPC are advantageous for such a combination- the use of a value function and the use of a model. Direct use of a value function in MPC (VF-MPC) as the objective function maintains the structure of the MPC policy in terms of constraint formulation but may be limited in the selection of optimization solver. That is dependent on the form of the value function approximation. The use of a model in MPC is also useful in that, by solving for the optimal trajectory, a projected view of the expected reward is gained. While this information can be inaccurate based on the current value function, it can allow for accelerated learning.
The most current RL algorithms for application to continuous systems generally make use of NNs as the basis function. However, this presents potential issues with second order optimization methods for use in the VF-MPC algorithm. Most continuous RL algorithms adopt an actor-critic structure to circumvent this issue, such as the DDPG [7] or TD3 [8]. The work develops an approach for adapting the actor-critic structure to the value function, while maintaining the advantages of MPC. In addition, a common shortcoming found within most RL algorithms is the sensitivity of the algorithm to its own learning parameters. For the structure of the VF-MPC algorithm, the most significant parameter is the length of learned projection. This work develops an approach for meta-learning of the optimal parameter for a given case study, while also deriving an effective value function. This algorithm will be evaluated by applying it to the classic double integrator and an industrial SCR unit, a time-varying time-delay system where the traditional MPC has poor performance [2].
Following with these points, the main contributions of this work are:
- The proposal of a combination of RL and MPC in which the learned value function is used as the cost function for the MPC. This formulation yields an optimal MPC with respect to the reward function.
- A key contribution is the proposal of the use of the optimized trajectory from the MPC to accelerate learning, along with analysis of how the search depth in the trajectory affects the rate of learning.
- The proposal of two algorithms employing these concepts: VFMPC(0), using the one step return in order to learn the cost function, and VFMPC(n), using the optimal trajectory to learn on the n-step return subject to the dynamics of the controller model.
- These algorithms and their performance are exhibited on two process control examples.
[1] O. Dogru et al., âReinforcement Learning in Process Industries : Review and Perspective,â 2 IEEE/CAA J. Autom. Sin., vol. 11, no. 2, pp. 1â19, 2024, doi: 10.1109/JAS.2024.124227.
[2] E. Hedrick, K. Hedrick, D. Bhattacharyya, S. E. Zitney, and B. Omell, âReinforcement learning for online adaptation of model predictive controllers: Application to a selective catalytic reduction unit,â Comput. Chem. Eng., vol. 160, p. 107727, 2022, doi: 10.1016/j.compchemeng.2022.107727.
[3] S. Gros and M. Zanon, âData-driven economic NMPC using reinforcement learning,â IEEE Trans. Automat. Contr., vol. 65, no. 2, pp. 636â648, Feb. 2020, doi: 10.1109/TAC.2019.2913768.
[4] Y. Yang and S. Lucia, âMulti-step greedy reinforcement learning based on model predictive control,â IFAC-PapersOnLine, vol. 54, no. 3, pp. 699â705, 2021, doi: 10.1016/j.ifacol.2021.08.323.
[5] X. Pan, X. Chen, Q. Zhang, and N. Li, âModel Predictive Control : A Reinforcement Learning-based Approach,â J. Phys. Conf. Ser., vol. 2203, no. 1, p. 012058, 2022, doi: 10.1088/1742-6596/2203/1/012058.
[6] R. Nian, J. Liu, and B. Huang, âA review On reinforcement learning: Introduction and applications in industrial process control,â Comput. Chem. Eng., vol. 139, p. 106886, Aug. 2020, doi: 10.1016/j.compchemeng.2020.106886.
[7] T. P. Lillicrap et al., âContinuous control with deep reinforcement learning,â 4th Int. Conf. Learn. Represent. ICLR 2016 - Conf. Track Proc., Sep. 2016, [Online]. Available: http://arxiv.org/abs/1509.02971
[8] S. Fujimoto, H. Van Hoof, and D. Meger, âAddressing Function Approximation Error in Actor-Critic Methods,â in 35th International Conference on Machine Learning, ICML 2018, Feb. 2018, pp. 2587â2601. [Online]. Available: http://arxiv.org/abs/1802.09477