(46g) Reinforcement Learning Augmented Model Predictive Control with Fuzzy Subtractive Clustering and Fuzzy Cognitive Mapping
AIChE Annual Meeting
2020
2020 Virtual AIChE Annual Meeting
Topical Conference: Next-Gen Manufacturing
Artificial Intelligence and Advanced Computation I
Monday, November 16, 2020 - 9:15am to 9:30am
In this work, a novel adaptive model predictive controller (MPC) augmented with RL is proposed, where a supervisory RL agent learns an optimal tuning policy from the performance of the MPC while improving its performance through exploration and exploitation. However, for a general class of nonlinear systems, the state space is infinite dimensional thus leading to computational intractability due to the aggregation of infinite states.
Here we propose to reduce the size of the state space via a clustering approach. First, an updated value function approximation is considered, using a Bayesian Smoothing Spline (BSS)-ANOVA framework where eigenfunctions of the Karhunen-Loéve (KL) expansion are used as basis functions. Fuzzy Subtractive Clustering is then used to partition the state space in a control-centric way [9]. Clustering is performed in a state-optimal action space where the optimality of the action is evaluated with respect to cluster-specific basis functions. The state of the system is determined using a Fuzzy Cognitive Map (FCM) based on the existing cluster centers. This fuzzy clustering can be considered as an adaptive mapping of the states and actions where cluster aggregation is minimized using an information theoretic criterion. The FCM exploits stored memory for computing the optimal move. In this algorithm, policy improvement is performed using a TD(λ) update [10].
For fast and efficient learning, a two-stage learning approach is proposed where exploration in the first stage, where the agent learns on a reduced model, is followed by exploitation of the control space. The approach enables use of the RL technique for large-scale systems nonlinear with acceptable optimality gap. Performance of the proposed controller is benchmarked against the static underlying MPC by applying it to several applications. Of particular importance is the application of the proposed approach to a time-varying time-delay system.
[1] J. Shin, T. A. Badgwell, K.-H. Liu, and J. H. Lee, âReinforcement Learning â Overview of recent progress and implications for process control,â Comput. Chem. Eng., vol. 127, pp. 282â294, Aug. 2019, doi: 10.1016/j.compchemeng.2019.05.029.
[2] J. B. Rawlings and C. T. Maravelias, âBringing new technologies and approaches to the operation and control of chemical process systems,â AIChE J., vol. 65, no. 6, Jun. 2019, doi: 10.1002/aic.16615.
[3] L. A. Brujeni, J. M. Lee, and S. L. Shah, âDynamic tuning of PI-controllers based on model-free Reinforcement Learning methods,â in ICCAS 2010, Oct. 2010, pp. 453â458, doi: 10.1109/ICCAS.2010.5669655.
[4] I. Carlucho, M. De Paula, S. A. Villar, and G. G. Acosta, âIncremental Q-learning strategy for adaptive PID control of mobile robots,â Expert Syst. Appl., vol. 80, pp. 183â199, Sep. 2017, doi: 10.1016/j.eswa.2017.03.002.
[5] J. E. Morinelly and B. E. Ydstie, âDual MPC with Reinforcement Learning,â 11th IFAC Symp. Dyn. Control Process Syst. Biosyst. DYCOPS-CAB 2016, vol. 49, no. 7, pp. 266â271, Jan. 2016, doi: 10.1016/j.ifacol.2016.07.276.
[6] J. Shin and J. H. Lee, âMulti-timescale, multi-period decision-making model development by combining reinforcement learning and mathematical programming,â Comput. Chem. Eng., vol. 121, pp. 556â573, Feb. 2019, doi: 10.1016/j.compchemeng.2018.11.020.
[7] P. Petsagkourakis, I. O. Sandoval, E. Bradford, D. Zhang, and E. A. del Rio-Chanona, âReinforcement learning for batch bioprocess optimization,â Comput. Chem. Eng., vol. 133, p. 106649, Feb. 2020, doi: 10.1016/j.compchemeng.2019.106649.
[8] J. W. Kim, B. J. Park, H. Yoo, T. H. Oh, J. H. Lee, and J. M. Lee, âA model-based deep reinforcement learning method applied to finite-horizon optimal control of nonlinear control-affine system,â J. Process Control, vol. 87, pp. 166â178, Mar. 2020, doi: 10.1016/j.jprocont.2020.02.003.
[9] Q. Gao, G. Feng, Z. Xi, Y. Wang, and J. Qiu, âA New Design of Robust ${\rm H}_{\infty}$ Sliding Mode Control for Uncertain Stochastic T-S Fuzzy Time-Delay Systems,â IEEE Trans. Cybern., vol. 44, no. 9, pp. 1556â1566, Sep. 2014, doi: 10.1109/TCYB.2013.2289923.
[10] R. S. Sutton and A. G. Barto, Reinforcement Learning - An Introduction, 2nd ed. United States of America: Westchester Publishing Services, 2018.