(46g) Reinforcement Learning Augmented Model Predictive Control with Fuzzy Subtractive Clustering and Fuzzy Cognitive Mapping | AIChE

(46g) Reinforcement Learning Augmented Model Predictive Control with Fuzzy Subtractive Clustering and Fuzzy Cognitive Mapping

Authors 

Hedrick, E. - Presenter, West Virginia University
Reynolds, K., West Virginia University
Dwivedy, V., West Virginia University
Bhattacharyya, D., West Virginia University
Zitney, S., National Energy Technology Laboratory
Omell, B. P., National Energy Technology Laboratory
Reinforcement learning (RL) is a model-free machine learning framework in which an agent learns from a system by direct interaction, as opposed to learning from labeled sets of data that occurs with unsupervised learning. RL can be applied to process control [1], but there are considerable opportunities for its improvement as well as expansion of application areas [2]. RL has been shown to outperform traditional controllers by accounting for plant-model mismatch [3] especially for systems with complex dynamics [4]. The potential of RL to improve the approximation of uncertain system dynamics has also been illustrated, and was shown to converge to a solution with small optimality gap for the discrete linear case [5]. Applications for decision making and process optimization have also been shown [6], [7]. Recent work has shown the application of deep neural networks to represent the Hamilton-Jacobi-Bellman equation in a large state space for an optimal control problem [8]. As can be seen, the research on RL for practical application to process control applications is diverse and growing, but significant work remains in solving key issues and exploiting all of the features of RL.

In this work, a novel adaptive model predictive controller (MPC) augmented with RL is proposed, where a supervisory RL agent learns an optimal tuning policy from the performance of the MPC while improving its performance through exploration and exploitation. However, for a general class of nonlinear systems, the state space is infinite dimensional thus leading to computational intractability due to the aggregation of infinite states.

Here we propose to reduce the size of the state space via a clustering approach. First, an updated value function approximation is considered, using a Bayesian Smoothing Spline (BSS)-ANOVA framework where eigenfunctions of the Karhunen-Loéve (KL) expansion are used as basis functions. Fuzzy Subtractive Clustering is then used to partition the state space in a control-centric way [9]. Clustering is performed in a state-optimal action space where the optimality of the action is evaluated with respect to cluster-specific basis functions. The state of the system is determined using a Fuzzy Cognitive Map (FCM) based on the existing cluster centers. This fuzzy clustering can be considered as an adaptive mapping of the states and actions where cluster aggregation is minimized using an information theoretic criterion. The FCM exploits stored memory for computing the optimal move. In this algorithm, policy improvement is performed using a TD(λ) update [10].

For fast and efficient learning, a two-stage learning approach is proposed where exploration in the first stage, where the agent learns on a reduced model, is followed by exploitation of the control space. The approach enables use of the RL technique for large-scale systems nonlinear with acceptable optimality gap. Performance of the proposed controller is benchmarked against the static underlying MPC by applying it to several applications. Of particular importance is the application of the proposed approach to a time-varying time-delay system.

[1] J. Shin, T. A. Badgwell, K.-H. Liu, and J. H. Lee, “Reinforcement Learning – Overview of recent progress and implications for process control,” Comput. Chem. Eng., vol. 127, pp. 282–294, Aug. 2019, doi: 10.1016/j.compchemeng.2019.05.029.

[2] J. B. Rawlings and C. T. Maravelias, “Bringing new technologies and approaches to the operation and control of chemical process systems,” AIChE J., vol. 65, no. 6, Jun. 2019, doi: 10.1002/aic.16615.

[3] L. A. Brujeni, J. M. Lee, and S. L. Shah, “Dynamic tuning of PI-controllers based on model-free Reinforcement Learning methods,” in ICCAS 2010, Oct. 2010, pp. 453–458, doi: 10.1109/ICCAS.2010.5669655.

[4] I. Carlucho, M. De Paula, S. A. Villar, and G. G. Acosta, “Incremental Q-learning strategy for adaptive PID control of mobile robots,” Expert Syst. Appl., vol. 80, pp. 183–199, Sep. 2017, doi: 10.1016/j.eswa.2017.03.002.

[5] J. E. Morinelly and B. E. Ydstie, “Dual MPC with Reinforcement Learning,” 11th IFAC Symp. Dyn. Control Process Syst. Biosyst. DYCOPS-CAB 2016, vol. 49, no. 7, pp. 266–271, Jan. 2016, doi: 10.1016/j.ifacol.2016.07.276.

[6] J. Shin and J. H. Lee, “Multi-timescale, multi-period decision-making model development by combining reinforcement learning and mathematical programming,” Comput. Chem. Eng., vol. 121, pp. 556–573, Feb. 2019, doi: 10.1016/j.compchemeng.2018.11.020.

[7] P. Petsagkourakis, I. O. Sandoval, E. Bradford, D. Zhang, and E. A. del Rio-Chanona, “Reinforcement learning for batch bioprocess optimization,” Comput. Chem. Eng., vol. 133, p. 106649, Feb. 2020, doi: 10.1016/j.compchemeng.2019.106649.

[8] J. W. Kim, B. J. Park, H. Yoo, T. H. Oh, J. H. Lee, and J. M. Lee, “A model-based deep reinforcement learning method applied to finite-horizon optimal control of nonlinear control-affine system,” J. Process Control, vol. 87, pp. 166–178, Mar. 2020, doi: 10.1016/j.jprocont.2020.02.003.

[9] Q. Gao, G. Feng, Z. Xi, Y. Wang, and J. Qiu, “A New Design of Robust ${\rm H}_{\infty}$ Sliding Mode Control for Uncertain Stochastic T-S Fuzzy Time-Delay Systems,” IEEE Trans. Cybern., vol. 44, no. 9, pp. 1556–1566, Sep. 2014, doi: 10.1109/TCYB.2013.2289923.

[10] R. S. Sutton and A. G. Barto, Reinforcement Learning - An Introduction, 2nd ed. United States of America: Westchester Publishing Services, 2018.