(243d) Learning and Adapting Model Predictive Controllers with Reinforcement Learning for Time-Varying Systems

Conference

AIChE Annual Meeting

Year

2019

Proceeding

Computing and Systems Technology Division

Session

Tuesday, November 12, 2019 - 8:57am to 9:16am

Authors

Hedrick, E. - Presenter, West Virginia University

Reynolds, K., West Virginia University

Sarda, P., West Virginia University

Bhattacharyya, D., West Virginia University

Zitney, S., National Energy Technology Laboratory

Omell, B. P., National Energy Technology Laboratory

While model predictive control (MPC) has been a primary tool of the process control community¹, automatic learning of models and transitioning to previously-developed models, if available, is challenging for time-varying systems. The main challenge in transitioning to previously developed models lies in identifying the similarity of the current or immediate control problem with one or more previous control problem(s). Here we propose a novel MPC augmented with reinforcement learning (RL) for learning and adapting.

RL entails a Markov-decision process (MDP) whereby an actor applies an input to a system and a critic determines a reward based on the new state². Use of the RL with MPC in presence of parametric uncertainty for a linear system has been addressed³. For utilizing the evolving information about future uncertainty, a RL approach has been proposed by using the learned cost-to-go as the terminal penalty in a MPC⁴. However, one of the critical issue that can lead to poor performance of the MPC is model discrepancy. In this work, we propose a novel RL algorithm for learning as well as adapting the MPC.

Q-Learning, one of the RL methods, can be used to learn the state dynamics and the value function⁵. However, adapting the MPC model based on the Q-function for time-varying systems is impractical since the space of Q-function is infinite dimensional for a time-varying system. Here we use a BSS-ANOVA GP where the eigenfunctions in the Karhunen-LoÃ©ve (KL) expansion are used as the orthogonal basis functions. One of the key advantages of using a KL expansion with the GP model for the discrepancy function is that the stochasticity is represented by the discrepancy parameters, since the basis functions for each functional component do not change with the covariance function parameters. This translates to reduced computational costs. Residual analysis of the Bellmanâ€™s optimality equation along with the policy gradient and actor-critic methods are then used for model adaptation. The value functions and the policy as a map of control actions are stored in compact clusters by using a subtractive clustering technique for unsupervised learning of unique, or core, control features. Cores are automatically updated as new information are gathered. The algorithm also includes directed exploration methods that add an intrinsic award to the original reward ensuring that the infinite-horizon cost function converges to the exact cost-to-go function as the discrepancy vanishes. Feasibility and optimality conditions of the proposed algorithm are also analyzed.

The algorithm developed here is applied to the load-following problem in the operation of a supercritical pulverized coal (SCPC) power plant. Here, one of the critical control problems is that of the main steam temperature control under load changes. Due to sliding pressure operation and due to significant nonlinearity of steam properties in the operating domain as well as evolving ash buildup on the tubes, this system is a time-varying nonlinear system. The proposed RL-augmented MPC algorithm is evaluated using a high-fidelity dynamic model of the SCPC plant⁶.

Bibliography

[1] S. J. Qin and T. A. Badgwell, â€œA survey of industrial model predictive control technology,â€ Control Eng. Pract., vol. 11, no. 7, pp. 733â€“764, Jul. 2003.

[2] T. A. Badgwell, J. H. Lee, and K.-H. Liu, â€œReinforcement Learning â€“ Overview of Recent Progress and Implications for Process Control,â€ in Computer Aided Chemical Engineering, vol. 44, M. R. Eden, M. G. Ierapetritou, and G. P. Towler, Eds. Elsevier, 2018, pp. 71â€“85.

[3] J. E. Morinelly and B. E. Ydstie, â€œDual MPC with Reinforcement Learning,â€ 11th IFAC Symp. Dyn. Control Process Syst. Biosyst. DYCOPS-CAB 2016, vol. 49, no. 7, pp. 266â€“271, Jan. 2016.

[4] J. Lee, W. Wong, â€œApproximate Dynamic Programming Approach for Process Control,â€ Journal of Process Control, vol. 20, pp. 1038-1048, 2010.

[5] C. Watkins, â€œLearning from Delayed Rewards,â€ Dissertation, Cambridge, London, 1989.

[6] P. Sarda, E. Hedrick, K. Reynolds, D. Bhattacharyya, E. S. Zitney, and B. Omell, â€œDevelopment of a Dynamic Model and Control System for Load-Following Studies of Supercritical Pulverized Coal Power Plants,â€ Processes, vol. 6, no. 11, 2018.

Topics

Computing and Systems Engineering

Energy

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: December 2024

CEP: November 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(243d) Learning and Adapting Model Predictive Controllers with Reinforcement Learning for Time-Varying Systems

AIChE Annual Meeting

2019

2019 AIChE Annual Meeting

Computing and Systems Technology Division

Advances in Process Control

Tuesday, November 12, 2019 - 8:57am to 9:16am

Authors

Topics

More Conference Links

Cancelation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams