(543f) Offset-Free Deep Deterministic Policy Gradient with Lyapunov Learning Penalties

Conference

AIChE Annual Meeting

Year

2022

Proceeding

2022 Annual Meeting

Group

Computing and Systems Technology Division

Session

Advances in Machine Learning and Intelligent Systems II

Time

Wednesday, November 16, 2022 - 5:05pm to 5:24pm

Authors

Hedrick, E. - Presenter, West Virginia University

Hedrick, K., West Virginia University

Bhattacharyya, D., West Virginia University

Zitney, S., National Energy Technology Laboratory

Omell, B. P., National Energy Technology Laboratory

Recent advancements in deep learning have seen the expansion in applications and efficacy of actor-critic reinforcement learning (RL) algorithms [1]â€“[3]. However, there still exist significant challenges in applying these types of approaches to automatic process control, most notably due to sample inefficiency and lack of performance guarantees. While deep networks can approximate generic functions very well, the large number of parameters in these networks (especially where network architectures are relatively limited) requires many samples to achieve satisfactory performance. Further, the structure of the actor network can be limiting in that, while â€œgoodâ€ performance may be achieved, neither stability nor elimination of offset is guaranteed. The approaches detailed in this work propose methods to address both issues.

To address the problem of offset-free control, a two-policy approach is proposed where it is assumed that, close to the origin, a linear, state-feedback controller exists that will drive the states to zero. Further from the origin it is assumed that a fully parameterized control policy (i.e., a neural network generating input moves) will drive the states near enough to origin that offset can then be eliminated. To retain the model-free nature of these approaches it is only assumed that the feedback policy exists, but that its gains are unknown; a second RL agent is used to learn these values. This approach is applied to linear and nonlinear examples, where learning is carried out in episodes starting from random states. The learning method applied is deep deterministic policy gradient (DDPG), where deep networks are used to approximate both the action-value function and the optimal policy [4] (two sets of networks are used in the proposed approach). After learning, it is shown that the proposed approach can eliminate offset in both systems.

To address the problem on sample efficiency, work exists in the inverse (I-)RL and apprenticeship learning literature [5], [6]. However, most work in this area aims to also generate parameterized reward functions, leading to significant complications. This level of complexity may not be necessary where well-posed value functions (e.g., quadratic penalties) are already defined. Further, it is assumed for the purposes of this work that an appropriate controller for the plant, PID or other simple controllers, exists or can easily be generated. In this way, the value function can be trained based on the reward profile of the existing controller; the policy, rather than being trained to maximize reward, can be trained to approximate the current control policy. These networks can then be used for initialization when learning is initiated on the true plant, allowing for possibly less exploration, faster convergence, or some combination of the two. Result are presented for the application of this approach to several energy and chemical systems and compared with the naïve initialization with the same network structures.

Bibliography

[1] H. Yoo, B. Kim, J. W. Kim, and J. H. Lee, â€œReinforcement learning based optimal control of batch processes using Monte-Carlo deep deterministic policy gradient with phase segmentation,â€ Comput. Chem. Eng., vol. 144, Jan. 2021, doi: 10.1016/j.compchemeng.2020.107133.

[2] M. Zanon and S. Gros, â€œSafe Reinforcement Learning Using Robust MPC,â€ IEEE Trans. Automat. Contr., vol. 66, no. 8, pp. 3638â€“3652, Aug. 2021, doi: 10.1109/TAC.2020.3024161.

[3] D. Silver et al., â€œMastering the game of Go without human knowledge,â€ Nature, vol. 550, no. 7676, pp. 354â€“359, Oct. 2017, doi: 10.1038/nature24270.

[4] T. P. Lillicrap et al., â€œContinuous control with deep reinforcement learning,â€ arXiv, Sep. 2015.

[5] P. Abbeel, â€œInverse Reinforcement Learning,â€ SpringerReference, 2012, doi: 10.1007/springerreference_179129.

[6] M. Mowbray, R. Smith, E. A. Del Rio-Chanona, and D. Zhang, â€œUsing process data to generate an optimal control policy via apprenticeship and reinforcement learning,â€ AIChE J., no. May, pp. 1â€“15, 2021, doi: 10.1002/aic.17306.

Topics

Computing and Systems Engineering

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: November 2024

CEP: October 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.