(65g) Exploring The Q-Learning Approach For Direct Controller Synthesis

Conference

AIChE Annual Meeting

Year

2007

Proceeding

2007 Annual Meeting

Group

Computing and Systems Technology Division

Session

Advances in Process Control - I

Time

Monday, November 5, 2007 - 2:30pm to 2:50pm

Authors

Wong, W. C. - Presenter, Georgia Institute of Technology

Crisalle, O. D. - Presenter, Dept. of Chemical Engineering

The combination of safety and economic constraints make closed-loop system-identification and subsequent controller design attractive to industrial control engineers. However, it is not always clear how the errors during the identification step will affect the controller performance. In this sense, it may be advantageous to consider the identification and controller design in an integrated manner.

In view of this, we investigate a reinforcement learning method called ?Q-learning' for addressing the issue of model identification and controller design synergistically. This can be viewed as a form of direct adaptive optimal control; theoretical results for Linear Quadratic Regulation (LQR) were first derived by (Bradtke et al., 1994).

Traditionally belonging to the realm of artificial intelligence, reinforcement learning is concerned with finding an optimal control policy via the decision-maker's interactions with its environment. Q-learning is an iterative instance with the added advantage that the true environmental model need not be known. Central to this is the Q(x, u) function, which is parameterized by the current policy in place, and maps a state (x)-action (u) pair to a scalar reflecting the intrinsic value of a state. More importantly, Q(.,.) instructs its corresponding policy. In the control context, controller design is separated into stages. At each stage, the learnt Q function is used to generate a policy no worse than the current one in place; the process converges to the optimal one.

In our context, we extend the LQR case of Bradtke et. al. in the presence of input-output data only, i.e., without state data. Through the judicious re-definition of the state-vector, we remove the requirement of full state-feedback without sacrificing control performance. Also, using the certainty-equivalent situation as a benchmark, we investigate the issue of robustness of this method against undermodeling, noise as well as non-linearity. Our investigations indicate that this method is a competitive alternative to the indirect approach.

References

Bradtke, S.J., Ydestie, B.E., & Barto, A.G. 1994. Adaptive linear quadratic control using policy iteration. In Proceedings of the American Control Conference. 3475-3476.

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2025 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: January 2025

CEP: December 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(65g) Exploring The Q-Learning Approach For Direct Controller Synthesis

AIChE Annual Meeting

2007

2007 Annual Meeting

Computing and Systems Technology Division

Advances in Process Control - I

Monday, November 5, 2007 - 2:30pm to 2:50pm

Authors

More Conference Links

Cancelation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams