(65g) Exploring The Q-Learning Approach For Direct Controller Synthesis
AIChE Annual Meeting
2007
2007 Annual Meeting
Computing and Systems Technology Division
Advances in Process Control - I
Monday, November 5, 2007 - 2:30pm to 2:50pm
The combination of safety and economic constraints make closed-loop system-identification and subsequent controller design attractive to industrial control engineers. However, it is not always clear how the errors during the identification step will affect the controller performance. In this sense, it may be advantageous to consider the identification and controller design in an integrated manner.
In view of this, we investigate a reinforcement learning method called ?Q-learning' for addressing the issue of model identification and controller design synergistically. This can be viewed as a form of direct adaptive optimal control; theoretical results for Linear Quadratic Regulation (LQR) were first derived by (Bradtke et al., 1994).
Traditionally belonging to the realm of artificial intelligence, reinforcement learning is concerned with finding an optimal control policy via the decision-maker's interactions with its environment. Q-learning is an iterative instance with the added advantage that the true environmental model need not be known. Central to this is the Q(x, u) function, which is parameterized by the current policy in place, and maps a state (x)-action (u) pair to a scalar reflecting the intrinsic value of a state. More importantly, Q(.,.) instructs its corresponding policy. In the control context, controller design is separated into stages. At each stage, the learnt Q function is used to generate a policy no worse than the current one in place; the process converges to the optimal one.
In our context, we extend the LQR case of Bradtke et. al. in the presence of input-output data only, i.e., without state data. Through the judicious re-definition of the state-vector, we remove the requirement of full state-feedback without sacrificing control performance. Also, using the certainty-equivalent situation as a benchmark, we investigate the issue of robustness of this method against undermodeling, noise as well as non-linearity. Our investigations indicate that this method is a competitive alternative to the indirect approach.
References
Bradtke, S.J., Ydestie, B.E., & Barto, A.G. 1994. Adaptive linear quadratic control using policy iteration. In Proceedings of the American Control Conference. 3475-3476.