(389e) Use of Post-Decision-State Value Function for Control of Constrained Stochastic Systems | AIChE

(389e) Use of Post-Decision-State Value Function for Control of Constrained Stochastic Systems

Authors 

Wong, W. C. - Presenter, Georgia Institute of Technology
Lee, J. H. - Presenter, Korea Advanced Institute of Science and Technology (KAIST)

Model Predictive Control (MPC), the de-facto advanced process control solution, solves a deterministic optimization problem by assuming nominal values of future disturbance signals and uncertainties. The deterministic formulation means that uncertainty is sub-optimally addressed in MPC.

Most of the past attempts at formulating robust MPCs are based on the objective of minimizing the worst-case scenarios [1] at the expense of overly conservative policies. Multi-scenario formulations [2] have also been developed but the number of scenarios is limited and they do not give closed-loop optimal policies. Stochastic programming based methodologies [3] allows for recourse actions at the computational expense of enumerating an exponentially growing number of scenarios.

In order to tackle the problem of stochastic optimal control of nonlinear systems (Eq. 1) rigorously and practicably at the same time, we appeal to stochastic dynamic programming and the approximate solution of the accompanying optimality equations. Specifically, we take advantage of a ?post-decision state variable? [4, 5] so that on-line and off-line, single-stage optimization computations may be conveniently carried out using standard optimization solvers (a feature driving the attractiveness of MPC solutions) whilst ensuring that uncertainty is systematically handled.  

Consider the control of the following nonlinear system (Eq. (1)) under stochastic perturbation (by w):

(1)

where the goal is to find the optimal stationary policy p, that  results in the minimization of an expected, discounted infinite horizon cost,

(2)

where f(xt, p(xt)) is a stage-wise penalty and g a discounting factor. By shifting our attention to the post-decision state, defined as the state obtained immediately after an action is chosen but before the realization of the uncertainty (Eq. (1)), one obtains an alternative version of Bellman's optimality equations (Eqs. (3)- (4)), and with it, an implicit form of the optimal policy (Eq. (4)):

(3)

(4)

To overcome the intractability of the solving (Eq. 3), we judiciously employ off-line, closed-loop simulations and function approximation in a manner similar to that proposed by [6], who approximately solved Bellman's optimality equations based on the pre-decision state (xt) for the purpose of process control applications.

In our context, the set {(xp, J*(xp)) | xp XREL} is obtained as such. A finite-sized, control-relevant state space (XREL) is identified through simulations based on sub-optimal simulations; converged approximations of the optimal value functions (J*(xp)), is computed by recursing Eq. (3) for each member in XREL. Due to the continuous nature of stochastic, optimal control problems, an appropriately designed function approximator for J* is required to ensure convergence. With this in place, one solves Eq. (4) during on-line control.

Off-line, recursive computations of Eq. (3) are computationally convenient since they involve independent, single-stage deterministic optimizations (which may be parallelized for efficiency, if desired). This is in contrast to the pre-decision-state optimality equations where the single optimization involves an unwieldy expectation, i.e. the typically non-commutative min and E operators are interchanged through the introduction of the post-decision state, xp.  This computational benefit naturally extends to on-line optimizations, based on Eq. (4).

We demonstrate the usefulness of the proposed algorithm through examples involving the control of constrained linear and nonlinear systems where the typical MPC approach gives poor performance due to the underlying certainty equivalence assumption.

[References]

 

1.             Scokaert, P. and D. Mayne, Min-max feedback model predictive control for constrained linear systems. IEEE Transactions of Automatic Control, 1998. 43(8): p. 1136-1142.

2.             Laird, C.D. and L.T. Biegler, Large-Scale Nonlinear Programming for Multi-scenario Optimization, in Modeling, Simulation and Optimization of Complex Processes, H.G. Bock, et al., Editors. 2008, Springer Berlin Heidelberg. p. 323-336.

3.             Pena, D.M.d.l., A. Bemporad, and T. Alamo. Stochastic Programming Applied to Model Predictive Control. in 44th IEEE Conference on Decision and Control, and the European Control Conference. 2005. Seville, Spain.

4.             Roy, B.V., et al. A neuro-dynamic programming approach to retailer inventory management. in IEEE Conference on Decision and Control. 1997.

5.             Powell, W.B., Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley Series in Probability and Statistics. 2007: Wiley-Interscience.

6.             Lee, J.H. and J.M. Lee, Approximate dynamic programming based approach to process control and scheduling. Computers & Chemical Engineering, 2006. 30(10-12): p. 1603-1618.