(16g) Stochastic Optimal Control of Polynomial Jump-Diffusion Processes Via Local Occupation Measures | AIChE

(16g) Stochastic Optimal Control of Polynomial Jump-Diffusion Processes Via Local Occupation Measures

Authors 

Barton, P. I., Massachusetts Institute of Technology
The optimal control of stochastic processes is arguably one of the most fundamental problems in the context of decision-making under uncertainty. While a wide range of decision-making problems in engineering lend themselves to be naturally modeled as stochastic optimal control problems, only a small subset of such problems allows for a globally optimal control policy to be identified in a tractable manner. As a consequence, engineers are often forced to resort to one of many available heuristic or approximate techniques for the construction of control policies in practice. While such approximations and heuristics are frequently found to perform remarkably well, they seldom come with certificates of optimality or mechanisms to quantify or bound the degree of suboptimality they induce. The study of nonlinear stochastic processes and control problems through the lens of occupation measures [1,2,3] offers a general theoretical and computational framework to close this gap. Specifically, it enables the systematic construction of (tight) convex relaxations in the form of generalized moment problems derived from a weak formulation of the underlying stochastic optimal control problem [4,5,6]. In particular in the case of optimal control of jump-diffusion processes with polynomial data (i.e., polynomial drift and diffusion coefficients, arrival rates, etc.), these generalized moment problems further admit a tractable approximation by a hierarchy of semidefinite programs (SDP) [7], which enables a practical computation of rigorous bounds on the degree of suboptimality of any given control policy. A critical limitation of this approach, however, remains in its poor scalability, which is further amplified by the challenges associated with the solution of large SDPs.

In this work, we address this limitation by introducing the concept of local occupation measures. Local occupation measures are obtained by restriction of the state-action occupation measure associated with a stochastic control system to a subset of the time domain. This straightforward generalization allows to bridge the gap between the weak and strong formulation of the associated optimal control problem by discretizing the time domain and imposing constraints on the system trajectories in a weak form over the resultant collection of time intervals instead of over the entire time horizon as a whole (as is done traditionally). As a consequence, the generalized moment problems and associated SDP relaxations generated this way are not only tighter but explicitly reflect the causal temporal structure that is inherent to optimal control problems - a feature that is absent from the traditional formulation. From a practical perspective, this structure can crucially be exploited by distributed optimization algorithms offering the potential to improve scalability. Moreover, the use of local occupation measures provides a new mechanism to tighten the generated SDP relaxations via refinement of the time discretization at the cost of only a linear increase in problem size. This is in stark contrast to the traditionally used tightening mechanism which relies on increasing the truncation order of moment sequences associated with multivariate measures, hence suffers from combinatorial scaling. As an aside, we note that the proposed approach inherits key properties of the original occupation measure framework; most notably, convergence of the optimal value of the SDP relaxations to the true optimal value of the stochastic control problem can be established under mild regularity conditions and the dual SDPs furnish piecewise polynomial subsolutions to the Hamilton-Jacobi-Bellman equations, providing useful information for controller design [4,8]. We demonstrate the effectiveness and versatility of the proposed framework with examples from systems biology and population control.

[1] Wendell H. Fleming and Domokos Vermes. Convex duality approach to the optimal control of diffusions. SIAM Journal on Control and Optimization, 27(5):1136–1155, 1989

[2] Abhay G. Bhatt and Vivek S. Borkar. Occupation Measures for Controlled Markov Processes: Characterization and Optimality. The Annals of Probability, 24(3):1531–1562, 1996

[3] Thomas G. Kurtz and Richard H. Stockbridge. Existence of Markov Controls and Characterization of Optimal Markov Controls. SIAM Journal on Control and Optimization, 36(2):609–653, 1998

[4] Jean B. Lasserre, Didier Henrion, Christophe Prieur, and Emmanuel Trélat. Nonlinear Optimal Control via Occupation Measures and LMI-Relaxations. SIAM Journal on Control and Optimization, 47(4):1643–1666, 2008.

[5] Carlo Savorgnan, Jean B. Lasserre, and Moritz Diehl. Discrete-time stochastic optimal control via occupation measures and moment relaxations. Proceedings of the IEEE Conference on Decision and Control, pages 519–524,2009.

[6] Milan Korda, Didier Henrion, and Jean B. Lasserre. Moments and Convex Optimization for Analysis and Control of Nonlinear Partial Differential Equations. arXiv preprint arXiv:1804.07565, 2018

[7] Jean B. Lasserre. Moments, Positive Polynomials and Their Applications, volume 1. World Scientific, 2010.

[8] Milan Korda, Didier Henrion, and Colin N. Jones. Controller design and value function approximation for nonlinear dynamical systems. Automatica,67:54–66, 2016.