(278e) Tractable Global Solutions to Bayesian Optimal Experiment Design | AIChE

(278e) Tractable Global Solutions to Bayesian Optimal Experiment Design

Authors 

Rodrigues, D. - Presenter, Instituto Superior Tecnico
Mesbah, A., University of California, Berkeley
Optimal experiment design (OED) uses a system model to design experimental conditions by maximizing the information content of observations [1]. Classical OED formulations are based on scalar metrics of the Fisher information matrix (FIM). On the other hand, the design criteria in Bayesian OED approaches are defined in terms of expected utility, which is often expressed in terms of prior and posterior distributions of the parameters [2]. Bayesian OED is especially advantageous when the system observations are noisy, incomplete, and indirect [3]. Gradient-based optimization approaches such as stochastic approximation and sample average approximation methods can be used to attain locally optimal designs [4]. However, these methods cannot guarantee the global optimality of the selected designs.

This contribution presents a tractable approach for obtaining globally optimal solutions to Bayesian OED for nonlinear systems with time-varying designs. The complexity in nonconvex and global optimization scales exponentially with the number of decision variables. Thus, it is desired to formulate the problem in terms of as few as possible decision variables to enable tractable solutions. This is achieved by a parsimonious input parametrization [5], which can be a promising approach for OED problems since they are typically high-dimensional in the design variables.

We consider continuous-time, nonlinear dynamical systems where the states depend on the uncertain parameters and the inputs that may be restricted between lower and upper bounds. Noisy measurements are collected at various instants. We aim to optimally design the inputs by maximizing the information content of the observations for estimation of the unknown parameters. To this end, we adopt a Bayesian perspective. Under a given design and a realization of the observations, the change in the information about the parameters between a prior probability density function (pdf) and a posterior pdf is given by Bayes' rule, which also depends on a likelihood function that results in an evidence.

In Bayesian OED, the optimal inputs are designed by maximizing an expected utility that corresponds to the prior expectation of a utility function. In this work, the utility function is the Kullback-Leibler (KL) divergence from the evidence to the likelihood function. The resulting design also maximizes the expected utility in terms of the KL divergence from the prior to the posterior distributions as well as the expected gain in Shannon information between the distributions.

The expected utility is approximated using nested Monte Carlo integration over the joint observation and parameter space, which can become prohibitively expensive. To address this computational challenge, we approximate the Bayesian OED problem. Under the assumption that the likelihood function and the prior pdf follow normal distributions, the aforementioned design can be approximated as the design that maximizes the scalar metric that involves the prior expectation of a function of the FIM for Bayes D-optimality. This corresponds to a utility function that depends on the FIM and on the state sensitivities.

The computation of the expected utility via multivariate integration over the parameter space typically requires computing the utility function for many parameters, which is computationally prohibitive if this procedure needs to be repeated for many designs in the context of optimization. Thus, we seek an integration rule based on as few quadrature points as possible. Even in the univariate case, methods based on Gaussian quadrature minimize the number of points needed for exact integration of polynomials of a given degree. We use an efficient approach that corresponds to sparse stochastic collocation and is the multivariate equivalent of Gaussian quadrature.

One can express the utility function as an expansion in terms of orthogonal polynomials with respect to the prior pdf, which are Hermite polynomials for a normal prior pdf. The orthogonal polynomials are given by a truncation scheme to introduce sparsity since this reduces the number of quadrature points when the number of parameters is large. If the number of quadrature points is sufficiently large, one can choose weights and points such that they satisfy the condition for exact integration of the orthogonal polynomials. It follows that the error in the approximation of the expected utility vanishes when the utility function is exactly expressed in terms of these orthogonal polynomials.

The approximate expected utility is an explicit function of the system states and their sensitivities for the different quadrature points. This means that the Bayesian OED problem can be approximated by an optimal control problem (OCP) subject to the dynamics of these states and sensitivities. The inputs that represent the solution to the OCP are composed of several arcs. For each input, each arc can be input constraint-seeking, such that it is determined by the input bounds, or sensitivity-seeking, such that it is determined by an equality that stems from the dynamics [5]. Hence, there is a finite number of arc types from which arc sequences can be formed. It follows that the number of plausible sequences is also finite.

We aim to show how Bayesian OED problems reformulated as an OCP can be solved efficiently to global optimality. The proposed approach for global optimality relies on determining: (i) when and how the optimal switching between arcs takes place for a given plausible arc sequence; and (ii) which sequence provides the optimal solution. Once question (i) is addressed for every plausible sequence, for example via parallel computing, it is trivial to answer question (ii). For a given plausible arc sequence, the input is defined by the following decision variables: the switching times between arcs and the initial conditions of the sensitivity-seeking arcs. Then, addressing question (i) above consists in computing the optimal values of the decision variables for the given arc sequence.

For a given arc sequence, we describe the input in each interval by defining states for this interval and combining all the states into a vector of extended states. Then, one can eliminate input dependencies and reformulate the OCP in terms of the extended states and the new decision variables, which is convenient for numerical optimization since there are only a few decision variables. We aim to reformulate the OCP for each arc sequence as a polynomial optimization problem (POP) that is amenable to global optimization. This entails expressing the cost as a polynomial function [6]. To this end, we compute the cost and its first-order partial derivatives with respect to the decision variables.

An efficient approach to approximate the cost as a polynomial function consists in using multivariate Hermite interpolation to obtain a polynomial of a certain degree that matches the value and the first-order partial derivatives at some sample points. Note that this requires no more than computing the extended states and the extended adjoint variables for the cost that correspond to each point. The number of sample points is polynomial in the number of decision variables, which is typically small owing to the parsimonious input parameterization. This yields the polynomial representation of the cost. Hence, the OCP for that arc sequence is reformulated as a POP.

This POP is solved efficiently to global optimality via reformulation as a hierarchy of convex semidefinite programs using the concept of sum-of-squares polynomials. Although the method to solve such problems to global optimality is out of the scope of this contribution, standard methods for this purpose have been described [6].

The proposed approach is applied to the case study of a Lotka-Volterra system that is widely used as a benchmark problem for OED. For all the plausible arc sequences, it is possible to extract the unique solution to the POP and certify its global optimality. One can determine the globally optimal solution with no more than 3 arcs, which only requires solving 6 problems in parallel in less than 1000 s. If we had only used local optimization to compute a local solution to the OED problem, we could have obtained a worse local solution and it would not have been possible to provide any guarantee that the local solution is in any way close to the globally optimal solution.

In future work, we aim to apply a similar methodology to obtain tractable global solutions to increasingly complex Bayesian OED problems, without the approximation of the expected utility for KL divergence as a Bayes D-optimality formulation and the assumption of normal distributions for the likelihood and the prior. A further extension would also include the consideration of more complex path constraints.

[1] G. Franceschini and S. Macchietto. Model-based design of experiments for parameter precision: State of the art, Chem. Eng. Sci., 63(19):4846 – 4872, 2008.

[2] K. Chaloner and I. Verdinelli. Bayesian experimental design: A review, Stat. Sci., 10(3):273-304, 1995.

[3] X. Huan and Y. M. Marzouk. Simulation-based optimal Bayesian experimental design for nonlinear systems, J. Comput. Phys., 232(1):288-317, 2013.

[4] J. A. Paulson, M. Martin-Casas, and A. Mesbah. Optimal Bayesian experiment design for nonlinear dynamic systems with chance constraints, J. Process Control, 77:155-171, 2019.

[5] D. Rodrigues and D. Bonvin. On reducing the number of decision variables for dynamic optimization, Optim. Control Appl. Meth., 41(1):292-311, 2020.

[6] D. Henrion and J. B. Lasserre. GloptiPoly: Global optimization over polynomials with Matlab and SeDuMi, ACM Trans. Math. Softw., 29(2):165-194, 2003.

Checkout

This paper has an Extended Abstract file available; you must purchase the conference proceedings to access it.

Checkout

Do you already own this?

Pricing

Individuals

AIChE Pro Members $150.00
AIChE Emeritus Members $105.00
AIChE Graduate Student Members Free
AIChE Undergraduate Student Members Free
AIChE Explorer Members $225.00
Non-Members $225.00