(463e) An Augmented Lagrangian Approach for Parallel Solution of NLP Problems On Graphics Processing Units | AIChE

(463e) An Augmented Lagrangian Approach for Parallel Solution of NLP Problems On Graphics Processing Units



Large-scale nonlinear programming
(NLP) has proven to be an effective framework for optimal process design and
operation. As problem sizes continue to grow, the capabilities of a single
workstation can become insufficient. This, coupled with the emergence of new
concurrent computing architectures, drives the need for effective parallel
algorithms.

Graphics processing units (GPUs)
are emerging as massively parallel systems that offer a large degree of
parallelism at a relatively low cost. However, compared with classic
multiple-instruction-multiple-data (MIMD) the single-instruction-multiple-thread
(SIMT) architecture of the modern GPU limits its direct application in many
numerical applications. This architecture contains a large number of processing
cores, however, groups of these cores must be concurrently executing the same
instruction. This makes the architecture inappropriate for parallel direct
factorization – the primary numerical workhorse of most nonlinear
programming solvers.

In this paper, we propose an
Augmented Lagrangian approach for parallel solution of large-scale nonlinear programming
problems on a GPU. The algorithm is iterative at three levels. The first level replaces
the original problem by a sequence of bound-constrained optimization problems using
an Augmented Lagrangian method. Each of these bound-constrained problems is
solved using a nonlinear interior-point method. Inside the interior-point
method, the barrier subproblems are solved using a variation of Newton's method,
where the linear system is solved using a preconditioned conjugate gradient
method (PCG). If the PCG approach detects negative curvature, a diagonal
modification of the Hessian is made to ensure positive definiteness. The primary
advantage of this algorithm is that is allows use of the PCG method, instead of
using direct factorization. The PCG method can be implemented efficiently on
the GPU in parallel.

Finally, numerical experiments are
reported using a set of problems from COPS test collection. We compare the
computational results of our serial and parallel versions with the IPOPT
nonlinear optimization package, and show that significant speedup
(approximately an order of magnitude) is possible.

See more of this Session: Advances In Optimization

See more of this Group/Topical: Computing and Systems Technology Division