(206g) Nonlinear Optimization on Exascale Computing Architectures and GPUs | AIChE

(206g) Nonlinear Optimization on Exascale Computing Architectures and GPUs

Authors 

Shin, S. - Presenter, Uninveristy of Wisconsin-Madison
Pacaud, F., Mines Paris - PSL
Anitescu, M., Argonne National Laboratory
Large-scale nonlinear optimization is critical for many scientific and engineering applications, such as energy infrastructure operations, process control, and machine learning. These problems often exhibit extreme complexities because they embed complex physical models that exhibit space-time dynamics and/or uncertainty scenario trees. However, solving such problems with second-order algorithms is computationally challenging, and traditional optimization solvers [1,2] struggle to cope with the extreme scale of emerging applications. With the performance of single-core computers stagnating in recent years, the need for optimization software tools capable of exploiting parallel and distributed computing architectures, such as distributed computing clusters and graphics processing units (GPUs), to enable scalability is increasing.

Despite the potential of parallel and distributed computing architectures, the development of parallel nonlinear optimization solvers has faced challenges due to the sequential nature of some operations within nonlinear optimization algorithms, resulting in limited success reported in the literature [3,4,5,6,7]. In large-scale regimes, the computational complexity is delegated to automatic differentiation (AD) tools and linear solvers, and the key to developing parallel optimization solvers is the parallelization of the derivative evaluations and solving sparse linear systems. While the parallel solution of linear systems has been extensively studied in the context of partial differential equations (PDEs), the linear systems arising from interior-point nonlinear optimization solvers are critically ill-conditioned, making the widespread use of iterative methods impossible. Consequently, the parallel solution of ill-conditioned sparse linear systems remains one of the most significant challenges in parallel nonlinear optimization. Moreover, parallel sparse automatic differentiation is inherently challenging to implement, and existing state-of-the-art AD tools do not support parallel computation of derivatives, even though significant opportunities exist for additional speed-up [8,9,10].

In this work, we present our recent efforts to develop software infrastructure for solving extremely large-scale constrained nonlinear optimization problems on supercomputers equipped with GPU accelerators [11,12,13,14]. Our software tools include the algebraic modeling/AD tool MadDiff.jl [15] and the nonlinear optimization solver MadNLP.jl [16]. MadDiff.jl allows users to specify optimization problems in a way that allows parallelization at the automatic differentiation level, running on multi-core CPUs as well as GPU accelerators, while abstracting away low-level details such as dispatching parallel derivative evaluations to GPU kernels. MadNLP.jl can take the optimization model in such a form and can solve optimization problems efficiently by performing the majority of operations on GPUs. The key advantages of our software tools include flexibility and portability. The multiple dispatch feature of Julia Language enables the development of a high-level optimization solver that can handle different data structure backends, allowing for the use of structure-exploiting strategies for linear algebra subroutines. The use of KernelAbstractions.jl unifies all the different vendor APIs, by wrapping them in Julia as a single API, enabling us to run the same code on CPUs as well as Intel, AMD, and NVIDIA GPUs. Importantly, since Julia is a compiled language, these flexibilities do not compromise performance.

Although developing parallel linear algebra subroutines for general sparse linear systems is challenging, we have developed several specialized linear solvers based on structure-exploiting reduction strategies [11,12,13]. These methods aim to reduce the large-scale sparse linear systems with few degrees of freedom into a dense linear system in a smaller dimension, which can be efficiently handled by vendor libraries (e.g., NVidia's CUDA framework). While this strategy may not be sufficient for handling general sparse linear systems, it can effectively address a wide range of optimization problems observed in chemical processes and energy systems.

We have conducted numerical experiments to demonstrate the effectiveness of our tools in solving large-scale optimization problems on GPUs, or even with a cluster of GPUs. Our results demonstrate a significant reduction in computation time, making it feasible to tackle problems that were previously considered intractable. We have benchmarked our algorithms and software infrastructure on several large-scale optimization problems from various domains, including chemical process control, PDE-constrained optimization, and AC and security-constrained optimal power flow problems. Our experiments show that for large-scale instances, our approach can speed up the solution time by up to 50 times compared to state-of-the-art methods running on a single CPU [13]. These promising results open up new opportunities for solving increasingly complex optimization problems by fully leveraging the computational power of exascale computing architectures and GPU accelerators.


[1] Wächter and Biegler. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical programming (2006).
[2] Byrd, Nocedal, and Waltz. KNITRO: An integrated package for nonlinear optimization. Large-scale nonlinear optimization (2006).
[3] Cao, Seth, and Laird. An augmented Lagrangian interior-point approach for large-scale NLP problems on graphics processing units. Computers & Chemical Engineering (2016).
[4] Rodriguez, Parker, Laird, Nicholson, Siirola, and Bynum. Scalable parallel nonlinear optimization with PyNumero and Parapint. INFORMS Journal on Computing (2023).
[5] Shin, Coffrin, Sundar, and Zavala. Graph-based modeling and decomposition of energy infrastructures. IFAC-PapersOnLine (2021).
[6] Chiang, Petra, and Zavala. Structured nonconvex optimization of large-scale energy systems using PIPS-NLP. In 2014 Power Systems Computation Conference (2014).
[7] Rodriguez, Laird, and Zavala. Scalable preconditioning of block-structured linear algebra systems using ADMM. Computers & Chemical Engineering (2020).
[8] Fourer, Gay, and Kernighan. AMPL: A mathematical programming language. AT & T Bell Laboratories (1987).
[9] Dunning, Huchette, and Miles. JuMP: A modeling language for mathematical optimization. SIAM review (2017).
[10] Andersson, Gillis, Horn, Rawlings, and Diehl. CasADi: a software framework for nonlinear optimization and optimal control. Mathematical Programming Computation (2019).
[11] Cole, David, et al. Exploiting GPU/SIMD Architectures for Solving Linear-Quadratic MPC Problems. 2023 American Control Conference (to appear).
[12] Pacaud, Shin, Schanen, Maldonado, and Anitescu. Accelerating condensed interior-point methods on SIMD/GPU architectures. Journal of Optimization Theory and Applications (2023).
[13] Pacaud, Shin, Schanen, Maldonado, and Anitescu. Parallel Interior-Point Solver for Block-Structured Nonlinear Programs on SIMD/GPU Architectures. arXiv:2301.04869 (2023).
[14] Anitescu, Kim, Kim, Maldonado, Pacaud, Rao, Schanen, Shin, and Subramanian. Targeting Exascale with Julia on GPUs for multiperiod optimization with scenario constraints. SIAG/OPT Views and News (2021).
[15] MadDiff.jl: An automatic differentiation and algebraic modeling package. https://github.com/sshin23/MadDiff.jl
[16] MadNLP.jl: A solver for nonlinear programming. https://github.com/MadNLP/MadNLP.jl