(245b) An Algorithm for Exascale-Capable Integrated Process Design and Control | AIChE

(245b) An Algorithm for Exascale-Capable Integrated Process Design and Control

Authors 

Papadopoulos, A. - Presenter, Center for Research and Technology-Hellas
Vasilas, N., Centre for Research and Technology Hellas
Papadopoulos, L., National Technical University of Athens
Soudris, D., National Technical University of Athens
Seferlis, P., Aristotle University of Thessaloniki
Industrial processes are subject to endogenous and exogenous variability that has detrimental effects on their performance. It is therefore necessary to design process systems that inherently exhibit low sensitivity to variability, but also facilitate the adoption of operating strategies and mechanisms (e.g., controllers) that efficiently alleviate the detrimental effects of variability. These requirements are addressed through methods for integrated process design and control1, which aim to design processes that are both economically optimum at steady state and robust under the influence of disturbances. Such methods need to account for a very wide range of process structural and operating options, simultaneously with numerous variability scenarios and detailed process models. The simultaneous consideration of all these features requires extremely intense computational effort.

Modern high-performance computing technologies in the form of upcoming exascale systems provide a unique opportunity to address the computationally demanding calculations. Exascale class machines will see a massive increase in the number of computing units (between tens and hundreds of millions), in the form of homogeneous cores or heterogeneous mixtures of multipurpose CPUs, GPUs and other specialized processing units2. However, the exploitation of such technologies includes significant challenges as they require the development of algorithms that exhibit high scalability, enable effortless portability within heterogeneous computing resources and support resiliency to failures. Available algorithms for integrated process design and control have not been developed with such capabilities. There are few occasions where such algorithms have been used in parallel environments, without specific references to optimizations that facilitate parallel execution3. Specific developments in optimization algorithms4 and in model predictive control5 for parallel environments are available, although the two are not combined and the latter are rare. Reduced-order modelling for process design and control is becoming increasingly popular6, but the focus is not on optimization for parallel infrastructures. Optimization algorithms have been reported that make use of either CPUs or GPUs4,5, but not both. Frameworks such as OpenCL are generally available for cross-platform, parallel programming7, but specialized codes are needed depending on the type of computing unit utilized. Derivative-free optimization (DFO) algorithms often result in redundant computations as they lack the mechanisms to avoid them, whereas the challenges exhibited in the convergence of derivative-based optimization (DBO) algorithms is an open research topic. Algorithms with such features would require significant upgrading for use in parallel computing infrastructures, to avoid over- or under-utilization of computational resources and load imbalances, to improve their scalability and to enable exploitation of heterogeneous resources.

This work proposes a novel and generic algorithmic scheme for simultaneous process design and controllability assessment that combines approximate computing techniques with skeleton programming and run-time scheduling over heterogeneous computing nodes. In approximate computing techniques, exact computations are replaced by selective approximations that incur significant reduction in computational resources utilization. This comes at the expense of a small deterioration in the accuracy of the obtained solutions8. The algorithm is further developed using general purpose skeletons9 for parallel matrix operations simultaneously on both CPUs and GPUs. Skeleton programming increases the developer’s productivity by abstracting away parallelism, and enables effortless deployment on large scale computing systems. Finally, a run-time scheduling library10 is employed that is also suitable for heterogeneous computing nodes and enables efficient load management.

These methods are incorporated into a hybrid scheme that comprises an external, DFO-based layer that handles discrete optimization variables and an internal, DBO-based layer for continuous process optimization and controllability assessment. In every iteration, the external DFO generates values for the discrete variables which are used by a DBO for steady-state process optimization using an economic objective function. The control performance of each optimum process design, generated by the DBO, is assessed by a homotopy-continuation algorithm. The latter is used to evaluate the non-linear sensitivity of the process in the context of a control structure, within a wide range of variability scenarios. The control structure takes the form of an objective function that accounts for the distance of each design solution under variability from the desired set-point and for the cost of the resources needed to bring the process back to its set-point. These two objective functions are aggregated into one that is used by the DFO to guide the overall optimization search until convergence. The employed approximate computing techniques include memoization, task dropping and loop perforation. Memoization is used at the level of the DFO to generate a record of previously visited solutions and to avoid their time-consuming re-evaluation in the internal layer. Task dropping is used within the DBO and the homotopy-continuation algorithms to efficiently avoid the time- consuming simulations that eventually result in solutions that fail to meet the algorithmic convergence criteria. Loop-perforation is used within the control performance assessment, in a scheme that enables gradual increase of the intensity of the investigated variability scenarios as the DFO proceeds to convergence. The skeleton programing and the run-time scheduling frameworks are used to parallelize the equality and inequality constraints that form the process model used by the DBO and the homotopy-continuation algorithms. Their implementation is adapted to the modular formulation of the superstructure used to develop the process model, enabling the simultaneous distribution of different process modules on the desired computing resources (CPUs or GPUs).

The proposed algorithm is implemented on the optimum design and control of chemisorption-based CO2 capture flowsheets. A superstructure of absorption/desorption flowsheets is used that enables the consideration of wide stream distribution options, together with sizing of equipment and operating optimization11. Variability is considered in feed compositions, operating temperatures etc.12. The employed DFO is Simulated Annealing, whereas the DBO is the Interior Point Optimizer (IPOPT) and the homotopy-continuation algorithm is PITCON. Table 1 below shows indicative results from the implementation of memoization and task-dropping. It is clear that the number of function evaluations in each thread is much smaller due to the two techniques. The total CPU time to convergence of the new algorithm is 60 times lower. The optimum objective function value attained by the new algorithm is 44% better. These CPU time and objective function values correspond to a design space that is 2 orders of magnitude wider than the conventional algorithm, in terms of discrete parameter combinations.

Furthermore, the task dropping approach reduces on average by 63% the number of function evaluations that result in invalid simulations and this greatly improves the time performance of the algorithm. Further investigations are performed for the case of loop perforation, which exhibits similar improvements as the ones reported above. The skeleton programming and the scheduling frameworks further enable approximately 40% lower CPU time compared to an algorithm without them. Results will also be reported regarding the simultaneous use of CPUs and GPUs and the implementation in a supercomputer using up to 1000 parallel threads.

Acknowledgements

This work has received funding from the European Union’s Horizon 2020 research and innovation programme, under grant agreement No. 801015 (EXA2PRO, https://exa2pro.eu/). This work was supported by computational time granted from the National Infrastructures for Research and Technology S.A. (GRNET) in the National HPC facility - ARIS – under project EXACO2.

References

1. P. Vega, R. Lamanna de Rocco, S. Revollar and M. Francisco, Comput. Chem. Eng., 2014, 71, 602–617.

2. S. Ashby, P. Beckman, J. Chen, P. Colella, B. Collins, D. Crawford, J. Dongarra, K. Kothe, R. Lusk, P. Messina, T. Mezzacappa, P. Moin, M. Norman, R. Rosner, V. Sarkar, A. Siegel, F. Streitz, A. White and M. Wright, The Opportunities and Challenges of Exascale Computing, Summary Report of the Advanced Scientific Computing Advisory Committee (ASCAC) Subcommittee, 2010.

3. D. C. Miller, D. Agarwal, D. Bhattacharyya, J. Boverhof, Y. Chen, J. Eslick, J. Leek, J. Ma, P. Mahapatra, B. Ng, N. V. Sahinidis, C. Tong and S. E. Zitney, in Process Systems and Materials for CO 2 Capture , 2017.

4. B. Sauk, N. Ploskas and N. Sahinidis, Optim. Methods Softw., 2020, 35, 638–660.

5. N. F. Gade-Nielsen, Interior Point Methods on GPU with application to Model Predictive Control, PhD Thesis, Denmark Technical University

6. J. H. Lee, J. Shin and M. J. Realff, Comput. Chem. Eng., 2018, 114, 111–121.

7. D. Grewe and M. F. P. O’Boyle, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2011, vol. 6601 LNCS, pp. 286–305.

8. B. Grigorian and G. Reinman, in ACM Transactions on Architecture and Code Optimization, 2015, vol. 12.

9. A. Ernstsson and C. Kessler, in Parallel Computing: Technology Trends, series: Advances in Parallel Computing, 2020, pp. 475–484.

10. C. Augonnet, S. Thibault, R. Namyst and P. A. Wacrenier, in Concurrency Computation Practice and Experience, 2011, vol. 23, pp. 187–198.

11. T. Damartzis, A. I. Papadopoulos and P. Seferlis, Clean Technol. Environ. Policy, 2014, 16, 1363–1380.

12. P. Seferlis and J. Grievink, Comput. Chem. Eng., 2004, 17, 326–351.