(245b) An Algorithm for Exascale-Capable Integrated Process Design and Control

Conference

AIChE Annual Meeting

Year

2021

Proceeding

2021 Annual Meeting

Group

Topical Conference: Next-Gen Manufacturing

Session

Cybersecurity and High-Performance Computing in Next-Gen Manufacturing

Time

Tuesday, November 9, 2021 - 8:23am to 8:46am

Authors

Papadopoulos, A. - Presenter, Center for Research and Technology-Hellas

Vasilas, N., Centre for Research and Technology Hellas

Papadopoulos, L., National Technical University of Athens

Soudris, D., National Technical University of Athens

Seferlis, P., Aristotle University of Thessaloniki

Industrial processes are subject to endogenous and exogenous variability that has detrimental effects on their performance. It is therefore necessary to design process systems that inherently exhibit low sensitivity to variability, but also facilitate the adoption of operating strategies and mechanisms (e.g., controllers) that efficiently alleviate the detrimental effects of variability. These requirements are addressed through methods for integrated process design and control¹, which aim to design processes that are both economically optimum at steady state and robust under the influence of disturbances. Such methods need to account for a very wide range of process structural and operating options, simultaneously with numerous variability scenarios and detailed process models. The simultaneous consideration of all these features requires extremely intense computational effort.

Modern high-performance computing technologies in the form of upcoming exascale systems provide a unique opportunity to address the computationally demanding calculations. Exascale class machines will see a massive increase in the number of computing units (between tens and hundreds of millions), in the form of homogeneous cores or heterogeneous mixtures of multipurpose CPUs, GPUs and other specialized processing units². However, the exploitation of such technologies includes significant challenges as they require the development of algorithms that exhibit high scalability, enable effortless portability within heterogeneous computing resources and support resiliency to failures. Available algorithms for integrated process design and control have not been developed with such capabilities. There are few occasions where such algorithms have been used in parallel environments, without specific references to optimizations that facilitate parallel execution³. Specific developments in optimization algorithms⁴ and in model predictive control⁵ for parallel environments are available, although the two are not combined and the latter are rare. Reduced-order modelling for process design and control is becoming increasingly popular⁶, but the focus is not on optimization for parallel infrastructures. Optimization algorithms have been reported that make use of either CPUs or GPUs^4,5, but not both. Frameworks such as OpenCL are generally available for cross-platform, parallel programming⁷, but specialized codes are needed depending on the type of computing unit utilized. Derivative-free optimization (DFO) algorithms often result in redundant computations as they lack the mechanisms to avoid them, whereas the challenges exhibited in the convergence of derivative-based optimization (DBO) algorithms is an open research topic. Algorithms with such features would require significant upgrading for use in parallel computing infrastructures, to avoid over- or under-utilization of computational resources and load imbalances, to improve their scalability and to enable exploitation of heterogeneous resources.

This work proposes a novel and generic algorithmic scheme for simultaneous process design and controllability assessment that combines approximate computing techniques with skeleton programming and run-time scheduling over heterogeneous computing nodes. In approximate computing techniques, exact computations are replaced by selective approximations that incur significant reduction in computational resources utilization. This comes at the expense of a small deterioration in the accuracy of the obtained solutions⁸. The algorithm is further developed using general purpose skeletons⁹ for parallel matrix operations simultaneously on both CPUs and GPUs. Skeleton programming increases the developerâ€™s productivity by abstracting away parallelism, and enables effortless deployment on large scale computing systems. Finally, a run-time scheduling library¹⁰ is employed that is also suitable for heterogeneous computing nodes and enables efficient load management.

These methods are incorporated into a hybrid scheme that comprises an external, DFO-based layer that handles discrete optimization variables and an internal, DBO-based layer for continuous process optimization and controllability assessment. In every iteration, the external DFO generates values for the discrete variables which are used by a DBO for steady-state process optimization using an economic objective function. The control performance of each optimum process design, generated by the DBO, is assessed by a homotopy-continuation algorithm. The latter is used to evaluate the non-linear sensitivity of the process in the context of a control structure, within a wide range of variability scenarios. The control structure takes the form of an objective function that accounts for the distance of each design solution under variability from the desired set-point and for the cost of the resources needed to bring the process back to its set-point. These two objective functions are aggregated into one that is used by the DFO to guide the overall optimization search until convergence. The employed approximate computing techniques include memoization, task dropping and loop perforation. Memoization is used at the level of the DFO to generate a record of previously visited solutions and to avoid their time-consuming re-evaluation in the internal layer. Task dropping is used within the DBO and the homotopy-continuation algorithms to efficiently avoid the time- consuming simulations that eventually result in solutions that fail to meet the algorithmic convergence criteria. Loop-perforation is used within the control performance assessment, in a scheme that enables gradual increase of the intensity of the investigated variability scenarios as the DFO proceeds to convergence. The skeleton programing and the run-time scheduling frameworks are used to parallelize the equality and inequality constraints that form the process model used by the DBO and the homotopy-continuation algorithms. Their implementation is adapted to the modular formulation of the superstructure used to develop the process model, enabling the simultaneous distribution of different process modules on the desired computing resources (CPUs or GPUs).

The proposed algorithm is implemented on the optimum design and control of chemisorption-based CO₂ capture flowsheets. A superstructure of absorption/desorption flowsheets is used that enables the consideration of wide stream distribution options, together with sizing of equipment and operating optimization¹¹. Variability is considered in feed compositions, operating temperatures etc.¹². The employed DFO is Simulated Annealing, whereas the DBO is the Interior Point Optimizer (IPOPT) and the homotopy-continuation algorithm is PITCON. Table 1 below shows indicative results from the implementation of memoization and task-dropping. It is clear that the number of function evaluations in each thread is much smaller due to the two techniques. The total CPU time to convergence of the new algorithm is 60 times lower. The optimum objective function value attained by the new algorithm is 44% better. These CPU time and objective function values correspond to a design space that is 2 orders of magnitude wider than the conventional algorithm, in terms of discrete parameter combinations.

Furthermore, the task dropping approach reduces on average by 63% the number of function evaluations that result in invalid simulations and this greatly improves the time performance of the algorithm. Further investigations are performed for the case of loop perforation, which exhibits similar improvements as the ones reported above. The skeleton programming and the scheduling frameworks further enable approximately 40% lower CPU time compared to an algorithm without them. Results will also be reported regarding the simultaneous use of CPUs and GPUs and the implementation in a supercomputer using up to 1000 parallel threads.

Acknowledgements

This work has received funding from the European Unionâ€™s Horizon 2020 research and innovation programme, under grant agreement No. 801015 (EXA2PRO, https://exa2pro.eu/). This work was supported by computational time granted from the National Infrastructures for Research and Technology S.A. (GRNET) in the National HPC facility - ARIS â€“ under project EXACO2.

References

1. P. Vega, R. Lamanna de Rocco, S. Revollar and M. Francisco, Comput. Chem. Eng., 2014, 71, 602â€“617.

2. S. Ashby, P. Beckman, J. Chen, P. Colella, B. Collins, D. Crawford, J. Dongarra, K. Kothe, R. Lusk, P. Messina, T. Mezzacappa, P. Moin, M. Norman, R. Rosner, V. Sarkar, A. Siegel, F. Streitz, A. White and M. Wright, The Opportunities and Challenges of Exascale Computing, Summary Report of the Advanced Scientific Computing Advisory Committee (ASCAC) Subcommittee, 2010.

3. D. C. Miller, D. Agarwal, D. Bhattacharyya, J. Boverhof, Y. Chen, J. Eslick, J. Leek, J. Ma, P. Mahapatra, B. Ng, N. V. Sahinidis, C. Tong and S. E. Zitney, in Process Systems and Materials for CO 2 Capture , 2017.

4. B. Sauk, N. Ploskas and N. Sahinidis, Optim. Methods Softw., 2020, 35, 638â€“660.

5. N. F. Gade-Nielsen, Interior Point Methods on GPU with application to Model Predictive Control, PhD Thesis, Denmark Technical University

6. J. H. Lee, J. Shin and M. J. Realff, Comput. Chem. Eng., 2018, 114, 111â€“121.

7. D. Grewe and M. F. P. Oâ€™Boyle, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2011, vol. 6601 LNCS, pp. 286â€“305.

8. B. Grigorian and G. Reinman, in ACM Transactions on Architecture and Code Optimization, 2015, vol. 12.

9. A. Ernstsson and C. Kessler, in Parallel Computing: Technology Trends, series: Advances in Parallel Computing, 2020, pp. 475â€“484.

10. C. Augonnet, S. Thibault, R. Namyst and P. A. Wacrenier, in Concurrency Computation Practice and Experience, 2011, vol. 23, pp. 187â€“198.

11. T. Damartzis, A. I. Papadopoulos and P. Seferlis, Clean Technol. Environ. Policy, 2014, 16, 1363â€“1380.

12. P. Seferlis and J. Grievink, Comput. Chem. Eng., 2004, 17, 326â€“351.

Topics

Computing and Systems Engineering

Process Automation & Control

Other Sites & Tools

Technical Groups

Technical

Professional/Personal Growth

Societal Needs

Leadership

2025 Spring Meeting and 21st Global Congress on Process Safety

2024 AIChE Annual Meeting

Upcoming Conferences & Events

CEP: October 2024

CEP: September 2024

Explore Areas of Advancement:

Learning Center:

Want to be an Entrepreneur? Personal Stories From Three Successful Entrepreneurs Who Have Traveled This Path.

(245b) An Algorithm for Exascale-Capable Integrated Process Design and Control

AIChE Annual Meeting

2021

2021 Annual Meeting

Topical Conference: Next-Gen Manufacturing

Cybersecurity and High-Performance Computing in Next-Gen Manufacturing

Tuesday, November 9, 2021 - 8:23am to 8:46am

Authors

Topics

More Conference Links

Cancellation Policy

Code of Conduct

Beware of Hotel and Attendee-list Scams