(133c) Memory Optimization for Particle Access in CFD-DEM Simulations | AIChE

(133c) Memory Optimization for Particle Access in CFD-DEM Simulations

Authors 

Vaidhynathan, D. - Presenter, National Renewable Energy Laboratory
Sitaraman, H., National Renewable Energy Laboratory
Grout, R., National Renewable Energy Lab
Hauser, T., University of Colorado at Boulder
Hrenya, C., University of Colorado
Computational Fluid Dynamics – Discrete Element Method (CFD-DEM) simulations are employed in modeling large-scale gas-solid systems in energy, pharmaceutical, and petrochemical industries. These simulations play a crucial role in the design of industrial-scale systems that are otherwise difficult to design using laboratory-scale experiments. The computational expense of using a large number of particles in these cases (order of billion or more) inhibit the use of CFD-DEM, which is otherwise a more rigorous and accurate approach compared to continuum approximations of the particle phase. Memory access bottlenecks on current high-performance computing architectures become considerable at this scale. The particle data structures in these simulations have a large memory footprint that does not fit into the processor caches leading to poor computational performance. The focus of this work is on addressing this bottleneck using a space-filling-curve approach for reordering particle data structures in CFD-DEM solvers.

The use of the Morton [1] space-filling curve to improve memory access patterns in the particle data structure is described, and their impact on the performance of representative CFD-DEM simulations is presented. The Morton space-filling curve was applied to co-located particles on uniform and non-uniform k-dimensional tree [2] generated cartesian grids. This optimization technique was applied to reorder the particle data-structure to improve spatial and temporal locality in memory. The performance impact of this technique when applied to two benchmarks cases: the homogeneous cooling system [3] and fluidized bed are presented. This optimization technique leads to a two-fold performance improvement in operations that involve accessing the particle data structure, such as creation of neighbor lists, collisional force calculation and inter-processor data exchange. An example case of a homogeneous cooling system simulation where particles are spatially distributed by thermal motion is described in the attached figure. The particle index, which associates to the position in memory, is not correlated with its spatial location without reordering, leading to cache misses and memory access inefficiencies. On the other hand, spatial reordering of the particle data-structure leads to a twenty percent overall improvement in performance. Some frequently performed operations, such as building neighbor lists, exhibit a significant improvement (about two times) while the expense of reordering particles remains negligible (~ 6% of overall run-time).

References:

[1] D. W. Walker, Morton ordering of 2d arrays for efficient access to hierarchical memory, The International Journal of High Performance Computing Applications 32 (1) (2018) 189–203.

[2] M. Grandin, Data structures and algorithms for high-dimensional structured adaptive mesh refinement, Advances in Engineering Software 82 (2015) 75–86.

[3] W. Fullmer, C. Hrenya, The homogeneous cooling state as a verification test for kinetic-theory-based continuum models of gas-solid flows, Journal of Verification, Validation and Uncertainty Quantification 2 (4) (2017) 044501.