Introducing a parallel cache oblivious blocking approach for the lattice Boltzmann method (Q929255): Difference between revisions

From MaRDI portal
Import240304020342 (talk | contribs)
Set profile property.
Set OpenAlex properties.
 
Property / full work available at URL
 
Property / full work available at URL: https://doi.org/10.1504/pcfd.2008.018088 / rank
 
Normal rank
Property / OpenAlex ID
 
Property / OpenAlex ID: W2128746953 / rank
 
Normal rank

Latest revision as of 10:29, 30 July 2024

scientific article
Language Label Description Also known as
English
Introducing a parallel cache oblivious blocking approach for the lattice Boltzmann method
scientific article

    Statements

    Introducing a parallel cache oblivious blocking approach for the lattice Boltzmann method (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    16 June 2008
    0 references
    Summary: We propose a parallel cache oblivious spatial and temporal blocking algorithm for the lattice Boltzmann method in three spatial dimensions. The algorithm has originally been proposed by \textit{M. Frigo} et al. [ACM Trans. Algorithms 8, No. 1, Paper No. 4, 22 p. (2012; Zbl 1295.68236)] and divides the space-time domain of stencil-based methods in an optimal way, independently of any external parameters, e.g., cache size. In view of the increasing gap between processor speed and memory performance this approach offers a promising path to increase cache utilisation. We find that even a straightforward cache oblivious implementation can reduce memory traffic at least by a factor of two if compared to a highly optimised standard kernel and improves scalability for shared memory parallelisation. Due to the recursive structure of the algorithm we use an unconventional parallelisation scheme based on task queuing.
    0 references
    lattice Boltzmann
    0 references
    cache optimisation
    0 references
    cache oblivious
    0 references
    multi core
    0 references
    task queuing
    0 references
    shared memory parallelisation
    0 references

    Identifiers