Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication
From MaRDI portal
Abstract: Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdos-Renyi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first ever implementation of the 3D SpGEMM formulation that also exploits multiple (intra-node and inter-node) levels of parallelism, achieving significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.
Recommendations
- Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments
- New ideas in sparse matrix matrix multiplication
- Simultaneous input and output matrix partitioning for outer-product -- parallel sparse matrix-matrix multiplication
- Optimizing sparse matrix-matrix multiplication for the GPU
- Parallel matrix multiplication: a systematic journey
Cites work
- scientific article; zbMATH DE number 5937963 (Why is no real title available?)
- A Simple Parallel Algorithm for the Maximal Independent Set Problem
- An Improved Spectral Graph Partitioning Algorithm for Mapping Parallel Computations
- An optimized sparse approximate matrix multiply for matrices with decay
- An overview of the Trilinos project
- Communication lower bounds for distributed-memory matrix multiplication
- Detecting short directed cycles using rectangular matrix multiplication and dynamic programming
- Direct Methods for Sparse Linear Systems
- Exposing fine-grained parallelism in algebraic multigrid methods
- GPU-accelerated sparse matrix-matrix multiplication by iterative row merging
- Implementing sparse matrices for graph algorithms
- On techniques to improve robustness and scalability of a parallel hybrid linear solver
- Optimizing sparse matrix-matrix multiplication for the GPU
- Parallel Matrix and Graph Algorithms
- Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments
- Simultaneous input and output matrix partitioning for outer-product -- parallel sparse matrix-matrix multiplication
- Sparse Matrices in MATLAB: Design and Implementation
- Sparse Matrix-Matrix Products Executed Through Coloring
- The University of Florida sparse matrix collection
- Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition
Cited in
(17)- Massively parallel sparse matrix function calculations with NTPoly
- New ideas in sparse matrix matrix multiplication
- Fast matrix multiplication and its algebraic neighbourhood
- Memory-Efficient Sparse Matrix-Matrix Multiplication by Row Merging on Many-Core Architectures
- Efficient parallel linear scaling method to get the response density matrix in all-electron real-space density-functional perturbation theory
- A novel partitioning method for accelerating the block Cimmino algorithm
- The parallelism motifs of genomic data analysis
- Parallel SPNon Multi-Core CPUS and Many-Core GPUS
- Simultaneous input and output matrix partitioning for outer-product -- parallel sparse matrix-matrix multiplication
- Numerical algorithms for high-performance computational science
- \texttt{pylspack}: parallel algorithms and data structures for sketching, column subset selection, regression, and leverage scores
- Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments
- Introduction to communication avoiding algorithms for direct methods of factorization in linear algebra
- Sparse matrix multiplication and triangle listing in the congested clique model
- Sparse matrix multiplication and triangle listing in the congested clique model
- Tail bounds for gaps between eigenvalues of sparse random matrices
- A distributed block Chebyshev-Davidson algorithm for parallel spectral clustering
This page was built for publication: Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2833530)