Optimizing sparse matrix-matrix multiplication for the GPU

DOI10.1145/2699470MaRDI QIDQ2828151zbMATH OpenOpenAlexWikidataFDO

Authors Steven Dalton, Luke Olson, Nathan Bell

Publication date 24 October 2016

Published in ACM Transactions on Mathematical Software (Search for Journal in Brave)

Full work available at URL https://doi.org/10.1145/2699470

sparse matrices parallel algorithms GPU matrix-matrix multiplication

Computational methods for sparse matrices (65F50) Parallel numerical computation (65Y05) Numerical algorithms for specific classes of architectures (65Y10)

Recommendations

Memory-Efficient Sparse Matrix-Matrix Multiplication by Row Merging on Many-Core Architectures
A novel multi-GPU parallel optimization model for the sparse matrix-vector multiplication
Sparse matrix-vector multiplication on GPGPUs
Sparse matrix-vector multiplication on NVIDIA GPU
GPU-accelerated sparse matrix-matrix multiplication by iterative row merging

Cites work

scientific article; zbMATH DE number 1069512 (Why is no real title available?)
An overview of the Trilinos project
Exposing fine-grained parallelism in algebraic multigrid methods
Maximum matchings in general graphs through randomization
Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments
Sparse matrix multiplication package (SMMP)
The University of Florida sparse matrix collection
Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition
Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy

Cited in

(21)

A new class of AMG interpolation methods based on matrix-matrix multiplications
GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU
Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication
Effective minimally-invasive GPU acceleration of distributed sparse matrix factorization
Redesigning triangular dense matrix computations on GPUs
A Communication Optimization Scheme for Basis Computation of Krylov Subspace Methods on Multi-GPUs
HPMaX: heterogeneous parallel matrix multiplication using CPUs and GPUs
Reducing communication costs for sparse matrix multiplication within algebraic multigrid
GPU-accelerated sparse matrix-matrix multiplication by iterative row merging
A novel multi-GPU parallel optimization model for the sparse matrix-vector multiplication
On optimizing multiplications of sparse matrices
Generating optimized sparse matrix vector product over finite fields
KBLAS: an optimized library for dense matrix-vector multiplication on GPU accelerators
Cache friendly sparse matrix-vector multiplication
Memory-Efficient Sparse Matrix-Matrix Multiplication by Row Merging on Many-Core Architectures
Efficient CSR-based sparse matrix-vector multiplication on GPU
A two-scale approach for efficient on-the-fly operator assembly in massively parallel high performance multigrid codes
scientific article; zbMATH DE number 1728269 (Why is no real title available?)
Randomized GPU Algorithms for the Construction of Hierarchical Matrices from Matrix-Vector Operations
Accelerating Iterative SpMV for the Discrete Logarithm Problem Using GPUs
Achieving Native GPU Performance for Out-of-Card Large Dense Matrix Multiplication

Describes a project that uses

Uses Software

CUDA
PETSc
Trilinos
SparseMatrix
CUSP
Thrust
SMMP

This page was built for publication: Optimizing sparse matrix-matrix multiplication for the GPU

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2828151)