Efficient CSR-based sparse matrix-vector multiplication on GPU (Q1793182): Difference between revisions

From MaRDI portal
Changed an Item
ReferenceBot (talk | contribs)
Changed an Item
 
(3 intermediate revisions by 3 users not shown)
Property / describes a project that uses
 
Property / describes a project that uses: CSR5 / rank
 
Normal rank
Property / MaRDI profile type
 
Property / MaRDI profile type: MaRDI publication profile / rank
 
Normal rank
Property / full work available at URL
 
Property / full work available at URL: https://doi.org/10.1155/2016/4596943 / rank
 
Normal rank
Property / OpenAlex ID
 
Property / OpenAlex ID: W2527991513 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q2768030 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Compressed Multirow Storage Format for Sparse Matrices on Graphics Processing Units / rank
 
Normal rank
Property / cites work
 
Property / cites work: A novel CSR-based sparse matrix-vector multiplication on GPUs / rank
 
Normal rank
Property / cites work
 
Property / cites work: A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units / rank
 
Normal rank
Property / cites work
 
Property / cites work: The university of Florida sparse matrix collection / rank
 
Normal rank

Latest revision as of 20:34, 16 July 2024

scientific article
Language Label Description Also known as
English
Efficient CSR-based sparse matrix-vector multiplication on GPU
scientific article

    Statements

    Efficient CSR-based sparse matrix-vector multiplication on GPU (English)
    0 references
    0 references
    0 references
    0 references
    12 October 2018
    0 references
    Summary: Sparse matrix-vector multiplication (SpMV) is an important operation in computational science and needs be accelerated because it often represents the dominant cost in many widely used iterative methods and eigenvalue problems. We achieve this objective by proposing a novel SpMV algorithm based on the compressed sparse row (CSR) on the GPU. Our method dynamically assigns different numbers of rows to each thread block and executes different optimization implementations on the basis of the number of rows it involves for each block. The process of accesses to the CSR arrays is fully coalesced, and the GPU's DRAM bandwidth is efficiently utilized by loading data into the shared memory, which alleviates the bottleneck of many existing CSR-based algorithms (i.e., CSR-scalar and CSR-vector). Test results on C2050 and K20c GPUs show that our method outperforms a perfect-CSR algorithm that inspires our work, the vendor tuned CUSPARSE V6.5 and CUSP V0.5.1, and three popular algorithms clSpMV, CSR5, and CSR-Adaptive.
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references

    Identifiers