Performance and energy consumption of accurate and mixed-precision linear algebra kernels on GPUs
From MaRDI portal
Publication:2297181
Recommendations
- Mixed precision algorithms in numerical linear algebra
- Mixed precision block fused multiply-add: error analysis and application to GPU tensor cores
- Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems
- Accelerating GPU kernels for dense linear algebra
- Design, implementation and testing of extended and mixed precision BLAS
Cites work
- scientific article; zbMATH DE number 3303655 (Why is no real title available?)
- A floating-point technique for extending the available precision
- Accelerating the solution of linear systems by iterative refinement in three precisions
- Accurate Sum and Dot Product
- Basic Linear Algebra Subprograms for Fortran Usage
- Design, implementation and testing of extended and mixed precision BLAS
- High-precision division and square root
- MPFR
- Reproducible and accurate matrix multiplication
- The University of Florida sparse matrix collection
Cited in
(5)- Mixed-precision conjugate gradient algorithm using the groupwise update strategy
- Infinite-precision inner product and sparse matrix-vector multiplication using Ozaki scheme with Dot2 on manycore processors
- Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor Cores
- GPU Based Mixed Precision PWR Depletion Calculation
- Mixed precision block fused multiply-add: error analysis and application to GPU tensor cores
This page was built for publication: Performance and energy consumption of accurate and mixed-precision linear algebra kernels on GPUs
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2297181)