Mixed precision block fused multiply-add: error analysis and application to GPU tensor cores
From MaRDI portal
Publication:3300847
Recommendations
- Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor Cores
- Rounding error analysis of mixed precision block Householder QR algorithms
- Performance and energy consumption of accurate and mixed-precision linear algebra kernels on GPUs
- Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems
- Mixed precision algorithms in numerical linear algebra
Cites work
- A Class of Fast and Accurate Summation Algorithms
- A New Approach to Probabilistic Rounding Error Analysis
- A new analysis of iterative refinement and its application to accurate solution of ill-conditioned sparse linear systems
- Accelerating the solution of linear systems by iterative refinement in three precisions
- Accuracy and Stability of Numerical Algorithms
- IEEE754 Precision- k base-β Arithmetic Inherited by Precision- m Base-β Arithmetic for k < m
- Squeezing a Matrix into Half Precision, with an Application to Solving Linear Systems
- Systolic super summation
- The Arithmetic of the Digital Computer: A New Approach
- Verification methods: rigorous results using floating-point arithmetic
Cited in
(12)- Rigorous floating-point mixed-precision tuning
- Mixed precision algorithms in numerical linear algebra
- Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems
- Mixed-precision explicit stabilized Runge-Kutta methods for single- and multi-scale differential equations
- Numerical algorithms for high-performance computational science
- Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor Cores
- Double precision is not necessary for LSQR for solving discrete linear ill-posed problems
- Numerical stability of algorithms at extreme scale and low precisions
- Sharper probabilistic backward error analysis for basic linear algebra kernels with random data
- Exploiting lower precision arithmetic in solving symmetric positive definite linear systems and least squares problems
- Performance and energy consumption of accurate and mixed-precision linear algebra kernels on GPUs
- Rounding error analysis of mixed precision block Householder QR algorithms
This page was built for publication: Mixed precision block fused multiply-add: error analysis and application to GPU tensor cores
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q3300847)