Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing
From MaRDI portal
Publication:608851
DOI10.1016/j.parco.2010.06.001zbMath1214.65020OpenAlexW2125960020MaRDI QIDQ608851
Rajib Nath, Stanimire Z. Tomov, Jack J. Dongarra
Publication date: 26 November 2010
Published in: Parallel Computing (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1016/j.parco.2010.06.001
GPUsbidiagonalizationdense linear algebraHessenberg reductionhybrid computingtridiagonalizationtwo-sided factorizations
Related Items
Algorithm 1019: A Task-based Multi-shift QR/QZ Algorithm with Aggressive Early Deflation, The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale, Parallel Solver for Shifted Systems in a Hybrid CPU--GPU Framework, Parallel and Heterogeneous $m$--Hessenberg--Triangular--Triangular Reduction, Numerical algorithms for the determinant evaluation of general Hessenberg matrices, KBLAS, Algorithm 953, Linear algebra software for large-scale accelerated multicore computing, Parallel two-stage reduction to Hessenberg form using dynamic scheduling on shared-memory architectures, Towards dense linear algebra for hybrid GPU accelerated manycore systems, A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators
Uses Software
Cites Work
- Unnamed Item
- Towards dense linear algebra for hybrid GPU accelerated manycore systems
- Block reduction of matrices to condensed forms for eigenvalue computations
- Communication-optimal Parallel and Sequential QR and LU Factorizations
- Programming matrix algorithms-by-blocks for thread-level parallelism
- A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators
- Minimizing Communication in Numerical Linear Algebra
- The WY Representation for Products of Householder Matrices
- A Storage-Efficient $WY$ Representation for Products of Householder Transformations
- LAPACK Users' Guide
- Using the Hessenberg decomposition in control theory
- Communication-optimal Parallel and Sequential Cholesky Decomposition