Towards dense linear algebra for hybrid GPU accelerated manycore systems

DOI10.1016/J.PARCO.2009.12.005MaRDI QIDQ991102zbMATH OpenOpenAlexFDO

Authors Marc Baboulin, Stanimire Tomov, Jack Dongarra

Publication date 2 September 2010

Published in Parallel Computing (Search for Journal in Brave)

Full work available at URL http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.214.5312

parallel algorithms graphics processing units multicore processors dense linear algebra hybrid computing

Parallel numerical computation (65Y05) Numerical linear algebra (65F99) Parallel algorithms in computer science (68W10) Computer system organization (68M99)

Recommendations

Accelerating GPU kernels for dense linear algebra
Accelerating numerical dense linear algebra calculations with GPUs
Linear systems solvers for distributed-memory machines with GPU accelerators
Heterogenous Acceleration for Linear Algebra in Multi-coprocessor Environments
Dense linear algebra kernels on heterogeneous platforms: Redistribution issues
scientific article; zbMATH DE number 4176333
Parallel Algorithms for Dense Linear Algebra Computations
Accelerating iterative linear solvers using multiple graphical processing units

Cites work

Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing
Accuracy and Stability of Numerical Algorithms
Communication-optimal parallel and sequential QR and LU factorizations
GEMM-based level 3 BLAS
LAPACK Users' Guide
Minimizing communication in numerical linear algebra
Out-of-core solution of linear systems on graphics processors
Using Mixed Precision for Sparse Matrix Computations to Enhance the Performance while Achieving 64-bit Accuracy

Cited in

(34)

Simulating Low Precision Floating-Point Arithmetic
Divide and conquer on hybrid GPU-accelerated multicore systems
Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems
Direct numerical simulations of turbulent reacting flows with shock waves and stiff chemistry using many-core/GPU acceleration
ELSI -- an open infrastructure for electronic structure solvers
A linear algebra method to decompose forms whose length is lower than the number of variables into weighted sum of squares
Productivity, performance, and portability for computational fluid dynamics applications
GPU-acceleration of the ELPA2 distributed eigensolver for dense symmetric and Hermitian eigenproblems
A parallel computing method using blocked format with optimal partitioning for SpMV on GPU
ARKODE: a flexible IVP solver infrastructure for one-step methods
A LAPACK implementation of the dynamic mode decomposition
Quantum circuits synthesis using Householder transformations
GPU acceleration of all-electron electronic structure theory using localized numeric atom-centered basis functions
Accelerating the reduction to upper Hessenberg, tridiagonal, and bidiagonal forms through hybrid GPU-based computing
Performance models and workload distribution algorithms for optimizing a hybrid CPU-GPU multifrontal solver
Computing least squares condition numbers on hybrid multicore/GPU systems
A new approach to the lattice Boltzmann method for graphics processing units
Accelerating GPU kernels for dense linear algebra
GPU accelerated computation of the isogeometric analysis stiffness matrix
Accelerating numerical dense linear algebra calculations with GPUs
A new era in scientific computing: domain decomposition methods in hybrid CPU-GPU architectures
Randomized GPU Algorithms for the Construction of Hierarchical Matrices from Matrix-Vector Operations
Using Random Butterfly Transformations to Avoid Pivoting in Sparse Direct Methods
Adapting regularized low-rank models for parallel architectures
scientific article; zbMATH DE number 7640509 (Why is no real title available?)
Extending the length and time scales of Gram-Schmidt Lyapunov vector computations
DG-IMEX method for a two-moment model for radiation transport in the \(\mathcal{O}(v/c)\) limit
Exploiting lower precision arithmetic in solving symmetric positive definite linear systems and least squares problems
GPU parameter tuning for tall and skinny dense linear least squares problems
A heterogeneous parallel LU factorization algorithm based on a basic column block uniform allocation strategy
LU factorization on heterogeneous systems: an energy-efficient approach towards high performance
GPU accelerated Newton for Taylor series solutions of polynomial homotopies in multiple double precision
Direct numerical simulations of reacting flows with detailed chemistry using many-core/GPU acceleration
GPU-acceleration of stiffness matrix calculation and efficient initialization of EFG meshless methods

Describes a project that uses

Uses Software

mctoolbox
LINPACK
CUDA
GEMM
LAPACK

This page was built for publication: Towards dense linear algebra for hybrid GPU accelerated manycore systems

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q991102)