BLIS: A Framework for Rapidly Instantiating BLAS Functionality
From MaRDI portal
Publication:2828133
DOI10.1145/2764454zbMath1347.65054OpenAlexW2252007067WikidataQ57275429 ScholiaQ57275429MaRDI QIDQ2828133
Field G. van Zee, Robert A. van de Geijn
Publication date: 24 October 2016
Published in: ACM Transactions on Mathematical Software (Search for Journal in Brave)
Full work available at URL: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.361.6527
Complexity and performance of numerical algorithms (65Y20) Packaged methods for numerical algorithms (65Y15) Numerical linear algebra (65Fxx) Software, source code, etc. for problems pertaining to numerical analysis (65-04)
Related Items
Towards an efficient use of the BLAS library for multilinear tensor contractions, Multidimensional Array Data Management, The matrix reloaded: multiplication strategies in FrodoKEM, An efficient implementation of two-component relativistic density functional theory with torque-free auxiliary variables, Analytical modeling of matrix–vector multiplication on multicore processors, High-Performance Tensor Contraction without Transposition, A high-performance implementation of atomistic spin dynamics simulations on x86 CPUs, Optimized implementation for calculation and fast-update of Pfaffians installed to the open-source fermionic variational solver mVMC, Implementing High-Performance Complex Matrix Multiplication via the 1M Method, GMRES with embedded ensemble propagation for the efficient solution of parametric linear systems in uncertainty quantification of computational models, A compute-bound formulation of Galerkin model reduction for linear time-invariant dynamical systems, Householder QR Factorization With Randomization for Column Pivoting (HQRRP), Parallel direct solver for solving systems of linear equations resulting from finite element method on multi-core desktops and workstations, Strassen's Algorithm for Tensor Contraction, Efficiency of Reproducible Level 1 BLAS, BLIS, Analytical Modeling Is Enough for High-Performance BLIS
Uses Software
Cites Work
- Unnamed Item
- Exploiting Symmetry in Tensors for High Performance: Multiplication with Symmetric Tensors
- Programming matrix algorithms-by-blocks for thread-level parallelism
- Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures
- Families of Algorithms for Reducing a Matrix to Condensed Form
- Elemental
- Algorithm-Based Fault Tolerance for Matrix Operations
- Accumulating Householder transformations, revisited
- Anatomy of high-performance matrix multiplication
- Cache efficient bidiagonalization using BLAS 2.5 operators
- The WY Representation for Products of Householder Matrices
- An extended set of FORTRAN basic linear algebra subprograms
- A Storage-Efficient $WY$ Representation for Products of Householder Transformations
- LAPACK Users' Guide
- Basic Linear Algebra Subprograms for Fortran Usage
- GEMM-based level 3 BLAS
- A set of level 3 basic linear algebra subprograms
- Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance
- Codesign Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures
- FLAME
- The science of deriving dense linear algebra algorithms
- An overview of the sparse basic linear algebra subprograms