A set of level 3 basic linear algebra subprograms

DOI10.1145/77626.79170zbMath0900.65115OpenAlexW2002257715WikidataQ56455009 ScholiaQ56455009MaRDI QIDQ4371637

Jeremy J. du Croz, Sven J. Hammarling, Jack J. Dongarra, Iain S. Duff

Publication date: 23 March 1998

Published in: ACM Transactions on Mathematical Software (Search for Journal in Brave)

Full work available at URL: http://www.acm.org/pubs/contents/journals/toms/1990-16/

zbMATH Keywords

verification robustness reliability efficiency testing portability certification matrix-matrix operations

Mathematics Subject Classification ID

Complexity and performance of numerical algorithms (65Y20)

Related Items

A parallel R-matrix program PRMAT for electron-atom and electron-ion scattering calculations, Towards an efficient use of the BLAS library for multilinear tensor contractions, Object-oriented programming in control system design: A survey, A new parallel sparse direct solver: Presentation and numerical experiments in large-scale structural mechanics parallel computing, Interior-point solver for large-scale quadratic programming problems with bound constraints, PROFIL/BIAS - A fast interval library, Sparse Matrix Methods for Circuit Simulation Problems, Unnamed Item, An implicitly restarted block Lanczos bidiagonalization method using Leja shifts, Stabilizing canonical-ensemble calculations in the auxiliary-field Monte Carlo method, Performance models and workload distribution algorithms for optimizing a hybrid CPU-GPU multifrontal solver, Basis selection in LOBPCG, The design of a parallel dense linear algebra software library: Reduction to Hessenberg, tridiagonal, and bidiagonal form, Nonlinear eigenvalue and frequency response problems in industrial practice, A block representation for products of hyperbolic Householder transforms, Parallel benchmarks of turbulence in complex geometries, Explicit parallel block Cholesky algorithms on the CRAY APP, High performance solution of partial differential equations discretized using a Chebyshev spectral collocation method, Parallel solution of almost block diagonal systems on a hypercube, An efficient approach to solve very large dense linear systems with verified computing on clusters, Sparse matrix factorization in the implicit finite element method on petascale architecture, Efficient algorithm for proper orthogonal decomposition of block-structured adaptively refined numerical simulations, A sparse nonsymmetric eigensolver for distributed memory architectures, Optimal size of the block in block GMRES on GPUs: computational model and experiments, Factorizing the factorization -- a spectral-element solver for elliptic equations with linear operation count, Reorthogonalized block classical Gram-Schmidt, Computer algebra systems - new strategies and techniques, Full multi grid method for electric field computation in point-to-plane streamer discharge in air at atmospheric pressure, A \(\mu\)-mode BLAS approach for multidimensional tensor-structured problems, Enhancing Performance and Robustness of ILU Preconditioners by Blocking and Selective Transposition, A highly efficient implementation of a backpropagation learning algorithm using matrix ISA, Efficient algorithms for the discrete Gabor transform with a long FIR window, Codes for almost block diagonal systems, Rank-profile revealing Gaussian elimination and the CUP matrix decomposition, Upper and lower I/O bounds for pebbling \(r\)-pyramids, A multiscale method for model order reduction in PDE parameter estimation, A domain-decomposing parallel sparse linear system solver, Fast interval matrix multiplication, Sparse direct factorizations through unassembled hyper-matrices, Look-ahead in the two-sided reduction to compact band forms for symmetric eigenvalue problems and the SVD, Unnamed Item, Deriving dense linear algebra libraries, Solving sequences of generalized least-squares problems on multi-threaded architectures, Cholesky and Gram-Schmidt Orthogonalization for Tall-and-Skinny QR Factorizations on Graphics Processors, An efficient out-of-core multifrontal solver for large-scale unsymmetric element problems, Approximate eigenvectors as preconditioner, New parallel sparse direct solvers for multicore architectures, Parallel implementation of a multilevel modelling package, Performance evaluation of supercomputers using HPCC and IMB benchmarks, Block-Cholesky for parallel processing, Solving large dense systems of linear equations on systems with virtual memory and with cache, A sparse proximal implementation of the LP dual active set algorithm, Dual multilevel optimization, Comparisons of Gaussian elimination algorithms on a Cray Y-MP, Augmented block Householder Arnoldi method, Parallel solution of almost block diagonal systems on the CRAY Y-MP using level 3 BLAS, High performance BLAS formulation of the multipole-to-local operator in the fast multipole method, VBARMS: a variable block algebraic recursive multilevel solver for sparse linear systems, Fast inclusion of interval matrix multiplication, From steady solutions to chaotic flows in a Rayleigh-Bénard problem at moderate Rayleigh numbers, Efficient use of sparsity by direct solvers applied to 3D controlled-source EM problems, Lattice quantum hadrodynamics on a CRAY Y-MP, Performance of parallel Cholesky factorization algorithms using BLAS, A massively-parallel electronic-structure calculations based on real-space density functional theory, Efficient iterative algorithms for the stochastic finite element method with application to acoustic scattering, Accelerating scientific computations with mixed precision algorithms, High performance BLAS formulation of the adaptive fast multipole method, Solving stable Sylvester equations via rational iterative schemes, A mathematical model of the static pantograph/catenary interaction, Diffusion forecasting model with basis functions from QR-decomposition, Solving path problems on the GPU, RECSY and SCASY Library Software: Recursive Blocked and Parallel Algorithms for Sylvester-Type Matrix Equations with Some Applications, LAPACK-Based Condition Estimates for the Discrete-Time LQG Design, Reproducibility strategies for parallel preconditioned conjugate gradient, Multifrontal Computations on GPUs and Their Multi-core Hosts, The parallel tiled WZ factorization algorithm for multicore architectures, Using dual techniques to derive componentwise and mixed condition numbers for a linear function of a linear least squares solution, Efficient algorithm for simultaneous reduction to the \(m\)-Hessenberg-triangular-triangular form, BLIS: A Framework for Rapidly Instantiating BLAS Functionality, Reliable Generation of High-Performance Matrix Algebra, Block reduction of matrices to condensed forms for eigenvalue computations, Designing linear algebra algorithms on the IBM 3090 vector multiprocessor with a hierarchical memory system, Self-Stabilizing Prefix Tree Based Overlay Networks, Gmsh: A 3-D finite element mesh generator with built-in pre- and post-processing facilities, A parallel Davidson-type algorithm for several eigenvalues, Multifrontal parallel distributed symmetric and unsymmetric solvers, ScaLAPACK: A portable linear algebra library for distributed memory computers -- design issues and performance, High-performance computing -- an overview, A review of frontal methods for solving linear systems, Mathematical software: Past, present, and future, Numerical algorithm delivery mechanisms, A frontal solver for the 21st century, Evaluating recursive filters on distributed memory parallel computers, \(QR\)-like algorithms for eigenvalue problems, Numerical linear algebra algorithms and software, The impact of high-performance computing in the solution of linear systems: Trends and problems, Nodal high-order methods on unstructured grids. I: Time-domain solution of Maxwell's equations, Unnamed Item, A block varaint of the GMRES method for unsymmetric linear systems, STRFLO: A program for time-independent calculations of multiphoton processes in one-electron atomic systems. I: Quasienergy spectra and angular distributions, SOLUTION OF LARGE LINEAR SYSTEMS ON PIPELINED SIMD MACHINES, Logarithmic barriers for sparse matrix cones, The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale, A PARALLEL BLOCK LANCZOS ALGORITHM FOR DISTRIBUTED MEMORY ARCHITECTURES, PARALLEL CFD BENCHMARKS ON CRAY COMPUTERS, Reordering Strategy for Blocking Optimization in Sparse Linear Solvers, Efficient update of determinants for many-electron wave function overlaps, Factorized structure of the long-range two-electron integrals tensor and its application in quantum chemistry, An efficient randomized QLP algorithm for approximating the singular value decomposition, Pebbling Game and Alternative Basis for High Performance Matrix Multiplication, Well‐scaled, a‐posteriori error estimation for model order reduction of large second‐order mechanical systems, Parallel Solution of Hierarchical Symmetric Positive Definite Linear Systems, Analytical modeling of matrix–vector multiplication on multicore processors, High-Performance Tensor Contraction without Transposition, Numerical stability of algorithms at extreme scale and low precisions, On Exploiting Sparsity of Multiple Right-Hand Sides in Sparse Direct Solvers, Implementing High-Performance Complex Matrix Multiplication via the 1M Method, Efficient Reduction of Banded Hermitian Positive Definite Generalized Eigenvalue Problems to Banded Standard Eigenvalue Problems, Real-Time Radiation Treatment Planning with Optimality Guarantees via Cluster and Bound Methods, ADI Methods for Cubic Spline Collocation Discretizations of Elliptic PDE, Using Level 3 BLAS in Rotation-Based Algorithms, Householder QR Factorization With Randomization for Column Pivoting (HQRRP), Linear algebra software for large-scale accelerated multicore computing, A survey of direct methods for sparse linear systems, Efficient computation of the compositional model for gas condensate reservoirs, Block classical Gram–Schmidt-based block updating in low-rank matrix approximation, PopRatio: A program to calculate atomic level populations in astrophysical plasmas, Strassen's Algorithm for Tensor Contraction, Communication lower bounds and optimal algorithms for numerical linear algebra, On short recurrence Krylov type methods for linear systems with many right-hand sides, The Eigenvalues Slicing Library (EVSL): Algorithms, Implementation, and Software, High-performance sampling of generic determinantal point processes, Block Modified Gram--Schmidt Algorithms and Their Analysis, Valuation of Structured Financial Products by Adaptive Multiwavelet Methods in High Dimensions, Computing the Gradient in Optimization Algorithms for the CP Decomposition in Constant Memory through Tensor Blocking, Numerical Computations and Computer Assisted Proofs of Periodic Orbits of the Kuramoto--Sivashinsky Equation, Computing Petaflops over Terabytes of Data, A Parallel Sparse Direct Solver via Hierarchical DAG Scheduling, Analytical Modeling Is Enough for High-Performance BLIS

Uses Software

BLAS