Anatomy of high-performance matrix multiplication

From MaRDI portal
Publication:3549230


DOI10.1145/1356052.1356053zbMath1190.65064WikidataQ56455012 ScholiaQ56455012MaRDI QIDQ3549230

Kazushige Goto, Robert A. van de Geijn

Publication date: 21 December 2008

Published in: ACM Transactions on Mathematical Software (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1145/1356052.1356053



Related Items

Unnamed Item, High-Performance Tensor Contraction without Transposition, Automatic generation of fast algorithms for matrix–vector multiplication, Implementing High-Performance Complex Matrix Multiplication via the 1M Method, Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance, A Componentwise Splitting Method for Pricing American Options Under the Bates Model, Continuation and stability of rotating waves in the magnetized spherical Couette system: secondary transitions and multistability, Computing the Gradient in Optimization Algorithms for the CP Decomposition in Constant Memory through Tensor Blocking, Analytical Modeling Is Enough for High-Performance BLIS, Householder QR Factorization With Randomization for Column Pivoting (HQRRP), Dominant speed factors of active set methods for fast MPC, Strassen's Algorithm for Tensor Contraction, Scientific computations on multi-core systems using different programming frameworks, Towards an efficient use of the BLAS library for multilinear tensor contractions, Penalized splines for smooth representation of high-dimensional Monte Carlo datasets, Upper and lower I/O bounds for pebbling \(r\)-pyramids, Deriving dense linear algebra libraries, PARFES: A method for solving finite element linear equations on multi-core computers, The evaluation of American options in a stochastic volatility model with jumps: an efficient finite element approach, Fast verified solutions of linear systems, Blocked algorithms for the reduction to Hessenberg-triangular form revisited, Heterogeneous computing on mixed unstructured grids with pyfr, Parallel direct solver for solving systems of linear equations resulting from finite element method on multi-core desktops and workstations, Direct reconstruction method for discontinuous Galerkin methods on higher-order mixed-curved meshes III. Code optimization via tensor contraction, The matrix reloaded: multiplication strategies in FrodoKEM, GMRES with embedded ensemble propagation for the efficient solution of parametric linear systems in uncertainty quantification of computational models, Modulated rotating waves in the magnetised spherical Couette system, Safe feature elimination for non-negativity constrained convex optimization, A comparison of high-order time integrators for thermal convection in rotating spherical shells, High dimensional tori and chaotic and intermittent transients in magnetohydrodynamic Couette flows, BLIS: A Framework for Rapidly Instantiating BLAS Functionality, Parallel Matrix Multiplication: A Systematic Journey, Oscillatory Convection in Rotating Spherical Shells: Low Prandtl Number and Non-Slip Boundary Conditions


Uses Software