Anatomy of high-performance matrix multiplication

From MaRDI portal
Publication:3549230

DOI10.1145/1356052.1356053zbMath1190.65064OpenAlexW2073061372WikidataQ56455012 ScholiaQ56455012MaRDI QIDQ3549230

Kazushige Goto, Robert A. van de Geijn

Publication date: 21 December 2008

Published in: ACM Transactions on Mathematical Software (Search for Journal in Brave)

Full work available at URL: https://doi.org/10.1145/1356052.1356053




Related Items

Scientific computations on multi-core systems using different programming frameworksTowards an efficient use of the BLAS library for multilinear tensor contractionsMultidimensional Array Data ManagementOscillatory Convection in Rotating Spherical Shells: Low Prandtl Number and Non-Slip Boundary ConditionsPenalized splines for smooth representation of high-dimensional Monte Carlo datasetsFast verified solutions of linear systemsA comparison of high-order time integrators for thermal convection in rotating spherical shellsThe matrix reloaded: multiplication strategies in FrodoKEMHeterogeneous computing on mixed unstructured grids with pyfrPARFES: A method for solving finite element linear equations on multi-core computersThe evaluation of American options in a stochastic volatility model with jumps: an efficient finite element approachParallel Matrix Multiplication: A Systematic JourneyAn efficient implementation of two-component relativistic density functional theory with torque-free auxiliary variablesHigh dimensional tori and chaotic and intermittent transients in magnetohydrodynamic Couette flowsArchitecture-based and target-oriented algorithm optimization of high-order methods via complete-search tensor contractionHigh-Performance Tensor Contraction without TranspositionA high-performance implementation of atomistic spin dynamics simulations on x86 CPUsNumerical stability of algorithms at extreme scale and low precisionsUpper and lower I/O bounds for pebbling \(r\)-pyramidsImplementing High-Performance Complex Matrix Multiplication via the 1M MethodDeriving dense linear algebra librariesGMRES with embedded ensemble propagation for the efficient solution of parametric linear systems in uncertainty quantification of computational modelsAutomatic generation of fast algorithms for matrix–vector multiplicationBlocked algorithms for the reduction to Hessenberg-triangular form revisitedHouseholder QR Factorization With Randomization for Column Pivoting (HQRRP)Parallel direct solver for solving systems of linear equations resulting from finite element method on multi-core desktops and workstationsRestructuring the Tridiagonal and Bidiagonal QR Algorithms for PerformanceDominant speed factors of active set methods for fast MPCStrassen's Algorithm for Tensor ContractionA Componentwise Splitting Method for Pricing American Options Under the Bates ModelDirect reconstruction method for discontinuous Galerkin methods on higher-order mixed-curved meshes III. Code optimization via tensor contractionModulated rotating waves in the magnetised spherical Couette systemSafe feature elimination for non-negativity constrained convex optimizationBLIS: A Framework for Rapidly Instantiating BLAS FunctionalityContinuation and stability of rotating waves in the magnetized spherical Couette system: secondary transitions and multistabilityComputing the Gradient in Optimization Algorithms for the CP Decomposition in Constant Memory through Tensor BlockingUnnamed ItemAnalytical Modeling Is Enough for High-Performance BLIS


Uses Software