Strassen's Algorithm for Tensor Contraction
From MaRDI portal
Abstract: Tensor contraction (TC) is an important computational kernel widely used in numerous applications. It is a multi-dimensional generalization of matrix multiplication (GEMM). While Strassen's algorithm for GEMM is well studied in theory and practice, extending it to accelerate TC has not been previously pursued. Thus, we believe this to be the first paper to demonstrate how one can in practice speed up tensor contraction with Strassen's algorithm. By adopting a Block-Scatter-Matrix format, a novel matrix-centric tensor layout, we can conceptually view TC as GEMM for a general stride storage, with an implicit tensor-to-matrix transformation. This insight enables us to tailor a recent state-of-the-art implementation of Strassen's algorithm to TC, avoiding explicit transpositions (permutations) and extra workspace, and reducing the overhead of memory movement that is incurred. Performance benefits are demonstrated with a performance model as well as in practice on modern single core, multicore, and distributed memory parallel architectures, achieving up to 1.3x speedup. The resulting implementations can serve as a drop-in replacement for various applications with significant speedup.
Recommendations
- A tensor product formulation of Strassen's matrix multiplication algorithm
- The arithmetic complexity of tensor contraction
- Algorithmic simplification of tensor expressions
- On an algorithm converging to hyperstochastic tensors
- An algorithm to simplify tensor expressions
- The arithmetic complexity of tensor contractions
- Fast bilinear algorithms for symmetric tensor contractions
- High-Performance Tensor Contraction without Transposition
- A Lanczos-type procedure for tensors
- scientific article; zbMATH DE number 7650358
Cites work
- A set of level 3 basic linear algebra subprograms
- Accuracy and Stability of Numerical Algorithms
- Analytical modeling is enough for high-performance BLIS
- Anatomy of high-performance matrix multiplication
- BLIS: a framework for rapidly instantiating BLAS functionality
- Design of a high-performance GEMM-like tensor-tensor multiplication
- Design, implementation and testing of extended and mixed precision BLAS
- Efficient MATLAB Computations with Sparse and Factored Tensors
- Exploiting parallelism in matrix-computation kernels for symmetric multiprocessor systems: matrix-multiplication and matrix-addition algorithm optimizations by software pipelining and threads allocation
- Fast matrix multiplication is stable
- GEMMW: A portable level 3 BLAS Winograd variant of Strassen's matrix- matrix multiply algorithm
- Gaussian elimination is not optimal
- High-Performance Tensor Contraction without Transposition
- Improving the numerical stability of fast matrix multiplication
- Matrix multiplication, a little faster
- Symmetric Tensors and Symmetric Tensor Rank
- TTC: a high-performance compiler for tensor transpositions
- Tensor Decompositions and Applications
- Towards an efficient use of the BLAS library for multilinear tensor contractions
Cited in
(13)- Towards an efficient use of the BLAS library for multilinear tensor contractions
- High-Performance Tensor Contraction without Transposition
- Strassen's algorithm reloaded on GPUs
- scientific article; zbMATH DE number 7650358 (Why is no real title available?)
- Fast bilinear algorithms for symmetric tensor contractions
- Design of a high-performance GEMM-like tensor-tensor multiplication
- Algorithmic simplification of tensor expressions
- On the optimal linear contraction order of tree tensor networks, and beyond
- Implementing high-performance complex matrix multiplication via the 1M method
- On an algorithm converging to hyperstochastic tensors
- Algorithm design for tensor units
- An algorithm to simplify tensor expressions
- TTC: a high-performance compiler for tensor transpositions
This page was built for publication: Strassen's Algorithm for Tensor Contraction
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5745128)