Parallel Algorithms for Tensor Train Arithmetic
From MaRDI portal
Publication:5028405
Abstract: We present efficient and scalable parallel algorithms for performing mathematical operations for low-rank tensors represented in the tensor train (TT) format. We consider algorithms for addition, elementwise multiplication, computing norms and inner products, orthogonalization, and rounding (rank truncation). These are the kernel operations for applications such as iterative Krylov solvers that exploit the TT structure. The parallel algorithms are designed for distributed-memory computation, and we use a data distribution and strategy that parallelizes computations for individual cores within the TT format. We analyze the computation and communication costs of the proposed algorithms to show their scalability, and we present numerical experiments that demonstrate their efficiency on both shared-memory and distributed-memory parallel systems. For example, we observe better single-core performance than the existing MATLAB TT-Toolbox in rounding a 2GB TT tensor, and our implementation achieves a speedup using all 40 cores of a single node. We also show nearly linear parallel scaling on larger TT tensors up to over 10,000 cores for all mathematical operations.
Recommendations
- Parallel Algorithms for Computing the Tensor-Train Decomposition
- Parallel Algorithms for Low Rank Tensor Arithmetic
- Parallel approximation of multidimensional tensors using GPUs
- Efficient vector and parallel manipulation of tensor products
- High performance rearrangement and multiplication routines for sparse tensor arithmetic
- scientific article; zbMATH DE number 4176333
- Parallel Algorithms for Dense Linear Algebra Computations
- scientific article; zbMATH DE number 47823
- Parallel algorithms for certain matrix computations
- scientific article; zbMATH DE number 193901
Cites work
- A new scheme for the tensor representation
- A survey of projection-based model reduction methods for parametric dynamical systems
- An overview of the Trilinos project
- Analysis of individual differences in multidimensional scaling via an \(n\)-way generalization of ``Eckart-Young decomposition
- Block tensor unfoldings
- Certified reduced basis methods for parametrized partial differential equations
- Communication lower bounds and optimal algorithms for numerical linear algebra
- Communication-optimal parallel and sequential QR and LU factorizations
- Faster tensor train decomposition for sparse data
- Krylov subspace methods for linear systems with tensor product structure
- Low-Rank Tensor Approximation for High-Order Correlation Functions of Gaussian Random Fields
- Low-rank solution to an optimization problem constrained by the Navier-Stokes equations
- Low-rank solvers for unsteady Stokes-Brinkman optimal control problem with random data
- Numerical operator calculus in higher dimensions
- PLANC
- Recompression of Hadamard Products of Tensors in Tucker Format
- Reduced basis methods for partial differential equations. An introduction
- TT-cross approximation for multidimensional arrays
- Tensor Decompositions and Applications
- Tensor approximations of matrices generated by asymptotically smooth functions
- Tensor train approximation of moment equations for elliptic equations with lognormal coefficient
- Tensor-train decomposition
- TuckerMPI: a parallel C++/MPI software package for large-scale data compression via the Tucker tensor decomposition
- \(O(d \log N)\)-quantics approximation of \(N\)-\(d\) tensors in high-dimensional numerical modeling
Cited in
(15)- Efficient vector and parallel manipulation of tensor products
- Parallel ALS algorithm for solving linear systems in the hierarchical Tucker representation
- Parallel Algorithms for Low Rank Tensor Arithmetic
- Generative modeling via tensor train sketching
- Implicit integration of nonlinear evolution equations on tensor manifolds
- Higher-Order QR with Tournament Pivoting for Tensor Compression
- Parallel approximation of multidimensional tensors using GPUs
- Performance of the low-rank TT-SVD for large dense tensors on modern multicore CPUs
- Parallel Algorithms for Computing the Tensor-Train Decomposition
- Adaptive integration of nonlinear evolution equations on tensor manifolds
- Randomized Algorithms for Rounding in the Tensor-Train Format
- Fundamental tensor operations for large-scale data analysis using tensor network formats
- Tensor rank reduction via coordinate flows
- Imposing different boundary conditions for thermal computational homogenization problems with FFT- and tensor-train-based Green's operator methods
- Distributed hierarchical SVD in the hierarchical Tucker format.
Describes a project that uses
Uses Software
This page was built for publication: Parallel Algorithms for Tensor Train Arithmetic
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q5028405)