Communication lower bounds for distributed-memory matrix multiplication
From MaRDI portal
Recommendations
Cited in
(25)- REVISITING MATRIX PRODUCT ON MASTER-WORKER PLATFORMS
- Massively parallel sparse matrix function calculations with NTPoly
- HPMaX: heterogeneous parallel matrix multiplication using CPUs and GPUs
- Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication
- Parallel complexity of matrix multiplication
- Parallel matrix multiplication: a systematic journey
- Communication Lower Bounds and Optimal Algorithms for Multiple Tensor-Times-Matrix Computation
- A cache-optimal alternative to the unidirectional hierarchization algorithm
- Graph expansion and communication costs of fast matrix multiplication
- Communication efficient matrix multiplication on hypercubes
- Numerical algorithms for high-performance computational science
- Cache optimization and performance modeling of batched, small, and rectangular matrix multiplication on Intel, AMD, and Fujitsu processors
- Task-based parallel programming for scalable matrix product algorithms
- Algorithm 953: Parallel library software for the multishift QR algorithm with aggressive early deflation
- Introduction to communication avoiding algorithms for direct methods of factorization in linear algebra
- A bridging model for multi-core computing
- Matrix exponentials and parallel prefix computation in a quantum control problem
- Pebbling Game and Alternative Basis for High Performance Matrix Multiplication
- scientific article; zbMATH DE number 6691438 (Why is no real title available?)
- Communication lower bounds and optimal algorithms for numerical linear algebra
- Communication lower bounds of bilinear algorithms for symmetric tensor contractions
- Distributed control for large-scale systems with adaptive event-triggering
- On the cost of iterative computations
- Oblivious algorithms for multicores and networks of processors
- Parallel time integration using batched BLAS (Basic Linear Algebra Subprograms) routines
This page was built for publication: Communication lower bounds for distributed-memory matrix multiplication
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1886368)