PLASMA - MaRDI portal

Cited in

(93)

A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators
Multi-GPU implementation of the lattice Boltzmann method
Implementing High-performance Complex Matrix Multiplication via the 3m and 4m Methods
Scientific computations on multi-core systems using different programming frameworks
Evaluation of selected resource allocation and scheduling methods in heterogeneous many-core processors and graphics processing units
ViennaCL-linear algebra library for multi- and many-core architectures
BLIS: a framework for rapidly instantiating BLAS functionality
Divide and conquer on hybrid GPU-accelerated multicore systems
A parallel algorithm for calculation of determinants and minors using arbitrary precision arithmetic
Toward a high performance tile divide and conquer algorithm for the dense symmetric eigenvalue problem
Parallel hierarchical hybrid linear solvers for emerging computing platforms
Exploiting symmetry in tensors for high performance: multiplication with symmetric tensors
An inertia-free filter line-search algorithm for large-scale nonlinear programming
An efficient approach to solve very large dense linear systems with verified computing on clusters.
A new sparse \(LDL^T\) solver using a posteriori threshold pivoting
Towards an efficient tile matrix inversion of symmetric positive definite matrices on multicore architectures
Linear algebra software for large-scale accelerated multicore computing
Algorithm 953: Parallel library software for the multishift QR algorithm with aggressive early deflation
SLATE
H-LU factorization on many-core systems
KBLAS: an optimized library for dense matrix-vector multiplication on GPU accelerators
FLAME
ScaLAPACK
PLAPACK
Algorithm 826
BLIS
CALU
libflame
Elemental
POOCLAPACK
SOLAR
Cilk
Cellss
SBR Toolbox
PLASMA
Algorithm 880
CULA
hwloc
MAGMA
LogGOPSim
MR3-SMP
HSL_MA87
STREAM benchmark
HSL_MA79
Superglue
QUARK
StarPU
FastFlow
CUMP
GPUprec
MPIGMP
SWARM
Wool
MINMOD
Algorithm 953
KBLAS
SSIDS
PBLAS
Algorithm 656
DAGuE
SuperMatrix
OpenGM
ReLAPACK
PDHSEQR
PDLAQR1
UHM
OmpSs
HSL_MA86
Zippy
qr_mumps
A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines
Static Scheduling with Load Balancing for Solving Triangular Band Linear Systems on Multicore Processors
A distributed and incremental SVD algorithm for agglomerative data analysis on large networks
A novel parallel algorithm based on the Gram-Schmidt method for tridiagonal linear systems of equations
AxiSEM
A high performance QDWH-SVD solver using hardware accelerators
An improved divide-and-conquer algorithm for the banded matrices with narrow bandwidths
Exact likelihood-free Markov chain Monte Carlo for elliptically contoured distributions
SPEX Left LU
Efficient semidefinite branch-and-cut for MAP-MRF inference
Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems
Parallel direct methods for solving the system of linear equations with pipelining on a multicore using OpenMP
SGEMM
Hiding global communication latency in the GMRES algorithm on massively parallel machines
The parallel tiled WZ factorization algorithm for multicore architectures
An efficient multicore implementation of a novel HSS-structured multifrontal solver using randomized sampling
Accelerating the solution of linear systems by iterative refinement in three precisions
Solving a large scale radiosity problem on GPU-based parallel computers
High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster
Numerical analysis of parallel implementation of the reorthogonalized ABS methods
Design of a multicore sparse Cholesky factorization using DAGs
Experiments with sparse Cholesky using a sequential task-flow implementation
Superglue: a shared memory framework using data versioning for dependency-aware task-based parallelization

This page was built for software: PLASMA