CUBLAS - MaRDI portal

MaRDI QIDQ18949swMATHFDO

Official website http://docs.nvidia.com/cuda/cublas/index.html

Cited in

(only showing first 100 items - show all)

HPMaX: heterogeneous parallel matrix multiplication using CPUs and GPUs
scientific article; zbMATH DE number 7559357 (Why is no real title available?)
A Dynamic Pattern Factored Sparse Approximate Inverse Preconditioner on Graphics Processing Units
Parallel reduction of four matrices to condensed form for a generalized matrix eigenvalue algorithm
A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators
A GPU application for high-order compact finite difference scheme
Accelerating the explicitly restarted Arnoldi method with GPUs using an autotuned matrix vector product
Evaluation of gas sales agreements with indexation using tree and least-squares Monte Carlo methods on graphics processing units
A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures
Exposing fine-grained parallelism in algebraic multigrid methods
Higher order finite elements in space and time for anisotropic simulations with variational integrators. Application of an efficient GPU implementation
Fast and robust flow simulations in discrete fracture networks with gpgpus
Efficient determination of the Markovian time-evolution towards a steady-state of a complex open quantum system
GPU-accelerated algorithms for many-particle continuous-time quantum walks
Megapixel topology optimization on a graphics processing unit
Auto-tuned Krylov methods on cluster of graphics processing unit
Introduction to high performance scientific computing
Solving time-fractional reaction-diffusion systems through a tensor-based parallel algorithm
A new efficient and accurate spline algorithm for the matrix exponential computation
GPU optimization of large-scale eigenvalue solver
Efficient L₀ resampling of point sets
Discrete particle swarm optimization for constructing uniform design on irregular regions
Parallel Prony's Method with Multivariate Matrix Pencil Approach and Its Numerical Aspects
Strassen's algorithm reloaded on GPUs
Fast Taylor polynomial evaluation for the computation of the matrix cosine
A Framework for Error-Bounded Approximate Computing, with an Application to Dot Products
Batch Matrix Exponentiation
GPU-accelerated preconditioned GMRES method for two-dimensional Maxwell's equations
Algorithms for efficient reproducible floating point summation
An efficient and accurate algorithm for computing the matrix cosine based on new Hermite approximations
Compressed hierarchical Schur algorithm for frequency-domain analysis of photonic structures
MPI-CUDA sparse matrix-vector multiplication for the conjugate gradient method with an approximate inverse preconditioner
A parallel computing method using blocked format with optimal partitioning for SpMV on GPU
CUDA-based scientific computing. Tools and selected applications
SLATE
Efficient and accurate algorithms for computing matrix trigonometric functions
Performance models and workload distribution algorithms for optimizing a hybrid CPU-GPU multifrontal solver
Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor Cores
Accelerating GPU kernels for dense linear algebra
GPU-based block-wise nonlocal means denoising for 3D ultrasound images
3D data denoising via nonlocal means filter by using parallel GPU strategies
Low synchronization Gram–Schmidt and generalized minimal residual algorithms
KBLAS: an optimized library for dense matrix-vector multiplication on GPU accelerators
A Fast Parallel SVM Algorithm for Massive Classification Tasks
Accelerated dimension-independent adaptive metropolis
ITPACK
VOLSCAT
LAWRA
BLAS
CUDA
SHTns
P3DFFT
RScaLAPACK
MKL
Seigtool
OpenCL
Algorithm 919
CUSP
PFFT
CUSPARSE
SciPAL
TERMOFLUIDS
PyCUDA
Thrust
SIMPAR
gem5
OpenACC
cuFFT
cuRAND
clSpMV
CULA
OpenBLAS
MAGMA
MERAM
SoftFloat
AmgX
CORAL
GAMPACK
IEL
CONLIN
MPC Toolbox
gputools
PyOpenCL
QUARK
SeLaLib
CholeskyQR2
SpGEMM
testmatrix
Boda-RTC
AccFFT
UPC++
pyCTQW
Sailfish
AUGEM
KBLAS
cuDNN
maxDNN
Algorithm 656
CLBlast
CLTune

This page was built for software: CUBLAS