CUBLAS
From MaRDI portal
Cited in
(only showing first 100 items - show all)- HPMaX: heterogeneous parallel matrix multiplication using CPUs and GPUs
- scientific article; zbMATH DE number 7559357 (Why is no real title available?)
- A Dynamic Pattern Factored Sparse Approximate Inverse Preconditioner on Graphics Processing Units
- Parallel reduction of four matrices to condensed form for a generalized matrix eigenvalue algorithm
- A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators
- A GPU application for high-order compact finite difference scheme
- Accelerating the explicitly restarted Arnoldi method with GPUs using an autotuned matrix vector product
- Evaluation of gas sales agreements with indexation using tree and least-squares Monte Carlo methods on graphics processing units
- A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures
- Exposing fine-grained parallelism in algebraic multigrid methods
- Higher order finite elements in space and time for anisotropic simulations with variational integrators. Application of an efficient GPU implementation
- Fast and robust flow simulations in discrete fracture networks with gpgpus
- Efficient determination of the Markovian time-evolution towards a steady-state of a complex open quantum system
- GPU-accelerated algorithms for many-particle continuous-time quantum walks
- Megapixel topology optimization on a graphics processing unit
- Auto-tuned Krylov methods on cluster of graphics processing unit
- Introduction to high performance scientific computing
- Solving time-fractional reaction-diffusion systems through a tensor-based parallel algorithm
- A new efficient and accurate spline algorithm for the matrix exponential computation
- GPU optimization of large-scale eigenvalue solver
- Efficient L₀ resampling of point sets
- Discrete particle swarm optimization for constructing uniform design on irregular regions
- Parallel Prony's Method with Multivariate Matrix Pencil Approach and Its Numerical Aspects
- Strassen's algorithm reloaded on GPUs
- Fast Taylor polynomial evaluation for the computation of the matrix cosine
- A Framework for Error-Bounded Approximate Computing, with an Application to Dot Products
- Batch Matrix Exponentiation
- GPU-accelerated preconditioned GMRES method for two-dimensional Maxwell's equations
- Algorithms for efficient reproducible floating point summation
- An efficient and accurate algorithm for computing the matrix cosine based on new Hermite approximations
- Compressed hierarchical Schur algorithm for frequency-domain analysis of photonic structures
- MPI-CUDA sparse matrix-vector multiplication for the conjugate gradient method with an approximate inverse preconditioner
- A parallel computing method using blocked format with optimal partitioning for SpMV on GPU
- CUDA-based scientific computing. Tools and selected applications
- SLATE
- Efficient and accurate algorithms for computing matrix trigonometric functions
- Performance models and workload distribution algorithms for optimizing a hybrid CPU-GPU multifrontal solver
- Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor Cores
- Accelerating GPU kernels for dense linear algebra
- GPU-based block-wise nonlocal means denoising for 3D ultrasound images
- 3D data denoising via nonlocal means filter by using parallel GPU strategies
- Low synchronization Gram–Schmidt and generalized minimal residual algorithms
- KBLAS: an optimized library for dense matrix-vector multiplication on GPU accelerators
- A Fast Parallel SVM Algorithm for Massive Classification Tasks
- Accelerated dimension-independent adaptive metropolis
- ITPACK
- VOLSCAT
- LAWRA
- BLAS
- CUDA
- SHTns
- P3DFFT
- RScaLAPACK
- MKL
- Seigtool
- OpenCL
- Algorithm 919
- CUSP
- PFFT
- CUSPARSE
- SciPAL
- TERMOFLUIDS
- PyCUDA
- Thrust
- SIMPAR
- gem5
- OpenACC
- cuFFT
- cuRAND
- clSpMV
- CULA
- OpenBLAS
- MAGMA
- MERAM
- SoftFloat
- AmgX
- CORAL
- GAMPACK
- IEL
- CONLIN
- MPC Toolbox
- gputools
- PyOpenCL
- QUARK
- SeLaLib
- CholeskyQR2
- SpGEMM
- testmatrix
- Boda-RTC
- AccFFT
- UPC++
- pyCTQW
- Sailfish
- AUGEM
- KBLAS
- cuDNN
- maxDNN
- Algorithm 656
- CLBlast
- CLTune
This page was built for software: CUBLAS