CUBLAS
From MaRDI portal
Software:18949
swMATH6880MaRDI QIDQ18949FDOQ18949
Author name not available (Why is that?)
Cited In (80)
- Title not available (Why is that?)
- Title not available (Why is that?)
- A Dynamic Pattern Factored Sparse Approximate Inverse Preconditioner on Graphics Processing Units
- KBLAS
- An Error Correction Solver for Linear Systems: Evaluation of Mixed Precision Implementations
- Performance and Numerical Accuracy Evaluation of Heterogeneous Multicore Systems for Krylov Orthogonal Basis Computation
- Parallel Prony's Method with Multivariate Matrix Pencil Approach and Its Numerical Aspects
- A Framework for Error-Bounded Approximate Computing, with an Application to Dot Products
- A Flexible CUDA LU-Based Solver for Small, Batched Linear Systems
- Batch Matrix Exponentiation
- Accelerating the Explicitly Restarted Arnoldi Method with GPUs Using an Autotuned Matrix Vector Product
- Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor Cores
- A Fast Parallel SVM Algorithm for Massive Classification Tasks
- A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines
- A fast dense triangular solve in CUDA
- Strassen’s Algorithm Reloaded on GPUs
- SCELib4.0: the new program version for computing molecular properties in the single center approach
- HPMaX: heterogeneous parallel matrix multiplication using CPUs and GPUs
- A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators
- Parallel reduction of four matrices to condensed form for a generalized matrix eigenvalue algorithm
- A GPU application for high-order compact finite difference scheme
- Finite Element Integration on GPUs
- Evaluation of gas sales agreements with indexation using tree and least-squares Monte Carlo methods on graphics processing units
- A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures
- Higher order finite elements in space and time for anisotropic simulations with variational integrators. Application of an efficient GPU implementation
- Fast and robust flow simulations in discrete fracture networks with gpgpus
- Efficient determination of the Markovian time-evolution towards a steady-state of a complex open quantum system
- GPU-accelerated algorithms for many-particle continuous-time quantum walks
- Auto-tuned Krylov methods on cluster of graphics processing unit
- A GPU-Accelerated Hybridizable Discontinuous Galerkin Method for Linear Elasticity
- Solving time-fractional reaction-diffusion systems through a tensor-based parallel algorithm
- A new efficient and accurate spline algorithm for the matrix exponential computation
- GPU optimization of large-scale eigenvalue solver
- Efficient \(L_0\) resampling of point sets
- Discrete particle swarm optimization for constructing uniform design on irregular regions
- Fast Taylor polynomial evaluation for the computation of the matrix cosine
- AmgX: A Library for GPU Accelerated Algebraic Multigrid and Preconditioned Iterative Methods
- Solving a Large-Scale Thermal Radiation Problem Using an Interoperable Executive Library Framework on Petascale Supercomputers
- GPU-Accelerated Bernstein--Bézier Discontinuous Galerkin Methods for Wave Problems
- GPU-accelerated preconditioned GMRES method for two-dimensional Maxwell's equations
- Compressed hierarchical Schur algorithm for frequency-domain analysis of photonic structures
- An efficient and accurate algorithm for computing the matrix cosine based on new Hermite approximations
- MPI-CUDA sparse matrix-vector multiplication for the conjugate gradient method with an approximate inverse preconditioner
- A parallel computing method using blocked format with optimal partitioning for SpMV on GPU
- Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods
- Parallel Solver for Shifted Systems in a Hybrid CPU--GPU Framework
- Low synchronization Gram–Schmidt and generalized minimal residual algorithms
- Efficient and accurate algorithms for computing matrix trigonometric functions
- Performance models and workload distribution algorithms for optimizing a hybrid CPU-GPU multifrontal solver
- GPU-based block-wise nonlocal means denoising for 3D ultrasound images
- 3D data denoising via nonlocal means filter by using parallel GPU strategies
- Accelerated dimension-independent adaptive metropolis
- GPU accelerated computational homogenization based on a variational approach in a reduced basis framework
- Towards a parallel component in a GPU–CUDA environment: a case study with the L-BFGS Harwell routine
- GPGPU-based parallel computing applied in the FEM using the conjugate gradient algorithm: a review
- Megapixel Topology Optimization on a Graphics Processing Unit
- Title not available (Why is that?)
- FloatX
- The Eigenvalues Slicing Library (EVSL): Algorithms, Implementation, and Software
- New Hermite series expansion for computing the matrix hyperbolic cosine
- Development of a parallel CUDA algorithm for solving 3D guiding center problems
- Graphics processing units and high-dimensional optimization
- Updating incomplete factorization preconditioners for model order reduction
- Rapid re-meshing and re-solution of three-dimensional boundary element problems for interactive stress analysis
- Optimal size of the block in block GMRES on GPUs: computational model and experiments
- On the GPGPU parallelization issues of finite element approximate inverse preconditioning
- Parallel and Heterogeneous $m$--Hessenberg--Triangular--Triangular Reduction
- Accelerating GPU Kernels for Dense Linear Algebra
- A heterogeneous parallel LU factorization algorithm based on a basic column block uniform allocation strategy
- Title not available (Why is that?)
- Redesigning triangular dense matrix computations on GPUs
- Portable implementation model for CFD simulations. Application to hybrid CPU/GPU supercomputers
- Efficient GPU-based implementations of simplex type algorithms
- Acceleration of early-photon fluorescence molecular tomography with graphics processing units
- GPU accelerated intensities MPI (GAIN-MPI): a new method of computing Einstein-\(A\) coefficients
- Cucheb: a GPU implementation of the filtered Lanczos procedure
- Algorithms for Efficient Reproducible Floating Point Summation
- High-performance statistical computing in the computing environments of the 2020s
- A \(\mu\)-mode BLAS approach for multidimensional tensor-structured problems
- Highly efficient GPU eigensolver for three-dimensional photonic crystal band structures with any Bravais lattice
This page was built for software: CUBLAS