CUBLAS
From MaRDI portal
Software:18949
No author found.
Related Items (80)
Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor Cores ⋮ Discrete particle swarm optimization for constructing uniform design on irregular regions ⋮ Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods ⋮ AmgX: A Library for GPU Accelerated Algebraic Multigrid and Preconditioned Iterative Methods ⋮ GPU-accelerated preconditioned GMRES method for two-dimensional Maxwell's equations ⋮ Compressed hierarchical Schur algorithm for frequency-domain analysis of photonic structures ⋮ Efficient and accurate algorithms for computing matrix trigonometric functions ⋮ Performance models and workload distribution algorithms for optimizing a hybrid CPU-GPU multifrontal solver ⋮ MPI-CUDA sparse matrix-vector multiplication for the conjugate gradient method with an approximate inverse preconditioner ⋮ Solving a Large-Scale Thermal Radiation Problem Using an Interoperable Executive Library Framework on Petascale Supercomputers ⋮ Finite Element Integration on GPUs ⋮ Updating incomplete factorization preconditioners for model order reduction ⋮ FloatX ⋮ Portable implementation model for CFD simulations. Application to hybrid CPU/GPU supercomputers ⋮ GPU accelerated computational homogenization based on a variational approach in a reduced basis framework ⋮ Parallel Solver for Shifted Systems in a Hybrid CPU--GPU Framework ⋮ Acceleration of early-photon fluorescence molecular tomography with graphics processing units ⋮ New Hermite series expansion for computing the matrix hyperbolic cosine ⋮ A parallel computing method using blocked format with optimal partitioning for SpMV on GPU ⋮ GPU accelerated intensities MPI (GAIN-MPI): a new method of computing Einstein-\(A\) coefficients ⋮ Optimal size of the block in block GMRES on GPUs: computational model and experiments ⋮ Parallel and Heterogeneous $m$--Hessenberg--Triangular--Triangular Reduction ⋮ Fast Taylor polynomial evaluation for the computation of the matrix cosine ⋮ Solving time-fractional reaction-diffusion systems through a tensor-based parallel algorithm ⋮ Redesigning triangular dense matrix computations on GPUs ⋮ Unnamed Item ⋮ Efficient GPU-based implementations of simplex type algorithms ⋮ A \(\mu\)-mode BLAS approach for multidimensional tensor-structured problems ⋮ Graphics processing units and high-dimensional optimization ⋮ Development of a parallel CUDA algorithm for solving 3D guiding center problems ⋮ Highly efficient GPU eigensolver for three-dimensional photonic crystal band structures with any Bravais lattice ⋮ SCELib4.0: the new program version for computing molecular properties in the single center approach ⋮ HPMaX: heterogeneous parallel matrix multiplication using CPUs and GPUs ⋮ On the GPGPU parallelization issues of finite element approximate inverse preconditioning ⋮ KBLAS ⋮ Strassen’s Algorithm Reloaded on GPUs ⋮ Algorithms for Efficient Reproducible Floating Point Summation ⋮ Parallel reduction of four matrices to condensed form for a generalized matrix eigenvalue algorithm ⋮ A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures ⋮ Efficient determination of the Markovian time-evolution towards a steady-state of a complex open quantum system ⋮ Cucheb: a GPU implementation of the filtered Lanczos procedure ⋮ GPU-accelerated algorithms for many-particle continuous-time quantum walks ⋮ A new efficient and accurate spline algorithm for the matrix exponential computation ⋮ A GPU-Accelerated Hybridizable Discontinuous Galerkin Method for Linear Elasticity ⋮ A GPU application for high-order compact finite difference scheme ⋮ Higher order finite elements in space and time for anisotropic simulations with variational integrators. Application of an efficient GPU implementation ⋮ An efficient and accurate algorithm for computing the matrix cosine based on new Hermite approximations ⋮ GPU-Accelerated Bernstein--Bézier Discontinuous Galerkin Methods for Wave Problems ⋮ GPU optimization of large-scale eigenvalue solver ⋮ GPU-based block-wise nonlocal means denoising for 3D ultrasound images ⋮ 3D data denoising via nonlocal means filter by using parallel GPU strategies ⋮ Efficient \(L_0\) resampling of point sets ⋮ Unnamed Item ⋮ GPGPU-based parallel computing applied in the FEM using the conjugate gradient algorithm: a review ⋮ Rapid re-meshing and re-solution of three-dimensional boundary element problems for interactive stress analysis ⋮ A heterogeneous parallel LU factorization algorithm based on a basic column block uniform allocation strategy ⋮ Performance and Numerical Accuracy Evaluation of Heterogeneous Multicore Systems for Krylov Orthogonal Basis Computation ⋮ An Error Correction Solver for Linear Systems: Evaluation of Mixed Precision Implementations ⋮ Accelerating GPU Kernels for Dense Linear Algebra ⋮ A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators ⋮ A Fast Parallel SVM Algorithm for Massive Classification Tasks ⋮ Unnamed Item ⋮ A Dynamic Pattern Factored Sparse Approximate Inverse Preconditioner on Graphics Processing Units ⋮ The Eigenvalues Slicing Library (EVSL): Algorithms, Implementation, and Software ⋮ Towards a parallel component in a GPU–CUDA environment: a case study with the L-BFGS Harwell routine ⋮ Accelerated Dimension-Independent Adaptive Metropolis ⋮ Parallel Prony's Method with Multivariate Matrix Pencil Approach and Its Numerical Aspects ⋮ Megapixel Topology Optimization on a Graphics Processing Unit ⋮ Fast and robust flow simulations in discrete fracture networks with gpgpus ⋮ Evaluation of gas sales agreements with indexation using tree and least-squares Monte Carlo methods on graphics processing units ⋮ Accelerating the Explicitly Restarted Arnoldi Method with GPUs Using an Autotuned Matrix Vector Product ⋮ Unnamed Item ⋮ A Fast Dense Triangular Solve in CUDA ⋮ Auto-tuned Krylov methods on cluster of graphics processing unit ⋮ High-performance statistical computing in the computing environments of the 2020s ⋮ Low synchronization Gram–Schmidt and generalized minimal residual algorithms ⋮ Batch Matrix Exponentiation ⋮ A Flexible CUDA LU-Based Solver for Small, Batched Linear Systems ⋮ A Framework for Error-Bounded Approximate Computing, with an Application to Dot Products ⋮ A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines
This page was built for software: CUBLAS