CUBLAS

From MaRDI portal
Software:18949



swMATH6880MaRDI QIDQ18949


No author found.





Related Items (80)

Matrix Multiplication in Multiword Arithmetic: Error Analysis and Application to GPU Tensor CoresDiscrete particle swarm optimization for constructing uniform design on irregular regionsExposing Fine-Grained Parallelism in Algebraic Multigrid MethodsAmgX: A Library for GPU Accelerated Algebraic Multigrid and Preconditioned Iterative MethodsGPU-accelerated preconditioned GMRES method for two-dimensional Maxwell's equationsCompressed hierarchical Schur algorithm for frequency-domain analysis of photonic structuresEfficient and accurate algorithms for computing matrix trigonometric functionsPerformance models and workload distribution algorithms for optimizing a hybrid CPU-GPU multifrontal solverMPI-CUDA sparse matrix-vector multiplication for the conjugate gradient method with an approximate inverse preconditionerSolving a Large-Scale Thermal Radiation Problem Using an Interoperable Executive Library Framework on Petascale SupercomputersFinite Element Integration on GPUsUpdating incomplete factorization preconditioners for model order reductionFloatXPortable implementation model for CFD simulations. Application to hybrid CPU/GPU supercomputersGPU accelerated computational homogenization based on a variational approach in a reduced basis frameworkParallel Solver for Shifted Systems in a Hybrid CPU--GPU FrameworkAcceleration of early-photon fluorescence molecular tomography with graphics processing unitsNew Hermite series expansion for computing the matrix hyperbolic cosineA parallel computing method using blocked format with optimal partitioning for SpMV on GPUGPU accelerated intensities MPI (GAIN-MPI): a new method of computing Einstein-\(A\) coefficientsOptimal size of the block in block GMRES on GPUs: computational model and experimentsParallel and Heterogeneous $m$--Hessenberg--Triangular--Triangular ReductionFast Taylor polynomial evaluation for the computation of the matrix cosineSolving time-fractional reaction-diffusion systems through a tensor-based parallel algorithmRedesigning triangular dense matrix computations on GPUsUnnamed ItemEfficient GPU-based implementations of simplex type algorithmsA \(\mu\)-mode BLAS approach for multidimensional tensor-structured problemsGraphics processing units and high-dimensional optimizationDevelopment of a parallel CUDA algorithm for solving 3D guiding center problemsHighly efficient GPU eigensolver for three-dimensional photonic crystal band structures with any Bravais latticeSCELib4.0: the new program version for computing molecular properties in the single center approachHPMaX: heterogeneous parallel matrix multiplication using CPUs and GPUsOn the GPGPU parallelization issues of finite element approximate inverse preconditioningKBLASStrassen’s Algorithm Reloaded on GPUsAlgorithms for Efficient Reproducible Floating Point SummationParallel reduction of four matrices to condensed form for a generalized matrix eigenvalue algorithmA Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded ArchitecturesEfficient determination of the Markovian time-evolution towards a steady-state of a complex open quantum systemCucheb: a GPU implementation of the filtered Lanczos procedureGPU-accelerated algorithms for many-particle continuous-time quantum walksA new efficient and accurate spline algorithm for the matrix exponential computationA GPU-Accelerated Hybridizable Discontinuous Galerkin Method for Linear ElasticityA GPU application for high-order compact finite difference schemeHigher order finite elements in space and time for anisotropic simulations with variational integrators. Application of an efficient GPU implementationAn efficient and accurate algorithm for computing the matrix cosine based on new Hermite approximationsGPU-Accelerated Bernstein--Bézier Discontinuous Galerkin Methods for Wave ProblemsGPU optimization of large-scale eigenvalue solverGPU-based block-wise nonlocal means denoising for 3D ultrasound images3D data denoising via nonlocal means filter by using parallel GPU strategiesEfficient \(L_0\) resampling of point setsUnnamed ItemGPGPU-based parallel computing applied in the FEM using the conjugate gradient algorithm: a reviewRapid re-meshing and re-solution of three-dimensional boundary element problems for interactive stress analysisA heterogeneous parallel LU factorization algorithm based on a basic column block uniform allocation strategyPerformance and Numerical Accuracy Evaluation of Heterogeneous Multicore Systems for Krylov Orthogonal Basis ComputationAn Error Correction Solver for Linear Systems: Evaluation of Mixed Precision ImplementationsAccelerating GPU Kernels for Dense Linear AlgebraA Scalable High Performant Cholesky Factorization for Multicore with GPU AcceleratorsA Fast Parallel SVM Algorithm for Massive Classification TasksUnnamed ItemA Dynamic Pattern Factored Sparse Approximate Inverse Preconditioner on Graphics Processing UnitsThe Eigenvalues Slicing Library (EVSL): Algorithms, Implementation, and SoftwareTowards a parallel component in a GPU–CUDA environment: a case study with the L-BFGS Harwell routineAccelerated Dimension-Independent Adaptive MetropolisParallel Prony's Method with Multivariate Matrix Pencil Approach and Its Numerical AspectsMegapixel Topology Optimization on a Graphics Processing UnitFast and robust flow simulations in discrete fracture networks with gpgpusEvaluation of gas sales agreements with indexation using tree and least-squares Monte Carlo methods on graphics processing unitsAccelerating the Explicitly Restarted Arnoldi Method with GPUs Using an Autotuned Matrix Vector ProductUnnamed ItemA Fast Dense Triangular Solve in CUDAAuto-tuned Krylov methods on cluster of graphics processing unitHigh-performance statistical computing in the computing environments of the 2020sLow synchronization Gram–Schmidt and generalized minimal residual algorithmsBatch Matrix ExponentiationA Flexible CUDA LU-Based Solver for Small, Batched Linear SystemsA Framework for Error-Bounded Approximate Computing, with an Application to Dot ProductsA Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines


This page was built for software: CUBLAS