Publication | Date of Publication | Type |
Applying Dijkstra's vision to numerical software | 2024-10-28 | Paper |
Supporting Mixed-domain Mixed-precision Matrix Multiplication within the BLIS Framework | 2022-02-01 | Paper |
Strassen’s Algorithm Reloaded on GPUs | 2020-11-10 | Paper |
Strassen's Algorithm for Tensor Contraction | 2018-06-05 | Paper |
Codesign Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures | 2017-07-12 | Paper |
Algorithm, Architecture, and Floating-Point Unit Codesign of a Matrix Factorization Accelerator | 2017-06-20 | Paper |
Householder QR Factorization With Randomization for Column Pivoting (HQRRP) | 2017-05-31 | Paper |
Programming matrix algorithms-by-blocks for thread-level parallelism | 2017-05-19 | Paper |
High-performance up-and-downdating via householder-like transformations | 2017-05-19 | Paper |
A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures | 2017-05-19 | Paper |
Parallel Matrix Multiplication: A Systematic Journey | 2017-01-13 | Paper |
BLIS: A Framework for Rapidly Instantiating BLAS Functionality | 2016-10-24 | Paper |
Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance | 2015-03-10 | Paper |
Exploiting Symmetry in Tensors for High Performance: Multiplication with Symmetric Tensors | 2015-01-23 | Paper |
Deriving dense linear algebra libraries | 2014-11-10 | Paper |
Families of Algorithms for Reducing a Matrix to Condensed Form | 2014-09-12 | Paper |
Elemental | 2014-09-12 | Paper |
Sparse direct factorizations through unassembled hyper-matrices | 2011-11-30 | Paper |
Goal-Oriented and Modular Stability Analysis | 2011-06-15 | Paper |
Out-of-core solution of linear systems on graphics processors | 2010-05-21 | Paper |
An Algorithm-by-Blocks for SuperMatrix Band Cholesky Factorization | 2009-02-03 | Paper |
Accumulating Householder transformations, revisited | 2008-12-21 | Paper |
Improving the performance of reduction to Hessenberg form | 2008-12-21 | Paper |
Scalable parallelization of FLAME code via the workqueuing model | 2008-12-21 | Paper |
Anatomy of high-performance matrix multiplication | 2008-12-21 | Paper |
A Parallel Eigensolver for Dense Symmetric Matrices Based on Multiple Relatively Robust Representations | 2005-09-22 | Paper |
The science of deriving dense linear algebra algorithms | 2005-07-22 | Paper |
Representing linear algebra algorithms in code: the FLAME application program interfaces | 2005-07-22 | Paper |
Parallel out-of-core computation and updating of the QR factorization | 2005-07-22 | Paper |
FLAME | 2005-07-21 | Paper |
Formal derivation of algorithms | 2005-07-21 | Paper |
Specialized parallel algorithms for solving Lyapunov and Stein equations | 2002-07-31 | Paper | | 2002-04-15 | Paper |
A Note On Parallel Matrix Inversion | 2001-03-19 | Paper |
On global combine operations | 1999-05-05 | Paper | | 1998-11-08 | Paper |
Parallelizing the QR Algorithm for the Unsymmetric Algebraic Eigenvalue Problem: Myths and Reality | 1997-02-24 | Paper |
Parallel performance and scalability for block preconditioned finite element (p) solution of viscous flow | 1995-07-03 | Paper |
High performance computational kernels for selected segments of a p finite element code | 1995-07-03 | Paper |
Performance and scalability of finite element analysis for distributed parallel computation | 1995-01-02 | Paper |
Deferred Shifting Schemes for Parallel QR Methods | 1993-05-16 | Paper |
Reduction to condensed form for the eigenvalue problem on distributed memory architectures | 1993-01-17 | Paper | | 1991-01-01 | Paper |