| Publication | Date of Publication | Type |
|---|
| Applying Dijkstra's vision to numerical software | 2024-10-28 | Paper |
| Supporting Mixed-domain Mixed-precision Matrix Multiplication within the BLIS Framework | 2022-02-01 | Paper |
| Strassen’s Algorithm Reloaded on GPUs | 2020-11-10 | Paper |
| Strassen's Algorithm for Tensor Contraction | 2018-06-05 | Paper |
| Codesign Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures | 2017-07-12 | Paper |
| Algorithm, Architecture, and Floating-Point Unit Codesign of a Matrix Factorization Accelerator | 2017-06-20 | Paper |
| Householder QR Factorization With Randomization for Column Pivoting (HQRRP) | 2017-05-31 | Paper |
| Programming matrix algorithms-by-blocks for thread-level parallelism | 2017-05-19 | Paper |
| A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures | 2017-05-19 | Paper |
| High-performance up-and-downdating via householder-like transformations | 2017-05-19 | Paper |
| Parallel Matrix Multiplication: A Systematic Journey | 2017-01-13 | Paper |
| BLIS: a framework for rapidly instantiating BLAS functionality | 2016-10-24 | Paper |
| Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance | 2015-03-10 | Paper |
| Exploiting Symmetry in Tensors for High Performance: Multiplication with Symmetric Tensors | 2015-01-23 | Paper |
| Deriving dense linear algebra libraries | 2014-11-10 | Paper |
| Elemental | 2014-09-12 | Paper |
| Families of Algorithms for Reducing a Matrix to Condensed Form | 2014-09-12 | Paper |
| Sparse direct factorizations through unassembled hyper-matrices | 2011-11-30 | Paper |
| Goal-Oriented and Modular Stability Analysis | 2011-06-15 | Paper |
| Out-of-core solution of linear systems on graphics processors | 2010-05-21 | Paper |
| An Algorithm-by-Blocks for SuperMatrix Band Cholesky Factorization | 2009-02-03 | Paper |
| Scalable parallelization of FLAME code via the workqueuing model | 2008-12-21 | Paper |
| Anatomy of high-performance matrix multiplication | 2008-12-21 | Paper |
| Improving the performance of reduction to Hessenberg form | 2008-12-21 | Paper |
| Accumulating Householder transformations, revisited | 2008-12-21 | Paper |
| A Parallel Eigensolver for Dense Symmetric Matrices Based on Multiple Relatively Robust Representations | 2005-09-22 | Paper |
| Representing linear algebra algorithms in code: the FLAME application program interfaces | 2005-07-22 | Paper |
| The science of deriving dense linear algebra algorithms | 2005-07-22 | Paper |
| Parallel out-of-core computation and updating of the QR factorization | 2005-07-22 | Paper |
| FLAME | 2005-07-21 | Paper |
| Formal derivation of algorithms | 2005-07-21 | Paper |
| Specialized parallel algorithms for solving Lyapunov and Stein equations | 2002-07-31 | Paper |
| https://portal.mardi4nfdi.de/entity/Q2779289 | 2002-04-15 | Paper |
| A note on parallel matrix inversion | 2001-03-19 | Paper |
| On global combine operations | 1999-05-05 | Paper |
| https://portal.mardi4nfdi.de/entity/Q4209701 | 1998-11-08 | Paper |
| Parallelizing the QR Algorithm for the Unsymmetric Algebraic Eigenvalue Problem: Myths and Reality | 1997-02-24 | Paper |
| Parallel performance and scalability for block preconditioned finite element (p) solution of viscous flow | 1995-07-03 | Paper |
| High performance computational kernels for selected segments of a p finite element code | 1995-07-03 | Paper |
| Performance and scalability of finite element analysis for distributed parallel computation | 1995-01-02 | Paper |
| Deferred Shifting Schemes for Parallel QR Methods | 1993-05-16 | Paper |
| Reduction to condensed form for the eigenvalue problem on distributed memory architectures | 1993-01-17 | Paper |
| https://portal.mardi4nfdi.de/entity/Q3361798 | 1991-01-01 | Paper |