| Publication | Date of Publication | Type |
|---|
Applying Dijkstra's vision to numerical software | 2024-10-28 | Paper |
Supporting Mixed-domain Mixed-precision Matrix Multiplication within the BLIS Framework ACM Transactions on Mathematical Software | 2022-02-01 | Paper |
Strassen's algorithm reloaded on GPUs ACM Transactions on Mathematical Software | 2020-11-10 | Paper |
Strassen's Algorithm for Tensor Contraction SIAM Journal on Scientific Computing | 2018-06-05 | Paper |
Codesign Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures IEEE Transactions on Computers | 2017-07-12 | Paper |
Algorithm, Architecture, and Floating-Point Unit Codesign of a Matrix Factorization Accelerator IEEE Transactions on Computers | 2017-06-20 | Paper |
Householder QR factorization with randomization for column pivoting (HQRRP) SIAM Journal on Scientific Computing | 2017-05-31 | Paper |
Programming matrix algorithms-by-blocks for thread-level parallelism ACM Transactions on Mathematical Software | 2017-05-19 | Paper |
A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures ACM Transactions on Mathematical Software | 2017-05-19 | Paper |
High-performance up-and-downdating via householder-like transformations ACM Transactions on Mathematical Software | 2017-05-19 | Paper |
Parallel matrix multiplication: a systematic journey SIAM Journal on Scientific Computing | 2017-01-13 | Paper |
BLIS: a framework for rapidly instantiating BLAS functionality ACM Transactions on Mathematical Software | 2016-10-24 | Paper |
Restructuring the tridiagonal and bidiagonal QR algorithms for performance ACM Transactions on Mathematical Software | 2015-03-10 | Paper |
Exploiting symmetry in tensors for high performance: multiplication with symmetric tensors SIAM Journal on Scientific Computing | 2015-01-23 | Paper |
Deriving dense linear algebra libraries Formal Aspects of Computing | 2014-11-10 | Paper |
Elemental, a new framework for distributed memory dense matrix computations ACM Transactions on Mathematical Software | 2014-09-12 | Paper |
Families of Algorithms for Reducing a Matrix to Condensed Form ACM Transactions on Mathematical Software | 2014-09-12 | Paper |
Sparse direct factorizations through unassembled hyper-matrices Computer Methods in Applied Mechanics and Engineering | 2011-11-30 | Paper |
Goal-oriented and modular stability analysis SIAM Journal on Matrix Analysis and Applications | 2011-06-15 | Paper |
Out-of-core solution of linear systems on graphics processors International Journal of Parallel, Emergent and Distributed Systems | 2010-05-21 | Paper |
An Algorithm-by-Blocks for SuperMatrix Band Cholesky Factorization High Performance Computing for Computational Science - VECPAR 2008 | 2009-02-03 | Paper |
Scalable parallelization of FLAME code via the workqueuing model ACM Transactions on Mathematical Software | 2008-12-21 | Paper |
Anatomy of high-performance matrix multiplication ACM Transactions on Mathematical Software | 2008-12-21 | Paper |
Improving the performance of reduction to Hessenberg form ACM Transactions on Mathematical Software | 2008-12-21 | Paper |
Accumulating Householder transformations, revisited ACM Transactions on Mathematical Software | 2008-12-21 | Paper |
A Parallel Eigensolver for Dense Symmetric Matrices Based on Multiple Relatively Robust Representations SIAM Journal on Scientific Computing | 2005-09-22 | Paper |
Representing linear algebra algorithms in code: the FLAME application program interfaces ACM Transactions on Mathematical Software | 2005-07-22 | Paper |
The science of deriving dense linear algebra algorithms ACM Transactions on Mathematical Software | 2005-07-22 | Paper |
Parallel out-of-core computation and updating of the QR factorization ACM Transactions on Mathematical Software | 2005-07-22 | Paper |
FLAME ACM Transactions on Mathematical Software | 2005-07-21 | Paper |
Formal derivation of algorithms ACM Transactions on Mathematical Software | 2005-07-21 | Paper |
Specialized parallel algorithms for solving Lyapunov and Stein equations Journal of Parallel and Distributed Computing | 2002-07-31 | Paper |
scientific article; zbMATH DE number 1728263 (Why is no real title available?) | 2002-04-15 | Paper |
A note on parallel matrix inversion SIAM Journal on Scientific Computing | 2001-03-19 | Paper |
On global combine operations Journal of Parallel and Distributed Computing | 1999-05-05 | Paper |
scientific article; zbMATH DE number 1203510 (Why is no real title available?) | 1998-11-08 | Paper |
Parallelizing the QR Algorithm for the Unsymmetric Algebraic Eigenvalue Problem: Myths and Reality SIAM Journal on Scientific Computing | 1997-02-24 | Paper |
Parallel performance and scalability for block preconditioned finite element (p) solution of viscous flow International Journal for Numerical Methods in Engineering | 1995-07-03 | Paper |
High performance computational kernels for selected segments of a p finite element code International Journal for Numerical Methods in Engineering | 1995-07-03 | Paper |
Performance and scalability of finite element analysis for distributed parallel computation Journal of Parallel and Distributed Computing | 1995-01-02 | Paper |
Deferred Shifting Schemes for Parallel QR Methods SIAM Journal on Matrix Analysis and Applications | 1993-05-16 | Paper |
Reduction to condensed form for the eigenvalue problem on distributed memory architectures Parallel Computing | 1993-01-17 | Paper |
scientific article; zbMATH DE number 4215265 (Why is no real title available?) | 1991-01-01 | Paper |