Reliable generation of high-performance matrix algebra

DOI10.1145/2629698MaRDI QIDQ2828141zbMATH OpenOpenAlexWikidataFDO

Authors Thomas Nelson, Geoffrey Belter, Jeremy G. Siek, Elizabeth Jessup, Boyana Norris

Publication date 24 October 2016

Published in ACM Transactions on Mathematical Software (Search for Journal in Brave)

Full work available at URL https://arxiv.org/abs/1205.1098

genetic algorithms linear algebra domain-specific languages autotuning

Complexity and performance of numerical algorithms (65Y20) Packaged methods for numerical algorithms (65Y15) Theory of compilers and interpreters (68N20) Software, source code, etc. for problems pertaining to numerical analysis (65-04) Numerical linear algebra (65Fxx)

Abstract: Scientific programmers often turn to vendor-tuned Basic Linear Algebra Subprograms (BLAS) to obtain portable high performance. However, many numerical algorithms require several BLAS calls in sequence, and those successive calls result in suboptimal performance. The entire sequence needs to be optimized in concert. Instead of vendor-tuned BLAS, a programmer could start with source code in Fortran or C (e.g., based on the Netlib BLAS) and use a state-of-the-art optimizing compiler. However, our experiments show that optimizing compilers often attain only one-quarter the performance of hand-optimized code. In this paper we present a domain-specific compiler for matrix algebra, the Build to Order BLAS (BTO), that reliably achieves high performance using a scalable search algorithm for choosing the best combination of loop fusion, array contraction, and multithreading for data parallelism. The BTO compiler generates code that is between 16% slower and 39% faster than hand-optimized code.

Recommendations

Cites work

Cited in

(6)

Describes a project that uses

Uses Software

This page was built for publication: Reliable generation of high-performance matrix algebra

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2828141)