Reliable generation of high-performance matrix algebra

From MaRDI portal
Publication:2828141

DOI10.1145/2629698zbMATH Open1347.68066arXiv1205.1098OpenAlexW1877620707WikidataQ113310287 ScholiaQ113310287MaRDI QIDQ2828141FDOQ2828141


Authors: Thomas Nelson, Geoffrey Belter, Jeremy G. Siek, Elizabeth Jessup, Boyana Norris Edit this on Wikidata


Publication date: 24 October 2016

Published in: ACM Transactions on Mathematical Software (Search for Journal in Brave)

Abstract: Scientific programmers often turn to vendor-tuned Basic Linear Algebra Subprograms (BLAS) to obtain portable high performance. However, many numerical algorithms require several BLAS calls in sequence, and those successive calls result in suboptimal performance. The entire sequence needs to be optimized in concert. Instead of vendor-tuned BLAS, a programmer could start with source code in Fortran or C (e.g., based on the Netlib BLAS) and use a state-of-the-art optimizing compiler. However, our experiments show that optimizing compilers often attain only one-quarter the performance of hand-optimized code. In this paper we present a domain-specific compiler for matrix algebra, the Build to Order BLAS (BTO), that reliably achieves high performance using a scalable search algorithm for choosing the best combination of loop fusion, array contraction, and multithreading for data parallelism. The BTO compiler generates code that is between 16% slower and 39% faster than hand-optimized code.


Full work available at URL: https://arxiv.org/abs/1205.1098




Recommendations




Cites Work


Cited In (6)

Uses Software





This page was built for publication: Reliable generation of high-performance matrix algebra

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2828141)