TTC: a high-performance compiler for tensor transpositions

From MaRDI portal
Publication:4581372

DOI10.1145/3104988zbMATH Open1484.68045arXiv1603.02297OpenAlexW2963964033WikidataQ113310112 ScholiaQ113310112MaRDI QIDQ4581372FDOQ4581372


Authors: Paul Springer, Jeff R. Hammond, Paolo Bientinesi Edit this on Wikidata


Publication date: 17 August 2018

Published in: ACM Transactions on Mathematical Software (Search for Journal in Brave)

Abstract: We present TTC, an open-source parallel compiler for multidimensional tensor transpositions. In order to generate high-performance C++ code, TTC explores a number of optimizations, including software prefetching, blocking, loop-reordering, and explicit vectorization. To evaluate the performance of multidimensional transpositions across a range of possible use-cases, we also release a benchmark covering arbitrary transpositions of up to six dimensions. Performance results show that the routines generated by TTC achieve close to peak memory bandwidth on both the Intel Haswell and the AMD Steamroller architectures, and yield significant performance gains over modern compilers. By implementing a set of pruning heuristics, TTC allows users to limit the number of potential solutions; this option is especially useful when dealing with high-dimensional tensors, as the search space might become prohibitively large. Experiments indicate that when only 100 potential solutions are considered, the resulting performance is about 99% of that achieved with exhaustive search.


Full work available at URL: https://arxiv.org/abs/1603.02297




Recommendations





Cited In (4)

Uses Software





This page was built for publication: TTC: a high-performance compiler for tensor transpositions

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q4581372)