On the optimisation of the GSACA suffix array construction algorithm
From MaRDI portal
Publication:6166977
DOI10.1007/978-3-031-20643-6_8zbMATH Open1525.68040arXiv2206.12222MaRDI QIDQ6166977FDOQ6166977
Authors: Jannik Olbrich, Enno Ohlebusch, Thomas Büchler
Publication date: 4 August 2023
Published in: String Processing and Information Retrieval (Search for Journal in Brave)
Abstract: The suffix array is arguably one of the most important data structures in sequence analysis and consequently there is a multitude of suffix sorting algorithms. However, to this date the GSACA algorithm introduced in 2015 is the only known non-recursive linear-time suffix array construction algorithm (SACA). Despite its interesting theoretical properties, there has been little effort in improving the algorithm's subpar real-world performance. There is a super-linear algorithm DSH which relies on the same sorting principle and is faster than DivSufSort, the fastest SACA for over a decade. This paper is concerned with analysing the sorting principle used in GSACA and DSH and exploiting its properties in order to give an optimised linear-time algorithm. Our resulting algorithm is not only significantly faster than GSACA but also outperforms DivSufSort and DSH.
Full work available at URL: https://arxiv.org/abs/2206.12222
Recommendations
Cites Work
- Factorizing words over an ordered alphabet
- Bioinformatics algorithms. Sequence analysis, genome rearrangements, and phylogenetic reconstruction
- Title not available (Why is that?)
- Linear-time suffix sorting -- a new approach for suffix array construction
- Optimal in-place suffix sorting
- Lyndon Words Accelerate Suffix Sorting.
This page was built for publication: On the optimisation of the GSACA suffix array construction algorithm
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q6166977)