Suffix-sorting via Shannon-Fano-Elias codes (Q1662548): Difference between revisions

From MaRDI portal
RedirectionBot (talk | contribs)
Changed an Item
Normalize DOI.
 
(3 intermediate revisions by 3 users not shown)
Property / DOI
 
Property / DOI: 10.3390/a3020145 / rank
Normal rank
 
Property / MaRDI profile type
 
Property / MaRDI profile type: MaRDI publication profile / rank
 
Normal rank
Property / full work available at URL
 
Property / full work available at URL: https://doi.org/10.3390/a3020145 / rank
 
Normal rank
Property / OpenAlex ID
 
Property / OpenAlex ID: W2175286270 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Suffix Arrays: A New Method for On-Line String Searches / rank
 
Normal rank
Property / cites work
 
Property / cites work: Faster suffix sorting / rank
 
Normal rank
Property / cites work
 
Property / cites work: Engineering a lightweight suffix array construction algorithm / rank
 
Normal rank
Property / cites work
 
Property / cites work: Algorithms on Strings, Trees and Sequences / rank
 
Normal rank
Property / cites work
 
Property / cites work: Linear work suffix array construction / rank
 
Normal rank
Property / cites work
 
Property / cites work: Space efficient linear time construction of suffix arrays / rank
 
Normal rank
Property / cites work
 
Property / cites work: Asymptotic properties of data compression and suffix trees / rank
 
Normal rank
Property / cites work
 
Property / cites work: Replacing suffix trees with enhanced suffix arrays / rank
 
Normal rank
Property / cites work
 
Property / cites work: On the sorting-complexity of suffix tree construction / rank
 
Normal rank
Property / cites work
 
Property / cites work: Constructing suffix arrays in linear time / rank
 
Normal rank
Property / cites work
 
Property / cites work: Optimal Lightweight Construction of Suffix Arrays for Constant Alphabets / rank
 
Normal rank
Property / cites work
 
Property / cites work: Breaking a Time-and-Space Barrier in Constructing Full-Text Indices / rank
 
Normal rank
Property / cites work
 
Property / cites work: Combinatorial Pattern Matching / rank
 
Normal rank
Property / cites work
 
Property / cites work: Fast Lightweight Suffix Array Construction and Checking / rank
 
Normal rank
Property / cites work
 
Property / cites work: Linear Time Suffix Array Construction Using D-Critical Substrings / rank
 
Normal rank
Property / cites work
 
Property / cites work: In-Place Suffix Sorting / rank
 
Normal rank
Property / cites work
 
Property / cites work: Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching / rank
 
Normal rank
Property / cites work
 
Property / cites work: New approaches for computer analysis of nucleic acid sequences. / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4023085 / rank
 
Normal rank
Property / DOI
 
Property / DOI: 10.3390/A3020145 / rank
 
Normal rank

Latest revision as of 01:54, 11 December 2024

scientific article
Language Label Description Also known as
English
Suffix-sorting via Shannon-Fano-Elias codes
scientific article

    Statements

    Suffix-sorting via Shannon-Fano-Elias codes (English)
    0 references
    0 references
    0 references
    0 references
    20 August 2018
    0 references
    Summary: Given a sequence \(T=t_0t_1\dots t_{n-1}\) of size \(n=| T|\), with symbols from a fixed alphabet \(\Sigma\), \((|\Sigma|\leq n)\), the suffix array provides a listing of all the suffixes of \(T\) in a lexicographic order. Given \(T\), the suffix sorting problem is to construct its suffix array. The direct suffix sorting problem is to construct the suffix array of \(T\) directly without using the suffix tree data structure. While algorithims for linear time, linear space direct suffix sorting have been proposed, the actual constant in the linear space is still a major concern, given that the applications of suffix trees and suffix arrays (such as in whole-genome analysis) often involve huge data sets. In this work, we reduce the gap between current results and the minimal space requirement. We introduce an algorithm for the direct suffix sorting problem with worst case time complexity in \(O(n)\), requiring only \((1\frac{2}{3}n\log n-n\log|\Sigma|+O(1))\) bits in memory space. This implies \(5\frac{2}{3}n+O(1)\) bytes for total space requirment, (including space for both the output suffix array and the input sequence \(T\)) assuming \(n\leq 2^{32}\), \(|\Sigma|\leq 256\), and 4 bytes per integer. The basis of our algorithm is an extension of Shannon-Fano-Elias codes used in source coding and information theory. This is the first time information-theoretic methods have been used as the basis for solving the suffix sorting problem.
    0 references
    suffix sorting
    0 references
    suffix arrays
    0 references
    suffix tree
    0 references
    Shannon-Fano-Elias codes
    0 references
    source coding
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references