Note on the greedy parsing optimality for dictionary-based text compression

DOI10.1016/J.TCS.2014.01.013MaRDI QIDQ2437746zbMATH OpenOpenAlexWikidataFDO

Authors Maxime Crochemore, Alessio Langiu, Filippo Mignosi

Publication date 13 March 2014

Published in Theoretical Computer Science (Search for Journal in Brave)

Full work available at URL https://arxiv.org/abs/1211.5108

data compression text compression optimal parsing greedy parsing \texttt{LZ77} algorithm Lempel-Ziv factorisation

Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science) (68P30) Computing methodologies for text processing; mathematical typography (68U15)

Abstract: LZ77-based compression schemes compress the input text by replacing factors in the text with an encoded reference to a previous occurrence formed by the couple (length, offset). For a given factor, the smallest is the offset, the smallest is the resulting compression ratio. This is optimally achieved by using the rightmost occurrence of a factor in the previous text. Given a cost function, for instance the minimum number of bits used to represent an integer, we define the Rightmost Equal-Cost Position (REP) problem as the problem of finding one of the occurrences of a factor which cost is equal to the cost of the rightmost one. We present the Multi-Layer Suffix Tree data structure that, for a text of length n, at any time i, it provides REP(LPF) in constant time, where LPF is the longest previous factor, i.e. the greedy phrase, a reference to the list of REP({set of prefixes of LPF}) in constant time and REP(p) in time O(|p| log log n) for any given pattern p.

Recommendations

Cites work

Cited in

(9)

This page was built for publication: Note on the greedy parsing optimality for dictionary-based text compression

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2437746)