On Finite Memory Universal Data Compression and Classification of Individual Sequences

DOI10.1109/TIT.2008.917666MaRDI QIDQ3604520zbMATH OpenOpenAlexFDO

Publication date 24 February 2009

Published in IEEE Transactions on Information Theory (Search for Journal in Brave)

Full work available at URL https://arxiv.org/abs/cs/0612019

Source coding (94A29) Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science) (68P30)

Abstract: Consider the case where consecutive blocks of N letters of a semi-infinite individual sequence X over a finite-alphabet are being compressed into binary sequences by some one-to-one mapping. No a-priori information about X is available at the encoder, which must therefore adopt a universal data-compression algorithm. It is known that if the universal LZ77 data compression algorithm is successively applied to N-blocks then the best error-free compression for the particular individual sequence X is achieved, as

N

tends to infinity. The best possible compression that may be achieved by any universal data compression algorithm for finite N-blocks is discussed. It is demonstrated that context tree coding essentially achieves it. Next, consider a device called classifier (or discriminator) that observes an individual training sequence X. The classifier's task is to examine individual test sequences of length N and decide whether the test N-sequence has the same features as those that are captured by the training sequence X, or is sufficiently different, according to some appropriatecriterion. Here again, it is demonstrated that a particular universal context classifier with a storage-space complexity that is linear in N, is essentially optimal. This may contribute a theoretical "individual sequence" justification for the Probabilistic Suffix Tree (PST) approach in learning theory and in computational biology.

Cited in

(2)

This page was built for publication: On Finite Memory Universal Data Compression and Classification of Individual Sequences

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q3604520)