Textual data compression in computational biology: algorithmic techniques
DOI10.1016/j.cosrev.2011.11.001zbMath1298.68087OpenAlexW2089566450MaRDI QIDQ465689
Davide Scaturro, Raffaele Giancarlo, Filippo Utro
Publication date: 24 October 2014
Published in: Computer Science Review (Search for Journal in Brave)
Full work available at URL: https://doi.org/10.1016/j.cosrev.2011.11.001
entropyKolmogorov complexityhidden Markov modelssequence alignmentHuffman codingminimum description length principlealignment-free sequence comparisondata compression practicedata compression theoryLempel-Ziv compressorspattern discovery in bioinformaticsreverse engineering of biological networks
Database theory (68P15) Learning and adaptive systems in artificial intelligence (68T05) Pattern recognition, speech recognition (68T10) Algorithmic information theory (Kolmogorov complexity, etc.) (68Q30) Protein sequences, DNA sequences (92D20) Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science) (68P30) Research exposition (monographs, survey articles) pertaining to computer science (68-02) Computational methods for problems pertaining to biology (92-08)
Related Items (5)
Uses Software
Cites Work
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Unnamed Item
- Sublinear growth of information in DNA sequences
- The power of amnesia: Learning probabilistic automata with variable memory length
- Speeding up HMM decoding and training by exploiting sequence repetitions
- A small trip in the untranquil world of genomes: a survey on the detection and analysis of genome rearrangement breakpoints
- Normalized Lempel-Ziv complexity and its application in bio-sequence analysis
- Grammatical inference by hill climbing
- A new challenge for compression algorithms: Genetic sequences
- Comparison of TOPS strings based on LZ complexity
- Dynamical systems and computable information
- Compressing table data with column dependency
- Asymptotically optimal low-complexity sequential lossless coding for piecewise-stationary memoryless sources .I. The regular case
- Improving table compression with combinatorial optimization
- Speeding Up HMM Decoding and Training by Exploiting Sequence Repetitions
- Clustering by Compression
- The Similarity Metric
- Maximal Words in Sequence Comparisons Based on Subword Composition
- On Finite Memory Universal Data Compression and Classification of Individual Sequences
- The Application of Data Compression-Based Distances to Biological Sequences
- A Greedy Heuristic for the Set-Covering Problem
- Linear Algorithm for Data Compression via String Matching
- Universal codeword sets and representations of the integers
- On the Complexity of Finite Sequences
- Compression of individual sequences via variable-rate coding
- Arithmetic Coding
- Biological Sequence Analysis
- Algorithms on Strings, Trees and Sequences
- A Subquadratic Sequence Alignment Algorithm for Unrestricted Scoring Matrices
- New text indexing functionalities of the compressed suffix arrays
- Minimum description length induction, Bayesianism, and Kolmogorov complexity
- Grammar-based codes: a new class of universal lossless source codes
- A dynamic programming algorithm for haplotype block partitioning
- Information distance
- The minimum description length principle in coding and modeling
- On classification with empirically observed statistics and universal data compression
- Biological Information as Set-Based Complexity
- Bridging Lossy and Lossless Compression by Motif Pattern Discovery
- A Method for the Construction of Minimum-Redundancy Codes
- Run-length encodings (Corresp.)
- Error bounds for convolutional codes and an asymptotically optimum decoding algorithm
- Combinatorial Pattern Matching
This page was built for publication: Textual data compression in computational biology: algorithmic techniques