Algorithms to compute the Burrows-Wheeler similarity distribution

DOI10.1016/J.TCS.2019.03.012zbMATH Open1423.68621arXiv1903.10583OpenAlexW2920807403WikidataQ128232643 ScholiaQ128232643MaRDI QIDQ2420649FDOQ2420649

Authors: Yanyan Li

Publication date: 6 June 2019

Published in: Theoretical Computer Science (Search for Journal in Brave)

Abstract: The Burrows-Wheeler transform (BWT) is a well studied text transformation widely used in data compression and text indexing. The BWT of two strings can also provide similarity measures between them, based on the observation that the more their symbols are intermixed in the transformation, the more the strings are similar. In this article we present two new algorithms to compute similarity measures based on the BWT for string collections. In particular, we present practical and theoretical improvements to the computation of the Burrows-Wheeler similarity distribution for all pairs of strings in a collection. Our algorithms take advantage of the BWT computed for the concatenation of all strings, and use compressed data structures that allow reducing the running time with a small memory footprint, as shown by a set of experiments with real and artificial datasets.

Full work available at URL: https://arxiv.org/abs/1903.10583

Recommendations

zbMATH Keywords

parallel algorithms compressed data structures Burrows-Wheeler transform string similarity string collections

Mathematics Subject Classification ID

Data structures (68P05) Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science) (68P30) Parallel algorithms in computer science (68W10) Algorithms on strings (68W32)

Cites Work

Cited In (5)

This page was built for publication: Algorithms to compute the Burrows-Wheeler similarity distribution

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2420649)