Alignment-free sequence comparison using absent words

From MaRDI portal
Publication:1784946

DOI10.1016/J.IC.2018.06.002zbMATH Open1400.68264arXiv1806.02718OpenAlexW2805439715WikidataQ129623427 ScholiaQ129623427MaRDI QIDQ1784946FDOQ1784946


Authors: Panagiotis Charalampopoulos, Maxime Crochemore, Gabriele Fici, Robert Mercaş, Solon P. Pissis Edit this on Wikidata


Publication date: 27 September 2018

Published in: Information and Computation (Search for Journal in Brave)

Abstract: Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realised by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as q-gram distance, are usually computed in time linear with respect to the length of the sequences. In this paper, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an {em absent word} of some sequence if it does not occur in the sequence. An absent word is {em minimal} if all its proper factors occur in the sequence. Here we present the first linear-time and linear-space algorithm to compare two sequences by considering {em all} their minimal absent words. In the process, we present results of combinatorial interest, and also extend the proposed techniques to compare circular sequences. We also present an algorithm that, given a word x of length n, computes the largest integer for which all factors of x of that length occur in some minimal absent word of x in time and space cO(n). Finally, we show that the known asymptotic upper bound on the number of minimal absent words of a word is tight.


Full work available at URL: https://arxiv.org/abs/1806.02718




Recommendations




Cites Work


Cited In (28)

Uses Software





This page was built for publication: Alignment-free sequence comparison using absent words

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1784946)