Approximate word matches between two random sequences

From MaRDI portal
Publication:2476396

DOI10.1214/07-AAP452zbMATH Open1141.60013arXiv0801.3145MaRDI QIDQ2476396FDOQ2476396


Authors: Conrad J. Burden, Miriam Ruth Kantorovitz, Susan R. Wilson Edit this on Wikidata


Publication date: 19 March 2008

Published in: The Annals of Applied Probability (Search for Journal in Brave)

Abstract: Given two sequences over a finite alphabet mathcalL, the D2 statistic is the number of m-letter word matches between the two sequences. This statistic is used in bioinformatics for expressed sequence tag database searches. Here we study a generalization of the D2 statistic in the context of DNA sequences, under the assumption of strand symmetric Bernoulli text. For k<m, we look at the count of m-letter word matches with up to k mismatches. For this statistic, we compute the expectation, give upper and lower bounds for the variance and prove its distribution is asymptotically normal.


Full work available at URL: https://arxiv.org/abs/0801.3145




Recommendations




Cites Work


Cited In (9)

Uses Software





This page was built for publication: Approximate word matches between two random sequences

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2476396)