Approximate word matches between two random sequences
From MaRDI portal
Publication:2476396
DOI10.1214/07-AAP452zbMATH Open1141.60013arXiv0801.3145MaRDI QIDQ2476396FDOQ2476396
Authors: Conrad J. Burden, Miriam Ruth Kantorovitz, Susan R. Wilson
Publication date: 19 March 2008
Published in: The Annals of Applied Probability (Search for Journal in Brave)
Abstract: Given two sequences over a finite alphabet , the statistic is the number of -letter word matches between the two sequences. This statistic is used in bioinformatics for expressed sequence tag database searches. Here we study a generalization of the statistic in the context of DNA sequences, under the assumption of strand symmetric Bernoulli text. For , we look at the count of -letter word matches with up to mismatches. For this statistic, we compute the expectation, give upper and lower bounds for the variance and prove its distribution is asymptotically normal.
Full work available at URL: https://arxiv.org/abs/0801.3145
Recommendations
- Asymptotic Behavior of k-Word Matches Between two Uniformly Distributed Sequences
- Distributional regimes for the number of k -word matches between two random sequences
- An extreme value theory for sequence matching
- Counts of long aligned word matches among random letter sequences
- An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences
DNA sequencescentral limit theoremsequence comparisonword matchesnumber of \(m\)-letter word matches
Cites Work
- Normal convergence by higher semi-invariants with applications to sums of dependent random variables and random graphs
- Title not available (Why is that?)
- Distributional regimes for the number of k -word matches between two random sequences
- Title not available (Why is that?)
- Poisson approximation for dependent trials
- Compound Poisson approximation: A user's guide
- Title not available (Why is that?)
- Asymptotic Behavior of k-Word Matches Between two Uniformly Distributed Sequences
- Title not available (Why is that?)
Cited In (9)
- Distributional regimes for the number of k -word matches between two random sequences
- Counts of long aligned word matches among random letter sequences
- Extraction of high quality \(k\)-words for alignment-free sequence comparison
- Scoring unusual words with varying mismatch errors
- Limit distributions of extremal distances to the nearest neighbor
- Indifference pricing for CRRA utilities
- Empirical distribution of \(k\)-word matches in biological sequences
- New powerful statistics for alignment-free sequence comparison under a pattern transfer model
- Statistical considerations underpinning an alignment-free sequence comparison method
Uses Software
This page was built for publication: Approximate word matches between two random sequences
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2476396)