Approximate word matches between two random sequences
From MaRDI portal
Publication:2476396
Abstract: Given two sequences over a finite alphabet , the statistic is the number of -letter word matches between the two sequences. This statistic is used in bioinformatics for expressed sequence tag database searches. Here we study a generalization of the statistic in the context of DNA sequences, under the assumption of strand symmetric Bernoulli text. For , we look at the count of -letter word matches with up to mismatches. For this statistic, we compute the expectation, give upper and lower bounds for the variance and prove its distribution is asymptotically normal.
Recommendations
- Asymptotic Behavior of k-Word Matches Between two Uniformly Distributed Sequences
- Distributional regimes for the number of k -word matches between two random sequences
- An extreme value theory for sequence matching
- Counts of long aligned word matches among random letter sequences
- An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences
Cites work
- scientific article; zbMATH DE number 50805 (Why is no real title available?)
- scientific article; zbMATH DE number 3438144 (Why is no real title available?)
- scientific article; zbMATH DE number 1912144 (Why is no real title available?)
- scientific article; zbMATH DE number 850226 (Why is no real title available?)
- Asymptotic Behavior of k-Word Matches Between two Uniformly Distributed Sequences
- Compound Poisson approximation: A user's guide
- Distributional regimes for the number of k -word matches between two random sequences
- Normal convergence by higher semi-invariants with applications to sums of dependent random variables and random graphs
- Poisson approximation for dependent trials
Cited in
(9)- Distributional regimes for the number of k -word matches between two random sequences
- Counts of long aligned word matches among random letter sequences
- Extraction of high quality \(k\)-words for alignment-free sequence comparison
- Scoring unusual words with varying mismatch errors
- Limit distributions of extremal distances to the nearest neighbor
- Indifference pricing for CRRA utilities
- Empirical distribution of \(k\)-word matches in biological sequences
- New powerful statistics for alignment-free sequence comparison under a pattern transfer model
- Statistical considerations underpinning an alignment-free sequence comparison method
This page was built for publication: Approximate word matches between two random sequences
Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q2476396)