An extreme value theory for sequence matching (Q1081998): Difference between revisions

From MaRDI portal
RedirectionBot (talk | contribs)
Removed claim: reviewed by (P1447): Item:Q458801
Created claim: Wikidata QID (P12): Q104764149, #quickstatements; #temporary_batch_1711504539957
 
(3 intermediate revisions by 3 users not shown)
Property / reviewed by
 
Property / reviewed by: Susan R. Wilson / rank
 
Normal rank
Property / MaRDI profile type
 
Property / MaRDI profile type: MaRDI publication profile / rank
 
Normal rank
Property / full work available at URL
 
Property / full work available at URL: https://doi.org/10.1214/aos/1176350045 / rank
 
Normal rank
Property / OpenAlex ID
 
Property / OpenAlex ID: W2093189601 / rank
 
Normal rank
Property / Wikidata QID
 
Property / Wikidata QID: Q104764149 / rank
 
Normal rank

Latest revision as of 03:10, 27 March 2024

scientific article
Language Label Description Also known as
English
An extreme value theory for sequence matching
scientific article

    Statements

    An extreme value theory for sequence matching (English)
    0 references
    0 references
    0 references
    0 references
    1986
    0 references
    Suppose there are two finite sequences \(X_ 1,X_ 2,...,X_ m\) and \(Y_ 1,Y_ 2,...,Y_ n\) where the letters \(\{X_ i\}\) and \(\{Y_ j\}\) are chosen i.i.d. on a countable alphabet with \(p=P\{X_{\ell}=Y_{\ell}\}\in (0,1)\). This paper studies the distribution of the longest contiguous run of matches between the X's and Y's, allowing at most k mismatches. The distribution is closely approximated by that of the maximum of (1-p)mn i.i.d. negative binomial random variables. The latter is shown to behave like the integer part of an extreme value distribution. Approximations to the expectation and the variance are found. The paper concludes with an example in which segments taken from the DNA sequence of the bacteriophage lambda are compared. The authors attribute the lack of fit from asymptotic prediction to the small sample properties of the distribution of the maximum k-interrupted match run length. However, this reviewer's experience is that the i.i.d. assumption is usually not valid for such data, and so this may be the cause of the lack of fit. Corresponding results for the more difficult case of insertions and deletions are still outstanding.
    0 references
    distribution of the longest contiguous run of matches
    0 references
    negative binomial
    0 references
    extreme value distribution
    0 references
    DNA sequence
    0 references
    bacteriophage lambda
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references