An extreme value theory for sequence matching (Q1081998)

From MaRDI portal





scientific article; zbMATH DE number 3971966
Language Label Description Also known as
default for all languages
No label defined
    English
    An extreme value theory for sequence matching
    scientific article; zbMATH DE number 3971966

      Statements

      An extreme value theory for sequence matching (English)
      0 references
      0 references
      0 references
      0 references
      1986
      0 references
      Suppose there are two finite sequences \(X_ 1,X_ 2,...,X_ m\) and \(Y_ 1,Y_ 2,...,Y_ n\) where the letters \(\{X_ i\}\) and \(\{Y_ j\}\) are chosen i.i.d. on a countable alphabet with \(p=P\{X_{\ell}=Y_{\ell}\}\in (0,1)\). This paper studies the distribution of the longest contiguous run of matches between the X's and Y's, allowing at most k mismatches. The distribution is closely approximated by that of the maximum of (1-p)mn i.i.d. negative binomial random variables. The latter is shown to behave like the integer part of an extreme value distribution. Approximations to the expectation and the variance are found. The paper concludes with an example in which segments taken from the DNA sequence of the bacteriophage lambda are compared. The authors attribute the lack of fit from asymptotic prediction to the small sample properties of the distribution of the maximum k-interrupted match run length. However, this reviewer's experience is that the i.i.d. assumption is usually not valid for such data, and so this may be the cause of the lack of fit. Corresponding results for the more difficult case of insertions and deletions are still outstanding.
      0 references
      distribution of the longest contiguous run of matches
      0 references
      negative binomial
      0 references
      extreme value distribution
      0 references
      DNA sequence
      0 references
      bacteriophage lambda
      0 references

      Identifiers

      0 references
      0 references
      0 references
      0 references
      0 references
      0 references