A nonlinear measure of subalignment similarity and its significance levels (Q1085099)

From MaRDI portal
scientific article
Language Label Description Also known as
English
A nonlinear measure of subalignment similarity and its significance levels
scientific article

    Statements

    A nonlinear measure of subalignment similarity and its significance levels (English)
    0 references
    0 references
    0 references
    0 references
    1986
    0 references
    A new measure of subalignment similarity is introduced. Specifically, similarity s(l,c) is defined as the logarithm to the base p of the probability of finding c or fewer mismatches in a subalignment of length l, where p is the probability of a match. Previous algorithms can not use this measure to find locally optimal subalignments because, unlike Needleman-Wunsch and Sellers similarities, this measure is nonlinear. A new pattern recognition algorithm is described for finding all locally optimal subalignments of two nucleotide sequences. The DD algorithm can use s(l,c) or any other reasonable similarity function to assess the relative interest of subalignments. The DD algorithm searches only the diagonal graph, which lacks insertions and deletions. This search strategy greatly decreases the computation time and does not require an arbitrary choice of gap cost. The paths of the resulting DD graph usually draw attention to likely locations for insertions and deletions. A heuristic formula is derived for estimating significance levels for s(l,c) in the context of the lengths of the two aligned sequences. The DD algorithm has been used to find interesting subalignments between the nucleotide sequences for human and murine interleukin 2.
    0 references
    0 references
    0 references
    0 references
    0 references
    nonlinear measure
    0 references
    biochemistry
    0 references
    new measure of subalignment similarity
    0 references
    new pattern recognition algorithm
    0 references
    locally optimal subalignments of two nucleotide sequences
    0 references
    heuristic formula
    0 references
    estimating significance levels
    0 references
    DD algorithm
    0 references
    0 references