Local multiple alignment via subgraph enumeration (Q5961633)

From MaRDI portal
scientific article; zbMATH DE number 981930
Language Label Description Also known as
English
Local multiple alignment via subgraph enumeration
scientific article; zbMATH DE number 981930

    Statements

    Local multiple alignment via subgraph enumeration (English)
    0 references
    11 November 1998
    0 references
    The algorithms described in this paper were developed to solve the following sort of computational problem from molecular biology. Suppose a newly sequenced protein has been used as the query for a database search, and that statistically significant similarity has been found to, say, six sequences from the database. The problem now is to identify the motif or motifs that are common to most or all of the seven sequences. In particular, the following difficulties may arise. (1) A motif may not appear in all of the sequences. (2) Two distinct motifs may appear in different subsets of the sequences. (3) A motif may have multiple appearance in a sequence. (4) Two motifs may appear in different multiplicites and relative orders in two sequences. Moreover, as in most sequence analysis problems in biology, when we say that a motif appears in some sequences, we mean that there are substrings that approximately match in a sense that needs to be made precise. Finally, the alignment-scoring scheme most appropriate for providing the desired precision to the notion of ``approximately match'' may depend on both the set of sequences and the particular motif. We discuss three problems, which we call blocking, chaining and flattening, that arise when computing a multiple-sequence alignment from given pairwise alignments. Blocking is the construction of gap-free multiple alignments, each called a ``block'', from the pairwise alignments; it is formalized here as the enumeration of maximal cliques in a certain graph. Chaining is the identification of a collection of blocks that can appear together in a multiple alignment, which we formalize as determining a maximal connected subgraph (of a different graph) that satisfies certain consistency conditions. Flattening is the introduction of gaps within a chain of blocks to create a multiple alignment, which involves solving a problem of dynamic bipartite matching. For each problem, practical algorithms are presented and shown to be effective for analyzing sequences containing internal repeats.
    0 references
    sequence comparison
    0 references
    subgraph enumeration
    0 references
    maximal clique
    0 references
    molecular biology
    0 references
    blocking
    0 references
    chaining
    0 references
    flattening
    0 references
    multiple alignments
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references