Local multiple alignment via subgraph enumeration (Q5961633)
From MaRDI portal
scientific article; zbMATH DE number 981930
Language | Label | Description | Also known as |
---|---|---|---|
English | Local multiple alignment via subgraph enumeration |
scientific article; zbMATH DE number 981930 |
Statements
Local multiple alignment via subgraph enumeration (English)
0 references
11 November 1998
0 references
The algorithms described in this paper were developed to solve the following sort of computational problem from molecular biology. Suppose a newly sequenced protein has been used as the query for a database search, and that statistically significant similarity has been found to, say, six sequences from the database. The problem now is to identify the motif or motifs that are common to most or all of the seven sequences. In particular, the following difficulties may arise. (1) A motif may not appear in all of the sequences. (2) Two distinct motifs may appear in different subsets of the sequences. (3) A motif may have multiple appearance in a sequence. (4) Two motifs may appear in different multiplicites and relative orders in two sequences. Moreover, as in most sequence analysis problems in biology, when we say that a motif appears in some sequences, we mean that there are substrings that approximately match in a sense that needs to be made precise. Finally, the alignment-scoring scheme most appropriate for providing the desired precision to the notion of ``approximately match'' may depend on both the set of sequences and the particular motif. We discuss three problems, which we call blocking, chaining and flattening, that arise when computing a multiple-sequence alignment from given pairwise alignments. Blocking is the construction of gap-free multiple alignments, each called a ``block'', from the pairwise alignments; it is formalized here as the enumeration of maximal cliques in a certain graph. Chaining is the identification of a collection of blocks that can appear together in a multiple alignment, which we formalize as determining a maximal connected subgraph (of a different graph) that satisfies certain consistency conditions. Flattening is the introduction of gaps within a chain of blocks to create a multiple alignment, which involves solving a problem of dynamic bipartite matching. For each problem, practical algorithms are presented and shown to be effective for analyzing sequences containing internal repeats.
0 references
sequence comparison
0 references
subgraph enumeration
0 references
maximal clique
0 references
molecular biology
0 references
blocking
0 references
chaining
0 references
flattening
0 references
multiple alignments
0 references