Multiple sequence comparison and consistency on multipartite graphs (Q1894790): Difference between revisions
From MaRDI portal
Added link to MaRDI item. |
Set profile property. |
||
Property / MaRDI profile type | |||
Property / MaRDI profile type: MaRDI publication profile / rank | |||
Normal rank |
Revision as of 05:07, 5 March 2024
scientific article
Language | Label | Description | Also known as |
---|---|---|---|
English | Multiple sequence comparison and consistency on multipartite graphs |
scientific article |
Statements
Multiple sequence comparison and consistency on multipartite graphs (English)
0 references
18 March 1996
0 references
The human genome project has generated vast amounts of biological sequence data. While pairwise comparison of such sequences is traditionally done using the so-called dot-matrices, multiple comparison of \(n\) sequences poses the challenging task of identifying commonalities or consensuses in these, using the pairwise comparisons. The authors model this as an \(n\)-dimensional image reconstruction problem. The pairwise comparisons may correspond to projections of `real' \(n\)- dimensional points or `noises'; that is which are not projections in 2- dimensional coordinate planes of \(n\)-dimensional points. One way for the reconstruction is to use a notion of consistency of the pairwise comparisons, and the other extreme is the idea of transitivity. The mathematical model used is that of an \(n\)-partite graph \(G(V_1\cup V_2\cup\cdots \cup V_n, E)\), where the \(V_i\) correspond to the biological sequences consisting of `real points' and `noises' and the edges \((i, j)\), \(i\in V_s\), \(j\in V_t\) represent the pairwise alignments (dots). Consistency is defined through the membership of the edges in triangles. The mathematical consistency problem boils down to determining a maximal consistent subgraph of \(G\). Besides showing that an earlier algorithm by the first author and \textit{P. Argos} [J. Mol. Biol. 218, 33-43 (1991)] may involve \(O(L)\) numbers of iterations, each of complexity \(O(n^3 L^3)\), where \(L\) is the length of the sequences, the present authors give an algorithm with overall complexity \(O(n^3 L^3)\). The transitivity problem is the dual to this and can be tackled similarly. But, what may be useful for biological applications, is the introduction of a hierarchy of consistency classes through the idea of \(k\)-consistency defined here by means of a `\(k\)-mask' of a matrix. The utility of this is illustrated by the study of the protein elongation factor \(Tu\) in drosophila, homo sapiens and ten other species. Three open problems are suggested.
0 references
multiple sequence comparison
0 references
multipartite graphs
0 references
biological sequence data
0 references
pairwise comparisons
0 references
reconstruction
0 references
consistency
0 references
transitivity
0 references
alignments
0 references
algorithm
0 references
complexity
0 references
protein elongation factor \(Tu\)
0 references