Multiple sequence comparison and consistency on multipartite graphs (Q1894790): Difference between revisions

From MaRDI portal
Added link to MaRDI item.
Import240304020342 (talk | contribs)
Set profile property.
Property / MaRDI profile type
 
Property / MaRDI profile type: MaRDI publication profile / rank
 
Normal rank

Revision as of 05:07, 5 March 2024

scientific article
Language Label Description Also known as
English
Multiple sequence comparison and consistency on multipartite graphs
scientific article

    Statements

    Multiple sequence comparison and consistency on multipartite graphs (English)
    0 references
    0 references
    0 references
    18 March 1996
    0 references
    The human genome project has generated vast amounts of biological sequence data. While pairwise comparison of such sequences is traditionally done using the so-called dot-matrices, multiple comparison of \(n\) sequences poses the challenging task of identifying commonalities or consensuses in these, using the pairwise comparisons. The authors model this as an \(n\)-dimensional image reconstruction problem. The pairwise comparisons may correspond to projections of `real' \(n\)- dimensional points or `noises'; that is which are not projections in 2- dimensional coordinate planes of \(n\)-dimensional points. One way for the reconstruction is to use a notion of consistency of the pairwise comparisons, and the other extreme is the idea of transitivity. The mathematical model used is that of an \(n\)-partite graph \(G(V_1\cup V_2\cup\cdots \cup V_n, E)\), where the \(V_i\) correspond to the biological sequences consisting of `real points' and `noises' and the edges \((i, j)\), \(i\in V_s\), \(j\in V_t\) represent the pairwise alignments (dots). Consistency is defined through the membership of the edges in triangles. The mathematical consistency problem boils down to determining a maximal consistent subgraph of \(G\). Besides showing that an earlier algorithm by the first author and \textit{P. Argos} [J. Mol. Biol. 218, 33-43 (1991)] may involve \(O(L)\) numbers of iterations, each of complexity \(O(n^3 L^3)\), where \(L\) is the length of the sequences, the present authors give an algorithm with overall complexity \(O(n^3 L^3)\). The transitivity problem is the dual to this and can be tackled similarly. But, what may be useful for biological applications, is the introduction of a hierarchy of consistency classes through the idea of \(k\)-consistency defined here by means of a `\(k\)-mask' of a matrix. The utility of this is illustrated by the study of the protein elongation factor \(Tu\) in drosophila, homo sapiens and ten other species. Three open problems are suggested.
    0 references
    multiple sequence comparison
    0 references
    multipartite graphs
    0 references
    biological sequence data
    0 references
    pairwise comparisons
    0 references
    reconstruction
    0 references
    consistency
    0 references
    transitivity
    0 references
    alignments
    0 references
    algorithm
    0 references
    complexity
    0 references
    protein elongation factor \(Tu\)
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references