On the complexity of computing MP distance between binary phylogenetic trees (Q1682616)

From MaRDI portal
scientific article
Language Label Description Also known as
English
On the complexity of computing MP distance between binary phylogenetic trees
scientific article

    Statements

    On the complexity of computing MP distance between binary phylogenetic trees (English)
    0 references
    0 references
    0 references
    30 November 2017
    0 references
    Distance based measures are used to quantify the dissimilarity of two trees in Phylogenetics. Maximum Parsimony distance (MP-distance) is a recent addition to those measures. The MP-distance requires the search for a character which has a low parsimony score on one of the trees involved and a high score on the other one. In this article, authors show that the computation of the MP-distance on two Phylogenetic binary trees is NP-hard. A rooted phylogenetic \(X\)-tree is a rooted tree \(T=(V(T),E(T))\) on a leaf set \(X=\{1,2,\dots,n\}\subset V(T)\). A character \(f\) is a function \(f:X\to \mathcal{C}\), where \(\mathcal{C}\) is a set \(\{c_1,c_2,c_3,\ldots, c_k\}\) of \(k\) character states, \(k\) being a positive integer. If \(|f(X)|=r\), then \(f\) is said to be an \(r\)-state character. An extension of a character \(f\) to \(V(T)\) is a map \(g:V(T)\to \mathcal{C}\) such that \(g(i)=f(i)\) for all \(i\in X\). The parsimony score or parsimony length of a character \(f\) on \(T\), denoted by \(l_f(T)\), is obtained by minimizing \(l_g(T)\) over all possible extensions \(g\) of \(f\). The maximum parsimony distance \(d_{MP}\) of two phylogenetic trees \(T_1\) and \(T_2\) on the same set \(X\) is defined as \(d_{MP}(T_1,T_2)=\max_f\{|l_f(T_1)-l_f(T_2)|\}\), where this maximum is taken over all characters \(f\) on \(X\). With the help of an interesting symmetry breaking construction, a couple of lemmas and an observation, authors establish that the computation of \(d_{MP}(T_1,T_2)\) on binary trees is NP-hard. The proof of this result is pretty lengthy, logical and intuitive and progress by constructing will construct two trees \(T_E\) and \(T_V\) such that, for a certain integer \(P\), \(d_{MP}(T_E,T_V)=P\) if and only if \(\chi^\prime(G)=3\), and the authors establish many interesting properties on the phylogenetic trees \(T_E\) and \(T_V\). Following this result, the authors also prove that the computation of \(d^2_{MP}(T_1,T_2)\) is also NP-hard in a similar way and also show that computation of \(d^2_{MP}(T_1,T_2)\) is APX-hard as well. In the final section, the authors also present an integer linear programming formulation for binary instances. The article is well-written, innovative and interesting. As mentioned earlier, the content of this paper is logical, intuitive and presented sequentially in brilliant manner. The authors deserve much appreciation for the preparation of such a marvellous paper.
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    maximum parsimony distance
    0 references
    phylogenetics
    0 references
    binary trees
    0 references
    NP-hardness
    0 references
    0 references
    0 references
    0 references