On the subnet prune and regraft distance (Q1740360)

From MaRDI portal
scientific article
Language Label Description Also known as
English
On the subnet prune and regraft distance
scientific article

    Statements

    On the subnet prune and regraft distance (English)
    0 references
    0 references
    0 references
    30 April 2019
    0 references
    A phylogenetic tree graphically represents evolutionary transformation among organisms, where vertices represent biological species, and edges applicable local rearrangement operations, e.g., subtree prune and regraft, tree bisection and reconnection, etc.. It turns out that the search operation in the collection of all the phylogenetic trees plays a critical part of the process of reconstructing such a phylogenetic tree from molecular sequence data. \par Furthermore, rooted phylogenetic networks, i.e., leaf-labeled rooted directed acyclic graphs, have become increasingly popular in the analysis of ancestral relationships. In such a network, the structure of reticulation, or rather a special kinds of vertices, stands out: with in-degree 2 and out-degree 1, it naturally reflects the evolution of combining two ancestors into one descendant. \par Searching in such networks is also critical, although much more challenging since the space of such networks is much larger than that of the phylogenetic trees, which itself grows exponentially. To facilitate such a task, the tree based transforming operation, rooted subtree prune and regraft, or rSPR, was generalized to the subnet prune and regraft, SNPR, operation, appropriate for the phylogenetic networks. It is known that this late operation induces a metric on several classes, including that of all the phylogenetic networks with $n$ labels for their leaves. \par One important problem in phylogenetic study is to construct a phylogenetic tree such that distance in between two vertices equals their known distance. The problem studied in this paper seems to be the opposite, although related, i.e., find out the distance for two given phylogenetic networks. Simply put, the SNPR-distance, i.e., the distance between two phylogenetic networks equals the minimum number of SNPR operations that is required to transform one network into the another. Not surprisingly, the number of the aforementioned reticulations contributes a dominant term in such a distance expression. And it has been known that the calculation of such a distance, even in the tree case, is NP-hard. \par In this paper, the authors investigated several computation related problems that arise in the context of computing the SNPR-distance in the network search space. In particular, for a phylogenetic tree $T$ and a phylogenetic network $N,$ it is shown that such a distance between $T$ and $N$ can be computed by considering the set of all the trees that are embedded in $N,$ thus reduced to the problem of rSPR-distance calculation between phylogenetic trees. Incidentally, this NP-hard problem is fixed-parameter tractable with rSPR distance as the parameter. An alternative direct approach is also discussed to characterize the SNPR-distance between $T$ and $N$ in terms of a maximum agreement forest for $T$ and $N,$ expanding a notion underneath all the works related to the study of tree distances. In particular, if $F\ (=(T_\rho, T_1, \dots, T_k, E_1, \dots, E_r))$ is a maximum agreement forest for $T$ and $N$, where $r$ is the number of reticulations in $N$, i.e., the inherent need; and $k,$ representing the effort, is minimized, then the SNPR-distance between $T$ and $N$ is $k+r.$ The properties of shortest SNPR-sequences between two phylogenetic networks are also analyzed which help to answer the question, in the negative, whether or not, for any of the classes of tree-child, reticulation-visible, or tree-based networks, its associated search space embeds into only these phylogenetic networks. Computationally speaking, the search space for computing the SNPR-distance between two networks $N$ and $N^\prime$ cannot be restricted to those with its reticulation number bounded between $\min\{r, r^\prime\}$ and $\max\{r, r^\prime\},$ where $r$\ (respectively, $r^\prime$) is the number of reticulations of $N$\ (respectively, $N^\prime$). \par This reviewer does practice combinatorial study, in studying various problems in computer networks, where, e.g., distances between various nodes in the same network, or that of a certain network, are studied, but not that between networks. Through an incomplete review of this paper, this reviewer find these results, and properties, although not surprising, interesting and inspiring. The proofs are thorough and integrating. \par This paper certainly shows another application area where combinatoric study can play a significant role in exploring some subtle but important properties, both algorithmic and computational. It is at those points where this reviewer does feel that we are on the same page.
    0 references
    0 references
    phylogenetic network
    0 references
    search operation
    0 references
    SNPR-distance
    0 references
    reticulation
    0 references
    fixed-parameter tractable problem
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references