Learning nonsingular phylogenies and hidden Markov models (Q5901107)

From MaRDI portal
scientific article; zbMATH DE number 5178103
Language Label Description Also known as
English
Learning nonsingular phylogenies and hidden Markov models
scientific article; zbMATH DE number 5178103

    Statements

    Learning nonsingular phylogenies and hidden Markov models (English)
    0 references
    0 references
    0 references
    16 August 2010
    0 references
    8 August 2007
    0 references
    The aim of the present paper is to study and to provide new interesting results on the learning problem for phylogenies and hidden Markov models. Phylogenies are used in evolutionary biology to model the stochastic evolution of genetic data on the ancestral tree relating a group of species. The underlying assumption is that genetic information evolves from the root to the leaves according to a Markov model on the tree. In molecular biology, the problem of reconstructing the phylogenetic trees is to obtain the topology of the (unknown) tree from the characters (sequences) at the tree leaves (extant species). The present paper pays a special attention to the problem of inferring the complete Markov model, i.e. inferring the Markov matrices on the edges. The authors looked for the design of efficient reconstruction algorithms of phylogenies, based on learning techniques. A Markov model is called nonsingular is all its transition matrices have determinants bounded away from 0 (and 1). The role of the nonsingularity condition is pointed out for the learning approach. In learning theory, the source of hardness for tree topologies and hidden Markov models are the transition matrices \(P\) for which \(\det(P)\) is 0 (or close to 0), but \(\text{rank}(P) > 1\). When the number of character states is \(k = 2,\) there are no matrices \(P\) whose determinant is 0 and rank is more than 1, thus efficient reconstruction algorithms of phylogenies and hidden Markov models do exist for this case. The main contribution of this paper is to investigate the problem of learning hidden Markov models without the nonsingularity condition, for \(k > 2\), and proper polynomial-time (PAC) algorithm for learning nonsingular phylogenies and hidden Markov models under the weaker condition that \(1 / \text{poly}(n)< |\det(P)| \leq 1\). The obtained class of learning algorithms is based on a combination of techniques from phylogeny, statistics, combinatorics, and probability. Several known results are shown to be particular or closely related cases, including the recovery of the mutation matrices from an infinite number of samples. The reconstruction of the mutation matrices from a polynomial number of samples is proved to require a refined error analysis along with various combinatorial and algorithmic ideas.
    0 references
    efficient reconstruction algorithms
    0 references
    problem of reconstructing the phylogenetic trees
    0 references
    hidden Markov models
    0 references
    evolutionary trees
    0 references
    phylogenetic reconstruction
    0 references
    stochastic evolution of genetic data
    0 references
    PAC learning
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references