Statistically consistent and computationally efficient inference of ancestral DNA sequences in the TKF91 model under dense taxon sampling (Q2299336)

scientific article

Language	Label	Description	Also known as
default for all languages	No label defined
English	Statistically consistent and computationally efficient inference of ancestral DNA sequences in the TKF91 model under dense taxon sampling	scientific article

Statements

instance of

scholarly article

0 references

title

Statistically consistent and computationally efficient inference of ancestral DNA sequences in the TKF91 model under dense taxon sampling (English)

0 references

0 references

0 references

Bulletin of Mathematical Biology

0 references

publication date

21 February 2020

0 references

full work available at URL

https://arxiv.org/abs/1707.05711

0 references

review text

In the present paper, the authors are interested in statistically consistent estimators for the ASR problem under the TKF91 process in the taxon-rich setting, which differs from the ``solvability'' results in [\textit{A. Andoni} et al., Stochastic Processes Appl. 122, No. 12, 3852--3874 (2012; Zbl 1250.92034)]. In fact, an ASR statistical consistency result in this context is already implied by the general results of [\textit{W.-T. Fan} and \textit{S. Roch}, Electron. J. Probab. 23, Paper No. 47, 24 p. (2018; Zbl 1410.60074)]. More concrete they are considered the ancestral sequence reconstruction (ASR) problem in the taxon-rich context for the TKF91 process. It has been known from previous work [Zbl 1410.60074, Theorem 1] that the Big Bang condition is necessary for the existence of consistent estimators. In this paper, the authors design the first estimator which is not only consistent but also explicit and computationally tractable. They ancestral reconstruction algorithm involves two steps: first is estimated the length of the ancestral sequence and then are estimated the nucleotides conditioned on the sequence length. The novel observation that leads to the design of authors estimator is a new constructive proof of initial-state identifiability, formulated in Lemma 2, which says that one can explicitly invert the mapping from the root sequence to the distribution of the leaf sequences. This is nontrivial for evolutionary models with indels. This estimator is computationally efficient in the sense that the number of arithmetic operations required scales like a polynomial in the size of the input data. Indeed the length estimator is linear in the number of input sequences and the matrix manipulations in the sequence estimator are polynomial in the length of the longest input sequence.

0 references

reviewed by

Andrey Zahariev

0 references

zbMATH Keywords

phylogenetics

0 references

ancestral reconstruction

0 references

insertion/deletions

0 references

MaRDI profile type