A model selection approach for multiple sequence segmentation and dimensionality reduction

DOI10.48550/ARXIV.1501.01756zbMATH Open1403.62044DBLPjournals/ma/CastroLCHL18arXiv1501.01756OpenAlexW1597804836WikidataQ101496321 ScholiaQ101496321MaRDI QIDQ143696FDOQ143696

Authors: Bruno M. de Castro, Florencia Leonardi, Bruno M. Castro, Renan B. Lemes, Jonatas Cesar, Tábita Hünemeier, Florencia Leonardi

Publication date: 8 January 2015

Published in: Journal of Multivariate Analysis (Search for Journal in Brave)

Abstract: In this paper we consider the problem of segmenting

n

aligned random sequences of equal length

m

, into a finite number of independent blocks. We propose to use a penalized maximum likelihood criterion to infer simultaneously the number of points of independence as well as the position of each one of these points. We show how to compute the estimator efficiently by means of a dynamic programming algorithm with time complexity

O (m^{2} n)

. We also propose another algorithm, called hierarchical algorithm, that provides an approximation to the estimator when the sample size increases and runs in time

O (m n)

. Our main theoretical result is the proof of almost sure consistency of the estimator and the convergence of the hierarchical algorithm when the sample size

n

grows to infinity. We illustrate the convergence of these algorithms through some simulation examples and we apply the method to a real protein sequence alignment of Ebola Virus.

Full work available at URL: https://arxiv.org/abs/1501.01756

Recommendations

zbMATH Keywords

dimensionality reduction model selection approach multiple sequence segmentation

Mathematics Subject Classification ID

Nonparametric estimation (62G05) Asymptotic properties of nonparametric inference (62G20) Estimation in multivariate analysis (62H12) Protein sequences, DNA sequences (92D20)

Cites Work

Cited In (5)

Uses Software

PLINK

This page was built for publication: A model selection approach for multiple sequence segmentation and dimensionality reduction

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q143696)