Sublinear time motif discovery from multiple sequences

DOI10.3390/A6040636MaRDI QIDQ1736589zbMATH OpenOpenAlexFDO

Authors Bin Fu, Yunhui Fu, Yuan Xue

Publication date 26 March 2019

Published in Algorithms (Search for Journal in Brave)

Copyright license Creative Commons Attribution 4.0 International

Full work available at URL https://arxiv.org/abs/1007.2618

zbMATH Keywords

randomized algorithm deterministic algorithm motif discovery sublinear time

Mathematics Subject Classification ID

Protein sequences, DNA sequences (92D20) Randomized algorithms (68W20) Computational methods for problems pertaining to biology (92-08) Algorithms on strings (68W32)

Abstract: A natural probabilistic model for motif discovery has been used to experimentally test the quality of motif discovery programs. In this model, there are

k

background sequences, and each character in a background sequence is a random character from an alphabet

S i g m a

. A motif

G = g_{1} g_{2} ... g_{m}

is a string of

m

characters. Each background sequence is implanted a probabilistically generated approximate copy of

G

. For a probabilistically generated approximate copy

b_{1} b_{2} ... b_{m}

of

G

, every character

b_{i}

is probabilistically generated such that the probability for

b_{i} e q g_{i}

is at most

a l p h a

. We develop three algorithms that under the probabilistic model can find the implanted motif with high probability via a tradeoff between computational time and the probability of mutation. The methods developed in this paper have been used in the software implementation. We observed some encouraging results that show improved performance for motif detection compared with other softwares.

Recommendations

Cites work

Cited in

(14)

Describes a project that uses

Uses Software

This page was built for publication: Sublinear time motif discovery from multiple sequences

Report a bug (only for logged in users!)Click here to report a bug for this page (MaRDI item Q1736589)