Sublinear time motif discovery from multiple sequences (Q1736589): Difference between revisions

From MaRDI portal
Importer (talk | contribs)
Changed an Item
ReferenceBot (talk | contribs)
Changed an Item
 
Property / cites work
 
Property / cites work: On covering problems of codes / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4252402 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Distinguishing string selection problems. / rank
 
Normal rank
Property / cites work
 
Property / cites work: On the closest string and substring problems / rank
 
Normal rank
Property / cites work
 
Property / cites work: Finding similar regions in many strings / rank
 
Normal rank
Property / cites work
 
Property / cites work: Algorithms on Strings, Trees and Sequences / rank
 
Normal rank
Property / cites work
 
Property / cites work: Probabilistic Analysis of a Motif Discovery Algorithm for Multiple Sequences / rank
 
Normal rank
Property / cites work
 
Property / cites work: Discovering almost any hidden motif from multiple sequences / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4856179 / rank
 
Normal rank
Property / cites work
 
Property / cites work: Q4139463 / rank
 
Normal rank

Latest revision as of 21:48, 18 July 2024

scientific article
Language Label Description Also known as
English
Sublinear time motif discovery from multiple sequences
scientific article

    Statements

    Sublinear time motif discovery from multiple sequences (English)
    0 references
    0 references
    0 references
    0 references
    0 references
    26 March 2019
    0 references
    Summary: In this paper, a natural probabilistic model for motif discovery has been used to experimentally test the quality of motif discovery programs. In this model, there are \(k\) background sequences, and each character in a background sequence is a random character from an alphabet, \(\Sigma\). A motif \(G=g_1 g_2 \dots g_m\) is a string of \(m\) characters. In each background sequence is implanted a probabilistically-generated approximate copy of \(G\). For a probabilistically-generated approximate copy \(b_1 b_2 \dots b_m\) of \(G\), every character, \(b_i\), is probabilistically generated, such that the probability for \(b_i \neq g_i\) is at most \(\alpha\). We develop two new randomized algorithms and one new deterministic algorithm. They make advancements in the following aspects: (1) The algorithms are much faster than those before. Our algorithms can even run in sublinear time. (2) They can handle any motif pattern. (3) The restriction for the alphabet size is a lower bound of four. This gives them potential applications in practical problems, since gene sequences have an alphabet size of four. (4) All algorithms have rigorous proofs about their performances. The methods developed in this paper have been used in the software implementation. We observed some encouraging results that show improved performance for motif detection compared with other software.
    0 references
    motif discovery
    0 references
    sublinear time
    0 references
    randomized algorithm
    0 references
    deterministic algorithm
    0 references
    0 references
    0 references

    Identifiers

    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references
    0 references