Sublinear time motif discovery from multiple sequences (Q1736589): Difference between revisions

Summary: In this paper, a natural probabilistic model for motif discovery has been used to experimentally test the quality of motif discovery programs. In this model, there are \(k\) background sequences, and each character in a background sequence is a random character from an alphabet, \(\Sigma\). A motif \(G=g_1 g_2 \dots g_m\) is a string of \(m\) characters. In each background sequence is implanted a probabilistically-generated approximate copy of \(G\). For a probabilistically-generated approximate copy \(b_1 b_2 \dots b_m\) of \(G\), every character, \(b_i\), is probabilistically generated, such that the probability for \(b_i \neq g_i\) is at most \(\alpha\). We develop two new randomized algorithms and one new deterministic algorithm. They make advancements in the following aspects: (1) The algorithms are much faster than those before. Our algorithms can even run in sublinear time. (2) They can handle any motif pattern. (3) The restriction for the alphabet size is a lower bound of four. This gives them potential applications in practical problems, since gene sequences have an alphabet size of four. (4) All algorithms have rigorous proofs about their performances. The methods developed in this paper have been used in the software implementation. We observed some encouraging results that show improved performance for motif detection compared with other software.

0 references

Mathematics Subject Classification ID

68W32

0 references

0 references

0 references

0 references

0 references

motif discovery

0 references

sublinear time

0 references

randomized algorithm

0 references

deterministic algorithm

0 references

describes a project that uses

WebMOTIFS

0 references

PhyME

0 references

MaRDI profile type

MaRDI publication profile

0 references

0 references

0 references

On covering problems of codes

0 references

Q4252402

0 references

Distinguishing string selection problems.

0 references

On the closest string and substring problems

0 references

Finding similar regions in many strings

0 references

Algorithms on Strings, Trees and Sequences

0 references

Probabilistic Analysis of a Motif Discovery Algorithm for Multiple Sequences

0 references

Discovering almost any hidden motif from multiple sequences

0 references

Q4856179

0 references

Q4139463

0 references

Sitelinks

Mathematics(1 entry)

mardi Publication:1736589

@@ Property / describes a project that uses @@
+WebMOTIFS
@@ Property / describes a project that uses: WebMOTIFS / rank @@
+Normal rank
@@ Property / describes a project that uses @@
+PhyME
@@ Property / describes a project that uses: PhyME / rank @@
+Normal rank
@@ Property / MaRDI profile type @@
+MaRDI publication profile
@@ Property / MaRDI profile type: MaRDI publication profile / rank @@
+Normal rank
@@ Property / OpenAlex ID @@
+W1999959196
@@ Property / OpenAlex ID: W1999959196 / rank @@
+Normal rank
@@ Property / arXiv ID @@
+.2618
@@ Property / arXiv ID: 1007.2618 / rank @@
+Normal rank
@@ Property / cites work @@
+On covering problems of codes
@@ Property / cites work: On covering problems of codes / rank @@
+Normal rank
@@ Property / cites work @@
+Q4252402
@@ Property / cites work: Q4252402 / rank @@
+Normal rank
@@ Property / cites work @@
+Distinguishing string selection problems.
@@ Property / cites work: Distinguishing string selection problems. / rank @@
+Normal rank
@@ Property / cites work @@
+On the closest string and substring problems
@@ Property / cites work: On the closest string and substring problems / rank @@
+Normal rank
@@ Property / cites work @@
+Finding similar regions in many strings
@@ Property / cites work: Finding similar regions in many strings / rank @@
+Normal rank
@@ Property / cites work @@
+Algorithms on Strings, Trees and Sequences
@@ Property / cites work: Algorithms on Strings, Trees and Sequences / rank @@
+Normal rank
@@ Property / cites work @@
+Probabilistic Analysis of a Motif Discovery Algorithm for Multiple Sequences
+Normal rank
@@ Property / cites work @@
+Discovering almost any hidden motif from multiple sequences
+Normal rank
@@ Property / cites work @@
+Q4856179
@@ Property / cites work: Q4856179 / rank @@
+Normal rank
@@ Property / cites work @@
+Q4139463
@@ Property / cites work: Q4139463 / rank @@
+Normal rank
@@ links / mardi / name / links / mardi / name @@
+Publication:1736589