molecular-biology_promoters (Q6032977)
From MaRDI portal
OpenML dataset with id 164
Language | Label | Description | Also known as |
---|---|---|---|
English | molecular-biology_promoters |
OpenML dataset with id 164 |
Statements
1
0 references
**Author**: C. Harley, R. Reynolds, M. Noordewier, J. Shavlik. \N**Source**: [UCI](https://archive.ics.uci.edu/ml/datasets/Molecular+Biology+(Promoter+Gene+Sequences)) - 1990 \N**Please cite**: [UCI](https://archive.ics.uci.edu/ml/citation_policy.html) \N\N**E. coli promoter gene sequences (DNA)** \NCompilation of promoters with known transcriptional start points for E. coli genes. The task is to recognize promoters in strings that represent nucleotides (one of A, G, T, or C). A promoter is a genetic region which initiates the first step in the expression of an adjacent gene (transcription). \N\NThe input features are 57 sequential DNA nucleotides. Fifty-three sample promoters and 53 nonpromoter sequences were used. The 53 sample promoters were obtained from a compilation\Nproduced by Hawley and McClure (1983). Negative training examples were thus derived by selecting contiguous substrings from a 1.5 kilobase sequence provided by Prof. T. Record of the Univ. of Wisconsin’s Chemistry Dept. This sequence is a fragment from E. coli bacteriophage T7 isolated with the restriction enzyme HaeIII. By virtue of the fact that the fragment does not bind RNA polymerase, it is believed to not contain any promoter sites.\N\NThis dataset has been developed to help evaluate a "hybrid" learning algorithm ("KBANN") that uses examples to inductively refine preexisting knowledge.\N\N### Attribute Description \N\N* 1. One of {+/-}, indicating the class ("+" = promoter).\N* 2. The instance name (non-promoters named by position in the 1500-long nucleotide sequence provided by T. Record).\N* 3-59. The remaining 57 fields are the sequence, starting at position -50 (p-50) and ending at position +7 (p7). Each of these fields is filled by one of {a, g, t, c}.\N \N### Relevant papers \N\N* Harley, C. and Reynolds, R. 1987. "Analysis of E. Coli Promoter Sequences." Nucleic Acids Research, 15:2343-2361. \N* Towell, G., Shavlik, J. and Noordewier, M. 1990. "Refinement of Approximate Domain Theories by Knowledge-Based Artificial Neural Networks." In Proceedings of the Eighth National Conference on Artificial Intelligence (AAAI-90).
0 references
23 April 2014
0 references
class
0 references
instance
0 references
1
0 references
2
0 references
58
0 references
106
0 references
0
0 references
0
0 references
0
0 references
58
0 references