molecular-biology_promoters (Q6032977)

From MaRDI portal
Revision as of 09:46, 15 April 2024 by Importer (talk | contribs) (‎Created a new Item)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
OpenML dataset with id 164
Language Label Description Also known as
English
molecular-biology_promoters
OpenML dataset with id 164

    Statements

    0 references
    **Author**: C. Harley, R. Reynolds, M. Noordewier, J. Shavlik. \N**Source**: [UCI](https://archive.ics.uci.edu/ml/datasets/Molecular+Biology+(Promoter+Gene+Sequences)) - 1990 \N**Please cite**: [UCI](https://archive.ics.uci.edu/ml/citation_policy.html) \N\N**E. coli promoter gene sequences (DNA)** \NCompilation of promoters with known transcriptional start points for E. coli genes. The task is to recognize promoters in strings that represent nucleotides (one of A, G, T, or C). A promoter is a genetic region which initiates the first step in the expression of an adjacent gene (transcription). \N\NThe input features are 57 sequential DNA nucleotides. Fifty-three sample promoters and 53 nonpromoter sequences were used. The 53 sample promoters were obtained from a compilation\Nproduced by Hawley and McClure (1983). Negative training examples were thus derived by selecting contiguous substrings from a 1.5 kilobase sequence provided by Prof. T. Record of the Univ. of Wisconsin’s Chemistry Dept. This sequence is a fragment from E. coli bacteriophage T7 isolated with the restriction enzyme HaeIII. By virtue of the fact that the fragment does not bind RNA polymerase, it is believed to not contain any promoter sites.\N\NThis dataset has been developed to help evaluate a "hybrid" learning algorithm ("KBANN") that uses examples to inductively refine preexisting knowledge.\N\N### Attribute Description \N\N* 1. One of {+/-}, indicating the class ("+" = promoter).\N* 2. The instance name (non-promoters named by position in the 1500-long nucleotide sequence provided by T. Record).\N* 3-59. The remaining 57 fields are the sequence, starting at position -50 (p-50) and ending at position +7 (p7). Each of these fields is filled by one of {a, g, t, c}.\N \N### Relevant papers \N\N* Harley, C. and Reynolds, R. 1987. "Analysis of E. Coli Promoter Sequences." Nucleic Acids Research, 15:2343-2361. \N* Towell, G., Shavlik, J. and Noordewier, M. 1990. "Refinement of Approximate Domain Theories by Knowledge-Based Artificial Neural Networks." In Proceedings of the Eighth National Conference on Artificial Intelligence (AAAI-90).
    0 references
    23 April 2014
    0 references
    class
    0 references
    instance
    0 references
    51cdf7757673b47cadc65d6fbd223fd7
    0 references
    1
    0 references
    2
    0 references
    58
    0 references
    106
    0 references
    0
    0 references
    58
    0 references
    0 references

    Identifiers

    0 references