splice (Q6032902)

From MaRDI portal
OpenML dataset with id 46
Language Label Description Also known as
English
splice
OpenML dataset with id 46

    Statements

    0 references
    0 references
    **Author**: Genbank. Donated by G. Towell, M. Noordewier, and J. Shavlik \N**Source**: [UCI](https://archive.ics.uci.edu/ml/datasets/Molecular+Biology+(Splice-junction+Gene+Sequences)) \N**Please cite**: None \N\NPrimate splice-junction gene sequences (DNA) with associated imperfect domain theory.\NSplice junctions are points on a DNA sequence at which 'superfluous' DNA is removed during the process of protein creation in higher organisms. The problem posed in this dataset is to recognize, given a sequence of DNA, the boundaries between exons (the parts of the DNA sequence retained after splicing) and introns (the parts of the DNA sequence that are spliced out). This problem consists of two subtasks: recognizing exon/intron boundaries (referred to as EI sites), and recognizing intron/exon boundaries (IE sites). (In the biological community, IE borders are referred to a ''acceptors'' while EI borders are referred to as ''donors''.)\N\NAll examples taken from Genbank 64.1. Categories "ei" and "ie" include every "split-gene" for primates in Genbank 64.1. Non-splice examples taken from sequences known not to include a splicing site.\N \N### Attribute Information \N>\N 1 One of {n ei ie}, indicating the class.\N 2 The instance name.\N 3-62 The remaining 60 fields are the sequence, starting at \N position -30 and ending at position +30. Each of\N these fields is almost always filled by one of \N {a, g, t, c}. Other characters indicate ambiguity among\N the standard characters according to the following table:\N character: meaning\N D: A or G or T\N N: A or G or C or T\N S: C or G\N R: A or G\N\NNotes: \N* Instance_name is an identifier and should be ignored for modelling
    0 references
    1992-01-01
    0 references
    6 April 2014
    0 references
    Class
    0 references
    0 references
    https://dl.acm.org/doi/abs/10.5555/2986766.2986838
    0 references
    21a60c8d1b14bbf0f146b4afeda39287
    0 references
    0
    0 references
    3
    0 references
    61
    0 references
    3,190
    0 references
    0
    0 references
    61
    0 references
    0 references