isolet

From MaRDI portal
Dataset:6033075



OpenML300MaRDI QIDQ6033075FDOQ6033075RO-CrateQ6033075

OpenML dataset with id 300

Mark Fanty, Ron Cole

Full work available at URL: https://api.openml.org/data/v1/download/52405/isolet.arff

Upload date: 20 August 2014



Dataset Characteristics

Number of classes: 26
Number of features: 618 (numeric: 617, symbolic: 1 and in total binary: 0 )
Number of instances: 7,797
Number of instances with missing values: 0
Number of missing values: 0

Author: Ron Cole and Mark Fanty (cole@cse.ogi.edu, fanty@cse.ogi.edu) Donor: Tom Dietterich (tgd@cs.orst.edu) Source: UCI Please cite: UCI

Description

ISOLET (Isolated Letter Speech Recognition) dataset was generated as follows: 150 subjects spoke the name of each letter of the alphabet twice. Hence, there are 52 training examples from each speaker. The speakers are grouped into sets of 30 speakers each, 4 groups can serve as training set, the last group as the test set. A total of 3 examples are missing, the authors dropped them due to difficulties in recording.

This is a good domain for a noisy, perceptual task. It is also a very good domain for testing the scaling abilities of algorithms. For example, C4.5 on this domain is slower than backpropagation!

Source

  • Creators:

Ron Cole and Mark Fanty Department of Computer Science and Engineering, Oregon Graduate Institute, Beaverton, OR 97006. cole '@' cse.ogi.edu, fanty '@' cse.ogi.edu

  • Donor:

Tom Dietterich Department of Computer Science Oregon State University, Corvallis, OR 97331 tgd '@' cs.orst.edu

Attributes Information

All attributes are continuous, real-valued attributes scaled into the range -1.0 to 1.0. The features are described in the paper by Cole and Fanty cited below. The features include spectral coefficients; contour features, sonorant features, pre-sonorant features, and post-sonorant features. The exact order of appearance of the features is not known.

Relevant papers

Fanty, M., Cole, R. (1991). Spoken letter recognition. In Lippman, R. P., Moody, J., and Touretzky, D. S. (Eds). Advances in Neural Information Processing Systems 3. San Mateo, CA: Morgan Kaufmann.

Dietterich, T. G., Bakiri, G. (1991) Error-correcting output codes: A general method for improving multiclass inductive learning programs. Proceedings of the Ninth National Conference on Artificial Intelligence (AAAI-91), Anaheim, CA: AAAI Press.

Dietterich, T. G., Bakiri, G. (1994) Solving Multiclass Learning Problems via Error-Correcting Output Codes.





ROCrate

What is a RO-Crate?

A RO-Crate is a standardized research object package used to bundle data together with rich machine-readable metadata. Each RO-Crate contains:

  • the files belonging to the dataset (e.g. CSVs, images, code, documentation)
  • a ro-crate-metadata.json file describing the content, provenance, and context
  • persistent identifiers and references to related research objects (e.g. software, publications)

This ensures that the dataset can be easily reused, cited, validated, and interpreted in a reproducible manner. More information can be found here.

Download

You can download a RO-Crate for this dataset here: Download RO-Crate

HINT: The RO-Crate is created dynamically, so it could take up to 30 seconds until the downloads starts.


This page was built for dataset: isolet