covertype
OpenML293MaRDI QIDQ6033070FDOQ6033070RO-CrateQ6033070
OpenML dataset with id 293
Dr. Charles W. Anderson, Dr. Denis J. Dean, Jock A. Blackard
Full work available at URL: https://api.openml.org/data/v1/download/49823/covertype.sparse_arff
Upload date: 15 August 2014
Dataset Characteristics
Number of classes: 2
Number of features: 55 (numeric: 54, symbolic: 1 and in total binary: 1 )
Number of instances: 581,012
Number of instances with missing values: 0
Number of missing values: 0
Author: Jock A. Blackard, Dr. Denis J. Dean, Dr. Charles W. Anderson Source: LibSVM repository - 2013-11-14 Please cite: For the binarization: R. Collobert, S. Bengio, and Y. Bengio. A parallel mixture of SVMs for very large scale problems. Neural Computation, 14(05):1105-1114, 2002.
This is the famous covertype dataset in its binary version, retrieved 2013-11-13 from the libSVM site (called covtype.binary there). Additional to the preprocessing done there (see LibSVM site for details), this dataset was created as follows: -load covertpype dataset, unscaled. -normalize each file columnwise according to the following rules: -If a column only contains one value (constant feature), it will set to zero and thus removed by sparsity. -If a column contains two values (binary feature), the value occuring more often will be set to zero, the other to one. -If a column contains more than two values (multinary/real feature), the column is divided by its std deviation. -duplicate lines were finally removed.
Preprocessing: Transform from multiclass into binary class.
ROCrate
What is a RO-Crate?
A RO-Crate is a standardized research object package used to bundle data together with rich machine-readable metadata. Each RO-Crate contains:
- the files belonging to the dataset (e.g. CSVs, images, code, documentation)
- a ro-crate-metadata.json file describing the content, provenance, and context
- persistent identifiers and references to related research objects (e.g. software, publications)
This ensures that the dataset can be easily reused, cited, validated, and interpreted in a reproducible manner. More information can be found here.
Download
You can download a RO-Crate for this dataset here: Download RO-Crate
HINT: The RO-Crate is created dynamically, so it could take up to 30 seconds until the downloads starts.
This page was built for dataset: covertype