madelon

OpenML dataset with id 1485

No author found.

Full work available at URL: https://api.openml.org/data/v1/download/1590986/madelon.arff

Upload date: 22 May 2015

Dataset Characteristics

Number of classes: 2
Number of features: 501 (numeric: 500, symbolic: 1 and in total binary: 1 )
Number of instances: 2,600
Number of instances with missing values: 0
Number of missing values: 0

Description

Author: Isabelle Guyon Source: UCI Please cite: Isabelle Guyon, Steve R. Gunn, Asa Ben-Hur, Gideon Dror, 2004. Result analysis of the NIPS 2003 feature selection challenge.

1. 1. 1. Abstract:

MADELON is an artificial dataset, which was part of the NIPS 2003 feature selection challenge. This is a two-class classification problem with continuous input variables. The difficulty is that the problem is multivariate and highly non-linear.

1. 1. 1. Source:

Isabelle Guyon Clopinet 955 Creston Road Berkeley, CA 90708 isabelle '@' clopinet.com

1. 1. 1. Data Set Information:

MADELON is an artificial dataset containing data points grouped in 32 clusters placed on the vertices of a five-dimensional hypercube and randomly labeled +1 or -1. The five dimensions constitute 5 informative features. 15 linear combinations of those features were added to form a set of 20 (redundant) informative features. Based on those 20 features one must separate the examples into the 2 classes (corresponding to the +-1 labels). It was added a number of distractor feature called 'probes' having no predictive power. The order of the features and patterns were randomized.

This dataset is one of five datasets used in the NIPS 2003 feature selection challenge. The original data was split into training, validation and test set. Target values are provided only for two first sets (not for the test set). So, this dataset version contains all the examples from training and validation partitions.

There is no attribute information provided to avoid biasing the feature selection process.

1. 1. 1. Relevant Papers:

The best challenge entrants wrote papers collected in the book: Isabelle Guyon, Steve Gunn, Masoud Nikravesh, Lofti Zadeh (Eds.), Feature Extraction, Foundations and Applications. Studies in Fuzziness and Soft Computing. Physica-Verlag, Springer.

Isabelle Guyon, et al, 2007. Competitive baseline methods set new standards for the NIPS 2003 feature selection benchmark. Pattern Recognition Letters 28 (2007) 1438–1444.

Isabelle Guyon, et al. 2006. Feature selection with the CLOP package. Technical Report.

This page was built for dataset: madelon