duke-breast-cancer

From MaRDI portal
Dataset:6034021



OpenML1434MaRDI QIDQ6034021

OpenML dataset with id 1434

No author found.

Full work available at URL: https://api.openml.org/data/v1/download/1426694/duke-breast-cancer.sparse_arff

Upload date: 27 April 2015



Dataset Characteristics

Number of classes: 0
Number of features: 7,130 (numeric: 7,130, symbolic: 0 and in total binary: 0 )
Number of instances: 86
Number of instances with missing values: 0
Number of missing values: 0

Author: Shirish Krishnaj Shevade and S. Sathiya Keerthi. libSVM","AAD group Source: original - Date unknown Please cite: Shirish Krishnaj Shevade and S. Sathiya Keerthi. A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics, 19(17):2246-2253, 2003.

  1. Dataset from the LIBSVM data repository.

Preprocessing: Instance-wise normalization to mean zero and variance one. Then feature-wise normalization to mean zero and variance one. The original dataset consists of 49 instances. Five are removed since the classification results using immunohistochemistry and protein immunoblotting assay confilcted. Of the remaining, two instances were rejected due to failed array hybridization. The rest data are further splited into training (38), and validation (4).