AP_Prostate_Kidney

OpenML dataset with id 1144

No author found.

Full work available at URL: https://api.openml.org/data/v1/download/54027/AP_Prostate_Kidney.arff

Upload date: 7 October 2014

Dataset Characteristics

Number of classes: 2
Number of features: 10,936 (numeric: 10,935, symbolic: 1 and in total binary: 1 )
Number of instances: 329
Number of instances with missing values: 0
Number of missing values: 0

Description

Author: Source: Unknown - Date unknown Please cite:

GEMLeR provides a collection of gene expression datasets that can be used for benchmarking gene expression oriented machine learning algorithms. They can be used for estimation of different quality metrics (e.g. accuracy, precision, area under ROC curve, etc.) for classification, feature selection or clustering algorithms.

This repository was inspired by an increasing need in machine learning / bioinformatics communities for a collection of microarray classification problems that could be used by different researches. This way many different classification or feature selection techniques can finally be compared to eachother on the same set of problems.

Origin of data

Each gene expression sample in GEMLeR repository comes from a large publicly available expO (Expression Project For Oncology) repository by International Genomics Consortium.

The goal of expO and its consortium supporters is to procure tissue samples under standard conditions and perform gene expression analyses on a clinically annotated set of deidentified tumor samples. The tumor data is updated with clinical outcomes and is released into the public domain without intellectual property restriction. The availability of this information translates into direct benefits for patients, researchers and pharma alike.

Source: expO website Although there are various other sources of gene expression data available, a decision to use data from expO repository was made because of: - consistency of tissue samples processing procedure - same microarray platform used for all samples - availability of additional information for combined genotype-phenotype studies - availability of a large number of samples for different tumor types

In case of publishing material based on GEMLeR datasets, then, please note the assistance you received by using this repository. This will help others to obtain the same datasets and replicate your experiments. Please cite as follows when referring to this repository:

Stiglic, G., & Kokol, P. (2010). Stability of Ranked Gene Lists in Large Microarray Analysis Studies. Journal of biomedicine biotechnology, 2010, 616358.

You are also welcome to acknowledge the contribution of expO (Expression Project For Oncology) and International Genomics Consortium for providing their gene expression samples to the public.

This page was built for dataset: AP_Prostate_Kidney