liver-disorders

OpenML8MaRDI QIDQ6032861FDOQ6032861RO-CrateQ6032861

OpenML dataset with id 8

BUPA Medical Research Ltd, Richard S. Forsyth

Full work available at URL: https://api.openml.org/data/v1/download/8/liver-disorders.arff

Upload date: 6 April 2014

Dataset Characteristics

Number of classes: 0
Number of features: 6 (numeric: 6, symbolic: 0 and in total binary: 0 )
Number of instances: 345
Number of instances with missing values: 0
Number of missing values: 0

Description

Author: BUPA Medical Research Ltd. Donor: Richard S. Forsyth Source: UCI - 5/15/1990 Please cite:

BUPA liver disorders

The first 5 variables are all blood tests which are thought to be sensitive to liver disorders that might arise from excessive alcohol consumption. Each line in the dataset constitutes the record of a single male individual.

Important note: The 7th field (selector) has been widely misinterpreted in the past as a dependent variable representing presence or absence of a liver disorder. This is incorrect [1]. The 7th field was created by BUPA researchers as a train/test selector. It is not suitable as a dependent variable for classification. The dataset does not contain any variable representing presence or absence of a liver disorder. Researchers who wish to use this dataset as a classification benchmark should follow the method used in experiments by the donor (Forsyth & Rada, 1986, Machine learning: applications in expert systems and information retrieval) and others (e.g. Turney, 1995, Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm), who used the 6th field (drinks), after dichotomising, as a dependent variable for classification. Because of widespread misinterpretation in the past, researchers should take care to state their method clearly.

Attribute information

   1. mcv mean corpuscular volume  
   2. alkphos alkaline phosphotase  
   3. sgpt alanine aminotransferase  
   4. sgot  aspartate aminotransferase  
   5. gammagt gamma-glutamyl transpeptidase  
   6. drinks number of half-pint equivalents of alcoholic beverages drunk per day  
   7. selector field created by the BUPA researchers to split the data into train/test sets

[1] McDermott & Forsyth 2016, Diagnosing a disorder in a classification benchmark, Pattern Recognition Letters, Volume 73. Note Forsyth is named on the UCI page as the original donor of the dataset.

ROCrate

What is a RO-Crate?

A RO-Crate is a standardized research object package used to bundle data together with rich machine-readable metadata. Each RO-Crate contains:

the files belonging to the dataset (e.g. CSVs, images, code, documentation)
a ro-crate-metadata.json file describing the content, provenance, and context
persistent identifiers and references to related research objects (e.g. software, publications)

This ensures that the dataset can be easily reused, cited, validated, and interpreted in a reproducible manner. More information can be found here.

Download

You can download a RO-Crate for this dataset here: Download RO-Crate

HINT: The RO-Crate is created dynamically, so it could take up to 30 seconds until the downloads starts.

This page was built for dataset: liver-disorders