liver-disorders (Q6032861)

From MaRDI portal
OpenML dataset with id 8
Language Label Description Also known as
English
liver-disorders
OpenML dataset with id 8

    Statements

    0 references
    0 references
    BUPA Medical Research Ltd
    0 references
    Richard S. Forsyth
    0 references
    1990-05-15
    0 references
    6 April 2014
    0 references
    drinks
    0 references
    0 references
    0 references
    https://www.sciencedirect.com/science/article/pii/S0167865516000088?casa_token=V_00WKobQo4AAAAA:vksCWgiDdjbISFvgBuV8LfRTDarVcsyo-L88EKG8sBN9WqlbSTyAMcdljmjRZsiRGJZHoJOiO4Y
    0 references
    e4905a8be9c1eeedb0d41304239155a8
    0 references
    0
    0 references
    0
    0 references
    6
    0 references
    345
    0 references
    0
    0 references
    **Author**: BUPA Medical Research Ltd. Donor: Richard S. Forsyth \N**Source**: [UCI](https://archive.ics.uci.edu/ml/datasets/Liver+Disorders) - 5/15/1990 \N**Please cite**: \N\N**BUPA liver disorders**\N \NThe first 5 variables are all blood tests which are thought to be sensitive to liver disorders that might arise from excessive alcohol consumption. Each line in the dataset constitutes the record of a single male individual. \N\N**Important note:** The 7th field (selector) has been widely misinterpreted in the past as a dependent variable representing presence or absence of a liver disorder. This is incorrect [1]. The 7th field was created by BUPA researchers as a train/test selector. It is not suitable as a dependent variable for classification. The dataset does not contain any variable representing presence or absence of a liver disorder. Researchers who wish to use this dataset as a classification benchmark should follow the method used in experiments by the donor (Forsyth & Rada, 1986, Machine learning: applications in expert systems and information retrieval) and others (e.g. Turney, 1995, Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm), who used the 6th field (drinks), after dichotomising, as a dependent variable for classification. Because of widespread misinterpretation in the past, researchers should take care to state their method clearly.\N \N**Attribute information** \N 1. mcv mean corpuscular volume \N 2. alkphos alkaline phosphotase \N 3. sgpt alanine aminotransferase \N 4. sgot aspartate aminotransferase \N 5. gammagt gamma-glutamyl transpeptidase \N 6. drinks number of half-pint equivalents of alcoholic beverages drunk per day \N 7. selector field created by the BUPA researchers to split the data into train/test sets \N\N[1] McDermott & Forsyth 2016, Diagnosing a disorder in a classification benchmark, Pattern Recognition Letters, Volume 73. Note Forsyth is named on the UCI page as the original donor of the dataset.
    0 references
    0 references