adult (Q6034160)

From MaRDI portal
Revision as of 13:16, 15 April 2024 by Importer (talk | contribs) (‎Created a new Item)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
OpenML dataset with id 1590
Language Label Description Also known as
English
adult
OpenML dataset with id 1590

    Statements

    0 references
    **Author**: Ronny Kohavi and Barry Becker \N**Source**: [UCI](https://archive.ics.uci.edu/ml/datasets/Adult) - 1996 \N**Please cite**: Ron Kohavi, "Scaling Up the Accuracy of Naive-Bayes Classifiers: a Decision-Tree Hybrid", Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 1996 \N\NPrediction task is to determine whether a person makes over 50K a year. Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the following conditions: ((AAGE>16) && (AGI>100) && (AFNLWGT>1)&& (HRSWK>0))\N\NThis is the original version from the UCI repository, with training and test sets merged.\N\N### Variable description\N\NVariables are all self-explanatory except __fnlwgt__. This is a proxy for the demographic background of the people: "People with similar demographic characteristics should have similar weights". This similarity-statement is not transferable across the 51 different states.\N\NDescription from the donor of the database: \N\NThe weights on the CPS files are controlled to independent estimates of the civilian noninstitutional population of the US. These are prepared monthly for us by Population Division here at the Census Bureau. We use 3 sets of controls. These are:\N1. A single cell estimate of the population 16+ for each state.\N2. Controls for Hispanic Origin by age and sex.\N3. Controls by Race, age and sex.\N\NWe use all three sets of controls in our weighting program and "rake" through them 6 times so that by the end we come back to all the controls we used. The term estimate refers to population totals derived from CPS by creating "weighted tallies" of any specified socio-economic characteristics of the population. People with similar demographic characteristics should have similar weights. There is one important caveat to remember about this statement. That is that since the CPS sample is actually a collection of 51 state samples, each with its own probability of selection, the statement only applies within state.\N\N\N### Relevant papers \N\NRonny Kohavi and Barry Becker. Data Mining and Visualization, Silicon Graphics. \Ne-mail: ronnyk '@' live.com for questions.
    0 references
    9 June 2015
    0 references
    class
    0 references
    0 references
    0 references
    bb6510925e5d4b23d136715febb2cdf5
    0 references
    2
    0 references
    2
    0 references
    15
    0 references
    48,842
    0 references
    6,465
    0 references
    0 references

    Identifiers

    0 references