Glass-Classification
OpenML43750MaRDI QIDQ6036843FDOQ6036843RO-CrateQ6036843
OpenML dataset with id 43750
Author name not available (Why is that?)
Full work available at URL: https://api.openml.org/data/v1/download/22102575/Glass-Classification.arff
Upload date: 24 March 2022
Dataset Characteristics
Number of features: 10 (numeric: 10, symbolic: 0 and in total binary: 0 )
Number of instances: 214
Number of instances with missing values: 0
Number of missing values: 0
Context This is a Glass Identification Data Set from UCI. It contains 10 attributes including id. The response is glass type(discrete 7 values) Content Attribute Information:
Id number: 1 to 214 (removed from CSV file) RI: refractive index Na: Sodium (unit measurement: weight percent in corresponding oxide, as are attributes 4-10) Mg: Magnesium Al: Aluminum Si: Silicon K: Potassium Ca: Calcium Ba: Barium Fe: Iron Type of glass: (class attribute) -- 1 buildingwindowsfloatprocessed -- 2 buildingwindowsnonfloatprocessed -- 3 vehiclewindowsfloatprocessed -- 4 vehiclewindowsnonfloatprocessed (none in this database) -- 5 containers -- 6 tableware -- 7 headlamps
Acknowledgements
https://archive.ics.uci.edu/ml/datasets/Glass+Identification
Source:
Creator:
B. German
Central Research Establishment
Home Office Forensic Science Service
Aldermaston, Reading, Berkshire RG7 4PN
Donor:
Vina Spiehler, Ph.D., DABFT
Diagnostic Products Corporation
(213) 776-0180 (ext 3014)
Inspiration
Data exploration of this dataset reveals two important characteristics :
1) The variables are highly corelated with each other including the response variables:
So which kind of ML algorithm is most suitable for this dataset Random Forest , KNN or other? Also since dataset is too small is there any chance of applying PCA or it should be completely avoided?
2) Highly Skewed Data:
Is scaling sufficient or are there any other techniques which should be applied to normalize data? Like BOX-COX Power transformation?
Cited In (1)
ROCrate
What is a RO-Crate?
A RO-Crate is a standardized research object package used to bundle data together with rich machine-readable metadata. Each RO-Crate contains:
- the files belonging to the dataset (e.g. CSVs, images, code, documentation)
- a ro-crate-metadata.json file describing the content, provenance, and context
- persistent identifiers and references to related research objects (e.g. software, publications)
This ensures that the dataset can be easily reused, cited, validated, and interpreted in a reproducible manner. More information can be found here.
Download
You can download a RO-Crate for this dataset here: Download RO-Crate
HINT: The RO-Crate is created dynamically, so it could take up to 30 seconds until the downloads starts.
This page was built for dataset: Glass-Classification