Project:OpenMLDatamodels: Difference between revisions

From MaRDI portal
No edit summary
No edit summary
Line 1: Line 1:
=== Dataset ===
=== Dataset ===
Dataset from OpenML
Dataset from OpenML; currently, there is no way to associate these datasets with papers, unless there is a doi match


* label: OpenML dataset name
* label: OpenML dataset name
Line 8: Line 8:
* dataset version: version of dataset  
* dataset version: version of dataset  
* author name string: This input was free text and can't be associated to an ID of any kind at the moment - creators and contributors are both used here
* author name string: This input was free text and can't be associated to an ID of any kind at the moment - creators and contributors are both used here
* collection date: from freeform text field --> string is used as is
* upload date: this is an automatic timestamp
* license: license name gets matched to License items in KG like in the CRAN importer
* full work available at url: both the fields "url" and "original data url" are used for this
* default target attribute (e.g. class)
* row id attribute
* OpenML semantic tag: these were automatically tagged in OpenML; only tags in this list are considered:  Agriculture, Astronomy, Chemistry, Computational Universe, Computer Systems, Culture, Demographics, Earth Science, Economics, Education, Geography, Government, Health, History, Human Activities, Images, Language, Life Science, Machine Learning, Manufacturing, Mathematics, Medicine, Meteorology, Physical Sciences, Politics, Social Media, Sociology, Statistics, Text & Literature, Transportation


=== Publication ===
=== Publication ===

Revision as of 14:58, 18 March 2024

Dataset

Dataset from OpenML; currently, there is no way to associate these datasets with papers, unless there is a doi match

  • label: OpenML dataset name
  • description: OpenML dataset with id {ID}
  • instance of data set (wikidata: Q1172284)
  • OpenML dataset ID: unique dataset ID from OpenML
  • dataset version: version of dataset
  • author name string: This input was free text and can't be associated to an ID of any kind at the moment - creators and contributors are both used here
  • collection date: from freeform text field --> string is used as is
  • upload date: this is an automatic timestamp
  • license: license name gets matched to License items in KG like in the CRAN importer
  • full work available at url: both the fields "url" and "original data url" are used for this
  • default target attribute (e.g. class)
  • row id attribute
  • OpenML semantic tag: these were automatically tagged in OpenML; only tags in this list are considered: Agriculture, Astronomy, Chemistry, Computational Universe, Computer Systems, Culture, Demographics, Earth Science, Economics, Education, Geography, Government, Health, History, Human Activities, Images, Language, Life Science, Machine Learning, Manufacturing, Mathematics, Medicine, Meteorology, Physical Sciences, Politics, Social Media, Sociology, Statistics, Text & Literature, Transportation

Publication