Project:OpenMLDatamodels: Difference between revisions
From MaRDI portal
add some links |
No edit summary |
||
(4 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
== Dataset == | |||
Dataset from OpenML; currently, there is no way to associate these datasets with papers, unless there is a doi match | Dataset from OpenML; currently, there is no way to associate these datasets with papers, unless there is a doi match | ||
* label: OpenML dataset name | * label: OpenML dataset name | ||
* description: OpenML dataset with id {ID} | * description: OpenML dataset with id {ID} | ||
* instance of ([[Property:P31]]) data set ( | * instance of ([[Property:P31]]) data set (MaRDI: [[Item:Q56885]]) | ||
* OpenML dataset ID: unique dataset ID from OpenML | * OpenML dataset ID ([[Property:P1473]]): unique dataset ID from OpenML | ||
* dataset version: version of dataset | * dataset version: version of dataset | ||
* author name string: This input was free text and can't be associated to an ID of any kind at the moment - creators and contributors are both used here | * author name string: This input was free text and can't be associated to an ID of any kind at the moment - creators and contributors are both used here | ||
* collection date: from freeform text field --> string is used as is | * collection date: from freeform text field --> string is used as is | ||
* upload date: this is an automatic timestamp | * upload date: this is an automatic timestamp | ||
* license: license name gets matched to License items in KG like in the CRAN importer | * license: license name gets matched to License items in KG like in the CRAN importer ("Public" is the default in OpenML) | ||
* full work available at url: both the fields "url" and "original data url" are used for this | * full work available at url: both the fields "url" and "original data url" are used for this | ||
* default target attribute (e.g. class) | * default target attribute (e.g. class) | ||
* row id attribute | * row id attribute | ||
* OpenML semantic tag ([[Property:P1465]]): these were automatically tagged in OpenML; only tags in this list are considered: Agriculture, Astronomy, Chemistry, Computational Universe, Computer Systems, Culture, Demographics, Earth Science, Economics, Education, Geography, Government, Health, History, Human Activities, Images, Language, Life Science, Machine Learning, Manufacturing, Mathematics, Medicine, Meteorology, Physical Sciences, Politics, Social Media, Sociology, Statistics, Text & Literature, Transportation | * OpenML semantic tag ([[Property:P1465]]): these were automatically tagged in OpenML; only tags [https://query.portal.mardi4nfdi.de/#SELECT%20%3Fitem%20%3FitemLabel%20WHERE%20%7B%0A%20%20%3Fitem%20wdt%3AP31%20wd%3AQ6032783%20.%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22%5BAUTO_LANGUAGE%5D%2Cen%22.%20%7D%0A%7D in this list] are considered: Agriculture, Astronomy, Chemistry, Computational Universe, Computer Systems, Culture, Demographics, Earth Science, Economics, Education, Geography, Government, Health, History, Human Activities, Images, Language, Life Science, Machine Learning, Manufacturing, Mathematics, Medicine, Meteorology, Physical Sciences, Politics, Social Media, Sociology, Statistics, Text & Literature, Transportation | ||
* cites work: this points to a Publication item if a doi or arxiv id could get extracted from the citation text; if there is a doi or an arxiv ID, the importer tries to find existing papers with that ID; else, a new paper gets created | * cites work: this points to a Publication item if a doi or arxiv id could get extracted from the citation text; if there is a doi or an arxiv ID, the importer tries to find existing papers with that ID; else, a new paper gets created | ||
* citation text: raw citation text | * citation text: raw citation text | ||
Line 29: | Line 29: | ||
* MaRDI profile type: MaRDI dataset profile | * MaRDI profile type: MaRDI dataset profile | ||
== Publication == | |||
If no publication is found for the identifier, a publication item which just consists of the identifier without a label is created | If no publication is found for the identifier, a publication item which just consists of the identifier without a label is created | ||
Line 37: | Line 37: | ||
* doi (if present) | * doi (if present) | ||
* MaRDI profile type: MaRDI publication profile | * MaRDI profile type: MaRDI publication profile | ||
== Sample item == | |||
* [[Item:Q6032831]] (anneal) |
Latest revision as of 11:03, 11 April 2024
Dataset
Dataset from OpenML; currently, there is no way to associate these datasets with papers, unless there is a doi match
- label: OpenML dataset name
- description: OpenML dataset with id {ID}
- instance of (Property:P31) data set (MaRDI: Item:Q56885)
- OpenML dataset ID (Property:P1473): unique dataset ID from OpenML
- dataset version: version of dataset
- author name string: This input was free text and can't be associated to an ID of any kind at the moment - creators and contributors are both used here
- collection date: from freeform text field --> string is used as is
- upload date: this is an automatic timestamp
- license: license name gets matched to License items in KG like in the CRAN importer ("Public" is the default in OpenML)
- full work available at url: both the fields "url" and "original data url" are used for this
- default target attribute (e.g. class)
- row id attribute
- OpenML semantic tag (Property:P1465): these were automatically tagged in OpenML; only tags in this list are considered: Agriculture, Astronomy, Chemistry, Computational Universe, Computer Systems, Culture, Demographics, Earth Science, Economics, Education, Geography, Government, Health, History, Human Activities, Images, Language, Life Science, Machine Learning, Manufacturing, Mathematics, Medicine, Meteorology, Physical Sciences, Politics, Social Media, Sociology, Statistics, Text & Literature, Transportation
- cites work: this points to a Publication item if a doi or arxiv id could get extracted from the citation text; if there is a doi or an arxiv ID, the importer tries to find existing papers with that ID; else, a new paper gets created
- citation text: raw citation text
- has feature: features and their data types, such as the feature "width" with the data type "numeric"
- number of binary features
- number of classes
- number of features
- number of instances
- number of instances with missing values
- number of missing values
- number of numeric features
- number of symbolic features
- file format: ARFF or Sparse ARFF
- MaRDI profile type: MaRDI dataset profile
Publication
If no publication is found for the identifier, a publication item which just consists of the identifier without a label is created
- label: None
- description: scientific article about an OpenML dataset
- arxiv ID (if present)
- doi (if present)
- MaRDI profile type: MaRDI publication profile
Sample item
- Item:Q6032831 (anneal)