Project talk:OpenMLDatamodels

From MaRDI portal

Initial feedback on the data model for OpenML dataset items

The following remarks are based on this version of the documentation page and this version of the sample item.

On the documentation page, each statement type should link to the respective property. For a generally useful approach to sharing Wikibase data models, see the corresponding pages on some WikiProjects over on Wikidata, e.g. here. It is also advisable to create and document the necessary properties in advance in order to facilitate their discussion.

Some specific points regarding individual properties:

  • dataset version
    • The property description page is essentially empty. Should this be specific to OpenML or generic?
      • Answer: We can keep it generic in my opinion
      • Tim requested a name change to "dataset version identifier"
    • Some of the other properties may change with the version number (certainly the checksum, for instance) — how to handle that?
      • Answer: My plan was to always update it to the newest version, including all properties
  • author name string
    • keep track of order in the author list, as per series ordinal, so as to facilitate conversion to author statements
      • Answer: Even though there is no meaningful order in OpenML, afaik?
        • Yes.
  • default target attribute
    • The property description page is essentially empty and needs to be fleshed out.
      • Answer: Sure :) (this also applies to all other property pages)
  • checksum
    • Depends on version, so should be coordinated with that ( see above)
      • Answer: Also see above, this depends on how we handle the version
  • has feature
    • The property description page is essentially empty and needs to be fleshed out.
      • this property will be removed
  • number of binary features
    • The property description page is essentially empty and needs to be fleshed out.
  • number of classes
    • The property description page is essentially empty and needs to be fleshed out.
  • number of features
    • The property description page is essentially empty and needs to be fleshed out.
  • number of instances
    • The property description page is essentially empty and needs to be fleshed out.
  • number of instances with missing values
    • The property description page is essentially empty and needs to be fleshed out.
  • number of missing values
    • The property description page is essentially empty and needs to be fleshed out.
  • number of numeric features
    • The property description page is essentially empty and needs to be fleshed out.
  • number of symbolic features
    • The property description page is essentially empty and needs to be fleshed out.

--Daniel (talk) 16:05, 21 March 2024 (CET)