Project talk:OpenMLDatamodels: Difference between revisions

From MaRDI portal
started
 
No edit summary
Line 8: Line 8:
* [[Property:P1474|dataset version]]
* [[Property:P1474|dataset version]]
** The property description page is essentially empty. Should this be specific to OpenML or generic?
** The property description page is essentially empty. Should this be specific to OpenML or generic?
*** Answer: We can keep it generic in my opinion
** Some of the other properties may change with the version number (certainly the checksum, for instance) — how to handle that?
** Some of the other properties may change with the version number (certainly the checksum, for instance) — how to handle that?
*** Answer: My plan was to always update it to the newest version, including all properties
* [[Property:P43|author name string]]
* [[Property:P43|author name string]]
** keep track of order in the author list, as per [[Property:P146|series ordinal]], so as to facilitate conversion to [[Property:P16|author]] statements
** keep track of order in the author list, as per [[Property:P146|series ordinal]], so as to facilitate conversion to [[Property:P16|author]] statements

Revision as of 16:12, 21 March 2024

Initial feedback on the data model for OpenML dataset items

The following remarks are based on this version of the documentation page and this version of the sample item.

On the documentation page, each statement type should link to the respective property. For a generally useful approach to sharing Wikibase data models, see the corresponding pages on some WikiProjects over on Wikidata, e.g. here. It is also advisable to create and document the necessary properties in advance in order to facilitate their discussion.

Some specific points regarding individual properties:

  • dataset version
    • The property description page is essentially empty. Should this be specific to OpenML or generic?
      • Answer: We can keep it generic in my opinion
    • Some of the other properties may change with the version number (certainly the checksum, for instance) — how to handle that?
      • Answer: My plan was to always update it to the newest version, including all properties
  • author name string
    • keep track of order in the author list, as per series ordinal, so as to facilitate conversion to author statements
  • default target attribute
    • The property description page is essentially empty and needs to be fleshed out.
  • checksum
    • Depends on version, so should be coordinated with that ( see above)
  • has feature
    • The property description page is essentially empty and needs to be fleshed out.
  • number of binary features
    • The property description page is essentially empty and needs to be fleshed out.
  • number of classes
    • The property description page is essentially empty and needs to be fleshed out.
  • number of features
    • The property description page is essentially empty and needs to be fleshed out.
  • number of instances
    • The property description page is essentially empty and needs to be fleshed out.
  • number of instances with missing values
    • The property description page is essentially empty and needs to be fleshed out.
  • number of missing values
    • The property description page is essentially empty and needs to be fleshed out.
  • number of numeric features
    • The property description page is essentially empty and needs to be fleshed out.
  • number of symbolic features
    • The property description page is essentially empty and needs to be fleshed out.

--Daniel (talk) 16:05, 21 March 2024 (CET)