Project:MilestonesMeeting/20240223: Difference between revisions

From MaRDI portal
No edit summary
 
(5 intermediate revisions by the same user not shown)
Line 20: Line 20:
# Open (technical) topics (see below)
# Open (technical) topics (see below)
# Documentation
# Documentation
## How to improve internal documentation?
## How to improve internal documentation?
## How to improve documentation for external?
### --> If possible, update the upstream documentation (e.g. MediaWiki) - and link it from our Wiki 
# Outreach (to other SFBs, Math+, Libraries, ...)
### --> Use Rim as a test-person to check whether all needed information is documented on our Wiki
# [If time permits] Who is doing what? (-> Personal Milestone Planning) PART2 - Refinement
## How to improve documentation for external?  
### --> Collect technical questions from other TAs and create documentation about it
### --> Start with a FAQ-like document (potentially link to more complex documentations from there)
# Outreach (to other SFBs, Math+, Libraries, ...)  
## --> Connect better with: Math+, LifeDocs (Christoph Lehrenfeld), TU Darmstadt Library (Jens Freund)


== (Technical) Topics to discuss ==
== (Technical) Topics to discuss ==
* How to define items?  
* How to define items? --> Create a property ("mardi-profile") for each item that can be used to identify an item's type (software, formula, publication, ...)
** How to define profile types?
*** --> see [[Project:Profile types]]
** Formulae  
** Formulae  
*** Which "instance of" to use?
*** Which properties to use?  
*** Which properties to use?
**** --> Same as DLMF
** Papers
** Papers
*** Which "instance of" to use? ("scholary article"?)
*** Current way of selecting papers in SPARQL queries by "has zbMath ID"? --> solved through the new "mardi-profile" property
*** Current way of selecting papers in SPARQL queries by "has zbMath ID"?  
*** How to link from a paper, as in "cites software"? / "uses dataset"? 
*** How to link to a paper, as in "This data-set / software was used in this paper"  (Now: in software-item we use "is described in" and in )
**** --> use https://www.wikidata.org/wiki/Property:P4510 to link software to a publication if this software was used in the publication 
*** How to link from a paper, as in "cites software"? / "uses dataset"?
**** --> info available in swMath
*** How to link to a paper, as in "This data-set / software was used in this paper"  (Now: in software-item we use "is described in" and in )  
**** --> use the reverse 4510
** Datasets
** Datasets
*** Which "instance of" to use?
*** Which properties to use?  
*** Which properties to use?  
**** --> Larissa made a first draft; compatibility should be checked with Zenodo items; then implement it
*** How to link to a paper, as in "was used in paper"? (Is this necessary?)
*** How to link to a paper, as in "was used in paper"? (Is this necessary?)
**** --> as before for software
** Software items (How can we query all of them - "instance of X" - what is X?)
** Software items (How can we query all of them - "instance of X" - what is X?)
*** Which "instance of" to use?
*** "instance of software" is violating the WikiData hierarchy? (Software is quite high-level)
*** "instance of software" is violating the WikiData hierarchy? (Software is quite high-level)
** How to define profile types?
**** --> Solved by using the new "mardi-profile" property
*** --> see [[Project:Profile types]]
* arXiv Importer
* arXiv Importer [ELOI]
** What is the plan?
** What is the plan?
*** --> Use zbMath data about arXiv paper meta-data (blocker: API is not yet giving out that information)
*** Import of formulae (can we use an LLM to describe a particular formula? parameters etc.?)
*** Import of formulae (can we use an LLM to describe a particular formula? parameters etc.?)
**** --> Do this on a small sub-set of arXiv papers to showcase the idea
*** Import of paper-meta-data? (->Disambiguation)
*** Import of paper-meta-data? (->Disambiguation)
** What is the status?
**** Use zbMath data
* LLMs for MaRDI portal [ELOI/MORITZ/LITESH]
** Next steps?
*** --> Take 2..10 arXiv papers, extract formulas, add to MaRDI KG, try Moritz's formula search service
*** --> Discuss results and see whether this is useful at all
* LLMs for MaRDI portal
** What is the overall plan?
** What is the overall plan?
** What is the status?
** What is the status?
*** Chat-Bot (LLM to query the portal)
*** Chat-Bot (LLM to query the portal)
* How to integrate more of the cool Scholia stuff? (Simple example: number of citations of a paper, see e.g. https://scholia.portal.mardi4nfdi.de/work/Q25938997)
* How to integrate more of the cool Scholia stuff? (Simple example: number of citations of a paper, see e.g. https://scholia.portal.mardi4nfdi.de/work/Q25938997)
** --> Define what "cool" Scholia stuff is
** --> For the citation example: Use available services such as OpenCitations.net to get needed meta-data
* Zenodo importer (for Math+ integration)
* Zenodo importer (for Math+ integration)
** What is the plan?
** What is the plan?
*** Set-up workflow to harvest the Math+ Zenodo Community items
* Workflows for periodic updates (for any source we have)
* Workflows for periodic updates (for any source we have)
** --> Use the Zenodo example as demontrator
* zbMath MSC Keyword import? (We only have the IDs)
* zbMath MSC Keyword import? (We only have the IDs)
** --> Put the ID<->Keyword relatins in SQL database to avoid license issues
* [https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/IIA5LVHBYK45FSMLPIVZI6WXA5QSRPF4/ Wikidata graph split]
* [https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/IIA5LVHBYK45FSMLPIVZI6WXA5QSRPF4/ Wikidata graph split]
** --> If this happens, Scholia might become disfunctional on many of the queries
* Licensing
* Licensing
** Put a "general" this is our licensing strategy page on our Wiki
* Author disambiguation
* Author disambiguation
* [https://openknowledgemaps.org/ OKMaps]
* [https://openknowledgemaps.org/ OKMaps]
* environmental footprint
* environmental footprint

Latest revision as of 14:38, 26 February 2024

MaRDI TA5 Milestones Meeting 23.02.2024 @ ZIB

Goals of the meeting

  • We have an idea about how to reach the official milestones
  • Everybody is aware of the personal milestones
  • Some (all) technical points are discussed / solved
  • We have a plan about how to have better documentation

Agenda

  1. Welcome
  2. Mission clarification
    1. What IS our mission in TA5? Connect papers/software AND data-sets?
      --> Make it easy to access and find the data produced by MaRDI TAs 1-4
  3. Milestone Planning
    1. What are our 2024 goals for MaRDI?
      --> Bring in content from TAs 1-4
    2. What are the official 2024 milestones?
    3. Who is doing what? (-> Personal Milestone Planning) PART1 - Presentations
  4. Open (technical) topics (see below)
  5. Documentation
    1. How to improve internal documentation?
      1. --> If possible, update the upstream documentation (e.g. MediaWiki) - and link it from our Wiki
      2. --> Use Rim as a test-person to check whether all needed information is documented on our Wiki
    2. How to improve documentation for external?
      1. --> Collect technical questions from other TAs and create documentation about it
      2. --> Start with a FAQ-like document (potentially link to more complex documentations from there)
  6. Outreach (to other SFBs, Math+, Libraries, ...)
    1. --> Connect better with: Math+, LifeDocs (Christoph Lehrenfeld), TU Darmstadt Library (Jens Freund)

(Technical) Topics to discuss

  • How to define items? --> Create a property ("mardi-profile") for each item that can be used to identify an item's type (software, formula, publication, ...)
    • How to define profile types?
    • Formulae
      • Which properties to use?
        • --> Same as DLMF
    • Papers
      • Current way of selecting papers in SPARQL queries by "has zbMath ID"? --> solved through the new "mardi-profile" property
      • How to link from a paper, as in "cites software"? / "uses dataset"?
      • How to link to a paper, as in "This data-set / software was used in this paper" (Now: in software-item we use "is described in" and in )
        • --> use the reverse 4510
    • Datasets
      • Which properties to use?
        • --> Larissa made a first draft; compatibility should be checked with Zenodo items; then implement it
      • How to link to a paper, as in "was used in paper"? (Is this necessary?)
        • --> as before for software
    • Software items (How can we query all of them - "instance of X" - what is X?)
      • "instance of software" is violating the WikiData hierarchy? (Software is quite high-level)
        • --> Solved by using the new "mardi-profile" property
  • arXiv Importer
    • What is the plan?
      • --> Use zbMath data about arXiv paper meta-data (blocker: API is not yet giving out that information)
      • Import of formulae (can we use an LLM to describe a particular formula? parameters etc.?)
        • --> Do this on a small sub-set of arXiv papers to showcase the idea
      • Import of paper-meta-data? (->Disambiguation)
        • Use zbMath data
    • Next steps?
      • --> Take 2..10 arXiv papers, extract formulas, add to MaRDI KG, try Moritz's formula search service
      • --> Discuss results and see whether this is useful at all
  • LLMs for MaRDI portal
    • What is the overall plan?
    • What is the status?
      • Chat-Bot (LLM to query the portal)
  • How to integrate more of the cool Scholia stuff? (Simple example: number of citations of a paper, see e.g. https://scholia.portal.mardi4nfdi.de/work/Q25938997)
    • --> Define what "cool" Scholia stuff is
    • --> For the citation example: Use available services such as OpenCitations.net to get needed meta-data
  • Zenodo importer (for Math+ integration)
    • What is the plan?
      • Set-up workflow to harvest the Math+ Zenodo Community items
  • Workflows for periodic updates (for any source we have)
    • --> Use the Zenodo example as demontrator
  • zbMath MSC Keyword import? (We only have the IDs)
    • --> Put the ID<->Keyword relatins in SQL database to avoid license issues
  • Wikidata graph split
    • --> If this happens, Scholia might become disfunctional on many of the queries
  • Licensing
    • Put a "general" this is our licensing strategy page on our Wiki
  • Author disambiguation
  • OKMaps
  • environmental footprint