Bio-ML: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching
DOI10.5281/zenodo.13119437Zenodo13119437MaRDI QIDQ6718572FDOQ6718572
Dataset published at Zenodo repository.
Ernesto Jiménez-Ruiz, Ian Horrocks, Yuan He, Ali Hadian, Hang Dong, Jiaoyan Chen
Publication date: 28 July 2024
Copyright license: Creative Commons Attribution 4.0 International
This version is used in the Bio-ML track of the OAEI 2024; the only change compared to the OAEI 2023 is the deletion of certain training subsumption mappings. Overview The purpose of these datasets is to supportequivalence and subsumption ontology matching. There are five ontology pairs extracted from MONDO and UMLS: Source Task Category #SrcCls #TgtCls #Ref (equiv) #Ref (subs) Mondo OMIM-ORDO Disease 9,648 9,275 3,721 103 Mondo NCIT-DOID Disease 15,762 8,465 4,686 3,338 (-1) UMLS SNOMED-FMA Body 34,418 88,955 7,256 5,453 (-53) UMLS SNOMED-NCIT Pharm 29,500 22,136 5,803 4,224 (-1) UMLS SNOMED-NCIT Neoplas 22,971 20,247 3,804 213 The "-" numbers reflect the changes due to lthe deletion of certain training subsumption mappings. The main track is available at "bio-ml", where each pair is associated with a task folder, containing the source and target ontologies, reference equivalence mappings (in "refs_equiv"), reference subsumption mappings ("refs_subs"). The special sub-track is available at "bio-llm", where each pair is associated with a task folder, containing the source and target ontologies, and the test candidate mappings. Citation Bio-ML (Main Track) ```@inproceedings{he2022machine, title={Machine learning-friendly biomedical datasets for equivalence and subsumption ontology matching}, author={He, Yuan and Chen, Jiaoyan and Dong, Hang and Jim{\'e}nez-Ruiz, Ernesto and Hadian, Ali and Horrocks, Ian}, booktitle={International Semantic Web Conference}, pages={575--591}, year={2022}, organization={Springer} }``` Bio-LLM (Sub-track) ```@article{he2023exploring, title={Exploring large language models for ontology alignment}, author={He, Yuan and Chen, Jiaoyan and Dong, Hang and Horrocks, Ian}, journal={arXiv preprint arXiv:2309.07172}, year={2023} }``` Important Links See detailed documentation at:https://krr-oxford.github.io/DeepOnto/bio-ml. See the OAEI Bio-ML track at:https://www.cs.ox.ac.uk/isg/projects/ConCur/oaei/ See our resource paper for the original Bio-ML atarxivor springer(accepted atISWC-2022 and nominated as the best resource paper candidate). See our poster paper for the Bio-LLM sub-track atarxiv (accepted at ISWC-2023 Posters Demos). Changelog The only change in this version compared to the OAEI 2023 is the deletion of certain training subsumption mappings that can be directly exploited through deductive reasoning.
This page was built for dataset: Bio-ML: Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching