Equivalence Classes of the LOD Cloud

From MaRDI portal
Dataset:6718013



DOI10.5281/zenodo.3345674Zenodo3345674MaRDI QIDQ6718013FDOQ6718013

Dataset published at Zenodo repository.

Jan Wielemaker, Fatiha Saïs, Joe Raad, Frank van Harmelen, Wouter Beek, Nathalie Pernelle

Publication date: 28 January 2019

Copyright license: Creative Commons Attribution 4.0 International



This data set contains all the 49 millionnon-singleton equivalence classes resulting from the transitive closure of over 556 million owl:sameAs statements extracted from the LOD Cloud in the2015 LOD Laundromat crawl. These equivalence classes are the result of the transitive closureof the owl:sameAs links available in the sameAs.cc data set. We represent these non-singleton equivalence classes using two CSV files: 1. id2terms.csv: contains in the first column theequivalence class identifier (randomly generated number) and in the rest of the columns all IRIs belonging to this equivalence class, which theoretically should refer to the same real world entity. In the following, we presentan example of one row of this file, where 42467584 in the first column represents the ID of this equivalence class, and the 4 other columns represent the IRIs that are identical after transitive closure: 42467584 http://nl.dbpedia.org/resource/Cnodocentron_trilineatum http://sv.dbpedia.org/resource/Cnodocentron_trilineatum http://vi.dbpedia.org/resource/Cnodocentron_trilineatum http://www.wikidata.org/entity/Q2304468 2.terms2id.csv: contains two columns, representing a mapping between each IRI in the sameAs.cc data set involved in a owl:sameAs link with the equivalence class it belongs to. In the following, we present an example of onerow in this file: http://nl.dbpedia.org/resource/Cnodocentron_trilineatum 42467584 In addition to the closure of all owl:sameAs links (available in the folder closure_all.zip), this data set contains an additional two closures, with each closure alsocontaining two CSV files with the same structure as presented above. These two additional closures are the following: - closure_099.ziprepresents the closure of allowl:sameAs links in the sameAs.cc data set after discarding around 1 millionprobably erroneous owl:sameAs links (with error degree 0.99). This error degree is computed based on the community structure of the network, describedin theapproach of[Raad et al., 2018]. - closure_04.ziprepresents the closure of allowl:sameAs links in the sameAs.cc data set after discarding around 150 million owl:sameAs links (with error degree 0.4). The evaluation conducted in[Raad et al., 2018]shows that the 400M owl:sameAs links with an error degree = 0.4 have higher probability of correctness compared to other links. The availability of these 3 different closures allowsLinked Data practitioners for the first time to control in practice, the trade-off between (a) using more identity links, possibly not all correct, and benefiting from more contextual information from the LOD Cloud, and (b) using a smaller subset of higher qualityidentity links for limiting the risk of propagating erroneous identity links and information through the application of owl:sameAs semantics, i.e. transitive, symmetric, reflexive and property sharing.







This page was built for dataset: Equivalence Classes of the LOD Cloud