A Novel Curated Scholarly Graph Connecting Textual and Data Publications

From MaRDI portal
Dataset:6724750



DOI10.5281/zenodo.7464120Zenodo7464120MaRDI QIDQ6724750FDOQ6724750

Dataset published at Zenodo repository.

Ornella Irrera, Paolo Manghi, Andrea Mannocci, Gianmaria Silvello

Publication date: 20 December 2022

Copyright license: Creative Commons Attribution 4.0 International



This dataset contains an open and curated scholarly graph we builtas a training and test set for data discovery, data connection, author disambiguation, and link prediction tasks.This graph represents the European Marine Science community included in the OpenAIRE Graph.The nodes of the graph we releaserepresent publications, datasets, software, and authors respectively; edges interconnecting research products always have the publication as source, and the dataset/software as target. In addition, edges are labeled with semantics that outline whether the publication is referencing, citing, documenting, or supplementing the related outcome. To curate and enrich nodes metadata and edges semantics, we relied on the information extracted from the PDF of the publications and the datasets/software webpages respectively. We curated the authors so to remove duplicated nodes representing the same person. The resource we release counts 4,047 publications, 5,488 datasets, 22 software, 21,561 authors, and 9,692 edges connect publications to datasets/software. This graph is in the curated_MESfolder. We provide this resource as: a property graph: we provide the dump that can be imported in neo4j 5 jsonl files containing publications, datasets, software, authors, and relationships respectively. Each line of a jsonl file contains a JSON object representing a node and contains themetadata of thatnode (or a relationship). We provide two additional scholarly graphs: The curated MES graph with the removed edges. During the curation we removed some edges sincethey were labeled with an inconsistent or imprecise semantics. This graph includes the same nodes and edges as the previous one, and, in addition, it contains the edges removed during the curation pipeline; these edges are marked as Removed.This graph is in the curated_MES_with_removed_semantics folder. The original MES community of OpenAIRE. It represents the MES community extracted from the OpenAIRE Research Graph. This graph has not been curated, and the metadata and semantics are those of the OpenAIRE Research Graph. This graph is in the original_MES_community folder.







This page was built for dataset: A Novel Curated Scholarly Graph Connecting Textual and Data Publications