AYNEC-Datasets

From MaRDI portal
Dataset:6686428



DOI10.5281/zenodo.2564955Zenodo2564955MaRDI QIDQ6686428FDOQ6686428

Dataset published at Zenodo repository.

Agustín Borrego, David Ruiz, Carlos Rivero, Daniel Ayala, Inma Hernández

Publication date: 14 February 2019

Copyright license: Creative Commons Attribution 4.0 International



These datasets are presented in the article AYNEC: All You Need for Evaluating Completion Techniques in Knowledge Graphs, sent for the ESWC19. Please, cite it in your work if you make use of them. The following datasets are included: WN18-AF, generated from WN18. WN18-AR, generated from WN18, removing inverses. WN11-AF, generated from WN11. WN11-AR, generated from WN11, removing inverses. FB13-A, generated from FB13. FB15K-AF, generated from FB15K. FB15K-AR, generated from FB15K, keeping relations that cover 95% of the graph and removing inverses. NELL-AF, generated from NELL. NELL-AR, generated from NELL, keeping relations that cover 95% of the graph and removing inverses. In all datasets, we removed relations with only one instance, used 20% of each relation in the graph for test, generated one negative for each positive in both training and testing by replacing the target of the positive with a random entity. In WN11and WN18all entities are potential candidates. In the rest of datasets, only entities that have appeared as targets of the relation are candidates. Two relations were considered inverses when there was a 90% overlap between them. That is, relationc A and B are inverses if for 90% of instances of A there is an instance of B with inversed source and target, and vice-versa. When removing inverses, the smallest of each pair of inverses was removed. Each zip file contains the following files about a dataset: train.txt - triples used for training. Each line contains the source, the relation, the target, and the label (1 for positives and -1 for negatives). test.txt - triples used for testing, following the same format. relations.txt - a list of the relations in the dataset, each with its frequency. entities.txt - a list of the entities in the dataset, eac with its total degree, inwards degree, and output degree. inverses.txt - a list of the inverses in the original graph, whether or not they were removed. Each inverse relationship is represented by a pair of relations. summary.html - the visual summary of the relation frequencies and entity degrees (without removed inverses). dataset.gexf - the entire dataset in the open graph format gexf, which can be opened by applications such as Gephi.







This page was built for dataset: AYNEC-Datasets