OIE4PA: Open Information Extraction for the Public Administration

DOI10.5281/zenodo.8331106Zenodo8331106MaRDI QIDQ6718885FDOQ6718885

Dataset published at Zenodo repository.

Pierpaolo Basile, Lucia Siciliani, Pasquale Lops, Eleonora Ghizzota

Publication date: 9 September 2023

Copyright license: Creative Commons Attribution 4.0 International

Tenders are powerful means of investment of public funds and represent a strategic development resource. Despite the efforts made so far by governments at national and international levels to digitalise documents related to the Public Administration sector, most of the information is still available in an unstructured format only. With the aim of bridging this gap, we present OIE4PA, our latest study on extracting and classifying relations from tenders of the Public Administration. Our work focuses on the Italian language, where the availability of linguistic resources to perform Natural Language Processing tasks is considerably limited. For evaluation purposes, we built a dataset composed of 2,000 triples extracted from Italian tenders, which have been manually annotated by two human experts. The dataset, compressed in a single zip file,is composed of: The corpus of 6,262 texts extracted from Italian public tenders (corpus_tenders) The training set of 1,600annotated triples (training_set) The test set of 400annotated triples (test_set) The set Uof 14,096triples used for the self-training (u_triples_dd) a compressed archive that contains both the extracted triples and the index for each supervised approach (extraction)

This page was built for dataset: OIE4PA: Open Information Extraction for the Public Administration