Scholarly Wikidata: Population and Exploration of Conference Data in Wikidata using LLMs

DOI10.5281/zenodo.10989709Zenodo10989709MaRDI QIDQ6702279FDOQ6702279

Dataset published at Zenodo repository.

Daniil Dobriy, Axel Polleres, Tek Raj Chhetri, Finn Årup Nielsen, Sanju Tiwari, Nandana Mihindukulasooriya

Publication date: 18 April 2024

This dataset provides the input data and intermediate results of the paper titled "Scholarly Wikidata: Population and Exploration of Conference Data in Wikidata using Large Language Models and Semantic Web Techniques". It contains the following resources. conference proceedings front matter links - these links can be used to download the pdf files of the conference proceeding front matters that include information about the number of submitted and accepted papers that can be used to calculate acceptance rates, names of all conference organization committee members, list of programme committee and senior programme member names for each track with other interesting facts such as the main topics of the submitted papers and emerging topics according to the editors, etc. web crawl of conference websites - this contains a set of crawled content from each conference website in both HTML and text formats. Each file contains web pages from a specific conference along with the page URL, page title, and page content. Information such as important dates (deadlines) and other announcements can be extracted from the content of the web sites. papers and paper-authors list for each conference in a given conference series - this contains the paper list along with their corresponding authors for each conference series extracted from DBLP. OpenRefine projects - this contains examples of open refile projects that were used to perform entity linking and reconciliation as well as the schemas that was used to map the tabular data columns to Wikidata properties, and qualifiers and cell values to Wikidata entities. evaluation benchmark - this contains the outputs of LLM generations for the tasks (a) extracting the number of submitted and accepted papers per each track at a given conference, (b) extraction of organizers with their roles for each conference, (c) extraction of programme committee members with track and their role (member, SPC member), and (d) extraction of important dates or deadlines for each activity (submission, notification, etc.) in each track. The corresponding source code is available at the scholary-data repo.

This page was built for dataset: Scholarly Wikidata: Population and Exploration of Conference Data in Wikidata using LLMs