Gitome: A curated dataset for GitHub README-related tasks
DOI10.5281/zenodo.10311456Zenodo10311456MaRDI QIDQ6706838FDOQ6706838
Dataset published at Zenodo repository.
Claudio di Sipio, Nguyen Phuong Than, Juri di Rocco, Davide Di Ruscio, Rubei Riccardo
Publication date: 8 December 2023
Copyright license: Creative Commons Attribution 4.0 International
AboutThis repository contains the source code implementation used to replicate the experimental results obtained in the submitted to the 21st International Conference on Mining Software Repositories (MSR204)."Gitome: A curated dataset for GitHub README-related tasks"authored by:Claudio Di Sipio, Juri Di Rocco, Riccardo Rubei, Phuong Than Nguyen, and Davide Di Ruscio,Università degli Studi dell'Aquila, ItalyData descriptionThe dataset is structured as follows:emf_metamodel.zip: It contains the Ecore project with the Gitome data modelexisting_dumps.zip: It contains the existing datasets used to build Gitomelang_aggr_stats.csv: It contains the language data to compute the statistics presented in the paperlangs.csv: It contains all the languages and their frequencyoutput_dataset.zip: It contains the benchmarking dataset obtained by parsing the README filesrepository_lists.zip: It contains the list of repositories for each considered dataset (with possible duplicates)topics.csv: It contains all the topics and their frequencytopics_aggr_stats.csv: It contains the topics data to compute the statistics presented in the papergitome_repo.txt: It contains the list of the URLs of the considered GitHub repositoriesHow to collect GitomeTo collect all the data stored in this archive, please refer to the supporting Github repository https://github.com/MDEGroup/Gitome-MSR2024.
This page was built for dataset: Gitome: A curated dataset for GitHub README-related tasks